0% found this document useful (0 votes)

164 views235 pages

Quantum Computing Course Overview

Uploaded by

Aram Shojaei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views235 pages

Quantum Computing Course Overview

Uploaded by

Aram Shojaei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

An Undergraduate Course on Quantum Computing

Third Edition, plus a section on the 2022 Nobel Prize

Required text for PHYS 150/CSE 109, Spring Quarter 2024

Peter Young,
e-mail: petery@[Link]
University of California Santa Cruz, CA 95064

“Anyone who is not shocked by quantum mechanics hasn’t understood it.”

(Attributed to Niels Bohr)
“I think I can say that nobody understands quantum mechanics.”
(Richard Feynman)

z 0

θ 0n

y
ϕ
x
x x control qubit 1n

y y + x target qubit
1
A quantum gate (controlled−NOT) The general state of a qubit (the Bloch sphere)
ii
Contents

Preface vii

1 The Strange World of Quantum Mechanics 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Two-Slit Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Stern-Gerlach Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Review of Linear Algebra 9

2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Complex Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Matrix Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Some Important 2 × 2 matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Properties of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Introduction to Quantum Mechanics 17

3.1 Quantum States as Complex Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 The Computational Basis and Change of Basis . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Outer Product Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Functions of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.8 Statistics of Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 Composite Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.10 Generalized Born Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.11 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.12 Time Evolution of Quantum States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 General state of a qubit, no-cloning theorem, entanglement and Bell states 37

4.1 General qubit states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 No-cloning theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Entanglement and Bell states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.A Angular Momentum Eigenstates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iii
iv CONTENTS

5 The Density Matrix 45

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Definition of the Density Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.1 Density matrix of a system in a well defined state . . . . . . . . . . . . . . . . . 46
5.2.2 Density matrix of a subsystem when the combined system is in a well defined state 46
5.3 Determining if a state is entangled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Some Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.1 Example 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.2 Example 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4.3 Example 3: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5 Systems not in a single quantum state . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.A Schmidt Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.B Change in the density matrix under a unitary transformation . . . . . . . . . . . . . . . 56

6 Einstein-Podolsky-Rosen (EPR), Bell’s inequalities, and Local Realism 59

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 An EPR Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3 Bells’ Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.A The 2022 Physics Nobel Prize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.B Information does not propagate faster than the speed of light . . . . . . . . . . . . . . . 68
6.C The spin-singlet state is isotropic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7 Classical and Quantum Gates 71

7.1 Classical Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8 Generating and measuring Bell States 83

9 Quantum Functions 85
9.1 An elementary quantum function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.2 Quantum Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

10 Deutsch’s Algorithm 89
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.A An alternative derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.B Derivation of some useful identities in quantum circuits . . . . . . . . . . . . . . . . . . 94

11 The Bernstein-Vazirani Algorithm 97

11.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.2 An Alternative Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

12 Simon’s Algorithm 105

13 Factoring and RSA (Rivest-Shamir-Adleman) Encryption 109

13.A The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
13.B Extension of the Euclidean Algorithm to find an inverse modulo an integer . . . . . . . 112

14 Using Period Finding to Factor an Integer 115

CONTENTS v

15 The Fourier Transform and the Fast Fourier Transform (FFT) 119
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.2 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.A The Fast Fourier Transform; an example with N = 8 . . . . . . . . . . . . . . . . . . . . 121
15.B Beyond N = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15.C The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

16 The Quantum Fourier Transform (QFT) 129

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
16.2 QFT with two qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
16.3 QFT with three or more qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
16.4 The Phase Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
16.A Comparison between FFT and the QFT for N = 4 . . . . . . . . . . . . . . . . . . . . . 141
16.B Comparison of the FFT and QFT for N = 8 and generalization to larger N . . . . . . . 144

17 Shor’s Algorithm 151

17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
17.2 Modular Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
17.3 Quantum Fourier Transform (QFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
17.4 A special case: the period r is a power of 2. . . . . . . . . . . . . . . . . . . . . . . . . . 156
17.5 The general case: the period is not a power of 2. . . . . . . . . . . . . . . . . . . . . . . 159
17.6 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
17.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
17.A Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
17.B Eliminating the two-qubit gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
17.C Unimportance of Small Phase Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

18 Coherent Superposition Versus Incoherent Addition of Probabilities 171

18.1 Coherent Linear Superposition: 1 qubit . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
18.2 Incoherent (Classical) Addition of Probabilities . . . . . . . . . . . . . . . . . . . . . . . 172
18.2.1 Example with 1 qubit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
18.2.2 Example with 2 qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
18.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

19 Quantum Error Correction 175

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
19.2 Correcting bit flip errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
19.3 Stabilizer formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
19.4 Phase Flip Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
19.5 General Errors and the Effects of the Environment . . . . . . . . . . . . . . . . . . . . . 185
19.6 Correcting Arbitrary Errors: the 9-qubit Shor code . . . . . . . . . . . . . . . . . . . . . 188
19.7 Other error-correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.7.1 The 5-qubit code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.7.2 The Steane 7-qubit code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
19.7.3 Surface Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
19.8 Fault Tolerant Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
19.A Summary of Quantum Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
vi CONTENTS

20 Grover’s Search Algorithm 205

20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
20.2 The Black Box (Oracle) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
20.3 The second step of the Grover iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
20.4 Subsequent iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
20.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.5.1 More than one special value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.5.2 Quantum Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

21 Quantum Protocols Using Photons 215

21.1 Quantum Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
21.1.1 BB84 protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
21.1.2 BB92 protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

22 Epilogue: Quantum Simulators 221

Bibliography 223

Index 225
Preface

This material has been given as a one-quarter course for undergraduates in the physical sciences at
the University of California Santa Cruz.
In order that the course be accessible to majors other than physics, the rules of quantum mechanics
were taught from scratch in the first part. While some of my physics colleagues were surprised that
this could be done, it is perfectly feasible because much of what is included in a traditional physics
course on quantum mechanics concerns continuous degrees of freedom so one has to cover complicated
topics such as partial differential equations, boundary conditions, angular momentum, and a plethora
of special functions. All this can be omitted in a quantum computing course which is focused on 2-state
systems. A solid background in linear algebra is required. A brief review of this is given at the start,
but the treatment is fast and it is assumed that the students will have seen the material before.
The aim of the course is to get students to the level where they can understand the two most
important topics covered: Shor’s algorithm in Chapter 17 and quantum error correction in Chapter
19. Unlike quantum algorithms proposed previously, Shor’s algorithm for factoring integers gives a
spectacular speedup on a problem of practical importance (encryption of data sent down a public
channel). Considerable experimental challenges remain to implement Shor’s algorithm for a large
number of qubits but quantum error correction will be essential in order to achieve this, because qubits
are highly susceptible to noise. Incorporating quantum error correction still leaves huge experimental
challenges before achieving the goal of factoring integers larger than what is possible classically, but
without quantum error correction it would clearly be impossible.
The goal, then, is to present a course at the undergraduate level, but which still goes into enough
depth to give a good understanding of Shor’s algorithm and the basics of quantum error correction.
No details will be given on the many experimental approaches to building a quantum computer, which
is a huge topic that would merit a separate course in its own right.
There are, of course, excellent more advanced texts, such as the monumental classic by Nielsen and
Chuang [NC00] and the books by Mermin [Mer07] and Rieffel and Pollack [RP14]. The book closest
in level and spirit to the present text is the one by Vathsam [Vat16], which I found very useful when
preparing this material. Whereas these books, and mine, focus mainly or entirely on theory, the book
by LaPierre [LaP21] also devotes a substantial amount of material to experimental implementations
of quantum computers.
My hope is that this text will take students to a level where they can follow the rapidly-moving
advanced literature in the field.

Peter Young
University of California, Santa Cruz
April 2, 2024

vii
viii PREFACE
Chapter 1

The Strange World of Quantum

Mechanics

1.1 Introduction
The quantum world is strange, and different from the classical world that we see around us. Our
intuition obtained from everyday experience is for objects that we can see. It does not apply to the
quantum world where we are dealing with very small objects, objects that (in most cases) are too small
to see. I give two quotations, from eminent physicists, which illustrate the strangeness of the quantum
world:

“Anyone who is not shocked by quantum mechanics hasn’t understood it.”

(Attributed to Niels Bohr).

“I think I can say that nobody understands quantum mechanics.”

(Richard Feynman).

The big question which we will address in this course is whether we can use the difference between
the quantum and classical worlds to find more efficient algorithms to solve certain problems by treating
the data in a quantum computer in which it is processed according to quantum rules rather than
classical rules. We shall see that for some problems the answer is “yes”. I should mention now
that there is a practical question of whether we can actually build a useful quantum computer. The
difficulties of building such a device have not yet been overcome, though much progress has been made.
In this course, which focuses on theory, we will not describe the many experimental approaches that
are being implemented to try to achieve this goal. However, we will discuss in Chapter 19 how one
can reduce errors caused by an imperfect device, a topic called “Quantum Error Correction”.
A quantum computer, then, is one in which data is processed by quantum, rather than classical
rules. What do we mean by this? In a classical computer the data is stored in bits, which take two
values 0 and 1. A quantum computer also uses 2-state systems called qubits. We indicate these two
states by |0i and |1i, a notation introduced by the physicist Paul Dirac. The difference from classical
bits is that the general state of a qubit, which we will write as |ψi, is a superposition of states |0i and
|1i:
|ψi = α|0i + β|1i, (1.1)
where α and β are numbers (complex in general). For reasons that will be explained later, we need the
condition |α|2 + |β|2 = 1. One sometimes says loosely that a qubit in the state described by Eq. (1.1)
is simultaneously in states |0i and |1i. This is to be contrasted with a classical bit which takes value
0 or 1.

1
2 CHAPTER 1. THE STRANGE WORLD OF QUANTUM MECHANICS

Our main goal in this course will be to see if one can gain computationally from superposition
states.
In the next two sections of this chapter I describe experiments which illustrate the strangeness of
the quantum world. More information on this topic can be found in Refs. [NC00, Mer07, Vat16] and
in Ch. 1, Vol. 3 of the Feynman Lectures on Physics [FLS64].

1.2 The Two-Slit Experiment

You are probably familiar with experiments involving light going through slits which demonstrate that
light, being a wave, shows interference.
First consider just one slit. If the slit width d is very large compared with the wavelength of light
λ (the geometrical optics limit) then, to a good approximation, the light continues in a straight line.
However, if the slit width is comparable to, or less than, λ, the light spreads out after passing through
the slit, which is called diffraction. Figure 1.1 sketches the intensity of light observed on a screen
behind the slit.
intensity

screen

Figure 1.1: A beam of light spreads out (diffracts) when passing though a slit of width d which is
comparable to, or smaller than, the wavelength of the light λ. The figure shows a sketch of the
intensity of the beam on a screen after it has passed through the slit.

intensity

screen

Figure 1.2: A two slit experiment. Interference fringes, oscillations of strong and weak intensity, are
seen due destructive and constructive interference. The overall envelope of the intensity has a similar
form to that from a single slit shown in Fig. 1.1.
1.2. THE TWO-SLIT EXPERIMENT 3

If the light beam passes two slits, as shown in Fig. 1.2 one observes interference fringes, oscillations
of strong and weak intensity, due to interference between between the beams going through the two
slits. If the difference in path length |r1 − r2 | (see Fig. 1.3) satisfies |r1 − r2 | = nλ (for integer n)
one has constructive interference and a maximum intensity, whereas if |r1 − r2 | = (n + 21 )λ one has a
minimum intensity. Hence, as one moves along the screen one alternately gets regions of low intensity
and high intensity. These are called interference fringes.
This is the classical picture. That is, we shine a beam at the two slits, some of it goes through one
slit, some goes through the other slit, and when these two beams recombine they interfere.

screen

Figure 1.3: The difference in the length of the paths taken by the beams going through the two slits
is |r1 − r2 |. This varies as a function of the location on the screen, so the interference changes from
constructive, where |r1 − r2 | = nλ, to destructive, where |r1 − r2 | = (n + 21 )λ with n an integer.

Now we reduce the intensity of the light. At some point we notice that light is not a continuous
wave but consists of discrete bunches of energy called photons. To detect individual photons, we place
an array of photon counters on the screen and count the number of discrete clicks in each counter, see
Fig. 1.4. We record the number of clicks for counters placed at different points on the screen.

array of photon counters

Figure 1.4: The two slit experiment where individual photons are detected by an array of counters
which count the number of photons at their location.

Suppose we reduce the intensity so much that the time between emitting photons is greater than
the time it takes a photon to pass through the experimental setup, i.e. the photons go through one at
a time. Do we see an interference pattern? Using our classical intuition we would say “no” because
surely each photon “must” either go through the upper slit or the lower one and can therefore not
interfere with itself. In other words we would expect the intensity of clicks in the counters to vary
smoothly along the screen, as in the classical single slit experiment shown in Fig. 1.1.
4 CHAPTER 1. THE STRANGE WORLD OF QUANTUM MECHANICS

Amazingly, this is not so and we do see an interference pattern. In other words the number of
clicks in the counters varies rapidly and in a oscillatory manner as we move along the screen, just as in
the classical two-slit experiment shown in Fig. 1.2. It looks as though a single photon does go through
both slits. You may already be feeling (correctly) that this looks suspiciously like a superposition state
such as the one we we wrote down in Eq. (1.1), where now |0i refers to photon through the upper slit
and |1i to photon through the lower slit.
You might ask “why don’t we just look and see which slit the photon went through”. Well, photons
being electrically neutral are hard to observe unless we absorb them (which we want to do only when
they reach the screen). The rate of scattering of one photon by another is immeasurably small. So, with
photons we can’t observe which slit they went though. However, we can do the same experiment with
electrons rather than photons. Like photons, electrons have both particle and wave-like properties,
but, being charged, they readily scatter light so we can see observe them by shining light on them.
The discussion which follows is based on Ch. 1, Vol. 3 of Feynman [FLS64].
In this new version of the experiment we send electrons through the slits one at a time. To see
which slit they went through we shine light of wavelength λ at the slits and observe a flash of light
every time an electron goes through.
Suppose that we choose a light source that has a wavelength λ which is bigger than the slit spacing
d. We do see a flash every time an electron passes through, and observe that there is still an interference
pattern but, the flash of light is of size λ which is greater than the separation of the slits, so we can’t tell
which slit the electron went through. Clearly we need to use a light source with wavelength less than d.
When we do this, indeed we see a flash at either the upper slit or the lower slit every time an electrons
passes, so we’ve achieved our goal of observing which slit each electron goes through. But alas, when
we look at the counts registered on the detectors we see that the interference fringes have been washed
out, and we have just a smooth variation in the number of clicks along the screen. Observations such
as these show that it is not possible to determine which slit each electron goes through and observe
interference fringes.
This observation guides us to a second piece of intuition regarding quantum mechanics (the first,
mentioned above, is that a quantum system can be in a superposition state), namely that a measure-
ment can unavoidably change a quantum state, and in particular can destroy a superposition.
Classically, measurements are passive, and can be done in a delicate way so they simply reveal a
reality which is already present whether we observe it or not. Quantum mechanically, measurements
play a much more active role and can change the state of the system. In particular, we shall see that
if we observe a system in a particular state, we can’t necessarily say that it was in that state before
the measurement.

1.3 Stern-Gerlach Experiment

We will now discuss a second experiment which gives additional insight into superposition states.
Consider the hydrogen atom, which consists of one proton (the nucleus), which has a positive
electric charge, and one electron which has a negative charge. In its ground state the electron has
a symmetric distribution of velocities and so there is no net circulating electric current around the
proton. Hence the orbital motion of the electron does not give rise to a magnetic moment which could
interact with an external magnetic field. However, the electron has an internal state, called spin, which
does give rise to a magnetic moment1 µ ~ , proportional to the spin angular momentum.
There is a force on a magnetic moment in a field if the field is non-uniform. To see this, recall that
the energy of a magnetic moment in a magnetic field B ~ is −~µ·B ~ and therefore the force, which is
1
The proton also has a spin and hence a magnetic moment but, because of its much larger mass, its magnetic moment
is much smaller than that of the electron and so does not play a role in our discussion.
1.3. STERN-GERLACH EXPERIMENT 5

minus the spatial gradient of the energy, is given by

F~ = ∇
~ µ ~
~ ·B (1.2)

so
dB~
~·
Fz = µ , (1.3)
dz
where we have assumed, without loss of generality, that the field changes as function of z. Hence
a beam of hydrogen atoms in a non-uniform field varying in the z-direction will be deflected in the
z-direction. For simplicity we assume that the field itself is also (predominantly) along the z-direction,
see Fig. 1.5, so
dBz
F z = µz , (1.4)
dz
and hence the deflection will be proportional to µz .

z
S

N x

Figure 1.5: A cross section of the magnet in the Stern-Gerlach experiment. The beam goes between
the poles of the magnet, into the plane i.e. in the y-direction, and intersects the symmetry axis (which
is in the z-direction and shown by the dashed line).

magnet
expected classically

y screen

magnet
observed

screen

Figure 1.6: The Stern-Gerlach apparatus.

We send in a beam of unpolarized hydrogen atoms into a non-uniform field. This is the famous
Stern-Gerlach (SG) experiment. Since the direction of µ ~ is random, classically µz takes a range of
values, so we would expect a continuous range of deflections. However, it is found that only two beams
6 CHAPTER 1. THE STRANGE WORLD OF QUANTUM MECHANICS

emerge, which are deflected in opposite directions, see Fig. 1.6. Since µ ~ is proportional to the spin it
seems that the spin component along z has only two components, corresponding to states which we
might label as2 | ↑z i and | ↓z i, or alternatively as |0i and |1i respectively.
Now suppose that we orientate the magnet so the field and its gradient are in the x-direction.
Again we will see two beams emerging, indicating that µx has only two possible values | ↑x i and | ↓x i.
How are | ↑x i and | ↓x i related to | ↑z i and | ↓z i? We can get an idea of this if we run our beam
first through a SG setup with the field in the z-direction and then pass one of the resulting beams
through an SG setup in the x-direction as shown in Fig. 1.7. The final result is found to be two beams
of equal intensity.

z x
SG z SG x

z x

I/4
x
I/2 SG x
I
SG z I/4 x
I/2
blocked

Figure 1.7: The upper figure shows schematically separate Stern-Gerlach experiments with the field
in the z-direction (SGz ) and in the x direction (SGx ). The lower figure shows a double Stern-Gerlach
experiment in which the beam is passed first through an SG apparatus with a field in the z-direction
and then one of the beams is passed through an SG apparatus with the field in the x-direction.

1
| ↑z i = √ (| ↑x i + | ↓x i) , (1.5)
2
√ √
where we say that there is an amplitude 3 1/ 2 for | ↑z i to be | ↑x i and amplitude 1/ 2 for it to
be | ↓x i. As we shall also see later, the probability that a measurement gives a certain result is the
square of the modulus of corresponding amplitude4 so the probability of measuring | ↑x i after the SGx
apparatus is 1/2 (as observed) and the same for | ↓x i.
It is also true that
1
| ↑x i = √ (| ↑z i + | ↓z i) , (1.6)
2
so if we run one of the beams from the SGx apparatus in Fig. 1.7 through another SGz apparatus we
will get beams with equal intensity for | ↑z i and | ↓z i, see Fig. 1.8. Note a surprising aspect of this
result. After the first SGz apparatus, there is zero probability for getting | ↓z i (because we blocked
it off), but after the SGx apparatus there is a 50% probability for finding | ↓z i. In other words, a
non-zero probability for getting | ↓z i has been generated by the measurement. This is a clear example
of a measurement (in this case that done by the SGx apparatus) affecting the state of the system.
2
The electron is the simplest two-state system.
3
Sometimes called a probability amplitude
4
The fact that probabilities add to 1, is why |α|2 + |β|2 = 1 in Eq. (1.1).
1.4. PHOTONS 7

Probability 1/2 of z
Zero probability of z I/8
z
I/4 SG z
I/2 SG x I/8
I z
SG z I/4 blocked
I/2 blocked

Figure 1.8: We now add another SGz apparatus after the SGx apparatus in Fig. 1.7. The result is
equal intensity in the beams for | ↑z i and | ↓z i. After each SG apparatus the upper line is for the “up”
spin and the lower line for the “down” spin.

1.4 Photons
In the previous section we noted that the spin of the electron is a two-state quantum system. Here we
discuss another two-state quantum system, the photon, the quantum of light.
Light is an oscillating transverse electromagnetic field, in which the electric field E~ and magnetic
field B~ are perpendicular both to each other and to the direction of propagation specified by the
wavevector ~k. For example, if E ~ in the y direction, and ~k in the z direction we
~ is in the x direction, B
have 5

~ = E0 x̂ ei(kz−ωt) ,
E
(1.7)
~ = B0 ŷ ei(kz−ωt) .
B

The direction of E~ is called the polarization direction. There are two distinct polarizations which
we can call “horizontal” (along x̂)

| ↔i, equivalent to | ↑z i ≡ |0i, (1.8)

and “vertical”, (along ŷ)

| li, equivalent to | ↓z i ≡ |1i. (1.9)
What are the analogs of | ↑x i and | ↓x i? The answer is diagonal polarizations:
1 1
| i ≡ √ (| li + | ↔i) , equivalent to | ↑x i ≡ √ (|0i + |1i) ,
↔

2 2
(1.10)
1 1
| i ≡ √ (| li − | ↔i) , equivalent to | ↓x i ≡ √ (|0i − |1i) .
l

2 2
More details on the correspondence between photon polarization and qubit states will be given in
Sec. 4.1.
Photons do not interact with each other to a measurable extent, and can not readily be stored, so
they are unsuitable for most types of quantum computer, but have the advantage that they can be
transmitted over great distances down optical fibers, preserving their polarization. These properties
will be useful for some quantum protocols to be discussed in Chapter 21.
5
We understand that the physical fields are the real parts of these expressions.
8 CHAPTER 1. THE STRANGE WORLD OF QUANTUM MECHANICS
Chapter 2

Review of Linear Algebra

The theory of quantum mechanics is based on linear algebra which is a pre-requisite for the course
and is standard material available in many books. In this chapter we summarize those topics in linear
algebra which will be needed for this class. The treatment is quick and is intended as a review for
students, assuming that they have seen the material before.

2.1 Vectors
An abstract vector ~v can be represented in terms of its N components vi , (i = 1, · · · , N )
N
X
~v = vi êi , (2.1)
i=1

with respect to a set of basis vectors êi , which form an orthonormal set, i.e.

~ei · ~ej = δij , (2.2)

where the left hand side is a scalar product

N
X
~a · ~b = ai bi , (2.3)
i=1

and δij is the Kronecker delta function,

1 (i = j),
δij = . (2.4)
6 j),
0 (i =

We say that a vector ~v is normalized if ~v · ~v = 1, and that two vectors ~a and ~b are orthogonal if
~a · ~b = 0. A set of vectors is said to be orthonormal if each is normalized and every pair is orthogonal.
The number of independent basis states required to represent any vector is called the size of the “vector
space”. It is denoted here by N .
The vector ~v can be represented in terms of its components vi as a column vector
 
v1
 v2 
~v =  .  , (2.5)
 
 .. 
vN

9
10 CHAPTER 2. REVIEW OF LINEAR ALGEBRA

and its transpose as a row vector

~v T = v1 v2 · · · vN .

(2.6)
The length of a vector is given by

N
!1/2
X
|v| = vi2 = (~v · ~v )1/2 . (2.7)
i=1

One can represent a vector with respect to different orthonormal bases rotated with respect to each
other. If a vector has components vi0 with respect to the new basis, there is a linear relation between
the old and new components,
N
X
vi0 = Mij vj , (2.8)
j=1

where M is an N × N matrix with elements Mij . In order that M describes a rotation (which preserves
lengths of vectors and angles between them), it is necessary that M be an orthogonal matrix, i.e.

M −1 = M T , (2.9)

where M T is the transpose matrix, and M −1 is the matrix inverse which means that M −1 M =
M M −1 = 1 where 1 is the identity matrix. An example of a rotation matrix for two-component
vectors is
cos θ sin θ
M= , (2.10)
− sin θ cos θ
where θ is the rotation angle.
The scalar product of two vectors is independent of basis, so
N
X N
X
~a · ~b = ai bi = a0i b0i . (2.11)
i=1 i=1

This is why ~a · ~b is called a scalar product.

2.2 Complex Vectors

In quantum mechanics, we need complex vectors, i.e. vectors with complex coefficients. The main new
feature compared with real vectors is a slight difference in the definition of the scalar product, namely
one takes the complex conjugate of the left hand vector, i.e.
N
X
~a · ~b = a?i bi . (2.12)
i=1

In terms of rules for matrix multiplication one can view the scalar product as the matrix product of
the complex conjugate of the transpose vector (row vector) for a with the vector (column vector) for
b, i.e.  
b1
 b2 
?
~a · ~b ≡ aT b = a?1 a?2 · · · a?N  .  ,

(2.13)
 
 .. 
bN
2.3. MATRICES 11

? ?
in which aT is an 1 × N dimensional matrix, b is an N × 1 dimensional matrix, and aT b denotes
matrix multiplication with the result being a single number (scalar).
The length of a complex vector, called the norm |a| from now on, is still the square root of the
scalar product of the vector with itself, i.e.

N
!1/2
X
|a| = (~a · ~a)1/2 = |ai |2 . (2.14)
i=1

2.3 Matrices
If A and B are matrices then the matrix product C = AB is given in terms of its elements by

M
X
Cij = Aik Bkj . (2.15)
k=1

We assume here that A is of dimension N × M (N rows and M columns), in which case B must have
M rows. If B has P columns then C is of dimension N × P . As noted above, it will sometimes be
useful to think of a column vector as an N × 1 dimensional matrix (N rows and 1 column), and a row
vector as a 1 × N dimensional matrix. Apart from vectors, the matrices in this course will be square
(number of rows equals number of columns).
Matrix multiplication has the property that the order of multiplication matters in general. We
define the commutator of two matrices by

[A, B] ≡ AB − BA. (2.16)

If [A, B] = 0 we say that A and B commute. However, in general matrices do not commute, i.e. their
commutator is non-zero. Lack of commutation of matrices will have important consequences in quan-
tum mechanics.
Some important, special types of matrices are:

• Symmetric: M T = M (M T is the transpose, so M T

ij
= Mji ).

• Orthogonal: M T = M −1 .

In quantum mechanics we will deal with complex matrices, as well as complex vectors. In the case
of complex matrices, one is usually interested in Hermitian matrices rather than symmetric ones, and
unitary matrices rather than orthogonal ones, where these are defined by:
?
• Hermitian: M † = M (M † is the adjoint, the complex conjugate of the transpose so M † = M T ).

• Unitary: M † = M −1 .
Unitary matrices have the useful property that the rows form orthonormal vectors, as do the
columns. To determine if a matrix is unitary it may be easier to do this check rather than
compute the inverse.

Hermitian and unitary matrices play important roles in quantum mechanics.

12 CHAPTER 2. REVIEW OF LINEAR ALGEBRA

2.4 Matrix Diagonalization

Let A be N × N matrix and ~x an N -component vector. Then if A~x is proportional to ~x itself, i.e. if
A~x = λ~x or, in terms of elements,
N
X (2.17)
Aij xj = λxi ,
j=1

then we say that λ is an eigenvalue and ~x the corresponding eigenvector of A. There are N eigenvalues
which may not all be distinct. If two or more eigenvalues are equal we say that they are degenerate.
We can always multiply an eigenvector by a constant and it remains an eigenvector. In quantum
mechanics we will need to choose this multiplicative constant so the vector is “normalized”, i.e. has
unit length.
The eigenvalues are obtained from solving
det(A − λ1) = 0, (2.18)
where det is short for determinant. Expanding out the determinant gives an N -th order polynomial
equation for λ. One can then get the eigenvectors by solving the linear equations in Eq. (2.17) for each
value of λ.
The eigenvalues and eigenvectors of Hermitian matrices have special properties:
• The eigenvalues are all real.
• Eigenvectors corresponding to unequal (non-degenerate) eigenvalues are orthogonal. For eigen-
vectors corresponding to degenerate eigenvalues, one can form linear combinations which are
orthogonal.
Once one has the eigenvectors, a matrix A can be “diagonalized” as follows1 :
D = S −1 AS, (2.19)
where D is a diagonal matrix with the eigenvalues of A on the diagonal,
 
λ1 0 · · · 0
 0 λ2 · · · 0 
D= . ..  , (2.20)
 
. .. . .
. . . . 
0 0 · · · λN
and the matrix S, which effects the diagonalization, is constructed out of the eigenvectors of A as
follows:
S = ~e (1) , ~e (2) , · · · , ~e (N ) ,

(2.21)
where ~e (i) is the i-th eigenvector of A written as a column vector.
If A is Hermitian then the eigenvectors orthogonal, so if we normalize them, the matrix of eigen-
vectors S is unitary, so let’s call it U , i.e. U −1 = U † . Hence a Hermitian matrix A is diagonalized by
the following transformation
D = U † AU. (2.22)
If we consider two N × N matrices A and B, one can show that they have the same eigenvectors
if and only if the matrices commute, i.e. if [A, B] ≡ AB − BA = 0. This result will have important
consequences in quantum mechanics.
1
There are some matrices with degenerate eigenvalues which have less than N independent eigenvectors. These can
not be diagonalized. However, this situation does not occur for Hermitian or unitary matrices, the two categories that
are of principle interest in quantum mechanics, and so we will ignore non-diagonalizable matrices in this course.
2.5. SOME IMPORTANT 2 × 2 MATRICES 13

2.5 Some Important 2 × 2 matrices

In quantum computing we deal most frequently with 2 × 2 matrices because qubits have two states.
Important examples of 2 × 2 Hermitian matrices are the Pauli (spin) matrices

0 1 0 −i 1 0
X= , Y = , Z= , (2.23)
1 0 i 0 0 −1

(called σx , σy and σz in the physics literature).

Any 2 × 2 matrix can be expressed as a linear combination of the three Pauli matrices plus the
identity. To see this note that X, Y, Z and 1 are linearly independent (i.e. we can’t write any one as a
linear combination of the others). Also a general 2 × 2 matrix

t u
A= (2.24)
v w

has 4 complex elements, and so a total of 8 real parameters. If we write

A = a0 1 + ax X + ay Y + az Z (2.25)

then there are also 4 complex coefficients (8 real parameters). Hence there are just the right number
of coefficients to specify any 2 × 2 matrix, so Eq. (2.25) is a general expression for a 2 × 2 matrix.
Let’s determine the eigenvalues and eigenvectors of X. The eigenvalues λ are obtained from

0−λ 1
= 0, (2.26)
1 0−λ

which gives λ2 − 1 = 0 or λ = ±1. These are real, which they must be since X is Hermitian.
Let us now get the eigenvectors. We denote the corresponding normalized eigenvectors by ~e+1 and
~e−1 and indicate the coefficients by a and b.

• λ = +1.

0 1 a a
= , (2.27)
1 0 b b
which gives the equations
√ b = a and a = b, which are the same. To normalize the eigenvector,
we take a = b = 1/ 2, so

1 1
~e+1 = √ . (2.28)
2 1

• λ = −1.

0 1 a a
=− (2.29)
1 0 b b
which gives the two equations b = −a and a = −b (which are equivalent). The normalized
eigenvector is therefore
1 1
~e−1 = √ . (2.30)
2 −1
The eigenvectors ~e+1 and ~e−1 are orthogonal, as we know they must be since X is Hermitian.
14 CHAPTER 2. REVIEW OF LINEAR ALGEBRA

Forming the matrix of normalized eigenvectors gives

1 1 1
U=√ (2.31)
2 1 −1
which is unitary as expected.2
It is instructive for the student to show that the eigenvalues of Y and Z are also ±1 and to determine
their eigenvectors. The student should also be able to show that X, Y and Z are not only Hermitian
but also unitary.
Pauli matrices have the property that the commutator of the two of them is proportional to the
third one, e.g.
[X, Y ] = 2iZ, (2.32)
and similarly [Y, Z] = 2iX and [Z, X] = 2iY . Furthermore, if we define the anti-commutator of two
matrices by
{A, B} ≡ AB + BA, (2.33)
then, interestingly, different Pauli matrices anti-commute, e.g.
{X, Y } = 0, (2.34)
and similarly {Y, Z} = {Z, X} = 0.
Another 2 × 2 matrix which is very important in quantum computing is the Hadamard, defined by

1 1 1 1
H = √ (X + Z) = √ . (2.35)
2 2 1 −1
The Hadamard also has eigenvalues ±1.

2.6 Properties of Matrices

Two properties of square matrices will be important: the trace, which is the sum of the diagonal
elements, and the determinant. It is left as an exercise for the student to show (i) that the trace is
the sum of the eigenvalues, and (ii) that the trace of a product of matrices is invariant under a cyclic
permutation of the matrices so, for example, Tr AB = Tr BA even if A and B don’t commute.
We will now show (iii) that the determinant is the product of the eigenvalues. If we multiply
Eq. (2.19) on the left by S and on the right by S −1 we get
A = SDS −1 . (2.36)
An important result of linear algebra, which is not as well known in the physics community as it should
be, is that determinant of a product of matrices is equal to the product of the determinants, i.e.
det (AB) = det A det B. (2.37)
Taking the determinant of both sides of Eq. (2.36) gives
det A = det S det D det S −1
= det D det S det S −1
= det D det SS −1

N
Y
= det D = λm , (2.38)
m=1

A unitary matrix has the property that U −1 = U † . Here, in addition, it turns out that U −1 is equal to U itself. This
2

is not necessary for U to be unitary, though many unitary operations in this course will have this property.
2.6. PROPERTIES OF MATRICES 15

which is the desired result.

Problems
2.1. For the following matrix A  
1 2 i
A = −2 1 3 ,
−i 1 0

determine AT and A† .

2.2. Show whether the following matrices are Hermitian or unitary or both or neither.

1 1 1 7 3i 1 3 4
(a) A = √ , (b) B = , (c) C =
2 1 −1 −3i 4 5 −4 3

Note: To decide if a matrix A is unitary it is simpler to check if A† A = 1 (where 1 is the identity

matrix) than to check if A−1 = A† .

2.3. Find the eigenvalues and normalized eigenvectors of the following matrix
 
1 0 1
0 1 0 . (2.39)
1 0 1

Note: As a check you should verify that the sum of the eigenvalues you find is equal to the trace
(sum of diagonal elements). You should also check that the eigenvectors are orthogonal.

2.4. Verify that the trace of a matrix is equal to the sum of its eigenvalues for the following matrices:
 
2 0 2
0 1
(a) A = , (b) B = 0 4 4
1 0
1 0 3

2.5. For the matrices in Qu. 2.4 verify that the determinant is equal to the product of the eigenvalues.

2.6. Show that the eigenvalues of a Hermitian matrix are real. Show also that the eigenvectors of a
Hermitian matrix belonging to distinct eigenvalues are orthogonal.

2.7. Cyclic invariance of the trace

Show that the trace of a product of matrices is invariant under a cyclic permutation of the
matrices, e.g.
Tr (ABC) = Tr (BCA) = Tr (CAB) . (2.40)
Hence show that the trace of a matrix is equal to the sum of its eigenvalues.

2.8. Show that the eigenvalues of a matrix whose square is the identity are ±1.

2.9. Show that two matrices A and B have common eigenvectors only if they commute, i.e. if

[A, B] ≡ AB − BA = 0. (2.41)
16 CHAPTER 2. REVIEW OF LINEAR ALGEBRA

2.10. Consider the Pauli spin matrices

0 1 0 −i 1 0
σx ≡ X = , σy ≡ Y = , σz ≡ Z = , (2.42)
1 0 i 0 0 −1

(i) Show that σi2 = 1 for i = x, y, z.

(ii) Determine the eigenvalues of each of the matrices.
(iii) Determine the commutators [σi , σj ] ≡ σi σj − σj σi for all distinct pairs i and j. Express your
results in terms of Pauli matrices.
(iv) Determine the anti-commutators {σi , σj } ≡ σi σj + σj σi for all distinct pairs i and j.

2.11. Consider ~σ = x̂σx + ŷσy + ẑσz , where the x̂ etc. refer to unit vectors in the indicated coordinate
directions. Show that
(~a · ~σ )(~b · ~σ ) = (~a · ~b)1 + i(~a × ~b) · ~σ .
Note: The answers to Qu. 2.10 will be useful here.

2.12. If A = BC show that A† = C † B † (note the reverse order). Hence show that if B and C are
Hermitian, then A is Hermitian only if [B, C] = 0, (i.e. if B commutes with C.)
Chapter 3

Introduction to Quantum Mechanics

In this chapter we give an introduction to quantum mechanics. A good textbook on the subject, at an
undergraduate level, is Griffiths [Gri05].

3.1 Quantum States as Complex Vectors

In Chapter 2 we reviewed linear algebra, including vectors, generalized to the case where the coefficients
of the vectors are complex.
We now describe the basic postulates of quantum mechanics. We will see that the framework
is precisely that of complex vectors. The notation, however, is quite different and so, for the next
few equations, we will show both a statement concerning quantum mechanics in quantum mechanics
notation, and the corresponding statement for complex vectors in the standard notation of linear
algebra.
While the discussion which follows may seem very abstract don’t forget that quantum mechanics
is arguably the most successful theory in all of physics, with countless precise comparisons between
theory and experiment, some to the most exquisite accuracy1 .
Now we get started with quantum mechanics:

Ansatz 1: The state of a quantum system is a complex vector (which we shall often call a “state
vector” or just a vector).

In quantum computing one uses the notation of Dirac, in which a quantum state is written as |ψi.

QM state : |ψi, ⇐⇒ complex vector : ~v . (3.1)

In equations with the double arrow ⇐⇒ in the middle, the part to the left of the arrow is in the
notation of quantum mechanics, and the part to the right is the corresponding statement in standard
linear algebra notation. The state |ψi can be expressed as a linear combination of basis states |ni,

N
X N
X
|ψi = cn |ni, ⇐⇒ ~v = vn ên , (3.2)
n=1 n=1

in which the cn are called “amplitudes” or sometimes “probability amplitudes”.

1
For example, experimental and theoretical values for the magnetic moment of the electron agree to better than a part
in a trillion, see Eq. (15.15d) of [Link]

17
18 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

We can write the state as a column vector

  
c1 v1
 c2   v2 
|ψi =  .  ⇐⇒ ~v =  .  . (3.3)
   
 ..   .. 
cN vN

We also introduce the dual state vector, denoted by hψ|. This corresponds to the complex conjugate
of the transpose vector introduced in Eq. (2.13) in the context of the scalar product of a complex vector.
In other words, if |φi is represented as a column vector by
 
d1
 d2 
|φi =  .  , (3.4)
 
 .. 
dN

then the corresponding dual vector is

?
hφ| = d?1 d?2 · · · d?N vT = v1? v2? · · · vN
? ,

⇐⇒ (3.5)

i.e. a row vector in which the coefficients are the complex conjugate of the coefficients in the original
column vector.
The scalar product of two vectors is called the “inner product” in a general context and this
nomenclature will be used here from now on. In quantum mechanics, the inner product of a vector |ψi
with vector |φi is written as hφ|ψi.
N
X N
X
hφ|ψi = d?n cn ⇐⇒ ~a · ~b = a?n bn . (3.6)
n=1 n=1

From this definition it follows that

hφ|ψi = hψ|φi? . (3.7)
The length of a vector in quantum mechanics is called the “norm” and written kψk. As with
ordinary vectors, the norm of a state vector in quantum mechanics is the square root of the inner
product with itself, i.e.

N
!1/2 n
!1/2
X 1/2
X
kψk = hψ|ψi1/2 = |cn |2 ⇐⇒ |v| = (~v · ~v ) = |vi |2 . (3.8)
n=1 i=1

As we shall see later, in quantum mechanics state vectors must have unit norm. Such vectors are said
to be normalized.
Orthogonality. Two state vectors are said to be orthogonal if their inner product is zero:

hφ|ψi = hψ|φi = 0, ⇐⇒ ~a · ~b = ~b · ~a = 0. (3.9)

We choose basis states |ni which are orthonormal, i.e. normalized and orthogonal,

hn|mi = δnm , ⇐⇒ ~en · ~em = δnm . (3.10)

So far, in this chapter we have emphasized the correspondence between quantum mechanical states
and complex vectors. Now that we are familiar with this correspondence, from now on we will describe
the formulation of quantum mechanics using only quantum mechanics notation.
3.1. QUANTUM STATES AS COMPLEX VECTORS 19

It will be useful to rewrite Eq. (3.2) for a linear superposition in a different way. Starting with
Eq. (3.2),
XN
|ψi = cn |ni, (3.11)
n=1
we take the inner product of both sides with the dual of one of the basis states, hm| say. Using the
orthonormality property in Eq. (3.10) gives us2
cn = hn|ψi, (3.12)
so we can rewrite Eq. (3.11) as
N
X
|ψi = |nihn|ψi. (3.13)
n=1
We call hn|ψi the probability amplitude for the state |ψi to be in basis state |ni. Equation (3.13) shows
us that
N
|nihn| = 1,
X
(3.14)
n=1
the identity matrix. Equation (3.14) is sometimes called a completeness relation. A single term in this
sum, |nihn| is an N × N matrix with all elements 0 except that the n-th diagonal element is 1.
To make our discussion more concrete consider the following example of a 2-state system, i.e. a
single qubit,
0 1
|ψ i = . (3.15)
2i
This is not normalized because the norm is
√ √
kψ 0 k = 12 + |2i|2 = 1 + 4 = 5.
p
(3.16)
To get a valid quantum state it must be properly normalized so we divide by the norm. Hence

1 0 1 1
|ψi = √ |ψ i = √ (3.17)
5 5 2i
is a valid quantum state. To get the dual state vector we take the complex conjugate of the transpose,
so
1
hψ| = √ 1 −2i . (3.18)
5
Suppose we also have a second state,

1 2
|φi = √ , (3.19)
5 −i
p √
which we see is normalized because 22 + | − i|2 = 5. What then is the inner product hψ|φi? We
have
1 2 1
hψ|φi = 1 −2i = (1 · 2 + (−2i) · (−i)) = 0, (3.20)
5 −i 5
so |ψi and |φi are actually orthogonal. In this example we had to be careful with the factors of i
because a complex conjugate is taken when we form the dual vector (which we need to get the inner
product with another vector).
To make sure we haven’t forgotten it, let’s reiterate (with a bit more math jargon) the first Ansatz
of quantum mechanics which we stated at the beginning of this section:
Ansatz 1: The state of a quantum system is a vector in a complex vector space (technically a Hilbert
space though we won’t need that level of mathematical sophistication here).
2
For ordinary vectors the corresponding expression would be vn = ~en · ~v .
20 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

3.2 Phases
At this point it is convenient to discus an important topic, namely phases. Suppose we have a 2-state
system with complex amplitudes, which we write in polar form as

|ψi = r0 eiθ0 |0i + r1 eiθ1 |1i, (3.21)

where r02 + r12 = 1 for normalization. Let’s take out the factor of eiθ0 , so

|ψi = eiθ0 r0 |0i + r1 ei(θ1 −θ0 ) |1i . (3.22)

We call θ0 the global phase which turns out to have no physical significance, while θ1 − θ0 is the relative
phase (of basis states |1i and |0i) which is important because it gives rise to interference. It is crucial
to understand the difference between global phase and relative phase. States which differ only in
the overall phase are physically identical. As we will see in Sec. 3.7 the reason for this is that no
measurement can distinguish states which only differ by a global phase. By contrast, states which
differ in a relative phase are physically distinct because measurements can distinguish between
them.
For example,
1 1 1 −1
|ψ1 i = √ , |ψ2 i = √ , (3.23)
2 −1 2 1
describe the same state because one is just the negative of the other. By contrast,

0 1 1 0 1 1
|ψ1 i = √ , |ψ2 i = √ , (3.24)
2 1 2 −1

describe different states because the relative phase of |1i and |0i is different in the two cases (0 for |ψ10 i
and π for |ψ20 i).

3.3 Observables
How is all this abstract stuff about complex vectors related to the real world, i.e. to quantities that we
can measure.
The answer is that an observable quantity will be an operator, Ô say, acting on these vectors. The
“hat” symbol “ ˆ ” indicates an operator, though for simplicity of notation we will usually omit the hat
when context makes clear that we are dealing with an operator. In terms of components, operators
are represented by matrices.
An operator acting on a state vector gives another state vector, so

Ô|ψi = |φi, (3.25)

A crucial point is that operators in quantum mechanics are linear, so

Ô ( a|ψi + b|φi ) = a Ô|ψi + b Ô|φi. (3.26)

This brings us to the second Ansatz of quantum mechanics:

Ansatz 2: Observables are represented by linear Hermitian operators. The result of a measurement
is one of the eigenvalues of the corresponding operator Ô. After the measurement, the system is in the
eigenstate corresponding to the measured eigenvalue.
3.3. OBSERVABLES 21

Why is it assumed that quantity which can be measured is represented by a Hermitian operator?
The answer is that the eigenvalues of a Hermitian operator (matrix) are guaranteed to be real, and we
know that the results of a measurement must be real.
We now discuss how to represent operators as a matrix using the Dirac notation. We take or-
thonormal basis vectors |ni which have the property hm|ni = δmn . In terms of components, |ni will
be a column vector with the n-th entry equal to 1 and all the others zero. In other words
 
0
0
 
 .. 
.
 
|ni = 1 .
(row n)   (3.27)
 .. 
.
 
0
0

Consider the action of an operator A on one of the basis vectors |ni. It will give a linear combination
of the basis vectors.
     
A11 A12 ··· A1n ··· A1,N −1 A1N 0 c1
 A21
 A 22 · · · A2n ··· A2,N −1 A2N  0  c2 
    
 .. .. . .. .. .. .. ..   ..   .. 
 .
 . . . . .  .  . 
    
 An1
 A n2 · · · A nn ··· An,N −1 AnN   1 =  cn  .
    (3.28)
 .. .. .. .. .. .. ..   ..   .. 
 .
 . . . . . .  .  . 
    
AN −1,1 AN −1,2 · · · AN −1,n · · · AN −1,N −1 AN −1,N  0 cN −1 
AN 1 AN 2 ··· AN n · · · AN,N −1 AN N 0 cN

We see that ck is equal to the element of A on the k-th row and n-th column, i.e. Akn . We can therefore
write Eq. (3.28) as X
A|ni = Akn |ki. (3.29)
k

Acting on the left with the dual vector hm| and using the orthonormality of the basis vectors, we get

Amn = hm|A|ni, (3.30)

which is the connection between the usual suffix notation for an element of a matrix, Amn , and the
Dirac notation for the same thing, hm|A|ni. They both refer to the m-th row and nth column of the
matrix A. ?
Recall that the definition of the adjoint of a matrix is A† = AT . Hence, in Dirac notation,

hm|A† |ni = hn|A|mi? . (3.31)

If A is Hermitian then it is equal to its adjoint so

hm|A|ni = hn|A|mi? (for A Hermitian). (3.32)

Note that this states, in component form, that the transpose of a Hermitian matrix is equal to its
complex conjugate, which is precisely the definition of a Hermitian matrix.
To gain still more familiarity with the Dirac notation consider hφ|A|ψi. If we write this out in
components in some basis, then |ψi is a column vector, A is a matrix and hφ| is a row vector, i.e. we
22 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

have   
A11 A12 ··· A1N ψ1
 A21
 A22 ··· A2N   ψ2 
hφ|A|ψi = φ?1 φ?2 · · · φ?N  . ..   ..  , (3.33)
 
.. ..
 .. . . .  . 
AN 1 AN 2 · · · AN N ψN
in an obvious notation. The multiplication can be done either by acting with A on |ψi to get |Aψi
and then taking the inner product with hφ|, or by acting with A to the left on hφ| and then taking the
inner product with |ψi. But what does acting with A to the left on hφ| mean? Let’s suppose that

hφ|A = hµ|. (3.34)

Then we have
 
A11 A12 ··· A1N
 A21 A22 ··· A2N 
φ?1 φ?2 · · · φ?N  . ? ? ?

..  = µ1 µ2 · · · µN . (3.35)
 
.. ..
 .. . . . 
AN 1 AN 2 · · · AN N

Evaluating components gives X

µ?m = φ?k Akm . (3.36)
k

This can be rearranged as

X
µm = φk A?km
k
? (3.37)
A†mk φk ,
X X
= AT φ =
mk k
k k

or, for the vector as a whole

|µi = A† |φi. (3.38)
which is equivalent to Eq. (3.34). Hence the action of A acting to the left on hφ| can be written as

hφ|A = h A† φ |. (3.39)

Summarizing, we see that in hφ|A|ψi, the operator A can be considered to act either to the left or
the right as follows:
hφ|A|ψi = h A† φ |ψi = hφ| Aψ i. (3.40)
In quantum mechanics A will commonly be a Hermitian operator (since observables are represented
by Hermitian operators) for which A† = A, so A acts equally to the right and to the left as follows:

hφ|A|ψi = h Aφ |ψi = hφ| Aψ i (for A Hermitian). (3.41)

3.4 The Computational Basis and Change of Basis

When dealing with standard vectors, we know that we can work with different sets of bases rotated
with respect to each other. In quantum mechanics, too, it will be convenient to represent state vectors
in terms of different bases, transformed with respect to each other.
The standard basis for a single qubit comprises the states |0i and |1i and in this basis the Pauli
operator Z is diagonal, see Eq. (2.23). This basis is called the computational basis. It is the basis
3.4. THE COMPUTATIONAL BASIS AND CHANGE OF BASIS 23

in which measurements are performed. Since Z is diagonal in this basis the eigenvectors of Z
are the basis vectors. For this reason the computational basis is sometimes called the Z-basis.
Note that for state |0i the eigenvalue of Z is +1 and for state |1i the eigenvalue of Z is −1. One
might have thought it should be the other way round but this is the convention that has been adopted.
We will also need to consider other bases, one of the most common being the X-basis, i.e. the basis
in which X (see Eq. (2.23)) is diagonal. We showed in Sec. 2.5 that the eigenvalues of X are +1 and
−1, with corresponding eigenvectors, called |+i and |−i (sometimes |0x i and |1x i), given by
1
|0x i ≡ |+i = √ (|0i + |1i)
2
(3.42)
1
|1x i ≡ |−i = √ (|0i − |1i) .
2
From these results it follows that, in the X basis, the Pauli X-matrix is written as

|+i |−i

h+| 1 0
X= , (3.43)
h−| 0 −1

which looks just like the Pauli-Z matrix in the Z (computational) basis.
There is a linear relation between the new basis vectors and the old ones. Denoting the old basis
vectors by Latin letters, e.g. |ni, and the new basis vectors by Greek letters, e.g. |αi, we write
X
|αi = Uαn |ni. (3.44)
n

The new basis vectors must be orthonormal, like the old set, and this constrains the matrix of coef-
ficients U in a way that we will now determine. Writing the equivalent of Eq. (3.44) in terms of row
vectors and taking the complex conjugate, we get the following transformation for the dual basis state
vectors X
?
hβ| = Uβk hk|. (3.45)
k
Taking the inner product of Eqs. (3.44) and (3.45) gives
X
?
hβ|αi = Uβk Uαn hk|ni
n,k
X
?
= Uβn Uαn
n
?
†
X X
= Uαn U T nβ
= Uαn Unβ = UU† , (3.46)
αβ
n n

where we used that hk|ni = δkn to get the second line. However, hβ|αi = δαβ and so we must have
U U † = 1, the identity matrix. Thus the matrix of coefficients which transforms from one basis to
another as in Eq. (3.44) must be unitary.
As an example, according to Eq. (3.42) the matrix which transforms from the Z-basis to the X
basis for one qubit is
1 1 1
U=√ . (3.47)
2 1 −1
We can verify that this matrix is unitary by evaluating its inverse and checking that U −1 = U † , or,
more simply, by recalling that the rows of a unitary matrix are orthonormal vectors, and the same for
the columns. By inspection, this is the case here.
24 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

3.5 Outer Product Notation

For orthonormal basis vectors, we have hi|ji = δij . As a further exercise in familiarization with the
Dirac notation, consider what we mean if we write the vector and the dual vector the other way round
i.e. |iihj|, which is called an “outer product”. It is actually a matrix. By sandwiching it on the left
and right by basis states we see that it is a matrix, whose entries are all zero except for the element in
the i-th row and j-th column which is 1. In other words

(col. j) (3.48)
 
0 0 0 ··· 0 ··· 0
0
 0 0 ··· 0 ··· 0 
 .. .. .. . . .. . . .. 
.
 . . . . . .
|iihj| = (row i) 
0 0 0 ··· 1 ··· 0 . (3.49)
 .. .. .. . . .. . . .. 
.
 . . . . . . 
0 0 0 ··· 0 ··· 0
0 0 0 ··· 0 ··· 0

If j = i, then we have a 1 in the i-th diagonal element and 0 everywhere else. This is a projection
operator on to state i, so we denote it by Pi , i.e.

Pi = |iihi|. (3.50)

One can see it is a projection operator because, if it acts on an arbitrary state |ψi, we have

Pi |ψi = |ii hi|ψi, (3.51)

which is also known as a completeness relation, see Eq. (3.14).

3.6 Functions of operators

We will need to evaluate functions of operators. For example what is eA ? In this case there is a
convergent series expansion which can be used to evaluate the function;

A2 A3
eA = 1 + A + + + ··· . (3.53)
2! 3!
In some cases the infinite series can be evaluated in closed form. Consider for example ecX where c is
a constant and X, the Pauli operator, is given in Eq. (2.23). We have X 2 = 1 and so X 3 = X 5 · · · =
X 2n+1 · · · = X, while X 2 = X 4 · · · = X 2n · · · = 1. Hence

c2 c4 c3 c5

e =1 1+
cX
+ + ··· + X c + + + ··· ,
2! 4! 3! 5!

cosh c sinh c
= 1 cosh c + X sinh c = . (3.54)
sinh c cosh c
3.7. MEASUREMENTS 25

More generally, we can evaluate a function of an operator by diagonalizing it. Consider first a
diagonal matrix,
 
λ1 0 ··· 0
0 λ2 · · · 0 
D= . ..  . (3.55)
 
.. . .
 .. . . . 
0 0 · · · λN
When multiplying D by itself n times, say, all that happens is each diagonal element is multiplied by
itself n times. Hence if f (D) is some function of D which can be represented by a series expansion, we
have  
f (λ1 ) 0 ··· 0
 0 f (λ2 ) · · · 0 
f (D) =  . ..  . (3.56)
 
. ..
 .. .. . . 
0 0 · · · f (λN )

If the function f (x) for scalar argument x does not have a series expansion, we take Eq. (3.56) as the
definition of the matrix function f (D) for a diagonal matrix D.
In general, a matrix A is not already in diagonal form. However, we can diagonalize it by a similarity
transform, see Eq. (2.36), which we repeat here:

A = SDS −1 , (3.57)

where D is a diagonal matrix with the eigenvalues of A on the diagonal. Hence it follows that

A2 = SDS −1 SDS −1 = SD2 S −1 ,

A3 = SDS −1 SDS −1 SDS −1 = SD3 S −1 , and so
n n −1
A = SD S , and hence
f (A) = Sf (D)S −1
 
f (λ1 ) 0 ··· 0
 0 f (λ2 ) ··· 0 
 −1
=S . S , (3.58)

. .. .. ..
 . . . . 
0 0 · · · f (λN )

which is the desired expression showing how to construct a function of a matrix from its eigenvalues
and eigenvectors.

3.7 Measurements
Now we have to discuss in detail the vexed topic of measurement in quantum mechanics. The reason
for using the term “vexed” will become clear later, especially in Chapter 6 when we discuss a famous
thought experiment of Einstein, Podolsky and Rosen (EPR).
In a measurement, our delicate quantum system is brought into contact with a macroscopic experi-
mental apparatus. Measurement is an irreversible process and as such has a special status in quantum
mechanics.
Assume that the Hermitian operator A corresponding to the measured quantity of interest has
eigenvalues λn and normalized eigenvectors |ni. Because A is Hermitian the eigenvalues are real. In
addition, for a Hermitian matrix of size N there are N orthogonal eigenvectors which can therefore
26 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

where the last line is from Eq. (3.13).

According to ansatz 2 in Sec. 3.3, a measurement will give one of the eigenvalues, λn , but which
one? To answer this question, we need to add one more ingredient to our Ansatz 2, one which was
first proposed by Born in a footnote in a 1926 paper, and which is therefore called the “Born rule”.
This states that the probability, P (n), to get eigenvalue λn (and after the measurement to leave the
system in eigenstate |ni), is the square of the modulus of the amplitude an , i.e.

P (n) = |an |2 ≡ |hn|ψi|2 ≡ hψ|ni hn|ψi, (3.60)

where we used Eq. (3.12) and that hψ|ni = hn|ψi? . Since probabilities must add up to 1, it follows
that state vectors in quantum mechanics must be normalized to unity, i.e.
X X X X
1= P (n) = |an |2 = |hn|ψi|2 = hψ|ni hn|ψi = hψ|ψi. (3.61)
n n n n

Note that the probability of a getting a particular measured value only depends on the square of the
modulus of the amplitude of the corresponding eigenstate. This means that the global phase of a state
has no physical significance since no measurement can distinguish two states which differ only by a
global phase.
However if two states differ in a relative phase there are measurements which can distinguish
between them. For example, |+i = √12 (|0i + |1i) and |−i = √12 (|0i − |1i) are eigenstates of X with
eigenvalues +1 and −1 respectively, so a measurement of X will give different results (+1 for |+i and
−1 for |−i, with probability 1 in both cases).
We therefore have to complete our Ansatz 2 to include the probabilities of different results:
Ansatz 20 : “Observables are represented by a linear Hermitian operator. Measurement of an ob-
servable corresponding to a (linear) Hermitian operator Ô gives one of the eigenvalues of Ô. The
probability of getting an eigenvalue is the square of the modulus of the amplitude for the state of
the system to be in the corresponding eigenstate of Ô. After the measurement, the system is in this
eigenstate.”
The fact that probabilities enter into the results of measurements has led to a lot of “vexed”
discussion. Your first reaction might be “What’s the fuss? After all, don’t probabilities enter in classical
physics too? If one tosses a coin isn’t the result is randomly heads or tails with equal probability?”
Well, is it really random? If one could measure with sufficient precision the initial momentum and
angular momentum of the coin, and integrate the equations of motion for its trajectory, including
the effects of air resistance, to sufficient accuracy then one would be able to compute, with certainty,
on which side it would land. The difficulty is that the coin toss has great sensitivity to the initial
conditions, which means that if one changes the initial velocity by an immeasurably small amount the
result changes. In other words, for all practical purposes (FAPP) a coin toss is random. Nonetheless,
from a fundamental point of view it is not, since it is uniquely determined by the initial conditions.
However, the situation in quantum mechanics is different since, as far as we know, probabilities enter
in a fundamental way.
3.8. STATISTICS OF MEASUREMENTS 27

The most famous critic of probabilities being part of a fundamental theory of physics was Einstein,
who had many discussions on the topic with Niels Bohr. As we shall see in our study of the EPR thought
experiment in Chapter 6, despite Einstein’s claim that “God doesn’t play dice with the universe”,
quantum mechanics has been repeatedly vindicated.
We have said that after a measurement the system is left in eigenstate |ni. Measurement therefore
“projects” the initial state |ψi on to |ni. This is accomplished by the projection operator

P̂n = |nihn| (3.62)

so
P̂n |ψi = |nihn|ψi, (3.63)
(no sum on n). The sum of the projection operators must add to the identity, i.e.

|nihn| = 1.
X X
P̂n ≡ (3.64)
n n
P
The fact that n |nihn| can be replaced by the identity is called a “completeness” relation.
Note that the state in Eq. (3.63) is not normalized. If we continue to follow the system after the
measurement then we need to multiply the state by 1/|hn|ψi|, so it is again correctly normalized and
the sum of probabilities of results of a future measurement will add to unity. We note that something
similar is also done in classical statistics. If we have a sequence of measurements, and we know the
result of the first one, then we can determine the “conditional probability” of subsequent measurements,
given the result of the first measurement, and these conditional probabilities add to unity. In effect,
this is what is done by multiplying a state by a constant to get its norm back to 1 after a measurement.
The resulting state will give the conditional probabilities for a subsequent measurement given the result
of the first measurement.
Let’s give a simple example of a measurement. Consider one qubit in state |ψi = √12 (|0i + |1i) and
measure Z. The eigenstates of Z are |0i and |1i with eigenvalues +1 and −1 respectively. Hence the
results of a measurement of Z are
1 2 1

+1, prob. √ = , qubit is in state |0i after the measurement,
2 2
2 (3.65)
1 1
−1, prob. √ = , qubit is in state |1i after the measurement.
2 2
Now suppose that we measure X. The eigenstates of X are shown in Eqs. (2.28) and (2.30) to be
√1 (|0i± |1i). Hence |ψi is the eigenstate with eigenvalue +1, so the result of the measurement of X
2
is +1 with 100% probability. Similarly a measurement of X on state √12 (|0i − |1i) would give −1 with
probability 1.
We see that if the initial state is an eigenstate of the operator being measured, then the result
will, with certainty, be the corresponding eigenvalue, and the state will remain unchanged after the
measurement. However, if this is not the case, i.e. if the initial state is in a superposition of eigenstates
of the measurement operator, then (i) the result of the measurement will take one of several values
with appropriate probabilities, and (ii) the measurement changes the state, leaving it in the eigenstate
corresponding to the eigenvalue which is measured.

3.8 Statistics of Measurements

If we prepare many identical copies of the system and measure each of them what can we say about
the statistics of the measured values λn , the eigenvalues of A. First of all, what would be the mean of
28 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

the measurements hAi? We have

X
hAi = P (n)λn
n
X X
= |hn|ψi|2 λn = hψ|niλn hn|ψi
n n
X
= hψ|A|nihn|ψi
n
= hψ|A|ψi . (3.66)

where we used Eq. (3.60) to get the second line, we used that A|ni = λn |ni to get the third line, and
Eq. (3.64) to get the last line. The final result, hψ|A|ψi is called the “expectation value” of A in state
|ψi.
In addition to the average result we are also often interested in the scatter about the average. This
is characterized by the standard deviation defined by
D E1/2
∆A = (A − hAi )2 , (3.67)

which is the root mean square deviation about the mean. It can be expressed in a slightly simpler
form since

(A − hAi)2 = hA2 − 2AhAi + hAi2 i

= hA2 i − 2hAi2 + hAi2
= hA2 i − hAi2 , (3.68)

so 1/2
∆A = hA2 i − hAi2 (3.69)
We will call ∆A the uncertainty in A.
Let’s illustrate this with the example we considered just above, namely |ψi = √1 (|0i + |1i). If we
2
measure Z we have
1 1 0 1
hZi = hψ|Z|ψi = 1 1 = 0. (3.70)
2 0 −1 1
This agrees with our previous discussion where we found +1 and −1 with equal probability. We also
have Z 2 = 1 and, since the average of 1 is always 1,

hZ 2 i = 1, (3.71)

so 1/2
∆Z = hZ 2 i − hZi2 = 1. (3.72)
For a measurement of X we already showed that |ψi is an eigenstate with eigenvalue 1 and so the
measured value is always 1. If we use Eqs. (3.66) and (3.69), we obtain hXi = 1, hX 2 i = 1, and so
∆X = 0 as expected.

3.9 Composite Systems

So far, we have described states of just a single qubit. How should we describe states of the many
qubits which we will need for a quantum computer? Suppose, as an example, we have two qubits A
and B. We can label the states of qubit A by |0A i and |1A i, and similarly the states of qubit B by |0B i
3.9. COMPOSITE SYSTEMS 29

and |1B i. A state of both qubits is written as a “tensor product”, also known as a “direct product”,
e.g. |0A i ⊗ |1B i, which in this example indicates that qubit A is in state |0i and qubit B is in state |1i.
This notation is heavy so we will usually write the same state more compactly as |0A i|1B i, or
even more concisely as |01i provided a specification of the order of the qubits has been given. In this
notation, the four possible states of two qubits are

|00i, |01i, |10i, |11i. (3.73)

Note that the label of each state is a number in binary notation from 0 to 3. This provides an even
more compact notation, which is particularly convenient when the number of qubits is large, namely
|xi2 , where x = 0, 1, 2 or 3. It is necessary to indicate the number of qubits by a subscript on the
bracket to avoid ambiguity. For example just writing a state as |2i we wouldn’t know if it is state |10i
for 2 qubits, or |010i for 3 qubits and so on. An exception to this will be states |0i and |1i (without
subscript) which always refer to the 1-qubit basis states. The four states in Eq. (3.73) can therefore
also be written as
|0i2 , |1i2 , |2i2 , |3i2 . (3.74)

Similarly for three qubits, we can specify the 8 possible states by |xi3 where x = 0, 1, · · · , 7, and
for n qubits the 2n states are indicated by |xin , where x = 0, 1, · · · , 2n − 1. We see that to use this
convenient binary notation we need to label the states starting from 0 rather than 1. The last state
then has label 2n − 1. I emphasize that an n-qubit basis state |xn−1 xn−2 · · · x2 x1 x1 i, where the xi are
the values of the qubits, can also be represented as |xin where x is the n-bit integer whose bits are the
xi . Furthermore, summing over the two values (0 and 1) of xi for each bit i, is equivalent to summing
over all the x values from 0 to 2n − 1.
We need to be familiar with these ways of labeling multi-qubit states.
Next we discuss matrix representations of operators on multiple qubits, and we take as an example,
the case of two qubits. An operator acting on the space of two qubits is a 4 × 4 matrix. We will write
the four basis states as |00i, |01i, |10i, |11i. Consider an operator where X acts on the first (left
hand) qubit and the identity 1 acts on the second (right hand) qubit. The 2-qubit operator is a tensor
product of the 1-qubit operators, i.e. X ⊗ 1. Its action on the four basis states is as follows:

X ⊗ 1|00i = |10i,
X ⊗ 1|01i = |11i,
(3.75)
X ⊗ 1|10i = |00i,
X ⊗ 1|11i = |01i,

so its matrix representation is

|00i |01i |10i |11i

 
h00| 0 0 1 0
0 1

h01|  0 0 0 1
X ⊗1= = , (3.76)
1 0
 
h10|  1 0 0 0
h11| 0 1 0 0

where in the last expression each entry is a 2 × 2 block. Note how this block structure reflects the
operators in the tensor product on the left of the expression. The 2 × 2 block structure is that of
X (the left hand operator) while each block is made up of the identity 1 (the right hand operator).
30 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

Similarly

|00i |01i |10i |11i

 
h00| 0 1 0 0
h01|  1 0 0 0 X 0
1⊗X =   = . (3.77)
h10|  0 0 0 1 0 X
h11| 0 0 1 0

As another example consider X ⊗ Z. We have

X ⊗ Z|00i = |10i,
X ⊗ Z|01i = −|11i,
(3.78)
X ⊗ Z|10i = |00i,
X ⊗ Z|11i = −|01i,

so its matrix representation is

|00i |01i |10i |11i

 
h00| 0 0 1 0
h01|  0 0 0 −1= 0 Z .
X ⊗Z =  (3.79)
h10|  1 0 0 0 Z 0
h11| 0 −1 0 0

Again notice how the block structure in the last expression reflects the operators in the tensor product.

3.10 Generalized Born Rule

In Sec. 3.7 we gave the standard physics text book discussion of measurement in quantum mechanics.
For quantum computing we need to extend this to deal with situations involving multiple qubits where
we measure only some of the qubits and we need to know the state of the remaining qubits after the
measurement. As a simple example, suppose we have 2 qubits A and B, in a state

|ψi = a0 |00i + a1 |01i + a2 |10i + a3 |11i, (3.80)

where the left qubit is A and the right qubit is B. Because the state has to be normalized we need
|a0 |2 + |a1 |2 + |a2 |2 + |a3 |2 = 1. Suppose we measure Z for qubit-A, the left qubit. We want to know
what are the possible measurement results, what are the probabilities of the different results, and, for
each case, in what state is qubit B after the measurement.
We will rewrite Eq. (3.80), grouping together all the terms where qubit A is |0i (more generally an
eigenstate of the operator acting on A), and all the terms where qubit A is |1i (the other eigenstate).
The terms involving qubit A in state |0i are a0 |00i + a1 |01i. We write this as

a0 a1
a0 |00i + a1 |01i = α0 |0A i |0B i + |1B i = α0 |0A i|φ0,B i, (3.81)
α0 α0

where
|α0 |2 = |a0 |2 + |a1 |2 (3.82)
and
1
|φ0,B i = (a0 |0B i + a1 |1B i) , (3.83)
α0
3.11. THE UNCERTAINTY PRINCIPLE 31

is a normalized state for qubit B. Similarly

a2 a3
a2 |10i + a3 |11i = α1 |1A i |0B i + |1B i = α1 |1A i|φ1,B i, (3.84)
α1 α1
where
|α1 |2 = |a2 |2 + |a3 |2 (3.85)
and
1
|φ1,B i = (a2 |0B i + a3 |1B i) , (3.86)
α1
is normalized.
Combining we get
|ψi = α0 |0A i|φ0,B i + α1 |1A i|φ1,B i, (3.87)
where we emphasize that all the states in this expression are normalized.
The inner product of |φ0 i and |φ1 i is
a?0 a2 + a?1 a3
hφ0,B |φ1,B i = p p , (3.88)
|a0 |2 + |a1 |2 |a2 |2 + |a3 |2

and there is no reason for this to be zero in general. Hence, while |φ0,B i and |φ1,B i are normalized,
they are not necessarily orthogonal.
The natural extension of the Born rule, called the “generalized Born” rule, is that, when the two
qubits are in the state given in Eq. (3.87), the possible results of the measurement of Z on qubit A,
are
result +1, probability |α0 |2 , final state |0A i|φ0,B i,
2
(3.89)
result −1, probability |α1 | , final state |1A i|φ1,B i.

It is straightforward to generalize this result to an arbitrary situation in which there are n + m qubits,
n of which are measured and we want to know the possible final states of the remaining m qubits after
the measurement, and to arbitrary measurement operators.

3.11 The Uncertainty Principle

Now we come to a key concept in quantum mechanics, the uncertainty principle. We shall see that
some variables are incompatible with each other, which means that one can not have definite values
for both of them in any state. The important quantity to see if two operators, A and B say, are
compatible is their commutator
[A, B] ≡ AB − BA. (3.90)
If [A, B] 6= 0 then it is shown in linear algebra texts that A and B have different eigenvectors. We
have already noted that we only get a definite value for some operator if the state is in an eigenstate
of that operator. Hence, if [A, B] 6= 0, so A and B have different eigenvectors, there is no state which
will give a definite value for both of them.
As an example of a commutator consider X and Z. We have

1 0 0 1 0 1 1 0 0 1 0 −1 0 1
[Z, X] = − = − =2 = 2iY, (3.91)
0 −1 1 0 1 0 0 −1 −1 0 1 0 −1 0

where Y is defined in Eq. (2.23). Since the commutator is non-zero it is impossible to find a state
which is simultaneous eigenstate of both X and Z and so either ∆X or ∆Z, or both, must be non-zero.
32 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

An important inequality involving the uncertainties ∆A and ∆B of two operators in a state |ψi is
1
(∆A ∆B)ψ ≥ h[A, B]iψ , (3.92)
2
which is known as the Heisenberg uncertainty principle. We shall not prove this result. The most
famous case of the uncertainty principle is for A = x, the position of a particle, and B = p, its
momentum, for which the commutator is a constant3 ih̄ so
h̄
∆x∆p ≥ . (3.93)
2
However, this particular version of the uncertainty principle does not play a role in quantum computing
which is concerned with (discrete) 2-state systems, rather than (continuous) trajectories of particles.

3.12 Time Evolution of Quantum States

So far, we have described fixed quantum states. Now we need to discuss how they evolve with time.
If the state at an initial time is |ψi and the state at a later time is |ψ 0 i, then, according to quantum
mechanics, that there is a linear relation between the two, so

|ψ 0 i = U |ψi, (3.94)

for some linear operator U . The normalization condition must be preserved so hψ 0 |ψ 0 i = hψ|ψi = 1.
This provides a constraint on the form of U as we will now show. The equation corresponding to
Eq. (3.94) for the dual vector hψ 0 | is
hψ 0 | = hψ|U † . (3.95)
†
To see this compare Eqs. (3.38) and (3.34) and note that A† = A. Combining Eqs. (3.94) and (3.95)

U † U = 1, (3.97)

so U has to be unitary.
In quantum computing we change the state of the qubits by a sequence of discrete unitary trans-
formations. Note that for a unitary operator U −1 = U † , and U † is well defined, so the inverse
transformation, which acts on the final state and converts it to the initial state, exists. Thus quantum
transformations are reversible. The exception is measurement, in which the quantum system is coupled
to a macroscopic, external apparatus which leads to an irreversible change. As we shall see, standard
classical gates which manipulate the bits in a classical computer are irreversible. The necessity of do-
ing reversible operations in a quantum computer will be a major difference compared with a classical
computer.
In a quantum computer, as noted above, we act on the qubits with a series of discrete unitary
operations, but we should be aware that these are implemented by acting with some operation for a
finite amount of time, see e.g. Chs. 14–17 of the book by LaPierre [LaP21].
Microscopically, quantum states evolve continuously with time, and we will finish this chapter with
a brief discussion of continuous time evolution in quantum mechanics (even though it will not be needed
3
h̄ is Planck’s constant divided by 2π. It is of paramount importance in physics but does not explicitly play a role in
the theory of quantum computation.
3.12. TIME EVOLUTION OF QUANTUM STATES 33

in the rest of the course). Time evolution is determined by the Hamiltonian (energy), H a Hermitian
operator, according to Ansatz 3:
Ansatz 3: The time dependence of a state is given by Schrödinger’s equation

∂
ih̄ |ψ(t)i = H |ψ(t)i. (3.98)
∂t
Assuming that H does not change with time, we can integrate Eq. (3.98) to get

|ψ(t)i = U (t)|ψ(0)i, (3.99)

where
U (t) = e−iHt/h̄ . (3.100)
Since H is Hermitian we can show that U is unitary by the following argument. To get the adjoint of
U we take its complex conjugate and replace any operators in the expression for U by their adjoint.
Since H is self-adjoint (Hermitian) we have

U † (t) = eiHt/h̄ , (3.101)

from which one sees that

U † (t)U (t) = eiHt/h̄ e−iHt/h̄ = ei(Ht−Ht)/h̄ = 1, (3.102)

so U is unitary as required. Note that if we have operators in exponentials which don’t commute,
we can’t manipulate them as we do with ordinary numbers. For example eA eB does not equal eA+B
unless [A, B] = 0. However, here both A and B are proportional to H which commutes with itself, so
combining the exponentials as done in Eq. (3.102) is valid.

Problems
3.1. Consider the following state vectors:

2 4
|ψi = , |φi = .
3i 5i

(i) Write down the dual vectors hψ| and hφ|.

(ii) Normalize |ψi and |φi.
(iii) For the normalized states determined in the last part, compute the inner product hφ|ψi.

3.2. Which of the following pairs of of quantum states represent the same physical state? (You must
explain your results.)

(i) |0i and −|0i .

(ii) √1 (|0i + |1i) and √1 (|0i − |1i
2 2
1 1
(iii) √ (|0i
2
− |1i) and √ (|1i
2
− |0i
(iv) √1 (|0i + eiπ/4 |1i) and √1 (e−iπ/4 |0i + |1i
2 2
(v) √1 (|0i + i|1i) and √1 (i|0i + |1i
2 2
34 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS

For those cases where the two states are different, what measured quantity would give different
results? Show that the measured quantity acting on the states does yield different results. Hint:
Think Pauli matrices.

3.3. Tensor products of matrices

Note the block structure of the following tensor product:

|00i |01i |10i |11i

 
0 0 1 0
(0)Z (1)Z  0 0 0 −1
X ⊗Z = = . (3.103)
(1)Z (0)Z 1 0 0 0
0 −1 0 0

Obtain, in a similar way, the tensor products for Z ⊗ X and H ⊗ H where H is the Hadamard
matrix.

3.4. Consider the 2-qubit state

1
|ψi = √ (|0i1 |0i2 − |1i1 1i2 ). (3.104)
2
Determine the expectation values of hZ1 Z2 i and hX1 X2 i in this state.

3.5. Show that the following state is separable (i.e. is a product state):

1
(|00i − |01i − |10i + |11i) , (3.105)
2
and, by inspection, write the state in separable form.

3.6. Consider the state

1
|ψi = √ (|0i + |1i). (3.106)
2
Determine hZi, hZ 2 i, ∆Z, hXi, hX 2 i, ∆X, hY i, hY 2 i and ∆Y . Show that the uncertainty is only
zero for those operators for which the state is an eigenstate.

3.7. Consider the Bell states discussed in class

|0yi + (−1)x |1yi
|βxy i = √ , (3.107)
2

where y is the complement of y, i.e. y = 1 − y. Show that

Z ⊗ Z|βxy i = (−1)y |βxy i, (3.108)

x
X ⊗ X|βxy i = (−1) |βxy i, (3.109)
x+y
Y ⊗ Y |βxy i = −(−1) |βxy i. (3.110)

3.8. Show that the unitary operator which transforms the Z-basis (i.e. the basis in which Z is diagonal)
to the X-basis is
1 1 1
H=√ . (3.111)
2 1 −1
H is called the Hadamard operator.
3.12. TIME EVOLUTION OF QUANTUM STATES 35

3.9. Consider a system with two qubits, A and B, in state

1 √ √
|φi = |0iA |0iB + 3 |1iA |0iB + 5|1iA |1iB .
3
(i) Show that the state is normalized.
(ii) If measurements are made of both qubits, what are the possible results and their probabil-
ities?
(iii) If a measurement is made only on qubit A what are the possible resulting states for qubit
B and what are their probabilities?
Note: Make sure that the probabilities add up to 1.
36 CHAPTER 3. INTRODUCTION TO QUANTUM MECHANICS
Chapter 4

General state of a qubit, no-cloning

theorem, entanglement and Bell states

4.1 General qubit states

As already discussed in Sec. 2.5, the following 2 × 2 matrices, called Pauli matrices, acting on the states
of a single qubit will be important in the rest of the course:

0 1
X ≡ σx = , (4.1a)
1 0

0 −i
Y ≡ σy = , (4.1b)
i 0

1 0
Z ≡ σz = . (4.1c)
0 −1

In the physics literature the notation used is σx , σy and σz , but in this course we shall use the quantum
computing notation: X, Y , and Z. As shown in Sec. 2.4, an arbitrary 2 × 2 matrix can be written as
a linear combination of the three Pauli matrices plus the 2 × 2 identity matrix. These matrices are
Hermitian, and have eigenvalues ±1, see Sec. 2.4.
If the qubit is the spin of an electron, then the eigenstate with Z = 1 has spin along the +z direction,
and analogously the eigenstate with Y = 1 has spin along the +y direction, and the eigenstate with
X = 1 has spin along the +x direction. Also, the eigenstate with Z = −1 has the spin pointing in the
−z direction, and analogously for X = −1 and Y = −1.
How can we specify a general state of a qubit? To see this, we first ask how many parameters do
we need to specify a general state? A qubit vector has two complex components making a total of four.
However, one of these can be eliminated because the state must be normalized, and another can be
eliminated because an overall phase is unimportant. This leaves two parameters necessary to describe
a general qubit state.
We shall see that we can conveniently take these two parameters to be the two angles which describe
a direction in space in spherical polar coordinates. To see this we compute the eigenstates for the spin of
an electron aligned along a general direction with polar angle θ and azimuthal angle φ, which describe
a unit vector n̂ where
n̂ = (sin θ cos φ, sin θ sin φ, cos θ), (4.2)
so nx = sin θ cos φ etc. In other words we compute the eigenvalues and eigenvectors of ~σ · n̂. We have

nz nx − iny
~σ · n̂ = (4.3)
nx + iny −nz

37
38CHAPTER 4. GENERAL STATE OF A QUBIT, NO-CLONING THEOREM, ENTANGLEMENT AND BE

so the eigenvalues are given by

nz − λ nx − iny
= 0. (4.4)
nx + iny −nz − λ
Expanding the determinant, and using that n2x + n2y + n2z = 1, we find the eigenvalues to be

λ = ±1. (4.5)

Thus, the eigenvalues are not only ±1 when measured along the Cartesian directions, but take the
same values along any direction.
Next we look at the eigenvectors. First the eigenvector for eigenvalue +1 is

a
|0n̂ i = (4.6)
b

where
sin θ e−iφ

cos θ a a
iφ = , (4.7)
sin θ e − cos θ b b
where we used Eqs. (4.2) and (4.3). Writing out the two equations we get

sin θ e−iφ b = a(1 − cos θ), (4.8a)

iφ
sin θ e a = b(1 + cos θ). (4.8b)

Both these equations are satisfied by

b cos θ2 = a eiφ sin θ2 , (4.9)

in which we used that

sin θ = 2 sin θ/2 cos θ/2, cos θ = 2 cos2 θ/2 − 1 = 1 − 2 sin2 θ/2. (4.10)

We require the state to be normalized, i.e. |a|2 + |b|2 = 1, so we get

cos θ2
|0n̂ i = iφ , (4.11)
e sin θ2

or equivalently, in Dirac notation,

|0n̂ i = cos θ2 |0i + eiφ sin θ2 |1i . (4.12a)

A similar calculation gives the eigenstate corresponding to eigenvalue −1 to be

|1n̂ i = − sin θ2 |0i + eiφ cos θ2 |1i . (4.12b)

It is straightforward to see that the states in Eqs. (4.12) are normalized, i.e.

h0n̂ |0n̂ i = 1, h1n̂ |1n̂ i = 1, (4.13)

and are mutually orthogonal

h0n̂ |1n̂ i = 0. (4.14)
Note that we can always multiply eigenstates by an arbitrary phase factor so you might see expres-
sions for these eigenstates which look different from Eqs. (4.12a) and (4.12b), but which are actually
equivalent.
4.1. GENERAL QUBIT STATES 39

z 0

θ 0n

y
ϕ
x
1n

Figure 4.1: The Bloch sphere.

If we consider a point on a unit sphere (often called the Bloch sphere) with polar angles θ and φ,
then the eigenstate of spin in that direction with eigenvalue +1 is given by Eq. (4.12a), see Fig. 4.1.
Even if the qubit is not an electron spin, Eq. (4.12a) provides a convenient description of an arbitrary
qubit state.
Similarly, (apart from a possible unimportant overall phase factor) the eigenstate with eigenvalue
−1 is given by Eq. (4.12b), which corresponds to the antipodal point where θ → π − θ, φ → φ + π.
It is useful to consider four special cases of Eqs. (4.12):
(i) (θ = φ = 0), the z direction. Clearly |0ẑ i = |0i and |1ẑ i = |1i as required.
(ii) (θ = π/2, φ = 0), the x direction:
1
|0x̂ i = √ ( |0i + |1i ) = |+i, (4.15)
2
1
|1x̂ i = √ ( −|0i + |1i ) = −|−i. (4.16)
2
These are the eigenstates of X as expected. (|1x̂ i has the opposite sign to the conventionally
defined state |−i, but the overall sign of a state is of no importance.)
(iii) (θ arbitrary φ = 0), a direction n̂, in the x-z plane at an angle θ to the z axis:
|0n̂ i = cos θ2 |0i + sin θ2 |1i (φ = 0), (4.17)
θ θ
|1n̂ i = − sin |0i + cos |1i.
2 2 (4.18)

(iv) (θ = π/2, φ = π/2), the y direction:

1
|0ŷ i = √ ( |0i + i |1i ) , (4.19)
2
1
|1ŷ i = √ ( −|0i + i |1i ) . (4.20)
2
40CHAPTER 4. GENERAL STATE OF A QUBIT, NO-CLONING THEOREM, ENTANGLEMENT AND BE

These are the eigenstates of Y as expected.

We mentioned in Sec. 1.4 that for certain quantum protocols photons make good qubits, with the
state of the qubit being characterized by its polarization (the direction and phase of the electric field).
Using Eqs. (1.7)–(1.10) and (4.12a), we find that the electric field of a photon propagating in the ẑ
direction, corresponding to a qubit |0n̂ i specified by angles θ and φ, is given by
h i
E~ = < E0 cos(θ/2)e−i(kz−ωt) x̂ + E0 sin(θ/2)eiφ e−i(kz−ωt) ŷ , (4.21)

where < means real part, so

Ex = E0 cos(θ/2) cos(ωt − kz),
(4.22)
Ey = E0 sin(θ/2) cos(ωt − kz − φ).
Hence one can create an arbitrary qubit state by an appropriate choice of photon polarization. The
polarization states for a photon for each of the four special cases given above are:
(i) (θ = φ = 0), i.e. |0ẑ i ≡ |0i. Linearly polarized along x̂. (The photon corresponding to |1i is
polarized along ŷ.)
(ii) (θ = π/2, φ = 0), i.e. |0x̂ i. Linearly polarized along a diagonal direction. (The photon corre-
sponding to |1x̂ i is polarized along the other diagonal direction.)
(iii) (θ arbitrary, φ = 0). Linearly polarized with the polarization direction at an angle θ/2 to the
x-axis.
(iv) (θ = π/2, φ = π/2), i.e. |0ŷ i. Circularly polarized1 with the E~ vector rotating in a particular
sense as a function of time. (The photon corresponding to |1ŷ i is is circularly polarized with the
~ vector rotating in the opposite sense.)
E

4.2 No-cloning theorem

A classical bit, 0 or 1, can be copied, i.e. cloned. You just observe it and create another one. With
qubits, however, it turns out to be not possible to clone an arbitrary, unknown state. This is called
the “no-cloning theorem”. It imposes an important limitation on our ability to manipulate quantum
states. We now give the simple derivation of this important result.
Consider the general qubit state

|ψi = α|0i + β|1i, where |α|2 + |β|2 = 1. (4.23)

U |ψi |0i = |ψi |ψi. (4.24)

We shall see that no such operator can exist because operators in quantum mechanics are linear.
Suppose that
U |ψi |0i = |ψi |ψi,
(4.25)
U |φi |0i = |φi |φi.
1
Recall that cos(x − π/2) = sin(x)
4.3. ENTANGLEMENT AND BELL STATES 41

Then, by linearity,
U ( α|ψi + β|φi ) |0i = α|ψi |ψi + β|φi |φi . (4.26)
However, this is not a clone of α|ψi + β|φi which would be
( α|ψi + β|φi ) ( α|ψi + β|φi ) = α2 |ψi |ψi + αβ|ψi |φi + αβ|φi |ψi + β 2 |φi |φi. (4.27)
There is an inconsistency so a unitary operator U for cloning does not exist.
The no-cloning theorem will be an important limitation when designing quantum algorithms.

4.3 Entanglement and Bell states

A striking aspect of quantum states of more than one qubit, which seems mysterious and plays a
crucial role in quantum algorithms, is called “entanglement”. Here we will illustrate this concept for
the simplest case of two qubits.
Let’s suppose that the first qubit is in state |ψ1 i = α1 |0i + β1 |1i and the second qubit is in state
|ψ2 i = α2 |0i + β2 |1i. The state of the two-qubit system is the tensor product
 
α1 α2
α1 α2  α1 β 2 
|ψ1 i ⊗ |ψ2 i = ⊗ =  β 1 α2  .
 (4.28)
β1 β2
β1 β2
This is an example of what is called a product state (it is also sometimes called a separable state).
However, a general qubit state is not a product state. It can be written as
|φi2 = c0 |00i + c1 |01i + c2 |10i + c3 |11i, (4.29)
or equivalently as
3
X
|φi2 = c0 |0i2 + c1 |1i2 + c2 |2i2 + c3 |3i2 = cx |xi2 . (4.30)
x=0
The product state has
c0 = α1 α2 , c1 = α1 β2 , c2 = β1 α2 , c3 = β1 β2 , (4.31)
and so satisfies
c0 c3 = c1 c2 . (4.32)
This is the condition for a 2-qubit state to be a product state. States which do not have this property
are said to be entangled.
The most-studied entangled states are so-called Bell states which involve two qubits. They are
named in honor of the physicist John Bell whose inequalities (to be discussed later) demonstrated that
the description of nature provided by quantum mechanics is fundamentally different from the classical
description. The Bell states are defined by
1
|β00 i = √ ( |00i + |11i ) , (4.33a)
2
1
|β01 i = √ ( |01i + |10i ) , (4.33b)
2
1
|β10 i = √ ( |00i − |11i ) , (4.33c)
2
1
|β11 i = √ ( |01i − |10i ) . (4.33d)
2
42CHAPTER 4. GENERAL STATE OF A QUBIT, NO-CLONING THEOREM, ENTANGLEMENT AND BE

These four equations can be combined as follows:

1
|βxy i = √ ( |0yi + (−1)x |1yi ) , (4.34)
2
where y is the complement of y, i.e. y = 1 − y. The Bell states are clearly entangled.
There are correlations between the qubits in the Bell states (quite generally between the qubits
in entangled states). For example, if we consider |β00 i and do a measurement on qubit 1, then a
measurement of qubit 2 (if performed) would find the same result with 100% probability. We will
discuss quantum correlations in entangled states in some detail in Chapter 6 when we investigate the
claim of Einstein-Podolsky-Rosen (EPR) that quantum mechanics is incomplete.
For the case of two qubits, Eq. (4.32) is a convenient way to test if a state is a product state or
entangled. In a more general case where we have, say, n = nA + nB qubits, we may want to know
whether a partition of the system into the two subsystems A, with nA qubits, and B, with nB qubits,
gives a product state, i.e. if
|ψin = |ψA inA ⊗ |ψB inB , (4.35)
or whether the state is entangled with respect to this partition. In the case with more than n = 2 qubits,
there is no simple expression analogous to Eq. (4.32) for the 2n coefficients cx , (x = 0, 1, · · · , 2n − 1),
which indicates a product state. Instead, a systematic way to investigate whether such a state is
entangled or a product state is to use the density matrix, discussed in Chapter 5.

Appendix

4.A Angular Momentum Eigenstates

Physics students learn about quantum states which are eigenstates of angular momentum. This ap-
pendix relates Bell states to spin angular momentum eigenstates of two electrons. It is intended for
physics students and is not essential reading for students of other disciplines.
The spin of an electron ~s is given by
h̄
~s = ~σ , (4.36)
2
where h̄ is Planck’s constant divided by 2π and the Pauli operators ~σ are defined in Eqs. (4.1).
In general, spin angular momentum states, |S, mi, are specified by two quantum numbers S and
m. The total spin quantum number S is defined by

Sx2 + Sy2 + Sz2 = h̄2 S(S + 1), (4.37)

2
~ with
where Sx , for example, is the x-component of the total spin, so |S, mi is an eigenvector of S
eigenvalue h̄2 S(S + 1). The quantum number m is defined such that |S, mi is an also eigenstate of Sz
with eigenvalue h̄m, where m ranges from −S to S in integer steps (so there are 2S +1 values of m for a
given S). Thus the spin of an electron has S = 1/2, and its two basis states are |S = 1/2, m = 1/2i and
|S = 1/2, m = −1/2i, which are often written as | ↑i and | ↓i respectively. The latter notation indicates
that one thinks of these two states as spin “up” and spin “down”. By convention, the correspondence
between the basis states of the electron spin in physics, | ↑i and | ↓i, and the computational basis
states in quantum computer science, |0i and |1i, is taken to be

| ↑i ≡ |0i, | ↓i ≡ |1i. (4.38)

If we have two particles with total spin quantum numbers S1 and S2 then, as shown in textbooks
on quantum mechanics [Gri05], the “vector rule” for addition of angular momentum states that the
4.A. ANGULAR MOMENTUM EIGENSTATES 43

total spin quantum number of the combined system, Stot , takes integer values between S1 + S2 and
|S1 − S2 |. Thus, two electrons can have combined total spin quantum number Stot = 1 (for which
there are 3 values of mtot , namely 1, 0 and −1, and Stot = 0 (for which there is only one value of mtot ,
namely 0). These are called “triplet” and “singlet” states respectively. Note that the total number of
states works out right since there are 22 = 4 states out of which 3 have Stot = 1 and 1 has Stot = 0,
(i.e. 2 × 2 = 3 + 1).
It is also shown in the quantum mechanics textbooks that the states of two spin-1/2 particles with
specified values of Stot and mtot are given by

|Stot = 1, mtot = 1i = | ↑↑i ≡ |00i, (4.39a)

1 1
|Stot = 1, mtot = 0i = √ ( | ↑↓i + | ↓↑i ) ≡ √ ( |01i + |10i ) , (4.39b)
2 2
|Stot = 1, mtot = −1i = | ↓↓i ≡ |11i, (4.39c)
1 1
|Stot = 0, mtot = 0i = √ ( | ↑↓i − | ↓↑i ) ≡ √ ( |01i − |10i ) . (4.39d)
2 2
Eqs. (4.39a)–(4.39c) are the triplet states while Eq. (4.39d) is the singlet state.
Comparing with Eqs. (4.33) we see that
1
|Stot = 1, mtot = 1i = √ ( |β00 i + |β10 i ) , (4.40a)
2
|Stot = 1, mtot = 0i = |β01 i, (4.40b)
1
|Stot = 1, mtot = −1i = √ ( |β00 i − |β10 i ) , (4.40c)
2
|Stot = 0, mtot = 0i = |β11 i, (4.40d)

Equations (4.40) connect Bell states and angular momentum states, while Eqs. (4.39) connect compu-
tational basis states and angular momentum states.
In this chapter we have encountered three sets of states which can describe 2 qubits:

• the computational basis states |xyi,

• the Bell states |βxy i, and

• the angular momentum states |Stot , mtot i.

Each of these forms a basis set. In quantum computing we generally use computational basis states
but sometimes the Bell basis will be useful. However, there does not seem to be a use for angular
momentum basis states in quantum computing.
44CHAPTER 4. GENERAL STATE OF A QUBIT, NO-CLONING THEOREM, ENTANGLEMENT AND BE
Chapter 5

The Density Matrix

5.1 Introduction
The material in this chapter is not essential for the rest of the course and so could be omitted if
necessary. It is, however, necessary for advanced treatments of quantum error correction which go
beyond the discussion in Ch. 19.
We will be interested in situations where a system is in contact with another, possibly much larger,
system. Let’s call the system of interest subsystem A, and denote the other system by subsystem
B. We use the word “subsystem” for A and B, since we now consider them as the two parts of the
combined AB system. We want to describe the properties of subsystem A without explicitly including
the degrees of freedom of subsystem B. This is accomplished by the “density matrix”. Two situations
where the density matrix is useful are:

• To determine whether a state is a product state or entangled with respect to a partition of the
system into two subsystems.

• Understanding and correcting errors in quantum computers, where A is the qubits of the computer
and B is the environment which inevitably couples to the computational qubits. The environment
is very complicated with a huge (essentially infinite) number of degrees of freedom, so we cannot
include it explicitly and we need a description involving just the degrees of freedom of A, in
which the effects of the environment have been averaged over in some sense. This description is
provided by the density matrix. In practice, approximations will have to be made to determine
it. We will discuss the effects of the environment on a quantum computer in the Chapter 19.

For further reading on the density matrix see Refs. [NC00, Vat16, RP14].

5.2 Definition of the Density Matrix

To become familiar with the notation in a gentle way we first consider the density matrix of a system
in a well-defined quantum state. This is not terribly useful in itself, and will just be a rewriting of
results we have already obtained, but doing this will help us understand the much more useful case
of the density matrix of a subsystem A, say, coupled to another system B, such that the total system
A ⊗ B is in a well defined quantum state, but is entangled with respect to the A-B subdivision so
neither A nor B are in a well defined state.

45
46 CHAPTER 5. THE DENSITY MATRIX

5.2.1 Density matrix of a system in a well defined state

Consider, then, a quantum system in a well defined quantum state, |ψi. We define its density matrix
ρ by the outer product
ρ = |ψihψ|. (5.1)
We will understand the reason for this definition as we go along.1 The matrix elements of ρ are

hn|ρ|mi = hn|ψihψ|mi. (5.2)

Note that its diagonal elements are |hn|ψi|2 which are the probabilities, Pn , of a measurement finding
the system in state |ni. Since probabilities add up to one, the trace (sum of diagonal elements) must
satisfy
Tr ρ = 1. (5.3)
This will turn out to be a general property of a density matrix.
We shall now show that expectation values of operators in state |ψi can be expressed in terms of
ρ. We showed in Eq. (3.66) that the expectation value of an operator Ô is given by

hÔi = hψ|Ô|ψi. (5.4)

This can be re-expressed in terms of the density matrix ρ since

X
hψ|Ô|ψi = hψ|Ô|mihm|ψi
m
X
= hm|ψihψ|Ô|mi
m
X
= hm|ρ Ô|mi,
m
= Tr (ρ Ô), (5.5)

where we used Eq. (5.1) and the completeness relation n |nihn| = 1, the identity. Hence expectation
P
values can be obtained directly from the density matrix.
I emphasize that this is a trivial example in which the density matrix is not needed, but this
discussion provides a useful starting point for the general formulation of the density matrix described
in the next subsection.

5.2.2 Density matrix of a subsystem when the combined system is in a well defined
state
In the rest of this chapter we consider a system composed of two subsystems A and B, such that the
combined system is in a single quantum state. In general, subsystems A and B will be entangled, so
neither subsystem is in a well defined state. We will only be interested in one of the subsystems, A
say, and would like a description in terms of just the states of A. This is where the density matrix
becomes very useful.
We will assume that subsystem A has nA qubits, and subsystem B has nB qubits, so the number
of states of each subsystem is given by

NA = 2nA , NB = 2nB . (5.6)

1
In standard linear algebra notation, ρnm = cn c?m and ρmn = cm c?n = (ρnm )? , so ρ is Hermitian.
5.2. DEFINITION OF THE DENSITY MATRIX 47

As in the previous subsection, the density matrix of the whole system is given by
ρAB = |ψAB ihψAB |. (5.7)
This is a matrix involving the states of both A and B. We shall now show that information about
averages of the A degrees of freedom can be obtained without explicitly considering the B degrees of
freedom from the density matrix ρA where2
Nb
X
ρA = Tr B ρAB = hjB |ψAB ihψAB |jB i, (5.8)
jB =1

which is a matrix in the space of the states of A only. Here |jB i is a basis state for subsystem B. We
say that we have “traced out” the B states.
The state |ψAB i can be expressed in terms of basis states. In the Dirac notation this has the rather
cumbersome form
NA X
X NB
|ψAB i = |iA i|jB ihiA |hjB |ψAB i, (5.9)
iA =1 jB =1
where |iA i is a basis state for subsystem A. so I prefer to use here the standard matrix notation with
indices, rather than the Dirac notation, writing Eq. (5.9) as
NA X
X NB
|ψAB i = ciA jB |iA i |jB i. (5.10)
iA =1 jB =1

so the trace of the density matrix is always equal to 1. Also we see that
hi0 |ρA |ii = hi|ρA |i0 i? , (5.13)
omitting for conciseness the label A on |iA i and |i0A i when there is no ambiguity, so ρA is Hermitian.
As discussed earlier, this condition is a general property of density matrices.
We want to compute the expectation value of some operator ÔA acting only on the A degrees of
freedom, i.e.
hÔA i = hψAB |ÔA |ψAB i
X X
0 0
= hjB iA |ÔA |iA jB i c∗i0 0 ciA jB
A jB
iA ,iA 0 jB ,jB
0

2
As stated above, in this chapter we assume that the combined AB system is in single quantum state. If, instead, the
combined system is itself described by a non-trivial density matrix ρAB , then the reduced density matrix for subsystem
A is still given by ρA = Tr B ρAB but ρAB is no longer given by Eq. (5.7).
3
See footnote 1 on page 46.
48 CHAPTER 5. THE DENSITY MATRIX

which has the same form as Eq. (5.5). Thus we can compute averages of quantities involving subsystem
A from a knowledge of the density matrix ρA , without needing to explicitly consider subsystem B. All
necessary information about B is contained in the density matrix ρA . Note that ρA is the same no
matter what quantity of system A is to be calculated, and so it only has to be calculated once.
One can equivalently trace out the degrees of freedom in A to get the density matrix for subsystem
B, i.e. ρB = TrA ρAB , so X
0
hjB |ρB |jB i= ciA jB c∗iA j 0 . (5.16)
B
iA

As we shall see, it is useful to diagonalize the density matrix, obtaining its eigenvalues λα and
eigenvectors |φα i. Since the sum of the eigenvalues is equal to the trace we have, according to Eq. (5.12),
X
λα = 1, (5.17)
α

which suggests that the eigenvalues can be interpreted as probabilities (since probabilities also sum to
1). We shall now see that this interpretation is correct.
Let’s consider Eq. (5.15) in the basis where ρA is diagonal. We have

hÔA i = Tr ρA ÔA
X (5.18)
= λα hφα |ÔA |φα i.
α

Thus we get the expectation value of ÔA in state |ψAB i by (i) computing the expectation of ÔA in
state |φα i (an eigenvector of ρA ), (ii) multiplying by λα (the corresponding eigenvalue of ρA ), and (iii)
summing over α. This clearly shows that λα should be thought of as the probability that subsystem
A is in state |αi. To emphasize this, from now on we will denote the eigenvalues of the density matrix
by pα .
Furthermore, if we consider Eq. (5.15) in the basis |mi where Ô is diagonal we get

hÔA i = Tr ρA ÔA
X (5.19)
= hm|ρ|mihm|ÔA |mi,
m

where hm|ÔA |mi is an eigenvalue of ÔA . We interpret this to mean that the probability that a
measurement of Ô yields eigenvalue hm|Ô|mi, leaving the system in state |mi, is the corresponding
diagonal element of the density matrix, hm|ρ|mi.
To summarize, to determine the properties of a subsystem from the density matrix when the state
of the whole system is in a single quantum state:
5.3. DETERMINING IF A STATE IS ENTANGLED 49

1. We compute the elements of the density matrix according to Eq. (5.11).

2. The density matrix is Hermitian and so has real eigenvalues, pi . The sum of the eigenalues is
equal to one, and the eigenvalues are interpreted as probabilities.

3. If the eigenstate corresponding to eigenvalue pi is denoted by |ui i then the significance of the
density matrix is that the subsystem can be thought of as being in state |ui i with probablity pi .

4. If measurements are made in some basis, then the probability that the measurement finds the
system in state |ni is the corresponding diagonal element of the density matrix, hn|ρ|ni.

5. Expectation values of operators acting on the subsystem can be obtained from Eq. (5.15).

5.3 Determining if a state is entangled

One use of the density matrix is that it gives a systematic prescription for determining whether a state
is a product state or entangled with respect to a partition into subsystems A and B. If it is not a
product state we say that it is a mixed state and is “entangled” with respect to this partition. We
shall use the terms “mixed state” and “entangled state” interchangeably.
To see how the density matrix can determine if a state is a product state or is entangled with
respect to partition into A-B subsystems, let’s assume initially that |ψAB i is a product state, i.e.

|ψAB i = |φiA |µiB . (5.20)

In this case subsystem A is definitely in state |φi, so the eigenvalues of ρA must be p1 = 1 and pα = 0
for α 6= 1. Also the eigenvector for the non-zero eigenvalue must be given by |φ1 i = |φi.
Hence, if the state of the combined AB system is a product state then one of the eigenvalues of
the density matrix of A (or of B) will be 1 and the others zero. Conversely if more than one of the
eigenvalues of the density matrix are positive (since they are probabilities they can only be positive or
zero) the state is mixed, i.e. entangled.
It is actually not necessary to diagonalize the density matrix to determine if the state is a product
state or entangled. Instead it is sufficient to take its square. To see this note that
2 X 2
Tr ρA = pα , (5.21)
α

where we used that the trace is the sum of the eigenvalues, see Sec. 2.6, and that the eigenvalues of
the square ofPa matrix are the square of theP
eigenvalues of that matrix. Since the pα must lie between
0 and 1 and α pα = 1, one can show that α p2α ≤ 1, with the equality only holding if one of the pα
is 1 and the others zero. As an example, consider the case of two states, for which the eigenvalues are
p and 1 − p with 0 ≤ p ≤ 1. Now
2
2 X
Tr ρA = p2α = p2 + (1 − p)2 = 1 − 2p + 2p2 = 1 − 2p(1 − p). (5.22)
α=1

For 0 ≤ p ≤ 1, we see that 0 ≤ 2p(1 − p) ≤ 1/2 and is only zero for p = 0 and 1. Consequently,
2
Tr ρA < 1 unless p = 0 or 1.
Hence we have the following general criterion:

A 2
= 1, then we have a product state,
if Tr ρ (5.23)
< 1, then we have a mixed (entangled) state,
50 CHAPTER 5. THE DENSITY MATRIX

We emphasize again that Tr ρA = 1 always.

Sometimes one defines the Von Neumann entanglement entropy by
X
S(ρA ) = −Tr ρA log ρA (= − pα log pα ). (5.24)
α

It is easy to see that S(ρA ) = 0 if the state is a product state since limx→0 (x ln x) = 0. In the opposite
limit, of a maximally entangled state where pα = 1/NA for all α, one has S(ρA ) = log NA . For the
case where subsystem A is a single qubit, this gives S(ρA ) = log 2.

5.4 Some Simple Examples

In this section we consider some simple examples where subsystems A and B each have just a single
qubit.

5.4.1 Example 1:
We take
1
|ψAB i =(|0A 0B i + |0A 1B i − |1A 0B i − |1A 1B i) (5.25)
2
Note: We can see “by inspection” that this is a product state
1 1
|ψAB i = √ (|0A i − |1A i) ⊗ √ (|0B i + |1B i) . (5.26)
2 2

We shall now show how this result is obtained from the density matrices ρA and ρB .
We have
c00 = c01 = 12 , c10 = c11 = − 12 , (5.27)
so, from Eq. (5.11),

ρA
00 = c00 c00 + c01 c01 =
1
2
ρA 1
01 = c00 c10 + c01 c11 = − 2
(5.28)
ρA 1
10 = c10 c00 + c11 c01 = − 2
ρA 1
11 = c10 c10 + c11 c11 = 2 ,

and hence
A 1 1 −1
ρ = . (5.29)
2 −1 1
The eigenvalues are given by
1
2 −λ − 12
=0 (5.30)
− 12 1
2 −λ
so
1 2
2
λ− 2 − − 12 =0 (5.31)
which gives λ = 1 and 0. Since only one eigenvalue is non-zero this is a product state, as we saw above.
One easily finds that
A 2
1 1 −1
ρ = , (5.32)
2 −1 1
2
and so Tr ρA = 1 as required for a product state.
5.4. SOME SIMPLE EXAMPLES 51

The eigenvector with eigenvalue λ = 1 is given by

1 1 −1 a a
= . (5.33)
2 −1 1 b b

Both the resulting equations give b = −a so the normalized eigenvector is

1
|φ1,A i = √ | (|0A i − |1A i) . (5.34)
2

Hence, with probability 1, subsystem A is in state |φ1 i, in agreement with Eq. (5.26).
One can repeat the same calculation for ρB . The results are

ρB
00 = c00 c00 + c10 c10 =
1
2
ρB
01 = c00 c01 + c10 c11 =
1
2
(5.35)
ρB
10 = c01 c00 + c11 c10 =
1
2
ρB
11 = c01 c01 + c11 c11 =
1
2,

so
1
B 1 1
ρ = (5.36)
2 1 1
The eigenvalues are given by
1 1
2 −λ 2
1 1 =0 (5.37)
2 2 −λ
so
1 2 1 2

λ− 2 − 2 =0 (5.38)
which gives λ = 1 and 0, the same as for ρA . It is true in general that the non-zero eigenvalues of ρA
and ρB must be equal, provided that the combined AB system is in a single quantum state. This is
discussed further in the more advanced material in Appendix 5.A
The eigenvector with eigenvalue λ = 1 is given by

1 1 1 a a
= . (5.39)
2 1 1 b b

Both the resulting equations give b = a so the normalized eigenvector is

1
|σ1,B i = √ | (|0B i + |1B i) . (5.40)
2

Hence, with probability 1, subsystem B is in state |σ1 i, again in agreement with Eq. (5.26).

5.4.2 Example 2:
In this example we take one of the Bell states,
1
|ψAB i = √ (|0A 0B i + |1A 1B i) , (5.41)
2
which is clearly entangled. Here we have

c00 = c11 = √1 , c10 = c01 = 0. (5.42)

2
52 CHAPTER 5. THE DENSITY MATRIX

Hence
ρA
00 = c00 c00 + c01 c01 =
1
2
ρA
01 = c00 c10 + c01 c11 = 0
(5.43)
ρA
10 = c10 c00 + c11 c01 = 0
ρA 1
11 = c10 c10 + c11 c11 = 2 ,

so
A 1 1 0
ρ = . (5.44)
2 0 1
This is already in diagonal form so we read off that the two eigenvalues are both equal to 1/2. Since
more than one eigenvalue is positive, the state is mixed. A density matrix like this, with all eigenvalues
equal, is maximally entangled. It is easy to see that the same eigenvalues are obtained from ρB .
Trivially
A 2
1 1 0 1
Tr ρ = Tr = (< 1), (5.45)
4 0 1 2
which indicates, again, that Eq. (5.41) is a mixed state.

5.4.3 Example 3:
This example is slightly more complicated but it is useful to go through it in detail. We take
1 √ √
|ψAB i = √ |0A 0B i + 3|0A 1B i − 3|1A 0B i − |1A 1B i , (5.46)
8
so q q
c00 = √1 , c01 = 3
c10 = − 38 , c11 = − √18 .
8 8, (5.47)
It follows that
ρA
00 = c00 c00 + c01 c01 =
1
2
√
ρA
01 = c00 c10 + c01 c11 = − 4
3
√ (5.48)
ρA
10 = c10 c00 + c11 c01 = − 43
ρA
11 = c10 c10 + c11 c11 =
1
2,

so √
A 1 2
√ − 3
ρ = . (5.49)
4 − 3 2
The eigenvalues are found to be
1 √ 1 √
p1 = 2+ 3 , p2 = 2− 3 , (5.50)
4 4
with corresponding eigenvectors
1
|φ1,A i = √ (|0A i − |1A i)
2
(5.51)
1
|φ2,A i = √ (|0A i + |1A i) .
2
Thus subsystem A can be regarded as being in state |φ1 i with probability p1 and in state |φ2 i with
probability p2 .
5.4. SOME SIMPLE EXAMPLES 53

It is straightforward to show that

√
A 2
1 7√ −4 3
ρ = (5.52)
16 −4 3 7

and so
2 7
Tr ρA = (< 1), (5.53)
8
in agreement with Eq. (5.46) being a mixed state.
Repeating the same arguments for ρB gives

ρB
00 = c00 c00 + c10 c10 =
1
2
√
ρB
01 = c00 c01 + c10 c11 = 4
3
√ (5.54)
ρB
10 = c01 c00 + c11 c10 = 4
3

ρB
11 = c01 c01 + c11 c11 =
1
2,

so √
1
B
ρ = √2 3
. (5.55)
4 3 2

The eigenvalues are found to be again given by Eq. (5.50) and the corresponding eigenvectors are

1
|σ1,B i = √ (|0B i + |1B i)
2
(5.56)
1
|σ2,B i = √ (−|0B i + |1B i) .
2

Subsystem B can therefore be regarded as being in state |σ1 i with probability p1 and in state |σ2 i with
probability p2 .
It is interesting to note that if we define

√ √
q q
1 1
c1 = 2 + 3, c2 = 2 − 3, (5.57)
2 2
so
p1 = c21 , p2 = c22 , (5.58)

then a bit of algebra4 shows that

|ψAB i = c1 |φ1,A i ⊗ |σ1,B i + c2 |φ2,A i ⊗ |σ2,B i. (5.59)

This is an example of Schmidt decomposition which is described in the more advanced material in
Appendix 5.A. The coefficients c1 and c2 are known as Schmidt coefficients.
According to Eq. (5.59) we can decompose |ψAB i in the following way: with probability p1 = c21
subsystem A is in state |ψ1,A i and subsystem B is in state |σ1,B i, and with probability p2 = c22 (= 1−p1 )
subsystem A is in state |ψ2,A i and subsystem B is in state |σ2,B i. In this way one can see why the
non-zero eigenvalues of the two subsystem density matrices ρA and ρB must be equal, namely for both
matrices the eigenvalues are given by c21 and c22 .
4
p √ p √ √ p √ p √ √
Note that 2 + 3 − 2 − 3 = 2 and 2 + 3 + 2 − 3 = 6, which are proved by squaring both sides.
54 CHAPTER 5. THE DENSITY MATRIX

5.5 Systems not in a single quantum state

An additional application for the density matrix is for systems which are not described by a single
quantum state. An example would be to characterize the behavior of a stream of particles (electrons,
say) which are polarized in different directions. We need to average over the different spin orientations
using standard classical statistics.
Suppose for example that a fraction p of the electrons are polarized in the +z direction, i.e are
in state |0i, and a fraction 1 − p are in the -z direction, i.e are in state |1i. The density matrix for
particles in state |0i is
1 0
|0ih0| = , (5.60)
0 0
and that for state |1i is
0 0
|1ih1| = . (5.61)
0 1
The density matrix of the stream of electrons is therefore

p 0
ρ = p|0ih0| + (1 − p)|1ih1| = . (5.62)
0 1−p
For a less trivial example, consider the case that a fraction p of the electrons are in state |0i
(polarized in the +z direction) while fraction 1 − p are polarized in state |+i = √12 (|0i + |1i) (polarized
in the +x direction). The density matrix can then be conveniently written as
ρ = p|0ih0| + (1 − p)|+ih+|. (5.63)
Note that states |0i and |+i are not orthogonal. If we rewrite Eq. (5.63) in terms of orthogonal
states, (for example computational basis states) it becomes more complicated. To do this we note that

1 1 1
|+ih+| = , (5.64)
2 1 1
so the density matrix for the beam of electrons can be written in the computational basis as

(1 + p)/2 (1 − p)/2
ρ= . (5.65)
(1 − p)/2 (1 − p)/2
The eigenvalues of ρ are
1h p i
λ± = 1 ± 1 − 2p + 2p2 , (5.66)
2
while the eigenvectors are  p 
p± 1 − 2p + 2p2
|ψ± i = C± 
 1−p ,

(5.67)
1
where the C± are normalization factors which are sufficiently messy that I prefer to not write them
down.
Here we have given a description of the density matrix in terms of non-orthogonal states.5 Note
that the factors of p and (1 − p) in Eq. (5.63) are not the eigenvalues. These have to be determined in
an orthogonal basis and are given by Eq. (5.66).
5
The statistical properties of measurements on a system are completely determined by its density matrix. However,
this example shows that the interpretation of the density matrix in terms of the system being in different states with
various probabilities is not unique if one allows for non-orthogonal states. Sometimes, as in this example, it may be
simpler to use non-orthogonal states.
5.6. CONCLUSIONS 55

5.6 Conclusions
We have seen that the density matrix is useful when studying the properties of a system composed of
two subsystems A and B. More precisely, it can be used to:

• Determine the properties of one of the subsystems A without explicitly having to include the
degrees of freedom of the other subsystem B. This is particularly useful if B contains a very
large number of degrees of freedom. An example of a large “subsystem” is the environment, with
which, unfortunately, the qubits of a quantum computer unavoidably interact.

• If the combined AB system is in a single state, the properties of the subsystem density matrices
tell us whether that state is a product state with respect to the A-B partition or whether, on
the other hand, it is a mixed state in which the two subsystems are entangled.

Problems
5.1. 1. Compute the reduced density matrix for qubit A for the 2-qubit state
1
√ (|0A 1B i − |1A 0B i) . (5.68)
2
How can you deduce from this density matrix that the state is entangled?
2. Compute the reduced density matrix for the left-hand qubit in the state in Eq. (3.105).
How can you deduce from this density matrix that the state is a product state?

5.2. Find the reduced density matrices for each subsystem for the state
1 √ √
|ψAB i = √ |0A 0B i + 3|0A 1B i + 3|1A 0B i + |1A 1B i (5.69)
2 2

Determine the eigenvalues of each density matrix, ρA and ρB , and hence deduce if the state is
separable or entangled.
Note: There is general theorem which states that if a system in a single state is decomposed into
subsystems A and B then ρA has the same non-zero eigenvalues as ρB .

5.3. Using your result for ρA , the density matrix of subsystem A, in question 5.2 compute hXA i and
hZA i, where the average is for state |ψAB i.

Appendices

5.A Schmidt Decomposition

(This is more advanced material which is not required for the course.)
It can be shown [NC00] that a state |ψAB i can be written as
X
|ψAB i = cα |φα,A i ⊗ |σα,B i, (5.70)
α

where the number of terms is less than or equal to the smaller of NA and NB , and the |φα i are mutually
orthogonal as are the |σα i. The cα are real, non-negative numbers called Schmidt coefficients. It is
56 CHAPTER 5. THE DENSITY MATRIX

always possible to make the cα real and non-negative because the phases of |φα,A i and |σα,B i can be
chosen independently. The sum in Eq. (5.70) is known as a Schmidt decomposition. An example of a
Schmidt decomposition is shown in Eq. (5.59).
Using the definition of ρA given in Eq. (5.8) and working in the |φα i basis for the states of A and
the |σα i basis for the states of B, one has

ρA = TrB |ψAB ihψAB |

=
X
c2α |φα,A i hφα,A |. (5.71)
α

which shows that ρB has non-zero eigenvalues pα = c2α (the same as for ρA ) with corresponding
eigenvectors |σα iB . The number of non-zero Schmidt coefficients (the cα ) is called the Schmidt number
(or Schmidt rank). If the Schmidt number is 1 the state is a product state, while if it is greater than
1, the state is entangled (mixed).

5.B Change in the density matrix under a unitary transformation

If qubit A (more generally subsystem A) is acted by a unitary transformation U A then we show now
0
that the density matrix for subsystem A changes from ρA to ρ A where:
0 †
ρ A = U A ρA U A . (5.73)

To see this, note that |ψAB i in Eq. (5.10) goes to |ψAB0 i where

X
0
|ψAB i= c0ij |iA i ⊗ |jB i (5.74)
i,j

in which X
c0ij = A
Uik ckj (5.75)
k

describes the change in amplitudes produced by the action of U A . Note that the second index j on
the amplitude cij refers to subsystem B and is not changed. Hence
0 0
X
ρi,iA0 = c0ij ci∗0 j
j
X
A ∗
= Uik c UiA∗
1 k1 j
0 k ck j
2 2
j,k1 ,k2
 
X X
= A 
Uik 1
ck1 j c∗k2 j  UiA∗
0k
2
(5.76)
k1 ,k2 j
X
A A
†
= Uik ρ
1 k1 ,k2
UkA2 i0
k1 ,k2
†
= U A ρA U A ,
i,i0
5.B. CHANGE IN THE DENSITY MATRIX UNDER A UNITARY TRANSFORMATION 57

so we obtain Eq. (5.73).

Note that the most general operation that can be applied to the combined AB system is a unitary
transformation acting on the whole system, not just on subsystem A. One can show that if one
performs such a general unitary operation on the combined system, and then recomputes the density
matrix of subsystem A, the new density matrix is not in general related to the old one by a unitary
transformation. This is how irreversible processes can occur in a subsystem which is coupled to the
environment. A more detailed discussion of this is beyond the scope of the course but the interested
student is referred to Nielsen and Chuang [NC00] and Rieffel and Polak [RP14].
58 CHAPTER 5. THE DENSITY MATRIX
Chapter 6

Einstein-Podolsky-Rosen (EPR), Bell’s

inequalities, and Local Realism

6.1 Introduction
In classical physics, objects have definite properties irrespective of whether we measure them or not.
This is called objective reality. A measurement just reveals a property which already existed.
However, this is not the case in quantum mechanics. To see this, suppose that a qubit is initially
in the state
1
|ψi = √ (|0i + |1i) . (6.1)
2
If we measure the qubit (i.e. measure Z) the Born rule states that we get |0i (i.e. eigenvalue +1) with
probability 12 and |1i (i.e. eigenvalue −1) with probability 21 . However, we can not infer from this that,
before the measurement, the qubit was in state |0i with probability 12 and |1i with probability 12 , for
this leads to a contradiction as we will now see.
If we apply the Hadamard operator,

1 1 1
H=√ (6.2)
2 1 −1

to |ψi we get
H|ψi = |0i. (6.3)
Hence, according to the Born rule if we measure a qubit in state H|ψi, i.e. after applying the Hadamard,
we get |0i with probability 1.
However, suppose we assume that, before the measurement, the qubit in state |ψi corresponds to
being in state |0i with probability 12 and |1i with probability 21 , then the action of H on |ψi produces
either √12 (|0i + |1i) or √12 (|0i − |1i), again with equal probability, and so, in either case, measurement
of the qubit gives |0i or |1i, again with equal probability. This is in contradiction to Eq. (6.3), which
states that the measurement gives |0i with probability 1. Hence we can not assume that state |ψi
in Eq. (6.1) corresponds to its being in |0i with probability 21 and |1i with probability 12 before the
measurement, even though this is the result of the measurement. In other words the description of the
world provided by quantum mechanics does not have objective reality.
One person who did not like that quantum mechanics describes a world without objective reality
(and that quantum mechanics involves probabilities at a fundamental level), was Albert Einstein1 .
1
He reputedly claimed to Niels Bohr that “God does not play dice with the universe”. Bohr’s reported reply, which
may be apocryphal, was “Albert, you shouldn’t tell God what to do”.

59
60CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM

In 1935 he wrote a famous paper with Podolsky and Rosen (now called EPR), in which they simply
asserted that nature has the property of objective reality. According to this picture of the world, the
reason that, in general, measurements do not give a definite answer but give different results with
various probabilities, is that quantum mechanics, as we have it, is incomplete. Rather, there is a
deeper level of structure, which we don’t have access to at present, with extra, hidden, variables, such
that if we could access those variables, the measurement would be deterministic and would just reveal
the state of the system which existed previously, i.e. we would have objective reality. The fact that
measurements on a quantum state do not give a unique result is, in this picture, because the hidden
variables have different values when the different measurements are done.
The classical, EPR picture is called local realism:

1. Realism. The measured values of each particle are objectively real. They have definite values
before measurement and irrespective of whether or not a measurement is made.

2. Locality. A measurement of A does not affect B instantaneously. More precisely, the measure-
ment of A has no effect on B if A and B are spatially separated, i.e. |~rA − ~rB | > ct where t is
the time between measurements and c is the speed of light. This is just special relativity, one of
Einstein’s greatest insights.

6.2 An EPR Experiment

In this chapter we will describe an experiment in which quantum mechanics gives different results
from any local realistic theory. Such experiments have been done and found to be in agreement with
quantum mechanics and in disagreement with local realism.
EPR examined a thought experiment with entangled particles. We shall consider a simpler version
of the EPR thought experiment due to Bohm. For this experiment we will derive a condition (an
inequality) which any theory with local realism must have, but which is violated by quantum mechanics.
This is one of many inequalities of a similar nature, the first of which was discovered by John Bell.
Hence they are known quite generally as Bell’s inequalities.
We suppose that an experimenter prepares pairs of 2-state particles (qubits) in the following en-
tangled Bell state
1
|ψi = √ (|01i − |10i) . (6.4)
2
In experiments the qubits will be photons. He sends one particle of the pair to Alice and the other, in
the opposite direction, to Bob, see Fig. 6.1. He then repeats this for many pairs. Suppose that Alice
and Bob measure the particles in the computational (Z) basis. If Alice measures |0i (for which the
eigenvalue of Z is +1) then Bob must measure the opposite, i.e. |1i (for which the eigenvalue of Z is
−1).
Now consider a general basis. As discussed in Chapter 4 a general qubit state |0n̂ i is characterized
by two parameters, θ and φ, which are the polar and azimuthal angles of a point in direction n̂ on
the unit sphere, known, in this context, as the Bloch sphere, see Fig. 4.1. The state on the antipodal
point on the sphere is denoted by |1n̂ i. The connection between |0n̂ i and |1n̂ i and the basis states in
the computational basis, |0i and |1i, is given by Eqs. (4.12).
It is shown in Eq. (6.36) in Appendix 6.C, that Eq. (6.4) can equivalently be written as

1
|ψi = √ (|0n̂ 1n̂ i − |1n̂ 0n̂ i) , (6.5)
2

ignoring an overall phase, for any direction n̂. Hence the state in Eq. (6.4) has the interesting property
6.3. BELLS’ INEQUALITY 61

1 or −1 1 or −1

Alice Source Bob

measure in direction a or b or c measure in direction a or b or c

Figure 6.1: Sketch of the experimental setup for the version of the EPR experiment discussed in
the text. The source emits pairs of qubits (in practice photons) in the state |ψi = √12 (|01i − |10i)
given in Eq. (6.4). For each pair Alice and Bob decide independently and randomly which of the three
non-orthogonal directions, ~a, ~b or ~c to measure along. The result in each case is +1 or −1. The double
lines indicate that the result of the measurement is a classical bit.

that Alice and Bob will always get opposite results as long as they measure in the same basis2 no
matter what that basis is.
The results of the measurements of Alice and Bob are therefore strongly correlated. Of course,
one can also have correlations between experimental results in classical systems. However, we will
show below that the quantum correlations in entangled states like that in Eq, (6.5) are different from
classical correlations.
In the experiment that we will consider, Alice and Bob each choose to measure in one of three3
distinct, non-orthogonal directions ~a, ~b and ~c. Every time they receive a particle they separately choose
at random one of these three directions and record whether they get +1 or −1
The timing of the measurements is important. They must be done in a causally disconnected
manner so information about the direction that Alice, for example, has chosen can not have reached
Bob when he makes his measurement, and vice versa.
The setup is sketched in Fig. 6.1.

6.3 Bells’ Inequality

If Alice and Bob choose the same direction we know that they will get opposite results. Next consider
in some detail what happens when Alice and Bob do not choose the same direction.
Firstly let us see what happens in a classical picture with objective reality.
The qubits then have a well defined state prior to the measurement. The reason that we don’t
always get the same result for measurements along a given direction must be that the qubit pairs
are not all emitted in the same state each time. Rather, each possible result of the measurements
corresponds to a particular type of initial state. There are three directions, for each of which Alice and
Bob get one of two possible results. Let’s first consider the results that Alice might get. For each of
the three directions she gets one of two possible results, ±1. With objective reality, the result of the
measurement is pre-ordained before the measurement takes place, it just depends on the state of the
photon. Furthermore, even though only one measurement direction is used for each photon, assuming
objective reality it makes sense to talk about the results that Alice would have got if she had measured
in one of the other directions. For example, there are photons where Alice would find +1, +1, +1 in
the three directions. Let call these photons type 1. Since there are 23 = 8 possible results for the three
directions, there are eight possible types of photon, as far as Alice is concerned.
2
Note: when we say “measure the qubit in the n̂ basis” we mean measure ~σ · n̂, where, as discussed in Sec. 2.4, the
σα , (α = x, y, z) are just another notation for the Pauli operators X, Y and Z.
3
In the simpler setup of two directions, one finds that there is no incompatibility between quantum mechanics and
local realism. Three directions is the minimum needed to derive an inequality which is violated by quantum mechanics.
62CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM

Alice Bob
Population ~a ~b ~c ~a ~b ~c
N1 + + + − − −
N2 + + − − − +
N3 + − + − + −
N4 + − − − + +
N5 − + + + − −
N6 − + − + − +
N7 − − + + + −
N8 − − − + + +

Table 6.1: The eight types of qubit pairs give different results when measured along the ~a, ~b and ~c
directions. Note that Alice and Bob get opposite results if they measure in the same direction, so
Bob’s side of the table is precisely the opposite of Alice’s. Hence there are 23 possible sets of outcomes.

Now we incorporate Bob’s results with those of Alice. Assuming that Bob’s results are not affected
by the measurement direction chosen by Alice, which is the case if the measurements are done in a
causally disconnected manner, Bob’s results are also determined only by the state of his photon when
emitted by the source. In this case, if he and Alice measure in the same direction we know that they
must get opposite results4 . For example if Alice receives a photon which would give +1, +1, +1 in the
three directions, then Bob’s photon would give −1, −1, −1. Hence, including both Alice and Bob’s
results, there are still only eight possible types of photon pair that we need consider, and these are
shown in Table 6.1. For the i-th type, Ni pairs will be generated where
8
X
N= Ni , (6.6)
i=1

is the total number of pairs.

Let us discuss next some examples taken from Table 6.1. For a qubit pair in population 4, Alice
will get +1 if she measures in direction ~a, and Bob will get +1 if he measures in direction ~b. Similarly
for population 7, Alice will get −1 if she measures in direction ~a and Bob will get +1 if he measures
in direction ~b. In all cases, if Alice and Bob measure in the same direction they get opposite results.
We now make some simple observations. (Each observation is simple but one needs to focus to
follow the thread of the argument to the end.) Clearly Ni ≥ 0, so it must be true that

N3 + N4 N2 + N4 N3 + N7
≤ + , (6.7)
N N N
since N2 and N7 , which can not be negative, have been added on the RHS.

• (N3 , N4 ) Let’s suppose that Alice measures along ~a and Bob along ~b. The appropriate columns
of Table 6.1 are collected in Table 6.2 for clarity. According to Table 6.2 only for populations 3
and 4 would Alice and Bob both get +1. None of the other populations give this. Hence, among
the times that Alice measures along ~a and Bob along ~b, the probability that they both get +1
4
The state |ψi in Eq. (6.4), is known as a “spin singlet” state in the physics literature and has zero total spin angular
momentum. Assuming that the initial state of the source, before the qubits are omitted, has zero angular momentum,
then conservation of angular momentum requires that the qubits be emitted in state |ψi and therefore that Alice and
Bob must get opposite results if they measure in the same direction.
6.3. BELLS’ INEQUALITY 63

Alice Bob
Population ~a ~b
N1 + −
N2 + −
N3 + +
N4 + +
N5 − −
N6 − −
N7 − +
N8 − +

Table 6.2: The columns of Table 6.1 for the case when Alice measures along ~a and Bob along ~b.

is (N3 + N4 )/N . Let’s call this P (+~a; +~b), in which the first argument refers to Alice and the
second to Bob, i.e.
N3 + N4
= P (+~a; +~b). (6.8)
N
• (N2 , N4 ). Following similar arguments, only for populations 2 and 4 would Alice get +1 mea-
suring along ~a and Bob get +1 measuring along ~c. Hence
N2 + N4
= P (+~a; +~c). (6.9)
N

• (N3 , N7 ). Similarly, only for populations 3 and 7 would Alice get +1 measuring along ~c and
Bob get +1 measuring along ~b. Hence
N3 + N7
= P (+~c; +~b). (6.10)
N

Combining Eqs. (6.7)–(6.10), we have5

P (+~a; +~b) ≤ P (+~a; +~c) + P (+~c; +~b) . (6.11)

In the simple case that all the populations are equal, each probability is 1/4 so the inequality is
trivially satisfied. Equation (6.11) is an example of a Bell’s inequality. It is satisfied by any theory
with local realism. Note that there is nothing sophisticated about this Bell’s inequality; it is just
bookkeeping. I emphasize that Eq. (6.11) has nothing to do with quantum mechanics. In fact, we will
now see that it is violated by quantum mechanics for a broad range of measurement directions ~a, ~b, ~c.
We therefore now consider what quantum mechanics has to say.
The 2-qubit state generated by the source is given by Eq. (6.5) for any direction n̂, where |0n̂ i and
|1n̂ i are given by Eqs. (6.30). We take the θ = φ = 0 direction to be that of ~a, so we write
1
|ψi = √ (|0~a i1 |1~a i2 − |1~a i1 |0~a i2 ) , (6.12)
2
where we indicate on the RHS which qubit is meant (1 for Alice’s and 2 for Bob’s).
5
Recall what we mean by these probabilities. P (+~a; +~b), for example, means that, out of the times when Alice
measures along ~a and Bob measures along ~b, this is the probability that they both get +1. The sum of the probabilities
for the different measurement results for these fixed directions must add to 1, i.e. P (+~a; +~b) + P (+~a; −~b) + P (−~a; +~b) +
P (−~a; −~b) = 1.
64CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM

We now compute P (+~a; +~c) according to quantum mechanics. We need the probability amplitude
for the state in Eq. (6.12) to have eigenvalue +1 along ~a for Alice and eigenvalue +1 along ~c for Bob,
i.e. |0~a i1 |0~c i2 . Hence, to get P (+~a; +~c) we compute first the amplitude
1
h0 | h0
1 ~a 2 ~c | |ψi = √ h0 |0 i h0 |1 i
~a ~a 1 ~c ~a 2 − h0 |1 i h0 |0 i
~a ~a 1 ~c ~a 2 , (6.13)
2
Now h0~a |0~a i = 1 and h0~a |1~a i = 0, so

1
h0~a 0~c |ψi = √ h0~c |1~a i. (6.14)
2
If ~c is at angles (θac , φac ) relative to ~a, then, according to Eq. (6.30a),

1 1 θ
√ h0~c |1~a i = √ eiφac sin ac , (6.15)
2 2 2
so
2
21 θac 1 θac
P (+~a; +~c) = |h0~a 0~c |ψi| = eiφac sin = sin2 . (6.16)
2 2 2 2
We recall that out of the times when Alice measures along ~a and Bob measures along ~c, this is the
probability that they both get +1. For these same directions there are three other possibilities. It is
straightforward to check that P (−~a; −~c) = P (+~a; +~c), and a calculation shows that

1 2 θac
P (+~a; −~c) = P (−~a; +~c) = cos . (6.17)
2 2

Hence the sum of the probabilites for the four different (±1) results when Alice measures along ~a and
Bob measures along ~c adds up to 1, i.e.

P (+~a; +~c) + P (+~a; −~c) + P (−~a; +~c) + P (−~a; −~c) = 1, (6.18)

as required.
A further check on Eq. (6.16) is that it predicts P (+~a; +~c) → 0 if ~a and ~c are in the same direction.
This result is correct because when Alice and Bob measure in the same direction they must get different
results because of the nature of |ψi, see Eq. (6.5).
Similarly one has

1 θab
P (+~a; +~b) = sin 2
, (6.19)
2 2

~ 1 2 θcb
P (+~c; +b) = sin . (6.20)
2 2

Hence Bell’s inequality, Eq. (6.11), when applied to quantum mechanics, gives

2 θab 2 θac 2 θcb
sin ≤ sin + sin . (6.21)
2 2 2

As we shall now see, it is easy to find cases where this is violated.

Consider the situation in Fig. 6.2 where θac = θcb = θ, so θab = 2θ, and take θ = π/3. We have

2 θac 2 θcb 2 θ π 1
sin = sin = sin = sin2 ( ) = , (6.22)
2 2 2 6 4
6.4. SUMMARY 65

a b
θ θ

Figure 6.2: A possible choice of directions for which the Bell’s inequality in Eq. (6.21) is violated.

and
2 θab π 3
sin = sin2 θ = sin2 = . (6.23)
2 3 4
Hence the LHS of Eq. (6.21) is 3/4 while the RHS is 1/2 so the inequality is violated. For general θ in
Fig. 6.2, the inequality in Eq. (6.21) can be written
√

θ
sin θ ≤ 2 sin , (6.24)
2

which is violated for the broad range 0 < θ < π/2, as shown graphically in Fig. 6.3.

1.4
sin θ
1.2 2 sin(θ/2)
√

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
θ

Figure 6.3: A graph showing that the inequality in Eq. (6.24) is violated for 0 < θ < π/2.

6.4 Summary
We have seen that quantum mechanics violates Bell’s inequalities. These inequalities are
satisfied by any theory with local realism. Experiments along the lines of that sketched in Fig. 6.1
have been done, using polarized photons. These experiments agree with quantum mechanics
and disagree with local realism. See [Link] for a brief
discussion of these experiments. Among the different experiments there are variations in the initial
state of the entangled qubits and in which Bell’s inequality is being tested, but they are all equivalent.
The more sophisticated experiments choose (randomly) the polarization directions while the photons
66CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM

are in flight. This makes it impossible for the emitted photons to be affected by the chosen orientations
of the polarizers. Similarly, the polarizer directions are set at times such that that information about
the direction of one polarizer has not had time to reach the other polarizer when it performs its
measurement. (Note that information can not travel faster than the speed of light.) Features of the
experiment like these are necessary to show that no local hidden variable theory can explain the data.

Bell’s inequalities characterize quantum correlations between two entangled qubits, which are dif-
ferent from classical correlations. Very recently non-classical correlations, distinct from those of Bell,
have been found in experiments with three sources of pairs of entangled photons and three detectors in
the shape of a triangle, see [Link] Thus the study of non-
classical correlations in quantum mechanics, stimulated by EPR in the 1930s, made precise by Bell in
the 1960s, and studied experimentally since the 1970s, remains an active field up to the present day.
Although the experimentally found violations of Bell’s inequalities rule out local theories with
objective reality, they do not, in principle, rule out non-local 6 theories with objective reality. However,
these would violate special relativity. Hence very few physicists think that a non-local theory of
quantum mechanics will turn out to be the correct theory of nature.
In an EPR-like experiment the entangled state changes when one qubit is measured. We can
ask whether any information is instantaneously transmitted to the other qubit at the moment of
measurement. Since the two qubits in an entangled state are correlated, naively one might imagine
that this occurs. If so, special relativity, one of the cornerstones of modern physics, would be violated.
Fortunately, no information is transmitted at the moment of measurement, as we show in Appendix
6.B, so special relativity is preserved.
To conclude, we see that quantum mechanics is strange:
• Unlike in classical physics, probabilities enter in a fundamental way.
• Unlike in classical physics, we do not have objective reality. Reality is an emergent concept for
bigger systems when we go over to a description in terms of classical physics.
Many physicists feel uncomfortable with these aspects of quantum mechanics, and hope that a
better insight will emerge. But, in the 87 years since the EPR paper this has not happened, so we will
probably have to continue living with the strange world of quantum mechanics as we now understand
it.
Can we use the differences between the strange quantum world and the familiar classical world to
do more efficient computation, at least for some problems? This question will be the focus of the rest
of the course.

Problems
6.1. We showed in this chapter that, out of the times that Alice measures along ~a and Bob along ~c,
the probability that they both get +1 is given by P (+~a; +~c) = 21 sin2 (θac /2), where θac is the
angle between the directions ~a and ~c.
Perform a similar calculation to compute the probability that Alice gets −1 and Bob gets +1,
which we call P (−~a; +~c).
6.2. We showed in this chapter that the so-called “singlet” state,
1
|β11 i = √ (|01i − |10i) , (6.25)
2
6
The term non-local refers to information propagating faster than the speed of light.
6.A. THE 2022 PHYSICS NOBEL PRIZE 67

which is one of the Bell states, has the same form, apart from an unimportant overall phase
factor, in all bases. Show that the same result is not true for the Bell state
1
|β01 i = √ (|01i + |10i) . (6.26)
2
Note: like |β01 i, the other two Bell states, |β10 i and |β00 i, also have a different form in other
bases.

Appendices

6.A The 2022 Physics Nobel Prize

In 2022 the Royal Swedish Academy of Sciences awarded the Nobel Prize in Physics to Alain Aspect,
John F. Clauser and Anton Zeilinger
“for experiments with entangled photons, establishing the violation of Bell inequalities and
pioneering quantum information science”.
The announcement can be seen at
[Link]
John Bell himself died unexpectedly of a cerebral hemorrhage in 1990. Apparently he had been
nominated for the Nobel prize that year. Whether or not he would have received it then, he would
certainly have received it at some point had he not died prematurely.
Here is a summary of the contributions made by the three awardees gleaned from the Nobel an-
nouncement.
John Clauser
The first experiment to test Bell inequalities was performed by Stuart Freedman (now deceased) and
Clauser, who found a violation of a version of the Bell inequality proposed earlier by John Clauser,
Michael Horne, Abner Shimony and Richard Holt (CHSH). The results agreed well with quantum
mechanics.
Alain Aspect
An assumption in the Bell inequalities is that the two observers, Alice and Bob, make random choices
of what to measure, independent of each other. For this to be true, one must make sure that Alice
cannot send a message to Bob about which polarization direction she will measure which Bob receives
before he chooses his polarization [Link] other words, Alice will not influence Bob’s choices.
Assuming that special relativity is correct, this locality condition amounts to making sure that such
a message would have to travel with a speed greater than that of light. Alain Aspect was the first to
design an experiment to overcome this locality loophole. Aspect ensured the independence of Alice
and Bob by using polarization settings that changed randomly during the time of flight of the photons
between the detectors. His results agreed well with quantum mechanics and violated the relevant Bell
inequality.
Anton Zeilinger
We have discussed in this course that an unknown, arbitrary quantum states can not be copied, i.e.
cloned. However, as we will see in Chapter 21, it is possible, using entanglement, to “teleport” an
arbitrary state from one position to another, as long as the original copy is destroyed. Zeilinger’s
group was one of the first to demonstrate teleportation. It has been possible to create entanglement
over very large distances, and a Chinese group, in collaboration with Zeilinger, was able to distribute
entanglement between China and Australia using a satellite.
The locality loophole in experiments to test Bell inequalities was mentioned above in the context
of Aspect’s work. This loophole was largely eliminated by Aspect, but in his experiment the distance
68CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM

between the polarizers was too small to allow for truly random settings. Later, Zeilinger’s group was
able to test the inequality under strict local conditions with the observers separated by no less than
400 m.
Another loophole is the “detection loophole”, which arises because no detector has 100% efficiency,
so a quantum skeptic could argue that the lost photons might conspire to give a fake violation of a
Bell inequality. While this possibility seems unlikely it is important to rule it out. The detection
loophole was first closed in an experiment using trapped ions rather than photons. However, in these
systems one could not close the locality loophole. It was only relatively recently, in the years 2015-17
that several groups, including that of Zeilinger, managed to simultaneously close both the locality and
detection loopholes.

6.B Information does not propagate faster than the speed of light
Consider a pair of entangled qubits A and B which are widely separated. If a measurement is done
on qubit B then the state of the system changes, the final state depending on the result of the
measurement. This change in state happens instantaneously. Does this mean that information is
transmitted instantaneously to qubit A? If so, this would violate special relativity. We shall now see
that this is not the case, no information is transferred at the moment of measurement, and therefore
quantum mechanics does not violate special relativity.
Since qubits A and B are entangled we have to describe qubit A by a density matrix. To see its
form we separate out the parts of the entangled state corresponding to B being in state |0i and B
being in state |1i. Referring to our discussion of the generalized Born rule in Sec. 3.10, we write the
state of the two qubits before the measurement as

|ψAB i = α|ψ0,A i|0B i + β|ψ1,A i|1B i, (6.27)

where |ψ0,A i and |ψ1,A i are normalized (but not, in general, orthogonal) states of qubit A, and |α|2 +
|β|2 = 1. As stated in Eq. (5.7) in Sec. 5.2, the density matrix of the two qubits in a well-defined
quantum state is

ρAB = |ψAB ihψAB |

= ( α|ψ0,A i|0B i + β|ψ1,A i|1B i ) ( α∗ hψ0,A |h0B | + β ∗ hψ1,A |h1B | ]) , (6.28)

and the density matrix of qubit A alone is

ρA = Tr B ρAB
= h0B |ρAB |0B i + h1B |ρAB |1B i
= |α|2 |ψ0,A ihψ0,A | + |β|2 |ψ1,A ihψ1,A |. (6.29)

Note that Eq. (6.29) is a representation of the density matrix in terms of the non-orthogonal states
|ψ0,A i and |ψ1,A i. Another example involving non-orthogonal states was described in Sec. 5.5. As
discussed in Sec. 5.2.2, Eq. (6.29) implies that that qubit-A is in state |ψ0,A i with probability |α|2 and
is in state |ψ0,B i with probability |β|2 .
Now consider the situation after the measurement on qubit B. According to the generalized Born
rule discussed in Sec. 3.10, for the state of the combined AB system in Eq. (6.27), there is probability
|α|2 that qubit B is measured to be in state |0B i while qubit A is left in state |ψ0,A i, and there is
probability |β|2 that qubit B is measured to be in state |1B i while qubit A is left in in state |ψ1,A i.
For qubit A this situation is exactly the same as we found before the measurement, see Eq. (6.29).
Hence the density matrix for qubit A, which determines the probabilities of results of subsequent
measurements on A, is unchanged by the measurement of the distant qubit B, even though the two
6.C. THE SPIN-SINGLET STATE IS ISOTROPIC 69

qubits are entangled. Thus, although our description of the state of the two qubits does change
instantaneously at the moment of measurement, information is not propagated instantaneously by the
measurement and so special relativity is satisfied.

6.C The spin-singlet state is isotropic

Equations (4.12) of Chapter 4 show that the eigenstate of spin in a direction specified by polar angles
(θ, φ) with eigenvalue +1 is given by

|0n̂ i = cos θ2 |0i + eiφ sin θ2 |1i , (6.30a)

see Fig. 4.1. We also showed that the eigenstate corresponding to eigenvalue −1 is

|1n̂ i = − sin θ2 |0i + eiφ cos θ2 |1i , (6.30b)

which is the antipodal point where θ → π − θ, φ → φ + π, see Eq. (4.12b).

From Eqs. (6.30a) and (6.30b), we see that the unitary matrix which transforms from the Z basis
to the n̂ basis is
cos 2θ eiφ sin 2θ

U= . (6.31)
− sin 2θ eiφ cos 2θ
The inverse transformation is given by U −1 , but since U is unitary we have

cos 2θ − sin 2θ

−1 † T ?

U =U ≡ U = −iφ , (6.32)
e sin 2θ e−iφ cos 2θ
so

|0i = cos θ2 |0n̂ i − sin θ2 |1n̂ i (6.33a)

−iφ θ −iφ θ
|1i = e sin |0n̂ i + e
2 cos |1n̂ i .
2 (6.33b)

Hence the entangled Bell state |ψi in Eq. (6.4), (which is called the spin-singlet state in the physics
literature) can be written in the n̂ basis as
1
|ψi = √ (|01i − |10i) (6.34)
2
1 h −iφ
=√ cos θ2 |0n̂ i1 −sin θ2 |1n̂ i1 e sin θ2 |0n̂ i2 +e−iφ cos θ2 |1n̂ i2 −
2 (6.35)
i
e−iφ sin θ2 |0n̂ i1 +e−iφ cos θ2 |1n̂ i1 cos θ2 |0n̂ i2 −sin θ2 |1n̂ i2
e−iφ
= √ (|0n̂ 1n̂ i − |1n̂ 0n̂ i) , (6.36)
2
where, in the middle expression, we indicated by a subscript, e.g. | · · · i1 , whether the state is that of
the first or second qubit. Apart from the unimportant overall phase factor of7 e−iφ , Eq. (6.36) is the
same form that the state takes in the computational (Z) basis, Eq. (6.34). Hence if two qubits in the
entangled Bell state in Eq. (6.4) are observed in the same basis (see footnote 2 on page 61), no matter
which one, the results of the two measurements will always be opposite, one giving +1 and the other
−1.
7
Note that e−iφ is just the determinant of the transformation matrix from the computational basis to the n̂ basis given
in Eq. (6.32). Quite generally, if the “singlet” state |ψi in Eq. (6.34) is acted on by a unitary transformation V then one
can show that V |ψi = det V |ψi. Since V is unitary its determinant can only be a pure phase.
70CHAPTER 6. EINSTEIN-PODOLSKY-ROSEN (EPR), BELL’S INEQUALITIES, AND LOCAL REALISM
Chapter 7

Classical and Quantum Gates

Now, finally, we get to computation!

The elementary circuit elements which acts on the data in a computer are called gates. In this
chapter we will first discuss classical gates and then go on to describe quantum gates.

7.1 Classical Gates

Data in a classical digital computer is in the form of bits, x, which take values 0 or 1. The only
operation involving a single classical bit, i.e. the only 1-bit classical gate, is NOT which takes 0 to 1
and vice versa.
Of particular interest are 2 bit gates, the most common ones being

In Out
00 0
01 0
AND x∧y
10 0
11 1

In Out
00 0
01 1
OR x∨y (7.1)
10 1
11 1

In Out
00 0
01 1
XOR x⊕y
10 1
11 0

These have two input bits and one output bit. For the AND gate the result is 0 unless both inputs
are 1. For the OR gate the result is 0 unless one or both of the inputs are 1. The XOR gate only
differs from the OR gate in giving zero if both the inputs are 1.
Note that AND gives the same results as multiplication of the bits xy. The XOR operation is
equivalent to addition of the bits modulo 2, i.e. x + y ( mod 2). To see this, note that the modulo

71
72 CHAPTER 7. CLASSICAL AND QUANTUM GATES

operation gives the remainder after integer division. For example, since 13 = (5 × 2) + 3 we have 13 (
mod 5) = 3. Referring to the XOR gate consider the case x = y = 1, so we have 1 + 1 ( mod 2) = 0,
which is the value of XOR in this case. It is trivial to see that XOR is also addition modulo 2 for the
other values of x and y. For convenience of notation x + y ( mod 2) is written as x ⊕ y.
One can show that the AND, NOT and OR gates form a universal set which means that any
logical operation on a arbitrary number of bits on a classical computer can be expressed in terms of
these gates. Thus, classically, we only need 1-bit and 2-bit gates to perform any operation.
However, we cannot directly take over gates like AND, OR and XOR to a quantum computer for
the following reason. A gate in a quantum computer will be implemented by a unitary operator acting
on a small number of qubits. A unitary operator has the property that U −1 = U † . Now U −1 performs
the inverse operation, and since U † is well defined the inverse operation must exist. Thus, quantum
gates must be reversible.
However, AND, OR and XOR can not be reversible because they have a different number of outputs
and inputs. Suppose, for example, we know that the output from an OR gate is 1, and want to know
what is the input. We can’t say because there are three possible inputs, 01, 10 and 11, which give this
output.
Thus, a major change in going from classical to quantum computing will be having to deal with re-
versible computation. Next we will consider reversible classical computation before doing the quantum
case.
Clearly a necessary condition for a gate to be reversible is that it has the same number of input
and output bits. The 1-bit NOT gate has one input and one output, and is reversible since acting
twice gives back the original bit, i.e. (NOT)2 = IDENTITY, so (NOT)−1 = NOT, i.e. NOT is its own
inverse.
We will now consider a reversible, classical, 2-bit gate, the quantum analog of which will play an
important role in quantum computing. This is the controlled-NOT, or CNOT gate. It is similar to
XOR except that it has a second output bit, which is equal to one of the input bits, i.e. this bit is
unchanged on output. As we shall see, this simple modification, namely keeping one of the input bits
as part of the output, suffices to make the CNOT gate a reversible version of XOR.
One way of representing the action of CNOT is

x x
−→ . (7.2)
y x⊕y

The first (upper) bit is called the control bit. This is unchanged by the action of CNOT. The
second (lower) bit is called the target bit, and the effect of the XOR operation x ⊕ y is to flip y if
x = 1 and to leave y alone if x = 0. Hence, as far as the target bit is concerned, the gate is indeed a
controlled NOT, since the NOT acts if x, the control bit, is 1, and does not act if x = 0. The truth
table is as follows:
x y x0 y 0
0 0 0 0
0 1 0 1 . (7.3)
1 0 1 1
1 1 1 0

It is useful to represent the CNOT gate by a diagram, as shown in Fig. 7.1. The input is on the
left and the output on the right. The upper line is the control bit, and has value x on input, while the
lower line is the target bit and has value y on input. On output, the control qubit is unchanged and
the target qubit is the exclusive or (XOR) of x and y.
It is easy to see that CNOT is reversible since, if we act twice, we get back the original input
7.1. CLASSICAL GATES 73

x x control bit

y x+y target bit

Figure 7.1: The CNOT gate. The input is on the left and the output on the right.

because
x CNOT x CNOT x x
−→ −→ = . (7.4)
y x⊕y x⊕x⊕y y

The last line follows because x ⊕ x = 0 since 0 + 0 = 0 and 1 + 1 = 0 ( mod 2). Thus CNOT is its own
inverse. Hence, as mentioned earlier, it can therefore be regarded as a reversible version of XOR.
Note that to be reversible it is not required that the inverse operator is the same as the original
operator, only that the inverse operator exists. However, it turns out that most quantum gates we
consider will be their own inverse.
x x
y y

z z + xy

Figure 7.2: The Toffoli gate. This has two control bits x and y and one target bit z. On output the
control bits are unchanged and the target bit is flipped if both control bits are 1, so z → z ⊕ xy.

x x
y z
y xy + xz
z y
z xz + xy SWAP

Figure 7.3: Left: the Fredkin gate. This is a controlled-swap gate. If the upper (control) bit is 1 then
the two lower (target) bits are swapped, and otherwise the target bits are unchanged. x ≡ 1 − x is the
complement of x. Right: the elemental SWAP gate.

We mentioned above that the 1-bit (NOT) gate and a set of irreversible 2-bit gates (AND and OR)
together form universal set for a classical computer, which means that any logical operation on an
arbitrary number of bits can be constructed out of these gates. The question we now ask is whether
1-bit and 2-bit reversible classical gates are universal. The answer is no. Classically one also needs a
3-bit gate such as the Toffoli gate shown in Fig. 7.2 or the Fredkin gate shown in Fig. 7.3.
Amazingly we shall see that 3-qubit gates are not needed quantum mechanically. In fact it is
possible build the Toffoli gate, for example, out of 1-qubit and 2-qubit gates, and you will go through
how to do this in homework question 11.4 in Ch. 11. We shall see that quantum mechanics allows for a
big range of 1-qubit gates, whereas we have already noted that classically the only 1-bit gate is NOT.
It is this wide range of possibilities for 1-qubit gates that allows us to construct a quantum mechanical
Toffoli gate out of 1-qubit and 2-qubit gates, whereas no such construction is possible using classical
gates.
74 CHAPTER 7. CLASSICAL AND QUANTUM GATES

7.2 Quantum Gates

Following David Deutsch we represent the action of quantum gates by a circuit. The circuit comprises
a set of qubits in some initial state, acted on by gates and ending up in a final state. Each qubit is
represented by a line in the circuit diagram and time runs from left to right, see e.g. Fig. 7.4.
time

i1
G1
i2

i3 G2

Figure 7.4: A schematic circuit with three qubits and two gates. Time runs from left to right. The
initial state of the qubits is |i1 i ⊗ |i2 i ⊗ |i3 i.

Sometimes we will indicate a set of n qubits (called a register) compactly by a single line with a
n
slash through it as follows: .
Quantum circuits have the following properties:
• There are no loops, because qubits can’t go back in time.

• Lines can’t splay out (fan out) because of the no-cloning theorem.

• Similarly lines can’t merge.

• Gates and circuits are linear. We evaluate the effect of the circuit on an initial state which is a
computational basis state. However, if the initial qubits are in a superposition of computational
basis states, then the final state of the qubits, after the circuit has acted, is easily computed since
it is the corresponding linear superposition of outputs for each of the computational basis state
inputs.
Circuits have several gates acting in succession on a qubit and it is important to understand the
order in which they act. Unfortunately, this can be confusing. By convention, in diagrams time is from
left to right, so in the diagram

i A B f

A (the leftmost gate) acts first on state |ii, and then B acts. However, when writing operator expres-
sions, these work from right to left, so, the above diagram corresponds to

|f i = BA|ii, (7.5)

in which A is on the right. You simply have to get used to this reversal of order when going from
circuit diagrams to operator expressions.
Now we describe some commonly used quantum gates, recalling that quantum gates must be
reversible and so are unitary operators.
Firstly we consider 1-qubit gates.
7.2. QUANTUM GATES 75

• NOT, i.e. bit-flip (corresponds to the Pauli X matrix)

X|0i = |1i 0 1 α β
, X= , so X = . (7.6)
X|1i = |0i 1 0 β α

• Phase flip (corresponds to the Pauli Z matrix)

Z|0i = |0i 1 0 α α
, Z= , so Z = . (7.7)
Z|1i = −|1i 0 −1 β −β
In the physics literature X and Z are called Pauli spin matrices. There is also a third Pauli spin
matrix, Y , where

Y |0i = −i|1i 0 −i α −β
, Y = iXZ = , so Y =i , (7.8)
Y |1i = i|0i i 0 β α
which corresponds to a combined bit- and phase-flip. The Pauli Y -matrix will only appear again
when we do quantum error correction in Chapter 19.
• Hadamard
The Hadamard gate H will be very important.

1 1 1 1
H = √ (X + Z) = √ . (7.9)
2 2 1 −1
Note that H 2 = 1, and similarly X 2 = Y 2 = Z 2 = 1.
Now a matrix which squares to the identity has eigenvalues ±1. To see this note that if ~x is an
eigenvector of A with eigenvalue λ then
A2 ~x = A (A~x) = Aλ~x = λA~x = λ2 ~x. (7.10)
But if A2 = 1 then it follows that λ2 = 1 and so λ = ±1.
We need to become familiar with the action of H on computational basis states. This is:
1
H|0i =|+i ≡ √ (|0i + |1i)
2
(7.11)
1
H|1i =|−i ≡ √ (|0i − |1i) .
2
Combining these two equations, the action of H on a computational basis state |xi is seen to be
1
H|xi = √ ( |0i + (−1)x |1i ) , (7.12)
2
for both values of x, namely 0 and 1.
A crucial point is that these gates are linear, and so they act in the same way on a superposition. For
example:

α β α+β α−β
H [ α|0i + β|1i ] = √ (|0i + |1i) + √ (|0i − |1i) = √ |0i + √ |1i. (7.13)
2 2 2 2
We also need to consider measurement gates, in which a classical measurement of a qubit takes
place. The basis in which measurements are made is called the computational basis. The Pauli spin
matrices are for the computational basis and since the Pauli Z is diagonal we also call the computational
basis the Z-basis.
The result of the measurement is a classical bit. In the circuit diagrams we indicate a classical bit
by a double line, and so a measurement gate is indicated as follows:
76 CHAPTER 7. CLASSICAL AND QUANTUM GATES

A measurement gate

The measurement apparatus acting on a qubit determines the value of Z for that qubit, obtaining
either +1, in which case the qubit is left in state |0i, or −1, in which case the qubit is left in state |1i.
If one wants to measure the value of some other quantity one needs to perform an appropriate unitary
transformation first. For example, to determine the value of X one acts with a Hadamard before the
measurement, since the Hadamard converts the X-basis to the Z-basis and vice-versa. In other words
a state α|+i + β|−i becomes α|0i + β|1i after the Hadamard, and so a measurement gives |0i with
probability |α|2 and |1i with probability |β|2 . These are the probabilities one would have of measuring
|+i and |−i respectively (before the Hadamard acted) if one could measure X directly.
Note, however, that this procedure leaves the qubit in an eigenstate of Z which is a problem if we
want to continue to use the qubit after the measurement, because then the qubit should be left in the
eigenstate of the measurement operator. It turns out that this can be done by coupling the qubit to
another “ancilla” qubit and measuring the ancilla, as explained in Fig. 7.8 below.
Next we consider 2-qubit gates, the most important of which by far is the CNOT. We already met
the classical CNOT gate in Fig. 7.1. In the quantum case, if initially the qubits are in a computational
basis state, then the action of the CNOT is the same as classically. Since the NOT function is
implemented by the Pauli X operator, so the CNOT operation can equivalently be thought of as Ctrl-
X, we indicate explicitly the action of X in the circuit representation of the quantum CNOT gate
shown in Fig. 7.5.

x x control qubit

y y+x target qubit

Figure 7.5: A quantum CNOT gate. This representation makes clear that the NOT operation is
performed by the Pauli Xoperator. If the initial state of the qubits (on the left) is a computational
basis state, then the action of the quantum CNOT gate is the same as that of the classical CNOT
shown in Fig. 7.1. The upper line represents the control qubit and the lower line the target qubit.

The CNOT gate has the matrix representation

|00i |01i |10i |11i

 
h00| 1 0 0 0
h01| 
0 1 0 0
UCNOT = . (7.14)
h10|  0 0 0 1
h11| 0 0 1 0

In this tensor product the control qubit is the one to the left. The target qubit (to the right) is flipped
if the control qubit is 1 (so, relative to the identity matrix, columns 3 and 4 are interchanged). We
can also write UCNOT in terms of 2 × 2 blocks as follows
1 0

UCN OT = . (7.15)
0 X

The quantum aspect appears if we input (on the left) a linear combination of basis states. Suppose,
for example, we set the target (lower) qubit to |0i. Then if the control qubit is initially |0i the final
7.2. QUANTUM GATES 77

state of the 2-qubit system is |00i, because the target qubit is not flipped and stays as |0i (we take the
control qubit to be the left one). If the control qubit is initially |1i then the final state of the 2-qubit
system is |11i because the target qubit is flipped from |0i to |1i. Hence, by linearity, if the initial
state of the control qubit is the superposition α|0i + β|1i, then the final state of the 2-qubit system is
α|00i + β|11i, see Fig. 7.6. Note that the CNOT gate has entangled the control and target qubits.

α 0 +β 1

α 00 + β 11
0 X

Figure 7.6: The action of the CNOT gate when the upper (control) qubit is initially in a superposition
α|0i + β|1i, and the lower (target) qubit is initially |0i. By linearity, the final state is α times the
result of inputting |0i in the control qubit plus β times the result of inputting |1i, i.e. α|00i + β|11i.
We see that the final state is entangled.

Note that if α = 0 (so β = 1 since |α|2 + |β|2 = 1) or α = 1 (β = 0), the final state is a
clone of the initial state of the control qubit. However, for a general input state, the final state of
the two qubits, α|00i + β|11i, is not a clone of the initial state of the control qubit which would be
(α|0i + β|1i) ⊗ (α|0i + β|1i) = α2 |00i + αβ(|01i + |10i) + β 2 |11i. Hence there is no violation of the
no-cloning theorem which states that a general, unknown quantum state can not be cloned.
In this course, we will specify the action of a gate by its action on an initial computational basis
state. If we denote a qubit by a Latin letter, e.g. |xi, we mean that this is a computational basis state
and x takes values 0 or 1. General quantum states, i.e. superpositions of computational basis states,
will be indicated by Greek letters, e.g. |ψi.
As already mentioned above, we do not need 3-qubit gates for quantum computing. More precisely,
the statement is that one can generate an arbitrary unitary transformation (to a specified level of
accuracy) on an arbitrary number of qubits, using only CNOT and single-qubit gates. I do not prove
this result but refer interested students to a more advanced text [NC00]. It is fortunate that we don’t
need 3-qubit gates given the difficulty of making quantum circuits.

control
0 H H

0 X
target
ψ ψ ψ ψ
0 1 2 3

Figure 7.7: The initial state of both qubits is |0i. What is the final state |ψ3 i? Equation (7.16) gives
the state of the two qubits at each stage. The end result is that the two qubits are entangled and, in
contrast to what one might have thought, the control (upper) qubit has a non-zero amplitude to be
flipped relative to its initial state, i.e. to be in state |1i.

It is useful to mention here that one has to be careful when dealing with superpositions, and one’s
initial intuition as to the final result may be incorrect. As an example, consider the circuit in Fig. 7.7.
78 CHAPTER 7. CLASSICAL AND QUANTUM GATES

Since H 2 = 1 and the CNOT gate doesn’t change the control (upper) qubit, one might think that
the final state of the control qubit would be the same as the initial state, i.e. |0i. However this is not
correct because the control and target qubits become entangled. Let’s go through each stage of the
circuit using the notation for successive states indicated in Fig. 7.7, and taking the left-hand qubit in
the formulae to be the control qubit:

|ψ0 i = |00i
1
|ψ1 i = √ ( |00i + |10i )
2
1
|ψ2 i = √ ( |00i + |11i )
2 (7.16)
1
|ψ3 i = ( |00i + |10i + |01i − |11i )
2
1 |0it + |1it |0it − |1it
= √ |0ic ⊗ √ + |1ic ⊗ √ ,
2 2 2
where in the last expression we indicate explicitly which qubit is the control qubit (“c”), and which
the target qubit (“t”). We see that, contrary to what one might have initially guessed, there is an
amplitude for the final state of the control qubit to be |1i because of its entanglement with the target
qubit.
We have noted that the Pauli operators X, Y and Z, and the Hadamard operator have eigenvalues
±1. Later in the course, when we consider the important topic of quantum error correction, we will
encounter combinations of these operators on different qubits which also have ±1 eigenvalues. We will
now describe a convenient way of measuring such operators. Let us denote the operator by U . It will
have matrix elements given by
u00 u01
U= (7.17)
u10 u11
and eigenvalue +1 with eigenvector |ψ+ i and an eigenvalue −1 with eigenvector |ψ− i, i.e.

U |ψ+ i = |ψ+ i, U |ψ− i = −|ψ− i. (7.18)

We would like to investigate the qubit (or qubits) to determine which eigenstate of U it is in, or, if
it is in a linear superposition, to project by measurement on to one of the eigenstates, and know which
one.
control
0 H H

ψ U
target
φ φ φ φ
0 1 2 3

Figure 7.8: A circuit with a control-U gate in which the control (upper) qubit is surrounded by
Hadamards. U is an operator with eigenvalues ±1 and corresponding eigenvectors |ψ+ i and |ψ− i.
As shown in the text, if a measurement of the upper qubit gives |0i then the lower qubit will be in
state |ψ+ i, and if the measurement gives |1i then the lower qubit will be in state |ψ− i. The states
|φi i (i = 0, 1, 2, 3) are described in the text.
7.2. QUANTUM GATES 79

A convenient way is to use the circuit shown in Fig. 7.8, which has a control-U gate1 . The matrix
representation of control-U is

|00i |01i |10i |11i

 
h00| 1 0 0 0
= 1 0 ,

h01|  0 1 0 0 
control−U =  (7.19)
h10|  0 0 u00 u01  0 U
h11| 0 0 u10 u11

where the last expression is written in terms of 2 × 2 blocks. If the control qubit is 1 then U acts on the
target qubit according to Eq. (7.18), while if the control qubit is 0 then the target qubit is unchanged.
The lower (target) qubit is initially in state |ψi, which can be written as a linear combination of
the two eigenvectors
|ψi = α+ |ψ+ i + α− |ψ− i , (7.20)
and so, including the upper (control) qubit which is initially in state |0i, the initial state of the circuit
(on the left of Fig. 7.8) is
|φ0 i = α+ |0 ψ+ i + α− |0 ψ− i . (7.21a)
In labeling the states, we put the state of the control qubit to the left and that of the target qubit to
the right. After the first Hadamard on the upper qubit the state is
α+ α−
|φ1 i = √ ( |0 ψ+ i + |1 ψ+ i ) + √ ( |0 ψ− i + |1 ψ− i ) . (7.21b)
2 2

The effect of the control-U gate on the target qubit is given by Eq. (7.18) when the control qubit is 1
and has no effect if the control qubit is 0. Hence, after the control-U gate, the state is
α+ α−
|φ2 i = √ ( |0 ψ+ i + |1 ψ+ i ) + √ ( |0 ψ− i − |1 ψ− i ) . (7.21c)
2 2

Applying the righthand Hadamard in Fig. 7.8 to the upper (control) qubit we get

|φ3 i = α+ |0 ψ+ i + α− |1 ψ− i . (7.21d)

Hence if a measurement of the upper qubit gives |0i (which it does with probability |α+ |2 ) the lower
qubit will be in state |ψ+ i, and if the measurement gives |1i (probability is |α− |2 ) the lower qubit will
be in state |ψ− i. We see that measuring the control qubit projects the target qubit onto an eigenstate
of U and tells us which one.
Note that we use the control qubit as an ancilla, and by measuring it we can determine which
eigenstate of U the target qubit is left in. Directly measuring the qubit of interest does not achive
this for the following reason. To determine which eigenvalue of U the qubit is in, we go to the U -basis
by acting with U and then perform a measurement. If the qubit is measured to be in state |1i then,
before the U operation the qubit was in state |ψ+ i, and if the qubit is measured to be in state |0i then
the qubit was in state |ψ− i. However, we want the qubit to be left in the eigenstate of U whereas the
measurement leaves it in a computational eigenstate (i.e. a Z-basis state). In order to leave the qubit
in an eigenstate of U , and know which one it is, we cannot measure the qubit itself, but fortunately,
as we just saw, we can do this by coupling the qubit to an ancilla and measuring the ancilla.
We will return to the circuit in Fig. 7.8 in Chapter 19 when we discuss quantum error correction.

1
Apart from the absence of the final measurement gate, Fig. 7.7 is a special case of Fig. 7.8 with U = X.
80 CHAPTER 7. CLASSICAL AND QUANTUM GATES

Problems
7.1. Show that the n-qubit Hadamard gate acts as
n
2 −1
⊗n 1 X
H |xin = √ (−1)x·y |yi, (7.22)
n
2 y=0

where x · y is the bitwise inner product of x and y with modulo 2 addition:

x · y = x0 y0 ⊕ x1 y1 ⊕ . . . ⊕ xn−1 yn−1 . (7.23)

7.2. Show the following identities:

HXH = Z
HZH = X
HY H = −Y,
where H is the Hadamard matrix.
7.3. The “SWAP” gate S interchanges the two inputs. It is defined by
S|xyi = |yxi. (7.24)
(i) Give the matrix representing this state.
(ii) Show that it is equivalent to three CNOT gates as
S12 = C12 C21 C12 , (7.25)
where, in Cij , i refers to the control bit and j to the target bit.
7.4. Verify the following circuit identities:

(a)
x X x X

y X X y X
ψ1 ψf ψ1 ψf

(b)
x Z x

y Z X y X Z
ψ1 ψf ψ1 ψf

Note: Control-X is another way of writing the CNOT gate.

Hint: Consider arbitrary initial computational basis states |xi and |yi, and determine, for all
four figures, the intermediate states |ψ1 i and the final states |ψf i. Show that the final state |ψf i
is the same for the left-hand and the corresponding right-hand figures.
7.2. QUANTUM GATES 81

7.5. Consider a CNOT gate in which the target qubit is |0i. Show that it clones the control qubit if
the control qubit is a computational basis state, |xi, where x = 0 or 1, but does not clone it if
the control qubit is a linear superposition of computational basis states.
Note: This is in agreement with the no-cloning theorem which states that one can not clone an
arbitrary unknown quantum state.

7.6. The notion of controlled (i.e. conditional) gate can be generalized to an arbitrary single-qubit
operation U as follows
UCU |xi|yi = |xiU x |yi, (7.26)
where x and y are 0 or 1. Here |xi is the control qubit, and |yi is the target qubit. If x = 0 then
U does not act because U x = 1, whereas U does act on the target qubit if x = 1. The matrix
representation of this gate is
1 0

UCU = , (7.27)
0 U
where 1 and U represent 2 × 2 blocks. The circuit diagram is as follows:

x x

y x
U U y

In most of the examples that we will discuss, it turns out that U 2 = 1 and so, as shown earlier,
the eigenvalues are ±1. The operator U in the circuit below has this property.

0 H H 0 or 1

ψi U ψ
f

Now we add a measurement of the control qubit as shown in the figure above. The box with
an arrow indicates a measurement. The double line to the right indicates that the result of the
measurement is a classical bit, 0 or 1.

Show that if the measurement of the upper (control) qubit finds |0i then the lower (target) qubit
ends up a state |ψf i which is the eigenstate of U with eigenvalue +1, whereas if the measurement
of the upper (control) qubit finds |1i then the lower (target) qubit ends up in the eigenstate of
U with eigenvalue −1.
Note: We say that this circuit measures the operator U . It will play an important role when we
study quantum error correction.
82 CHAPTER 7. CLASSICAL AND QUANTUM GATES
Chapter 8

Generating and measuring Bell States

Entangled states play an important role in quantum computing. The most-studied entangled states are
so-called Bell states which involve two qubits. They are named in honor of the physicist John Bell who
clarified the Einstein-Podolsky-Rosen (EPR) paradox, discussed in Chapter 6, and whose inequalities
demonstrated that the description of nature provided by quantum mechanics is fundamentally different
from the classical description. The Bell states are defined by
1
|β00 i = √ ( |00i + |11i ) , (8.1a)
2
1
|β01 i = √ ( |01i + |10i ) , (8.1b)
2
1
|β10 i = √ ( |00i − |11i ) , (8.1c)
2
1
|β11 i = √ ( |01i − |10i ) . (8.1d)
2
These four equations can be combined as follows:
1
|βxy i = √ ( |0yi + (−1)x |1yi ) , (8.2)
2
where y is the complement of y, i.e. y = 1 − y. Note that the Bell states form a basis for two qubits,
as do the computational states |xi2 .

x H

βxy

y X

Figure 8.1: Circuit to create the Bell states defined by Eqs. (8.1). In the CNOT (Ctrl-X) gate, the
upper qubit |xi is the control qubit and the lower qubit |yi is the target qubit.

The Bell states are clearly entangled. They can be created out of two (unentangled) qubits in
computational basis states |xyi by the circuit shown in Fig. 8.1. To see this note that, according to
Eq. (7.12), after the Hadamard the state is
1
|xyi → √ ( |0yi + (−1)x |1yi ) . (8.3)
2

83
84 CHAPTER 8. GENERATING AND MEASURING BELL STATES

The effect of the CNOT gate is to flip y in the second term (since x = 1 there) and so we get Eq. (8.2).

H x

βxy

X y

Figure 8.2: Circuit for Bell measurements. This will be used later in the course when we discuss
teleportation.

The circuit in Fig. 8.1 converts the computational basis to the Bell basis. The reverse of this
circuit can be used to convert the Bell basis back to the computational basis as shown in Fig. 8.2. The
measured values of x and y tell us which Bell state we started with. This is called a Bell Measurement.
To see that this works note that after the CNOT gate the state of the two qubits in Fig. 8.2 is1
1
√ [ |0yi + (−1)x |1yi ] , (8.4)
2
which is separable and so can be written as
1
√ [ |0i + (−1)x |1i ] ⊗ |yi. (8.5)
2
Recall that the left-hand qubit is the upper (control) qubit in Fig. 8.2 and the right hand qubit is the
lower (target) qubit. Acting with the Hadamard has the effect
1
H √ [ |0i + (−1)x |1i ] = |xi , (8.6)
2
so the final state in Fig. 8.2 is |xyi as desired.
Note that the Bell states |βxy i provide a basis for two qubits, see Appendix 4.A in Chapter 4,
since they are normalized and mutually orthogonal. Consequently, if the state inputted into the Bell
measurement circuit in Fig. 8.2 is not a single Bell state, but rather a linear combination,
1
X
|ψin i = αxy |βxy i, (8.7)
x,y=0

with x,y |αxy |2 = 1, then the probability that the measurements obtain a particular set of values for
P

x and y is |αxy |2 .

1
The reason that y in the Bell state, Eq. (8.2), changes to y in the second term in Eq. (8.4) is because x = 1 and so
the y (target) qubit is flipped.
Chapter 9

Quantum Functions

9.1 An elementary quantum function

In computation we need to evaluate functions. How can we do this in a quantum computer where
functions are determined by unitary transformations which are reversible?
Let us first consider the simplest case, where the argument of the function, x, is a single bit, and
the result of the function, f (x), is also a single bit. In other words, x takes only the values 0 and 1, and
the same for f (x). We need to have a qubit for x and an additional qubit1 which contains information
on the function f (x).
The function f (x) will be implemented by a unitary operator Uf acting on two qubits such that

Uf |xi|yi = |xi|f (x) ⊕ yi. (9.1)

Note the similarity with the CNOT gate, which is precisely of this form with f (x) = x. It is easy to
see that Uf2 = 1 since

Uf2 |xi|yi = Uf |xi|f (x) ⊕ yi = |xi|f (x) ⊕ f (x) ⊕ yi = |xi|yi (9.2)

since, as discussed earlier in the course, f (x) ⊕ f (x) = 0. Hence Uf has an inverse, which is Uf itself.
The corresponding circuit diagram is shown in Fig. 9.1

x x
Uf
y y + f(x)

Figure 9.1: Schematic diagram of a unitary transformation Uf for a function f (x) in which both the
argument x and the function just take two values, 0 and 1.

For a general function, the range of inputs can be represented by n bits, say, and the range of
outputs by m bits. Thus we need a total of n + m qubits both in the initial state and final state. The
unitary transformation is
Uf |xin |yim = |xin |f (x) ⊕ yim , (9.3)
1
We need to have two qubits in both the initial and final states in order that the function is reversible, just as we
needed two qubits in the final state as well as the initial state to make the CNOT gate which is a reversible generalization
of the XOR gate, see Chapter 7.

85
86 CHAPTER 9. QUANTUM FUNCTIONS

where the modulo 2 addition, indicated by ⊕, applies separately to each of the m bits of f (x) and y.
The proof that Uf is its own inverse is the same as that in Eq. (9.2). The circuit diagram corresponding
to Eq. (9.3) is shown in Fig. 9.2.

n n
x x
n n
Uf
y m m y + f(x)
m m

Figure 9.2: Schematic diagram of a general unitary transformation Uf for an n-bit input x and an
m-bit output f (x). The upper register in the figure has n qubits and contains the input value x. The
lower register has m qubits and, in the final state on the right, contains information about the function
value f (x). The registers are shown as single lines. To ensure the transformation is reversible there
are n + m qubits in both the initial state (to the left) and final state (to the right).

One sometimes calls the upper register in Fig. 9.2 the “input” register, because it contains the
input, x, and the lower register the “output register” because it contains information on the function
f (x). However, since both registers are present in the initial state (on the left) and the final state (on
the right) this terminology can be confusing.
Note that if y = 0 the lower register contains precisely the function f (x).

9.2 Quantum Parallelism

Things get interesting if we feed in a superposition. We can generate a uniform superposition by acting
with Hadamards on |0in . Note that for one qubit
1
H|0i = √ (|0i + |1i) , (9.4)
2
and similarly applying a Hadamard to each of two qubits
1
H|0i ⊗ H|0i = (|0i + |1i) ⊗ (|0i + |1i)
2
1
= (|00i + |01i + |10i + |11i) (9.5)
2
3
1 1 X
= (|0i2 + |1i2 + |2i2 + |3i2 ) = |xi2 .
2 2
x=0

Generalizing we have
n −1
2X
⊗n 1
H |0in = |xin . (9.6)
2n/2 x=0
Now lets consider the circuit shown in Fig. 9.3. The initial state is

|φ0 i = |0in |0im , (9.7)

so the state fed into the unitary operator Uf is the superposition

n −1
2X
1
|φ1 i = |xin |0im . (9.8)
2n/2 x=0
9.2. QUANTUM PARALLELISM 87

Noting that the lower register is initialized to |0i, then by linearity, according to Eq. (9.3), the final
state must be
2n −1
1 X
|φ2 i = n/2 |xin |f (x)im . (9.9)
2 x=0

0 n n
n Hxn
Uf
0 m m
m

φ0 φ1 φ2

Figure 9.3: Because of the Hadamards, the input to Uf is now the uniform superposition of all com-
putational basis states in Eq. (9.6). The output from Uf is given by Eq. (9.9).

This is an astonishing result. The final state contains the function values for all 2n possible values
of the input x. They have been evaluated in parallel, a feature of quantum mechanics called, naturally
enough, “quantum parallelism”. For n = 100 we have 2100 ' 1030 function evaluations in parallel.
A speedup of 1030 seems to good to be true, and, unfortunately, it is. What’s the catch? The catch
is that the only way one can access the information contained in the state is to do a measurement of
the lower register. This does not give 1030 results but just one result, the value of f (x) for a single
value of x. The probabilities of the different results are the square of the amplitudes (which are all
equal here so there is a probability 1/2n of getting the value of f (x) for each of the 2n possible values
of x). So, it seems that we have achieved nothing. We have found the value of the function for one
value of its argument, which we could have got much more easily on a classical computer. However,
for some problems, one can gain enough useful information to get a “quantum speedup” by doing
clever pre-processing before the measurement, in order to reduce the number of possible measurement
outcomes (sometimes to just one.) How to achieve this in practice will occupy us for most of the rest
of the course.
Philosophers, and some physicists, debate whether one can really state that all 2n values of the
function have been evaluated since one can not observe them. Most physicists would argue that the
only “real” quantities are those that can be observed, and, in particular, the quantum mechanical state
itself is not real. Rather it is a device from which one compute the results of measurements. From this
point of view, it is not valid to claim that all 2n values of the function have actually been evaluated.
Now we have done enough preliminaries to study our first quantum algorithm! This will be described
in the next chapter.
88 CHAPTER 9. QUANTUM FUNCTIONS
Chapter 10

Deutsch’s Algorithm

10.1 Introduction
We now turn to our first algorithm, due to David Deutsch [Deu85] which is generally felt to have
started the field of quantum computing.
As we shall see the problem is very trivial. It concerns functions which takes a 1-qubit argument
and give a 1-qubit output. The problem is clearly contrived and is of no practical interest. However,
it does show a quantum speedup, and this arises from the same features of quantum circuits, namely
quantum parallelism and interference, used in more sophisticated and useful quantum algorithms such
as that of Shor.
Since the input takes one of two values, 0 and 1, as does the output, there are only four distinct
functions as shown in the table.

x=0 x=1
f1 0 0
f2 0 1
f3 1 0
f4 1 1

Table 10.1: The four functions which have a 1-qubit input and a 1-qubit output.

You see that f1 and f4 gave the same result for each input, they are constant. On the other hand,
f2 and f3 give different results for the two inputs. This is analogous to a coin toss. The two values
of x correspond to the two physical sides of the coin, the upper and the lower sides. The function
values correspond to what is represented on those sides, heads or tails. If the two sides of the coin give
different results (one heads and the other tails), corresponding to a non-constant function, the coin
is honest. From now on we shall use the term “balanced”, rather than “non-constant”, to indicate a
function which gives different results for x = 0 and x = 1. However, if the two sides of the coin give
the same result (both heads or both tails) the coin is dishonest, corresponding to a constant function.
We are given a “black box”1 function f (x) and we want to learn about it. Of course we could just
feed in x = 0 and x = 1 and observe the results. Suppose, however, we only want to know whether
the function is constant (satisfied by f1 and f4 ) or balanced (satisfied by f2 and f3 ). On a classical
computer the only thing to do is to evaluate the function for both values of x and compare them, i.e.
we need to make two calls to the function. However, we shall see that we can answer this question
1
The term “black box” implies that the only information we can get about the function is by evaluating it for different
inputs. We can’t open up the box to see what is inside. A black box function is often called an “oracle”.

89
90 CHAPTER 10. DEUTSCH’S ALGORITHM

on a quantum computer with only one call to the function. We get less information than classically,
because we don’t determine the individual values of f (0) and f (1), but we do determine whether or
not f is constant. Hence Deutsch’s problem may be thought of as determining whether a coin to be
tossed is honest or not with just one toss of the coin.
As we discussed in Chapter 9, a quantum function f is implemented by a unitary operator Uf as
shown in Fig. 10.1.

x x
Uf
y y + f(x)

Figure 10.1: The blackbox routine Uf for a function f (x) which takes a 1-qubit input x and computes
a 1-qubit function f (x). Here x and y are computational basis states |0i or |1i. However, to gain a
quantum speedup, we will input superpositions, generated by Hadamard gates, as shown in Fig. 10.2.
We obtain the result of inputting a superposition from the results of inputting computational basis
states by using linearity. Recall that time runs from left to right in circuit diagrams.

In order to take advantage of quantum parallelism we insert Hadamard gates before the black box
function Uf on both the upper (input) and lower (output) qubits, and to take advantage of quantum
interference of the results we will also put Hadamards on both qubits after Uf has acted2 , see Fig. 10.2.
We initialize the upper qubit to be |0i and the lower qubit to be |1i. The upper qubit could be initialized
to either |0i or |1i but it is essential to initialize the lower qubit to |1i as we shall see.

0 H H x
Uf
1 H H 1

ψ ψ ψ
0 1 2

Figure 10.2: Circuit for Deutsch’s algorithm. The initial state (on left) has |0i in the upper (input)
qubit and |1i in the lower (output) qubit. Hadamard gates are applied to both qubits both before and
after the function Uf (which we assume to be an unknown black box). In the final state the lower
qubit is unchanged at |1i. A measurement is made of the final value (on right) of the upper qubit. If
this is unchanged, i.e. x = 0 in this case, then the function is constant, while if the upper qubit has
flipped, then the function is balanced. One could equivalently start with the upper qubit as |1i and
find the same conclusion: namely if the upper qubit is unchanged the function is constant whereas if
it has flipped the function is balanced. However, it is essential to start the lower qubit in state |1i for
the algorithm to work.

2
This is actually an improved version of Deutsch’s original algorithm. The improved version works every time, whereas
the original version only worked half the time.
10.1. INTRODUCTION 91

Recalling that
1 1
H|0i = √ (|0i + |1i), H|1i = √ (|0i − |1i), (10.1)
2 2
we find that after the first Hadamards the state in Fig. 10.2 is

1
|ψ0 i = ((|0iu + |1iu ) ⊗ (|0il − |1il ) ,
2
1 1
= |0iu ⊗ (|0il − |1il ) + |1iu ⊗ (|0il − |1il ) , (10.2)
2 2
where, in the tensor product, the the upper qubit (labeled “u”) is to the left and the lower qubit
(labeled “l”) is to the right.
The function Uf is then applied. Recall from Fig. 10.1 that if the state of the upper qubit is x,
then the final state of the lower qubit is f (x) if its initial state is zero, and the complement f (x) if its
initial state is one, i.e.

|xi |yi → |xi |y ⊕ f (x)i, so (10.3)

|0i |0i → |0i |f (0)i,
|0i |1i → |0i |f (0)i,
|1i |0i → |1i |f (1)i,
|1i |1i → |1i |f (1)i,

Hence, after Uf has been applied, the state is

1 1
|ψ1 i = |0iu ⊗ ( |f (0)il − |f (0)il ) + |1iu ⊗ ( |f (1)il − |f (1)il ) (10.4)
2 2
It is helpful to note that

|0il − |1il if f (x) = 0,
|f (x)il − |f (x)il =
|1il − |0il if f (x) = 1,
= (−1)f (x) ( |0il − |1il ) . (10.5)

Hence whether or not f (x) = 0 or f (x) = 1 just changes the overall sign of the state. To get this
effect it was necessary to prepare the lower qubit in state |1i rather than |0i. Vathsan [Vat16] calls
Eq. (10.5) “phase kickback ”. Consequently we can write |ψ1 i as

(−1)f (0) |0iu + (−1)f (1) |1iu |0il − |1il

|ψ1 i = √ ⊗ √ . (10.6)
2 2

Now we run both qubits through Hadamards (those to the right of U in Fig. 10.2). It is easy to see
that action on the lower qubit (right hand one in the tensor product) is to convert √12 (|0il − |1il ) back
to |1il . The action of H on the upper qubit is to give

1h i
(−1)f (0) (|0iu + |1iu ) + (−1)f (1) (|0iu − |1iu ) (10.7)
2
which can be written as
1 h
f (0) f (1)
i 1 h
f (0) f (1)
i
|0iu (−1) + (−1) + |1iu (−1) − (−1) . (10.8)
2 2
92 CHAPTER 10. DEUTSCH’S ALGORITHM

Clearly this is ±|0iu if f (0) = f (1) (where the plus sign is for f (0) = f (1) = 0 and the minus sign for
f (0) = f (1) = 1), and is ±|1iu if f (0) 6= f (1) (where the sign depends on whether f (0) = 1, f (1) = 0
or vice versa). Hence the state to the right of the Hadamards in Fig. 10.2 is

±|0iu ⊗ |1il if f (1) = f (0) ,
|ψ2 i = (10.9)
±|1iu ⊗ |1il if f (1) 6= f (0) .
Consequently, if a measurement of the upper qubit in Fig. 10.2 (left in the tensor product) finds that
it is unchanged3 from its value in the initial state then f (0) = f (1), whereas if it is flipped then
f (0) 6= f (1). We do this with one call to the function so we have achieved a “quantum speedup” of 2,
which is admittedly not spectacular but it is interesting that we get any speedup at all. We will get
more impressive speedups in later algorithms.
If we could measure the sign of the state we could determine the values of f (0) and f (1) separately
but the sign of the state (more generally its phase) has no measurable effect and can not be determined.
A crucial role has been played by the Hadamards. Those which act before U is called generate a
superposition state with both inputs x = 0 and 1 present. Looking at Eq. (10.4) it “seems” that U
has computed f (x) for both values of x with just one call to it. This is “quantum parallelism”. If
we do a measurement directly after the application of U we only get one value. However, for certain
problems like this one, if we do some additional post-processing (in this case acting with Hadamards
again), we can use “quantum interference” between the different pieces in the superposition to set to
zero the probability of getting certain results (in this case all possible results bar one are suppressed).
Consequently it is possible to get useful information (in this case whether the function is constant or
not) when the measurement is subsequently done.
Note that the Deutsch algorithm is not probabilistic: it succeeds with probability 1. This
shows that quantum algorithms don’t necessarily have to be probabilistic (though many are). In this
case, quantum interference transforms the state to be measured into an eigenstate of the computational
basis. As we know, if we measure an eigenstate we always get the same answer (the eigenvalue) and
there is no uncertainty.
Appendices

10.A An alternative derivation

This appendix is based on Mermin [Mer07].
To familiarize ourselves with quantum circuits we will obtain Eq. (10.9) in a different way by
explicitly writing down circuits for the four functions f1 to f4 , see Fig. 2.1 of Mermin [Mer07]. Noting
that the function flips the lower (output) qubit if the result of the function is 1 but leaves it alone if the
function gives 0, we can represent the four functions in Table 10.1 by the circuits shown in Fig. 10.3.
Explanations of why each circuit is equivalent to the corresponding function are given in the figure
caption. We sandwich each of these functions between Hadamards to carry out the Deutsch algorithm,
as shown in Fig. 10.2, and prepare the qubits in the initial state |xi ⊗ |1i. The results are shown in
Fig. 10.4.
We now explain each of the diagrams in this figure.
• f1 :
This follows simply because Uf1 makes no change, see Fig. 10.3, and H 2 = 1 (the identity), see
Fig. 10.5(a) in Appendix 10.B, so the final qubits are the same as the initial qubits, |xi ⊗ |1i, see
Fig. 10.4. In particular x is unchanged indicating, correctly that the function is constant.
3
It does not matter whether we initialize the upper qubit to be 0 or 1, the conclusion is the same. Namely, if the
upper qubit is unchanged, then the function is constant, whereas if it is flipped the function is balanced.
10.A. AN ALTERNATIVE DERIVATION 93

f(0) f(1)

x x x
Uf1 = 0 0
y y y

x x x
Uf2 = 0 1
y y X y(x=0)
y (x=1)

x x x
Uf3 = 1 0
y y X X y (x=0)
y (x=1)

x x x
Uf4 = 1 1
y y X y

Figure 10.3: Circuit diagrams for each of the four functions f1 , · · · , f4 in Table 10.1. As seen in
Fig. 10.1, the function flips the lower (output) qubit if the result of the function is 1 but leaves it alone
if the function gives 0. If f (x) = 0, y is unchanged but if f (x) = 1 then y is flipped and becomes y, the
complement. Note that x is always unchanged. For example, with f1 (top diagram), nothing happens.
For f4 , y is always flipped which is done with the X gate on the lower qubit. For f2 , y is only flipped
if x = 1 which is done by the CNOT gate as shown. For f3 , y is only flipped if x = 0 which can be
accomplished by the extra X gate on the y-qubit.

• f2 :
The function Uf2 has a CNOT gate in which the upper qubit is the control and the lower qubit
is the target, see Fig. 10.3. The result of sandwiching a CNOT between Hadamards is, perhaps
surprisingly, to interchange the role of the target and control qubits. This is shown in Appendix
10.B, see Fig. 10.5(f). Hence we see that x is flipped because the lower qubit is set to |1i, see
Fig. 10.4. This is correct because the function is balanced.

• f3 :
The circuit for Uf3 is shown in Fig. 10.3. Noting that H 2 = 1, one can insert two Hadamards
between the two X gates in the circuit for Uf3 in Fig. 10.3. As we noted for f2 , the effect of
putting Hadamards on either side of the CNOT gate is to interchange the role of the target and
control qubits. In addition, we have HXH = Z, see Fig. 10.5(b) in the Appendix 10.B. Hence
x is flipped and there is a sign change, see Fig. 10.4. We can’t measure the sign change but the
fact that x is flipped correctly indicates that the function is balanced.

• f4 :
The function Uf4 has an X gate on the lower qubit, see Fig. 10.3, and again we have HXH = Z.
Hence x remains unchanged and there is a sign change, see Fig. 10.4. Again we cannot measure
the sign change and the fact that x is not flipped indicates correctly that the function is constant.
94 CHAPTER 10. DEUTSCH’S ALGORITHM

f(0) f(1)

x H H x x
f1: = 0 0
1 H H 1 1
Uf1
_
x H H x X x
f2: = 0 1
1 H X H 1 1
Uf2
_
f3: x H H
=
x X x
1 0
1 H X X H 1 Z
_1
Uf3
f4: x H H
=
x x
1 1
1 H X H 1 Z
_1
Uf4

Figure 10.4: The circuits for the four functions f1 , · · · , f4 given in Fig. 10.3 when sandwiched between
Hadamards in order to perform the Deutsch algorithm. The upper qubit is initialized in either of
computational basis states, |xi with x = 0 or 1, while the lower qubit is initialized to be |1i. The
derivations of the equivalent circuits shown are given in the text. One sees that the upper qubit is
flipped for those functions which are balanced, and is not flipped for the constant functions.

10.B Derivation of some useful identities in quantum circuits

We have
0 1 1 0 1 1 1
X= , Z= , H=√ . (10.10)
1 0 0 −1 2 1 −1
By direct calculation it is easy to see that X 2 = 1, Z 2 = 1, and

H2 = 1 , (10.11)

where 1 is the identity

1 0
1= . (10.12)
0 1
Equation (10.11) is represented graphically by Fig. 10.5(a) Also by direct calculation, we have XH =
HZ. Hence multiplying on the left by H gives

HXH = Z , (10.13)

see Fig. 10.5(b) for a graphical illustration, and multiplying on the right by H gives

HZH = X , (10.14)

which is illustrated graphically in Fig. 10.5(c)

The NOT part of the CNOT gate is performed by the X operator. Hence we represent the CNOT
gate as a control-X gate as in Fig. 10.5(d). We will also meet the control-Z gate, in which the target
10.B. DERIVATION OF SOME USEFUL IDENTITIES IN QUANTUM CIRCUITS 95

(a) H H =

(b) H X H = Z

(d) Representation of a CNOT gate

X
Z
(e) =
Z

H H H H
(f)
=
H X H Z
(i)
H Z H X
= =
(ii) (iii)

Figure 10.5: Some useful identities in quantum circuits. Of particular note is identity (f) which shows
that putting Hadamards around a CNOT gate is equivalent to a CNOT gate without Hadamards, but
with the control and target qubits interchanged.

qubit is acted upon by Z if the control qubit is 1, and otherwise the target qubit is unchanged. As
with the control-X gate, there is no change in the control qubit. With a bit of thought, we see that the
only effect of the control-Z gate is to change the overall sign of the state if both the target and control
qubits are one. Thus the distinction between target and control is non-existent, so control and target
qubits can be interchanged in a control-Z gate, see Fig. 10.5(e).
Now consider a CNOT (control-X) gate sandwiched between Hadamards as shown in Fig. 10.5(f).
Consider the target (lower) qubit. If the control qubit does not act on it, the target qubit is just acted
on by the two Hadamards which is equivalent to the identity, see Fig. 10.5(a). If the control qubit
does act on the target qubit, the target qubit is acted on by the succession of gates HXH which is
equivalent to Z, see Fig. 10.5(b). Both these possibilities are taken care of by the equivalent circuit
in Fig. 10.5(f)(i), which is control-Z gate. As illustrated in Fig. 10.5(e), the target and control qubits
in a control-Z gate can be interchanged so Fig. 10.5(f)(i) is equivalent to Fig. 10.5(f)(ii). Now the
target qubit is the upper one, and has the sequence of gates H Ctrl-ZH acting on it. Similar to the
argument that showed Fig. 10.5(f) is equivalent to Fig. 10.5(f)(i), this is equivalent to Ctrl-X because
of the identities in Fig. 10.5(a) and Fig. 10.5(c). Hence Fig. 10.5(f) is equivalent to Fig. 10.5(f)(iii).
So we see that a CNOT surrounded by Hadamards is equivalent to a CNOT gate without Hadamards
but with the control and target qubits interchanged, a quite surprising result.
One could also derive this result by multiplying 4 × 4 matrices which is more tedious. However, for
96 CHAPTER 10. DEUTSCH’S ALGORITHM

completeness we will do it here. The CNOT gate has the matrix representation

|00i |01i |10i |11i (10.15)

 
1 0 0 0
0 1 0 0
UCN OT =
0
. (10.16)
0 0 1
0 0 1 0

In this tensor product the control qubit is to the left. The target qubit (to the right) is flipped if the
control qubit (to the left) is 1 (so, relative to the identity matrix, columns 3 and 4 are interchanged).
In a CNOT gate with target and control qubits swapped, the left hand qubit is flipped if the right
hand qubit is 1 (so columns 2 and 4 are interchanged). Hence we have

|00i |01i |10i |11i (10.17)

 
1 0 0 0
0 0 0 1
UCN OT SW AP =
0
. (10.18)
0 1 0
0 1 0 0

The tensor product H ⊗2 is given by

 
1 1 1 1
1 H H 1 1 −1 1 −1
H ⊗2 = √ =   (10.19)
2 H −H 2 1
 1 −1 −1
1 −1 −1 1

One can check by working out the matrix multiplication that

UCN OT SW AP = H ⊗2 UCN OT H ⊗2 , (10.20)

in agreement with Fig. 10.5(f). This is a bit tedious so I used Mathematica. It is more straightforward
to use the circuit identities shown in Fig. 10.5.
Chapter 11

The Bernstein-Vazirani Algorithm

11.1 The Algorithm

Like the Deutsch algorithm, the Bernstein-Vazirani algorithm finds information about a black box
function, but has a bigger speedup. It is very similar to the Deutsch-Josza algorithm which is set as a
homework problem.
Consider a function
f (x) = a · x (11.1)
where a and x have n bits while the function itself, f , has one bit. The dot indicates a bitwise inner
product with modulo 2 addition:

a · x ≡ a0 x0 ⊕ a1 x1 ⊕ · · · ⊕ an−1 xn−1 . (11.2)

The problem is to determine a.

Let’s make sure that we understand the “dot”. We have ai xi = 0 or 1. Hence

a · x = a0 x0 ⊕ a1 x1 ⊕ · · · ⊕ an−1 xn−1 (11.3)

1 if an odd number of terms is 1
= (11.4)
0 if an even number of terms is 1

For example for n = 4, if the bits of a are 1101 and the bits of x are 1110 (recall that the zeroth bit is
the least significant, i.e. the rightmost one) then1

a · x = (1 × 0) + (0 × 1) + (1 × 1) + (1 × 1) mod 2 = 0 + 0 + 1 + 1 mod 2 = 2 mod 2 = 0. (11.5)

Hence, for these values of a and x, f (x) = 0. If we take x = 1000 then f (x) = 0 + 0 + 0 + 1 mod 2 = 1.
Classically we can only determine the bits of a one at a time. The k-th bit of a can be determined
by feeding in x = 2k . To see this, consider the binary representations of a and x:

a = a0 + a1 21 + · · · + ak 2k + · · · + an−1 2n−1 ,
(11.6)
x = x0 + x1 21 + · · · + xk 2k + · · · + xn−1 2n−1 .

Hence if x = 2k then xk = 1 while, for l 6= k, xl = 0, so a · x = ak . Consequently f (2k ) = ak . We have

to do this for each bit, k = 0, 1, 2, · · · , n − 1, so it requires n calls of the function.
1
One can either do the mod 2 operation after each addition or add up in the normal way and apply the mod 2 operation
at the end. In either case, the result is 0 if an even number of terms in the sum are 1, and 1 if an odd number of terms
are 1.

97
98 CHAPTER 11. THE BERNSTEIN-VAZIRANI ALGORITHM

We will see that the quantum algorithm succeeds in determining a with just one call!
A schematic diagram of a general reversible unitary transformation which takes an n-bit input x
in the upper register and generates an m-bit output f (x) in the lower register is shown in Fig. 9.2. For
the Bernstein-Vazirani Algorithm there are n qubits in the upper register but only 1 qubit in the lower
register. In addition, the unitary Uf is surrounded by Hadamards, as shown in Fig. 11.1. The upper
register is set to |0in and the lower qubit to |1i. This is the same circuit as for the Deutsch-Josza
algorithm, see problem 11.1.

0 n xn n xn n a
n H H n
Uf
1 H H 1

Figure 11.1: Circuit diagram for the Bernstein-Vazirani algorithm. In the final state the upper (input)
register contains |ai while the lower (output) qubit reverts to its initial state |1i. The desired value of
a can therefore be read off by measuring the upper register.

Acting with H on |0i gives an equal linear superposition of the two basis states. Similarly acting
with H ⊗n on |0in gives an equal superposition of the 2n basis states. Hence, including the lower
register, the state inputted to Uf is

2 −1 n
⊗n 1 X |0i − |1i
H |0in ⊗ H|1i = √ |xin ⊗ √ . (11.7)
n
2 x=0 2

For each term in the superposition, the function Uf acts in the same way as for the Deutsch algorithm
described in Chapter 10. The lower qubit is flipped if f (x) = 1, which is the same as changing the sign
of the state. If f (x) = 0 there is no change. Hence each term in the superposition acquires a factor of
(−1)f (x) , so the state of the system immediately after the action of Uf is
n
2 −1
1 X ( |0i − |1i )
√ (−1)f (x) |xin ⊗ √ . (11.8)
n
2 x=0 2

Next consider the √ effect of the Hadamards acting after Uf . The action on the lower qubit is to
convert ( |0i − |1i )/ 2 to |1i. However, the effect of H ⊗n acting on an arbitrary computational basis
state |xin needs more thought. Consider first just one qubit. Then
1
1 1 X
H|xi = √ ( |0i + (−1)x |1i ) = √ (−1)xy |yi . (11.9)
2 2 y=0

Hence the effect of applying H ⊗n on an n-qubit computational basis state is

where x · y is the bitwise inner product with moduloP2n−1addition defined in Eq. (11.2), and we have
used the fact that we only need to know whether j=0 xj yj is even or odd. All the amplitudes,
cy ≡ (1/2 )(−1) , are equal in magnitude and the sign is +1 if n−1
n x·y
P
j=0 xj yj is even (note that each
Pn−1
term in the sum is 1 or 0) and the sign is −1 if j=0 xj yj is odd.
Hence, combining Eqs. (11.8) and (11.10), the amplitude to find the upper register in state |yin ≡
|yn−1 i · · · |y1 i|y0 i is
n
2 −1
1 X
cy = n (−1)f (x)+x·y
2
x=0
 
n−1 1
1 Y X
= n (−1)(aj +yj )xj  . (11.11)
2
j=0 xj =0

Let us evaluate this for the state where yj = aj for all j, in which case P aj + yj = 2 or 0. If xj = 0
then (−1)(aj +yj )xj = 1 and if xj = 1 we also get (−1)(aj +yj )xj = 1, so 1xj =0 (−1)(aj +yj )xj = 2, i.e. the
two terms add up in phase. Hence, from Eq. (11.11), we have ca = 1. Since the total probability must
add up to 1 this means that all the other amplitudes must be zero. To see that this is indeed the case,
note that for each qubit where yj 6= aj , aj + yj = 1 and so the sum over xj for these qubits gives zero.
The final result in Eq. (11.11) is a product over terms for each qubit and so we get zero, as required.
Including the lower (output) qubit, the final state is

|ain ⊗ |1i , (11.12)

and a measurement of the upper register in Fig. 11.1 gives a, with probability one, even though we
made just one call to the function.
Since a classical computation of a requires n function calls, we have obtained a “quantum speedup”
of n. Note that the procedure is analogous to Deutsch’s algorithm. The first set of Hadamards
generates a superposition of inputs to the gate Uf which “evaluates”2 the function for all 2n inputs
using quantum parallelism, and then the second set of Hadamards destroys all the outputs apart from
a, using quantum interference.

11.2 An Alternative Derivation

Following Mermin [Mer07] and Vathsan [Vat16] it is useful to give an alternative derivation of how the
circuit in Fig. 11.1 works, by giving an explicit construction of the black box Uf . It is convenient to
illustrate by a specific example. We take n = 5 and a = 11010 so a0 = 0, a1 = 1, a2 = 0, a3 = 1, a4 = 1
(recall we read the bits from right, the least significant, to left, the most significant). The function
a · x can be implemented by the gates shown in Fig. 11.2.
To incorporate Uf into the Bernstein-Vazirani algorithm, we sandwich it in between Hadamards,
see Fig. 11.1, and note that the Hadamards interchange control and target qubits in the CNOT
(control-X) gates, see Fig. 10.5(f) in Chapter 10. As before, the initial upper register is |0in and the
lower register is |1i. We see immediately from Fig. 11.3 that a is directly imprinted in the final state
of the input register. There does not appear to be any parallelism and interference.
Hence these two explanations of the Bernstein-Vazirani algorithm are quite different. To quote
Mermin [Mer07]:
“The first applies Uf to the quantum superposition of all possible inputs and then applies
operations which leads to perfect destructive interference of all states in the superposition
2
To understand the reason for the quotation marks see the discussion at the end of Sec. 9.2.
100 CHAPTER 11. THE BERNSTEIN-VAZIRANI ALGORITHM

a = 11010

x4
x3
Uf = x2
x1
x0
y X X X y + a x

Figure 11.2: A circuit diagram for n = 5 to implement the function f (x) = a · x with a = 11010,
i.e. f (x) = x1 + x3 + x4 mod 2. The circuit flips the output qubit, the lowest one, initialized to y,
whenever x1 ⊕ x3 ⊕ x4 = 1. (Note that flipping y is equivalent to adding 1 to y mod 2.) Hence the
final value of the output qubit is y ⊕ (a · x) = y ⊕ x1 ⊕ x3 ⊕ x4 as required.

x4 0 H H 0 X 1
x3 0 H H 0 X 1
x2 0 H H 0 0

x1 0 H H = 0 X 1

x0 0 H H 0 0

y 1 H X X X H 1 1

Figure 11.3: Sandwiching the circuit for Uf in Fig. 11.2 between Hadamards, and realizing that the
effect of the Hadamards is to interchange the control and target qubits in the CNOT (control-X) gates,
we see immediately that the final state of the upper (input) register contains a = 11010.

except for the one in which the upper (input) register is in the state |ai. The second
suggests a specific mechanism for representing the subroutine that executes Uf and then
shows that sandwiching such a mechanism between Hadamards automatically (my italics)
imprints a on the upper register. Interestingly, quantum mechanics appears in the second
method only because it allows the reversal of the control and target qubits of a cNOT
operation solely by means of 1-qubit (Hadamard) gates.”

(I have used the conventional spelling of “qubit” rather than Mermin’s idiosyncratic “Qbit”.)

Problems
11.1. The Deutsch-Josza Algorithm
This is an extension of the Deutsch algorithm discussed in class. Recall that in Deutsch’s
algorithm the input is one bit and the output is also one bit. In the Deutsch-Josza algorithm,
the output is still one bit but the input has n bits, so there are 2n distinct inputs. We are told
that either the function is “constant” (in which case the function outputs the same value for all
2n inputs) or is “balanced” (in which case an equal number of inputs give the results 1 and 0).
11.2. AN ALTERNATIVE DERIVATION 101

Clearly this is a very artificially constructed problem but it will be our first quantum algorithm
with more than a one-bit input. Note that it is precisely the Deutsch algorithm for n = 1.
The circuit for the Deutsch-Josza is almost identical to that for the Deutsch algorithm except
that the upper qubit in the Deutsch algorithm (sometimes called the “input” qubit) is replaced
by a n-qubit register. The circuit is shown in the figure below.

ψ ψ ψ
0 1 2

0 x nn xn
n H H n y
n
Uf
1 H H 1

The function Uf acts as follows on computational basis states |xin and |zi:

Uf |xin |zi = |xin |z ⊕ f (x)i , (11.13)

where x is an n-bit integer, |xi is the state of the n-qubit upper register in the figure, z and f (x)
are 1-bit integers, and |zi is the lower qubit in the figure.
As in the Deutsch algorithm, the lower qubit is initialized to |1i. In the Deutsch algorithm, the
upper qubit is initialized to |0i. Here the single qubit is replaced by an n-qubit register which,
by analogy, is initialized to |0in .

(i) Show that

2 −1 n
⊗n 1 X
|ψ0 in = H |0in = √ |xin , (11.14)
2n x=0
so the input to the function Uf is the uniform superposition of all 2n basis states.
(ii) Show that after the action of Uf the state of the upper register is
2 −1 n
1 X
|ψ1 in = √ (−1)f (x) |xin . (11.15)
2n x=0

The fact that the value of f (x) only changes the overall sign of the state is called the phase
kickback trick by Vathsan.
(iii) Show that after the action of the second set of Hadamards on the n-qubit register, the state
of that register is
2n −1
⊗n 1 X
|ψ2 in = H |ψ1 in = n (−1)[f (x)+x·y] |yin , (11.16)
2
x,y=0

where x · y is the bitwise inner product of x and y with modulo 2 addition:

x · y = x0 y0 ⊕ x1 y1 ⊕ . . . ⊕ xn−1 yn−1 . (11.17)

(iv) The upper register is then measured, and an n-bit integer y is obtained. Show that if
the function is a constant then y = 0 with probability 1. Show also that if the function
is balanced then one must get a non-zero value of y. Hence the Deutsch-Josza algorithm
succeeds with just one function call.
102 CHAPTER 11. THE BERNSTEIN-VAZIRANI ALGORITHM

(v) How does this compare with a classical approach? The only thing one can do classically is
keep computing f (x) for different values of x and seeing if one gets more than one value
for the output. If the function is balanced, one would probably get different outputs quite
quickly. If the function is constant one would need to evaluate half the inputs (plus 1),
i.e. 2n−1 + 1, to be 100% sure that the function is not balanced. This is exponentially (in
n) worse than the quantum algorithm.
However, this is arguably not fair. We may well be content to establish that the function is
constant with some high probability3 , a bit less than one. If the function is constant, how
many function calls would you need classically to rule out the possibility that it is balanced
with a probability of error of no more than (i) 10−3 and (ii) 10−6 .
Note: For simplicity, assume that the number of function calls is much less than 2n/2 , the
number of values of x which give the same result if the function is balanced.

11.2. Consider the Deutsch-Josza algorithm for n = 2, and assume a constant function f (x) = 0 for
all x, i.e. x = 0, 1, 2, 3. Compute explicitly the state of the system at each stage and show that
you get the state |yi = |00i in the upper register at the end.
Hint: Evaluate explicitly Eqs. (11.14)–(11.16) for this situation.

11.3. Consider again the Deutsch-Josza algorithm for n = 2 but this time assume that f (00) = f (01) =
0, f (10) = f (11) = 1 (a balanced fiunction). Determine the final state of the upper register and
show that this implies the function is balanced, as indeed it is.
Hint: See the hint for Qu. 11.2.

11.4. The Toffoli Gate.

We stated in Sec. 7.1 that for classical reversible computation we need three-bit gates, such as the
Toffoli gate, in addition to 1-bit and 2-bit gates, to be able to perform universal computation.
However, three qubit gates are, fortunately, not needed in quantum computation because the
appropriate three-bit gates can be constructed out of 1-qubit and 2-qubit gates.
Here we consider the quantum Toffoli gate, which is a control-control-NOT (C-C-NOT) gate:

x x
y y

z z + xy

The target qubit z is flipped if both the control qubits, x and y, are 1 and is otherwise unchanged.

(i) Consider the following circuit for an arbitrary unitary operator V :

x x
y X X y

z V V V

3
For later quantum algorithms we will only be able to solve the problem with high probability. Since we have to give
up 100% certainty in the quantum case, we we should not insist on 100% certainty here from the classical algorithm.
11.2. AN ALTERNATIVE DERIVATION 103

Show that it acts with V 2 on |zi if both x and y are 1 and otherwise does nothing.
Hint: One possible way of approaching this question (though not the only way) is to consider
separately what happens for the four possible input values of the control qubits x and y,
namely 00, 01, 10, and 11.
Another, more elegant, way is to note that the effect of a Ctrl-V gate in which |zi is the
target and |xi is the contol is V |xi|zi −→ |xiV x |zi.
(ii) Now take V to be the following 1-qubit gate:

(1 + i X)
V = (1 − i) . (11.18)
2
Show that V † V = 1, and hence V is unitary. Show also that V 2 = X and hence the above
circuit is a quantum Toffoli gate.
Note: One sometimes says that V is the “square root of X”.
104 CHAPTER 11. THE BERNSTEIN-VAZIRANI ALGORITHM
Chapter 12

Simon’s Algorithm

So far we have studied Deutsch’s algorithm in Chapter 10 which gave a quantum speedup of a factor
of 2, and the Bernstein-Vazirani algorithm in Chapter 11, which gave a speedup of n, where n is the
size of the problem. Next we consider a problem, due to Daniel Simon, which gives an exponential
speedup in n. Like the previous algorithms it has an artificial character and is not of practical use,
but it has features in common with the vastly more useful algorithm of Shor for factoring integers,
which we shall spend a substantial amount of time on in the next few chapters. Like Shor’s algorithm,
Simon’s is of a probabilistic nature.
In Simon’s problem we are given a black box function which takes an n-bit input and has the
property that
f (x ⊕ a) = f (x), (12.1)
where a is a non-zero n-bit integer and ⊕ means bitwise addition modulo 2. Note that each bit is
treated separately, so if the integer x is represented in binary notation by bits xn−1 xn−2 · · · x1 x0 , and
similarly for a then x⊕a is an integer y with binary representation yn−1 yn−2 · · · y1 y0 where yj = xj ⊕aj .
Adding a twice to x (modulo 2) gives back x, i.e.

x⊕a⊕a=x (12.2)

since adding a bit to itself gives 0 ( mod 2) irrespective of whether that bit is 0 or 1. Hence

f (x) = f (x ⊕ a) = f (x ⊕ a ⊕ a) (12.3)

and so on, so f (x) is periodic, with period a, under bitwise mod 2 addition. We are told that for every
x there is only one other input to the function, x ⊕ a, which gives the same output, so there are 2n−1
distinct values of f . Hence we assume that we can represent f by n − 1 qubits. An example of a
function with the desired property is shown in Table 12.1.
The problem is to determine the period a with the least number of function calls.
If we input different values of x and find a repeated output, i.e. if f (xi ) = f (xj ), then xj = xi ⊕ a.
If we add xi to both sides (bitwise addition modulo 2) we get

a = xi ⊕ xj . (12.4)

so we obtain a if we can find two values of x which give the same function value.
Classically this problem is hard, by which we mean that the number of function calls grows
exponentially with n. All one can do is call the function with different values of x until one finds a
repeated output, i.e. f (xi ) = f (xj ), which gives us a from Eq. (12.4). After m calls to the function
we have compared m(m − 1)/2 pairs. For a reasonable chance of success we need 21 m(m − 1) ∼ 2n , so
m = O(2n/2 ), i.e. exponential in the number of bits n.

105
106 CHAPTER 12. SIMON’S ALGORITHM

x 0 1 2 3 4 5 6 7
f (x) 3 2 2 3 0 1 1 0

Table 12.1: An example with n = 3 bits of the type of function that is considered in Simon’s algorithm.
The function satisfies f (x) = f (x ⊕ a) for some non-zero a. To determine a we look for repetitions.
An example is f (4) = f (7) = 0. Hence, according to Eq. (12.4), a = 4 ⊕ 7 = 100 ⊕ 111 = 011 = 3. The
other repetitions satisfy this same condition as you can check.

The circuit to solve this problem quantum mechanically is similar to that in the Bernstein-
Vazirani algorithm except that the lower register has enough qubits to contain the function values,
i.e. n − 1. Also the phase kickback is not used, so the lower register is initialized to |0in−1 rather than
|1i and we do not have Hadamards on the lower register. A final difference is that we measure first on
the lower register rather than the upper one. The circuit diagram is shown in Fig. 12.1.

ψ ψ ψ ψ
0 1 2 3

n xn n xn n
0 H H y
n n
Uf
n−1
0 fmeas
n−1

Figure 12.1: Circuit diagram for Simon’s algorithm. The upper register has n qubits and contains the
x values, while the lower register has n − 1 qubits and contains the values of the function f (x).

After the first Hadamards in the upper register the state of the system is
n −1
2X
1
|ψ0 i = |xin ⊗ |0in−1 . (12.5)
2n/2 x=0

The function call makes the transformation |xin ⊗ |yin−1 → |xin ⊗ |y ⊕ f (x)in−1 , see Fig. 11.1 in
Chapter 11. Here y = 0 so, after the function call the state becomes
n −1
2X
1
|ψ1 i = |xin ⊗ | f (x) in−1 . (12.6)
2n/2 x=0

A measurement is then done on the lower register which will record some value of the function, fmeas
say. All values are equally probable. There are two values of x which give function value fmeas , and
we denote them by xmeas and xmeas ⊕ a. Hence, immediately after the measurement, the state of the
system is
|xmeas in + |xmeas ⊕ ain
|ψ2 i = √ ⊗ | fmeas in−1 . (12.7)
2
If we were now to measure the upper register, we would get either xmeas or xmeas ⊕ a. At first
glance, this might seem like progress since we appear to be halfway there. If we could just get the
other number, we would have a. However there is no way to get both. If we could clone the state
several times and measure each clone then, with high probability, we would be able to determine both
107

of them. However, the no-cloning theorem says that we can’t clone an arbitrary, unknown state. Also,
repeating the whole procedure doesn’t help because, with high probability, we would get a different
function value, f˜meas , and one of a different pair of x-values, x̃meas or x̃meas ⊕ a, from which again we
would not be able to extract a.
As in Deutsch’s algorithm and the Bernstein-Vazirani algorithm, we must do some processing before
the final measurement. As we showed in Eq. (11.10) in Chapter 11 on the Bernstein-Vazirani algorithm,
the effect of Hadamards on n-qubit register which is in a computational basis state |xin , is given by
n −1
2X
⊗n 1
H |xin = (−1)x·y |yin , (12.8)
2n/2 y=0

where x · y is the bitwise inner product modulo 2,

x · y ≡ x0 y0 ⊕ x1 y1 ⊕ · · · ⊕ xn−1 yn−1 mod 2, (12.9)

discussed in Sec. 11.1. Hence, applying Hadamards to the n-qubit upper register in state |ψ2 i in
Eq. (12.7), the state of that register becomes
2 −1 n
1 1 Xh xmeas ·y (xmeas ⊕a)·y
i
|ψ3 in = √ (−1) + (−1) |yin . (12.10)
2 2n/2 y=0

Now1 (x ⊕ a) · y = (x · y) ⊕ (a · y) so we can write

2 −1 n
1 1 X
|ψ3 in = √ n/2 (−1)xmeas ·y [ 1 + (−1)a·y ] |yin . (12.11)
2 2 y=0

Noting that a · y = 0 or 1 we see that if a · y = 1 then the two terms in Eq. (12.11) cancel. Hence
the only terms with a non-zero amplitude are those with a · y = 0. All values of y which satisfy this
condition are equally probable. Note that the condition does not depend on the value of xmeas .
A measurement on the upper register then gives, with equal probability, one value of y with a·y = 0.
This is a linear equation for the ai , the bits of a, i.e.

a0 y0 + a1 y1 + · · · + an−1 yn−1 = 0. (12.12)

If we can find n such equations for the ai which are linearly independent, we can obtain the solution.
Hence we have to repeat the procedure, each time determining the yi . As discussed in Appendix G of
Mermin [Mer07] one needs to run the algorithm a little more than n times because the set of equations
one gets for the ai are not necessarily linearly independent. The result is that if one runs n + p times,
then the probability of getting n linearly independent equations (and hence the solution for the ai ) is
is greater than
1
1 − p+1 . (12.13)
2
Hence there is less than one chance in a million of failure if one calls the function n + 20 times. A
crucial point in this expression is that the number of calls beyond n needed to find a solution with
some high probability does not depend on n.
The occurrence of probability, and some arcane mathematical arguments to prove that one does
get the solution with high probability within the specified number of runs, is characteristic of several
quantum algorithms including Shor’s.
1
This is the mod 2 version of the usual distributive rule for addition and multiplication: a × (b + c) = (a × b) + (a × c).
108 CHAPTER 12. SIMON’S ALGORITHM

In the case of Simon’s problem, the classical algorithm takes of order 2n/2 function calls whereas
the quantum algorithm finds the answer with high probability with little more than n calls2 . This is
an exponential speedup3 .
Finally a few words of anticipation for Shor’s algorithm which we will do next. Simon’s problem
considers a function which is periodic under bitwise modulo 2 addition, i.e. f (x ⊕ a) = f (x). Shor’s
algorithm investigates functions which are periodic under ordinary addition: f (x + a) = f (x), which
is much more useful. In Simon’s problem, the action of the n-Hadamards in Eq. (12.8) can be written
n −1
2X
⊗n 1
H |xin = eiπx·y |yin , (12.14)
2n/2 y=0

Since x · y is the bitwise inner product modulo 2, it only takes values 0 and 1, so the phases in the
complex exponential are just 0 and π. The core of Shor’s algorithm is a quantum Fourier transform
(QFT), where an essential difference from Eq. (12.14) is that the bitwise inner product is replaced by
ordinary multiplication. Hence the QFT generates many different phases, with the result that, unlike
Simon’s algorithm, it cannot, in general, be constructed entirely out of 1-qubit gates. Fortunately, it
can be constructed entirely out of 1- and 2-qubit gates. All this and more will be discussed in Chapter
17.

Problems
12.1. Consider Simon’s problem, i.e. we have a function f (x), where x has n bits and f has (n − 1)
bits such that f (x) = f (x ⊕ a) where a 6= 0. The quantum algorithm obtains values for x such
that a · x = 0. From these linear equations for x one deduces a.
Consider the case of n = 4. You are given that some of the values of x for which x · a = 0 are

x = 3 (0011)
x = 4 (0100)
x = 7 (0111)
x = 9 (1001)

(i) Using only this information, determine a.

(ii) For this value of a, show that a · x = 1 for x = 1 and 2.
(Hence x = 1 and x = 2 would not appear as possible results.)

2
In the interests of full disclosure I should state that one also needs to solve n linear equations on a classical computer,
which takes of order n3 steps. A algorithm which takes a time proportional to a power of the problem size n is said to
be polynomial. Since classical hardware is cheap it is not clear if one should include this time using a classical computer
in the computational cost of Simon’s algorithm. However, since n3 is polynomial, even if one does include this time the
comparison is still between a polynomial quantum (+classical) algorithm and an exponential purely classical algorithm,
which is still an exponential speedup, see footnote 3.
3
An algorithm which takes a time proportional to a power of the problem size is said to have polynomial complexity,
while if the time increases exponentially with size (or exponentially with a power of the size) it is said to have exponential
complexity. If one algorithm has polynomial complexity and another has exponential complexity then the former is said
to have an exponential speedup compared with the latter.
Chapter 13

Factoring and RSA

(Rivest-Shamir-Adleman) Encryption

Shor’s famous quantum algorithm, to be discussed in detail in Chapter 17, factors large integers much
more efficiently than any known classical algorithm. Factoring is not just of interest to mathematicians,
however, because the difficulty of factoring is at the heart of the popular RSA method of encrypting
sensitive information sent via the internet (or some other public channel). While RSA is not the only
method use to encrypt information, my understanding is that some version of Shor’s algorithm can
be used to crack other encryption methods such as Diffie-Hellman. RSA stands for the names of its
inventors, Rivest, Shamir and Adleman.
This chapter is a LATEX copy of a Mathematica notebook, the original of which is available at
[Link] In it, the RSA algorithm is implemented, parame-
ters are chosen, and random messages are generated. These are encrypted, the encrypted messages are
decrypted, and a check is made that the original message is recovered. It you have Mathematica you
can run the notebook version and verify that the RSA algorithm works.
Suppose that Bob wants to receive a message from Alice on the internet (a public channel). Any-
thing sent on a public channel can be intercepted by others. How can Bob and Alice agree on a coding
scheme and then send each other coded messages which can be decoded by the other person but not
by anyone “sniffing” on the internet? This has to be accomplished by only sending messages down the
public channel.
We will now describe the RSA encryption scheme for doing this. It uses a result of number theory
which we will quote but not prove. To receive the message from Alice, Bob picks two large prime
numbers p and q, and sends to Alice, on the public channel, their product

N = pq, (13.1)

but not p and q separately. N is taken to be large enough, typically a few thousand bits, that it cannot
be factored on a classical computer. You might ask how can one choose the large prime numbers p
and q. If one selects a large integer N at random it can be shown that the probability that it is prime
is about 1/ ln N . Hence, even if N has, say, 400 digits (around 1000 bits) you only have to take test a
few hundred to a thousand random integers to typically find a prime number. But can one efficiently
test if a number is prime? It turns out that one can, even though, if the number is found to be not
prime, there is no known efficient classical algorithm to determine the prime factors. The website
[Link] explains how the test for primality is done in
Mathematica.
Bob also sends a large “encoding number” c which has no factors in common with (p − 1)(q − 1).
If there are no factors in common then the greatest common divisor (GCD) is 1. The GCD of two

109
110 CHAPTER 13. FACTORING AND RSA (RIVEST-SHAMIR-ADLEMAN) ENCRYPTION

integers is easily determined by Euclid’s algorithm discussed in Sec. 13.A. According to Appendix J
of Mermin [Mer07], the probability that two large random integers have no common factors is greater
that 1/2, so it is not difficult to find a suitable value for c.
Hence the public key (available to everyone) is N and c.
Since Bob knows both p and q, and hence (p − 1)(q − 1), he can also determine the integer d such
that
c d = 1(mod (p − 1)(q − 1)). (13.2)
Let us remind ourselves of this mod function. The value of a mod b is the result after one subtracts
(or adds) the appropriate multiple of b to a to get a value which lies in the range 0 to b − 1. If a is
positive, things are simple, one subtracts a multiple of b (possibly 0) so the mod function is just the
remainder after integer division. Hence, for example, 9 mod 5 = 4 because 9/5 = 1 remainder 4. If a
is negative one has to add a multiple of b, so, for example, (−13) mod 5 = 2 (since −13 + (3 × 5) = 2).
The above equation, c d = 1 mod (something), looks strange at first. If c is an integer we would
normally think that its inverse should be a fraction. However, here d is also an integer, and the product
of two integers can give 1 if we use modular arithmetic. For example if c = 5 and d = 3 then cd = 15,
and cd mod 7 = 1 (since 15 = (7 × 2) + 1).
The algorithm for computing d in Eq. (13.2) is efficient and an extension of Euclid’s algorithm. It
is given in Appendix 13.B and in Appendix J of Mermin [Mer07]. It turns out that d is unique. Hence
Alice, and anyone else sniffing on the public channel, knows N and c (but not p, and q, and hence not
d).
The private key (known only to Bob) is p and q (and hence d).
Alice breaks up her message into chunks each containing a number of bits less than the number
of bits of the integer N . Each chunk is then a binary number less than N . Let’s denote by a the
numerical value of one chunk.
a is the original message.
Using the values of N and c that Bob has sent, Alice computes

b = ac (mod N ) the encoded message. (13.3)

The encoded message b is another large integer, and is sent down the public channel from Alice to Bob.
Bob knows not only c and N , but also the value of d. Here number theory kicks in and shows that
the original (unencoded) message a is given by

a = bd (mod N ) (the original message is recovered). (13.4)

For a proof of this result see the book by Mermin [Mer07]. Note the symmetry between the encoding
formula, Eq. (13.3) and the decoding formula, Eq. (13.4), with c and d related by Eq. (13.2).
Bob can compute the original message a because he knows d, but anyone sniffing on the public
channel does not know d. However, if a third person, traditionally called Eve, listening on the public
channel, could factor N (which is sent down the public channel) into its factors p and q, she would
then have (p − 1)(q − 1) and, since c is also sent down the public channel, she could determine d where
c d = 1 (mod (p − 1)(q − 1)) using the extension of the Euclid algorithm mentioned above. Hence she
could find the original unencrypted message a from Eq. (13.4).
Let’s do a simple example. We will take

p = 7, q = 13, so N = 91. (13.5)

For the encoding integer we take c = 11, which has no factors in common with (p − 1)(q − 1) =
6 × 12 = 72. As shown in Appendix 13.B, using the extended Euclid algorithm one finds that d = 59.
111

(Let’s verify this: cd = 11 × 59 = 649 = (9 × 72) + 1 so cd mod (p − 1)(q − 1) = 1, as desired.) The

Mathematica code below sets these values, checks that p and q are prime while N is not, and that
cd = 1 (mod (p − 1)(q − 1)). (Note: in Mathematica commands I use n rather than N because N
has a special meaning in Mathematica.) The code then generates a message a by computing a random
integer between 0 and N − 1, and next computes the encoded message b from b = ac (mod N ). It then
computes bd (mod N ) and checks that it gives back the original message a. If you have Mathematica
you can run the code several times (each time a different random value for the message a will be
generated) and see that the original message is always returned.

In[1]:= p=7; q=13; c=11; d=59; n=p*q

Out[1]= 91

We check that p and q are prime. The Mathematica command PrimeQ[p] returns “True” if p is prime
and “False” if it is not.

In[2]:= PrimeQ[p]
Out[2]= True
In[3]:= PrimeQ[q]
Out[3]= True
In[4]:= PrimeQ[n]
Out[4]= False

We check that cd = 1 mod ((p − 1)(q − 1)).

In[5]:= Mod[c * d, (p-1)(q-1)]

Out[5]= 1

We generate a random message, using the command Random[Integer, n − 1] which generates a

random integer between 0 and n − 1.

In[6]:= mess = Random[Integer, n - 1]

Out[6]= 51

We compute the encoded message.

In[7]:= encodedmess = Mod[mess^c, n]

Out[7]= 25

We decode the encoded message and check that we recover the original message.

In[8]:= recoveredmess = Mod[encodedmess^d, n]

Out[8]= 51
In[9]:= recoveredmess == mess
Out[9]= True

Hence the message was successfully decoded.

Problems
13.1. Consider the RSA scheme for encryption with p = 11, q = 3 so N = pq = 33. For the encoding
integer take c = 3 which has no factors in common with (p − 1)(q − 1) = 20.
112 CHAPTER 13. FACTORING AND RSA (RIVEST-SHAMIR-ADLEMAN) ENCRYPTION

(i) Using the extended Euclid algorithm, find the decoding number d which satisfies cd = 1
mod (p − 1)(q − 1).
(ii) Assume that the original message m is represented by the integer 7. Compute the encoded
message m0 given by
m0 = mc mod N. (13.6)

(iii) Compute (m0 )d mod N , and show that you recover the original message m.

Appendices

13.A The Euclidean Algorithm

We want to efficiently find the Greatest Common Divisor (GCD) of two integers. This is the largest
factor that they have in common. As a simple example, the GCD of 24 and 9 is 3 since 24 = 23 × 3
and 9 = 32 .
Suppose we want the GCD of two numbers a0 and b0 with a0 > b0 . We proceed iteratively. At
each stage, the new value of a is equal to the old value of b, and the new value of b is equal to the
remainder when the old value of a is divided by the old value of b, i.e.

an+1 = bn
(13.7)
bn+1 = an − [an /bn ]bn which is the same as bn+1 = an mod bn ,

where [· · · ] means the integer part of the quantity in brackets.

Assuming that bn < an and using Eq. (13.7) to get an+1 and bn+1 , one finds (i) bn+1 < bn since the
largest value that a number can have mod bn is bn − 1, (ii) bn = an+1 so combined with (i) we have
bn+1 < an+1 and (iii) an+1 = bn < an . Hence an and bn :

(a) decrease at successive iterations, and

(b) maintain the inequality an > bn .

Note too that an and bn have the same common factors as a0 and b0 , because an+1 and bn+1
are linear combinations of the values at the previous stage, an and bn , and so any common factor is
preserved. Eventually we get to a stage where bn+1 = 0 at which the procedure stops. This means
that an is divisible by bn so bn is the greatest common divisor. As an example we take a0 = 24, b0 = 9,

n an bn
0 24 9 (the initial values)
1 9 6 (since 24 = 2 × 9 + 6)
2 6 3 (since 9 = 6 × 1 + 3)
3 3 0 (since 6 = 3 × 2 + 0) .

Hence the GCD of 24 and 9 is b2 (= 3), which is correct.

13.B Extension of the Euclidean Algorithm to find an inverse mod-

ulo an integer
Given a and c which have no common factors, and a > c, we want to find d where

c d = 1 mod a. (13.8)
13.B. EXTENSION OF THE EUCLIDEAN ALGORITHM TO FIND AN INVERSE MODULO AN INTEGER113

The greatest common divisor of c and a is 1 since, by assumption, they have no common factors.
We go through the Euclid algorithm

an+1 = cn
(13.9)
cn+1 = an − [an /cn ]cn

until we get to the stage where cn = 1, the greatest common divisor. One can then obtain d by working
backwards through the iterations. This is best shown by an example. We take p = 7, q = 13, as in
example above, so we have a = (p − 1)(q − 1) = 72 and hence we initialize a0 = 72. We also take c = 11
(again as in the example) which has no factors in common with a, and so initialize c0 = 11. Hence the
Euclid algorithm proceeds as follows

n an cn
0 72 11 a0 = a, c0 = c (the initial values)
1 11 6 a1 = c0 , c1 = a0 − 6c0 = 6
2 6 5 a2 = c1 , c2 = a1 − c1 = 5
3 5 1 a3 = c2 , c3 = a2 − c2 = 1 (c3 = 1 so we stop).

Hence working backwards,

1 = a2 − c2 = c1 − (a1 − c1 ) = 2c1 − a1 = 2(a0 − 6c0 ) − c0 = 2a0 − 13c0 (= 2a − 13c). (13.10)

We want to take this (mod a). Now 2a (mod a) = 0. Since −13c is negative we need to make it
positive by adding a c (which is zero (mod a)). Hence

1 = 2a − 13c (mod a) = −13c (mod a) = (−13 + a)c (mod a) = 59c (mod a), (13.11)

where we used that a = 72 to get the last equality. Hence d = 59 as stated in the above example.
114 CHAPTER 13. FACTORING AND RSA (RIVEST-SHAMIR-ADLEMAN) ENCRYPTION
Chapter 14

Using Period Finding to Factor an

Integer

In this chapter, we explain how finding the period of a certain function will enable us to factor integers.
We will also illustrate the technique with a simple example. This will probably seem a strange approach
for factoring, and is not the preferred method on a classical computer, but it is the method used by
Shor in his quantum algorithm.
We take two large primes p and q and form the product

N = pq. (14.1)

The goal is to find the factors p and q given only the product N . This is a problem which is hard
classically. For applications in cryptography p and q may have around 600 digits (around 2000 bits)
so n, the number of bits of N , will be several thousand.
We proceed by choosing a random integer a less than N which has no factors in common with N .
Whether or not a and N have a common factor can be determined efficiently using Euclid’s algorithm,
which was described in Sec. 13.A. In the very unlikely event that a and N do have a common factor
we have found a factor of N and the problem is solved. Otherwise we compute the following function

f (x) ≡ ax ( mod N ) (14.2)

for x = 1, 2, · · · . As stated, a and N have no common factors, and for this case one can show that
eventually we will get f (x) = 1 for some value, x = r say, so

ar ( mod N ) ≡ 1 . (14.3)

The function then repeats since

f (x + r) ≡ ax+r ( mod N ) ≡ ax ( mod N ) × ar ( mod N ) ≡ ax ( mod N ) = f (x) , (14.4)

using Eq. (14.3). Hence r is the period of the function.

We we illustrate with a simple example,

N = p q = 91, with factors p = 13, q = 7. (14.5)

We also take a = 4, which has no factors in common with 91. We plot f (x) ≡ 4x (mod 91 ) in Fig. 14.1.
The periodic nature is clear, and the period is found to equal 6 by inspection. Let’s make sure we

115
116 CHAPTER 14. USING PERIOD FINDING TO FACTOR AN INTEGER

4 (mod 91)
40
x

0
0 5 10 15 20 25
x

Figure 14.1: The function f (x) ≡ 4x (mod 91 ). The period is seen by inspection to equal 6.

understand how this figure is obtained by working out the values of 4x (mod 91 ) for x = 1, 2, · · · , 6.

x = 1, 4x = 4 , (14.6a)
x
x = 2, 4 = 16 , (14.6b)
x
x = 3, 4 = 64 , (14.6c)
x
x = 4, 4 = 64 × 4 = 256 = 2 × 91 + 74 ≡ 74 ( mod 91 ) , (14.6d)
x
x = 5, 4 ≡ 74 × 4 = 296 = 3 × 91 + 23 ≡ 23 ( mod 91 ) , (14.6e)
x
x = 6, 4 ≡ 23 × 4 = 92 = 91 + 1 ≡ 1 ( mod 91 ) . (14.6f)

In the above equations the symbol ≡ means equivalent to (mod 91).

The plot in Fig. 14.1 seems to have a fairly regular behavior, but such smooth behavior is exceptional
and occurs here only because of the particularly simple choice of parameters. Figure 14.2 shows a plot
for the same value of N but with a = 19. This is a much more random looking figure, as is typical. In
this case the period is r = 12. The apparently random shape of f (x) means that one can not estimate
the period by taking a few nearby values of x and extrapolating.
Having found the period we now need to be lucky in two respects:

1. The period r must be even. This means that r/2 is an integer and so is ar/2 . Hence we can write

0 ≡ ar − 1 ≡ (ar/2 − 1)(ar/2 + 1) ( mod p q ) . (14.7)

2. We need that
ar/2 + 1 6≡ 0 ( mod p q ) . (14.8)
It is automatically true that ar/2 − 1 6≡ 0 (mod p q ) because, by defintion, x = r is the smallest
power for which ax − 1 ≡ 0 (mod p q ). Hence, if Eq. (14.8) is true, neither ar/2 + 1 nor ar/2 − 1
is divisible by N = p q but, according to Eq. (14.7), their product is, i.e. ar/2 + 1 ar/2 − 1 =
117

19 (mod 91) 60

40
x

0
0 5 10 15 20 25
x

Figure 14.2: The function f (x) ≡ 19x (mod 91 ). The period is seen by inspection to equal 12.

const. pq. Since p and q are primes (and neither ar/2 + 1 nor ar/2 − 1 are multiples of pq),

this is only possible if ar/2 + 1 is a multiple of one of the factors, p say, i.e. ar/2 + 1 = Cp, and
ar/2 −1 is a multiple of the other one q, i.e. ar/2 −1 = C 0 q (C and C 0 are constants). Consequently
p is the greatest common divisor of N (= p q) and ar/2 + 1 (= Cp), and q is the greatest common
divisor of N (= p q) and ar/2 − 1 (= C 0 q). We can therefore find p and q using the Euclidean
algorithm mentioned earlier.
What are the odds that we will be doubly lucky in this way. According to Appendix M in Mer-
min [Mer07] the probability is greater than 0.5 for large N . If one is unlucky one tries a different
choice for a. Since the probability of success is quite high at each attempt, one does not have to repeat
the process very many times to succeed with very high probability.
Back to our example. For N = 91, a = 4 we found r = 6. Indeed we are lucky! This is even. Also
r/2
a + 1 = 65 6≡ 0 (mod 91 ). So we are doubly lucky! However, this is not remarkable. As noted above
the probability of this double luck is greater than 0.5 (at least for large N ).
Hence one of the factors is the greatest common divisor (GCD) of 91 and ar/2 + 1 = 65. The other
factor is the greatest common divisor of 91 and ar/2 − 1 = 63.
Applying Euclid’s algorithm, described in Sec. 13.A, to f0 = 91, g0 = 65:
f1 = 65,
g1 = 91 − [91/65] 65 = 91 − 65 = 26,
f2 = 26,
g2 = 65 − [65/26] 26 = 65 − 52 = 13,
f3 = 13,
g3 = 26 − [26/13] 13 = 26 − 26 = 0. (14.9)
Hence the GCD is g2 = 13, which is indeed one of the factors of 91. By the same process the GCD of
63 and 91 is found to be 7, the other factor of 91.
118 CHAPTER 14. USING PERIOD FINDING TO FACTOR AN INTEGER

Period finding is a rather indirect method for factoring integers and is not the most efficient one
on a classical computer because of the amount of work in computing ax (mod N ) for all x from 1 to
r where r is of order N . However Shor realized that it lends itself to a very efficient implementation
on a classical computer. Part of Shor’s algorithm, which we will discuss in Chapter 17, uses quantum
parallelism to compute all needed values of ax (mod N ) with a time that only increases as a power of
n rather than exponentially in n, where we recall that the number to be factored, N , has n bits.
Chapter 15

The Fourier Transform and the Fast

Fourier Transform (FFT)

15.1 Introduction
The standard Fourier Transform concerns a continuous function, x(t) say. For descriptive purposes it
will be convenient to think of t as time, but this is not essential. In the Fourier transform we decompose
x(t) into its components at different “frequencies” ω as follows:
Z ∞
1
y(ω) = √ eiωt x(t) dt . (15.1)
2π −∞
If x(t) comprises oscillations at a frequency ω0 , say, (i.e. has a period T equal to 2π/ω0 ), so x(t) ∼ e−iω0 t ,
then y(ω) will be sharply peaked at ω = ω0 (or equivalently at ω = 2π/T ). Note the inverse relation
between the period T and the position of the peak in the Fourier Transform. The larger the period,
the smaller the value of ω at the peak.
As an example, if x(t) = cos ω0 t = 21 eiω0 t + e−iω0 t then y(ω) has sharp “delta function” peaks at

ω = ±ω0 . A completely different situation is when x(t) is random (i.e. white noise) in which case y(ω)
is a constant (at least for |ω| less than a cut-off value ωc .)
There is also an inverse Fourier transform,
Z ∞
1
x(t) = √ e−iωt y(ω) dω , (15.2)
2π −∞
which has almost the same form as the original (forward) transform, apart from the sign of i in the
exponential. It is shown in standard mathematics texts that substituting for y(ω) from Eq. (15.1) into
the RHS of Eq. (15.2) does give back x(t) for a wide class of functions x(t).
This chapter is concerned with the discrete analog of Eqs. (15.1) and (15.2) in which the data xm
is at a set of N equally spaced “times”, and the Fourier transform yk is at a set of N equally spaced
“frequencies”. In addition, in the discrete Fourier Transform, the data only covers a finite range,
whereas the data in the original, continuous Fourier Transform extends to ±∞.

15.2 The Discrete Fourier Transform

If we have a set of N data points xm (m = 0, 1, · · · , N − 1), the discrete Fourier transform (FT) is a
set of N new values yk given by
N −1
1 X
yk = √ exp(2πi km/N ) xm , (15.3)
N m=0

119
120CHAPTER 15. THE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM (FFT)

evaluated for k = 0, 1, · · · , N − 1. We don’t need to consider k values outside this range because
yk+N = yk (so the yk are periodic with period N ). Equation (15.3) corresponds to a discretized and
finite-range version of Eq. (15.1) with m corresponding to t and 2πk/N corresponding to ω. If xm
is a periodic function of m with period T , i.e. xm ∼ e−2πim/T , then yk will be peaked for k around
N/T since the terms in Eq. (15.3) then add up in phase. This corresponds, in the continuous Fourier
Transform, to a peak for ω at around 2π/T .
The inverse Fourier transform has almost the same form; one just needs to take the complex
conjugate of the exponential, i.e.
N −1
1 X
xm = √ exp(−2πi km/N ) yk , (m = 0, 1, · · · , N − 1). (15.4)
N k=0

To see this we substitute Eq. (15.3) into Eq. (15.4) so

N −1 N −1
1 X 1 X
xm =√ exp(−2πi km/N ) √ exp(2πi kl/N ) xl
N k=0 N l=0
N −1
"N −1 #
1 X X
= xl exp(2πi k(l − m)/N )
N
l=0 k=0
N −1
1 X 1 − exp(2πi (l − m))
= xl , (15.5)
N 1 − exp(2πi (l − m)/N )
l=0

where, in the last expression, we summed up the geometric series. The numerator in the brackets is
always zero. The denominator is only zero if l = m. Hence, as long as l 6= m the sum is zero. However,
if l = m we get 0/0, which is undefined, and so, to get the answer, we either evaluate it as the limit
l → m or go back the start and put l = m from the beginning. In either method one finds that the
term in rectangular brackets is equal to N for l = m.. Hence the RHS of Eq. (15.5) is xm , showing
that the inverse transform in Eq. (15.4) does give back the original dataset xm as claimed.
Note that xm+N = xm , so the x-values obtained from the inverse Fourier transform are actually a
periodic repetition of the original data (i.e. the xm for m = 0, · · · , N − 1) with period N .
The discrete Fourier transform can be conveniently written as
N −1
1 X km
yk = √ ω xm , (k = 0, 1, · · · , N − 1), (15.6)
N m=0

where
ω = exp(2πi/N ) , (15.7)
is the N -th root of unity.
For example, for N = 4 we have ω = i, and so

~y = U~x , (15.8)

where the matrix of coefficients is

   
1 1 1 1 1 1 1 1
1 1 i i2 i3   = 1 1
 i −1 −i
U=  2 4 6
. (15.9)
2 1 i i i
  2 1 −1
 1 −1
1 i3 i6 i9 1 −i −1 i
15.A. THE FAST FOURIER TRANSFORM; AN EXAMPLE WITH N = 8 121

To determine the FT, each application of Eq. (15.6) requires N additions and N multiplications
for each of the N values of k, so the operation count is O(N 2 ).
In the appendices of this chapter we describe the fast Fourier transform (FFT) which is a much
more efficient way to calculate a discrete Fourier transform. We don’t need the FFT for this course,
but I include a description of it here in the appendices partly to stimulate students’ interest in it (since
it is a gem of computer science), and partly because it bears a strong resemblance to Shor’s quantum
Fourier transform (QFT), see Chapter 16, which is the heart of his factoring algorithm. We shall show
this connection in the appendices of Chapter 16.
The Fast Fourier Transform (FFT) requires an operation count of only N log2 N compared with
2
N which is needed for a straightforward evaluation of Eq. (15.6) for all k. This reduction (which is
considerable for large N ) is possible because ω n is a periodic function of n with period N and so ω km
takes only N distinct values, even though k m runs over O(N 2 ) values.
The FFT is discussed in the appendices which now follow. As mentioned above, this material is
not required for the rest of the course and can be omitted.

Appendices

15.A The Fast Fourier Transform; an example with N = 8

We will understand the Fast Fourier Transform (FFT) by first working out in detail a simple example.
The number of data points N must be a power of 2. If it’s not a power of 2 then one pads the data
with zeroes to make it so. We will take n = 3, i.e. N = 8. Written out explicitly, the Fourier Transform
for N = 8 data points is

y0 = √1 ( x0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 ) , (15.10a)
8
2 3 4 5 6 7

y1 = √1 x0 + ω x1 + ω x2 + ω x3 + ω x4 + ω x5 + ω x6 + ω x7 , (15.10b)
8

x0 + ω 2 x1 + ω 4 x2 + ω 6 x3 + x4 + ω 2 x5 + ω 4 x6 + ω 6 x7 ,

y2 = √1 (15.10c)
8

x0 + ω 3 x1 + ω 6 x2 + ω x3 + ω 4 x4 + ω 7 x5 + ω 2 x6 + ω 5 x7 ,

y3 = √1 (15.10d)
8

x0 + ω 4 x1 + x2 + ω 4 x3 + x4 + ω 4 x5 + x6 + ω 4 x7 ,

y4 = √1 (15.10e)
8

x0 + ω 5 x1 + ω 2 x2 + ω 7 x3 + ω 4 x4 + ω x5 + ω 6 x6 + ω 3 x7 ,

y5 = √1 (15.10f)
8

x0 + ω 6 x1 + ω 4 x2 + ω 2 x3 + x4 + ω 6 x5 + ω 4 x6 + ω 2 x7 ,

y6 = √1 (15.10g)
8

x0 + ω 7 x1 + ω 6 x2 + ω 5 x3 + ω 4 x4 + ω 3 x5 + ω 2 x6 + ω x7 ,

y7 = √1 (15.10h)
8

where the xj are the original data, the yj are the Fourier transformed data,

1
ω = exp(2πi/8) = √ (1 + i) , (15.11)
2

and we note that

ω8 = 1 = ω0 , (15.12)

so we have reduced all the powers of ω to be between 0 and 7 (= N − 1). We also note that

ω 2 = i, ω 4 = −1 . (15.13)
122CHAPTER 15. THE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM (FFT)

To evaluate Eqs. (15.10) efficiently the FFT proceeds recursively. We firstly define Fourier trans-
forms of length 2:

u0 = √1 (x0
2
+ x4 ) = √1 (x0
2
+ ω 4k x4 ) (k = 0) , (15.14a)
u1 = √1 (x1
2
+ x5 ) = √1 (x1
2
+ ω 4k x5 ) (k = 0) , (15.14b)
u2 = √1 (x2
2
+ x6 ) = √1 (x2
2
+ ω 4k x6 ) (k = 0) , (15.14c)
u3 = √1 (x3
2
+ x7 ) = √1 (x3
2
+ ω 4k x7 ) (k = 0) , (15.14d)
u4 = √1 (x0
2
− x4 ) = √1 (x0
2
+ ω 4k x4 ) (k = 1) , (15.14e)
u5 = √1 (x1
2
− x5 ) = √1 (x1
2
+ ω 4k x5 ) (k = 1) , (15.14f)
u6 = √1 (x2
2
− x6 ) = √1 (x2
2
+ ω 4k x6 ) (k = 1) , (15.14g)
√1 (x3 √1 (x3 4k
u7 = 2
− x7 ) = 2
+ ω x7 ) (k = 1) . (15.14h)

Pairs of quantities in Eqs. (15.14) are combined into Fourier Transforms of length 4:

v0 = √1 (u0
2
+ u2 ) = √1 (u0
2
+ ω 2k u2 ) (k = 0) , (15.15a)
v1 = √1 (u1
2
+ u3 ) = √1 (u1
2
+ ω 2k u3 ) (k = 0) , (15.15b)
v2 = √1 (u4
2
+ iu6 ) = √1 (u4
2
+ ω 2k u6 ) (k = 1) , (15.15c)
v3 = √1 (u5
2
+ iu7 ) = √1 (u5
2
+ ω 2k u7 ) (k = 1) , (15.15d)
v4 = √1 (u0
2
− u2 ) = √1 (u0
2
+ ω 2k u2 ) (k = 2) , (15.15e)
√1 (u1 √1 (u1 2k
v5 = 2
− u3 ) = 2
+ ω u3 ) (k = 2) , (15.15f)
√1 (u4 √1 (u4 2k
v6 = 2
− iu6 ) = 2
+ ω u6 ) (k = 3) , (15.15g)
v7 = √1 (u5
2
− iu7 ) = √1 (u5
2
+ ω 2k u7 ) (k = 3) , (15.15h)

and finally pairs of quantities in Eqs. (15.15) are combined to form the Fourier Transform in Eqs. (15.10):

y0 = √1 (v0
2
+ v1 ) = √1 (v0
2
+ ω k v1 ) (k = 0) , (15.16a)
y1 = √1 (v2
2
+ ω v3 ) = √1 (v2
2
+ ω k v3 ) (k = 1) , (15.16b)
y2 = √1 (v4
2
+ iv5 ) = √1 (v4
2
+ ω k v5 ) (k = 2) , (15.16c)
y3 = √1 (v6
2
+ ω 3 v7 ) = √1 (v6
2
+ ω k v7 ) (k = 3) , (15.16d)
y4 = √1 (v0
2
− v1 ) = √1 (v0
2
+ ω k v1 ) (k = 4) , (15.16e)
√1 (v2 √1 (v2 k
y5 = 2
− ωv3 ) = 2
+ ω v3 ) (k = 5) , (15.16f)
√1 (v4 √1 (v4 k
y6 = 2
− iv5 ) = 2
+ ω v5 ) (k = 6) , (15.16g)
y7 = √1 (v6
2
− ω 3 v7 ) = √1 (v6
2
+ ω k v7 ) (k = 7) , (15.16h)

Equations (15.14)–(15.16) are represented graphically by Fig. 15.1.

We see that the FFT, specified by Eqs. (15.14)–(15.16), requires 8 × 3 (= N log2 N for N = 8)
additions and multiplications, whereas a direct evaluation of the FT according to Eq. (15.10) takes
8 × 8 (= N 2 ) additions and multiplications. For large N , the speedup factor, N/ log2 N , in using the
FFT rather than direct evaluation of the FT is considerable.
15.A. THE FAST FOURIER TRANSFORM; AN EXAMPLE WITH N = 8 123

u0 v0
x0 y0
u4 v2
x4 y1
u2 v4
x2 y2
u6 v6
x6 y3
u1 v1
x1 y4
u5 v3
x5 y5
u3 v5
x3 y6
u7 v7
x7 y7

−1
2 3
ω ω =i ω

Figure 15.1: A graphical representation of Eqs. (15.14)–(15.16), which is the FFT for N = 8 (=
2n with n = 3). The original data are the xj and the Fourier transformed data are the yj . The dashed
(red) lines have a factor of −1 and the solid lines have a factor of 1. The thick (green) circle transmits
a factor of ω to the right, the dashed (blue) circles transmit a factor of ω 2 (= i) to the right, and the
(brown) filled-in circle transmits a factor of ω 3 to the right. In Sec. 15.C we will change to a notation
(0) (1) (2) (3)
applicable for general n, as follows: yj ≡ xj , vj ≡ xj , uj ≡ xj , and xj = xj . (Adapted from
R. Vathsan Introduction to Quantum Physics and Information Processing.)

Let’s check that this works by evaluating y1 . We have

y1 = √1 (v2 + ω v3 ) , (15.17a)
2
2 3
1 1

= 2 ( u4 + iu6 + ω(u5 + iu7 ) ) = 2 u4 + ω u6 + ω u5 + ω u7 , (15.17b)
2 3

= √1 x 0 − x4 + ω (x2 − x 6 ) + ω(x 1 − x5 ) + ω (x 3 − x 7 ) , (15.17c)
8
2 3 4 5 6 7

= √1 x0 + ω x1 + ω x2 + ω x3 + ω x4 + ω x5 + ω x6 + ω x7 , (15.17d)
8

which agrees with Eq. (15.10b). We have used Eq. (15.16b) to get Eq. (15.17a), Eqs. (15.15c) and
(15.15d) to get Eq. (15.17b), and Eqs. (15.14e), (15.14g), (15.14f) and (15.14h) to get Eq. (15.17c).
Equation (15.17d) is the same as Eq. (15.17c) with powers of ω written out explicitly using Eq. (15.13).
It is instructive to write the linear transformations in Eqs. (15.10), (15.14), (15.15) and (15.16) in
matrix form. Equation (15.10) is written in matrix formulation as

~y = U~x , (15.18)
124CHAPTER 15. THE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM (FFT)

where  
1 1 1 1 1 1 1 1
1 ω ω2 ω3 ω4 ω5 ω6 ω7
ω2 ω4 ω6 ω2 ω4 ω6
 
1 1
ω3 ω6 ω4 ω7 ω2 ω5
 
1  1 ω
U=√  . (15.19)
8 1
 ω4 1 ω4 1 ω4 1 ω4

1
 ω5 ω2 ω7 ω4 ω ω6 ω3

1 ω6 ω4 ω2 1 ω6 ω4 ω2
1 ω7 ω6 ω5 ω4 ω3 ω2 ω
Equation (15.14) in matrix form is
~u = D~x , (15.20)
where  
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
 
0 0 1 0 0 0 1 0
 
1 0 0 0 1 0 0 0 1
D= √ . (15.21)
1
2 0 0 0 ω4 0 0 0
0
 1 0 0 0 ω4 0 0
0 0 1 0 0 0 ω4 0 
0 0 0 1 0 0 0 ω4
Equation (15.15) in matrix form is
~v = E~u , (15.22)
where  
1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
2
 
0 0 0 0 1 0 ω 0
1 0 ω2
 
1  0 0 0 0 0
E=√  4
. (15.23)
2 1
 0 ω 0 0 0 0 0
0
 1 0 ω4 0 0 0 0
0 0 0 0 1 0 ω6 0 
0 0 0 0 0 1 0 ω6
Equation (15.16) in matrix form is
~y = F ~v , (15.24)
where  
1 1 0 0 0 0 0 0
0 0 1 ω 0 0 0 0
1 ω2
 
0 0 0 0 0 0
1 ω3
 
1  0 0 0 0 0 0
F =√  4
. (15.25)
2 1 ω
 0 0 0 0 0 0 
0 0
 1 ω5 0 0 0 0 
0 0 0 0 1 ω6 0 0
0 0 0 0 0 0 1 ω7
Notice that D, E and F , which describe the FFT, are very sparse, they have only two entries in each
row and column, so they can be multiplied very efficiently, whereas the matrix U , which describes the
original Fourier transform, is dense. With some tedious matrix manipulations one can verify that
U = F ED, (15.26)
as required. (I used Mathematica.)
15.B. BEYOND N = 8 125

15.B Beyond N = 8
Now we discuss how we obtained Eqs. (15.14)–(15.16). For a general value n, with N = 2n , the FT is
defined by
N −1
1 X km
yk = √ ω xm , (k = 0, 1, · · · , N − 1) (15.27)
N m=0

with ω given by Eq. (15.7). We can break Eq. (15.27) into even and odd terms as follows:
 
N/2−1 N/2−1
1  X 2km X
yk = √ ω x2m + ω k(2m+1) x2m+1  ,
N m=0 m=0
r 
N/2−1 r N/2−1
1  2 X 2 X
=√ (ω 2 )km x2m + ω k (ω 2 )km x2m+1  , (k = 0, 1, · · · , N − 1) . (15.28)
2 N N
m=0 m=0

Noting that ω 2 is the complex exponential factor analogous to Eq. (15.7) which figures in a Fourier
Transform with N/2 points, we see that the first term in Eq. (15.28) is a FT for the N/2 even points
and the second term is the FT for the N/2 odd points. We can write Eq. (15.28) as
h i
yk = √1
2
v2k + ω k v2k+1 , (k = 0, 1, · · · , N − 1) , (15.29)

where
r N/2−1
2 X
v2k = (ω 2 )km x2m , (15.30a)
N
m=0
r N/2−1
2 X
v2k+1 = (ω 2 )km x2m+1 , (k = 0, 1, · · · , N − 1) . (15.30b)
N
m=0

Here k runs over the range 0, 1, · · · , N − 1 so the indices on the vj in Eqs. (15.30) run from 0 to
2N − 1. However, since ω N = 1, see Eq. (15.7), it follows from the definition of the vj in Eq, (15.30)
that vj+ N = vj . Hence the index j, of the vj is to be evaluated modulo N . This applies in an obvious
(`)
way to other quantities as well, such as the uj , and, in Sec. 15.C, to the lower index on the xj .
For N = 8 please check that Eq. (15.29) corresponds to our Eqs. (15.16) for k = 0, 1, 2, · · · 7 and
that, according to Eqs. (15.30), the expressions for the vk in terms of the original data xm are

3
X 3
X 3
X 3
X
2 m 2 2m
v0 = 1
2 x2m , v2 = 1
2 (ω ) x2m , v4 = 1
2 (ω ) x2m , v6 = 1
2 (ω 2 )3m x2m ,
m=0 m=0 m=0 m=0
(15.31a)
3
X 3
X 3
X 3
X
v1 = 1
2 x2m+1 , v3 = 1
2 (ω 2 )m x2m+1 , v5 = 1
2 (ω 2 )2m x2m+1 , v7 = 1
2 (ω 2 )3n x2m+1 ,
m=0 m=0 m=0 m=0
(15.31b)

so v0 , v2 , v4 and v6 are the FT of the 4 even points for k = 0, 1, 2 and 3 respectively, while v1 , v3 , v5
and v7 are the FT of the 4 odd points for k = 0, 1, 2 and 3 respectively.
126CHAPTER 15. THE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM (FFT)

We can again separate each of Eqs. (15.30) into even and odd terms by analogy with Eq. (15.28).
We have
r N/4−1 N/4−1

2  X k X
v2k = (ω 4 )km x4m + ω 2 (ω 4 )km x4m+2  , (15.32a)
N
m=0 m=0
r N/4−1 N/4−1

2  X k X
(ω 4 )km x4m+1 + ω 2 (ω 4 )km x4m+3  .

v2k+1 = (15.32b)
N
m=0 m=0

We can write these equations as

h k i
v2k = √12 u4k + ω 2 u4k+2 , (15.33a)
h k i
v2k+1 = √12 u4k+1 + ω 2 u4k+3 , (k = 0, 1, · · · , N/2 − 1) , (15.33b)

where
r N/4−1 r N/4−1
4 X 4 X
u4k = (ω 4 )km x4m , u4k+1 = (ω 4 )km x4m+1 , (15.34a)
N N
m=0 m=0
r N/4−1 r N/4−1
4 X 4 X
u4k+2 = (ω 4 )km x4m+2 , u4k+3 = (ω 4 )km x4m+3 . (15.34b)
N N
m=0 m=0

Note that the two equations in Eqs. (15.33) can be combined as

h k i
v2k+p = √1
2
u4k+p + ω 2 u4k+p+2 , (p = 0, 1), (k = 0, 1, · · · , N/2 − 1) . (15.35)

Again, the index j on the uj is to be evaluated modulo N .

For N = 8 please check that Eq. (15.35) corresponds to our Eqs. (15.15) for p = 0, 1, and k = 0, 1, 2
and 3, and that, according to Eqs. (15.34), the explicit expressions for the uj are
1
X 1
X
u0 = √1 x4m = √1 (x0 + x4 ) , u1 = √1 x4m+1 = √1 (x1 + x5 ) , (15.36a)
2 2 2 2
m=0 m=0
X1 X1
u2 = √1 x4m+2 = √1 (x2 + x6 ) , u3 = √1 x4m+3 = √1 (x3 + x7 ) , (15.36b)
2 2 2 2
m=0 m=0
X1 1
X
u4 = √1
2
(ω 4 )m x4m = √1 (x0
2
− x4 ) , u5 = √1
2
(ω 4 )m x4m+1 = √1 (x1
2
− x5 ) , (15.36c)
m=0 m=0
X1 X1
u6 = √1
2
(ω 4 )m x4m+2 = √1 (x2
2
− x6 ) , u7 = √1
2
(ω 4 )n x4m+3 = √1 (x3
2
− x7 ) . (15.36d)
m=0 m=0

Equations (15.36) agree with the expressions in Eq. (15.14). They can be written as a single equation
as
u4k+p = √12 [xp + (−1)k xp+4 ] , (p = 0, 1, 2, 3), (k = 0, 1) . (15.37)

Thus we have seen that the FFT for N = 8 (= 2n with n = 3), which is written out explicitly in
Eqs. (15.14)–(15.16), corresponds to firstly doing the Fourier transforms of length 2 in Eq. (15.37),
followed by two applications of the iterative procedure, the first shown in Eq. (15.35) and the second
shown in Eq. (15.29).
15.C. THE GENERAL CASE 127

15.C The General Case

So far we have unsystematically labeled the results at each stage of iteration by a different symbol,
x → u → v → y, see Fig. 15.1. When writing a code applicable for N = 2n data points for arbitrary
n, one would use a common symbol but add a second index, so
(n)
xj ≡ xj , (15.38a)
..
.
(2)
uj ≡ xj , (15.38b)
(1)
vj ≡ xj , (15.38c)
(0)
yj ≡ xj . (15.38d)

Note that since ω = exp(2πi/2n ) we have

n n−1
ω 2 = exp(2πi) = 1, ω2 = exp(πi) = −1. (15.39)

The `-th iteration, analogous to Eqs. (15.35), (15.29) and (15.37) is

h `−1
i
(`−1) (`) (`)
x2`−1 k+p = √1
2
x2` k+p + (ω 2 )k x2` k+p+2`−1 , (15.40)

with
p = 0, 1, · · · , 2`−1 − 1, k = 0, 1, · · · , 2n−`+1 − 1 . (15.41)
Sorry that the notation is messy but I can’t see how to improve it; one just has to keep track of the
(`)
indices and the powers of ω. Recall that the lower index j on the xj is to be evaluated modulo 2n .
Let’s see how this works.
(`)
• We start with ` = n, for which xj ≡ xj , the original data points.
Equation (15.40) is then
h i
(n−1)
x2n−1 k+p = √12 xp + (−1)k xp+2n−1 , (p = 0, 1, · · · , 2n−1 − 1), (k = 0, 1) . (15.42)

(n−1)
For n = 3 (N = 8) this corresponds to Eq. (15.37) with xj ≡ uj .

• We then iterate Eq. (15.40) for ` = n − 1, n − 2, · · · , 2, 1.

At the next to the last iteration, ` = 2, we have
h i
(1) (2) (2)
x2k+p = √12 x4k+p + (ω 2 )k x4k+p+2 , (p = 0, 1), (k = 0, 1, · · · , 2n−1 − 1) , (15.43)

(1) (2)
which corresponds to Eq. (15.35) with, xj ≡ vj , xj ≡ uj . At the last iteration, ` = 1, we
obtain h i
(1) (1)
yk = √12 x2k + ω k x2k+1 , (k = 0, 1, 2, · · · , 2n − 1) , (15.44)
(0) (1)
which is Eq. (15.29). (Recall that xj ≡ yj , the Fourier transformed data, and xj ≡ vj .)

Note that the iterations are evaluated in reverse, starting with ` = n and working down to ` = 1.
128CHAPTER 15. THE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM (FFT)
Chapter 16

The Quantum Fourier Transform

(QFT)

16.1 Introduction
This chapter introduces the quantum Fourier transform (QFT), which is at the heart of Shor’s algo-
rithm for period finding, and hence for factoring. Shor’s algorithm will be discussed in Chapter 17.
The appendices of this chapter make a detailed comparison with the (classical) Fast Fourier Trans-
form(FFT). The FFT is not part of the course so if you are not interested in this comparison you can
ignore the appendices.
The QFT can be defined as follows. Starting with n qubits in a single computational basis state
|xin , where x is an n-bit integer, one generates the following superposition:
n −1
2X
QFT 1
|xin −→ |ψx in = exp[2πixy/2n ]|yin (16.1)
2n/2 y=0

where y is also an n-bit integer.

P2The real power of the QFT arises, of course, because it acts in parallel
n −1
if one inputs a superposition x=0 ax |xin , i.e.
n −1
2X 2n −1 2Xn −1 n −1 "2n −1
2X
#
QFT 1 X 1 X
ax |xin −→ n/2 ax exp[2πixy/2n ]|yin = n/2 ax exp[2πixy/2n ] |yin .
x=0
2 x=0 y=0
2 y=0 x=0
(16.2)
The circuit to perform the QFT, the derivation of which is the main topic of this chapter and which
is shown below in Fig. 16.5, takes no more time to act on the superposition in Eq. (16.2) than on the
single basis state in Eq. (16.1). This is where the power of the QFT lies.
Note that the effect of the QFT acting on a superposition, given in Eq. (16.2), can be written as
n −1
2X n −1
2X
QFT
ax |xin −→ a0y |yin , (16.3)
x=0 y=0

where the transformed amplitudes a0y are related to the original amplitudes ax by
n −1
2X
1
a0y = exp[2πixy/2n ] ax (16.4)
2n/2 x=0

which is a discrete Fourier transform on the amplitudes. This transformation of the amplitudes is a
useful alternative way of defining a QFT, and is equivalent to Eq. (16.1).

129
130 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

16.2 QFT with two qubits

To help us derive the circuit which generates the transformation in Eq. (16.1) we start with just n = 2
qubits.
The Quantum Fourier Transform (QFT)in Eq. (16.1) for n = 2 qubits is

3
1X
exp 2πixy/22 |yi2 ,

|ψx i2 = (16.5)
2
y=0

2 hψx |ψx0 i2 = δx,x0 . (16.6)

Noting that y = y0 + 2y1 and x = x0 + 2x1 we can simplify the argument of the exponential:

2πixy 2πi(x0 + 2x1 )(y0 + 2y1 ) n x

0 x1 x
0
o
= = 2πi y0 + + y 1 + x 1 . (16.7)
22 22 4 2 2

Now exp(2πiy1 x1 ) = 1 so the factor y1 x1 above can be neglected. Hence Eq. (16.5) becomes
  
1 1
1 X h x0 x1   1
i X h x0 
i
|ψx i2 =  √ exp 2πiy0 + √ exp 2πiy1 |y1 y0 i. (16.8)
2 y0 =0 4 2 2 y1 =0 2

Next we will explain how to perform the operations in Eq. (16.8) using quantum gates.
Consider the second factor on the RHS of Eq. (16.8), which involves a sum over y1 . If x0 = 1
the exponential is 1 for y1 = 0 and is −1 for y1 = 1. If x0 = 0 the exponential is always 1. This
functionality is provided by a Hadamard gate H, since

1 1
1 1 X 1 X
H|x0 i = √ (|0i + (−1)x0 |1i) = √ (−1)x0 y1 |y1 i = √ exp[2πiy1 x0 /2]|y1 i , (16.9)
2 2 y1 =0 2 y1 =0

where we denote by y1 the dummy summation variable in order to correspond with the notation in
the second factor on the RHS of Eq. (16.8). We therefore see that the second factor on the RHS of
Eq. (16.8), including the sum over y1 , is generated by the Hadamard gate shown in Fig. 16.1.

y1
x0 H ψx
0

What about the first factor on the RHS of Eq. (16.8) which involves y0 ? There are two pieces in
the exponential. The factor involving 2πiy0 x1 /2, including the sum over y0 , can be dealt with by a
Hadamard, similar to Fig. 16.1 but with the left hand qubit being x1 and the right hand qubit being
labeled by y0 . However, the piece involving 2πiy0 x0 /4 is different. It induces a phase shift of eiπ/2 for
16.2. QFT WITH TWO QUBITS 131

y0 = 1 provided that x0 is also 1. This requires a controlled phase gate. We define a phase gate Rd
by1

1 0
Rd = d . (16.10)
0 eπi/2

Acting on |0i, Rd makes no change, while acting on |1i Rd changes the phase by π/2d . Note that R0 is
just the Z gate. Here we need R1 .
Hence the exponential in the first term on the RHS of Eq. (16.8) can be generated by a Hadamard
followed by a controlled R1 gate as shown for the top qubit in Fig. 16.2, in which the R1 gate on the
upper qubit is controlled by the lower qubit, x0 . Including the Hadamard on the lower qubit, Fig. 16.2
generates both factors on the RHS of Eq. (16.8).

ϕ ϕ
2 2
ψ’
1 2 x 2
y y0
0
x1 H R1

y
H 1
x0

Figure 16.2: The initial state on the left is the single quantum state |xi
P2 ≡ |x1 x0 i in the computational
basis. The final state on the right is the superposition |ψx0 i2 = (1/2) 3y=0 exp(2πixy/22 )|y0 y1 i, which
is almost |ψx i2 , the QFT of |xi2 ≡ |x1 x0 i given in Eq. (16.8), except that the order of the bits in the
final state is the reverse of what it should be according to Eq. (16.8). This can be corrected by a swap
gate as shown in Fig. 16.3. Note the controlled-R1 phase gate. This acts if the control qubit, x0 , is 1,
and changes the phase of the state if the target qubit, y0 , is also equal to 1. The general phase gate
Rd is defined in Eq. (16.10).

To make sure we understand we understand what is happening in the circuit in Fig. 16.2 we now
write down the state at each of the steps shown in the figure. The initial state is

|xi2 = |x1 x0 i. (16.11a)

After the first Hadamard the state is

1
1 X 2πiy0 x1 /2
|φ1 i2 = √ e |y0 x0 i. (16.11b)
2 y0 =0

After the controlled-R1 gate we have

1
1 X 2πiy0 x1 /2 2πiy0 x0 /4
|φ2 i2 = √ e e |y0 x0 i. (16.11c)
2 y0 =0

1
This is the definition of Rd that I find most convenient. Some other authors adopt a slightly different definition with
d d
a factor of e2πi/2 instead of eπi/2 .
132 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

ϕ ϕ
2 2
ψ’ ψ
x 2
1 2 x 2
y0 y0 y
x1 R1 X 1
H

y y0
H 1 X
x0

Figure 16.3: The same as Fig. 16.2 but with the addition of a swap gate onPthe right (the dashed line
with crosses at the ends). The final state is now precisely |ψx i2 = (1/2) 3y=0 exp(2πixy/22 )|y1 y0 i,
the QFT given in Eq. (16.8).

The final state after the Hadamard on the lower qubit is therefore
  
1 1
1 X 1 X
|ψx0 i2 =  √ e2πiy0 x1 /2 e2πiy0 x0 /4   √ e2πiy1 x0 /2  |y0 y1 i. (16.11d)
2 y0 =0 2 y1 =0

|ψx0 i is almost the desired QFT in Eq. (16.8), except that the order of the qubits on in the final state
on the right has been reversed. This can be compensated for by adding a swap gate on the right as
shown in Fig. 16.3.
In terms of operators the circuit in Fig. 16.3 corresponds to

QFT4 = (SWAP) (I ⊗ H) (Ctrl-R1 )(H ⊗ I), (16.12)

where in the tensor product the left operator refers to the upper qubit in the figure. We recall that for
operators we read from right to left (the opposite of circuit diagrams).
The 4 × 4 matrices for each piece in this operator product are
 
1 0 0 0
0 0 1 0
SWAP =  0 1 0 0 ,
 (16.13)
0 0 0 1
 
1 1 0 0
1 1 −1 0 0
I ⊗H = √  , (16.14)
2 0
 0 1 1
0 0 1 −1
 
1 0 0 0
0 1 0 0
Ctrl-R1 = 0 0 1 0 ,
 (16.15)
0 0 0 i
 
1 0 1 0
1 0 1 0 1
H ⊗I = √  . (16.16)
2 1 0 −1
 0
0 1 0 −1
16.3. QFT WITH THREE OR MORE QUBITS 133

For a dicsusion of how to construct matrices for a direct product of operators on two qubits see Sec. 3.9.
Multiplying the above matrices in the order specified in Eq. (16.12) one can verify that one correctly
obtains the Fourier Transform for N = 4 states given in Eq. (15.9).
This confirms that Fig. 16.3 displays the circuit to implement the QFT for 2 qubits. I emphasize
that initially (on the left) the qubits are in a single computational basis state, |x0 i and |x1 i, whereas
in the final state (on the right) there is a sum over the states y0 and y1 (the sum being generated by
the Hadamards).

16.3 QFT with three or more qubits

We next do another special case, this time with n = 3 qubits. After this, we will be able to see the
structure of the circuit for general n.
The QFT analogous to Eq. (16.8) is

7
1 X
|ψx i3 = exp[2πixy/23 ]|yi3 (16.17)
23/2 y=0
  
1 1
1 X h x0 x1 x2   1
i X h x0 x1  i
= √ exp 2πiy0 + + √ exp 2πiy1 +
2 y0 =0 8 4 2 2 y1 =0 4 2
  (16.18)
1
1 X h x0 i 
× √
 exp 2πiy2 |y2 y1 y0 i,
2 y2 =0 2

where we have again replaced factors of exp(2πi × integer) by unity.

Note that the terms in the exponential are of the form

2j 2k
2πixj yk (16.19)
2n
where j runs from 0 to n − 1 and k runs from 0 to n − j − 1.
Following along the lines in the previous section, the circuit diagram which will perform this is
shown in Fig. 16.4. To make sure we understand this circuit we will write down the state at each stage
indicated on the figure. (Although these expressions look rather complicated is useful to make the
effort to understand them.) The initial state is

|xi3 = |x2 x1 x0 i, (16.20a)

and the subsequent states, labeled in Fig. 16.4, are

 
1
1 X h x i
2
|φ1 i3 =  √ exp 2πiy0  |y0 x1 x0 i, (16.20b)
2 y0 =0 2
 
1
1 X h x0 x1 x2  i
|φ2 i3 =  √ exp 2πiy0 + + |y0 x1 x0 i, (16.20c)
2 y0 =0 8 4 2
  
1 1
1 X h x0 x1 x2   1 i X h x1 
i
|φ3 i3 =  √ exp 2πiy0 + + √ exp 2πiy1 |y0 y1 x0 i, (16.20d)
2 y0 =0 8 4 2 2 y1 =0 2
134 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

ϕ1 ϕ2 ϕ3 ϕ4 ψx’ ψx
3 3 3 3 3 3

y0 y
2
x2 H R1 R2 X

y y
1 1
x1 H R1

y y0
H 2
x0 X

Figure 16.4: Circuit diagram for performing the QFT with n = 3 qubits. It generates the transfor-
mation shown in Eq. (16.18). The initial state is |xi3 = |x2 x1 x0 i and the subsequent states are given
in Eqs. (16.20). The phase gates, Rd , are defined in Eq. (16.10). The dashed line with crosses at the
ends indicates a swap gate between qubits 0 and 2. This serves to reverse the order of the qubits.

  
1 1
1 X h x0 x1 x2   1
i X h x0 x1 
i
|φ4 i3 =  √ exp 2πiy0 + + √ exp 2πiy1 +
2 y0 =0 8 4 2 2 y1 =0 4 2 (16.20e)
|y0 y1 x0 i,
  
1 1
1 X h x
0 x 1 x2
i 1 X h x
0 x 1
i
|ψx0 i3 =  √ exp 2πiy0 + +  √ exp 2πiy1 + 
2 y0 =0 8 4 2 2 y1 =0 4 2
  (16.20f)
1
1 X h x i
0 
× √ exp 2πiy2 |y0 y1 y2 i.
2 2
y2 =0

|ψx0 i is almost the desired QFT in Eq. (16.18), except that the order of the qubits on in the final state
on the right has been reversed. This can be compensated for by adding a swap gate between qubits 1
and 3. Hence |ψx i in the figure is the desired QFT for 3 qubits given in Eq. (16.18).
Intuitively, the reason that for the reverse order of the qubits in the final state before the swaps,
is the following. The Hadamards generate the superpositions, i.e. the sums over the yj . They also
produce the factors in the exponential involving 2πi/2. From the straightforward generalization of
Eq. (16.18)P to arbitrary n, see Eq. (16.19) it follows that the factors generated by the Hadamards are
(2πi/2) n−1 j=0 xj yn−j−1 . Here xj is the label of the j-th physical qubit in its initial state, and yn−j−1
is the dummy label for the state of the same physical qubit in its final state. Because it is the label
yn−j−1 (rather than yj ) which occurs on the same physical qubit as xj , the qubits in the final state
are in reverse order.
Comparing with the case for two qubits shown in Fig. 16.3, and that for three qubits in Fig. 16.4,
the generalization to an arbitrary number of qubits can be deduced and is shown in Fig. 16.5. Note
that the controlled phase gate between qubits xi and xj is R|i−j| , which makes the structure fairly
simple.
For an n-qubit QFT one needs n Hadamard gates. The number of controlled phase gates is
1 + 2 + · · · + n − 1 = n(n − 1)/2. Also [n/2] swaps are required, where [k] denotes the largest integer
16.4. THE PHASE ESTIMATION ALGORITHM 135

x n−1 H R1 Rn−2 Rn−1 y0

x n−2 H R1 R
n−3
Rn−2 y
1

x1 H R1 y
n−2

x0 H y
n−1

Figure 16.5: Circuit diagram for performing the QFT with an arbitrary number of qubits. For clarity
the final swaps are not shown, so the input states on the left, xi , and the output states on the right,
yi , are in opposite order. Note that the controlled phase gate between qubits xi and xj is R|i−j| , which
makes the structure fairly simple. The state inputted on the left is a single computational basis state
|xin , and if we add the final swaps, the state outputted on the right is the superposition in Eq. (16.1).

less than or equal to k. The circuit therefore provides an algorithm for performing the QFT in O(n2 )
steps. By contrast the FFT requires O(n2n ) steps which is exponentially greater.
However, we cannot obtain the 2n Fourier amplitudes from the QFT since a measurement will
just give one of the basis states with a probability proportional to the square of the absolute value of
its Fourier amplitude.
P However, the QFT does give useful information if the input state is a linear
combination x ax |xi, see Eq. (16.2), in which the ax are periodic in x with some period r. As we shall
see in Chapter 17 the Fourier amplitudes are then strongly peaked at values of y which are multiples of
2n /r, so there is a high probability that a measurement of y will give a value which is equal, or close,
to a multiple of 2n /r. As we shall see in Chapter 17, from this information one can then deduce the
period r with high probability. Hence the QFT is very useful for period finding.
As we saw in Chapter 14, period finding can be used to factor integers. If one could factor large
integers, one would be able to decode messages sent down the internet which have been encoded with
the standard RSA encryption method. We discussed RSA encryption in Chapter 13.
Another application of the QFT is to estimate the phase of the eigenvalues of a unitary matrix.
This is discussed in section 16.4.

16.4 The Phase Estimation Algorithm

The eigenvalue of a unitary operator U must be a pure phase, i.e. λ = eiθ . The reason is that U
preserves the norm of states, so if |ψ 0 i = U |ψi, we have

hψ 0 |ψ 0 i = hU ψ|U ψi = hψ|U † U |ψi = hψ|ψi = 1, (16.21)

since U † U = 1 and we used Eq. (3.39). If |ψi is an eigenstate of U , i.e. |ψ 0 i = λ|ψi this last equation
becomes
1 = hψ 0 |ψ 0 i = hλψ|λψi = hψ|λ? λ|ψi = |λ|2 hψ|ψi = |λ|2 , (16.22)
136 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

so |λ|2 = 1, and hence λ = eiθ for some θ.

The objective of this section is to determine an eigenvalue of a unitary matrix, which is equivalent
to determining its (complex) phase (since, as we just showed, its modulus is 1). Hence this problem is
called “phase estimation”.
Let us write

θ = 2πφ (16.23)

so 0 ≤ φ < 1. The result for the phase φ will be encoded as an integer (formed from the values of
the measured qubits) and let’s suppose we want to determine φ correct to n bits of precision. The
procedure is to compute an n-bit integer φ0 , related to φ and θ by

φ0
φ0 = 2n φ, so θ = 2π . (16.24)
2n

The possible values of φ0 are 0, 1, 2, · · · , 2n − 1.

ψ ψ
1 2
control φ’
0 H H

u U
target

Figure 16.6: The circuit for phase estimation for 1 bit of precision.

We start with a simple example in which we only require 1 bit accuracy, so φ0 = 0 or 1. We will
see that circuit in Fig. 16.6 does the trick. Figure 16.6 is essentially the same as Fig. 7.8 in Chapter 7.
Here we assume that |ui is an eigenstate of U with eigenvalue exp(2πiφ0 /2). Following the discussion
after Fig. 7.8 we find that

1 0

|ψ1 i = √ |0i + e2πiφ /2 |1i ,
2
1 h 0
0
i
|ψ2 i = 1 + e2πiφ /2 |0i + 1 − e2πiφ /2 |1i (16.25)
2

We see that if φ0 = 0 the measurement of the upper qubit gives |0i and if φ0 = 1, the measurement of
the upper qubit gives |1i. Hence a measurement of the upper qubit in Fig. 16.6 determines the phase
to one bit of precision.
We note that the right hand Hadamard on the upper qubit in Fig. 16.6 is just the QFT for 1 qubit,
see Fig. 16.1. In fact, one can obtain φ0 to an arbitrary accuracy of n-bits by using the n-bit QFT
(strictly speaking the inverse QFT).
To see this we proceed gently by considering the circuit in Fig. 16.7 which is for two qubits. Both
of the upper qubits are acted on by a Hadamard, after which one of them is the control for a control-U
16.4. THE PHASE ESTIMATION ALGORITHM 137

ψ’
ψ
0 H H R −1
φ’
0 H H

−1
QFT
2
u U U

Figure 16.7: The circuit for phase estimation for two bits of precision. In their final state the two
upper qubits contain the two bits of φ0 , which is related to the phase θ by Eq. (16.24).

This is just the QFT of |φ0 i which can be undone by an inverse QFT, i.e.
2
1 X −2πiyk/23
|ki2 → e |yi2 , (16.27)
2
y=0

In terms of gates, what is the difference between the quantum Fourier transform and its inverse?
For the quantum Fourier transform we use phase gates Rd , defined by Eq. (16.10), which increase
the phase of basis state |1i by π/2d and leave the phase of basis state |0i unchanged. In the inverse
transform these are replaced by gates, which we label R−d , which decrease the phase of basis state |1i
by π/2d and leave the phase of basis state |0i unchanged, i.e.

1 0
R−d = d . (16.29)
0 e−πi/2
138 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

The gates which perform the inverse quantum Fourier transform for two qubits are indicated in
Fig. 16.7. According to Eq. (16.28), the final measurement in Fig. 16.7, after the inverse quantum
Fourier transform has been done, gives the 2-bit integer φ0 from which the eigenvalue is given by
0 2
λ = e2πiφ /2 .
This generalizes to the case of n bits of precision. We need n qubits to act as control-U , control-
n−1 l
U , control-U 4 , · · · , control-U 2
2 gates on the qubit containing |ui. After the control-U 2 gates, for
l = 0, 1, · · · , n − 1, have acted, we run the qubits through the inverse Fourier transform to get |φ0 in ,
from which θ = 2πφ0 /2n . The circuit is shown in Fig. 16.8.

0 H

−1
QFT φ’
0 H

0 H

u U U2 U2
n−1

Figure 16.8: The circuit for phase estimation. The values of the n measured qubits form the binary
representation integer φ0 , related to an eigenvalue λ = eiθ of the unitary operator U by θ = 2πφ0 /2n .

What happens if |ui is not a single eigenstate of U as we have been assuming up to now, but a
superposition? After the inverse QFT, the state of the n qubits will be a superposition of computational
basis states |φ0 i for each of the eigenvalues present in the decomposition of |ψin into its eigenstates.
Measurement will then project on to the value of φ0 corresponding to one of the eigenvalues.

Problems
16.1. We stated (without proof) that any quantum gate can be made out of single qubit gates and the
CNOT gate (i.e. the only gate needed with more than one qubit is CNOT). Here we illustrate
this for the controlled phase gate used in Shor’s quantum Fourier transform.
The (uncontrolled) phase gate (acting on one qubit) has the matrix representation

1 0
R(θ) = , (16.30)
0 eiθ

so the phase is changed by θ if the qubit is in state |1i and is unchanged if the qubit is in state
|0i.
Now we want this gate to be controlled by a control qubit such that the gate will only act on
the target qubit if the control qubit is in state |1i. We want to find out how to do this using
1-qubit gates (including R(θ)) and the CNOT (Ctrl-X) 2-qubit gate. Note that the 4 × 4 matrix
16.4. THE PHASE ESTIMATION ALGORITHM 139

representation for the controlled phase gate is

|00i |01i |10i |11i

 
h00| 1 0 0 0
h01| 
0 1 0 0, (16.31)
h10|  0 0 1 0
h11| 0 0 0 eiθ

where the control qubit is to the left and the target qubit is to the right.
Show that the following circuit almost generates a controlled-phase gate:

R(θ /2) X R(−θ /2) X

In particular, write the 4 × 4 matrix representation of this circuit. From this you should be able
show that if the control (upper) qubit is 0 then the qubits are unchanged (as required) but that
if the control qubit is 1 (so the gate is activated) then the relative phase between the |1i and |0i
states of the target (lower) qubit is θ as required, but the overall phase of these two states is not
correct2 .
Show that this phase can be corrected by adding another R(θ/2) gate on the control qubit after
the other gates have acted (i.e. at the right).
Note: If we have two or more qubits we often find it convenient to associate the global phase (or
the sign in simple cases) with just one of the qubits. However, you should appreciate that this
is simply a manner of speaking; the global phase is a property of the whole state.

16.2. Consider a function f (x) which is periodic with period N . We are given a unitary operator Uy
that performs the transformation

Uy |f (x)i = |f (x + y)i. (16.32)

Show that the state

N −1
˜ 1 X −2πikx/N
|f (k)i = √ e |f (x)i (16.33)
N x=0

is an eigenvector of Uy . Calculate the corresponding eigenvalue.

16.3. We have defined the quantum Fourier transform (QFT) in terms of a transformation of basis
states
2n −1
1 X
|xi → √ exp[2πixy/2n ]|yi . (16.34)
2n y=0

2
Note that what I mean by this overall phase is the common phase of the two states of the target qubit when the
control qubit is |1i. The existence of this phase means that there is an error in the relative phase between the two states
when the control qubit is |1i and those when the control qubit is |0i compared with what is expected in Eq. (16.31)
140 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

(i) If we consider a superposition

n −1
2X
|ψi = cx |xi, (16.35)
x=0

show that one can regard the QFT as a transformation of the coefficients

n −1
2X
|ψi → c̃y |yi (16.36)
y=0

where
2 −1 n
1 X
c̃y = √ exp[2πixy/2n ] cx . (16.37)
2n x=0

(ii) Now suppose we shift the basis states by a, say, in the sense that we define a new state

n −1
2X
0
|ψ i = cx |x + ai. (16.38)
x=0

Show that, after the quantum Fourier transform

n −1
2X
0
|ψ i = c̃0y |yi (16.39)
y=0

where
n
c̃0y = e2πiay/2 c̃y (16.40)

This is called the “shift-invariance” property of the Fourier transform.

16.4. (i) Write down the 4 × 4 matrix for the Fourier transform for N = 4 (2 qubits).
(ii) Consider the circuit diagram below,

x H
UF ?
y H

where F indicates the quantum Fourier transform.

What is the final state if x = y = 0?
(iii) What is the final state for the other possible values of x and y?

16.5. The circuit for the quantum Fourier transform with 4 qubits is shown in the figure below. (The
final swap gates are omitted).
16.A. COMPARISON BETWEEN FFT AND THE QFT FOR N = 4 141

y0 y0
x3 H R1 R2 R3

y
1 R R2 y
x2 H 1 1

y
2 y
x1 H R1
2

x0 y
H 3

Write down the following states of the system:

(i) Immediately after the R3 gate on the top qubit.
(ii) Immediately after the R2 gate on the next to top qubit.
(iii) Immediately after the R1 gate on the second from top qubit.
(iv) The final state at the right.
16.6. Phase Estimation Algorithm
We showed in Sec. 16.4 that the eigenvalues of a unitary matric are a pure phase, i.e. are of
the form eiθ for some phase θ. We also showed that to determine the phase with a quantum
algorithm one takes out a factor of 2π and writes θ = 2πφ where 0 ≤ φ < 1. To determine φ to
n bits of precision one then writes φ = φ0 /2n where φ0 is an integer in the range from 0 to 2n − 1.
The quantum algorithm, discussed in Sec. 16.4, determines φ0 .
Here we consider the following two unitary matrices:

0 1
(a) X = , (16.41)
1 0

1 0
(b) R1 = . (16.42)
0 eiπ/2

For each matrix determine how many bits n you need to evaluate the eigenvalues, and draw the
quantum circuit for each case. Explain how each circuit works.
Appendices

16.A Comparison between FFT and the QFT for N = 4

In this and the subsequent appendix in this chapter, we describe the connection between the QFT and
FFT. This material is not necessary for the rest of the course and can be skipped.
We start by considering the simplest case of 2 qubits, i.e. N = 4. The FFT for N = 4 is
1
y0 = 2 ( x0 + x1 + x2 + x3 ) , (16.43a)
2 3
1

y1 = 2 x0 + ix1 + i x2 + i x3 , (16.43b)
x0 + i2 x1 + x2 + i2 x3 ,
1

y2 = 2 (16.43c)
x0 + i3 x1 + i2 x2 + ix3 ,
1

y3 = 2 (16.43d)
142 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

where the xj are the original data, the yj are the Fourier transformed data, and we have used that

exp(2πi/4) = i . (16.44)

To evaluate Eqs. (16.43) efficiently, the FFT proceeds recursively. We firstly define Fourier trans-
forms of length 2:

u0 = √1 (x0
2
+ x2 ) = √1 (x0
2
+ i2k x2 ) (k = 0) , (16.45a)
√1 (x1 √1 (x1 2k
u1 = 2
+ x3 ) = 2
+ i x3 ) (k = 0) , (16.45b)
√1 (x0 √1 (x0 2k
u2 = 2
− x2 ) = 2
+ i x2 ) (k = 1) , (16.45c)
u3 = √1 (x1
2
− x3 ) = √1 (x1
2
+ i2k x3 ) (k = 1) , (16.45d)
(16.45e)

Pairs of quantities in Eqs. (16.45) are combined to form the Fourier Transform in Eqs. (16.43):

y0 = √1 (u0
2
+ u1 ) = √1 (u0
2
+ ik u1 ) (k = 0) , (16.46a)
√1 (u2 √1 (u2 k
y1 = 2
+ i u3 ) = 2
+ i u3 ) (k = 1) , (16.46b)
y2 = √1 (u0
2
− u1 ) = √1 (u0
2
+ ik u1 ) (k = 2) , (16.46c)
y3 = √1 (u2
2
− iu3 ) = √1 (u2
2
+ ik u3 ) (k = 3) , (16.46d)
(16.46e)

Let’s check that this works by evaluating y1 . We have

y1 = √1 (u2 + i u3 ) , (16.47a)
2
2 3
1 1

= 2 ( x0 − x2 + i(x1 − x3 ) ) = 2 u0 + ix1 + i x2 + i x3 , (16.47b)
(16.47c)

which agrees with Eq. (16.43b).

It is instructive to write the linear transformations in Eqs. (16.43), (16.45), and (16.46) in matrix
form. Equation (16.43) is written in matrix formulation as

~y = U~x , (16.48)

where  
1 1 1 1
1 1 i i 2 i 3 
U=  . (16.49)
2 1 i2 1 i2 
1 i3 i2 i
Equation (16.45) in matrix form is
~u = U1 ~x , (16.50)
where  
1 0 1 0
1  0 1 0 1
U1 = √  . (16.51)
2 1
 0 i2 0 
0 1 0 i2
Equation (16.46) in matrix form is
~y = U2 ~u , (16.52)
16.A. COMPARISON BETWEEN FFT AND THE QFT FOR N = 4 143

where  
1 1 0 0
1  0 0 1 i
U2 = √  2
. (16.53)
2 1 i 0 0
0 0 1 i3
With some matrix manipulations one can verify that

U = U2 U1 , (16.54)

as required. (I used Mathematica.)

We will now show that there is a close connection between the FFT and the QFT, and in particular
that the transformations U1 and U2 correspond to different parts of the diagram in Fig. 16.3.
The swap gate interchanges states |01i and |10i, so it has the matrix representation
 
1 0 0 0
0 0 1 0
S=
0
. (16.55)
1 0 0
0 0 0 1

The Hadamard gate acting on the lower qubit of Fig. 16.3 was shown in Eq. (10.11). Including now
also the (unchanged) upper qubit, the matrix representation of the transformation induced by this
gate is  
1 1 0 0
1 1 −1 0 0 
Hl = √  . (16.56)
2 0 0 1 1 
0 0 1 −1
The Hadamard on the upper qubit has a similar representation, except that the two qubits have been
interchanged, i.e.  
1 0 1 0
1 0 1 0 1
Hu = √  . (16.57)
2 1 0 −1 0 

0 1 0 −1
The controlled R1 phase gate gives a multiplicative factor of i if y0 and x0 are both 1, i.e. state |3i.
Hence  
1 0 0 0
0 1 0 0
R1 = 0 0
. (16.58)
1 0
0 0 0 i
The total effect of the quantum circuit in Fig. 16.3, reading from left to right on the circuit, is
given by the matrix product SHl R1 Hu . Note that one reads from from right to left in a product of
operators because the operators act on the right. It can be confusing that the direction of time in the
circuit diagram is opposite to that in an expression of operators. Multiplying out these matrices using
Mathematica one gets the expected result,

SHl R1 Hu = U , (16.59)

where U is the Fourier transform, shown in Eq. (16.49). Recall that S is the swap, Hl is the Hadamard
on the lower qubit, R1 is the controlled phase gate, and Hu is the Hadamard on the upper qubit.
Hence the gates in the quantum circuit in Fig. 16.3 do indeed affect a Fourier transform for 2 qubits.
144 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

y0 y
1
x1 H R1 X

ψ
2

y y0
1
x0 H X

U1 U2

Figure 16.9: The same as Fig. 16.3 but also showing the correspondence with the breakup of the FFT
into two operations U = U2 U1 , see Eqs. (16.60).

In the FFT we decomposed U into a product of two sparse matrices, U = U2 U1 , see Eq. (16.54). We
can also make a connection between the individual matrices U2 and U1 of the FFT and the individual
matrices S, Hl , Hu and R1 of the QFT. One finds

U1 = Hu , (16.60a)
U2 = SHl R1 . (16.60b)

The first is obtained by inspection and the second I checked with Mathematica. Hence the first
operation U1 in the FFT for N = 4 corresponds, in the QFT, to the Hadamard on the upper qubit in
Fig. 16.3, while the second operation U2 in the FFT corresponds to the remaining operations in the
QFT: the controlled phase gate on the upper qubit, the Hadamard on the lower qubit, and the swap.
This breakup is shown in Fig. 16.9.
To conclude this section, we have seen that for 2 qubits there is close connection between the
breakup used in the FFT and that used in the QFT. This should not be a surprise. In the FFT
we iteratively divide the FT into two FTs of half the length, while in the QFT we have a binary
representation of the states and treat each bit in turn, so clearly these are related. For N = 4, this
connection is expressed in Eqs. (16.60).

16.B Comparison of the FFT and QFT for N = 8 and generalization

to larger N

In this appendix we show how the breakup of the FFT for 3 qubits, i.e. N = 8 is related to the circuit
for the QFT. Our final result will be Fig. 16.10, which is the analog of Fig. 16.9 for N = 4.
As shown in Chapter 15, the FFT for N = 8 can be written as

(8) (8) (8)

U (8) = U3 U2 U1 (16.61)
16.B. COMPARISON OF THE FFT AND QFT FOR N = 8 AND GENERALIZATION TO LARGER N 145

where
 
1 1 1 1 1 1 1 1
1 ω ω2 ω3 ω4 ω5 ω6 ω7
ω2 ω4 ω6 ω2 ω4 ω6
 
1 1
ω3 ω6 ω4 ω7 ω2 ω5
 
1  1 ω
U (8) =√  , (16.62)
8 1
 ω4 1 ω4 1 ω4 1 ω4

1
 ω5 ω2 ω7 ω4 ω ω6 ω3

1 ω6 ω4 ω2 1 ω6 ω4 ω2
1 ω7 ω6 ω5 ω4 ω3 ω2 ω

 
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
 
0 0 1 0 0 0 1 0
 
(8) 1  0 0 0 1 0 0 0 1
U1 =√  , (16.63)
2 1
 0 0 0 ω4 0 0 0
0
 1 0 0 0 ω4 0 0
0 0 1 0 0 0 ω4 0 
0 0 0 1 0 0 0 ω4

 
1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
2
 
0 0 0 0 1 0 ω 0
1 0 ω2
 
(8) 1  0 0 0 0 0
U2 =√  4
, (16.64)
2 1
 0 ω 0 0 0 0 0
0
 1 0 ω4 0 0 0 0
0 0 0 0 1 0 ω6 0 
0 0 0 0 0 1 0 ω6

and
 
1 1 0 0 0 0 0 0
0 0 1 ω 0 0 0 0
1 ω2
 
0 0 0 0 0 0
1 ω3
 
(8) 1  0 0 0 0 0 0
U3 =√  4
. (16.65)
2 ω
 1 0 0 0 0 0 0 
0 0
 1 ω5 0 0 0 0 
0 0 0 0 1 ω6 0 0
0 0 0 0 0 0 1 ω7

One can verify by doing the matrix multiplication (using Mathematica helps) that Eq. (16.61) is
satisfied.
One can see from Fig. 16.4 that the QFT can be written as3

(8) (8) (8) (8) (8)

U (8) = S02 Hl (8)
R1,m Hm R2,u R1,u Hu(8) , (16.66)

3
Recall that we work from right to left in operator equations like Eq. (16.66) but from left to right in circuit diagrams
such as Fig. 16.4.
146 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

in a fairly obvious notation, where

 
1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0
 
0 0 1 0 0 0 0 0
 
(8) 0 0 0 0 0 0 1 0
S02 =
0
, (16.67)
 1 0 0 0 0 0 0

0 0 0 0 0 1 0 0
 
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1

 
1 1 0 0 0 0 0 0
1 −1 0 0 0 0 0 0 
 
0 0 1 1 0 0 0 0 
 
(8) 1  0 0 1 −1 0 0 0 0 
Hl =√   , (16.68)
2 0 0 0 0 1 1 0 0  
0 0 0 0 1 −1 0 0 
 
0 0 0 0 0 0 1 1 
0 0 0 0 0 0 1 −1

 
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
 
(8) 0 0 0 i 0 0 0 0
R1,m =
0
, (16.69)
 1 0 0 1 0 0 0

0 0 0 0 0 1 0 0
 
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 i

 
1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
 
1
 0 −1 0 0 0 0 0
(8) 1  0 1 0 −1 0 0 0 0
Hm =√  , (16.70)
2 0
 0 0 0 1 0 1 0
0 0 0 0 0 1 0 1
 
0 0 0 0 1 0 −1 0 
0 0 0 0 0 1 0 −1

 
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
 
(8) 0 0 0 1 0 0 0 0
R2,u =
0
, (16.71)
 1 0 0 1 0 0 0
0 0 0 0 0 ω 0 0
 
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 ω
16.B. COMPARISON OF THE FFT AND QFT FOR N = 8 AND GENERALIZATION TO LARGER N 147

 
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
 
(8) 0 0 0 1 0 0 0 0
R1,u = 0 0 0 0 1
, (16.72)
 0 0 0

0 0 0 0 0 1 0 0
 
0 0 0 0 0 0 i 0
0 0 0 0 0 0 0 i
 
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
 
0 0 1 0 0 0 1 0
 
1  0 0 0 1 0 0 0 1
Hu(8) =√  , (16.73)
2 1 0 0 0 −1
 0 0 0 
0 1 0 0 0
 −1 0 0 
0 0 1 0 1 0 −1 0 
0 0 0 1 0 1 0 −1
One may verify Eq. (16.66) using Mathematica. Note that S02 swaps qubits 0 and 2, as required to
reverse the order of the qubits.
(8) (8) (8)
Can we make a connection between the individual matrices, U1 , U2 , and U3 , in the FFT,
(8) (8) (8) (8) (8) (8) (8)
Eq. (16.61), and the individual matrices, S02 , Hl , R1,m , Hm , R2,u , R1,u , and Hu , in the QFT,
Eq. (16.66)?
(8) (8)
One immediately sees that U1 = Hu . However to make a connection between the other parts of
(8) (8)
the FFT, U2 and U3 , we introduce the swap operator between qubits 1 and 2:
 
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 0 0 1 0 0 0
 
(8) 0 0 0 0 0 1 0 0
S12 =
0
, (16.74)
 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0
 
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1

We also need to realize that we can move the R2 gate in Fig. 16.4 to the right as long as it does
not cross the Hadamard on the lowest qubit (since this is the control qubit). Hence we can also write
Eq. (16.66) as
(8) (8) (8) (8) (8) (8)
U (8) = S02 Hl R1,m R2,u Hm R1,u Hu(8) , (16.75)
(8)
where we have moved R2,u to the left. We then find that

(8)
U1 = Hu(8) , (16.76a)
(8) (8) (8) (8)
U2 = S12 Hm R1,u , (16.76b)
(8) (8) (8) (8) (8) (8)
U3 = S02 Hl R1,m R2,u S12 , (16.76c)

(8) 2

which agrees with Eqs. (16.75) and (16.61) since S12 is the identity (swapping twice makes no
change). This breakup is shown in Fig. 16.10. Apart from the reversals of qubit order, the correspon-
dence between the QFT and the FFT is straightforward to see.
148 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)

y
2
x2 H R1 X X R2 X

y
1
x1 H X X R1 ψ
3

y0
x0 H X

U1 U2 U3

Figure 16.10: Like Fig. 16.4 except that the R2 gate has been moved to the right of the Hadamard
on the middle qubit (which has no effect) and that a pair of reversals of the order of qubits 1 and 2
have been added (which also has no effect). The reversal is accomplished by a swap gate. Note that
the final reversal of the order of all three qubits (on the right of the diagram) is also accomplished by
a single swap gate. The correspondence with the breakup of the FFT (U = U3 U2 U1 ) is indicated, see
Eqs. (16.76). To see this correspondence it is necessary to include the pair of reversals of the order of
qubits 1 and 2.

x3 H R1 X X R2 X X R3 X y3

x2 H X X R1 R2 X y
2

ψ
4

x1 H X X R1 X y
1

x0 H X y0

U1 U2 U3 U4

Figure 16.11: The generalization of Figs. 16.10 and 16.9 to the case of four qubits. The correspondence
with the breakup of the FFT (U = U4 U3 U2 U1 ) is indicated.

Following the structure of Fig. 16.9 for two qubits, and Fig. 16.10 for three qubits the generalization
to four qubits is shown in Fig. 16.11. The correspondence with the FFT is clear, the only complication
being that, in order to show the correspondence, pairs of reversals of the order of the qubits (which
16.B. COMPARISON OF THE FFT AND QFT FOR N = 8 AND GENERALIZATION TO LARGER N 149

cancel each other out) have to be introduced, with one reversal being in one stage of the QFT and the
other reversal in the next stage of the QFT. Reading Fig. 16.11 from left to right, the first reversal
pair reverses qubits 2 and 3 (which needs a single swap gate between qubits 2 and 3), the next reversal
reverses qubits 1, 2 and 3 (which only needs a single swap gate between qubits 1 and 3), and the
last reversal (not a pair because this is the last one so there is no additional stage to compensate it)
reverses all 4 qubits (which needs two swap gates, one between qubits 0 and 3 and the other between
qubits 1 and 2).
To conclude, we see that there is a close parallel between the breakup of the FFT and circuit of
the QFT. The details are slightly complicated because one needs reversals of the order of the qubits
to make the correspondence precise. Note that Fig. 1 in [Link] is
related to the results presented here.
150 CHAPTER 16. THE QUANTUM FOURIER TRANSFORM (QFT)
Chapter 17

Shor’s Algorithm
When computers we build become quantum,
Then spies of all factions will want ’em.
Our codes will all fail,
They’ll hack our email,
But crypto that’s quantum will daunt ’em.

This is a slightly edited version of a limerick by Peter and Jennifer Shor. (The original version is
printed in the book by Nielsen and Chuang [NC00].) Continuing in a literary vein, on p. 453 of Nielsen
and Chuang is a very well-crafted (Shakespearean) sonnet by Daniel Gottesman on quantum error
correction. It seems that quantum computing brings out latent literary qualities in some scientists
who work on it, but unfortunately not for me!

17.1 Introduction
Consider an integer N composed of two prime factors p and q, i.e. N = p q. In Chapter 14 we showed
how to determine the factors of N from the period r of the function

f (x) ≡ ax ( mod N ) , (17.1)

where a is some number less than N and which has no factors in common with N . Since a0 = 1, the
period is the smallest value x = r such that

ar ( mod N ) = 1. (17.2)

In 1994 Peter Shor [Sho94] developed a famous quantum algorithm for period finding which is much
more efficient for factoring large integers than any known algorithm running on a classical computer.
The ability to factor a large integer can be used to decode messages sent down a public channel (such
as the internet) which have been encrypted with the RSA scheme. The first four lines of the above
limerick refer to this1 . The RSA encryption scheme is described in Chapter 13.
Here we describe in detail Shor’s algorithm to determine the period of the function f (x) in
Eq. (17.1). Useful references are [Mer07, NC00, Vat16]. There is also a helpful YouTube video at
[Link] which is less technical than the present discus-
sion.
We denote by n0 the number of bits needed to contain N , so N is comparable to 2n0 . In cryptog-
raphy, N may have of order 600 digits (so n0 ∼ 2000 bits).
1
The last line of the limerick refers to quantum key distribution (QKD) which will be discussed in Chapter 21.

151
152 CHAPTER 17. SHOR’S ALGORITHM

17.2 Modular Exponentiation

In Shor’s algorithm the period is found by a Quantum Fourier transform of the function in Eq. (17.1)
evaluated for x = 0, 1, 2, · · · , 2n − 1. What do we take for n? Now the period may be comparable to
N and, according to Mermin [Mer07], in general we need at least N periods in the data, i.e. 2n > N 2 ,
and so set n = 2n0 . We will see why the doubling of the number of qubits is necessary in Sec. 17.5.
Hence, if n0 ∼ 2000 we have n ∼ 4000.
It would seem to be a formidable (nay, impossible) task to calculate ax ( mod N ) for a value of
n
x of order 24000 . However, it can be done as follows. First compute a, a2 , a4 , · · · a2 ( mod N ) by
successively squaring. This only takes n multiplications and so can be done on a classical or quantum
computer. Let the binary expansion of x be

x = xn−1 xn−2 · · · x2 x1 x0 . (17.3)

Then we have
n−1
Y xj
j
ax = a2 . (17.4)
j=0

For example for n = 4, x = 10, the binary expansion of x is 1010 (note the least significant bit is to
the right) so
1 4 0 2 1 1 0
a10 = a8 a a a . (17.5)
The use of Eq. (17.4) to compute ax for huge values of x is called “modular exponentiation”.
We can perform modular exponentiation on a classical or quantum computer as follows. We start
with a certain value for x ≡ xn−1 xn−2 · · · x2 x1 x0 in the input register and 1 ≡ 000 · · · 001 in the output
register. We also need an additional work register with n0 qubits, whose contents we will denote by
w, with initial value w = a. The following steps compute ax ( mod N ) using Eq. (17.4):

• (a) Multiply the output register by w if x0 = 1.

• (b) Replace w by its square w → w2 .

• (a’) Repeat (a) but for x1 .

• (b’) Repeat (b)

• Continue repeating (a) (with successive bits of x) and (b).

On a classical computer, the computation has to be performed separately for each x, whereas on
a quantum computer, as we shall see, Eq. (17.4) can be computed efficiently for all x between 1 and
2n − 1 using quantum parallelism.
A schematic circuit diagram for doing modular exponentiation on a quantum computer is shown
in Fig. 17.1. There are n upper or “input” qubits, which contain the values of x, and n0 lower or
“output” qubits, which contain the function values f (x). As discussed above, we usually take n = 2n0 .
We will call the set of input qubits the “input register”, and similarly denote the output qubits as the
“output register”. The notation “input” and “output”, though often used, can be rather confusing
since both input and output registers are present in the initial state (left edge of the circuit diagram
in Fig. 17.1) and in the final state (right edge of the circuit), so from now on we will refer to these
registers as “upper” and “lower”.
Both the upper and lower registers are initialized to |0i. The qubits in the upper register are
each run through a Hadamard gate. As shown in Sec. 9.2, Hadamards acting on n qubits gives the
17.3. QUANTUM FOURIER TRANSFORM (QFT) 153

"input"
0 Hxn
n n n
U
"output"
0 n0
n0 n0

ψ ψ
0 1

symmetric sum of all 2n basis states. Hence before entering into the box U shown in Fig. 17.1, the
state of the system is
2n −1
1 X
|ψ0 i = n/2 |xin |0in0 . (17.6)
2 x=0

On exiting the box U , the state of the system has the values of f (x) in the lower register, i.e.
n −1
2X
1
|ψ1 i = |xin |f (x)in0 . (17.7)
2n/2 x=0

Note that, in general, if the lower qubits were initialized to |yin0 , then after the function acted they
would be in state |y ⊕ f (x)in0 , but here y = 0.
How many operations does this require? If we consider (b) we need to do n squares of an n0 -bit
number. Multiplying two n0 bit numbers in the simplest way2 takes O(n20 ) operations. Since n = 2n0
we see that the operation count for (b) is O(n3 ). The operation count for (a) is similar, so the total
operation count for modular exponentiation is O(n3 ).
On a classical computer one would have to perform these calculations sequentially for x = 1, 2, · · · , r,
where the period r is of order N where N is of order 2n /2, but on a quantum computer they are are
done in parallel using quantum superposition. Hence a quantum computer performs the modular
exponentiation part of Shor’s algorithm exponentially faster than a classical computer.

17.3 Quantum Fourier Transform (QFT)

Now that the state of the qubits contains f (x) for all x from 0 to 2n − 1, how do we determine the
period r? A schematic of the full circuit for doing this is shown in Fig. 17.2.
The first (left) part of the algorithm is the modular exponentiation also shown in Fig. 17.1. A
measurement is then made of the result in the (lower register from the modular exponentiation routine
U . This measurement is indicated by the lower box with an arrow in Fig. 17.2. The measurement will
yield some value for f (x), say f0 . According to the extended Born hypothesis, the upper register will
then contain a superposition of those basis states for which f (x) = f0 . Since f (x) is periodic with
period r, the possible values of x are of the form x0 + kr, so, after the measurement on the lower
2
As mentioned in Nielsen and Chuang [NC00], there are more sophisticated methods of multiplying n-bit numbers
which only take O(n (ln n) (ln ln n)) operations rather than O(n2 ). This gives a total operation count for modular expo-
nentiation of O(n2 ln n ln ln n), hardly more than O(n2 ).
154 CHAPTER 17. SHOR’S ALGORITHM

ψ ψ
2 3
"input"
xn FT n y
0 H n
n n 2
U
"output"
0 n0 f0
n0 n0

ψ ψ
0 1

Figure 17.2: Schematic circuit diagram for Shor’s algorithm for period finding on a quantum computer.
The black box U does the modular exponentiation as described in the text, see also Fig. 17.1. The
state inputted to U is given by |ψ0 i in Eq. (17.6) and the state outputted from U is given by |ψ1 i in
Eq. (17.7). A measurement (indicated by the box with the arrow) is performed on the lower register,
giving some value f0 . The double lines indicate that the measurement gives classical bits which take
values 0 or 1. The state of the upper register is then given by |ψ2 i in Eq. (17.8), the equally weighted
superposition of all values of x for which f (x) = f0 . The n qubits in the upper register then go through
the quantum Fourier transform the result of which is given by |ψ3 i in Eq. (17.10). A measurement
of the upper qubits then gives a result y which is close to an integer multiple of 2n /r, where r is the
period, as discussed in the text.

register, the state of the upper register becomes

Q−1
1 X
|ψ2 i = √ |x0 + krin . (17.8)
Q
k=0

Here 0 ≤ x0 ≤ r − 1, x0 + (Q − 1)r ≤ 2n − 1, f (x0 + kr) = f0 , and the number states in the sum is
n
2
Q= , (17.9)
r
where [· · · ] denotes the integer part. Thus Px (x), the probability of of measuring state |xi in the upper
register, consists of Q delta functions at positions x0 + kr, k = 0, 1, · · · , Q − 1, see Fig. 17.3.
If we were to measure |ψ2 i we would just get one value of x0 + kr, which, because of the dependence
on the unknown quantity x0 , does not give any information from which we might be able to determine
the period r. In order to extract information on r, we have perform a quantum Fourier transform on
the states in Eq. (17.8) before measuring. This gives
n −1
2X Q−1
!
1 X n
|ψ3 i = √ n e2πi(x0 +kr)y/2 |yin . (17.10)
y=0
2 Q
k=0

The quantum circuit which performs the quantum Fourier transform is described in Chapter 16. An
example for n = 4 qubits is shown in Fig. 17.4. The controlled phase gates act on the target qubit
according to Eq. (16.10) if the control qubit is 1, and otherwise do nothing. Like the controlled Z gate,
the controlled phase gate is symmetric between the control and target qubits (the phase is changed
only if both qubits are |1i), so the control and target qubits can be exchanged. We will use this in
Appendix 17.B when we see how to actually eliminate these 2-qubit gates.
Generalizing the diagram in Fig. 17.4 to the case of n qubits we see that controlled phase gates Rd
are required for d = 1, 2, · · · , n − 1. Hence, in total, we need n Hadamard gates and 1 + 2 + · · · + n − 1 =
17.3. QUANTUM FOURIER TRANSFORM (QFT) 155

Px (x)

r r
n
2 −1
0 x0 x 0+r x 0+2r x 0+(Q−1)r x

Figure 17.3: The probability of getting state x in the upper register if a measurement were performed
before doing the Quantum Fourier Transform. There are Q delta functions, where Q = [2n /r], each
with weight 1/Q separated by r, the period. The values of x where these delta functions appear,
x0 + kr, k = 0, 1, · · · , Q − 1, are those values for which f (x) = f0 the result obtained from the
measurement of the lower register. A measurement would get a value for x0 + kr for some k but since
we don’t know x0 this is no help in determining the period r. Hence measuring the upper register at
this point is not useful. We need to Fourier transform the state of the upper register before measuring
it, in order to determine the period.

n(n−1)/2 controlled phase gates. However, as discussed in Sec. 3.9 of Mermin [Mer07], and in Appendix
17.C, it is both impossible to contruct gates giving a phase change which is exponentially small in n,
and also not necessary to do this to obtain the QFT with the required precision. Mermin shows that
one only needs controlled phase gates Rd for d < log2 (const. n), where the constant Mermin gives is
large but independent of n. Thus the number of controlled phase gates needed in practice is of order
n log2 n which is considerably less than O(n2 ) if n is several thousand.
In fact we can eliminate the 2-qubit controlled phase gates by measuring each qubit immediately
after the gates of the QFT have acted on it, rather than after completion of the QFT. This is discussed
in Appendix 17.B.
After the quantum Fourier transform we measure the upper (input) register in Fig. 17.2, obtaining
a value for y. The probability of getting a particular state y is given by the square of the absolute
value of the amplitude of |yi in Eq. (17.10), i.e.
Q−1 2
1 X n
P (y) = n e2πikry/2 . (17.11)
2 Q
k=0

Note that the dependence on x0 , which was troublesome before doing the Fourier transform, and
appears just as a phase factor after the Fourier transform, Eq. (17.10), now drops out completely when
we take the square of the absolute value to get the probabilities in Eq. (17.11).
If y could take real values, the exponentials would add up precisely in phase (and so there would
be a peak in the probability for y), when yr/2n is an integer, i.e. for y = ym where
2n
ym = m , (17.12)
r
156 CHAPTER 17. SHOR’S ALGORITHM

y0 y0
x3 H R1 R2 R3

y
1 R R2 y
x2 H 1 1

y
2 y
x1 H R1
2

x0 y
H 3

Figure 17.4: The circuit for the quantum Fourier transform for n = 4 qubits. The controlled phase
gates act on the target qubit according to Eq. (16.10) if the control qubit is 1 and otherwise does
nothing. The final swap gates to reverse the order of the qubits outputted on the right are not
included here. Note that the controlled phase gate between qubits xi and xj is R|i−j| which makes the
structure of the circuit quite simple to understand.

in which m is an integer. Note that there are r values of m, from 0 to r − 1 since y runs over a range
of 2n values. We emphasize that ym is not an integer in general, but the measured values of y are are
integers, so there will be peaks in P (y) at integer values close to the ym in Eq. (17.12), see the sketch
in Fig. 17.5. Precise values of P (y) for a particular case will be calculated in Sec. 17.5. Hence there is
a high probability that we will obtain an integer y close to an integral multiple of 2n /r.
To summarize this part, P (y) has r peaks separated by 2n /r. We recall that r is the period, which
is what we want to compute.

17.4 A special case: the period r is a power of 2.

In some special cases the period r will be a power of 2, in which case an integer number of periods fits
exactly into the range of x-values (2n ). An example discussed by Mermin [Mer07] is if both p and q
are both primes of the form 2` + 1 (e.g. the commonly studied case of N = pq = 15). In this situation
we will not need n to be as big as 2n0 (where n0 is the number of bits needed to contain N ). Rather,
we will see that we just need 2n to be big enough to contain some integer number3 of periods for us
to exactly determine an integer multiple of 2n /r. Since the period might be as large as N , when r is
a power of 2 we need

2n = const. 2n0 rather than

(17.13)
2n = 22n0 in the general case.

Here we go through this special case because the mathematics is simpler than the general case which
we will study in the next section.
3
Even one period is sufficient, i.e. 2n = r.
17.4. A SPECIAL CASE: THE PERIOD R IS A POWER OF 2. 157

P(y)
n
ym= m 2 / r

n
2/r

n
2 −1
0
m=0 m=1 m=2 m=r−1 y

Figure 17.5: A sketch of the probability of getting state y in the input register after the Quantum
Fourier Transform. There are r peaks at ym = m 2n /r for m = 0, 1, 2, · · · , r − 1. Note that [2n /r] = Q
so the separation between the peaks in P (y) is no more than 1 away from Q, the number of peaks in
the distribution Px (x) for the state before the quantum Fourier transform, see Fig. 17.3. Precise values
of P (y) will be calculated in Sec. 17.5 for a particular case and the resulting values of P (y) will be
shown in Fig. 17.8.

First of all we check for N = 15 that the period is a power of 2 as stated above. Let’s take a = 7
which has no factors in common with 15:
x = 1, ax = 7 , (17.14a)
x
x = 2, a = 7 × 7 = 49 ≡ 4 ( mod 15 ) , (17.14b)
x
x = 3, a ≡ 7 × 4 = 28 ≡ 13 ( mod 15 ) , (17.14c)
x
x = 4, a ≡ 7 × 13 = 91 ≡ 1 ( mod 15 ) , (17.14d)
so the period is r = 4, i.e. a power of 2 as claimed.
Now, we perform the sum in Eq. (17.11). Since r is a power of 2 here, and 2n ≥ r, it follows that
n
2 /r is an integer, so Q, the number of terms in the sum in Eq. (17.11), is given exactly by
2n
Q= . (17.15)
r
From Eq. (17.15), we see that Eq. (17.11) becomes
Q−1 2
1 1 X 2πiky/Q
P (y) = e . (17.16)
r Q
k=0

Firstly suppose that y = mQ for integer m. It is trivial to see that all the exponentials in Eq. (17.16)
are unity so
1
P (y = mQ) = . (17.17)
r
Note that there are r distinct values of m, m = 0, 1, 2, · · · , r − 1 since y runs over a range of 2n values
and Q = 2n /r, see Eq. (17.15). Hence the sum of the probabilities for the set of values y = mQ is
158 CHAPTER 17. SHOR’S ALGORITHM

unity. Since the total probability must be unity there can be no probability for other values of y, as
we will now verify.
The sum in Eq. (17.16) is a geometric series, which can be summed to give
Q−1
X 1 − e2πiy
e2πiky/Q = . (17.18)
k=0
1 − e2πiy/Q

The numerator is zero for all y (recall that y is an integer), but for y 6= mQ the denominator is
non-zero, so
P (y 6= mQ) = 0 , (17.19)
as required. Thus, with probability 1, the measured value of y is an integer multiple of 2n /r. This
is shown in Fig. 17.6. Superficially, this may look similar to the situation before the QFT shown in
Fig. 17.3. The difference is that the unknown quantity x0 does not appear in Fig. 17.6. Rather, the
delta functions occur at positions ym where ym /2n = m/r from which one can determine r.
Notice the reciprocal relation between the period r in the original data in Fig. 17.3 and the period in
the Fourier transformed data which is the size of the dataset, 2n , divided by r. To use terminology from
sound waves and frequencies, quite generally, if the original dataset is a periodic function of “time”
with period r, the Fourier transform will have a peak at the “fundamental frequency”, 2n /r, and in
addition can have peaks at “higher harmonics” (m 2n /r for m > 1). It can also have a component
at zero “frequency” (y = 0) if the average of the original data is non-zero. The special nature of the
original dataset here (equally weighted, uniformly spaced delta functions, see Fig. 17.3), leads to a
Fourier transform which also comprises equally weighted, uniformly spaced delta functions.

P(y)
n
ym= m 2 / r

n n n n
2/r 2/r 2/r 2/r

n
2
0
m=0 m=1 m=2 m=r−2 m=r−1 y

Figure 17.6: The probability of getting state y in the input register after the Quantum Fourier
Transform for the special case where r is a power of 2 so there are an exact number of periods in the
interval 2n . There are r delta functions of equal weight at exactly ym = m 2n /r, for m = 0, 1, · · · , Q−1.

Let us give a simple example so we can see in detail how to extract the period r from this knowledge.
We take our previous example of N = 15, a = 7, for which we found in Eq. (17.14) that the period is
r = 4. This means that 74 ≡ 1 mod 15. We will assume that we have n = 5 qubits, so 2n = 32. The
only possible results of a measurement of y are an integer multiple of Q = 2n /r (= 8), so here we have
y = 0, 8, 16 or 24, each with equal probability 1/4, see Table 17.1.
17.5. THE GENERAL CASE: THE PERIOD IS NOT A POWER OF 2. 159

y m0 r
y m = c=
2n r0 r0
0 0 0 –
8 1 1/4 1
16 2 1/2 2
24 3 3/4 1

Table 17.1: The possible results of a measurement of y for the case of N = 15, a = 7, 2n = 32 for which
r = 4. The value of y gives us the fraction y/2n , which is also equal to m/r for some m. However, any
common factor, c, is divided out, so we write y/2n as m0 /r0 with m = cm0 , r = cr0 . Hence we obtain
r0 (and m0 ), but not c. We determine c by computing the function acr0 for c = 1, 2, · · · until we get
the value 1. There is a probability 1/2 that c = 1 works, and it is extremely unlikely that a large value
of c will be needed. The values of c in this example are shown in the last column.

From the measurement of y we determine the fraction y/2n , which is also equal to m/r for some m.
However, any common factor c is divided out. We therefore write y/2n as m0 /r0 with m = cm0 , r = cr0 .
The values of c in this example are shown in the last column of Table 17.1. In general, to determine
r = cr0 we compute the function acr0 mod N for the first few values of c = 1, 2, · · · and see for what
value of c we obtain 1, the result if cr0 = r, see Eq. (17.2). The common ratio c is unlikely to be
large. For example if m is odd, which occurs with probability 1/2, then c must equal 1. Similarly there
is probability 1/4 that m is even but not a multiple of 4 in which case c cannot be greater than 2.
Proceeding in this vein we see that it is very unlikely that c is large. In the rare case that the common
ratio c is large, we would stop after the first few values of c and restart the quantum computation (the
steps shown in Fig. 17.2).
In Table 17.1 we see that the value y = 0 does not give useful information but, since the number
of possible results is equal to r and each result is equally probable, the probability of getting y = 0 is
small if the period r is large (the situation if one needs a quantum computer).
In this section, we have seen that in the rare situation that the period is a power of 2, the mea-
surement of y gives an integer multiple of 2n /r with probability one. Hence y/2n = m/r with integer
m exactly. However, in the general case, which we discuss in the next section, the measurement of y
will give, with a probability which is high but less than one, a value such that y/2n is close to (but not
equal to) m/r. The continued fraction method in Appendix 17.A is then needed to determine m/r.
For the continued fraction method to work it turns out that we need to have at least N periods in the
range of values of x, and so we will take n = 2n0 .

17.5 The general case: the period is not a power of 2.

We now evaluate the sum in Eq. (17.11) for the general case when r is not a power of 2 so we do not
have an exact integer number of periods in the range of x-values, 2n , over which f (x) is calculated. As
discussed after Eq. (17.11), P (y) has r peaks, where each peak is in the vicinity of one of the values of
ym = m2n /r where m = 0, 1, 2, · · · , r − 1. We set
y = ym + δm ,
2n
= m + δm . (17.20)
r
We assume that δm is small, so we are close to the m-th peak, but 2n , r and m are large, since we only
need the quantum algorithm when these numbers are large. (Recall that y, the measured value is an
integer, whereas ym and δm are not.)
160 CHAPTER 17. SHOR’S ALGORITHM

Equation (17.11) involves a geometric series which can be summed as follows:

Q−1 Q−1
2πikry/2n n
X X
e = e2πikm e2πikrδm /2 ,
k=0 k=0
Q−1
n
X
= e2πikrδm /2 ,
k=0
n
1 − e2πiQrδm /2
= ,
1 − e2πirδm /2n
n
eπiQrδm /2 sin (πQrδm /2n )
= . (17.21)
eπirδm /2n sin (πrδm /2n )
1 ix −ix ).
where we used that sin x = 2i (e − e Inserting Eq. (17.21) into Eq. (17.11) the phase factors drop
out and we get
1 sin2 (πQrδm /2n )
P (y) = . (17.22)
2n Q sin2 (πrδm /2n )
Now Q is within an integer of 2n /r and Q is also large so so we can replace Qr/2n by 1 with negligible
error. Also r/2n is very small, since we take n to be big enough that there are many periods within
the range of x computed, so the sine in the denominator can be replaced by its argument. Hence, to
a good approximation,
1 sin πδm 2

P (y) = , (17.23)
r πδm
for y in the vicinity of ym . Recall that the relation between δm and y is given in Eq. (17.20). The
function in Eq. (17.23) is plotted in Fig. 17.7. The area under the curve is 1, and most of the weight
is in the peak centered at 0.
To find the period we would like to get the integer y which is closest to m 2n /r for some integer m
i.e. |δm | < 1/2. Writing πδm = x, this corresponds to |x| < π/2, and in this region
sin x 2
> , (17.24)
x π
so, according to Eq. (17.23), the probability of getting the nearest integer to ym is greater than
1 4 0.40
2
' , (17.25)
r π r
see Fig. 17.7. There are r distinct values4 of m so the total probability of getting the closest integer
to one of the ym is greater than 40%.5
So, with fairly high probability, we have obtained the nearest integer to m 2n /r for some integer m
(which we don’t know). How can we determine r from this information? We need some post-processing
which will be done on a classical computer.
In deriving Eq. (17.23) we just needed that the range of x studied contains many periods, i.e. 2n r.
Since r can not be bigger than N we needed 2n N . However, to actually extract r we need a stronger
condition, 2n > N 2 , as we shall now see.
4
One of these is for m = 0 which doesn’t give useful information but since we are interested in situations where r is
large, the difference between r and r − 1 is negligible.
5
In fact, according to Mermin [Mer07], Appendix L, when N is the product of two primes (as we have here) the period
is not only less than N but less than N/2. As a result, still using n = 2n0 qubits in the input register, the algorithm will
provide a result for r not only if the measured value of y is the closest integer to m 2n /r, but also if it is the second, third
or fourth closest. This increases the probability of a successful run to about 0.9.
17.5. THE GENERAL CASE: THE PERIOD IS NOT A POWER OF 2. 161

0.8

[ sin(πδm)/(π δm) ]2
0.6

0.4

0.2

0
-3 -2 -1 0 1 2 3
δm

Figure 17.7: A plot of the function in Eq. (17.23), neglecting the factor of 1/r where r is the number
of peaks. The area under the curve is 1. The result of a measurement will be one of a series uniformly
spaced possible values of δm separated by 1. For example, if ym +0.3 is an integer, the possible measured
values of δm would be · · · , −1.7, −0.7, 0.3, 1.3, · · · . One of these values for δm must be within 1/2 of
0 and the figure shows that the probability for this is greater than 4/π 2 , the dashed horizontal line.
(The dashed vertical lines are at δm = ±1/2). An example of real data is shown in Fig. 17.9.

We assume now that we have been successful and found a y which is within 1/2 of 2n m/r. Dividing
by 2n we have
y m 1
− < n+1 , (17.26)
2n r 2
so y/2n , our estimate for m/r, is off by no more than 1/(2 · 2n ).
The value of m/r can then be obtained using continued fractions. A continued fraction represen-
tation of a number x has the form
1
x = c0 + , (17.27)
1
c1 +
1
c2 +
c3 + · · ·
where the ci are integers known as the continued fraction coefficients. If we stop after a certain number
of iterations and ignore the remainder we have a “partial sum”, which is an approximation for x. If
x is a rational number (ratio of two integers) the continued fraction will eventually terminate. If x is
irrational (like π) the continued fraction will go on for ever. More details about continued fractions
are given in Appendix 17.A.
The crucial result of continued fractions which we need is theorem A4.16 in Appendix 4 of Ref. [NC00],
which states that if
y m 1
− < 2 (17.28)
2n r 2r
then m/r is one of the partial sums in the continued fraction representation of y/2n . Here r < N ∼
2n0 = 2n/2 so we see from Eq. (17.26) that the theorem applies6 . Hence m/r will appear as one of the
6
It is at this point that we need the data to contain at least N periods.
162 CHAPTER 17. SHOR’S ALGORITHM

partial sums in the continued fraction representation of y/2n . Since r < N this must be a partial sum
with denominator less than N . Successive partial sums get more and more accurate, so we want the
one with the largest denominator less than7 N .
As we already noted for the special case when r is a power of 2 (Sec. 17.4), if m and r have a
common factor, c say, then the continued fraction representation will divide this out and give m0 /r0
where m0 = m/c, r0 = r/c. Thus we actually get r0 which is a divisor of r. However, we may be lucky
and still get r straight away. As shown in Appendix J of Mermin [Mer07], the probability that two large
numbers chosen at random have no common factors is greater than 1/2. Thus, with probability greater
than 1/2, we get r directly. We can check if r0 is the period r by computing, on a classical computer,
ar0 ( mod N ) and seeing if we get 1. If we do not, we would try simple multiples, r = 2r0 , 3r0 , 4r0 , · · · ,
since it is very unlikely that the common factor is large. If we are very unlucky, and the common factor
is large, we could start again from the beginning, get another value for m/r and hence get another
value for r0 , and compute ar0 ( mod N ). If this is not 1, then again we try r = 2r0 , 3r0 , 4r0 , · · · . There
is also a chance that the measured value of y is not close enough to one of the ym to get the period
from continued fractions. Again, if this happens we need to repeat the whole procedure. However, we
will not have to repeat very many times because the probability of success in one run is quite high.
The probabilistic nature of Shor’s algorithm, with the resultant need to run the algorithm several
times (usually not very many), is a quite common feature of quantum algorithms.

17.6 An example
The last section was probably hard going, so we will try to clarify things by discussing a simple example.
Consider the following, which was also discussed in Chapter 14, N = 91, a = 4. As shown in Eq. (14.6),
the period is r = 6. Since the period is not a power of 2 this is a general example, as discussed in the
previous two sections.

order (m) peak position (ym = m 2n /r) nearest integer P (nearest int.)
0 0 0 0.167
1 2730.67 2731 0.114
2 5461.33 5461 0.114
3 8192 8192 0.167
4 10922.67 10923 0.114
5 13653.33 13653 0.114

Table 17.2: The peak positions in the Fourier transform for the example discussed in this chapter.
The output is at integer values of y and the nearest integers to the peaks are shown along with the
probability at those nearest integer values, computed numerically from Eq. (17.11). Neglecting the
zeroth order peak at y = 0, which doesn’t give useful information, the sum of the other probabilities at
the nearest integers is 0.623, so we have a greater than 60% probability of obtaining the nearest integer
to a non-zero multiple of 2n /r, from which one can deduce r using continued fractions, as discussed in
the text and Appendix 17.A.

7
If we have two approximants for y/2n , p/q and p0 /q 0 say, then

p p0 |pq 0 − p0 q| 1
− 0 = > 2
q q 2qq 0 N

(since q and q 0 are less than N ) unless the two approximants are equal, so there is at most one approximant with
denominator less than N which satisfies Eq. (17.26). Since successive approximants give better approximations, the
unique partial fraction that we want must be the one with the largest denominator less than N .
17.6. AN EXAMPLE 163

One needs n0 = 7 bits to represent N so we take n = 2n0 = 14. Hence

2n
= 2730.67 (17.29)
r
so
Q = 2730 . (17.30)
Hence there are 2730 (and two thirds) periods in our data. As discussed in Mermin [Mer07] and
Sec. 17.5 we need at least N (= 91) periods so 2730 is something of an overkill. The peaks in the
Fourier transform, which are at integers next to multiples of 2n /r as discussed above, are shown in
Table 17.2.

0.18

0.16 N=91, a=4, r = 6

n=14, 2n = 16384
0.14

0.12

0.1
P(y)

0.08

0.06

0.04

0.02

0
0 2000 4000 6000 8000 10000 12000 14000 16000
y

Figure 17.8: Probabilities for the different components of the Fourier transformed state for the
example studied with N = 91, a = 4 for which the period is r = 6. These are computed numerically
from Eq. (17.11). There are six sharp peaks near ym = m 2n /r, for m = 0, 1, · · · , 5. The one at
y = 0 (m = 0) doesn’t give useful information. However, the probability of hitting the highest point
of one of the other five peaks, i.e. the nearest integer to a non-zero multiple of 2n /r, is greater than
60%, see Table 17.2. If, as is likely, the measurement gives one of these results, it can then be used to
determine the period r, as discussed in the text and Appendix 17.A. A blowup of the m = 2 peak is
shown in Fig. 17.9.

I have evaluated P (y) numerically from Eq. (17.11) and the results are shown in Fig. 17.8. There
are r = 6 peaks at values close to ym = m 22 /r. There is a trivial peak at exactly y = 0 (m = 0) but
this can not give any useful information about the period r. For the other 5 peaks, the peaks are not,
in general, centered at exactly integer values, so the possible observed (integer) values of y are a set
of discrete values around each peak, as shown in the histogram in Fig. 17.9 which blows up the region
around the m = 2 peak.
As discussed in Sec. 17.5, the sum in Eq. (17.11) can be evaluated, and is given, to a good ap-
proximation, by Eq. (17.23) in the region of the m-th peak, where y is given by Eq. (17.20), and ym ,
given by Eq. (17.12), indicates the peak position. (Recall that y itself is an integer.) The function
164 CHAPTER 17. SHOR’S ALGORITHM

0.18
Second peak N=91, a=4, r = 6
0.16 n n
2 (2 / r) = 5461.33 n=14, 2 = 16384
0.14

0.12

0.1

P(y) 0.08

0.06

0.04

0.02

0
5458 5459 5460 5461 5462 5463 5464
y

Figure 17.9: A blowup of the region around the m = 2 peak in Fig. 17.8 (see also Table 17.2).
The histogram is obtained from numerical evaluation of Eq. (17.11). The probability is dominated
by the biggest bar, which is at y = 5461 the nearest integer to y2 = 2 × (2n /r) = 5461.33 (indicated
by the vertical dashed line). According to Eq. (17.32), the sum of the weights in the histogram
is 1/r (= 1/6 here). The solid curve is the expression shown in Eq. (17.23), with y (= ym + δm )
considered to be a continuous variable.

in Eq. (17.23) is plotted for continuous y as the solid curve in Fig. 17.9. When evaluated at integer
y, it agrees very well with the values numerically computed from Eq. (17.11) which are shown as the
histogram in Fig. 17.9.
Note that δm in Eq. (17.23) is defined in Eq. (17.20) and can be written as

δm = + ` (17.31)

where ` is an integer and || < 0.5. Note too that

∞
sin(π( + `)) 2
X
= 1, (17.32)
π( + `)
`=−∞

for arbitrary (recent versions of Mathematica know this). Hence, according to Eqs. (17.23), (17.31),
and (17.32), the weight around each of the peaks in Fig. 17.8 is equal to 1/r (= 1/6 here). There are
r peaks so the total probability is r × (1/r) = 1 as required. Referring to Fig. 17.9, the weight in the
largest bar is 0.114 which is 68% of 1/6, the total weight in all the bars for this (m = 2) peak.
From Table 17.2 we see that the probability of getting the nearest integer to an integral multiple of
2n /r is greater than 60%. Let’s suppose we get one of these. In fact, lets suppose we get the large bar
at y = 5461 in Fig. 17.9. (Recall that Fig. 17.9 is a blowup of the m = 2 peak in Fig. 17.8.) Given the
measured value, y = 5461, we will now see how to determine the period r using continued fractions.
We define x = y/2n . This is close to m/r, where r, the period, is what we want to determine. Since
r is no greater than N , as discussed in Sec. 17.5, the best guess for x is the partial sum having the
17.7. SUMMARY 165

largest denominator less than N . As stated above we assume in this example that the measurement
gave the value y = 5461, the highest histogram for the peak in Fig. 17.9. We therefore determine the
continued fraction representation for x = 5461/16384 (since n = 14 we have 2n = 16384). Since this is
a rational fraction the continued fraction terminates.
We use the methods of Appendix 17.A to determine coefficients as follows. We have c0 = [x] = 0
(note: [· · · ] means the integer part of what is in the brackets). We subtract c0 from x and call the
inverse of the remainder x1 , so x1 = 16384/5461. c1 is the integer part of x1 so c1 = 3. Subtract c1
from x1 and call the inverse of the remainder x2 . Since x1 − c1 = 1/5416, we have x2 = 5461. Since
this is an integer, the continued fraction terminates at this point. Hence the coefficients are

c0 = 0, c1 = 3, c2 = 5461, (17.33)

and the corresponding partial sums are

c0 = 0,
1 1
c0 + = ,
c1 3
(17.34)
1 5416
c0 + = .
1 16384
c1 +
c2

The last result has a denominator bigger than N (= 91) so we neglect it and conclude that8

m 1
= . (17.35)
r 3
It is possible that m and r have a common factor, i.e. m = c, r = 3c for some integer c. We try some
small values for c. Starting with c = 1, so r = 3, we compute a3 (mod 91 ) and find that it is not 1,
see Eq. (14.6c). However, we find that c = 2 does work, since a6 ≡ 1 (mod 91 ), see Eq. (14.6f). Hence
the period r is equal to 6, the desired result.

17.7 Summary
What is the operation count for Shor’s period finding algorithm?
To factor an integer with n bits, the QFT requires, in principle, O(n2 ) operations, as shown in
section 17.3. Note, however, as discussed there, in Appendix 17.C, and in Mermin [Mer07], in practice
one only needs of order n log2 n gates to perform the QFT to within the necessary precision.
The computation of the function values using modular exponentiation takes O(n3 ) operations,
as shown in section 17.2 (but see footnote 2 on page 153 which states that the operation count is
O(n2 log n log log n), not much more than O(n2 ), if one uses a sophisticated method for multiplying
two large numbers).
What about the continued fraction part, which is, of course, done on a classical computer? Each
division of an n-bit number takes of order n2 operations if the division is done in a simple way. In fact,
division can be rewritten as several multiplications, see [Link]
algorithm, so the operation count can be reduced to that for multiplication, i.e. O(n log n log log n).
8
In this case, where there are many more than N periods in the intervals 2n , one gets the right answer from the
continued fraction if the measurement gives one of the other nearby y values. For example, if we get y = 5460 (the third
closest to the peak), the continued fraction coefficients are 0, 3, and 1365 which give the partial sums 0, 1/3, 1365/4096.
The last value has a denominator greater than N , so we ignore it and take the previous partial sum, again getting
m/r = 1/3.
166 CHAPTER 17. SHOR’S ALGORITHM

The depth of the continued fraction where the denominator is O(N ) is O(log N ), since the coefficients in
the continued fraction multiply to get the numerator and denominator. This is O(n) since N contains
no more than n/2 bits. Hence the operation count for the continued fraction post-processing is O(n3 ),
but recall that this is done on a classical computer. Again the count is not much more than O(n2 ) if
one uses a sophisticated method for dividing two large numbers. Hence, the overall operation count of
Shor’s algorithm is9 O(n3 ).
Shor’s algorithm for factoring integers therefore runs in polynomial time as a function of n, the
number of bits in N . For comparison, no polynomial time classical algorithm for factoring integers
is known. The fastest classical algorithm at present, the general number field sieve (GNFS), takes a
time exp(const. n1/3 log2/3 n). It is currently not known whether there exists a yet to be determined
polynomial time classical algorithm for factorization.
Even though the power of n in the exponent of the GNFS algorithm is less than one, it still
much slower for large n than Shor’s polynomial-time algorithm. Hence, if the considerable technical
difficulties could be overcome, and a quantum computer with a sufficiently large number of qubits built
with the error rate made sufficiently low, then such a device could decode encrypted messages currently
being sent down the internet which are currently impossible to decode on a classical computer.

Problems
17.1. Continued Fractions
Consider the Shor algorithm for n = 10, so 2n = 1024. Recall that there are peaks in the quantum
Fourier transform in the vicinity of ym = m2n /r, for integer m, where r is the period that we
wish to determine. Suppose we measure y = 695, which, with high probability will be close to
one of the peaks.

(i) Go through the continued fraction calculation to determine the period.

(ii) Compare the resulting value of ym = m2n /r with the measured (integer) value of 695.

Note:

• Recall you want the continued fraction with the largest denominator less than N , the number
being factored. Since we take 2n to be comparable to N 2 , as discussed in class, you may
assume that N is no bigger than 50.
• Note that the period could, potentially, be a multiple of the denominator, r0 , you found
in the continued fraction. In a real situation, this would be checked by seeing if acr0
mod N = 1, for c = 1, 2, · · · . Neglect this possibility here and take the period r to equal
the denominator r0 .
• If you wish you may use a package such as Mathematica, or write your own computer
program, to help with evaluating the continued fraction.

17.2. Consider Shor’s algorithm for determining the period r of the function

f (x) = ax mod N, (17.36)

so ar mod N = 1. Recall that the register containing the values of f (x) is measured, and then
the register containing the x-values is acted on by a quantum Fourier transform (QFT). The
9
This can be reduced to O(n2 log n log log n) using sophisticated methods for multiplying and dividing large numbers.
17.A. CONTINUED FRACTIONS 167

values of x range from 0 to 2n − 1. We showed that if one then measures the n-qubit register
containing the x-values the probability of getting the (integer) value y is given by
Q−1 2
1 X 2πirky/2n
P (y) = n e (17.37)
2 Q
k=0

where, in general,
2n

Q= , (17.38)
r
in which [x] means the integer part of x.
In this question we consider a simple case in which r is a power of 2, so here 2n /r is precisely an
integer.
(i) Show that the probability of getting y = mQ for m = 0, 1, 2 · · · , r − 1 is given by P (y =
mQ) = 1/r, and that the probability of getting any other y-value is zero.
(ii) Suppose n = 6 (so 2n = 64) and the period is r = 8. What are the possible values of y?
(iii) We showed in class that y/2n = k/r for some integer k. For each of the possible values of y
from the previous part what are the values of k/r (dividing out any common factors)?
Example: for y = 48 we have k/r = 3/4.
(iv) For each of the possible y values there is still a little work to determine the period r. Explain
what you have to do for each possible value of y.
Example: in the example in the last part, for y = 48 we have k/r = 3/4. Is the period equal
to the denominator, i.e. 4? How would one check this? If 4 is not the period, what would
one check next?

Appendices

17.A Continued Fractions

Continued fractions are a convenient way of finding a simple rational approximation to a number.
The continued fraction representation of a number x is obtained as follows. If there is an integer
part of x call this c0 . Subtract c0 from x and call the inverse of the remainder x1 , so
1
x = c0 + . (17.39)
x1
Let the integer part of x1 be c1 . Subtract c1 from x1 and call the inverse of the remainder x2 so
x1 = c1 + 1/x2 . Continuing in the same way for c2 and x3 etc. we get
1 1 1
x = c0 + = c0 + · · · = c0 + . (17.40)
1 1 1
c1 + c1 + c1 +
x2 1 1
c2 + c2 +
x3 c3 + · · ·
To evaluate continued fractions we start at the bottom. For example if we wish to evaluate
1
x= (17.41)
1
2+
1
5+
4
168 CHAPTER 17. SHOR’S ALGORITHM

we determine first that

1 21
5+ = (17.42)
4 4
and then that
4 46
2+ = (17.43)
21 21
so
21
x= . (17.44)
46
If we stop after a certain number of iterations and ignore the remainder we have a “partial sum”,
which is an approximation for x. After each iteration the approximation improves. If x is a rational
number (ratio of two integers) the continued fraction will eventually terminate. If x is irrational (like π)
the continued fraction will go on for ever. The first few continued fraction coefficients ci (i = 0, 1, 2 · · · )
for π = 3.141592654 . . . are
3, 7, 15, 1, 292, 1, · · · . (17.45)
It is a property of continued fractions, which you can verify, that if a relatively large coefficient
appears at some point, stopping the continued fraction at the previous coefficient gives an accurate
approximation to the number. For, example, omitting 15 and subsequent coefficients in Eq. (17.45)
gives the well known approximation10

1 22
3+ = = 3.14286 . . . , (17.46)
7 7

which has an error of about 10−3 (the continued fraction coefficients are in bold).
In the present case we are interested in the continued fraction representation of y/2n , which is
a rational fraction so the continued fraction will eventually terminate. As discussed in the text, the
value of y/2n is close to m/r where r is no bigger than N (N can be represented by n0 qubits with
n0 = n/2). So we are interested in a continued fraction approximation to y/2n with a denominator no
bigger than N . (Recall that 2n = (2n0 )2 which is greater than N 2 .)
Consider the example described in this chapter which has N = 91, a = 4 and n = 14 so 2n = 16384.
The most probable results for y are those in the column labeled “nearest integer” in Table 17.2.
Suppose the measurement of y gives the nearest integer for m = 5, i.e. 13653. The continued fraction
representation of 13653/16384 is obtained as follows:

13653
x= ,
16384
16384
c0 = [x] = 0, x1 = (x − c0 )−1 =
13653
13653
c1 = [x1 ] = 1, x2 = (x1 − c1 )−1 =
2731
2731 (17.47)
c2 = [x2 ] = 4, x3 = (x2 − c2 )−1 =
2729
2729
c3 = [x3 ] = 1, x4 = (x3 − c3 )−1 =
2
c4 = [x4 ] = 1364, x5 = (x4 − c4 )−1 =2
c5 = [x5 ] = 2,
10
A much more accurate result is obtained by omitting 292 and subsequent terms, which gives a value 355/113 =
3.141592920 . . ., which has an error of a bit less than 3 × 10−7 . This rational approximation to π was apparently first
obtained by a Chinese mathematician Zu Chouygzhi about 1500 years ago.
17.B. ELIMINATING THE TWO-QUBIT GATES 169

and the series terminates since x5 is an integer. Hence the exact continued fraction coefficients of
13653/16384 are
0, 1, 4, 1, 1364, 2 . (17.48)
Successive partial sums are 0, 1, 4/5, 5/6, 6824/8189 and 13653/16384. We want the partial sum with
the largest denominator less than N (= 91), which is 5/6. This tells us, if m and r have no common
factors, that m = 5 and r = 6.
We check if r = 6 works by directly calculating 46 ( mod 91). We find that it is equal to 1, see
Eq. (14.6f), so the period is indeed 6. According to Appendix M in Mermin [Mer07] the probability
of two large randomly chosen numbers not having a common factor is greater than 1/2. If we are
unlucky and the assumption of no common factor does not work, then usually we would only have to
try a few values for the common factor i.e. 2, 3, 4, · · · , before succeeding. If we are really unlucky, and
the common factor is very large, we would give up at some point, start again and get a different value
for y. In the related example studied in detail in Sec. 17.6, where the measurement gives the nearest
integer to the second peak, the common factor is 2.

17.B Eliminating the two-qubit gates

It is possible to replace the 2-qubit gates by 1-qubit gates which act or not depending on the result of a
measurement. This is important from a technological point of view since 1-qubit gates are much easier
to implement than 2-qubit gates. The point is that we measure the final state of the QFT anyway,
and we will see that we can eliminate the 2-qubit gates by measuring each qubit immediately after all
the gates of the QFT have acted on it rather than waiting until the QFT is completed. We now see
how to do this.
y0
x3 H y0

y
x2 H 1 y
R1 1

y
x1 R2 R1 H 2 y
2

x0 R3 R2 y
R1 H 3

Figure 17.10: Circuit equivalent to Fig. 17.4 but with the target and control qubits interchanged on
the controlled phase gates.

First of all we note that, similar to the control-Z gate, the target and control qubits in the controlled
phase gates can be interchanged. Hence Fig. 17.4 is equivalent to Fig. 17.10.
In Fig 17.10 we see that, for each qubit, once the phase gates and Hadamard have acted the qubit
doesn’t change, so it could be measured at this point. (Recall that time flows from left to right in circuit
diagrams). Consider the top qubit x3 which, on output, is y0 . We can measure it immediately after
170 CHAPTER 17. SHOR’S ALGORITHM

x3 H y0 y0

y y
x2 R10 H y1 1

y
x1 R20 Ry11 H y2
y
2

y y1 y y
x0 R3 0 R2 R1 2 H y3 3

Figure 17.11: Circuit for the QFT with 4 qubits equivalent to Fig. 17.10 but in which each qubit
is measured immediately after the Hadamard gate. Subsequent phase gates (on qubits lower in the
diagram) are controlled by classical circuits (not shown) which use the values of the already measured
qubits. Note that R1y0 means R1 to the power y0 . Since y0 is 0 or 1 this gives R1 if y0 = 1 and 1 if
y0 = 0. Hence we obtain the required control, but done by a classical circuit rather than the 2-qubit
controlled phase gates in Fig. 17.10.

the Hadamard has acted, since it doesn’t change after that. If the result is y0 = 1 then the R1 phase
gate for x2 is activated, as well as the R2 phase gate for x1 and the R3 phase gate for x0 . However, if
the result is y0 = 0 then those phase gates are not activated. Since y0 has been measured, this control
can be done by a classical circuit, which is much simpler to implement than a 2-qubit quantum gate.
Similarly we measure x2 , which is y1 on output, immediately after its Hadamard. Hence the R1 gate
on x1 and the R2 gate on x0 can be activated classically if y1 = 1. We can proceed in this way for
the whole circuit, measuring the qubit after the Hadamard, and using the result to phase change other
qubits, or not, using classical control. The circuit is shown in Fig. 17.11.

17.C Unimportance of Small Phase Errors

The action of the controlled-phase gate is given by Eq. (16.10) and the QFT requires, in principle, these
gates for d = 1, 2, · · · , n−1. The total number of controlled phase gates is therefore 1+2+· · ·+n−1 =
O(n2 ). However, it is clearly impossible to accurately construct a phase gate for a phase which is
exponentially small in n if n is large. For factoring, n would typically be several thousand.
Fortunately it is not necessary to include controlled phase gates with such small phase changes.
Mermin [Mer07] shows that one can generate the closest integer to a multiple of 2n /r within almost
the same probability as when one includes all gates (reduced by at most 1%) if one neglects controlled
phase gates with d > d? = log2 (Cn), where the constant C is quite large (500π) but independent of
n. Hence, in practice, one only needs of order d? n controlled phase gates (∼ n log2 n) to obtain the
desired result, rather than O(n2 ) which would be needed if one includes all the gates with d up to n.
Hence the size of the circuit does not grow much faster than n which is a huge improvement compared
with O(n2 ) if n is several thousand.
Chapter 18

Coherent Superposition Versus

Incoherent Addition of Probabilities

In the next chapter we shall discuss the effects of external noise on qubits. This will require us to
understand the distinction between a coherent superposition of amplitudes in quantum mechanics and
an incoherent (classical) addition of probabilities. This is the topic that we discuss here.

18.1 Coherent Linear Superposition: 1 qubit

To illustrate coherent superposition, consider one qubit in the following state

|ψi = α|0i + β|1i, (18.1)

|0i with probability |α|2 = p,

(18.2)
|1i with probability |β|2 = 1 − p.

To show the effects of interference we apply a Hadamard gate, defined in Eq. (2.35), before doing
the measurement. The result is

0 α β α+β α−β
|ψ i = H|ψi = √ (|0i + |1i) + √ (|0i − |1i) = √ |0i + √ |1i. (18.3)
2 2 2 2
If we do a measurement in the computational basis after applying the Hadamard, the results are

|0i with probability 21 |α + β|2 = 1

2 (1 + αβ ∗ + α∗ β) ,
(18.4)
|1i with probability 12 |α − β|2 = 1
2 (1 − αβ ∗ − α∗ β) .

The factor αβ ∗ + α∗ β comes from interference between the two pieces in the linear combination of |ψi
in Eq. (18.1). In particular, if α = β = √12 , so p = 12 , we get

|0i with probability 1,

(18.5)
|1i with probability 0,

171
172CHAPTER 18. COHERENT SUPERPOSITION VERSUS INCOHERENT ADDITION OF PROBABILIT

showing that there is zero probability of getting state |1i in this case if we measure after performing a
Hadamard. The vanishing probability of getting |1i is due to destructive interference between the two
pieces of the superposition in state |ψi in Eq. (18.1).
We emphasise that it is incorrect to claim that the state in Eq. (18.1) corresponds to the qubit
being in state |0i with probability |α|2 and in state |1i with probability |β|2 . Although this gives the
correct result if we measure without acting with the Hadamard it gives incorrect results if we apply
the Hadamard before measuring. The reason is that, after acting with the Hadamard gate, the system
would be in state H|0i = 12 (|0i+|1i) with probability |α|2 and state H|1i = 21 (|0i−|1i) with probability
|β|2 . Adding the probabilities and using |α|2 + |β|2 = 1, we find that a measurement would then give

1
|0i with probability ,
2 (18.6)
1
|1i with probability ,
2
which does not have the interference terms present in Eq. (18.4).

18.2 Incoherent (Classical) Addition of Probabilities

18.2.1 Example with 1 qubit
An example of a situation with classical probabilities is measuring a single qubit in the presence of
external noise. Suppose the qubit starts out in state |ψi in Eq. (18.1) but is then acted on by noise
which randomises the phases of the two parts of the superposition. After the noise has acted for some
time, we can write the state in terms of a global phase θ and a relative phase φ as

|ψi = eiθ α|0i + eiφ β|1i . (18.7)

Measuring |ψi in the computational basis gives the same results as without noise in Eq. (18.2).
However, there is a difference if we apply a Hadamard before doing the measurement. After a Hadamard
this state becomes
iφ iφ

iθ α + e β iθ α − e β
e √ |0i + e √ |1i. (18.8)
2 2
If we then measure, we will get state |0i with probability

1 iθ 1
e α + eiφ β e−iθ α? + e−iφ β ? = |α|2 + |β|2 + e−iφ αβ ? + eiφ α? β . (18.9)
2 2
The global phase θ drops out, of course, but we still need to average over the relative phase φ. After
some time the external noise
R 2π will have completely randomized the phases so each value of φ will be
equally probable. Since 0 eiφ dφ = 0 the interference terms disappear 1 when we average over the
1 2 2
relative phase, so the probability of getting state |0i is 2 |α| + |β| = 2 . In other words measuring
the qubit after acting with a Hadamard one finds

|0i with probability 12 , and similarly

(18.10)
|1i with probability 21 .

The probabilities in Eq. (18.10) differ from those in Eq. (18.4), which is for the case of a coherent
superposition, by the absence of the factors of αβ ∗ + α∗ β which came from interference. Interference
does not happen here because the phase relation between the |0i and |1i parts of the qubit state has
been erased by noise.
18.2. INCOHERENT (CLASSICAL) ADDITION OF PROBABILITIES 173

However Eq. (18.10) is the same as Eq. (18.6) where we assumed that the qubit is in state |0i
with probability |α|2 and in state |1i with probability |β|2 and added those probabilities as in classical
statistics. Thus we shall call the results like Eq. (18.10), when noise has erased the phase differ-
ence, an incoherent (classical) average over probabilities, as opposed to a result like Eq. (18.4) for the
superposition state in Eq. (18.1), which is a coherent sum over amplitudes.

18.2.2 Example with 2 qubits

As another example of the incoherent addition of probabilities, consider two qubits in the following
entangled state
|ψ2 i = α|00i + β|11i, (18.11)
where we again denote |α|2 by p. If α = ±β = √12 this is a Bell state. Let us write |ψ2 i more explicitly
as
|ψ2 i = α|0A i ⊗ |0B i + β|1A i ⊗ |1B i. (18.12)
If we focus on qubit A, say, then state |ψ2 i looks rather similar to the 1-qubit state |ψi in Eq. (18.1),
in that there is a piece where qubit A is |0i with amplitude α and a piece where qubit A is |1i with
amplitude β. However, for |ψ2 i, unlike for |ψi, each of these pieces goes with a different state for qubit
B (i.e. |ψ2 i is entangled). Because of this entanglement, we will not get interference between the pieces
of |ψ2 i if we perform operations on qubit A followed by a measurement of that qubit, as we now show.
If we measure qubit A before doing any operation on it we get
|0i with probability p (= |α|2 ),
(18.13)
|1i with probability 1 − p (= |β|2 ),
which is the same as for the other examples.
However, if we apply a Hadamard the state is given by
|ψ20 i = HA |ψ2 i
= α (HA |0A i) ⊗ |0B i + β (HA |1A i) ⊗ |1B i
α β
= √ (|0A 0B i + |1A 0B i) + √ (|0A 1B i − |1A 1B i)
2 2 (18.14)
1 h i
= √ |0A i ⊗ (α|0B i + β|1B i) + |1A i ⊗ (α|0B i − β|1B i)
2
1 h i
= √ |0A i ⊗ |φ0,B i + |1A i ⊗ |φ1,B i ,
2
where
|φ0,B i = α|0B i + β|1B i
(18.15)
|φ1,B i = α|0B i − β|1B i.
According to the generalized Born hypothesis discussed in Sec. 3.10, if one measures qubit A after
acting with the Hadamard one finds qubit A is in state |0i with probability 1/2, leaving qubit B in
state |φ0,B i, and is in state |1i with probability 1/2, in which case qubit B is left in state |φ1,B i. Again,
the probabilities differ from those in Eq. (18.4), which is for the case of a coherent superposition, by
the absence of the factors of αβ ∗ + α∗ β which came from interference.
One could also obtain these results by computing the density matrix for qubit A, see Chapter 5,
particularly Example 2 in Sec. 5.4.
Intuitively, interference terms do not appear when the qubit being investigated (qubit A here) is
entangled with another qubit because there is then no well defined phase relation between the two
parts of the superposition (|0A i and |1A |i).
174CHAPTER 18. COHERENT SUPERPOSITION VERSUS INCOHERENT ADDITION OF PROBABILIT

18.3 Summary
For a coherent superposition, to compute probabilities one sums the amplitudes and then squares, e.g.
1
|α + β|2 , (18.16)
2
while for an incoherent addition of probabilities, which happens when the relative phase is erased by
noise or by entanglement with other qubits, one squares and then sums, e.g.
1
|α|2 + |β|2 .

(18.17)
2
Chapter 19

Quantum Error Correction

19.1 Introduction
Quantum error correction has developed into a huge topic, so here we will only be able to describe the
main ideas.
Error correction is essential for quantum computing, but appeared at first to be impossible, for rea-
sons that we shall soon see. The field was transformed in 1995 by Shor [Sho95] and Steane [Ste96] who
showed that quantum error correction is feasible. Before Shor and Steane, the goal of building a useful
quantum computer seemed clearly unattainable. After those two papers, while building a quantum
computer obviously posed enormous experimental challenges, it was not necessarily impossible.
Some general references on quantum error correction are Refs. [Mer07, NC00, Vat16, RP14].
Let us start by giving a simple discussion of classical error correction which will motivate our study
of quantum error correction. Classically, error correction is not necessary for computation. This is
because the hardware for one bit is huge on an atomic scale and the states 0 and 1 are so different
that the probability of an unwanted flip is tiny. However, error correction is needed classically for
transmitting a signal over large distances where it attenuates and can be corrupted by noise.
To perform error correction one needs redundancy. One simple way of doing classical error correc-
tion is to encode each logical bit by three physical bits, i.e.

|0i → |0i ≡ |0i|0i|0i ≡ |000i , (19.1a)

|1i → |1i ≡ |1i|1i|1i ≡ |111i , (19.1b)

(for convenience we are using Dirac notation here even though these are classical bits for now.) The
sets of three bits, |000i and |111i, are called codewords. One monitors the codewords to look for errors.
If the bits in a codeword are not all the same one uses “majority rule” to correct. For example

|010i is corrected to |000i

(19.2)
|110i is corrected to |111i.

This works if no more than one bit is corrupted and so the error rate must be sufficiently low that the
probability of two or more bits in a codeword being corrupted is negligible.
In quantum error correction one also uses multi-qubit codewords and monitoring. However, there
are several major differences compared with classical error correction:

1. Error correction is essential. Quantum computing requires error correction. This is because the
physical systems for a single qubit are very small, often on an atomic scale, so any small outside
interference can disrupt the quantum state.

175
176 CHAPTER 19. QUANTUM ERROR CORRECTION

2. Measurement destroys quantum information. In contrast to the classical case checking for errors
is problematic. Monitoring means measuring, and measuring a general quantum state alters it.
Thus it seems that any attempt at error correction must destroy important quantum information.

3. More general types of error can occur. Bit flips are not the only possible errors. For example one
can have phase errors where √12 (|0i + |1i) → √12 (|0i + eiφ |1i).

4. Errors are continuous. Unlike all-or-nothing bit flip errors for classical bits, errors in qubits can
grow continuously out of the uncorrupted state.

One might imagine that point (2), in particular, would be fatal. Amazingly this is not so as we shall
see.

19.2 Correcting bit flip errors

We start our discussion of quantum error correction by considering how one can correct for just bit
flip errors. If the error rate is low we might hope to correct them by tripling the number of qubits as
in the classical case, Eq. (19.1).
The tripling of the qubits can be accomplished by the circuit in Fig. 19.1. To see how this works
suppose that the input qubit, |xi, is |0i. Then none of the Ctrl-X (CNOT) gates act on their target
qubit so all three qubits are |0i at the end (i.e. on the right). However, if the input qubit |xi is |1i
then the Ctrl-X gates act so all three qubits are 1 at the end.

x x

0 X x

α 0 +β 1

0 X α 000 + β 111

0 X

Figure 19.2: Circuit to encode the 3-qubit bit-flip code acting on a linear combination of |0i and |1i.
19.2. CORRECTING BIT FLIP ERRORS 177

By linearity a linear combination of |0i and |1i is transformed as we want:

α|0i + β|1i → α|000i + β|111i , (19.3)

see Fig. 19.2. Note that this is not a clone of the input state which would be

( α|0i + β|1i )⊗3 = α3 |000i + α2 β ( |001i + |010i + |100i ) + αβ 2 ( |110i + |101i + |011i ) + β 3 |111i .
(19.4)
We recall that cloning an arbitrary unknown state is impossible according to the no-cloning theorem.
Now we have to check if any of the three qubits generated by the circuit in Fig. 19.2 are flipped,
i.e. if the situation is that shown in Fig. 19.3. We assume that no more than one has been flipped,
which is a reasonable approximation if the error rate is small.

α 0 +β 1 X

0 X X

Figure 19.3: Circuit indicating that at most one of the three bits generated by the circuit in Fig. 19.2
has flipped due to an error. The goal will be to determine whether any have flipped, if so which one,
and then correct the error.

We have therefore one uncorrupted state and three corrupted states:

| ψ i = α|000i + β|111i , (19.5a)

|ψ1 i = α|100i + β|011i = X1 | ψ i (qubit 1 flipped) , (19.5b)
|ψ2 i = α|010i + β|101i = X2 | ψ i (qubit 2 flipped) , (19.5c)
|ψ3 i = α|001i + β|110i = X3 | ψ i (qubit 3 flipped) . (19.5d)

These four states are called the “syndromes”. Note that we denote the left hand qubit as the first
qubit, the one to its right as the second qubit, and so on, e.g. |x1 x2 x3 i. Hence in Eq. (19.5) |ψi i refers
to the state in which qubit i is flipped relative to the uncorrupted state |ψi.
Classically, to determine if one of the bits is flipped we just have to look at them. However, quantum
mechanically, if we measure |ψi, say, we get |000i with probability |α|2 and |111i with probability |β|2 ,
which destroys the coherent superposition. It might therefore seem that quantum error correction is
impossible.
Amazingly this is not so. The secret is to couple the codeword qubits to ancillary qubits and
measure only the ancillas. This will give enough information to determine which syndrome the state
is in without destroying the coherent superposition.
Here we need two ancillary qubits. The circuit including them is shown in Fig. 19.4. The three
codeword qubits are at the bottom and the ancillary qubits are at the top. The ancillary qubits are
measured and give values x and y. We shall now see that each of the four possible pairs of values for
x and y corresponds to one of the syndrome states in Eq. (19.5).
Both ancillas are targeted by two of the codeword qubits:
178 CHAPTER 19. QUANTUM ERROR CORRECTION

detection correction
0 X X x

0 X X y
~
xy
1 X X
or
xy α 000 + β 111
α 000 + β 111 2 X X
or
~
xy
3 X X

Figure 19.4: Circuit to determine the syndrome for the 3-qubit bit-flip code, and correct if necessary. A
box with an arrow denotes a measurement. The double lines indicate that the result of a measurement
is a classical bit.

1st (upper) ancilla (x) is targeted by codeword qubits 1 and 2.

2nd (lower) ancilla (y) is targeted by codeword qubits 2 and 3.

Let’s see what happens for the four syndrome states.

|ψi Codeword |000i. No ancilla flipped so x = 0, y = 0.

Codeword |111i. Both ancillas are flipped twice so again x = 0, y = 0.
Note that the result of the measurement is the same for both the |000i and |111i parts of the state
|ψi. Hence the coherent superposition of |ψi is not destroyed by the measurement on the ancillas.
If the result of the measurement were different for the different parts of the superposition, then
only the piece corresponding to the measured value would survive and the superposition would
be broken.

|ψ1 i Codeword |100i. x is flipped once, and y is not flipped, so x = 1, y = 0.

Codeword |011i. x is flipped once and y is flipped twice so again x = 1, y = 0.
Recall that the qubits are ordered such that qubit 1 is on the left.

|ψ2 i Codeword |010i. x and y are both flipped once so x = 1, y = 1.

Codeword |101i. x and y are both flipped once so again x = 1, y = 1.

|ψ3 i Codeword |001i. x is not flipped and y is flipped once so x = 0, y = 1.

Codeword |110i. x is flipped twice and y is flipped once so again x = 0, y = 1.

Hence we get the table of results shown in Table 19.1. Note that in all cases the coherent superposition
of the syndrome state is not destroyed by the measurement of the ancillas.
Hence by measuring the auxiliary qubits we can determine which if any of the codeword qubits have
flipped and then apply a compensating flip if necessary. The X-gates which perform these compensating
flips are shown at the right of Fig. 19.4. For example the X xỹ gate on qubit 1 indicates that a flip is
done by acting with the X operator on qubit 1 only if xỹ = 1, i.e. x = 1 and y = 0, which corresponds
to the second entry in the Table 19.1 (ỹ means the complement of y).
We have assumed up to now that the state of the system has had a qubit flipped with probability
one. However, as already noted, errors in quantum circuits can arise continuously from zero, and we
are concerned with the situation in which the error rate is small (otherwise we can not error correct).
19.2. CORRECTING BIT FLIP ERRORS 179

syndrome bit flipped x y

|ψi none 0 0
|ψ1 i 1 1 0
|ψ2 i 2 1 1
|ψ3 i 3 0 1

Table 19.1: Results of measurement of the ancillary qubits for the different syndromes of the codeword
qubits.

Consider then, a more realistic scenario in which the state of the three qubits in the codeword has a
small amplitude to have any of the qubits flipped, i.e. the state of the codeword is given by

|ψi → [1 + ( 1 X1 + 2 X2 + 3 X3 )] |ψi , (19.6)

where |ψi is given by Eq. (19.5a), the i may be complex, |i | 1, we have only indicated terms to
first order in the i , and ignored corrections to the normalization which are second order in the i .
Hence the state of the codeword qubit and ancilla qubits which is inputed to the detection phase
of the circuit in Fig. 19.4 is

[1 + ( 1 X1 + 2 X2 + 3 X3 )] |ψi ⊗ |00iA , , (19.7)

where | · · · iA refers to the ancillas. In the detection phase, the codeword qubits are entangled with
ancillas in such a way that the state of the combined codeword-ancilla system, just before the measuring
gates in Fig. 19.4, is

|ψi|00iA + 1 X1 |ψi|10iA + 2 X2 |ψi|11iA + 3 X3 |ψi|01iA , (19.8)

where | · · · iA refers to the ancillas. The ancillas are then measured with the possible results shown
below
measured ancillas probability resulting syndrome operator to correct the error
|00iA '1 |ψi none needed
|10iA |1 |2 X1 |ψi X1
|11iA |2 |2 X2 |ψi X2
|01iA |3 |2 X3 |ψi X3

Since the i are small, the probability that a corrupted state is detected is small, so the most
probable situation is that projection is on to the uncorrupted state so no correction is needed. However,
there is a small probability that the projection will be on to one of the corrupted syndromes. The
corrupted syndromes differ substantially from the uncorrupted state. They are further, in fact, from
the uncorrupted state than the original state in Eq. (19.6). This might, at first, seem like a retrograde
step but it is not because the corrupted state is known precisely so it is possible to correct it back to
to the uncorrupted state.
To summarize this part, quantum error correction is feasible, even though errors arise continuously,
because possibly corrupted states are projected on to one of a discrete set of states which can be
corrected if necessary. We will discuss this important point again in Sec. 19.5 when we consider how
general errors arise.
It should be noted that in classical analog computers, where errors also arise continuously, no such
projection can be done, and hence error correction can not be performed. This is why we don’t have
classical analog computers.
180 CHAPTER 19. QUANTUM ERROR CORRECTION

Going back to the discussion of Fig. 19.4, one can avoid explicitly measuring the qubits and instead
correct any bit-flip error coherently and automatically by having the ancillas interact back on the
codeword qubits as shown in Fig. 19.5. In that figure, the rightmost three controlled gates have
the same effect as the NOT (i.e. X) gates in the right of Fig. 19.4 which depend on the result of
measurements of the x and y ancillary qubits. The rightmost gate in Fig. 19.5 has two control qubits
and three target qubits. This gate flips all the target qubits if both control qubits are 1. It is a
generalization of the Toffoli gate T which has two control qubits, and one target qubit which is flipped
if both control qubits are 1, i.e. T |xi|yi|zi = |xi|yi|z ⊕ x yi. If we denote by T ∗ the rightmost gate in
Fig. 19.5 then T ∗ |xi|yi|zi|ui|vi = |xi|yi|z ⊕ x yi|u ⊕ x yi|v ⊕ x yi. Note that this gate is equivalent to
three separate Toffoli gates, in which the two ancilla qubits are the controls, qubit 1 is the target for
the first Toffoli, qubit 2 for the second Toffoli, etc. After the error on the computational bits has been
corrected the ancilla qubits have to be reinitialized to zero.

detection correction

0 X
X X x

0 X
X X
X y

1 X X X
or
α 000 + β 111 2 X X α 000 + β 111
or
3 X X X

Figure 19.5: Automation of the error correction procedure of Fig. 19.4. The three controlled gates on
the right have the same effect as the NOT (i.e. X) gates on the right of Fig. 19.4 which depend on the
result of measurements of the x and y ancillary qubits. The rightmost gate, with two control qubits
and three target qubits, is discussed in the text. The values of the control bits x and y at the end
depend on which of the four syndromes is present (i.e. which if any of the X gates on the left of the
figure have acted) according to Table 19.1. Before this circuit can be used again, the ancillary qubits
have to be reinitialized to 0.

It is instructive to show for the different syndromes in Eq. (19.5) that the circuits in Figs. 19.4 and
19.5 give the same result, i.e. the end product is the uncorrupted state |ψi. The results from the circuit
of Fig. 19.4 have already been discussed above. For the circuit in Fig. 19.5 we just consider the case
of |ψ2 i (so qubit 2 has been flipped), and we have x = 1, y = 1 according to Table 19.1. Consider the
rightmost three gates in Fig. 19.5 (these are the ones that do the error correction). For x = 1, y = 1,
the rightmost gate is active and flips all three codeword qubits. Hence, between them, the rightmost
three gates flip codeword qubit 1 twice, flip codeword qubit 2 once, and flip codeword qubit 3 twice.
The net result is that only codeword qubit 2 is flipped so we recover the uncorrupted state |ψi. It is
useful to check that the circuit in Fig. 19.5 also works to correct |ψ1 i and |ψ3 i.

19.3 Stabilizer formalism

In order to conveniently generalize the ideas in the previous section to arbitrary errors we need to
reformulate them.
19.3. STABILIZER FORMALISM 181

For reasons that will shortly become clear, consider the two Hermitian1 operators Z1 Z2 and Z2 Z3 .
Because Zi2 = 1 (the identity) and different Z 0 s commute we have
(Z1 Z2 )2 = 1, (Z2 Z3 )2 = 1 . (19.9)
An operator whose square is unity has eigenvalues equal to ±1, since acting twice with the operator
on an eigenvector gives the eigenvector, so the square of the eigenvalue is 1. We also we know that
Z1 Z2 and Z2 Z3 commute with each other and hence have the same eigenvectors.

syndrome Z1 Z2 Z2 Z3 x y
| ψi 1 1 0 0
|ψ1 i X1 |ψi -1 1 1 0
|ψ2 i X2 |ψi -1 -1 1 1
|ψ3 i X3 |ψi 1 -1 0 1

Table 19.2: The eigenvalues of the stabilizers Z1 Z2 and Z2 Z3 for the four syndromes for the 3-qubit
bit-flip code, and a comparison with the measurements of the ancillary qubits x and y used to measure
them, see Fig. 19.7. The uncorrupted state has eigenvalue +1 for both stabilizers. This is an important
property that stabilizers must have in general. Note that Z1 Z2 = 1 corresponds to x = 0, and
Z1 Z2 = −1 corresponds to x = 1. There is a similar connection between Z2 Z3 and y, so Z1 Z2 =
(−1)x , Z2 Z3 = (−1)y . The second column shows how the corrupted state is generated from the
uncorrupted state.

One can verify that the syndrome states in Eq. (19.5) are eigenvectors of Z1 Z2 and Z2 Z3 according
to Table 19.2. In general we use the term “stabilizers” to denote operators like operators Z1 Z2 and
Z2 Z3 whose ±1 eigenvalues distinguish the different syndromes. As we will see below, each of the
stabilizers is measured by an ancilla qubit, |xi for Z1 Z2 and |yi for Z2 Z3 , see Fig 19.7 below. The
ancilla state |x = 0i corresponds to Z1 Z2 = +1, and |x = 1i corresponds to Z1 Z2 = −1, or in other
words, Z1 Z2 = (−1)x , and similarly Z2 Z3 = (−1)y .
Below we will discuss the circuit with which we measure the stabilizers, but first we show a more
straightforward way to determine whether the eigenvalue of a stabilizer in a syndrome is +1 or −1
than simply acting with the stabilizer on the syndrome.
We note first that the eigenvalue of all the stabilizers is +1 in the uncorrupted syndrome |ψi. This
is an essential property that stabilizers must have. Also note that the operators for the stabilizers will
be built out of the single-qubit operators Zi and Xi . For the 3-qubit, bit-flip code we only have the Zi
but the Xi will also be needed to correct for general errors. Furthermore the syndromes with a single
qubit error are obtained by acting on the uncorrupted syndrome with the Xi , Yi and Zi operators.2
Again, for our simple example above, we only had the Xi , but the other operators will also be used
when we deal with general errors.
The Pauli operators, Xi , Yi , Zi , have the property that they commute for different qubits i, whereas
different operators on the same qubit anti-commute, where the anti-commutator of A and B is defined
by {A, B} ≡ AB + BA. Hence we have, for example,
[Xi , Yj ] ≡ Xi Yj − Yj Xi = 0 (i 6= j) , (19.10a)
{Xi , Yi } ≡ Xi Yi + Yi Xi = 0 . (19.10b)
1
As discussed in Chapter 3 it is an axiom of quantum mechanics that measurable quantities are represented by
Hermitian operators.
2 0 1 0 −i
Recall that the Pauli operators X, Y and Z are given by X ≡ σ x = , Y ≡ σy = , Z ≡ σz =
1 0 i 0

1 0
and so Y = iXZ.
0 −1
182 CHAPTER 19. QUANTUM ERROR CORRECTION

(Verify the anti-commutation relations like Eq. (19.10b) by explicitly working out some cases.)
Consequently, if we consider a general stabilizer Aα and a syndrome state |ψβ i = Bβ |ψi then Aα
either commutes or anti-commutes with Bβ . Note that Bβ only involves a single Pauli operator (which,
in general, can be an X or a Y or a Z) whereas Aα involves a product of Pauli operators, which, in
the general case, can be made up of X’s and Z’s. We will now show that if Aα commutes with Bβ the
eigenvalue of the stabilizer Aα in state |ψβ i is +1 and if they anti-commute the eigenvalue is −1.
Firstly, if Aα commutes with Bβ then

Aα |ψβ i = Aα Bβ |ψi = Bβ Aα |ψi = Bβ |ψi = |ψβ i , (19.11)

where we used that the eigenvalues of all the stabilizers Aα are +1 in the uncorrupted state |ψi to get
the third equality. Hence the eigenvalue of Aα in state |ψβ i is +1 if Aα commutes with Bβ . Similarly
if Aα anti-commutes with Bβ then

Aα |ψβ i = Aα Bβ |ψi = −Bβ Aα |ψi = −Bβ |ψi = −|ψβ i , (19.12)

so the eigenvalue is −1.

We emphasize that the syndromes must be eigenstates of all the stabilizers which means that the
stabilizers must commute with each other.
Next we will see how to determine efficiently if a stabilizer commutes or anti-commutes with the
operator which generates a corrupted syndrome out of the uncorrupted state.
For the case of the 3-qubit, bit-flip code discussed so far the stabilizers are

Z1 Z2 and Z2 Z3 , (19.13)

and the operators which generate the corrupted syndrome from the uncorrupted state are

X1 , X2 and X3 . (19.14)

As an example, we see that X1 commutes with Z2 Z3 because there are no sites in common, so the
eigenvalue of Z2 Z3 for |ψ1 i must be +1 which agrees with Table 19.2. On the other hand X2 has one
site in common with Z2 Z3 so

X2 Z2 Z3 = −Z2 X2 Z3 = −Z2 Z3 X2 , (19.15)

and the operators anticommute, so the eigenvalue of Z2 Z3 for |ψ2 i must be −1, which again agrees
with Table 19.2.

The point is that every time we have to interchange the order of two different operators
acting on the same qubit we pick up a minus sign.

Hence it is straightforward to deduce the overall sign. Note that operators of the same type, e.g. the
Zi , always commute.
As a more complicated example, which occurs in a scheme for full error correction, consider the
stabilizer Z1 Z3 X4 X5 . For the syndrome which has been corrupted by Z4 the eigenvalue is −1, the
minus sign coming from interchanging the order of X4 and Z4 . However, for the syndrome which was
corrupted by X4 the eigenvalue is +1 since, for the qubit in common, (qubit 4), both operators are X
and so commute. As another example, for the syndrome which was corrupted by X2 the eigenvalue is
+1, because X2 and the stabilizer commute since they have no qubits in common.
To summarize, in the stabilizer formalism we need to construct a set of Hermitian operators (the
stabilizers) which have the following properties:
19.3. STABILIZER FORMALISM 183

1. they square to 1, (so the eigenvalues are ±1),

2. they mutually commute (so they have the same eigenstates),

3. the syndromes are eigenstates

4. the uncorrupted syndrome has eigenvalue +1 for all stabilizers, and

5. the set of ±1 eigenvalues of the stabilizers uniquely specifies the syndrome. Whether the eigen-
value is +1 or −1 is easily determined from the commutation properties of the stabilizer with
respect to the operator which generates the corruption in the syndrome.

In Sec. 19.6 we will describe an example with full error correction which has codewords with 9
qubits and needs 8 stabilizers.

control
0 H H

ψ U
target
φ φ φ φ
0 1 2 3

Figure 19.6: A circuit with a control-U gate in which the control (upper) qubit is surrounded by
Hadamards. U is an operator with eigenvalues ±1 and corresponding eigenvectors |ψ+ i and |ψ− i.
As shown in the text, if a measurement of the upper qubit gives |0i then the lower qubit will be in
state |ψ+ i, and if the measurement gives |1i then the lower qubit will be in state |ψ− i. The states
|φi i (i = 0, 1, 2, 3) are described in the text. Note that this figure is identical to Fig. 7.8 and was
discussed in Chapter 7.

Next we describe the circuit which will measure the eigenvalues of the stabilizers and hence deter-
mine which syndrome has occurred. Consider the circuit in Fig. 19.6 which includes a control-U gate
in which the control qubit is sandwiched between Hadamards. Here U is an operator, which, like the
stabilizers, has eigenvalues ±1. If the control qubit is 1 the effect on the target qubit is

U |ψ+ i = |ψ+ i, U |ψ− i = −|ψ− i, (19.16)

where |ψ+ i and |ψ− i are the eigenvectors with eigenvalue +1 and −1 respectively. If the control qubit
is 0 then the target qubit is unchanged. The initial state of the target qubit can be written as a
superposition of eigenstates, i.e.
|ψi = α+ |ψ+ i + α− |ψ− i. (19.17)
We discussed the circuit of Fig. 19.6 in Chapter 7 and found that the states |φi i, (i = 0, 1, 2, 3) are
given by Eqs. (7.21). In particular, the final state |φ3 i, before the measurement of the upper qubit, is
given by
|φ3 i = α+ |0 ψ+ i + α− |1 ψ− i . (19.18)
Hence if a measurement of the upper qubit gives |0i (which it does with probability |α+ |2 ) the lower
qubit will be in state |ψ+ i, and if the measurement gives |1i (probability is |α− |2 ) the lower qubit will
be in state |ψ− i. Hence we see that measuring the control qubit tells us which eigenstate of U the
target qubit is in.
184 CHAPTER 19. QUANTUM ERROR CORRECTION

Stabilizers involve more than one codeword qubit so the gates we need will have several target
qubits. For the 3-qubit, bit-flip code, the circuit equivalent to Fig. 19.4 is shown in Fig. 19.7. We see
that the x ancilla is the control qubit for a control-Z1 Z2 gate which is sandwiched between Hadamards,
and similarly the y ancilla is the control qubit for a control-Z2 Z3 gate. Hence if x = 0 the state of the
codeword bits has Z1 Z2 = +1, whereas if x = 1 the state of the codeword bits has Z1 Z2 = −1. There
is an analogous correspondence between y and Z2 Z3 .

detection correction
0 H H x

0 H H y
~
xy
1 X Z X
or
xy α 000 + β 111
α 000 + β 111 2 X Z Z X
or
~
xy
3 X Z X

Figure 19.7: Circuit equivalent to that in Fig. 19.4 but in the stabilizer formalism. In this circuit x
measures Z1 Z2 , and y measures Z2 Z3 . In other words, if x = 0 the state of the codeword bits has
Z1 Z2 = +1, whereas if x = 1 the state of the codeword bits has Z1 Z2 = −1, with an analogous
correspondence between y and Z2 Z3 . Note that Z1 Z2 and Z2 Z3 have eigenvalues ±1 and commute
with each other.

The equivalence of the circuits in Figs. 19.4 and 19.7 can also be understood from the simpler case
of the equivalences shown in Fig. 19.8 in which the left-hand equality comes from the fact that the
target and control qubits can be exchanged in a control-Z gate,3 and the right-hand equality is because
HZH = X and H 2 = 1 (the identity).

H H H Z H X
= =

Figure 19.8: The equalities in this figure are helpful to understand the equivalence of Figs. 19.4 and
19.7. The left-hand equality comes from the fact that the target and control qubits can be exchanged
in a control-Z gate, and the right-hand equality is because HZH = X and H 2 = 1.

The stabilizer formalism will be convenient when devising circuits for full error correction rather
than just correcting bit flips as we have done up to now.

19.4 Phase Flip Code

Before discussing how to correct general errors, we will briefly mention another special case, a phase
flip, which has no classical equivalent since classical bits don’t have any property corresponding to
3
Because the only effect of the gate is to change the sign of the state if both target and control qubits are 1.
19.5. GENERAL ERRORS AND THE EFFECTS OF THE ENVIRONMENT 185

phase. In this error model, with some probability p, the relative phase of |0i and |1i is flipped so

|ψi = α|0i + β|1i → α|0i − β|1i . (19.19)

Phase flips are generated by the Z operator since

α α α
→Z = (computational basis). (19.20)
β β −β

The phase-flip error model can be turned into the already-studied bit-flip model by transforming
to the ± basis (also called the X-basis because it is the basis in which X is diagonal) where
1 1
|+i = √ (|0i + |1i) , |−i = √ (|0i − |1i) , (19.21)
2 2
One transforms between the ± basis and the computational basis using Hadamards:

H|0i = |+i, H|1i = |−i, (19.22a)

H|+i = |0i, H|−i = |1i. (19.22b)

In the ± basis the roles of X and Z are interchanged since

X|0i = |1i, X|1i = |0i, Z|0i = |0i, Z|1i = −|1i, (19.23a)

Z|+i = |−i, Z|−i = |+i, X|+i = |+i, X|−i = −|−i. (19.23b)

Thus we shall find in Sec. 19.6 that stabilizers to detect phase errors involve X operators, as opposed
to those used to detect bit-flip errors which involve Z operators (see Fig. 19.7).

α 0 +β 1 H

0 X H α +++ + β − − −

0 X H

Figure 19.9: Encoding circuit for the 3-qubit phase-flip code.

The encoding circuit for the 3-qubit phase-flip code is obtained from that for the 3-qubit bit-flip
code in Fig. 19.2 by adding Hadamards to the circuit, with the result shown in Fig. 19.9. We shall use
this circuit in Sec. 19.6 as part of the encoding circuit in Fig. 19.10 for a code (due to Shor) which
corrects general 1-qubit errors.

19.5 General Errors and the Effects of the Environment

In our discussion of errors we have so far implicitly assumed that the errors occur because of some
malfunction in the circuit. The state has underdone a unitary transformation, but not exactly the
right one. Another, and very important, source of error is interaction between the qubits and the
environment, which is unavoidable even though quantum computer engineers work very hard to reduce
186 CHAPTER 19. QUANTUM ERROR CORRECTION

it to a minimum. This can lead to errors due to a non-unitary change in the computational qubits
(though the combined system of qubits plus environment undergoes unitary time development.) In
this section we include the effects of the environment and also consider the most general type of single
qubit error. The discussion below follows Mermin [Mer07].
Consider a single qubit |xi, and call the environment |ei. Unlike the state of the qubit, the state
of the environment is in a space of very many dimensions. Ideally |xi evolves under the effects of the
gates only, independent of the environment. However, interactions with the environment cannot be
avoided which leads to a corruption of the qubit and an entangling of the qubit with the environment.
The most general such form of these effects is
|ei |0i → |e0 i |0i + |e1 i |1i, (19.24a)
|ei |1i → |e2 i |0i + |e3 i |1i, (19.24b)
where |ei i (i = 0, · · · , 3) are possible final states of the environment. The environment states are not
normalized, and not orthogonal either. However, the two states on the right hand side of Eqs. (19.24)
must be orthogonal since the time evolution of the combined qubit-environment system is unitary. In
other words
he2 |e0 i + he3 |e1 i = 0 . (19.25)
The corruption of the computation by the environment indicated in Eq. (19.24) is called “decoherence”.
It is the main source of difficulty in building a practical quantum computer.
In previous sections we have neglected entanglement with the environment. Rather, errors were
assumed to occur because of mistakes made in the circuit itself. This corresponds to a special case of
Eqs. (19.24), where all the environment states are the same, apart from normalization, i.e. |ei i = ci |ei,
for i = 0, · · · , 3.
We are interested in the case where the probability of an error is small (otherwise we would not be
able to correct for it), i.e.
he|ei = 1, he0 |e0 i ' 1, he3 |e3 i ' 1, he1 |e1 i 1, he2 |e2 i 1. (19.26)
Equations (19.24) can be combined into one as

|e0 i + |e3 i |e0 i − |e3 i |e2 i + |e1 i |e2 i − |e1 i
|ei |xi → 1+ Z+ X+ (iY ) |xi,
2 2 2 2
(19.27)
where x = 0 or 1 and, as usual, 4

1 0 0 1 0 1 1 0
Z= , X= , iY = ZX = , 1= . (19.28)
0 −1 1 0 −1 0 0 1
Please evaluate Eq. (19.27) separately for x = 0 and 1 to verify that it is equivalent to Eqs. (19.24).
There is nothing special about these environment states so we can write
|ei |xi → ( |di1 + |aiX + |bi(iY ) + |ciZ ) |xi. (19.29)
Equation (19.28) applies to both x = 0 and x = 1. Since time evolution of the combined qubit-
environment system follows quantum mechanics and so is unitary and linear, it also applies to a linear
superposition |ψi = α|0i + β|1i so
|ei |ψi → ( |di1 + |aiX + |bi(iY ) + |ciZ ) |ψi. (19.30)
4
I prefer to write equations like (19.27) in terms of iY (= ZX) rather than Y to avoid having explicitly complex
coefficients in the matrices. Many texts on quantum computing write ZX rather than iY . Note that iY (= ZX) is not
Hermitian (though Y is) but we do not need the Hermitian property here. What we do need is that, iY , like X, Y and
Z, is unitary.
19.5. GENERAL ERRORS AND THE EFFECTS OF THE ENVIRONMENT 187

We see that the effects of the environment on the uncorrupted state of a single qubit can be
expressed entirely in terms of the Pauli operators, X, (iY ) and Z. These are characterized as follows:

• X corresponds to a bit-flip error,

• Z corresponds to a phase-flip error, and

• iY (= ZX) corresponds to combined bit-flip and phase-flip errors.

Intuitively, the reason that the new state can be expressed in terms of the Pauli operators and the
identity, is that any 2 × 2 matrix can be written as a linear combination of these operators, see
Eq. (2.25).
We remind the reader that the environment states are not normalized, and so, in the important
case where the initial state is close to the final state, we have

ha|ai 1, hb|bi 1, hc|ci 1, (19.31)

in Eq. (19.30),
We now extend this discussion to the situation where we have expanded a single qubit into an
n-qubit codeword which we write as |ψin . In this course we just consider how to correct single-qubit
errors, so we neglect the possibility that two or more of the qubits in the codeword are corrupted.
From Eq. (19.30), we see that all single qubit errors are incorporated by

Based on Eq. (19.32), single qubit quantum error correction involves the following steps:

• Expand the logical qubit to an n-qubit codeword.

• Project the possibly corrupted state to one of the 3n + 1 states (syndromes) on the right hand
side of Eq. (19.32), with information indicating which one.

• Correct, if necessary, the 1-qubit error by acting with the appropriate Xk , Yk or Zk .

Please note the following important points:

1. The whole continuum of errors can be represented by a finite set of discrete errors. Errors
emerge continuously from the uncorrupted state by increasing from zero the size of the terms in
Eq. (19.32) involving Xi , Yi and Zi , which are characterized by hai |ai i1/2 , hbi |bi i1/2 and hci |ci i1/2
respectively. However, the projection is always to one of the 3n+1 discrete states. If the amplitude
of the error is small then, with high probability, the projection will be to the uncorrupted state
(which needs no correction) but with small but non-zero probability the projection will be to one
of the 3n corrupted states (which do need correction).

2. An arbitrary error on a single qubit will be corrected, not just bit-flip (X), or phase-flip (Z), or
combined bit- and phase-flip (iY ) errors but also any combination of them. For example, suppose
that the k-th qubit has been reinitialized to zero, i.e.

|0k i → |0k i, |1k i → |0k i. (19.33)

188 CHAPTER 19. QUANTUM ERROR CORRECTION

The matrix which accomplishes this transformation is5

1 1
(19.34)
0 0

which can be written as

1 + Xk + iYk + Zk
. (19.35)
2
Hence the state of the codeword qubits and environment has been transformed as follows:
1
|ei |ψin → |e0 i |ψ 0 in = |e0 i (1 + Xk + iYk + Zk ) |ψin . (19.36)
2
The codeword qubits are now in a linear combination of four syndromes, corresponding to the
four terms in this equation. A general syndrome measuring circuit, such as the Shor 9-qubit code
discussed in the next section, will detect these syndromes and obtain a unique set of values for
the ancilla qubits for each of them. Hence, even for this non-unitary error, measuring the ancillas
will project on to one of the syndromes which can then be corrected if necessary.

3. A full discussion of how the entanglement of qubits with the environment generates errors and
how they can subsequently be corrected, requires a detailed treatment of the density matrix, see
Chapter 5. This advanced material is discussed in Refs. [NC00, RP14] but is beyond the scope
of the present course.

19.6 Correcting Arbitrary Errors: the 9-qubit Shor code

In the section we discuss a code, due to Peter Shor [Sho95], for correcting arbitrary 1-qubit errors.
This code needs code words of nine qubits to represent one logical qubit. It is not the most efficient
code, there are others which use smaller code words and so don’t need as many physical qubits, but the
structure of Shor’s code follows quite naturally from the discussion we have already given of 1-qubit
bit-flip, and 1-qubit phase-flip errors, so will discuss it here.
Shor’s algorithm includes both bit-flip (X) and phase-flip (Z) codes, which turns out to then
automatically correct combined bit-flip, phase-flip (iY ) errors. As discussed in the previous section, it
then also corrects arbitrary 1-qubit errors.
We first encode for phase flip errors:

|0i → | + ++i, |1i → | − −−i, (19.37)

and then encode for bit-flip errors

1 1 1 1
|+i = √ ( |0i + |1i ) → √ ( |000i + |111i ) , |−i = √ ( |0i − |1i ) → √ ( |000i − |111i ) . (19.38)
2 2 2 2
The final result is the 9-qubit encoding
1
|0i → |0i = ( |000i + |111i ) ( |000i + |111i ) ( |000i + |111i) , (19.39a)
23/2
1
|1i → |1i = ( |000i − |111i ) ( |000i − |111i ) ( |000i − |111i) . (19.39b)
23/2
5
The reader will notice that the transformation in Eqs. (19.34), which involves a linear combination of X, iY and
Z on a single qubit, are not unitary. Now the evolution of an isolated (closed) system is unitary, However, qubits are
coupled to the environment. If we consider a system coupled to the environment (called an open system), and subject
the combined system+environment to a unitary transformation, and finally consider the behavior of just the system by
tracing out over the environment, the resulting transformation of the system is not necessarily unitary [NC00, RP14].
19.6. CORRECTING ARBITRARY ERRORS: THE 9-QUBIT SHOR CODE 189

These two equations can be combined as

1
|xi → |xi = 3/2 ( |000i + (−1)x |111i ) ( |000i + (−1)x |111i ) ( |000i + (−1)x |111i) , (19.40)
2
or more concisely as
1
|xi = 3/2 ( |000i + (−1)x |111i )⊗3 . (19.41)
2
Such a code is called a concatenated code. The circuit to achieve this encoding is obtained by concate-
nating the phase flip and the bit flip encodings as shown in Fig. 19.10. Note the labeling of the qubits.
The qubits in each of the three blocks in Eq. (19.39) have labels 123, 456 and 789.

x H 1

0 X 2

0 X 3

0 X H 4

0 X 5

0 X 6

0 X H 7

0 X 8

0 X 9

φ φ φ2
0 1

Figure 19.10: Encoding for the Shor 9-qubit code. If the initial state at the top left, |xi, is a compu-
tational basis state, |0i or |1i, then |φ0 i = |xxxi and |φ1 i = 2−3/2 (|0i + (−1)x |1i)(|0i + (−1)x |1i)(|0i +
(−1)x |1i) since H|xi = 2−1/2 (|0i + (−1)x |1i). By comparison with Fig. 19.1, we see that |φ2 i = |xi
given in Eq. (19.40). Hence, if the initial state at the top left is a linear combination α|0i + β|1i then,
by linearity, the final state at the right is α|0i + β|1i. The numbers at the right are the labels of the
nine qubits. Note that this circuit is a concatenation of the encoding circuit for phase-flips shown in
Fig. 19.9, and that for bit-flips in Fig. 19.1.

The form of the 1-qubit corruption in Eq. (19.32) simplifies a little here because if |ψi is a linear
combination of the codeword states in Eq. (19.39) then
Z1 |ψi = Z2 |ψi = Z3 |ψi, (19.42a)
Z4 |ψi = Z5 |ψi = Z6 |ψi, (19.42b)
Z7 |ψi = Z8 |ψi = Z9 |ψi. (19.42c)
The reason is that, for example, changing the first of the + signs in Eq. (19.39a) into a − sign, and
the first − sign in Eq. (19.39b) into a + sign, can be accomplished by acting with either Z1 , Z2 or Z3 .
Hence, the general form of a 1-qubit corruption contains only 22 independent syndromes rather
than 28 = (3 × 9) + 1:
9 9
!
X X
|ei |ψi → |diI + |ciZ1 + |c0 iZ4 + |c00 iZ7 + |ai iXi + |bi iiYi |ψi. (19.43)
i=1 i=1
190 CHAPTER 19. QUANTUM ERROR CORRECTION

0 H H x8
0 H H x7
0 H H x6
0 H H x5

0 H H x4

0 H H x3

0 H H x2

0 H H x1
1 Z X
2 Z Z X
3 Z X
4 Z X X
5 Z Z X X

6 Z X X
7 Z X

8 Z Z X

9 Z X

Figure 19.11: A circuit to measure the error syndrome for the Shor 9-qubit code. The nine codeword
qubits are at the bottom and the eight ancillary qubits at the top. The ancillary qubits determine
the values of the eight, mutually commuting stabilizers in Eq. (19.44), M1 = Z1 Z2 , M2 = Z2 Z3 , M3 =
Z4 Z5 , M4 = Z5 Z6 , M5 = Z7 Z8 , M6 = Z8 Z9 , M7 = X1 X2 X3 X4 X5 X6 and M8 = X4 X5 X6 X7 X8 X9 .
The nine codeword qubits can be conveniently grouped into three groups of three as indicated. The
measured value of the i-th ancilla xi (= 0 or 1), is related to the value of the corresponding stabilizer
Mi by Mi = (−1)xi . The measured values of the eight xi (or equivalently the Mi ) determine which
syndrome in Eq. (19.43) has been projected out by the measurement of the ancillas, as discussed in
the text and Table 19.3. If one of the corrupted syndromes is found, it can be corrected back to the
uncorrupted state by acting with the appropriate Xi , Yi or Zi .

The eight stabilizers which we use to diagnose the error are

M1 = Z 1 Z 2 , M2 = Z2 Z3 , M3 = Z4 Z5 , M4 = Z5 Z6 , M5 = Z7 Z8 , M6 = Z8 Z9 ,
M7 = X1 X2 X3 X4 X5 X6 , M8 = X4 X5 X6 X7 X8 X9 . (19.44)
Note that the nine qubits can conveniently be grouped into three blocks of three, containing qubits
123, 456 and 789 respectively. M1 and M2 act entirely on the first block, and do so in the same way as
the stabilizers of the 3-qubit, bit flip code shown in Fig. 19.7. Similarly M3 and M4 act on the second
block and M5 and M6 act on the third block. M7 acts on all qubits in blocks 1 and 2, while M7 acts
on all qubit in blocks 2 and 3.
The circuit for determining the syndrome eigenvalues is shown in Fig. 19.11.
We will now show that the Mi have the desired properties:
19.6. CORRECTING ARBITRARY ERRORS: THE 9-QUBIT SHOR CODE 191

• They all square to unity (since each of the Z’s and X’s square to unity and the X’s commute
amongst each other as do the Z’s). Hence their eigenvalues are ±1.

• They mutually commute. The six Z-stabilizers trivially commute with each other as do the
two X-stabilizers. Comparing the indices on the Z-stabilizers with the X-stabilizers one sees
that either they have none in common, in which case this X-stabilizer and Z-stabilizer trivially
commute, or they have two in common, in which case there are two minus signs when one pulls
one of the stabilizers through the other, so the overall sign is positive and again the X-stabilizer
and the Z-stabilizer commute).

• The eigenvalue of the uncorrupted codewords |0i and |1i is +1 for all stabilizers.
This is trivially seen for M1 –M6 which involve pairs of Z operators, since, for each pair, both
qubits are 0 or both are 1 in the codewords. Note that the pairs are entirely within the blocks
of three adjacent qubits in Eq. (19.39), see Fig. 19.10.
Next consider M7 and M8 which involve a product of six X operators, each spanning two of the
three blocks shown in Fig. 19.10. For example, M7 is a product of the X operators for the qubits
in the first two blocks. We have
1
M7 |0i = X1 X2 X3 X4 X5 X6 ( |000i + |111i ) ( |000i + |111i ) ( |000i + |111i)
23/2
1
= ( |111i + |000i ) ( |111i + |000i ) ( |000i + |111i)
23/2
1
= ( |000i + |111i ) ( |000i + |111i ) ( |000i + |111i)
23/2
=|0i, (19.45)

and
1
M7 |1i = X1 X2 X3 X4 X5 X6 ( |000i − |111i ) ( |000i − |111i ) ( |000i − |111i)
23/2
1
= 3/2 ( |111i − |000i ) ( |111i − |000i ) ( |000i − |111i)
2
1
= 3/2 ( |000i − |111i ) ( |000i − |111i ) ( |000i − |111i)
2
=|1i, (19.46)

so M7 has eigenvalue +1 for both uncorrupted codewords. The argument for M8 goes along the
same lines.

• The ±1 eigenvalues of the stabilizers allow one to determine which of the 22 syndromes in
Eq. (19.43) the system has projected on to. Recalling the discussion in Sec. 19.3, the eigenvalue
is +1 if the stabilizer commutes with the operator which caused the 1-qubit corruption, and is −1
if it anti-commutes. Each time two different operators on the same qubit are pulled through each
other to perform the commutation one generates a minus sign. The operators which generate the
corruption are the 21 Xi , Yi and Zi in Eq. (19.43). A table of the eigenvalues of the stabilizers
for all 22 syndromes is given in Table 19.3.
Let’s make sure that we understand how the syndrome-detection circuit in Fig. 19.11 works. Firstly
we remind the reader that if the measurement of an auxiliary qubit, xi say, is 0, then the value of the
corresponding stabilizer Mi is +1, while if the measurement is 1, then the value of Mi is −1. Thus we
can say that xi measures Mi , see the discussion of Fig. 19.6 on page 183. Next we discuss how each of
the stabilizers works.
192 CHAPTER 19. QUANTUM ERROR CORRECTION

Syndrome M1 M2 M3 M4 M5 M6 M7 M8
1 + + + + + + + +
X1 − + + + + + + +
X2 − − + + + + + +
X3 + − + + + + + +
X4 + + − + + + + +
X5 + + − − + + + +
X6 + + + − + + + +
X7 + + + + − + + +
X8 + + + + − − + +
X9 + + + + + − + +
Y1 − + + + + + − +
Y2 − − + + + + − +
Y3 + − + + + + − +
Y4 + + − + + + − −
Y5 + + − − + + − −
Y6 + + + − + + − −
Y7 + + + + − + + −
Y8 + + + + − − + −
Y9 + + + + + − + −
Z1 (= Z2 = Z3 ) + + + + + + − +
Z4 (= Z5 = Z6 ) + + + + + + − −
Z7 (= Z8 = Z9 ) + + + + + + + −

Table 19.3: The eigenvalues of the 8 stabilizers defined in Eq. (19.44) for the 22 syndromes of Shor’s
9-qubit error correcting code. The left column indicates which Pauli operator generates the syndrome
from the uncorrupted state. A + sign indicates eigenvalue +1 and a − sign indicates eigenvalue −1.
Each stabilizer Mi is measured by an ancilla qubit xi , see Fig. 19.11, such that if Mi = +1 then xi = 0
and if Mi = −1 then xi = 1. An essential feature is that each of the 22 rows, i.e. syndromes, has a
unique pattern of + and − signs. Hence the measured values of the xi indicate which syndrome has
been projected out by the measurement. If this is one of the corrupted syndromes, the set of xi indicate
which Pauli operator generated the corruption, and the syndrome is then corrected by applying the
same Pauli operator. This works because the Pauli operators square to the identity.

• We consider first M1 –M6 , the stabilizers involving Z operators.

The ancilla qubits x1 and x2 measure M1 = Z1 Z2 and M2 = Z2 Z3 respectively, and so detect a
bit-flip error in the first group of three qubits in the 9-qubit encoding of Eq. (19.39), in exactly
the same way as for the 3-qubit, bit-flip code shown in Fig. 19.7. Similarly x3 and x4 detect a
bit-flip error in the second group of three qubits (qubits 4–6), and x5 and x6 detect a bit-flip
error in the third group of three qubits (qubits 7–9).

• Next we consider M7 and M8 , the stabilizers involving X operators.

The ancilla x7 measures M7 = X1 X2 X3 X4 X5 X6 and the ancilla x8 measures M8 = X4 X5 X6 X7 X8 X9 .
These detect phase flips. M7 detects a phase flip in the first two groups of three qubits (qubits
1–6) while M8 detects a phase flip in the second and third groups of three qubits (qubits 4–9).

We now illustrate in more detail how Table 19.3 was obtained by working through a few cases.
(Eigenvalues are taken to be +1 unless otherwise stated.)
19.6. CORRECTING ARBITRARY ERRORS: THE 9-QUBIT SHOR CODE 193

(a) Z2 : Clearly Z2 commutes with all the Z-stabilizers. It anticommutes with M7 (because it has
one qubit in common and X and Z anticommute) and commutes with M8 because it has no
qubits in common. Hence M7 has eigenvalue −1 while all other stabilizers have eigenvalue +1.

(b) Z4 : Both M7 and M8 have eigenvalue −1 since they have one qubit in common with Z4 (and X
and Z anticommute).

(c) X4 : Clearly X4 commutes with both X-stabilizers. It anticommutes with M3 because it has one
qubit in common (and Z and X anticommute). Hence M3 has eigenvalue −1.

(d) Y5 : We note that Y anticommutes with both X and Z so we have to consider all the stabilizers.
Y5 has a qubit in common with M3 , M4 , M7 and M8 so these stabilizers have eigenvalue −1.

Table 19.3 shows that each syndrome gives rise to a unique set of +1 and −1 eigenvalues of
the stabilizers as required. Thus, measuring the eigenvalues of the eight stabilizers in Eq. (19.44)
projects the corrupted state on to one of the 22 syndromes in Eq. (19.43), and the set of eigenvalues
determines which one it is. One then applies an appropriate unitary transformation to correct the
state if necessary. Note that the Shor code is explicitly designed to detect and correct bit-flip (X)
and phase-flip (Z) errors, but then automatically detects and corrects combined bit-flip and phase-flip
(ZX ≡ iY ) errors.
Not only that, it also corrects arbitrary errors on a single qubit, which, as discussed in Sec. 19.5,
can be expressed as linear combinations of bit-flip, phase-flip, and combined bit- and phase-flip errors.
As an example consider the situation mentioned in Eq. (19.36) in Sec. 19.5 in which a qubit has been
reset to |0i. This is an example of a non-unitary 6 operation on the qubit. Let’s take it to be qubit 1
and indicate the codeword qubits by putting the first on the left, the last on the right (we will use the
same ordering below for the ancilla qubits). In other words

|ψi = α|0i + β|1i (19.47)

has been transformed to

α
|ψ 0 i = ( |000i + |011i ) ( |000i + |111i ) ( |000i + |111i) +
23/2
β (19.48)
( |000i − |011i ) ( |000i − |111i ) ( |000i − |111i) .
23/2
According to Eq. (19.36) this can be written as

1
|ψ 0 i = (1 + X1 + iY1 + Z1 ) |ψi, (19.49)
2
where

|ψi = α ( |000i + |111i ) (· · · )+ (· · · )+ + β ( |000i − |111i ) (· · · )− (· · · )− (19.50a)

X1 |ψi = α ( |100i + |011i ) (· · · )+ (· · · )+ + β ( |100i − |011i ) (· · · )− (· · · )− (19.50b)
iY1 |ψi = α ( −|100i + |011i ) (· · · )+ (· · · )+ + β ( −|100i − |011i ) (· · · )− (· · · )− (19.50c)
Z1 |ψi = α ( |000i − |111i ) (· · · )+ (· · · )+ + β ( |000i + |111i ) (· · · )− (· · · )− , (19.50d)
6
In footnote 5 we noted that, while a transformation of the combined system+environment is unitary, if the system
is coupled to the environment, then a unitary operation applied to system+environment followed by a trace over the
environment leaves the system in a new state which is not, in general, related by a unitary transformation to its initial
state.
194 CHAPTER 19. QUANTUM ERROR CORRECTION

in which
(· · · )+ ≡ ( |000i + |111i )
(19.51)
(· · · )− ≡ ( |000i − |111i ) .

One can verify that adding Eqs. (19.50) (and dividing by 2 according to Eq. (19.49)) does indeed give
Eq. (19.48).
Equation (19.49) is the input to the syndrome measurement circuit. According to Table 19.3, after
the syndrome measurement circuit in Fig. 19.10 has acted, the state of the system is
1
[ |ψi |00000000iA + X1 |ψi |10000000iA + iY1 |ψi |10000010iA + Z1 |ψi |00000010iA ] , (19.52)
2
where | · · · iA denotes the ancillas, which are ordered from 1 on the left to 8 on the right. Measuring the
ancillas will project the computational qubits on to one of the four syndromes, |ψi, X1 |ψi, iY1 |ψi, Z1 |ψi.
Since the measurements of the ancillas tell us which syndrome the state has been projected on to, the
computational qubits can then be corrected if necessary.
Thus, Shor’s 9-qubit code, and other codes designed to correct both bit-flip and phase-flip errors,
actually correct arbitrary 1-qubit errors. I find this amazing.

19.7 Other error-correcting codes

The Shor code uses nine physical qubits to encode one logical qubit. What is the minimum number of
physical qubits needed to correct all 1-qubit errors? If we encode using n qubits the dimension of the
space of states is 2n . Now the uncorrupted syndrome is a linear combination of |0i and |1i, i.e. two
basis states. Similarly each of the corrupted syndromes is a linear combination of two basis states.
Hence 2n must be sufficient to contain 3n + 1 mutually orthogonal 2-d subspaces for the syndromes
(the 1 is for the uncorrupted state and there are n possible corruptions with each of the X, iY or Z
operators). Hence we need
2n ≥ 2(3n + 1) , (19.53)
so the smallest value is n = 5 which satisfies this condition as an equality.
There is a 5-qubit code, but it turns out to be difficult to construct the necessary gates. A more
popular choice is a 7-qubit code due to Steane [Ste96]. The Shor code, which has 9-qubit codewords,
is now mainly of pedagogical interest.

19.7.1 The 5-qubit code

We now state, without much discussion, the codewords and stabilizers for the 5-qubit code. Further
details are in Mermin [Mer07].
For the 5-qubit code we have (3 × 5) + 1 = 16 mutually orthogonal, two-dimensional subspaces,
i.e. 16 syndromes. There are four stabilizers and, since they each have two eigenvalues (±1), the
number of distinct sets of eigenvalues is 24 = 16 which is just enough to distinguish the syndromes.
These stabilizers are

M1 = Z2 X3 X4 Z5 , (19.54a)
M2 = Z3 X4 X5 Z1 , (19.54b)
M3 = Z4 X5 X1 Z2 , (19.54c)
M4 = Z5 X1 X2 Z3 . (19.54d)

The circuit to measure the Mi is shown in Fig. 19.12.

19.7. OTHER ERROR-CORRECTING CODES 195

0 H H x4

0 H H x3
0 H H x2
0 H H x1
1 Z X X
2 Z Z X
3 X Z Z
4 X X Z

5 Z X X Z

Figure 19.12: A circuit to measure the error syndrome for the 5-qubit code. The five codeword qubits
are at the bottom and the four ancillary qubits at the top. The ancillary qubits determine the values
of the four, mutually commuting stabilizers in Eq. (19.54), M1 = Z2 X3 X4 Z5 , M2 = Z3 X4 X5 Z1 , M3 =
Z4 X5 X1 Z2 , M4 = Z5 X1 X2 Z3 .

The 5-qubit codewords are most conveniently expressed in terms of the Mi :

1
|0i = (1 + M1 )(1 + M2 )(1 + M3 )(1 + M4 )|00000i, (19.55a)
4
1
|1i = (1 + M1 )(1 + M2 )(1 + M3 )(1 + M4 )|11111i. (19.55b)
4

Note that |0i is composed of the 16 basis states with an even number of 1’s, while |1i is composed of the
16 basis states with an odd number of 1’s, so the two codewords are orthogonal. It is not completely
trivial to generate these codewords, see Mermin [Mer07] for details.
Furthermore the Mi square to unity, are mutually commuting and each has eigenvalue +1 for the
uncorrupted codewords in Eq. (19.55). Each of them commutes or anti-commutes with the Xi , Yi and
Zi error operators, so the 15 corrupted syndromes and the uncorrupted state are distinguished by the
set of ±1 eigenvalues of the M ’s, as shown in Table 19.4.

19.7.2 The Steane 7-qubit code

Next I describe briefly the 7-qubit Steane code.
There are 6 stabilizers which are

M1 = X1 X5 X6 X7 , N1 = Z1 Z5 Z6 Z7 ,
M2 = X2 X4 X6 X7 , N2 = Z2 Z4 Z6 Z7 ,
M3 = X3 X4 X5 X7 , N3 = Z3 Z4 Z5 Z7 . (19.56)

The circuit to detect errors is shown in Fig. 19.13. The 7-qubit codewords are given by

1
|0i = √ (1 + M1 )(1 + M2 )(1 + M3 )|0i7 ,
8
1
|1i = √ (1 + M1 )(1 + M2 )(1 + M3 )X|0i7 , (19.57)
8
196 CHAPTER 19. QUANTUM ERROR CORRECTION

Syndrome M1 = Z2 X3 X4 Z5 M2 = Z3 X4 X5 Z1 M3 = Z4 X5 X1 Z2 M4 = Z5 X1 X2 Z3
1 + + + +
X1 + − + +
X2 − + − +
X3 + − + −
X4 + + − −
X5 − + + −
Y1 + − − −
Y2 − + − −
Y3 − − + −
Y4 − − − +
Y5 − − − −
Z1 + + − −
Z2 + + + −
Z3 − + + +
Z4 − − + +
Z5 + − − +

Table 19.4: The table shows whether the four stabilizers Mi for the 5-qubit error correcting code
commute (+) or anti-commute (−) with the 15 operators Xi , Yi and Zi , i = 1, 2, · · · , 5 (which generate
a corruption of the codeword) as well as with the identity. Each of the 16 rows has a unique pattern of
+ and − signs. A + sign corresponds to an eigenvalue +1 while a − sign indicates an eigenvalue −1.

where
X = X1 X2 X3 X4 X5 X6 X7 , (19.58)

so
|1111111i = X|0000000i. (19.59)

It is instructive for the student to show the following:

(a) The stabilizers mutually commute and square to the identity.

(b) The two states in Eq. (19.57) are orthogonal.

(c) The two states in Eq. (19.57) are normalized.

Hint: You will need to use that the Mi square to the identity, as does X, and that X commutes
with the Mi .

(d) The codewords |0i and |1i are eigenstates of each of the stabilizers with eigenvalue +1.
Hint: Note that Mi (1 + Mi ) = 1 + Mi (why?), that the Nj commute with X (explain why), and
that |0i7 is an eigenstate of the Ni with eigenvalue 1.

19.7.3 Surface Codes

A different approach to quantum error correction, but one that seems the most promising, is to use
“surface codes” in which the physical qubits are arranged in a square array and the values of the logical
qubits are encoded in complicated entangled states of the square array. Unfortunately, I have not been
able to find a simple introduction to this topic.
19.8. FAULT TOLERANT QUANTUM COMPUTING 197

0 H H x6

0 H H x5
0 H H x4
0 H H x3
0 H H x2

0 H H x1
7 Z Z Z X X X

6 Z Z X X

5 Z Z X X

4 Z Z X X

3 Z X

2 Z X

1 Z X

Figure 19.13: The circuit of Steane’s 7-qubit code to detect errors in the computational qubits, (labeled
1–7 in the figure). There are also six ancilla qubits (at the top) each of which is associated with one
of the stabilizers as follows: N1 -N3 correspond to x1 -x3 respectively, and M1 -M3 correspond to x4 -x6
respectively, in the usual way, e.g. N1 = (−1)x1 , M1 = (−1)x4 .

19.8 Fault Tolerant Quantum Computing

So far we have assumed that an error has occurred in some way and that we can correct it by perfect
gates which do not introduce any further errors. This is, of course unreasonable since all aspects of
quantum computing can introduce errors: acting with gates, measurements, or simply waiting. Looking
at the number of gates for Shor’s 9-qubit syndrome-detection code in Fig. 19.11 we might imagine that
this circuit could introduce more errors than it corrects. Of particular importance is that a circuit
does not spread an error initially in one qubit into multiple qubits which would then be much harder
to correct. A circuit which does not spread errors is said to be “fault tolerant”.
An important result in quantum error correction is the “threshold theorem” which states that if the
intrinsic error rate in an individual gate in a fault tolerant circuit is less than a critical value pc then
the overall error rate in the circuit can be reduced to arbitrary low levels by quantum error correction.
This means that errors are being corrected faster than they are being generated. However, since error
correction requires duplication, getting the error rate down to an acceptable level will require that the
number of physical qubits is much greater than the number of logical qubits (those that appear in the
algorithm).
To see how one might reduce errors to an arbitrarily low level suppose that the intrinsic error rate
is p and we have a fault tolerant error correction scheme which corrects 1-qubit errors. This means
that the error rate after error correction is7 cp2 for some constant c. If pc < 1 then we have decreased
7
The crucial point is that the new error rate is proportional to the square of the old error rate. I don’t think it’s
obvious that one can design a circuit with this property, but a detailed study indicates that one can [NC00, RP14].
198 CHAPTER 19. QUANTUM ERROR CORRECTION

the errors, so the threshold error rate is pc = 1/c.

How can we go decrease the errors further? Suppose the error correction procedure requires n
physical qubits for each logical qubit, so, for example, n = 9 for the Shor code and n = 7 for the
7-qubit Steane code. We can then take each of the n qubits and error correct these with the same
code. This procedure is known as concatenation. We then have n2 physical qubits and the error rate is
2
c(cp2 )2 = c−1 (cp)2 . Generalizing, if we concatenate l times, then the number of qubits is nl while the
l
resulting error rate is c−1 (cp)2 . Note that while the number of qubits increases exponentially with the
level of concatenation l, the error rate decreases doubly exponentially with l. As an example, to get a
feel for what this means, consider the case p = 1/8, c = 2, so cp = 1/4 and also suppose that n = 7
(corresponding to the Steane code). Then successive concatenations give the numbers in Table 19.5.

no. of concatenations (l) error rate (formula) error rate (numeric) no. of qubits
0 p 1/23 = 0.125 1
1 cp2 = c−1 (cp)2 1/25 = 0.03125 n (= 7)
2
2 c(cp2 )2 = c−1 (cp)2 1/29 = 1.953 × 10−3 n2 (= 49)
3
3 c((cp2 )2 )2 = c−1 (cp)2 1/217 = 7.629 × 10−6 n3 (= 343)
4
4 c(c((cp2 )2 )2 )2 = c−1 (cp)2 1/233 = 1.164 × 10−10 n4 (= 2401)
5
5 c(c(c((cp2 )2 )2 )2 )2 = c−1 (cp)2 1/265 = 2.711 × 10−20 n5 (= 16807)

Table 19.5: Parameters for the concatenation of a fault tolerant circuit with an (artificial) choice of
parameters discussed in the text.

These numbers are not realistic. They correspond to a threshold value of pc = 1/c = 1/2 and any
realistic circuit would have a much smaller value. However, they do show, and this is the main point,
that the error rate goes down much faster than the number of physical qubits goes up. Of course, the
number of physical qubits per logical qubit will still have to be very large to get the error rate down
to an acceptable value for computation.
Various calculations have estimated the threshold for 7-qubit Steane code at around 10−5 . To
perform error correction one would need individual circuit elements with an error rate significantly
less than this, which, to my knowledge, is not feasible at present. Surface codes, which were briefly
mentioned above, are estimated to have a higher threshold, of around 10−2 , and it does seem feasible
to make gates with a lower error rate than this. For example, at the end of a very long and technical
paper, Ref. [FMMC12] estimates that to factor, using Shor’s algorithm, an integer which is too large
to be factored on a classical computer (2000 bits), would require no less than around 220 × 106 qubits
with then state-of-the-art superconducting qubits using quantum error correction with surface codes.
At present, quantum computers (using the “gate” model of quantum computing which is the topic of
this course) have at most a few tens of qubits, so a huge increase in scale will be required. However,
who is to say that this cannot happen in a few decades? An example of a comparable increase in scale
which has already happened is the number of transistors on a modern chip compared with the number
on early integrated circuits.
Thus, in my view, in the next few years, we may see quantum computers with a modest number
of logical qubits which perform error correction. However, quantum computers with error correction
having enough logical qubits to outperform classical computers for some useful problem such as integer
factorization are for the distant future, if ever.
I thank Eleanor Rieffel for a helpful email exchange on quantum error correction.

Unfortunately, I have not been able to find a simple explanation of this result.
19.8. FAULT TOLERANT QUANTUM COMPUTING 199

Problems
19.1. Consider the 3-qubit, bit-flip code discussed in class, and in the lecture material. The circuit is
shown in Fig. 19.14. We commented that this circuit works in the situation where a bit-flip error
builds up continuously from zero. Let us verify this. Consider the corrupted state

|ψ 0 i = (1 − 2 /2)1 + i ( 1 X1 + 2 X2 + 3 X3 ) |ψi,

(19.60)
P3
where k 1 and 2 = 2
k=1 k and

|ψi = α|000i + β|111i (19.61)

is the uncorrupted state. We will work to first order in (the factor of 1 − 2 /2 is inserted so
that the normalization constant is 1 through order 2 ). |ψ 0 i is the initial state (on the left) of
the three computational qubits, labeled 1, 2 and 3, in Fig. 19.14.
Determine the state of the system (computational qubits plus ancillas) after the error detection
circuit has operated.
Then consider the correction phase. What are the possible results of the measurements of the
ancillas, what are the probabilities of these results, and what is the resulting state of the com-
putational qubits?
(You should conclude that the bit-flip error has been corrected for all possible results of the
measurement of the ancillas.)

detection correction

0 H H x

0 H H y
~
xy
1 Z X
xy
2 Z Z X ?
~
xy
3 Z X

Figure 19.14: Circuit for syndrome detection for the 3-qubit bit-flip code, and for correction if necessary.

19.2. In question 19.1 we implicitly assumed that the time dependence of the computational qubits has
proceeded in a unitary manner including the point where the error has developed. In other words
the error is in the circuit itself. However, a very common cause of errors in a quantum computer
is that the qubits have an unwanted interaction with the environment. The environment becomes
entangled with the qubits leading to “decoherence”, which is the main difficulty in building a
useful quantum computer.
Let us apply the same 3-qubit, bit-flip code shown in Fig. 19.14 to an error model where the
error comes from the prior interaction of the qubits with the environment.
200 CHAPTER 19. QUANTUM ERROR CORRECTION

A system interacting with environment is not in a single quantum state but can be represented
as being in different quantum states with various probabilities8 . Let us suppose, then, that the
system is described as follows (in which we again only allow for single bit-flips):
Probability :P0 , |ψ 0 i = α|000i + β|111i = |ψi
Probability :P1 , |ψ 0 i = α|100i + β|011i = X1 |ψi
Probability :P2 , |ψ 0 i = α|010i + β|101i = X2 |ψi
Probability :P3 , |ψ 0 i = α|001i + β|110i = X3 |ψi, (19.62)
where, of course, 3i=0 Pi = 1. Note that these states are incoherent in the sense that there is no
P
interference between the different states. This is different from Eq. (19.60) where the different
pieces of the wave function have well defined relative phases (i.e. the superposition is coherent)
and so can potentially interfere.
Describe the result of acting with the “detection” part of the circuit.
Then consider the “correction” part and derive the possible results of the measurements of the
ancillas and their probabilities. Show that, like the case of the coherent bit-flip error of Eq. (19.60)
in Qu. 19.1, the circuit succeeds in correcting the error.
Note: The difference between questions 19.1 and 19.2 is that in the former the corruption is
due to a coherent superposition of 1-qubit corrupted states, while in the latter it is due to an
incoherent sum of 1-qubit corrupted states with various probabilities. To answer Qu. 19.2 you
have to discuss, for each of the states in the incoherent sum, what is the state of the ancillas and
how the error correction is done.
By doing both these two questions you see that error correction works irrespective of whether the
error is due to a coherent addition of corrupted states (perhaps due to the gates not functioning
correctly) or to an incoherent addition of corrupted states due to the computational qubits
becoming entangled with the environment.
19.3. Shor’s 9-qubit code

(i) We mentioned in class that it is necessary that the (uncorrupted) codewords are eigenvectors
of all the stabilizers with eigenvalue +1. Show that this is the case for stabilizers M1 and
M8 of Shor’s 9-qubit code.
(ii) We all discussed in detail the table of ±1 eigenvalues for the stabilizers acting on the 22
syndromes. Here is an extract from that table, for the syndrome where there is a 1-qubit
corruption due to Y4 (i.e. a combined phase-flip and bit-flip acting on qubit 4).
Syndrome M1 M2 M3 M4 M5 M6 M7 M8
,
Y4 + + − + + + − −
Here “+” means eigenvalue +1 and “−” means eigenvalue −1.
Explain the sign of each of these ±1 eigenvalues.

19.4. As discussed in class, the four stabilizers for the 5-qubit error correcting code are
M1 = Z2 X3 X4 Z5 , (19.63a)
M2 = Z3 X4 X5 Z1 , (19.63b)
M3 = Z4 X5 X1 Z2 , (19.63c)
M4 = Z5 X1 X2 Z3 . (19.63d)
8
The correct way to describe this is with the density matrix discussed in Chapter 5, but we will not need the details
of the density matrix here.
19.8. FAULT TOLERANT QUANTUM COMPUTING 201

We also stated that the pattern of +1 and −1 eigenvalues for the stabilizers among the 16
syndromes (1 uncorrupted and 3 × 5 = 15 corrupted) are given by

X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3 X4 Y4 Z4 X5 Y5 Z5 1
M1 = Z2 X3 X4 Z5 +++ −−+ +−− +−− −−+ +
M2 = Z3 X4 X5 Z1 −−+ +++ −−+ +−− +−− + ,
M3 = Z4 X5 X1 Z2 +−− −−+ +++ −−+ +−− +
M4 = Z5 X1 X2 Z3 +−− +−− −−+ +++ −−+ +

where the top row indicates which Pauli operator is used to generate the corrupted state from
the uncorrupted state.

(i) Show that the stabilizers square to the identity.

(ii) Show that they are mutually commuting.
(iii) By considering the nature of the commutation of the stabilizer with the relevant Pauli op-
erator, explain the results in the table for the columns X3 , Y4 and Z5 .
Note: You may assume without proof that the right-hand column is correct, i.e. the eigen-
values of all the stabilizers are +1 for the uncorrupted state.

19.5. Using the expressions for the stabilizers of the 5-qubit code given in Qu. 19.4, draw the circuit
to detect 1-qubit errors in the 5-qubit code.

19.6. (More challenging)

Consider the 7-qubit Steane code. There are 6 stabilizers which are

M1 = X1 X5 X6 X7 , N1 = Z1 Z5 Z6 Z7 ,
M2 = X2 X4 X6 X7 , N2 = Z2 Z4 Z6 Z7 ,
M3 = X3 X4 X5 X7 , N3 = Z3 Z4 Z5 Z7 . (19.64)

The circuit to detect errors is shown in Fig. 19.15. The 7-qubit codewords are given by
1
|0i = √ (1 + M1 )(1 + M2 )(1 + M3 )|0i7 ,
8
1
|1i = √ (1 + M1 )(1 + M2 )(1 + M3 )X|0i7 , (19.65)
8
where “1” refers to the identity operator,

X = X1 X2 X3 X4 X5 X6 X7 , (19.66)

so
|1111111i = X|0000000i. (19.67)

(i) Show that the stabilizers mutually commute and square to the identity.
(ii) Show that the two states in Eq. (19.65) are orthogonal.
(iii) Show that the two states in Eq. (19.65) are normalized.
Hint: You will need to use that the Mi square to the identity, as does X, and that X
commutes with the Mi .
202 CHAPTER 19. QUANTUM ERROR CORRECTION

0 H H x6

0 H H x5
0 H H x4
0 H H x3
0 H H x2

0 H H x1
7 Z Z Z X X X

6 Z Z X X

5 Z Z X X

4 Z Z X X

3 Z X

2 Z X

1 Z X

Figure 19.15: The circuit of Steane’s 7-qubit code to detect errors in the computational qubits, (labeled
1–7 in the figure). There are also six ancilla qubits (at the top) each of which is associated with one
of the stabilizers as follows: N1 -N3 correspond to x1 -x3 , and M1 -M3 correspond to x4 -x6 , in the usual
way, e.g. M1 = (−1)x1 .

(iv) Show that the codewords |0i and |1i are eigenstates of each of the stabilizers with eigenvalue
+1.
Hint: Note that Mi (1 + Mi ) = 1 + Mi (why?), that the Nj commute with X (explain why),
and that |0i7 is an eigenstate of the Ni with eigenvalue 1.

19.7. (More challenging)

Consider operators which act equally on all qubits in the 7 qubit code:
Z = Z1 Z2 Z3 Z4 Z5 Z6 Z7 , H = H1 H2 H3 H4 H5 H6 H7 , (19.68)
and similarly X defined in Eq. (19.66).
(i) Show that X implements the logical NOT gate (i.e. logical X) on the codewords, i.e.
X|0i = |1i, X|1i = |0i. (19.69)

(ii) Show that Z implements the logical Z on the codewords, i.e.

Z|0i = |0i, Z|1i = −|1i. (19.70)

(iii) (Harder) Show that H implements the logical H on the codewords, i.e.
1 1
H|0i = √ |0i + |1i , H|1i = √ |0i − |1i . (19.71)
2 2
Hints:
19.A. SUMMARY OF QUANTUM ERROR CORRECTION 203

• We want to show that

1 1
h0|H|0i = h1|H|0i = h0|H|1i = √ , h1|H|1i = − √ . (19.72)
2 2
• Hence we need to calculate

1 x
hx|H|yi = 7 h0|X (1 + M1 )(1 + M2 )(1 + M3 )H(1 + M1 )
8
y
× (1 + M2 )(1 + M3 )X |0i7 . (19.73)

• Derive the results

HMi = Ni H, Mi H = HNi , (19.74)
and use them to show that you can replace the Mi in Eq. (19.73) by Ni .
• Show that each Ni commutes with X and apply this result.
• Use that each Ni acts as the identity on |0i7 .

Note: Having codeword gates that are tensor products of single qubit gates is very helpful when
designing circuits to implement an error correcting code. A similar result also holds for CNOT.
In Steane’s code the logical CNOT gate that takes |xi|yi to |xi|x ⊕ yi, is simply made up of
CNOT gates applied to each of the seven pairs of qubits in the two codewords.
The results in this question for Hadamards and CNOT gates do not apply, for example, to the
5 qubit code of Qu. 19.4. That they do apply to Steane’s 7 qubit code is one of the reasons why
this code is a popular choice.

19.8. In the last question we showed

Q that, for the 7-qubit Steane code, the logical
Q X acting on the
codewords is implemented by j Xj , and the logical Z is implemented by j Zj . Show that the
corresponding results for Shor’s 9-qubit code do not hold. Instead, show that one has, rather
curiously,
Y9 9
Y
Zj ≡ X, Xj ≡ Z. (19.75)
j=1 j=1

Appendices

19.A Summary of Quantum Error Correction

This chapter has been quite involved so it is easy to get lost in the details and not appreciate the main
ideas. Therefore, in this appendix, I summarize those ideas.
A logical qubit is represented by n physical qubits. We consider codes that can correct errors in
just one of those qubits. The initial state is therefore assumed to be a superposition of the uncorrupted
state, with an amplitude close to 1, plus all possible single qubit corruptions with small amplitude.
Since each qubit can be corrupted with an X, Y or Z Pauli operator, there are usually 3n corrupted
states9 and so there are usually 3n + 1 states in total in the superposition. These are called syndromes.
Omitting to write the states of the environment for simplicity of notation, the initial state is

s −1
NX
|ψi → cα Aα |ψi (19.76)
α=0
9
The Shor 9-qubit code that we discussed in detail has fewer because some corruptions give the same state.
204 CHAPTER 19. QUANTUM ERROR CORRECTION

where |ψi is the uncorrupted state, α = 0 represents the uncorrupted state so A0 = I (the identity),
the other Aα are Pauli operators Xi , Yi or Zi (i = 1, 2, n), Ns is the number of syndromes (usually
Ns = 3n + 1), c0 is the amplitude of the uncorrupted state which is close to 1 in magnitude, and the
other cα are much less than 1 in magnitude.
In addition we have m ancilla qubits. We denote a state of the ancillas by |xiA where x is an m-bit
integer whose binary representation is the state of the ancilla qubits. Initially, the state of the ancillas
is |0iA .
The error detection circuit entangles the n codeword qubits with the m ancilla qubits, so the final
state of the combined codeword-ancilla system, after the error detection circuit has acted, is
s −1
NX
cα Aα |ψi ⊗ |xα iA , (19.77)
α=0

where each syndrome is associated with a distinct state of the ancillas, represented by the integer xα ,
with the unperturbed syndrome having x0 = 0.
A measurement is then made of the ancillas, whose state after the measurement is represented by
the m-bit integer xα̃ corresponding to one of the syndromes α̃. The codeword has then been projected
on to the α̃ syndrome, i.e. Aα̃ |ψi. From the measured xα̃ we know α̃ (since each xα specifies a unique
syndrome α), and hence, if α̃ 6= 0 so there is an error, we can correct that error by acting on the
codeword qubits with Aα̃ 10 . The codeword qubits are then in the uncorrupted state |ψi, as required.

10
Recall that the Aα are Pauli operators which square to unity.
Chapter 20

Grover’s Search Algorithm

20.1 Introduction
Grover’s algorithm discussed in this chapter is of a different type from Shor’s algorithm. Whereas
Shor’s (and related algorithms like Simon’s) depend on a quantum Fourier transform (of some sort),
Grover’s algorithm involves a different approach, amplitude amplification.
To motivate Grover’s algorithm consider looking up someone in a phone directory. It is straight-
forward to lookup a person’s phone number in a directory if one is given the name, because names are
in alphabetic order. To locate the name systematically one would go to the midpoint of the list, see
which half the name is in, divide that half in two, again see which half the number is in, and so on.
One continues this procedure until the size of the region containing the desired entry is just one. For
a directory with N entries, this bisection method takes log2 N operations (rounded up to the nearest
integer if N is not a power of 2) since one halves the range over which the special entry could be at
each stage.
By contrast, suppose one is given the number and asked which person has that number. Since the
numbers are not ordered, all one can do is go through the entries one at a time and see if each one has
the desired name. On average this would take N/2 operations before success was achieved.
If N is large this is a huge difference. For example if N = 106 then log2 N ' 20, to be compared
with N/2 = 5 × 105 . Note that if the N possible values are represented by the configurations of n
qubits then
N = 2n . (20.1)

The quantum search algorithm algorithm discussed here, due to Grover, is often presented as such
a search of an unstructured database.1 Grover’s algorithm requires a quantum computer running a
subroutine for which the input is a number corresponding to an entry in the database, and which
performs a test to see if this is the special value being searched for. For large N√it will determine the
special value, with probability close to 1, by calling the subroutine only (π/4) N times. This is a
quadratic speedup compared with a classical computer. While less spectacular than the exponential
speedup of Shor’s algorithm, it can potentially be applied to a wide variety of problems2 .

1
Though it is doubtful it would ever be used in this way since it would be a very extravagant use of a precious resource
to use qubits to store classical information.
2
However, most applications of practical interest have some structure, whereas Grover is designed for problems with
no structure. In most cases that Grover could potentially be applied, the structure of the problem allows an efficient
classical algorithm which outperforms Grover. Thus it is debated whether the Grover algorithm would be of practical
utility, even if one could overcome the severe experimental difficulties of building a large quantum computer.

205
206 CHAPTER 20. GROVER’S SEARCH ALGORITHM

20.2 The Black Box (Oracle)

To formulate the problem we consider n-bit integers, one of which, a, is special. The goal is to find a.
We need a subroutine which outputs 1 if the input value x is equal to a and outputs 0 otherwise, i.e.

f (x) = 0, (x 6= a),
(20.2)
f (a) = 1 .

As usual, the function will be determined from a unitary transformation acting on an n-qubit
“input” register and an “output” qubit which is flipped or not flipped depending on whether x is the
special number a or not:
U |xin |yi1 = |xin |y ⊕ f (x)i1 . (20.3)

x0 x0

x1 X X x1

x2 X X x2

x3 x3

x4 X X x4

y X y + f(x)

Figure 20.1: A black box circuit that executes the first part of a Grover iteration, Eq. (20.3), in which
f (x) = 0 if x 6= a and f (a) = 1, for the case of n = 5 qubits and where the special number a is 01001.
The 6-qubit gate in the center is a five-fold-controlled-NOT gate which acts to flip the target qubit
y only if all the control qubits are 1. The X gates on the left flip qubits x1 , x2 and x4 . Hence the
target qubit is flipped if and only if x0 = 1, x1 = 0, x2 = 0, x3 = 1, x4 = 0, which are the bits of a.
The X-gates on the right flip back those qubits which had previously been flipped, thus leaving the
“input” register, the {|xi i}, unchanged. The lower “output” qubit, which is initialized to |yi, contains
information on the function f (x) in its final state.

A simple example of such a function for n = 5 and a = 01001 is shown in Fig. 20.1. Recall
that x0 is the least significant (i.e. right-hand) bit. The target qubit is flipped only if all five of the
control bits are one, which requires x0 = 1, x1 = 0, x2 = 0, x3 = 1, x4 = 0 (the bits of a). How to
construct such a five-fold-controlled-NOT gate out of 1-qubit and 2-qubit elementary gates is discussed
in Mermin [Mer07] §4.2.
Such a black box function is called an oracle. An oracle gives the output for the input values
which are fed into it but one is not allowed to “open the box” and see how it is made. Of course, for
the implementation in Fig. 20.1 if you did look at the workings of the circuit you would immediately
determine the special value a. However, the implementation of the black box in Fig. 20.1 is a simple
example. The Grover algorithm can also be applied in more useful situations where the value of f (x)
is not built in explicitly but has to be calculated in a non-trivial way and so for these cases “opening
the box” wouldn’t help to solve the problem. Examples are discussed in Mermin [Mer07] and Nielsen
and Chuang [NC00].
It is useful to initially set the “output” qubit y to be 1 and then apply a Hadamard gate before
20.2. THE BLACK BOX (ORACLE) 207

applying U . The “output” qubit is then

1
H|1i = √ ( |0i − |1i ) . (20.4)
2
If the result of U is f (x) = 0 then the “output” qubit is unchanged. If the result is f (x) = 1 then
|0i → |1i and vice-versa, so the “output” qubit changes sign. Consequently

U ( |xin ⊗ H|1i1 ) = (−1)f (x) |xin ⊗ H|1i1 . (20.5)

We can associate the possible sign change with the “input” register” in which case the “output” qubit
remains unchanged. Hence, for simplicity, the “output” qubit will be ignored in what follows. Thus
we consider the following unitary operator Ô acting only on the n-qubit “input” register3 :

f (x) |xi, x 6= a,
Ô|xi = (−1) |xi = (20.6)
−|ai, x = a.

Since U , and hence Ô, are linear, acting with Ô on a superposition changes the sign of the component
along |ai but leaves the component perpendicular to |ai unchanged. Hence if
X
|ψi = cx |xi, (20.7)
x

We shall see that all the subsequent states generated during the Grover algorithm can also be
written as a linear combination of |ai and |a⊥ i. These can be conveniently drawn as vectors in the
2-dimensional space spanned by these two basis vectors, see Fig. 20.2.
Hence |ψ0 i makes an angle θ0 with the |a⊥ i axis where sin θ0 = ha|ψ0 i, or
1
sin θ0 = √ , (20.12)
N
3
We omit the subscript n on the states from now on since we will only be dealing with n-qubit states.
208 CHAPTER 20. GROVER’S SEARCH ALGORITHM

1/2 ψ
1/N 0
θ0
a

Figure 20.2: Projection of the 2N -dimensional space on to a 2-dimensional space spanned by |ai and
|a⊥ i, the latter being a (normalized) equal linear combination of all basis states except for |ai itself,
see Eq. (20.11). The vector in bold is the initial state |ψ0√
i, an equal linear combination
√ of all basis
states, see Eq. (20.9). The vector |ψ0 i has a projection 1/ N on to |ai, so sin θ0 = 1/ N , where θ0
is the angle between |ψ0 i and |a⊥ i.

so we can express |ψ0 i in Eq. (20.10) as

|ψ0 i = sin θ0 |ai + cos θ0 |a⊥ i. (20.13)

Note that |ψ0 i, |a⊥ i and |ai are all normalized.

From Eq. (20.13) we see that if we were to measure |ψ0 i now, we would get |ai with probability
2
sin θ0 (= 1/N ), which is very small for large N . (Of course we can also see that the probability is
1/N directly from Eq. (20.9).) The goal of the Grover algorithm is to iteratively rotate the vector
representing the state of the input register from its initial direction, that of |ψ0 i (which is close to the
|a⊥ i axis), to a direction close to the |ai axis, because a measurement of it will then give a with high
probability. This is called amplitude amplification.

1/2 ψ
1/N 0
θ0
a
θ0
O ψ0

Figure 20.3: Figure showing that the action of the operator Ô is to reflect the state it is acting on, in
this case |ψ0 i, about the |a⊥ i axis.
20.3. THE SECOND STEP OF THE GROVER ITERATION 209

As shown in Eq. (20.8) the action of Ô is to invert the component along |ai of the vector it acts
on, while keeping the component perpendicular to |ai unchanged. The net effect is to reflect about
the |a⊥ i axis. Figure 20.3 shows the effect of Ô on the initial state |ψ0 i. To rotate the direction of the
state towards the |ai axis we will need a second unitary operation that is discussed in the next section.

20.3 The second step of the Grover iteration

ψ = S O ψ0
1

1/2 θ1 2θ0
1/N ψ
0
θ0
a
θ0
O ψ0

Figure 20.4: Figure showing that the action of the operator Ŝ is to reflect the state it is acting on, in
this case Ô|ψ0 i, about the direction of |ψ0 i which is defined in Eq. (20.9). The net result of the two
operations, Ô followed by Ŝ, is to rotate the direction of |ψ0 i by 2θ0 in an anti-clockwise direction. We
will call the new state |ψ1 i. It is at an angle θ1 = θ0 + 2θ0 to the |a⊥ i axis.

The second stage of a single Grover iteration is independent of the special number a. It changes
the sign of the component perpendicular to the initial state |ψ0 i and keeps unchanged the component
along |ψ0 i. Denoting this operation by Ŝ we have

|φi → |φ0 i = Ŝ|φi = 2|ψ0 ihψ0 |φi − |φi, (20.14)

where |φi is an arbitrary state. You should check that hψ0 |φ0 i = hψ0 |φi, so the component along
|ψ0 i is unchanged, and for a state |µi which is orthogonal to |ψ0 i, hµ|φ0 i = −hµ|φi, showing that
the component perpendicular to |ψ0 i has the sign changed. The net result if to reflect |φi about the
direction of |ψ0 i.
Figure 20.4 shows the effects of Ŝ acting on the state generated by Ô|ψ0 i. The combined effect of Ô
followed by Ŝ is to rotate the initial state |ψ0 i by 2θ0 in an anti-clockwise direction, i.e. 2θ0 towards the
desired direction of the |ai axis. The combination of these two operations is called a Grover iteration,
implemented by the Grover operator
Ĝ = Ŝ Ô. (20.15)

The effect of the first Grover iteration, therefore, is to take the initial state |ψ0 i and rotate it
anti-clockwise by 2θ0 . We will call the resulting state |ψ1 i. It is at an angle θ1 to the |a⊥ i axis, where

θ1 = θ0 + 2θ0 , (20.16)

see Fig. 20.4.

210 CHAPTER 20. GROVER’S SEARCH ALGORITHM

20.4 Subsequent iterations

Subsequent Grover iterations perform the same two steps: Ô which reflects about |a⊥ i followed by Ŝ
which reflects about |ψ0 i. The overall circuit implementing the Grover algorithm is shown in Fig. 20.5.

xn
0 n H a
n n
G G
1 H
1/2
O(N times)

Figure 20.5: Circuit implementing the Grover algorithm. Ĝ is the Grover operator, given by Ĝ = Ŝ Ô
where Ô and Ŝ are given by Eqs. (20.8) and (20.14) respectively. It acts only on the √
n input qubits
(the upper line). The output qubit (the lower line) remains unchanged by Ĝ. After O( N ) iterations
of the Grover operator, the result of a measurement on the input qubits is the special value a with
high probability.

If m iterations have already been done, so the current state is |ψm i, Fig. 20.6 shows the effect of
doing an additional iteration. The state |ψm i makes an angle θm with the |a⊥ i axis, so Ô rotates the
direction by 2θm clockwise, while Ŝ rotates it by 2(θm + θ0 ) anti-clockwise. The net result is a rotation
by 2θ0 (independent of θm ) anti-clockwise, which is towards the desired direction, |ai, i.e.

θm+1 = θm + 2θ0 , (20.17)

which gives
θm = (2m + 1)θ0 (20.18)

The relationship between |ψm i, |ai and |a⊥ i is

|ψm i = cos θm |a⊥ i + sin θm |ai. (20.19)

According to Eq. (20.19), the amplitude for |ψm i to be measured in state |ai, i.e. ha|ψm i, is
sin θm = sin[(2m + 1)θ0 ], the projection on to the vertical axis in Fig. 20.6. This increases as m
increases up to the point where θm = π/2 but then decreases. One therefore takes the number of
Grover iterations, m, to be such that θm ' π/2. From Eqs. (20.18) and (20.12) we see that we need

1 π
θm = (2m + 1)θ0 = (2m + 1) sin−1 √ = , (20.20)
N 2

which, for large N , gives

π√
m= N. (20.21)
4
When θm ' π/2 measuring the state gives a with high probability.
We do not have to get the number of iterations precisely right. After m iterations, the probability
that a measurement gives a is sin2 θm = sin2 [(2m + 1)θ0 ]. Any value of θm in the range

π 3π
< θm < (20.22)
4 4
20.4. SUBSEQUENT ITERATIONS 211

a
ψm+1 = S O ψm

ψm
2 θ0

θ m+1 θm
1/2
1/N ψ0
θ0
a

θm

O ψm

Figure 20.6: After the m-th iteration of the Grover algorithm, the state |ψ0 i has been rotated to |ψm i,
which makes an angle θm with the |a⊥ i axis. At the next iteration of the Grover algorithm, firstly the
action of Ô reflects |ψm i about the |a⊥ i axis as shown. This is equivalent to a clockwise rotation by
2θm so Ô|ψm i is at an angle θm below the |a⊥ i axis. Secondly, the state Ô|ψm i is acted on by Ŝ which
reflects about the direction of |ψ0 i. This is equivalent to an anti-clockwise rotation by 2(θm + θ0 ). The
net effect of the two operations is to rotate |ψm i by an angle 2θ0 in an anti-clockwise direction. Hence
the new state |ψm+1 i is at an angle θm+1 = θm + 2θ0 to the |a⊥ i axis. The amplitude for the state
|ψm i to be |ai is the projection on to the vertical axis, which increases with m up to the point where
θm = π/2.

will get determine a correctly with a probability greater than 1/2. For large N this corresponds to

π√ 3π √
N <m< N. (20.23)
8 8

√
Note that the probability decreases for m > (π/4) N , unlike many algorithms where increasing the
number of iterations progressively improves the probability of success.
√
The operation count of the Grover algorithm is O( N ) which is a quadratic speedup compared
with the O(N ) count on a classical computer. The quantum speedup comes, of course, from quantum
parallelism; all N = 2n values of f (x) are evaluated in parallel, so naively it looks as though we should
be able to get a speedup by a factor of N , i.e. an operation count of O(1). However, if one measured
directly after computing the function, one would just get one value of x and the corresponding f (x),
which is no better than on a classical computer. It requires additional operations, in the form of
the Grover operator Ĝ√applied iteratively, to extract a speedup, √ which in this case only reduces the
operation count to O( N ) not O(1). One can show that the O( N ) operation count of the Grover
algorithm is optimal. An operation count of O(1) is proved to be impossible.
212 CHAPTER 20. GROVER’S SEARCH ALGORITHM

20.5 Extensions
20.5.1 More than one special value
In the standard implementation of the Grover algorithm it is assumed that there is only one special
value. If there are M solutions, ai , i = 1, · · · , M then, proceeding along the lines of the derivation for
one solution, one finds [NC00, Mer07, Vat16, RP14]:

(a) The states generated by the Grover algorithm can be written as a linear combination of a uniform
superposition of all the special states,

1 X
|ai = √ |xi, (20.24)
M x ∈ {ai }

and a uniform superposition of all the other states,

1 X
|a⊥ i = √ |xi. (20.25)
N −M x 6∈ {ai }

We see that |ai and |a⊥ i are normalized.

rather than Eq. (20.12). Consequently we can write Eq. (20.26) in terms of θ0 in the same way
as for M = 1, namely Eq. (20.13).

(c) Subsequent iterations rotate the direction of the state by an angle 2θ0 towards the |ai axis and
so, after m iterations, the angle θm is given by Eq. (20.18), and the state |ψm i is given by
Eq. (20.19). Hence the effect of each Grover iteration, when expressed in terms of θ0 , is the same
as for M = 1, and the only difference compared with M = 1 is that θ0 is given by Eq. (20.27)
rather than (20.12).

(d) Assuming M N , then θm is approximately π/2 when the number of iterations m is given by
r
π N
m= . (20.28)
4 M

After this number of iterations of the Grover operator, with high probability a measurement of
the state will give one of the special values ai with equal likelihood.

The student is advised to check these steps.

20.5. EXTENSIONS 213

20.5.2 Quantum Counting

The results of the previous subsection are only useful if we know in advance how many special values,
M , there are. If we have no prior knowledge of M , how can we determine it? We saw that the Grover
operator Ĝ rotates vectors in the |ai–|a⊥ i plane by an angle 2θ0 , where θ0 is given by Eq. (20.27) and
so depends on M . In other words, in the space of |ai and |a⊥ i, the Grover operator has the standard
form of a rotation matrix
cos 2θ0 − sin 2θ0
Ĝ = . (20.29)
sin 2θ0 cos 2θ0

The eigenvalues of Ĝ are easily found to be exp(±2iθ0 ). (It is a general property of unitary matrices
that their eigenvalues are a pure phase.) We already showed in section 16.4 in Chapter 16 that the
phase of the eigenvalue of a unitary matrix can be determined from the phase estimation algorithm
using Shor’s quantum Fourier transform.
Consequently, we can determine θ0 (and hence M ), and also get one of the special values ai , by
combining the Quantum Fourier Transform with Grover’s algorithm. In fact this “quantum counting”
algorithm will even tell us whether or not a special value exists at all, i.e. whether or not M = 0. The
interested student can find more details in advanced texts such as Refs. [NC00, RP14].

Problems
20.1. Consider the Grover algorithm in which you have to find one marked state out of N = 4 states.
Show that the algorithm succeeds with probability 1 after 1 iteration.

20.2. You have to find one marked state out of N = 2 states. Classically, picking one state at random
has a probability of 1/2 to succeed. Show that the Grover algorithm does not improve these
odds.

20.3. Assume that there are M marked states out of N . Fill in the details of the derivation, sketched
in Sec. 20.5.1, of the required number of Grover iterations. (Assume that N is large.)
214 CHAPTER 20. GROVER’S SEARCH ALGORITHM
Chapter 21

Quantum Protocols Using Photons

There are several problems of interest where qubits can be considered one at a time, without needing
any qubit-qubit interactions. Photons are ideal qubits for this because their interactions with each
other are immeasurably weak, and they can be propagated down optical fibres for a big distance
with little attenuation while preserving their polarization. You will recall from Sec. 1.4 that it is the
polarization of the photon which characterizes the qubit, e.g.:

|0i ≡ | ↔ i, (left−right)

|1i ≡ | l i, (up−down)
1 (21.1)
|+i = H|0i = √ (|0i + |1i) ≡ | i, (one of the diagonals)
↔

2
1
|−i = H|1i = √ (|0i − |1i) ≡ | i, (the other diagonal).
l

2
The connection between the polarization of photons and qubit states was described in more detail in
Sec. 4.1.
Several quantum protocols involving photons have been successfully implemented. Here we will
discuss applications to cryptography and “teleportation”, the latter being set as a homework problem
with lots of help. Some references are [NC00, Vat16, Mer07].

21.1 Quantum Key Distribution

Cryptography is concerned with transmitting secret messages. There are two main approaches:

• Public Key
An example is the RSA scheme which we already met in Chapter 13 in the context of Shor’s
algorithm for factoring integers. Let us briefly review the basic idea. Suppose Bob wants to send
a message to Alice. Alice sends her public key down an open channel to Bob who uses this to
encrypt his message. Alice decodes the encrypted message using her private key. The private key
is not shared, only the public key. Security depends on the difficulty of decoding the message
without the private key. In the case of RSA we recall that this required factoring a large integer.

• Private key (or symmetric key). (Note: public key encryption is not symmetric between
sender and receiver.)
Alice and Bob share a private key, which has been generated and shared in advance. This must
be as long as the message and, as we shall explain later, can only be used once. But how do Alice
and Bob share the private key securely? Perhaps Alice could put it in a box and send it to Bob

215
216 CHAPTER 21. QUANTUM PROTOCOLS USING PHOTONS

by FedEx. This is not convenient which is why internet transactions use public key encryption
instead.

We shall now see that quantum mechanics can help with securely sharing private keys, using what
is called Quantum Key Distribution (QKD).
The idea of QKD is to create a one-time codepad which Alice and Bob share. By using quantum
mechanics, Alice and Bob will be able to detect whether an eavesdropper whom, following tradition,
we shall call Eve, is trying to intercept their messages when they share the codepad.
The codepad is a shared random string of bits R, which must be at least as long as the message.
Alice encodes the message M by bit-wise XOR-ing it with the random string, i.e.

Alice : M −→ M ⊕ R (= M 0 ). (21.2)

Bob decodes the encoded message M 0 by also XOR-ing it with R, i.e.

Bob : M 0 −→ M 0 ⊕ R = M. (21.3)

This works because M ⊕ R ⊕ R = M , as we have discussed several times before in the course.
We now explain why this codepad can only be used once securely. Suppose we send two encoded
messages using the same codepad, i.e.

M10 = M1 ⊕ R
(21.4)
M20 = M2 ⊕ R.

Anyone intercepting the message can XOR the two messages with the result

M10 ⊕ M20 = M1 ⊕ R ⊕ M2 ⊕ R = M1 ⊕ M2 , (21.5)

so the random string has dropped out. The eavesdropper can then use standard methods (e.g. letter
frequency) to decrypt. This is harder than for a single message since one has to extract both messages,
but may be feasible. Hence the great security1 coming from using a random bit string has been lost.
How do Alice Bob know that their random bit string R was not intercepted by Eve as they were
sharing it? This is where quantum mechanics comes into play.

21.1.1 BB84 protocol

We describe here the method proposed by Bennett and Brassard in 1984 (BB84). Alice sends Bob
a long string of photons. Each photon is in one of the four polarization states in Eq. (21.1). The
polarization states corresponding to qubits |0i and |1i we will call Z-basis qubits (since this is the
basis in which Z is diagonal). The polarization states corresponding to H|0i = √12 (|0i + |1i) and
H|1i = √12 (|0i − |1i) we will call X-basis qubits (since this is the basis in which X is diagonal). To
decide in which basis to send a photon Alice generates a random integer taking values 0 and 1. If
she gets 0 she sends a Z-basis photon, and if she gets 1 she sends an X-basis photon. Within each
basis-type there are two states, which Alice chooses by generating a second random integer, again
taking values 0 and 1. If she gets 0 she sends |0i if the Z-basis were chosen and H|0i if the X-basis
were chosen. If she gets 1 for the second random number, she sends |1i or H|1i, depending on whether
the Z-basis or X-basis was chosen. An example of a set of photons sent to Bob is

basis Z X X X Z Z X Z X ···
state 0 1 0 1 1 0 1 0 0 ··· (21.6)
1
If the bit string is truly random it is impossible to decrypt the message without knowing the string.
21.1. QUANTUM KEY DISTRIBUTION 217

Bob receives these qubits and decides randomly whether to measure in the Z-basis or the X-basis.
Note that the photons are individually identifiable by the sequence in which they arrive.
If the basis in which Alice sends a photon (Z or X) is the same as that in which Bob measures it,
then the state which Bob measures, 0 or 1, must be the same as the state that Alice sent. However
if the bases for sending and measuring are different, then Bob will only find the same state as Alice
about half the time. Alice tells Bob over an insecure channel which photons were in the Z basis and
which in the X-basis, but not the state. Bob then tells Alice over an insecure channel for which of
the photons he measured in the same basis as she sent it in. They keep these and discard the others
(about 1/2 on average).
The onetime codepad is the set of random bits corresponding to the state of the qubits for which
Alice and Bob measured in the same basis. Note that this information was not sent down the insecure
channel, only the basis was sent. Recall that if Alice and Bob use the same basis they must get the
same state.
Let’s complete the above example with a possible set of measurements that Bob made.

Alice basis Z X X X Z Z X Z X ···

state 0? 1 0 1 1 0? 1 0 0 ···

Bob basis X X X Z Z X Z Z Z ···

? ?
state 1 1 0 1 1 1 1 0 0 ··· (21.7)

For the photons where Alice’s and Bob’s bases agree, the information is boxed. For these photons,
the state that Alice generated and that which Bob measured agree. For the other photons, the states
agree only half the time on average. The cases where the states disagree are starred (in this example,
the states differ for 2 out of the 5 cases where the bases differ).
The codepad which Alice and Bob have shared is the set of states for which their bases agree, i.e.

R = 1010 · · · . (21.8)

How can Alice Bob know if Eve is interrupting the photons? Consider the “good” photons, those
where Alice and Bob used the same basis. If Eve is not interrupting them, then Alice and Bob agree
on the state with 100% probability. However, if Eve measures the photons and sends them on to Bob,
then Alice and Bob will have different states some of the time, as we now show.
Like Alice and Bob, Eve will have to choose a random basis for each photon. There is probability
1/2 that she will choose the same basis as the common basis of Alice and Bob, and probability 1/2
that she will choose a different basis. If she chooses the same basis, then the state of the qubit which
she measures and sends on to Bob will be the same as the one Alice sent. Hence, for these photons,
Eve’s interception can not be detected. However, from
1 1
H|0i = √ (|0i + |1i) , H|1i = √ (|0i − |1i) ,
2 2
1 1
|0i = √ (H|0i + H|1i) , |1i = √ (H|0i − H|1i) . (21.9)
2 2
we see that, out the times when Eve chooses a different basis from the common basis of Alice and Bob,
there is a probability 1/2 that Eve’s intervention will result in her sending on to Bob a photon in the
opposite state from the one which Alice sent. Hence, for the photons where Alice and Bob used the
same basis, Eve’s intervention results in Alice and Bob having different states about 1/4 of the time2 .
2
There is a probability 1/2 that Eve measures in a different basis and for those qubits there is a probability 1/2 that
her measurement changes the state.
218 CHAPTER 21. QUANTUM PROTOCOLS USING PHOTONS

To see if this is happening, Alice and Bob sacrifice some fraction of the good photons by sending
their values for the state down an insecure channel. If about 1/4 of the states disagree, then they know
that the photons are being intercepted. If only a small fraction disagree, Alice and Bob would have
needed to decide beforehand up to what fraction of disagreements they would consider an acceptable
risk in order to still send the message.
In summary, a quantum key distribution protocol is able to detect an eavesdropper because mea-
surements in quantum mechanics in general change the state.

21.1.2 BB92 protocol

There is a later version, also due to Bennet and Brassard, from 1992 (BB92), in which only two
polarizations are used: ↔ and % .. Note that these states are not orthogonal. Lack of orthogonality
is essential for the method to work. If only orthogonal states are used then there is only one basis, so
if Eve knows what this is she can measure the states of the photons in this basis and send then on to
Bob without being detected.
The BB92 protocol works as follows. To decide in which state to send the k-th photon, Alice
generates a random bit, ki , which is 0 or 1. If she gets 0 she sends | ↔i ≡ |0i a Z-type photon, whereas
if she gets 1 she sends | i ≡ H|0i, an X-type photon.
↔

If Bob were to always measure in the same basis as the one Alice used, i.e. the Z basis for Z-type
photons, and the X basis for X-type photons, he would always get 0. However, he doesn’t know which
basis Alice used, so, for each photon, he chooses a random basis by generating a random bit li . As
Alice also did, Bob chooses the Z-basis if li is 0 and the X basis if the li = 1. He notes for which
photon he measures 1 and sends this information to Alice on a public channel. This only happens
when they use different bases, i.e. they generate complementary random bits, ki = 1 − li , since if they
use the same basis Bob must get 0. The shared key is then the set {li } for which Bob measures 1.
Alice just has to take the complement of her bits for the same photons to get the same key as Bob.
Note that Bob measures 1 either if Alice chooses a Z-basis and Bob an X-basis, or vice versa, but
information as to which one is not transmitted down the public channel.
If an eavesdropper intercepts the qubits to try to determine this information, the result is similar
to that for the BB84 protocol. Alice and Bob could check, via a public channel, some of the bits of the
key. If the qubits are being intercepted, Alice and Bob would find that for about 1/4 of them, they
actually used the same basis.

Problems
21.1. BB84 Quantum Key Distribution
Consider the BB84 Quantum Key Distribution (QKD) protocol discussed in this chapter. As-
sume that Eve intercepts every qubit (photon) that Alice sends, and then transmits it to
Bob. Like Alice and Bob, Eve chooses one the bases (the 1 or the H basis) at random. Alice and
Bob compare, over a public channel which can be intercepted by Eve, which qubits they used
the same basis for (Alice for sending and Bob for measuring.) The values of these qubits (0 or
1) (which Alice and Bob agree on if Eve did not eavesdrop) form the shared key.

(i) For what fraction of the shared key qubits would Alice and Bob get different results for the
qubit due to Eve’s interception. (If Eve had not intercepted the qubits, then Alice and Bob
would agree for all qubits in the shared key.)
(ii) Supposing that the shared key has 10 qubits, what is the probability that all of Alice’s and
Bob’s qubits would agree (in which case Eve’s eavesdropping would not be detected?
21.1. QUANTUM KEY DISTRIBUTION 219

(iii) What is the probability that all qubits would agree if the shared key has 100 qubits?

21.2. Teleportation
Suppose that Alice has a qubit in a state

|ψi = α|0i + β|1i. (21.10)

The values of α and β are unknown to her and can not be determined as discussed in class. The
no-cloning theorem means that we can’t do repeated measurements on copies of this state. This
qubit may be the result of a (possibly complicated) quantum computation which Alice would like
to send on to Bob to continue the computation. Bob is far away and Alice can not physically
transport the qubit to Bob but wants to send the state.
Now Alice and Bob:

• share an entangled qubit

1
|β00 i = √ ( |0ia |0ib + |1ia |1ib ) , (21.11)
2
where a stands for Alice’s qubit and b stands for Bob’s, and
• can communicate over a classical channel (e.g. a phone).

Hence, together they have a 3-qubit state,

1
|φ0 i = √ ( α|0ia + β|1ia ) ⊗ ( |0ia |0ib + |1ia |1ib ) (21.12)
2
1
= √ ( α|000i + α|011i + β|100i + β|111i ) , (21.13)
2
where the leftmost two qubits refer to Alice and the rightmost qubit to Bob.
Alice now applies a Bell measurement (discussed in class) to the two qubits in her possession,
see the circuit below.

Alice ψ H x

Alice y
β00
Bob

φ0 φ1 φ2

(i) Determine the states |φ1 i and |φ2 i shown in the figure.
(ii) Alice then measures the two qubits in her possession, obtaining results x and y as shown.
She then calls up Bob and tells him the result of her measurements.
Explain what Bob needs to do, depending on the results of Alice’s measurements, for his
qubit to be in state
|ψi = α|0ib + β|1ib , (21.14)
i.e. the state that was originally in Alice’s possession.

Note:
220 CHAPTER 21. QUANTUM PROTOCOLS USING PHOTONS

• The state, but not the physical qubit, has been transported. This is called teleportation.
• This procedure doesn’t violate relativity (information can not be transmitted faster than
the speed of light) since classical communication between Alice and Bob is required.
• It does not violate the no-cloning theorem because, at the end, Alice doesn’t have her
original state |ψi, only two classical bits x and y. There is never more than one copy of |ψi
in existence.

Final Comment:
There are claims that teleportation has been verified experimentally which I will now discuss
briefly. One would like to show the following:

• Alice stores state |ψi.

• The state |ψi is transported to Bob who is far away.
• Bob stores state |ψi.

To transport qubits over a long distance one needs photons. One can teleport photons over a
large distance while retaining their polarization, but at present one can not store them in a way
which preserves their polarization. One can store other types of qubits, e.g. trapped ions, but
can’t entangle them over large distances, so they can be teleported only locally. Hence, in my
view, a complete demonstration of teleportation, incorporating all three bullet points above, has
not yet been achieved.
Chapter 22

Epilogue: Quantum Simulators

We are currently in the middle of what is called the “second quantum revolution”. The first quantum
revolution was the development of quantum mechanics in the 1920’s and subsequent applications to
devices like integrated circuits, which use quantum mechanics in the design of the hardware, but these
applications treat the information, i.e. the bits, classically. However, in the second quantum revolution,
the information itself is treated according to the rules of quantum mechanics.
In this book we have discussed what is called the circuit model (or gate model) of a quantum
computer. The qubits are initialized, and then acted on by a series of discrete unitary transformations
to solve the problem at hand. This sort of quantum device is what people normally refer to when they
talk about a “quantum computer”. The circuit model quantum computer was proposed initially by
David Deutsch [Deu85].
However, other types of quantum device are being developed as part of the second quantum revo-
lution, which can be termed “quantum simulators”. It is anticipated that we will have interesting new
results from quantum simulators, i.e. results which could not be obtained by a classical computer, in
the next few years. By contrast, the ability to get interesting new results from a circuit model quantum
computer, for example by decoding information sent down the internet using Shor’s algorithm (which
requires factoring a huge integer) will be very far in the future, if ever.
The idea of a quantum simulator is to use an artificial quantum device to simulate the quantum
system which we want to understand. It was first proposed by Feynman[Fey82]. For example a quantum
chemist might want to understand the properties of a certain molecule, or a condensed matter physicist
might want to understand a material with unusual magnetic or superconducting behavior. Properties
of these materials are, of course, determined by quantum mechanics. Many problems in nature are not
amenable to analytic (i.e. pencil and paper) calculations and need to be simulated. Although many
problems can be simulated efficiently on a classical computer, there remain problems of interest where
the quantum aspects cause serious difficulties for classical simulations. As an example, we learn in
quantum mechanics classes that particles of a certain type (e.g. electrons or protons or π mesons) are
(i) all identical and (ii) are in one of two classes, bosons or fermions. For bosons, the state of the
system (wave function) does not change if the two particles are interchanged, whereas for fermions
the state does change sign under particle interchange. This sign change for fermions can create great
difficulty when trying to simulate fermions on a classical computer.
By and large quantum simulators are analog devices. The reason is that, in order for the system of
qubits to model the problem of interest, there must be interactions between the qubits. In a classical
(digital) computer they would be represented by floating point numbers with typically 16 digits or
precision1 . In a quantum computer, however, interactions are induced by turning some “knob” on the
1
For many purpose this can be considered exact but, in any case, the interactions are represented by a precisely known
string of bits.

221
222 CHAPTER 22. EPILOGUE: QUANTUM SIMULATORS

experimental apparatus, the nature of the knob depending on the hardware used for the qubits. For
example, in the case of superconducting qubits, interactions would be determined by the value of a
magnetic field threading superconducting loops. The magnetic field takes a continuous range of values
(i.e. is analog) and can only be set within a certain level of precision.
Above I stated that we will probably have interesting new results from a quantum simulator before
we have new results from a (circuit model) quantum computer. Why is this? A quantum computer
uses quantum parallelism to get its quantum speedup. This depends on accurately preserving phase
relations between the different pieces of the state. These phase relations are destroyed by noise, an effect
called decoherence. Present-day qubits are quite noisy. In principle one can include error correction,
but this requires a huge number of physical qubits for each logical qubit. Thus, in the near future, we
will have to live with noisy qubits. However, as stated, noise is a disaster for circuit model quantum
computers.
Is a modest amount of noise as big a disaster for a quantum simulator? The answer is “probably
not”. For example, suppose we want to simulate the temperature dependence of the behavior of
a material which goes superconducting. A non-zero temperature means that there is noise due to
thermal fluctuations. One might hope that a bit of extra (even non-thermal) noise from the qubits
would not change the results all that much, so the results would, nonetheless, be useful. The next
paragraph discuses another example for which there is also reason to believe that some noise is not
disastrous.
A particular type of quantum simulator is one used to solve “optimization” problems, where we
need to find the maximum (or minimum) of some “objective function” with constraints. Let’s assume
for concreteness that we want the minimum. Optimization problems are very important in science and
engineering, two widely used applications being speech recognition and image recognition. Optimiza-
tion problems are hard when there is “frustration”, i.e. competition, between different pieces of the
function that one has to minimize. In these cases, if one locally minimizes individual pieces, one will
end up in a “local minimum” rather than the global minimum. It has been proposed to use “quantum
annealing”2 to try to find the global minimum. We recall from Chapter 3 that if we have two operators
which don’t commute then one or both of them must have an uncertainty in any quantum state. Thus
non-commuting operators generate fluctuations. By making the (classical) objective function become
quantum by adding a non-commuting “driver” piece to it, one induces fluctuations, which can help get
one out of a local minimum. In such a “quantum annealer” the qubits simulate the “objective function
plus driver function”. By letting the driver piece tend to zero at the end of the simulation, the model
simulated at the end is just the objective function, and we anticipate that the set of qubits will then be
close to the ground state. Quantum annealing has been pioneered by a company called D-Wave, which
has manufactured machines with around 5000 qubits. These 5000 qubits do not maintain coherence
during the time of the simulation, but it is anticipated that, despite some noise, the induced quantum
fluctuations will help to find the ground state.
To summarize, in the near future qubits will be noisy and we won’t be capable of assembling a
huge number of them together. Hence, in the short and intermediate term, we will only have “Noisy
Intermediate-Scale Quantum” (NISQ) devices. I expect that in the next few years we will be able to get
interesting, new3 results from NISQ simulators, but probably not from NISQ circuit model quantum
computers.

2
It was earlier proposed to add thermal fluctuations to solve optimization problems. This approach is called “thermal
annealing” or “simulated annealing”. Whether quantum annealing is more efficient in finding ground states than classical
algorithms such as simulated annealing is hotly debated at present.
3
By “new” I mean results that would be impossible to obtain on a classical computer. By “interesting” I mean results
that scientists would like to know for their own sake, not just as an illustration of the capabilities of a quantum computer.
Bibliography

[Deu85] D. Deutsch. Quantum theory, the Church-Turing Principle and the Universal Quantum
Computer. Proc. Roy. Soc. London, 400:97, 1985.

[Fey82] R. P. Feynman. Simulating physics with computers. Int. J. Theor. Phys., 21:467, 1982.

[FLS64] R. P. Feynman, R. B. Leighton, and M Sands. The Feynman Lectures on Physics. Addison
Wesley, New York, 1964. Available online at [Link]

[FMMC12] A.G. Fowler, M. Mariantoni, J.M. Martinis, and A.N. Cleland. Surface codes: Towards
practical large-scale quantum computation. Phys. Rev. A, 86:032324, 2012.

[Gri05] D. J. Griffiths. Introduction to Quantum Mechanics. Addison-Wesley, Boston, 2005.

[LaP21] R. LaPierre. Introduction to Quantum Computing. Springer, Materials Research Society

Series, 2021.

[Mer07] N. D. Mermin. Quantum Computer Science. Cambridge University Press, Cambridge,

2007.

[NC00] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cam-
bridge University Press, Cambridge, England, 2000.

[RP14] E. Rieffel and I. Polak. Quantum Computing; A Gentle Introduction. MIT Press, Mas-
sachusetts Institute of Technology, 2014.

[Sho94] P. W. Shor. Algorithms for quantum computing: discrete logarithms and factoring. In
S. Goldwasser, editor, Proc. 35th Symp. on Foundations of Computer Science, page 124,
Los Alamitos CA, 1994. IEEE Computer Society Press.

[Sho95] P. Shor. Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A,
52:2493, 1995.

[Ste96] A. M. Steane. Error correcting codes in quantum theory. Phys. Rev. Lett., 77:793, 1996.

[Vat16] R. Vathsan. Introduction to Quantum Physics and Information Processing. CRC Press,
Boca Raton, 2016.

223
224 BIBLIOGRAPHY
Index

σx , see Pauli X matrix Comparison between FFT and QFT, 143, 144
σy , see Pauli Y matrix completeness relation, 19, 24, 27, 46
σz , see Pauli Z matrix complexity
5-qubit code, 194 exponential, 108
polynomial, 108
amplitude, see probability amplitude composite systems, 28
amplitude amplification, 205 computational basis, 23, 75
analog device, 221 continued fraction, 161, 165, 167, 169
ancilla qubits, 40, 177 control qubit, 72, 76, 84, 96, 99, 100, 169, 180, 206
AND gate, 71 control-U gate, 78
angular momentum eigenstates, 42 cryptography, 215
Aspect, Alain, 67
D-Wave, 222
basis decoherence, 186, 222
change of, 10, 23 density matrix, 42, 45, 46, 68
vectors, 9 detection loophole, 68
Bell measurement, 84 Deutsch’s algorithm, 89, 105
Bell state, 41, 42, 51, 60, 69, 83, 173 Deutsch, David, 74, 89, 221
Bell’s inequality, see Bell’s theorem Deutsch-Josza Algorithm, 100
Bell’s theorem, 60, 61, 63–65, 67, 68 diagonalization of matrices, 12
Bell, John, 41, 60, 66, 67, 83 Diffie-Hellman encryption, 109
Bernstein-Vazirani algorithm, 97, 105, 106 Dirac notation, 1, 17
bit-flip code, 3-qubits, 176 direct product, see tensor product
bit-flip gate, see X matrix
bitwise addition, 105 eigenvalues and eigenvectors, 12, 13
bitwise inner product, 97 Einstein, Albert, 27, 59
black box, 89, 97, 105, 154, 206 Einstein, Podolsky, Rosen, see EPR
Bloch sphere, 39, 60 entanglement, 41, 45, 49, 78, 173, 186, 188
Bohm, David, 60 entropy (Von Neuman), 50
Bohr, Niels, 1, 27, 59 EPR, 25, 27, 42, 60, 66, 83
Born rule, 26, 59 Euclidean algorithm, 110, 112, 115, 117
extension, 110, 112
circuit identities, 94 expectation value, 28
circuit model, 74
classical gates, 71 Fast Fourier Transform, see FFT
Clauser, John, 67 fault tolerant quantum computing, 197
CNOT classical gate, 72 Feynman, Richard, 2, 221
CNOT quantum gate, 76 FFT, 119, 129, 141
codeword, 175 Fourier Transform, 119
coherence, 171, 177, 179 Fredkin gate, 73

225
226 INDEX

frustration, 222 norm, 11, 18, 19

functions of operators, 24 normalization, 9, 12, 13, 18, 19
NOT gate, 71
GCD, 109, 112, 117
generalized Born rule, 30, 68 objective function, 222
global minimum, 222 objective reality, 59, 66
greatest common divisor, see GCD observables, 20
Grover’s search algorithm, 205 optimization problems, 222
more than one special value, 212 OR gate, 71
oracle, see black box
Hadamard matrix (gate), 14, 75, 86, 152 orthogonality, 9
Hamiltonian, 33 orthonormal, 9, 18
hidden variable theories, 60, 66 outer product, 24, 46
inner product, 18 Pauli matrices
instantaneous transfer of information, 66 X matrix, 13, 37, 75
interference Y matrix, 13, 37, 75
of light, 2 Z matrix, 13, 37, 75
quantum, 20, 89, 99, 171 period finding algorithm, 115, 135
linearity, 20, 74 phase
local minimum, 222 global, 20
local realism, 60 relative, 20
locality loophole, 67 phase estimation, 135, 213
phase flip code, 184
magnetic moment, 4, 17 phase kickback, 91, 106
majority rule, 175 phase-flip gate, see Pauli Z matrix
matrices phases, 20
anti-commuting, 14 photon, 7, 65, 215
commuting, 11 circular polarization, 40
matrix polarization, 7, 40, 65, 67, 215, 216, 218
commutator, 11, 12, 14, 31 Planck’s constant, 32, 42
determinant, 12, 14, 69 position, 32
diagonalization, 12 private key, 110, 215
Hermitian, 11–13, 33, 37, 180 private key cryptography, 215
multiplication of, 11 probability amplitude, 6, 19, 20, 24, 26
trace, 14, 46 probability amplitude., 6
unitary, 11, 23, 33 product state, 41, 42, 45, 49, 50, 55, 56
maximally entangled state, 50, 52 public key, 110, 215
measurement gates, 75 public key cryptography, 215
measurements, 20, 25
mixed state, 49 QFT, 108, 121, 130, 153
modular exponentiation, 152 QKD, see quantum key distribution
momentum, 32 quantum annealing, 222
quantum counting algorithm, 213
NISQ devices, 222 quantum Fourier transform, see QFT
no-cloning theorem, 40, 74, 77, 107, 177, 219 quantum functions, 85
Nobel Prize, 67 quantum gates, 74
non-local theories, 66 quantum key distribution, 151, 215
non-orthogonal states, 54, 68 BB84 protocol, 216, 218
non-unitary transformation, 186, 188, 193 BB92 protocol, 218
INDEX 227

quantum NOT gate, see X matrix uncertainty, 28

quantum parallelism, 86, 87, 89, 99, 152, 211 uncertainty principle, 31, 32
quantum simulator, 221 unitary transformation, 23, 32, 56, 69
quantum state, 17 universal set of gates, 73
qubit, 1, 19, 22
general state of, 39 vector, 9
complex, 10, 17
register, 74
X, see Pauli X matrix
input, 86
XOR gate, 71
output, 86
reversible computation, 32, 72 Y, see Pauli Y matrix
RSA encryption, 109, 135, 151, 215
Z, see Pauli Z matrix
Schmidt Zeilinger, Anton, 67
coefficients, 55
decomposition, 53, 55
number (rank), 56
Schrödinger’s equation, 33
second quantum revolution, 221
Shor’s 9-qubit code, 188
Shor’s factoring algorithm, 108, 109, 151, 205
operation count, 165
Shor, Peter, 151
sign change under particle interchange, 221
Simon’s algorithm, 105, 108
simulated annealing, see thermal annealing
special relativity, 60, 66
spin-singlet state, 69
stabilizers, 180, 182
for 3-qubit code, 182
for 5-qubit code, 194
for Shor’s 9-qubit code, 190
for Steane’s 7-qubit code, 195
state vector, see quantum state
Steane’s 7-qubit code, 195
Stern-Gerlach experiment, 4
superposition, 1, 26, 40, 75, 86, 92, 98, 99, 130,
171, 186, 207, 212
surface codes, 196, 198
swap gate, 73
syndrome, 177

target qubit, 72, 76, 84, 96, 99, 100, 169, 180, 206
teleportation, 67, 219
tensor product, 29, 76, 91, 96
thermal annealing, 222
time evolution, 32
Toffoli gate, 73, 102, 180
generalized, 180

An Undergraduate Course On Quantum Computing Peter Young
No ratings yet
An Undergraduate Course On Quantum Computing Peter Young
233 pages
Barak Shoshany PHYS 4P51 Lecture Notes
No ratings yet
Barak Shoshany PHYS 4P51 Lecture Notes
180 pages
Quantum Mechanics Solutions Overview
No ratings yet
Quantum Mechanics Solutions Overview
62 pages
Script QI
No ratings yet
Script QI
199 pages
Physics 160 Notes
No ratings yet
Physics 160 Notes
73 pages
PHY 256: Quantum Physics Lecture Notes
100% (1)
PHY 256: Quantum Physics Lecture Notes
167 pages
PHY 256: Quantum Physics Overview
No ratings yet
PHY 256: Quantum Physics Overview
165 pages
Quantum Information and Symmetry Overview
No ratings yet
Quantum Information and Symmetry Overview
141 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Quantum Computing for Non-Physicists
No ratings yet
Quantum Computing for Non-Physicists
187 pages
Aqm Lecture Notes 77
No ratings yet
Aqm Lecture Notes 77
206 pages
Lecture Notes v2 18
No ratings yet
Lecture Notes v2 18
149 pages
Advanced Quantum Mechanics Overview
No ratings yet
Advanced Quantum Mechanics Overview
151 pages
QI Notes201123
No ratings yet
QI Notes201123
709 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Fundamentals of Quantum Information Theory: Michael Keyl
No ratings yet
Fundamentals of Quantum Information Theory: Michael Keyl
120 pages
Quantum Computing Lecture Notes
No ratings yet
Quantum Computing Lecture Notes
141 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
Mathematics of Quantum Entanglement
No ratings yet
Mathematics of Quantum Entanglement
70 pages
Qclec
No ratings yet
Qclec
260 pages
Quantum Entanglement Summer School
No ratings yet
Quantum Entanglement Summer School
52 pages
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
No ratings yet
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
170 pages
Intro To Quantum Computing - Aaronson
100% (1)
Intro To Quantum Computing - Aaronson
259 pages
Lectures Qco
No ratings yet
Lectures Qco
124 pages
Aqm-David Gross
100% (1)
Aqm-David Gross
128 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
Quantum Theory for High School Students
No ratings yet
Quantum Theory for High School Students
38 pages
Quantum Communication
No ratings yet
Quantum Communication
1,240 pages
Quantum Mechanics & Geometry Guide
No ratings yet
Quantum Mechanics & Geometry Guide
246 pages
Sham Lu J. Quantum Mechanics (Lecture Notes, 2002) (324s) - PQM
0% (1)
Sham Lu J. Quantum Mechanics (Lecture Notes, 2002) (324s) - PQM
324 pages
Quantum Computing Lecture Notes
100% (2)
Quantum Computing Lecture Notes
114 pages
Advanced Quantum Physics Overview 2223
No ratings yet
Advanced Quantum Physics Overview 2223
154 pages
Quantum Mechanics Lecture Notes
No ratings yet
Quantum Mechanics Lecture Notes
186 pages
T2 Skript Bertlmann
No ratings yet
T2 Skript Bertlmann
186 pages
Quantum Information Lecture Notes
No ratings yet
Quantum Information Lecture Notes
83 pages
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
No ratings yet
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
38 pages
Quantum
No ratings yet
Quantum
277 pages
Aqm 23
No ratings yet
Aqm 23
143 pages
Quantum Physics Overview
100% (1)
Quantum Physics Overview
301 pages
Lecture Notes on Quantum Information Theory
No ratings yet
Lecture Notes on Quantum Information Theory
101 pages
Introduction to Quantum Physics
No ratings yet
Introduction to Quantum Physics
158 pages
Fundamentals of Quantum Information Theory
No ratings yet
Fundamentals of Quantum Information Theory
118 pages
Quantum Mechanics for Scholars
No ratings yet
Quantum Mechanics for Scholars
178 pages
Spin Field-Effect Transistor
No ratings yet
Spin Field-Effect Transistor
17 pages
A Proximitized Quantum Dot in Germanium
No ratings yet
A Proximitized Quantum Dot in Germanium
17 pages
Sweet-Spot Operation of A Germanium Hole Spin Qubit With Highly Anisotropic Noise Sensitivity
No ratings yet
Sweet-Spot Operation of A Germanium Hole Spin Qubit With Highly Anisotropic Noise Sensitivity
22 pages
Electrical Control of Spin Coherence in Semiconductor Nanostructures
No ratings yet
Electrical Control of Spin Coherence in Semiconductor Nanostructures
4 pages
Complementary Spin Transistor Using A Quantum Well Channel
No ratings yet
Complementary Spin Transistor Using A Quantum Well Channel
7 pages
Electric Field-Tuneable Crossing of Hole Zeeman Splitting and Orbital Gaps in Compressively Strained Germanium Semiconductor On Silicon
No ratings yet
Electric Field-Tuneable Crossing of Hole Zeeman Splitting and Orbital Gaps in Compressively Strained Germanium Semiconductor On Silicon
9 pages
Magnetic Resonance Rabi Formula
No ratings yet
Magnetic Resonance Rabi Formula
39 pages
Dynamic Rashba-Dresselhaus Effect
No ratings yet
Dynamic Rashba-Dresselhaus Effect
21 pages
Electrical Control of Spins and Giant G-Factors in Ring-Like Coupled Quantum Dots
No ratings yet
Electrical Control of Spins and Giant G-Factors in Ring-Like Coupled Quantum Dots
7 pages
Few-Electron Quantum Dots
No ratings yet
Few-Electron Quantum Dots
37 pages
Quantum Dot Arrays in Silicon & Germanium
No ratings yet
Quantum Dot Arrays in Silicon & Germanium
9 pages
!!!!!!!!!!!!spin Qubits in Silicon and Germanium BOOK
No ratings yet
!!!!!!!!!!!!spin Qubits in Silicon and Germanium BOOK
231 pages
Factorisation of Hamiltonians
No ratings yet
Factorisation of Hamiltonians
20 pages
Zwiebach Notes PDF
No ratings yet
Zwiebach Notes PDF
68 pages
Convection Boundary Layers Explained
No ratings yet
Convection Boundary Layers Explained
8 pages
Top All36 Cgenff
No ratings yet
Top All36 Cgenff
521 pages
Unit-V PPT Vibrations
No ratings yet
Unit-V PPT Vibrations
99 pages
Prepared By: Engr. Lucia V. Ortega 8/28/20 Statics of Rigid Bodies
No ratings yet
Prepared By: Engr. Lucia V. Ortega 8/28/20 Statics of Rigid Bodies
11 pages
Full File at Https://testbanku - eu/Solution-Manual-for-Engineering-Mechanics-Dynamics-1st-Edition-by-Soutas-Little
No ratings yet
Full File at Https://testbanku - eu/Solution-Manual-for-Engineering-Mechanics-Dynamics-1st-Edition-by-Soutas-Little
88 pages
Two-Dimensional Flow of Water Through Soils: Soil Mechanics-I (CENG-2202) Chapter 5: Seepage Through Soils
No ratings yet
Two-Dimensional Flow of Water Through Soils: Soil Mechanics-I (CENG-2202) Chapter 5: Seepage Through Soils
8 pages
Chapter 01 Periodic Table and Periodic Properties
No ratings yet
Chapter 01 Periodic Table and Periodic Properties
21 pages
Quantum-Optical Nonlinearity Simulation
No ratings yet
Quantum-Optical Nonlinearity Simulation
56 pages
Electrostatics+Exercise 5 (A)
No ratings yet
Electrostatics+Exercise 5 (A)
5 pages
High-Voltage Outdoor Disconnectors
No ratings yet
High-Voltage Outdoor Disconnectors
9 pages
Effect of Glycerol Concentration and Temperature On The Rheological Properties of Cassava Starch Solutions
No ratings yet
Effect of Glycerol Concentration and Temperature On The Rheological Properties of Cassava Starch Solutions
8 pages
Molecular Orbital Theory Overview
No ratings yet
Molecular Orbital Theory Overview
7 pages
Organic and Biological Molecules Overview
No ratings yet
Organic and Biological Molecules Overview
2 pages
8928
No ratings yet
8928
59 pages
GenPhysics1 W4 Module5
100% (1)
GenPhysics1 W4 Module5
90 pages
Question Bank HT
No ratings yet
Question Bank HT
12 pages
CSSPsample PDF
No ratings yet
CSSPsample PDF
5 pages
Bpharma 7 Sem Pharma Analysis 3 2017
No ratings yet
Bpharma 7 Sem Pharma Analysis 3 2017
1 page
2012 - APA. Exotic Properties and Potential Applications of Quantum Metamaterials
No ratings yet
2012 - APA. Exotic Properties and Potential Applications of Quantum Metamaterials
8 pages
Geotechnical Insights on Pile Behavior
No ratings yet
Geotechnical Insights on Pile Behavior
27 pages
Hall Thruster Miniaturization Study
No ratings yet
Hall Thruster Miniaturization Study
264 pages
Part - B. Tpde
No ratings yet
Part - B. Tpde
20 pages
Thermodynamics Project (Fixed)
No ratings yet
Thermodynamics Project (Fixed)
5 pages
ABB REF Catalogue
No ratings yet
ABB REF Catalogue
56 pages
Phys 142 Notes
No ratings yet
Phys 142 Notes
3 pages
MC450 User Manual
100% (1)
MC450 User Manual
32 pages
Geometric Unity
No ratings yet
Geometric Unity
10 pages
Wind Tunnel & Flow Experiments Guide
0% (1)
Wind Tunnel & Flow Experiments Guide
22 pages
Space Physics Combined Science
No ratings yet
Space Physics Combined Science
43 pages
How Does The Shape of An Ice Cube Affect How Fast It Melts
No ratings yet
How Does The Shape of An Ice Cube Affect How Fast It Melts
2 pages