0% found this document useful (0 votes)

7 views

Preskill Quantum computing

The lecture notes for Physics 229 cover the fundamentals of quantum information and computation, including topics such as quantum algorithms, complexity, error correction, and quantum hardware. The document is structured into sections that delve into the principles of quantum mechanics, measurement, entanglement, and information theory. It serves as a comprehensive resource for understanding the intersection of physics and information science.

Uploaded by

Kumar Vaibhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Preskill Quantum computing

Uploaded by

Kumar Vaibhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 481

Lecture Notes for Physics 229:

Quantum Information and

Computation
John Preskill
California Institute of Technology
September, 1998
2
Contents
1 Introduction and Overview 7
1.1 Physics of information . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Quantum information . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Ecient quantum algorithms . . . . . . . . . . . . . . . . . . 11
1.4 Quantum complexity . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Quantum parallelism . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 A new classi cation of complexity . . . . . . . . . . . . . . . . 19
1.7 What about errors? . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 Quantum error-correcting codes . . . . . . . . . . . . . . . . . 26
1.9 Quantum hardware . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9.1 Ion Trap . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.9.2 Cavity QED . . . . . . . . . . . . . . . . . . . . . . . . 33
1.9.3 NMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2 Foundations I: States and Ensembles 37
2.1 Axioms of quantum mechanics . . . . . . . . . . . . . . . . . . 37
2.2 The Qubit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.1 Spin- 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.2 Photon polarizations . . . . . . . . . . . . . . . . . . . 48
2.3 The density matrix . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.1 The bipartite quantum system . . . . . . . . . . . . . . 49
2.3.2 Bloch sphere . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.3 Gleason's theorem . . . . . . . . . . . . . . . . . . . . 56
2.3.4 Evolution of the density operator . . . . . . . . . . . . 58
2.4 Schmidt decomposition . . . . . . . . . . . . . . . . . . . . . . 59
2.4.1 Entanglement . . . . . . . . . . . . . . . . . . . . . . . 61
2.5 Ambiguity of the ensemble interpretation . . . . . . . . . . . . 62
3
4 CONTENTS
2.5.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.5.2 Ensemble preparation . . . . . . . . . . . . . . . . . . . 64
2.5.3 Faster than light? . . . . . . . . . . . . . . . . . . . . . 66
2.5.4 Quantum erasure . . . . . . . . . . . . . . . . . . . . . 68
2.5.5 The GHJW theorem . . . . . . . . . . . . . . . . . . . 71
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3 Measurement and Evolution 77
3.1 Orthogonal Measurement and Beyond . . . . . . . . . . . . . . 77
3.1.1 Orthogonal Measurements . . . . . . . . . . . . . . . . 77
3.1.2 Generalized measurement . . . . . . . . . . . . . . . . 81
3.1.3 One-qubit POVM . . . . . . . . . . . . . . . . . . . . . 83
3.1.4 Neumark's theorem . . . . . . . . . . . . . . . . . . . . 84
3.1.5 Orthogonal measurement on a tensor product . . . . . 86
3.1.6 GHJW with POVM's . . . . . . . . . . . . . . . . . . . 91
3.2 Superoperators . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.2.1 The operator-sum representation . . . . . . . . . . . . 92
3.2.2 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2.3 Complete positivity . . . . . . . . . . . . . . . . . . . . 97
3.2.4 POVM as a superoperator . . . . . . . . . . . . . . . . 98
3.3 The Kraus Representation Theorem . . . . . . . . . . . . . . . 100
3.4 Three Quantum Channels . . . . . . . . . . . . . . . . . . . . 104
3.4.1 Depolarizing channel . . . . . . . . . . . . . . . . . . . 104
3.4.2 Phase-damping channel . . . . . . . . . . . . . . . . . . 108
3.4.3 Amplitude-damping channel . . . . . . . . . . . . . . . 111
3.5 Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5.1 Markovian evolution . . . . . . . . . . . . . . . . . . . 114
3.5.2 The Lindbladian . . . . . . . . . . . . . . . . . . . . . 117
3.5.3 Damped harmonic oscillator . . . . . . . . . . . . . . . 119
3.5.4 Phase damping . . . . . . . . . . . . . . . . . . . . . . 121
3.6 What is the problem? (Is there a problem?) . . . . . . . . . . 124
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4 Quantum Entanglement 139
4.1 Nonseparability of EPR pairs . . . . . . . . . . . . . . . . . . 139
4.1.1 Hidden quantum information . . . . . . . . . . . . . . 139
CONTENTS 5
4.1.2 Einstein locality and hidden variables . . . . . . . . . . 144
4.1.3 Bell Inequalities . . . . . . . . . . . . . . . . . . . . . . 145
4.1.4 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.1.5 More Bell inequalities . . . . . . . . . . . . . . . . . . . 150
4.1.6 Maximal violation . . . . . . . . . . . . . . . . . . . . . 153
4.1.7 The Aspect experiment . . . . . . . . . . . . . . . . . . 154
4.1.8 Nonmaximal entanglement . . . . . . . . . . . . . . . . 154
4.2 Uses of Entanglement . . . . . . . . . . . . . . . . . . . . . . . 156
4.2.1 Dense coding . . . . . . . . . . . . . . . . . . . . . . . 156
4.2.2 EPR Quantum Key Distribution . . . . . . . . . . . . 158
4.2.3 No cloning . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.2.4 Quantum teleportation . . . . . . . . . . . . . . . . . . 164
5 Quantum Information Theory 167
5.1 Shannon for Dummies . . . . . . . . . . . . . . . . . . . . . . 168
5.1.1 Shannon entropy and data compression . . . . . . . . . 168
5.1.2 Mutual information . . . . . . . . . . . . . . . . . . . . 171
5.1.3 The noisy channel coding theorem . . . . . . . . . . . . 173
5.2 Von Neumann Entropy . . . . . . . . . . . . . . . . . . . . . . 179
5.2.1 Mathematical properties of S () . . . . . . . . . . . . . 181
5.2.2 Entropy and thermodynamics . . . . . . . . . . . . . . 184
5.3 Quantum Data Compression . . . . . . . . . . . . . . . . . . . 186
5.3.1 Quantum data compression: an example . . . . . . . . 187
5.3.2 Schumacher encoding in general . . . . . . . . . . . . . 190
5.3.3 Mixed-state coding: Holevo information . . . . . . . . 194
5.4 Accessible Information . . . . . . . . . . . . . . . . . . . . . . 198
5.4.1 The Holevo Bound . . . . . . . . . . . . . . . . . . . . 202
5.4.2 Improving distinguishability: the Peres{Wootters method205
5.4.3 Attaining Holevo: pure states . . . . . . . . . . . . . . 209
5.4.4 Attaining Holevo: mixed states . . . . . . . . . . . . . 212
5.4.5 Channel capacity . . . . . . . . . . . . . . . . . . . . . 214
5.5 Entanglement Concentration . . . . . . . . . . . . . . . . . . . 216
5.5.1 Mixed-state entanglement . . . . . . . . . . . . . . . . 222
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6 CONTENTS
6 Quantum Computation 231
6.1 Classical Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.1.1 Universal gates . . . . . . . . . . . . . . . . . . . . . . 231
6.1.2 Circuit complexity . . . . . . . . . . . . . . . . . . . . 234
6.1.3 Reversible computation . . . . . . . . . . . . . . . . . . 240
6.1.4 Billiard ball computer . . . . . . . . . . . . . . . . . . 245
6.1.5 Saving space . . . . . . . . . . . . . . . . . . . . . . . . 247
6.2 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 250
6.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.2.2 BQP PSPACE . . . . . . . . . . . . . . . . . . . . . 256
6.2.3 Universal quantum gates . . . . . . . . . . . . . . . . . 259
6.3 Some Quantum Algorithms . . . . . . . . . . . . . . . . . . . 267
6.4 Quantum Database Search . . . . . . . . . . . . . . . . . . . . 275
6.4.1 The oracle . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.4.2 The Grover iteration . . . . . . . . . . . . . . . . . . . 278
6.4.3 Finding 1 out of 4 . . . . . . . . . . . . . . . . . . . . . 279
6.4.4 Finding 1 out of N . . . . . . . . . . . . . . . . . . . . 281
6.4.5 Multiple solutions . . . . . . . . . . . . . . . . . . . . . 282
6.4.6 Implementing the re ection . . . . . . . . . . . . . . . 283
6.5 The Grover Algorithm Is Optimal . . . . . . . . . . . . . . . . 284
6.6 Generalized Search and Structured Search . . . . . . . . . . . 287
6.7 Some Problems Admit No Speedup . . . . . . . . . . . . . . . 289
6.8 Distributed database search . . . . . . . . . . . . . . . . . . . 293
6.8.1 Quantum communication complexity . . . . . . . . . . 295
6.9 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
6.9.1 Finding the period . . . . . . . . . . . . . . . . . . . . 298
6.9.2 From FFT to QFT . . . . . . . . . . . . . . . . . . . . 302
6.10 Factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.10.1 Factoring as period nding . . . . . . . . . . . . . . . . 304
6.10.2 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
6.11 Phase Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 312
6.12 Discrete Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
6.13 Simulation of Quantum Systems . . . . . . . . . . . . . . . . . 317
6.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
6.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Chapter 1
Introduction and Overview
The course has a website at
http://www.theory.caltech.edu/preskill/ph229
General information can be found there, including a course outline and links
to relevant references.
Our topic can be approached from a variety of points of view, but these
lectures will adopt the perspective of a theoretical physicist (that is, it's my
perspective and I'm a theoretical physicist). Because of the interdisciplinary
character of the subject, I realize that the students will have a broad spectrum
of backgrounds, and I will try to allow for that in the lectures. Please give
me feedback if I am assuming things that you don't know.

1.1 Physics of information

Why is a physicist teaching a course about information? In fact, the physics
of information and computation has been a recognized discipline for at least
several decades. This is natural. Information, after all, is something that is
encoded in the state of a physical system; a computation is something that
can be carried out on an actual physically realizable device. So the study of
information and computation should be linked to the study of the underlying
physical processes. Certainly, from an engineering perspective, mastery of
principles of physics and materials science is needed to develop state-of-the-
art computing hardware. (Carver Mead calls his Caltech research group,
dedicated to advancing the art of chip design, the \Physics of Computation"
(Physcmp) group).
7
8 CHAPTER 1. INTRODUCTION AND OVERVIEW
From a more abstract theoretical perspective, there have been noteworthy
milestones in our understanding of how physics constrains our ability to use
and manipulate information. For example:
Landauer's principle. Rolf Landauer pointed out in 1961 that erasure
of information is necessarily a dissipative process. His insight is that erasure
always involves the compression of phase space, and so is irreversible.
For example, I can store one bit of information by placing a single molecule
in a box, either on the left side or the right side of a partition that divides
the box. Erasure means that we move the molecule to the left side (say) irre-
spective of whether it started out on the left or right. I can suddenly remove
the partition, and then slowly compress the one-molecule \gas" with a piston
until the molecule is de nitely on the left side. This procedure reduces the
entropy of the gas by S = k ln 2 and there is an associated ow of heat from
the box to the environment. If the process is isothermal at temperature T ,
then work W = kT ln 2 is performed on the box, work that I have to provide.
If I am to erase information, someone will have to pay the power bill.
Reversible computation. The logic gates used to perform computa-
tion are typically irreversible, e.g., the NAND gate
(a; b) ! :(a ^ b) (1.1)
has two input bits and one output bit, and we can't recover a unique input
from the output bit. According to Landauer's principle, since about one
bit is erased by the gate (averaged over its possible inputs), at least work
W = kT ln 2 is needed to operate the gate. If we have a nite supply of
batteries, there appears to be a theoretical limit to how long a computation
we can perform.
But Charles Bennett found in 1973 that any computation can be per-
formed using only reversible steps, and so in principle requires no dissipation
and no power expenditure. We can actually construct a reversible version
of the NAND gate that preserves all the information about the input: For
example, the (To oli) gate
(a; b; c) ! (a; b; c a ^ b) (1.2)
is a reversible 3-bit gate that ips the third bit if the rst two both take
the value 1 and does nothing otherwise. The third output bit becomes the
NAND of a and b if c = 1. We can transform an irreversible computation
1.1. PHYSICS OF INFORMATION 9
to a reversible one by replacing the NAND gates by To oli gates. This
computation could in principle be done with negligible dissipation.
However, in the process we generate a lot of extra junk, and one wonders
whether we have only postponed the energy cost; we'll have to pay when we
need to erase all the junk. Bennett addressed this issue by pointing out that
a reversible computer can run forward to the end of a computation, print
out a copy of the answer (a logically reversible operation) and then reverse
all of its steps to return to its initial con guration. This procedure removes
the junk without any energy cost.
In principle, then, we need not pay any power bill to compute. In prac-
tice, the (irreversible) computers in use today dissipate orders of magnitude
more than kT ln 2 per gate, anyway, so Landauer's limit is not an important
engineering consideration. But as computing hardware continues to shrink
in size, it may become important to beat Landauer's limit to prevent the
components from melting, and then reversible computation may be the only
option.
Maxwell's demon. The insights of Landauer and Bennett led Bennett
in 1982 to the reconciliation of Maxwell's demon with the second law of ther-
modynamics. Maxwell had envisioned a gas in a box, divided by a partition
into two parts A and B . The partition contains a shutter operated by the
demon. The demon observes the molecules in the box as they approach the
shutter, allowing fast ones to pass from A to B , and slow ones from B to A.
Hence, A cools and B heats up, with a negligible expenditure of work. Heat
ows from a cold place to a hot place at no cost, in apparent violation of the
second law.
The resolution is that the demon must collect and store information about
the molecules. If the demon has a nite memory capacity, he cannot continue
to cool the gas inde nitely; eventually, information must be erased. At that
point, we nally pay the power bill for the cooling we achieved. (If the demon
does not erase his record, or if we want to do the thermodynamic accounting
before the erasure, then we should associate some entropy with the recorded
information.)
These insights were largely anticipated by Leo Szilard in 1929; he was
truly a pioneer of the physics of information. Szilard, in his analysis of the
Maxwell demon, invented the concept of a bit of information, (the name \bit"
was introduced later, by Tukey) and associated the entropy S = k ln 2 with
the acquisition of one bit (though Szilard does not seem to have fully grasped
Landauer's principle, that it is the erasure of the bit that carries an inevitable
10 CHAPTER 1. INTRODUCTION AND OVERVIEW
cost).
These examples illustrate that work at the interface of physics and infor-
mation has generated noteworthy results of interest to both physicists and
computer scientists.

1.2 Quantum information

The moral we draw is that \information is physical." and it is instructive to
consider what physics has to tell us about information. But fundamentally,
the universe is quantum mechanical. How does quantum theory shed light
on the nature of information?
It must have been clear already in the early days of quantum theory that
classical ideas about information would need revision under the new physics.
For example, the clicks registered in a detector that monitors a radioactive
source are described by a truly random Poisson process. In contrast, there is
no place for true randomness in deterministic classical dynamics (although
of course a complex (chaotic) classical system can exhibit behavior that is in
practice indistinguishable from random).
Furthermore, in quantum theory, noncommuting observables cannot si-
multaneously have precisely de ned values (the uncertainty principle), and in
fact performing a measurement of one observable A will necessarily in uence
the outcome of a subsequent measurement of an observable B , if A and B
do not commute. Hence, the act of acquiring information about a physical
system inevitably disturbs the state of the system. There is no counterpart
of this limitation in classical physics.
The tradeo between acquiring information and creating a disturbance is
related to quantum randomness. It is because the outcome of a measurement
has a random element that we are unable to infer the initial state of the
system from the measurement outcome.
That acquiring information causes a disturbance is also connected with
another essential distinction between quantum and classical information:
quantum information cannot be copied with perfect delity (the no-cloning
principle annunciated by Wootters and Zurek and by Dieks in 1982). If we
could make a perfect copy of a quantum state, we could measure an observ-
able of the copy without disturbing the original and we could defeat the
principle of disturbance. On the other hand, nothing prevents us from copy-
ing classical information perfectly (a welcome feature when you need to back
1.3. EFFICIENT QUANTUM ALGORITHMS 11
up your hard disk).
These properties of quantum information are important, but the really
deep way in which quantum information di ers from classical information
emerged from the work of John Bell (1964), who showed that the predictions
of quantum mechanics cannot be reproduced by any local hidden variable
theory. Bell showed that quantum information can be (in fact, typically is)
encoded in nonlocal correlations between the di erent parts of a physical
system, correlations with no classical counterpart. We will discuss Bell's
theorem in detail later on, and I will also return to it later in this lecture.
The study of quantum information as a coherent discipline began to
emerge in the 1980's, and it has blossomed in the 1990's. Many of the
central results of classical information theory have quantum analogs that
have been discovered and developed recently, and we will discuss some of
these developments later in the course, including: compression of quantum
information, bounds on classical information encoded in quantum systems,
bounds on quantum information sent reliably over a noisy quantum channel.

1.3 Ecient quantum algorithms

Given that quantum information has many unusual properties, it might have
been expected that quantum theory would have a profound impact on our
understanding of computation. That this is spectacularly true came to many
of us as a bolt from the blue unleashed by Peter Shor (an AT&T computer
scientist and a former Caltech undergraduate) in April, 1994. Shor demon-
strated that, at least in principle, a quantum computer can factor a large
number eciently.
Factoring ( nding the prime factors of a composite number) is an example
of an intractable problem with the property:
| The solution can be easily veri ed, once found.
| But the solution is hard to nd.
That is, if p and q are large prime numbers, the product n = pq can be
computed quickly (the number of elementary bit operations required is about
log2 p log2 q). But given n, it is hard to nd p and q.
The time required to nd the factors is strongly believed (though this has
never been proved) to be superpolynomial in log(n). That is, as n increases,
the time needed in the worst case grows faster than any power of log(n). The
12 CHAPTER 1. INTRODUCTION AND OVERVIEW
best known factoring algorithm (the \number eld sieve") requires
time ' exp[c(ln n)1=3(ln ln n)2=3] (1.3)
where c = (64=9)1=3 1:9. The current state of the art is that the 65 digit
factors of a 130 digit number can be found in the order of one month by a
network of hundreds of work stations. Using this to estimate the prefactor
in Eq. 1.3, we can estimate that factoring a 400 digit number would take
about 1010 years, the age of the universe. So even with vast improvements
in technology, factoring a 400 digit number will be out of reach for a while.
The factoring problem is interesting from the perspective of complexity
theory, as an example of a problem presumed to be intractable; that is, a
problem that can't be solved in a time bounded by a polynomial in the size
of the input, in this case log n. But it is also of practical importance, because
the diculty of factoring is the basis of schemes for public key cryptography,
such as the widely used RSA scheme.
The exciting new result that Shor found is that a quantum computer can
factor in polynomial time, e.g., in time O[(ln n)3]. So if we had a quantum
computer that could factor a 130 digit number in one month (of course we
don't, at least not yet!), running Shor's algorithm it could factor that 400
digit number in less than 3 years. The harder the problem, the greater the
advantage enjoyed by the quantum computer.
Shor's result spurred my own interest in quantum information (were it
not for Shor, I don't suppose I would be teaching this course). It's fascinating
to contemplate the implications for complexity theory, for quantum theory,
for technology.

1.4 Quantum complexity

Of course, Shor's work had important antecedents. That a quantum system
can perform a computation was rst explicitly pointed out by Paul Benio
and Richard Feynman (independently) in 1982. In a way, this was a natural
issue to wonder about in view of the relentless trend toward miniaturization
in microcircuitry. If the trend continues, we will eventually approach the
regime where quantum theory is highly relevant to how computing devices
function. Perhaps this consideration provided some of the motivation behind
Benio 's work. But Feynman's primary motivation was quite di erent and
very interesting. To understand Feynman's viewpoint, we'll need to be more
1.4. QUANTUM COMPLEXITY 13
explicit about the mathematical description of quantum information and
computation.
The indivisible unit of classical information is the bit: an object that can
take either one of two values: 0 or 1. The corresponding unit of quantum
information is the quantum bit or qubit. The qubit is a vector in a two-
dimensional complex vector space with inner product; in deference to the
classical bit we can call the elements of an orthonormal basis in this space
j0i and j1i. Then a normalized vector can be represented
j i = aj0i + bj1i; jaj2 + jbj2 = 1: (1.4)
where a; b 2 C. We can perform a measurement that projects j i onto the
basis j0i; j1i. The outcome of the measurement is not deterministic | the
probability that we obtain the result j0i is jaj2 and the probability that we
obtain the result j1i is jbj2.
The quantum state of N qubits can be expressed as a vector in a space
of dimension 2N . We can choose as an orthonormal basis for this space the
states in which each qubit has a de nite value, either j0i or j1i. These can
be labeled by binary strings such as
j01110010 1001i (1.5)
A general normalized vector can be expanded in this basis as
2X
N ;1
axjxi ; (1.6)
x=0
where we have associated with each string the number that it represents in
binary notation, ranging
P in value from 0 to 2N ; 1. Here the ax's are complex
numbers satisfying x jaxj2 = 1. If we measure all N qubits by projecting
each onto the fj0i; j1ig basis, the probability of obtaining the outcome jxi is
jaxj2.
Now, a quantum computation can be described this way. We assemble N
qubits, and prepare them in a standard initial state such as j0ij0i j0i, or
jx = 0i. We then apply a unitary transformation U to the N qubits. (The
transformation U is constructed as a product of standard quantum gates,
unitary transformations that act on just a few qubits at a time). After U is
applied, we measure all of the qubits by projecting onto the fj0i; j1ig basis.
The measurement outcome is the output of the computation. So the nal
14 CHAPTER 1. INTRODUCTION AND OVERVIEW
output is classical information that can be printed out on a piece of paper,
and published in Physical Review.
Notice that the algorithm performed by the quantum computer is a prob-
abilistic algorithm. That is, we could run exactly the same program twice
and obtain di erent results, because of the randomness of the quantum mea-
surement process. The quantum algorithm actually generates a probability
distribution of possible outputs. (In fact, Shor's factoring algorithm is not
guaranteed to succeed in nding the prime factors; it just succeeds with
a reasonable probability. That's okay, though, because it is easy to verify
whether the factors are correct.)
It should be clear from this description that a quantum computer, though
it may operate according to di erent physical principles than a classical com-
puter, cannot do anything that a classical computer can't do. Classical com-
puters can store vectors, rotate vectors, and can model the quantum mea-
surement process by projecting a vector onto mutually orthogonal axes. So
a classical computer can surely simulate a quantum computer to arbitrarily
good accuracy. Our notion of what is computable will be the same, whether
we use a classical computer or a quantum computer.
But we should also consider how long the simulation will take. Suppose we
have a computer that operates on a modest number of qubits, like N = 100.
Then to represent the typical quantum state of the computer, we would need
to write down 2N = 2100 1030 complex numbers! No existing or foreseeable
digital computer will be able to do that. And performing a general rotation
of a vector in a space of dimension 1030 is far beyond the computational
capacity of any foreseeable classical computer.
(Of course, N classical bits can take 2N possible values. But for each
one of these, it is very easy to write down a complete description of the
con guration | a binary string of length N . Quantum information is very
di erent in that writing down a complete description of just one typical
con guration of N qubits is enormously complex.)
So it is true that a classical computer can simulate a quantum computer,
but the simulation becomes extremely inecient as the number of qubits N
increases. Quantum mechanics is hard (computationally) because we must
deal with huge matrices { there is too much room in Hilbert space. This
observation led Feynman to speculate that a quantum computer would be
able to perform certain tasks that are beyond the reach of any conceivable
classical computer. (The quantum computer has no trouble simulating itself!)
Shor's result seems to bolster this view.
1.4. QUANTUM COMPLEXITY 15
Is this conclusion unavoidable? In the end, our simulation should provide
a means of assigning probabilities to all the possible outcomes of the nal
measurement. It is not really necessary, then, for the classical simulation
to track the complete description of the N -qubit quantum state. We would
settle for a probabilistic classical algorithm, in which the outcome is not
uniquely determined by the input, but in which various outcomes arise with
a probability distribution that coincides with that generated by the quantum
computation. We might hope to perform a local simulation, in which each
qubit has a de nite value at each time step, and each quantum gate can act on
the qubits in various possible ways, one of which is selected as determined by
a (pseudo)-random number generator. This simulation would be much easier
than following the evolution of a vector in an exponentially large space.
But the conclusion of John Bell's powerful theorem is precisely that this
simulation could never work: there is no local probabilistic algorithm that
can reproduce the conclusions of quantum mechanics. Thus, while there is
no known proof, it seems highly likely that simulating a quantum computer
is a very hard problem for any classical computer.
To understand better why the mathematical description of quantum in-
formation is necessarily so complex, imagine we have a 3N -qubit quantum
system (N 1) divided into three subsystems of N qubits each (called sub-
systems (1),(2), and (3)). We randomly choose a quantum state of the 3N
qubits, and then we separate the 3 subsystems, sending (1) to Santa Barbara
and (3) to San Diego, while (2) remains in Pasadena. Now we would like to
make some measurements to nd out as much as we can about the quantum
state. To make it easy on ourselves, let's imagine that we have a zillion copies
of the state of the system so that we can measure any and all the observables
we want.1 Except for one proviso: we are restricted to carrying out each
measurement within one of the subsystems | no collective measurements
spanning the boundaries between the subsystems are allowed. Then for a
typical state of the 3N -qubit system, our measurements will reveal almost
nothing about what the state is. Nearly all the information that distinguishes
one state from another is in the nonlocal correlations between measurement
outcomes in subsystem (1) (2), and (3). These are the nonlocal correlations
that Bell found to be an essential part of the physical description.
1We cannot make copies of an unknown quantum state ourselves, but we can ask a
friend to prepare many identical copies of the state (he can do it because he knows what
the state is), and not tell us what he did.
16 CHAPTER 1. INTRODUCTION AND OVERVIEW
We'll see that information content can be quanti ed by entropy (large
entropy means little information.) If we choose a state for the 3N qubits
randomly, we almost always nd that the entropy of each subsystem is very
close to
S
= N ; 2;(N +1); (1.7)
a result found by Don Page. Here N is the maximum possible value of the
entropy, corresponding to the case in which the subsystem carries no accessi-
ble information at all. Thus, for large N we can access only an exponentially
small amount of information by looking at each subsystem separately.
That is, the measurements reveal very little information if we don't con-
sider how measurement results obtained in San Diego, Pasadena, and Santa
Barbara are correlated with one another | in the language I am using, a
measurement of a correlation is considered to be a \collective" measurement
(even though it could actually be performed by experimenters who observe
the separate parts of the same copy of the state, and then exchange phone
calls to compare their results). By measuring the correlations we can learn
much more; in principle, we can completely reconstruct the state.
Any satisfactory description of the state of the 3N qubits must charac-
terize these nonlocal correlations, which are exceedingly complex. This is
why a classical simulation of a large quantum system requires vast resources.
(When such nonlocal correlations exist among the parts of a system, we say
that the parts are \entangled," meaning that we can't fully decipher the state
of the system by dividing the system up and studying the separate parts.)

1.5 Quantum parallelism

Feynman's idea was put in a more concrete form by David Deutsch in 1985.
Deutsch emphasized that a quantum computer can best realize its compu-
tational potential by invoking what he called \quantum parallelism." To
understand what this means, it is best to consider an example.
Following Deutsch, imagine we have a black box that computes a func-
tion that takes a single bit x to a single bit f (x). We don't know what is
happening inside the box, but it must be something complicated, because the
computation takes 24 hours. There are four possible functions f (x) (because
each of f (0) and f (1) can take either one of two possible values) and we'd
1.5. QUANTUM PARALLELISM 17
like to know what the box is computing. It would take 48 hours to nd out
both f (0) and f (1).
But we don't have that much time; we need the answer in 24 hours, not
48. And it turns out that we would be satis ed to know whether f (x) is
constant (f (0) = f (1)) or balanced (f (0) 6= f (1)). Even so, it takes 48 hours
to get the answer.
Now suppose we have a quantum black box that computes f (x). Of course
f (x) might not be invertible, while the action of our quantum computer is
unitary and must be invertible, so we'll need a transformation Uf that takes
two qubits to two:
Uf : jxijyi ! jxijy f (x)i : (1.8)
(This machine ips the second qubit if f acting on the rst qubit is 1, and
doesn't do anything if f acting on the rst qubit is 0.) We can determine if
f (x) is constant or balanced by using the quantum black box twice. But it
still takes a day for it to produce one output, so that won't do. Can we get
the answer (in 24 hours) by running the quantum black box just once. (This
is \Deutsch's problem.")
Because the black box is a quantum computer, we can choose the input
state to be a superposition of j0i and j1i. If the second qubit is initially
prepared in the state p12 (j0i ; j1i), then
Uf : jxi p1 (j0i ; j1i) ! jxi p1 (jf (x)i ; j1 f (x)i)
2 2
= jxi(;1)f (x) p1 (j0i ; j1i); (1.9)
2
so we have isolated the function f in an x-dependent phase. Now suppose
we prepare the rst qubit as p12 (j0i + j1i). Then the black box acts as
Uf : p1 (j0i + j1i) p1 (j0i ; j1i) !
2 2
h i
p1 (;1)f (0)j0i + (;1)f (1)j1i p1 (j0i ; j1i) : (1.10)
2 2
Finally, we can perform a measurement that projects the rst qubit onto the
basis
ji = p12 (j0i j1i): (1.11)
18 CHAPTER 1. INTRODUCTION AND OVERVIEW
Evidently, we will always obtain j+i if the function is balanced, and j;i if
the function is constant.2
So we have solved Deutsch's problem, and we have found a separation be-
tween what a classical computer and a quantum computer can achieve. The
classical computer has to run the black box twice to distinguish a balanced
function from a constant function, but a quantum computer does the job in
one go!
This is possible because the quantum computer is not limited to com-
puting either f (0) or f (1). It can act on a superposition of j0i and j1i, and
thereby extract \global" information about the function, information that
depends on both f (0) and f (1). This is quantum parallelism.
Now suppose we are interested in global properties of a function that acts
on N bits, a function with 2N possible arguments. To compute a complete
table of values of f (x), we would have to calculate f 2N times, completely
infeasible for N 1 (e.g., 1030 times for N = 100). But with a quantum
computer that acts according to
Uf : jxij0i ! jxijf (x)i ; (1.12)
we could choose the input register to be in a state
" #N
p (j0i + j1i) = 2N=1 2 X jxi ;
1 2N ;1
(1.13)
2 x=0
and by computing f (x) only once, we can generate a state
1 2XN ;1
jxijf (x)i : (1.14)
2N=2 x=0
Global properties of f are encoded in this state, and we might be able to
extract some of those properties if we can only think of an ecient way to
do it.
This quantum computation exhibits \massive quantum parallelism;" a
simulation of the preparation of this state on a classical computer would
2 In our earlier description of a quantum computation, we stated that the nal mea-
surement would project each qubit onto the fj0i; j1ig basis, but here we are allowing
measurement in a di erent basis. To describe the procedure in the earlier framework, we
would apply an appropriate unitary change of basis to each qubit before performing the
nal measurement.
1.6. A NEW CLASSIFICATION OF COMPLEXITY 19
require us to compute f an unimaginably large number of times (for N 1).
Yet we have done it with the quantum computer in only one go. It is just
this kind of massive parallelism that Shor invokes in his factoring algorithm.
As noted earlier, a characteristic feature of quantum information is that
it can be encoded in nonlocal correlations among di erent parts of a physical
system. Indeed, this is the case in Eq. (1.14); the properties of the function f
are stored as correlations between the \input register" and \output register"
of our quantum computer. This nonlocal information, however, is not so easy
to decipher.
If, for example, I were to measure the input register, I would obtain a
result jx0i, where x0 is chosen completely at random from the 2N possible
values. This procedure would prepare a state
jx0ijf (x0)i: (1.15)
We could proceed to measure the output register to nd the value of f (x0).
But because Eq. (1.14) has been destroyed by the measurement, the intricate
correlations among the registers have been lost, and we get no opportunity
to determine f (y0) for any y0 6= x0 by making further measurements. In this
case, then, the quantum computation provided no advantage over a classical
one.
The lesson of the solution to Deutsch's problem is that we can sometimes
be more clever in exploiting the correlations encoded in Eq. (1.14). Much
of the art of designing quantum algorithms involves nding ways to make
ecient use of the nonlocal correlations.

1.6 A new classi cation of complexity

The computer on your desktop is not a quantum computer, but still it is a
remarkable device: in principle, it is capable of performing any conceivable
computation. In practice there are computations that you can't do | you
either run out of time or you run out of memory. But if you provide an
unlimited amount of memory, and you are willing to wait as long as it takes,
then anything that deserves to be called a computation can be done by your
little PC. We say, therefore, that it is a \universal computer."
Classical complexity theory is the study of which problems are hard and
which ones are easy. Usually, \hard" and \easy" are de ned in terms of how
much time and/or memory are needed. But how can we make meaningful
20 CHAPTER 1. INTRODUCTION AND OVERVIEW
distinctions between hard and easy without specifying the hardware we will
be using? A problem might be hard on the PC, but perhaps I could design
a special purpose machine that could solve that problem much faster. Or
maybe in the future a much better general purpose computer will be available
that solves the problem far more eciently. Truly meaningful distinctions
between hard and easy should be universal | they ought not to depend on
which machine we are using.
Much of complexity theory focuses on the distinction between \polyno-
mial time" and \exponential time" algorithms. For any algorithm A, which
can act on an input of variable length, we may associate a complexity func-
tion TA(N ), where N is the length of the input in bits. TA (N ) is the longest
\time" (that is, number of elementary steps) it takes for the algorithm to
run to completion, for any N -bit input. (For example, if A is a factoring
algorithm, TA(N ) is the time needed to factor an N -bit number in the worst
possible case.) We say that A is polynomial time if
TA(N ) Poly (N ); (1.16)
where Poly (N ) denotes a polynomial of N . Hence, polynomial time means
that the time needed to solve the problem does not grow faster than a power
of the number of input bits.
If the problem is not polynomial time, we say it is exponential time
(though this is really a misnomer, because of course that are superpoly-
nomial functions like N log N that actually increase much more slowly than
an exponential). This is a reasonable way to draw the line between easy and
hard. But the truly compelling reason to make the distinction this way is
that it is machine-independent: it does not matter what computer we are
using. The universality of the distinction between polynomial and exponen-
tial follows from one of the central results of computer science: one universal
(classical) computer can simulate another with at worst \polynomial over-
head." This means that if an algorithm runs on your computer in polynomial
time, then I can always run it on my computer in polynomial time. If I can't
think of a better way to do it, I can always have my computer emulate how
yours operates; the cost of running the emulation is only polynomial time.
Similarly, your computer can emulate mine, so we will always agree on which
algorithms are polynomial time.3
3 To make this statement precise, we need to be a little careful. For example, we
should exclude certain kinds of \unreasonable" machines, like a parallel computer with an
unlimited number of nodes.
1.7. WHAT ABOUT ERRORS? 21
Now it is true that information and computation in the physical world
are fundamentally quantum mechanical, but this insight, however dear to
physicists, would not be of great interest (at least from the viewpoint of
complexity theory) were it possible to simulate a quantum computer on a
classical computer with polynomial overhead. Quantum algorithms might
prove to be of technological interest, but perhaps no more so than future
advances in classical algorithms that might speed up the solution of certain
problems.
But if, as is indicated (but not proved!) by Shor's algorithm, no polynomial-
time simulation of a quantum computer is possible, that changes everything.
Thirty years of work on complexity theory will still stand as mathematical
truth, as theorems characterizing the capabilities of classical universal com-
puters. But it may fall as physical truth, because a classical Turing machine
is not an appropriate model of the computations that can really be performed
in the physical world.
If the quantum classi cation of complexity is indeed di erent than the
classical classi cation (as is suspected but not proved), then this result will
shake the foundations of computer science. In the long term, it may also
strongly impact technology. But what is its signi cance for physics?
I'm not sure. But perhaps it is telling that no conceivable classical com-
putation can accurately predict the behavior of even a modest number of
qubits (of order 100). This may suggest that relatively small quantum sys-
tems have greater potential than we suspected to surprise, bae, and delight
us.

1.7 What about errors?

As signi cant as Shor's factoring algorithm may prove to be, there is another
recently discovered feature of quantum information that may be just as im-
portant: the discovery of quantum error correction. Indeed, were it not for
this development, the prospects for quantum computing technology would
not seem bright.
As we have noted, the essential property of quantum information that a
quantum computer exploits is the existence of nonlocal correlations among
the di erent parts of a physical system. If I look at only part of the system
at a time, I can decipher only very little of the information encoded in the
system.
22 CHAPTER 1. INTRODUCTION AND OVERVIEW
Unfortunately, these nonlocal correlations are extremely fragile and tend
to decay very rapidly in practice. The problem is that our quantum system
is inevitably in contact with a much larger system, its environment. It is
virtually impossible to perfectly isolate a big quantum system from its en-
vironment, even if we make a heroic e ort to do so. Interactions between a
quantum device and its environment establish nonlocal correlations between
the two. Eventually the quantum information that we initially encoded in
the device becomes encoded, instead, in correlations between the device and
the environment. At that stage, we can no longer access the information by
observing only the device. In practice, the information is irrevocably lost.
Even if the coupling between device and environment is quite weak, this
happens to a macroscopic device remarkably quickly.
Erwin Schrodinger chided the proponents of the mainstream interpreta-
tion of quantum mechanics by observing that the theory will allow a quantum
state of a cat of the form
jcati = p12 (jdeadi + jalivei) : (1.17)
To Schrodinger, the possibility of such states was a blemish on the theory,
because every cat he had seen was either dead or alive, not half dead and
half alive.
One of the most important advances in quantum theory over the past
15 years is that we have learned how to answer Schrodinger with growing
con dence. The state jcati is possible in principle, but is rarely seen because
it is extremely unstable. The cats Schrodinger observed were never well
isolated from the environment. If someone were to prepare the state jcati,
the quantum information encoded in the superposition of jdeadi and jalivei
would immediately be transferred to correlations between the cat and the
environment, and become completely inaccessible. In e ect, the environment
continually measures the cat, projecting it onto either the state jalivei or
jdeadi. This process is called decoherence. We will return to the study of
decoherence later in the course.
Now, to perform a complex quantum computation, we need to prepare a
delicate superposition of states of a relatively large quantum system (though
perhaps not as large as a cat). Unfortunately, this system cannot be perfectly
isolated from the environment, so this superposition, like the state jcati,
decays very rapidly. The encoded quantum information is quickly lost, and
our quantum computer crashes.
1.7. WHAT ABOUT ERRORS? 23
To put it another way, contact between the computer and the environ-
ment (decoherence) causes errors that degrade the quantum information. To
operate a quantum computer reliably, we must nd some way to prevent or
correct these errors.
Actually, decoherence is not our only problem. Even if we could achieve
perfect isolation from the environment, we could not expect to operate a
quantum computer with perfect accuracy. The quantum gates that the ma-
chine executes are unitary transformations that operate on a few qubits at a
time, let's say 4 4 unitary matrices acting on two qubits. Of course, these
unitary matrices form a continuum. We may have a protocol for applying
U0 to 2 qubits, but our execution of the protocol will not be awless, so the
actual transformation
U = U0 (1 + O(")) (1.18)
will di er from the intended U0 by some amount of order ". After about 1="
gates are applied, these errors will accumulate and induce a serious failure.
Classical analog devices su er from a similar problem, but small errors are
much less of a problem for devices that perform discrete logic.
In fact, modern digital circuits are remarkably reliable. They achieve
such high accuracy with help from dissipation. We can envision a classical
gate that acts on a bit, encoded as a ball residing at one of the two minima
of a double-lobed potential. The gate may push the ball over the intervening
barrier to the other side of the potential. Of course, the gate won't be
implemented perfectly; it may push the ball a little too hard. Over time,
these imperfections might accumulate, causing an error.
To improve the performance, we cool the bit (in e ect) after each gate.
This is a dissipative process that releases heat to the environment and com-
presses the phase space of the ball, bringing it close to the local minimum
of the potential. So the small errors that we may make wind up heating the
environment rather than compromising the performance of the device.
But we can't cool a quantum computer this way. Contact with the en-
vironment may enhance the reliability of classical information, but it would
destroy encoded quantum information. More generally, accumulation of er-
ror will be a problem for classical reversible computation as well. To prevent
errors from building up we need to discard the information about the errors,
and throwing away information is always a dissipative process.
Still, let's not give up too easily. A sophisticated machinery has been
developed to contend with errors in classical information, the theory of er-
24 CHAPTER 1. INTRODUCTION AND OVERVIEW
ror correcting codes. To what extent can we coopt this wisdom to protect
quantum information as well?
How does classical error correction work? The simplest example of a
classical error-correcting code is a repetition code: we replace the bit we
wish to protect by 3 copies of the bit,
0 ! (000);
1 ! (111): (1.19)
Now an error may occur that causes one of the three bits to ip; if it's the
rst bit, say,
(000) ! (100);
(111) ! (011): (1.20)
Now in spite of the error, we can still decode the bit correctly, by majority
voting.
Of course, if the probability of error in each bit were p, it would be
possible for two of the three bits to ip, or even for all three to ip. A double
ip can happen in three di erent ways, so the probability of a double ip is
3p2 (1 ; p), while the probability of a triple ip is p3. Altogether, then, the
probability that majority voting fails is 3p2 (1 ; p) + p3 = 3p2 ; 2p3 . But for
3p2 ; 2p3 < p or p < 1 ; (1.21)
2
the code improves the reliability of the information.
We can improve the reliability further by using a longer code. One such
code (though far from the most ecient) is an N -bit repetition code. The
probability distribution for the average value ofpthe bit, by the central limit
theorem, approaches a Gaussian with width 1= N as N ! 1. If P = 12 + "
is the probability that each bit has the correct value, then the probability
that the majority vote fails (for large N ) is
Perror e;N"2 ; (1.22)
arising from the tail of the Gaussian. Thus, for any " > 0, by introducing
enough redundancy we can achieve arbitrarily good reliability. Even for
" < 0, we'll be okay if we always assume that majority voting gives the
1.7. WHAT ABOUT ERRORS? 25
wrong result. Only for P = 21 is the cause lost, for then our block of N bits
will be random, and encode no information.
In the 50's, John Von Neumann showed that a classical computer with
noisy components can work reliably, by employing sucient redundancy. He
pointed out that, if necessary, we can compute each logic gate many times,
and accept the majority result. (Von Neumann was especially interested in
how his brain was able to function so well, in spite of the unreliability of
neurons. He was pleased to explain why he was so smart.)
But now we want to use error correction to keep a quantum computer on
track, and we can immediately see that there are diculties:
1. Phase errors. With quantum information, more things can go wrong.
In addition to bit- ip errors
j0i ! j1i;
j1i ! j0i: (1.23)
there can also be phase errors
j0i ! j0i;
j1i ! ;j1i: (1.24)
A phase error is serious, because it makes the state p12 [j0i + j1i] ip to
the orthogonal state p12 [j0i;j1i]. But the classical coding provided no
protection against phase errors.
2. Small errors. As already noted, quantum information is continuous.
If a qubit is intended to be in the state
aj0i + bj1i; (1.25)
an error might change a and b by an amount of order ", and these small
errors can accumulate over time. The classical method is designed to
correct large (bit ip) errors.
3. Measurement causes disturbance. In the majority voting scheme,
it seemed that we needed to measure the bits in the code to detect and
correct the errors. But we can't measure qubits without disturbing the
quantum information that they encode.
4. No cloning. With classical coding, we protected information by mak-
ing extra copies of it. But we know that quantum information cannot
be copied with perfect delity.
26 CHAPTER 1. INTRODUCTION AND OVERVIEW
1.8 Quantum error-correcting codes
Despite these obstacles, it turns out that quantum error correction really
is possible. The rst example of a quantum error-correcting code was con-
structed about two years ago by (guess who!) Peter Shor. This discovery
ushered in a new discipline that has matured remarkably quickly { the the-
ory of quantum error-correcting codes. We will study this theory later in the
course.
Probably the best way to understand how quantum error correction works
is to examine Shor's original code. It is the most straightforward quantum
generalization of the classical 3-bit repetition code.
Let's look at that 3-bit code one more time, but this time mindful of the
requirement that, with a quantum code, we will need to be able to correct
the errors without measuring any of the encoded information.
Suppose we encode a single qubit with 3 qubits:
j0i ! j0i j000i;
j1i ! j1i j111i; (1.26)
or, in other words, we encode a superposition
aj0i + bj1i ! aj0i + bj1i = aj000i + bj111i : (1.27)
We would like to be able to correct a bit ip error without destroying this
superposition.
Of course, it won't do to measure a single qubit. If I measure the rst
qubit and get the result j0i, then I have prepared the state j0i of all three
qubits, and we have lost the quantum information encoded in the coecients
a and b.
But there is no need to restrict our attention to single-qubit measure-
ments. I could also perform collective measurements on two-qubits at once,
and collective measurements suce to diagnose a bit- ip error. For a 3-qubit
state jx; y; zi I could measure, say, the two-qubit observables y z, or x z
(where denotes addition modulo 2). For both jx; y; zi = j000i and j111i
these would be 0, but if any one bit ips, then at least one of these quantities
will be 1. In fact, if there is a single bit ip, the two bits
(y z; x z); (1.28)
1.8. QUANTUM ERROR-CORRECTING CODES 27
just designate in binary notation the position (1,2 or 3) of the bit that ipped.
These two bits constitute a syndrome that diagnoses the error that occurred.
For example, if the rst bit ips,
aj000i + bj111i ! aj100i + bj011i; (1.29)
then the measurement of (y z; x z) yields the result (0; 1), which instructs
us to ip the rst bit; this indeed repairs the error.
Of course, instead of a (large) bit ip there could be a small error:
j000i ! j000i + "j100i
j111i ! j111i ; "j011i: (1.30)
But even in this case the above procedure would work ne. In measuring
(y z; x z), we would project out an eigenstate of this observable. Most
of the time (probability 1 ; j"j2) we obtain the result (0; 0) and project the
damaged state back to the original state, and so correct the error. Occasion-
ally (probability j"j2) we obtain the result (0; 1) and project the state onto
Eq. 1.29. But then the syndrome instructs us to ip the rst bit, which re-
stores the original state. Similarly, if there is an amplitude of order " for each
of the three qubits to ip, then with a probability of order j"j2 the syndrome
measurement will project the state to one in which one of the three bits is
ipped, and the syndrome will tell us which one.
So we have already overcome 3 of the 4 obstacles cited earlier. We see
that it is possible to make a measurement that diagnoses the error without
damaging the information (answering (3)), and that a quantum measurement
can project a state with a small error to either a state with no error or a state
with a large discrete error that we know how to correct (answering (2)). As
for (4), the issue didn't come up, because the state aj0i + bj1i is not obtained
by cloning { it is not the same as (aj0i + bj1i)3; that is, it di ers from three
copies of the unencoded state.
Only one challenge remains: (1) phase errors. Our code does not yet
provide any protection against phase errors, for if any one of the three qubits
undergoes a phase error then our encoded state aj0i + bj1i is transformed
to aj0i ; bj1i, and the encoded quantum information is damaged. In fact,
phase errors have become three times more likely than if we hadn't used the
code. But with the methods in hand that conquered problems (2)-(4), we can
approach problem (1) with new con dence. Having protected against bit- ip
28 CHAPTER 1. INTRODUCTION AND OVERVIEW
errors by encoding bits redundantly, we are led to protect against phase- ip
errors by encoding phases redundantly.
Following Shor, we encode a single qubit using nine qubits, according to
j0i ! j0i 231=2 (j000) + j111i) (j000i + j111i) (j000i + j111i) ;
j1i ! j1i 231=2 (j000) ; j111i) (j000i ; j111i) (j000i ; j111i) :(1.31)
Both j0i and j1i consist of three clusters of three qubits each, with each
cluster prepared in the same quantum state. Each of the clusters has triple
bit redundancy, so we can correct a single bit ip in any cluster by the method
discussed above.
Now suppose that a phase ip occurs in one of the clusters. The error
changes the relative sign of j000i and j111i in that cluster so that
j000i + j111i ! j000i ; j111i;
j000i ; j111i ! j000i + j111i: (1.32)
This means that the relative phase of the damaged cluster di ers from the
phases of the other two clusters. Thus, as in our discussion of bit- ip cor-
rection, we can identify the damaged cluster, not by measuring the relative
phase in each cluster (which would disturb the encoded information) but
by comparing the phases of pairs of clusters. In this case, we need to mea-
sure a six-qubit observable to do the comparison, e.g., the observable that
ips qubits 1 through 6. Since ipping twice is the identity, this observable
squares to 1, and has eigenvalues 1. A pair of clusters with the same sign
is an eigenstate with eigenvalue +1, and a pair of clusters with opposite sign
is an eigenstate with eigenvalue ;1. By measuring the six-qubit observable
for a second pair of clusters, we can determine which cluster has a di erent
sign than the others. Then, we apply a unitary phase transformation to one
of the qubits in that cluster to reverse the sign and correct the error.
Now suppose that a unitary error U = 1 + 0(") occurs for each of the 9
qubits. The most general single-qubit unitary transformation (aside from a
physically irrelevant overall phase) can be expanded to order " as
! ! !
0 1 0 ; i
U = 1 + i"x 1 0 + i"y i 0 + i"z 0 ;1 : 1 0
(1.33)
1.8. QUANTUM ERROR-CORRECTING CODES 29
the three terms of order " in the expansion can be interpreted as a bit ip
operator, a phase ip operator, and an operator in which both a bit ip
and a phase ip occur. If we prepare an encoded state aj0i + bj1i, allow
the unitary errors to occur on each qubit, and then measure the bit- ip and
phase- ip syndromes, then most of the time we will project the state back
to its original form, but with a probability of order j"j2, one qubit will have
a large error: a bit ip, a phase ip, or both. From the syndrome, we learn
which bit ipped, and which cluster had a phase error, so we can apply the
suitable one-qubit unitary operator to x the error.
Error recovery will fail if, after the syndrome measurement, there are
two bit ip errors in each of two clusters (which induces a phase error in
the encoded data) or if phase errors occur in two di erent clusters (which
induces a bit- ip error in the encoded data). But the probability of such a
double phase error is of order j"j4. So for j"j small enough, coding improves
the reliability of the quantum information.
The code also protects against decoherence. By restoring the quantum
state irrespective of the nature of the error, our procedure removes any en-
tanglement between the quantum state and the environment.
Here as always, error correction is a dissipative process, since information
about the nature of the errors is ushed out of the quantum system. In this
case, that information resides in our recorded measurement results, and heat
will be dissipated when that record is erased.
Further developments in quantum error correction will be discussed later
in the course, including:
As with classical coding it turns out that there are \good" quantum
codes that allow us to achieve arbitrarily high reliability as long as the error
rate per qubit is small enough.
We've assumed that the error recovery procedure is itself executed aw-
lessly. But the syndrome measurement was complicated { we needed to mea-
sure two-qubit and six-qubit collective observables to diagnose the errors { so
we actually might further damage the data when we try to correct it. We'll
show, though, that error correction can be carried out so that it still works
e ectively even if we make occasional errors during the recovery process.
To operate a quantum computer we'll want not only to store quantum
information reliably, but also to process it. We'll show that it is possible to
apply quantum gates to encoded information.
Let's summarize the essential ideas that underlie our quantum error cor-
rection scheme:
30 CHAPTER 1. INTRODUCTION AND OVERVIEW
1. We digitized the errors. Although the errors in the quantum information
were small, we performed measurements that projected our state onto
either a state with no error, or a state with one of a discrete set of
errors that we knew how to convert.
2. We measured the errors without measuring the data. Our measure-
ments revealed the nature of the errors without revealing (and hence
disturbing) the encoded information.
3. The errors are local, and the encoded information is nonlocal. It is im-
portant to emphasize the central assumption underlying the construc-
tion of the code { that errors a ecting di erent qubits are, to a good
approximation, uncorrelated. We have tacitly assumed that an event
that causes errors in two qubits is much less likely than an event caus-
ing an error in a single qubit. It is of course a physics question whether
this assumption is justi ed or not { we can easily envision processes
that will cause errors in two qubits at once. If such correlated errors
are common, coding will fail to improve reliability.
The code takes advantage of the presumed local nature of the errors by
encoding the information in a nonlocal way - that is the information is stored
in correlations involving several qubits. There is no way to distinguish j0i
and j1i by measuring a single qubit of the nine. If we measure one qubit
we will nd j0i with probability 12 and j1i with probability 12 irrespective of
the value of the encoded qubit. To access the encoded information we need
to measure a 3-qubit observable (the operator that ips all three qubits in a
cluster can distinguish j000i + j111i from j000i ; j111i).
The environment might occasionally kick one of the qubits, in e ect \mea-
suring" it. But the encoded information cannot be damaged by disturbing
that one qubit, because a single qubit, by itself, actually carries no informa-
tion at all. Nonlocally encoded information is invulnerable to local in uences
{ this is the central principle on which quantum error-correcting codes are
founded.

1.9 Quantum hardware

The theoretical developments concerning quantum complexity and quantum
error correction have been accompanied by a burgeoning experimental e ort
1.9. QUANTUM HARDWARE 31
to process coherent quantum information. I'll brie y describe some of this
activity here.
To build hardware for a quantum computer, we'll need technology that
enables us to manipulate qubits. The hardware will need to meet some
stringent speci cations:
1. Storage: We'll need to store qubits for a long time, long enough to
complete an interesting computation.
2. Isolation: The qubits must be well isolated from the environment, to
minimize decoherence errors.
3. Readout: We'll need to measure the qubits eciently and reliably.
4. Gates: We'll need to manipulate the quantum states of individual
qubits, and to induce controlled interactions among qubits, so that we
can perform quantum gates.
5. Precision: The quantum gates should be implemented with high pre-
cision if the device is to perform reliably.

1.9.1 Ion Trap

One possible way to achieve these goals was suggested by Ignacio Cirac and
Peter Zoller, and has been pursued by Dave Wineland's group at the National
Institute for Standards and Technology (NIST), as well as other groups. In
this scheme, each qubit is carried by a single ion held in a linear Paul trap.
The quantum state of each ion is a linear combination of the ground state
jgi (interpreted as j0i) and a particular long-lived metastable excited state
jei (interpreted as j1i). A coherent linear combination of the two levels,
ajgi + bei!tjei; (1.34)
can survive for a time comparable to the lifetime of the excited state (though
of course the relative phase oscillates as shown because of the energy splitting
~! between the levels). The ions are so well isolated that spontaneous decay
can be the dominant form of decoherence.
It is easy to read out the ions by performing a measurement that projects
onto the fjgi; jeig basis. A laser is tuned to a transition from the state jgi
to a short-lived excited state je0i. When the laser illuminates the ions, each
32 CHAPTER 1. INTRODUCTION AND OVERVIEW
qubit with the value j0i repeatedly absorbs and reemits the laser light, so
that it ows visibly ( uoresces). Qubits with the value j1i remain dark.
Because of their mutual Coulomb repulsion, the ions are suciently well
separated that they can be individually addressed by pulsed lasers. If a laser
is tuned to the frequency ! of the transition and is focused on the nth ion,
then Rabi oscillations are induced between j0i and j1i. By timing the laser
pulse properly and choosing the phase of the laser appropriately, we can
apply any one-qubit unitary transformation. In particular, acting on j0i, the
laser pulse can prepare any desired linear combination of j0i and j1i.
But the most dicult part of designing and building quantum computing
hardware is getting two qubits to interact with one another. In the ion
trap, interactions arise because of the Coulomb repulsion between the ions.
Because of the mutual Couloumb repulsion, there is a spectrum of coupled
normal modes of vibration for the trapped ions. When the ion absorbs or
emits a laser photon, the center of mass of the ion recoils. But if the laser
is properly tuned, then when a single ion absorbs or emits, a normal mode
involving many ions will recoil coherently (the Mossbauer e ect).
The vibrational mode of lowest frequency (frequency ) is the center-of-
mass (cm) mode, in which the ions oscillate in lockstep in the harmonic well
of the trap. The ions can be laser cooled to a temperature much less than ,
so that each vibrational mode is very likely to occupy its quantum-mechanical
ground state. Now imagine that a laser tuned to the frequency ! ; shines
on the nth ion. For a properly time pulse the state jein will rotate to jgin ,
while the cm oscillator makes a transition from its ground state j0icm to its
rst excited state j1icm (a cm \phonon" is produced). However, the state
jgin j0icm is not on resonance for any transition and so is una ected by the
pulse. Thus the laser pulse induces a unitary transformation acting as
jgin j0icm ! jgin j0icm ;
jein j0icm ! ;ijginj1icm : (1.35)
This operation removes a bit of information that is initially stored in the
internal state of the nth ion, and deposits that bit in the collective state of
motion of all the ions.
This means that the state of motion of the mth ion (m 6= n) has been in-
uenced by the internal state of the nth ion. In this sense, we have succeeded
in inducing an interaction between the ions. To complete the quantum gate,
we should transfer the quantum information from the cm phonon back to
1.9. QUANTUM HARDWARE 33
the internal state of one of the ions. The procedure should be designed so
that the cm mode always returns to its ground state j0icm at the conclusion
of the gate implementation. For example, Cirac and Zoller showed that the
quantum XOR (or controlled not) gate

jx; yi ! jx; y xi; (1.36)

can be implemented in an ion trap with altogether 5 laser pulses. The condi-
tional excitation of a phonon, Eq. (1.35) has been demonstrated experimen-
tally, for a single trapped ion, by the NIST group.
One big drawback of the ion trap computer is that it is an intrinsically
slow device. Its speed is ultimately limited by the energy-time uncertainty
relation. Since the uncertainty in the energy of the laser photons should be
small compared to the characteristic vibrational splitting , each laser pulse
should last a time long compared to ;1 . In practice, is likely to be of
order 100 kHz.

1.9.2 Cavity QED

An alternative hardware design (suggested by Pellizzari, Gardiner, Cirac,
and Zoller) is being pursued by Je Kimble's group here at Caltech. The
idea is to trap several neutral atoms inside a small high nesse optical cavity.
Quantum information can again be stored in the internal states of the atoms.
But here the atoms interact because they all couple to the normal modes of
the electromagnetic eld in the cavity (instead of the vibrational modes as
in the ion trap). Again, by driving transitions with pulsed lasers, we can
induce a transition in one atom that is conditioned on the internal state of
another atom.
Another possibility is to store a qubit, not in the internal state of an ion,
but in the polarization of a photon. Then a trapped atom can be used as
the intermediary that causes one photon to interact with another (instead of
a photon being used to couple one atom to another). In their \ ying qubit"
experiment two years ago. The Kimble group demonstrated the operation of
a two-photon quantum gate, in which the circular polarization of one photon
34 CHAPTER 1. INTRODUCTION AND OVERVIEW
in uences the phase of another photon:
jLi1jLi2 ! jLi1jLi2
jLi1jRi2 ! jLi1jRi2
jRi1jLi2 ! jRi1 jLi2
jRi1 jRi2 ! eijRi1 jRi2 (1.37)
where jLi; jRi denote photon states with left and right circular polarization.
To achieve this interaction, one photon is stored in the cavity, where the jLi
polarization does not couple to the atom, but the jRi polarization couples
strongly. A second photon transverses the cavity, and for the second photon
as well, one polarization interacts with the atom preferentially. The second
photon wave pocket acquires a particular phase shift ei only if both pho-
tons have jRi polarization. Because the phase shift is conditioned on the
polarization of both photons, this is a nontrivial two-qubit quantum gate.

1.9.3 NMR
A third (dark horse) hardware scheme has sprung up in the past year, and
has leap frogged over the ion trap and cavity QED to take the current lead
in coherent quantum processing. The new scheme uses nuclear magnetic
resonance (NMR) technology. Now qubits are carried by certain nuclear
spins in a particular molecule. Each spin can either be aligned (j "i = j0i)
or antialigned (j #i = j1i) with an applied constant magnetic eld. The
spins take a long time to relax or decohere, so the qubits can be stored for a
reasonable time.
We can also turn on a pulsed rotating magnetic eld with frequency
! (where the ! is the energy splitting between the spin-up and spin-down
states), and induce Rabi oscillations of the spin. By timing the pulse suitably,
we can perform a desired unitary transformation on a single spin (just as in
our discussion of the ion trap). All the spins in the molecule are exposed to
the rotating magnetic eld but only those on resonance respond.
Furthermore, the spins have dipole-dipole interactions, and this coupling
can be exploited to perform a gate. The splitting between j "i and j #i for
one spin actually depends on the state of neighboring spins. So whether a
driving pulse is on resonance to tip the spin over is conditioned on the state
of another spin.
1.9. QUANTUM HARDWARE 35
All this has been known to chemists for decades. Yet it was only in the
past year that Gershenfeld and Chuang, and independently Cory, Fahmy, and
Havel, pointed out that NMR provides a useful implementation of quantum
computation. This was not obvious for several reasons. Most importantly,
NMR systems are very hot. The typical temperature of the spins (room
temperature, say) might be of order a million times larger than the energy
splitting between j0i and j1i. This means that the quantum state of our
computer (the spins in a single molecule) is very noisy { it is subject to
strong random thermal uctuations. This noise will disguise the quantum
information. Furthermore, we actually perform our processing not on a single
molecule, but on a macroscopic sample containing of order 1023 \computers,"
and the signal we read out of this device is actually averaged over this ensem-
ble. But quantum algorithms are probabilistic, because of the randomness of
quantum measurement. Hence averaging over the ensemble is not equivalent
to running the computation on a single device; averaging may obscure the
results.
Gershenfeld and Chuang and Cory, Fahmy, and Havel, explained how to
overcome these diculties. They described how \e ective pure states" can
be prepared, manipulated, and monitored by performing suitable operations
on the thermal ensemble. The idea is to arrange for the uctuating properties
of the molecule to average out when the signal is detected, so that only the
underlying coherent properties are measured. They also pointed out that
some quantum algorithms (including Shor's factoring algorithm) can be cast
in a deterministic form (so that at least a large fraction of the computers give
the same answer); then averaging over many computations will not spoil the
result.
Quite recently, NMR methods have been used to prepare a maximally
entangled state of three qubits, which had never been achieved before.
Clearly, quantum computing hardware is in its infancy. Existing hardware
will need to be scaled up by many orders of magnitude (both in the number of
stored qubits, and the number of gates that can be applied) before ambitious
computations can be attempted. In the case of the NMR method, there is
a particularly serious limitation that arises as a matter of principle, because
the ratio of the coherent signal to the background declines exponentially with
the number of spins per molecule. In practice, it will be very challenging to
perform an NMR quantum computation with more than of order 10 qubits.
Probably, if quantum computers are eventually to become practical de-
vices, new ideas about how to construct quantum hardware will be needed.
36 CHAPTER 1. INTRODUCTION AND OVERVIEW
1.10 Summary
This concludes our introductory overview to quantum computation. We
have seen that three converging factors have combined to make this subject
exciting.
1. Quantum computers can solve hard problems. It seems that
a new classi cation of complexity has been erected, a classi cation
better founded on the fundamental laws of physics than traditional
complexity theory. (But it remains to characterize more precisely the
class of problems for which quantum computers have a big advantage
over classical computers.)
2. Quantum errors can be corrected. With suitable coding methods,
we can protect a complicated quantum system from the debilitating
e ects of decoherence. We may never see an actual cat that is half dead
and half alive, but perhaps we can prepare and preserve an encoded cat
that is half dead and half alive.
3. Quantum hardware can be constructed. We are privileged to be
witnessing the dawn of the age of coherent manipulation of quantum
information in the laboratory.
Our aim, in this course, will be to deepen our understanding of points
(1), (2), and (3).
Chapter 2
Foundations I: States and
Ensembles
2.1 Axioms of quantum mechanics
For a few lectures I have been talking about quantum this and that, but
I have never de ned what quantum theory is. It is time to correct that
omission.
Quantum theory is a mathematical model of the physical world. To char-
acterize the model, we need to specify how it will represent: states, observ-
ables, measurements, dynamics.
1. States. A state is a complete description of a physical system. In
quantum mechanics, a state is a ray in a Hilbert space.
What is a Hilbert space?
a) It is a vector space over the complex numbers C. Vectors will be
denoted j i (Dirac's ket notation).
b) It has an inner product h j'i that maps an ordered pair of vectors
to C, de ned by the properties
(i) Positivity: h j i > 0 for j i = 0
(ii) Linearity: h'j(aj 1i + bj 2i) = ah'j 1i + bh'j 2i
(iii) Skew symmetry: h'j i = h j'i
c) It is complete in the norm jj jj = h j i1=2
37
38 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(Completeness is an important proviso in in nite-dimensional function
spaces, since it will ensure the convergence of certain eigenfunction
expansions { e.g., Fourier analysis. But mostly we'll be content to
work with nite-dimensional inner product spaces.)
What is a ray? It is an equivalence class of vectors that di er by multi-
plication by a nonzero complex scalar. We can choose a representative
of this class (for any nonvanishing vector) to have unit norm
h j i = 1: (2.1)
We will also say that j i and ei j i describe the same physical state,
where jei j = 1.
(Note that every ray corresponds to a possible state, so that given two
states j'i; j i, we can form another as aj'i + bj i (the \superposi-
tion principle"). The relative phase in this superposition is physically
signi cant; we identify aj'i + bj'i with ei (aj'i + bj i) but not with
aj'i + ei bj i:)
2. Observables. An observable is a property of a physical system that
in principle can be measured. In quantum mechanics, an observable is
a self-adjoint operator. An operator is a linear map taking vectors to
vectors
A : j i ! Aj i; A (aj i + bj i) = aAj i + bBj i: (2.2)
The adjoint of the operator A is de ned by
h'jA i = hAy'j i; (2.3)
for all vectors j'i; j i (where here I have denoted Aj i as jA i). A is
self-adjoint if A = Ay.
If A and B are self adjoint, then so is A + B (because (A + B)y =
Ay + By) but (AB)y = ByAy, so AB is self adjoint only if A and B
commute. Note that AB + BA and i(AB ; BA) are always self-adjoint
if A and B are.
A self-adjoint operator in a Hilbert space H has a spectral representa-
tion { it's eigenstates form a complete orthonormal basis in H. We can
express a self-adjoint operator A as
X
A = anPn : (2.4)
n
2.1. AXIOMS OF QUANTUM MECHANICS 39
Here each an is an eigenvalue of A, and Pn is the corresponding or-
thogonal projection onto the space of eigenvectors with eigenvalue an.
(If an is nondegenerate, then Pn = jnihnj; it is the projection onto the
corresponding eigenvector.) The Pn 's satisfy
Pn Pm = n;mPn
Pyn = Pn : (2.5)
(For unbounded operators in an in nite-dimensional space, the de ni-
tion of self-adjoint and the statement of the spectral theorem are more
subtle, but this need not concern us.)
3. Measurement. In quantum mechanics, the numerical outcome of a
measurement of the observable A is an eigenvalue of A; right after the
measurement, the quantum state is an eigenstate of A with the mea-
sured eigenvalue. If the quantum state just prior to the measurement
is j i, then the outcome an is obtained with probability
Prob (an) =k Pn j i k2= h jPn j i; (2.6)
If the outcome is an is attained, then the (normalized) quantum state
becomes
Pn j i : (2.7)
(h jPn j i)1=2
(Note that if the measurement is immediately repeated, then according
to this rule the same outcome is attained again, with probability one.)
4. Dynamics. Time evolution of a quantum state is unitary; it is gener-
ated by a self-adjoint operator, called the Hamiltonian of the system. In
the Schrodinger picture of dynamics, the vector describing the system
moves in time as governed by the Schrodinger equation
d j (t)i = ;iHj (t)i; (2.8)
dt
where H is the Hamiltonian. We may reexpress this equation, to rst
order in the in nitesimal quantity dt, as
j (t + dt)i = (1 ; iHdt)j (t)i: (2.9)
40 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
The operator U(dt) 1 ; iHdt is unitary; because H is self-adjoint
it satis es UyU = 1 to linear order in dt. Since a product of unitary
operators is nite, time evolution over a nite interval is also unitary
j (t)i = U(t)j (0)i: (2.10)
In the case where H is t-independent; we may write U = e;itH.
This completes the mathematical formulation of quantum mechanics. We
immediately notice some curious features. One oddity is that the Schrodinger
equation is linear, while we are accustomed to nonlinear dynamical equations
in classical physics. This property seems to beg for an explanation. But far
more curious is the mysterious dualism; there are two quite distinct ways
for a quantum state to change. On the one hand there is unitary evolution,
which is deterministic. If we specify j (0)i, the theory predicts the state
j (t)i at a later time.
But on the other hand there is measurement, which is probabilistic. The
theory does not make de nite predictions about the measurement outcomes;
it only assigns probabilities to the various alternatives. This is troubling,
because it is unclear why the measurement process should be governed by
di erent physical laws than other processes.
Beginning students of quantum mechanics, when rst exposed to these
rules, are often told not to ask \why?" There is much wisdom in this ad-
vice. But I believe that it can be useful to ask way. In future lectures.
we will return to this disconcerting dualism between unitary evolution and
measurement, and will seek a resolution.

2.2 The Qubit

The indivisible unit of classical information is the bit, which takes one of the
two possible values f0; 1g. The corresponding unit of quantum information
is called the \quantum bit" or qubit. It describes a state in the simplest
possible quantum system.
The smallest nontrivial Hilbert space is two-dimensional. We may denote
an orthonormal basis for a two-dimensional vector space as fj0i; j1ig. Then
the most general normalized state can be expressed as
aj0i + bj1i; (2.11)
2.2. THE QUBIT 41
where a; b are complex numbers that satisfy jaj2 + jbj2 = 1, and the overall
phase is physically irrelevant. A qubit is a state in a two-dimensional Hilbert
space that can take any value of the form eq. (2.11).
We can perform a measurement that projects the qubit onto the basis
fj0i; j1ig. Then we will obtain the outcome j0i with probability jaj2, and the
outcome j1i with probability jbj2. Furthermore, except in the cases a = 0
and b = 0, the measurement irrevocably disturbs the state. If the value of
the qubit is initially unknown, then there is no way to determine a and b with
that single measurement, or any other conceivable measurement. However,
after the measurement, the qubit has been prepared in a known state { either
j0i or j1i { that di ers (in general) from its previous state.
In this respect, a qubit di ers from a classical bit; we can measure a
classical bit without disturbing it, and we can decipher all of the information
that it encodes. But suppose we have a classical bit that really does have a
de nite value (either 0 or 1), but that value is initially unknown to us. Based
on the information available to us we can only say that there is a probability
p0 that the bit has the value 0, and a probability p1 that the bit has the
value 1, where p0 + p1 = 1. When we measure the bit, we acquire additional
information; afterwards we know the value with 100% con dence.
An important question is: what is the essential di erence between a qubit
and a probabilistic classical bit? In fact they are not the same, for several
reasons that we will explore.

2.2.1 Spin- 12
First of all, the coecients a and b in eq. (2.11) encode more than just the
probabilities of the outcomes of a measurement in the fj0i; j1ig basis. In
particular, the relative phase of a and b also has physical signi cance.
For a physicist, it is natural to interpret eq. (2.11) as the spin state of an
object with spin- 12 (like an electron). Then j0i and j1i are the spin up (j "i)
and spin down (j #i) states along a particular axis such as the z-axis. The
two real numbers characterizing the qubit (the complex numbers a and b,
modulo the normalization and overall phase) describe the orientation of the
spin in three-dimensional space (the polar angle and the azimuthal angle
').
We cannot go deeply here into the theory of symmetry in quantum me-
chanics, but we will brie y recall some elements of the theory that will prove
useful to us. A symmetry is a transformation that acts on a state of a system,
42 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
yet leaves all observable properties of the system unchanged. In quantum
mechanics, observations are measurements of self-adjoint operators. If A is
measured in the state j i, then the outcome jai (an eigenvector of A) oc-
curs with probability jhaj ij2. A symmetry should leave these probabilities
unchanged (when we \rotate" both the system and the apparatus).
A symmetry, then, is a mapping of vectors in Hilbert space
j i ! j 0i; (2.12)
that preserves the absolute values of inner products
jh'j ij = jh'0j 0ij; (2.13)
for all j'i and j i. According to a famous theorem due to Wigner, a mapping
with this property can always be chosen (by adopting suitable phase conven-
tions) to be either unitary or antiunitary. The antiunitary alternative, while
important for discrete symmetries, can be excluded for continuous symme-
tries. Then the symmetry acts as
j i ! j 0i = Uj i; (2.14)
where U is unitary (and in particular, linear).
Symmetries form a group: a symmetry transformation can be inverted,
and the product of two symmetries is a symmetry. For each symmetry op-
eration R acting on our physical system, there is a corresponding unitary
transformation U(R). Multiplication of these unitary operators must respect
the group multiplication law of the symmetries { applying R1 R2 should be
equivalent to rst applying R2 and subsequently R1. Thus we demand
U(R1)U(R2) = Phase (R1; R2)U(R1 R2) (2.15)
The phase is permitted in eq. (2.15) because quantum states are rays; we
need only demand that U(R1 R2) act the same way as U(R1)U(R2) on
rays, not on vectors. U(R) provides a unitary representation (up to a phase)
of the symmetry group.
So far, our concept of symmetry has no connection with dynamics. Usu-
ally, we demand of a symmetry that it respect the dynamical evolution of
the system. This means that it should not matter whether we rst transform
the system and then evolve it, or rst evolve it and then transform it. In
other words, the diagram
2.2. THE QUBIT 43

dynamics -
Initial Final
rotation rotation
? ?
dynamics - New Final
New Initial

is commutative. This means that the time evolution operator eitH should
commute with the symmetry transformation U(R) :
U(R)e;itH = e;itHU(R); (2.16)
and expanding to linear order in t we obtain
U(R)H = HU(R) (2.17)
For a continuous symmetry, we can choose R in nitesimally close to the
identity, R = I + T , and then U is close to 1,
U = 1 ; i"Q + O("2): (2.18)
From the unitarity of U (to order ") it follows that Q is an observable,
Q = Qy. Expanding eq. (2.17) to linear order in " we nd
[Q; H] = 0; (2.19)
the observable Q commutes with the Hamiltonian.
Eq. (2.19) is a conservation law. It says, for example, that if we prepare
an eigenstate of Q, then time evolution governed by the Schrodinger equation
will preserve the eigenstate. We have seen that symmetries imply conserva-
tion laws. Conversely, given a conserved quantity Q satisfying eq. (2.19) we
can construct the corresponding symmetry transformations. Finite transfor-
mations can be built as a product of many in nitesimal ones
R = (1 + N T )N ) U(R) = (1 + i N Q)N ! eiQ; (2.20)
(taking the limit N ! 1). Once we have decided how in nitesimal sym-
metry transformations are represented by unitary operators, then it is also
44 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
determined how nite transformations are represented, for these can be built
as a product of in nitesimal transformations. We say that Q is the generator
of the symmetry.
Let us brie y recall how this general theory applies to spatial rotations
and angular momentum. An in nitesimal rotation by d about the axis
speci ed by the unit vector n^ = (n1; n2; n3) can be expressed as
~
R(^n; d) = I ; idn^ J; (2.21)
where (J1; J2; J3) are the components of the angular momentum. A nite
rotation is expressed as
R(^n; ) = exp(;in^ J~): (2.22)
Rotations about distinct axes don't commute. From elementary properties
of rotations, we nd the commutation relations
[Jk; J` ] = i"k`mJm ; (2.23)
where "k`m is the totally antisymmetric tensor with "123 = 1, and repeated
indices are summed. To implement rotations on a quantum system, we nd
self-adjoint operators J1; J2; J3 in Hilbert space that satisfy these relations.
The \de ning" representation of the rotation group is three dimensional,
but the simplest nontrivial irreducible representation is two dimensional,
given by
Jk = 12 k ; (2.24)
where
! ! !
0 1 0 ; i
1 = 1 0 ; 2 = i 0 ; 3 = 0 ;1 ; 1 0 (2.25)
are the Pauli matrices. This is the unique two-dimensional irreducible rep-
resentation, up to a unitary change of basis. Since the eigenvalues of Jk are
12 , we call this the spin- 12 representation. (By identifying J as the angular-
momentum, we have implicitly chosen units with ~ = 1).
The Pauli matrices also have the properties of being mutually anticom-
muting and squaring to the identity,
k ` + ` k = 2k` 1; (2.26)
2.2. THE QUBIT 45
So we see that (^n ~)2 = nk n`k ` = nk nk 1 = 1. By expanding the
exponential series, we see that nite rotations are represented as
U(^n; ) = e;i 2 n^~ = 1 cos 2 ; in^ ~ sin 2 : (2.27)
The most general 2 2 unitary matrix with determinant 1 can be expressed
in this form. Thus, we are entitled to think of a qubit as the state of a spin- 21
object, and an arbitrary unitary transformation acting on the state (aside
from a possible rotation of the overall phase) is a rotation of the spin.
A peculiar property of the representation U(^n; ) is that it is double-
valued. In particular a rotation by 2 about any axis is represented nontriv-
ially:
U(^n; = 2) = ;1: (2.28)
Our representation of the rotation group is really a representation \up to a
sign"
U(R1)U(R2) = U(R1 R2): (2.29)
But as already noted, this is acceptable, because the group multiplication is
respected on rays, though not on vectors. These double-valued representa-
tions of the rotation group are called spinor representations. (The existence
of spinors follows from a topological property of the group | it is not simply
connected.)
While it is true that a rotation by 2 has no detectable e ect on a spin-
1 object, it would be wrong to conclude that the spinor property has no
2
observable consequences. Suppose I have a machine that acts on a pair of
spins. If the rst spin is up, it does nothing, but if the rst spin is down, it
rotates the second spin by 2. Now let the machine act when the rst spin
is in a superposition of up and down. Then
p1 (j "i1 + j #i1) j "i2 ! p1 (j "i1 ; j #i1) j "i2 : (2.30)
2 2
While there is no detectable e ect on the second spin, the state of the rst
has ipped to an orthogonal state, which is very much observable.
In a rotated frame of reference, a rotation R(^n; ) becomes a rotation
through the same angle but about a rotated axis. It follows that the three
components of angular momentum transform under rotations as a vector:
U(R)Jk U(R)y = Rk` J` : (2.31)
46 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
Thus, if a state jmi is an eigenstate of J3
J3jmi = mjmi; (2.32)
then U(R)jmi is an eigenstate of RJ3 with the same eigenvalue:
RJ3 (U(R)jmi) = U(R)J3U(R)yU(R)jmi
= U(R)J3jmi = m (U(R)jmi) : (2.33)
Therefore, we can construct eigenstates of angular momentum along the axis
n^ = (sin cos '; sin sin '; cos ) by applying a rotation through , about the
axis n^ 0 = (; sin '; cos '; 0), to a J3 eigenstate. For our spin- 12 representation,
this rotation is
" # " ;i'!#

exp ;i 2 n^ 0 ~ = exp 2 ei' 0 0 ; e
cos ;e ;i' sin !
= ei' sin2 2 ; (2.34)
2 cos 2

and applying it to 10 , the J3 eigenstate with eigenvalue 1, we obtain
e ;i'=2 cos !
j (; ')i = ei'=2 sin 2 ; (2.35)
2
(up to an overall phase). We can check directly that this is an eigenstate of
cos e;i' sin !
n^ ~ = ei' sin ; cos ; (2.36)

with eigenvalue one. So we have seen that eq. (2.11) with a = e;i'=2 cos 2 ; b =
ei'=2 sin 2 ; can be interpreted as a spin pointing in the (; ') direction.
We noted that we cannot determine a and b with a single measurement.
Furthermore, even with many identical copies of the state, we cannot com-
pletely determine the state by measuring each copy only along the z-axis.
This would enable us to estimate jaj and jbj, but we would learn nothing
about the relative phase of a and b. Equivalently, we would nd the compo-
nent of the spin along the z-axis
h (; ')j3j (; ')i = cos2 2 ; sin2 2 = cos ; (2.37)
2.2. THE QUBIT 47
but we would not learn about the component in the x ; y plane. The problem
of determining j i by measuring the spin is equivalent to determining the
unit vector n^ by measuring its components along various axes. Altogether,
measurements along three di erent axes are required. E.g., from h3i and
h1i we can determine n3 and n1, but the sign of n2 remains undetermined.
Measuring h2i would remove this remaining ambiguity.
Of course, if we are permitted to rotate the spin, then only measurements
along the z-axis will suce. That is, measuring a spin along the n^ axis is
equivalent to rst applying a rotation that rotates the n^ axis to the axis z^,
and then measuring along z^.
In the special case = 2 and ' = 0 (the x^-axis) our spin state is
j "xi = p12 (j "z i + j #z i) ; (2.38)
(\spin-up along the x-axis"). The orthogonal state (\spin down along the
x-axis") is
j #xi = p12 (j "z i ; j #z i) : (2.39)
For either of these states, if we measure the spin along the z-axis, we will
obtain j "z i with probability 12 and j #z i with probability 21 .
Now consider the combination
p1 (j "xi + j #xi) : (2.40)
2
This state has the property that, if we measure the spin along the x-axis, we
obtain j "xi or j #xi, each with probability 12 . Now we may ask, what if we
measure the state in eq. (2.40) along the z-axis?
If these were probabilistic classical bits, the answer would be obvious.
The state in eq. (2.40) is in one of two states, and for each of the two, the
probability is 12 for pointing up or down along the z-axis. So of course we
should nd up with probability 12 when we measure along the z-axis.
But not so for qubits! By adding eq. (2.38) and eq. (2.39), we see that
the state in eq. (2.40) is really j "z i in disguise. When we measure along the
z-axis, we always nd j "z i, never j #z i.
We see that for qubits, as opposed to probabilistic classical bits, proba-
bilities can add in unexpected ways. This is, in its simplest guise, the phe-
nomenon called \quantum interference," an important feature of quantum
information.
48 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
It should be emphasized that, while this formal equivalence with a spin- 12
object applies to any two-level quantum system, of course not every two-level
system transforms as a spinor under rotations!

2.2.2 Photon polarizations

Another important two-state system is provided by a photon, which can
have two independent polarizations. These photon polarization states also
transform under rotations, but photons di er from our spin- 12 objects in two
important ways: (1) Photons are massless. (2) Photons have spin-1 (they
are not spinors).
Now is not a good time for a detailed discussion of the unitary represen-
tations of the Poincare group. Suce it to say that the spin of a particle
classi es how it transforms under the little group, the subgroup of the Lorentz
group that preserves the particle's momentum. For a massive particle, we
may always boost to the particle's rest frame, and then the little group is
the rotation group.
For massless particles, there is no rest frame. The nite-dimensional
unitary representations of the little group turn out to be representations of
the rotation group in two dimensions, the rotations about the axis determined
by the momentum. Of course, for a photon, this corresponds to the familiar
property of classical light { the waves are polarized transverse to the direction
of propagation.
Under a rotation about the axis of propagation, the two linear polarization
states (jxi and jyi for horizontal and vertical polarization) transform as
jxi ! cos jxi + sin jyi
jyi ! ; sin jxi + cos jyi: (2.41)
This two-dimensional representation is actually reducible. The matrix
!
cos sin (2.42)
; sin cos
has the eigenstates
! !
jRi = p1 1i 1 i
jLi = p 1 ; (2.43)
2 2
2.3. THE DENSITY MATRIX 49
with eigenvalues ei and e;i , the states of right and left circular polarization.
That is, these are the eigenstates of the rotation generator
!
0 ; i
J = i 0 = y ; (2.44)

with eigenvalues 1. Because the eigenvalues are 1 (not 21 ) we say that
the photon has spin-1.
In this context, the quantum interference phenomenon can be described
this way: Suppose that we have a polarization analyzer that allows only
one of the two linear photon polarizations to pass through. Then an x or y
polarized photon has prob 21 of getting through a 45o rotated polarizer, and
a 45o polarized photon has prob 12 of getting through an x and y analyzer.
But an x photon never passes through a y analyzer. If we put a 45o rotated
analyzer in between an x and y analyzer, then 21 the photons make it through
each analyzer. But if we remove the analyzer in the middle no photons make
it through the y analyzer.
A device can be constructed easily that rotates the linear polarization of
a photon, and so applies the transformation Eq. (2.41) to our qubit. As
noted, this is not the most general possible unitary transformation. But if
we also have a device that alters the relative phase of the two orthogonal
linear polarization states
jxi ! ei!=2jxi
jyi ! e;i!=2jyi ; (2.45)
the two devices can be employed together to apply an arbitrary 2 2 unitary
transformation (of determinant 1) to the photon polarization state.

2.3 The density matrix

2.3.1 The bipartite quantum system
The last lecture was about one qubit. This lecture is about two qubits.
(Guess what the next lecture will be about!) Stepping up from one qubit to
two is a bigger leap than you might expect. Much that is weird and wonderful
about quantum mechanics can be appreciated by considering the properties
of the quantum states of two qubits.
50 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
The axioms of x2.1 provide a perfectly acceptable general formulation
of the quantum theory. Yet under many circumstances, we nd that the
axioms appear to be violated. The trouble is that our axioms are intended
to characterize the quantum behavior of the entire universe. Most of the
time, we are not so ambitious as to attempt to understand the physics of the
whole universe; we are content to observe just our little corner. In practice,
then, the observations we make are always limited to a small part of a much
larger quantum system.
In the next several lectures, we will see that, when we limit our attention
to just part of a larger system, then (contrary to the axioms):
1. States are not rays.
2. Measurements are not orthogonal projections.
3. Evolution is not unitary.
We can best understand these points by considering the simplest possible
example: a two-qubit world in which we observe only one of the qubits.
So consider a system of two qubits. Qubit A is here in the room with us,
and we are free to observe or manipulate it any way we please. But qubit
B is locked in a vault where we can't get access to it. Given some quantum
state of the two qubits, we would like to nd a compact way to characterize
the observations that can be made on qubit A alone.
We'll use fj0iA ; j1iA g and fj0iB ; j1iB g to denote orthonormal bases for
qubits A and B respectively. Consider a quantum state of the two-qubit
world of the form
j iAB = aj0iA j0iB + bj1iA j1iB : (2.46)
In this state, qubits A and B are correlated. Suppose we measure qubit A by
projecting onto the fj0iA ; j1iA g basis. Then with probability jaj2 we obtain
the result j0iA , and the measurement prepares the state
j0iA j0iB : (2.47)
with probability jbj2, we obtain the result j1iA and prepare the state
j1iA j1iB : (2.48)
2.3. THE DENSITY MATRIX 51
In either case, a de nite state of qubit B is picked out by the measurement. If
we subsequently measure qubit B , then we are guaranteed (with probability
one) to nd j0iB if we had found j0iA , and we are guaranteed to nd j1iB if
we found j1iA . In this sense, the outcomes of the fj0iA ; j1iA g and fj0iB ; j1iB g
measurements are perfectly correlated in the state j iAB .
But now I would like to consider more general observables acting on qubit
A, and I would like to characterize the measurement outcomes for A alone
(irrespective of the outcomes of any measurements of the inaccessible qubit
B ). An observable acting on qubit A only can be expressed as
MA 1B ; (2.49)
where MA is a self-adjoint operator acting on A, and 1B is the identity
operator acting on B . The expectation value of the observable in the state
j i is:
h jMA 1B j i
= (aAh0j B h0j + bB h1j B h1j) (MA 1B )
(aj0iA j0iB + bj1iA j1iB )
= jaj2Ah0jMA j0iA + jbj2A h1jMA j1iA ; (2.50)
(where we have used the orthogonality of j0iB and j1iB ). This expression
can be rewritten in the form
hMA i = tr (MA A ) ; (2.51)

A = jaj2j0iA Ah0j + jbj2j1iA A h1j; (2.52)

and tr() denotes the trace. The operator A is called the density operator
(or density matrix) for qubit A. It is self-adjoint, positive (its eigenvalues are
nonnegative) and it has unit trace (because j i is a normalized state.)
Because hMA i has the form eq. (2.51) for any observable MA acting
on qubit A, it is consistent to interpret A as representing an ensemble of
possible quantum states, each occurring with a speci ed probability. That
is, we would obtain precisely the same result for hMA i if we stipulated that
qubit A is in one of two quantum states. With probability p0 = jaj2 it is
in the quantum state j0iA , and with probability p1 = jbj2 it is in the state
52 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
j1iA . If we are interested in the result of any possible measurement, we can
consider MA to be the projection EA(a) onto the relevant eigenspace of a
particular observable. Then
Prob (a) = p0 Ah0jEA (a)j0iA + p1 A h1jEA(a)j1iA ; (2.53)
which is the probability of outcome a summed over the ensemble, and weighted
by the probability of each state in the ensemble.
We have emphasized previously that there is an essential di erence be-
tween a coherent superposition of the states j0iA and j1iA , and a probabilistic
ensemble, in which j0iA and j1iA can each occur with speci ed probabilities.
For example, for a spin- 12 object we have seen that if we measure 1 in the
state p12 (j "z i + j #z i), we will obtain the result j "xi with probability one.
But the ensemble in which j "z i and j #z i each occur with probability 21 is
represented by the density operator
= 12 (j "z ih"z j + j #z ih#z j)
= 12 1; (2.54)
and the projection onto j "xi then has the expectation value
tr (j "xih"x j) = 12 : (2.55)
In fact, we have seen that any state of one qubit represented by a ray can
be interpreted as a spin pointing in some de nite direction. But because
the identity is left unchanged by any unitary change of basis, and the state
j (; ')i can be obtained by applying a suitable unitary transformation to
j "z i, we see that for given by eq. (2.54), we have
tr (j (; ')ih (; ')j) = 1 : (2.56)
2
Therefore, if the state j iAB in eq. (2.57) is prepared, with jaj2 = jbj2 = 12 ,
and we measure the spin A along any axis, we obtain a completely random
result; spin up or spin down can occur, each with probability 12 .
This discussion of the correlated two-qubit state j iAB is easily general-
ized to an arbitrary state of any bipartite quantum system (a system divided
into two parts). The Hilbert space of a bipartite system is HA HB where
2.3. THE DENSITY MATRIX 53
HA;B are the Hilbert spaces of the two parts. This means that if fjiiA g is an
orthonormal basis for HA and fjiB g is an orthonormal basis for HB , then
fjiiA jiB g is an orthonormal basis for HA HB . Thus an arbitrary pure
state of HA HB can be expanded as
X
j iAB = aijiiA jiB ; (2.57)
i;
where Pi; jaij2 = 1. The expectation value of an observable MA 1B , that
acts only on subsystem A is
hMA i = AB h jM 1 j i
X A B AB X
= aj (A hj j B h j) (MA 1B ) ai (jiiA jiB )
j; i;
X
= ajai Ahj jMA jiiA
i;j;
= tr (MA A) ; (2.58)
where
A = trXB (j iAB AB h j)
aiajjiiA Ahj j : (2.59)
i;j;
We say that the density operator A for subsystem A is obtained by per-
forming a partial trace over subsystem B of the density matrix (in this case
a pure state) for the combined system AB .
From the de nition eq. (2.59), we can immediately infer that A has the
following properties:
1. A is self-adjoint: A = yA.
2. A is positive: For any j iA A h jAj iA = P j Pi ai A h jiiAj2 0.
3. tr(A) = 1: We have tr A = Pi; jaij2 = 1, since j iAB is normalized.
It follows that A can be diagonalized, that the eigenvalues are all real and
nonnegative, and that the eigenvalues sum to one.
If we are looking at a subsystem of a larger quantum system, then, even
if the state of the larger system is a ray, the state of the subsystem need
54 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
not be; in general, the state is represented by a density operator. In the
case where the state of the subsystem is a ray, and we say that the state is
pure. Otherwise the state is mixed. If the state is a pure state j iA , then
the density matrix A = j iA Ah j is the projection onto the one-dimensional
space spanned by j iA . Hence a pure density matrix has the property 2 = .
A general density matrix, expressed in the basis in which it is diagonal, has
the form
A = X paj aih aj; (2.60)
a

where 0 < pa 1 and Pa pa = 1. If the state is not pure, P thereP are two
or more terms in this sum, and 6= ; in fact, tr = pa < pa = 1.
2 2 2
We say that is an incoherent superposition of the states fj aig; incoherent
meaning that the relative phases of the j ai are experimentally inaccessible.
Since the expectation value of any observable M acting on the subsystem
can be expressed as
X
hMi = trM = pah ajMj ai; (2.61)
a

we see as before that we may interpret as describing an ensemble of pure

quantum states, in which the state j ai occurs with probability pa . We have,
therefore, come a long part of the way to understanding how probabilities
arise in quantum mechanics when a quantum system A interacts with another
system B . A and B become entangled, that is, correlated. The entanglement
destroys the coherence of a superposition of states of A, so that some of the
phases in the superposition become inaccessible if we look at A alone. We
may describe this situation by saying that the state of system A collapses
| it is in one of a set of alternative states, each of which can be assigned a
probability.

2.3.2 Bloch sphere

Let's return to the case in which system A is a single qubit, and consider the
form of the general density matrix. The most general self-adjoint 2 2 matrix
has four real parameters, and can be expanded in the basis f1; 1; 2; 3g.
Since each i is traceless, the coecient of 1 in the expansion of a density
2.3. THE DENSITY MATRIX 55
matrix must be 12 (so that tr() = 1), and may be expressed as

(P~ ) = 12 1 + P~ ~
21 (1 + P1 1 + P2 2 + P33)
!
1 1 + P
= 2 P + iP 1 ; P :3 P 1 ; iP 2 (2.62)
1 2 3

We can compute det = 41 1 ; P~ 2 . Therefore, a necessary condition for
to have nonnegative eigenvalues is det 0 or P~ 2 1. This condition is
also sucient; since tr = 1, it is not possible for to have two negative
eigenvalues. Thus, there is a 1 ; 1 correspondence between the possible
density matrices of a single qubit and the points on the unit 3-ball 0 jP~ j
1. This ball is usually called the Bloch sphere (although of course it is really
a ball, not a sphere).
The boundary jP~ j = 1 of the ball (which really is a sphere) contains
the density matrices with vanishing determinant. Since tr = 1, these den-
sity matrices must have the eigenvalues 0 and 1. They are one-dimensional
projectors, and hence pure states. We have already seen that every pure
state of a single qubit is of the form j (; ')i and can be envisioned as a spin
pointing in the (; ') direction. Indeed using the property
(^n ~)2 = 1; (2.63)
where n^ is a unit vector, we can easily verify that the pure-state density
matrix
(^n) = 21 (1 + n^ ~) (2.64)
satis es the property
(^n ~) ~(^n) = (^n) (^n ~) = (^n); (2.65)
and, therefore is the projector
(^n) = j (^n)ih (^n)j; (2.66)
that is, n^ is the direction along which the spin is pointing up. Alternatively,
from the expression
e;i'=2 cos !
j (; )i = ei'=2 sin 2 ; (2.67)
2
56 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
we may compute directly that
(; ) = j (; )ih (; )j
cos2 cos sin e;i' ! 1 1 cos sin e ;i' !
= cos sin ei'
2 2 2
sin2 2 = 2 1 + 2 sin ei' ; cos
2 2

= 12 (1 + n^ ~) (2.68)
where n^ = (sin cos '; sin sin '; cos ). One nice property of the Bloch
parametrization of the pure states is that while j (; ')i has an arbitrary
overall phase that has no physical signi cance, there is no phase ambiguity
in the density matrix (; ') = j (; ')ih (; ')j; all the parameters in
have a physical meaning.
From the property
1 tr = (2.69)
2 i j ij
we see that

hn^ ~iP~ = tr n^ ~(P~ ) = n^ P~ : (2.70)
Thus the vector P~ in Eq. (2.62) parametrizes the polarization of the spin. If
there are many identically prepared systems at our disposal, we can determine
P~ (and hence the complete density matrix (P~ )) by measuring hn^ ~i along
each of three linearly independent axes.

2.3.3 Gleason's theorem

We arrived at the density matrix and the expression tr(M) for the ex-
pectation value of an observable M by starting from our axioms of quantum
mechanics, and then considering the description of a portion of a larger quan-
tum system. But it is encouraging to know that the density matrix formalism
is a very general feature in a much broader framework. This is the content
of Gleason's theorem (1957).
Gleason's theorem starts from the premise that it is the task of quantum
theory to assign consistent probabilities to all possible orthogonal projec-
tions in a Hilbert space (in other words, to all possible measurements of
observables).
2.3. THE DENSITY MATRIX 57
A state of a quantum system, then, is a mapping that take each projection
(E E and E = Ey) to a nonnegative real number less than one:
2=

E ! p(E); 0 p(E ) 1: (2.71)

This mapping must have the properties:
(1) p(0) = 0
(2) p(1) = 1
(3) If E1E2 = 0, then p(E1 + E2) = p(E1) + p(E2).
Here (3) is the crucial assumption. It says that (since projections on to mutu-
ally orthogonal spaces can be viewed as mutually exclusive alternatives) the
probabilities assigned to mutually orthogonal projections must be additive.
This assumption is very powerful, because there are so many di erent ways
to choose E1 and E2. Roughly speaking, the rst two assumptions say that
whenever we make a measurement; (1) there is always an outcome, and (2)
the probabilities of all possible outcomes sum to 1.
Under these assumptions, Gleason showed that for any such map, there
is a hermitian, positive with tr = 1 such that
p(E) = tr(E): (2.72)
as long as the dimension of the Hilbert space is greater than 2. Thus, the
density matrix formalism is really necessary, if we are to represent observables
as self-adjoint operators in Hilbert space, and we are to consistently assign
probabilities to all possible measurement outcomes. Crudely speaking, the
requirement of additivity of probabilities for mutually exclusive outcomes is
so strong that we are inevitably led to the linear expression eq. (2.72).
The case of a two-dimensional Hilbert space is special because there just
are not enough mutually exclusive projections in two dimensions. All non-
trivial projections are of the form
E(^n) = 21 (1 + n^ ~); (2.73)
and
E(^n)E(m^ ) = 0 (2.74)
58 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
only for m^ = ;n^ ; therefore, any function f (^n) on the two-sphere such that
f (^n) + f (;n^ ) = 1 satis es the premises of Gleason's theorem, and there
are many such functions. However, in three-dimensions, there are may more
alternative ways to partition unity, so that Gleason's assumptions are far
more powerful. The proof of the theorem will not be given here. See Peres,
p. 190 , for a discussion.

2.3.4 Evolution of the density operator

So far, we have not discussed the time evolution of mixed states. In the case
of a bipartite pure state governed by the usual axioms of quantum theory,
let us suppose that the Hamiltonian on HA HB has the form
HAB = HA 1B + 1A HB : (2.75)
Under this assumption, there is no coupling between the two subsystems A
and B , so that each evolves independently. The time evolution operator for
the combined system
UAB (t) = UA (t) UB (t); (2.76)
decomposes into separate unitary time evolution operators acting on each
system.
In the Schrodinger picture of dynamics, then, an initial pure state j (0)iAB
of the bipartite system given by eq. (2.57) evolves to
X
j (t)iAB = aiji(t)iA j(t)iB ; (2.77)
i;
where
ji(t)iA = UA (t)ji(0)iA;
j(t)iB = UB (t)j(0)iB ; (2.78)
de ne new orthonormal basis for HA and HB (since UA (t) and UB (t) are
unitary). Taking the partial trace as before, we nd
A (t) = X aiaj ji(t)iA Ahj (t)j
i;j;
= UA(t)A(0)UA (t)y: (2.79)
2.4. SCHMIDT DECOMPOSITION 59
Thus UA(t), acting by conjugation, determines the time evolution of the
density matrix.
In particular, in the basis in which A(0) is diagonal, we have
A (t) = X paUA (t)j a(0)iA A h a(0)jUA(t): (2.80)
a
Eq. (2.80) tells us that the evolution of A is perfectly consistent with the
ensemble interpretation. Each state in the ensemble evolves forward in time
governed by UA(t). If the state j a(0)i occurs with probability pa at time 0,
then j a(t)i occurs with probability pa at the subsequent time t.
On the other hand, it should be clear that eq. (2.80) applies only under
the assumption that systems A and B are not coupled by the Hamiltonian.
Later, we will investigate how the density matrix evolves under more general
conditions.

2.4 Schmidt decomposition

A bipartite pure state can be expressed in a standard form (the Schmidt
decomposition) that is often very useful.
To arrive at this form, note that an arbitrary vector in HA HB can be
expanded as
X X
j iAB = aijiiA jiB jiiAj~iiB : (2.81)
i; i
Here fjiiAg and fjiB g are orthonormal basis for HA and HB respectively,
but to obtain the second equality in eq. (2.81) we have de ned
X
j~iiB aijiB : (2.82)

Note that the j~iiB 's need not be mutually orthogonal or normalized.
Now let's suppose that the fjiiAg basis is chosen to be the basis in which
A is diagonal,
A = X pi jiiA A hij: (2.83)
i
We can also compute A by performing a partial trace,
A = trB (j iAB AB h j)
60 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
X X
= trB ( jiiA Ahj j j~iiB B h~j j) = B h~j j~iiB (jiiA Ahj j) :
ij ij (2.84)
We obtained the last equality in eq. (2.84) by noting that
X
trB j~iiB B h~j j = B hkj~iiB B h~j jkiB
Xk
= B h~j jkiB B hkj~iiB = B h~j j~iiB ; (2.85)
k
where fjkiB g is an orthonormal basis for HB . By comparing eq. (2.83) and
eq. (2.84), we see that
B h~j j~iiB = pi ij : (2.86)
Hence, it turns out that the fj~iiB g are orthogonal after all. We obtain
orthonormal vectors by rescaling,
ji0iB = p;i 1=2jiiB (2.87)
(we may assume pi 6= 0, because we will need eq. (2.87) only for i appearing
in the sum eq. (2.83)), and therefore obtain the expansion
X
j iAB = ppi jiiAji0iB ; (2.88)
i
in terms of a particular orthonormal basis of HA and HB .
Eq. (2.88) is the Schmidt decomposition of the bipartite pure state j iAB .
Any bipartite pure state can be expressed in this form, but of course the
bases used depend on the pure state that is being expanded. In general, we
can't simultaneously expand both j iAB and j'iAB 2 HA HB in the form
eq. (2.88) using the same orthonormal bases for HA and HB .
Using eq. (2.88), we can also evaluate the partial trace over HA to obtain
B = trA (j iAB AB h j) = X piji0iB B hi0j: (2.89)
i
We see that A and B have the same nonzero eigenvalues. Of course, there
is no need for HA and HB to have the same dimension, so the number of zero
eigenvalues of A and B can di er.
If A (and hence B ) have no degenerate eigenvalues other than zero,
then the Schmidt decomposition of j iAB is essentially uniquely determined
2.4. SCHMIDT DECOMPOSITION 61
by A and B . We can diagonalize A and B to nd the jiiA's and ji0iB 's,
and then we pair up the eigenstates of A and B with the same eigenvalue
to obtain eq. (2.88). We have chosen the phases of our basis states so that no
phases appear in the coecients in the sum; the only remaining freedom is
to rede ne jiiA and ji0iB by multiplying by opposite phases (which of course
leaves the expression eq. (2.88) unchanged).
But if A has degenerate nonzero eigenvalues, then we need more infor-
mation than that provided by A and B to determine the Schmidt decompo-
sition; we need to know which ji0iB gets paired with each jiiA. For example,
if both HA and HB are N -dimensional and Uij is any N N unitary matrix,
then
XN
j iAB = p1N jiiAUij jj 0iB ; (2.90)
i;j =1
will yield A = B = N1 1 when we take partial traces. Furthermore, we are
free to apply simultaneous unitary transformations in HA and HB ,
X X
j iAB = p1N jiiAji0iB = p1N Uij jj iA Uik jk0iB ; (2.91)
i ijk
this preserves the state j iAB , but illustrates that there is an ambiguity in
the basis used when we express j iAB in the Schmidt form.

2.4.1 Entanglement
With any bipartite pure state j iAB we may associate a positive integer, the
Schmidt number, which is the number of nonzero eigenvalues in A (or B )
and hence the number of terms in the Schmidt decomposition of j iAB . In
terms of this quantity, we can de ne what it means for a bipartite pure state
to be entangled: j iAB is entangled (or nonseparable) if its Schmidt number
is greater than one; otherwise, it is separable (or unentangled). Thus, a
separable bipartite pure state is a direct product of pure states in HA and
HB ,
j iAB = j'iA jiB ; (2.92)
then the reduced density matrices A = j'iA A h'j and B = jiB B hj are
pure. Any state that cannot be expressed as such a direct product is entan-
gled; then A and B are mixed states.
62 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
One of our main goals this term will be to understand better the signif-
icance of entanglement. It is not strictly correct to say that subsystems A
and B are uncorrelated if j iAB is separable; after all, the two spins in the
separable state
j "iA j "iB ; (2.93)
are surely correlated { they are both pointing in the same direction. But
the correlations between A and B in an entangled state have a di erent
character than those in a separable state. Perhaps the critical di erence is
that entanglement cannot be created locally. The only way to entangle A and
B is for the two subsystems to directly interact with one another.
We can prepare the state eq. (2.93) without allowing spins A and B to
ever come into contact with one another. We need only send a (classical!)
message to two preparers (Alice and Bob) telling both of them to prepare a
spin pointing along the z-axis. But the only way to turn the state eq. (2.93)
into an entangled state like
p1 (j "iA j "iB + j #iAj #iB ) ; (2.94)
2
is to apply a collective unitary transformation to the state. Local unitary
transformations of the form UA UB , and local measurements performed by
Alice or Bob, cannot increase the Schmidt number of the two-qubit state,
no matter how much Alice and Bob discuss what they do. To entangle two
qubits, we must bring them together and allow them to interact.
As we will discuss later, it is also possible to make the distinction between
entangled and separable bipartite mixed states. We will also discuss various
ways in which local operations can modify the form of entanglement, and
some ways that entanglement can be put to use.

2.5 Ambiguity of the ensemble interpretation

2.5.1 Convexity
Recall that an operator acting on a Hilbert space H may be interpreted as
a density operator if it has the three properties:
(1) is self-adjoint.
2.5. AMBIGUITY OF THE ENSEMBLE INTERPRETATION 63
(2) is nonnegative.
(3) tr() = 1.
It follows immediately that, given two density matrices 1, and 2, we can
always construct another density matrix as a convex linear combination of
the two:
() = 1 + (1 ; )2 (2.95)
is a density matrix for any real satisfying 0 1. We easily see that
() satis es (1) and (3) if 1 and 2 do. To check (2), we evaluate
h j()j i = h j1j i + (1 ; )h j2j i 0; (2.96)
h()i is guaranteed to be nonnegative because h1i and h2i are. We have,
therefore, shown that in a Hilbert space H of dimension N , the density
operators are a convex subset of the real vector space of N N hermitian
matrices. (A subset of a vector space is said to be convex if the set contains
the straight line segment connecting any two points in the set.)
Most density operators can be expressed as a sum of other density oper-
ators in many di erent ways. But the pure states are special in this regard {
it is not possible to express a pure state as a convex sum of two other states.
Consider a pure state = j ih j, and let j ?i denote a vector orthogonal
to j i; h ?j i = 0. Suppose that can be expanded as in eq. (2.95); then
h ? jj ?i = 0 = h ? j1j ?i
+ (1 ; )h ? j2j ?i: (2.97)
Since the right hand side is a sum of two nonnegative terms, and the sum
vanishes, both terms must vanish. If is not 0 or 1, we conclude that 1 and
2 are orthogonal to j ?i. But since j ? i can be any vector orthogonal to
j i, we conclude that 1 = 2 = .
The vectors in a convex set that cannot be expressed as a linear combina-
tion of other vectors in the set are called the extremal points of the set. We
have just shown that the pure states are extremal points of the set of density
matrices. Furthermore, onlyP the pure states are extremal, because any mixed
state can be written = i pi jiihij in the basis in which it is diagonal, and
so is a convex sum of pure states.
64 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
We have already encountered this structure in our discussion of the special
case of the Bloch sphere. We saw that the density operators are a (unit) ball
in the three-dimensional set of 2 2 hermitian matrices with unit trace.
The ball is convex, and its extremal points are the points on the boundary.
Similarly, the N N density operators are a convex subset of the (N 2 ; 1)-
dimensional set of N N hermitian matrices with unit trace, and the extremal
points of the set are the pure states.
However, the 2 2 case is atypical in one respect: for N > 2, the points on
the boundary of the set of density matrices are not necessarily pure states.
The boundary of the set consists of all density matrices with at least one
vanishing eigenvalue (since there are nearby matrices with negative eigenval-
ues). Such a density matrix need not be pure, for N > 2, since the number
of nonvanishing eigenvalues can exceed one.

2.5.2 Ensemble preparation

The convexity of the set of density matrices has a simple and enlightening
physical interpretation. Suppose that a preparer agrees to prepare one of
two possible states; with probability , the state 1 is prepared, and with
probability 1 ; , the state 2 is prepared. (A random number generator
might be employed to guide this choice.) To evaluate the expectation value
of any observable M, we average over both the choices of preparation and the
outcome of the quantum measurement:
hMi = hMi1 + (1 ; )hMi2
= tr(M1) + (1 ; )tr(M2)
= tr (M()) : (2.98)
All expectation values are thus indistinguishable from what we would obtain
if the state () had been prepared instead. Thus, we have an operational
procedure, given methods for preparing the states 1 and 2, for preparing
any convex combination.
Indeed, for any mixed state , there are an in nite variety of ways to
express as a convex combination of other states, and hence an in nite
variety of procedures we could employ to prepare , all of which have exactly
the same consequences for any conceivable observation of the system. But
a pure state is di erent; it can be prepared in only one way. (This is what
is \pure" about a pure state.) Every pure state is an eigenstate of some
2.5. AMBIGUITY OF THE ENSEMBLE INTERPRETATION 65
observable, e.g., for the state = j ih j, measurement of the projection
E = j ih j is guaranteed to have the outcome 1. (For example, recall that
every pure state of a single qubit is \spin-up" along some axis.) Since
is the only state for which the outcome of measuring E is 1 with 100%
probability, there is no way to reproduce this observable property by choosing
one of several possible preparations. Thus, the preparation of a pure state
is unambiguous (we can determine a unique preparation if we have many
copies of the state to experiment with), but the preparation of a mixed state
is always ambiguous.
How ambiguous is it? Since any can be expressed as a sum of pure
states, let's con ne our attention to the question: in how many ways can a
density operator be expressed as a convex sum of pure states? Mathemati-
cally, this is the question: in how many ways can be written as a sum of
extremal states?
As a rst example, consider the \maximally mixed" state of a single qubit:
= 21 1: (2.99)
This can indeed be prepared as an ensemble of pure states in an in nite
variety of ways. For example,
= 12 j "z ih"z j + 21 j #z ih#z j; (2.100)
so we obtain if we prepare either j "z i or j #z i, each occurring with proba-
bility 12 . But we also have
= 12 j "xih"x j + 21 j #xih#x j; (2.101)
so we obtain if we prepare either j "xi or j #xi, each occurring with proba-
bility 12 . Now the preparation procedures are undeniably di erent. Yet there
is no possible way to tell the di erence by making observations of the spin.
More generally, the point at the center of the Bloch ball is the sum of
any two antipodal points on the sphere { preparing either j "n^ i or j #n^ i, each
occurring with probability 21 will generate = 12 1.
Only in the case where has two (or more) degenerate eigenvalues will
there be distinct ways of generating from an ensemble of mutually orthog-
onal pure states, but there is no good reason to con ne our attention to
66 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
ensembles of mutually orthogonal pure states. We may consider a point in
the interior of the Bloch ball
(P~ ) = 21 (1 + P~ ~); (2.102)
with 0 < jP~ j < 1, and it too can be expressed as
(P~ ) = (^n1) + (1 ; )(^n2); (2.103)
if P~ = n^ 1 + (1 ; )^n2 (or in other words, if P~ lies somewhere on the line
segment connecting the points n^ 1 and n^ 2 on the sphere). Evidently, for any
P~ , there is a solution associated with any chord of the sphere that passes
through the point P~ ; all such chords comprise a two-parameter family.
This highly ambiguous nature of the preparation of a mixed quantum
state is one of the characteristic features of quantum information that con-
trasts sharply with classical probability distributions. Consider, for exam-
ple, the case of a probability distribution for a single classical bit. The two
extremal distributions are those in which either 0 or 1 occurs with 100%
probability. Any probability distribution for the bit is a convex sum of these
two extremal points. Similarly, if there are N possible states, there are N
extremal distributions, and any probability distribution has a unique decom-
position into extremal ones (the convex set of probability distributions is a
simplex). If 0 occurs with 21% probability, 1 with 33% probability, and 2
with 46% probability, there is a unique preparation procedure that yields
this probability distribution!

2.5.3 Faster than light?

Let's now return to our earlier viewpoint { that a mixed state of system
A arises because A is entangled with system B { to further consider the
implications of the ambiguous preparation of mixed states. If qubit A has
density matrix
A = 12 j "z iA Ah"z j + 12 j #z iA A h#z j; (2.104)
this density matrix could arise from an entangled bipartite pure state j iAB
with the Schmidt decomposition
j iAB = p12 (j "z iA j "z iB + j #z iA j #z iB ) : (2.105)
2.5. AMBIGUITY OF THE ENSEMBLE INTERPRETATION 67
Therefore, the ensemble interpretation of A in which either j "z iA or j #z iA
is prepared (each with probability p = 21 ) can be realized by performing a
measurement of qubit B . We measure qubit B in the fj "z iB ; j #z iB g basis;
if the result j "z iB is obtained, we have prepared j "z iA, and if the result
j #7iB is obtained, we have prepared j #z iA .
But as we have already noted, in this case, because A has degenerate
eigenvalues, the Schmidt basis is not unique. We can apply simultaneous
unitary transformations to qubits A and B (actually, if we apply U to A
we must apply U to B ) without modifying the bipartite pure state j iAB .
Therefore, for any unit 3-vector n^ ; j iAB has a Schmidt decomposition of the
form
j iAB = p12 (j "n^ iA j "n^0 iB + j #n^ iA j #n^0 iB ) : (2.106)
We see that by measuring qubit B in a suitable basis, we can realize any
interpretation of A as an ensemble of two pure states.
Bright students, upon learning of this property, are sometimes inspired
to suggest a mechanism for faster-than-light communication. Many copies of
j iAB are prepared. Alice takes all of the A qubits to the Andromeda galaxy
and Bob keeps all of the B qubits on earth. When Bob wants to send a one-
bit message to Alice, he chooses to measure either 1 or 3 for all his spins,
thus preparing Alice's spins in either the fj "z iA ; j #z iA g or fj "xiA ; j #xiA g
ensembles.1 To read the message, Alice immediately measures her spins to
see which ensemble has been prepared.
But exceptionally bright students (or students who heard the previous
lecture) can see the aw in this scheme. Though the two preparation meth-
ods are surely di erent, both ensembles are described by precisely the same
density matrix A . Thus, there is no conceivable measurement Alice can
make that will distinguish the two ensembles, and no way for Alice to tell
what action Bob performed. The \message" is unreadable.
Why, then, do we con dently state that \the two preparation methods
are surely di erent?" To qualm any doubts about that, imagine that Bob
either (1) measures all of his spins along the z^-axis, or (2) measures all of his
spins along the x^-axis, and then calls Alice on the intergalactic telephone. He
does not tell Alice whether he did (1) or (2), but he does tell her the results of
all his measurements: \the rst spin was up, the second was down," etc. Now
1U is real in this case, so U = U and n^ = n^ 0 .
68 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
Alice performs either (1) or (2) on her spins. If both Alice and Bob measured
along the same axis, Alice will nd that every single one of her measurement
outcomes agrees with what Bob found. But if Alice and Bob measured along
di erent (orthogonal) axes, then Alice will nd no correlation between her
results and Bob's. About half of her measurements agree with Bob's and
about half disagree. If Bob promises to do either (1) or (2), and assuming no
preparation or measurement errors, then Alice will know that Bob's action
was di erent than hers (even though Bob never told her this information)
as soon as one of her measurements disagrees with what Bob found. If all
their measurements agree, then if many spins are measured, Alice will have
very high statistical con dence that she and Bob measured along the same
axis. (Even with occasional measurement errors, the statistical test will still
be highly reliable if the error rate is low enough.) So Alice does have a
way to distinguish Bob's two preparation methods, but in this case there is
certainly no faster-than-light communication, because Alice had to receive
Bob's phone call before she could perform her test.

2.5.4 Quantum erasure

We had said that the density matrix A = 12 1 describes a spin in an inco-
herent superposition of the pure states j "z iA and j #z iA . This was to be
distinguished from coherent superpositions of these states, such as
j "x; #xi = 12 (j "z i j #z i) ; (2.107)
in the case of a coherent superposition, the relative phase of the two states
has observable consequences (distinguishes j "xi from j #xi). In the case of an
incoherent superposition, the relative phase is completely unobservable. The
superposition becomes incoherent if spin A becomes entangled with another
spin B , and spin B is inaccessible.
Heuristically, the states j "z iA and j #z iA can interfere (the relative phase
of these states can be observed) only if we have no information about whether
the spin state is j "z iA or j #z iA . More than that, interference can occur
only if there is in principle no possible way to nd out whether the spin
is up or down along the z-axis. Entangling spin A with spin B destroys
interference, (causes spin A to decohere) because it is possible in principle
for us to determine if spin A is up or down along z^ by performing a suitable
measurement of spin B .
2.5. AMBIGUITY OF THE ENSEMBLE INTERPRETATION 69
But we have now seen that the statement that entanglement causes de-
coherence requires a quali cation. Suppose that Bob measures spin B along
the x^-axis, obtaining either the result j "xiB or j #xiB , and that he sends his
measurement result to Alice. Now Alice's spin is a pure state (either j "xiA
or j #xiA ) and in fact a coherent superposition of j "z iA and j #z iA . We have
managed to recover the purity of Alice's spin before the jaws of decoherence
could close!
Suppose that Bob allows his spin to pass through a Stern{Gerlach ap-
paratus oriented along the z^-axis. Well, of course, Alice's spin can't behave
like a coherent superposition of j "z iA and j #z iA ; all Bob has to do is look
to see which way his spin moved, and he will know whether Alice's spin is
up or down along z^. But suppose that Bob does not look. Instead, he care-
fully refocuses the two beams without maintaining any record of whether his
spin moved up or down, and then allows the spin to pass through a second
Stern{Gerlach apparatus oriented along the x^-axis. This time he looks, and
communicates the result of his 1 measurement to Alice. Now the coherence
of Alice's spin has been restored!
This situation has been called a quantum eraser. Entangling the two
spins creates a \measurement situation" in which the coherence of j "z iA and
j #z iA is lost because we can nd out if spin A is up or down along z^ by
observing spin B . But when we measure spin B along x^, this information
is \erased." Whether the result is j "xiB or j #xiB does not tell us anything
about whether spin A is up or down along z^, because Bob has been careful
not to retain the \which way" information that he might have acquired by
looking at the rst Stern{Gerlach apparatus.2 Therefore, it is possible again
for spin A to behave like a coherent superposition of j "z iA and j #z iA (and
it does, after Alice hears about Bob's result).
We can best understand the quantum eraser from the ensemble viewpoint.
Alice has many spins selected from an ensemble described by A = 21 1, and
there is no way for her to observe interference between j "z iA and j #z iA .
When Bob makes his measurement along x^, a particular preparation of the
ensemble is realized. However, this has no e ect that Alice can perceive {
her spin is still described by A = 21 1 as before. But, when Alice receives
Bob's phone call, she can select a subensemble of her spins that are all in
the pure state j "xiA . The information that Bob sends allows Alice to distill
2One often says that the \welcher weg" information has been erased, because it sounds
more sophisticated in German.
70 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
purity from a maximally mixed state.
Another wrinkle on the quantum eraser is sometimes called delayed choice.
This just means that the situation we have described is really completely sym-
metric between Alice and Bob, so it can't make any di erence who measures
rst. (Indeed, if Alice's and Bob's measurements are spacelike separated
events, there is no invariant meaning to which came rst; it depends on the
frame of reference of the observer.) Alice could measure all of her spins to-
day (say along x^) before Bob has made his mind up how he will measure his
spins. Next week, Bob can decide to \prepare" Alice's spins in the states
j "n^ iA and j #n^ iA (that is the \delayed choice"). He then tells Alice which
were the j "n^ iA spins, and she can check her measurement record to verify
that
h1in^ = n^ x^ : (2.108)
The results are the same, irrespective of whether Bob \prepares" the spins
before or after Alice measures them.
We have claimed that the density matrix A provides a complete physical
description of the state of subsystem A, because it characterizes all possible
measurements that can be performed on A. One sometimes hears the objec-
tion3 that the quantum eraser phenomenon demonstrates otherwise. Since
the information received from Bob enables Alice to recover a pure state from
the mixture, how can we hold that everything Alice can know about A is
encoded in A ?
I don't think this is the right conclusion. Rather, I would say that quan-
tum erasure provides yet another opportunity to recite our mantra: \Infor-
mation is physical." The state A of system A is not the same thing as A
accompanied by the information that Alice has received from Bob. This in-
formation (which attaches labels to the subensembles) changes the physical
description. One way to say this mathematically is that we should include
Alice's \state of knowledge" in our description. An ensemble of spins for
which Alice has no information about whether each spin is up or down is a
di erent physical state than an ensemble in which Alice knows which spins
are up and which are down.4
3 For example, from Roger Penrose in Shadows of the Mind.
4 This \state of knowledge" need not really be the state of a human mind; any (inani-
mate) record that labels the subensemble will suce.
2.5. AMBIGUITY OF THE ENSEMBLE INTERPRETATION 71
2.5.5 The GHJW theorem
So far, we have considered the quantum eraser only in the context of a single
qubit, described by an ensemble of equally probable mutually orthogonal
states, (i.e., A = 21 1). The discussion can be considerably generalized.
We have already seen that a mixed state of any quantum system can be
realized as an ensemble of pure states in an in nite number of di erent ways.
For a density matrix A, consider one such realization:
A = X pij'iiA A h'ij; X pi = 1: (2.109)
i
Here the states fj'iiA g are all normalized vectors, but we do not assume
that they are mutually orthogonal. Nevertheless, A can be realized as an
ensemble, in which each pure state j'iiA A h'ij occurs with probability pi .
Of course, for any such A, we can construct a \puri cation" of A, a
bipartite pure state j1iAB that yields A when we perform a partial trace
over HB . One such puri cation is of the form
X
j1iAB = ppij'iiA j iiB ; (2.110)
i
where the vectors j iiB 2 HB are mutually orthogonal and normalized,
B h i j j iB = ij : (2.111)
Clearly, then,
trB (j1iAB AB h1j) = A : (2.112)
Furthermore, we can imagine performing an orthogonal measurement in sys-
tem B that projects onto the j iiB basis.5 The outcome j iiB will occur with
probability pi , and will prepare the pure state j'iiA A h'ij of system A. Thus,
given the puri cation jiAB of A , there is a measurement we can perform
in system B that realizes the j'iiA ensemble interpretation of A. When the
measurement outcome in B is known, we have successfully extracted one of
the pure states j'iiA from the mixture A.
What we have just described is a generalization of preparing j "z iA by
measuring spin B along z^ (in our discussion of two entangled qubits). But
5The j iiB 's might not span HB , but in the state jiAB , measurement outcomes
orthogonal to all the j iiB 's never occur.
72 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
to generalize the notion of a quantum eraser, we wish to see that in the state
j1iAB , we can realize a di erent ensemble interpretation of A by performing
a di erent measurement of B . So let
A = X qj iA A h j; (2.113)

be another realization of the same density matrix A as an ensemble of pure
states. For this ensemble as well, there is a corresponding puri cation
X
j2iAB = pqj iA j iB ; (2.114)

where again the fj iB 'sg are orthonormal vectors in HB . So in the state
j2iAB , we can realize the ensemble by performing a measurement in HB
that projects onto the fj iB g basis.
Now, how are j1iAB and j2iAB related? In fact, we can easily show
that
j1iAB = (1A UB ) j2iAB ; (2.115)
the two states di er by a unitary change of basis acting in HB alone, or
X
j1iAB = pqj iA j iB ; (2.116)

where
j iB = UB j iB ; (2.117)
is yet another orthonormal basis for HB . We see, then, that there is a single
puri cation j1iAB of A, such that we can realize either the fj'iiA g ensemble
or fj iAg ensemble by choosing to measure the appropriate observable in
system B !
Similarly, we may consider many ensembles that all realize A, where
the maximum number of pure states appearing in any of the ensembles is
n. Then we may choose a Hilbert space HB of dimension n, and a pure
state jiAB 2 HA HB , such that any one of the ensembles can be realized
by measuring a suitable observable of B . This is the GHJW 6 theorem. It
expresses the quantum eraser phenomenon in its most general form.
6 For Gisin and Hughston, Jozsa, and Wootters.
2.6. SUMMARY 73
In fact, the GHJW theorem is an almost trivial corollary to the Schmidt
decomposition. Both j1iAB and j2iAB have a Schmidt decomposition, and
because both yield the same A when we take the partial trace over B , these
decompositions must have the form
Xq
j1iAB = k jkiA jk10 iB ;
k
Xq
j2iAB = k jkiA jk20 iB ; (2.118)
k
where the k 's are the eigenvalues of A and the jkiA 's are the corresponding
eigenvectors. But since fjk10 iB g and fjk20 iB g are both orthonormal bases for
HB , there is a unitary UB such that
jk10 iB = UB jk20 iB ; (2.119)
from which eq. (2.115) immediately follows.
In the ensemble of pure states described by Eq. (2.109), we would say that
the pure states j'iiA are superposed incoherently | an observer in system
A cannot detect the relative phases of these states. Heuristically, the reason
that these states cannot interfere is that it is possible in principle to nd
out which representative of the ensemble is actually realized by performing a
measurement in system B , a projection onto the orthonormal basis fj iiB g.
However, by projecting onto the fj iB g basis instead, and relaying the in-
formation about the measurement outcome to system A, we can extract one
of the pure states j iA from the ensemble, even though this state may be a
coherent superposition of the j'iiA 's. In e ect, measuring B in the fj iB g
basis \erases" the \welcher weg" information (whether the state of A is j'iiA
or j'j iA ). In this sense, the GHJW theorem characterizes the general quan-
tum eraser. The moral, once again, is that information is physical | the
information acquired by measuring system B , when relayed to A, changes
the physical description of a state of A.

2.6 Summary
Axioms. The arena of quantum mechanics is a Hilbert space H. The
fundamental assumptions are:
(1) A state is a ray in H.
74 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(2) An observable is a self-adjoint operator on H.
(3) A measurement is an orthogonal projection.
(4) Time evolution is unitary.
Density operator. But if we con ne our attention to only a portion of
a larger quantum system, assumptions (1)-(4) need not be satis ed. In par-
ticular, a quantum state is described not by a ray, but by a density operator
, a nonnegative operator with unit trace.2 The density operator is pure (and
the state can be described by a ray) if = ; otherwise, the state is mixed.
An observable M has expectation value tr(M) in this state.
Qubit. A quantum system with a two-dimensional Hilbert space is called
a qubit. The general density matrix of a qubit is
(P~ ) = 12 (1 + P~ ~) (2.120)
where P~ is a three-component vector of length jP~ j 1. Pure states have
jP~ j = 1.
Schmidt decomposition. For any quantum system divided into two
parts A and B (a bipartite system), the Hilbert space is a tensor product HA
HB . For any pure state j iAB of a bipartite system, there are orthonormal
bases fjiiA g for HA and fji0iB g for HB such that
X
j iAB = ppi jiiAji0iB ; (2.121)
i
Eq. (2.121) is called the Schmidt decomposition of j iAB . In a bipartite pure
state, subsystems A and B separately are described by density operators A
and B ; it follows from eq. (2.121) that A and B have the same nonvanish-
ing eigenvalues (the pi's). The number of nonvanishing eigenvalues is called
the Schmidt number of j iAB . A bipartite pure state is said to be entangled
if its Schmidt number is greater than one.
Ensembles. The density operators on a Hilbert space form a convex set,
and the pure states are the extremal points of the set. A mixed state of a
system A can be prepared as an ensemble of pure states in many di erent
ways, all of which are experimentally indistinguishable if we observe system
A alone. Given any mixed state A of system A, any preparation of A
as an ensemble of pure states can be realized in principle by performing a
2.7. EXERCISES 75
measurement in another system B with which A is entangled. In fact given
many such preparations of A , there is a single entangled state of A and
B such that any one of these preparations can be realized by measuring a
suitable observable in B (the GHJW theorem). By measuring in system B
and reporting the measurement outcome to system A, we can extract from
the mixture a pure state chosen from one of the ensembles.

2.7 Exercises
2.1 Fidelity of a random guess
A single qubit (spin- 21 object) is in an unknown pure state j i, se-
lected at random from an ensemble uniformly distributed over the Bloch
sphere. We guess at random that the state is ji. On the average, what
is the delity F of our guess, de ned by
F jhj ij2 : (2.122)
2.2 Fidelity after measurement
After randomly selecting a one-qubit pure state as in the previous prob-
lem, we perform a measurement of the spin along the z^-axis. This
measurement prepares a state described by the density matrix
= P"h jP"j i + P#h jP#j i (2.123)
(where P";# denote the projections onto the spin-up and spin-down
states along the z^-axis). On the average, with what delity
F h jj i (2.124)
does this density matrix represent the initial state j i? (The improve-
ment in F compared to the answer to the previous problem is a crude
measure of how much we learned by making the measurement.)
2.3 Schmidt decomposition
For the two-qubit state
p ! p !
1 1 3 1 3 1
= p j "iA 2 j "iB + 2 j #iB + p j #iA 2 j "iB + 2 j #iB ;
2 2 (2.125)
76 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
a. Compute A = trB (jihj) and B = trA (jihj).
b. Find the Schmidt decomposition of ji.
2.4 Tripartite pure state
Is there a Schmidt decomposition for an arbitrary tripartite pure state?
That is if j iABC is an arbitrary vector in HA HB HC , can we nd
orthonormal bases fjiiA g, fjiiB g, fjiiC g such that
X
j iABC = ppi jiiA jiiB jiiC ? (2.126)
i
Explain your answer.
2.5 Quantum correlations in a mixed state
Consider a density matrix for two qubits
= 81 1 + 12 j ;ih ; j ; (2.127)
where 1 denotes the 4 4 unit matrix, and
j ;i = p12 (j "ij #i ; j #ij "i) : (2.128)
Suppose we measure the rst spin along the n^ axis and the second spin
along the m^ axis, where n^ m^ = cos . What is the probability that
both spins are \spin-up" along their respective axes?
Chapter 3
Foundations II: Measurement
and Evolution
3.1 Orthogonal Measurement and Beyond
3.1.1 Orthogonal Measurements
We would like to examine the properties of the generalized measurements
that can be realized on system A by performing orthogonal measurements
on a larger system that contains A. But rst we will brie y consider how
(orthogonal) measurements of an arbitrary observable can be achieved in
principle, following the classic treatment of Von Neumann.
To measure an observable M, we will modify the Hamiltonian of the world
by turning on a coupling between that observable and a \pointer" variable
that will serve as the apparatus. The coupling establishes entanglement
between the eigenstates of the observable and the distinguishable states of the
pointer, so that we can prepare an eigenstate of the observable by \observing"
the pointer.
Of course, this is not a fully satisfying model of measurement because we
have not explained how it is possible to measure the pointer. Von Neumann's
attitude was that one can see that it is possible in principle to correlate
the state of a microscopic quantum system with the value of a macroscopic
classical variable, and we may take it for granted that we can perceive the
value of the classical variable. A more complete explanation is desirable and
possible; we will return to this issue later.
We may think of the pointer as a particle that propagates freely apart
77
78 CHAPTER 3. MEASUREMENT AND EVOLUTION
from its tunable coupling to the quantum system being measured. Since we
intend to measure the position of the pointer, it should be prepared initially
in a wavepacket state that is narrow in position space | but not too narrow,
because a vary narrow wave packet will spread too rapidly. If the initial
width of the wave packet is x, then the uncertainty in it velocity will be
of order v = p=m ~=mx, so that after a time t, the wavepacket will
spread to a width
x(t) x + m~t x ; (3.1)
which is minimized for [x(t)]2 [x]2 ~t=m. Therefore, if the experi-
ment takes a time t, the resolution we can achieve for the nal position of
the pointer is limited by
s
~t
x >(x)SQL m ; (3.2)
the \standard quantum limit." We will choose our pointer to be suciently
heavy that this limitation is not serious.
The Hamiltonian describing the coupling of the quantum system to the
pointer has the form
H = H0 + 21m P2 + MP; (3.3)
where P2=2m is the Hamiltonian of the free pointer particle (which we will
henceforth ignore on the grounds that the pointer is so heavy that spreading
of its wavepacket may be neglected), H0 is the unperturbed Hamiltonian of
the system to be measured, and is a coupling constant that we are able to
turn on and o as desired. The observable to be measured, M, is coupled to
the momentum P of the pointer.
If M does not commute with H0, then we have to worry about how the
observable evolves during the course of the measurement. To simplify the
analysis, let us suppose that either [M; H0] = 0, or else the measurement
is carried out quickly enough that the free evolution of the system can be
neglected during the measurement procedure. Then the Hamiltonian can be
approximated as H ' MP (where of course [M; P] = 0 because M is an
observable of the system and P is an observable of the pointer), and the time
evolution operator is
U(t) ' exp[;itMP]: (3.4)
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 79
Expanding in the basis in which M is diagonal,
X
M = jaiMahaj; (3.5)
a
we express U(t) as
X
U(t) = jai exp[;itMaP]haj: (3.6)
a
Now we recall that P generates a translation of the position of the pointer:

P = ;i dxd in the position representation, so that e;ixoP = exp ;xo dxd , and
by Taylor expanding,
e;ixoP (x) = (x ; xo); (3.7)
In other words e;ixoP acting on a wavepacket translates the wavepacket by xo.
We see that if our quantum system starts in a superposition of M eigenstates,
initially unentangled with the position-space wavepacket j (x) of the pointer,
then after time t the quantum state has evolved to
X !
U(t) a jai j (x)i
a

X
= a jai j (x ; tMa)i; (3.8)
a
the position of the pointer is now correlated with the value of the observable
M. If the pointer wavepacket is narrow enough for us to resolve all values of
the Ma that occur (x <tMa), then when we observe the position of the
pointer (never mind how!) we will prepare an eigenstate of the observable.
With probability j aj2, we will detect that the pointer has shifted its position
by tMa, in which case we will have prepared the M eigenstate jai. In the
end, then, we conclude that the initial state j'i or the quantum system is
projected to jai with probability jhaj'ij2. This is Von Neumann's model of
orthogonal measurement.
The classic example is the Stern{Gerlach apparatus. To measure 3 for a
spin- 21 object, we allow the object to pass through a region of inhomogeneous
magnetic eld
B3 = z: (3.9)
80 CHAPTER 3. MEASUREMENT AND EVOLUTION
The magnetic moment of the object is ~ , and the coupling induced by the
magnetic eld is
H = ;z 3: (3.10)
In this case 3 is the observable to be measured, coupled to the position
z rather than the momentum of the pointer, but that's all right because z
generates a translation of Pz , and so the coupling imparts an impulse to the
pointer. We can perceive whether the object is pushed up or down, and so
project out the spin state j "z i or j #z i. Of course, by rotating the magnet,
we can measure the observable n^ ~ instead.
Our discussion of the quantum eraser has cautioned us that establishing
the entangled state eq. (3.8) is not sucient to explain why the measurement
procedure prepares an eigenstate of M. In principle, the measurement of the
pointer could project out a peculiar superposition of position eigenstates,
and so prepare the quantum system in a superposition of M eigenstates. To
achieve a deeper understanding of the measurement process, we will need to
explain why the position eigenstate basis of the pointer enjoys a privileged
status over other possible bases.
If indeed we can couple any observable to a pointer as just described, and
we can observe the pointer, then we can perform any conceivable orthogonal
projection in Hilbert space. Given a set of operators fEag such that
X
Ea = Eya; EaEb = abEa; Ea = 1; (3.11)
a
we can carry out a measurement procedure that will take a pure state j ih j
to
Eaj ih jEa (3.12)
h jEaj i
with probability
Prob(a) = h jEaj i: (3.13)
The measurement outcomes can be described by a density matrix obtained
by summing over all possible outcomes weighted by the probability of that
outcome (rather than by choosing one particular outcome) in which case the
measurement modi es the initial pure state according to
X
j ih j ! Eaj ih jEa: (3.14)
a
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 81
This is the ensemble of pure states describing the measurement outcomes
{ it is the description we would use if we knew a measurement had been
performed, but we did not know the result. Hence, the initial pure state has
become a mixed state unless the initial state happened to be an eigenstate
of the observable being measured. If the initial state before the measure-
ment were a mixed state with density matrix , then by expressing as an
ensemble of pure states we nd that the e ect of the measurement is
! X EaEa: (3.15)
a

3.1.2 Generalized measurement

We would now like to generalize the measurement concept beyond these
orthogonal measurements considered by Von Neumann. One way to arrive
at the idea of a generalized measurement is to suppose that our system A
is extended to a tensor product HA HB , and that we perform orthogonal
measurements in the tensor product, which will not necessarily be orthogonal
measurements in A alone. At rst we will follow a somewhat di erent course
that, while not as well motivated physically, is simpler and more natural from
a mathematical view point.
We will suppose that our Hilbert space HA is part of a larger space that
has the structure of a direct sum
H = HA H?A : (3.16)
Our observers who \live" in HA have access only to observables with support
in HA , observables MA such that
MAj ?i = 0 = h ?jMA ; (3.17)
for any j ?i 2 H?A : For example, in a two-qubit world, we might imagine
that our observables have support only when the second qubit is in the state
j0i2. Then HA = H1 j0i2 and H?A = H1 j1i2 , where H1 is the Hilbert
space of qubit 1. (This situation may seem a bit arti cial, which is what I
meant in saying that the direct sum decomposition is not so well motivated.)
Anyway, when we perform orthogonal measurement in H, preparing one of
a set of mutually orthogonal states, our observer will know only about the
component of that state in his space HA. Since these components are not
82 CHAPTER 3. MEASUREMENT AND EVOLUTION
necessarily orthogonal in HA , he will conclude that the measurement prepares
one of a set or non-orthogonal states.
Let fjiig denote a basis for HA and fjig a basis for H?A . Suppose that
the initial density matrix A has support in HA , and that we perform an
orthogonal measurement in H. We will consider the case in which each Ea is
a one-dimensional projector, which will be general enough for our purposes.
Thus, Ea = juaihua j, where juai is a normalized vector in H. This vector has
a unique orthogonal decomposition
juai = j ~ai + j ~a?i; (3.18)
where j ~ai and j ~a?i are (unnormalized) vectors in HA and H?A respectively.
After the measurement, the new density matrix will be juaihuaj with proba-
bility huajAjuai = h ~ajA j ~ai (since A has no support on H?A ).
But to our observer who knows nothing of H?A , there is no physical
distinction
p between juai and j ~ai (aside from normalization). If we write
j ~ai = a j ai, where j ai is a normalized state, then for the observer lim-
ited to observations in HA, we might as well say that the outcome of the
measurement is j aih aj with probability h ~ajAj ~ai.
Let us de ne an operator
Fa = EAEaEA = j ~aih ~a j = aj aih aj; (3.19)
(where EA is the orthogonal projection taking H to HA). Then we may say
that the outcome a has probability tr Fa. It is evident that each Fa is
hermitian and nonnegative, but the F a's are not projections unless a = 1.
Furthermore
X X !
F a = E A E a E A = E A = 1A ; (3.20)
a a
the F a's sum to the identity on HA
A partition of unity by nonnegative operators is called a positive operator-
valued measure (POVM). (The term measure is a bit heavy-handed in our
nite-dimensional context; it becomes more apt when the index a can be
continually varying.) In our discussion we have arrived at the special case
of a POVM by one-dimensional operators (operators with one nonvanishing
eigenvalue). In the generalized measurement theory, each outcome has a
probability that can be expressed as
Prob(a) = tr F a: (3.21)
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 83
The Ppositivity of F a is necessary to ensure that the probabilities are positive,
and a F a = 1 ensures that the probabilities sum to unity.
How does a general POVM a ect the quantum state? There is not any
succinct general answer to this question that is particularly useful, but in
the case of a POVM by one-dimensional operators (as just discussed), where
the outcome j aih a j occurs with probability tr(F a), summing over the
outcomes yields
! 0 = X j aih a j(ah ajj ai)
a
X q q
= aj aih aj a j aih aj
a q q
X
= F a F a; (3.22)
a
(which generalizes Von Neumann's Pa E aE a to theP case where the F a's are
not projectors). Note that tr0 = tr = 1 because a F a = 1.

3.1.3 One-qubit POVM

For example, consider a single qubit and suppose that fn^ ag are N unit 3-
vectors that satisfy
X
an^ a = 0; (3.23)
a
where the a's are positive real numbers, 0 < a < 1, such that Pa a = 1.
Let
F a = a(1 + nâ ~) = 2aE (^na); (3.24)
(where E (^na) is the projection j "nâ ih"nâ j). Then
X
F a = (X a)1 + (X a n^ a) ~ = 1; (3.25)
a a a
hence the F 's de ne a POVM.
In the case N = 2, we have n^1 + n^ 2 = 0, so our POVM is just an
orthogonal measurement along the n^ 1 axis. For N = 3, in the symmetric
case 1 = 2 = 3 = 31 . We have n^1 + n^2 + n^3 = 0, and
F a = 13 (1 + n^ a ~) = 23 E (^na): (3.26)
84 CHAPTER 3. MEASUREMENT AND EVOLUTION
3.1.4 Neumark's theorem
We arrived at the concept of a POVM by considering orthogonal measure-
ment in a space larger than HA . Now we will reverse our tracks, showing
that any POVM can be realized in this way.
So consider an P arbitrary POVM with n one-dimensional positive opera-
tors F a satisfying na=1 F a = 1. We will show that this POVM can always
be realized by extending the Hilbert space to a larger space, and perform-
ing orthogonal measurement in the larger space. This statement is called
Neumark's theorem.1
To prove it, consider a Hilbert space H with dim H = N , and a POVM
fF ag; a = 1; : : : ; n, with n N . Each one-dimensional positive operator can
be written
F a = j ~aih ~aj; (3.27)
where the vector j ~ai isPnot normalized. Writing out the matrix elements
explicitly, the property a F a = 1 becomes
X
n Xn
(Fa)ij = ~ai ~aj = ij : (3.28)
a=1 a=1

Now let's change our perspective on eq. (3.28). Interpret the ( a)i's not as
n N vectors in an N -dimensional space, but rather an N n vectors
( iT )a in an n-dimensional space. Then eq. (3.28) becomes the statement
that these N vectors form an orthonormal set. Naturally, it is possible to
extend these vectors to an orthonormal basis for an n-dimensional space. In
other words, there is an n n matrix uai, with uai = ~ai for i = 1; 2; : : : ; N ,
such that
X
uaiuaj = ij ; (3.29)
a

or, in matrix form U yU = 1. It follows that UU y = 1, since

U (U yU )j i = (UU y)U j i = U j i (3.30)
1 For a discussion of POVM's and Neumark's theorem, see A. Peres, Quantum Theory:
Concepts and Methods.
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 85
for any vector j i, and (at least for nite-dimensional matrices) the range
of U is the whole n-dimension space. Returning to the component notation,
we have
X
uaj ubj = ab; (3.31)
j
so the (ua)i are a set of n orthonormal vectors.2
Now suppose that we perform an orthogonal measurement in the space
of dimension n N de ned by
E a = juaihuaj: (3.32)
We have constructed the juai's so that each has an orthogonal decomposition
juai = j ~ai + j ~a?i; (3.33)
where j ~ai 2 H and j ~a?i 2 H? . By orthogonally projecting this basis onto
H, then, we recover the POVM fF ag. This completes the proof of Neumark's
theorem.
To illustrate Neumark's theorem in action, consider again the POVM on
a single qubit with
F a = 32 j "n^a ih"n^a j; (3.34)
a = 1; 2; 3, where 0 = n^ 1 +^n2 +^n3. According to the theorem, this POVM can
be realized as an orthogonal measurement on a \qutrit," a quantum system
in a three-dimensional Hilbertpspace. p
Let n^ 1 = (0; 0; 1); n^2 = ( 3=2; 0; ;1=2); n^3 = (; 3=2; 0; 0; ;1=2), and
therefore, recalling that
cos !
j; ' = 0i = sin 2 (3.35)
2
q
we may write the three vectors j ~ai = 2=3ja; ' = 0i (where 1; 2; 3 =
0; 2=3; 4=3) as
0q 1 0q 1 0 q 1
1=6 ; 1=6
j ~1i; j ~2i; j ~3i = @ 20=3 A ; B@ q CA ; B@ q CA : (3.36)
1=2 1=2
2In other words, we have shown that if the rows of an n n matrix are orthonormal,
then so are the columns.
86 CHAPTER 3. MEASUREMENT AND EVOLUTION
Now, we may interpret these three two-dimensional vectors as a 2 3 matrix,
and as Neumark's theorem assured us, the two rows are orthonormal. Hence
we can add one more orthonormal row:
0q 1 0 q 1 0 q 1
BB 2 =3 CC BB q
1= 6 CC BB ;q 1=6 CC
ju1i; ju2i; ju3i = @ q 0 A ; B@ q1=2 CA ; B@ q1=2 CA ;
1=3 ; 1=3 1=3 (3.37)
and we see (as the theorem also assured us) that the columns (the juai's) are
then orthonormal as well. If we perform an orthogonal measurement onto
the juai basis, an observer cognizant of only the two-dimensional subspace
will conclude that we have performed the POVM fF 1; F 2; F 3g. We have
shown that if our qubit is secretly two components of a qutrit, the POVM
may be realized as orthogonal measurement of the qutrit.

3.1.5 Orthogonal measurement on a tensor product

A typical qubit harbors no such secret, though. To perform a generalized
measurement, we will need to provide additional qubits, and perform joint
orthogonal measurements on several qubits at once.
So now we consider the case of two (isolated) systems A and B , described
by the tensor product HA HB . Suppose we perform an orthogonal mea-
surement on the tensor product, with
X
E a = 1; (3.38)
a
where all E a's are mutually orthogonal projectors. Let us imagine that the
initial system of the quantum system is an \uncorrelated" tensor product
state
AB = A B : (3.39)
Then outcome a occurs with probability
Prob(a) = trAB [E a(A B )]; (3.40)
in which case the new density matrix will be
0AB (a) = trE a([EA ( B )E a)] : (3.41)
AB a A B
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 87
To an observer who has access only to system A, the new density matrix for
that system is given by the partial trace of the above, or
0A(a) = trtrBAB[E[aE(a(A B)E)]a] : (3.42)
A B
The expression eq. (3.40) for the probability of outcome a can also be written
Prob(a) = trA [trB (E a(A B ))] = trA (F aA ); (3.43)
If we introduce orthonormal bases fjiiAg for HA and jiB for HB , then
X X
(Ea)j;i (A)ij (B ) = (Fa)ji(A)ij ; (3.44)
ij ij
or
X
(Fa)ji = (Ea)j;i (B ) : (3.45)

It follows from eq. (3.45) that each F a has the properties:
(1) Hermiticity: X
(Fa)ij = (Ea)i;j (B )

X
= (Ea)j;i (B ) = Fji

(because E a and B are hermitian:
(2) Positivity:
In the basis that diagonalizes = P p ji hj, h jF j i =
P p ( h j hj)E (j i ji ) B B A
B a A
A B a A B
0 (because E a is positive):
(3) Completeness:
X
F a = X p B hj X E ajiB = 1A
a a
X
(because E a = 1AB and tr B = 1):
a
88 CHAPTER 3. MEASUREMENT AND EVOLUTION
But the F a's need not be mutually orthogonal. In fact, the number of F a's
is limited only by the dimension of HA HB , which is greater than (and
perhaps much greater than) the dimension of HA .
There is no simple way, in general, to express the nal density matrix
A(a) in terms of A and F a . But let us disregard how the POVM changes
0
the density matrix, and instead address this question: Suppose that HA has
dimension N , and
P consider a POVM with n one-dimensional nonnegative
F a's satisfying a=1 F a = 1A . Can we choose the space HB , density matrix
n
B in HB , and projection operators E a in HA HB (where the number or
E a's may exceed the number of F a's) such3 that the probability of outcome
a of the orthogonal measurement satis es
tr E a(A B ) = tr(F aA) ? (3.46)
(Never mind how the orthogonal projections modify A!) We will consider
this to be a \realization" of the POVM by orthogonal measurement, because
we have no interest in what the state 0A is for each measurement outcome;
we are only asking that the probabilities of the outcomes agree with those
de ned by the POVM.
Such a realization of the POVM is indeed possible; to show this, we will
appeal once more to Neumark's theorem. Each one-dimensional F a; a =
1; 2; : : : ; n, can be expressed as F a = j ~aih ~aj. According to Neumark,
there are n orthonormal n-component vectors juai such that
juai = j ~ai + j ~a?i: (3.47)
Now consider, to start with, the special case n = rN , where r is a positive
integer. Then it is convenient to decompose j ~a?i as a direct sum of r ; 1
N -component vectors:
j ~a?i = j ~1?;ai j ~2?;ai j ~r?;1;ai; (3.48)
Here j ~1?;ai denotes the rst N components of j ~a?i, j ~2?;ai denotes the next
N components, etc. Then the orthonormality of the juai's implies that
rX
;1
ab = huajubi = h aj bi + h ~;a
~ ~ ? j ~? i :
;b (3.49)
=1
3 If there are more E a 's than F a 's, all but n outcomes have probability zero.
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 89
Now we will choose HB to have dimension r and we will denote an orthonor-
mal basis for HB by
fjiB g; = 0; 1; 2; : : : ; r ; 1: (3.50)
Then it follows from Eq. (3.49) that
rX
;1
jaiAB = j ~aiA j0iB + j ~;a
? i ji ; a = 1; 2; : : : ; n;
A B
=1 (3.51)
is an orthonormal basis for HA HB .
Now suppose that the state in HA HB is
AB = A j0iB B h0j; (3.52)
and that we perform an orthogonal projection onto the basis fjaiAB g in
HA HB . Then, since B h0jiB = 0 for 6= 0, the outcome jaiAB occurs
with probability
AB ha jAB ja iAB = A h ~a jA j ~a iA ; (3.53)
and thus,
ha jAB jaiAB = tr(F aA): (3.54)
We have indeed succeeded in \realizing" the POVM fF ag by performing
orthogonal measurement on HA HB. This construction is just as ecient as
the \direct sum" construction described previously; we performed orthogonal
measurement in a space of dimension n = N r.
If outcome a occurs, then the state
0AB = jaiAB AB haj; (3.55)
is prepared by the measurement. The density matrix seen by an observer
who can probe only system A is obtained by performing a partial trace over
HB ,
0A = trB (jaiAB AB haj)
rX
;1
= j ~aiA Ah ~aj + j ~;a
? iA A h ~? j
;a (3.56)
=1
90 CHAPTER 3. MEASUREMENT AND EVOLUTION
which isn't quite the same thing as what we obtained in our \direct sum"
construction. In any case, there are many possible ways to realize a POVM
by orthogonal measurement and eq. (3.56) applies only to the particular
construction we have chosen here.
Nevertheless, this construction really is perfectly adequate for realizing
the POVM in which the state j aiA Ah aj is prepared in the event that
outcome a occurs. The hard part of implementing a POVM is assuring that
outcome a arises with the desired probability. It is then easy to arrange that
the result in the event of outcome a is the state j aiA Ah aj; if we like, once
the measurement is performed and outcome a is found, we can simply throw
A away and proceed to prepare the desired state! In fact, in the case of the
projection onto the basis jaiAB , we can complete the construction of the
POVM by projecting system B onto the fjiB g basis, and communicating
the result to system A. If the outcome is j0iB , then no action need be taken.
If the outcome is jiB , > 0, then the state j ~;a ? iA has been prepared,
which can then be rotated to j aiA.
So far, we have discussed only the special case n = rN . But if actually
n = rN ; c, 0 < c < N , then we need only choose the nal c components of
j ~r?;1;aiA to be zero, and the states jiAB will still be mutually orthogonal.
To complete the orthonormal basis, we may add the c states

jeiiA jr ; 1iB ; i = N ; c + 1; N ; c + 2; : : : N ; (3.57)

here ei is a vector whose only nonvanishing component is the ith component,

so that jeiiA is guaranteed to be orthogonal to j ~r?;1;aiA . In this case, the
POVM is realized as an orthogonal measurement on a space of dimension
rN = n + c.
As an example of the tensor product construction, we may consider once
again the single-qubit POVM with

F a = 23 j "n^a iA A h"n^a j; a = 1; 2; 3: (3.58)

We may realize this POVM by introducing a second qubit B . In the two-

3.1. ORTHOGONAL MEASUREMENT AND BEYOND 91
qubit Hilbert space, we may project onto the orthonormal basis4
s s
jai = 3 j "nâ iA j0iB + 13 j0iA j1iB ; a = 1; 2; 3;
2
j0i = j1iA j1iB : (3.59)
If the initial state is AB = A j0iB B h0j, we have
hajAB jai = 23 Ah"nâ jAj "nâ iA (3.60)
so this projection implements the POVM on HA . (This time we performed
orthogonal measurements in a four-dimensional space; we only needed three
dimensions in our earlier \direct sum" construction.)

3.1.6 GHJW with POVM's

In our discussion of the GHJW theorem, we saw that by preparing a state
X
jiAB = pqj iA j iB ; (3.61)

we can realize the ensemble
A = X qj iA A h j; (3.62)

by performing orthogonal measurements on HB . Moreover, if dim HB = n,
then for this single pure state jiAB , we can realize any preparation of A as
an ensemble of up to n pure states by measuring an appropriate observable
on HB .
But we can now see that if we are willing to allow POVM's on HB rather
than orthogonal measurements only, then even for dim HB = N , we can
realize any preparation of A by choosing the POVM on HB appropriately.
The point is that B has support on a space that is at most dimension N .
We may therefore rewrite jiAB as
X
jiAB = pqj iA j ~iB ; (3.63)

4 ~ p
Here the phase of j 2i = 2=3j "n^ 2 i di ers by ;1 from that in eq. (3.36); it has
been chosen so that h"n^ a j "n^ b i = ;1=2 for a 6= b. We have made this choice so that the
coecient of j0iAj1iB is positive in all three of j1i; j2i; j3i:
92 CHAPTER 3. MEASUREMENT AND EVOLUTION
where j ~iB is the result of orthogonally projecting j iB onto the support
of B . We may now perform the POVM on the support of B with F =
j ~iB B h ~j, and thus prepare the state j iA with probability q.

3.2 Superoperators
3.2.1 The operator-sum representation
We now proceed to the next step of our program of understanding the be-
havior of one part of a bipartite quantum system. We have seen that a pure
state of the bipartite system may behave like a mixed state when we observe
subsystem A alone, and that an orthogonal measurement of the bipartite
system may be a (nonorthogonal) POVM on A alone. Next we ask, if a state
of the bipartite system undergoes unitary evolution, how do we describe the
evolution of A alone?
Suppose that the initial density matrix of the bipartite system is a tensor
product state of the form
A j0iB B h0j; (3.64)
system A has density matrix A, and system B is assumed to be in a pure
state that we have designated j0iB . The bipartite system evolves for a nite
time, governed by the unitary time evolution operator
UAB (A j0iB B h0j) UAB : (3.65)
Now we perform the partial trace over HB to nd the nal density matrix of
system A,

0A = trB UAB (A j0iB B h0j) UyAB
X
= B hjUAB j0iB A B h0jUAB jiB ; (3.66)

where fjiB g is an orthonormal basis for HB0 and B hjUAB j0iB is an operator
acting on HA . (If fjiiA jiB g is an orthonormal basis for HA HB , then
B hjUAB j iB denotes the operator whose matrix elements are

A hij (B hjUAB j iB ) jj iA
3.2. SUPEROPERATORS 93
= (A hij B hj) UAB (jj iA j iB ) :) (3.67)
If we denote
M = B hjUAB j0iB ; (3.68)
then we may express 0A as
X
$(A) 0A = MAMy: (3.69)

It follows from the unitarity of UAB that the M 's satisfy the property
X y X
M M = B h0jUyAB jiB B hjUAB j0iB

= B h0jUyAB UAB j0iB = 1A: (3.70)
Eq. (3.69) de nes a linear map $ that takes linear operators to linear
operators. Such a map, if the property in eq. (3.70) is satis ed, is called a
superoperator, and eq. (3.69) is called the operator sum representation (or
Kraus representation) of the superoperator. A superoperator can be regarded
as a linear map that takes density operators to density operators, because it
follows from eq. (3.69) and eq. (3.70) that 0A is a density matrix if A is:
(1) 0 is hermitian: 0y = P M y My = .
A A A A
(2) 0A has unit trace: tr0A = P tr(AMy M) = trA = 1.
(3) 0A is positive: A h j0Aj iA = P(h jM )A(My j i) 0.
We showed that the operator sum representation in eq. (3.69) follows from
the \unitary representation" in eq. (3.66). But furthermore, given the oper-
ator sum representation of a superoperator, it is always possible to construct
a corresponding unitary representation. We choose HB to be a Hilbert space
whose dimension is at least as large as the number of terms in the operator
sum. If fj'Ag is any vector in HA, the fjiB g are orthonormal states in HB ,
and j0iB is some normalized state in HB , de ne the action of UAB by
X
UAB (j'iA j0iB ) = Mj'iA jiB : (3.71)

94 CHAPTER 3. MEASUREMENT AND EVOLUTION
This action is inner product preserving:
X ! X !
A h'2 jM B h j M j'1iA jiB
y

X
= Ah'2j My Mj'1iA = A h'2j'1iA ; (3.72)

therefore, UAB can be extended to a unitary operator acting on all of HA
HB . Taking the partial trace we nd

trB UAB (j'iA j0iB ) (A h'j B h0j) UyAB
X
= M (j'iA A h'j) My : (3.73)

Since any A can be expressed as an ensemble of pure states, we recover the
operator sum representation acting on an arbitrary A .
It is clear that the operator sum representation of a given superoperator
$ is not unique. We can perform the partial trace in any basis we please. If
we use the basis fB h 0 j = P U B hjg then we obtain the representation
X
$(A ) = N ANy ; (3.74)

where N = U M. We will see shortly that any two operator-sum repre-
sentations of the same superoperator are always related this way.
Superoperators are important because they provide us with a formalism
for discussing the general theory of decoherence, the evolution of pure states
into mixed states. Unitary evolution of A is the special case in which there
is only one term in the operator sum. If there are two or more terms, then
there are pure initial states of HA that become entangled with HB under
evolution governed by UAB . That is, if the operators M1 and M2 appearing
in the operator sum are linearly independent, then there is a vector j'iA such
that j'~1iA = M1j'iA and j'~2iA = M2j'iA are linearly independent, so that
the state j'~1iA j1iB + j'~2iA j2iB + has Schmidt number greater than one.
Therefore, the pure state j'iA A h'j evolves to the mixed nal state 0A.
Two superoperators $1 and $2 can be composed to obtain another super-
operator $2 $1; if $1 describes evolution from yesterday to today, and $2
3.2. SUPEROPERATORS 95
describes evolution from today to tomorrow, then $2 $1 describes the evolu-
tion from yesterday to tomorrow. But is the inverse of a superoperator also a
superoperator; that is, is there a superoperator that describes the evolution
from today to yesterday? In fact, you will show in a homework exercise that
a superoperator is invertible only if it is unitary.
Unitary evolution operators form a group, but superoperators de ne a
dynamical semigroup. When decoherence occurs, there is an arrow of time;
even at the microscopic level, one can tell the di erence between a movie that
runs forwards and one running backwards. Decoherence causes an irrevocable
loss of quantum information | once the (dead) cat is out of the bag, we can't
put it back in again.

3.2.2 Linearity
Now we will broaden our viewpoint a bit and consider the essential properties
that should be satis ed by any \reasonable" time evolution law for density
matrices. We will see that any such law admits an operator-sum representa-
tion, so in a sense the dynamical behavior we extracted by considering part
of a bipartite system is actually the most general possible.
A mapping $ : ! 0 that takes an initial density matrix to a nal
density matrix 0 is a mapping of operators to operators that satis es
(1) $ preserves hermiticity: 0 hermitian if is.
(2) $ is trace preserving: tr0 = 1 if tr = 1.
(3) $ is positive: 0 is nonnegative if is.
It is also customary to assume
(0) $ is linear.
While (1), (2), and (3) really are necessary if 0 is to be a density matrix,
(0) is more open to question. Why linearity?
One possible answer is that nonlinear evolution of the density matrix
would be hard to reconcile with any ensemble interpretation. If
$ (()) $ (1 + (1 ; )2) = $(1) + (1 ; )$(2);
(3.75)
96 CHAPTER 3. MEASUREMENT AND EVOLUTION
then time evolution is faithful to the probabilistic interpretation of ():
either (with probability ) 1 was initially prepared and evolved to $(1), or
(with probability 1 ; ) 2 was initially prepared and evolved to $(2). But
a nonlinear $ typically has consequences that are seemingly paradoxical.
Consider, for example, a single qubit evolving according to
$() = exp [i1tr(1)] exp [;i1tr(1)] : (3.76)
One can easily check that $ is positive and trace-preserving. Suppose that
the initial density matrix is = 12 1, realized as the ensemble
= 12 j "z ih"z j + 21 j #z ih#z j: (3.77)
Since tr(1) = 0, the evolution of is trivial, and both representatives of
the ensemble are unchanged. If the spin was prepared as j "z i, it remains in
the state j "z i.
But now imagine that, immediately after preparing the ensemble, we do
nothing if the state has been prepared as j "z i, but we rotate it to j "xi if it
has been prepared as j #z i. The density matrix is now
0 = 12 j "z ih"z j + 12 j "xij "xi; (3.78)
so that tr01 = 12 . Under evolution governed by $, this becomes $(0) =
101. In this case then, if the spin was prepared as j "z i, it evolves to the
orthogonal state j #z i.
The state initially prepared as j "z i evolves di erently under these two
scenarios. But what is the di erence between the two cases? The di erence
was that if the spin was initially prepared as j #z i, we took di erent actions:
doing nothing in case (1) but rotating the spin in case (2). Yet we have found
that the spin behaves di erently in the two cases, even if it was initially
prepared as j "z i!
We are accustomed to saying that describes two (or more) di erent
alternative pure state preparations, only one of which is actually realized
each time we prepare a qubit. But we have found that what happens if we
prepare j "z i actually depends on what we would have done if we had prepared
j #xi instead. It is no longer sensible, apparently, to regard the two possible
preparations as mutually exclusive alternatives. Evolution of the alternatives
actually depends on the other alternatives that supposedly were not realized.
3.2. SUPEROPERATORS 97
Joe Polchinski has called this phenomenon the \Everett phone," because the
di erent \branches of the wave function" seem to be able to \communicate"
with one another.
Nonlinear evolution of the density matrix, then, can have strange, perhaps
even absurd, consequences. Even so, the argument that nonlinear evolution
should be excluded is not completely compelling. Indeed Jim Hartle has
argued that there are versions of \generalized quantum mechanics" in which
nonlinear evolution is permitted, yet a consistent probability interpretation
can be salvaged. Nevertheless, we will follow tradition here and demand that
$ be linear.

3.2.3 Complete positivity

It would be satisfying were we able to conclude that any $ satisfying (0) - (3)
has an operator-sum representation, and so can be realized by unitary evolu-
tion of a suitable bipartite system. Sadly, this is not quite possible. Happily,
though, it turns out that by adding one more rather innocuous sounding
assumption, we can show that $ has an operator-sum representation.
The additional assumption we will need (really a stronger version of (3))
is
(3') $ is completely positive.
Complete positivity is de ned as follows. Consider any possible extension of
HA to the tensor product HA HB ; then $A is completely positive on HA if
$A IB is positive for all such extensions.
Complete positivity is surely a reasonable property to demand on physical
grounds. If we are studying the evolution of system A, we can never be certain
that there is no system B , totally uncoupled to A, of which we are unaware.
Complete positivity (combined with our other assumptions) is merely the
statement that, if system A evolves and system B does not, any initial density
matrix of the combined system evolves to another density matrix.
We will prove that assumptions (0), (1), (2), (30) are sucient to ensure
that $ is a superoperator (has an operator-sum representation). (Indeed,
properties (0) - (30) can be taken as an alternative de nition of a superopera-
tor.) Before proceeding with the proof, though, we will attempt to clarify the
concept of complete positivity by giving an example of a positive operator
that is not completely positive. The example is the transposition operator
T : ! T : (3.79)
98 CHAPTER 3. MEASUREMENT AND EVOLUTION
T preserves the eigenvalues of and so clearly is positive.
But is T completely positive (is TA IB necessarily positive)? Let us
choose dim(HB ) = dim(HA) = N , and consider the maximally entangled
state
XN
jiAB = p1 jiiA ji0iB ; (3.80)
N i=1
where fjiiA g and fji0iB g are orthonormal bases for HA and HB respectively.
Then
X
TA IB : = jiAB AB hj = N1 (jiiA A hj j) (ji0iB B hj 0j)
i;j
1 X
! 0 = N (jj iA A hij) (ji0iB B hj 0j): (3.81)
i;j
We see that the operator N 0 acts as
X X
N 0 :( aijiiA) ( bj jj 0iB )
i
X 0 j X
! ( aiji iB ) ( bj jj iA ); (3.82)
i j
or
N 0(j'iA j iB ) = j iA j'iB : (3.83)
Hence N 0 is a swap operator (which squares to the identity). The eigenstates
of N 0 are states symmetric under the interchange A $ B , with eigenvalue 1,
and antisymmetric states with eigenvalue ;1. Since 0 has negative eigenval-
ues, it is not positive, and (since is certainly positive), therefore, TA IB
does not preserve positivity. We conclude that TA, while positive, is not
completely positive.

3.2.4 POVM as a superoperator

A unitary transformation that entangles A with B , followed by an orthog-
onal measurement of B , can be described as a POVM in A. In fact, the
positive operators comprising the POVM can be constructed from the Kraus
operators. If j'iA evolves as
X
j'iA j0iB ! M j'iA jiB ; (3.84)

3.2. SUPEROPERATORS 99
then the measurement in B that projects onto the fjiE g basis has outcome
with probability
Prob() = A h'jM y M j'iA: (3.85)
Expressing A as an ensemble of pure states, we nd the probability
Prob() = tr(F A); F = M yM ; (3.86)
P
for outcome ; evidently F is positive, and F = 1 follows from the
normalization of the Kraus operators. So this is indeed a realization of a
POVM.
In particular, a POVM that modi es a density matrix according to
q q
! X F F ; (3.87)

q
is a special case of a superoperator. Since each F is hermitian, the re-
quirement
X
F = 1; (3.88)

is just the operator-sum normalization condition. Therefore, the POVM has
a \unitary representation;" there is a unitary UAB that acts as
Xq
UAB : j'iA j0iB ! F j'iA jiB ; (3.89)

where j'iA is a pure state of system A. Evidently, then, by performing an
orthogonal measurement in system B that projects onto the basis fjiB g, we
can realize the POVM that prepares
q q
F F
0A = tr(F A )

A
(3.90)
with probability
Prob() = tr(F A): (3.91)
This implementation of the POVM is not the most ecient possible (we
require a Hilbert space HA HB of dimension N n, if the POVM has n
possible outcomes) but it is in some ways the most convenient. A POVM is
the most general measurement we can perform in system A by rst entangling
system A with system B , and then performing an orthogonal measurement
in system B .
100 CHAPTER 3. MEASUREMENT AND EVOLUTION
3.3 The Kraus Representation Theorem
Now we are almost ready to prove that any $ satisfying the conditions
(0); (1); (2), and (30) has an operator-sum representation (the Kraus rep-
resentation theorem).5 But rst we will discuss a useful trick that will be
employed in the proof. It is worthwhile to describe the trick separately,
because it is of wide applicability.
The trick (which we will call the \relative-state method") is to completely
characterize an operator MA acting on HA by describing how MA 1B acts
on a single pure maximally entangled state6 in HA HB (where dim(HB )
dim(HA) N ). Consider the state
XN
j ~iAB = jiiA ji0iB (3.92)
i=1
where fjiiAg and fji0iB g are orthonormal bases of HA and HB . (We have
chosen to normalizepj ~iAB so that AB h ~j ~iAB = N ; this saves us from writing
various factors of N in the formulas below.) Note that any vector
X
j'iA = aijiiA ; (3.93)
i
in HA may be expressed as a \partial" inner product
j'iA =B h'j ~iAB ; (3.94)
where
X
j'iB = ai ji0iB : (3.95)
i
We say that j'iA is the \relative state" of the \index state" j'iB . The map
j'iA ! j'iB ; (3.96)
is evidently antilinear, and it is in fact an antiunitary map from HA to a
subspace of HB . The operator M A 1B acting on j ~iAB gives
X
(M A 1B )j ~iAB = M A jiiA ji0iB : (3.97)
i
5 The argument given here follows B. Schumacher, quant-ph/9604023 (see Appendix A
of that paper.).
6 We say that the state j iAB is maximally entangled if trB (j iAB AB h j) / 1A .
3.3. THE KRAUS REPRESENTATION THEOREM 101
From this state we can extract M Aj iA as a relative state:
B h' j(M A 1B )j ~iAB = M A j'iA :
(3.98)
We may interpret the relative-state formalism by saying that we can realize
an ensemble of pure states in HA by performing measurements in HB on an
entangled state { the state j'iA is prepared when the measurement in HB
has the outcome j'iB . If we intend to apply an operator in HA , we have
found that it makes no di erence whether we rst prepare the state and then
apply the operator or we rst apply the operator and then prepare the state.
Of course, this conclusion makes physical sense. We could even imagine that
the preparation and the operation are spacelike separated events, so that the
temporal ordering has no invariant (observer-independent) meaning.
We will show that $A has an operator-sum representation by applying
the relative-state method to superoperators rather than operators. Because
we assume that $A is completely positive, we know that $A IB is positive.
Therefore, if we apply $A IB to ~AB = j ~iAB AB h ~j, the result is a positive
operator, an (unconventionally normalized) density matrix ~0AB in HA HB .
Like any density matrix, ~0AB can be expanded as an ensemble of pure states.
Hence we have
X
($A IB )(j ~iAB AB h ~j) = qj~ iAB AB h~ j; (3.99)

(where q > 0, P q = 1, and each j~ i, like j ~iAB , is normalized so that
h~ j~ i = N ). Invoking the relative-state method, we have
$A(j'iA A h'j) =B h'j($A IB )(j ~iAB AB h ~j)j'iB
X
= q B h'j~ iAB AB h~ j'iB : (3.100)

Now we are almost done; we de ne an operator M on HA by
M : j'iA ! pq B h'j~ iAB : (3.101)
We can check that:
1. M is linear, because the map j'iA ! j'iB is antilinear.
2. $A(j'iA A h'j) = P M (j'iA Ah'j)M y, for any pure state j'iA 2 HA.
102 CHAPTER 3. MEASUREMENT AND EVOLUTION
3. $A (A) = P M A M y for any density matrix A, because A can be
expressed as an ensemble of pure states, and $A is linear.
4. P M y M = 1A , because $A is trace preserving for any A.
Thus, we have constructed an operator-sum representation of $A.
Put succinctly, the argument went as follows. Because $A is completely
positive, $A IB takes a maximally entangled density matrix on HA HB to
another density matrix. This density matrix can be expressed as an ensemble
of pure states. With each of these pure states in HA HB , we may associate
(via the relative-state method) a term in the operator sum.
Viewing the operator-sum representation this way, we may quickly estab-
lish two important corollaries:
How many Kraus operators? Each M is associated with a state
j i in the ensemble representation of ~AB . Since ~AB has a rank at most
0 0
N 2 (where N = dim HA ), $A always has an operator-sum representation with
at most N 2 Kraus operators.
How ambiguous? We remarked earlier that the Kraus operators
Na = M Ua; (3.102)
(where Ua is unitary) represent the same superoperator $ as the M 's. Now
we can see that any two Kraus representations of $ must always be related
in this way. (If there are more Na's than M 's, then it is understood that
some zero operators are added to the M 's so that the two operator sets
have the same cardinality.) This property may be viewed as a consequence
of the GHJW theorem.
The relative-state construction described above established a 1 ; 1 corre-
spondence between ~ ensemble~ representations of the (unnormalized) density
matrix ($A IB) j iAB AB h j and operator-sum representations of $A . (We
explicitly described how to proceed from the ensemble representation to the
operator sum, but we can clearly go the other way, too: If
X
$A(jiiA Ahj j) = M jiiA Ahj jM y ; (3.103)

then
X
($A IB )(j ~iAB AB h ~j) = (M jiiA ji0iB )(Ahj jM y B hj 0j)
i;j
X
= qj~ iAB AB h~ j; (3.104)

3.3. THE KRAUS REPRESENTATION THEOREM 103
where
pqj~ iAB = X M jiiAji0iB : ) (3.105)
i
Now consider two suchpensembles (or correspondingly
p two operator-sum rep-
~ ~
resentations of $A), f qj iAB g and f pajaiAB g. For each ensemble,
there is a corresponding \puri cation" in HAB HC :
Xp ~
qjiAB j iC

Xp ~
pajaiAB j aiC ; (3.106)
a
where f( iC g and fj aiC g are two di erent orthonormal sets in Hc . The
GHJW theorem asserts that these two puri cations are related by 1AB U0C ,
a unitary transformation on HC . Therefore,
Xp ~
pa jaiAB j aiC
a

Xp ~
= qjiAB U0C j iC

X
= pqj~ iAB Ua j aiC ; (3.107)
;a
where, to establish the second equality we note that the orthonormal bases
fj iC g and fj aiC g are related by a unitary transformation, and that a
product of unitary transformations is unitary. We conclude that
pp j~ i = X pq j~ i U ; (3.108)
a a AB AB a

(where Ua is unitary) from which follows
N a = X M Ua : (3.109)

Remark. Since we have already established that we can proceed from an
operator-sum representation of $ to a unitary representation, we have now
found that any \reasonable" evolution law for density operators on HA can
104 CHAPTER 3. MEASUREMENT AND EVOLUTION
be realized by a unitary transformation UAB that acts on HA HB according
to
X
UAB : j iA j0iB ! j'iA jiB : (3.110)

Is this result surprising? Perhaps it is. We may interpret a superoperator as
describing the evolution of a system (A) that interacts with its environment
(B ). The general states of system plus environment are entangled states.
But in eq. (3.110), we have assumed an initial state of A and B that is
unentangled. Apparently though a real system is bound to be entangled
with its surroundings, for the purpose of describing the evolution of its density
matrix there is no loss of generality if we imagine that there is no pre-existing
entanglement when we begin to track the evolution!
Remark: The operator-sum representation provides a very convenient
way to express any completely positive $. But a positive $ does not admit
such a representation if it is not completely positive. As far as I know, there
is no convenient way, comparable to the Kraus representation, to express the
most general positive $.

3.4 Three Quantum Channels

The best way to familiarize ourselves with the superoperator concept is to
study a few examples. We will now consider three examples (all interesting
and useful) of superoperators for a single qubit. In deference to the traditions
and terminology of (classical) communication theory. I will refer to these
superoperators as quantum channels. If we wish, we may imagine that $
describes the fate of quantum information that is transmitted with some loss
of delity from a sender to a receiver. Or, if we prefer, we may imagine (as in
our previous discussion), that the transmission is in time rather than space;
that is, $ describes the evolution of a quantum system that interacts with its
environment.

3.4.1 Depolarizing channel

The depolarizing channel is a model of a decohering qubit that has partic-
ularly nice symmetry properties. We can describe it by saying that, with
probability 1 ; p the qubit remains intact, while with probability p an \er-
ror" occurs. The error can be of any one of three types, where each type of
3.4. THREE QUANTUM CHANNELS 105
error is equally likely. If fj0i; j1ig is an orthonormal basis for the qubit, the
three types of errors can be characterized as:
1i or j i ! j i; = 0 1 ;

1. Bit ip error: jj01i!j
i!j0i 1 1 10
i!j0i or j i ! j i; = 1 0 ;
2. Phase ip error: j1j0i!;j 1i 3 3 0 ;1

i!;ij0i or j i ! 2 j i; 2 = i 0 :
3. Both: jj01i! +ij1i 0 ;i

If an error occurs, then j i evolves to an ensemble of the three states 1j i; 2j i; 3j i,
all occuring with equal likelihood.
Unitary representation
The depolarizing channel can be represented by a unitary operator acting on
HA HE , where HE has dimension 4. (I am calling it HE here to encour-
age you to think of the auxiliary system as the environment.) The unitary
operator UAE acts as
UAE : j iA j0iE
q rp2
! 1 ; pj i j0iE + 3 41j iA j1iE
3
+ 2j i j2iE + 3j i j3iE 5: (3.111)

(Since UAE is inner product preserving, it has a unitary extension to all of

HA HE .) The environment evolves to one of four mutually orthogonal
states that \keep a record" of what transpired; if we could only measure the
environment in the basis fjiE ; = 0; 1; 2; 3g, we would know what kind of
error had occurred (and we would be able to intervene and reverse the error).
Kraus representation
To obtain an operator-sum representation of the channel, we evaluate the
partial trace over the environment in the fjiE g basis. Then
M = E hjU AE j0iE ; (3.112)
106 CHAPTER 3. MEASUREMENT AND EVOLUTION
so that
q rp rp rp
M 0 = 1 ; p 1; M ; = 3 1; M 2 = 3 2; M 3 = 3 3:
(3.113)
Using 2i = 1, we can readily check the normalization condition
X y
M M = (1 ; p) + 3 3p 1 = 1: (3.114)

A general initial density matrix A of the qubit evolves as
! 0 = (1 ; p)+
p ( + + ) : (3.115)
3 1 1 2 2 3 3
where we are summing over the four (in principle distinguishable) ways that
the environment could evolve.
Relative-state representation
We can also characterize the channel by describing how a maximally-entangled
state of two qubits evolves, when the channel acts only on the rst qubit.
There are four mutually orthogonal maximally entangled states, which may
be denoted
j+iAB = p1 (j00iAB + j11iAB );
2
j;iAB = p12 (j00iAB ; j11iAB );
j +iAB = p12 (j01iAB + j10iAB );
j ;iAB = p12 (j01iAB ; j10iAB ): (3.116)
If the initial state is j+iAB , then when the depolarizing channel acts on the
rst qubit, the entangled state evolves as
j+ih+ j ! (1 ; p)j+ ih+ j
3.4. THREE QUANTUM CHANNELS 107
0 1
p
+ 3 @j +ih + j + j ;ih ;j + j; ih; jA: (3.117)
The \worst possible" quantum channel has p = 3=4 for in that case the
initial entangled state evolves as
0
1
j+ih+ j ! 4 @j+ ih+ j + j;ih; j
1
+j +ih + j + j ;ih ;jA = 14 1AB ; (3.118)
it becomes the totally random density matrix on HA HB . By the relative-
state method, then, we see that a pure state j'iA of qubit A evolves as
1
j'iA A h'j ! B h' j2 4 1AB j'iB = 12 1A ;
(3.119)
it becomes the random density matrix on HA, irrespective of the value of the
initial state j'iA. It is as though the channel threw away the initial quantum
state, and replaced it by completely random junk.
An alternative way to express the evolution of the maximally entangled
state is
4 4 1
j ih j ! 1 ; 3 p j ih j + 3 p 4 1AB :
+ + + + (3.120)
Thus instead of saying that an error occurs with probability p, with errors of
three types all equally likely, we could instead say that an error occurs with
probability 4=3p, where the error completely \randomizes" the state (at least
we can say that for p 3=4). The existence of two natural ways to de ne
an \error probability" for this channel can sometimes cause confusion and
misunderstanding.
One useful measure of how well the channel preserves the original quan-
tum information is called the \entanglement delity" Fe. It quanti es how
\close" the nal density matrix is to the original maximally entangled state
j+i:
Fe = h+ j0j+i: (3.121)
For the depolarizing channel, we have Fe = 1 ; p, and we can interpret Fe
as the probability that no error occured.
108 CHAPTER 3. MEASUREMENT AND EVOLUTION
Block-sphere representation
It is also instructive to see how the depolarizing channel acts on the Bloch
sphere. An arbitrary density matrix for a single qubit can be written as

= 12 1 + P~ ~ ; (3.122)
where P~ is the \spin polarization" of the qubit. Suppose we rotate our axes
so that P~ = P3e^3 and = 12 (1 + P33). Then, since 333 = 3 and
131 = ;3 = 232, we nd

0 = 1 ; p + p3 12 (1 + P33) + 23p 12 (1 ; P33);
(3.123)

or P30 = 1 ; 43 p P3. From the rotational symmetry, we see that

P~ 0 = 1 ; 43 p P~ ; (3.124)
irrespective of the direction in which P points. Hence, the Bloch sphere
contracts uniformly under the action of the channel; the spin polarization
is reduced by the factor 1 ; 43 p (which is why we call it the depolarizing
channel). This result was to be expected in view of the observation above
that the spin is totally \randomized" with probability 43 p.
Invertibility?
Why do we say that the superoperator is not invertible? Evidently we can
reverse a uniform contraction of the sphere with a uniform in ation. But
the trouble is that the in ation of the Bloch sphere is not a superoperator,
because it is not positive. In ation will take values of P~ with jP~ j 1 to
values with jP~ j > 1, and so will take a density operator to an operator
with a negative eigenvalue. Decoherence can shrink the ball, but no physical
process can blow it up again! A superoperator running backwards in time is
not a superoperator.

3.4.2 Phase-damping channel

Our next example is the phase-damping channel. This case is particularly
instructive, because it provides a revealing caricature of decoherence in re-
3.4. THREE QUANTUM CHANNELS 109
alistic physical situations, with all inessential mathematical details stripped
away.
Unitary representation
A unitary representation of the channel is
q
j0iA j0iE ! 1 ; pj0iA j0iE + ppj0iA j1iE ;
q
j1iA j0iE ! 1 ; pj1iA j0iE + ppj1iA j2iE : (3.125)
In this case, unlike the depolarizing channel, qubit A does not make any
transitions. Instead, the environment \scatters" o of the qubit occasionally
(with probability p) being kicked into the state j1iE if A is in the state j0iA
and into the state j2iE if A is in the state j1iA . Furthermore, also unlike the
depolarizing channel, the channel picks out a preferred basis for qubit A; the
basis fj0iA ; j1iA g is the only basis in which bit ips never occur.
Kraus operators
Evaluating the partial trace over HE in the fj0iE ; j1iE ; j2iE gbasis, we obtain
the Kraus operators
q p 1 0 p 0 0
M 0 = 1 ; p1; M 1 = p 0 0 ; M 2 = p 0 1 :
(3.126)
it is easy to check that M 20 + M 21 + M 22 = 1. In this case, three Kraus
operators are not really needed; a representation with two Kraus operators
is possible, as you will show in a homework exercise.
An initial density matrix evolves to
$() = M 0M 0 + M 1M 1 + M 2M 2
! !
0
= (1 ; p) + p 0 = (1 ; p)
00 00 (1 ; p ) 01 ;
11 10 11 (3.127)
thus the on-diagonal terms in remain invariant while the o -diagonal terms
decay.
Now suppose that the probability of a scattering event per unit time is
;, so that p = ;t 1 when time t elapses. The evolution over a time
110 CHAPTER 3. MEASUREMENT AND EVOLUTION
t = nt is governed by $n , so that the o -diagonal terms are suppressed by
(1 ; p)n = (1 ; ;t)t=t ! e;;t (as t ! 0). Thus, if we prepare an initial
pure state aj0i + bj1i, then after a time t ;;1, the state decays to the
incoherent superposition 0 = jaj2j0ih0j + jbj2j1ih1j. Decoherence occurs, in
the preferred basis fj0i; j1ig.

Bloch-sphere representation
This will be worked out in a homework exercise.
Interpretation
We might interpret the phase-damping channel as describing a heavy \clas-
sical" particle (e.g., an interstellar dust grain) interacting with a background
gas of light particles (e.g., the 30K microwave photons). We can imagine
that the dust is initially prepared in a superposition of position eigenstates
j i = p12 (jxi + j ; xi) (or more generally a superposition of position-space
wavepackets with little overlap). We might be able to monitor the behavior
of the dust particle, but it is hopeless to keep track of the quantum state of
all the photons that scatter from the particle; for our purposes, the quantum
state of the particle is described by the density matrix obtained by tracing
over the photon degrees of freedom.
Our analysis of the phase damping channel indicates that if photons are
scattered by the dust particle at a rate ;, then the o -diagonal terms in
decay like exp(;;t), and so become completely negligible for t ;;1 .
At that point, the coherence of the superposition of position eigenstates is
completely lost { there is no chance that we can recombine the wavepackets
and induce them to interfere. (If we attempt to do a double-slit interference
pattern with dust grains, we will not see any interference pattern if it takes
a time t ;;1 for the grain to travel from the source to the screen.)
The dust grain is heavy. Because of its large inertia, its state of motion is
little a ected by the scattered photons. Thus, there are two disparate time
scales relevant to its dynamics. On the one hand, there is a damping time
scale, the time for a signi cant amount of the particle's momentum to be
transfered to the photons; this is a long time if the particle is heavy. On the
other hand, there is the decoherence time scale. In this model, the time scale
for decoherence is of order ;, the time for a single photon to be scattered
by the dust grain, which is far shorter than the damping time scale. For a
3.4. THREE QUANTUM CHANNELS 111
macroscopic object, decoherence is fast.
As we have already noted, the phase-damping channel picks out a pre-
ferred basis for decoherence, which in our \interpretation" we have assumed
to be the position-eigenstate basis. Physically, decoherence prefers the spa-
tially localized states of the dust grain because the interactions of photons
and grains are localized in space. Grains in distinguishable positions tend to
scatter the photons of the environment into mutually orthogonal states.
Even if the separation between the \grains" were so small that it could
not be resolved very well by the scattered photons, the decoherence process
would still work in a similar way. Perhaps photons that scatter o grains at
positions x and ;x are not mutually orthogonal, but instead have an overlap
h + j ;i = 1 ; "; " 1: (3.128)
The phase-damping channel would still describe this situation, but with p
replaced by p" (if p is still the probability of a scattering event). Thus, the
decoherence rate would become ;dec = ";scat; where ;scat is the scattering
rate (see the homework).
The intuition we distill from this simple model applies to a vast variety
of physical situations. A coherent superposition of macroscopically distin-
guishable states of a \heavy" object decoheres very rapidly compared to its
damping rate. The spatial locality of the interactions of the system with its
environment gives rise to a preferred \local" basis for decoherence. Presum-
ably, the same principles would apply to the decoherence of a \cat state"
p1 (j deadi + j alivei), since \deadness" and \aliveness" can be distinguished
2
by localized probes.

3.4.3 Amplitude-damping channel

The amplitude-damping channel is a schematic model of the decay of an ex-
cited state of a (two-level) atom due to spontaneous emission of a photon. By
detecting the emitted photon (\observing the environment") we can perform
a POVM that gives us information about the initial preparation of the atom.
Unitary representation
We denote the atomic ground state by j0iA and the excited state of interest
by j1iA . The \environment" is the electromagnetic eld, assumed initially to
be in its vacuum state j0iE . After we wait a while, there is a probability p
112 CHAPTER 3. MEASUREMENT AND EVOLUTION
that the excited state has decayed to the ground state and a photon has been
emitted, so that the environment has made a transition from the state j0iE
(\no photon") to the state j1iE (\one photon"). This evolution is described
by a unitary transformation that acts on atom and environment according
to
j0iA j0iE ! q
j0iA j0iE
j1iA j0iE ! 1 ; pj1iA j0iE + ppj0iA j1iE : (3.129)
(Of course, if the atom starts out in its ground state, and the environment
is at zero temperature, then there is no transition.)
Kraus operators
By evaluating the partial trace over the environment in the basis fj0iE ; j1iE g,
we nd the kraus operators
! pp !
1 0
M 0 = 0 p1 ; p ; M 1 = 0 0 ; 0 (3.130)
and we can check that
! !
1 0
M 0M 0 + M 1M 1 = 0 1 ; p 0 p = 1:
y y 0 0
(3.131)
The operator M 1 induces a \quantum jump" { the decay from j1iA to j0iA ,
and M 0 describes how the state evolves if no jump occurs. The density
matrix evolves as
! $() = M 0Mpy0 + M 1M
!1
y
!

= p1 ; p (1 ; p)
00 1 ; p 01 p
+ 0 0 11 0
10 11
+ p
p1 ; p !
= p1 ; p (1 ; p) 01 :
00 11 (3.132)
10 11
If we apply the channel n times in succession, the 11 matrix element decays
as
11 ! (1 ; p)n 11; (3.133)
3.4. THREE QUANTUM CHANNELS 113
so if the probability of a transition in time interval t is ;t, then the
probability that the excited state persists for time t is (1 ; ;t)t=t ! e;;t,
the expected exponential decay law.
As t ! 1, the decay probability approaches unity, so
!
$() ! 00 + 11 0
0 0 ; (3.134)
The atom always winds up in its ground state. This example shows that it
is sometimes possible for a superoperator to take a mixed initial state, e.g.,
!

= 0 11 ;
00 0 (3.135)
to a pure nal state.
Watching the environment
In the case of the decay of an excited atomic state via photon emission, it
may not be impractical to monitor the environment with a photon detector.
The measurement of the environment prepares a pure state of the atom, and
so in e ect prevents the atom from decohering.
Returning to the unitary representation of the amplitude-damping chan-
nel, we see that a coherent superposition of the atomic ground and excited
states evolves as
(aj0iA + bj1iA )j0iE
q
! (aj0iA + b 1 ; pj1i)j0iE + ppj0iA j1iE :
(3.136)
If we detect the photon (and so project out the state j1iE of the environment),
then we have prepared the state j0iA of the atom. In fact, we have prepared
a state in which we know with certainty that the initial atomic state was the
excited state j1iA { the ground state could not have decayed.
On the other hand, if we detect no photon, and our photon detector has
perfect eciency, then we have projected out the state j0iE of the environ-
ment, and so have prepared the atomic state
q
aj0iA + b 1 ; pj1iA : (3.137)
114 CHAPTER 3. MEASUREMENT AND EVOLUTION
The atomic state has evolved due to our failure to detect a photon { it has
become more likely that the initial atomic state was the ground state!
As noted previously, a unitary transformation that entangles A with E ,
followed by an orthogonal measurement of E , can be described as a POVM
in A. If j'iA evolves as
X
j'iAj0iE ! M j'iA jiE ; (3.138)

then an orthogonal measurement in E that projects onto the fjiE g basis
realizes a POVM with
Prob() = tr(F A); F = M yM ; (3.139)
for outcome . In the case of the amplitude damping channel, we nd
! !
1 0
F0 = 0 1 ; p ; F1 = 0 p ; 0 0 (3.140)
where F 1 determines the probability of a successful photon detection, and
F 0 the complementary probability that no photon is detected.
If we wait a time t ;;1, so that p approaches 1, our POVM approaches
an orthogonal measurement, the measurement of the initial atomic state in
the fj0iA ; j1iA g basis. A peculiar feature of this measurement is that we can
project out the state j0iA by not detecting a photon. This is an example
of what Dicke called \interaction-free measurement" { because no change
occured in the state of the environment, we can infer what the atomic state
must have been. The term \interaction-free measurement" is in common use,
but it is rather misleading; obviously, if the Hamiltonian of the world did not
include a coupling of the atom to the electromagnetic eld, the measurement
could not have been possible.

3.5 Master Equation

3.5.1 Markovian evolution
The superoperator formalism provides us with a general description of the
evolution of density matrices, including the evolution of pure states to mixed
states (decoherence). In the same sense, unitary transformations provide
3.5. MASTER EQUATION 115
a general description of coherent quantum evolution. But in the case of
coherent evolution, we nd it very convenient to characterize the dynamics of
a quantum system with a Hamiltonian, which describes the evolution over an
in nitesimal time interval. The dynamics is then described by a di erential
equation, the Schrodinger equation, and we may calculate the evolution over
a nite time interval by integrating the equation, that is, by piecing together
the evolution over many in nitesimal intervals.
It is often possible to describe the (not necessarily coherent) evolution of
a density matrix, at least to a good approximation, by a di erential equation.
This equation, the master equation, will be our next topic.
In fact, it is not at all obvious that there need be a di erential equation
that describes decoherence. Such a description will be possible only if the
evolution of the quantum system is \Markovian," or in other words, local in
time. If the evolution of the density operator (t) is governed by a ( rst-
order) di erential equation in t, then that means that (t + dt) is completely
determined by (t).
We have seen that we can always describe the evolution of density op-
erator A in Hilbert space HA if we imagine that the evolution is actually
unitary in the extended Hilbert space HA HE . But even if the evolution
in HA HE is governed by a Schrdinger equation, this is not sucient to
ensure that the evolution of A(t) will be local in t. Indeed, if we know only
A (t), we do not have complete initial data for the Schrodinger equation;
we need to know the state of the \environment," too. Since we know from
the general theory of superoperators that we are entitled to insist that the
quantum state in HA HE at time t = 0 is
A j0iE E h0j; (3.141)
a sharper statement of the diculty is that the density operator A(t + dt)
depends not only on A(t), but also on A at earlier times, because the
reservoir E 7 retains a memory of this information for a while, and can transfer
it back to system A.
This quandary arises because information ows on a two-way street. An
open system (whether classical or quantum) is dissipative because informa-
tion can ow from the system to the reservoir. But that means that informa-
tion can also ow back from reservoir to system, resulting in non-Markovian
7In discussions of the mater equation, the environment is typically called the reservoir,
in deference to the deeply ingrained conventions of statistical physics.
116 CHAPTER 3. MEASUREMENT AND EVOLUTION
uctuations of the system.8
Except in the case of coherent (unitary) evolution, then, uctuations
are inevitable, and an exact Markovian description of quantum dynamics
is impossible. Still, in many contexts, a Markovian description is a very
good approximation. The key idea is that there may be a clean separation
between the typical correlation time of the uctuations and the time scale
of the evolution that we want to follow. Crudely speaking, we may denote
by (t)res the time it takes for the reservoir to \forget" information that it
acquired from the system | after time (t)res we can regard that information
as forever lost, and neglect the possibility that the information may feed back
again to in uence the subsequent evolution of the system.
Our description of the evolution of the system will incorporate \coarse-
graining" in time; we perceive the dynamics through a lter that screens out
the high frequency components of the motion, with ! (tcoarse);1. An
approximately Markovian description should be possible, then, if (t)res
(t)coarse; we can neglect the memory of the reservoir, because we are unable
to resolve its e ects. This \Markovian approximation" will be useful if the
time scale of the dynamics that we want to observe is long compared to
(t)coarse, e.g., if the damping time scale (t)damp satis es
(t)damp (t)coarse (t)res: (3.142)
This condition often applies in practice, for example in atomic physics, where
(t)res ~=kT 10;14 s (T is the temperature) is orders of magnitude larger
than the typical lifetime of an excited atomic state.
An instructive example to study is the case where the system A is a
single harmonic oscillator
P (HA = !aya), and the reservoir R consists of many
oscillators (H = ! byb , weakly coupled to the system by a perturbation
R i i i i
X
H0 = i (abyi + aybi): (3.143)
i
The reservoir Hamiltonian could represent the (free) electromagnetic eld,
and then H 0, in lowest nontrivial order of perturbation theory induces tran-
sitions in which the oscillator emits or absorbs a single photon, with its
occupation number n = aya decreasing or increasing accordingly.
8 This inescapable connection underlies the uctuation-dissipation theorem, a powerful
tool in statistical physics.
3.5. MASTER EQUATION 117
We could arrive at the master equation by analyzing this system using
time-dependent perturbation theory, and carefully introducing a nite fre-
quency cuto . The details of that analysis can be found in the book \An
Open Systems Approach to Quantum Optics," by Howard Carmichael. Here,
though, I would like to short-circuit that careful analysis, and leap to the
master equation by a more heuristic route.

3.5.2 The Lindbladian

Under unitary evolution, the time evolution of the density matrix is governed
by the Schrodinger equation
_ = ;i[H ; ]; (3.144)
which we can solve formally to nd
(t) = e;iH t(0)eiH t; (3.145)
if H is time independent. Our goal is to generalize this equation to the case
of Markovian but nonunitary evolution, for which we will have
_ = L[]: (3.146)
The linear operator L, which generates a nite superoperator in the same
sense that a Hamiltonian H generates unitary time evolution, will be called
the Lindbladian. The formal solution to eq. (3.146) is
(t) = eLt[(0)]; (3.147)
if L is t-independent.
To compute the Lindbladian, we could start with the Schrodinger equa-
tion for the coupled system and reservoir
_A = trR(_AR) = trR(;i[H AR ; AR]); (3.148)
but as we have already noted, we cannot expect that this formula for _A
can be expressed in terms of A alone. To obtain the Lindbladian, we need
to explicitly invoke the Markovian approximation (as Carmichael does). On
the other hand, suppose we assume that the Markov approximation applies.
We already know that a general superoperator has a Kraus representation
(t) = $t ((0)) = X M (t)(0)M y(t); (3.149)

118 CHAPTER 3. MEASUREMENT AND EVOLUTION
and that $t=0 = I . If the elapsed time is the in nitesimal interval dt, and
(dt) = (0) + O(dt); (3.150)
p operators will be M 0 = 1 + O(dt), and all the others
then one of the Kraus
will be of order dt. The operators M ; > 0 describe the \quantum
jumps" that the system might undergo, all occuring with a probability of
order dt. We may, therefore, write
p
M = dt L; = 1; 2; 3; : : :
M 0 = 1 + (;iH + K)dt; (3.151)
where H and K are both hermitian and L ; H , and K are all zeroth order
in dt. In fact, we can determine K by invoking the Kraus normalization
condition:
X X
1 = M yM = 1 + dt(2K + Ly L ); (3.152)
>0
or
K = ; 12 X Ly L : (3.153)
>0
Substituting into eq. (3.149), expressing (dt) = (0) + dt_(0), and equating
terms of order dt, we obtain Lindblad's equation:
X 1 1
_ L[] = ;i[H ; ] + L L ; 2 LL ; 2 LL :
y y y
>0 (3.154)
The rst term in L[] is the usual Schrodinger term that generates unitary
evolution. The other terms describe the possible transitions that the system
may undergo due to interactions with the reservoir. The operators L are
called Lindblad operators or quantum jump operators. Each L Ly term in-
duces one of the possible quantum jumps, while the ;1=2LyL ; 1=2LyL
terms are needed to normalize properly the case in which no jumps occur.
Lindblad's eq (3.154) is what we were seeking { the general form of (com-
pletely positive) Markovian evolution of a density matrix: that is, the master
equation. It follows from the Kraus representation that we started with that
Lindblad's equation preserves density matrices: (t + dt) is a density matrix
3.5. MASTER EQUATION 119
if (t) is. Indeed, we can readily check, using eq. (3.154), that _ is Hermitian
and tr_ = 0. That L[] preserves positivity is somewhat less manifest but,
as already noted, follows from the Kraus representation.
If we recall the connection between the Kraus representation and the uni-
tary representation of a superoperator, we clarify the interpretation of the
master equation. We may imagine that we are continuously monitoring the
reservoir, projecting it in each instant of time onto the jiR basis. With
probability 1 ; 0(dt), the reservoir remains in the state j0iR, but with prob-
ability of order dt, the reservoir makes a quantum jump to one of the states
jiR; > 0. When we say that the reservoir has \forgotten" the information
it acquired from the system (so that the Markovian approximation applies),
we mean that these transitions occur with probabilities that increase linearly
with time. Recall that this is not automatic in time-dependent perturbation
theory. At a small time t the probability of a particular transition is propor-
tional to t2; we obtain a rate (in the derivation of \Fermi's golden rule") only
by summing over a continuum of possible nal states. Because the number
of accessible states actually decreases like 1=t, the probability of a transition,
summed over nal states, is proportional to t. By using a Markovian de-
scription of dynamics, we have implicitly assumed that our (t)coarse is long
enough so that we can assign rates to the various possible transitions that
might be detected when we monitor the environment. In practice, this is
where the requirement (t)coarse (t)res comes from.

3.5.3 Damped harmonic oscillator

As an example to illustrate the master equation, we consider the case of a
harmonic oscillator interacting with the electromagnetic eld, coupled as
H 0 = X i (abyi + aybi): (3.155)
i
Let us also suppose that the reservoir is at zero temperature; then the ex-
citation level of the oscillator can cascade down by successive emission of
photons, but no absorption of photons will occur. Hence, there is only one
jump operator:
p
L1 = ;a: (3.156)
Here ; is the rate for the oscillator to decay from the rst excited (n = 1)
state to the ground (n = 0) state; because of the form of H , the rate for
120 CHAPTER 3. MEASUREMENT AND EVOLUTION
the decay from level n to n ; I is n;.9 The master equation in the Lindblad
form becomes
_ = ;i[H 0; ] + ;(aay ; 21 aya ; 12 aya): (3.157)
where H 0 = !aya is the Hamiltonian of the oscillator. This is the same
equation obtained by Carmichael from a more elaborate analysis. (The only
thing we have missed is the Lamb shift, a radiative renormalization of the
frequency of the oscillator that is of the same order as the jump terms in
L[]:)
The jump terms in the master equation describe the damping of the os-
cillator due to photon emission.10 To study the e ect of the jumps, it is
convenient to adopt the interaction picture; we de ne interaction picture
operators I and aI by

(t) = e;iH tI (t)eiH t;

0 0

a(t) = e;iH taI (t)eiH t;

0 0
(3.158)
so that
_I = ;(aI I ayI ; 21 ayI aI ; 12 I ayI aI ): (3.159)
where in fact aI (t) = ae;i!t so we can replace aI by a on the right-hand
side. The variable a~ = e;iH0tae+iH0 t = ei!ta remains constant in the absence
of damping. With damping, a~ decays according to
d ha~ i = d tr(a ) = tra_ ; (3.160)
dt dt I

and from eq. (3.159) we have

1 1
tra_ = ;tr a I a ; 2 aa aI ; 2 aI a a
2 y y y

9 The nth level of excitation of the oscillator may be interpreted as a state of n nonin-
teracting particles; the rate is n; because any one of the n particles can decay.
10This model extends our discussion of the amplitude-damping channel to a damped
oscillator rather than a damped qubit.
3.5. MASTER EQUATION 121
1
= ;tr 2 [a ; a]aI = ; ;2 tr(aI ) = ; ;2 ha~ i:
y (3.161)
Integrating this equation, we obtain
ha~ (t)i = e;;t=2ha~ (0)i: (3.162)
Similarly, the occupation number of the oscillator n aya = a~ ya~ decays
according to
d hni = d ha~ ya~ i = tr(aya_ )
dt dt I

= ;tr ayaaI ay ; 12 ayaayaI ; 21 ayaI aya
= ;tray[ay; a]aI = ;;trayaI = ;;hni; (3.163)
which integrates to
hn(t)i = e;;thn(0)i: (3.164)
Thus ; is the damping rate of the oscillator. We can interpret the nth
excitation state of the oscillator as a state of n noninteracting particles,
each with a decay probability ; per unit time; hence eq. (3.164) is just the
exponential law satis ed by the population of decaying particles.
More interesting is what the master equation tells us about decoherence.
The details of that analysis will be a homework exercise. But we will analyze
here a simpler problem { an oscillator undergoing phase damping.

3.5.4 Phase damping

To model phase damping of the oscillator, we adopt a di erent coupling of
the oscillator to the reservoir:
X y ! y
H = ibi bi a a:
0 (3.165)
i
Thus, there is just one Lindblad operator, and the master equation in the
interaction picture is.

_I = ; ayaI aya ; 12 (aya)2I ; 21 I (aya)2 : (3.166)
122 CHAPTER 3. MEASUREMENT AND EVOLUTION
Here ; can be interpreted as the rate at which reservoir photons are scattered
when the oscillator is singly occupied. If the occupation number is n then
the scattering rate becomes ;n2 . The reason for the factor of n2 is that
the contributions to the scattering amplitude due to each of n oscillator
\particles" all add coherently; the amplitude is proportional to n and the
rate to n2.
It is easy to solve for _ I in the occupation number basis. Expanding
I = X nm jnihmj; (3.167)
n;m
(where ayajni = njni), the master equation becomes

_nm = ; nm ; 12 n2 ; 21 m2 nm
= ; ; (n ; m)2nm ; (3.168)
2
which integrates to

nm (t) = nm (0) exp ; 12 ;t(n ; m)2 : (3.169)
If we prepare a \cat state" like
jcati = p1 (j0i + jni); n 1; (3.170)
2
a superposition of occupation number eigenstates with much di erent values
of n, the o -diagonal terms in the density matrix decay like exp(; 12 ;n2 t). In
fact, this is just the same sort of behavior we found when we analyzed phase
damping for a single qubit. The rate of decoherence is ;n2 because this is
the rate for reservoir photons to scatter o the excited oscillator in the state
jni. We also see, as before, that the phase decoherence chooses a preferred
basis. Decoherence occurs in the number-eigenstate basis because it is the
occupation number that appears in the coupling H 0 of the oscillator to the
reservoir.
Return now to amplitude damping. In our amplitude damping model, it
is the annihilation operator a (and its adjoint) that appear in the coupling
H 0 of oscillator to reservoir, so we can anticipate that decoherence will occur
in the basis of a eigenstates. The coherent state
X1 n
j i = e;j j2=2e ay j0i = e;j j2=2 p jni; (3.171)
n=0 n!
3.5. MASTER EQUATION 123
is the normalized eigenstate of a with complex eigenvalue . Two coherent
states with distinct eigenvalues 1 and 2 are not orthogonal; rather
jh 1j 2ij2 = e;j 1j2 e;j 2j2 e2Re( 1 2)
= exp(;j 1 ; 2j2); (3.172)
so the overlap is very small when j 1 ; 2j is large.
Imagine that we prepare a cat state
jcati = p1 (j 1i + j 2i); (3.173)
2
a superposition of coherent states with j 1 ; 2j 1. You will show that
the o diagonal terms in decay like
;t
exp ; 2 j 1 ; 2j 2 (3.174)
(for ;t << 1). Thus the decoherence rate
;dec = 12 j 1 ; 2j2;damp; (3.175)
is enormously fast compared to the damping rate. Again, this behavior is easy
to interpret. The expectation value of the occupation number in a coherent
state is h jayaj i = j j2. Therefore, if 1;2 have comparable moduli but
signi cantly di erent phases (as for a superposition of minimum uncertainty
wave packets centered at x and ;x), the decoherence rate is of the order of
the rate for emission of a single photon. This rate is very large compared to
the rate for a signi cant fraction of the oscillator energy to be dissipated.
We can also consider an oscillator coupled to a reservoir with a nite
temperature. Again, the decoherence rate is roughly the rate for a single
photon to be emitted or absorbed, but the rate is much faster than at zero
temperature. Because the photon modes with frequency comparable to the
oscillator frequency ! have a thermal occupation number
n = ~T! ; (3.176)
(for T ~!), the interaction rate is further enhanced by the factor n . We
have then
;dec n n E T
;damp osc ~! ~!
m!~!x ~T! x2 mT x2 ;
2 2
~2
2T (3.177)
124 CHAPTER 3. MEASUREMENT AND EVOLUTION
where x is the amplitude of oscillation and T is the thermal de Broglie
wavelength. Decoherence is fast.

3.6 What is the problem? (Is there a prob-

lem?)
Our survey of the foundations of quantum theory is nearly complete. But
before we proceed with our main business, let us brie y assess the status of
these foundations. Is quantum theory in good shape, or is there a fundamen-
tal problem at its roots still in need of resolution?
One potentially serious issue, rst visited in x2.1, is the measurement prob-
lem. We noted the odd dualism inherent in our axioms of quantum theory.
There are two ways for the quantum state of a system to change: unitary evo-
lution, which is deterministic, and measurement, which is probabilistic. But
why should measurement be fundamentally di erent than any other physical
process? The dualism has led some thoughtful people to suspect that our
current formulation of quantum theory is still not complete.
In this chapter, we have learned more about measurement. In x3.1.1, we
discussed how unitary evolution can establish correlations (entanglement)
between a system and the pointer of an apparatus. Thus, a pure state of
the system can evolve to a mixed state (after we trace over the pointer
states), and that mixed state admits an interpretation as an ensemble of
mutually orthogonal pure states (the eigenstates of the density operator of
the system), each occuring with a speci ed probability. Thus, already in this
simple observation, we nd the seeds of a deeper understanding of how the
\collapse" of a state vector can arise from unitary evolution alone. But on the
other hand, we discussed in x2.5 now the ensemble interpretation of a density
matrix is ambiguous, and we saw particularly clearly in x2.5.5 that, if we are
able to measure the pointer in any basis we please, then we can prepare the
system in any one of many \weird" states, superpositions of eigenstates of the
system's (the GHJW theorem). Collapse, then (which destroys the relative
phases of the states in a superposition), cannot be explained by entanglement
alone.
In x3.4 and x3.5, we studied another important element of the measure-
ment process { decoherence. The key idea is that, for macroscopic systems,
we cannot hope to keep track of all microscopic degrees of freedom. We must
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 125
be content with a coarse-grained description, obtained by tracing over the
many unobserved variables. In the case of a macroscopic measurement ap-
paratus, we must trace over the degrees of freedom of the environment with
which the apparatus inevitably interacts. We then nd that the apparatus
decoheres exceedingly rapidly in a certain preferred basis, a basis determined
by the nature of the coupling of the apparatus to the environment. It seems
to be a feature of the Hamiltonian of the world that fundamental interactions
are well localized in space, and therefore the basis selected by decoherence
is a basis of states that are wellplocalized spatially. The cat is either alive or
dead { it is not in the state 1= 2(jAlivei + jDeadi).
By tracing over the degrees of freedom of the environment, we obtain
a more complete picture of the measurement process, of \collapse." Our
system becomes entangled with the apparatus, which is in turn entangled
with the environment. If we regard the microstate of the environment as
forever inaccessible, then we are well entitled to say that a measurement has
taken place. The relative phases of the basis states of the system have been
lost irrevocably { its state vector has collapsed.
Of course, as a matter of principle, no phase information has really been
lost. The evolution of system + apparatus + environment is unitary and
deterministic. In principle, we could, perhaps, perform a highly nonlocal
measurement of the environment, and restore to the system the phase in-
formation that was allegedly destroyed. In this sense, our explanation of
collapse is, as John Bell put it, merely FAPP (for all practical purposes).
After the \measurement," the coherence of the system basis states could still
be restored in principle (we could reverse the measurement by \quantum era-
sure"), but undoing a measurement is extremely improbable. True, collapse
is merely FAPP (though perhaps we might argue, in a cosmological context,
that some measurements really are irreversible in principle), but isn't FAPP
good enough?
Our goal in physics is to account for observed phenomena with a model
that is as simple as possible. We should not postulate two fundamental pro-
cesses (unitary evolution and measurement) if only one (unitary evolution)
will suce. Let us then accept, at least provisionally, this hypothesis:
The evolution of a closed quantum system is always unitary.
Of course, we have seen that not all superoperators are unitary. The point
of the hypothesis is that nonunitary evolution in an open system, including
126 CHAPTER 3. MEASUREMENT AND EVOLUTION
the collapse that occurs in the measurement process, always arises from dis-
regarding some of the degrees of freedom of a larger system. This is the view
promulgated by Hugh Everett, in 1957. According to this view, the evolution
of the quantum state of \the universe" is actually deterministic!
But even if we accept that collapse is explained by decoherence in a system
that is truly deterministic, we have not escaped all the puzzles of quantum
theory. For the wave function of the universe is in fact a superposition of a
state in which the cat is dead and a state in which the cat is alive. Yet each
time I look at a cat, it is always either dead or alive. Both outcomes are
possible, but only one is realized in fact. Why is that?
Your answer to this question may depend on what you think quantum
theory is about. There are (at least) two reasonable schools of thought.
Platonic : Physics describes reality. In quantum theory, the \wave function
of the universe" is a complete description of physical reality.
Positivist : Physics describes our perceptions. The wave function encodes
our state of knowledge, and the task of quantum theory is to make the
best possible predictions about the future, given our current state of
knowledge.
I believe in reality. My reason, I think, is a pragmatic one. As a physicist,
I seek the most economical model that \explains" what I perceive. To this
physicist, at least, the simplest assumption is that my perceptions (and yours,
too) are correlated with an underlying reality, external to me. This ontology
may seem hopelessly naive to a serious philosopher. But I choose to believe
in reality because that assumption seems to be the simplest one that might
successfully account for my perceptions. (In a similar vein, I chose to believe
that science is more than just a social consensus. I believe that science makes
progress, and comes ever closer to a satisfactory understanding of Nature {
the laws of physics are discovered, not invented. I believe this because it
is the simplest explanation of how scientists are so successful at reaching
consensus.)
Those who hold the contrary view (that, even if there is an underlying
reality, the state vector only encodes a state of knowledge rather than an
underlying reality) tend to believe that the current formulation of quantum
theory is not fully satisfactory, that there is a deeper description still awaiting
discovery. To me it seems more economical to assume that the wavefunction
does describe reality, unless and until you can dissuade me.
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 127
If we believe that the wavefunction describes reality and if we accept
Everett's view that all evolution is unitary, then we must accept that all
possible outcomes of a measurement have an equal claim to being \real."
How then, are we to understand why, when we do an experiment, only one
outcome is actually realized { the cat is either alive or dead.
In fact there is no paradox here, but only if we are willing (consistent with
the spirit of the Everett interpretation) to include ourselves in the quantum
system described by the wave function. This wave function describes all
the possible correlations among the subsystems, including the correlations
between the cat and my mental state. If we prepare the cat state and then
look at the cat, the density operator (after we trace over other extraneous
degrees of freedom) becomes
1
jDecayiatom jDeadicat jKnow it s Deadime Prob = 2
0

jNo decayiatom jAliveicat jKnow it0s Aliveime Prob = 21
(3.178)
This describes two alternatives, but for either alternative, I am certain
about the health of the cat. I never see a cat that is half alive and half dead.
(I am in an eigenstate of the \certainty operator," in accord with experience.)
By assuming that the wave function describes reality and that all evo-
lution is unitary, we are led to the \many-worlds interpretation" of quan-
tum theory. In this picture, each time there is a \measurement," the wave
function of the universe \splits" into two branches, corresponding to the
two possible outcomes. After many measurements, there are many branches
(many worlds), all with an equal claim to describing reality. This prolifera-
tion of worlds seems like an ironic consequence of our program to develop the
most economical possible description. But we ourselves follow one particular
branch, and for the purpose of predicting what we will see in the next instant,
the many other branches are of no consequence. The proliferation of worlds
comes at no cost to us. The \many worlds" may seem weird, but should
we be surprised if a complete description of reality, something completely
foreign to our experience, seems weird to us?
By including ourselves in the reality described by the wave function, we
have understood why we perceive a de nite outcome to a measurement, but
there is still a further question: how does the concept of probability enter
128 CHAPTER 3. MEASUREMENT AND EVOLUTION
into this (deterministic) formalism? This question remains troubling, for to
answer it we must be prepared to state what is meant by \probability."
The word \probability" is used in two rather di erent senses. Sometimes
probability means frequency. We say the probability of a coin coming up
heads is 1=2 if we expect, as we toss the coin many times, the number of
heads divided by the total number of tosses to converge to 1=2. (This is a
tricky concept though; even if the probability is 1=2, the coin still might come
up heads a trillion times in a row.) In rigorous mathematical discussions,
probability theory often seems to be a branch of measure theory { it concerns
the properties of in nite sequences.
But in everyday life, and also in quantum theory, probabilities typically
are not frequencies. When we make a measurement, we do not repeat it
an in nite number of times on identically prepared systems. In the Everett
viewpoint, or in cosmology, there is just one universe, not many identically
prepared ones.
So what is a probability? In practice, it is a number that quanti es the
plausibility of a proposition given a state of knowledge. Perhaps surpris-
ingly, this view can be made the basis of a well-de ned mathematical theory,
sometimes called the \Bayesian" view of probability. The term \Bayesian"
re ects the way probability theory is typically used (both in science and in
everyday life) { to test a hypothesis given some observed data. Hypothesis
testing is carried out using Bayes's rule for conditional probability
P (A0jB ) = P (B jA0)P (A0)=P (B ): (3.179)
For example { suppose that A0 is the preparation of a particular quantum
state, and B is a particular outcome of a measurement of the state. We
have made the measurement (obtaining B ) and now we want to infer how
the state was prepared (compute P (A0jB ). Quantum mechanics allows us to
compute P (B jA0). But it does not tell us P (A0) (or P (B )). We have to make
a guess of P (A0), which is possible if we adopt a \principle of indi erence"
{ if we have no knowledge that Ai is more or less likely than Aj we assume
P (Ai) = P (Aj ). Once an ensemble of preparations is chosen, we can compute
X
P (B ) = P (B=Ai )P (Ai); (3.180)
i
and so obtain P (A0jB ) by applying Bayes's rule.
But if our attitude will be that probability theory quanti es plausibility
given a state of knowledge, we are obligated to ask \whose state of knowl-
edge?" To recover an objective theory, we must interpret probability in
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 129
quantum theory not as a prediction based on our actual state of knowledge,
but rather as a prediction based on the most complete possible knowledge
about the quantum state. If we prepare j "xi and measure 3, then we say
that the result is j "z i with probability 1=2, not because that is the best
prediction we can make based on what we know, but because it is the best
prediction anyone can make, no matter how much they know. It is in this
sense that the outcome is truly random; it cannot be predicted with certainty
even when our knowledge is complete (in contrast to the pseudo randomness
that arises in classical physics because our knowledge is incomplete).
So how, now, are we to extract probabilities from Everett's deterministic
universe? Probabilities arise because we (a part of the system) cannot predict
our future with certainty. I know the formalism, I know the Hamiltonian and
wave function of the universe, I know my branch of the wave function. Now
I am about to look at the cat. A second from now, I will be either be certain
that the cat is dead or I will be certain that it is alive. Yet even with all I
know, I cannot predict the future. Even with complete knowledge about the
present, I cannot say what my state of knowledge will be after I look at the
cat. The best I can do is assign probabilities to the outcomes. So, while the
wave function of the universe is deterministic I, as a part of the system, can
do no better than making probabilistic predictions.
Of course, as already noted, decoherence is a crucial part of this story.
We may consistently assign probabilities to the alternatives Dead and Alive
only if there is no (or at least negligible) possibility of interference among the
alternatives. Probabilities make sense only when we can identify an exhaus-
tive set of mutually exclusive alternatives. Since the issue is really whether
interference might arise at a later time, we cannot decide whether probabil-
ity theory applies by considering a quantum state at a xed time; we must
examine a set of mutually exclusive (coarse-grained) histories, or sequences
of events. There is a sophisticated technology (\decoherence functionals")
for adjudicating whether the various histories decohere to a sucient extent
for probabilities to be sensibly assigned.
So the Everett viewpoint can be reconciled with the quantum indeter-
minism that we observe, but there is still a troubling gap in the picture, at
least as far as I can tell. I am about to look at the cat, and I know that the
density matrix a second from now will be

jDeadicat jKnow it0s Deadime ; Prob = pdead;

130 CHAPTER 3. MEASUREMENT AND EVOLUTION
jAliveicat jKnow it0s Aliveime ; Prob = palive: (3.181)
But how do I infer that pdead and palive actually are probabilities that I (in
my Bayesian posture) may assign to my future perceptions? I still need
a rule to translate this density operator into probabilities assigned to the
alternatives. It seems contrary to the Everett philosophy to assume such a
rule; we could prefer to say that the only rule needed to de ne the theory
is the Schrodinger equation (and perhaps a prescription to specify the initial
wave function). Postulating a probability formula comes perilously close to
allowing that there is a nondeterministic measurement process after all. So
here is the issue regarding the foundations of theory for which I do not know
a fully satisfying resolution.
Since we have not been able to remove all discom ture concerning the
origin of probability in quantum theory, it may be helpful to comment on an
interesting suggestion due to Hartle. To implement his suggestion, we must
return (perhaps with regret) to the frequency interpretation of probability.
Hartle's insight is that we need not assume the probability interpretation as
part of the measurement postulate. It is really sucient to make a weaker
assumption:
If we prepare a quantum state jai, such that Ajai = ajai, and
then immediately measure A, the outcome of the measurement
is a.
This seems like an assumption that a Bayesian residing in Everett's universe
would accept. I am about to measure an observable, and the wavefunction
will branch, but if the observable has the same value in every branch, then I
can predict the outcome.
To implement a frequency interpretation of probability, we should, strictly
speaking, consider an in nite number of trials. Suppose we want to make a
statement about the probability of obtaining the result j "z i when we measure
3 in the state
j i = aj "z i + bj #z i: (3.182)
Then we should imagine that an in nite number of copies are prepared, so
the state is
j (1)i (j i)1 = j i j i j i (3.183)
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 131
and we imagine measuring 3 for each of the copies. Formally, the case of an
in nite number of trials can be formulated as the N ! 1 limit of N trials.
Hartle's idea is to consider an \average spin" operator

3 = Nlim 1 X (i);

N
(3.184)
!1 N 3
i=1

and to argue that (j i)N becomes an eigenstate of 3 with eigenvalue jaj2 ;

jbj2, as N ! 1. Then we can invoke the weakened measurement postulate to
infer that a measurement of 3 will yield the result jaj2 ; jbj2 with certainty,
and that the fraction of all the spins that point up is therefore jaj2. In this
sense, jaj2 is the probability that the measurement of 3 yields the outcome
j "z i .
Consider, for example, the special case
" #N
( ) N 1
j x i (j "xi) = p2 (j "z i + j #z i) :
N (3.185)

We can compute
h
x j 3j x i = 0 ;
(N ) (N )

h
x j 3j x i
(N ) 2 (N )
X (i) (j) (N )
= 12 h x(N )j
3 3 j x i
N ij
X
= N12 ij = NN2 = N1 : (3.186)
ij

Taking the formal N ! 1 limit, we conclude that 3 has vanishing disper-

sion about its mean value h 3i = 0, and so at least in this sense j x(1)i is an
\eigenstate" of 3 with eigenvalue zero.
Coleman and Lesniewski have noted that one can take Hartle's argument
a step further, and argue that the measurement outcome j "z i not only occurs
with the right frequency, but also that the j "z i outcomes are randomly
distributed. To make sense of this statement, we must formulate a de nition
of randomness. We say that an in nite string of bits is random if the string
is incompressible; there is no simpler way to generate the rst N bits than
simply writing them out. We formalize this idea by considering the length
132 CHAPTER 3. MEASUREMENT AND EVOLUTION
of the shortest computer program (on a certain universal computer) that
generates the rst N bits of the sequence. Then, for a random string
Length of shortest program > N ; const: (3.187)
where the constant may depend on the particular computer used or on the
particular sequence, but not on N .
Coleman and Lesniewski consider an orthogonal projection operator E random
that, acting on a state j i that is an eigenstate of each (3i), satis es
E randomj i = j i; (3.188)
if the sequence of eigenvalues of (3i) is random, and
E randomj i = 0; (3.189)
if the sequence is not random. This property alone is not sucient to de-
termine how E random acts on all of (H2)1 , but with an additional technical
assumption, they nd that E random exists, is unique, and has the property
E randomj x(1)i = j x(1)i: (3.190)
Thus, we \might as well say" that j x(1)i is random, with respect to 3
measurements { a procedure for distinguishing the random states from non-
random ones that works properly for strings of 3 eigenstates, will inevitably
identify j x(1)i as random, too.
These arguments are interesting, but they do not leave me completely
satis ed. The most disturbing thing is the need to consider in nite sequences
(a feature of any frequency interpretation probability). For any nite N , we
are unable to apply Hartle's weakened measurement postulate, and even in
the limit N ! 1, applying the postulate involves subtleties. It would be
preferable to have a stronger weakened measurement postulate that could be
applied at nite N , but I am not sure how to formulate that postulate or
how to justify it.
In summary then: Physics should describe the objective physical world,
and the best representation of physical reality that we know about is the
quantum-mechanical wave function. Physics should aspire to explain all ob-
served phenomena as economically as possible { it is therefore unappealing
to postulate that the measurement process is governed by di erent dynami-
cal principles than other processes. Fortunately, everything we know about
3.7. SUMMARY 133
physics is compatible with the hypothesis that all physical processes (includ-
ing measurements) can be accurately modeled by the unitary evolution of
a wave function (or density matrix). When a microscopic quantum system
interacts with a macroscopic apparatus, decoherence drives the \collapse" of
the wave function \for all practical purposes."
If we eschew measurement as a mystical primitive process, and we accept
the wave function as a description of physical reality, then we are led to the
Everett or \many-worlds" interpretation of quantum theory. In this view,
all possible outcomes of any \measurement" are regarded as \real" | but I
perceive only a speci c outcome because the state of my brain (a part of the
quantum system) is strongly correlated with the outcome.
Although the evolution of the wave function in the Everett interpretation
is deterministic, I am unable to predict with certainty the outcome of an
experiment to be performed in the future { I don't know what branch of the
wavefunction I will end up on, so I am unable to predict my future state of
mind. Thus, while the \global" picture of the universe is in a sense deter-
ministic, from my own local perspective from within the system, I perceive
quantum mechanical randomness.
My own view is that the Everett interpretation of quantum theory pro-
vides a satisfying explanation of measurement and of the origin of random-
ness, but does not yet fully explain the quantum mechanical rules for com-
puting probabilities. A full explanation should go beyond the frequency
interpretation of probability | ideally it would place the Bayesian view of
probability on a secure objective foundation.

3.7 Summary
POVM. If we restrict our attention to a subspace of a larger Hilbert space,
then an orthogonal (Von Neumann) measurement performed on the larger
space cannot in general be described as an orthogonal measurement on the
subspace. Rather, it is a generalized measurement or POVM { the outcome
a occurs with a probability
Prob(a) = tr (F a) ; (3.191)
where is the density matrix of the subsystem, each F a is a positive hermi-
tian operator, and the F a's satisfy
X
Fa = 1 : (3.192)
a
134 CHAPTER 3. MEASUREMENT AND EVOLUTION
A POVM in HA can be realized as a unitary transformation on the tensor
product HA HB , followed by an orthogonal measurement in HB .
Superoperator. Unitary evolution on HA HB will not in general
appear to be unitary if we restrict our attention to HA alone. Rather, evo-
lution in HA will be described by a superoperator, (which can be inverted
by another superoperator only if unitary). A general superoperator $ has an
operator-sum (Kraus) representation:
X
$ : ! $() = M M y ; (3.193)

where
X
M yM = 1 : (3.194)

In fact, any reasonable (linear and completely positive) mapping of density
matrices to density matrices has unitary and operator-sum representations.
Decoherence. Decoherence { the decay of quantum information due to
the interaction of a system with its environment { can be described by a
superoperator. If the environment frequently \scatters" o the system, and
the state of the environment is not monitored, then o -diagonal terms in the
density matrix of the system decay rapidly in a preferred basis (typically a
spatially localized basis selected by the nature of the coupling of the system
to the environment). The time scale for decoherence is set by the scattering
rate, which may be much larger than the damping rate for the system.
Master Equation. When the relevant dynamical time scale of an open
quantum system is long compared to the time for the environment to \for-
get" quantum information, the evolution of the system is e ectively local in
time (the Markovian approximation). Much as general unitary evolution is
generated by a Hamiltonian, a general Markovian superoperator is generated
by a Lindbladian L as described by the master equation:

_ L[] = ;i[H ; ] + X LLy ; 12 LyL ; 21 LyL :
(3.195)
Here each Lindblad operator (or quantum jump operator) represents a \quan-
tum jump" that could in principle be detected if we monitored the envi-
ronment faithfully. By solving the master equation, we can compute the
decoherence rate of an open system.
3.8. EXERCISES 135
3.8 Exercises
3.1 Realization of a POVM
Consider the POVM de ned by the four positive operators
P1 = 21 j "z ih"z j ; P2 = 12 j #z ih#z j
P3 = 12 j "xih"x j ; P4 = 21 j #xih#x j :
(3.196)
Show how this POVM can be realized as an orthogonal measurement
in a two-qubit Hilbert space, if one ancilla spin is introduced.
3.2 Invertibility of superoperators
The purpose of this exercise is to show that a superoperator is invertible
only if it is unitary. Recall that any superoperator has an operator-sum
representation; it acts on a pure state as
X
M(j ih j) = Mj ih jMy; (3.197)

where P MyM = 1. Another superoperator N is said to be the
inverse of M if N M = I , or
X
NaMj ih jMyNya = j ih j; (3.198)
;a
for any j i. It follows that
X
jh jNaMj ij2 = 1: (3.199)
;a

a) Show, using the normalization conditions satis ed by the Na's and

M's, that N M = I implies that
NaM = a1; (3.200)
for each a and ; i.e., that each NaM is a multiple of the identity.
b) Use the result of (a) to show that My M is proportional to the
identity for each and .
136 CHAPTER 3. MEASUREMENT AND EVOLUTION
c) Show that it follows from (b) that M is unitary.
3.3 How many superoperators?
How many real parameters are needed to parametrize the general su-
peroperator
$ : ! 0 ; (3.201)
if is a density operator in a Hilbert space of dimension N ? [Hint: How
many real parameters parametrize an N N Hermitian matrix? How
many for a linear mapping of Hermitian matices to Hermitian matrices?
How many for a trace-preserving mapping of Hermitian matrices to
Hermitian matrices?]
3.4 How fast is decoherence?
A very good pendulum with mass m = 1 g and circular frequency
! = 1 s;1 has quality factor Q = 109. The pendulum is prepared in
the \cat state"
jcati = p12 (jxi + j ; xi); (3.202)
a superposition of minimum uncertainty wave packets, each initially at
rest, centered at positions x, where x = 1 cm. Estimate, in order
of magnitude, how long it takes for the cat state to decohere, if the
environment is at
a) zero temperature.
b) room temperature.
3.5 Phase damping
In class, we obtained an operator-sum representation of the phase-
damping channel for a single qubit, with Kraus operators
q
M0 = 1 ; p 1; M1 = pp 12 (1 + 3);

M2 = pp 21 (1 ; 3): (3.203)
3.8. EXERCISES 137
a) Find an alternative representation using only two Kraus operators
N0; N1.
b) Find a unitary 3 3 matrix Ua such that your Kraus operators
found in (a) (augmented by N2 = 0) are related to M0;1;2 by
M = UaNa: (3.204)
c) Consider a single-qubit channel with a unitary representation
q
j0iA j0iE ! 1 ; p j0iA j0iE + pp j0iA j 0iE
q
j1iA j0iE ! 1 ; p j1iA j0iE + pp j1iA j 1iE ;
(3.205)
where j 0iE and j 1iE are normalized states, both orthogonal to
j0iE , that satisfy
E h 0 j 1 iE = 1 ; "; 0 < " < 1: (3.206)
Show that this is again the phase-damping channel, and nd its
operator-sum representation with two Kraus operators.
d) Suppose that the channel in (c) describes what happens to the qubit
when a single photon scatters from it. Find the decoherence rate
;decoh in terms of the scattering rate ;scatt .
3.6 Decoherence on the Bloch sphere
Parametrize the density matrix of a single qubit as

= 12 1 + P~ ~ : (3.207)
a) Describe what happens to P~ under the action of the phase-damping
channel.
b) Describe what happens to P~ under the action of the amplitude-
damping channel de ned by the Kraus operators.
! pp !
1 0
M0 = 0 p1 ; p ; M1 = 0 0 : 0
(3.208)
138 CHAPTER 3. MEASUREMENT AND EVOLUTION
c) The same for the \two-Pauli channel:"
q r r
M0 = 1 ; p 1; M1 = p2 1; M2 = p2 3:
(3.209)
3.7 Decoherence of the damped oscillator
We saw in class that, for an oscillator that can emit quanta into a zero-
temperature reservoir, the interaction picture density matrix I (t) of
the oscillator obeys the master equation
1 1
_ I = ; aI a ; 2 a aI ; 2 I a a ;
y y y (3.210)
where a is the annihilation operator of the oscillator.
a) Consider the quantity
h i
X (; t) = tr I (t)eay e;a ; (3.211)
(where is a complex number). Use the master equation to derive
and solve a di erential equation for X (; t). You should nd
X (; t) = X (0 ; 0); (3.212)
where 0 is a function of ; ;; and t. What is this function
0(; ;; t)?
b) Now suppose that a \cat state" of the oscillator is prepared at t = 0:
jcati = p12 (j 1i + j 2i) ; (3.213)
where j i denotes the coherent state
j i = e;j j =2e ay j0i:
2
(3.214)
Use the result of (a) to infer the density matrix at a later time
t. Assuming ;t 1, at what rate do the o -diagonal terms in
decay (in this coherent state basis)?
Chapter 4
Quantum Entanglement
4.1 Nonseparability of EPR pairs
4.1.1 Hidden quantum information
The deep ways that quantum information di ers from classical information
involve the properties, implications, and uses of quantum entanglement. Re-
call from x2.4.1 that a bipartite pure state is entangled if its Schmidt number
is greater than one. Entangled states are interesting because they exhibit
correlations that have no classical analog. We will begin the study of these
correlations in this chapter.
Recall, for example, the maximally entangled state of two qubits de ned
in x3.4.1:
j+iAB = p12 (j00iAB + j11iAB ): (4.1)
\Maximally entangled" means that when we trace over qubit B to nd the
density operator A of qubit A, we obtain a multiple of the identity operator
A = trB (j+iAB AB h+ ) = 21 1A; (4.2)
(and similarly B = 21 1B ). This means that if we measure spin A along
any axis, the result is completely random { we nd spin up with probability
1/2 and spin down with probability 1=2. Therefore, if we perform any local
measurement of A or B , we acquire no information about the preparation of
the state, instead we merely generate a random bit. This situation contrasts
139
140 CHAPTER 4. QUANTUM ENTANGLEMENT
sharply with case of a single qubit in a pure state; there we can store a bit by
preparing, say, either j "n^ i or j #n^ i, and we can recover that bit reliably by
measuring along the n^-axis. With two qubits, we ought to be able to store
two bits, but in the state j+iAB this information is hidden; at least, we can't
acquire it by measuring A or B .
In fact, j+i is one member of a basis of four mutually orthogonal states
for the two qubits, all of which are maximally entangled | the basis
j i = p12 (j00i j11i);
j i = p12 (j01i j10i); (4.3)

introduced in x3.4.1. We can choose to prepare one of these four states, thus
encoding two bits in the state of the two-qubit system. One bit is the parity
bit (ji or j i) { are the two spins aligned or antialigned? The other is
the phase bit (+ or ;) { what superposition was chosen of the two states
of like parity. Of course, we can recover the information by performing
an orthogonal measurement that projects onto the fj+i; j; i; j +i; j ;ig
basis. But if the two qubits are distantly separated, we cannot acquire this
information locally; that is, by measuring A or measuring B .
What we can do locally is manipulate this information. Suppose that
Alice has access to qubit A, but not qubit B . She may apply 3 to her
qubit, ipping the relative phase of j0iA and j1iA . This action ips the phase
bit stored in the entangled state:
j+i $ j;i;
j +i $ j ;i: (4.4)
On the other hand, she can apply 1, which ips her spin (j0iA $ j1iA ), and
also ips the parity bit of the entangled state:
j+i $ j +i;
j;i $ ;j ;i: (4.5)
Bob can manipulate the entangled state similarly. In fact, as we discussed
in x2.4, either Alice or Bob can perform a local unitary transformation that
changes one maximally entangled state to any other maximally entangled
4.1. NONSEPARABILITY OF EPR PAIRS 141
state.1 What their local unitary transformations cannot do is alter A =
B = 21 1 { the information they are manipulating is information that neither
one can read.
But now suppose that Alice and Bob are able to exchange (classical)
messages about their measurement outcomes; together, then, they can learn
about how their measurements are correlated. The entangled basis states are
conveniently characterized as the simultaneous eigenstates of two commuting
observables:
(1A)(1B);
(3A) (3B); (4.6)
the eigenvalue of (3A)(3B) is the parity bit, and the eigenvalue of (1A)(1B) is
the phase bit. Since these operators commute, they can in principle be mea-
sured simultaneously. But they cannot be measured simultaneously if Alice
and Bob perform localized measurements. Alice and Bob could both choose
to measure their spins along the z-axis, preparing a simultaneous eigenstate
of 3 and 3 . Since 3 and 3 both commute with the parity operator
( A ) (B ) ( A) (B )
(3A)(3B), their orthogonal measurements do not disturb the(Aparity bit, and
they can combine their results to infer the parity bit. But 3 and (3B) do
)
not commute with phase operator (1A) (1B ), so their measurement disturbs
the phase bit. On the other hand, they could both choose to measure their
spins along the x-axis; then they would learn the phase bit at the cost of
disturbing the parity bit. But they can't have it both ways. To have hope of
acquiring the parity bit without disturbing the phase bit, they would need(to
learn about the product 3 3 without nding out anything about 3A)
(A) (B )
and (3B) separately. That cannot be done locally.
Now let us bring Alice and Bob together, so that they can operate on
their qubits jointly. How might they acquire both the parity bit and the
phase bit of their pair? By applying an appropriate unitary transformation,
they can rotate the entangled basis fji; j ig to the unentangled basis
fj00i; j01i; j10i; j11ig. Then they can measure qubits A and B separately to
acquire the bits they seek. How is this transformation constructed?
1But of course, this does not suce to perform an arbitrary unitary transformation on
the four-dimensional space HA HB , which contains states that are not maximally entan-
gled. The maximally entangled states are not a subspace { a superposition of maximally
entangled states typically is not maximally entangled.
142 CHAPTER 4. QUANTUM ENTANGLEMENT
This is a good time to introduce notation that will be used heavily later in
the course, the quantum circuit notation. Qubits are denoted by horizontal
lines, and the single-qubit unitary transformation U is denoted:

U
A particular single-qubit unitary we will nd useful is the Hadamard trans-
form
!
H = p2 1 ;1 = p12 (1 + 3);
1 1 1 (4.7)
which has the properties
H 2 = 1; (4.8)
and
H1H = 3;
H3H = 1: (4.9)
(We can envision H (up to an overall phase) as a = rotation about the
axis n^ = p12 (^n1 + n^3) that rotates x^ to z^ and vice-versa; we have

R(^n; ) = 1 cos 2 + in^ ~ sin 2 = i p1 (1 + 3) = iH :)

2 (4.10)
Also useful is the two-qubit transformation known as the XOR or controlled-
NOT transformation; it acts as
CNOT : ja; bi ! ja; a bi; (4.11)
on the basis states a; b = 0; 1, where a b denotes addition modulo 2, and is
denoted:
a w a

b ab
4.1. NONSEPARABILITY OF EPR PAIRS 143
Thus this gate ips the second bit if the rst is 1, and acts trivially if the
rst bit is 0; we see that
(CNOT)2 = 1: (4.12)
We call a the control (or source) bit of the CNOT, and b the target bit.
By composing these \primitive" transformations, or quantum gates, we
can build other unitary transformations. For example, the \circuit"

H u
i
(to be read from left to right) represents the product of H applied to the
rst qubit followed by CNOT with the rst bit as the source and the second
bit as the target. It is straightforward to see that this circuit transforms the
standard basis to the entangled basis,
j00i ! p12 (j0i + j1i)j0i ! j+i;
j01i ! p12 (j0i + j1i)j1i ! j +i;
j10i ! p12 (j0i ; j1i)j0i ! j;i;
j11i ! p12 (j0i ; j1i)j1i ! j ;i; (4.13)
so that the rst bit becomes the phase bit in the entangled basis, and the
second bit becomes the parity bit.
Similarly, we can invert the transformation by running the circuit back-
wards (since both CNOT and H square to the identity); if we apply the
inverted circuit to an entangled state, and then measure both bits, we can
learn the value of both the phase bit and the parity bit.
Of course, H acts on only one of the qubits; the \nonlocal" part of our
circuit is the controlled-NOT gate { this is the operation that establishes or
removes entanglement. If we could only perform an \interstellar CNOT,"
we would be able to create entanglement among distantly separated pairs, or
144 CHAPTER 4. QUANTUM ENTANGLEMENT
extract the information encoded in entanglement. But we can't. To do its
job, the CNOT gate must act on its target without revealing the value of
its source. Local operations and classical communication will not suce.

4.1.2 Einstein locality and hidden variables

Einstein was disturbed by quantum entanglement. Eventually, he along with
Podolsky and Rosen sharpened their discomfort into what they regarded as
a paradox. As later reinterpreted by Bohm, the situation they described is
really the same as that discussed in x2.5.3. Given a maximally entangled
state of two qubits shared by Alice and Bob, Alice can choose one of several
possible measurements to perform on her spin that will realize di erent pos-
sible ensemble interpretations of Bob's density matrix; for example, she can
prepare either 1 or 3 eigenstates.
We have seen that Alice and Bob are unable to exploit this phenomenon
for faster-than-light communication. Einstein knew this but he was still
dissatis ed. He felt that in order to be considered a complete description
of physical reality a theory should meet a stronger criterion, that might be
called Einstein locality:
Suppose that A and B are spacelike separated systems. Then in
a complete description of physical reality an action performed on
system A must not modify the description of system B .
But if A and B are entangled, a measurement of A is performed and a
particular outcome is known to have been obtained, then the density matrix
of B does change. Therefore, by Einstein's criterion, the description of a
quantum system by a wavefunction cannot be considered complete.
Einstein seemed to envision a more complete description that would re-
move the indeterminacy of quantum mechanics. A class of theories with this
feature are called local hidden-variable theories. In a hidden variable the-
ory, measurement is actually fundamentally deterministic, but appears to be
probabilistic because some degrees of freedom are not precisely known. For
example, perhaps when a spin is prepared in what quantum theory would
describe as the pure state j "z^i, there is actually a deeper theory in which
the state prepared is parametrized as (^z; ) where (0 1) is the
hidden variable. Suppose that with present-day experimental technique, we
have no control over , so when we prepare the spin state, might take any
4.1. NONSEPARABILITY OF EPR PAIRS 145
value { the probability distribution governing its value is uniform on the unit
interval.
Now suppose that when we measure the spin along an axis rotated by
from the z^ axis, the outcome will be
j " i; for 0 cos2 2
j # i; for cos2 2 < 1: (4.14)
If we know , the outcome is deterministic, but if is completely unknown,
then the probability distribution governing the measurement will agree with
the predictions of quantum theory.
Now, what about entangled states? When we say that a hidden variable
theory is local, we mean that it satis es the Einstein locality constraint. A
measurement of A does not modify the values of the variables that govern
the measurements of B . This seems to be what Einstein had in mind when
he envisioned a more complete description.

4.1.3 Bell Inequalities

John Bell's fruitful idea was to test Einstein locality by considering the
quantitative properties of the correlations between measurement outcomes
obtained by Bob and Alice.2 Let's rst examine the predictions of quantum
mechanics regarding these correlations.
Note that the state j ;i has the properties
(~ (A) + ~ (B))j ;i = 0; (4.15)
as we can see by explicit computation. Now consider the expectation value
h;j(~ (A) n^)(~ (B) m^ )j ;i: (4.16)
Since we can replace ~ (B) by ;~ (A) acting on j ;i, this can be expressed as
; h(~ (A) n^ )(~ (A) m^ )i =
; ni mj tr(A(iA)(jA)) = ;nimj ij = ;n^ m^ = ; cos ;
(4.17)
2A good reference on Bell inequalities is A. Peres, Quantum Theory: Concepts and
Methods, chapter 6.
146 CHAPTER 4. QUANTUM ENTANGLEMENT
where is the angle between the axes n^ and m^ . Thus we nd that the
measurement outcomes are always perfectly anticorrelated when we measure
both spins along the same axis n^, and we have also obtained a more general
result that applies when the two axes are di erent. Since the projection
operator onto the spin up (spin down) states along n^ is E (^n; ) = 12 (1n^ ~ ),
we also obtain
h ;jE (A)(^n; +)E (B)(m;
^ +)j ;i
^ ;)j ;i = 14 (1 ; cos );
= h ;jE (A)(^n; ;)E (B)(m;
h ;jE (A)(^n; +)E (B)(m;
^ ;)j ;i
^ +)j ;i = 14 (1 + cos );
= h ;jE (A)(^n; ;)E (B)(m; (4.18)
The probability that the outcomes are opposite is 12 (1 + cos ), and the prob-
ability that the outcomes are the same is 21 (1 ; cos ).
Now suppose Alice will measure her spin along one of the three axes in
the x ; z plane,
n^ 1 = (0; 0; 1)
p !
3
n^ 2 = 2 ; 0; ; 2 1
p !
3 1
n^ 3 = ; 2 ; 0; ; 2 : (4.19)
Once she performs the measurement, she disturbs the state of the spin, so
she won't have a chance to nd out what would have happened if she had
measured along a di erent axis. Or will she? If she shares the state j ;i
with Bob, then Bob can help her. If Bob measures along, say, n^2, and sends
the result to Alice, then Alice knows what would have happened if she had
measured along n^2, since the results are perfectly anticorrelated. Now she can
go ahead and measure along n^1 as well. According to quantum mechanics,
the probability that measuring along n^ 1, and n^2 give the same result is
Psame = 12 (1 ; cos ) = 14 : (4.20)
(We have cos = 1=2 because Bob measures along ;n^ 2 to obtain Alice's
result for measuring along n^2). In the same way, Alice and Bob can work
4.1. NONSEPARABILITY OF EPR PAIRS 147
together to determine outcomes for the measurement of Alice's spin along
any two of the axes n^ 1; n^ 2, and n^ 3.
It is as though three coins are resting on a table; each coin has either the
heads (H) or tails (T) side facing up, but the coins are covered, at rst, so
we don't know which. It is possible to reveal two of the coins (measure the
spin along two of the axes) to see if they are H or T , but then the third coin
always disappears before we get a chance to uncover it (we can't measure the
spin along the third axis).
Now suppose that there are actually local hidden variables that provide
a complete description of this system, and the quantum correlations are to
arise from a probability distribution governing the hidden variables. Then,
in this context, the Bell inequality is the statement
Psame (1; 2) + Psame (1; 3) + Psame (2; 3) 1; (4.21)
where Psame (i; j ) denotes the probability that coins i and j have the same
value (HH or TT ). This is satis ed by any probability distribution for the
three coins because no matter what the values of the coins, there will always
be two that are the same. But in quantum mechanics,
Psame (1; 2) + Psame (1; 3) + Psame (2; 3) = 3 41 = 34 < 1:
(4.22)
We have found that the correlations predicted by quantum theory are incom-
patible with the local hidden variable hypothesis.
What are the implications? To some people, the peculiar correlations
unmasked by Bell's theorem call out for a deeper explanation than quantum
mechanics seems to provide. They see the EPR phenomenon as a harbinger
of new physics awaiting discovery. But they may be wrong. We have been
waiting over 60 years since EPR, and so far no new physics.
Perhaps we have learned that it can be dangerous to reason about what
might have happened, but didn't actually happen. (Of course, we do this
all the time in our everyday lives, and we usually get away with it, but
sometimes it gets us into trouble.) I claimed that Alice knew what would
happen when she measured along n^ 2, because Bob measured along n^ 2, and
every time we have ever checked, their measurement outcomes are always
perfectly anticorrelated. But Alice did not measure along n^ 2; she measured
along n^ 1 instead. We got into trouble by trying to assign probabilities to the
outcomes of measurements along n^ 1; n^ 2, and n^ 3, even though we can only
148 CHAPTER 4. QUANTUM ENTANGLEMENT
perform one of those measurements. This turned out to lead to mathematical
inconsistencies, so we had better not do it. From this viewpoint we have
armed Bohr's principle of complementary | we are forbidden to consider
simultaneously the possible outcomes of two mutually exclusive experiments.
Another common attitude is that the violations of the Bell inequalities
(con rmed experimentally) have exposed an essential nonlocality built into
the quantum description of Nature. One who espouses this view has implic-
itly rejected the complementarity principle. If we do insist on talking about
outcomes of mutually exclusive experiments then we are forced to conclude
that Alice's choice of measurement actually exerted a subtle in uence on the
outcome of Bob's measurement. This is what is meant by the \nonlocality"
of quantum theory.
By ruling out local hidden variables, Bell demolished Einstein's dream
that the indeterminacy of quantum theory could be eradicated by adopting
a more complete, yet still local, description of Nature. If we accept locality
as an inviolable principle, then we are forced to accept randomness as an
unavoidable and intrinsic feature of quantum measurement, rather than a
consequence of incomplete knowledge.
The human mind seems to be poorly equipped to grasp the correlations
exhibited by entangled quantum states, and so we speak of the weirdness
of quantum theory. But whatever your attitude, experiment forces you to
accept the existence of the weird correlations among the measurement out-
comes. There is no big mystery about how the correlations were established
{ we saw that it was necessary for Alice and Bob to get together at some
point to create entanglement among their qubits. The novelty is that, even
when A and B are distantly separated, we cannot accurately regard A and
B as two separate qubits, and use classical information to characterize how
they are correlated. They are more than just correlated, they are a single
inseparable entity. They are entangled.

4.1.4 Photons
Experiments that test the Bell inequality are done with entangled photons,
not with spin; 12 objects. What are the quantum-mechanical predictions for
photons?
Suppose, for example, that an excited atom emits two photons that come
out back to back, with vanishing angular momentum and even parity. If jxi
and jyi are horizontal and vertical linear polarization states of the photon,
4.1. NONSEPARABILITY OF EPR PAIRS 149
then we have seen that
j+i = p12 (jxi + ijyi);
j;i = p1 (ijxi + jyi); (4.23)
2
are the eigenstates of helicity (angular momentum along the axis of propaga-
tion z^. For two photons, one propagating in the +^z direction, and the other
in the ;z^ direction, the states
j+iA j;iB
j;iA j+iB (4.24)
are invariant under rotations about z^. (The photons have opposite values of
Jz , but the same helicity, since they are propagating in opposite directions.)
Under a re ection in the y ; z plane, the polarization states are modi ed
according to
jxi ! ;jxi; j+i ! +ij;i;
jyi ! jyi; j;i ! ;ij+i; (4.25)
therefore, the parity eigenstates are entangled states
p1 (j+iAj;iB j;iAj+iB ): (4.26)
2
The state with Jz = 0 and even parity, then, expressed in terms of the linear
polarization states, is
; pi2 (j+iA j;iB + j;iA j+iB )
= p1 (jxxiAB + jyyiAB )n = j+iAB : (4.27)
2
Because of invariance under rotations about z^, the state has this form irre-
spective of how we orient the x and y axes.
We can use a polarization analyzer to measure the linear polarization of
either photon along any axis in the xy plane. Let jx()i and jy()i denote
150 CHAPTER 4. QUANTUM ENTANGLEMENT
the linear polarization eigenstates along axes rotated by angle relative to
the canonical x and y axes. We may de ne an operator (the analog of ~ n^)
() = jx()ihx()j ; jy()ihy()j; (4.28)
which has these polarization states as eigenstates with respective eigenvalues
1. Since
! !
cos
jx()i = sin ; jy()i = cos ; ; sin (4.29)

in the jxi; jyi basis, we can easily compute the expectation value
AB h
+ j (A) (
1) (B)(2)j+iAB : (4.30)
Using rotational invariance:
= AB h+ j (A)(0) (B)(2 ; 1)j+iAB
= 12 hxj (B)(2 ; 1)jxiB ; 12 hyj (B)(2 ; 1)jyiB
B B
= cos2(2 ; 1) ; sin2(2 ; 1) = cos[2(2 ; 1)]: (4.31)
(For spin- 12 objects, we would obtain

AB h j( ~ n1
+ (A) ^ )( ~ (B ) n
^ 2) = n^ 1 n^2 = cos(2 ; 1); (4.32)
the argument of the cosine is di erent than in the case of photons, because
the half angle =2 appears in the formula analogous to eq. (4.29).)

4.1.5 More Bell inequalities

So far, we have considered only one (particularly interesting) case of the Bell
inequality. Here we will generalize the result.
Consider a correlated pair of photons, A and B . We may choose to
measure the polarization of photon A along either one of two axes, or 0.
The corresponding observables are denoted
a = (A)( )
a0 = (A)( 0): (4.33)
4.1. NONSEPARABILITY OF EPR PAIRS 151
Similarly, we may choose to measure photon B along either axis or axis 0;
the corresponding observables are
b = (B)( )
b = (B)( 0): (4.34)
We will, to begin with, consider the special case 0 = 0 .
Now, if we make the local hidden variable hypothesis, what can be in-
fer about the correlations among these observables? We'll assume that the
prediction of quantum mechanics is satis ed if we measure a0 and b0, namely
ha0b0i = h (B)( ) (B)( )i = 1; (4.35)
when we measure both photons along the same axes, the outcomes always
agree. Therefore, these two observables have exactly the same functional
dependence on the hidden variables { they are really the same observable,
with we will denote c.
Now, let a; b, and c be any three observables with the properties
a; b; c = 1; (4.36)
i.e., they are functions of the hidden variables that take only the two values
1. These functions satisfy the identity
a(b ; c) = ab(1 ; bc): (4.37)
(We can easily verify the identity by considering the cases b ; c = 0; 2; ;2.)
Now we take expectation values by integrating over the hidden variables,
weighted by a nonnegative probability distribution:
habi ; haci = hab(1 ; bc)i: (4.38)
Furthermore, since ab = 1, and 1 ; bc is nonnegative, we have
jhab(1 ; bc)ij
jh1 ; bcij = 1 ; hbci: (4.39)
We conclude that
jhabi ; hacij 1 ; hbci: (4.40)
152 CHAPTER 4. QUANTUM ENTANGLEMENT
This is the Bell inequality.
To make contact with our earlier discussion, consider a pair of spin- 21
objects in the state j+i, where ; ; are separated by successive 60o angles.
Then quantum mechanics predicts
habi = 21
hbci = 12
haci = ; 12 ; (4.41)
which violates the Bell inequality:
1 = 12 + 21 6 1 ; 12 = 12 : (4.42)
For photons, to obtain the same violation, we halve the angles, so ; ; are
separated by 30o angles.
Return now to the more general case 0 6= 0. We readily see that
a; a0; b; b0 = 1 implies that
(a + a0)b ; (a ; a0)b0 = 2; (4.43)
(by considering the two cases a + a0 = 0 and a ; a0 = 0), or
habi + ha0bi + ha0b0i ; hab0i = hi; (4.44)
where = 2. Evidently
jhij 2; (4.45)
so that
jhabi + ha0bi + ha0b0i ; hab0ij 2: (4.46)
This result is called the CHSH (Clauser-Horne-Shimony-Holt) inequality. To
see that quantum mechanics violates it, consider the case for photons where
; ; 0; 0 are separated by successive 22:5 angles, so that the quantum-
mechanical predictions are
habi = ha0bi = ha0b0i = cos 4 = p1 ;
2
3
hab0i = cos 4 = ; p2 ; 1 (4.47)
4.1. NONSEPARABILITY OF EPR PAIRS 153
while
p
2 2 6 2: (4.48)

4.1.6 Maximal violation

We can see that the case just considered ( ; ; 0; 0 separated by successive
22:5o angles) provides the largest possible quantum mechanical violation of
the CHSH inequality. In quantum theory, suppose that a; a0; b; b0 are ob-
servables that satisfy
a2 = a02 = b2 = b02 = 1; (4.49)
and
0 = [a; b] = [a; b0] = [a0; b] = [a0; b0]: (4.50)
Let
C = ab + a0b + a0b0 ; ab0: (4.51)
Then
C 2 = 4 + aba0b0 ; a0bab0 + a0b0ab ; ab0a0b: (4.52)
(You can check that the other terms cancel)
= 4 + [a; a0][b; b0]: (4.53)
The sup norm k M k of a bounded operator M is de ned by
!
k M k= sup k M j i k ; (4.54)
j i kj ik
it is easy to verify that the sup norm has the properties
k MN k k M kk N k;
k M + N k k M k + k N k; (4.55)
and therefore
k [M ; N ] kk MN k + k NM k 2 k M kk N k :
(4.56)
154 CHAPTER 4. QUANTUM ENTANGLEMENT
We conclude that
k C 2 k 4 + 4 k a k k a0 k k b k k b0 k= 8; (4.57)
or
p
k C k 2 2 (4.58)
(Cirel'son's
p inequality). Thus, the expectation value of C cannot exceed
2 2, precisely the value that we found to be attained in the case where
; ; 0; 0 are separated by successive 22:5o angles. The violation of the
CHSH inequality that we found is the largest violation allowed by quantum
theory.

4.1.7 The Aspect experiment

The CHSH inequality was convincingly tested for the rst time by Aspect
and collaborators in 1982. Two entangled photons were produced in the de-
cay of an excited calcium atom, and each photon was directed by a switch
to one of two polarization analyzers, chosen pseudo-randomly. The photons
were detected about 12m apart, corresponding to a light travel time of about
40 ns. This time was considerably longer than either the cycle time of the
switch, or the di erence in the times of arrival of the two photons. There-
fore the \decision" about which observable to measure was made after the
photons were already in ight, and the events that selected the axes for the
measurement of photons A and B were spacelike separated. The results were
consistent with the quantum predictions, and violated the CHSH inequal-
ity by ve standard deviations. Since Aspect, many other experiments have
con rmed this nding.

4.1.8 Nonmaximal entanglement

So far, we have considered the Bell inequality violations predicted by quan-
tum theory for a maximally entangled state such as j+i. But what about
more general states such as
ji = j00i + j11i? (4.59)
(Any pure state of two qubits can be expressed this way in the Schmidt basis;
by adopting suitable phase conventions, we may assume that and are
real and nonnegative.)
4.1. NONSEPARABILITY OF EPR PAIRS 155
Consider rst the extreme case of separable pure states, for which
habi = haihbi: (4.60)
In this case, it is clear that no Bell inequality violation can occur, because
we have already seen that a (local) hidden variable theory does exist that
correctly reproduces the predictions of quantum theory for a pure state of a
single qubit. Returning to the spin- 12 notation, suppose that we measure the
spin of each particle along an axis n^ = (sin ; 0; cos ) in the xz plane. Then
!(A)
( A) cos 1 sin
a = ( n^1) = sin 1 ; cos 1 ; 1

!(B)
( B ) cos 2 sin
b = ( n^2) = sin 2 ; cos 2 ; 2 (4.61)
so that quantum mechanics predicts
habi = hjabji
= cos 1 cos 2 + 2 sin 1 sin 2 (4.62)
p
(and we recover cos(1 ; 2) in the maximally entangled case = = 1= 2).
Now let us consider, for simplicity, the (nonoptimal!) special case
A = 0; A0 = 2 ; B0 = ;B ; (4.63)
so that the quantum predictions are:
habi = cos B = hab0i
ha0bi = 2 sin B = ;ha0b0i (4.64)
Plugging into the CHSH inequality, we obtain
j cos B ; 2 sin B j 1; (4.65)
and we easily see that violations occur for B close to 0 or . Expanding to
linear order in B , the left hand side is
' 1 ; 2 B ; (4.66)
which surely exceeds 1 for B negative and small.
We have shown, then, that any entangled pure state of two qubits violates
some Bell inequality. It is not hard to generalize the argument to an arbitrary
bipartite pure state. For bipartite pure states, then, \entangled" is equivalent
to \Bell-inequality violating." For bipartite mixed states, however, we will
see shortly that the situation is more subtle.
156 CHAPTER 4. QUANTUM ENTANGLEMENT
4.2 Uses of Entanglement
After Bell's work, quantum entanglement became a subject of intensive study
among those interested in the foundations of quantum theory. But more
recently (starting less than ten years ago), entanglement has come to be
viewed not just as a tool for exposing the weirdness of quantum mechanics,
but as a potentially valuable resource. By exploiting entangled quantum
states, we can perform tasks that are otherwise dicult or impossible.

4.2.1 Dense coding

Our rst example is an application of entanglement to communication. Alice
wants to send messages to Bob. She might send classical bits (like dots and
dashes in Morse code), but let's suppose that Alice and Bob are linked by
a quantum channel. For example, Alice can prepare qubits (like photons) in
any polarization state she pleases, and send them to Bob, who measures the
polarization along the axis of his choice. Is there any advantage to sending
qubits instead of classical bits?
In principle, if their quantum channel has perfect delity, and Alice and
Bob perform the preparation and measurement with perfect eciency, then
they are no worse o using qubits instead of classical bits. Alice can prepare,
say, either j "z i or j #z i, and Bob can measure along z^ to infer the choice
she made. This way, Alice can send one classical bit with each qubit. But
in fact, that is the best she can do. Sending one qubit at a time, no matter
how she prepares it and no matter how Bob measures it, no more than one
classical bit can be carried by each qubit. (This statement is a special case
of a bound proved by Kholevo (1973) on the classical information capacity
of a quantum channel.)
But now, let's change the rules a bit { let's suppose that Alice and Bob
share an entangled pair of qubits in the state j+iAB . The pair was prepared
last year; one qubit was shipped to Alice and the other to Bob, anticipating
that the shared entanglement would come in handy someday. Now, use of
the quantum channel is very expensive, so Alice can a ord to send only one
qubit to Bob. Yet it is of the utmost importance for Alice to send Bob two
classical bits of information.
Fortunately, Alice remembers about the entangled state j+iAB that she
shares with Bob, and she carries out a protocol that she and Bob had ar-
ranged for just such an emergency. On her member of the entangled pair,
4.2. USES OF ENTANGLEMENT 157
she can perform one of four possible unitary transformations:
1) 1 (she does nothing),
2) 1 (180o rotation about x^-axis),
3) 2 (180o rotation about y^-axis),
4) 3 (180o rotation about z^-axis).
As we have seen, by doing so, she transforms j+ iAB to one of 4 mutually
orthogonal states:
1) j+iAB ;
2) j +iAB ;
3) j ;iAB ;
4) j;iAB :
Now, she sends her qubit to Bob, who receives it and then performs an or-
thogonal collective measurement on the pair that projects onto the maximally
entangled basis. The measurement outcome unambiguously distinguishes the
four possible actions that Alice could have performed. Therefore the single
qubit sent from Alice to Bob has successfully carried 2 bits of classical infor-
mation! Hence this procedure is called \dense coding."
A nice feature of this protocol is that, if the message is highly con dential,
Alice need not worry that an eavesdropper will intercept the transmitted
qubit and decipher her message. The transmitted qubit has density matrix
A = 21 1A , and so carries no information at all. All the information is in
the correlations between qubits A and B , and this information is inaccessible
unless the adversary is able to obtain both members of the entangled pair.
(Of course, the adversary can \jam" the channel, preventing the information
but reaching Bob.)
From one point of view, Alice and Bob really did need to use the channel
twice to exchange two bits of information { a qubit had to be transmitted for
them to establish their entangled pair in the rst place. (In e ect, Alice has
merely sent to Bob two qubits chosen to be in one of the four mutually or-
thogonal entangled states.) But the rst transmission could have taken place
a long time ago. The point is that when an emergency arose and two bits
158 CHAPTER 4. QUANTUM ENTANGLEMENT
had to be sent immediately while only one use of the channel was possible,
Alice and Bob could exploit the pre-existing entanglement to communicate
more eciently. They used entanglement as a resource.

4.2.2 EPR Quantum Key Distribution

Everyone has secrets, including Alice and Bob. Alice needs to send a highly
private message to Bob, but Alice and Bob have a very nosy friend, Eve, who
they know will try to listen in. Can they communicate with assurance that
Eve is unable to eavesdrop?
Obviously, they should use some kind of code. Trouble is, aside from
being very nosy, Eve is also very smart. Alice and Bob are not con dent
that they are clever enough to devise a code that Eve cannot break.
Except there is one coding scheme that is surely unbreakable. If Alice
and Bob share a private key, a string of random bits known only to them,
then Alice can convert her message to ASCII (a string of bits no longer than
the key) add each bit of her message (module 2) to the corresponding bit of
the key, and send the result to Bob. Receiving this string, Bob adds the key
to it to extract Alice's message.
This scheme is secure because even if Eve should intercept the transmis-
sion, she will not learn anything because the transmitted string itself carries
no information { the message is encoded in a correlation between the trans-
mitted string and the key (which Eve doesn't know).
There is still a problem, though, because Alice and Bob need to establish
a shared random key, and they must ensure that Eve can't know the key.
They could meet to exchange the key, but that might be impractical. They
could entrust a third party to transport the key, but what if the intermediary
is secretly in cahoots with Eve? They could use \public key" distribution
protocols, but these are not guaranteed to be secure.
Can Alice and Bob exploit quantum information (and speci cally entan-
glement) to solve the key exchange problem? They can! This observation
is the basis of what is sometimes called \quantum cryptography." But since
quantum mechanics is really used for key exchange rather than for encoding,
it is more properly called \quantum key distribution."
Let's suppose that Alice and Bob share a supply of entangled pairs, each
prepared in the state j ;i. To establish a shared private key, they may carry
out this protocol.
4.2. USES OF ENTANGLEMENT 159
For each qubit in her/his possession, Alice and Bob decide to measure
either 1 or 3. The decision is pseudo-random, each choice occuring with
probability 1=2. Then, after the measurements are performed, both Alice
and Bob publicly announce what observables they measured, but do not
reveal the outcomes they obtained. For those cases (about half) in which
they measured their qubits along di erent axes, their results are discarded
(as Alice and Bob obtained uncorrelated outcomes). For those cases in which
they measured along the same axis, their results, though random, are perfectly
(anti-)correlated. Hence, they have established a shared random key.
But, is this protocol really invulnerable to a sneaky attack by Eve? In
particular, Eve might have clandestinely tampered with the pairs at some
time and in the past. Then the pairs that Alice and Bob possess might be
(unbeknownst to Alice and Bob) not perfect j ;i's, but rather pairs that are
entangled with qubits in Eve's possession. Eve can then wait until Alice and
Bob make their public announcements, and proceed to measure her qubits
in a manner designed to acquire maximal information about the results that
Alice and Bob obtained. Alice and Bob must protect themselves against this
type of attack.
If Eve has indeed tampered with Alice's and Bob's pairs, then the most
general possible state for an AB pair and a set of E qubits has the form
jiABE = j00iAB je00iE + j01iAB je01iE
+ j10iAB je10iE + j11iAB je11iE : (4.67)
But now recall that the de ning property or j ;i is that it is an eigenstate
with eigenvalue ;1 of both (1A)(1B) and (3A)(3B). Suppose that A and B
are able to verify that the pairs in their possession have this property. To
satisfy (3A)(3B) = ;1, we must have
jiAB = j01iAB je01iE + j10iAB je10iE ; (4.68)
and to also satisfy (1A)(1B) = ;1, we must have
jiABE = p1 (j01i ; j10i)jeiE = j ;ijei: (4.69)
2
We see that it is possible for the AB pairs to be eigenstates of (1A)(1B)
and (3A)(3B) only if they are completely unentangled with Eve's qubits.
160 CHAPTER 4. QUANTUM ENTANGLEMENT
Therefore, Eve will not be able to learn anything about Alice's and Bob's
measurement results by measuring her qubits. The random key is secure.
To verify the properties (1A)(1B) = ;1 = (3A)(3B), Alice and Bob can
sacri ce a portion of their shared key, and publicly compare their measure-
ment outcomes. They should nd that their results are indeed perfectly
correlated. If so they will have high statistical con dence that Eve is unable
to intercept the key. If not, they have detected Eve's nefarious activity. They
may then discard the key, and make a fresh attempt to establish a secure
key.
As I have just presented it, the quantum key distribution protocol seems
to require entangled pairs shared by Alice and Bob, but this is not really
so. We might imagine that Alice prepares the j ;i pairs herself, and then
measures one qubit in each pair before sending the other to Bob. This is
completely equivalent to a scheme in which Alice prepares one of the four
states
j "z i; j #z i; j "xi; j #xi; (4.70)
(chosen at random, each occuring with probability 1=4) and sends the qubit
to Bob. Bob's measurement and the veri cation are then carried out as
before. This scheme (known as BB84 in quantum key distribution jargon) is
just as secure as the entanglement-based scheme.3
Another intriguing variation is called the \time-reversed EPR" scheme.
Here both Alice and Bob prepare one of the four states in eq. (4.70), and
they both send their qubits to Charlie. Then Charlie performs a Bell mea-
surement on the pair, orthogonally projecting out one of jij i, and he
publicly announces the result. Since all four of these states are simultaneous
eigenstates of (1A)(1B) and (3A)(3B), when Alice and Bob both prepared
their spins along the same axis (as they do about half the time) they share
a single bit.4 Of course, Charlie could be allied with Eve, but Alice and Bob
can verify that Charlie has acquired no information as before, by compar-
ing a portion of their key. This scheme has the advantage that Charlie could
3 Except that in the EPR scheme, Alice and Bob can wait until just before they need to
talk to generate the key, thus reducing the risk that Eve might at some point burglarize
Alice's safe to learn what states Alice prepared (and so infer the key).
4 Until Charlie does his measurement, the states prepared by Bob and Alice are to-
tally uncorrelated. A de nite correlation (or anti-correlation) is established after Charlie
performs his measurement.
4.2. USES OF ENTANGLEMENT 161
operate a central switching station by storing qubits received from many par-
ties, and then perform his Bell measurement when two of the parties request
a secure communication link. A secure key can be established even if the
quantum communication line is down temporarily, as long as both parties
had the foresight to send their qubits to Charlie on an earlier occasion (when
the quantum channel was open.)
So far, we have made the unrealistic assumption that the quantum com-
munication channel is perfect, but of course in the real world errors will
occur. Therefore even if Eve has been up to no mischief, Alice and Bob
will sometimes nd that their veri cation test will fail. But how are they to
distinguish errors due to imperfections of the channel from errors that occur
because Eve has been eavesdropping?
To address this problem, Alice and Bob must enhance their protocol in
two ways. First they must implement (classical) error correction to reduce
the e ective error rate. For example, to establish each bit of their shared
key they could actually exchange a block of three random bits. If the three
bits are not all the same, Alice can inform Bob which of the three is di erent
than the other two; Bob can ip that bit in his block, and then use majority
voting to determine a bit value for the block. This way, Alice and Bob share
the same key bit even if an error occured for one bit in the block of three.
However, error correction alone does not suce to ensure that Eve has
acquired negligible information about the key { error correction must be
supplemented by (classical) privacy ampli cation. For example, after per-
forming error correction so that they are con dent that they share the same
key, Alice and Bob might extract a bit of \superkey" as the parity of n key
bits. To know anything about the parity of n bits, Eve would need to know
something about each of the bits. Therefore, the parity bit is considerably
more secure, on the average, than each of the individual key bits.
If the error rate of the channel is low enough, one can hope to show that
quantum key distribution, supplemented by error correction and privacy am-
pli cation, is invulnerable to any attack that Eve might muster (in the sense
that the information acquired by Eve can be guaranteed to be arbitrarily
small). Whether this has been established is, at the moment, a matter of
controversy.
162 CHAPTER 4. QUANTUM ENTANGLEMENT
4.2.3 No cloning
The security of quantum key distribution is based on an essential di erence
between quantum information and classical information. It is not possible
to acquire information that distinguishes between nonorthogonal quantum
states without disturbing the states.
For example, in the BB84 protocol, Alice sends to Bob any one of the four
states j "z ij #z ij "xij #xi, and Alice and Bob are able to verify that none of
their states are perturbed by Eve's attempt at eavesdropping. Suppose, more
generally, that j'i and j i are two nonorthogonal states in H (h j'i = 6 0)
and that a unitary transformation U is applied to H HE (where HE is a
Hilbert space accessible to Eve) that leaves both j i and j'i undisturbed.
Then
U : j i j0iE ! j i jeiE ;
j'i j0iE ! j'i jf iE ; (4.71)
and unitarity implies that
h ji = (E h0j h j)(j'i j0iE )
= (E hej h j)(j'i jf iE )
= h j'iE hejf iE : (4.72)
Hence, for h j'i 6= 0, we have E hejf iE = 1, and therefore since the states
are normalized, jei = jf i. This means that no measurement in HE can
reveal any information that distinguishes j i from j'i. In the BB84 case
this argument shows that the state in HE will be the same irrespective of
which of the four states j "z i; j #z i; j "xi; j #xi is sent by Alice, and therefore
Eve learns nothing about the key shared by Alice and Bob. On the other
hand, if Alice is sending to Bob one of the two orthogonal states j "z i or j #z i,
there is nothing to prevent Eve from acquiring a copy of the information (as
with classical bits).
We have noted earlier that if we have many identical copies of a qubit,
then it is possible to measure the mean value of noncommuting observables
like 1; 2, and 3 to completely determine the density matrix of the qubit.
Inherent in the conclusion that nonorthogonal state cannot be distinguished
without disturbing them, then, is the implicit provision that it is not possible
to make a perfect copy of a qubit. (If we could, we would make as many copies
as we need to nd h1i; h 2i, and h 3i to any speci ed accuracy.) Let's now
4.2. USES OF ENTANGLEMENT 163
make this point explicit: there is no such thing as a perfect quantum Xerox
machine.
Orthogonal quantum states (like classical information) can be reliably
copied. For example, the unitary transformation that acts as
U : j0iA j0iB ! j0iA j0iB
j1iA j0iB ! j1iA j1iB ; (4.73)
copies the rst qubit onto the second if the rst qubit is in one of the states
j0iA or j1iA . But if instead the rst qubit is in the state j i = aj0iA + bj1iA ,
then
U : (aj0iA + bj1iA )j0iB
! aj0iA j0iB + bj1iA j1iB : (4.74)
Thus is not the state j i j i (a tensor product of the original and the copy);
rather it is something very di erent { an entangled state of the two qubits.
To consider the most general possible quantum Xerox machine, we allow
the full Hilbert space to be larger than the tensor product of the space of the
original and the space of the copy. Then the most general \copying" unitary
transformation acts as
U : j iAj0iB j0iE ! j iAj iB jeiE
j'iAj0iB j0iE ! j'iA j'iB jf iE : (4.75)
Unitarity then implies that
Ah j'iA = Ah j'iA B h j'iB E hejf iE ; (4.76)
therefore, if h j'i 6= 0, then
1 = h j'i E hejf iE : (4.77)
Since the states are normalized, we conclude that
jh j'ij = 1; (4.78)
so that j i and j'i actually represent the same ray. No unitary machine can
make a copy of both j'i and j i if j'i and j i are distinct, nonorthogonal
states. This result is called the no-cloning theorem.
164 CHAPTER 4. QUANTUM ENTANGLEMENT
4.2.4 Quantum teleportation
In dense coding, we saw a case where quantum information could be exploited
to enhance the transmission of classical information. Now let's address a
closely related issue: Can we use classical information to realize transmission
of quantum information?
Alice has a qubit, but she doesn't know it's state. Bob needs this qubit
desperately. But that darn quantum channel is down again! Alice can send
only classical information to Bob.
She could try measuring ~ n^, projecting her qubit to either j "n^ i or j #n^ i.
She could send the measurement outcome to Bob who could then proceed to
prepare the state Alice found. But you showed in a homework exercise that
Bob's qubit will not be a perfect copy of Alice's; on the average we'll have
F = jB h j iAj2 = 23 ; (4.79)
Thus is a better delity than could have been achieved (F = 12 ) if Bob had
merely chosen a state at random, but it is not nearly as good as the delity
that Bob requires.
But then Alice and Bob recall that they share some entangled pairs; why
not use the entanglement as a resource? They carry out this protocol: Alice
unites the unknown qubit j iC she wants to send to Bob with her member
of a j+iAB pair that she shares with Bob. On these two qubits she performs
Bell measurement, projecting onto one of the four states jiCA ; j iCA . She
sends her measurement outcome (two bits of classical information) to Bob
over the classical channel. Receiving this information, Bob performs one of
four operations on his qubit jiB :
j+iCA ! 1B
j +iCA ! (1B)
j ;iCA ! (2B)
j;iCA ! (3B): (4.80)
This action transforms his qubit (his member of the j+iAB pair that he
initially shared with Alice) into a perfect copy of j iC ! This magic trick is
called quantum teleportation.
It is a curious procedure. Initially, Bob's qubit jiB is completely unentan-
gled with the unknown qubit j iC , but Alice's Bell measurement establishes
4.2. USES OF ENTANGLEMENT 165
a correlation between A and C . The measurement outcome is in fact com-
pletely random, as you'll see in a moment, so Alice (and Bob) actually acquire
no information at all about j i by making this measurement.
How then does the quantum state manage to travel from Alice to Bob?
It is a bit puzzling. On the one hand, we can hardly say that the two
classical bits that were transmitted carried this information { the bits were
random. So we are tempted to say that the shared entangled pair made the
teleportation possible. But remember that the entangled pair was actually
prepared last year, long before Alice ever dreamed that she would be sending
the qubit to Bob ...
We should also note that the teleportation procedure is fully consistent
with the no-cloning theorem. True, a copy of the state j i appeared in Bob's
hands. But the original j iC had to be destroyed by Alice's measurement
before the copy could be created.
How does it work? We merely note that for j i = aj0i + bj1i, we may
write
j iC j+ iAB = (aj0iC + bj1iC ) p12 (j00iAB + j11iAB )
= p1 (aj000iCAB + aj011iCAB + bj100iCAB + bj111iCAB )
2
= 21 a(j+iCA + j;iCA )j0iB + 12 a(j +iCA + j ;iCA )j1iB
+ 21 b(j +iCA ; j ;iCA )j0iB + 12 b(j+iCA ; j;iCA )j1iB
= 12 j+iCA (aj0iB + bj1iB )
+ 12 j +iCA (aj1iB + bj0iB )
+ 12 j ;iCA (aj1iB ; bj0iB )
+ 12 j;iCA (aj0iB ; bj1iB )
= 12 j+iCA j iB + 21 j +iCA 1j iB
+ 12 j ;iCA (;i2)j iB + 21 j;iCA 3j iB : (4.81)
Thus we see that when we perform the Bell measurement on qubits C and
166 CHAPTER 4. QUANTUM ENTANGLEMENT
A, all four outcomes are equally likely, and that the actions prescribed in
Eq. (4.80) will restore Bob's qubit to the initial state j i.
Chapter 5
Quantum Information Theory
Quantum information theory is a rich subject that could easily have occupied
us all term. But because we are short of time (I'm anxious to move on to
quantum computation), I won't be able to cover this subject in as much
depth as I would have liked. We will settle for a brisk introduction to some
of the main ideas and results. The lectures will perhaps be sketchier than
in the rst term, with more hand waving and more details to be lled in
through homework exercises. Perhaps this chapter should have been called
\quantum information theory for the impatient."
Quantum information theory deals with four main topics:
(1) Transmission of classical information over quantum channels (which we
will discuss).
(2) The tradeo between acquisition of information about a quantum state
and disturbance of the state (brie y discussed in Chapter 4 in connec-
tion with quantum cryptography, but given short shrift here).
(3) Quantifying quantum entanglement (which we will touch on brie y).
(4) Transmission of quantum information over quantum channels. (We will
discuss the case of a noiseless channel, but we will postpone discus-
sion of the noisy channel until later, when we come to quantum error-
correcting codes.)
These topics are united by a common recurring theme: the interpretation
and applications of the Von Neumann entropy.
167
168 CHAPTER 5. QUANTUM INFORMATION THEORY
5.1 Shannon for Dummies
Before we can understand Von Neumann entropy and its relevance to quan-
tum information, we must discuss Shannon entropy and its relevance to clas-
sical information.
Claude Shannon established the two core results of classical information
theory in his landmark 1948 paper. The two central problems that he solved
were:
(1) How much can a message be compressed; i.e., how redundant is the
information? (The \noiseless coding theorem.").
(2) At what rate can we communicate reliably over a noisy channel; i.e.,
how much redundancy must be incorporated into a message to protect
against errors? (The \noisy channel coding theorem.")
Both questions concern redundancy { how unexpected is the next letter of the
message, on the average. One of Shannon's key insights was that entropy
provides a suitable way to quantify redundancy.
I call this section \Shannon for Dummies" because I will try to explain
Shannon's ideas quickly, with a minimum of "'s and 's. That way, I can
compress classical information theory to about 11 pages.

5.1.1 Shannon entropy and data compression

A message is a string of letters chosen from an alphabet of k letters
fa1; a2; : : : ; ak g: (5.1)
Let us suppose that the letters in the message are statistically independent,
and
Pk that each letter ax occurs with an a priori probability p(ax), where
x=1 p ( a x = 1. For example, the simplest case is a binary alphabet, where
)
0 occurs with probability 1 ; p and 1 with probability p (where 0 p 1).
Now consider long messages with n letters, n 1. We ask: is it possible
to compress the message to a shorter string of letters that conveys essentially
the same information?
For n very large, the law of large numbers tells us that typical strings will
contain (in the binary case) about n(1 ; p) 0's and about np 1's. The number
5.1. SHANNON FOR DUMMIES 169

of distinct strings of this form is of order the binomial coecient npn , and
from the Stirling approximation logn! = n log n ; n + 0(log n) we obtain
! !
n n !
log np = log (np)![n(1 ; p)]! =
n log n ; n ; [np log np ; np + n(1 ; p) log n(1 ; p) ; n(1 ; p)]
= nH (p); (5.2)
where
H (p) = ;p log p ; (1 ; p) log(1 ; p) (5.3)
is the entropy function. Hence, the number of typical strings is of order
2nH (p). (Logs are understood to have base 2 unless otherwise speci ed.)
To convey essentially all the information carried by a string of n bits, it
suces to choose a block code that assigns a positive integer to each of the
typical strings. This block code has about 2nH (p) letters (all occurring with
equal a priori probability), so we may specify any one of the letters using
a binary string of length nH (p). Since 0 H (p) 1 for 0 p 1, and
H (p) = 1 only for p = 12 , the block code shortens the message for any p 6= 12
(whenever 0 and 1 are not equally probable). This is Shannon's result. The
key idea is that we do not need a codeword for every sequence of letters, only
for the typical sequences. The probability that the actual message is atypical
becomes negligible asymptotically, i.e., in the limit n ! 1.
This reasoning generalizes easily to the case of k letters, where letter x
occurs with probability p(x).1 In a string of n letters, x typically occurs
about np(x) times, and the number of typical strings is of order
Q n! ' 2;nH (X ); (5.4)
x ( np (x ))!
where we have again invoked the Stirling approximation and
X
H (X ) = ;p(x) log p(x): (5.5)
x
1The ensemble in which each of n letters is drawn from the distribution X will be
denoted X n .
170 CHAPTER 5. QUANTUM INFORMATION THEORY
is the Shannon entropy (or simply entropy) of the ensemble X = fx; p(x)g.
Adopting a block code that assigns integers to the typical sequences, the
information in a string of n letters can be compressed to H (X ) bits. In this
sense a letter x chosen from the ensemble carries, on the average, H (X ) bits
of information.
It is useful to restate this reasoning in a slightly di erent language. A
particular n-letter message
x1x2 xn; (5.6)
occurs with a priori probability
P (x1 xn) = p(x1)p(x2) p(xn) (5.7)
X
n
log P (x1 xn) = log p(xi): (5.8)
i=1
Applying the central limit theorem to this sum, we conclude that for \most
sequences"
; n1 log P (x1; ; xn) h; log p(x)i H (X ); (5.9)
where the brackets denote the mean value with respect to the probability
distribution that governs the random variable x.
Of course, with "'s and 's we can formulate these statements precisely.
For any "; > 0 and for n suciently large, each \typical sequence" has a
probability P satisfying
H (X ) ; < ; n1 log P (x1 xn) < H (X ) + ; (5.10)
and the total probability of all typical sequences exceeds 1 ; ". Or, in other
words, sequences of letters occurring with a total probability greater than
1 ; " (\typical sequences") each have probability P such that
2;n(H ;) P 2;n(H +); (5.11)
and from eq. (5.11) we may infer upper and lower bounds on the number
N ("; ) of typical sequences (since the sum of the probabilities of all typical
sequences must lie between 1 ; " and 1):
2n(H +) N ("; ) (1 ; ")2n(H ;): (5.12)
5.1. SHANNON FOR DUMMIES 171
With a block code of length n(H + ) bits we can encode all typical
sequences. Then no matter how the atypical sequences are encoded, the
probability of error will still be less than ".
Conversely, if we attempt to compress the message to less than H ; 0 bits
per letter, we will be unable to achieve a small error rate as n ! 1, because
we will be unable to assign unique codewords to all typical sequences. The
probability Psuccess of successfully decoding the message will be bounded by
Psuccess 2n(H ;0 )2;n(H ;) + "0 = 2;n(0 ;) + "0; (5.13)
we can correctly decode only 2n(H ;0 ) typical messages, each occurring with
probability less than 2;n(H ;) (the "0 is added to allow for the possibility that
we manage to decode the atypical messages correctly). Since we may choose
as small as we please, this success probability becomes small as n ! 1.
We conclude that the optimal code compresses each letter to H (X ) bits
asymptotically. This is Shannon's noiseless coding theorem.

5.1.2 Mutual information

The Shannon entropy H (X ) quanti es how much information is conveyed,
on the average, by a letter drawn from the ensemble X , for it tells us how
many bits are required (asymptotically as n ! 1, where n is the number of
letters drawn) to encode that information.
The mutual information I (X ; Y ) quanti es how correlated two messages
are. How much do we know about a message drawn from X n when we have
read a message drawn from Y n ?
For example, suppose we want to send a message from a transmitter to
a receiver. But the communication channel is noisy, so that the message
received (y) might di er from the message sent (x). The noisy channel can
be characterized by the conditional probabilities p(yjx) { the probability that
y is received when x is sent. We suppose that the letter x is sent with a priori
probability p(x). We want to quantify how much we learn about x when we
receive y; how much information do we gain?
As we have already seen, the entropy H (X ) quanti es my a priori igno-
rance per letter, before any message is received; that is, you would need to
convey nH (noiseless) bits to me to completely specify (asymptotically) a
particular message of n letters. But after I learn the value of y, I can use
172 CHAPTER 5. QUANTUM INFORMATION THEORY
Bayes' rule to update my probability distribution for x:
p(xjy) = p(ypjx(y)p)(x) : (5.14)
(I know p(yjx) if I am familiar with the properties of the channel, and p(x)
ifPI know the a priori probabilities of the letters; thus I can compute p(y) =
x p(y jx)p(x):) Because of the new knowledge I have acquired, I am now
less ignorant about x than before. Given the y's I have received, using an
optimal code, you can specify a particular string of n letters by sending me
H (X jY ) = h; log p(xjy)i; (5.15)
bits per letter. H (X jY ) is called the \conditional entropy." From p(xjy) =
p(x; y)=p(y), we see that
H (X jY ) = h; log p(x; y) + log p(y)i

= H (X; Y ) ; H (Y ); (5.16)
and similarly
H (Y jX ) h; log p(yjx)i
!
p ( x; y )
= h; log p(x) i = H (X; Y ) ; H (X ): (5.17)
We may interpret H (X jY ), then, as the number of additional bits per letter
needed to specify both x and y once y is known. Obviously, then, this quantity
cannot be negative.
The information about X that I gain when I learn Y is quanti ed by how
much the number of bits per letter needed to specify X is reduced when Y is
known. Thus is
I (X ; Y ) H (X ) ; H (X jY )
= H (X ) + H (Y ) ; H (X; Y )
= H (Y ) ; H (Y jX ): (5.18)
I (X ; Y ) is called the mutual information. It is obviously symmetric under
interchange of X and Y ; I nd out as much about X by learning Y as about Y
5.1. SHANNON FOR DUMMIES 173
by learning X . Learning Y can never reduce my knowledge of X , so I (X ; Y )
is obviously nonnegative. (The inequalities H (X ) H (X jY ) 0 are easily
proved using the convexity of the log function; see for example Elements of
Information Theory by T. Cover and J. Thomas.)
Of course, if X and Y are completely uncorrelated, we have p(x; y) =
p(x)p(y), and

I (X ; Y ) hlog pp(x(x;)py(y)) i = 0; (5.19)

naturally, we can't nd out about X by learning Y if there is no correlation!

5.1.3 The noisy channel coding theorem

If we want to communicate over a noisy channel, it is obvious that we can
improve the reliability of transmission through redundancy. For example, I
might send each bit many times, and the receiver could use majority voting
to decode the bit.
But given a channel, is it always possible to nd a code that can ensure
arbitrarily good reliability (as n ! 1)? And what can be said about the
rate of such codes; i.e., how many bits are required per letter of the message?
In fact, Shannon showed that any channel can be used for arbitrarily
reliable communication at a nite (nonzero) rate, as long as there is some
correlation between input and output. Furthermore, he found a useful ex-
pression for the optimal rate that can be attained. These results are the
content of the \noisy channel coding theorem."
Suppose, to be concrete, that we are using a binary alphabet, 0 and 1
each occurring with a priori probability 21 . And suppose that the channel is
the \binary symmetric channel" { it acts on each bit independently, ipping
its value with probability p, and leaving it intact with probability 1 ; p. That
is, the conditional probabilities are
p(0j0) = 1 ; p; p(0j1) = p; (5.20)
p(1j0) = p; p(1j1) = 1 ; p:
We want to construct a family of codes of increasing block size n, such
that the probability of a decoding error goes to zero as n ! 1. If the
number of bits encoded in the block is k, then the code consists of a choice of
174 CHAPTER 5. QUANTUM INFORMATION THEORY
2k \codewords" among the 2n possible strings of n bits. We de ne the rate
R of the code (the number of data bits carried per bit transmitted) as
R = nk : (5.21)
We should design our code so that the code strings are as \far apart" as
possible. That is for a given rate R, we want to maximize the number of
bits that must be ipped to change one codeword to another (this number is
called the \Hamming distance" between the two codewords).
For any input string of length n bits, errors will typically cause about
np of the bits to ip { hence the input typically di uses to one of about
2nH (p) typical output strings (occupying a \sphere" of \Hamming radius" np
about the input string). To decode reliably, we will want to choose our input
codewords so that the error spheres of two di erent codewords are unlikely
to overlap. Otherwise, two di erent inputs will sometimes yield the same
output, and decoding errors will inevitably occur. If we are to avoid such
decoding ambiguities, the total number of strings contained in all 2nR error
spheres must not exceed the total number 2n of bits in the output message;
we require
2nH (p)2nR 2n (5.22)
or
R 1 ; H (p) C (p): (5.23)
If transmission is highly reliable, we cannot expect the rate of the code to
exceed C (p). But is the rate R = C (p) actually attainable (asymptotically)?
In fact transmission with R arbitrarily close to C and arbitrarily small
error probability is possible. Perhaps the most ingenious of Shannon's ideas
was to demonstrate that C can be attained by considering an average over
\random codes." (Obviously, choosing a code at random is not the most
clever possible procedure, but, perhaps surprisingly, it turns out that random
coding achieves as high a rate (asymptotically for large n) as any other coding
scheme.) Since C is the optimal rate for reliable transmission of data over
the noisy channel it is called the channel capacity.
Suppose that 2nR codewords are chosen at random by sampling the en-
semble X n . A message (one of the codewords) is sent. To decode the message,
we draw a \Hamming sphere" around the message received that contains
2n(H (p)+); (5.24)
5.1. SHANNON FOR DUMMIES 175
strings. The message is decoded as the codeword contained in this sphere,
assuming such a codeword exists and is unique. If no such codeword exists,
or the codeword is not unique, then we will assume that a decoding error
occurs.
How likely is a decoding error? We have chosen the decoding sphere large
enough so that failure of a valid codeword to appear in the sphere is atypical,
so we need only worry about more than one valid codeword occupying the
sphere. Since there are altogether 2n possible strings, the Hamming sphere
around the output contains a fraction
2n(H (p)+) = 2;n(C(p);) ; (5.25)
2n
of all strings. Thus, the probability that one of the 2nR randomly chosen
codewords occupies this sphere \by accident" is
2;n(C(p);R;) ; (5.26)
Since we may choose as small as we please, R can be chosen as close to
C as we please (but below C ), and this error probability will still become
exponentially small as n ! 1.
So far we have shown that, the average probability of error is small, where
we average over the choice of random code, and for each speci ed code, we
also average over all codewords. Thus there must exist one particular code
with average probability of error (averaged over codewords) less than ". But
we would like a stronger result { that the probability of error is small for
every codeword.
To establish the stronger result, let Pi denote the probability of a decoding
error when codeword i is sent. We have demonstrated the existence of a code
such that
1 2XnR

2nR i=1 Pi < ": (5.27)

Let N2" denote the number of codewords with Pi > 2". Then we infer that
1 (N )2" < " or N < 2nR;1 ; (5.28)
2nR 2" 2"
we see that we can throw away at most half of the codewords, to achieve
Pi < 2" for every codeword. The new code we have constructed has
Rate = R ; n1 ; (5.29)
176 CHAPTER 5. QUANTUM INFORMATION THEORY
which approaches R as n ! 1
We have seen, then, that C (p) = 1 ; H (p) is the maximum rate that can
be attained asymptotically with an arbitrarily small probability of error.
Consider now how these arguments generalize to more general alphabets
and channels. We are given a channel speci ed by the p(yjx)'s, and let us
specify a probability distribution X = fx; p(x)g for the input letters. We will
send strings of n letters, and we will assume that the channel acts on each
letter independently. (A channel acting this way is said to be \memoryless.")
Of course, once p(yjx) and X are speci ed, p(xjy) and Y = fy; p(y)g are
determined.
To establish an attainable rate, we again consider averaging over random
codes, where codewords are chosen with a priori probability governed by X n .
Thus with high probability, these codewords will be chosen from a typical
set of strings of letters, where there are about 2nH (X ) such typical strings.
For a typical received message in Y n, there are about 2nH (X jY ) messages
that could have been sent. We may decode by associating with the received
message a \sphere" containing 2n(H (X jY )+) possible inputs. If there exists a
unique codeword in this sphere, we decode the message as that codeword.
As before, it is unlikely that no codeword will be in the sphere, but we
must exclude the possibility that there are more than one. Each decoding
sphere contains a fraction
2n(H (X jY )+) = 2;n(H (X );H (X jY );)
2nH (X )
= 2;n(I (X ;Y );); (5.30)
of the typical inputs. If there are 2nR codewords, the probability that any
one falls in the decoding sphere by accident is
2nR2;n(I (X ;Y );) = 2;n(I (X ;Y );R;): (5.31)
Since can be chosen arbitrarily small, we can choose R as close to I as we
please (but less than I ), and still have the probability of a decoding error
become exponentially small as n ! 1.
This argument shows that when we average over random codes and over
codewords, the probability of an error becomes small for any rate R < I . The
same reasoning as before then demonstrates the existence of a particular code
with error probability < " for every codeword. This is a satisfying result,
as it is consistent with our interpretation of I as the information that we
5.1. SHANNON FOR DUMMIES 177
gain about the input X when the signal Y is received { that is, I is the
information per letter that we can send over the channel.
The mutual information I (X ; Y ) depends not only on the channel con-
ditional probabilities p(yjx) but also on the priori probabilities p(x) of the
letters. The above random coding argument applies for any choice of the
p(x)'s, so we have demonstrated that errorless transmission is possible for
any rate R less than
C fMax
p(x)g I (X ; Y ): (5.32)

C is called the channel capacity and depends only on the conditional proba-
bilities p(yjx) that de ne the channel.
We have now shown that any rate R < C is attainable, but is it possible
for R to exceed C (with the error probability still approaching 0 for large
n)? To show that C is an upper bound on the rate may seem more subtle
in the general case than for the binary symmetric channel { the probability
of error is di erent for di erent letters, and we are free to exploit this in the
design of our code. However, we may reason as follows:
Suppose we have chosen 2nR strings of n letters as our codewords. Con-
sider a probability distribution (denoted X~ n ) in which each codeword occurs
with equal probability (= 2;nR ). Evidently, then,
H (X~ n ) = nR: (5.33)
Sending the codewords through the channel we obtain a probability distri-
bution Y~ n of output states.
Because we assume that the channel acts on each letter independently,
the conditional probability for a string of n letters factorizes:
p(y1y2 ynjx1x2 xn ) = p(y1jx1)p(y2jx2) p(yn jxn);
(5.34)
and it follows that the conditional entropy satis es
X
H (Y~ n jX~ n) = h; log p(ynjxn)i = h; log p(yijxi)i
Xi ~ ~
= H (Yi jXi ); (5.35)
i
178 CHAPTER 5. QUANTUM INFORMATION THEORY
where X~i and Y~i are the marginal probability distributions for the ith letter
determined by our distribution on the codewords. Recall that we also know
that H (X; Y ) H (X ) + H (Y ), or
X
H (Y~ n ) H (Y~i): (5.36)
i
It follows that
I (Y~ n ; X~ n) = H (Y~ n ) ; H (Y~ njX~ n )
X
(H (Y~i) ; H (Y~i jX~i))
Xi ~ ~
= I (Yi; Xi) nC ; (5.37)
i
the mutual information of the messages sent and received is bounded above
by the sum of the mutual information per letter, and the mutual information
for each letter is bounded above by the capacity (because C is de ned as the
maximum of I (X ; Y )).
Recalling the symmetry of mutual information, we have
I (X~ n; Y~ n) = H (X~ n ) ; H (X~ n jY~ n )
= nR ; H (X~ n jY~ n) nC: (5.38)
Now, if we can decode reliably as n ! 1, this means that the input code-
word is completely determined by the signal received, or that the conditional
entropy of the input (per letter) must get small
1 H (X~ n jY~ n ) ! 0: (5.39)
n
If errorless transmission is possible, then, eq. (5.38) becomes
R C; (5.40)
in the limit n ! 1. The rate cannot exceed the capacity. (Remember that
the conditional entropy, unlike the mutual information, is not symmetric.
Indeed (1=n)H (Y~ n jX~ n ) does not become small, because the channel intro-
duces uncertainty about what message will be received. But if we can decode
accurately, there is no uncertainty about what codeword was sent, once the
signal has been received.)
5.2. VON NEUMANN ENTROPY 179
We have now shown that the capacity C is the highest rate of communi-
cation through the noisy channel that can be attained, where the probability
of error goes to zero as the number of letters in the message goes to in nity.
This is Shannon's noisy channel coding theorem.
Of course the method we have used to show that R = C is asymptotically
attainable (averaging over random codes) is not very constructive. Since a
random code has no structure or pattern, encoding and decoding would be
quite unwieldy (we require an exponentially large code book). Nevertheless,
the theorem is important and useful, because it tells us what is in principle
attainable, and furthermore, what is not attainable, even in principle. Also,
since I (X ; Y ) is a concave function of X = fx; p(x)g (with fp(yjx)g xed),
it has a unique local maximum, and C can often be computed (at least
numerically) for channels of interest.

5.2 Von Neumann Entropy

In classical information theory, we often consider a source that prepares mes-
sages of n letters (n 1), where each letter is drawn independently from
an ensemble X = fx; p(x)g. We have seen that the Shannon information
H (X ) is the number of incompressible bits of information carried per letter
(asymptotically as n ! 1).
We may also be interested in correlations between messages. The cor-
relations between two ensembles of letters X and Y are characterized by
conditional probabilities p(yjx). We have seen that the mutual information
I (X ; Y ) = H (X ) ; H (X jY ) = H (Y ) ; H (Y jX ); (5.41)
is the number of bits of information per letter about X that we can acquire by
reading Y (or vice versa). If the p(yjx)'s characterize a noisy channel, then,
I (X ; Y ) is the amount of information per letter than can be transmitted
through the channel (given the a priori distribution for the X 's).
We would like to generalize these considerations to quantum information.
So let us imagine a source that prepares messages of n letters, but where each
letter is chosen from an ensemble of quantum states. The signal alphabet
consists of a set of quantum states x, each occurring with a speci ed a priori
probability px.
As we have already discussed at length, the probability of any outcome of
any measurement of a letter chosen from this ensemble, if the observer has no
180 CHAPTER 5. QUANTUM INFORMATION THEORY
knowledge about which letter was prepared, can be completely characterized
by the density matrix
= X pxx; (5.42)
x
for the POVM fF ag, we have
Prob(a) = tr(F a): (5.43)
For this (or any) density matrix, we may de ne the Von Neumann entropy
S () = ;tr( log ): (5.44)
Of course, if we choose an orthonormal basis fjaig that diagonalizes ,
= X ajaihaj; (5.45)
a
then
S () = H (A); (5.46)
where H (A) is the Shannon entropy of the ensemble A = fa; ag.
In the case where the signal alphabet consists of mutually orthogonal pure
states, the quantum source reduces to a classical one; all of the signal states
can be perfectly distinguished, and S () = H (X ). The quantum source
is more interesting when the signal states are not mutually commuting.
We will argue that the Von Neumann entropy quanti es the incompressible
information content of the quantum source (in the case where the signal
states are pure) much as the Shannon entropy quanti es the information
content of a classical source.
Indeed, we will nd that Von Neumann entropy plays a dual role. It
quanti es not only the quantum information content per letter of the ensem-
ble (the minimum number of qubits per letter needed to reliably encode the
information) but also its classical information content (the maximum amount
of information per letter|in bits, not qubits|that we can gain about the
preparation by making the best possible measurement). And, we will see that
Von Neumann information enters quantum information in yet a third way:
quantifying the entanglement of a bipartite pure state. Thus quantum infor-
mation theory is largely concerned with the interpretation and uses of Von
5.2. VON NEUMANN ENTROPY 181
Neumann entropy, much as classical information theory is largely concerned
with the interpretation and uses of Shannon entropy.
In fact, the mathematical machinery we need to develop quantum infor-
mation theory is very similar to Shannon's mathematics (typical sequences,
random coding, : : : ); so similar as to sometimes obscure that the concep-
tional context is really quite di erent. The central issue in quantum informa-
tion theory is that nonorthogonal pure quantum states cannot be perfectly
distinguished, a feature with no classical analog.

5.2.1 Mathematical properties of S ()

There are a handful of properties of S () that are frequently useful (many
of which are closely analogous to properties of H (X )). I list some of these
properties below. Most of the proofs are not dicult (a notable exception is
the proof of strong subadditivity), and are included in the exercises at the
end of the chapter. Some proofs can also be found in A. Wehrl, \General
Properties of Entropy," Rev. Mod. Phys. 50 (1978) 221, or in Chapter 9 of
A. Peres, Quantum Theory: Concepts and Methods.
(1) Purity. A pure state = j'ih'j has S () = 0.
(2) Invariance. The entropy is unchanged by a unitary change of basis:
S (UU ;1 ) = S (): (5.47)
This is obvious, since S () depends only on the eigenvalues of .
(3) Maximum. If has D nonvanishing eigenvalues, then
S () log D; (5.48)
with equality when all the nonzero eigenvalues are equal. (The entropy
is maximized when the quantum state is chosen randomly.)
(4) Concavity. For 1 ; 2; ; n 0 and 1 + 2 + + n = 1
S (11 + + n n ) 1S (1) + + n S (n):
(5.49)
That is, the Von Neumann entropy is larger if we are more ignorant
about how the state was prepared. This property is a consequence of
the convexity of the log function.
182 CHAPTER 5. QUANTUM INFORMATION THEORY
(5) Entropy of measurement. Suppose that, in a state , we measure
the observable
A = X jay iay hay j; (5.50)
y
so that the outcome ay occurs with probability
p(ay ) = hay jjay i: (5.51)
Then the Shannon entropy of the ensemble of measurement outcomes
Y = fay ; p(ay )g satis es
H (Y ) S (); (5.52)
with equality when A and commute. Mathematically, this is the
statement that S () increases if we replace all o -diagonal matrix ele-
ments of by zero, in any basis. Physically, it says that the randomness
of the measurement outcome is minimized if we choose to measure an
observable that commutes with the density matrix. But if we measure
a \bad" observable, the result will be less predictable.
(6) Entropy of preparation. If a pure state is drawn randomly from the
ensemble fj'xi; pxg, so that the density matrix is
= X pxj'xih'x j; (5.53)
x
then
H (X ) S (); (5.54)
with equality if the signal states j'xi are mutually orthogonal. This
statement indicates that distinguishability is lost when we mix nonorthog-
onal pure states. (We can't fully recover the information about which
state was prepared, because, as we'll discuss later on, the information
gain attained by performing a measurement cannot exceed S ().)
(7) Subadditivity. Consider a bipartite system AB in the state AB . Then
S (AB ) S (A) + S (B ); (5.55)
5.2. VON NEUMANN ENTROPY 183
(where A = trB AB and B = trAAB ), with equality for AB = A
B . Thus, entropy is additive for uncorrelated systems, but otherwise
the entropy of the whole is less than the sum of the entropy of the
parts. This property is analogous to the property
H (X; Y ) H (X ) + H (Y ); (5.56)
(or I (X ; Y ) 0) of Shannon entropy; it holds because some of the
information in XY (or AB ) is encoded in the correlations between X
and Y (A and B ).
(8) Strong subadditivity. For any state ABC of a tripartite system,
S (ABC ) + S (B ) S (AB ) + S (BC ): (5.57)
This property is called \strong" subadditivity in that it reduces to
subadditivity in the event that B is one-dimensional. The proof of the
corresponding property of Shannon entropy is quite simple, but the
proof for Von Neumann entropy turns out to be surprisingly dicult (it
is sketched in Wehrl). You may nd the strong subadditivity property
easier to remember by thinking about it this way: AB and BC can be
regarded as two overlapping subsystems. The entropy of their union
(ABC ) plus the entropy of their intersection (B ) does not exceed the
sum of the entropies of the subsystems (AB and BC ). We will see that
strong subadditivity has deep and important consequences.
(9) Triangle inequality (Araki-Lieb inequality): For a bipartite sys-
tem,
S (AB ) jS (A) ; S (B )j: (5.58)
The triangle inequality contrasts sharply with the analogous property
of Shannon entropy
H (X; Y ) H (X ); H (Y ); (5.59)
or
H (X jY ); H (Y jX ) 0: (5.60)
The Shannon entropy of a classical bipartite system exceeds the Shan-
non entropy of either part { there is more information in the whole
184 CHAPTER 5. QUANTUM INFORMATION THEORY
system than in part of it! Not so for the Von Neumann entropy. In the
extreme case of a bipartite pure quantum state, we have S (A) = S (B )
(and nonzero if the state is entangled) while S (AB ) = 0. The bipar-
tite state has a de nite preparation, but if we measure observables of
the subsystems, the measurement outcomes are inevitably random and
unpredictable. We cannot discern how the state was prepared by ob-
serving the two subsystems separately, rather, information is encoded
in the nonlocal quantum correlations. The juxtaposition of the posi-
tivity of conditional Shannon entropy (in the classical case) with the
triangle inequality (in the quantum case) nicely characterizes a key
distinction between quantum and classical information.

5.2.2 Entropy and thermodynamics

Of course, the concept of entropy rst entered science through the study of
thermodynamics. I will digress brie y here on some thermodynamic impli-
cations of the mathematic properties of S ().
There are two distinct (but related) possible approaches to the founda-
tions of quantum statistical physics. In the rst, we consider the evolution of
an isolated (closed) quantum system, but we perform some coarse graining
to de ne our thermodynamic variables. In the second approach, which is
perhaps better motivated physically, we consider an open system, a quantum
system in contact with its environment, and we track the evolution of the
open system without monitoring the environment.
For an open system, the crucial mathematical property of the Von Neu-
mann entropy is subadditivity. If the system (A) and environment (E ) are
initially uncorrelated with one another
AE = A E ; (5.61)
then entropy is additive:
S (AE ) = S (A ) + S (E ): (5.62)
Now suppose that the open system evolves for a while. The evolution is
described by a unitary operator U AE that acts on the combined system A
plus E :
AE ! 0AE = U AE AE U ;AE1 ; (5.63)
5.2. VON NEUMANN ENTROPY 185
and since unitary evolution preserves S , we have
S (0AE ) = S (AE ): (5.64)
Finally, we apply subadditivity to the state 0AE to infer that
S (A) + S (E ) = S (0AE ) S (0A ) + S (0E ); (5.65)
(with equality in the event that A and E remain uncorrelated). If we de ne
the \total" entropy of the world as the sum of the entropy of the system
and the entropy of the environment, we conclude that the entropy of the
world cannot decrease. This is one form of the second law of thermodynam-
ics. But note that we assumed that system and environment were initially
uncorrelated to derive this \law."
Typically, the interaction of system and environment will induce corre-
lations so that (assuming no initial correlations) the entropy will actually
increase. From our discussion of the master equation, in x3.5 you'll recall
that the environment typically \forgets" quickly, so that if our time resolution
is coarse enough, we can regard the system and environment as \initially"
uncorrelated (in e ect) at each instant of time (the Markovian approxima-
tion). Under this assumption, the \total" entropy will increase monotoni-
cally, asymptotically approaching its theoretical maximum, the largest value
it can attain consistent with all relevant conservation laws (energy, charge,
baryon number, etc.)
Indeed, the usual assumption underlying quantum statistical physics is
that system and environment are in the \most probable con guration," that
which maximizes S (A )+ S (E ). In this con guration, all \accessible" states
are equally likely.
From a microscopic point of view, information initially encoded in the
system (our ability to distinguish one initial state from another, initially
orthogonal, state) is lost; it winds up encoded in quantum entanglement
between system and environment. In principle that information could be
recovered, but in practice it is totally inaccessible to localized observers.
Hence thermodynamic irreversibility.
Of course, we can adapt this reasoning to apply to a large closed system
(the whole universe?). We may divide the system into a small part of the
whole and the rest (the environment of the small part). Then the sum of
the entropies of the parts will be nondecreasing. This is a particular type of
coarse graining. That part of a closed system behaves like an open system
186 CHAPTER 5. QUANTUM INFORMATION THEORY
is why the microcanonical and canonical ensembles of statistical mechanics
yield the same predictions for large systems.

5.3 Quantum Data Compression

What is the quantum analog of the noiseless coding theorem?
We consider a long message consisting of n letters, where each letter is
chosen at random from the ensemble of pure states
fj'xi; px g; (5.66)
and the j'xi's are not necessarily mutually orthogonal. (For example, each
j'xi might be the polarization state of a single photon.) Thus, each letter is
described by the density matrix
= X pxj'xih'x j; (5.67)
x
and the entire message has the density matrix
n = : (5.68)
Now we ask, how redundant is this quantum information? We would
like to devise a quantum code that enables us to compress the message to
a smaller Hilbert space, but without compromising the delity of the mes-
sage. For example, perhaps we have a quantum memory device (the hard
disk of a quantum computer?), and we know the statistical properties of the
recorded data (i.e., we know ). We want to conserve space on the device
by compressing the data.
The optimal compression that can be attained was found by Ben Schu-
macher. Can you guess the answer? The best possible compression compati-
ble with arbitrarily good delity as n ! 1 is compression to a Hilbert space
H with
log(dim H) = nS (): (5.69)
In this sense, the Von Neumann entropy is the number of qubits of quantum
information carried per letter of the message. For example, if the message
consists of n photon polarization states, we can compress the message to
5.3. QUANTUM DATA COMPRESSION 187
m = nS () photons { compression is always possible unless = 21 1. (We
can't compress random qubits just as we can't compress random bits.)
Once Shannon's results are known and understood, the proof of Schu-
macher's theorem is not dicult. Schumacher's important contribution was
to ask the right question, and so to establish for the rst time a precise
(quantum) information theoretic interpretation of Von Neumann entropy.2

5.3.1 Quantum data compression: an example

Before discussing Schumacher's quantum data compression protocol in full
generality, it is helpful to consider a simple example. So suppose that our
letters are single qubits drawn from the ensemble

j "z i = 10 p p = 21 ;
(5.70)
j "xi = 11==p22 p = 12 ;
so that the density matrix of each letter is
= 21 j "z ih"z j + 12 j "xih"x j
1 1 0 1 1 1 ! 3 1!
= 2 0 0 + 2 12 21 = 14 41 : (5.71)
2 2 4 4
As is obvious from symmetry, the eigenstates of are qubits oriented up and
down along the axis n^ = p12 (^x + z^),
cos !
j00i j "n^ i = sin 8 ;
8 !
sin
j10i j #n^ i = ; cos 8 ; (5.72)
8
the eigenvalues are
(00) = 21 + p1 = cos2 8 ;
2 2
(10) = 12 ; p1 = sin2 8 ; (5.73)
2 2
2An interpretation of S () in terms of classical information encoded in quantum states
was actually known earlier, as we'll soon discuss.
188 CHAPTER 5. QUANTUM INFORMATION THEORY
(evidently (00) + (10 ) = 1 and (00)(10) = 81 = det). The eigenstate j00i
has equal (and relatively large) overlap with both signal states
jh00j "z ij2 = jh00 j "xij2 = cos2 8 = :8535; (5.74)
while j10i has equal (and relatively small) overlap with both
jh10j "z ij2 = jh10j "xij2 = sin2 8 = :1465: (5.75)
Thus if we don't know whether j "z i or j "xi was sent, the best guess we can
make is j i = j00i. This guess has the maximal delity
F = 21 jh"z j ij2 + 12 jh"x j ij2; (5.76)
among all possible qubit states j i (F = .8535).
Now imagine that Alice needs to send three letters to Bob. But she can
a ord to send only two qubits (quantum channels are very expensive!). Still
she wants Bob to reconstruct her state with the highest possible delity.
She could send Bob two of her three letters, and ask Bob to guess j00i
for the third. Then Bob receives the two letters with F = 1, and he has
F = :8535 for the third; hence F = :8535 overall. But is there a more clever
procedure that achieves higher delity?
There is a better procedure. By diagonalizing , we decomposed the
Hilbert space of a single qubit into a \likely" one-dimensional subspace
(spanned by j00i) and an \unlikely" one-dimensional subspace (spanned by
j10i). In a similar way we can decompose the Hilbert space of three qubits
into likely and unlikely subspaces. If j i = j 1ij 2ij 3i is any signal state
(with each of three qubits in either the j "z i or j "xi state), we have

jh0 0 0 j ij = cos 8 = :6219;
0 0 0 2 6

jh0 0 1 j ij = jh0 1 0 j ij = jh1 0 0 j ij = cos 8 sin2 8 = :1067;
0 0 0 2 0 0 0 2 0 0 0 2 4

jh0 1 1 j ij = jh1 0 1 j ij = jh1 1 0 j ij = cos 8 sin4 8 = :0183;
0 0 0 2 0 0 0 2 0 0 0 2 2

jh1 1 1 j ij = sin 8 = :0031:
0 0 0 2 6 (5.77)
Thus, we may decompose the space into the likely subspace spanned by
fj000000i; j000010i; j001000 i; j100000ig, and its orthogonal complement ?. If we
5.3. QUANTUM DATA COMPRESSION 189
make a (\fuzzy") measurement that projects a signal state onto or ? , the
probability of projecting onto the likely subspace is
Plikely = :6219 + 3(:1067) = :9419; (5.78)
while the probability of projecting onto the unlikely subspace is
Punlikely = 3(:0183) + :0031 = :0581: (5.79)
To perform this fuzzy measurement, Alice could, for example, rst apply
a unitary transformation U that rotates the four high-probability basis states
to
jijij0i; (5.80)
and the four low-probability basis states to
jijij1i; (5.81)
then Alice measures the third qubit to complete the fuzzy measurement. If
the outcome is j0i, then Alice's input state has been projected (in e ect)
onto . She sends the remaining two (unmeasured) qubits to Bob. When
Bob receives this (compressed) two-qubit state j compi, he decompresses it
by appending j0i and applying U ;1, obtaining
j 0i = U ;1(j compij0i): (5.82)
If Alice's measurement of the third qubit yields j1i, she has projected her
input state onto the low-probability subspace ?. In this event, the best
thing she can do is send the state that Bob will decompress to the most
likely state j000000i { that is, she sends the state j compi such that
j 0i = U ;1(j compij0i) = j000000i: (5.83)
Thus, if Alice encodes the three-qubit signal state j i, sends two qubits to
Bob, and Bob decodes as just described, then Bob obtains the state 0
j ih j ! 0 = E j ih jE + j000000ih j(1 ; E )j ih000000j;
(5.84)
where E is the projection onto . The delity achieved by this procedure is
F = h j0j i = (h jEj i)2 + (h j(1 ; E )j i)(h j000000i)2
= (:9419)2 + (:0581)(:6219) = :9234: (5.85)
190 CHAPTER 5. QUANTUM INFORMATION THEORY
This is indeed better than the naive procedure of sending two of the three
qubits each with perfect delity.
As we consider longer messages with more letters, the delity of the com-
pression improves. The Von-Neumann entropy of the one-qubit ensemble
is

S () = H cos2 8 = :60088 : : : (5.86)
Therefore, according to Schumacher's theorem, we can shorten a long mes-
sage by the factor (say) .6009, and still achieve very good delity.

5.3.2 Schumacher encoding in general

The key to Shannon's noiseless coding theorem is that we can code the typical
sequences and ignore the rest, without much loss of delity. To quantify the
compressibility of quantum information, we promote the notion of a typical
sequence to that of a typical subspace. The key to Schumacher's noiseless
quantum coding theorem is that we can code the typical subspace and ignore
its orthogonal complement, without much loss of delity.
We consider a message of n letters where each letter is a pure quantum
state drawn from the ensemble fj'xi; pxg, so that the density matrix of a
single letter is
= X pxj'xih'x j: (5.87)
x
Furthermore, the letters are drawn independently, so that the density matrix
of the entire message is
n : (5.88)
We wish to argue that, for n large, this density matrix has nearly all of its
support on a subspace of the full Hilbert space of the messages, where the
dimension of this subspace asymptotically approaches 2nS().
This conclusion follows directly from the corresponding classical state-
ment, if we consider the orthonormal basis in which is diagonal. Working
in this basis, we may regard our quantum information source as an e ectively
classical source, producing messages that are strings of eigenstates, each
with a probability given by the product of the corresponding eigenvalues.
5.3. QUANTUM DATA COMPRESSION 191
For a speci ed n and , de ne the typical subspace as the space spanned
by the eigenvectors of n with eigenvalues satisfying
2;n(S;) e;n(S+): (5.89)
Borrowing directly from Shannon, we conclude that for any ; " > 0 and n
suciently large, the sum of the eigenvalues of n that obey this condition
satis es
tr(n E ) > 1 ; "; (5.90)
(where E denotes the projection onto the typical subspace) and the number
dim() of such eigenvalues satis es
2n(S+) dim() (1 ; ")2n(S;) : (5.91)
Our coding strategy is to send states in the typical subspace faithfully. For
example, we can make a fuzzy measurement that projects the input message
onto either or ?; the outcome will be with probability P = tr(n E ) >
1 ; ". In that event, the projected state is coded and sent. Asymptotically,
the probability of the other outcome becomes negligible, so it matters little
what we do in that case.
The coding of the projected state merely packages it so it can be carried
by a minimal number of qubits. For example, we apply a unitary change of
basis U that takes each state j typi in to a state of the form
U j typi = j compij0resti; (5.92)
where j compi is a state of n(S + ) qubits, and j0resti denotes the state
j0i : : : j0i of the remaining qubits. Alice sends j compi to Bob, who
decodes by appending j0resti and applying U ;1 .
Suppose that
j'ii = j'x1(i)i : : : j'xn(i)i; (5.93)
denotes any one of the n-letter pure state messages that might be sent. After
coding, transmission, and decoding are carried out as just described, Bob has
reconstructed a state
j'iih'i j ! 0i = E j'iih'ijE
+ i;Junkh'ij(1 ; E )j'ii; (5.94)
192 CHAPTER 5. QUANTUM INFORMATION THEORY
where i;Junk is the state we choose to send if the fuzzy measurement yields
the outcome ? . What can we say about the delity of this procedure?
The delity varies from message to message (in contrast to the example
discussed above), so we consider the delity averaged over the ensemble of
possible messages:
X
F = pi h'ij0ij'ii
Xi X
= pi h'ijE j'iih'i jEj'ii + pi h'iji;Junkj'iih'i j1 ; E j'ii
Xi i
pi k E j'ii k ; 4 (5.95)
i
where the last inequality holds because the \junk" term is nonnegative. Since
any real number satis es
(x ; 1)2 0; or x2 2x ; 1; (5.96)
we have (setting x =k E j'ii k2)
k E j'ii k4 2 k E j'ii k2 ;1 = 2h'ijE j'ii ; 1; (5.97)
and hence
X
F pi (2h'ijE j'ii ; 1)
i
= 2 tr(nE ) ; 1 > 2(1 ; ") ; 1 = 1 ; 2": (5.98)
We have shown, then, that it is possible to compress the message to fewer
than n(S + ) qubits, while achieving an average delity that becomes arbi-
trarily good a n gets large.
So we have established that the message may be compressed, with in-
signi cant loss of delity, to S + qubits per letter. Is further compression
possible?
Let us suppose that Bob will decode the message comp;i that he receives
by appending qubits and applying a unitary transformation U ;1, obtaining
0i = U ;1(comp;i j0ih0j)U (5.99)
(\unitary decoding"). Suppose that comp has been compressed to n(S ;
) qubits. Then, no matter how the input message have been encoded, the
5.3. QUANTUM DATA COMPRESSION 193
decoded messages are all contained in a subspace 0 of Bob's Hilbert space
of dimension 2n(S;) . (We are not assuming now that 0 has anything to do
with the typical subspace.)
If the input message is j'ii, then the message reconstructed by Bob is 0i
which can be diagonalized as
0i = X jaiiai haij; (5.100)
ai
where the jaii's are mutually orthogonal states in 0. The delity of the
reconstructed message is
Fi = h'i j0ij'ii
X
= ai h'i jaiihaij'ii
ai
X
h'i jaiihai j'ii h'i jE 0j'ii; (5.101)
ai

where E 0 denotes the orthogonal projection onto the subspace 0. The aver-
age delity therefore obeys
X X
F = pi Fi pih'i jE 0j'ii = tr(nE 0): (5.102)
i i
But since E 0 projects onto a space of dimension 2n(S;) ; tr(n E 0) can be no
larger than the sum of the 2n(S;) largest eigenvalues of n . It follows from
the properties of typical subspaces that this sum becomes as small as we
please; for n large enough
F tr(nE 0) < ": (5.103)
Thus we have shown that, if we attempt to compress to S ; qubits per
letter, then the delity inevitably becomes poor for n suciently large. We
conclude then, that S () qubits per letter is the optimal compression of the
quantum information that can be attained if we are to obtain good delity as
n goes to in nity. This is Schumacher's noiseless quantum coding theorem.
The above argument applies to any conceivable encoding scheme, but only
to a restricted class of decoding schemes (unitary decodings). A more general
decoding scheme can certainly be contemplated, described by a superoperator.
More technology is then required to prove that better compression than S
194 CHAPTER 5. QUANTUM INFORMATION THEORY
qubits per letter is not possible. But the conclusion is the same. The point is
that n(S ; ) qubits are not sucient to distinguish all of the typical states.
To summarize, there is a close analogy between Shannon's noiseless cod-
ing theorem and Schumacher's noiseless quantum coding theorem. In the
classical case, nearly all long messages are typical sequences, so we can code
only these and still have a small probability of error. In the quantum case,
nearly all long messages have nearly unit overlap with the typical subspace,
so we can code only the typical subspace and still achieve good delity.
In fact, Alice could send e ectively classical information to Bob|the
string x1x2 xn encoded in mutually orthogonal quantum states|and Bob
could then follow these classical instructions to reconstruct Alice's state.
By this means, they could achieve high- delity compression to H (X ) bits|
or qubits|per letter. But if the letters are drawn from an ensemble of
nonorthogonal pure states, this amount of compression is not optimal; some
of the classical information about the preparation of the state has become re-
dundant, because the nonorthogonal states cannot be perfectly distinguished.
Thus Schumacher coding can go further, achieving optimal compression to
S () qubits per letter. The information has been packaged more eciently,
but at a price|Bob has received what Alice intended, but Bob can't know
what he has. In contrast to the classical case, Bob can't make any measure-
ment that is certain to decipher Alice's message correctly. An attempt to
read the message will unavoidably disturb it.

5.3.3 Mixed-state coding: Holevo information

The Schumacher theorem characterizes the compressibility of an ensemble of
pure states. But what if the letters are drawn from an ensemble of mixed
states? The compressibility in that case is not rmly established, and is the
subject of current research.3
It is easy to see that S () won't be the answer for mixed states. To give
a trivial example, suppose that a particular mixed state 0 with S (0) 6= 0
is chosen with probability p0 = 1. Then the message is always 0 0
0 and it carries no information; Bob can reconstruct the message
perfectly without receiving anything from Alice. Therefore, the message can
be compressed to zero qubits per letters, which is less than S () > 0.
To construct a slightly less trivial example, recall that for an ensemble of
3 See M. Horodecki, quant-ph/9712035.
5.3. QUANTUM DATA COMPRESSION 195
mutually orthogonal pure states, the Shannon entropy of the ensemble equals
the Von Neumann entropy
H (X ) = S (); (5.104)
so that the classical and quantum compressibility coincide. This makes sense,
since the orthogonal states are perfectly distinguishable. In fact, if Alice
wants to send the message
j'x i'x i j'xn i
1 2 (5.105)
to Bob, she can send the classical message x1 : : :xn to Bob, who can recon-
struct the state with perfect delity.
But now suppose that the letters are drawn from an ensemble of mutually
orthogonal mixed states fx; pxg,
trxy = 0 for x 6= y; (5.106)
that is, x and y have support on mutually orthogonal subspaces of the
Hilbert space. These mixed states are also perfectly distinguishable, so again
the messages are essentially classical, and therefore can be compressed to
H (X ) qubits per letter. For example, we can extend the Hilbert space HA
of our letters to the larger space HA HB , and choose a puri cation of each
x, a pure state j'xiAB 2 HA HB such that
trB (j'xiAB AB h'xj) = (x)A: (5.107)
These pure states are mutually orthogonal, and the ensemble fj'xiAB ; px g
has Von Neumann entropy H (X ); hence we may Schumacher compress a
message
j'x iAB j'xn iAB ;
1 (5.108)
to H (X ) qubits per letter (asymptotically). Upon decompressing this state,
Bob can perform the partial trace by \throwing away" subsystem B , and so
reconstruct Alice's message.
To make a reasonable guess about what expression characterizes the com-
pressibility of a message constructed from a mixed state alphabet, we might
seek a formula that reduces to S () for an ensemble of pure states, and to
196 CHAPTER 5. QUANTUM INFORMATION THEORY
H (X ) for an ensemble of mutually orthogonal mixed states. Choosing a basis
in which
= X pxx; (5.109)
x
is block diagonalized, we see that
X
S () = ;tr log = ; tr(pxx) log(pxx)
X x
X
= ; px log px ; pxtrx log x
x X x
= H (X ) + pxS (x); (5.110)
x
(recalling that trx = 1 for each x). Therefore we may write the Shannon
entropy as
X
H (X ) = S () ; pxS (x) (E ): (5.111)
x
The quantity (E ) is called the Holevo information of the ensemble E =
fx; pxg. Evidently, it depends not just on the density matrix , but also
on the particular way that is realized as an ensemble of mixed states. We
have found that, for either an ensemble of pure states, or for an ensemble of
mutually orthogonal mixed states, the Holevo information (E ) is the optimal
number of qubits per letter that can be attained if we are to compress the
messages while retaining good delity for large n.
The Holevo information can be regarded as a generalization of Von Neu-
mann entropy, reducing to S () for an ensemble of pure states. It also bears a
close resemblance to the mutual information of classical information theory:
I (Y ; X ) = H (Y ) ; H (Y jX ) (5.112)
tells us how much, on the average, the Shannon entropy of Y is reduced once
we learn the value of X ; similarly,
X
(E ) = S () ; pxS (x) (5.113)
x
tells us how much, on the average, the Von Neumann entropy of an ensemble
is reduced when we know which preparation was chosen. Like the classical
5.3. QUANTUM DATA COMPRESSION 197
mutual information, the Holevo information is always nonnegative, as follows
from the concavity property of S (),
X X
S ( px x) pxS (x): (5.114)
x
Now we wish to explore the connection between the Holevo information and
the compressibility of messages constructed from an alphabet of nonorthog-
onal mixed states. In fact, it can be shown that, in general, high- delity
compression to less than qubits per letter is not possible.
To establish this result we use a \monotonicity" property of that was
proved by Lindblad and by Uhlmann: A superoperator cannot increase the
Holevo information. That is, if $ is any superoperator, let it act on an
ensemble of mixed states according to
$ : E = fx; pxg ! E 0 = f$(x); px g; (5.115)
then
(E 0) (E ): (5.116)
Lindblad{Uhlmann monotonicity is closely related to the strong subadditiv-
ity of the Von Neumann entropy, as you will show in a homework exercise.
The monotonicity of provides a further indication that quanti es
an amount of information encoded in a quantum system. The decoherence
described by a superoperator can only retain or reduce this quantity of infor-
mation { it can never increase it. Note that, in contrast, the Von Neumann
entropy is not monotonic. A superoperator might take an initial pure state
to a mixed state, increasing S (). But another superoperator takes every
mixed state to the \ground state" j0ih0j, and so reduces the entropy of an
initial mixed state to zero. It would be misleading to interpret this reduction
of S as an \information gain," in that our ability to distinguish the di er-
ent possible preparations has been completely destroyed. Correspondingly,
decay to the ground state reduces the Holevo information to zero, re ecting
that we have lost the ability to reconstruct the initial state.
We now consider messages of n letters, each drawn independently from
the ensemble E = fx; pxg; the ensemble of all such input messages is denoted
E (n). A code is constructed that compresses the messages so that they all
occupy a Hilbert space H~ (n); the ensemble of compressed messages is denoted
E~(n). Then decompression is performed with a superoperator $,
$ : E~(n) ! E 0(n); (5.117)
198 CHAPTER 5. QUANTUM INFORMATION THEORY
to obtain an ensemble E 0(n) of output messages.
Now suppose that this coding scheme has high delity. To minimize
technicalities, let us not specify in detail how the delity of E 0(n) relative to
E (n) should be quanti ed. Let us just accept that if E 0(n) has high delity,
then for any and n suciently large
1 (E (n)) ; 1 (E 0(n)) 1 (E (n) ) + ; (5.118)
n n n
the Holevo information per letter of the output approaches that of the input.
Since the input messages are product states, it follows from the additivity of
S () that
(E (n)) = n(E ); (5.119)
and we also know from Lindblad{Uhlmann monotonicity that
(E 0(n)) (E~(n)): (5.120)
By combining eqs. (5.118)-(5.120), we nd that
1 (E~(n)) (E ) ; : (5.121)
n
Finally, (E~(n) ) is bounded above by S (~(n)), which is in turn bounded above
by log dim H~ (n). Since may be as small as we please, we conclude that,
asymptotically as n ! 1,
1 log(dim H~ (n) ) (E ); (5.122)
n
high- delity compression to fewer than (E ) qubits per letter is not possible.
One is sorely tempted to conjecture that compression to (E ) qubits per
letter is asymptotically attainable. As of mid-January, 1998, this conjecture
still awaits proof or refutation.

5.4 Accessible Information

The close analogy between the Holevo information (E ) and the classical
mutual information I (X ; Y ), as well as the monotonicity of , suggest that
is related to the amount of classical information that can be stored in
5.4. ACCESSIBLE INFORMATION 199
and recovered from a quantum system. In this section, we will make this
connection precise.
The previous section was devoted to quantifying the quantum information
content { measured in qubits { of messages constructed from an alphabet of
quantum states. But now we will turn to a quite di erent topic. We want to
quantify the classical information content { measured in bits { that can be
extracted from such messages, particularly in the case where the alphabet
includes letters that are not mutually orthogonal.
Now, why would we be so foolish as to store classical information in
nonorthogonal quantum states that cannot be perfectly distinguished? Stor-
ing information this way should surely be avoided as it will degrade the
classical signal. But perhaps we can't help it. For example, maybe I am a
communications engineer, and I am interested in the intrinsic physical limi-
tations on the classical capacity of a high bandwidth optical ber. Clearly,
to achieve a higher throughout of classical information per unit power, we
should choose to encode information in single photons, and to attain a high
rate, we should increase the number of photons transmitted per second. But
if we squeeze photon wavepackets together tightly, the wavepackets will over-
lap, and so will not be perfectly distinguishable. How do we maximize the
classical information transmitted in that case? As another important ex-
ample, maybe I am an experimental physicist, and I want to use a delicate
quantum system to construct a very sensitive instrument that measures a
classical force acting on the system. We can model the force as a free pa-
rameter x in the system's Hamiltonian H (x). Depending on the value of x,
the state of the system will evolve to various possible nal (nonorthogonal)
states x. How much information about x can our apparatus acquire?
While physically this is a much di erent issue than the compressibility
of quantum information, mathematically the two questions are related. We
will nd that the Von Neumann entropy and its generalization the Holevo
information will play a central role in the discussion.
Suppose, for example, that Alice prepares a pure quantum state drawn
from the ensemble E = fj'xi; px g. Bob knows the ensemble, but not the
particular state that Alice chose. He wants to acquire as much information
as possible about x.
Bob collects his information by performing a generalized measurement,
the POVM fF y g. If Alice chose preparation x, Bob will obtain the measure-
200 CHAPTER 5. QUANTUM INFORMATION THEORY
ment outcome y with conditional probability
p(yjx) = h'xjF y j'xi: (5.123)
These conditional probabilities, together with the ensemble X , determine the
amount of information that Bob gains on the average, the mutual information
I (X ; Y ) of preparation and measurement outcome.
Bob is free to perform the measurement of his choice. The \best" possible
measurement, that which maximizes his information gain, is called the op-
timal measurement determined by the ensemble. The maximal information
gain is
Acc(E ) = Max I (X ; Y );
fF y g (5.124)
where the Max is over all POVM's. This quantity is called the accessible
information of the ensemble E .
Of course, if the states j'xi are mutually orthogonal, then they are per-
fectly distinguishable. The orthogonal measurement
E y = j'y ih'y j; (5.125)
has conditional probability
p(yjx) = y;x; (5.126)
so that H (X jY ) = 0 and I (X ; Y ) = H (X ). This measurement is clearly
optimal { the preparation is completely determined { so that
Acc(E ) = H (X ); (5.127)
for an ensemble of mutually orthogonal (pure or mixed) states.
But the problem is much more interesting when the signal states are
nonorthogonal pure states. In this case, no useful general formula for Acc(E )
is known, but there is an upper bound
Acc(E ) S (): (5.128)
We have seen that this bound is saturated in the case of orthogonal signal
states, where S () = H (X ). In general, we know from classical information
theory that I (X ; Y ) H (X ); but for nonorthogonal states we have S () <
5.4. ACCESSIBLE INFORMATION 201
H (X ), so that eq. (5.128) is a better bound. Even so, this bound is not tight;
in many cases Acc(E ) is strictly less than S ().
We obtain a sharper relation between Acc(E ) and S () if we consider the
accessible information per letter in a message containing n letters. Now Bob
has more exibility { he can choose to perform a collective measurement on
all n letters, and thereby collect more information than if he were restricted
to measuring only one letter at a time. Furthermore, Alice can choose to
prepare, rather than arbitrary messages with each letter drawn from the en-
semble E , an ensemble of special messages (a code) designed to be maximally
distinguishable.
We will then see that Alice and Bob can nd a code such that the marginal
ensemble for each letter is E , and the accessible information per letter asymp-
totically approaches S () as n ! 1. In this sense, S () characterizes the
accessible information of an ensemble of pure quantum states.
Furthermore, these results generalize to ensembles of mixed quantum
states, with the Holevo information replacing the Von Neumann entropy.
The accessible information of an ensemble of mixed states fx; px g satis es
Acc(E ) (E ); (5.129)
a result known as the Holevo bound. This bound is not tight in general
(though it is saturated for ensembles of mutually orthogonal mixed states).
However, if Alice and Bob choose an n-letter code, where the marginal en-
semble for each letter is E , and Bob performs an optimal POVM on all n
letters collectively, then the best attainable accessible information per let-
ter is (E ) { if all code words are required to be product states. In this
sense, (E ) characterizes the accessible information of an ensemble of mixed
quantum states.
One way that an alphabet of mixed quantum states might arise is that
Alice might try to send pure quantum states to Bob through a noisy quantum
channel. Due to decoherence in the channel, Bob receives mixed states that
he must decode. In this case, then, (E ) characterizes the maximal amount
of classical information that can be transmitted to Bob through the noisy
quantum channel.
For example, Alice might send to Bob n photons in certain polarization
states. If we suppose that the noise acts on each photon independently, and
that Alice sends unentangled states of the photons, then (E ) is the maximal
202 CHAPTER 5. QUANTUM INFORMATION THEORY
amount of information that Bob can acquire per photon. Since
(E ) S () 1; (5.130)
it follows in particular that a single (unentangled) photon can carry at most
one bit of classical information.

5.4.1 The Holevo Bound

The Holevo bound on the accessible information is not an easy theorem, but
like many good things in quantum information theory, it follows easily once
the strong subadditivity of Von Neumann entropy is established. Here we
will assume strong subadditivity and show that the Holevo bound follows.
Recall the setting: Alice prepares a quantum state drawn from the en-
semble E = fx; pxg, and then Bob performs the POVM fF y g. The joint
probability distribution governing Alice's preparation x and Bob's outcome
y is
p(x; y) = px trfF y xg: (5.131)
We want to show that
I (X ; Y ) (E ): (5.132)
Since strong subadditivity is a property of three subsystems, we will need
to identify three systems to apply it to. Our strategy will be to prepare an
input system X that stores a classical record of what preparation was chosen
and an output system Y whose classical correlations with x are governed by
the joint probability distribution p(x; y). Then applying strong subadditivity
to X; Y , and our quantum system Q, we will be able to relate I (X ; Y ) to
(E ).
Suppose that the initial state of the system XQY is
XQY = X px jxihxj x j0ih0j; (5.133)
x
where the jxi's are mutually orthogonal pure states of the input system X ,
and j0i is a particular pure state of the output system Y . By performing
partial traces, we see that
X = X pxjxihxj ! S (X ) = H (X )
Xx
Q = pxx ! S (QY ) = S (Q) = S (): (5.134)
x
5.4. ACCESSIBLE INFORMATION 203
and since the jxi's are mutually orthogonal, we also have
X
S (XQY ) = S (XQ) = ;tr(px x log pxx)
Xx
= H (X ) + pxS (x): (5.135)
x
Now we will perform a unitary transformation that \imprints" Bob's mea-
surement result in the output system Y . Let us suppose, for now, that Bob
performs an orthogonal measurement fE y g, where
E y E y0 = y;y0 E y ; (5.136)
(we'll consider more general POVM's shortly). Our unitary transformation
U QY acts on QY according to
U QY : j'iQ j0iY = X E y j'iQ jyiY ; (5.137)
y
(where the jyi's are mutually orthogonal), and so transforms XQY as
U QY : XQY ! 0XQY = X0 px jxihxj E y xE y0 jyihy0j:
x;y;y (5.138)
Since Von Neumann entropy is invariant under a unitary change of basis, we
have
X
S (0XQY ) = S (XQY ) = H (x) + pxS (x);
x
S (0QY ) = S (QY ) = S (); (5.139)
and taking a partial trace of eq. (5.138) we nd
0XY = X pxtr(E y x)jxihxj jyihyj
x;y
X
= p(x; y)jx; yihx; yj ! S (0XY ) = H (X; Y );
x;y (5.140)
(using eq. (5.136). Evidently it follows that
0Y = X p(y)jyihyj ! S (0Y ) = H (Y ): (5.141)
y
204 CHAPTER 5. QUANTUM INFORMATION THEORY
Now we invoke strong subadditivity in the form
S (0XQY ) + S (0Y ) S (0XY ) + S (0QY ); (5.142)
which becomes
X
H (X ) + pxS (x) + H (Y ) H (X; Y ) + S ();
x (5.143)
or
X
I (X ; Y ) = H (X ) + H (Y ) ; H (X; Y ) S () ; px S (x) = (E ):
x (5.144)
This is the Holevo bound.
One way to treat more general POVM's is to enlarge the system by ap-
pending one more subsystem Z . We then construct a unitary U QY Z acting
as
q
U QY Z : j'iQ j0iY j0iZ = X F y j'iA jyiY jyiZ ;
y (5.145)
so that
q q
0XQY Z = X0 px jxihxj F y x F y0 jyihy0j jyihy0j:
x;y;y (5.146)
Then the partial trace over Z yields
q q
0XQY = X pxjxihxj F y x F y jyihyj; (5.147)
x;y
and
0XY = X pxtr(F y x)jxihxj jyihyj
x;y
X
= p(x; y)jx; yihx; yj
x;y
! S (0XY ) = H (X; Y ): (5.148)
The rest of the argument then runs as before.
5.4. ACCESSIBLE INFORMATION 205
5.4.2 Improving distinguishability: the Peres{Wootters
method
To better acquaint ourselves with the concept of accessible information, let's
consider a single-qubit example. Alice prepares one of the three possible pure
states

j'1i = j "n^1 i = 10 ;
0 11
j'2i = j "n^2 i = @ ;p32 A ;
0 21 1
;
j'3i = j "n^3 i = @ p23 A ;
;2 (5.149)

a spin- 21 object points in one of three directions that are symmetrically dis-
tributed in the xz-plane. Each state has a priori probability 31 . Evidently,
Alice's \signal states" are nonorthogonal:
h'1j'2i = h'1j'3i = h'2 j'3i = ; 12 : (5.150)
Bob's task is to nd out as much as he can about what Alice prepared by
making a suitable measurement. The density matrix of Alice's ensemble is
= 13 (j'1ih'1 j + j'2ih'3 j + j'3ih'3j) = 21 1; (5.151)
which has S () = 1. Therefore, the Holevo bound tells us that the mutual
information of Alice's preparation and Bob's measurement outcome cannot
exceed 1 bit.
In fact, though, the accessible information is considerably less than the
one bit allowed by the Holevo bound. In this case, Alice's ensemble has
enough symmetry that it is not hard to guess the optimal measurement.
Bob may choose a POVM with three outcomes, where
F a = 32 (1 ; j'aih'aj); a = 1; 2; 3; (5.152)
we see that
(
p(ajb) = h'b jF aj'bi = 01 aa = 6
=
b;
b: (5.153)
2
206 CHAPTER 5. QUANTUM INFORMATION THEORY
Therefore, the measurement outcome a excludes the possibility
1 that Alice
prepared a, but leaves equal a posteriori probabilities p = 2 for the other
two states. Bob's information gain is
I = H (X ) ; H (X jY ) = log2 3 ; 1 = :58496: (5.154)
To show that this measurement is really optimal, we may appeal to a variation
on a theorem of Davies, which assures us that an optimal POVM can be
chosen with three F a's that share the same three-fold symmetry as the three
states in the input ensemble. This result restricts the possible POVM's
enough so that we can check that eq. (5.152) is optimal with an explicit
calculation. Hence we have found that the ensemble E = fj'ai; pa = 13 g has
accessible information.

Acc(E ) = log2 32 = :58496::: (5.155)
The Holevo bound is not saturated.
Now suppose that Alice has enough cash so that she can a ord to send
two qubits to Bob, where again each qubit is drawn from the ensemble E .
The obvious thing for Alice to do is prepare one of the nine states
j'aij'bi; a; b = 1; 2; 3; (5.156)
each with pab = 1=9. Then Bob's best strategy is to perform the POVM
eq. (5.152) on each of the two qubits, achieving a mutual information of
.58496 bits per qubit, as before.
But Alice and Bob are determined to do better. After discussing the
problem with A. Peres and W. Wootters, they decide on a di erent strategy.
Alice will prepare one of three two-qubit states
jai = j'aij'ai; a = 1; 2; 3; (5.157)
each occurring with a priori probability pa = 1=2. Considered one-qubit at
a time, Alice's choice is governed by the ensemble E , but now her two qubits
have (classical) correlations { both are prepared the same way.
The three jai's are linearly independent, and so span a three-dimensional
subspace of the four-dimensional two-qubit Hilbert space. In a homework
exercise, you will show that the density matrix
X3 !
1
= 3 jaiha j ; (5.158)
a=1
5.4. ACCESSIBLE INFORMATION 207
has the nonzero eigenvalues 1=2; 1=4; 1=4, so that

S () = ; 21 log 12 ; 2 41 log 14 = 23 : (5.159)
The Holevo bound requires that the accessible information per qubit is less
than 3=4 bit. This would at least be consistent with the possibility that we
can exceed the .58496 bits per qubit attained by the nine-state method.
Naively, it may seem that Alice won't be able to convey as much clas-
sical information to Bob, if she chooses to send one of only three possible
states instead of nine. But on further re ection, this conclusion is not obvi-
ous. True, Alice has fewer signals to choose from, but the signals are more
distinguishable; we have
ha jbi = 14 ; a 6= b; (5.160)
instead of eq. (5.150). It is up to Bob to exploit this improved distinguishabil-
ity in his choice of measurement. In particular, Bob will nd it advantageous
to perform collective measurements on the two qubits instead of measuring
them one at a time.
It is no longer obvious what Bob's optimal measurement will be. But Bob
can invoke a general procedure that, while not guaranteed optimal, is usually
at least pretty good. We'll call the POVM constructed by this procedure a
\pretty good measurement" (or PGM).
Consider some collection of vectors j~ ai that are not assumed to be or-
thogonal or normalized. We want to devise a POVM that can distinguish
these vectors reasonably well. Let us rst construct
G = X j~ aih~ a j; (5.161)
a
This is a positive operator on the space spanned by the j~ ai's. Therefore, on
that subspace, G has an inverse, G;1 and that inverse has a positive square
root G;1=2. Now we de ne
F a = G;1=2j~ aih~ a jG;1=2; (5.162)
and we see that
X X ~ ~ ! ;1=2
Fa = G ; 1 = 2 jaiha j G
a a
= G GG;1=2 = 1;
; 1 = 2 (5.163)
208 CHAPTER 5. QUANTUM INFORMATION THEORY
on the span of the j~ ai's. If necessary, we can augment these F a's with one
more positive operator, the projection F 0 onto the orthogonal complement
of the span of the j~ ai's, and so construct a POVM. This POVM is the PGM
associated with the vectors j~ ai.
In the special case where the j~ ai's are orthogonal,
q
~
jai = ajai; (5.164)
(where the jai's are orthonormal), we have
F a = X (jb i;b 1=2hb j)(ajaihaj)(jc i;c 1=2hc j)
a;b;c
= jaihaj; (5.165)
this is the orthogonal measurement that perfectly distinguishes the jai's
and so clearly is optimal. If the j~ ai's are linearly independent but not
orthogonal, then the PGM is again an orthogonal measurement (because n
one-dimensional operators in an n-dimensional space can constitute a POVM
only if mutually orthogonal), but in that case the measurement may not be
optimal.
In the homework, you'll construct the PGM for the vectors jai in eq. (5.157),
and you'll show that
!2
1 1
p(aja) = ha jF ajai = 3 1 + p = :971405
2
!2
1 1
p(bja) = ha jF bjai = 6 1 ; p = :0142977;
2 (5.166)
(for b 6= a). It follows that the conditional entropy of the input is
H (X jY ) = :215893; (5.167)
and since H (X ) = log2 3 = 1:58496, the information gain is
I = H (X ) ; H (X jY ) = 1:36907; (5.168)
a mutual information of .684535 bits per qubit. Thus, the improved dis-
tinguishability of Alice's signals has indeed paid o { we have exceeded the
5.4. ACCESSIBLE INFORMATION 209
.58496 bits that can be extracted from a single qubit. We still didn't saturate
the Holevo bound (I < 1:5 in this case), but we came a lot closer than before.
This example, rst described by Peres and Wootters, teaches some useful
lessons. First, Alice is able to convey more information to Bob by \pruning"
her set of codewords. She is better o choosing among fewer signals that
are more distinguishable than more signals that are less distinguishable. An
alphabet of three letters encodes more than an alphabet of nine letters.
Second, Bob is able to read more of the information if he performs a
collective measurement instead of measuring each qubit separately. His opti-
mal orthogonal measurement projects Alice's signal onto a basis of entangled
states.
The PGM described here is \optimal" in the sense that it gives the best
information gain of any known measurement. Most likely, this is really the
highest I that can be achieved with any measurement, but I have not proved
it.

5.4.3 Attaining Holevo: pure states

With these lessons in mind, we can proceed to show that, given an ensemble
of pure states, we can construct n-letter codewords that asymptotically attain
an accessible information of S () per letter.
We must select a code, the ensemble of codewords that Alice can pre-
pare, and a \decoding observable," the POVM that Bob will use to try to
distinguish the codewords. Our task is to show that Alice can choose 2n(S;)
codewords, such that Bob can determine which one was sent, with negligi-
ble probability of error as n ! 1. We won't go through all the details of
the argument, but will be content to understand why the result is highly
plausible.
The main idea, of course, is to invoke random coding. Alice chooses
product signal states
j'x1 ij'x2 i : : : j'xn i; (5.169)
by drawing each letter at random from the ensemble E = fj'xi; px g. As we
have seen, for a typical code each typical codeword has a large overlap with a
typical subspace (n) that has dimension dim (n) > 2n(S();). Furthermore,
for a typical code, the marginal ensemble governing each letter is close to E .
Because the typical subspace is very large for n large, Alice can choose
many codewords, yet be assured that the typical overlap of two typical code-
210 CHAPTER 5. QUANTUM INFORMATION THEORY
words is very small. Heuristically, the typical codewords are randomly dis-
tributed in the typical subspace, and on average, two random unit vectors in
a space of dimension D have overlap 1=D. Therefore if jui and jwi are two
codewords
hjhujwij2 i < 2;n(S;) : (5.170)
Here < > denotes an average over random typical codewords.
You can convince yourself that the typical codewords really are uniformly
distributed in the typical subspace as follows: Averaged over the ensemble,
the overlap of random codewords j'x1 i : : : j'xn i and j'y1 i : : : j'yn i is
X
= px1 : : : pxn py1 : : : pyn (jh'x1 j'y1 ij2 : : : jh'xn j'yn ij2)
= tr( : : : )2: (5.171)
Now suppose we restrict the trace to the typical subspace (n); this space
has dim(n) < 2n(S+) and the eigenvalues of (n) = : : : restricted to
(n) satisfy < 2;n(S;) . Therefore
hjhujwij2i = tr [(n)]2 < 2n(S+) [2;n(S;) ]2 = 2;n(S;3) ;
(5.172)
where tr denotes the trace in the typical subspace.
Now suppose that 2n(S;) random codewords fjuiig are selected. Then if
juj i is any xed codeword
X
hjhuijuj ij2i < 2n(S;) 2;n(S;0 ) + " = 2;n(;0 ) + ";
i6=j (5.173)
here the sum is over all codewords, and the average is no longer restricted to
the typical codewords { the " on the right-hand side arises from the atypical
case. Now for any xed , we can choose 0 and " as small as we please for
n suciently large; we conclude that when we average over both codes and
codewords within a code, the codewords become highly distinguishable as
n ! 1.
Now we invoke some standard Shannonisms: Since eq. (5.173) holds when
we average over codes, it also holds for a particular code. (Furthermore, since
nearly all codes have the property that the marginal ensemble for each letter
is close to E , there is a code with this property satisfying eq. (5.173).) Now
5.4. ACCESSIBLE INFORMATION 211
eq. (5.173) holds when we average over the particular codeword juj i. But by
throwing away at most half of the codewords, we can ensure that each and
every codeword is highly distinguishable from all the others.
We see that Alice can choose 2n(S;) highly distinguishable codewords,
which become mutually orthogonal as n ! 1. Bob can perform a PGM
at nite n that approaches an optimal orthogonal measurement as n ! 1.
Therefore the accessible information per letter
1 Acc(E~(n)) = S () ; ; (5.174)
n
is attainable, where E~(n) denotes Alice's ensemble of n-letter codewords.
Of course, for any nite n, Bob's POVM will be a complicated collective
measurement performed on all n letters. To give an honest proof of attain-
ability, we should analyze the POVM carefully, and bound its probability of
error. This has been done by Hausladen, et al.4 The handwaving argument
here at least indicates why their conclusion is not surprising.
It also follows from the Holevo bound and the subadditivity of the entropy
that the accessible information per letter cannot exceed S () asymptotically.
The Holevo bound tells us that
Acc(E~(n) ) S (~(n)); (5.175)
where ~ (n) denotes the density matrix of the codewords, and subadditivity
implies that
X
n
S (~(n)) S (~i); (5.176)
i=1
where ~ i is the reduced density matrix of the ith letter. Since each ~ i ap-
proaches asymptotically, we have
1 ~(n) 1 (n)) S ():
!1 n Acc(E ) nlim
nlim !1 n S (~ (5.177)
To derive this bound, we did not assume anything about the code, except
that the marginal ensemble for each letter asymptotically approaches E . In
4P. Hausladen, R. Jozsa, B. Schumacher, M. Westmoreland, and W. K. Wootters,
\Classical information capacity of a quantum channel," Phys. Rev. A 54 (1996) 1869-
1876.
212 CHAPTER 5. QUANTUM INFORMATION THEORY
particular the bound applies even if the codewords are entangled states rather
than product states. Therefore we have shown that S () is the optimal
accessible information per letter.
We can de ne a kind of channel capacity associated with a speci ed al-
phabet of pure quantum states, the \ xed-alphabet capacity." We suppose
that Alice is equipped with a source of quantum states. She can produce any
one of the states j'xi, but it is up to her to choose the a priori probabilities
of these states. The xed-alphabet capacity Cfa is the maximum accessible
information per letter she can achieve with the best possible distribution
fpx g. We have found that
Cfa = fMax
px g S (): (5.178)
Cfa is the optimal number of classical bits we can encode per letter (asymp-
totically), given the speci ed quantum-state alphabet of the source.

5.4.4 Attaining Holevo: mixed states

Now we would like to extend the above reasoning to a more general context.
We will consider n-letter messages, where the marginal ensemble for each
letter is the ensemble of mixed quantum states
E = fx; px g: (5.179)
We want to argue that it is possible (asymptotically as n ! 1) to convey
(E ) bits of classical information per letter. Again, our task is to: (1) specify
a code that Alice and Bob can use, where the ensemble of codewords yields
the ensemble E letter by letter (at least asymptotically). (2) Specify Bob's
decoding observable, the POVM he will use to attempt to distinguish the
codewords. (3) Show that Bob's probability of error approaches zero as
n ! 1. As in our discussion of the pure-state case, I will not exhibit the
complete proof (see Holevo5 and Schumacher and Westmoreland6). Instead,
I'll o er an argument (with even more handwaving than before, if that's
possible) indicating that the conclusion is reasonable.
5 A.S. Holevo, \The Capacity of the Quantum Channel with General Signal States,"
quant-ph/9611023
6 B. Schumacher and M.D. Westmoreland, \Sending Classical Information Via Noisy
Quantum Channels," Phys. Rev. A 56 (1997) 131-138.
5.4. ACCESSIBLE INFORMATION 213
As always, we will demonstrate attainability by a random coding argu-
ment. Alice will select mixed-state codewords, with each letter drawn from
the ensemble E . That is, the codeword
x x : : : xn ;
1 2
(5.180)
is chosen with probability px1 px2 : : : pxn . The idea is that each typical code-
word can be regarded as an ensemble of pure states, with nearly all of its
support on a certain typical subspace. If the typical subspaces of the various
codewords have little overlap, then Bob will be able to perform a POVM that
identi es the typical subspace characteristic of Alice's message, with small
probability of error.
What is the dimension of the typical subspace of a typical codeword? If
we average over the codewords, the mean entropy of a codeword is
X
hS (n)i = px1 px2 : : :pxn S (x1 x2 : : : xn ):
x1 :::xn (5.181)
Using additivity of the entropy of a product state, and x px = 1, we obtain
X
hS (n)i = n pxS (x) nhS i: (5.182)
x
For n large, the entropy of a codeword is, with high probability, close to this
mean, and furthermore, the high probability eigenvalues of x1 : : : x2
are close to 2;nhSi . In other words a typical x1 : : : xn has its support
on a typical subspace of dimension 2nhSi.
This statement is closely analogous to the observation (crucial to the
proof of Shannon's noisy channel coding theorem) that the number of typical
messages received when a typical message is sent through a noisy classical
channel is 2nH (Y jX ).
Now the argument follows a familiar road. For each typical message
x1x2 : : :xn, Bob can construct a \decoding subspace" of dimension 2n(hSi+),
with assurance that Alice's message is highly likely to have nearly all its
support on this subspace. His POVM will be designed to determine in which
decoding subspace Alice's message lies. Decoding errors will be unlikely if
typical decoding subspaces have little overlap.
Although Bob is really interested only in the value of the decoding sub-
space (and hence x1x2 : : : xn), let us suppose that he performs the complete
PGM determined by all the vectors that span all the typical subspaces of
214 CHAPTER 5. QUANTUM INFORMATION THEORY
Alice's codewords. (And this PGM will approach an orthogonal measure-
ment for large n, as long as the number of codewords is not too large.) He
obtains a particular result which is likely to be in the typical subspace of
dimension 2nS() determined by the source : : : , and furthermore,
is likely to be in the decoding subspace of the message that Alice actually
sent. Since Bob's measurement results are uniformly distributed in a space
on dimension 2nS , and the pure-state ensemble determined by a particular
decoding subspace has dimension 2n(hSi+) , the average overlap of the vector
determined by Bob's result with a typical decoding subspace is
2n(hSi+) = 2;n(S;hSi;) = 2;n(;) : (5.183)
2nS
If Alice chooses 2nR codewords, the average probability of a decoding error
will be
2nR2;n(;) = 2;n(;R;) : (5.184)
We can choose any R less than , and this error probability will get very
small as n ! 1.
This argument shows that the probability of error is small, averaged over
both random codes and codewords. As usual, we can choose a particular
code, and throw away some codewords to achieve a small probability of error
for every codeword. Furthermore, the particular code may be chosen to
be typical, so that the marginal ensemble for each codeword approaches E
as n ! 1. We conclude that an accessible information of per letter is
asymptotically attainable.
The structure of the argument closely follows that for the corresponding
classical coding theorem. In particular, the quantity arose much as I does
in Shannon's theorem. While 2;nI is the probability that a particular typical
sequence lies in a speci ed decoding sphere, 2;n is the overlap of a particular
typical state with a speci ed decoding subspace.

5.4.5 Channel capacity

Combining the Holevo bound with the conclusion that bits per letter is
attainable, we obtain an expression for the classical capacity of a quantum
channel (But with a caveat: we are assured that this \capacity" cannot be
exceeded only if we disallow entangled codewords.)
5.4. ACCESSIBLE INFORMATION 215
Alice will prepare n-letter messages and send them through a noisy quan-
tum channel to Bob. The channel is described by a superoperator, and we
will assume that the same superoperator $ acts on each letter independently
(memoryless quantum channel). Bob performs the POVM that optimizes his
information going about what Alice prepared.
It will turn out, in fact, that Alice is best o preparing pure-state messages
(this follows from the subadditivity of the entropy). If a particular letter is
prepared as the pure state j'xi, Bob will receive
j'xih'x j ! $(j'x ih'xj) x: (5.185)
And if Alice sends the pure state j'x1 i : : : j'xn i, Bob receives the mixed
state x1 : : : xn . Thus, the ensemble of Alice's codewords determines
as ensemble E~(n) of mixed states received by Bob. Hence Bob's optimal
information gain is by de nition Acc(E~(n)), which satis es the Holevo bound
Acc(E~(n) ) (E~(n)): (5.186)
Now Bob's ensemble is
fx1 : : : xn ; p(x1; x2; : : : ; xn)g; (5.187)
where p(x1; x2 : : : ; xn) is a completely arbitrary probability distribution on
Alice's codewords. Let us calculate for this ensemble. We note that
X
p(x1; x2; : : : ; xn)S (x1 : : : xn )
x1 :::xn
X h i
= p(x1; x2; : : : ; xn) S (x1 ) + S (x2 ) + : : : + S (xn )
x1 :::xn
X X X
= p1(x1)S (x1 ) + p2(x2)S (x2 ) + : : : + pn(xn )S (xn );
x1 x2 xn (5.188)
where, e.g., p1(x1) = Px2:::xn p(x1; x2; : : : ; xn) is the marginal probability
distribution for the rst letter. Furthermore, from subadditivity we have
S (~(n)) S (~1) + S (~2) + : : : + S (~n ); (5.189)
where ~ i is the reduced density matrix for the ith letter. Combining eq. (5.188)
and eq. (5.189) we nd that
(E~(n)) (E~1) + : : : + (E~n ); (5.190)
216 CHAPTER 5. QUANTUM INFORMATION THEORY
where E~i is the marginal ensemble governing the ith letter that Bob receives.
Eq. (5.190) applies to any ensemble of product states.
Now, for the channel described by the superoperator $, we de ne the
product-state channel capacity
C ($) = max
E
($(E )): (5.191)
Therefore, (E~i) C for each term in eq. (5.190) and we obtain
(E~(n) ) nC; (5.192)
where E~(n) is any ensemble of product states. In particular, we infer from
the Holevo bound that Bob's information gain is bounded above by nC . But
we have seen that ($(E )) bits per letter can be attained asymptotically for
any E , with the right choice of code and decoding observable. Therefore, C
is the optimal number of bits per letter that can be sent through the noisy
channel with negligible error probability, if the messages that Alice prepares
are required to be product states.
We have left open the possibility that the product-state capacity C ($)
might be exceeded if Alice is permitted to prepare entangled states of her
n letters. It is not known (in January, 1998) whether there are quantum
channels for which a higher rate can be attained by using entangled messages.
This is one of the many interesting open questions in quantum information
theory.

5.5 Entanglement Concentration

Before leaving our survey of quantum information theory, we will visit one
more topic where Von Neumann entropy plays a central role: quantifying
entanglement.
Consider two bipartite pure states. One is a maximally entangled state
of two qubits
j+i = p1 (j00i + j11i): (5.193)
2
The other is a partially entangled state of two qutrits
j i = p12 j00i + 12 j11i + 21 j22i: (5.194)
5.5. ENTANGLEMENT CONCENTRATION 217
which state is more entangled?
It is not immediately clear that the question has a meaningful answer.
Why should it be possible to nd an unambiguous way of placing all bipartite
states on a continuum, of ordering them according to their degree of entan-
glement? Can we compare a pair of qutrits with a pair of qubits any more
than we can compare an apple and an orange?
A crucial feature of entanglement is that it cannot be created by local
operations. In particular, if Alice and Bob share a bipartite pure state, they
cannot increase its Schmidt number by any local operations { any unitary
transformation or POVM performed by Alice or Bob, even if Alice and Bob
exchange classical messages about their actions and measurement outcomes.
So a number used to quantify entanglement ought to have the property that
local operations do not increase it. An obvious candidate is the Schmidt
number, but on re ection it does not seem very satisfactory. Consider
q
j "i = 1 ; 2j"j2j00i + "j11i + "j22i; (5.195)
which has Schmidt number 3 for any j"j > 0. Should we really say that j "i
is \more entangled" than j+i? Entanglement, after all, can be regarded as
a resource { we might plan to use it for teleportation, for example. It seems
clear that j "i (for j"j 1) is a less valuable resource than j'+i.
It turns out, though, that there is a natural and sensible way to quan-
tify the entanglement of any bipartite pure state. To compare two states,
we perform local operations to change their entanglement to a common cur-
rency that can be compared directly. The common currency is a maximally
entangled state.
A precise statement about interchangeability (via local operations) of
various forms of entanglement will necessarily be an asymptotic statement.
That is, to precisely quantify the entanglement of a particular bipartite pure
state, j iAB , let us imagine that we wish to prepare n identical copies of
that state. We have available a large supply of maximally entangled Bell
pairs shared by Alice and Bob. Alice and Bob are to use k of the Bell
pairs (j+iAB )k , and with local operations and classical communication,
to prepare n copies of the desired state ((j iAB )n ). What is the minimum
number kmin of Bell pairs with which they can perform this task?
And now suppose that n copies of j iAB have already been prepared.
Alice and Bob are to perform local operations that will transform the entan-
glement of (j iAB )n back to the standard form; that is, they are to extract
218 CHAPTER 5. QUANTUM INFORMATION THEORY

k0 Bell pairs (j+iAB )k0 . What is the maximum number kmax
0 of Bell pairs
that can be extracted (locally) from (j iAB )n ?
Since it is an in inviolable principle that local operations cannot create
entanglement, it is certain that
0 k :
kmax (5.196)
min

But we can show that

k k 0
!1 n E (j iAB ):
min max
nlim
!1 n = nlim (5.197)
In this sense, then, locally transforming n copies of the bipartite pure state
j iAB into k0 maximally entangled pairs is an asymptotically reversible pro-
cess. Since n copies of j iAB can be exchanged for k Bell pairs and vice
versa, we see that nk Bell pairs unambiguously characterizes the amount of
entanglement carried by the state j iAB . We will call the ratio k=n (in the
n ! 1 limit) the entanglement E of j iAB . The quantity E measures both
what we need to pay (in Bell pairs) to create j iAB , and the value of j iAB
as a resource (e.g., the number of qubits that can be faithfully teleported
using j iAB ).
Now, given a particular pure state j iAB , what is the value of E ? Can
you guess the answer? It is
E = S (A ) = S (B ); (5.198)
the entanglement is the Von Neumann entropy of Alice's density matrix A
(or Bob's density matrix B ). This is clearly the right answer in the case
where j iAB is a product of k Bell pairs. In that case A (or B ) is 12 1 for
each qubit in Alice's possession
A = 21 1 21 1 : : : 21 1; (5.199)
and
1
S (A) = kS 2 1 = k: (5.200)
We must now see why E = S (A ) is the right answer for any bipartite pure
state.
5.5. ENTANGLEMENT CONCENTRATION 219
First we want to show that if Alice and Bob share k = n(S (A ) + )
Bell pairs, than they can (by local operations) prepare (j iAB )n with high
delity. They may perform this task by combining quantum teleportation
with Schumacher compression. First, by locally manipulating a bipartite
system AC that is under her control, Alice constructs (n copies of) the state
j iAC . Thus, we may regard the state of system C as a pure state drawn from
an ensemble described by C , where S (C ) = S (A ). Next Alice performs
Schumacher compression on her n copies of C , retaining(ngood delity while
squeezing the typical states in (HC )n down to a space H~ C ) ) with
dim H~ (Cn) = 2n(S(A)+): (5.201)
Now Alice and Bob can use the n(S (A)+ ) Bell pairs they share to teleport
the compressed state from Alice's H~ (Cn) to Bob's H~ (Bn). The teleportation,
which in principle has perfect delity, requires only local operations and
classical communication, if Alice and Bob share the required number of Bell
pairs. Finally, Bob Schumacher decompresses the state he receives; then
Alice and Bob share (j iAB )n (with arbitrarily good delity as n ! 1).
Let us now suppose that Alice and Bob have prepared the state (j iAB )n .
Since j iAB is, in general, a partially entangled state, the entanglement that
Alice and Bob share is in a diluted form. They wish to concentrate their
shared entanglement by squeezing it down to the smallest possible Hilbert
space; that is, they want to convert it to maximally-entangled pairs. We will
show that Alice and Bob can \distill" at least
k0 = n(S (A ) ; ) (5.202)
Bell pairs from (j iAB )n , with high likelihood of success.
Since we know that Alice and Bob are not able to create entanglement
locally, they can't turn k Bell pairs into k0 > k pairs through local operations,
at least not with high delity and success probability. It follows then that
nS (A ) is the minimum number of Bell pairs needed to create n copies of
j iAB , and that nS (A) is the maximal number of Bell pairs that can be
distilled from n copies of j iAB . If we could create j iAB from Bell pairs
more eciently, or we could distill Bell pairs from j iAB more eciently,
then we would have a way for Alice and Bob to increase their supply of Bell
pairs with local operations, a known impossibility. Therefore, if we can nd
a way to distill k0 = n(S (A ) ; ) Bell pairs from n copies of j iAB , we know
that E = S (A ).
220 CHAPTER 5. QUANTUM INFORMATION THEORY
To illustrate the concentration of entanglement, imagine that Alice and
Bob have n copies of the partially entangled pure state of two qubits
j ()iAB = cos j00i + sin j11i: (5.203)
(Any bipartite pure state of two qubits can be written this way, if we adopt
the Schmidt basis and a suitable phase convention.) That is, Alice and Bob
share the state
(j ()i)n = (cos j00i + sin j11i)n : (5.204)
Now let Alice (or Bob) perform a local measurement on her (his) n qubits.
Alice measures the total spin of her n qubits along the z-axis
X
n
(total)
3;A = (3i;A) : (5.205)
i=1
A(total)
crucial feature of this measurement is its \fuzziness." The observable
3;A is highly degenerate; Alice projects the state of her n spins onto one
of the large eigenspaces of this observable. She does not measure the spin of
any single qubit; in fact, she is very careful not to acquire any information
other than the value of (total)
3;A , or equivalently, the number of up spins.
n
n If we expand eq. (5.204), we nd altogether 2 terms. Of these, there are
m terms in which exactly m of the qubits that Alice holds have the value
1. And each of these terms has a coecient (cos )n;m (sin )m. Thus, the
probability that Alice's measurement reveals that m spins are \up" is
n
P (m) = m (cos2 )n;m (sin2 )m : (5.206)
Furthermore, if she obtains this outcome, then her measurement has prepared
n
an equally weighted superposition of all m states that have m up spins. (Of
course, since Alice's and Bob's spins are perfectly correlated, if Bob were to
(total)
measure 3;B , he would nd exactly the same result as Alice. Alternatively,
Alice could report her result to Bob in a classical message, and so save Bob the
trouble of doing the measurement himself.) No matter what the measurement
result, Alice and Bob now share a new state j 0iAB such that all the nonzero
eigenvalues of 0A (and 0B ) are equal.
For n large, the probability distribution P (m) in eq. (5.206) peaks sharply
{ the probability is close to 1 that m=n is close to sin2 and that
n n
m n sin
2 2 nH (sin2 ) ; (5.207)
5.5. ENTANGLEMENT CONCENTRATION 221
where H (p) = ;p log p ; (1 ; p) log(1 ; p) is the entropy function. That is,
with probability greater than 1 ;", the entangled state now shared by Alice
and Bob has a Schmidt number mn with
n
2n(H (sin2 );) < m < 2n(H (sin2 )+): (5.208)
Now Alice and Bob want to convert their shared entanglement to standard
(j i) Bell pairs. If the Schmidt number of their shared maximally entangled
+
state happened to be a power of 2, this would be easy. Both Alice and Bob
could perform a unitary transformation that would rotate the 2k -dimensional
support of her/his density matrix to the Hilbert space of k-qubits, and then
they could discard the rest of their qubits. The k pairs that they retain would
then be maximally
n entangled.
Of course m need not be close to a power of 2. But if Alice and Bob
share many batches of n copies of the partially entangled state, they can
concentrate the entanglement in each batch. After operating on ` batches,
they will have obtained a maximally entangled state with Schmidt number

NSchm = mn mn mn : : : mn ; (5.209)
1 2 3 `
where each mi is typically close to n sin2 . For any " > 0, this Schmidt
number will eventually, for some `, be close to a power of 2,
2k` NSchm < 2k` (1 + "): (5.210)
At that point, either Alice or Bob can perform a measurement that attempts
to project the support of dimension 2k` (1 + ") of her/his density matrix to
a subspace of dimension 2k` , succeeding with probability 1 ; ". Then they
rotate the support to the Hilbert space of k` qubits, and discard the rest of
their qubits. Typically, k` is close to n`H (sin2 ), so that they distill about
H (sin2 ) maximally entangled pairs from each partially entangled state, with
a success probability close to 1.
Of course, though the number m of up spins that Alice (or Bob) nds in
her (his) measurement is typically close to n sin2 , it can uctuate about this
value. Sometimes Alice and Bob will be lucky, and then will manage to distill
more than H (sin2 ) Bell pairs per copy of j ()iAB . But the probability of
doing substantially better becomes negligible as n ! 1.
222 CHAPTER 5. QUANTUM INFORMATION THEORY
These considerations easily generalize to bipartite pure states in larger
Hilbert spaces. A bipartite pure state with Schmidt number s can be ex-
pressed, in the Schmidt basis, as
j (a1; a2; : : : ; as)iAB = a1j11i + a2j22i + : : : + asjssi:
(5.211)
Then in the state (j iAB )n, Alice (or Bob) can measure the total number
of j1i's, the total number of j2i's, etc. in her (his) possession. If she nds
m1j1i's, m2j2i's, etc., then her measurement prepares a maximally entangled
state with Schmidt number
NSchm = (m )!(m n)!! (m )! : (5.212)
1 2 s
For m large, Alice will typically nd
mi jaij2n; (5.213)
and therefore
NSch 2nH ; (5.214)
where
X
H= ;jaij2 log jaij2 = S (A): (5.215)
i
Thus, asymptotically for n ! 1, close to nS (A ) Bell pairs can be distilled
from n copies of j iAB .

5.5.1 Mixed-state entanglement

We have found a well-motivated and unambiguous way to quantify the en-
tanglement of a bipartite pure state j iAB : E = S (A), where
A = trB (j iAB AB h j): (5.216)
It is also of considerable interest to quantify the entanglement of bipartite
mixed states. Unfortunately, mixed-state entanglement is not nearly as well
understood as pure-state entanglement, and is the topic of much current
research.
5.5. ENTANGLEMENT CONCENTRATION 223
Suppose that AB is a mixed state shared by Alice and Bob, and that
they have n identical copies of this state. And suppose that, asymptotically
as n ! 1, Alice and Bob can prepare (AB )n, with good delity and high
success probability, from k Bell pairs using local operations and classical
communication. We de ne the entanglement of formation F of AB as
F (AB ) = nlim kmin
!1 n : (5.217)
Further, suppose that Alice and Bob can use local operations and classical
communication to distill k0 Bell pairs from n copies of AB . We de ne the
entanglement of distillation D of AB as
0
kmax
D(AB ) = nlim
!1 n : (5.218)
For pure states, we found D = E = F . But for mixed states, no explicit
general formulas for D or F are known. Since entanglement cannot be created
locally, we know that D F , but it is not known (in January, 1998) whether
D = F . However, one strongly suspects that, for mixed states, D < F . To
prepare the mixed state (AB )n from the pure state (j+iAB AB h+ j)k , we
must discard some quantum information. It would be quite surprising if this
process turned out to be (asymptotically) reversible.
It is useful to distinguish two di erent types of entanglement of distilla-
tion. D1 denotes the number of Bell pairs that can be distilled if only one-way
classical communication is allowed (e.g., Alice can send messages to Bob but
she cannot receive messages from Bob). D2 = D denotes the entanglement
of distillation if the classical communication is unrestricted. It is known that
D1 < D2 , and hence that D1 < F for some mixed states (while D1 = D2 = F
for pure states).
One reason for the interest in mixed-state entanglement (and in D1 in
particular) is a connection with the transmission of quantum information
through noisy quantum channels. If a quantum channel described by a su-
peroperator $ is not too noisy, then we can construct an n-letter block code
such that quantum information can be encoded, sent through the channel
($)n, decoded, and recovered with arbitrarily good delity as n ! 1. The
optimal number of encoded qubits per letter that can be transmitted through
the channel is called the quantum channel capacity C ($). It turns out that
C ($) can be related to D1 of a particular mixed state associated with the
channel | but we will postpone further discussion of the quantum channel
capacity until later.
224 CHAPTER 5. QUANTUM INFORMATION THEORY
5.6 Summary
Shannon entropy and classical data compression. The Shannon en-
tropy of an ensemble X = fx; p(x)g is H (x) h; log p(x)i; it quanti es
the compressibility of classical information. A message n letters long, where
each letter is drawn independently from X , can be compressed to H (x) bits
per letter (and no further), yet can still be decoded with arbitrarily good
accuracy as n ! 1.
Mutual information and classical channel capacity. The mutual
information I (X ; Y ) = H (X ) + H (Y ) ; H (X; Y ) quanti es how ensembles
X and Y are correlated; when we learn the value of y we acquire (on the
average) I (X ; Y ) bits of information about x. The capacity of a memoryless
noisy classical communication channel is C = fmax p(x)g I (X ; Y ). This is the
highest number of bits per letter that can be transmitted through the channel
(using the best possible code) with negligible error probability as n ! 1.
Von Neumann entropy, Holevo information, and quantum data
compression. The Von Neumann entropy of a density matrix is
S () = ;tr log ; (5.219)
and the Holevo information of an ensemble E = fx; pxg of quantum states
is
X X
(E ) = S ( px x) ; px S (x): (5.220)
x x
The Von Neumann entropy quanti es the compressibility of an ensemble of
pure quantum states. A message n letters long, where each letter is drawn in-
dependently from the ensemble fj'xi; px g, can be compressed to S () qubits
per letter (and no further), yet can still be decoded with arbitrarily good
delity as n ! 1. If the letters are drawn from the ensemble E of mixed
quantum states, then high- delity compression to fewer than (E ) qubits per
letter is not possible.
Accessible information. The accessible information of an ensemble E
of quantum states is the maximal number of bits of information that can
be acquired about the preparation of the state (on the average) with the
best possible measurement. The accessible information cannot exceed the
Holevo information of the ensemble. An n-letter code can be constructed such
that the marginal ensemble for each letter is close to E , and the accessible
5.7. EXERCISES 225
information per letter is close to (E ). The product-state capacity of a
quantum channel $ is
C ($) = max
E
($(E )): (5.221)
This is the highest number of classical bits per letter than can be transmitted
through the quantum channel, with negligible error probability as n ! 1,
assuming that each codeword is a tensor product of letter states.
Entanglement concentration. The entanglement E of a bipartite pure
state j iAB is E = S (A ) where A = trB (j iAB AB h j). With local oper-
ations and classical communication, we can prepare n copies of j iAB from
nE Bell pairs (but not from fewer), and we can distill nE Bells pairs (but
not more) from n copies of j iAB (asymptotically as n ! 1).

5.7 Exercises
5.1 Distinguishing nonorthogonal states.
Alice has prepared a single qubit in one of the two (nonorthogonal)
states
1 !
cos 2
jui = 0 ; jvi = sin ; (5.222)
2
where 0 < < . Bob knows the value of , but he has no idea whether
Alice prepared jui or jvi, and he is to perform a measurement to learn
what he can about Alice's preparation.
Bob considers three possible measurements:
a) An orthogonal measurement with
E 1 = juihuj; E 2 = 1 ; juihuj: (5.223)
(In this case, if Bob obtains outcome 2, he knows that Alice must have
prepared jvi.)
b) A three-outcome POVM with
F 1 = A(1 ; juihuj); F 2 = A(1 ; jvihvj)
226 CHAPTER 5. QUANTUM INFORMATION THEORY
F 3 = (1 ; 2A)1 + A(juihuj + jvihvj); (5.224)
where A has the largest value consistent with positivity of F 3. (In
this case, Bob determines the preparation unambiguously if he obtains
outcomes 1 or 2, but learns nothing from outcome 3.)
c) An orthogonal measurement with
E 1 = jwihwj; E 2 = 1 ; jwihwj; (5.225)
where
0 h 1 i 1
cos +
jwi = @ h 12 2 2 i A : (5.226)
sin 2 2 + 2
(In this case E 1 and E 2 are projections onto the spin states that are ori-
ented in the x ; z plane normal to the axis that bisects the orientations
of jui and jvi.)
Find Bob's average information gain I () (the mutual information of
the preparation and the measurement outcome) in all three cases, and
plot all three as a function of . Which measurement should Bob
choose?
5.2 Relative entropy.
The relative entropy S (j) of two density matrices and is de ned
by
S (j) = tr(log ; log ): (5.227)
You will show that S (j) is nonnegative, and derive some conse-
quences of this property.
a) A di erentiable real-valued function of a real variable is concave if
f (y) ; f (x) (y ; x)f 0(x); (5.228)
for all x and y. Show that if a and b are observables, and f is concave,
then
tr(f (b) ; f (a)) tr[(b ; a)f 0(a)]: (5.229)
5.7. EXERCISES 227
b) Show that f (x) = ;x log x is concave for x > 0.
c) Use (a) and (b) to show S (j) 0 for any two density matrices and
.
d) Use nonnegativity of S (j) to show that if has its support on a space
of dimension D, then
S () log D: (5.230)
e) Use nonnegativity of relative entropy to prove the subadditivity of entropy
S (AB ) S (A ) + S (B ): (5.231)
[Hint: Consider the relative entropy of A B and AB .]
f) Use subadditivity to prove the concavity of the entropy:
X X
S ( ii) iS (i); (5.232)
i i
where the i's are positive real numbers summing to one. [Hint: Apply
subadditivity to
AB = X i (i)A (jeiiheij)B : ] (5.233)
i

g) Use subadditivity to prove the triangle inequality (also called the Araki-
Lieb inequality):
S (AB ) jS (A) ; S (B )j: (5.234)
[Hint: Consider a puri cation of AB ; that is, construct a pure state
j i such that AB = trC j ih j. Then apply subadditivity to BC .]
5.3 Lindblad{Uhlmann monotonicity.
According to a theorem proved by Lindblad and by Uhlmann, relative
entropy on HA HB has a property called monotonicity:
S (AjA ) S (AB jAB ); (5.235)
The relative entropy of two density matrices on a system AB cannot
be less than the induced relative entropy on the subsystem A.
228 CHAPTER 5. QUANTUM INFORMATION THEORY
a) Use Lindblad-Uhlmann monotonicity to prove the strong subadditivity
property of the Von Neumann entropy. [Hint: On a tripartite system
ABC , consider the relative entropy of ABC and A BC .]
b) Use Lindblad{Uhlmann monotonicity to show that the action of a super-
operator cannot increase relative entropy, that is,
S ($j$) S (j); (5.236)
Where $ is any superoperator (completely positive map). [Hint: Recall
that any superoperator has a unitary representation.]
c) Show that it follows from (b) that a superoperator cannot increase the
Holevo information of an ensemble E = fx; px g of mixed states:
($(E )) (E ); (5.237)
where
X ! X
(E ) = S px x ; px S (x): (5.238)
x x
5.4 The Peres{Wootters POVM.
Consider the Peres{Wootters information source described in x5.4.2 of
the lecture notes. It prepares one of the three states
jai = j'aij'ai; a = 1; 2; 3; (5.239)
each occurring with a priori probability 13 , where the j'ai's are de ned
in eq. (5.149).
a) Express the density matrix
X !
1
= 3 jaiha j ; (5.240)
a
in terms of the Bell basis of maximally entangled states fji; j ig,
and compute S ().
b) For the three vectors jai; a = 1; 2; 3, construct the \pretty good mea-
surement" de ned in eq. (5.162). (Again, expand the jai's in the Bell
basis.) In this case, the PGM is an orthogonal measurement. Express
the elements of the PGM basis in terms of the Bell basis.
5.7. EXERCISES 229
c) Compute the mutual information of the PGM outcome and the prepara-
tion.
5.5 Teleportation with mixed states.
An operational way to de ne entanglement is that an entangled state
can be used to teleport an unknown quantum state with better delity
than could be achieved with local operations and classical communica-
tion only. In this exercise, you will show that there are mixed states
that are entangled in this sense, yet do not violate any Bell inequality.
Hence, for mixed states (in contrast to pure states) \entangled" and
\Bell-inequality-violating" are not equivalent.
Consider a \noisy" entangled pair with density matrix.
() = (1 ; )j ;ih ; j + 41 1: (5.241)
a) Find the delity F that can be attained if the state () is used to teleport
a qubit from Alice to Bob. [Hint: Recall that you showed in an earlier
exercise that a \random guess" has delity F = 12 .]
b) For what values of is the delity found in (a) better than what can be
achieved if Alice measures her qubit and sends a classical message to
Bob? [Hint: Earlier, you showed that F = 2=3 can be achieved if Alice
measures her qubit. In fact this is the best possible F attainable with
classical communication.]
c) Compute
Prob("n^ "m^ ) tr (E A(^n)E B (m^ )()) ; (5.242)
where E A(^n) is the projection of Alice's qubit onto j "n^ i and E B (m^ )
is the projection of Bob's qubit onto j "m^ i.
d) Consider the case = 1=2. Show that in this case the state () violates
no Bell inequalities. Hint: It suces to construct a local hidden variable
model that correctly reproduces the spin correlations found in (c), for
= 1=2. Suppose that the hidden variable ^ is uniformly distributed
on the unit sphere, and that there are functions fA and fB such that
ProbA ("n^ ) = fA(^ n^); ProbB ("m^ ) = fB (^ m^ ):
(5.243)
230 CHAPTER 5. QUANTUM INFORMATION THEORY
The problem is to nd fA and fB (where 0 fA;B 1) with the
properties
Z Z
fA
^ Z
(^ n
^ ) = 1 =2 ; ^
fB (^ m^ ) = 1=2;
fA (^ n^)fB (^ m^ ) = Prob("n^ "m^ ): (5.244)
^
Chapter 6
Quantum Computation
6.1 Classical Circuits
The concept of a quantum computer was introduced in Chapter 1. Here we
will specify our model of quantum computation more precisely, and we will
point out some basic properties of the model. But before we explain what a
quantum computer does, perhaps we should say what a classical computer
does.

6.1.1 Universal gates

A classical (deterministic) computer evaluates a function: given n-bits of
input it produces m-bits of output that are uniquely determined by the input;
that is, it nds the value of
f : f0; 1gn ! f0; 1gm ; (6.1)
for a particular speci ed n-bit argument. A function with an m-bit value is
equivalent to m functions, each with a one-bit value, so we may just as well
say that the basic task performed by a computer is the evaluation of
f : f0; 1gn ! f0; 1g: (6.2)
We can easily count the number of such functions. There are 2n possible
inputs, and nfor each input there are two possible outputs. So there are
altogether 22 functions taking n bits to one bit.
231
232 CHAPTER 6. QUANTUM COMPUTATION
The evaluation of any such function can be reduced to a sequence of
elementary logical operations. Let us divide the possible values of the input
x = x1x2x3 : : :xn; (6.3)
into one set of values for which f (x) = 1, and a complementary set for which
f (x) = 0. For each x(a) such that f (x(a)) = 1, consider the function f (a) such
that
(
f (x) = 10 xotherwise
( a) = x(a) (6.4)

Then
f (x) = f (1)(x) _ f (2)(x) _ f (3)(x) _ : : : : (6.5)
f is the logical OR (_) of all the f (a)'s. In binary arithmetic the _ operation
of two bits may be represented
x _ y = x + y ; x y; (6.6)
it has the value 0 if x and y are both zero, and the value 1 otherwise.
Now consider the evaluation of f (a). In the case where x(a) = 111 : : : 1,
we may write
f (a)(x) = x1 ^ x2 ^ x3 : : : ^ xn; (6.7)
it is the logical AND (^) of all n bits. In binary arithmetic, the AND is the
product
x ^ y = x y: (6.8)
For any other x(a); f (a) is again obtained as the AND of n(abits, but where the
NOT (:) operation is rst applied to each xi such that xi = 0; for example
)

f (a)(x) = (:x1) ^ x2 ^ x3 ^ (:x4) ^ : : : (6.9)

if
x(a) = 0110 : : : : (6.10)
6.1. CLASSICAL CIRCUITS 233
The NOT operation is represented in binary arithmetic as
:x = 1 ; x: (6.11)
We have now constructed the function f (x) from three elementary logi-
cal connectives: NOT, AND, OR. The expression we obtained is called the
\disjunctive normal form" of f (x). We have also implicitly used another
operation, COPY, that takes one bit to two bits:
COPY : x ! xx: (6.12)
We need the COPY operation because each f (a) in the disjunctive normal
form expansion of f requires its own copy of x to act on.
In fact, we can pare our set of elementary logical connectives to a smaller
set. Let us de ne a NAND (\NOT AND") operation by
x " y = :(x ^ y) = (:x) _ (:y): (6.13)
In binary arithmetic, the NAND operation is
x " y = 1 ; xy: (6.14)
If we can COPY, we can use NAND to perform NOT:
x " x = 1 ; x2 = 1 ; x = :x: (6.15)
(Alternatively, if we can prepare the constant y = 1, then x " 1 = 1;x = :x.)
Also,
(x " y) " (x " y) = :(x " y) = 1 ; (1 ; xy) = xy = x ^ y;
(6.16)
and
(x " x) " (y " y) = (:x) " (:y) = 1 ; (1 ; x)(1 ; y)
= x + y ; xy = x _ y: (6.17)
So if we can COPY, NAND performs AND and OR as well. We conclude
that the single logical connective NAND, together with COPY, suces to
evaluate any function f . (You can check that an alternative possible choice
of the universal connective is NOR:
x # y = :(x _ y) = (:x) ^ (:y):) (6.18)
234 CHAPTER 6. QUANTUM COMPUTATION
If we are able to prepare a constant bit (x = 0 or x = 1), we can reduce
the number of elementary operations from two to one. The NAND/NOT
gate
(x; y) ! (1 ; x; 1 ; xy); (6.19)
computes NAND (if we ignore the rst output bit) and performs copy (if
we set the second input bit to y = 1, and we subsequently apply NOT to
both output bits). We say, therefore, that NAND/NOT is a universal gate.
If we have a supply of constant bits, and we can apply the NAND/NOT
gates to any chosen pair of input bits, then we can perform a sequence of
NAND/NOT gates to evaluate any function f : f0; 1gn ! f0; 1g for any
value of the input x = x1x2 : : : xn.
These considerations motivate the circuit model of computation. A com-
puter has a few basic components that can perform elementary operations
on bits or pairs of bits, such as COPY, NOT, AND, OR. It can also prepare
a constant bit or input a variable bit. A computation is a nite sequence of
such operations, a circuit, applied to a speci ed string of input bits.1 The
result of the computation is the nal value of all remaining bits, after all the
elementary operations have been executed.
It is a fundamental result in the theory of computation that just a few
elementary gates suce to evaluate any function of a nite input. This
result means that with very simple hardware components, we can build up
arbitrarily complex computations.
So far, we have only considered a computation that acts on a particular
xed input, but we may also consider families of circuits that act on inputs
of variable size. Circuit families provide a useful scheme for analyzing and
classifying the complexity of computations, a scheme that will have a natural
generalization when we turn to quantum computation.

6.1.2 Circuit complexity

In the study of complexity, we will often be interested in functions with a
one-bit output
f : f0; 1gn ! f0; 1g: (6.20)
1 The circuit is required to be acyclic, meaning that no directed closed loops are
permitted.
6.1. CLASSICAL CIRCUITS 235
Such a function f may be said to encode a solution to a \decision problem"
| the function examines the input and issues a YES or NO answer. Often, a
question that would not be stated colloquially as a question with a YES/NO
answer can be \repackaged" as a decision problem. For example, the function
that de nes the FACTORING problem is:
(
integer x has a divisor less than y;
f (x; y) = 10 ifotherwise;
(6.21)
knowing f (x; y) for all y < x is equivalent to knowing the least nontrivial
factor of y. Another important example of a decision problem is the HAMIL-
TONIAN path problem: let the input be an `-vertex graph, represented by
an ` ` adjacency matrix ( a 1 in the ij entry means there is an edge linking
vertices i and j ); the function is
(
if graph x has a Hamiltonian path;
f (x) = 01 otherwise (6.22)
:
(A path is Hamiltonian if it visits each vertex exactly once.)
We wish to gauge how hard a problem is by quantifying the resources
needed to solve the problem. For a decision problem, a reasonable measure
of hardness is the size of the smallest circuit that computes the corresponding
function f : f0; 1gn ! f0; 1g. By size we mean the number of elementary
gates or components that we must wire together to evaluate f . We may also
be interested in how much time it takes to do the computation if many gates
are permitted to execute in parallel. The depth of a circuit is the number of
time steps required, assuming that gates acting on distinct bits can operate
simultaneously (that is, the depth is the maximum length of a directed path
from the input to the output of the circuit). The width of a circuit is the
maximum number of gates that act in any one time step.
We would like to divide the decision problems into two classes: easy and
hard. But where should we draw the line? For this purpose, we consider
in nite families of decision problems with variable input size; that is, where
the number of bits of input can be any integer n. Then we can examine how
the size of the circuit that solves the problem scales with n.
If we use the scaling behavior of a circuit family to characterize the dif-
culty of a problem, there is a subtlety. It would be cheating to hide the
diculty of the problem in the design of the circuit. Therefore, we should
236 CHAPTER 6. QUANTUM COMPUTATION
restrict attention to circuit families that have acceptable \uniformity" prop-
erties | it must be \easy" to build the circuit with n + 1 bits of input once
we have constructed the circuit with an n-bit input.
Associated with a family of functions ffn g (where fn has n-bit input) are
circuits fCng that compute the functions. We say that a circuit family fCng
is \polynomial size" if the size of Cn grows with n no faster than a power of
n,
size (Cn) poly (n); (6.23)
where poly denotes a polynomial. Then we de ne:
P = fdecision problem solved by polynomial-size circuit familiesg
(P for \polynomial time"). Decision problems in P are \easy." The rest are
\hard." Notice that Cn computes fn(x) for every possible n-bit input, and
therefore, if a decision problem is in P we can nd the answer even for the
\worst-case" input using a circuit of size no greater than poly(n). (As noted
above, we implicitly assume that the circuit family is \uniform" so that the
design of the circuit can itself be solved by a polynomial-time algorithm.
Under this assumption, solvability in polynomial time by a circuit family is
equivalent to solvability in polynomial time by a universal Turing machine.)
Of course, to determine the size of a circuit that computes fn , we must
know what the elementary components of the circuit are. Fortunately, though,
whether a problem lies in P does not depend on what gate set we choose, as
long as the gates are universal, the gate set is nite, and each gate acts on a
set of bits of bounded size. One universal gate set can simulate another.
The vast majority of function families f : f0; 1gn ! f0; 1g are not in
P . For most functions, the output is essentially random, and there is no
better way to \compute" f (x) than to consult a look-up table of its values.
Since there are 2n n-bit inputs, the look-up table has exponential size, and a
circuit that encodes the table must also have exponential size. The problems
in P belong to a very special class | they have enough structure so that the
function f can be computed eciently.
Of particular interest are decision problems that can be answered by
exhibiting an example that is easy to verify. For example, given x and y < x,
it is hard (in the worst case) to determine if x has a factor less than y. But
if someone kindly provides a z < y that divides x, it is easy for us to check
that z is indeed a factor of x. Similarly, it is hard to determine if a graph
6.1. CLASSICAL CIRCUITS 237
has a Hamiltonian path, but if someone kindly provides a path, it is easy to
verify that the path really is Hamiltonian.
This concept that a problem may be hard to solve, but that a solution
can be easily veri ed once found, can be formalized by the notion of a \non-
deterministic" circuit. A nondeterministic circuit C~n;m (x(n); y(m)) associated
with the circuit Cn (x(n)) has the property:
Cn (x(n)) = 1 i C~n;m (x(n); y(m)) = 1 for some y(m): (6.24)
(where x(n) is n bits and y(m) is m bits.) Thus for a particular x(n) we can
use C~n;m to verify that Cn(x(n) = 1, if we are fortunate enough to have the
right y(m) in hand. We de ne:
NP : fdecision problems that admit a polynomial-size nondeter-
ministic circuit familyg
(NP for \nondeterministic polynomial time"). If a problem is in NP , there
is no guarantee that the problem is easy, only that a solution is easy to check
once we have the right information. Evidently P NP . Like P , the NP
problems are a small subclass of all decision problems.
Much of complexity theory is built on a fundamental conjecture:
Conjecture : P 6= NP ; (6.25)
there exist hard decision problems whose solutions are easily veri ed. Un-
fortunately, this important conjecture still awaits proof. But after 30 years
of trying to show otherwise, most complexity experts are rmly con dent of
its validity.
An important example of a problem in NP is CIRCUIT-SAT. In this case
the input is a circuit C with n gates, m input bits, and one output bit. The
problem is to nd if there is any m-bit input for which the output is 1. The
function to be evaluated is
(
there exists x(m) with C (x(m)) = 1;
f (C ) = 10 ifotherwise (6.26)
:
This problem is in NP because, given a circuit, it is easy to simulate the
circuit and evaluate its output for any particular input.
I'm going to state some important results in complexity theory that will
be relevant for us. There won't be time for proofs. You can nd out more
238 CHAPTER 6. QUANTUM COMPUTATION
by consulting one of the many textbooks on the subject; one good one is
Computers and Intractability: A Guide to the Theory of NP-Completeness,
by M. R. Garey and D. S. Johnson.
Many of the insights engendered by complexity theory ow from Cook's
Theorem (1971). The theorem states that every problem in NP is poly-
nomially reducible to CIRCUIT-SAT. This means that for any PROBLEM
2 NP , there is a polynomial-size circuit family that maps an \instance" x(n)
of PROBLEM to an \instance" y(m) of CIRCUIT-SAT; that is
CIRCUIT ; SAT (y(m)) = 1 i PROBLEM (x(n)) = 1:
(6.27)
It follows that if we had a magical device that could eciently solve CIRCUIT-
SAT (a CIRCUIT-SAT \oracle"), we could couple that device with the poly-
nomial reduction to eciently solve PROBLEM. Cook's theorem tells us that
if it turns out that CIRCUIT-SAT 2 P , then P = NP .
A problem that, like CIRCUIT-SAT, has the property that every prob-
lem in NP is polynomially reducible to it, is called NP -complete (NPC).
Since Cook, many other examples have been found. To show that a PROB-
LEM 2 NP is NP -complete, it suces to nd a polynomial reduction to
PROBLEM of another problem that is already known to be NP -complete.
For example, one can exhibit a polynomial reduction of CIRCUIT-SAT to
HAMILTONIAN. It follows from Cook's theorem that HAMILTONIAN is
also NP -complete.
If we assume that P 6= NP , it follows that there exist problems in NP
of intermediate diculty (the class NPI). These are neither P nor NPC .
Another important complexity class is called co-NP . Heuristically, NP
decision problems are ones we can answer by exhibiting an example if the an-
swer is YES, while co-NP problems can be answered with a counter-example
if the answer is NO. More formally:
fC g 2 NP :C (x) = 1 i C (x; y) = 1 for some y (6.28)
fC g 2 co;NP :C (x) = 1 i C (x; y) = 1 for all y: (6.29)
Clearly, there is a symmetry relating the classes NP and co-NP | whether
we consider a problem to be in NP or co-NP depends on how we choose to
frame the question. (\Is there a Hamiltonian circuit?" is in NP . \Is there
no Hamiltonian circuit?" is in co-NP ). But the interesting question is: is a
problem in both NP and co-NP ? If so, then we can easily verify the answer
6.1. CLASSICAL CIRCUITS 239
(once a suitable example is in hand) regardless of whether the answer is YES
or NO. It is believed (though not proved) that NP 6= co;NP . (For example,
we can show that a graph has a Hamiltonian path by exhibiting an example,
but we don't know how to show that it has no Hamiltonian path that way!)
Assuming that NP 6= co;NP , there is a theorem that says that no co-NP
problems are contained in NPC. Therefore, problems in the intersection of
NP and co-NP , if not in P , are good candidates for inclusion in NPI.
In fact, a problem in NP \ co;NP that is believed not in P is the
FACTORING problem. As already noted, FACTORING is in NP because,
if we are o ered a factor of x, we can easily check its validity. But it is also in
co-NP , because it is known that if we are given a prime number then (at least
in principle), we can eciently verify its primality. Thus, if someone tells us
the prime factors of x, we can eciently check that the prime factorization is
right, and can exclude that any integer less than y is a divisor of x. Therefore,
it seems likely that FACTORING is in NPI .
We are led to a crude (conjectured) picture of the structure of NP [
co;NP . NP and co-NP do not coincide, but they have a nontrivial inter-
section. P lies in NP \ co;NP (because P = co;P ), but the intersection
also contains problems not in P (like FACTORING). Neither NPC nor co-
NPC intersects with NP \ co;NP .
There is much more to say about complexity theory, but we will be con-
tent to mention one more element that relates to the discussion of quantum
complexity. It is sometimes useful to consider probabilistic circuits that have
access to a random number generator. For example, a gate in a probabilistic
circuit might act in either one of two ways, and ip a fair coin to decide which
action to execute. Such a circuit, for a single xed input, can sample many
possible computational paths. An algorithm performed by a probabilistic
circuit is said to be \randomized."
If we attack a decision problem using a probabilistic computer, we attain
a probability distribution of outputs. Thus, we won't necessarily always get
the right answer. But if the probability of getting the right answer is larger
than 21 + for every possible input ( > 0), then the machine is useful. In
fact, we can run the computation many times and use majority voting to
achieve an error probability less than ". Furthermore, the number of times
we need to repeat the computation is only polylogarithmic in ";1.
If a problem admits a probabilistic circuit family of polynomial size that
always gives the right answer with probability larger than 12 + (for any input,
and for xed > 0), we say the problem is in the class BPP (\bounded-error
240 CHAPTER 6. QUANTUM COMPUTATION
probabilistic polynomial time"). It is evident that
P BPP; (6.30)
but the relation of NP to BPP is not known. In particular, it has not been
proved that BPP is contained in NP .

6.1.3 Reversible computation

In devising a model of a quantum computer, we will generalize the circuit
model of classical computation. But our quantum logic gates will be unitary
transformations, and hence will be invertible, while classical logic gates like
the NAND gate are not invertible. Before we discuss quantum circuits, it is
useful to consider some features of reversible classical computation.
Aside from the connection with quantum computation, another incentive
for studying reversible classical computation arose in Chapter 1. As Lan-
dauer observed, because irreversible logic elements erase information, they
are necessarily dissipative, and therefore, require an irreducible expenditure
of power. But if a computer operates reversibly, then in principle there need
be no dissipation and no power requirement. We can compute for free!
A reversible computer evaluates an invertible function taking n bits to n
bits
f : f0; 1gn ! f0; 1gn ; (6.31)
the function must be invertible so that there is a unique input for each output;
then we are able in principle to run the computation backwards and recover
the input from the output. Since it is a 1-1 function, we can regard it as a
permutation of the 2n strings of n bits | there are (2n )! such functions.
Of course, any irreversible computation can be \packaged" as an evalu-
ation of an invertible function. For example, for any f : f0; 1gn ! f0; 1gm ,
we can construct f~ : f0; 1gn+m ! f0; 1gn+m such that
f~(x; 0(m)) = (x; f (x)); (6.32)
(where 0(m) denotes m-bits initially set to zero). Since f~ takes each (x; 0(m))
to a distinct output, it can be extended to an invertible function of n + m
bits. So for any f taking n bits to m, there is an invertible f~ taking n + m
to n + m, which evaluates f (x) acting on (x; 0(m))
6.1. CLASSICAL CIRCUITS 241
Now, how do we build up a complicated reversible computation from
elementary components | that is, what constitutes a universal gate set? We
will see that one-bit and two-bit reversible gates do not suce; we will need
three-bit gates for universal reversible computation.
Of the four 1-bit ! 1-bit gates, two are reversible; the trivial gate and
the NOT gate. Of the (24)2 = 256 possible 2-bit ! 2-bit gates, 4! = 24 are
reversible. One of special interest is the controlled-NOT or reversible XOR
gate that we already encountered in Chapter 4:
XOR : (x; y) 7! (x; x y); (6.33)

x s x
y g xy

This gate ips the second bit if the rst is 1, and does nothing if the rst bit
is 0 (hence the name controlled-NOT). Its square is trivial, that is, it inverts
itself. Of course, this gate performs a NOT on the second bit if the rst bit
is set to 1, and it performs the copy operation if y is initially set to zero:
XOR : (x; 0) 7! (x; x): (6.34)
With the circuit
x s gs y
y gs g x

constructed from three X0R's, we can swap two bits:

(x; y) ! (x; x y) ! (y; x y) ! (y; x): (6.35)
With these swaps we can shue bits around in a circuit, bringing them
together if we want to act on them with a particular component in a xed
location.
To see that the one-bit and two-bit gates are nonuniversal, we observe
that all these gates are linear. Each reversible two-bit gate has an action of
the form ! ! !
x ! x0 = M x + a ; (6.36)
y y0 y b
242 CHAPTER 6. QUANTUM COMPUTATION

where the constant ab takes one of four possible values, and the matrix M
is one of the six invertible matrices
! ! ! ! ! !
1 0 0 1 1 1
M= 0 1 ; 1 0 ; 0 1 ; 1 1 ; 1 1 ; 1 0 :1 0 0 1 1 1
(6.37)
(All addition is performed modulo 2.) Combining the six choices for M with
the four possible constants, we obtain 24 distinct gates, which exhausts all
the reversible 2 ! 2 gates.
Since the linear transformations are closed under composition, any circuit
composed from reversible 2 ! 2 (and 1 ! 1) gates will compute a linear
function
x ! Mx + a: (6.38)
But for n 3, there are invertible functions on n-bits that are nonlinear. An
important example is the 3-bit To oli gate (or controlled-controlled-NOT)
(3)
(3) : (x; y; z) ! (x; y; z xy); (6.39)
x s x
y s y
z g z xy

it ips the third bit if the rst two are 1 and does nothing otherwise. Like
the XOR gate, it is its own inverse.
Unlike the reversible 2-bit gates, the To oli gate serves as a universal gate
for Boolean logic, if we can provide xed input bits and ignore output bits.
If z is initially 1, then x " y = 1 ; xy appears in the third output | we can
perform NAND. If we x x = 1, the To oli gate functions like an XOR gate,
and we can use it to copy.
The To oli gate (3) is universal in the sense that we can build a circuit to
compute any reversible function using To oli gates alone (if we can x input
bits and ignore output bits). It will be instructive to show this directly,
without relying on our earlier argument that NAND/NOT is universal for
Boolean functions. In fact, we can show the following: From the NOT gate
6.1. CLASSICAL CIRCUITS 243
and the To oli gate (3), we can construct any invertible function on n bits,
provided we have one extra bit of scratchpad space available.
The rst step is to show that from the three-bit To oli-gate (3) we can
construct an n-bit To oli gate (n) that acts as
(x1; x2; : : : xn;1; y) ! (x1; x2; : : : ; xn;1y x1x2 : : : xn;1):
(6.40)
The construction requires one extra bit of scratch space. For example, we
construct (4) from (3)'s with the circuit
x1 s s x1
x2 s s x2
0 gs g 0
x3 s x3
y g y x1x2x3

The purpose of the last (3) gate is to reset the scratch bit back to its original
value zero. Actually, with one more gate we can obtain an implementation
of (4) that works irrespective of the initial value of the scratch bit:

x1 s s x1
x2 s s x2
w gs gs w
x3 s s x3
y g g y x1x2x3

Again, we can eliminate the last gate if we don't mind ipping the value of
the scratch bit.
We can see that the scratch bit really is necessary, because (4) is an odd
permutation (in fact a transposition) of the 24 4-bit strings | it transposes
1111 and 1110. But (3) acting on any three of the four bits is an even
permutation; e.g., acting on the last three bits it transposes 0111 with 0110,
244 CHAPTER 6. QUANTUM COMPUTATION
and 1111 with 1110. Since a product of even permutations is also even, we
cannot obtain (4) as a product of (3)'s that act on four bits only.
The construction of (4) from four (3)'s generalizes immediately to the
construction of (n) from two (n;1)'s and two (3)'s (just expand x1 to several
control bits in the above diagram). Iterating the construction, we obtain (n)
from a circuit with 2n;2 +2n;3 ; 2 (3)'s. Furthermore, just one bit of scratch
space is sucient.2) (When we need to construct (k), any available extra
bit will do, since the circuit returns the scratch bit to its original value. The
next step is to note that, by conjugating (n) with NOT gates, we can in
e ect modify the value of the control string that \triggers" the gate. For
example, the circuit

x1 gs g
x2 s
x3 gs g
y g
ips the value of y if x1x2x3 = 010, and it acts trivially otherwise. Thus
this circuit transposes the two strings 0100 and 0101. In like fashion, with
(n) and NOT gates, we can devise a circuit that transposes any two n-bit
strings that di er in only one bit. (The location of the bit where they di er
is chosen to be the target of the (n) gate.)
But in fact a transposition that exchanges any two n-bit strings can be
expressed as a product of transpositions that interchange strings that di er
in only one bit. If a0 and as are two strings that are Hamming distance s
apart (di er in s places), then there is a chain
a0; a1; a2; a3; : : : ; as; (6.41)
such that each string in the chain is Hamming distance one from its neighbors.
Therefore, each of the transpositions
(a0a1); (a1a2); (a2a3); : : : (as;1as); (6.42)
2 With more scratch space, we can build (n) from (3) 's much more eciently | see
the exercises.
6.1. CLASSICAL CIRCUITS 245
can be implemented as a (n) gate conjugated by NOT gates. By composing
transpositions we nd
(a0as) = (as;1as)(as;2as;1) : : : (a2a3)(a1a2)(a0a1)(a1a2)(a2a3)
: : : (as;2 as;1)(as;1as); (6.43)
we can construct the Hamming-distance-s transposition from 2s;1 Hamming-
distance-one transpositions. It follows that we can construct (a0as) from
(n)'s and NOT gates.
Finally, since every permutation is a product of transpositions, we have
shown that every invertible function on n-bits (every permutation on n-bit
strings) is a product of (3)'s and NOT's, using just one bit of scratch space.
Of course, a NOT can be performed with a (3) gate if we x two input
bits at 1. Thus the To oli gate (3) is universal for reversible computation,
if we can x input bits and discard output bits.

6.1.4 Billiard ball computer

Two-bit gates suce for universal irreversible computation, but three-bit
gates are needed for universal reversible computation. One is tempted to
remark that \three-body interactions" are needed, so that building reversible
hardware is more challenging than building irreversible hardware. However,
this statement may be somewhat misleading.
Fredkin described how to devise a universal reversible computer in which
the fundamental interaction is an elastic collision between two billiard balls.
Balls of radius p12 move on a square lattice with unit lattice spacing. At
each integer valued time, the center of each ball lies at a lattice site; the
presence or absence of a ball at a particular site (at integer time) encodes
a bit of information. In each unit of time, each ball moves unit distance
along one of the lattice directions. Occasionally, at integer-valued times, 90o
elastic
p collisions occur between two balls that occupy sites that are distance
2 apart (joined by a lattice diagonal).
The device is programmed by nailing down balls at certain sites, so that
those balls act as perfect re ectors. The program is executed by xing ini-
tial positions and directions for the moving balls, and evolving the system
according to Newtonian mechanics for a nite time. We read the output
by observing the nal positions of all the moving balls. The collisions are
nondissipative, so that we can run the computation backward by reversing
the velocities of all the balls.
246 CHAPTER 6. QUANTUM COMPUTATION
To show that this machine is a universal reversible computer, we must
explain how to operate a universal gate. It is convenient to consider the
three-bit Fredkin gate
(x; y; z) ! (x; xz + xy; xy + xz); (6.44)
which swaps y and z if x = 0 (we have introduced the notation x = :x).
You can check that the Fredkin gate can simulate a NAND/NOT gate if we
x inputs and ignore outputs.
We can build the Fredkin gate from a more primitive object, the switch
gate. A switch gate taking two bits to three acts as
(x; y) ! (x; xy; xy): (6.45)
x xxy
y S
xy
The gate is \reversible" in that we can run it backwards acting on a con-
strained 3-bit input taking one of the four values
0 1 0 10 10 10 1
B x C B 0 CB 0 CB 1 CB 1 C
@ y A = @ 0 A@ 0 A@ 0 A@ 1 A (6.46)
z 0 1 0 0
Furthermore, the switch gate is itself universal; xing inputs and ignoring
outputs, it can do NOT (y = 1, third output) AND (second output), and
COPY (y = 1, rst and second output). It is not surprising, then, that we
can compose switch gates to construct a universal reversible 3 ! 3 gate.
Indeed, the circuit

builds the Fredkin gate from four switch gates (two running forward and two
running backward). Time delays needed to maintain synchronization are not
explicitly shown.
In the billiard ball computer, the switch gate is constructed with two
re ectors, such that (in the case x = y = 1) two moving balls collide twice.
The trajectories of the balls in this case are:
6.1. CLASSICAL CIRCUITS 247
A ball labeled x emerges from the gate along the same trajectory (and at the
same time) regardless of whether the other ball is present. But for x = 1, the
position of the other ball (if present) is shifted down compared to its nal
position for x = 0 | this is a switch gate. Since we can perform a switch
gate, we can construct a Fredkin gate, and implement universal reversible
logic with a billiard ball computer.
An evident weakness of the billiard-ball scheme is that initial errors in the
positions and velocities of the ball will accumulate rapidly, and the computer
will eventually fail. As we noted in Chapter 1 (and Landauer has insistently
pointed out) a similar problem will aict any proposed scheme for dissipa-
tionless computation. To control errors we must be able to compress the
phase space of the device, which will necessarily be a dissipative process.

6.1.5 Saving space

But even aside from the issue of error control there is another key question
about reversible computation. How do we manage the scratchpad space
needed to compute reversibly?
In our discussion of the universality of the To oli gate, we saw that in
principle we can do any reversible computation with very little scratch space.
But in practice it may be impossibly dicult to gure out how to do a
particular computation with minimal space, and in any case economizing on
space may be costly in terms of the run time.
There is a general strategy for simulating an irreversible computation on
a reversible computer. Each irreversible NAND or COPY gate can be simu-
lated by a To oli gate by xing inputs and ignoring outputs. We accumulate
and save all \garbage" output bits that are needed to reverse the steps of
the computation. The computation proceeds to completion, and then a copy
of the output is generated. (This COPY operation is logically reversible.)
Then the computation runs in reverse, cleaning up all garbage bits, and re-
turning all registers to their original con gurations. With this procedure
the reversible circuit runs only about twice as long as the irreversible circuit
that it simulates, and all garbage generated in the simulation is disposed of
without any dissipation and hence no power requirement.
This procedure works, but demands a huge amount of scratch space { the
space needed scales linearly with the length T of the irreversible computation
being simulated. In fact, it is possible to use space far more eciently (with
only a minor slowdown), so that the space required scales like log T instead
248 CHAPTER 6. QUANTUM COMPUTATION
of T . (That is, there is a general-purpose scheme that requires space /
log T ; of course, we might do even better in the simulation of a particular
computation.)
To use space more e ectively, we will divide the computation into smaller
steps of roughly equal size, and we will run these steps backward when pos-
sible during the course of the computation. However, just as we are unable
to perform step k of the computation unless step k ; 1 has already been
completed, we are unable to run step k in reverse if step k ; 1 has previously
been executed in reverse.3 The amount of space we require (to store our
garbage) will scale like the maximum value of the number of forward steps
minus the number of backward steps that have been executed.
The challenge we face can be likened to a game | the reversible pebble
game.4 The steps to be executed form a one-dimension directed graph with
sites labeled 1; 2; 3 : : : T . Execution of step k is modeled by placing a pebble
on the kth site of the graph, and executing step k in reverse is modeled as
removal of a pebble from site k. At the beginning of the game, no sites are
covered by pebbles, and in each turn we add or remove a pebble. But we
cannot place a pebble at site k (except for k = 1) unless site k ; 1 is already
covered by a pebble, and we cannot remove a pebble from site k (except for
k = 1) unless site k ; 1 is covered. The object is to cover site T (complete
the computation) without using more pebbles than necessary (generating a
minimal amount of garbage).
In fact, with n pebbles we can reach site T = 2n ; 1, but we can go no
further.
We can construct a recursive procedure that enables us to reach site
T = 2n;1 with n pebbles, leaving only one pebble in play. Let F1(k) denote
placing a pebble at site k, and F1(k);1 denote removing a pebble from site
k. Then
F2(1; 2) = F1(1)F1(2)F1(1);1 ; (6.47)
leaves a pebble at site k = 2, using a maximum of two pebbles at intermediate
3 We make the conservative assumption that we are not clever enough to know ahead
of time what portion of the output from step k ; 1 might be needed later on. So we store
a complete record of the con guration of the machine after step k ; 1, which is not to be
erased until an updated record has been stored after the completion of a subsequent step.
4 as pointed out by Bennett. For a recent discussion, see M. Li and P. Vitanyi,
quant-ph/9703022.
6.1. CLASSICAL CIRCUITS 249
stages. Similarly
F3(1; 4) = F2(1; 2)F2(3; 4)F2(1; 2);1; (6.48)
reaches site k = 4 using a maximum of three pebbles, and
F4(1; 8) = F3(1; 4)F3(5; 8)F3(1; 4);1; (6.49)
reaches k = 8 using four pebbles. Evidently we can construct Fn(1; 2n;1 )
which uses a maximum of n pebbles and leaves a single pebble in play. (The
routine
Fn(1; 2n;1 )Fn;1(2n;1 + 1; 2n;1 + 2n;2 ) : : : F1(2n ; 1);
(6.50)
leaves all n pebbles in play, with the maximal pebble at site k = 2n ; 1.)
Interpreted as a routine for executing T = 2n;1 steps of a computation,
this strategy for playing the pebble game represents a simulation requiring
space scaling like n log T . How long does the simulation take? At each level
of the recursive procedure described above, two steps forward are replaced by
two steps forward and one step back. Therefore, an irreversible computation
with Tirr = 2n steps is simulated in Trev = 3n steps, or
Trev = (Tirr)log 3= log 2; = (Tirr)1:58; (6.51)
a modest power law slowdown.
In fact, we can improve the slowdown to
Trev (Tirr)1+" ; (6.52)
for any " > 0. Instead of replacing two steps forward with two forward and
one back, we replace ` forward with ` forward and ` ; 1 back. A recursive
procedure with n levels reaches site `n using a maximum of n(` ; 1) + 1
pebbles. Now we have Tirr = `n and Trev = (2` ; 1)n , so that
Trev = (Tirr)log(2`;1)= log `; (6.53)
the power characterizing the slowdown is

log(2` ; 1) = log 2` + log 1 ; 21` ' 1 + log 2 ; (6.54)
log ` log ` log `
250 CHAPTER 6. QUANTUM COMPUTATION
and the space requirement scales as
S ' n` ' ` log T: (6.55)
log `
Thus, for any xed " > 0, we can attain S scaling like log T , and a slowdown
no worse than (Tirr)1+" . (This is not the optimal way to play the Pebble game
if our objective is to get as far as we can with as few pebbles as possible. We
use more pebbles to get to step T , but we get there faster.)
We have now seen that a reversible circuit can simulate a circuit com-
posed of irreversible gates eciently | without requiring unreasonable mem-
ory resources or causing an unreasonable slowdown. Why is this important?
You might worry that, because reversible computation is \harder" than ir-
reversible computation, the classi cation of complexity depends on whether
we compute reversibly or irreversibly. But this is not the case, because a
reversible computer can simulate an irreversible computer pretty easily.

6.2 Quantum Circuits

Now we are ready to formulate a mathematical model of a quantum com-
puter. We will generalize the circuit model of classical computation to the
quantum circuit model of quantum computation.
A classical computer processes bits. It is equipped with a nite set of
gates that can be applied to sets of bits. A quantum computer processes
qubits. We will assume that it too is equipped with a discrete set of funda-
mental components, called quantum gates. Each quantum gate is a unitary
transformation that acts on a xed number of qubits. In a quantum com-
putation, a nite number n of qubits are initially set to the value j00 : : : 0i.
A circuit is executed that is constructed from a nite number of quantum
gates acting on these qubits. Finally, a Von Neumann measurement of all the
qubits (or a subset of the qubits) is performed, projecting each onto the basis
fj0i; j1ig. The outcome of this measurement is the result of the computation.
Several features of this model require comment:
(1) It is implicit but important that the Hilbert space of the device has a pre-
ferred decomposition into a tensor product of low-dimensional spaces,
in this case the two-dimensional spaces of the qubits. Of course, we
could have considered a tensor product of, say, qutrits instead. But
6.2. QUANTUM CIRCUITS 251
anyway we assume there is a natural decomposition into subsystems
that is respected by the quantum gates | which act on only a few
subsystems at a time. Mathematically, this feature of the gates is cru-
cial for establishing a clearly de ned notion of quantum complexity.
Physically, the fundamental reason for a natural decomposition into
subsystems is locality; feasible quantum gates must act in a bounded
spatial region, so the computer decomposes into subsystems that inter-
act only with their neighbors.
(2) Since unitary transformations form a continuum, it may seem unneces-
sarily restrictive to postulate that the machine can execute only those
quantum gates chosen from a discrete set. We nevertheless accept such
a restriction, because we do not want to invent a new physical imple-
mentation each time we are faced with a new computation to perform.
(3) We might have allowed our quantum gates to be superoperators, and our
nal measurement to be a POVM. But since we can easily simulate a
superoperator by performing a unitary transformation on an extended
system, or a POVM by performing a Von Neumann measurement on
an extended system, the model as formulated is of sucient generality.
(4) We might allow the nal measurement to be a collective measurement,
or a projection into a di erent basis. But any such measurement can be
implemented by performing a suitable unitary transformation followed
by a projection onto the standard basis fj0i; j1ign . Of course, compli-
cated collective measurements can be transformed into measurements
in the standard basis only with some diculty, but it is appropriate to
take into account this diculty when characterizing the complexity of
an algorithm.
(5) We might have allowed measurements at intermediate stages of the
computation, with the subsequent choice of quantum gates conditioned
on the outcome of those measurements. But in fact the same result
can always be achieved by a quantum circuit with all measurements
postponed until the end. (While we can postpone the measurements in
principle, it might be very useful in practice to perform measurements
at intermediate stages of a quantum algorithm.)
A quantum gate, being a unitary transformation, is reversible. In fact, a
classical reversible computer is a special case of a quantum computer. A
252 CHAPTER 6. QUANTUM COMPUTATION
classical reversible gate
x(n) ! y(n) = f (x(n)); (6.56)
implementing a permutation of n-bit strings, can be regarded as a unitary
transformation that acts on the \computational basis fjxiig according to
U : jxii ! jyii: (6.57)
This action is unitary because the 2n strings jyii are all mutually orthogonal.
A quantum computation constructed from such classical gates takes j0 : : : 0i
to one of the computational basis states, so that the nal measurement is
deterministic.
There are three main issues concerning our model that we would like to
address. The rst issue is universality. The most general unitary transfor-
mation that can be performed on n qubits is an element of U (2n ). Our model
would seem incomplete if there were transformations in U (2n ) that we were
unable to reach. In fact, we will see that there are many ways to choose a
discrete set of universal quantum gates. Using a universal gate set we can
construct circuits that compute a unitary transformation that comes as close
as we please to any element in U (2n ).
Thanks to universality, there is also a machine independent notion of
quantum complexity. We may de ne a new complexity class BQP | the class
of decision problems that can be solved, with high probability, by polynomial-
size quantum circuits. Since one universal quantum computer can simulate
another eciently, the class does not depend on the details of our hardware
(on the universal gate set that we have chosen).
Notice that a quantum computer can easily simulate a probabilistic clas-
sical computer: it can prepare p12 (j0i + j1i) and then project to fj0i; j1ig,
generating a random bit. Therefore BQP certainly contains the class BPP .
But as we discussed in Chapter 1, it seems to be quite reasonable to expect
that BQP is actually larger than BPP , because a probabilistic classical
computer cannot easily simulate a quantum computer. The fundamental dif-
culty is that the Hilbert space of n qubits is huge, of dimension 2n, and
hence the mathematical description of a typical vector in the space is ex-
ceedingly complex. Our second issue is to better characterize the resources
needed to simulate a quantum computer on a classical computer. We will see
that, despite the vastness of Hilbert space, a classical computer can simulate
an n-qubit quantum computer even if limited to an amount of memory space
6.2. QUANTUM CIRCUITS 253
that is polynomial in n. This means the BQP is contained in the complexity
class PSPACE , the decision problems that can be solved with polynomial
space, but may require exponential time. (We know that NP is also con-
tained in PSPACE , since checking if C (x(n); y(m)) = 1 for each y(m) can be
accomplished with polynomial space.5
The third important issue we should address is accuracy. The class BQP
is de ned formally under the idealized assumption that quantum gates can be
executed with perfect precision. Clearly, it is crucial to relax this assumption
in any realistic implementation of quantum computation. A polynomial size
quantum circuit family that solves a hard problem would not be of much
interest if the quantum gates in the circuit were required to have exponential
accuracy. In fact, we will show that this is not the case. An idealized T -gate
quantum circuit can be simulated with acceptable accuracy by noisy gates,
provided that the error probability per gate scales like 1=T .
We see that quantum computers pose a serious challenge to the strong
Church{Turing thesis, which contends that any physically reasonable model
of computation can be simulated by probabilistic classical circuits with at
worst a polynomial slowdown. But so far there is no rm proof that
BPP 6= BQP: (6.58)
Nor is such a proof necessarily soon to be expected.6 Indeed, a corollary
would be
BPP 6= PSPACE; (6.59)
which would settle one of the long-standing and pivotal open questions in
complexity theory.
It might be less unrealistic to hope for a proof that BPP 6= BQP follows
from another standard conjecture of complexity theory such as P 6= NP . So
far no such proof has been found. But while we are not yet able to prove
that quantum computers have capabilities far beyond those of conventional
computers, we nevertheless might uncover evidence suggesting that BPP = 6
BQP . We will see that there are problems that seem to be hard (in classical
computation) yet can be eciently solved by quantum circuits.
5Actually there is another rung of the complexity hierarchy that may separate BQP
and PSPACE ; we can show that BQP P #P PSPACE , but we won't consider P #P
any further here.
6That is, we ought not to expect a \nonrelativized proof." A separation between BPP
and BQP \relative to an oracle" will be established later in the chapter.
254 CHAPTER 6. QUANTUM COMPUTATION
Thus it seems likely that the classi cation of complexity will be di erent
depending on whether we use a classical computer or a quantum computer
to solve a problem. If such a separation really holds, it is the quantum
classi cation that should be regarded as the more fundamental, for it is
better founded on the physical laws that govern the universe.

6.2.1 Accuracy
Let's discuss the issue of accuracy. We imagine that we wish to implement
a computation in which the quantum gates U 1; U 2; : : : ; U T are applied se-
quentially to the initial state j'0i. The state prepared by our ideal quantum
circuit is
j'T i = U T U T ;1 : : : U 2U 1j'0i: (6.60)
But in fact our gates do not have perfect accuracy. When we attempt to ap-
ply the unitary transformation U t, we instead apply some \nearby" unitary
transformation U~ t. (Of course, this is not the most general type of error that
we might contemplate { the unitary U t might be replaced by a superoperator.
Considerations similar to those below would apply in that case, but for now
we con ne our attention to \unitary errors.")
The errors cause the actual state of the computer to wander away from
the ideal state. How far does it wander? Let j'ti denote the ideal state after
t quantum gates are applied, so that
j'ti = U tj't;1i: (6.61)
But if we apply the actual transformation U~ t, then
U~ tj't;1i = j'ti + jEti; (6.62)
where
jEti = (U~ t ; U t)j't;1i; (6.63)
is an unnormalized vector. If j'~ti denotes the actual state after t steps, then
we have
j'~1i = j'1i + jE1i;
j'~2i = U~ 2j'~1i = j'2i + jE2i + U~ 2jE1i; (6.64)
6.2. QUANTUM CIRCUITS 255
and so forth; we ultimately obtain
j'~T i = j'T i + jET i + U~ T jET ;1i + U~ T U~ T ;1jET ;2i
+ : : : + U~ T U~ T ;1 : : : U~ 2jE1i: (6.65)
Thus we have expressed the di erence between j'~T i and j'T i as a sum of T
remainder terms. The worst case yielding the largest deviation of j'~T i from
j'T i occurs if all remainder terms line up in the same direction, so that the
errors interfere constructively. Therefore, we conclude that
k j'~T i ; j'T i k k jET i k + k jET ;1i k
+ : : : + k jE2i k + k jE1i k; (6.66)
where we have used the property k U jEii k=k jEii k for any unitary U .
Let k A ksup denote the sup norm of the operator A | that is, the
maximum modulus of an eigenvalue of A. We then have

k jEti k=k U~ t ; U t j't;1i kk U~ t ; U t ksup (6.67)
(since j't;1i is normalized). Now suppose that, for each value of t, the error
in our quantum gate is bounded by
k U~ t ; U t ksup< ": (6.68)
Then after T quantum gates are applied, we have
k j'~T i ; j'T i k< T"; (6.69)
in this sense, the accumulated error in the state grows linearly with the length
of the computation.
The distance bounded in eq. (6.68) can equivalently be expressed as k
W t ; 1 ksupi, where W t = U~ tU yt . Since W t is unitary, each of its eigenvalues
is a phase e , and the corresponding eigenvalue of W t ; 1 has modulus
jei ; 1j = (2 ; 2 cos )1=2; (6.70)
so that eq. (6.68) is the requirement that each eigenvalue satis es
cos > 1 ; "2=2; (6.71)
256 CHAPTER 6. QUANTUM COMPUTATION
<", for " small). The origin of eq. (6.69) is clear. In each time step,
(or jj
j'~i rotates relative to j'i by (at worst) an angle of order ", and the distance
between the vectors increases by at most of order ".
How much accuracy is good enough? In the nal step of our computation,
we perform an orthogonal measurement, and the probability of outcome a,
in the ideal case, is
P (a) = jhaj'T ij2: (6.72)
Because of the errors, the actual probability is
P~ (a) = jhaj'~T ij2: (6.73)
If the actual vector is close to the ideal vector, then the probability distribu-
tions are close, too. If we sum over an orthonormal basis fjaig, we have
X ~
jP (a) ; P (a)j 2 k j'~T i ; j'T i k; (6.74)
a
as you will show in a homework exercise. Therefore, if we keep T" xed (and
small) as T gets large, the error in the probability distribution also remains
xed. In particular, if we have designed a quantum algorithm that solves a
decision problem correctly with probability greater 21 + (in the ideal case),
then we can achieve success probability greater than 12 with our noisy gates,
if we can perform the gates with an accuracy T" < O(). A quantum circuit
family in the BQP class can really solve hard problems, as long as we can
improve the accuracy of the gates linearly with the computation size T .

6.2.2 BQP PSPACE

Of course a classical computer can simulate any quantum circuit. But how
much memory does the classical computer require? Naively, since the simu-
lation of an n-qubit circuit involves manipulating matrices of size 2n , it may
seem that an amount of memory space exponential in n is needed. But we
will now show that the simulation can be done to acceptable accuracy (albeit
very slowly!) in polynomial space. This means that the quantum complexity
class BQP is contained in the class PSPACE of problems that can be solved
with polynomial space.
The object of the classical simulation is to compute the probability for
each possible outcome a of the nal measurement
Prob(a) = jhajU T j0ij2; (6.75)
6.2. QUANTUM CIRCUITS 257
where
U T = U T U T ;1 : : : U 2 U 1 ; (6.76)
is a product of T quantum gates. Each U t, acting on the n qubits, can be
represented by a 2 2 unitary matrix, characterized by the complex matrix
n n
elements
hyjU tjxi; (6.77)
where x; y 2 f0; 1 : : : ; 2n ; 1g. Writing out the matrix multiplication explic-
itly, we have
X
hajU T j0i = hajU T jxT ;1ihxT ;1jU T ;1jxT ;2i : : :
fxtg
: : : hx2jU 2jx1ihx1jU 1j0i: (6.78)
Eq. (6.78) is a sort of \path integral" representation of the quantum compu-
tation { the probability amplitude for the nal outcome a is expressed as a
coherent sum of amplitudes for each of a vast number (2n(T ;1)) of possible
computational paths that begin at 0 and terminate at a after T steps.
Our classical simulator is to add up the 2n(T ;1) complex numbers in
eq. (6.78) to compute hajU T j0i. The rst problem we face is that nite size
classical circuits do integer arithmetic, while the matrix elements hyjU tjxi
need not be rational numbers. The classical simulator must therefore settle
for an approximate calculation to reasonable accuracy. Each term in the sum
is a product of T complex factors, and there are 2n(T ;1) terms in the sum.
The accumulated errors are sure to be small if we express the matrix elements
to m bits of accuracy, with m large compared to n(T ; 1). Therefore, we
can replace each complex matrix element by pairs of signed integers, taking
values in f0; 1; 2; : : : ; 2m;1 g. These integers give the binary expansion of the
real and imaginary part of the matrix element, expressed to precision 2;m .
Our simulator will need to compute each term in the sum eq. (6.78)
and accumulate a total of all the terms. But each addition requires only a
modest amount of scratch space, and furthermore, since only the accumulated
subtotal need be stored for the next addition, not much space is needed to
sum all the terms, even though there are exponentially many.
So it only remains to consider the evaluation of a typical term in the
sum, a product of T matrix elements. We will require a classical circuit that
258 CHAPTER 6. QUANTUM COMPUTATION
evaluates
hyjU tjxi; (6.79)
this circuit accepts the 2n bit input (x; y), and outputs the 2m-bit value of
the (complex) matrix element. Given a circuit that performs this function, it
will be easy to build a circuit that multiplies the complex numbers together
without using much space.
Finally, at this point, we appeal to the properties we have demanded of
our quantum gate set | the gates from a discrete set, and each gate acts on
a bounded number of qubits. Because there are a xed (and nite) number
of gates, there are only a xed number of gate subroutines that our simulator
needs to be able to call. And because the gate acts on only a few qubits,
nearly all of its matrix elements vanish (when n is large), and the value
hyjU jxi can be determined (to the required accuracy) by a simple circuit
requiring little memory.
For example, in the case of a single-qubit gate acting on the rst qubit,
we have
hy1y2 : : :ynjU jx1x2 : : :xn i = 0 if x2x3 : : :xn 6= y2y3 : : : yn:
(6.80)
A simple circuit can compare x2 with y2; x3 with y3, etc., and output zero if
the equality is not satis ed. In the event of equality, the circuit outputs one
of the four complex numbers
hy1jU jx1i; (6.81)
to m bits of precision. A simple circuit can encode the 8m bits of this
2 2 complex-valued matrix. Similarly, a simple circuit, requiring only space
polynomial in n and m, can evaluate the matrix elements of any gate of xed
size.
We conclude that a classical computer with space bounded above by
poly(n) can simulate an n-qubit universal quantum computer, and therefore
that BQP PSPACE. Of course, it is also evident that the simulation we
have described requires exponential time, because we need to evaluate the
sum of 2n(T ;1) complex numbers. (Actually, most of the terms vanish, but
there are still an exponentially large number of nonvanishing terms.)
6.2. QUANTUM CIRCUITS 259
6.2.3 Universal quantum gates
We must address one more fundamental question about quantum computa-
tion; how do we construct an adequate set of quantum gates? In other words,
what constitutes a universal quantum computer?
We will nd a pleasing answer. Any generic two-qubit gate suces for
universal quantum computation. That is, for all but a set of measure zero
of 4 4 unitary matrices, if we can apply that matrix to any pair of qubits,
then we can construct an n-qubit circuit that computes a transformation
that comes as close as we please to any element of U (2n ).
Mathematically, this is not a particularly deep result, but physically it
is very interesting. It means that, in the quantum world, as long as we can
devise a generic interaction between two qubits, and we can implement that
interaction accurately between any two qubits, we can compute anything,
no matter how complex. Nontrivial computation is ubiquitous in quantum
theory.
Aside from this general result, it is also of some interest to exhibit partic-
ular universal gate sets that might be particularly easy to implement physi-
cally. We will discuss a few examples.
There are a few basic elements that enter the analysis of any universal
quantum gate set.
(1) Powers of a generic gate
Consider a \generic" k-qubit gate. This is a 2k 2k unitary matrix
U with eigenvalues ei1 ; ei2 ; : : : ei2k . For all but a set of measure zero
of such matrices, each i is an irrational multiple of , and all the i's
are incommensurate (each i=j is also irrational). The positive integer
power U n of U has eigenvalues
ein1 ; ein2 ; : : : ; ein2k : (6.82)
Each such list of eigenvalues de nes a point in a 2k -dimensional torus
(the product of 2k circles). As n ranges over positive integer values,
these points densely ll the whole torus, if U is generic. If U = eiA,
positive integer powers of U come as close as we please to U () = eiA,
for any real . We say that any U () is reachable by positive integer
powers of U .
(2) Switching the leads
260 CHAPTER 6. QUANTUM COMPUTATION
There are a few (classical) transformations that we can implement just
by switching the labels on k qubits, or in other words, by applying the
gate U to the qubits in a di erent order. Of the (2k )! permutations
of the length-k strings, k! can be realized by swapping qubits. If a
gate applied to k qubits with a standard ordering is U , and P is a
permutation implemented by swapping qubits, then we can construct
the gate
U 0 = P U P ;1 ; (6.83)
just by switching the leads on the gate. For example, swapping two
qubits implements the transposition
P : j01i $ j10i; (6.84)
or
01 0 0 0 1
B0 0 1 0 CC
P =B
B@ 0 1 0 0 CA ; (6.85)
0 0 0 1
acting on basis fj00i; j01i; j10i; j11ig. By switching leads, we obtain a
gate

U0 = P U P ;1

We can also construct any positive integer power of U 0, (P U P ;1 )n =

P U n P ;1 .
(3) Completing the Lie algebra
We already remarked that if U = eiA is generic, then powers of U are
dense in the torus feiAg. We can further argue that if U = eiA and
U 0 = eiB are generic gates, we can compose them to come arbitrarily
close to
ei( A+ B ) or e; [A;B]; (6.86)
6.2. QUANTUM CIRCUITS 261
for any real ; ; . Thus, the \reachable" transformations have a
closed Lie algebra. We say that U = eiA is generated by A; then if
A and B are both generic generators of reachable transformations, so
are real linear combinations of A and B , and (i times) the commutator
of A and B .
We rst note that
n
lim ( e i A=n ei B=n)n = lim 1 + i ( A + B )
n!1 n!1 n
= ei( A+ B): (6.87)
Therefore, any ei( A+ B) is reachable if each ei A=n and ei B=n is. Fur-
thermore
iA=pn iB=pn ;iA=pn ;iB=pnn
n!1 e
lim e e e
1 n
!1 1 ; n (AB ; BA) = e
= nlim ;[A;B ]; (6.88)
so e;[A;B] is also reachable.
By invoking the observations (1), (2), and (3) above, we will be able to
show that a generic two-qubit gate is universal.
Deutsch gate. It was David Deutsch (1989) who rst pointed out the
existence of a universal quantum gate. Deutsch's three-bit universal gate
is a quantum cousin of the To oli gate. It is the controlled-controlled-R
transformation
s
s
R

that applies R to the third qubit if the rst two qubits have the value 1;
otherwise it acts trivially. Here
! !

R = ;iRx() = (;i) exp i 2 x = (;i) cos 2 + ix sin 2
(6.89)
is, up to a phase, a rotation by about the x-axis, where is a particular
angle incommensurate with .
262 CHAPTER 6. QUANTUM COMPUTATION
The nth power of the Deutsch gate is the controlled-controlled-Rn . In
particular, R4 = Rx(4), so that all one-qubit transformations generated by
x are reachable by integer powers of R. Furthermore the (4n + 1)st power
is
" #
(4 n + 1)
(;i) cos 2 + ix sin 2 (4 n + 1) ; (6.90)

which comes as close as we please to x. Therefore, the To oli gate is

reachable with integer powers of the Deutsch gate, and the Deutsch gate is
universal for classical computation.
Acting on the three-qubit computational basis
fj000i; j001i; j010i; j011i; j100i; j101i; j110i; j111ig; (6.91)
the generator of the Deutsch gate transposes the last two elements
j110i $ j111i: (6.92)
We denote this 8 8 matrix as
00 0 1
B CC
(x)67 = B
B@ CA : (6.93)
0 x
With To oli gates, we can perform any permutation of these eight elements,
in particular
P = (6m)(7n); (6.94)
for any m and n. So we can also reach any transformation generated by
P (x)67P = (x)mn : (6.95)
Furthermore,
20 1 0 13 0 1
0 1 0 0 0 0 0 0 1
@ 1 0 0 CA ; B@ 0 0 1 CA75 = B@ 0 0 0 CA = i(y )57;
[(x)56; (x)67] = 64B
0 0 0 0 1 0 ;1 0 0 (6.96)
6.2. QUANTUM CIRCUITS 263
and similarly, we can reach any unitary generated by (y )mn . Finally
[(x)mn ; (y )mn ] = i(z )mn ; (6.97)
So we can reach any transformation generated by a linear combination of the
(x;y;z )mn's. These span the whole SU (8) Lie Algebra, so we can generate
any three-qubit unitary (aside from an irrelevant overall phase).
Now recall that we have already found that we can construct the n-bit
To oli gate by composing three-bit To oli gates. The circuit

s s
s s
j0i gs g
s
R

uses one scratch bit to construct a four-bit Deutsch gate ((controlled)3-R)

from the three-bit Deutsch gate and two three-bit To oli gates, and a similar
circuit constructs the n-bit Deutsch gate from a three-bit Deutsch gate and
two (n ; 1)-bit To oli gates. Once we have an n-bit Deutsch gate, and
universal classical computation, exactly the same argument as above shows
that we can reach any transformation in SU (2n ).
Universal two-qubit gates. For reversible classical computation, we
saw that three-bit gates are needed for universality. But in quantum compu-
tation, two-bit gates turn out to be adequate. Since we already know that the
Deutsch gate is universal, we can establish this by showing that the Deutsch
gate can be constructed by composing two-qubit gates.
In fact, if

s
U
264 CHAPTER 6. QUANTUM COMPUTATION
denotes the controlled-U gate (the 2 2 unitary U is applied to the second
qubit if the rst qubit is 1; otherwise the gate acts trivially) then a controlled-
controlled-U 2 gate is obtained from the circuit
x s x s s s x
y s = y s gs g y
U2 U Uy U

the power of U applied to the third qubit is

y ; (x y) + x = x + y ; (x + y ; 2xy) = 2xy: (6.98)
Therefore, we can construct Deutsch's gate from the controlled-U , controlled
U ;1 and controlled-NOT gates, where
U 2 = ;iRx(); (6.99)
we may choose
!
U = e 4 Rx 2 :
; i (6.100)
Positive powers of U came as close as we please to x and U ;1, so from
U alone
the controlled- we can construct the Deutsch gate. Therefore, the
controlled- e;i 4 Rx 2 is itself a universal gate, for = irrational.
(Note that the above construction shows that, while we cannot construct
the To oli gate from two-bit reversible classical gates, we can construct it
from a controlled \square root of NOT" | a controlled-U with U 2 = x.)
Generic two-bit gates. Now we have found particular two-bit gates
(controlled rotations) that are universal gates. Therefore, for universality, it
is surely sucient if we can construct transformations that are dense in the
U (4) acting on a pair of qubits.
In fact, though, any generic two-qubit gate is sucient to generate all of
U (4). As we have seen, if eiA is a generic element of U (4), we can reach
any transformation generated by A. Furthermore, we can reach any trans-
formations generated by an element of the minimal Lie algebra containing A
and
B = P AP ;1 (6.101)
6.2. QUANTUM CIRCUITS 265
where P is the permutation (j01i $ j10i) obtained by switching the leads.
Now consider a general A, (expanded in terms of a basis for the Lie
algebra of U (4)), and consider a particular scheme for constructing 16 ele-
ments of the algebra by successive commutations, starting from A and B .
The elements so constructed are linearly independent (and it follows that
any transformation in U (4) is reachable) if the determinant of a particular
16 16 matrix vanishes. Unless this vanishes identically, its zeros occur only
on a submanifold of vanishing measure. But in fact, we can choose, say
A = ( I + x + y )23; (6.102)
(for incommensurate ; ; ), and show by explicit computation that the
entire 16-dimension Lie Algebra is actually generated by successive commu-
tations, starting with A and B . Hence we conclude that failure to generate
the entire U (4) algebra is nongeneric, and nd that almost all two-qubit gates
are universal.
Other adequate sets of gates. One can also see that universal quan-
tum computation can be realized with a gate set consisting of classical multi-
qubit gates and quantum single-qubit gates. For example, we can see that
the XOR gate, combined with one-qubit gates, form a universal set. Consider
the circuit
x s s x

A g B g C

which applies ABC to the second qubit if x = 0, and AxBxC to the

second qubit if x = 1. If we can nd A; B ; C such that
ABC = 1
AxBxC = U ; (6.103)
then this circuit functions as a controlled-U gate. In fact unitary 2 2
A; B; C with this property exist for any unitary U with determinant one
(as you'll show in an exercise). Therefore, the XOR plus arbitrary one-qubit
transformations form a universal set. Of course, two generic (noncommuting)
one-qubit transformations are sucient to reach all. In fact, with an XOR
266 CHAPTER 6. QUANTUM COMPUTATION
and a single generic one-qubit rotation, we can construct a second one-qubit
rotation that does not commute with the rst. Hence, an XOR together with
just one generic single-qubit gate constitutes a universal gate set.
If we are able to perform a To oli gate, then even certain nongeneric
one-qubit transformations suce for universal computation. For example
(another exercise) the To oli gate, together with =2 rotations about the x
and z axes, are a universal set.
Precision. Our discussion of universality has focused on reachability
without any regard for complexity. We have only established that we can
construct a quantum circuit that comes as close as we please to a desired
element of U (2n ), and we have not considered the size of the circuit that we
need. But from the perspective of quantum complexity theory, universality is
quite signi cant because it implies that one quantum computer can simulate
another to reasonable accuracy without an unreasonable slowdown.
Actually, we have not been very precise up until now about what it means
for one unitary transformation to be \close" to another; we should de ne a
topology. One possibility is to use the sup norm as in our previous discussion
of accuracy | the distance between matrices U and W is then k U ;W ksup.
Another natural topology is associated with the inner product
hW jU i tr W yU (6.104)
(if U and W are N N matrices, this is just the usual inner product on
C , where we regard U ij as a vector with N 2 components). Then we may
N 2

de ne the distance squared between matrices as

k U ; W k2 hU ; W jU ; W i: (6.105)
For the purpose of analyzing complexity, just about any reasonable topology
will do.
The crucial point is that given any universal gate set, we can reach within
distance " of any desired unitary transformation that acts on a xed num-
ber of qubits, using a quantum circuit whose size is bounded above by a
polynomial in ";1. Therefore, one universal quantum computer can simulate
another, to accuracy ", with a slowdown no worse than a factor that is poly-
nomial in ";1. Now we have already seen that to have a high probability of
getting the right answer when we perform a quantum circuit of size T , we
should implement each quantum gate to an accuracy that scales like T ;1.
Therefore, if you have a quantum circuit family of polynomial size that runs
6.3. SOME QUANTUM ALGORITHMS 267
on your quantum computer, I can devise a polynomial size circuit family that
runs on my machine, and that emulates your machine to acceptable accuracy.
Why can a poly(";1 )-size circuit reach a given k-qubit U to within dis-
tance "? We know for example that the positive integer powers of a generic
k-qubit eiA are dense in the 2k -torus feiAg. The region k
of the torus within
distance " of any given point has volume of order " , so (asymptotically
2

i"Asuciently
for n small) we can reach any feiAg to within distance " with
e , for some integer n of order ";2k . We also know that we can ob-
tain transformations feiAa g where the Aa's span the full U (2k ) Lie algebra,
using P circuits of xed size (independent of "). We may then approach any
exp (i a aAa) as in eq. (6.87), also with polynomial convergence.
In principle, we should be able to do much better, reaching a desired
k-qubit unitary within distance " using just poly(log(";1)) quantum gates.
Since the number of size-T circuits that we can construct acting on k qubits
is exponential in T , and the circuits ll U (2k ) roughly uniformly, there should
be a size-T circuit reaching within a distance of order e;T of any point in
U (2k ). However, it might be a computationally hard problem classically
to work out the circuit that comes exponentially close to the unitary we are
trying to reach. Therefore, it would be dishonest to rely on this more ecient
construction in an asymptotic analysis of quantum complexity.

6.3 Some Quantum Algorithms

While we are not yet able to show that BPP 6= BQP , there are three ap-
proaches that we can pursue to study the di erences between the capabilities
of classical and quantum computers:
(1) Nonexponential speedup. We can nd quantum algorithms that are
demonstrably faster than the best classical algorithm, but not expo-
nentially faster. These algorithms shed no light on the conventional
classi cation of complexity. But they do demonstrate a type of separa-
tion between tasks that classical and quantum computers can perform.
Example: Grover's quantum speedup of the search of an unsorted data
base.
(2) \Relativized" exponential speedup. We can consider the problem of
analyzing the contents of a \quantum black box." The box performs an
268 CHAPTER 6. QUANTUM COMPUTATION
a priori unknown) unitary transformation. We can prepare an input
for the box, and we can measure its output; our task is to nd out
what the box does. It is possible to prove that quantum black boxes
(computer scientists call them oracles7) exist with this property: By
feeding quantum superpositions to the box, we can learn what is inside
with an exponential speedup, compared to how long it would take if we
were only allowed classical inputs. A computer scientist would say that
BPP 6= BQP \relative to the oracle." Example: Simon's exponential
quantum speedup for nding the period of a 2 to 1 function.
(3) Exponential speedup for \apparently" hard problems. We can
exhibit a quantum algorithm that solves a problem in polynomial time,
where the problem appears to be hard classically, so that it is strongly
suspected (though not proved) that the problem is not in BPP . Ex-
ample: Shor's factoring algorithm.
Deutsch's problem. We will discuss examples from all three approaches.
But rst, we'll warm up by recalling an example of a simple quantum algo-
rithm that was previously discussed in x1.5: Deutsch's algorithm for dis-
tinguishing between constant and balanced functions f : f0; 1g ! f0; 1g.
We are presented with a quantum black box that computes f (x); that is, it
enacts the two-qubit unitary transformation
Uf : jxijyi ! jxijy f (x)i; (6.106)
which ips the second qubit i f ( rst qubit) = 1. Our assignment is to
determine whether f (0) = f (1). If we are restricted to the \classical" inputs
j0i and j1i, we need to access the box twice (x = 0 and x = 1) to get the
answer. But if we are allowed to input a coherent superposition of these
\classical" states, then once is enough.
The quantum circuit that solves the problem (discussed in x1.5) is:

j0i H s H Measure
j1i H Uf
7 The term \oracle" signi es that the box responds to a query immediately; that is, the
time it takes the box to operate is not included in the complexity analysis.
6.3. SOME QUANTUM ALGORITHMS 269
Here H denotes the Hadamard transform
H : jxi ! p12 X(;1)xy jyi; (6.107)
y
or
H : j0i ! p12 (j0i + j1i)
j1i ! p12 (j0i ; j1i); (6.108)
that is, H is the 2 2 matrix
p1 p1
!
H : p1 ; p1 :
2 2 (6.109)
2 2
The circuit takes the input j0ij1i to
j0ij1i ! 12 (j0i + j1i)(j0i ; j1i)

! 12 (;1)f (0)j0i + (;1)f (1)j1i (j0i ; j1i)
2
1
! 2 4 (;1)f (0) + (;1)f (1) j0i
3
f (0)
+ (;1) ; (;1)f (1) j1i5 p1 (j0i ; j1i):
2 (6.110)
Then when we measure the rst qubit, we nd the outcome j0i with prob-
ability one if f (0) = f (1) (constant function) and the outcome j1i with
probability one if f (0) 6= f (1) (balanced function).
A quantum computer enjoys an advantage over a classical computer be-
cause it can invoke quantum parallelism. Because we input a superposition
of j0i and j1i, the output is sensitive to both the values of f (0) and f (1),
even though we ran the box just once.
Deutsch{Jozsa problem. Now we'll consider some generalizations of
Deutsch's problem. We will continue to assume that we are to analyze a
quantum black box (\quantum oracle"). But in the hope of learning some-
thing about complexity, we will imagine that we have a family of black boxes,
270 CHAPTER 6. QUANTUM COMPUTATION
with variable input size. We are interested in how the time needed to nd
out what is inside the box scales with the size of the input (where \time" is
measured by how many times we query the box).
In the Deutsch{Jozsa problem, we are presented with a quantum black
box that computes a function taking n bits to 1,
f : f0; 1gn ! f0; 1g; (6.111)
and we have it on good authority that f is either constant (f (x) = c for all
x) or balanced (f (x) = 0 for exactly 21 of the possible input values). We are
to solve the decision problem: Is f constant or balanced?
In fact, we can solve this problem, too, accessing the box only once, using
the same circuit as for Deutsch's problem (but with x expanded from one
bit to n bits). We note that if we apply n Hadamard gates in parallel to
n-qubits.
H (n) = H H : : : H ; (6.112)
then the n-qubit state transforms as
0 1
Yn 1 X 1 2X
n ;1
H : jxi !
( n ) @ p
2 yi =f0;1g
xi y i A
(;1) jyii 2n=2 (;1)xy jyi;
i=1 y=0 (6.113)
where x; y represent n-bit strings, and x y denotes the bitwise AND (or mod
2 scalar product)
x y = (x1 ^ y1) (x2 ^ y2) : : : (xn ^ yn ): (6.114)
Acting on the input (j0i)n j1i, the action of the circuit is
2Xn ;1 !
(j0i) j1i ! 2n=2
n 1 jxi p12 (j0i ; j1i)
x=0
2Xn ;1 !
! 2n=2 (;1) jxi p12 (j0i ; j1i)
1 f (x )
0 2n;x=0 1
1 X 1 2X
n ;1
! @ 2n (;1)f (x)(;1)xy jyiA p1 (j0i ; j1i)
x=0 y=0 2 (6.115)
Now let us evaluate the sum
1 2X n ;1

2n x=0 (;1) (;1) :

f (x) xy (6.116)
6.3. SOME QUANTUM ALGORITHMS 271
If f is a constant function, the sum is
2X
n ;1 !
(;1) f ( x ) 1 (;1) = (;1)f (x)y;0;
x y (6.117)
2n x=0
it vanishes unless y = 0. Hence, when we measure the n-bit register, we
obtain the result jy = 0i (j0i)n with probability one. But if the function
is balanced, then for y = 0, the sum becomes
n;1
1 2X
2n x=0 (;1) = 0;
f (x) (6.118)

(because half of the terms are (+1) and half are (;1)). Therefore, the prob-
ability of obtaining the measurement outcome jy = 0i is zero.
We conclude that one query of the quantum oracle suces to distinguish
constant and balanced function with 100% con dence. The measurement
result y = 0 means constant, any other result means balanced.
So quantum computation solves this problem neatly, but is the problem
really hard classically? If we are restricted to classical input states jxi, we
can query the oracle repeatedly, choosing the input x at random (without
replacement) each time. Once we obtain distinct outputs for two di erent
queries, we have determined that the function is balanced (not constant).
But if the function is in fact constant, we will not be certain it is constant
until we have submitted 2n;1 +1 queries and have obtained the same response
every time. In contrast, the quantum computation gives a de nite response
in only one go. So in this sense (if we demand absolute certainty) the classical
calculation requires a number of queries exponential in n, while the quantum
computation does not, and we might therefore claim an exponential quantum
speedup.
But perhaps it is not reasonable to demand absolute certainty of the
classical computation (particularly since any real quantum computer will be
susceptible to errors, so that the quantum computer will also be unable to
attain absolute certainty.) Suppose we are satis ed to guess balanced or
constant, with a probability of success
P (success) > 1 ; ": (6.119)
If the function is actually balanced, then if we make k queries, the probability
of getting the same response every time is p = 2;(k;1). If after receiving the
272 CHAPTER 6. QUANTUM COMPUTATION
same response k consecutive times we guess that the function is balanced,
then a quick Bayesian analysis shows that the probability that our guess is
wrong is 2k;11+1 (assuming that balanced and constant are a priori equally
probable). So if we guess after k queries, the probability of a wrong guess is
1 ; P (success) = 2k;1 (2k1;1 + 1) : (6.120)

Therefore, we can achieve success probability 1 ; " for ";1 = 2k;1(2k;1 +1) or
k 12 log 1" . Since we can reach an exponentially good success probability
with a polynomial number of trials, it is not really fair to say that the problem
is hard.
Bernstein{Vazirani problem. Exactly the same circuit can be used
to solve another variation on the Deutsch{Jozsa problem. Let's suppose that
our quantum black box computes one of the functions fa, where
fa(x) = a x; (6.121)
and a is an n-bit string. Our job is to determine a.
The quantum algorithm can solve this problem with certainty, given just
one (n-qubit) quantum query. For this particular function, the quantum
state in eq. (6.115) becomes
1 2X X
n ;1 2n ;1

2 x=0 y=0 (;1) (;1) jyi:

ax xy (6.122)
n

But in fact
1 2X
n ;1

2n x=0 (;1) (;1) = a;y ;

ax xy (6.123)

so this state is jai. We can execute the circuit once and measure the n-qubit
register, nding the n-bit string a with probability one.
If only classical queries are allowed, we acquire only one bit of information
from each query, and it takes n queries to determine the value of a. Therefore,
we have a clear separation between the quantum and classical diculty of
the problem. Even so, this example does not probe the relation of BPP
to BQP , because the classical problem is not hard. The number of queries
required classically is only linear in the input size, not exponential.
6.3. SOME QUANTUM ALGORITHMS 273
Simon's problem. Bernstein and Vazirani managed to formulate a vari-
ation on the above problem that is hard classically, and so establish for the
rst time a \relativized" separation between quantum and classical complex-
ity. We will nd it more instructive to consider a simpler example proposed
somewhat later by Daniel Simon.
Once again we are presented with a quantum black box, and this time we
are assured that the box computes a function
f : f0; 1gn ! f0; 1gn ; (6.124)
that is 2-to-1. Furthermore, the function has a \period" given by the n-bit
string a; that is
f (x) = f (y) i y = x a; (6.125)
where here denotes the bitwise XOR operation. (So a is the period if we
regard x as taking values in (Z2)n rather than Z2n .) This is all we know
about f . Our job is to determine the value of a.
Classically this problem is hard. We need to query the oracle an exponen-
tially large number of times to have any reasonable probability of nding a.
We don't learn anything until we are fortunate enough to choose two queries
x and y that happen to satisfy x y = a. Suppose, for example, that we
choose 2n=4 queries. The number of pairs of queries is less than (2n=4)2, and
for each pair fx; yg, the probability that x y = a is 2;n . Therefore, the
probability of successfully nding a is less than
2;n (2n=4 )2 = 2;n=2; (6.126)
even with exponentially many queries, the success probability is exponentially
small.
If we wish, we can frame the question as a decision problem: Either f
is a 1-1 function, or it is 2-to-1 with some randomly chosen period a, each
occurring with an a priori probability 21 . We are to determine whether the
function is 1-to-1 or 2-to-1. Then, after 2n=4 classical queries, our probability
of making a correct guess is
P (success) < 12 + 2n=1 2 ; (6.127)
which does not remain bounded away from 12 as n gets large.
274 CHAPTER 6. QUANTUM COMPUTATION
But with quantum queries the problem is easy! The circuit we use is
essentially the same as above, but now both registers are expanded to n
qubits. We prepare the equally weighted superposition of all n-bit strings
(by acting on j0i with H (n)), and then we query the oracle:
2X
n;1 ! 2X
n ;1
Uf : jxi j0i ! jxijf (x)i: (6.128)
x=0 x=0
Now we measure the second register. (This step is not actually necessary,
but I include it here for the sake of pedagogical clarity.) The measurement
outcome is selected at random from the 2n;1 possible values of f (x), each
occurring equiprobably. Suppose the outcome is f (x0). Then because both
x0 and x0 a, and only these values, are mapped by f to f (x0), we have
prepared the state
p1 (jx0i + jx0 ai) (6.129)
2
in the rst register.
Now we want to extract some information about a. Clearly it would
do us no good to measure the register (in the computational basis) at this
point. We would obtain either the outcome x0 or x0 a, each occurring with
probability 21 , but neither outcome would reveal anything about the value of
a.
But suppose we apply the Hadamard transform H (n) to the register before
we measure:
H (n) : p12 (jx0i + jx0 + ai)
n ;1 h
2X i
1
! 2(n+1)=2 (;1)x0y + (;1)(x0a)y jyi
y=0
X
= 2(n;11)=2 (;1)x0y jyi: (6.130)
ay=0

If a y = 1, then the terms in the coecient of jyi interfere destructively.

Hence only states jyi with a y = 0 survive in the sum over y. The measure-
ment outcome, then, is selected at random from all possible values of y such
that a y = 0, each occurring with probability 2;(n;1).
6.4. QUANTUM DATABASE SEARCH 275
We run this algorithm repeatedly, each time obtaining another value of y
satisfying y a = 0. Once we have found n such linearly independent values
fy1; y2; y3 : : : yng (that is, linearly independent over (Z2)n), we can solve the
equations
y1 a = 0
y2 a = 0
...
yn a = 0 ; (6.131)
to determine a unique value of a, and our problem is solved. It is easy to
see that with O(n) repetitions, we can attain a success probability that is
exponentially close to 1.
So we nally have found an example where, given a particular type of
quantum oracle, we can solve a problem in polynomial time by exploiting
quantum superpositions, while exponential time is required if we are limited
to classical queries. As a computer scientist might put it:

There exists an oracle relative to which BQP 6= BPP .

Note that whenever we compare classical and quantum complexity rela-

tive to an oracle, we are considering a quantum oracle (queries and replies
are states in Hilbert space), but with a preferred orthonormal basis. If we
submit a classical query (an element of the preferred basis) we always receive
a classical response (another basis element). The issue is whether we can
achieve a signi cant speedup by choosing more general quantum queries.

6.4 Quantum Database Search

The next algorithm we will study also exhibits, like Simon's algorithm, a
speedup with respect to what can be achieved with a classical algorithm. But
in this case the speedup is merely quadratic (the quantum time scales like the
square root of the classical time), in contrast to the exponential speedup in
the solution to Simon's problem. Nevertheless, the result (discovered by Lov
Grover) is extremely interesting, because of the broad utility of the algorithm.
276 CHAPTER 6. QUANTUM COMPUTATION
Heuristically, the problem we will address is: we are confronted by a
very large unsorted database containing N 1 items, and we are to lo-
cate one particular item, to nd a needle in the haystack. Mathemat-
ically, the database is represented by a table, or a function f (x), with
x 2 f0; 1; 2; : : : ; N ; 1g. We have been assured that the entry a occurs
in the table exactly once; that is, that f (x) = a for only one value of x. The
problem is, given a, to nd this value of x.
If the database has been properly sorted, searching for x is easy. Perhaps
someone has been kind enough to list the values of a in ascending order.
Then we can nd x by looking up only log2 N entries in the table. Let's
suppose N 2n is a power of 2. First we look up f (x) for x = 2n;1 ; 1, and
check if f (x) is greater than a. If so, we next look up f at x = 2n;2 ; 1, etc.
With each table lookup, we reduce the number of candidate values of x by a
factor of 2, so that n lookups suce to sift through all 2n sorted items. You
can use this algorithm to look up a number in the Los Angeles phone book,
because the names are listed in lexicographic order.
But now suppose that you know someone's phone number, and you want
to look up her name. Unless you are fortunate enough to have access to
a reverse directory, this is a tedious procedure. Chances are you will need
to check quite a few entries in the phone book before you come across her
number.
In fact, if the N numbers are listed in a random order, you will need to
look up 21 N numbers before the probability is P = 21 that you have found
her number (and hence her name). What Grover discovered is that, if you
have a quantum phone book, you can learn p her name with high probability
by consulting the phone book only about N times.
This problem, too, can be formulated as an oracle or \black box" problem.
In this case, the oracle is the phone book, or lookup table. We can input
a name (a value of x) and the oracle outputs either 0, if f (x) 6= a, or 1, if
f (x) = a. Our task is to nd, as quickly as possible, the value of x with
f (x) = a: (6.132)
Why is this problem important? You may have never tried to nd in the
phone book the name that matches a given number, but if it weren't so hard
you might try it more often! More broadly, a rapid method for searching an
unsorted database could be invoked to solve any problem in NP . Our oracle
could be a subroutine that interrogates every potential \witness" y that could
6.4. QUANTUM DATABASE SEARCH 277
potentially testify to certify a solution to the problem. For example, if we
are confronted by a graph and need to know if it admits a Hamiltonian path,
we could submit a path to the \oracle," and it could quickly answer whether
the path is Hamiltonian or not. If we knew a fast way to query the oracle
about all the possible paths, we would be able to nd a Hamiltonian path
eciently (if one exists).

6.4.1 The oracle

So \oracle" could be shorthand for a subroutine that quickly evaluates a func-
tion to check a proposed solution to a decision problem, but let us continue
to regard the oracle abstractly, as a black box. The oracle \knows" that of
the 2n possible strings of length n, one (the \marked" string or \solution" !)
is special. We submit a query x to the oracle, and it tells us whether x = !
or not. It returns, in other words, the value of a function f! (x), with
f! (x) = 0; x 6= !;
f! (x) = 1; x = !: (6.133)
But furthermore, it is a quantum oracle, so it can respond to queries that are
superpositions of strings. The oracle is a quantum black box that implements
the unitary transformation
U f! : jxijyi ! jxijy f! (x)i; (6.134)
where jxi is an n-qubit state, and jyi is a single-qubit state.
As we have previously seen in other contexts, we may choose the state of
the single-qubit register to be p12 (j0i ; j1i), so that the oracle acts as
U f! : jxi p12 (j0i ; j1i)
! (;1)f! (x)jxi p12 (j0i ; j1i): (6.135)
We may now ignore the second register, and obtain
U ! : jxi ! (;1)f! (x)jxi; (6.136)
or
U ! = 1 ; 2j!ih!j: (6.137)
278 CHAPTER 6. QUANTUM COMPUTATION
The oracle ips the sign of the state j!i, but acts trivially on any state or-
thogonal to j!i. This transformation has a simple geometrical interpretation.
Acting on any vector in the 2n -dimensional Hilbert space, U! re ects the vec-
tor about the hyperplane orthogonal to j!i (it preserves the component in
the hyperplane, and ips the component along j!i).
We know that the oracle performs this re ection for some particular com-
putational basis state j!i, but we know nothing a priori about the value of
the string !. Our job is to determine !, with high probability, consulting
the oracle a minimal number of times.

6.4.2 The Grover iteration

As a rst step, we prepare the state
1 ;1 !
NX
jsi = p jxi ; (6.138)
N x=0
The equally weighted superposition of all computational basis states { this
can be done easily by applying the Hadamard transformation to each qubit
of the initial state jx = 0i. Although we do not know the value of !, we do
know that j!i is a computational basis state, so that
jh!jsij = p1 ; (6.139)
N
irrespective of the value of !. Were we to measure the state jsi by project-
ing onto the computational basis, the probability that we would \ nd" the
marked state j!i is only N1 . But following Grover, we can repeatedly iterate
a transformation that enhances the probability amplitude of the unknown
state j!i that we are seeking, while suppressing the amplitude of all of the
undesirable states jx 6= !i. We construct this Grover iteration by combining
the unknown re ection U ! performed by the oracle with a known re ection
that we can perform ourselves. This known re ection is
U s = 2jsihsj ; 1; (6.140)
which preserves jsi, but ips the sign of any vector orthogonal to jsi. Geo-
metrically, acting on an arbitrary vector, it preserves the component along
jsi and ips the component in the hyperplane orthogonal to jsi.
6.4. QUANTUM DATABASE SEARCH 279
We'll return below to the issue of constructing a quantum circuit that
implements U s ; for now let's just assume that we can perform U s eciently.
One Grover iteration is the unitary transformation
Rgrov = U sU ! ; (6.141)
one oracle query followed by our re ection. Let's consider how Rgrov acts in
the plane spanned by j!i and jsi. This action is easiest to understand if we
visualize it geometrically. Recall that
jhsj!ij = p1 sin ; (6.142)
N
so that jsi is rotated by from the axis j!? i normal to j!i in the plane. U !
re ects a vector in the plane about the axis j!?i, and U s re ects a vector
about the axis jsi. Together, the two re ections rotate the vector by 2:

The Grover iteration, then, is nothing but a rotation by 2 in the plane

determined by jsi and j!i.

6.4.3 Finding 1 out of 4

Let's suppose, for example, that there are N = 4 items in the database, with
one marked item. With classical queries, the marked item could be found
in the 1st, 2nd, 3rd, or 4th query; on the average 2 21 queries will be needed
before we are successful and four are needed in the worst case.8 But since
sin = p1N = 21 , we have = 30o and 2 = 60o . After one Grover iteration,
then, we rotate jsi to a 90o angle with j!? i; that is, it lines up with j!i.
When we measure by projecting onto the computational basis, we obtain the
result j!i with certainty. Just one quantum query suces to nd the marked
state, a notable improvement over the classical case.
8Of course, if we know there is one marked state, the 4th query is actually super uous,
so it might be more accurate to say that at most three queries are needed, and 2 41 queries
are required on the average.
280 CHAPTER 6. QUANTUM COMPUTATION
There is an alternative way to visualize the Grover iteration that is some-
times useful, as an \inversion about the average." If we expand a state j i
in the computational basis
X
j i = axjxi; (6.143)
x
then its inner product with jsi = p1N Px jxi is
X p
hsj i = p1N ax = N hai; (6.144)
x
where
X
hai = N1 ax; (6.145)
x
is the mean of the amplitude. Then if we apply U s = 2jsihsj ; 1 to j i, we
obtain
U s j i = X(2hai ; ax)jxi; (6.146)
x
the amplitudes are transformed as
U s : ax ; hai ! hai ; ax; (6.147)
that is the coecient of jxi is inverted about the mean value of the amplitude.
If we consider again the case N = 4, then in the state jsi each amplitude
is 12 . One query of the oracle ips the sign of the amplitude of marked state,
and so reduces the mean amplitude to 41 . Inverting about the mean then
brings the amplitudes of all unmarked states from 12 to zero, and raises the
amplitude of the marked state from ; 12 to 1. So we recover our conclusion
that one query suces to nd the marked state with certainty.
We can also easily see that one query is sucient to nd a marked state
if there are N entries in the database, and exactly 14 of them are marked.
Then, as above, one query reduces the mean amplitude from p1N to 2p1N ,
and inversion about the mean then reduces the amplitude of each unmarked
state to zero.
(When we make this comparison between the number of times we need
to consult the oracle if the queries can be quantum rather than classical, it
6.4. QUANTUM DATABASE SEARCH 281
may be a bit unfair to say that only one query is needed in the quantum
case. If the oracle is running a routine that computes a function, then some
scratch space will be lled with garbage during the computation. We will
need to erase the garbage by running the computation backwards in order
to maintain quantum coherence. If the classical computation is irreversible
there is no need to run the oracle backwards. In this sense, one query of the
quantum oracle may be roughly equivalent, in terms of complexity, to two
queries of a classical oracle.)

6.4.4 Finding 1 out of N

Let's return now to the case in which the database contains N items, and
exactly one item is marked. Each Grover iteration rotates the quantum state
in the plane determined by jsi and j!i; after T iterations, the state is rotated
by + 2T from the j!? i axis. To optimize the probability of nding the
marked state when we nally perform the measurement, we will iterate until
this angle is close to 90o , or
(2T + 1) ' 2 ) 2T + 1 ' 2 ; (6.148)

we recall that sin = p1N , or

' p1 ; (6.149)
N
for N large; if we choose
p
T = 4 N (1 + O(N ;1=2)); (6.150)

then the probability of obtaining the measurement result j!i will be

Prob(!) = sin2 ((2T + 1)) = 1 ; O N1 : (6.151)
p
We conclude that only about 4 N queries are needed to determine ! with
high probability, a quadratic speedup relative to the classical result.
282 CHAPTER 6. QUANTUM COMPUTATION
6.4.5 Multiple solutions
If there are r > 1 marked states, and r is known, we can modify the number
of iterations so that the probability of nding one of the marked states is still
very close to 1. The analysis is just as above, except that the oracle induces
a re ection in the hyperplane orthogonal to the vector
Xr !
1
j!~ i = pr j!i i ; (6.152)
i=1
the equally weighted superposition of the marked computational basis states
j!i i. Now
r
hsj!~ i = Nr sin ; (6.153)
and a Grover iteration rotates a vector by 2 in the plane spanned by jsi
and j!~ i; we again conclude that the state is close to j!~ i after a number of
iterations
s
T ' 4 = 4 Nr :
(6.154)
If we then measure by projecting onto the computational basis, we will nd
one of the marked states (each occurring equiprobably) with probability close
to one. (As the number of solutions increases, the time needed to nd one
of them declines like r;1=2, as opposed to r;1 in the classical case.)
Note that if we continue to perform further Grover iterations, the vector
continues to rotate, and so the probability of nding a marked state (when
we nally measure) begins to decline. The Grover algorithm is like baking a
soue { if we leave it in the oven for too long, it starts to fall. Therefore, if
we don't know anything about the number p of marked states, we might fail to
nd one of them. For example, T 4 N iterations is optimal for r = 1, but

for r = 4, the probability of nding a marked state after this many iterations
is quite close to zero.
But even if we don't know r a priori, we can still nd a solution with
a quadratic speed up over classical algorithms (for r N ). For example,
wepmight choose the number of iterations to be random in the range 0 to
N . Then the expected probability of nding a marked state is close to
4
1=2 for each r, so we are unlikely to fail to nd a marked state after several
6.4. QUANTUM DATABASE SEARCH 283
repetitions. And each time we measure, we can submit the state we nd to
the oracle as a classical query to con rm whether that state is really marked.
In particular, if we don't nd a solution after several attempts, there
probably is no solution. Hence with high probability we can correctly answer
the yes/no question, \Is there a marked state?" Therefore, we can adopt
the Grover algorithm to solve any NP problem, where the oracle checks
a proposed solution, with a quadratic speedup over a classical exhaustive
search.

6.4.6 Implementing the re ection

To perform a Grover iteration, we need (aside from the oracle query) a unitary
transformation

U s = 2jsihsj ; 1; (6.155)

that re ects a vector about the axis de ned by the vector jsi. How do
we build this transformation eciently from quantum gates? Since jsi =
H (n)j0i, where H (n) is the bitwise Hadamard transformation, we may write
U s = H (n)(2j0ih0j ; 1)H (n); (6.156)

so it will suce to construct a re ection about the axis j0i. We can easily
build this re ection from an n-bit To oli gate (n).
Recall that

HxH = z ; (6.157)

a bit ip in the Hadamard rotated basis is equivalent to a ip of the relative

phase of j0i and j1i. Therefore:
284 CHAPTER 6. QUANTUM COMPUTATION

s s
s s
s =
s
...

H g H Z

after conjugating the last bit by H ; (n) becomes controlled(n;1)- z , which

ips the phase of j11 : : : j1i and acts trivially on all other computational
basis states. Conjugating by NOT(n), we obtain U s , aside from an irrelevant
overall minus sign.
You will show in an exercise that the n-bit To oli gate (n) can be con-
structed from 2n ; 5 3-bit To oli gates (3) (if sucient scratch space is
available). Therefore, the circuit that constructs U s has a size linear in
n = log N . Grover's database search (assuming
p the oracle answers a query
instantaneously) takes a time of order N log N . If we regard the oracle as
a subroutine that performspa function evaluation in polylog time, then the
search takes time of order N poly(log N ).

6.5 The Grover Algorithm Is Optimal

Grover's quadratic quantum speedup of the database search is already inter-
esting and potentially important, but surely with more cleverness we can do
better, can't we? No, it turns out that we can't. Grover's algorithm provides
the fastest possible quantum search of an unsorted database, if \time" is
measured according to the number of queries of the oracle.
Considering the case of a single marked state j!i, let U (!; T ) denote a
quantum circuit that calls the oracle T times. We place no restriction on the
circuit aside from specifying the number of queries; in particular, we place
no limit on the number of quantum gates. This circuit is applied to an initial
6.5. THE GROVER ALGORITHM IS OPTIMAL 285
state j (0)i, producing a nal state
j !(t)i = U (!; T )j (0)i: (6.158)
Now we are to perform a measurement designed to distinguish among the
N possible values of !. If we are to be able to perfectly distinguish among
the possible values, the states j !(t)i must all be mutually orthogonal, and
if we are to distinguish correctly with high probability, they must be nearly
orthogonal.
Now, if the states fj ! i are an orthonormal basis, then, for any xed
normalized vector j'i,
NX
;1 p
k j !i ; j'i k2 2N ; 2 N: (6.159)
!=0
(The sum is minimized if jP'i is the equally weighted superposition of all the
basis elements, j'i = p1N ! j !i, as you can show by invoking a Lagrange
multiplier to perform the constrained extremization.) Our strategy will be
to choose the state j'i suitably so that we can use this inequality to learn
something about the number T of oracle calls.
Our circuit with T queries builds a unitary transformation
U (!; T ) = U ! U T U ! U T ;1 : : : U ! U 1; (6.160)
where U ! is the oracle transformation, and the U t's are arbitrary non-oracle
transformations. For our state j'(T )i we will choose the result of applying
U (!; T ) to j (0)i, except with each U ! replaced by 1; that is, the same
circuit, but with all queries submitted to the \empty oracle." Hence,
j'(T )i = U T U T ;1 : : : U 2U 1j (0)i; (6.161)
while
j ! (T )i = U ! U T U ! U T ;1 : : : U ! U 1j (0)i: (6.162)
To compare j'(T )i and j ! (T )i, we appeal to our previous analysis of the
e ect of errors on the accuracy of a circuit, regarding the ! oracle as an
\erroneous" implementation of the empty oracle. The error vector in the
t-th step (cf. eq. (6.63)) is
k jE (!; t)i k =k (U ! ; 1)j'(t)i k
= 2jh!j'(t)ij; (6.163)
286 CHAPTER 6. QUANTUM COMPUTATION
since U ! = 1 ; 2j!ih!j. After T queries we have (cf. eq. (6.66))
X
T
k j ! (T )i ; j'(T )i k 2 jh!j'(t)ij: (6.164)
t=1
From the identity
XT !2 1 X T
ct + 2 (cs ; ct)2
t=1 s;t=1
XT 1 1 XT
= ctcs + 2 cs ; ctcs + 2 cs = T c2t ;
2 2 (6.165)
s;t=1 t=1
we obtain the inequality
XT !2 XT
ct T c2t ; (6.166)
t=1 t=1
which applied to eq. (6.164) yields
X
T !
k j !(T )i ; j'(T )i k2 4T jh!j'(t)ij2 : (6.167)
t=1
Summing over ! we nd
X X
T
k j ! (T )i ; j'(T )i k2 4T h'(t)j'(t)i = 4T 2:
! t=1 (6.168)
Invoking eq. (6.159) we conclude that
p
4T 2 2N ; 2 N; (6.169)
if the states j ! (T )i are mutually orthogonal. We have, therefore, found
that any quantum algorithm that can distinguish all the possible values of
the marked state must query the oracle T times where
s
T N2 ; (6.170)
(ignoring the small correction as N ! 1). Grover's algorithm nds ! in
pN queries, which exceeds this bound by only about 11%. In fact, it is
4
6.6. GENERALIZED SEARCH AND STRUCTURED SEARCH 287
p
possible to re ne the argument to improve the bound to T 4 N (1 ; "),
which is asymptotically saturated by the Grover algorithm.9 Furthermore,
we canpshow that Grover's circuit attains the optimal success probability in
T 4 N queries.
One feels a twinge of disappointment (as well as a surge of admiration
for Grover) at the realization that the database search algorithm cannot be
improved. What are the implications for quantum complexity?
For many optimization problems in the NP class, there is no better
method known than exhaustive search of all the possible solutions. By ex-
ploiting quantum parallelism, we can achieve a quadratic speedup of exhaus-
tive search. Now we have learned that the quadratic speedup is the best
possible if we rely on the power of sheer quantum parallelism, if we don't de-
sign our quantum algorithm to exploit the speci c structure of the problem
we wish to solve. Still, we might do better if we are suciently clever.
The optimality of the Grover algorithm might be construed as evidence
that BQP 6 NP . At least, if it turns out that NP BQP and P 6= NP ,
then the NP problems must share a deeply hidden structure (for which there
is currently no evidence) that is well-matched to the peculiar capabilities of
quantum circuits.
Even the quadratic speedup may prove useful for a variety of NP -complete
optimization problems. But a quadratic speedup, unlike an exponential
one, does not really move the frontier between solvability and intractabil-
ity. Quantum computers may someday outperform classical computers in
performing exhaustive search, but only if the clock speed of quantum devices
does not lag too far behind that of their classical counterparts.

6.6 Generalized Search and Structured Search

In the Grover iteration, we perform the transformation
P U s = 2jsihsj ; 1,
the re ection in the axis de ned by jsi = N x=0 jxi. Why this axis? The
p1 N ;1
advantage of the state jsi is that it has the same overlap with each and every
computational basis state. Therefore, the p overlap of any marked state j!i
with jsi is guaranteed to be jh!jsij = 1= N . Hence, if we know the number
of marked states, we can determine how many iterations are required to nd
a marked state with high probability { the number of iterations needed does
9C. Zalka, \Grover's Quantum Searching Algorithm is Optimal," quant-ph/9711070.
288 CHAPTER 6. QUANTUM COMPUTATION
not depend on which states are marked.
But of course, we could choose to re ect about a di erent axis. If we can
build the unitary U (with reasonable eciency) then we can construct
U (2j0ih0j ; 1)U y = 2U j0ih0jU y ; 1; (6.171)
which re ects in the axis U j0i.
Suppose that
jh!jU j0ij = sin ; (6.172)
where j!i is the marked state. Then if we replace U s in the Grover iteration
by the re ection eq. (6.171), one iteration performs a rotation by 2 in the
plane determined by j!i and U j0i (by the same argument we used for U s ).
Thus, after T iterations, with (2T + I ) = =2, a measurement in the com-
putational basis will nd j!i with high probability. Therefore, we can still
search a database if we replace H (n) by U in Grover's quantum circuit, as
long as U j0i is not orthogonal to the marked state.10 But if we have no a
priori information about which state is marked, then H (n) is the best choice,
not only because jsi has a known overlap with each marked state, but also
because it has the largest average overlap with all the possible marked states.
But sometimes when we are searching a database, we do have some infor-
mation about where to look, and in that case, the generalized search strategy
described above may prove useful.11
As an example of a problem with some auxiliary structure, suppose that
f (x; y) is a one-bit-valued function of the two n-bit strings x and y, and we
are to nd the unique solution to f (x; y) = 1. With Grover's algorithm,
we can search through the N 2 possible values (N = 2n ) of (x; y) and nd
the solution (x0; y0) with high probability after 4 N iterations, a quadratic
speedup with respect to classical search.
But further suppose that g(x) is a function of x only, and that it is
known that g(x) = 1 for exactly M values of x, where 1 M N . And
furthermore, it is known that g(x0) = 1. Therefore, we can use g to help us
nd the solution (x0; y0).
10L.K. Grover \Quantum Computers Can Search Rapidly By Using Almost Any Trans-
formation," quant-ph/9712011.
11E. Farhi and S. Gutmann, \Quantum-Mechanical Square Root Speedup in a Struc-
tured Search Problem," quant-ph/9711035; L.K. Grover, \Quantum Search On Structured
Problems," quant-ph/9802035.
6.7. SOME PROBLEMS ADMIT NO SPEEDUP 289
Now we have two oracles to consult, one that returns the value of f (x; y),
and the other returning the value of g(x). Our task is to nd (x0; y0) with a
minimal number of queries.
Classically, we need of order NM queries to nd the solution with reason-
able probability. We rst evaluate g(x) for each x; then we restrict our search
for a solution to f (x; y) = 1 to only those M values of x such that g(x) = 1.
It is natural to wonder whether there is a way to perform a quantum search
in a time of order the square root of the classicalptime. Exhaustive search
that queries only the f oracle requires time N NM , and so does not do
the job. We need to revise our method of quantum search to take advantage
of the structure provided by g.
qA better method is to rst apply Grover's algorithm to g(x). In about
N iterations, we prepare a state that is close to the equally weighted
4 M
superposition of the M solutions to g(x) = 1. In particular, the state jx0i
appears with amplitude pp1M . Then we apply Grover's algorithm to f (x; y)
with x xed. In about 4 N iterations, the state jx0ijsi evolves to a state
quite close to jx0ijy0i. Therefore jx0; y0i appears with amplitude p1M . p
The unitary transformation we have constructed so far, in about 4 N
queries, can be regarded as the transformation U that de nes a generalized
search. Furthermore, we know that
hx0; y0jU j0; 0i 1
=p : (6.173)
M
p
Therefore, if we iterate the generalized search about 4 M times, we will
have prepared a state that is quite close to jx0; y0i. With altogether about
2 p
4 NM; (6.174)
queries, then, we can nd the solution with high probability. This is indeed
a quadratic speedup with respect to the classical search.

6.7 Some Problems Admit No Speedup

The example of structured search illustrates that quadratic quantum speedups
over classical algorithms can be attained for a variety of problems, not just
for an exhaustive search of a structureless database. One might even dare
290 CHAPTER 6. QUANTUM COMPUTATION
to hope that quantum parallelism enables us to signi cantly speedup any
classical algorithm. This hope will now be dashed { for many problems, no
quantum speedup is possible.
We continue to consider problems with a quantum black box, an oracle,
that computes a function f taking n bits to 1. But we will modify our
notation a little. The function f can be represented as a string of N = 2n
bits
X = XN ;1XN ;2 : : : X1X0; (6.175)
where Xi denotes f (i). Our problem is to evaluate some one-bit-valued
function of X , that is, to answer a yes/no question about the properties
of the oracle. What we will show is that for some functions of X , we can't
evaluate the function with low error probability using a quantum algorithm,
unless the algorithm queries the oracle as many times (or nearly as many
times) as required with a classical algorithm.12
The key idea is that any Boolean function of the Xi 's can be represented
as a polynomial in the Xi 's. Furthermore, the probability distribution for
a quantum measurement can be expressed as a polynomial in X , where the
degree of the polynomial is 2T , if the measurement follows T queries of the
oracle. The issue, then, is whether a polynomial of degree 2T can provide a
reasonable approximation to the Boolean function of interest.
The action of the oracle can be represented as
U O : ji; y; zi ! ji; y Xi ; zi; (6.176)
where i takes values in f0; 1; : : : ; N ; 1g; y 2 f0; 1g, and z denotes the state
of auxiliary qubits not acted upon by the oracle. Therefore, in each 2 2
block spanned by ji; 0; zi and ji; 1; zi; U O is the 2 2 matrix
!
1 ; Xi Xi
Xi 1 ; Xi : (6.177)
Quantum gates other than oracle queries have no dependence on X . There-
fore after a circuit with T queries acts on any initial state, the resulting state
j i has amplitudes that are (at most) T th-degree polynomials in X . If we
perform a POVM on j i, then the probability h jF j i of the outcome asso-
ciated with the positive operator F can be expressed as a polynomial in X
of degree at most 2T .
12E. Farhi, et al., quant-ph/9802045; R. Beals, et al., quant-ph/9802049.
6.7. SOME PROBLEMS ADMIT NO SPEEDUP 291
Now any Boolean function of the Xi 's can be expressed (uniquely) as a
polynomial of degree N in the Xi 's. For example, consider the OR function
of the N Xi's; it is
OR(X ) = 1 ; (1 ; X0)(1 ; X1 ) (1 ; XN ;1); (6.178)
a polynomial of degree N .
Suppose that we would like our quantum circuit to evaluate the OR func-
tion with certainty. Then we must be able to perform a measurement with
two outcomes, 0 and 1, where
Prob(0) = 1 ; OR(X );
Prob(1) = OR(X ): (6.179)
But these expressions are polynomials of degree N , which can arise only if
the circuit queries the oracle at least T times, where
T N2 : (6.180)
We conclude that no quantum circuit with fewer than N=2 oracle calls can
compute OR exactly. In fact, for this function (or any function that takes
the value 0 for just one of its N possible arguments), there is a stronger
conclusion (exercise): we require T N to evaluate OR with certainty.
On the other hand, evaluating the OR function (answering the yes/no
question, \Is there a marked state?") is just
p what the Grover algorithm can
achieve in a number of queries of order N . Thus, while the conclusion is
correct that N queries are needed to evaluate OR with certainty, this result is
a bit misleading. We can evaluate OR probabilistically with far fewer queries.
Apparently, the Grover algorithm
p can construct a polynomial in X that,
though only of degree O( N ), provides a very adequate approximation to
the N -th degree polynomial OR(X ).
But OR, which takes the value 1 for every value of X except X = ~0,
is a very simple Boolean function. We should consider other functions that
might pose a more serious challenge for the quantum computer.
One that comes to mind is the PARITY function: PARITY(X ) takes the
value 0 if the string X contains an even number of 1's, and the value 1 if
the string contains an odd number of 1's. Obviously, a classical algorithm
must query the oracle N times to determine the parity. How much better
292 CHAPTER 6. QUANTUM COMPUTATION
can we do by submitting quantum queries? In fact, we can't do much better
at all { at least N=2 quantum queries are needed to nd the correct value of
PARITY(X ), with probability of success greater than 21 + .
In discussing PARITY it is convenient to use new variables
X~i = 1 ; 2Xi ; (6.181)
that take values 1, so that
NY
;1
PARITY(X ) = X~i ;
~ (6.182)
i=0
also takes values 1. Now, after we execute a quantum circuit with alto-
gether T queries of the oracle, we are to perform a POVM with two possible
outcomes F even and F odd; the outcome will be our estimate of PARITY(X~ ).
As we have already noted, the probability of obtaining the outcome even
(say) can be expressed as a polynomial Peven
(2T ) of degree (at most) 2T in X~,
hF eveni = Peven
(2T )(X~ ): (6.183)
How often is our guess correct? Consider the sum
X (2T ) ~
Peven (X ) PARITY(X~ )
fX~ g
X ~ ) Y X~i :
(2T )(X
N ;1
= Peven (6.184)
fX~ g i=0

Since each term in the polynomial Peven ~ ) contains at most 2T of the X~i 's,
(2T )(X
we can invoke the identity
X ~
Xi = 0; (6.185)
X~i 2f0;1g
to see that the sum in eq. (6.184) must vanish if N > 2T . We conclude that
X (2T ) ~ X ~ );
(2T )(X
Peven (X ) = Peven (6.186)
par(X~ )=1 par(X~ )=;1

hence, for T < N=2, we are just as likely to guess \even" when the actual
PARITY(X~ ) is odd as when it is even (on average). Our quantum algorithm
6.8. DISTRIBUTED DATABASE SEARCH 293
fails to tell us anything about the value of PARITY(X~ ); that is, averaged
over the (a priori equally likely) possible values of Xi , we are just as likely
to be right as wrong.
We can also show, by exhibiting an explicit algorithm (exercise), that
N=2 queries (assuming N even) are sucient to determine PARITY (either
probabilistically or deterministically.) In a sense, then, we can achieve a
factor of 2 speedup compared to classical queries. But that is the best we
can do.

6.8 Distributed database search

We will nd it instructive to view the quantum database search algorithm
from a fresh perspective. We imagine that two parties, Alice and Bob, need
to arrange to meet on a mutually agreeable day. Alice has a calendar that
lists N = 2n days, with each day marked by either a 0, if she is unavailable
that day, or a 1, if she is available. Bob has a similar calendar. Their task is
to nd a day when they will both be available.
Alice and Bob both have quantum computers, but they are very far apart
from one another. (Alice is on earth, and Bob has traveled to the Andromeda
galaxy). Therefore, it is very expensive for them to communicate. They
urgently need to arrange their date, but they must economize on the amount
of information that they send back and forth.
Even if there exists a day when both are available, it might not be easy to
nd it. If Alice and Bob communicate by sending classical bits back and forth,
then in the worst case they will need to exchange of order N = 2n calendar
entries to have a reasonable chance of successfully arranging their date.. We
will ask: can they do better by exchanging qubits instead?13 (The quantum
13In an earlier version of these notes, I proposed a di erent scenario, in which Alice and
Bob had nearly identical tables, but with a single mismatched entry; their task was to nd
the location of the mismatched bit. However, that example was poorly chosen, because
the task can be accomplished with only log N bits of classical communication. (Thanks
to Richard Cleve for pointing out this blunder.) We want Alice to learn the address (a
binary string of length n) of the one entry where her table di ers from Bob's. So Bob
computes the parity of the N=2 entries in his table with a label that takes the value 0 in
its least signi cant bit, and he sends that one parity bit to Alice. Alice compares to the
parity of the same entries in her table, and she infers one bit (the least signi cant bit) of
the address of the mismatched entry. Then they do the same for each of the remaining
n ; 1 bits, until Alice knows the complete address of the \error". Altogether just n bits
294 CHAPTER 6. QUANTUM COMPUTATION
information highway from earth to Andromeda was carefully designed and
constructed, so it does not cost much more to send qubits instead of bits.)
To someone familiar with the basics of quantum information theory, this
sounds like a foolish question. Holevo's theorem told us once and for all that
a single qubit can convey no more than one bit of classical information. On
further re ection, though, we see that Holevo's theorem does not really settle
the issue. While it bounds the mutual information of a state preparation with
a measurement outcome, it does not assure us (at least not directly) that
Alice and Bob need to exchange as many qubits as bits to compare their
calendars. Even so, it comes as a refreshing
p surprise14 to learn that Alice
and Bob can do the job by exchanging O( N log N ) qubits, as compared to
O(N ) classical bits.
To achieve this Alice and Bob must work in concert, implementing a
distributed version of the database search. Alice has access to an oracle
(her calendar) that computes a function fA (x), and Bob has an oracle (his
calendar) that computes fB (x). Together, they can query the oracle
fAB (x) = fA (x) ^ fB (x) : (6.187)
Either one of them can implement the re ection U s, so they can perform a
complete Grover iteration, and can carry out exhaustive search for a suitable
day x such that fAB (x) = 1 (when Alice and Bob are both available). If a
mutually
p agreeable day really exists, they will succeed in nding it after of
order N queries.
How do Alice and Bob query fAB ? We'll describe how they do it acting
on any one of the computational basis states jxi. First Alice performs
jxij0i ! jxijfA(x)i; (6.188)
and then she sends the n + 1 qubits to Bob. Bob performs
jxijfA(x)i ! (;1)fA (x)^fB(x)jxijfA(x)i: (6.189)
This transformation is evidently unitary, and you can easily verify that Bob
can implement it by querying his oracle. Now the phase multiplying jxi is
(;1)fAB (x) as desired, but jfA(x)i remains stored in the other register, which
are sent (and all from Bob to Alice).
14H. Burhman, et al., \Quantum vs. Classical Communication and Computation,"
quant-ph/9802040.
6.8. DISTRIBUTED DATABASE SEARCH 295
would spoil the coherence of a superposition of x values. Bob cannot erase
that register, but Alice can. So Bob sends the n + 1 qubits back to Alice,
and she consults her oracle once more to perform
(;1)fA (x)^fB(x)jxijfA(x)i ! (;1)fA(x)^fB(x)jxij0i:
(6.190)
By exchanging 2(n + 1) qubits, the have accomplished one query of the fAB
oracle, and so can execute one Grover iteration.
Suppose, for example, that Alice and Bob know that there is only one
mutually agreeable date,pbut they have no a priori information about which
date it is. After about 4 N iterations, requiring altogether
Q p
= 4 N 2(log N + 1); (6.191)
qubit exchanges, Alice measures, obtaining the good date with probability
quite close to 1. p
Thus, at least in this special context, exchanging O( N log N ) qubits
is as good as exchanging O(N ) classical bits. Apparently, we have to be
cautious in interpreting the Holevo bound, which ostensibly tells us that a
qubit has no more information-carrying capacity than a bit!
If Alice and Bob don't know in advance how many good dates there are,
they can still perform the Grover search (as we noted in x6.4.5), and will
nd a solution with reasonable probability. With 2 log N bits of classical
communication, they can verify whether the date that they found is really
mutually agreeable.

6.8.1 Quantum communication complexity

More generally, we may imagine that several parties each possess an n-bit
input, and they are to evaluate a function of all the inputs, with one party
eventually learning the value of the function. What is the minimum amount
of communication needed to compute the function (either deterministically
or probabilistically)? The well-studied branch of classical complexity theory
that addresses this question is called communication complexity. What we
established above is a quadratic separation between quantum and classical
communication complexity, for a particular class of two-party functions.
296 CHAPTER 6. QUANTUM COMPUTATION
Aside from replacing the exchange of classical bits by the exchange of
qubits, there are other interesting ways to generalize classical communica-
tion complexity. One is to suppose that the parties share some preexisting
entangled state (either Bell pairs or multipartite entanglement), and that
they may exploit that entanglement along with classical communication to
perform the function evaluation. Again, it is not immediately clear that the
shared entanglement will make things any easier, since entanglement alone
doesn't permit the parties to exchange classical messages. But it turns out
that the entanglement does help, at least a little bit.15
The analysis of communication complexity is a popular past time among
complexity theorists, but this discipline does not yet seem to have assumed
a prominent position in practical communications engineering. Perhaps this
is surprising, considering the importance of eciently distributing the com-
putational load in parallelized computing, which has become commonplace.
Furthermore, it seems that nearly all communication in real life can be re-
garded as a form of remote computation. I don't really need to receive all the
bits that reach me over the telephone line, especially since I will probably re-
tain only a few bits of information pertaining to the call tomorrow (the movie
we decided to go to). As a less prosaic example, we on earth may need to
communicate with a robot in deep space, to instruct it whether to enter and
orbit around a distant star system. Since bandwidth is extremely limited, we
would like it to compute the correct answer to the Yes/No question \Enter
orbit?" with minimal exchange of information between earth and robot.
Perhaps a future civilization will exploit the known quadratic separation
between classical and quantum communication complexity, by exchanging
qubits rather than bits with its otilla of spacecraft. And perhaps an expo-
nential separation will be found, at least in certain contexts, which would
signi cantly boost the incentive to develop the required quantum communi-
cations technology.

6.9 Periodicity
So far, the one case for which we have found an exponential separation be-
tween the speed of a quantum algorithm and the speed of the corresponding
15R. Cleve, et al., \Quantum Entanglement and the Communication Complexity of the
Inner Product Function," quant-ph/9708019; W. van Dam, et al., \Multiparty Quantum
Communication Complexity," quant-ph/9710054.
6.9. PERIODICITY 297
classical algorithm is the case of Simon's problem. Simon's algorithm exploits
quantum parallelism to speed up the search for the period of a function. Its
success encourages us to seek other quantum algorithms designed for other
kinds of period nding.
Simon studied periodic functions taking values in (Z2)n . For that purpose
the n-bit Hadamard transform H (n) was a powerful tool. If we wish instead to
study periodic functions taking values in Z2n , the (discrete) Fourier transform
will be a tool of comparable power.
The moral of Simon's problem is that, while nding needles in a haystack
may be dicult, nding periodically spaced needles in a haystack can be far
easier. For example, if we scatter a photon o of a periodic array of needles,
the photon is likely to be scattered in one of a set of preferred directions,
where the Bragg scattering condition is satis ed. These preferred directions
depend on the spacing between the needles, so by scattering just one photon,
we can already collect some useful information about the spacing. We should
further explore the implications of this metaphor for the construction of
ecient quantum algorithms.
So imagine a quantum oracle that computes a function
f : f0; 1gn ! f0; 1gm ; (6.192)
that has an unknown period r, where r is a positive integer satisfying
1 r 2n : (6.193)
That is,
f (x) = f (x + mr); (6.194)
where m is any integer such that x and x + mr lie in f0; 1; 2; : : : ; 2n ; 1g.
We are to nd the period r. Classically, this problem is hard. If r is, say,
of order 2n=2, we will need to query the oracle of order 2n=4 times before we
are likely to nd two values of x that are mapped to the same value of f (x),
and hence learn something about r. But we will see that there is a quantum
algorithm that nds r in time poly (n).
Even if we know how to compute eciently the function f (x), it may
be a hard problem to determine its period. Our quantum algorithm can
be applied to nding, in poly(n) time, the period of any function that we
can compute in poly(n) time. Ecient period nding allows us to eciently
298 CHAPTER 6. QUANTUM COMPUTATION
solve a variety of (apparently) hard problems, such as factoring an integer,
or evaluating a discrete logarithm.
The key idea underlying quantum period nding is that the Fourier trans-
form can be evaluated by an ecient quantum circuit (as discovered by Peter
Shor). The quantum Fourier transform (QFT) exploits the power of quantum
parallelism to achieve an exponential speedup of the well-known (classical)
fast Fourier transform (FFT). Since the FFT has such a wide variety of
applications, perhaps the QFT will also come into widespread use someday.

6.9.1 Finding the period

The QFT is the unitary transformation that acts on the computational basis
according to
1 NX;1
QFT : jxi ! p e2ixy=N jyi; (6.195)
N y=0
where N = 2n . For now let's suppose that we can perform the QFT eciently,
and see how it enables us to extract the period of f (x).
Emulating
P Simon's algorithm, we rst query the oracle with the input
N x jxi (easily prepared by applying H
p1 (n) to j0i), and so prepare the
state
p1 X jxijf (x)i:
N ;1
(6.196)
N x=0
Then we measure the output register, obtaining the result jf (x0)i for some
0 x0 < r. This measurement prepares in the input register the coherent
superposition of the A values of x that are mapped to f (x0):
p1 X jx0 + jri;
A;1
(6.197)
A j=0
where
N ; r x0 + (A ; 1)r < N; (6.198)
or
A ; 1 < Nr < A + 1: (6.199)
6.9. PERIODICITY 299
Actually, the measurement of the output register is unnecessary. If it is omit-
ted, the state of the input register is an incoherent superposition (summed
over x0 2 f0; 1; : : : r ; 1g) of states of the form eq. (6.197). The rest of the
algorithm works just as well acting on this initial state.
Now our task is to extract the value of r from the state eq. (6.197) that we
have prepared. Were we to measure the input register by projecting onto the
computational basis at this point, we would learn nothing about r. Instead
(cf. Simon's algorithm), we should Fourier transform rst and then measure.
By applying the QFT to the state eq. (6.197) we obtain

p 1 X e2ix0y X e2ijry=N jyi:

N ;1 A;1
(6.200)
NA y=0 j =0
If we now measure in the computational basis, the probability of obtaining
the outcome y is
2
A 1 AX
Prob(y) = N
;1
e 2ijry=N : (6.201)
A j=0
This distribution strongly favors values of y such that yr=N is close to an
integer. For example, if N=r happened to be an integer (and therefore equal
to A), we would have
81
X
A ; 1 > y = A (integer)
<
Prob(y) = 1r A1 e2ijy=A = > r
j =0 :0 otherwise: (6.202)
More generally, we may sum the geometric series
AX
eij = eei ;;11 ;
;1 iA
(6.203)
j =0
where
y = 2yr(mod
N
N): (6.204)
There are precisely r values of y in f0; 1; : : : ; N ; 1g that satisfy
; 2r yr(mod N ) r2 : (6.205)
300 CHAPTER 6. QUANTUM COMPUTATION
(To see this, imagine marking the multiples of r and N on a number line
ranging from 0 to rN ; 1. For each multiple of N , there is a multiple of r no
more than distance r=2 away.) For each of these values, the corresponding
y satis es.
; Nr y Nr : (6.206)
Now, since A ; 1 < Nr , for these values of y all of the terms in the sum
over j in eq. (6.203) lie in the same half-plane, so that the terms interfere
constructively and the sum is substantial.
We know that
j1 ; ei j jj; (6.207)
because the straight-line distance from the origin is less than the arc length
along the circle, and for Ajj , we know that
j1 ; eiA j 2Ajj ; (6.208)
because we can see (either graphically or by evaluating its derivative) that
isr a convex function. We actually have A < r + 1, and hence
this distance N
Ay < 1 + N , but by applying the above bound to
ei(A;1) ; 1 + ei(A;1) ei(A;1) ; 1 ; 1; (6.209)
ei ; 1 ei ; 1
we can still conclude that
eiA ; 1 2(A ; 1)jj ; 1 = 2 A ; 1 + 2 : (6.210)
ei ; 1 jj
Ignoring a possible correction of order 2=A, then, we nd
4 1
Prob(y) 2 r ; (6.211)
for each of the r values of y that satisfy eq. (6.205). Therefore, with a
probability of at least 4=2, the measured value of y will satisfy
k Nr ; 12 y k Nr + 12 ; (6.212)
6.9. PERIODICITY 301
or
k; 1 y k+ 1 ; (6.213)
r 2N N r 2N
where k is an integer chosen from f0; 1; : : : ; r ; 1g. The output of the com-
putation is reasonable likely to be within distance 1=2 of an integer multiple
of N=r.
Suppose that we know that r < M N . Thus N=r is a rational number
with a denominator less than M . Two distinct rational numbers, each with
denominator less than M , can be no closer together than 1=M 2 , since ab ;
c ad;bc
d = bd . If the measurement outcome y satis es eq. (6.212), then there
is a unique value of k=r (with r < M ) determined by y=N , provided that
N M 2. This value of k=r can be eciently extracted from the measured
y=N , by the continued fraction method.
Now, with probability exceeding 4=2, we have found a value of k=r where
k is selected (roughly equiprobably) from f0; 1; 2; : : : ; r ; 1g. It is reasonably
likely that k and r are relatively prime (have no common factor), so that we
have succeeded in nding r. With a query of the oracle, we may check
whether f (x) = f (x + r). But if GCD(k; r) 6= 1, we have found only a factor
(r1) of r.
If we did not succeed, we could test some nearby values of y (the measured
value might have been close to the range ;r=2 yr(mod N ) r=2 without
actually lying inside), or we could try a few multiples of r (the value of
GCD(k; r), if not 1, is probably not large). That failing, we resort to a
repetition of the quantum circuit, this time (with probability at least 4=2)
obtaining a value k0=r. Now k0, too, may have a common factor with r,
in which case our procedure again determines a factor (r2) of r. But it
is reasonably likely that GCD(k; k0) = 1, in which case r = LCM; (r1; r2).
Indeed, we can estimate the probability that randomly selected k and k0 are
relatively prime as follows: Since a prime number p divides a fraction 1=p of
all numbers, the probability that p divides both k and k0 is 1=p2 . And k and
k0 are coprime if and only if there is no prime p that divides both. Therefore,
Y !
Prob(k; k0 coprime) = 1
1 ; p2 = (2) 1 = 6 ' :607
prime p 2 (6.214)
(where (z) denotes the Riemann zeta function). Therefore, we are likely to
succeed in nding the period r after some constant number (independent of
N ) of repetitions of the algorithm.
302 CHAPTER 6. QUANTUM COMPUTATION
6.9.2 From FFT to QFT
Now let's consider the implementation of the quantum Fourier transform.
The Fourier transform
X X 1 X 2ixy=N !
f (x)jxi ! p e f (x) jyi; (6.215)
x y N x
is multiplication by an N N unitary matrix, where the (x; y) matrix element
is (e2i=N )xy . Naively, this transform requires O(N 2) elementary operations.
But there is a well-known and very useful (classical) procedure that reduces
the number of operations to O(N log N ). Assuming N = 2n , we express x
and y as binary expansions
x = xn;1 2n;1 + xn;2 2n;2 + : : : + x1 2 + x0
y = yn;1 2n;1 + yn;2 2n;2 + : : : + y1 2 + y0: (6.216)
In the product of x and y, we may discard any terms containing n or more
powers of 2, as these make no contribution to e2ixy =2n . Hence
xy y (:x ) + y (:x x ) + y (:x x x ) + : : :
2n n;1 0 n;2 1 0 n;3 2 1 0
+ y1(:xn;2xn;3 : : : x0) + y0(:xn;1xn;2 : : : x0); (6.217)
where the factors in parentheses are binary expansions; e.g.,
:x2x1x0 = x22 + x221 + x230 : (6.218)
We can now evaluate
X 2ixy=N
f~(x) = p1 e f (y); (6.219)
N y
for each of the N values of x. But the sum over y factors into n sums over
yk = 0; 1, which can be done sequentially in a time of order n.
With quantum parallelism, we can do far better. From eq. (6.217) we
obtain
X 2ixy=N
QFT :jxi ! p1 e jy i
N y

= p1 n j0i + e2i(:x0)j1i j0i + e2i(:x1x0)j1i
2 2i(:xn;1xn;2:::x0)
: : : j0i + e j1i : (6.220)
6.9. PERIODICITY 303
The QFT takes each computational basis state to an unentangled state of
n qubits; thus we anticipate that it can be eciently implemented. Indeed,
let's consider the case n = 3. We can readily see that the circuit

jx2i H R1 R2 jy2i
jx1i s H R1 jy1i
jx0i s s H jy0i

does the job (but note that the order of the bits has been reversed in the
output). Each Hadamard gate acts as

H : jxk i ! p12 j0i + e2i(:xk)j1i : (6.221)
The other contributions to the relative phase of j0i and j1i in the kth qubit
are provided by the two-qubit conditional rotations, where
!
1 0
Rd = 0 ei=2d ; (6.222)
and d = (k ; j ) is the \distance" between the qubits.
In the case n = 3, the QFT is constructed from three H gates and three
controlled-R gates. For general
n, the obvious generalization of this circuit
requires n H gates and n2 = 21 n(n ; 1) controlled R's. A two qubit gate
is applied to each pair of qubits, again with controlled relative phase =2d ,
where d is the \distance" between the qubits. Thus the circuit family that
implements QFT has a size of order (log N )2.
We can reduce the circuit complexity to linear in log N if we are will-
ing to settle for an implementation of xed accuracy, because the two-qubit
gates acting on distantly separated qubits contribute only exponentially small
phases. If we drop the gates acting on pairs with distance greater than m,
than each term in eq. (6.217) is replaced by an approximation to m bits of
accuracy; the total error in xy=2n is certainly no worse than n2;m , so we
can achieve accuracy " in xy=2n with m log n=". If we retain only the
gates acting on qubit pairs with distance m or less, then the circuit size is
mn n log n=".
304 CHAPTER 6. QUANTUM COMPUTATION
In fact, if we are going to measure in the computational basis immedi-
ately after implementing the QFT (or its inverse), a further simpli cation
is possible { no two-qubit gates are needed at all! We rst remark that the
controlled { Rd gate acts symmetrically on the two qubits { it acts trivially
on j00i; j01i, and j10i, and modi es the phase of j11i by eid . Thus, we
can interchange the \control" and \target" bits without modifying the gate.
With this change, our circuit for the 3-qubit QFT can be redrawn as:

jx2i H s s jy2i
jx1i R1 H s jy1i
jx0i R2 R1 H jy0i

Once we have measured jy0i, we know the value of the control bit in the
controlled-R1 gate that acted on the rst two qubits. Therefore, we will
obtain the same probability distribution of measurement outcomes if, instead
of applying controlled-R1 and then measuring, we instead measure y0 rst,
and then apply (R1)y0 to the next qubit, conditioned on the outcome of the
measurement of the rst qubit. Similarly, we can replace the controlled-R1
and controlled-R2 gates acting on the third qubit by the single qubit rotation
(R2)y0 (R1)y1 ; (6.223)
(that is, a rotation with relative phase (:y1y0)) after the values of y1 and y0
have been measured.
Altogether then, if we are going to measure after performing the QFT,
only n Hadamard gates and n ; 1 single-qubit rotations are needed to im-
plement it. The QFT is remarkably simple!

6.10 Factoring
6.10.1 Factoring as period nding
What does the factoring problem ( nding the prime factors of a large com-
posite positive integer) have to do with periodicity? There is a well-known
6.10. FACTORING 305
(randomized) reduction of factoring to determining the period of a function.
Although this reduction is not directly related to quantum computing, we
will discuss it here for completeness, and because the prospect of using a
quantum computer as a factoring engine has generated so much excitement.
Suppose we want to nd a factor of the n-bit number N . Select pseudo-
randomly a < N , and compute the greatest common divisor GCD(a; N ),
which can be done eciently (in a time of order (log N )3) using the Euclidean
algorithm. If GCD(a; N ) 6= 1 then the GCD is a nontrivial factor of N , and
we are done. So suppose GCD(a; N ) = 1.
[Aside: The Euclidean algorithm. To compute GCD(N1; N2) (for N1 >
N2) rst divide N1 by N2 obtaining remainder R1. Then divide N2 by
R1, obtaining remainder R2. Divide R1 by R2, etc. until the remainder
is 0. The last nonzero remainder is R = GCD(N1; N2). To see that the
algorithm works, just note that (1) R divides all previous remainders
and hence also N1 and N2, and (2) any number that divides N1 and
N2 will also divide all remainders, including R. A number that divides
both N1 and N2, and also is divided by any number that divides both
N1 and N2 must be GCD(N1 ; N2). To see how long the Euclidean
algorithm takes, note that
Rj = qRj+1 + Rj+2 ; (6.224)
where q 1 and Rj+2 < Rj+1; therefore Rj+2 < 21 Rj . Two divisions
reduce the remainder by at least a factor of 2, so no more than 2 log N1
divisions are required, with each division using O((log N )2) elementary
operations; the total number of operations is O((log N )3).]
The numbers a < N coprime to N (having no common factor with N )
form a nite group under multiplication mod N . [Why? We need to establish
that each element a has an inverse. But for given a < N coprime to N , each
ab (mod N ) is distinct, as b ranges over all b < N coprime to N .16 Therefore,
for some b, we must have ab 1 (mod N ); hence the inverse of a exists.]
Each element a of this nite group has a nite order r, the smallest positive
integer such that
ar 1 (mod N ): (6.225)
16If N divides ab ; ab0, it must divide b ; b0.
306 CHAPTER 6. QUANTUM COMPUTATION
The order of a mod N is the period of the function
fN;a(x) = ax (mod N ): (6.226)
We know there is an ecient quantum algorithm that can nd the period of
a function; therefore, if we can compute fN;a eciently, we can nd the order
of a eciently.
Computing fN;a may look dicult at rst, since the exponent x can be
very large. But if x < 2m and we express x as a binary expansion
x = xm;1 2m;1 + xm;2 2m;2 + : : : + x0; (6.227)
we have
ax(mod N ) = (a2m;1 )xm;1 (a2m;2 )xm;2 : : : (a)x0 (mod N ):
(6.228)
Each a2j has a large exponent, but can be computed eciently by a classical
computer, using repeated squaring
a2j (mod N ) = (a2j;1 )2 (mod N ): (6.229)
So only m ; 1j (classical) mod N multiplications are needed to assemble a
table of all a2 's.
The computation of ax(mod N ) is carried out by executing a routine:
INPUT 1
For j = 0 to m ; 1, if xj = 1, MULTIPLY a2j .
This routine requires at most m mod N multiplications, each requiring of
order (log N )2 elementary operations.17 Since r < N , we will have a rea-
sonable chance of success at extracting the period if we choose m 2 log N .
Hence, the computation of fN;a can be carried out by a circuit family of size
O((log N )3). Schematically, the circuit has the structure:
17Using tricks for performing ecient multiplication of very large numbers, the number
of elementary operations can be reduced to O(log N log log N loglog log N ); thus, asymp-
totically for large N , a circuit family with size O(log2 N log log N loglog log N ) can com-
pute fN;a .
6.10. FACTORING 307

jx2i s
jx1i s
jx0i s
j1i a a2 a4

Multiplication by a2j is performed if the control qubit xj has the value 1.

Suppose we have found the period r of a mod N . Then if r is even, we
have

N divides a r2 + 1 a r2 ; 1 : (6.230)
We know that N does not divide ar=2 ; 1; if it did, the order of a would be
r=2. Thus, if it is also the case that N does not divide ar=2 + 1, or
ar=2 6= ;1 (mod N ); (6.231)
then N must have a nontrivial common factor with each of ar=21. Therefore,
GCD(N; ar=2 + 1) 6= 1 is a factor (that we can nd eciently by a classical
computation), and we are done.
We see that, once we have found r, we succeed in factoring N unless
either (1) r is odd or (2) r is even and ar=2 ;1 (mod N ). How likely is
success?
Let's suppose that N is a product of two prime factors p1 6= p2,
N = p1p2 (6.232)
(this is actually the least favorable case). For each a < p1p2, there exist
unique a1 < p1 and a2 < p2 such that
a a1 (mod p1)
a a2 (mod p2): (6.233)
Choosing a random a < N is, therefore, equivalent to choosing random
a; < p1 and a2 < p2.
[Aside: We're using the Chinese Remainder Theorem. The a solving
eq. (6.233) is unique because if a and b are both solutions, then both
308 CHAPTER 6. QUANTUM COMPUTATION
p1 and p2 must divide a ; b. The solution exists because every a < p1p2
solves eq. (6.233) for some a1 and a2. Since there are exactly p1p2 ways
to choose a1 and a2, and exactly p1 p2 ways to choose a, uniqueness
implies that there is an a corresponding to each pair a1; a2.]
Now let r1 denote the order of a1 mod p1 and r2 denote the order of
a2 mod p2 . The Chinese remainder theorem tells us that ar 1 (mod p1p2)
is equivalent to
ar1 1 (mod p1)
ar2 1 (mod p2): (6.234)
Therefore r = LCM(r1; r2). If r1 and r2 are both odd, then so is r, and we
lose.
But if either r1 or r2 is even, then so is r, and we are still in the game. If
ar=2 ;1 (mod p1)
ar=2 ;1 (mod p2): (6.235)
Then we have ar=2 ;1 (mod p1 p2) and we still lose. But if either
ar=2 ;1 (mod p1)
ar=2 1 (mod p2); (6.236)
or
ar=2 1 (mod p1)
ar=2 ;1 (mod p2); (6.237)
then ar=2 6 ;1(mod p1 p2) and we win. (Of course, ar=2 1 (mod p1) and
ar=2 1 (mod p2) is not possible, for that would imply ar=2 1 (mod p1 p2),
and r could not be the order of a.)
Suppose that
r1 = 2c1 odd
r2 = 2c2 odd; (6.238)
where c1 > c2. Then r = LCM(r1; r2) = 2r2 integer, so that ar=2
1 (mod p2) and eq. (6.236) is satis ed { we win! Similarly c2 > c1 im-
plies eq. (6.237) { again we win. But for c1 = c2, r = r1 (odd) = r2 (odd0)
so that eq. (6.235) is satis ed { in that case we lose.
6.10. FACTORING 309
Okay { it comes down to: for c1 = c2 we lose, for c1 6= c2 we win. How
likely is c1 6= c2?
It helps to know that the multiplicative group mod p is cyclic { it contains
a primitive element of order p ; 1, so that all elements are powers of the
primitive element. [Why? The integers mod p are a nite eld. If the group
were not cyclic, the maximum order of the elements would be q < p ; 1, so
that xq 1 (mod p) would have p ; 1 solutions. But that can't be: in a
nite eld there are no more than q qth roots of unity.]
Suppose that p ; 1 = 2k s, where s is odd, and consider the orders of
all the elements of the cyclic group of order p ; 1. For brevity, we'll discuss
only the case k = 1, which is the least favorable case for us. Then if b is a
primitive (order 2s) element, the even powers of b have odd order, and the
odd powers of b have order 2 (odd). In this case, then, r = 2c (odd) where
c 2 f0; 1g, each occuring equiprobably. Therefore, if p1 and p2 are both of
this (unfavorable) type, and a1; a2 are chosen randomly, the probability that
c1 6= c2 is 21 . Hence, once we have found r, our probability of successfully
nding a factor is at least 12 , if N is a product of two distinct primes. If N has
more than two distinct prime factors, our odds are even better. The method
fails if N is a prime power, N = p , but prime powers can be eciently
factored by other methods.

6.10.2 RSA
Does anyone care whether factoring is easy or hard? Well, yes, some people
do.
The presumed diculty of factoring is the basis of the security of the
widely used RSA18 scheme for public key cryptography, which you may have
used yourself if you have ever sent your credit card number over the internet.
The idea behind public key cryptography is to avoid the need to exchange
a secret key (which might be intercepted and copied) between the parties
that want to communicate. The enciphering key is public knowledge. But
using the enciphering key to infer the deciphering key involves a prohibitively
dicult computation. Therefore, Bob can send the enciphering key to Alice
and everyone else, but only Bob will be able to decode the message that Alice
(or anyone else) encodes using the key. Encoding is a \one-way function"
that is easy to compute but very hard to invert.
18For Rivest, Shamir, and Adleman
310 CHAPTER 6. QUANTUM COMPUTATION
(Of course, Alice and Bob could have avoided the need to exchange the
public key if they had decided on a private key in their previous clandestine
meeting. For example, they could have agreed to use a long random string
as a one-time pad for encoding and decoding. But perhaps Alice and Bob
never anticipated that they would someday need to communicate privately.
Or perhaps they did agree in advance to use a one-time pad, but they have
now used up their private key, and they are loath to reuse it for fear that an
eavesdropper might then be able to break their code. Now they are two far
apart to safely exchange a new private key; public key cryptography appears
to be their most secure option.)
To construct the public key Bob chooses two large prime numbers p and
q. But he does not publicly reveal their values. Instead he computes the
product
N = pq: (6.239)
Since Bob knows the prime factorization of N , he also knows the value of the
Euler function '(N ) { the number of number less than N that are coprime
with N . In the case of a product of two primes it is
'(N ) = N ; p ; q + 1 = (p ; 1)(q ; 1); (6.240)
(only multiples of p and q share a factor with N ). It is easy to nd '(N ) if
you know the prime factorization of N , but it is hard if you know only N .
Bob then pseudo-randomly selects e < '(N ) that is coprime with '(N ).
He reveals to Alice (and anyone else who is listening) the value of N and e,
but nothing else.
Alice converts her message to ASCII, a number a < N . She encodes the
message by computing
b = f (a) = ae (mod N ); (6.241)
which she can do quickly by repeated squaring. How does Bob decode the
message?
Suppose that a is coprime to N (which is overwhelmingly likely if p and
q are very large { anyway Alice can check before she encodes). Then
a'(N ) 1 (mod N ) (6.242)
(Euler's theorem). This is so because the numbers less than N and coprime
to N form a group (of order '(N )) under mod N multiplication. The order of
6.10. FACTORING 311
any group element must divide the order of the group (the powers of a form
a subgroup). Since GCD(e; '(N ) = 1, we know that e has a multiplicative
inverse d = e;1 mod '(N ):
ed 1 (mod '(N )): (6.243)
The value of d is Bob's closely guarded secret; he uses it to decode by com-
puting:
f ;1 (b) = bd (mod N )
= aed (mod N )
= a (a'(N ))integer (mod N )
= a (mod N ): (6.244)
[Aside: How does Bob compute d = e;1? The multiplicative inverse is a
byproduct of carrying out the Euclidean algorithm to compute GCD(e; '(N )) =
1. Tracing the chain of remainders from the bottom up, starting with
Rn = 1:
1 = Rn = Rn;2 ; qn;1 Rn;1
Rn;1 = Rn;3 ; qn;2 Rn;2
Rn;2 = Rn;4 ; qn;3 Rn;3
etc : : : : (6.245)
(where the qj 's are the quotients), so that
1 = (1 + qn;1qn;2 )Rn;2 ; qn;1Rn;3
1 = (;qn;1 ; qn;3(1 + qn;1qn;2 ))Rn;3
+ (1 + qn;1qn;2)Rn;4 ;
etc : : : : : (6.246)
Continuing, we can express 1 as a linear combination of any two suc-
cessive remainders; eventually we work our way up to
1 = d e + q '(N ); (6.247)
and identify d as e;1 (mod '(N )).]
312 CHAPTER 6. QUANTUM COMPUTATION
Of course, if Eve has a superfast factoring engine, the RSA scheme is
insecure. She factors N , nds '(N ), and quickly computes d. In fact, she
does not really need to factor N ; it is sucient to compute the order modulo
N of the encoded message ae (mod N ). Since e is coprime with '(N ), the
order of ae (mod N ) is the same as the order of a (both elements generate
the same orbit, or cyclic subgroup). Once the order Ord(a) is known, Eve
computes d~ such that
~ 1 (mod Ord(a))
de (6.248)
so that
(ae)d~ a (aOrd(a))integer (mod N ) a (mod N );
(6.249)
and Eve can decipher the message. If our only concern is to defeat RSA,
we run the Shor algorithm to nd r = Ord(ae), and we needn't worry about
whether we can use r to extract a factor of N or not.
How important are such prospective cryptographic applications of quan-
tum computing? When fast quantum computers are readily available, con-
cerned parties can stop using RSA, or can use longer keys to stay a step
ahead of contemporary technology. However, people with secrets sometimes
want their messages to remain con dential for a while (30 years?). They may
not be satis ed by longer keys if they are not con dent about the pace of
future technological advances.
And if they shun RSA, what will they use instead? Not so many suitable
one-way functions are known, and others besides RSA are (or may be) vul-
nerable to a quantum attack. So there really is a lot at stake. If fast large
scale quantum computers become available, the cryptographic implications
may be far reaching.
But while quantum theory taketh away, quantum theory also giveth;
quantum computers may compromise public key schemes, but also o er an
alternative: secure quantum key distribution, as discussed in Chapter 4.

6.11 Phase Estimation

There is an alternative way to view the factoring algorithm (due to Kitaev)
that deepens our insight into how it works: we can factor because we can
6.11. PHASE ESTIMATION 313
measure eciently and accurately the eigenvalue of a certain unitary opera-
tor.
Consider a < N coprime to N , let x take values in f0; 1; 2; : : : ; N ; 1g,
and let U a denote the unitary operator
U a : jxi ! jax (mod N )i: (6.250)
This operator is unitary (a permutation of the computational basis) because
multiplication by a mod N is invertible.
If the order of a mod N is r, then
U ra = 1: (6.251)
It follows that all eigenvalues of U a are rth roots of unity:
k = e2ik=r ; k 2 f0; 1; 2; : : : ; r ; 1g: (6.252)
The corresponding eigenstates are
rX
;1
jk i = p1r e;2ikj=r jaj x0(mod N )i; (6.253)
j =0

associated with each orbit of length r generated by multiplication by a, there

are r mutually orthogonal eigenstates.
U a is not hermitian, but its phase (the Hermitian operator that generates
U a) is an observable quantity. Suppose that we can perform a measurement
that projects onto the basis of U a eigenstates, and determines a value k
selected equiprobably from the possible eigenvalues. Hence the measurement
determines a value of k=r, as does Shor's procedure, and we can proceed to
factor N with a reasonably high success probability. But how do we measure
the eigenvalues of a unitary operator?
Suppose that we can execute the unitary U conditioned on a control bit,
and consider the circuit:
j 0i H s H Measure
ji U ji
314 CHAPTER 6. QUANTUM COMPUTATION
Here ji denotes an eigenstate of U with eigenvalue (U ji = ji). Then
the action of the circuit on the control bit is
j0i ! p12 (j0i + j1i) ! p12 (j0i + j1i)

! 12 (1 + )j0i + 12 (1 ; )j1i: (6.254)

Then the outcome of the measurement of the control qubit has probability
distribution
Prob(0) = 12 (1 + ) = cos2()
2

1
Prob(1) = 2 (1 ; ) j2 = sin2(); (6.255)
where = e2i.
As we have discussed previously (for example in connection with Deutsch's
problem), this procedure distinguishes with certainty between the eigenval-
ues = 1 ( = 0) and = ;1 ( = 1=2): But other possible values of can
also be distinguished, albeit with less statistical con dence. For example,
suppose the state on which U acts is a superposition of U eigenstates
1 j1 i + 2 j2 i: (6.256)
And suppose we execute the above circuit n times, with n distinct control
bits. We thus prepare the state
!n
1 + 1 1 ; 1
1j1 i
2 j0i + 2 j1i
!n
1 + 2
+ 2j2i 2 j0i + 2 j1i : 1 ; 2
(6.257)
If 1 6= 2, the overlap between the two states of the n control bits is ex-
ponentially small for large n; by measuring the control bits, we can perform
the orthogonal projection onto the fj1i; j2ig basis, at least to an excellent
approximation.
If we use enough control bits, we have a large enough sample to measure
Prob (0)= 21 (1 + cos 2) with reasonable statistical con dence. By execut-
ing a controlled-(iU ), we can also measure 12 (1 + sin 2) which suces to
determine modulo an integer.
6.11. PHASE ESTIMATION 315
However, in the factoring algorithm, we need to measure the phase of
e2ik=r to exponential accuracy, which seems to require an exponential number
of trials. Suppose, though, that we can eciently compute high powers of U
(as is the case for U a) such as
U 2j : (6.258)
By applying the above procedure to measurement of U 2j , we determine
exp(2i2j ); (6.259)
where e2i is an eigenvalue of U . Hence, measuring U 2j to one bit of accu-
racy is equivalent to measuring the j th bit of the eigenvalue of U .
We can use this phase estimation procedure for order nding, and hence
factorization. We invert eq. (6.253) to obtain
rX
;1
jx0i = p1r jk i; (6.260)
k=0
each computational basis state (for x0 6= 0) is an equally weighted superpo-
sition of r eigenstates of U a.
Measuring the eigenvalue, we obtain k = e2ik=r , with k selected from
f0; 1 : : : ; r ; 1g equiprobably. If r < 2n , we measure to 2n bits of precision to
determine k=r. In principle, we can carry out this procedure in a computer
that stores fewer qubits than we would need to evaluate the QFT, because
we can attack just one bit of k=r at a time.
But it is instructive to imagine that we incorporate the QFT into this
phase estimation procedure. Suppose the circuit

j 0i H s p12 (j0i + 4 j1i)

j 0i H s 2 (j0i +
p1 2 j1i)

j 0i H s p12 (j0i + j1i)

ji U U2 U4
316 CHAPTER 6. QUANTUM COMPUTATION
acts on the eigenstate ji of the unitary transformation U . The conditional
U prepares p12 (j0i + j1i), the conditional U 2 prepares p12 (j0i + 2j1i), the
conditional U 4 prepares p12 (j0i + 4j1i), and so on. We could perform a
Hadamard and measure each of these qubits to sample the probability dis-
tribution governed by the j th bit of , where = e2i. But a more ecient
method is to note that the state prepared by the circuit is
p1 m X e2iy jyi:
2m;1
(6.261)
2 y=0
A better way to learn the value of is to perform the QFT(m), not the
Hadamard H (m), before we measure.
Considering the case m = 3 for clarity, the circuit that prepares and then
Fourier analyzes the state
p1 X e2iy jyi
7
(6.262)
8 y=0
is
j0i H r H r r jy~0i
j0i H r 1 H r jy~1i
j0i H r 2 1 H jy~2i
U U2 U4
This circuit very nearly carries out our strategy for phase estimation out-
lined above, but with a signi cant modi cation. Before we execute the nal
Hadamard transformation and measurement of y~1 and y~2, some conditional
phase rotations are performed. It is those phase rotations that distinguish
the QFT(3) from Hadamard transform H (3), and they strongly enhance the
reliability with which we can extract the value of .
We can understand better what the conditional rotations are doing if we
suppose that = k=8, for k 2 f0; 1; 2 : : : ; 7g; in that case, we know that the
Fourier transform will generate the output y~ = k with probability one. We
may express k as the binary expansion
k = k2 k1k0 k2 4 + k1 2 + k0: (6.263)
6.12. DISCRETE LOG 317
In fact, the circuit for the least signi cant bit y~0 of the Fourier transform
is precisely Kitaev's measurement circuit applied to the unitary U 4, whose
eigenvalue is
(e2i)4 = eik = eik0 = 1: (6.264)
The measurement circuit distinguishes eigenvalues 1 perfectly, so that y~0 =
k0 .
The circuit for the next bit y~1 is almost the measurement circuit for U 2,
with eigenvalue
(e2i)2 = eik=2 = ei(k1 k0): (6.265)
Except that the conditional phase rotation has been inserted, which multi-
plies the phase by exp[i(k0)], resulting in eik1 . Again, applying a Hadamard
followed by measurement, we obtain the outcome y~1 = k1 with certainty.
Similarly, the circuit for y~2 measures the eigenvalue
e2i = eik=4 = ei(k2 k1k0 ); (6.266)
except that the conditional rotation removes ei(k1k0 ), so that the outcome
is y~2 = k2 with certainty.
Thus, the QFT implements the phase estimation routine with maximal
cleverness. We measure the less signi cant bits of rst, and we exploit
the information gained in the measurements to improve the reliability of our
estimate of the more signi cant bits. Keeping this interpretation in mind,
you will nd it easy to remember the circuit for the QFT(n)!

6.12 Discrete Log

Sorry, I didn't have time for this.

6.13 Simulation of Quantum Systems

Ditto.
318 CHAPTER 6. QUANTUM COMPUTATION
6.14 Summary
Classical circuits. The complexity of a problem can be characterized by the
size of a uniform family of logic circuits that solve the problem: The problem
is hard if the size of the circuit is a superpolynomial function of the size of
the input. One classical universal computer can simulate another eciently,
so the classi cation of complexity is machine independent. The 3-bit To oli
gate is universal for classical reversible computation. A reversible computer
can simulate an irreversible computer without a signi cant slowdown and
without unreasonable memory resources.
Quantum Circuits. Although there is no proof, it seems likely that
polynomial-size quantum circuits cannot be simulated by polynomial-size
probabilistic classical circuits (BQP 6= BPP ); however, polynomial space is
sucient (BQP PSPACE ). A noisy quantum circuit can simulate an
ideal quantum circuit of size T to acceptable accuracy if each quantum gate
has an accuracy of order 1=T . One universal quantum computer can simulate
another eciently, so that the complexity class BQP is machine independent.
A generic two-qubit quantum gate, if it can act on any two qubits in a device,
is adequate for universal quantum computation. A controlled-NOT gate plus
a generic one-qubit gate is also adequate.
Fast Quantum Searching. Exhaustive search for a marked item in an
unsorted database pof N items can be carried out by a quantum computer
in a time of order N , but no faster. Quadratic quantum speedups can be
achieved for some structured search problems, too, but some oracle prob-
lems admit no signi cant quantum speedup. Two parties, each in possession
of a table withpN entries, can locate a \collision" between their tables by
exchanging O( N ) qubits, in apparent violation of the spirit (but not the
letter) of the Holevo bound.
Period Finding. Exploiting quantum parallelism, the Quantum Fourier
Transform in an N -dimensional space can be computed in time of order
(log N )2 (compared to time N log N for the classical fast Fourier transform);
if we are to measure immediately afterward, one qubit gates are sucient
to compute the QFT. Thus quantum computers can eciently solve certain
problems with a periodic structure, such as factoring and the discrete log
problem.
6.15. EXERCISES 319
6.15 Exercises
6.1 Linear simulation of To oli gate.
In class we constructed the n-bit To oli gate ((n)) from 3-bit To oli
gates ((3)'s). The circuit required only one bit of scratch space, but
the number of gates was exponential in n. With more scratch, we can
substantially reduce the number of gates.
a) Find a circuit family with 2n ; 5 (3)'s that evaluates (n). (Here n ;
3 scratch bits are used, which are set to 0 at the beginning of the
computation and return to the value 0 at the end.)
b) Find a circuit family with 4n ; 12 (3)'s that evaluates (n), which works
irrespective of the initial values of the scratch bits. (Again the n ; 3
scratch bits return to their initial values, but they don't need to be set
to zero at the beginning.)
6.2 A universal quantum gate set.
The purpose of this exercise is to complete the demonstration that the
controlled-NOT and arbitrary one-qubit gates constitute a universal
set.
a) If U is any unitary 2 2 matrix with determinant one, nd unitary A; B,
and C such that
ABC = 1 (6.267)
AxBxC = U : (6.268)
Hint: From the Euler angle construction, we know that
U = Rz ( )Ry ()Rz (); (6.269)
where, e.g., Rz () denotes a rotation about the z-axis by the angle .
We also know that, e.g.,
xRz ()x = Rz (;): (6.270)
b) Consider a two-qubit controlled phase gate: it applies U = ei 1 to the
second qubit if the rst qubit has value j1i, and acts trivially otherwise.
Show that it is actually a one-qubit gate.
320 CHAPTER 6. QUANTUM COMPUTATION
c) Draw a circuit using controlled-NOT gates and single-qubit gates that
implements controlled-U , where U is an arbitrary 2 2 unitary trans-
formation.
6.3 Precision.
The purpose of this exercise is to connect the accuracy of a quantum
state with the accuracy of the corresponding probability distribution.
a) Let k A ksup denote the sup norm of the operator A, and let
h i
k A ktr= tr (AyA)1=2 ; (6.271)
denote its trace norm. Show that
k AB ktr k B ksup k A ktr and j tr A j k A ktr :
(6.272)
b) Suppose and ~ are two density matrices, and fjaig is a complete or-
thonormal basis, so that
Pa = hajjai;

P~a = haj~ jai; (6.273)

are the corresponding probability distributions. Use (a) to show that
X
jPa ; P~aj k ; ~ ktr : (6.274)
a

c) Suppose that = j ih j and ~ = j ~ih ~j are pure states. Use (b) to show
that
X
jPa ; P~aj 2 k j i ; j ~i k : (6.275)
a

6.4 Continuous-time database search

A quantum system with an n-qubit Hilbert space has the Hamiltonian
H ! = E j!ih!j; (6.276)
6.15. EXERCISES 321
where j!i is an unknown computational-basis state. You are to nd
the value of ! by the following procedure. Turn on a time-independent
perturbation H 0 of the Hamiltonian, so that the total Hamiltonian
becomes
H = H ! + H 0: (6.277)
Prepare an initial state j 0i, and allow the state to evolve, as governed
by H , for a time T . Then measure the state. From the measurement
result you are to infer !.
a) Suppose the initial state is chosen to be
1 2X
n ;1
jsi = 2n=2 jxi; (6.278)
x=0
and the perturbation is
H 0 = E jsihsj: (6.279)
Solve the time-independent Schrodinger equation
i dtd j i = H j i (6.280)
to nd the state at time T . How should T be chosen to optimize the
likelihood of successfully determining !?
b) Now suppose that we may choose j 0i and H 0 however we please, but
we demand that the state of the system after time T is j!i, so that
the measurement determines ! with success probability one. Derive a
lower bound that T must satisfy, and compare to your result in (a).
(Hint: As in our analysis in class, compare evolution governed by H
with evolution governed by H 0 (the case of the \empty oracle"), and
use the Schrodinger equation to bound how rapidly the state evolving
according to H deviates from the state evolving according to H 0.)
Chapter 7
Quantum Error Correction
7.1 A Quantum Error-Correcting Code
In our study of quantum algorithms, we have found persuasive evidence that
a quantum computer would have extraordinary power. But will quantum
computers really work? Will we ever be able to build and operate them?
To do so, we must rise to the challenge of protecting quantum information
from errors. As we have already noted in Chapter 1, there are several as-
pects to this challenge. A quantum computer will inevitably interact with its
surroundings, resulting in decoherence and hence in the decay of the quan-
tum information stored in the device. Unless we can successfully combat
decoherence, our computer is sure to fail. And even if we were able to pre-
vent decoherence by perfectly isolating the computer from the environment,
errors would still pose grave diculties. Quantum gates (in contrast to clas-
sical gates) are unitary transformations chosen from a continuum of possible
values. Thus quantum gates cannot be implemented with perfect accuracy;
the e ects of small imperfections in the gates will accumulate, eventually
leading to a serious failure in the computation. Any e ective strategem to
prevent errors in a quantum computer must protect against small unitary
errors in a quantum circuit, as well as against decoherence.
In this and the next chapter we will see how clever encoding of quan-
tum information can protect against errors (in principle). This chapter will
present the theory of quantum error-correcting codes. We will learn that
quantum information, suitably encoded, can be deposited in a quantum mem-
ory, exposed to the ravages of a noisy environment, and recovered without
1
2 CHAPTER 7. QUANTUM ERROR CORRECTION
damage (if the noise is not too severe). Then in Chapter 8, we will extend the
theory in two important ways. We will see that the recovery procedure can
work e ectively even if occasional errors occur during recovery. And we will
learn how to process encoded information, so that a quantum computation
can be executed successfully despite the debilitating e ects of decoherence
and faulty quantum gates.
A quantum error-correcting code (QECC) can be viewed as a mapping
of k qubits (a Hilbert space of dimension 2k ) into n qubits (a Hilbert space
of dimension 2n ), where n > k. The k qubits are the \logical qubits" or
\encoded qubits" that we wish to protect from error. The additional n ; k
qubits allow us to store the k logical qubits in a redundant fashion, so that
the encoded information is not easily damaged.
We can better understand the concept of a QECC by revisiting an ex-
ample that was introduced in Chapter 1, Shor's code with n = 9 and k = 1.
We can characterize the code by specifying two basis states for the code sub-
space; we will refer to these basis states as j0i, the \logical zero" and j1i, the
\logical one." They are
j0i = [ p12 (j000i + j111i)] 3 ;
j1i = [ p12 (j000i ; j111i)] 3 ; (7.1)
each basis state is a 3-qubit cat state, repeated three times. As you will
recall from the discussion of cat states in Chapter 4, the two basis states
can be distinguished by the 3-qubit observable (1) x (2)
x x (where
(3)
(i
x denotes the Pauli matrix x acting on the ith qubit); we will use the
)
notation X 1X 2X 3 for this operator. (There is an implicit I I I
acting on the remaining qubits that is suppressed in this notation.) The
states j0i and j1i are eigenstates of X 1X 2X 3 with eigenvalues +1 and ;1
respectively. But there is no way to distinguish j0i from j1i (to gather any
information about the value of the logical qubit) by observing any one or two
of the qubits in the block of nine. In this sense, the logical qubit is encoded
nonlocally; it is written in the nature of the entanglement among the qubits
in the block. This nonlocal property of the encoded information provides
protection against noise, if we assume that the noise is local (that it acts
independently, or nearly so, on the di erent qubits in the block).
Suppose that an unknown quantum state has been prepared and encoded
as aj0i + bj1i. Now an error occurs; we are to diagnose the error and reverse
7.1. A QUANTUM ERROR-CORRECTING CODE 3
it. How do we proceed? Let us suppose, to begin with, that a single bit ip
occurs acting on one of the rst three qubits. Then, as discussed in Chapter
1, the location of the bit ip can be determined by measuring the two-qubit
operators
Z 1 Z 2 ; Z 2Z 3 : (7.2)
The logical basis states j0i and j1i are eigenstates of these operators with
eigenvalue 1. But ipping any of the three qubits changes these eigenvalues.
For example, if Z 1Z 2 = ;1 and Z 2Z 3 = 1, then we infer that the rst
qubit has ipped relative to the other two. We may recover from the error
by ipping that qubit back.
It is crucial that our measurement to diagnose the bit ip is a collective
measurement on two qubits at once | we learn the value of Z 1Z 2, but we
must not nd out about the separate values of Z 1 and Z 2, for to do so
would damage the encoded state. How can such a collective measurement
be performed? In fact we can carry out collective measurements if we have
a quantum computer that can execute controlled-NOT gates. We rst intro-
duce an additional \ancilla" qubit prepared in the state j0i, then execute the
quantum circuit

{ Figure {

and nally measure the ancilla qubit. If the qubits 1 and 2 are in a state
with Z 1Z 2 = ;1 (either j0i1j1i2 or j1i1j0i2), then the ancilla qubit will ip
once and the measurement outcome will be j1i. But if qubits 1 and 2 are
in a state with Z 1Z 2 = 1 (either j0i1 j0i2 or j1i1j1i2 ), then the ancilla qubit
will ip either twice or not at all, and the measurement outcome will be j0i.
Similarly, the two-qubit operators
Z 4Z 5; Z 7Z 8 ;
Z 5Z 6; Z 8Z 9 ; (7.3)
can be measured to diagnose bit ip errors in the other two clusters of three
qubits.
A three-qubit code would suce to protect against a single bit ip. The
reason the 3-qubit clusters are repeated three times is to protect against
4 CHAPTER 7. QUANTUM ERROR CORRECTION
phase errors as well. Suppose now that a phase error
j i ! Zj i (7.4)
occurs acting on one of the nine qubits. We can diagnose in which cluster
the phase error occurred by measuring the two six-qubit observables
X 1 X 2X 3 X 4 X 5 X 6;

X 4 X 5X 6 X 7 X 8 X 9: (7.5)
The logical basis states j0i and j1i are both eigenstates with eigenvalue one
of these observables. A phase error acting on any one of the qubits in a
particular cluster will change the value of XXX in that cluster relative to
the other two; the location of the change can be identi ed by measuring the
observables in eq. (7.5). Once the a ected cluster is identi ed, we can reverse
the error by applying Z to one of the qubits in that cluster.
How do we measure the six-qubit observable X 1X 2X 3X 4X 5X 6? Notice
that if its control qubit is initially in the state p12 (j0i + j1i), and its target is
an eigenstate of X (that is, NOT) then a controlled-NOT acts according to
CNOT : p1 (j0i + j1i) jxi ! p1 (j0i + (;1)xj1i) jxi;
2 2 (7.6)
it acts trivially if the target is the X = 1 (x = 0) state, and it ips the
control if the target is the X = ;1 (x = 1) state. To measure a product of
X 's, then, we execute the circuit

{ Figure {

and then measure the ancilla in the p12 (j0i j1i) basis.
We see that a single error acting on any one of the nine qubits in the block
will cause no irrevocable damage. But if two bit ips occur in a single cluster
of three qubits, then the encoded information will be damaged. For example,
if the rst two qubits in a cluster both ip, we will misdiagnose the error and
attempt to recover by ipping the third. In all, the errors, together with our
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 5
mistaken recovery attempt, apply the operator X 1X 2X 3 to the code block.
Since j0i and j1i are eigenstates of X 1X 2X 3 with distinct eigenvalues, the
e ect of two bit ips in a single cluster is a phase error in the encoded qubit:
X 1X 2 X 3 : aj0i + bj1i ! aj0i ; bj1i : (7.7)
The encoded information will also be damaged if phase errors occur in two
di erent clusters. Then we will introduce a phase error into the third cluster
in our misguided attempt at recovery, so that altogether Z 1Z 4Z 7 will have
been applied, which ips the encoded qubit:
Z 1Z 4 Z 7 : aj0i + bj1i ! aj1i + bj0i : (7.8)
If the likelihood of an error is small enough, and if the errors acting on
distinct qubits are not strongly correlated, then using the nine-qubit code
will allow us to preserve our unknown qubit more reliably than if we had not
bothered to encode it at all. Suppose, for example, that the environment
acts on each of the nine qubits, independently subjecting it to the depolar-
izing channel described in Chapter 3, with error probability p. Then a bit
ip occurs with probability 32 p, and a phase ip with probability 23 p. (The
probability that both occur is 13 p). We can see that the probability of a phase
error a ecting the logical qubit is bounded above by 4p2 , and the probability
of a bit ip error is bounded above by 12p2 . The total error probability is no
worse than 16p2 ; this is an improvement over the error probability p for an
unprotected qubit, provided that p < 1=16.
Of course, in this analysis we have implicitly assumed that encoding,
decoding, error syndrome measurement, and recovery are all performed aw-
lessly. In Chapter 8 we will examine the more realistic case in which errors
occur during these operations.

7.2 Criteria for Quantum Error Correction

In our discussion of error recovery using the nine-qubit code, we have assumed
that each qubit undergoes either a bit- ip error or a phase- ip error (or both).
This is not a realistic model for the errors, and we must understand how to
implement quantum error correction under more general conditions.
To begin with, consider a single qubit, initially in a pure state, that in-
teracts with its environment in an arbitrary manner. We know from Chapter
6 CHAPTER 7. QUANTUM ERROR CORRECTION
3 that there is no loss or generality (we may still represent the most gen-
eral superoperator acting on our qubit) if we assume that the initial state
of the environment is a pure state, which we will denote as j0iE . Then the
evolution of the qubit and its environment can be described by a unitary
transformation
U: j0i j0iE ! j0i je00iE + j1i je01iE ;
j1i j0iE ! j0i je10iE + j1i je11iE ; (7.9)
here the four jeij iE are states of the environment that need not be normalized
or mutually orthogonal (though they do satisfy some constraints that follow
from the unitarity of U ). Under U , an arbitrary state j i = aj0i + bj1i of
the qubit evolves as
U : (aj0i + bj1i)j0iE ! a(j0ije00iE + j1ije01iE )
+ b(j0ije10iE + j1ije11iE )

= (aj0i + bj1i) 12 (je00iE + je11iE )

+ (aj0i ; bj1i) 12 (je00iE ; je11iE )
+ (aj1i + bj0i) 12 (je01iE + je10iE )
+ (aj1i ; bj0i) 12 (je01iE ; je10iE )

I j i jeI iE + X j i jeX iE + Y j i jeY iE

+ Z j i jeZ iE : (7.10)
The action of U can be expanded in terms of the (unitary) Pauli operators
fI ; X ; Y ; Z g, simply because these are a basis for the vector space of 2 2
matrices. Heuristically, we might interpret this expansion by saying that one
of four possible things happens to the qubit: nothing (I ), a bit ip (X ), a
phase ip (Z ), or both (Y = iXZ ). However, this classi cation should not
be taken literally, because unless the states fjeI i; jeX i; jeY i; jeZ ig of the en-
vironment are all mutually orthogonal, there is no conceivable measurement
that could perfectly distinguish among the four alternatives.
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 7
Similarly, an arbitrary 2n 2n matrix acting on an n-qubit Hilbert space
can be expanded in terms of the 22n operators
fI ; X ; Y ; Z g n ; (7.11)
that is, each such operator can be expressed as a tensor-product \string" of
single-qubit operators, with each operator in the string chosen from among
the identity and the three Pauli matrices X ; Y ; and Z . Thus, the action
of an arbitrary unitary operator on n qubits plus their environment can be
expanded as
X
j i j0iE ! E aj i jeaiE ; (7.12)
a
here the index a ranges over 22n values. The fE ag are the linearly inde-
pendent Pauli operators acting on the n qubits, and the fjeaiE g are the
corresponding states of the environment (which are not assumed to be nor-
malized or mutually orthogonal). A crucial feature of this expansion for what
follows is that each E a is a unitary operator.
Eq. (7.12) provides the conceptual foundation of quantum error correc-
tion. In devising a quantum error-correcting code, we identify a subset E of
all the Pauli operators,
E fE ag fI ; X ; Y ; Z g n ; (7.13)
these are the errors that we wish to be able to correct. Our aim will be
to perform a collective measurement of the n qubits in the code block that
will enable us to diagnose which error E a 2 E occurred. If j i is a state
in the code subspace, then for some (but not all) codes this measurement
will prepare a state E aj i jeaiE , where the value of a is known from the
measurement outcome. Since E a is unitary, we may proceed to apply E ya(=
E a) to the code block, thus recovering the undamaged state j i.
Each Pauli operator can be assigned a weight, an integer t with 0 t n;
the weight is the number of qubits acted on by a nontrivial Pauli matrix
(X ; Y ; or Z ). Heuristically, then, we can interpret a term in the expansion
eq. (7.12) where E a has weight t as an event in which errors occur on t
qubits (but again we cannot take this interpretation too literally if the states
fjeaiE g are not mutually orthogonal). Typically, we will take E to be the set
of all Pauli operators of weight up to and including t; then if we can recover
from any error superoperator with support on the set E , we will say that the
8 CHAPTER 7. QUANTUM ERROR CORRECTION
code can correct t errors. In adopting such an error set, we are implicitly
assuming that the errors aicting di erent qubits are only weakly correlated
with one another, so that the amplitude for more than t errors on the n
qubits is relatively small.
Given the set E of errors that are to be corrected, what are the necessary
and sucient conditions to be satis ed by the code subspace in order that
recovery is possible? Let us denote by f jii g an orthonormal basis for the
code subspace. (We will refer to these basis elements as \codewords".) It
will clearly be necessary that
hj jE yb E ajii = 0; i 6= j; (7.14)
where E a;b 2 E . If this condition were not satis ed for some i 6= j , then
errors would be able to destroy the perfect distinguishability of orthogonal
codewords, and encoded quantum information could surely be damaged. (A
more explicit derivation of this necessary condition will be presented below.)
We can also easily see that a sucient condition is
hj jEyb E ajii = abij : (7.15)
In this case the E a's take the code subspace to a set of mutually orthogonal
\error subspaces"
Ha = E aHcode : (7.16)
Suppose, then that an arbitrary state j i in the code subspace is prepared,
and subjected to an error. The resulting state of code block and environment
is
X
E a j i jeaiE ; (7.17)
E a2E
where the sum is restricted to the errors in the set E . We may then perform
an orthogonal measurement that projects the code block onto one of the
spaces Ha, so that the state becomes
E aj i jeaiE : (7.18)
We nally apply the unitary operator E ya to the code block to complete the
recovery procedure.
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 9
A code that satis es the condition eq. (7.15) is called a nondegenerate
code. This terminology signi es that there is a measurement that can unam-
biguously diagnose the error E a 2 E that occurred. But the example of the
nine-qubit code has already taught us that more general codes are possible.
The nine-qubit code is degenerate, because phase errors acting on di erent
qubits in the same cluster of three a ect the code subspace in precisely the
same way (e.g., Z 1j i = Z 2j i). Though no measurement can determine
which qubit su ered the error, this need not pose an obstacle to successful
recovery.
The necessary and sucient condition for recovery to be possible is easily
stated:
hj jE yb E ajii = Cbaij ; (7.19)
where E a;b 2 E , and Cba = hijE ybE ajii is an arbitrary Hermitian matrix. The
nontrivial content of this condition that goes beyond the weaker necessary
condition eq. (7.14) is that hijE ybE ajii does not depend on i. The origin of
this condition is readily understood | were it otherwise, in identifying an
error subspace Ha we would acquire some information about the encoded
state, and so would inevitably disturb that state.
To prove that the condition eq. (7.19) is necessary and sucient, we
invoke the theory of superoperators developed in Chapter 3. Errors acting
on the code block are described by a superoperator, and the issue is whether
another superoperator (the recovery procedure) can be constructed that will
reverse the e ect of the error. In fact, we learned in Chapter 3 that the only
superoperators that can be inverted are unitary operators. But now we are
demanding a bit less. We are not required to be able to reverse the action of
the error superoperator on any state in the n-qubit code block; rather, it is
enough to be able to reverse the errors when the initial state resides in the
k-qubit encoded subspace.
An alternative way to express the action of an error on one of the code
basis states jii (and the environment) is
X
jii j0iE ! M jii jiE ; (7.20)

where now the states jiE are elements of an orthonormal basis for the envi-
ronment, and the matrices M are linear combinations of the Pauli operators
10 CHAPTER 7. QUANTUM ERROR CORRECTION
E a contained in E , satisfying the operator-sum normalization condition
X
M y M = I : (7.21)

The error can be reversed by a recovery superoperator if there exist operators
R such that
X y
R R = I ; (7.22)

and X
R M jii jiE j iA
;

= jii jstu iEA ; (7.23)

here the j iA 's are elements of an orthonormal basis for the Hilbert space of
the ancilla that is employed to implement the recovery operation, and the
state jstu iEA of environment and ancilla must not depend on i. It follows
that
R M jii = jii; (7.24)
for each and ; the product R M acting on the code subspace is a multiple
of the identity. Using the normalization condition satis ed by the R 's, we
infer that
y y X y ! X
M M jii = M R R M jii = jii; (7.25)

so that M y M is likewise a multiple of the identity acting on the code
subspace. In other words
hj jM y M jii = Cij ; (7.26)
since each E a in E is a linear combination of M 's, eq. (7.19) then follows.
Another instructive way to understand why eq. (7.26) is a necessary con-
dition for error recovery is to note that if the code block is prepared in the
state j i, and an error acts according to eq. (7.20), then the density matrix
for the environment that we obtain by tracing over the code block is
X
E = jiE h jM y M j iE h j: (7.27)
;
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 11
Error recovery can proceed successfully only if there is no way to acquire
any information about the state j i by performing a measurement on the
environment. Therefore, we require that E be independent of j i, if j i is
any state in the code subspace; eq. (7.26) then follows.
To see that eq. (7.26) is sucient for recovery as well as necessary, we
can explicitly construct the superoperator that reverses the error. For this
purpose it is convenient to choose our basis fjiE g for the environment so
that the matrix C in eq. (7.26) is diagonalized:
hj jM y M jii = Cij ; (7.28)
where P C = 1 follows from the operator-sum normalization condition.
For each with C 6= 0, let
R = p
1 X jiihijM y ; (7.29)
C i

so that R acts according to

q
R : M jii ! C jii: (7.30)
Then we easily see that
X
R M jii jiE j iA
;
Xq
= jii ( C j iE j iA ); (7.31)

the superoperator de ned by the R 's does indeed reverse the error. It only
remains to check that the R 's satisfy the normalization condition. We have
X y X 1 X y
R R = M jiihijM ; (7.32)
;i C
which is the orthogonal projection onto the space of states that can be reached
by errors acting on codewords. Thus we can complete the speci cation of
the recovery superoperator by adding one more element to the operator sum
| the projection onto the complementary subspace.
In brief, eq. (7.19) is a sucient condition for error recovery because it is
possible to choose a basis for the error operators (not necessarily the Pauli
12 CHAPTER 7. QUANTUM ERROR CORRECTION
operator basis) that diagonalizes the matrix Cab, and in this basis we can
unambiguously diagnose the error by performing a suitable orthogonal mea-
surement. (The eigenmodes of Cab with eigenvalue zero, like Z 1 ; Z 2 in the
case of the 9-qubit code, correspond to errors that occur with probability
zero.) We see that, once the set E of possible errors is speci ed, the recov-
ery operation is determined. In particular, no information is needed about
the states jeaiE of the environment that are associated with the errors E a.
Therefore, the code works equally e ectively to control unitary errors or de-
coherence errors (as long as the amplitude for errors outside of the set E is
negligible). Of course, in the case of a nondegenerate code, Cab is already
diagonal in the Pauli basis, and we can express the recovery basis as
X
Ra = jiihijE ya ; (7.33)
i
there is an Ra corresponding to each E a in E .
We have described error correction as a two step procedure: rst a col-
lective measurement is conducted to diagnose the error, and secondly, based
on the measurement outcome, a unitary transformation is applied to reverse
the error. This point of view has many virtues. In particular, it is the quan-
tum measurement procedure that seems to enable us to tame a continuum of
possible errors, as the measurement projects the damaged state into one of a
discrete set of outcomes, for each of which there is a prescription for recov-
ery. But in fact measurement is not an essential ingredient of quantum error
correction. The recovery superoperator of eq. (7.31) may of course be viewed
as a unitary transformation acting on the code block and an ancilla. This
superoperator can describe a measurement followed by a unitary operator if
we imagine that the ancilla is subjected to an orthogonal measurement, but
the measurement is not necessary.
If there is no measurement, we are led to a di erent perspective on the
reversal of decoherence achieved in the recovery step. When the code block
interacts with its environment, it becomes entangled with the environment,
and the Von Neumann entropy of the environment increases (as does the
entropy of the code block). If we are unable to control the environment, that
increase in its entropy can never be reversed; how then, is quantum error
correction possible? The answer provided by eq. (7.31) is that we may apply
a unitary transformation to the data and to an ancilla that we do control.
If the criteria for quantum error correction are satis ed, this unitary can be
chosen to transform the entanglement of the data with the environment into
7.3. SOME GENERAL PROPERTIES OF QECC'S 13
entanglement of ancilla with environment, restoring the purity of the data in
the process, as in:

{ Figure {

While measurement is not a necessary part of error correction, the ancilla

is absolutely essential. The ancilla serves as a depository for the entropy in-
serted into the code block by the errors | it \heats" as the data \cools." If
we are to continue to protect quantum information stored in quantum mem-
ory for a long time, a continuous supply of ancilla qubits should be provided
that can be discarded after use. Alternatively, if the ancilla is to be recycled,
it must rst be erased. As discussed in Chapter 1, the erasure is dissipative
and requires the expenditure of power. Thus principles of thermodynamics
dictate that we cannot implement (quantum) error correction for free. Errors
cause entropy to seep into the data. This entropy can be transferred to the
ancilla by means of a reversible process, but work is needed to pump entropy
from the ancilla back to the environment.

7.3 Some General Properties of QECC's

7.3.1 Distance
A quantum code is said to be binary if it can be represented in terms of
qubits. In a binary code, a code subspace of dimension 2k is embedded in a
space of dimension 2n , where k and n > k are integers. There is actually no
need to require that the dimensions of these spaces be powers of two (see the
exercises); nevertheless we will mostly con ne our attention here to binary
coding, which is the simplest case.
In addition to the block size n and the number of encoded qubits k,
another important parameter characterizing a code is its distance d. The
distance d is the minimum weight of a Pauli operator E such that
hijE ajj i 6= Caij : (7.34)
We will describe a quantum code with block size n; k encoded qubits, and
distance d as an \[[n; k; d]] quantum code." We use the double-bracket no-
14 CHAPTER 7. QUANTUM ERROR CORRECTION
tation for quantum codes, to distinguish from the [n; k; d] notation used for
classical codes.
We say that an QECC can correct t errors if the set E of E a's that allow
recovery includes all Pauli operators of weigh t or less. Our de nition of
distance implies that the criterion for error correction
hijE yaE bjj i = Cabij ; (7.35)
will be satis ed by all Pauli operators E a;b of weight t or less, provided that
d 2t + 1. Therefore, a QECC with distance d = 2t + 1 can correct t errors.

7.3.2 Located errors

A distance d = 2t + 1 code can correct t errors, irrespective of the location of
the errors in the code block. But in some cases we may know that particular
qubits are especially likely to have su ered errors. Perhaps we saw a hammer
strike those qubits. Or perhaps you sent a block of n qubits to me, but t < n
of the qubits were lost and never received. I am con dent that the n ; t
qubits that did arrive were well packaged and were received undamaged. But
I replace the t missing qubits with the (arbitrarily chosen) state j00 : : : 0i,
realizing full well that these qubits are likely to be in error.
A given code can protect against more errors if the errors occur at known
locations instead of unknown locations. In fact, a QECC with distance d =
t + 1 can correct t errors at known locations. In this case, the set E of errors
to be corrected is the set of all Pauli operators with support at the t speci ed
locations (each E a acts trivially on the other n ; t qubits). But then, for each
E a and E b in E , the product E ya E b also has weight at most t. Therefore,
the error correction criterion is satis ed for all E a;b 2 E , provided the code
has distance at least t + 1.
In particular, a QECC that corrects t errors in arbitrary locations can
correct 2t errors in known locations.

7.3.3 Error detection

In some cases we may be satis ed to detect whether an error has occurred,
even if we are unable to fully diagnose or reverse the error. A measurement
designed for error detection has two possible outcomes: \good" and \bad."
7.3. SOME GENERAL PROPERTIES OF QECC'S 15
If the good outcome occurs, we are assured that the quantum state is un-
damaged. If the bad outcome occurs, damage has been sustained, and the
state should be discarded.
If the error superoperator has its support on the set E of all Pauli op-
erators of weight up to t, and it is possible to make a measurement that
correctly diagnoses whether an error has occurred, then it is said that we can
detect t errors. Error detection is easier than error correction, so a given code
can detect more errors than it can correct. In fact, a QECC with distance
d = t + 1 can detect t errors.
Such a code has the property that
hijE ajj i = Caij (7.36)
for every Pauli operator E a of weight t or less, or
E a jii = Ca jii + j'?aii ; (7.37)
where j'?aii is an unnormalized vector orthogonal to the code subspace.
Therefore, the action on a state j i in the code subspace of an error su-
peroperator with support on E is
0 1
X X
j i j0iE ! E a j i jeaiE = j i @ CajeaiE A + jorthogi ;
E a 2E E a 2E (7.38)
where jorthogi denotes a vector orthogonal to the code subspace.
Now we can perform a \fuzzy" orthogonal measurement on the data, with
two outcomes: the state is projected onto either the code subspace or the
complementary subspace. If the rst outcome is obtained, the undamaged
state j i is recovered. If the second outcome is found, an error has been
detected. We conclude that our QECC with distance d can detect d ; 1
errors. In particular, then, a QECC that can correct t errors can detect 2t
errors.

7.3.4 Quantum codes and entanglement

A QECC protects quantum information from error by encoding it nonlo-
cally, that is, by sharing it among many qubits in a block. Thus a quantum
codeword is a highly entangled state.
16 CHAPTER 7. QUANTUM ERROR CORRECTION
In fact, a distance d = t+1 nondegenerate code has the following property:
Choose any state j i in the code subspace and any t qubits in the block.
Trace over the remaining n ; t qubits to obtain
(t) = tr(n;t) j ih j ; (7.39)
the density matrix of the t qubits. Then this density matrix is totally random:
1
(t) = t I ; (7.40)
2
(In any distance-(t + 1) code, we cannot acquire any information about the
encoded data by observing any t qubits in the block; that is, (t) is a constant,
independent of the codeword. But only if the code is nondegenerate will the
density matrix of the t qubits be a multiple of the identity.)
To verify the property eq. (7.40), we note that for a nondegenerate distance-
(t + 1) code,
hijE ajji = 0 (7.41)
for any E a of nonzero weight up to t, so that
tr((t)E a) = 0; (7.42)
for any t-qubit Pauli operator E a other than the identity. Now (t), like any
Hermitian 2t 2t matrix, can be expanded in terms of Pauli operators:
1 X
(t )
= t I+ aE a : (7.43)
2 E a 6=I
Since the E a's satisfy
1
2t tr(E aE b) = ab ; (7.44)

we nd that each a = 0, and we conclude that (t) is a multiple of the

identity.
7.4. PROBABILITY OF FAILURE 17
7.4 Probability of Failure
7.4.1 Fidelity bound
If the support of the error superoperator contains only the Pauli operators
in the set E that we know how to correct, then we can recover the encoded
quantum information with perfect delity. But in a realistic error model,
there will be a small but nonzero amplitude for errors that are not in E , so
that the recovered state will not be perfect. What can we say about the
delity of the recovered state?
The Pauli operator expansion of the error superoperator can be divided
into a sum over the \good" operators (those in E ), and the \bad" ones (those
not in E ), so that it acts on a state j i in the code subspace according to
X
j i j0iE ! E aj i jeaiE
X a X
E a j i jeaiE + E b j i jeb iE
E a 2E E b 62E
jGOODi + jBADi : (7.45)
The recovery operation (a unitary acting on the data and the ancilla) then
maps jGOODi to a state jGOOD0i of data, environment, and ancilla, and
jBADi to a state jBAD0i, so that after recovery we obtain the state
jGOOD0i + jBAD0i ; (7.46)
here (since recovery works perfectly acting on the good state)
jGOOD0i = j i jsiEA ; (7.47)
where jsiEA is some state of the environment and ancilla.
Suppose that the states jGOODi and jBADi are orthogonal. This would
hold if, in particular, all of the \good" states of the environment are orthog-
onal to all of the \bad" states; that is, if
heajebi = 0 for E a 2 E ; E b 62 E : (7.48)
Let rec denote the density matrix of the recovered state, obtained by tracing
out the environment and ancilla, and let
F = h jrecj i (7.49)
18 CHAPTER 7. QUANTUM ERROR CORRECTION
be its delity. Now, since jBAD0i is orthogonal to jGOOD0i (that is, jBAD0i
has no component along j ijsiEA ), the delity will be
F = h jGOOD j i + h jBAD j i ;
0 0 (7.50)
where
GOOD = trEA (jGOOD0ihGOOD0 j) ; BAD = trEA (jBAD0 ihBAD0 j) :
0 0

(7.51)
The delity of the recovered state therefore satis es
F h jGOOD j i =k jsiEA k2=k jGOOD0i k2 :
0 (7.52)
Furthermore, since the recovery operation is unitary, we have k jGOOD0i k=
k jGOODi k, and hence
X
F k jGOODi k2=k E a j i jeaiE k2 : (7.53)
E a 2E
In general, though, jBADi need not be orthogonal to jGOODi, so that
jBAD0i need not be orthogonal to jGOOD0i. Then jBAD0i might have a
component along jGOOD0i that interferes destructively with jGOOD0i and
so reduces the delity. We can still obtain a lower bound on the delity in
this more general case by resolving jBAD0i into a component along jGOOD0i
and an orthogonal component, as
jBAD0i = jBAD0ki + jBAD0?i (7.54)
Then reasoning just as above we obtain
F k jGOOD0i + jBAD0ki k2 (7.55)
Of course, since both the error operation and the recovery operation are uni-
tary acting on data, environment, and ancilla, the complete state jGOOD0i +
jBAD0i is normalized, or
k jGOOD0i + jBAD0ki k2 + k jBAD0?i k2= 1 ; (7.56)
and eq. (7.55) becomes
F 1; k jBAD0?i k2 : (7.57)
7.4. PROBABILITY OF FAILURE 19
Finally, the norm of jBAD0? i cannot exceed the norm of jBAD0i, and we
conclude that
X
1 ; F k jBAD0i k2=k jBADi k2k E b j i jeb iE k2 :
E b 62E (7.58)
This is our general bound on the \failure probability" of the recovery oper-
ation. The result eq. (7.53) then follows in the special case where jGOODi
and jBADi are orthogonal states.

7.4.2 Uncorrelated errors

Let's now consider some implications of these results for the case where errors
acting on distinct qubits are completely uncorrelated. In that case, the error
superoperator is a tensor product of single-qubit superoperators. If in fact
the errors act on all the qubits in the same way, we can express the n-qubit
superoperator as
h in
n) = $(1)
$(error error ; (7.59)
where $(1)
error is a one-qubit superoperator whose action (in its unitary repre-
sentation) has the form
j i j0iE ! j i jeI iE + X j i jeX iE + Y j i jeY iE
+Z j i jeZ iE : (7.60)
The e ect of the errors on encoded information is especially easy to analyze
if we suppose further that each of the three states of the environment jeX;Y;Z i
is orthogonal to the state jeI i. In that case, a record of whether or not an
error occurred for each qubit is permanently imprinted on the environment,
and it is sensible to speak of a probability of error perror for each qubit, where
heI jeI i = 1 ; perror : (7.61)
If our quantum code can correct t errors, then the \good" Pauli operators
have weight up to t, and the \bad" Pauli operators have weight greater than
t; recovery is certain to succeed unless at least t + 1 qubits are subjected to
errors. It follows that the delity obeys the bound
Xn n n
1;F s n;
perror (1 ; perror) t + 1 pterror
s +1 :
s=t+1 s (7.62)
20 CHAPTER 7. QUANTUM ERROR CORRECTION
n
(For each of the t+1 ways of choosing t + 1 locations, the probability that
errors occurs at every one of those locations is pterror
+1 , where we disregard
whether additional errors occur at the remaining n ; t ; 1 locations. There-
fore, the nal expression in eq. (7.62) is an upper bound on the probability
that at least t +1 errors occur in the block of n qubits.) For perror small and t
large, the delity of the encoded data is a substantial improvement over the
delity F = 1 ; O(p) maintained by an unprotected qubit.
For a general error superoperator acting on a single qubit, there is no clear
notion of an \error probability;" the state of the qubit and its environment
obtained when the Pauli operator I acts is not orthogonal to (and so cannot
be perfectly distinguished from) the state obtained when the Pauli operators
X , Y , and Z act. In the extreme case there is no decoherence at all | the
\errors" arise because unknown unitary transformations act on the qubits.
(If the unitary transformation U acting on a qubit were known, we could
recover from the \error" simply by applying U y.)
Consider uncorrelated unitary errors acting on the n qubits in the code
block, each of the form (up to an irrelevant phase)
q p
U (1) = 1 ; p + i p W ; (7.63)
where W is a (traceless, Hermitian) linear combination of X , Y , and Z ,
satisfying W 2 = I . If the state j i of the qubit is prepared, and then the
unitary error eq. (7.63) occurs, the delity of the resulting state is
2
F = h jU (1)j i = 1 ; p 1 ; (h jW j i)2 1 ; p :
(7.64)
If a unitary error of the form eq. (7.63) acts on each of the n qubits in the
code block, and the resulting state is expanded in terms of Pauli operators
as in eq. (7.45), then the state jBADi (which arises from
p terms in which W
acts on at least t + 1 qubits) has a norm of order ( p) +1, and eq. (7.58)
t
becomes
1 ; F = O(pt+1 ) : (7.65)
We see that coding provides an improvement in delity of the same order
irrespective of whether the uncorrelated errors are due to decoherence or due
to unknown unitary transformations.
7.5. CLASSICAL LINEAR CODES 21
To avoid confusion, let us emphasize the meaning of \uncorrelated" for
the purpose of the above discussion. We consider a unitary error acting on
n qubits to be \uncorrelated" if it is a tensor product of single-qubit unitary
transformations, irrespective of how the unitaries acting on distinct qubits
might be related to one another. For example, an \error" whereby all qubits
rotate by an angle about a common axis is e ectively dealt with by quantum
error correction; after recovery the delity will be F = 1 ; O(2(t+1)), if the
code can protect against t uncorrelated errors. In contrast, a unitary error
that would cause more trouble is one of the form U (n) 1 + iE(bad n)
, where
n)
E (bad is an n-qubit Pauli operator whose weight is greater than t. Then
jBADi has a norm of order , and the typical delity after recovery will be
F = 1 ; O(2).

7.5 Classical Linear Codes

Quantum error-correcting codes were rst invented less than four years ago,
but classical error-correcting codes have a much longer history. Over the past
fty years, a remarkably beautiful and powerful theory of classical coding has
been erected. Much of this theory can be exploited in the construction of
QECC's. Here we will quickly review just a few elements of the classical
theory, con ning our attention to binary linear codes.
In a binary code, k bits are encoded in a binary string of length n. That
is, from among the 2n strings of length n, we designate a subset containing
2k strings { the codewords. A k-bit message is encoded by selecting one of
these 2k codewords.
In the special case of a binary linear code, the codewords form a k-
dimensional closed linear subspace C of the binary vector space F2n. That is,
the bitwise XOR of two codewords is another codeword. The space C of the
code is spanned by a basis of k vectors v1; v2; : : : ; vk; an arbitrary codeword
may be expressed as a linear combination of these basis vectors:
X
v ( 1; : : : ; k ) = i vi ; (7.66)
i

where each i 2 f0; 1g, and addition is modulo 2. We may say that the
length-n vector v( 1 : : : k ) encodes the k-bit message = ( 1; : : : ; k ).
22 CHAPTER 7. QUANTUM ERROR CORRECTION
The k basis vectors v1; : : :vk may be assembled into a k n matrix
0 1
BB v..1 CC
G = @ . A; (7.67)
vk
called the generator matrix of the code. Then in matrix notation, eq. (7.66)
can be rewritten as
v( ) = G ; (7.68)
the matrix G, acting to the left, encodes the message .
An alternative way to characterize the k-dimensional code subspace of
F2n is to specify n ; k linear constraints. There is an (n ; k) n matrix H
such that
Hv = 0 (7.69)
for all those and only those vectors v in the code C . This matrix H is called
the parity check matrix of the code C . The rows of H are n ; k linearly
independent vectors, and the code space is the space of vectors that are
orthogonal to all of these vectors. Orthogonality is de ned with respect to
the mod 2 bitwise inner product; two length-n binary strings are orthogonal
is they \collide" (both take the value 1) at an even number of locations. Note
that
HGT = 0 ; (7.70)
where GT is the transpose of G; the rows of G are orthogonal to the rows of
H.
For a classical bit, the only kind of error is a bit ip. An error occurring
in an n-bit string can be characterized by an n-component vector e, where
the 1's in e mark the locations where errors occur. When aicted by the
error e, the string v becomes
v !v+e : (7.71)
Errors can be detected by applying the parity check matrix. If v is a code-
word, then
H (v + e) = Hv + He = He : (7.72)
7.5. CLASSICAL LINEAR CODES 23
He is called the syndrome of the error e. Denote by E the set of errors
feig that we wish to be able to correct. Error recovery will be possible if
and only if all errors ei have distinct syndromes. If this is the case, we can
unambiguously diagnose the error given the syndrome He, and we may then
recover by ipping the bits speci ed by e as in
v + e ! (v + e) + e = v : (7.73)
On the other hand, if He1 = He2 for e1 6= e2 then we may misinterpret an
e1 error as an e2 error; our attempt at recovery then has the e ect
v + e1 ! v + (e1 + e2) 6= v: (7.74)
The recovered message v + e1 + e2 lies in the code, but it di ers from the
intended message v; the encoded information has been damaged.
The distance d of a code C is the minimum weight of any vector v 2 C ,
where the weight is the number of 1's in the string v. A linear code with
distance d = 2t + 1 can correct t errors; the code assigns a distinct syndrome
to each e 2 E , where E contains all vectors of weight t or less. This is so
because, if He1 = He2, then
0 = He1 + He2 = H (e1 + e2) ; (7.75)
and therefore e1 + e2 2 C . But if e1 and e2 are unequal and each has weight
no larger than t, then the weight of e1 + e2 is greater than zero and no larger
than 2t. Since d = 2t + 1, there is no such vector in C . Hence He1 and He2
cannot be equal.
A useful concept in classical coding theory is that of the dual code. We
have seen that the k n generator matrix G and the (n ; k) n parity check
matrix H of a code C are related by HGT = 0. Taking the transpose, it
follows that GH T = 0. Thus we may regard H T as the generator and G as
the parity check of an (n ; k)-dimensional code, which is denoted C ? and
called the dual of C . In other words, C ? is the orthogonal complement of
C in F2n. A vector is self-orthogonal if it has even weight, so it is possible
for C and C ? to intersect. A code contains its dual if all of its codewords
have even weight and are mutually orthogonal. If n = 2k it is possible that
C = C ?, in which case C is said to be self-dual.
An identity relating the code C and its dual C ? will prove useful in the
24 CHAPTER 7. QUANTUM ERROR CORRECTION
following section:
8k
X >
<2 u 2 C ?
(;1) = >
v u : (7.76)
v2C :0 u 26 C ?
The nontrivial content of the identity is the statement that the sum vanishes
for u 62 C ?. This readily follows from the familiar identity
X
(;1)vw = 0; w 6= 0; (7.77)
v2f0;1gk

where v and w are strings of length k. We can express v 2 G as

v = G; (7.78)
where is a k-vector. Then
X vu X
(;1) = (;1) Gu = 0; (7.79)
v 2C 2f0;1gk

for Gu 6= 0. Since G, the generator matrix of C , is the parity check matrix

for C ?, we conclude that the sum vanishes for u 62 C ?.

7.6 CSS Codes

Principles from the theory of classical linear codes can be adapted to the
construction of quantum error-correcting codes. We will describe here a
family of QECC's, the Calderbank{Shor{Steane (or CSS) codes, that exploit
the concept of a dual code.
Let C1 be a classical linear code with (n ; k1) n parity check matrix H1,
and let C2 be a subcode of C1, with (n;k2)n parity check H2, where k2 < k1.
The rst n ; k1 rows of H2 coincide with those of H1 , but there are k1 ; k2
additional linearly independent rows; thus each word in C2 is contained in
C1, but the words in C2 also obey some additional linear constraints.
The subcode C2 de nes an equivalence relation in C1; we say that u; v 2
C1 are equivalent (u v) if and only if there is a w in C2 such that u = v + w.
The equivalence classes are the cosets of C2 in C1.
7.6. CSS CODES 25
A CSS code is a k = k1 ; k2 quantum code that associates a codeword
with each equivalence class. Each element of a basis for the code subspace
can be expressed as
X
jwi = p1k2 jv + wi ; (7.80)
2 v2C2
an equally weighted superposition of all the words in the coset represented by
w. There are 2k1 ;k2 cosets, and hence 2k1 ;k2 linearly independent codewords.
The states jwi are evidently normalized and mutually orthogonal; that is,
hwjw0i = 0 if w and w0 belong to di erent cosets.
Now consider what happens to the codeword jwi if we apply the bitwise
Hadamard transform H (n):
1 X jv + wi
H (n) : jw iF p k
2 2 v2C2
X X
! jwiP p12n p1k2 (;1)uv (;1)uw jui
u 2 v2C2
X
= p 1n;k2 (;1)uw jui ; (7.81)
2 u2C2 ?

we obtain a coherent superposition, weighted by phases, of words in the dual

code C2? (in the last step we have used the identity eq. (7.76)). It is again
manifest in this last expression that the codeword depends only on the C2
coset that w represents | shifting w by an element of C2 has no e ect on
(;1)uw if u is in the code dual to C2.
Now suppose that the code C1 has distance d1 and the code C2? has
distance d?2 , such that
d1 2tF + 1 ;
d?2 2tP + 1 : (7.82)
Then we can see that the corresponding CSS code can correct tF bit ips
and tP phase ips. If e is a binary string of length n, let E (e ip) denote the
Pauli operator with an X acting at each location i where ei = 1; it acts on
the state jvi according to
E (e ip) : jv i ! jv + ei : (7.83)
26 CHAPTER 7. QUANTUM ERROR CORRECTION
And let E (phase)
e denote the Pauli operator with a Z acting where ei = 1; its
action is
E (phase)
e : jvi ! (;1)v:e jvi ; (7.84)
which in the Hadamard rotated basis becomes
E (phase)
e : jui ! ju + ei : (7.85)
Now, in the original basis (the F or \ ip" basis), each basis state jwiF of
the CSS code is a superposition of words in the code C1. To diagnose bit ip
error, we perform on data and ancilla the unitary transformation
jvi j0iA ! jvi jH1viA ; (7.86)
and then measure the ancilla. The measurement result H1eF is the bit ip
syndrome. If the number of ips is tF or fewer, we may correctly infer from
this syndrome that bit ips have occurred at the locations labeled by eF . We
recover by applying X to the qubits at those locations.
To correct phase errors, we rst perform the bitwise Hadamard transfor-
mation to rotate from the F basis to the P (\phase") basis. In the P basis,
each basis state jwiP of the CSS code is a superposition of words in the code
C2?. To diagnose phase errors, we perform a unitary transformation
jvi j0iA ! jvi jG2viA ; (7.87)
and measure the ancilla (G2, the generator matrix of C2, is also the parity
check matrix of C2?). The measurement result G2eP is the phase error syn-
drome. If the number of phase errors is tP or fewer, we may correctly infer
from this syndrome that phase errors have occurred at locations labeled by
eP . We recover by applying X (in the P basis) to the qubits at those lo-
cations. Finally, we apply the bitwise Hadamard transformation once more
to rotate the codewords back to the original basis. (Equivalently, we may
recover from the phase errors by applying Z to the a ected qubits after the
rotation back to the F basis.)
If eF has weight less than d1 and eP has weight less than d?2 , then
hwjE (phase)
eP E (eFip)jw 0 i = 0 (7.88)
(unless eF = eP = 0). Any Pauli operator can be expressed as a product of
a phase operator and a ip operator | a Y error is merely a bit ip and
7.7. THE 7-QUBIT CODE 27
phase error both aicting the same qubit. So the distance d of a CSS code
satis es
d min(d1; d?2 ) : (7.89)
CSS codes have the special property (not shared by more general QECC's)
that the recovery procedure can be divided into two separate operations, one
to correct the bit ips and the other to correct the phase errors.
The unitary transformation eq. (7.86) (or eq. (7.87)) can be implemented
by executing a simple quantum circuit. Associated with each of the n ; k1
rows of the parity check matrix H1 is a bit of the syndrome to be extracted.
To nd the ath bit of the syndrome, we prepare an ancilla bit in the state
j0iA;a, and for each value of with (H1)a = 1, we execute a controlled-NOT
gate with the ancilla bit as the target and qubit in the data block as the
control. When measured, the ancilla qubit reveals the value of the parity
check bit P(H1)av.
Schematically, the full error correction circuit for a CSS code has the
form:

{ Figure {

Separate syndromes are measured to diagnose the bit ip errors and the phase
errors. An important special case of the CSS construction arises when a code
C contains its dual C ?. Then we may choose C1 = C and C2 = C ? C ; the
C parity check is computed in both the F basis and the P basis to determine
the two syndromes.

7.7 The 7-Qubit Code

The simplest of the CSS codes is the [[n; k; d]] = [7; 1; 3] quantum code rst
formulated by Andrew Steane. It is constructed from the classical 7-bit
Hamming code.
The Hamming code is an [n; k; d] = [7; 4; 3] classical code with the 3 7
28 CHAPTER 7. QUANTUM ERROR CORRECTION
parity check matrix
0 1
1 0 1 0 1 0 1
H=B
@ 0 1 1 0 0 1 1 CA : (7.90)
0 0 0 1 1 1 1
To see that the distance of the code is d = 3, rst note that the weight-3
string (1110000) passes the parity check and is, therefore, in the code. Now
we need to show that there are no vectors of weight 1 or 2 in the code. If e1
has weight 1, then He1 is one of the columns of H . But no column of H is
trivial (all zeros), so e1 cannot be in the code. Any vector of weight 2 can be
expressed as e1 + e2, where e1 and e2 are distinct vectors of weight 1. But
H (e1 + e2) = He1 + He2 6= 0; (7.91)
because all columns of H are distinct. Therefore e1 + e2 cannot be in the
code.
The rows of H themselves pass the parity check, and so are also in the
code. (Contrary to one's usual linear algebra intuition, a nonzero vector over
the nite eld F2 can be orthogonal to itself.) The generator matrix G of
the Hamming code can be written as
01 0 1 0 1 0 11
B 0 1 1 0 0 1 1 CC
G=B B@ 0 0 0 1 1 1 1 CA ; (7.92)
1 1 1 0 0 0 0
the rst three rows coincide with the rows of H , and the weight-3 codeword
(1110000) is appended as the fourth row.
The dual of the Hamming code is the [7; 3; 4] code generated by H . In
this case the dual of the code is actually contained in the code | in fact, it
is the even subcode of the Hamming code, containing all those and only those
Hamming codewords that have even weight. The odd codeword (1110000)
is a representative of the nontrivial coset of the even subcode. For the CSS
construction, we will choose C1 to be the Hamming code, and C2 to be its
dual, the even subcode.. Therefore, C2? = C1 is again the Hamming code;
we will use the Hamming parity check both to detect bit ips in the F basis
and to detect phase ips in the P basis.
7.7. THE 7-QUBIT CODE 29
In the F basis, the two orthonormal codewords of this CSS code, each
associated with a distinct coset of the even subcode, can be expressed as
X
j0iF = p18 jvi ;
even v
Hamming
2

X
j1iF = p18 jvi : (7.93)
odd v
2 Hamming

Since both j0i and j1i are superpositions of Hamming codewords, bit ips
can be diagnosed in this basis by performing an H parity check. In the
Hadamard rotated basis, these codewords become
1 X 1 (j0i + j1i )
(7)
H : j0iF ! j0iP jv i = p
4 v2 Hamming 2 F F
X
j1iF ! j1iP 14 (;1)wt(v)jvi = p1 (j0iF ; j1iF ):
2
v2 Hamming (7.94)
In this basis as well, the states are superpositions of Hamming codewords,
so that bit ips in the P basis (phase ips in the original basis) can again
be diagnosed with an H parity check. (We note in passing that for this
code, performing the bitwise Hadamard transformation also implements a
Hadamard rotation on the encoded data, a point that will be relevant to our
discussion of fault-tolerant quantum computation in the next chapter.)
Steane's quantum code can correct a single bit ip and a single phase
ip on any one of the seven qubits in the block. But recovery will fail if
two di erent qubits both undergo either bit ips or phase ips. If e1 and e2
are two distinct weight-one strings then He1 + He2 is a sum of two distinct
columns of H , and hence a third column of H (all seven of the nontrivial
strings of length 3 appear as columns of H .) Therefore, there is another
weight-one string e3 such that He1 + He2 = He3, or
H (e1 + e2 + e3) = 0 ; (7.95)
thus e1 + e2 + e3 is a weight-3 word in the Hamming code. We will interpret
the syndrome He3 as an indication that the error v ! v + e3 has arisen, and
we will attempt to recover by applying the operation v ! v + e3. Altogether
30 CHAPTER 7. QUANTUM ERROR CORRECTION
then, the e ect of the two bit ip errors and our faulty attempt at recovery
will be to add e1 + e2 + e3 (an odd-weight Hamming codeword) to the data,
which will induce a ip of the encoded qubit
j0iF $ j1iF : (7.96)
Similarly, two phase ips in the F basis are two bit ips in the P basis, which
(after the botched recovery) induce on the encoded qubit
j0iP $ j1iP ; (7.97)
or equivalently
j0iF ! j0iF
j1iF ! ;j1iF ; (7.98)
a phase ip of the encoded qubit in the F basis. If there is one bit ip and
one phase ip (either on the same qubit or di erent qubits) then recovery
will be successful.

7.8 Some Constraints on Code Parameters

Shor's code protects one encoded qubit from an error in any single one of
nine qubits in a block, and Steane's code reduces the block size from nine to
seven. Can we do better still?

7.8.1 The Quantum Hamming bound

To understand how much better we might do, let's see if we can derive any
bounds on the distance d = 2t + 1 of an [[n; k; d]] quantum code, for given
n and k. At rst, suppose we limit our attention to nondegenerate codes,
which assign a distinct syndrome to each possible error. On a given qubit,
n linearly independent errors X ; Y ; or Z . In a block
there are three possible
of n qubits, there are j ways to choose j qubits that are a ected by errors,
and three possible errors for each of these qubits; therefore the total number
of possible errors of weight up to t is
X t !
N (t) = 3 j :j n (7.99)
j =0
7.8. SOME CONSTRAINTS ON CODE PARAMETERS 31
If there are k encoded qubits, then there are 2k linearly independent
codewords. If all E ajj i's are linearly independent, where E a is any error
of weight up to t and jii is any element of a basis for the codewords, then
the dimension 2n of the Hilbert space of n qubits must be large enough to
accommodate N (t) 2k independent vectors; hence
Xt j n ! n;k
N (t) = 3 j 2 : (7.100)
j =0
This result is called the quantum Hamming bound. An analogous bound
applies to classical block codes, but without the factor of 3j , since there is
only one type of error (a ip) that can a ect a classical bit. We also emphasize
that the quantum Hamming bound applies only in the case of nondegenerate
coding, while the classical Hamming bound applies in general. However, no
degenerate quantum codes that violate the quantum Hamming code have yet
been constructed (as of January, 1999).
In the special case of a code with one encoded qubit (k = 1) that corrects
one error (t = 1), the quantum Hamming bound becomes
1 + 3n 2n;1 ; (7.101)
which is satis ed for n 5. In fact, the case n = 5 saturates the inequality
(1 + 15 = 16). A nondegenerate [[5; 1; 3]] quantum code, if it exists, is
perfect: The entire 32-dimensional Hilbert space of the ve qubits is needed
to accommodate all possible one-qubit errors acting on all codewords | there
is no wasted space.

7.8.2 The no-cloning bound

We could still wonder, though, if there is a degenerate n = 4 code that can
correct one error. In fact, it is easy to see that no such code can exist. We
already know that a code that corrects t errors at arbitrary locations can
also be used to correct 2t errors at known locations. Suppose that we have
a [[4; 1; 3]] quantum code. Then we could encode a single qubit in the four-
qubit block, and split the block into two sub-blocks, each containing two
qubits.

{ Figure {
32 CHAPTER 7. QUANTUM ERROR CORRECTION
If we append j00i to each of those two sub-blocks, then the original block
has spawned two o spring, each with two located errors. If we were able to
correct the two located errors in each of the o spring, we would obtain two
identical copies of the parent block | we would have cloned an unknown
quantum state, which is impossible. Therefore, no [[4; 1; 3]] quantum code
can exist. We conclude that n = 5 is the minimal block size of a quantum
code that corrects one error, whether the code is degenerate or not.
The same reasoning shows that an [[n; k 1; d]] code can exist only for
n > 2(d ; 1) : (7.102)

7.8.3 The quantum Singleton bound

We will now see that this result eq. (7.102) can be strengthened to
n ; k 2(d ; 1): (7.103)
Eq. (7.103) resembles the Singleton bound on classical code parameters,
n ; k d ; 1; (7.104)
and so has been called the \quantum Singleton bound." For a classical linear
code, the Singleton bound is a near triviality: the code can have distance d
only if any d ; 1 columns of the parity check matrix are linearly independent.
Since the columns have length n ; k, at most n ; k columns can be linearly
independent; therefore d ; 1 cannot exceed n ; k. The Singleton bound also
applies to nonlinear codes.
An elegant proof of the quantum Singleton bound can be found that
exploits the subadditivity of the Von Neumann entropy discussed in x5.2.
We begin by introducing a k-qubit ancilla, and constructing a pure state
that maximally entangles the ancilla with the 2k codewords of the QECC:
X
j iAQ = p1 k jxiA jxiQ ; (7.105)
2
where fjxiAg denotes an orthonormal basis for the 2k -dimensional Hilbert
space of the ancilla, and fjxiQg denotes an orthonormal basis for the 2k -
dimensional code subspace. If we trace over the length-n code block Q, the
density matrix A of the ancilla is 21k 1, which has entropy
S (A) = k = S (Q): (7.106)
7.8. SOME CONSTRAINTS ON CODE PARAMETERS 33
Now, if the code has distance d, then d ; 1 located errors can be corrected;
or, as we have seen, no observable acting on d ; 1 of the n qubits can reveal
any information about the encoded state. Equivalently, the observable can
reveal nothing about the state of the ancilla in the entangled state j i.
Now, since we already know that n > 2(d ; 1) (if k 1), let us imagine
dividing the code block Q into three disjoint parts: a set of d ; 1 qubits Q(1)
d;1 ,
another disjoint set of d ; 1 qubits Qd;1, and the remaining qubits Qn;2(d;1).
(2) (3)
If we trace out Q(2) and Q(3), the density matrix we obtain must contain no
correlations between Q(1) and the ancilla A. This means that the entropy of
system AQ(1) is additive:
S (Q(2)Q(3)) = S (AQ(1)) = S (A) + S (Q(1)): (7.107)
Similarly,
S (Q(1)Q(3)) = S (AQ(2)) = S (A) + S (Q(2)): (7.108)
Furthermore, in general, Von Neumann entropy is subadditive, so that
S (Q(1)Q(3)) S (Q(1)) + S (Q(3))
S (Q(2)Q(3)) S (Q(2)) + S (Q(3)) (7.109)
Combining these inequalities with the equalities above, we nd
S (A) + S (Q(2)) S (Q(1)) + S (Q(3))
S (A) + S (Q(1)) S (Q(2)) + S (Q(3)): (7.110)
Both of these inequalities can be simultaneously satis ed only if
S (A) S (Q(3)) (7.111)
Now Q(3) has dimension n ; 2(d ; 1), and its entropy is bounded above by
its dimension so that
S (A) = k n ; 2(d ; 1); (7.112)
which is the quantum Singleton bound.
The [[5; 1; 3]] code saturates this bound, but for most values of n and
k the bound is not tight. Rains has obtained the stronger result that an
[[n; k; 2t + 1]] code with k 1 must satisfy

t n +6 1 ; (7.113)
34 CHAPTER 7. QUANTUM ERROR CORRECTION
(where [x] = \ oor x" is the greatest integer greater than or equal to x.
Thus, the minimal length of a k = 1 code that can correct t = 1; 2; 3; 4; 5
errors is n = 5; 11; 17; 23; 29 respectively. Codes with all of these parameters
have actually been constructed, except for the [[23; 1; 9]] code.

7.9 Stabilizer Codes

7.9.1 General formulation
We will be able to construct a (nondegenerate) [[5; 1; 3]] quantum code, but
to do so, we will need a more powerful procedure for constructing quantum
codes than the CSS procedure.
Recall that to establish a criterion for when error recovery is possible, we
found it quite useful to expand an error superoperator in terms of the n-qubit
Pauli operators. But up until now we have not exploited the group structure
of these operators (a product of Pauli operators is a Pauli operator). In fact,
we will see that group theory is a powerful tool for constructing QECC's.
For a single qubit, we will nd it more convenient now to choose all of
the Pauli operators to be represented by real matrices, so I will now use a
notation in which Y denotes the anti-hermitian matrix
!
Y = ZX = i y = ;1 0 ;
0 1 (7.114)

satisfying Y 2 = ;I . Then the operators

fI ; X ; Y ; Z g fI ; X ; Y ; Z g; (7.115)
are the elements of a group of order 8.1 The n-fold tensor products of single-
qubit Pauli operators also form a group
Gn = fI ; X ; Y ; Z gn ; (7.116)
of order jGnj = 22n+1 (since there are 4n possible tensor products, and another
factor of 2 for the sign) we will refer to Gn as the n-qubit Pauli group.
(In fact, we will use the term \Pauli group" both to refer to the abstract
1 It
is not the quaternionic group but the other non-abelian group of order 8 | the
symmetry group of the square. The element Y , of order 4, can be regarded as the 90
rotation of the plane, while X and Z are re ections about two orthogonal axes.
7.9. STABILIZER CODES 35
group Gn , and to its dimension-2n faithful unitary representation by tensor
products of 2 2 matrices; its only irreducible representation of dimension
greater than 1.) Note that Gn has the two element center Z2 = fI n g. If
we quotient out its center, we obtain the group Gn Gn =Z2 ; this group can
also be regarded as a binary vector space of dimension 22n, a property that
we will exploit below.
The (2n -dimensional representation of the) Pauli group Gn evidently has
these properties:
(i) Each M 2 Gn is unitary, M ;1 = M y.
(ii) For each element M 2 Gn ; M 2 = I I n . Furthermore,2 M 2 = I
if the number of Y 's in the tensor product is even, and M = ;I if
the number of Y 's is odd.
(iii) If M 2 = I , then M is hermitian (M = M y); if M 2 = ;I , then M is
anti-hermitian (M = ;M ). y

(iv) Any two elements M ; N 2 Gn either commute or anti-commute: MN =

NM .
We will use the Pauli group to characterize a QECC in the following way:
Let S denote an abelian subgroup of the n-qubit Pauli group Gn . Thus all
elements of S acting on H2n can be simultaneously diagonalized. Then the
stabilizer code HS H2n associated with S is the simultaneous eigenspace
with eigenvalue 1 of all elements of S . That is,
j i 2 HS i Mj i = j i for all M 2 S: (7.117)
The group S is called the stabilizer of the code, since it preserves all of the
codewords.
The group S can be characterized by its generators. These are elements
fM ig that are independent (no one can be expressed as a product of others)
and such that each element of S can be expressed as a product of elements
of fM ig. If S has n ; k generators, we can show that the code space HS has
dimension 2k | there are k encoded qubits.
To verify this, rst note that each M 2 S must satisfy M 2 = I ; if
M = ;I , then M cannot have the eigenvalue +1. Furthermore, for each
2
M 6= I in Gn that squares to one, the eigenvalues +1 and ;1 have equal
36 CHAPTER 7. QUANTUM ERROR CORRECTION
degeneracy. This is because for each M 6= I , there is an N 2 Gn that
anti-commutes with M ,
NM = ;M N ; (7.118)
therefore, M j i = j i if and only if M (N j i) = ;N j i, and the action
of the unitary N establishes a 1 ; 1 correspondence between the +1 eigen-
states of M and the ;1 eigenstates. Hence there are 21 (2n ) = 2n;1 mutually
orthogonal states that satisfy
M 1j i = j i ; (7.119)
where M 1 is one of the generators of S .
Now let M 2 be another element of Gn that commutes with M 1 such that
M 2 6= I ; M 1 . We can nd an N 2 Gn that commutes with M 1 but
anti-commutes with M 2; therefore N preserves the +1 eigenspace of M 1,
but within this space, it interchanges the +1 and ;1 eigenstates of M 2. It
follows that the space satisfying
M 1j i = M 2 j i = j i; (7.120)
has dimension 2n;2 .
Continuing in this way, we note that if M j is independent of fM 1; M 2; : : : M j;1g,
then there is an N that commutes with M 1; : : : ; M j;1, but anti-commutes
with M j (we'll discuss in more detail below how such an N can be found).
Therefore, restricted to the space with M 1 = M 2 = : : : = M j;1 = 1; M j
has as many +1 eigenvectors as ;1 eigenvectors. So adding another genera-
tor always cuts the dimension of the simultaneous eigenspace in half. With
n ; k generators, the dimension of the remaining space is 2n (1=2)n;k = 2k .
The stabilizer language is useful because it provides a simple way to
characterize the errors that the code can detect and correct. We may think
of the n ; k stabilizer generators M 1; : : : ; M n;k , as the check operators of
the code, the collective observables that we measure to diagnose the errors.
If the encoded information is undamaged, then we will nd M i = 1 for each
of the generators; but if M i = ;1 for some i, then the data is orthogonal to
the code subspace and an error has been detected.
Recall that the error superoperator can be expanded in terms of elements
E a of the Pauli group. A particular E a either commutes or anti-commutes
with a particular stabilizer generator M . If E a and M commute, then
ME aj i = E a M j i = E aj i; (7.121)
7.9. STABILIZER CODES 37
for j i 2 HS , so the error preserves the value M = 1. But if E a and M
anti-commute, then
ME a j i = ;E aM j i = ;E aj i; (7.122)
so that the error ips the value of M , and the error can be detected by
measuring M .
For stabilizer generators M i and errors E a, we may write
M i E a = (;1)sia E a M i : (7.123)
The sia's, i = 1; : : : ; n ; k constitute a syndrome for the error E a, as (;1)sia
will be the result of measuring M i if the error E a occurs. In the case
of a nondegenerate code, the sia's will be distinct for all E a 2 E , so that
measuring the n ; k stabilizer generators will diagnose the error completely.
More generally, let us nd a condition to be satis ed by the stabilizer
that is sucient to ensure that error recovery is possible. Recall that it is
sucient that, for each E a; E b 2 E , and normalized j i in the code subspace,
we have
h jE yaE bj i = Cab; (7.124)
where Cab is independent of j i. We can see that this condition is satis ed
provided that, for each E a; E b 2 E , one of the following holds:
1) E yaE b 2 S ,
2) There is an M 2 S that anti-commutes with E yaE b.
Proof: In case (1) h jE yaE bj iy = h j i y= 1; for j i 2 HS . In case (2),
suppose M 2 S and M E aE b = ;E aE b M . Then
h jE yaE bj i = h jEyaE bM j i

= ;h jME yaE bj i = ;h jE yaE bj i; (7.125)

and therefore h jE yaE bj i = 0.
38 CHAPTER 7. QUANTUM ERROR CORRECTION
Thus, a stabilizer code that corrects fEg is a space HS xed by an abelian
subgroup S of the Pauli group, where either (1) or (2) is satis ed by each
E ya E b with E a;b 2 E . The code is nondegenerate if condition (1) is not
satis ed for any E yaE b .
Evidently we could also just as well choose the code subspace to be any
one of the 2n;k simultaneous eigenspaces of n ; k independent commuting
elements of Gn . But in fact all of these codes are equivalent. We may regard
two stabilizer codes as equivalent if they di er only according to how the
qubits are labeled, and how the basis for each single-qubit Hilbert space is
chosen { that is the stabilizer of one code is transformed to the stabilizer
of the other by a permutation of the qubits together with a tensor prod-
uct of single-qubit transformations. If we partition the stabilizer generators
into two sets fM 1; : : : ; M j g and fM j+1; : : : ; M n;k g, then there exists an
N 2 Gn that commutes with each member of the rst set and anti-commutes
with each member of the second set. Applying N to j i 2 Hs preserves the
eigenvalues of the rst set while ipping the eigenvalues of the second set.
Since N is just a tensor product of single-qubit unitary transformations,
there is no loss of generality (up to equivalence) in choosing all of the eigen-
values to be one. Furthermore, since minus signs don't really matter when
the stabilizer is speci ed, we may just as well say that two codes are equiva-
lent if, up to phases, the stabilizers di er by a permutation of the n qubits,
and permutations on each individual qubits of the operators X ; Y ; Z .
Recovery may fail if there is an E yaE b that commutes with the stabilizer
but does not lie in the stabilizer. This is an operator that preserves the
code subspace HS but may act nontrivially in that space; thus it can modify
encoded information. Since E aj i and E bj i have the same syndrome, we
might mistakenly interpret an E a error as an E b error; the e ect of the error
together with the attempt at recovery is that E yb E a gets applied to the data,
which can cause damage.
A stabilizer code with distance d has the property that each E 2 Gn of
weight less than d either lies in the stabilizer or anti-commutes with some
element of the stabilizer. The code is nondegenerate if the stabilizer contains
no elements of weight less than d. A distance d = 2t + 1 code can correct
t errors, and a distance s + 1 code can detect s errors or correct s errors at
known locations.
7.9. STABILIZER CODES 39
7.9.2 Symplectic Notation
Properties of stabilizer codes are often best explained and expressed using the
language of linear algebra. The stabilizer S of the code, an order 2n;k abelian
subgroup of the Pauli group with all elements squaring to the identity, can
equivalently be regarded as a dimension n ; k closed linear subspace of F22n,
self orthogonal with respect to a certain (symplectic) inner product.
The group G n = Gn =Z2 is isomorphic to the binary vector space F22n. We
establish this by observing that, since Y = ZX , any element M of the Pauli
group (up to the sign) can be expressed as a product of Z 's and X 's; we
may write
M = ZM XM (7.126)
where Z M is a tensor product of Z 's and X M is a tensor product of X 's.
More explicitly, a Pauli operator may be written as
On On
( j ) Z ( )X ( ) = Z i X i ; (7.127)
i=1 i=1
where and are binary strings of length n. (Then Y acts at the locations
where and \collide.") Multiplication in Gn maps to addition in F22n:
( j )( 0j 0) = (;1) ( + 0j + 0) ;
0
(7.128)
the phase arises because 0 counts the number of times a Z is interchanged
with a X as the product is rearranged into the standard form of eq. (7.127).
It follows from eq. (7.128) that the commutation properties of the Pauli
operators can be expressed in the form
( j )( 0j 0) = (;1) +
0 0
( 0j 0)( j ) (7.129)
Thus two Pauli operators commute if and only if the corresponding vectors
are orthogonal with respect to the \symplectic" inner product
0+ 0 : (7.130)
We also note that the square of a Pauli operator is
( j )2 = (;1) I ; (7.131)
40 CHAPTER 7. QUANTUM ERROR CORRECTION
since counts the number of Y 's in the operator; it squares to the identity
if and only if
=0 : (7.132)
Note that a closed subspace, where each element has this property, is auto-
matically self-orthogonal, since
0 + 0 = ( + 0 ) ( + 0) ; ; 0 0 = 0 ;
(7.133)
in the group language, that is, a subgroup of Gn with each element squaring
to I is automatically abelian.
Using the linear algebra language, some of the statements made earlier
about the Pauli group can be easily veri ed by counting linear constraints.
Elements are independent if the corresponding vectors are linearly indepen-
dent over F22n, so we may think of the n ; k generators of the stabilizer
as a basis for a linear subspace of dimension n ; k. We will use the nota-
tion S to denote both the linear space and the corresponding abelian group.
Then S ? denotes the dimension-n + k space of vectors that are orthogonal
to each vector in S (with respect to the symplectic inner product). Note
that S ? contains S , since all vectors in S are mutually orthogonal. In the
group language, corresponding to S ? is the normalizer (or centralizer) group
N (S ) ( S ?) of S in Gn | the subgroup of Gn containing all elements that
commute with each element of S . Since S is abelian, it is contained in its
own normalizer, which also contains other elements (to be further discussed
below). The stabilizer
P of a distance d code has the property that each ( j )
whose weight i( i _ i) is less than d either lies in the stabilizer subspace
S or lies outside the orthogonal space S ? .
A code can be characterized by its stabilizer, a stabilizer by its generators,
and the n ; k generators can be represented by an (n ; k) 2n matrix
H = (HZ jHX ): (7.134)
Here each row is a Pauli operator, expressed in the ( j ) notation. The syn-
drome of an error E a = ( aj a) is determined by its commutation properties
with the generators M i = ( 0ij i0); that is
sia = ( aj a) ( 0ij i0) = a i + i a:
0 0 (7.135)
7.9. STABILIZER CODES 41
In the case of a nondegenerate code, each error has a distinct syndrome. If
the code is degenerate, there may be several errors with the same syndrome,
but we may apply any one of the E ya corresponding to the observed syndrome
in order to recover.

7.9.3 Some examples of stabilizer codes

(a) The nine-qubit code. This [[9; 1; 3]] code has eight stabilizer genera-
tors that can be expressed as
Z 1Z 2; Z 2Z 3 Z 4Z 5 Z 5Z 6; Z 7Z 8 Z 8Z 9

X 1 X 2 X 3 X 4X 5 X 6 ; X 4X 5 X 6 X 7 X 8X 9 :
(7.136)
In the notation of eq. (7.134) these become
0 1 1 0 1
BB 0 1 1 0 0 CC
BB 1 1 0 CC
BB 0 0 0 CC
BB 0 1 1 CC
BB 0 0 1 1 0 CC
BB 0 1 1 CC
B@ 1 1 1 1 1 1 0 0 0 CA
0 0 0 0 1 1 1 1 1 1
(b) The seven-qubit code. This [[7; 1; 3]] code has six stabilizer genera-
tors, which can be expressed as
!
H~ = H ham 0
0 Hham ; (7.137)
where Hham is the 3 7 parity-check matrix of the classical [7,4,3]
Hamming code. The three check operators
M 1 = Z 1Z 3Z 5Z 7
M 2 = Z 2Z 3Z 6Z 7
M 3 = Z 4Z 5Z 6Z 7; (7.138)
42 CHAPTER 7. QUANTUM ERROR CORRECTION
detect the bit ips, and the three check operators
M 4 = X 1 X 3 X 5X 7
M 5 = X 2 X 3 X 6X 7
M 6 = X 4 X 5 X 6X 7 ; (7.139)
detect the phase errors. The space with M 1 = M 2 = M 3 = 1 is
spanned by the codewords that satisfy the Hamming parity check. Re-
calling that a Hadamard change of basis interchanges Z and X , we
see that the space with M 4 = M 5 = M 6 is spanned by codewords
that satisfy the Hamming parity check in the Hadamard-rotated ba-
sis. Indeed, we constructed the seven-qubit code by demanding that
the Hamming parity check be satis ed in both bases. The generators
commute because the Hamming code contains its dual code; i.e., each
row of Hham satis es the Hamming parity check.
(c) CSS codes. Recall whenever an [n; k; d] classical code C contains its
dual code C ?, we can perform the CSS construction to obtain an
[[n; 2k ; n; d]] quantum code. The stabilizer of this code can be written
as
!
H
H~ = 0 H 0 (7.140)
where H is the (n ; k) n parity check matrix of C . As for the seven-
qubit code, the stabilizers commute because C contains C ?, and the
code subspace is spanned by states that satisfy the H parity check in
both the F -basis and the P -basis. Equivalently, codewords obey the H
parity check and are invariant under
jvi ! jv + wi; (7.141)
where w 2 C ?.
(d) More general CSS codes. Consider, more generally, a stabilizer
whose generators can each be chosen to be either a product of Z 's
( j0) or a product of X 's (0j ). Then the generators have the form
!
H
H~ = 0 H :Z 0 (7.142)
X
7.9. STABILIZER CODES 43
Now, what condition must HX and HZ satisfy if the Z -generators and
X -generators are to commute? Since Z 's must collide with X 's an
even number of times, we have
HX HZT = HZ HXT = 0 : (7.143)
But this is just the requirement that the dual CX? of the code whose
parity check is HX be contained in the code CZ whose parity check is
HZ . In other words, this QECC ts into the CSS framework, with
C2 = CX? C1 = CZ : (7.144)
So we may characterize CSS codes as those and only those for which
the stabilizer has generators of the form eq. (7.142).
However there is a caveat. The code de ned by eq. (7.142) will be non-
degenerate if errors are restricted to weight less than d = min(dZ ; dX )
(where dZ is the distance of CZ , and dX the distance of CX ). But the
true distance of the QECC could exceed d. For example, the 9-qubit
code is in this generalized sense a CSS code. But in that case the
classical code CX is distance 1, re ecting that, e.g., Z 1Z 2 is contained
in the stabilizer. Nevertheless, the distance of the CSS code is d = 3,
since no weight-2 Pauli operator lies in S ? n S .

7.9.4 Encoded qubits

We have seen that the troublesome errors are those in S ? n S | those that
commute with the stabilizer, but lie outside of it. These Pauli operators are
also of interest for another reason: they can be regarded as the \logical"
operations that act on the encoded data that is protected by the code.
Appealing to the \linear algebra" viewpoint, we can see that the nor-
malizer S ? of the stabilizer contains n + k independent generators { in the
2n-dimensional space of the ( j )'s, the subspace containing the vectors that
are orthogonal to each of n ; k linearly independent vectors has dimension
2n ; (n ; k) = n + k. Of the n + k vectors that span this space, n ; k
can be chosen to be the generators of the stabilizer itself. The remaining
2k generators preserve the code subspace because they commute with the
stabilizer, but act nontrivially on the k encoded qubits.
In fact, these 2k operations can be chosen to be the single-qubit operators
i; X i; i = 1; 2; : : : ; k; where Z i; X i are the Pauli operators Z and X acting
Z
44 CHAPTER 7. QUANTUM ERROR CORRECTION
on the encoded qubit labeled by i. First, note that we can extend the n ; k
stabilizer generators to a maximal set of n commuting operators. The k
operators that we add to the set may be denoted Z 1; : : : Z k . We can then
regard the simultaneous eigenstates of Z 1 : : : Z k (in the code subspace HS )
as the logical basis states jz1; : : : ; zk i, with zj = 0 corresponding to Z j = 1
and zj = 1 corresponding to Z j = ;1.
The remaining k generators of the normalizer may be chosen to be mutu-
ally commuting and to commute with the stabilizer, but then they will not
commute with any of the Z i 's. By invoking a Gram-Schmidt orthonormaliza-
tion procedure, we can choose these generators, denoted X i, to diagonalize
the symplectic form, so that
Z i X
j = (;1)ij X j Z i: (7.145)
Thus, each X j ips the eigenvalue of the corresponding Z j , and it can so be
regarded as the Pauli operator X acting on encoded qubit i

(a) The 9-qubit Code. As we have discussed previously, the logical oper-
ators can be chosen to be
Z = X 1X 2 X 3 ;
X = Z 1Z 4 Z 7 : (7.146)
These anti-commute with one another (an X and a Z collide at position
1), commute with the stabilizer generators, and are independent of the
generators (no element of the stabilizer contains three X 's or three
Z 's).

(b) The 7-qubit code. We have seen that

X = X 1X 2 X 3 ;
Z = Z 1Z 2 Z 3 ; (7.147)
then X adds an odd Hamming codeword and Z ips the phase of an
odd Hamming codeword. These operations implement a bit ip and
phase ip respectively in the basis fj0iF ; j1iF g de ned in eq. (7.93).
7.10. THE 5-QUBIT CODE 45
7.10 The 5-Qubit Code
All of the QECC's that we have considered so far are of the CSS type | each
stabilizer generator is either a product of Z 's or a product of X 's. But not
all stabilizer codes have this property. An example of a non-CSS stabilizer
code is the perfect nondegenerate [[5,1,3]] code.
Its four stabilizer generators can be expressed
M1 = XZZXI ;
M2 = IXZZX ;
M3 = XIXZZ ;
M4 = ZXIXZ ; (7.148)
M 2;3;4 are obtained from M 1 by performing a cyclic permutation of the
qubits. (The fth operator obtained by a cyclic permutation of the qubits,
M 5 = ZZXIX = M 1M 2 M 3 M 4 is not independent of the other four.)
Since a cyclic permutation of a generator is another generator, the code itself
is cyclic | a cyclic permutation of a codeword is a codeword.
Clearly each M i contains no Y 's and so squares to I . For each pair
of generators, there are two collisions between an X and a Z , so that the
generators commute. One can quickly check that each Pauli operator of
weight 1 or weight 2 anti-commutes with at least one generator, so that the
distance of the code is 3.
Consider, for example, whether there are error operators with support
on the rst two qubits that commute with all four generators. The weight-2
operator, to commute with the IX in M 2 and the XI in M 3, must be
XX . But XX anti-commutes with the XZ in M 1 and the ZX in M 4.
In the symplectic notation, the stabilizer may be represented as
0 01100 10010 1
B 00110 01001 CC
H~ = B
B@ 00011 10100 CA (7.149)
10001 01010
This matrix has a nice interpretation, as each of its columns can be regarded
as the syndrome of a single-qubit error. For example, the single-qubit bit ip
operator X j , commutes with M i if M i has an I or X in position j , and
anti-commutes if M i has a Z in position j . Thus the table
46 CHAPTER 7. QUANTUM ERROR CORRECTION
X1 X2 X3 X4 X5
M1 0 1 1 0 0
M2 0 0 1 1 0
M3 0 0 0 1 1
M4 1 0 0 0 1
lists the outcome of measuring M 1;2;3;4 in the event of a bit ip. (For example,
if the rst bit ips, the measurement outcomes M 1 = M 2 = M 3 = 1; M 4 =
;1, diagnose the error.) Similarly, the right half of H~ can be regarded as the
syndrome table for the phase errors.
Z1 Z2 Z3 Z4 Z5
M1 1 0 0 1 0
M2 0 1 0 0 1
M3 1 0 1 0 0
M4 0 1 0 1 0
Since Y anti-commutes with both X and Z , we obtain the syndrome for the
error Y i by summing the ith columns of the X and Z tables:
Y1 Y2 Y3 Y4 Y5
M1 1 1 1 1 0
M2 0 1 1 1 1
M3 1 0 1 1 1
M4 1 1 0 1 1
We nd by inspection that the 15 columns of the X ; Y , and Z syndrome
tables are all distinct, and so we verify again that our code is a nondegenerate
code that corrects one error. Indeed, the code is perfect | each of the 15
nontrivial binary strings of length 4 appears as a column in one of the tables.
Because of the cyclic property of the code, we can easily characterize all
15 nontrivial elements of its stabilizer. Aside from M 1 = XZZXI and
the four operators obtained from it by cyclic permutations of the qubit, the
stabilizer also contains
M 3 M 4 = ;Y XXY I ; (7.150)
plus its cyclic permutations, and
M 2 M 5 = ;ZY Y ZI ; (7.151)
7.10. THE 5-QUBIT CODE 47
and its cyclic permutations. Evidently, all elements of the stabilizer are
weight-4 Pauli operators.
For our logical operators, we may choose
Z = ZZZZZ ;
X = XXXXX ; (7.152)
these commute with M 1;2;3;4, square to I , and anti-commute with one an-
other. Being weight 5, they are not themselves contained in the stabilizer.
Therefore if we don't mind destroying the encoded state, we can determine
the value of Z for the encoded qubit by measuring Z of each qubit and eval-
uating the parity of the outcomes. In fact, since the code is distance three,
there are elements of S ? n S of weight-three; alternate expressions for Z and
X can be obtained by multiplying by elements of the stabilizer. For example
we can choose
Z = (ZZZZZ ) (;ZY Y ZI ) = ;IXXIZ ; (7.153)
(or one of its cyclic permutations), and
= (XXXXX ) (;Y XXY I ) = ;ZIIZX ;
X
(7.154)
(or one of its cyclic permutations). So it is possible to ascertain the value of
X or Z by measuring X or Z of only three of the ve qubits in the block,
and evaluating the parity of the outcomes.
If we wish, we can construct an orthonormal basis for the code subspace,
as follows. Starting from any state j 0i, we can obtain
X
j 0i = M j 0 i: (7.155)
M 2S

This (unnormalized) state obeys M 0j 0i = j 0i for each M 0 2 S , since

multiplication by an element of the stabilizer merely permutes the terms in
the sum. To obtain the Z = 1 encoded state j0i, we may start with the state
j00000i, which is also a Z = 1 eigenstate, but not in the stabilizer; we nd
48 CHAPTER 7. QUANTUM ERROR CORRECTION
(up to normalization)
X
j0i = j00000i
M 2S
= j00000i + (M 1 + cyclic perms) j00000i
+ (M 3M 4 + cyclic perms) j00000i + (M 2M 5 + cyclic perms) j00000i
= j00000i + (110010i + cyclic perms)
; (j11110i + cyclic perms)
; (j01100i + cyclic perms): (7.156)
We may then nd j1i by applying X to j0i, that is by ipping all 5 qubits:
j1i = X j0i = j11111i + (j01101i + cyclic perms)
; (j00001i + cyclic perms)
; (j10011i + cyclic perms) : (7.157)
How is the syndrome measured? A circuit that can be executed to mea-
sure M 1 = XZZXI is:

{ Figure {

The Hadamard rotations on the rst and fourth qubits rotate M 1 to the
tensor product of Z 's ZZZZI , and the CNOT's then imprint the value
of this operator on the ancilla. The nal Hadamard rotations return the
encoded block to the standard code subspace. Circuits for measuring M 2;3;4
are obtained from the above by cyclically permuting the ve qubits in the
code block.
What about encoding? We want to construct a unitary transformation
U encode : j0000i (aj0i + bj1i) ! aj0i + bj1i: (7.158)
We have already seen that j00000i is a Z = 1 eigenstate, and that j00001i is
a Z = ;1 eigenstate. Therefore (up to normalization)
0 1
X
aj0i + bj1i = @ M A j0000i (aj0i + bj1i): (7.159)
M 2S
7.10. THE 5-QUBIT CODE 49
So we need to gure out how to construct a circuit that applies (P M ) to
an initial state.
Since the generators are independent, each element of the stabilizer can be
expressed as a product of generators as a unique way, and we may therefore
rewrite the sum as
X
M = (I + M 4)(I + M 3 )(I + M 2 )(I + M 1) :
M 2S (7.160)
Now to proceed further it is convenient to express the stabilizer in an alter-
native form. Note that we have the freedom to replace the generator M i by
M i M j without changing the stabilizer. This replacement is equivalent to
adding the j th row to the ith row in the matrix H~ . With such row opera-
tions, we can perform a Gaussian elimination on the 4 5 matrix HX , and
so obtain the new presentation for the stabilizer
0 11011 10001 1
B 00110 01001 CC
H~ 0 = B
B@ 11000 00101 CA ; (7.161)
10111 00011
or
M1 = Y ZIZY
M2 = IXZZX
M3 = ZZXIX
M4 = ZIZY Y (7.162)
In this form M i applies an X ( ip) only to qubits i and 5 in the block.
Adopting this form for the stabilizer, we can apply p12 (I + M 1) to a state
j0; z2; z3; z4; z5i by executing the circuit

{ Figure {

The Hadamard prepares p12 (j0i + j1i. If the rst qubit is j0i, the other
operations don't do anything, so I is applied. But if the rst qubit is j1i,
then X has been applied to this qubit, and the other gates in the circuit apply
50 CHAPTER 7. QUANTUM ERROR CORRECTION
ZZIZY , conditioned on the rst qubit being j1i. Hence, Y ZIZY = M 1
has been applied. Similar circuits can be constructed that apply p12 (I + M 2)
to jz1; 0; z3; z4; z5i, and so forth. Apart from the Hadamard gates each of these
circuits applies only Z 's and conditional Z 's to qubits 1 through 4; these
qubits never ip. (It was to ensure thus that we performed the Gaussian
elimination on HX .) Therefore, we can construct our encoding circuit as

{ Figure {

Furthermore, each Z gate acting on j0i can be replaced by the identity, so

we may simplify the circuit by eliminating all such gates, obtaining

{ Figure {

This procedure can be generalized to construct an encoding circuit for any

stabilizer code.
Since the encoding transformation is unitary, we can use its adjoint to
decode. And since each gate squares to I , the decoding circuit is just the
encoding circuit run in reverse.

7.11 Quantum secret sharing

The [[5; 1; 3]] code provides a nice illustration of a possible application of
QECC's.2
Suppose that some top secret information is to be entrusted to n parties.
Because none is entirely trusted, the secret is divided into n shares, so that
each party, with access to his share alone, can learn nothing at all about the
secret. But if enough parties get together and pool their shares, they can
decipher the secret or some part of it.
In particular, an (m; n) threshold scheme has the property that m shares
are sucient to reconstruct all of the secret information. But from m ; 1
2 R.
Cleve, D. Gottesman, and H.-K. Lo, \How to Share a Quantum Secret," quant-
ph/9901025.
7.11. QUANTUM SECRET SHARING 51
shares, no information at all can be extracted. (This is called a threshold
scheme because as shares 1; 2; 3 : : : ; m ; 1 are collected one by one, nothing
is learned, but the next share crosses the threshold and reveals everything.)
We should distinguish too kinds of secrets: a classical secret is an a priori
unknown bit string, while a quantum secret is an a priori unknown quantum
state. Either type of secret can be shared. In particular, we can distribute
a classical secret among several parties by selecting one from an ensemble
of mutually orthogonal (entangled) quantum states, and dividing the state
among the parties.
We can see, for example, that the [[5; 1; 3]] code may be employed in
a (3; 5) threshold scheme, where the shared information is classical. One
classical bit is encoded by preparing one of the two orthogonal states j0i or
j1i and then the ve qubits are distributed to ve parties. We have seen that
(since the code is nondegenerate) if any two parties get together, then the
density matrix their two qubits is
1
(2) = 1 : (7.163)
4
Hence, they learn nothing about the quantum state from any measurement
of their two qubits. But we have also seen that the code can correct two
located errors or two erasures. When any three parties get together, they
may correct the two errors (the two missing qubits) and perfectly reconstruct
the encoded state j0i or j1i.
It is also clear that by a similar procedure a single qubit of quantum infor-
mation can be shared { the [[5; 1; 3]] code is also the basis of a ((3; 5)) quan-
tum threshold scheme (we use the ((m; n)) notation if the shared information
is quantum information, and the (m; n) notation if the shared information
is classical). How does this quantum-secret-sharing scenario generalize to
more qubits? Suppose we prepare a pure state j i of n qubits | can it be
employed in an ((m; n)) threshold scheme?
We know that m qubits must be sucient to reconstruct the state; hence
n ; m erasures can be corrected. It follows from our general error correction
criterion that the expectation value of any weight-(n ; m) observable must
be independent of the state j i
h jE j i independent of j i; wt(E ) n ; m: (7.164)
Thus, if m parties have all the information, the other n ; m parties have no
information at all. That makes sense, since quantum information cannot be
cloned.
52 CHAPTER 7. QUANTUM ERROR CORRECTION
On the other hand, we know that m ; 1 shares reveal nothing, or that
h jEj i independent of j i; wt(E ) m ; 1: (7.165)
It then follows that m;1 erasures can be corrected, or that the other n;m+1
parties have all the information.
From these two observations we obtain the two inequalities
n ; m < m ) n < 2m ;
m ; 1 < n ; m + 1 ) n > 2m ; 2 : (7.166)
It follows that
n = 2m ; 1 ; (7.167)
in an ((m; n)) pure state quantum threshold scheme, where each party has
a single qubit. In other words, the threshold is reached as the number of
qubits in hand crosses over from the minority to the majority of all n qubits.
We see that if each share is a qubit, a quantum pure state threshold
scheme is a [[2m ; 1; k; m]] quantum code with k 1. But in fact the [[3; 1; 2]]
and [[7; 1; 4]] codes do not exist, and it follows from the Rains bound that the
m > 3 codes do not exist. In a sense, then, the [[5; 1; 3]] code is the unique
quantum threshold scheme.
There are a number of caveats | the restriction n = 2m ; 1 continues to
apply if each share is a q-dimensional system rather than a qubit, but various
[[2m ; 1; 1; k]]q (7.168)
codes can be constructed for q > 2. (See the exercises for an example.)
Also, we might allow the shared information to be a mixed state (that
encodes a pure state). For example, if we discard one qubit of the ve qubit
block, we have a ((3; 4)) scheme. Again, once we have three qubits, we can
correct two erasures, one arising because the fourth share is in the hands of
another party, the other arising because a qubit has been thrown away.
Finally, we have assumed that the shared information is quantum infor-
mation. But if we are only sharing classical information instead, then the
conditions for correcting erasures are less stringent. For example, a Bell pair
may be regarded as a kind of (2; 2) threshold scheme for two bits of classical
information, where the classical information is encoded by choosing one of
7.12. SOME OTHER STABILIZER CODES 53
the four mutually orthogonal states ji; j i. A party in possession of one
of the two qubits is unable to access any of this classical information. But
this is not a scheme for sharing a quantum secret, since linear combinations
of these Bell states do not have the property that = 21 1 if we trace out one
of the two qubits.

7.12 Some Other Stabilizer Codes

7.12.1 The [[6; 0; 4]] code
A k = 0 quantum code has a one-dimensional code subspace; that is, there is
only one encoded state. The code cannot be used to store unknown quantum
information, but even so, k = 0 codes can have interesting properties. Since
they can detect and diagnose errors, they might be useful for a study of the
correlations in decoherence induced by interactions with the environment.
If k = 0, then S and S ? coincide { a Pauli operator that commutes
with all elements of the stabilizer must lie in the stabilizer. In this case,
the distance d is de ned as the minimum weight of any Pauli operator in
the stabilizer. Thus a distance-d code can \detect d ; 1 errors;" that is, if
any Pauli operator of weight less than d acts on the code state, the result is
orthogonal to that state.
Associated with the [[5; 1; 3]] code is a [[6; 0; 4]] code, whose encoded state
can be expressed as
j0i j0i + j1i j1i; (7.169)
where j0i and j1i are the Z eigenstates of the [[5; 1; 3]] code. You can verify
that this code has distance d = 4 (an exercise).
The [[6; 0; 4]] code is interesting because its code state is maximally en-
tangled. We may choose any three qubits from among the six. The density
matrix (3) of those three, obtained by tracing over the other three, is totally
random, (3) = 18 I . In this sense, the [[6; 0; 4]] state is a natural multiparti-
cle analog of the two-qubit Bell states. It is far \more entangled" than the
six-qubit cat state p12 (j000000i + j111111i). If we measure any one of the six
qubits in the cat state, in the fj0i; j1ig basis, we know everything about the
state we have prepared of the remaining ve qubits. But we may measure
any observable we please acting on any three qubits in the [[6; 0; 4]] state, and
54 CHAPTER 7. QUANTUM ERROR CORRECTION
we learn nothing about the remaining three qubits, which are still described
by (3) = 18 I .
Our [[6; 0; 4]] state is all the more interesting in that it turns out (but is not
so simple to prove) that its generalizations to more qubits do not exist. That
is, there are no [[2n; 0; n + 1]] binary quantum codes for n > 3. You'll see in
the exercises, though, that there are other, nonbinary, maximally entangled
states that can be constructed.

7.12.2 The [[2m; 2m ; 2; 2]] error-detecting codes

The Bell state j+ i = p12 (j00i + j11i) is a [[2; 0; 2]] code with stabilizer gen-
erators
ZZ ; (7.170)
XX :
The code has distance two because no weight-one Pauli operator commutes
with both generators (none of X ; Y ; Z commute with both X and Z ). Cor-
respondingly, a bit ip (X ) or a phase ip (Z ), or both (Y ) acting on either
qubit in j+i, takes it to an orthogonal state (one of the other Bell states
j; i; j +i; j ;i).
One way to generalize the Bell states to more qubits is to consider the
n = 4; k = 2 code with stabilizer generators
ZZZZ ;
XXXX : (7.171)
This is a distance d = 2 code for the same reason as before. The code
subspace is spanned by states of even parity (ZZZZ ) that are invariant
under a simultaneous ip of all four qubits (XXX ). A basis is:
j0000i + j1111i ;
j0011i + j1100i ; (7.172)
j0101i + j1010i ;
j0110i + j1001i :
Evidently, an X or a Z acting on any qubit takes each of these states to
a state orthogonal to the code subspace; thus any single-qubit error can be
detected.
7.12. SOME OTHER STABILIZER CODES 55
A further generalization is the [[2m; 2m ; 2; 2]] code with stabilizer gen-
erators
ZZ : : : Z ;
XX : : : X ; (7.173)
(the length is required to be even so that the generators will commute. The
code subspace is spanned by our familiar friends the 2n;2 cat states
p1 (jxi + j:xi); (7.174)
2
where x is an even-weight string of length n = 2m.

7.12.3 The [[8; 3; 3]] code

As already noted in our discussion of the [[5; 1; 3]] code, a stabilizer code with
generators
H~ = (HZ jHX ); (7.175)
can correct one error if: (1) the columns of H~ are distinct (a distinct syndrome
for each X and Z error) and (2) each sum of a column of HZ with the
corresponding column of HX is distinct from each column of H~ and distinct
from all other such sums (each Y error can be distinguished from all other
one-qubit errors).
We can readily construct a 5 16 matrix H~ with this property, and so
derive the stabilizer of an [[8; 3; 3]] code; we choose
0 1
H H
H~ = B@ 11111111 00000000 CA : (7.176)
00000000 11111111
Here H is the 3 8 matrix
0 1
1 0 1 0 1 0 1 0
H=B @ 0 1 1 0 0 1 1 0 CA (7.177)
0 0 0 1 1 1 1 0
whose columns are all the distinct binary strings of length 3, and H is ob-
tained from H by performing a suitable permutation of the columns. This
56 CHAPTER 7. QUANTUM ERROR CORRECTION
permutation is chosen so that the eight sums of columns of H with corre-
sponding columns of H are all distinct. We may see by inspection that a
suitable choice is
0 1
0 1 1 0 0 1 1 0
H = B@ 0 0 0 1 1 1 1 0 CA (7.178)
1 1 0 0 1 1 0 0
as the column sums are then
0 1
1 1 0 0 1 1 0 0
B@ 0 1 1 1 1 0 0 0 CA : (7.179)
1 1 0 1 0 0 1 0
The last two rows of H~ serve to distinguish each X syndrome from each Y
syndrome or Z syndrome, and the above mentioned property of H ensures
that all Y syndromes are distinct. Therefore, we have constructed a length-8
code with k = 8 ; 5 = 3 that can correct one error. It is actually the simplest
in an in nite class of [[2m; 2m ; m ; 2; 3]] codes constructed by Gottesman,
with m 3.
The [[8; 3; 3]] quantum code that we have just described is a close cousin
of the \extended Hamming code," the self-dual [8,4,4] classical code that
is obtained from the [7,3,4] dual of the Hamming code by adding an extra
parity bit. Its parity check matrix (which is also its generator matrix) is
01 0 1 0 1 0 1 01
B 0 1 1 0 0 1 1 0 CC
HEH = BB@ 0 0 0 1 1 1 1 0 CA (7.180)
1 1 1 1 1 1 1 1
This matrix HEH has the property that, not only are its eight columns dis-
tinct, but also each sum of two columns is distinct from all columns; since
the sum of two columns has 0, not 1, as its fourth bit.

7.13 Codes Over GF (4)

We constructed the [[5; 1; 3]] code by guessing the stabilizer generators, and
checking that d = 3. Is there a more systematic method?
7.13. CODES OVER GF (4) 57
In fact, there is. Our suspicion that the [[5; 1; 3]] code might exist was
aroused by the observation that its parameters saturate the quantum sphere-
packing inequality for t = 1 codes:
1 + 3n = 2n;k ; (7.181)
(16 = 16 for n = 5 and k = 1). To a coding theorist, this equation might
look familiar.
Aside from the binary codes we have focused on up to now, classical codes
can also be constructed from length-n strings of symbols that take values,
not in f0; 1g, but in the nite eld with q elements GF (q). Such nite elds
exist for any q = pm , where p is prime. (GF is short for \Galois Field," in
honor of their discoverer.)
For such nonbinary codes, we may model error as addition by an element
of the eld, a cyclic shift of the q symbols. Then there are q ; 1 nontrivial
errors. The weight of a vector in GF (q)n is the number of its nonzero ele-
ments, and the distance between two vectors is the weight of their di erence
(the number of elements that disagree). An [n; k; d]q classical code consists
of qk codewords in GF (q)n, where the minimal distance between a pair is
d. The sphere packing bound that must be satis ed for an [n; k; d]q code to
exist becomes, for d = 3,
1 + (q ; 1)n qn;k : (7.182)
In fact, the perfect binary Hamming codes that saturate this bound for q = 2
with parameters
n = 2m ; 1; k = n ; m; (7.183)
admit a generalization to any GF (q); perfect Hamming codes over GF (q)
can be constructed with
n = qq ;;11 ; k = n ; m :
m
(7.184)
The [[5; 1; 3]] quantum code is descended from the classical [5; 3; 3]4 Hamming
code (the case q = 4 and m = 2).
What do the classical GF (4) codes have to do with binary quantum sta-
bilizer codes? The connection arises because the stabilizer can be associated
with a set of vectors over GF (4) closed under addition.
58 CHAPTER 7. QUANTUM ERROR CORRECTION
The eld GF (4) has four elements that may be denoted 0; 1; !; ! , where
1 + 1 = ! + ! = ! + ! = 0;
1 + ! = ! ; (7.185)
and !2 = ! ; !! = 1. Thus, the additive structure of GF (4) echos the
multiplicative structure of the Pauli operators X ; Y ; Z . Indeed, the length-
2n binary string ( j ) that we have used to denote an element of the Pauli
group can equivalently be regarded as a length-n vector in GF (4)n
( j ) $ + !: (7.186)
The stabilizer, with 2n;k elements, can be regarded as a subcode of GF (4),
closed under addition and containing 2n;k codewords.
Note that the code need not be a vector space over GF (4), as it is not
required to be closed under multiplication by a scalar 2 GF (4). In the special
case where the code is a vector space, it is called a linear code.
Much is known about codes over GF (4), so this connection opened the
door for the (classical) coding theorists to construct many QECC's.3 How-
ever, not every subcode of GF (4)n is associated with a quantum code; we
have not yet imposed the requirement that the stabilizer is abelian { the
( j )'s that span the code must be mutually orthogonal in the symplectic
inner product
0+ 0 : (7.187)
This orthogonality condition might look strange to a coding theorist, who is
more accustomed to de ning the inner product of two vectors in GF (4)n as
an element of GF (4) given by
v u = v1u1 + + vnun ; (7.188)
where conjugation, denoted by a bar, interchanges ! and ! . If this \hermi-
tian" inner product of two vectors v and u is
v u = a + b! 2 GF (4) ; (7.189)
3 Calderbank, Rains, Shor, and Sloane, \Quantum error correction via codes over
GF (4)," quant-ph/9608006.
7.13. CODES OVER GF (4) 59
then our symplectic inner product is
vu=b : (7.190)
Therefore, vanishing of the symplectic inner product is a weaker condition
than vanishing of the hermitian inner product. In fact, though, in the special
case of a linear code, self-orthogonality with respect to the hermitian inner
product is actually equivalent to self-orthogonality with respect to the sym-
plectic inner product. We observe that if v u = a + b!, orthogonality in the
symplectic inner product requires b = 0. But if u is in a linear code, then so
is ! u where
v (!u) = b + a! (7.191)
so that
v (!u) = a : (7.192)
We see that if v and u belong to a linear GF (4) code and are orthogonal
with respect to the symplectic inner product, then they are also orthogonal
with respect to the hermitian inner product. We conclude then, that a lin-
ear GF(4) code de nes a quantum stabilizer code if and only if the code is
self-orthogonal in the hermitian inner product. Classical codes with these
properties have been much studied.
In particular, consider again the [5; 3; 3]4 Hamming code. Its parity check
matrix (in an unconventional presentation) can be expressed as
!
1
H= 0 1 ! ! 1 ; ! ! 1 0 (7.193)
which is also the generator matrix of its dual, a linear self-orthogonal [5; 2; 4]4
code. In fact, this [5; 2; 4]4 code, with 42 = 16 codewords, is precisely the
stabilizer of the [[5; 1; 3]] quantum code. By identifying 1 X ; ! Z , we
recognize the two rows of H as the stabilizer generators M 1; M 2. The dual
of the Hamming code is a linear code, so linear combinations of the rows are
contained in the code. Adding the rows and multiplying by ! we obtain
!(1; ! ; 0; ! ; 1) = (!; 1; 0; 1; !); (7.194)
which is M 4. And if we add M 4 to M 2 and multiply by ! , we nd
! (!; 0; !; ! ; ! ) = (1; 0; 1; !; !); (7.195)
60 CHAPTER 7. QUANTUM ERROR CORRECTION
which is M 3.
The [[5; 1; 3]] code is just one example of a quite general construction.
Consider a subcode C of GF (4)n that is additive (closed under addition),
and self-orthogonal (contained in its dual) with respect to the symplectic
inner product. This GF (4) code can be identi ed with the stabilizer of a
binary QECC with length n. If the GF (4) code contains 2n;k codewords,
then the QECC has k encoded qubits. The distance d of the QECC is the
minimum weight of a vector in C ? n C .
Another example of a self-orthogonal linear GF (4) code is the dual of the
m = 3 Hamming code with
n = 31 (43 ; 1) = 21: (7.196)
The Hamming code has 4n;m codewords, and its dual has 4m = 26 codewords.
We immediately obtain a QECC with parameters
[[21; 15; 3]]; (7.197)
that can correct one error.

7.14 Good Quantum Codes

A family of [[n; k; d]] codes is good if it contains codes whose \rate" R = k=n
and \error probability" p = t=n (where (t = (d ; 1)=2) both approach a
nonzero limit as n ! 1. We can use the stabilizer formalism to prove
a \quantum Gilbert-Varshamov" bound that demonstrates the existence of
good quantum codes. In fact, good codes can be chosen to be nondegenerate.
We will only sketch the argument, without carrying out the requisite
counting precisely. Let E = fE ag be a set of errors to be corrected, and
denote by E (2) = fE yaE b g, the products of pairs of elements of E . Then to
construct a nondegenerate code that can correct the errors in E , we must
nd a set of stabilizer generators such that some generator anti-commutes
with each element of E (2).
To see if a code with length n and k qubits can do the job, begin with the
set S (n;k) of all abelian subgroups of the Pauli group with n ; k generators.
We will gradually pare away the subgroups that are unsuitable stabilizers for
correcting the errors in E , and then see if any are left.
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 61
Each nontrivial error E a commutes with a fraction 1=2n;k of all groups
contained in S (n;k), since it is required to commute with each of the n ; k
generators of the group. (There is a small correction to this fraction that we
may ignore for large n.) Each time we add another element to E (2), a fraction
2k;n of all stabilizer candidates must be rejected. When E (2) has been fully
assembled, we have rejected at worst a fraction
jE (2)j 2k;n ; (7.198)
of all the subgroups contained in S (n;k) (where jE (2)j is the number of ele-
ments of E (2).) As long as this fraction is less than one, a stabilizer that does
the job will exist for large n.
If we want to correct t = pn errors, then E (2) contains operators of weight
at most 2t and we may estimate
" ! #
log2 jE j < n
(2)
log2 2pn 32pn n [H2 (2p) + 2p log 2 3] :
(7.199)
Therefore, nondegenerate quantum stabilizer codes that correct pn errors
exist, with asymptotic vote R = k=n given by
log2 jE (2)j + k ; n < 0; or R < 1 ; H2(2p) ; 2p log2 3:
(7.200)
Thus is the (asymptotic form of the) quantum Gilbert{Varshamov bound.
We conclude that codes with a nonzero rate must exist that protect
against errors that occur with any error probability p < pGV ' :0946. The
maximum error probability allowed by the Rains bound is p = 1=6, for a
code that can protect against every error operator of weight pn.
Though good quantum codes exist, the explicit construction of families
of good codes is quite another matter. Indeed, no such constructions are
known.

7.15 Some Codes that Correct Multiple Er-

rors
7.15.1 Concatenated codes
Up until now, all of the QECC's that we have explicitly constructed have
d = 3 (or d = 2), and so can correct one error (at best). Now we will
62 CHAPTER 7. QUANTUM ERROR CORRECTION
describe some examples of codes that have higher distance.
A particularly simple way to construct codes that can correct more errors
is to concatenate codes that can correct one error. A concatenated code is
a code within a code. Suppose we have two k = 1 QECC's, an [[n1; 1; d1]]
code C1 code and an [[n2; 1; d2]] code C2. Imagine constructing a length n2
codeword of C2, and expanding the codeword as a coherent superposition of
product states, in which each qubit is in one of the states j0i or j1i. Now
replace each qubit by a length-n1 encoded state using the code C1; that is
replace j0i by j0i and j1i by j1i of C1. The result is a code with length
n = n1n2; k = 1, and distance no less than d = d1d2. We will call C2 the
\outer" code and C1 the \inner" code.
In fact, we have already discussed one example of this construction: Shor's
9-qubit code. In that case, the inner code is the three-qubit repetition code
with stabilizer generators
ZZI ; IZZ ; (7.201)
and the outer code is the three-qubit \phase code" with stabilizer generators
XXI ; IXX (7.202)
(the Hadamard rotated repetition code). We construct the stabilizer of the
concatenated code as follows: Acting on each of the three qubits contained
in the block of the outer code, we include the two generators Z 1Z 2; Z 2Z 3 of
the inner code (six generators altogether). Then we add the two generators
of the outer code, but with X ; Z replaced by the encoded operations of the
inner code; in this case, these are the two generators
X X I ; IX X ; (7.203)
where I = III and X = XXX . You will recognize these as the eight
stabilizer generators of Shor's code that we have described earlier. In this
case, the inner and outer codes both have distance 1 (e.g., ZII commutes
with the stabilizer of the inner code), yet the concatenated code has distance
3 > d1d2 = 1. This happens because the code has been cleverly constructed
so that the weight 1 and 2 encoded operations of the inner code do not
commute with the stabilizer of the outer code. (It would have been di erent
if we had concatenated the repetition code with itself rather than with the
phase code!)
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 63
We can obtain a distance 9 code (capable of correcting four errors) by
concatenating the [[5; 1; 3]] code with itself. The length n = 25 is the smallest
for any known code with k = 1 and d = 9. (An [[n; 1; 9]] code with n = 23; 24
would be consistent with the Rains bound, but it is unknown whether such
a code really exists.)
The stabilizer of the [[25; 1; 9]] concatenated code has 24 generators. Of
these, 20 are obtained as the four generators M 1;2;3;4 acting on each of the
ve subblocks of the outer code, and the remaining four are the encoded
operators M 1;2;3;4 of the outer code. Notice that the stabilizer contains
elements of weight 4 (the stabilizer elements acting on each of the ve inner
codes); therefore, the code is degenerate. This is typical of concatenated
codes.
There is no need to stop at two levels of concatenation; from L QECC's
with parameters [[n1; 1; d1]]; : : : ; [[nL; 1; dL ]], we can construct a hierarchical
code with altogether L levels of codes within codes; it has length
n = n1n2 : : : nL; (7.204)
and distance
d d1 d2 : : :dL : (7.205)
In particular, by concatenating the [[5; 1; 3]] code L times, we may construct
a code with parameters
[[5L; 1; 3L ]]: (7.206)
Strictly speaking, this family of codes cannot protect against a number of
errors that scales linearly with the length. Rather the ratio of the number t
of errors that can be corrected to the length n is
t 1 3 L ; (7.207)
n 2 5
which tends to zero for large L. But the distance d may be a deceptive
measure of how well the code performs | it is all right if recovery fails for
some ways of choosing t pn errors, so long as recovery will be successful
for the typical ways of choosing pn faulty qubits. In fact, concatenated codes
can correct pn typical errors, for n large and p > 0.
Actually, the way concatenated codes are usually used does not fully
exploit their power to correct errors. To be concrete, consider the [[5; 1; 3]]
64 CHAPTER 7. QUANTUM ERROR CORRECTION
code in the case where each of the ve qubits is independently subjected to
the depolarizing channel with error probability p (that is X ; Y ; Z errors each
occur with probability p=3). Recovery is sure to succeed if fewer than two
errors occur in the block. Therefore, as in x7.4.2, we can bound the failure
probability by
!
(1) 5
pfail p 2 p2 = 10p2 : (7.208)
Now consider the performance of the concatenated [[25; 1; 9]] code. To
keep life easy, we will perform recovery in a simple (but nonoptimal) way:
First we perform recovery on each of the ve subblocks, measuring M 1;2;3;4
to obtain an error syndrome for each subblock. After correcting the sub-
blocks, we then measure the stabilizer generators M 1;2;3;4 of the outer code,
to obtains its syndrome, and apply an encoded X , Y , or Z to one of the
subblocks if the syndrome reveals an error.
For the outer code, recovery will succeed if at most one of the subblocks
is damaged, and the probability p(1) of damage to a subblock is bounded as
in eq. (7.208); we conclude that the probability of a botched recovery for the
[[25; 1; 9]] code is bounded above by
p(2) 10(p(1) )2 10(10p2 )2 = 1000p4 : (7.209)
Our recovery procedure is clearly not the best possible, because four errors
can induce failure if there are two each in two di erent subblocks. Since the
code has distance nine, there is a better procedure that would always recover
successfully from four errors, so that p(2) would be of order p5 rather than
p4 . Still, the suboptimal procedure has the advantage that it is very easily
generalized, (and analyzed) if there are many levels of concatenation.
Indeed, if there are L levels of concatenation, we begin recovery at the
innermost level and work our way up. Solving the recursion
p(`) C [p(`;1)]2; (7.210)
starting with p(0) = p, we conclude that
p(L) C1 (Cp)2L ; (7.211)
(where here C = 10). We see that as long as p < 1=10, we can make the
failure probability as small as we please by adding enough levels to the code.
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 65
We may write
!2L
p(L) po pp ; (7.212)
o
where po = 101 is an estimate of the threshold error probability that can be
tolerated (we will obtain better codes and better estimates of this threshold
below). Note that to obtain
p(L) < "; (7.213)
we may choose the block size n = 5L so that
" #log2 5
log( p
n log(p =p) o =")
: (7.214)
o
In principle, the concatenated code at a high level could fail with many
fewer than n=10 errors, but these would have to be distributed in a highly
conspiratorial fashion that is quite unlikely for n large.
The concatenated encoding of an unknown quantum state can be carried
out level by level. For example to encode aj0i + bj1i in the [[25; 1; 9]] block,
we could rst prepare the state aj0i + bj1i in the ve qubit block, using the
encoding circuit described earlier, and also prepare four ve-qubit blocks in
the state j0i. The aj0i + j1i can be encoded at the next level by executing the
encoded circuit yet again, but this time with all gates replaced by encoded
gates acting on ve-qubit blocks. We will see in the next chapter how these
encoded gates are constructed.

7.15.2 Toric codes

The toric codes are another family of codes that, like concatenated codes,
o er much better performance than would be expected on the basis of their
distance. They'll be described by Professor Kitaev (who discovered them).

7.15.3 Reed{Muller codes

Another way to construct codes that can correct many errors is to invoke the
CSS construction. Recall, in particular, the special case of that construction
that applies to a classical code C that is contained in its dual code (we
66 CHAPTER 7. QUANTUM ERROR CORRECTION
then say that C is \weakly self-dual"). In the CSS construction, there is a
codeword associated with each coset of C in C ?. Thus we obtain an [[n; k; d]]
quantum code, where n is the length of C , d is (at least) the distance of C ?,
and k = dim C ? ; dim C . Therefore, for the construction of CSS codes that
correct many errors, we seek weakly self-dual classical codes with a large
minimum distance.
One class of weakly self-dual classical codes are the Reed-Muller codes.
Though these are not especially ecient, they are very convenient, because
they are easy to encode, recovery is simple, and it is not dicult to explain
their mathematical structure.4
To prepare for the construction of Reed-Muller codes, consider Boolean
functions on m bits,

f : f0; 1gm ! f0; 1g : (7.215)

There are 22m such functions forming what we may regard as a binary vector
space of dimension 2m. It will be useful to have a basis for this space. Recall
(x6.1), that any Boolean function has a disjunctive normal form. Since the
NOT of a bit x is 1 ; x, and the OR of two bits x and y can be expressed as

x _ y == x + y ; xy ; (7.216)

any of the Boolean functions can be expanded as a polynomial in the m binary

variables xm;1; xm;2; : : : ; x1; x0 . A basis for the vector space of polynomials
consists of the 2m functions

1; xi; xixj ; xixj xk ; : : : ; (7.217)

(where, since x2 = x, we may choose the factors of each monomial to be

distinct). Each such function f can be represented by a binary string of length
2m , whose value in the position labeled by the binary string xm;1xm;2 : : :x1x0

4 See, , MacWilliams and Sloane, Chapter 13.

e.g.
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 67
is f (xm;1; xm;2; : : :x1; x0). For example, for m = 3,
1 = (11111111)
x0 = (10101010)
x1 = (11001100)
x2 = (11110000)
x 0 x1 = (10001000)
x 0 x2 = (10100000)
x 1 x2 = (11000000)
x0x1x2 = (10000000) : (7.218)
A subspace of this vector space is obtained if we restrict the degree of
the polynomial to r or less. This subspace is the Reed{Muller (or RM) code,
denoted R(r; m). Its length is n = 2m and its dimension is
! ! !
m m
k = 1 + 1 + 2 + ::: + r : m (7.219)

Some special cases of interest are:

R(0; m) is the length-2m repetition code.
R(m;1; m) is the dual of the repetition code, the space on all length-2m
even-weight strings.
R(1; 3) is the n = 8, k = 4 code spanned by 1; x0; x1; x2; it is in fact
the [8; 4; 4] extended Hamming code that we have already discussed.
More generally, R(m ; 2; m) is a d = 4 extended Hamming code for
each m 3. If we puncture this code (remove the last bit from all
codewords) we obtain the [n = 2m ; 1; k = n ; m; d = 3] perfect
Hamming code.
R(1; m) has d = 2m;1 = 21 n and k = m. It is the dual of the extended
Hamming code, and is known as a \ rst-order" Reed{Muller code. It
is of considerable practical interest in its own right, both because of its
large distance and because it is especially easy to decode.
68 CHAPTER 7. QUANTUM ERROR CORRECTION
We can compute the distance of the code R(r; m) by invoking induction
on m. First we must determine how R(m + 1; r) is related to R(m; r). A
function of xm; : : : ; x0 can be expressed as
f (xm ; : : : ; x0) = g(xm;1; : : : ; x0) + xmh(xm;1 ; : : : ; x0) ;
(7.220)
and if f has degree r, then g must be of degree r and h of degree r ; 1.
Regarding f as a vector of length 2m+1 , we have
f = (gjg) + (hj0) (7.221)
where g; h are vectors of length 2m . Consider the distance between f and
f 0 = (g0jg0) + (h0j0) : (7.222)
For h = h0 and f 6= f 0 this distance is wt(f ; f 0) =2 wt(g ; g0) 2
dist (R(r; m)); for h 6= h0 it is at least wt(h ; h0) dist (R(r ; 1; m)). If
d(r; m) denotes the distance of R(r; m), then we see that
d(r; m + 1) = min(2 d(r; m); d(r ; 1; m)) : (7.223)
Now we can show that d(r; m) = 2m;r by induction on m. To start with,
we check that d(r; m = 1) = 21;r for r = 0; 1; R(1; 1) is the space of all
length 2 strings, and R(0; 1) is the length-2 repetition code. Next suppose
that d = 2m;r for all m M and 0 r m. Then we infer that
d(r; m + 1) = min(2m;r+1 ; 2m;r+1 ) = 2m;r+1 ; (7.224)
for each 1 r m. It is also clear that d(m + 1; m + 1) = 1, since
R(m + 1; m + 1) is the space of all binary strings of length 2m+1, and that
d(0; m + 1) = 2m+1 , since R(0; m + 1) is the length-2m+1 repetition code.
This completes the inductive step, and proves d(r; m) = 2m;r .
It follows, in particular, that R(m ; 1; m) has distance 2, and therefore
that the dual of R(r; m) is R(m ; r ; 1; m). First we notice that the binomial
coecients mj sum to 2m , so that R(m ; r ; 1) has the right dimension
to be R(r; m)?. It suces, then, to show that R(m ; r ; 1) is contained in
R(r; m). But if f 2 R(r; m) and g 2 R(m ; r ; 1; m), their product is a
polynomial of degree at most m ; 1, and is therefore in R(m ; 1; m). Each
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 69
vector in R(m ; 1; m) has even weight, so the inner product f g vanishes;
hence g is in the dual R(v; m)?. This shows that
R(r; m)? = R(m ; r ; 1; m): (7.225)
It is because of this nice duality property that Reed{Muller codes are well-
suited for the CSS construction of quantum codes.
In particular, the Reed{Muller code is weakly self-dual for r m ; r ; 1,
or 2r ; m ; 1, and self-dual for 2r = m ; 1. In the self-dual case, the
distance is
p
d = 2m;r = 2 12 (m+1) = 2n ; (7.226)
and the number of encoded bits is
k = 12 n = 2m;1 : (7.227)
These self-dual codes, for m = 3; 5; 7, have parameters
[8; 4; 4]; [32; 16; 8]; [128; 64; 16] : (7.228)
(The [8; 4; 4] code is the extended Hamming code as we have already noted.)
Associated with these self-dual codes are the k = 0 quantum codes with
parameters
[[8; 0; 4]]; [[32; 0; 8]]; [[128; 0; 16]] ; (7.229)
and so forth.
One way to obtain a k = 1 quantum code is to puncture the self-dual
Reed{Muller code, that is, to delete one of the n = 2m bits from the code.
(It turns out not to matter which bit we delete.) The qclassical punctured
code has parameters n = 2m ; 1, d = 2 21 (m;1) ; 1 = 2(n + 1) ; 1, and
k = 12 (n + 1). Furthermore, the dual of the punctured code is its even
subcode. (The even subcode consists of those RM codewords for which the
bit removed by the puncture is zero, and it follows from the self-duality of
the RM code that these are orthogonal to all the words (both odd and even
weight) of the punctured code.) From these punctured codes, we obtain, via
the CSS construction, k = 1 quantum codes with parameters
[[7; 1; 3]]; [[31; 1; 7]]; [[127; 1; 15]] ; (7.230)
70 CHAPTER 7. QUANTUM ERROR CORRECTION
and so forth. The [7; 4; 3] Hamming code is obtained by puncturing the
[8; 4; 4] RM code, and the corresponding [7; 1; 3] QECC is of course Steane's
code. These QECC's have a distance that increases like the square root of
their length.
These k = 1 codes are not among the most ecient of the known QECC's.
Nevertheless they are of special interest, since their properties are especially
conducive to implementing fault-tolerant quantum gates on the encoded data,
as we will see in Chapter 8. In particular, one useful property of the self-dual
RM codes is that they are \doubly even" | all codewords have a weight that
is an integral multiple of four.
Of course, we can also construct quantum codes with k > 1 by applying
the CSS construction to the RM codes. For example R(3; 6), with parameters
n = 2m = 64
d = 2m;r = 8 ! !
6 6
k = 1 + 6 + 2 + 3 = 1 + 6 + 15 + 20 = 42 ; (7.231)
is dual to R(2; 6), with parameters
n = 2m = 64
d = 2m;r = 16 !
k = 1 + 6 + 62 = 1 + 6 + 15 = 22 ; (7.232)
and so the CSS construction yields a QECC with parameters
[[64; 20; 8]] : (7.233)
Many other weakly self-dual codes are known and can likewise be employed.

7.15.4 The Golay Code

From the perspective of pure mathematics, the most important error-correcting
code (classical or quantum) ever discovered is also one of the rst ever de-
scribed in a published article | the Golay code. Here we will brie y describe
the Golay code, as it too can be transformed into a nice QECC via the CSS
construction. (Perhaps this QECC is not really important enough to deserve
a section of this chapter; still, I have included it just for fun.)
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 71
The (extended) Golay code is a self-dual [24; 12; 8] classical code. If we
puncture it (remove any one of its 24 bits), we obtain the [23; 12; 7] Golay
code, which can correct three errors. This code is actually perfect, as it
saturates the sphere-packing bound:
! ! !
1+ 23 + 23 + 23 = 211 = 223;12: (7.234)
1 2 3
In fact, perfect codes that correct more than one error are extremely rare.
It can be shown5 that the only perfect codes (linear or nonlinear) over any
nite eld that can correct more than one error are the [23; 12; 7] code and
one other binary code discovered by Golay, with parameters [11; 6; 5].
The [24; 12; 8] Golay code has a very intricate symmetry. The symmetry
is characterized by its automorphism group | the group of permutations of
the 24 bits that take codewords to codewords. This is the Mathieu group
M24, a sporadic simple group of order 244,823,040 that was discovered in the
19th century.
The 212 = 4096 codewords have the weight distribution (in an obvious
notation)
01875912257616759241 : (7.235)
Note in particular that each weight is a multiple of 4 (the code is doubly
even). What is the signi cance of the number 759 (= 3:11:23)? In fact it is
! !
24 8 = 759; (7.236)
5 5
and it arises for this combination reason: with each weight-8 codeword we
associate the eight-element set (\octad") where the codeword has its support.
Each 5-element subset of the 24 bits is contained in exactly one octad (a
re ection of the code's large symmetry).
What makes the Golay code important in mathematics? Its discovery
in 1949 set in motion a sequence of events that led, by around 1980, to a
complete classi cation of the nite simple groups. This classi cation is one
of the greatest achievements of 20th century mathematics.
(A group is simple if it contains no nontrivial normal subgroup. The nite
simple groups may be regarded as the building blocks of all nite groups in
5MacWilliams and Sloane x6.10.
72 CHAPTER 7. QUANTUM ERROR CORRECTION
the sense that for any nite group G there is a unique decomposition of the
form
G G0 G1 G2 : : : Gn ; (7.237)
where each Gj+1 is a normal subgroup of Gj , and each quotient group
Gj =Gj+1 is simple. The nite simple groups can be classi ed into various
in nite families, plus 26 additional \sporadic" simple groups that resist clas-
si cation.)
The Golay code led Leech, in 1964, to discover an extraordinarily close
packing of spheres in 24 dimensions, known as the Leech Lattice . The lattice
points (the centers of the spheres) are 24-component integer-valued vectors
with these properties: to determine if ~x = (x1; x2 : : : ; x24) is contained in ,
write each component xj in binary notation,
xj = : : :xj3xj2xj1xj0 : (7.238)
Then ~x 2 if
(i) The xj0's are either all 0's or all 1's.
(ii) The xj2's are an even parity 24-bit string if the xj0's are 0, and an odd
parity 24-bit string if the xj0's are 1.
(iii) The xj1's are a 24-bit string contained in the Golay code.
When these rules are applied, a negative number is represented by its binary
complement, e.g.
;1 = : : : 11111 ;
;2 = : : : 11110 ;
;3 = : : : 11101 ;
etc: (7.239)
We can easily check that is a lattice; that is, it is closed under addition.
(Bits other than the last three in the binary expansion of the xj 's are unre-
stricted).
We can now count the number of nearest neighbors to the origin (or
the number of spheres that touch any given sphere). These points are all
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 73
(distance)2 = 32 away from the origin:
(2)8 : 27 759
(3)(1)23 : 212 24 !
(4)2 : 22 24 : (7.240)
2
That is, there are 759 27 neighbors that have eight components with the
values 2 | their support is on one of the 759 weight-8 Golay codewords,
and the number of ; signs must be even. There are 212 24 neighbors that
have one component with value 3 (this component can be chosen in 24
ways) and the remaining 23 components have the value (1). If, say, +3 is
chosen, then the position of the +3, together with the position of the ;1's,
can be any of the 211 Golay codewords with value 1 at the position of the
+3. There are 22 242 neighbors with two components each taking the value
4 (the signs are unrestricted). Altogether, the coordination number of the
lattice is 196; 560.
The Leech lattice has an extraordinary automorphism group discovered
by Conway in 1968. This is the nite subgroup of the 24-dimensional rotation
group SO(24) that preserves the lattice. The order of this nite group (known
as 0, or \dot oh") is
222 39 54 72 11 13 23 = 8; 315; 553; 613; 086; 720; 000 ' 8:3 1018:
(7.241)
If its two element center is modded out, the sporadic simple group 1 is
obtained. At the time of its discovery, 1 was the largest of the sporadic
simple groups that had been constructed.
The Leech lattice and its automorphism group eventually (by a route
that won't be explained here) led Griess in 1982 to the construction of the
most amazing sporadic simple group of all (whose existence had been inferred
earlier by Fischer and Griess). It is a nite subgroup of the rotation group in
196,883 dimensions, whose order is approximately 8:081053. This behemoth
known as F1 has earned the nickname \the monster" (though Griess prefers
to call it \the friendly giant".) It is the largest of the sporadic simple groups,
and the last to be discovered.
Thus the classi cation of the nite simple groups owes much to (classical)
coding theory, and to the Golay code in particular. Perhaps the theory of
74 CHAPTER 7. QUANTUM ERROR CORRECTION
QECC's can also bequeath to mathematics something of value and broad
interest!
Anyway, since the (extended) [24; 12; 8] Golay code is self-dual, the [23; 12; 7]
code obtained by puncturing it is weakly self dual; its [23; 11; 8] dual is its
even subcode. From it, a [23; 1; 7] QECC can be constructed by the CSS
method. This code is not the most ecient quantum code that can correct
three errors (there is a [17; 1; 7] code that saturates the Rains bound), but it
has especially nice properties that are conducive to fault-tolerant quantum
computation, as we will see in Chapter 8.

7.16 The Quantum Channel Capacity

As we have formulated it up until now, our goal in constructing quantum
error correcting codes has been to maximize the distance d of the code,
given its length n and the number k of encoded qubits. Larger distance
provides better protection against errors, as a distance d code can correct
d ; 1 erasures, or (d ; 1)=2 errors at unknown locations. We have observed
that \good" codes can be constructed, that maintain a nite rate k=n for n
large, and correct a number of errors pn that scales linearly with n.
Now we will address a related but rather di erent question about the
asymptotic performance of QECC's. Consider a superoperator $ that acts on
density operators in a Hilbert space H. Now consider $ acting independently
each copy of H contained in the n-fold tensor product
H(n) = H : : : H: (7.242)
We would like to select a code subspace H(code n) of H(n) such that quantum
information residing in H(code
n) can be subjected to the superoperator

$(n) = $ : : : $; (7.243)
and yet can still be decoded with high delity.
The rate of a code is de ned as
(n)
log H
R = log Hcode(n) ; (7.244)
this is the number of qubits employed to carry one qubit of encoded infor-
mation. The quantum channel capacity Q($) of the superoperator $ is the
7.16. THE QUANTUM CHANNEL CAPACITY 75
maximum asymptotic rate at which quantum information can be sent over
the channel with arbitrarily good delity. That is, Q($) is the largest number
such that for any R < Q($) and any " > 0, there is a code H(coden) with rate at
least R, such that for any j i 2 H(code
n) , the state recovered after j i passes
through $(n) has delity
F = h jj i > 1 ; ": (7.245)
Thus, Q($) is a quantum version of the capacity de ned by Shannon
for a classical noisy channel. As we have already seen in Chapter 5, this
Q($) is not the only sort of capacity that can be associated with a quantum
channel. It is also of considerable interest to ask about C ($), the maximum
rate at which classical information can be transmitted through a quantum
channel with arbitrarily small probability of error. A formal answer to this
question was formulated in x5.4, but only for a restricted class of possible
encoding schemes; the general answer is still unknown. The quantum channel
capacity Q($) is even less well understood than the classical capacity C ($) of
a quantum channel. Note that Q($) is not the same thing as the maximum
asymptotic rate k=n that can be achieved by \good" [[n; k; d]] QECC's with
positive d=n. In the case of the quantum channel capacity we need not insist
that the code correct any possible distribution of pn errors, as long as the
errors that cannot be corrected become highly atypical for n large.
Here we will mostly limit the discussion to two interesting examples of
quantum channels acting on a single qubit | the quantum erasure channel
(for which Q is exactly known), and the depolarizing channel (for which Q
is still unknown, but useful upper and lower bounds can be derived).
What are these channels? In the case of the quantum erasure chan-
nel, a qubit transmitted through the channel either arrives intact, or (with
probability p) becomes lost and is never received. We can nd a unitary rep-
resentation of this channel by embedding the qubit in the three-dimensional
Hilbert space of a qubit with orthonormal basis fj0i; j1i; j2ig. The channel
acts according to
q
j0i j0iE ! 1 ; pj0i j0iE + ppj2i j1iE ;
q
j1i j0iE ! 1 ; pj1i j0iE + ppj2i j2iE ; (7.246)
where fj0iE ; j1iE ; j2iE g are mutually orthogonal states of the environment.
The receiver can measure the observable j2ih2j to determined whether the
qubit is undamaged or has been \erased."
76 CHAPTER 7. QUANTUM ERROR CORRECTION
The depolarizing channel (with error probability p) was discussed at
length in x3.4.1. We see that, for p 3=4, we may describe the fate of
a qubit transmitted through the channel this way: with probability 1 ; q
(where q = 4=3p), the qubit arrives undamaged, and with probability q it is
destroyed, in which case it is described by the random density matrix 12 1.
Both the erasure channel and the depolarizing channel destroy a qubit
with a speci ed probability. The crucial di erence between the two channels
is that in the case of the erasure channel, the receiver knows which qubits
have been destroyed; in the case of the depolarizing channel, the damaged
qubits carry no identifying marks, which makes recovery more challenging.
Of course, for both channels, the sender has no way to know ahead of time
which qubits will be obliterated.

7.16.1 Erasure channel

The quantum channel capacity of the erasure channel can be precisely de-
termined. First we will derive an upper bound on Q, and then we will show
that codes exist that achieve high delity and attain a rate arbitrarily close
to the upper bound.
As the rst step in the derivation of an upper bound on the capacity, we
show that Q = 0 for p > 12 .

{ Figure {

We observe that the erasure channel can be realized if Alice sends a qubit
to Bob, and a third party Charlie decides at random to either steal the
qubit (with probability p) or allow the qubit to pass unscathed to Bob (with
probability 1 ; p).
If Alice sends a large number n of qubits, then about (1 ; p)n reach Bob,
and pn are intercepted by Charlie. Hence for p > 12 , Charlie winds up in
possession of more qubits than Bob, and if Bob can recover the quantum
information encoded by Alice, then certainly Charlie can as well. Therefore,
if Q(p) > 0 for p > 21 , Bob and Charlie can clone the unknown encoded
quantum states sent by Alice, which is impossible. (Strictly speaking, they
can clone with delity F = 1 ; ", for any " > 0.) We conclude that Q(p) = 0
for p > 12 .
7.16. THE QUANTUM CHANNEL CAPACITY 77
To obtain a bound on Q(p) in the case p < 12 , we will appeal to the
following lemma. Suppose that Alice and Bob are connected by both a
perfect noiseless channel and a noisy channel with capacity Q > 0. And
suppose that Alice sends m qubits over the perfect channel and n qubits
over the noisy channel. Then the number r of encoded qubits that Bob may
recover with arbitrarily high delity must satisfy
r m + Qn: (7.247)
We derive this inequality by noting that Alice and Bob can simulate the m
qubits sent over the perfect channel by sending m=Q over the noisy channel
and so achieve a rate
!
r
R = m=Q + n = m + Qn Q;r (7.248)

over the noisy channel. Were r to exceed m + Qn, this rate R would exceed
the capacity, a contradiction. Therefore eq. (7.247) is satis ed.
How consider the erasure channel with error probability p1, and suppose
Q(p1) > 0. Then we can bound Q(p2) for p2 p1 by
Q(p2) 1 ; pp2 + pp2 Q(p1): (7.249)
1 1
(In other words, if we plot Q(p) in the (p; Q) plane, and we draw a straight line
segment from any point (p1 ; Q1) on the plot to the point (p = 0; Q = 1), then
the curve Q(p) must lie on or below the segment in the interval 0 p p1; if
Q(p) is twice di erentiable, then its second derivative cannot be positive.) To
obtain this bound, imagine that Alice sends n qubits to Bob, knowing ahead
of time that n(1 ; p2 =p1) speci ed qubits will arrive safely. The remaining
n(p2=p1) qubits are erased with probability p1. Therefore, Alice and Bob are
using both a perfect channel and an erasure channel with erasure probability
p1; eq. (7.247) holds, and the rate R they can attain is bounded by
R 1 ; pp2 + pp2 Q(p1): (7.250)
1 1
On the other hand, for n large, altogether about np2 qubits are erased, and
(1 ; p2)n arrive safely. Thus Alice and Bob have an erasure channel with
erasure probability p2 , except that they have the additional advantage of
78 CHAPTER 7. QUANTUM ERROR CORRECTION
knowing ahead of time that some of the qubits that Alice sends are invul-
nerable to erasure. With this information, they can be no worse o than
without it; eq. (7.249) then follows. The same bound applies to the depolar-
izing channel as well.
Now, the result Q(p) = 0 for p > 1=2 can be combined with eq. (7.249).
We conclude that the curve Q(p) must be on or below the straight line
connecting the points (p = 0; Q = 1) and (p = 1=2; Q = 0), or
Q(p) 1 ; 2p; 0 p 12 : (7.251)
In fact, there are stabilizer codes that actually attain the rate 1 ; 2p for
0 p 1=2. We can see this by borrowing an idea from Claude Shannon,
and averaging over random stabilizer codes. Imagine choosing, in succession,
altogether n ; k stabilizer generators. Each is selected from among the
4n Pauli operators, where all have equal a priori probability, except that
each generator is required to commute with all generators chosen in previous
rounds.
Now Alice uses this stabilizer code to encode an arbitrary quantum state
in the 2k -dimensional code subspace, and sends the n qubits to Bob over an
erasure channel with erasure probability p. Will Bob be able to recover the
state sent by Alice?
Bob replaces each erased qubit by a qubit in the state j0i, and then
proceeds to measure all n ; k stabilizer generators. From this syndrome
measurement, he hopes to infer the Pauli operator E acting on the replaced
qubits. Once E is known, we can apply E y to recover a perfect duplicate
of the state sent by Alice. For n large, the number of qubits that Bob must
replace is about pn, and he will recover successfully if there is a unique Pauli
operator E that can produce the syndrome that he nds. If more than one
Pauli operator acting on the replaced qubits has this same syndrome, then
recovery may fail.
How likely is failure? Since there are about pn replaced qubits, there are
about 4pn Pauli operators with support on these qubits. Furthermore, for any
particular Pauli operator E , a random stabilizer code generates a random
syndrome | each stabilizer generator has probability 1=2 of commuting with
E , and probability 1=2 of anti-commuting with E . Therefore, the probability
that two Pauli operators have the same syndrome is (1=2)n;k .
There is at least one particular Pauli operator acting on the replaced
qubits that has the syndrome found by Bob. But the probability that an-
7.16. THE QUANTUM CHANNEL CAPACITY 79
other Pauli operator has this same syndrome (and hence the probability of
a recovery failure) is no worse than
n;k
Pfail 4pn 21 = 2;n(1;2p;R) : (7.252)
where R = k=n is the rate. Eq. (7.252) bounds the failure probability if
we average over all stabilizer codes with rate R; it follows that at least one
particular stabilizer code must exist whose failure probability also satis es
the bound.
For that particular code, Pfail gets arbitrarily small as n ! 1, for any rate
R = 1 ; 2p ; strictly less than 1 ; 2p. Therefore R = 1 ; 2p is asymptotically
attainable; combining this result with the inequality eq. (7.251) we obtain
the capacity of the quantum erasure channel:
Q(p) = 1 ; 2p; 0 p 12 : (7.253)
If we wanted assurance that a distinct syndrome could be assigned to
all ways of damaging pn erased qubits, then we would require an [[n; k; d]]
quantum code with distance d > pn. Our Gilbert{Varshamov bound of x7.14
guarantees the existence of such a code for
R < 1 ; H2(p) ; p log2 3: (7.254)
This rate can be achieved by a code that recovers from any of the possible
ways of erasing up to pn qubits. It lies strictly below the capacity for p > 0,
because to achieve high average delity, it suces to be able to correct the
typical erasures, rather than all possible erasures.

7.16.2 Depolarizing channel

The capacity of the depolarizing channel is still not precisely known, but we
can obtain some interesting upper and lower bounds.
As for the erasure channel, we can nd an upper bound on the capacity
by invoking the no-cloning theorem. Recall that for the depolarizing channel
with error probability p < 3=4, each qubit either passes safely with prob-
ability 1 ; 4=3p, or is randomized (replaced by the maximally mixed state
= 12 1) with probability q = 4=3p. An eavesdropper Charlie, then, can
simulate the channel by intercepting qubits with probability q, and replacing
80 CHAPTER 7. QUANTUM ERROR CORRECTION
each stolen qubit with a maximally mixed qubit. For q > 1=2, Charlie steals
more than half the qubits and is in a better position than Bob to decode the
state sent by Alice. Therefore, to disallow cloning, the rate at which quan-
tum information is sent from Alice to Bob must be strictly zero for q > 1=2
or p > 3=8:
Q(p) = 0; p > 38 : (7.255)
In fact we can obtain a stronger bound by noting that Charlie can choose
a better eavesdropping strategy { he can employ the optimal approximate
cloner that you studied in a homework problem. This device, applied to
each qubit sent by Alice, replaces it by two qubits that each approximate the
original with delity F = 5=6, or
1 2
j ih j ! (1 ; q)j ih j + q 2 1 ; (7.256)
where F = 5=6 = 1 ; 1=2q. By operating the cloner, both Charlie and
Bob can receive Alice's state transmitted through the q = 1=3 depolarizing
channel. Therefore, the attainable rate must vanish; otherwise, by combin-
ing the approximate cloner with quantum error correction, Bob and Charlie
would be able to clone Alice's unknown state exactly. We conclude that the
capacity vanishes for q > 1=3 or p > 1=4:
Q(p) = 0; p > 41 : (7.257)
Invoking the bound eq. (7.249) we infer that
Q(p) 1 ; 4p; 0 p 14 : (7.258)
This result actually coincides with our bound on the rate of [[n; k; d]] codes
with k 1 and d 2pn + 1 found in x7.8. A bound on the capacity is not
the same thing as a bound on the allowable error probability for an [[n; k; d]]
code (and in the latter case the Rains bound is tighter). Still, the similarity
of the two results bound may not be a complete surprise, as both bounds are
derived from the no-cloning theorem.
We can obtain a lower bound on the capacity by estimating the rate that
can be attained through random stabilizer coding, as we did for the erasure
7.16. THE QUANTUM CHANNEL CAPACITY 81
channel. Now, when Bob measures the n ; k (randomly chosen, commuting)
stabilizer generators, he hopes to obtain a syndrome that points to a unique
one among the typical Pauli error operators that can arise with nonnegligible
probability when the depolarizing channel acts on the n qubits sent by Alice.
The number Ntyp of typical Pauli operators with total probability 1 ; " can
be bounded by
Ntyp 2n(H2 (p)+p log2 3+); (7.259)
for any ; " > 0 and n suciently large. Bob's attempt at recovery can fail if
another among these typical Pauli operators has the same syndrome as the
actual error operator. Since a random code assigns a random (n ; k)-bit
syndrome to each Pauli operator, the failure probability can be bounded as
Pfail 2n(H2 (p)+p log2 3+)2k;n + " : (7.260)
Here the second term bounds the probability of an atypical error, and the
rst bounds the probability of an ambiguous syndrome in the case of a typical
error. We see that the failure probability, averaged over random stabilizer
codes, becomes arbitrarily small for large n, for any 0 < 0 and rate R such
that
R nk < 1 ; H2(p) ; p log2 3 ; 0: (7.261)
If the failure probability, averaged over codes, is small, there is a particu-
lar code with small failure probability, and we conclude that the rate R is
attainable; the capacity of the depolarizing channel is bounded below as
Q(p) 1 ; H2(p) ; p log2 3 : (7.262)
Not coincidentally, the rate attainable by random coding agrees with the
asymptotic form of the quantum Hamming upper bound on the rate of nonde-
generate [[n; k; d]] codes with d > 2pn; we arrive at both results by assigning
a distinct syndrome to each of the typical errors. Of course, the Gilbert{
Varshamov lower bound on the rate of [[n; k; d]] codes lies below Q(p), as it
is obtained by demanding that the code can correct all the errors of weight
pn or less, not just the typical ones.
This random coding argument can also be applied to a somewhat more
general channel, in which X ; Y , and Z errors occur at di erent rates. (We'll
82 CHAPTER 7. QUANTUM ERROR CORRECTION
call this a \Pauli channel.") If an X error occurs with probability pX , a Y
error with probability pY , a Z error with probability pZ , and no error with
probability pI 1 ; pX ; pY ; pZ , then the number of typical errors on n
qubits is
n!
(pX n)!(pY n)!(pZ n)!(pI n)! 2
nH (pI ;pX ;pY ;pZ ) ; (7.263)
where
H H (pI ; pX ; pY ; pZ ) = ;pI log2 pI ; pX log2 pX ; pY log2 pY ; pZ log2 pZ ;
(7.264)
is the Shannon entropy of the probability distribution fpI ; pX ; pY ; pZ g. Now
we nd
Q(pI ; pX ; pY ; pZ ) 1 ; H (pI ; pX ; pY ; pZ ) ; (7.265)
if the rate R satis es R < 1 ; H , then again it is highly unlikely that a single
syndrome of a random stabilizer code will point to more than one typical
error operator.

7.16.3 Degeneracy and capacity

Our derivation of a lower bound on the capacity of the depolarizing channel
closely resembles the argument in x5.1.3 for a lower bound on the capacity
of the classical binary symmetric channel. In the classical case, there was
a matching upper bound. If the rate were larger, then there would not be
enough syndromes to attach to all of the typical errors.
In the quantum case, the derivation of the matching upper bound does
not carry through, because a quantum code can be degenerate. We may
not need a distinct syndrome for each typical error, as some of the possible
errors could act trivially on the code subspace. Indeed, not only does the
derivation fail; the matching upper bound is actually false { rates exceeding
1 ; H2(p) ; p log2 3 actually can be attained.6
Shor and Smolin investigated the rate that can be achieved by concate-
nated codes, where the outer code is a random stabilizer code, and the inner
6 P.W. Shor and J.A. Smolin, \Quantum Error-Correcting
Codes Need Not Completely
Reveal the Error Syndrome" quant-ph/9604006; D.P. DiVincen, P.W. Shor, and J.A.
Smolin, \Quantum Channel Capacity of Very Noisy Channels," quant-ph/9706061.
7.16. THE QUANTUM CHANNEL CAPACITY 83
code is a degenerate code with a relatively small block size. Their idea is
that the degeneracy of the inner code will allow enough typical errors to act
trivially in the code space that a higher rate can be attained than through
random coding alone.
To investigate this scheme, imagine that encoding and decoding are each
performed in two stages. In the rst stage, using the (random) outer code
that she and Bob have agreed on, Alice encodes the state that she has selected
in a large n-qubit block. In the second stage, Alice encodes each of these
n-qubits in a block of m qubits, using the inner code. Similarly, when Bob
receives the nm qubits, he rst decodes each inner block of m, and then
subsequently decodes the block of n.
We can evidently describe this procedure in an alternative language |
Alice and Bob are using just the outer code, but the qubits are being trans-
mitted through a composite channel.

{ Figure {

This modi ed channel consists (as shown) of: rst the inner encoder, then
propagation through the original noisy channel, and nally inner decoding
and inner recovery. The rate that can be attained through the original chan-
nel, via concatenated coding, is the same as the rate that can be attained
through the modi ed channel, via random coding.
Speci cally, suppose that the inner code is an m-qubit repetition code,
with stabilizer
Z 1 Z 2; Z 1 Z 3 ; Z 1Z 4 ; : : : ; Z 1Z m : (7.266)
This is not much of a quantum code; it has distance 1, since it is insensi-
tive to phase errors | each Z j commutes with the stabilizer. But in the
present context its important feature is it high degeneracy, all Z i errors are
equivalent.
The encoding (and decoding) circuit for the repetition code consists of
just m ; 1 CNOT's, so our composite channel looks like (in the case m = 3)

{ Figure {
84 CHAPTER 7. QUANTUM ERROR CORRECTION
where $ denotes the original noisy channel. (We have also suppressed the
nal recovery step of the decoding; e.g., if the measured qubits both read
1, we should ip the data qubit. In fact, to simplify the analysis of the
composite channel, we will dispense with this step.)
Since we recall that a CNOT propagates bit ips forward (from control
to target) and phase ips backward (from target to control), we see that for
each possible measurement outcome of the auxiliary qubits, the composite
channel is a Pauli channel. If we imagine that this measurement of the m ; 1
inner block qubits is performed for each of the n qubits of the outer block,
then Pauli channels act independently on each of the n qubits, but the chan-
nels acting on di erent qubits have di erent parameters (error probabilities
p(Ii); p(Xi); p(Yi); p(Zi) for the ith qubit). Now the number of typical error operators
acting on the n qubits is
Pn
2 i=1 Hi (7.267)
where
Hi = H (p(Ii); p(Xi); p(Yi); p(Zi)); (7.268)
is the Shannon entropy of the Pauli channel acting on the ith qubit. By the
law of large numbers, we will have
X
n
Hi = nhH i; (7.269)
i=1
for large n, where hH i is the Shannon entropy, averaged over the 2m;1 pos-
sible classical outcomes of the measurement of the extra qubits of the inner
code. Therefore, the rate that can be attained by the random outer code is
R = 1 ;mhH i ; (7.270)
(we divide by m, because the concatenated code has a length m times longer
than the random code).
Shor and Smolin discovered that there are repetition codes (values of m)
for which, in a suitable range of p; 1 ;hH i is positive while 1 ; H2(p) ; p log2 3
is negative. In this range, then, the capacity Q(p) is nonzero, showing that
the lower bound eq. (7.262) is not tight.
7.16. THE QUANTUM CHANNEL CAPACITY 85
A nonvanishing asymptotic rate is attainable through random coding for
1 ; H2 (p) ; p log2 3 > 0, or p < pmax ' :18929. If a random outer code is
concatenated with a 5-qubit inner repetition code (m = 5 turns out to be the
optimal choice), then 1 ;hH i > 0 for p < p0max ' :19036; the maximum error
probability for which a nonzero rate is attainable increases by about 0:6%.
It is not obvious that the concatenated code should outperform the random
code in this range of error probability, though as we have indicated, it might
have been expected because of the (phase) degeneracy of the repetition code.
Nor is it obvious that m = 5 should be the best choice, but this can be
veri ed by an explicit calculation of hH i.7
The depolarizing channel is one of the very simplest of quantum chan-
nels. Yet even for this case, the problem of characterizing and calculating
the capacity is largely unsolved. This example illustrates that, due to the
possibility of degenerate coding, the capacity problem is considerably more
subtle for quantum channels than for classical channels.
We have seen that (if the errors are well described by the depolarizing
channel), quantum information can be recovered from a quantum memory
with arbitrarily high delity, as long as the probability of error per qubit is
less than 19%. This is an improvement relative to the 10% error rate that
we found could be handled by concatenation of the [[5; 1; 3]] code. In fact
[[n; k; d]] codes that can recover from any distribution of up to pn errors do
not exist for p > 1=6, according to the Rains bound. Nonzero capacity is
possible for error rates between 16:7% and 19% because it is sucient for the
QECC to be able to correct the typical errors rather than all possible errors.
However, the claim that recovery is possible even if 19% of the qubits
sustain damage is highly misleading in an important respect. This result
applies if encoding, decoding, and recovery can be executed awlessly. But
these operations are actually very intricate quantum computations that in
practice will certainly be susceptible to error. We will not fully understand
how well coding can protect quantum information from harm until we have
learned to design an error recovery protocol that is robust even if the execu-
tion of the protocol is awed. Such fault-tolerant protocols will be developed
in Chapter 8.
7In fact a very slight furtherimprovement can be achieved by concatenating a random
code with the 25-qubit generalized Shor code described in the exercises { then a nonzero
rate is attainable for max ' 19056 (another 0 1% better than the maximum tolerable
p < p
00 : :

error probability with repetition coding).

86 CHAPTER 7. QUANTUM ERROR CORRECTION
7.17 Summary
Quantum error-correcting codes: Quantum error correction can protect
quantum information from both decoherence and \unitary errors" due to
imperfect implementations of quantum gates. In a (binary) quantum error-
correcting code (QECC), the 2k -dimensional Hilbert space Hcode of k encoded
qubits is embedded in the 2n-dimensional Hilbert space of n qubits. Errors
acting on the n qubits are reversible provided that h jM y M j i=h j i is
independent of j i for any j i 2 Hcode and any two Kraus operators M ;
occuring in the expansion of the error superoperator. The recovery superop-
erator transforms entanglement of the environment with the code block into
entanglement of the environment with an ancilla that can then be discarded.
Quantum stabilizer codes: Most QECC's that have been constructed
are stabilizer codes. A binary stabilizer code is characterized by its stabilizer
S , an abelian subgroup of the n-qubit Pauli group Gn = fI ; X ; Y ; Z g n
(where X ; Y ; Z are the single-qubit Pauli operators). The code subspace is
the simultaneous eigenspace with eigenvalue one of all elements of S ; if S has
n ; k independent generators, then there are k encoded qubits. A stabilizer
code can correct each error in a subset E of Gn if for each E a; E b 2 E ,
E ya E b either lies in the stabilizer S or outside of the normalizer S ? of the
stabilizer. If some E yaE b is in S for E a;b 2 E the code is degenerate; otherwise
it is nondegenerate. Operators in S ? n S are \logical" operators that act on
encoded quantum information. The stabilizer S can be associated with an
additive code over the nite eld GF (4) that is self-orthogonal with respect
to a symplectic inner product. The weight of a Pauli operator is the number
of qubits on which its action is nontrivial, and the distance d of a stabilizer
code is the minimum weight of an element of S ? n S . A code with length n,
k encoded qubits, and distance d is called an [[n; k; d]] quantum code. If the
code enables recovery from any error superoperator with support on Pauli
operators of weight t or less, we say that the code \can correct t errors." A
code with distance d can correct [(d;1)=2] in unknown locations or d;1 errors
in known locations. \Good" families of stabilizer codes can be constructed
in which d=n and k=n remain bounded away from zero as n ! 1.
Examples: The code of minimal length that can correct one error is a
[[5; 1; 3; ]] quantum code associated with a classical GF (4) Hamming code.
Given a classical linear code C1 and subcode C2 C1, a Calderbank-Shor-
Steane (CSS) quantum code can be constructed with k = dim(C1) ; dim(C2)
encoded qubits. The distance d of the CSS code satis es d min(d1; d?2 ),
7.18. EXERCISES 87
where d1 is the distance of C1 and d?2 is the distance of C2?, the dual of
C2. The simplest CSS code is a [[7; 1; 3]] quantum code constructed from the
[7; 4; 3] classical Hamming code and its even subcode. An [[n1; 1; d1]] quantum
code can be concatenated with an [[n2; 1; d2]] code to obtain a degenerate
[[n1n2; 1; d]] code with d d1d2 .
Quantum channel capacity: The quantum channel capacity of a su-
peroperator (noisy quantum channel) is the maximum rate at which quantum
information can be transmitted over the channel and decoded with arbitrar-
ily good delity. The capacity of the binary quantum erasure channel with
erasure probability p is Q(p) = 1 ; 2p, for 0 p 1=2. The capacity of the
binary depolarizing channel is no yet known. The problem of calculating the
capacity is subtle because the optimal code may be degenerate; in particular,
random codes do not attain an asymptotically optimal rate over a quantum
channel.

7.18 Exercises
7.1 Phase error-correcting code
a) Construct stabilizer generators for an n = 3, k = 1 code that can
correct a single bit ip; that is, ensure that recovery is possible for
any of the errors in the set E = fIII ; XII ; IXI ; IIX g. Find
an orthonormal basis for the two-dimensional code subspace.
b) Construct stabilizer generators for an n = 3, k = 1 code that can
correct a single phase error; that is, ensure that recovery is possible
for any of the errors in the set E = fIII ; ZII ; IZI ; IIZ g. Find
an orthonormal basis for the two-dimensional code subspace.
7.2 Error-detecting codes
a) Construct stabilizer generators for an [[n; k; d]] = [[3; 0; 2]] quantum
code. With this code, we can detect any single-qubit error. Find
the encoded state. (Does it look familiar?)
b) Two QECC's C1 and C2 (with the same length n) are equivalent
if a permutation of qubits, combined with single-qubit unitary
transformations, transforms the code subspace of C1 to that of
C2. Are all [[3; 0; 2]] stabilizer codes equivalent?
88 CHAPTER 7. QUANTUM ERROR CORRECTION
c) Does a [[3; 1; 2]] stabilizer code exist?
7.3 Maximal entanglement
Consider the [[5; 1; 3]] quantum code, whose stabilizer generators are
M1 = XZZXI , and M2;3;4 obtained by cyclic permutations of M1,
and choose the encoded operation Z to be Z = ZZZZZ . From the
encoded states j0i with Z j0i = j0i and j1i with Z j1i = ;j1i, construct
the n = 6, k = 0 code whose encoded state is
p1 (j0i j0i + j1i j1i) : (7.271)
2
a) Construct a set of stabilizer generators for this n = 6, k = 0 code.
b) Find the distance of this code. (Recall that for a k = 0 code, the
distance is de ned as the minimum weight of any element of the
stabilizer.)
c) Find (3), the density matrix that is obtained if three qubits are
selected and the remaining three are traced out.
7.4 Codewords and nonlocality
For the [[5,1,3]] code with stabilizer generators and logical operators as
in the preceding problem,
a) Express Z as a weight-3 Pauli operator, a tensor product of I 's,
X 's, and Z 's (no Y 's). Note that because the code is cyclic,
all cyclic permutations of your expression are equivalent ways to
represent Z .
b) Use the Einstein locality assumption (local hidden variables) to pre-
dict a relation between the ve (cyclically related) observables
found in (a) and the observable ZZZZZ . Is this relation among
observables satis ed for the state j0i?
c) What would Einstein say?
7.5 Generalized Shor code
For integer m 2, consider the n = m2, k = 1 generalization of Shor's
nine-qubit code, with code subspace spanned by the two states:
j0i = (j000 : : : 0i + j111 : : : 1i) m ;
j1i = (j000 : : : 0i ; j111 : : : 1i) m : (7.272)
7.18. EXERCISES 89
a) Construct stabilizer generators for this code, and construct the log-
ical operations Z and X such that
Z j0i = j0i ; X j0i = j1i ;
Z j1i = ;j1i ; X j1i = j0i : (7.273)
b) What is the distance of this code?
c) Suppose that m is odd, and suppose that each of the n = m2 qubits
is subjected to the depolarizing channel with error probability p.
How well does this code protect the encoded qubit? Speci cally,
(i) estimate the probability, to leading nontrivial order in p, of a
logical bit- ip error j0i $ j1i, and (ii) estimate the probability,
to leading nontrivial order in p, of a logical phase error j0i ! j0i,
j1i ! ;j1i.
d) Consider the asymptotic behavior of your answer to (c) for m large.
What condition on p should be satis ed for the code to provide
good protection against (i) bit ips and (ii) phase errors, in the
n ! 1 limit?
7.6 Encoding circuits
For an [[n,k,d]] quantum code, an encoding transformation is a unitary
U that acts as
U : j i j0i (n;k) ! j i ; (7.274)
where j i is an arbitrary k-qubit state, and j i is the corresponding
encoded state. Design a quantum circuit that implements the encoding
transformation for
a) Shor's [[9,1,3]] code.
b) Steane's [[7,1,3]] code.
7.7 Shortening a quantum code
a) Consider a binary [[n; k; d]] stabilizer code. Show that it is possible
to choose the n ; k stabilizer generators so that at most two act
nontrivially on the last qubit. (That is, the remaining n ; k ; 2
generators apply I to the last qubit.)
90 CHAPTER 7. QUANTUM ERROR CORRECTION
b) These n;k;2 stabilizer generators that apply I to the last qubit will
still commute and are still independent if we drop the last qubit.
Hence they are the generators for a code with length n;1 and k +1
encoded qubits. Show that if the original code is nondegenerate,
then the distance of the shortened code is at least d ; 1. (Hint:
First show that if there is a weight-t element of the (n ; 1)-qubit
Pauli group that commutes with the stabilizer of the shortened
code, then there is an element of the n-qubit Pauli group of weight
at most t + 1 that commutes with the stabilizer of the original
code.)
c) Apply the code-shortening procedure of (a) and (b) to the [[5; 1; 3]]
QECC. Do you recognize the code that results? (Hint: It may
be helpful to exploit the freedom to perform a change of basis on
some of the qubits.)
7.8 Codes for qudits
A qudit is a d-dimensional quantum system. The Pauli operators
I ; X ; Y ; Z acting on qubits can be generalized to qudits as follows.
Let fj0i; j1i; : : : ; jd ; 1ig denote an orthonormal basis for the Hilbert
space of a single qudit. De ne the operators:
X : jj i ! jj + 1 (mod d)i ;
Z : jj i ! ! j jj i ; (7.275)
where ! = exp(2i=d). Then the d d Pauli operators E r;s are
E r;s X r Z s ; r; s = 0; 1; : : : ; d ; 1 (7.276)
a) Are the E r;s's a basis for the space of operators acting on a qudit?
Are they unitary? Evaluate tr(E yr;sE t;u).
b) The Pauli operators obey
E r;s E t;u = (r;s;t;u )E t;u E r;s ; (7.277)
where r;s;t;u is a phase. Evaluate this phase.
The n-fold tensor products of these qudit Pauli operators form a group
G(nd) of order d2n+1 (and if we mod out its d-element center, we obtain
7.18. EXERCISES 91
the group G (nd) of order d2n ). To construct a stabilizer code for qudits,
we choose an abelian subgroup of G(nd) with n ; k generators; the code
is the simultaneous eigenstate with eigenvalue one of these generators.
If d is prime, then the code subspace has dimension dk : k logical qudits
are encoded in a block of n qudits.
c) Explain how the dimension might be di erent if d is not prime.
Hint: Consider the case d = 4 and n = 1.)
7.9 Syndrome measurement for qudits
Errors on qudits are diagnosed by measuring the stabilizer generators.
For this purpose, we may invoke the two-qudit gate SUM (which gen-
eralizes the controlled-NOT), acting as
SUM : jj i jki ! jj i jk + j (mod d)i : (7.278)
a) Describe a quantum circuit containing SUM gates that can be exe-
cuted to measure an n-qudit observable of the form
O sa
Za : (7.279)
a
If d is prime, then for each r; s = 0; 1; 2; : : : ; d ; 1, there is a single-qudit
unitary operator U r;s such that
U r;s E r;s U yr;s = Z : (7.280)
b) Describe a quantum circuit containing SUM gates and U r;s gates
that can be executed to measure an arbitrary element of G(nd) of
the form
O
E ra ;sa : (7.281)
a
7.10 Error-detecting codes for qudits
A qudit with d = 3 is called a qutrit. Consider a qutrit stabilizer
code with length n = 3 and k = 1 encoded qutrit de ned by the two
stabilizer generators
ZZZ ; XXX : (7.282)
92 CHAPTER 7. QUANTUM ERROR CORRECTION
a) Do the generators commute?
b) Find the distance of this code.
c) In terms of the orthonormal basis fj0i; j1i; j2ig for the qutrit, write
out explicitly an orthonormal basis for the three-dimensional code
subspace.
d) Construct the stabilizer generators for an n = 3m qutrit code (where
m is any positive integer), with k = n ; 2, that can detect one
error.
e) Construct the stabilizer generators for a qudit code that detects one
error, with parameters n = d, k = d ; 2.
7.11 Error-correcting code for qudits
Consider an n = 5, k = 1 qudit stabilizer code with stabilizer generators
X Z Z ;1 X ;1 I
I X Z Z ;1 X ;1
X ;1 I X Z Z ;1 (7.283)
Z ;1 X ;1 I X Z
(the second, third, and fourth generators are obtained from the rst by
a cyclic permutation of the qudits).
a) Find the order of each generator. Are the generators really in-
dependent? Do they commute? Is the fth cyclic permutation
Z Z ;1 X ;1 I X independent of the rest?
b) Find the distance of this code. Is the code nondegenerate?
c) Construct the encoded operations X and Z , each expressed as an
operator of weight 3. (Be sure to check that these operators obey
the right commutation relations for any value of d.)
Lecture Notes for Physics 219:
Quantum Computation

John Preskill
California Institute of Technology

14 June 2004
Contents

9 Topological quantum computation 4

9.1 Anyons, anyone? 4
9.2 Flux-charge composites 7
9.3 Spin and statistics 9
9.4 Combining anyons 11
9.5 Unitary representations of the braid group 13
9.6 Topological degeneracy 16
9.7 Toric code revisited 20
9.8 The nonabelian Aharonov-Bohm effect 21
9.9 Braiding of nonabelian fluxons 24
9.10 Superselection sectors of a nonabelian superconductor 29
9.11 Quantum computing with nonabelian fluxons 32
9.12 Anyon models generalized 40
9.12.1 Labels 40
9.12.2 Fusion spaces 41
9.12.3 Braiding: the R-matrix 44
9.12.4 Associativity of fusion: the F -matrix 45
9.12.5 Many anyons: the standard basis 46
9.12.6 Braiding in the standard basis: the B-matrix 47
9.13 Simulating anyons with a quantum circuit 49
9.14 Fibonacci anyons 52
9.15 Quantum dimension 53
9.16 Pentagon and hexagon equations 58
9.17 Simulating a quantum circuit with Fibonacci anyons 61
9.18 Epilogue 63
9.18.1 Chern-Simons theory 63
9.18.2 S-matrix 64
9.18.3 Edge excitations 65
9.19 Bibliographical notes 65

2
Contents 3

References 67
9
Topological quantum computation

9.1 Anyons, anyone?

A central theme of quantum theory is the concept of indistinguishable
particles (also called identical particles). For example, all electrons in the
world are exactly alike. Therefore, for a system with many electrons,
an operation that exchanges two of the electrons (swaps their positions)
is a symmetry — it leaves the physics unchanged. This symmetry is
represented by a unitary transformation acting on the many-electron wave
function.
For the indistinguishable particles in three-dimensional space that we
normally talk about in physics, particle exchanges are represented in one
of two distinct ways. If the particles are bosons (like, for example, 4 He
atoms in a superﬂuid), then an exchange of two particles is represented by
the identity operator: the wave function is invariant, and we say the par-
ticles obey Bose statistics. If the particles are fermions (like, for example,
electrons in a metal), than an exchange is represented by multiplication
by (−1): the wave function changes sign, and we say that the particles
obey Fermi statistics.
The concept of identical-particle statistics becomes ambiguous in one
spatial dimension. The reason is that for two particles to swap positions
in one dimension, the particles need to pass through one another. If the
wave function changes sign when two identical particles are exchanged,
we could say that the particles are noninteracting fermions, but we could
just as well say that the particles are interacting bosons, such that the
sign change is induced by the interaction as the particles pass one an-
other. More generally, the exchange could modify the wavefunction by
a multiplicative phase eiθ that could take values other than +1 or −1,
but we could account for this phase change by describing the particles as
either bosons or fermions.

4
9.1 Anyons, anyone? 5

Thus, identical-particle statistics is a rather tame concept in three (or

more) spatial dimensions and also in one dimension. But in between these
two dull cases, in two dimensions, a remarkably rich variety of types of
particle statistics are possible, so rich that we have far to go before we
can give a useful classification of all of the possibilities.
Indistinguishable particles in two dimensions that are neither bosons
nor fermions are called anyons. Anyons are a fascinating theoretical con-
struct, but do they have anything to do with the physics of real systems
that can be studied in the laboratory? The remarkable answer is: “Yes!”
Even in our three-dimensional world, a two-dimensional gas of electrons
can be realized by trapping the electrons in a thin layer between two slabs
of semiconductor, so that at low energies, electron motion in the direction
orthogonal to the layer is frozen out. In a sufficiently strong magnetic field
and at sufficiently low temperature, and if the electrons in the material
are sufficiently mobile, the two-dimensional electron gas attains a pro-
foundly entangled ground state that is separated from all excited states
by a nonzero energy gap. Furthermore, the low-energy particle excitations
in the systems do not have the quantum numbers of electrons; rather they
are anyons, and carry electric charges that are fractions of the electron
charge. The anyons have spectacular effects on the transport properties
of the sample, manifested as the fractional quantum Hall effect.
Anyons will be our next topic. But why? True, I have already said
enough to justify that anyons are a deep and fascinating subject. But this
is not a course about the unusual behavior of exotic phases attainable in
condensed matter systems. It is a course about quantum computation.
In fact, there is a connection, first appreciated by Alexei Kitaev in
1997: anyons provide an unusual, exciting and perhaps promising means
of realizing fault-tolerant quantum computation.
So that sounds like something we should be interested in. After all,
I have already given 12 lectures on the theory of quantum error correc-
tion and fault-tolerant quantum computing. It is a beautiful theory; I
have enjoyed telling you about it and I hope you enjoyed hearing about
it. But it is also daunting. We’ve seen that an ideal quantum circuit
can be simulated faithfully by a circuit with noisy gates, provided the
noisy gates are not too noisy, and we’ve seen that the overhead in cir-
cuit size and depth required for the simulation to succeed is reasonable.
These observations greatly boost our confidence that large scale quantum
computers will really be built and operated someday. Still, for fault tol-
erance to be effective, quantum gates need to have quite high fidelity (by
the current standards of experimental physics), and the overhead cost of
achieving fault tolerance is substantial. Even though reliable quantum
computation with noisy gates is possible in principle, there always will
6 9 Topological quantum computation

be a strong incentive to improve the ﬁdelity of our computation by im-

proving the hardware rather than by compensating for the deficiencies of
the hardware through clever circuit design. By using anyons, we might
achieve fault tolerance by designing hardware with an intrinsic resistance
to decoherence and other errors, significantly reducing the size and depth
blowups of our circuit simulations. Clearly, then, we have ample motiva-
tion for learning about anyons. Besides, it will be fun!
In some circles, this subject has a reputation (not fully deserved in my
view) for being abstruse and inaccessible. I intend to start with the basics,
and not to clutter the discussion with details that are mainly irrelevant to
our central goals. That way, I hope to keep the presentation clear without
really dumbing it down.
What are these goals? I will not be explaining how the theory of anyons
connects with observed phenomena in fractional quantum Hall systems.
In particular, abelian anyons arise in most of these applications. From
a quantum information viewpoint, abelian anyons are relevant to robust
storage of quantum information (and we have already gotten a whiff of
that connection in our study of toric quantum codes). We will discuss
abelian anyons here, but our main interest will be in nonabelian anyons,
which as we will see can be endowed with surprising computational power.
Kitaev (quant-ph/9707021) pointed out that a system of nonabelian
anyons with suitable properties can efficiently simulate a quantum circuit;
this idea was elaborated by Ogburn and me (quant-ph/9712048), and gen-
eralized by Mochon (quant-ph/0206128, quant-ph/0306063). In Kitaev’s
original scheme, measurements were required to simulate some quantum
gates. Freedman, Larsen and Wang (quant-ph/000110) observed that if
we use the right kind of anyons, all measurements can be postponed until
the readout of the final result of the computation. Freedman, Kitaev,
and Wang (quant-ph/0001071) also showed that a system of anyons can
be simulated efficiently by a quantum circuit; thus the anyon quantum
computer and the quantum circuit model have equivalent computational
power. The aim of these lectures is to explain these important results.
We will focus on the applications of anyons to quantum computing, not
on the equally important issue of how systems of anyons with desirable
properties can be realized in practice.∗ It will be left to you to figure that
out!

∗
Two interesting approaches to realizing nonabelian anyons — using superconduct-
ing junction arrays and using cold atoms trapped in optical lattices — have been
discussed in the recent literature.
9.2 Flux-charge composites 7

9.2 Flux-charge composites

For those of us who are put off by abstract mathematical constructions, it
will be helpful to begin our exploration of the theory of anyons by thinking
about a concrete model. So let’s start by recalling a more familiar concept,
the Aharonov-Bohm effect.
Imagine electromagnetism in a two-dimensional world, where a “flux
tube” is a localized “pointlike” object (in three dimensions, you may en-
vision a plane intersecting a magnetic solenoid directed perpendicular to
the plane). The flux might be enclosed behind an impenetrable wall, so
that an object outside can never visit the region where the magnetic field
is nonzero. But even so, the magnetic field has a measurable influence on
charged particles outside the flux tube. If an electric charge q is adiabat-
ically transported (counterclockwise) around a flux Φ, the wave function
of the charge acquires a topological phase eiqΦ (where we use units with
h̄ = c = 1). Here the world “topological” means that the Aharonov-Bohm
phase is robust when we deform the trajectory of the charged particle —
all that matters is the “winding number” of the charge about the flux.
The concept of topological invariance arises naturally in the study of
fault tolerance. Topological properties are those that remain invariant
when we smoothly deform a system, and a fault-tolerant quantum gate is
one whose action on protected information remains invariant (or nearly
so) when we deform the implementation of the gate by adding noise. The
topological invariance of the Aharonov-Bohm phenomenon is the essential
property that we hope to exploit in the design of quantum gates that are
intrinsically robust.
We usually regard the Aharonov-Bohm effect as a phenomenon that
occurs in quantum electrodynamics, where the photon is exactly mass-
less. But it is useful to recognize that Aharonov-Bohm phenomena can
also occur in massive theories. For example, we might consider a “super-
conducting” system composed of charge e particles, such that composite
objects with charge ne form a condensate (where n is an integer). In
this superconductor, there is a quantum of flux Φ0 = 2π/ne, the minimal
nonzero flux such that a charge-(ne) particle in the condensate, when
transported around the flux, acquires a trivial Aharonov-Bohm phase.
An isolated region that contains a flux quantum is an island of normal
material surrounded by the superconducting condensate, prevented from
spreading because the magnetic flux cannot penetrate into the supercon-
ductor. That is, it is a stable particle, what we could call a “fluxon.”
When one of the charge-e particles is transported around a fluxon, its
wave function acquires the nontrivial topological phase eieΦ0 = e2πi/n .
But in the superconductor, the photon acquires a mass via the Higgs
mechanism, and there are no massless particles. That topological phases
8 9 Topological quantum computation

are compatible with massive theories is important, because massless par-

ticles are easily excited, a potentially copious source of decoherence.
Now, let’s imagine that, in our two-dimensional world, flux and electric
charge are permanently bound together (for some reason). A fluxon can
be envisioned as flux Φ confined inside an impenetrable circular wall,
and an electric charge q is stuck to the outside of the wall. What is
the angular momentum of this flux-charge composite? Suppose that we
carefully rotate the object counterclockwise by angle 2π, returning it to
its original orientation. In doing so, we have transported the charge q
about the flux Φ, generating a topological phase eiqΦ . This rotation by
2π is represented in Hilbert space by the unitary transformation

U (2π) = e−i2πJ = eiqΦ , (9.1)

where J is the angular momentum. We conclude, then, that the possible

eigenvalues of angular momentum are
qΦ
J =m− (m = integer) . (9.2)
2π
We can characterize this spectrum by an angular variable θ ∈ [0, 2π),
defined by θ = qΦ (mod 2π), and say that the eigenvalues are shifted
away from integer values by −θ/2π. We will refer to the phase eiθ that
represents a counterclockwise rotation by 2π as the topological spin of the
composite object.
But shouldn’t a rotation by 2π act trivially on a physical system (isn’t
it the same as doing nothing)? No, we know better than that, from our
experience with spinors in three dimensions. For a system with fermion
number F , we have
e−2πiJ = (−1)F ; (9.3)
if the fermion number is odd, the eigenvalues of J are shifted by 1/2
from the integers. This shift is physically acceptable because there is a
(−1)F superselection rule: no observable local operator can change the
value of (−1)F (there is no physical process that can create or destroy
an isolated fermion). Acting on a coherent superposition of states with
different values of (−1)F , the effect of e−2πiJ is

e−i2πJ (a| even F + b| odd F ) = a| even F − b| odd F . (9.4)

The relative sign in the superposition ﬂips, but this has no detectable
physical eﬀects, since all observables are block diagonal in the (−1)F
basis.
Similarly, in two dimensions, the shift in the angular momentum spec-
trum e−2πiJ = eiθ has no unacceptable physical consequences if there is
9.3 Spin and statistics 9

a θ superselection rule, ensuring that the relative phase in a superposi-

tion of states with different values of θ is physically inaccessible (not just
in practice but even in principle). As for fermions, there is no allowed
physical process that can create of destroy an isolated anyon.
In three dimensions, only θ = 0, π are allowed, because (as you probably
know) of a topological property of the three-dimensional rotation group
SO(3): a closed path in SO(3) beginning at the identity and ending at a
rotation by 4π can be smoothly contracted to a trivial path. It follows
that a rotation by 4π really is represented by the identity, and therefore
that the eigenvalues of a rotation by 2π are +1 and −1. But the two-
dimensional rotation group SO(2) does not have this topological property,
so that any value of θ is possible in principle.
Note that the angular momentum J changes sign under time reversal
(T ) and also under parity (P ). Unless θ = 0 or π, the spectrum of J
is asymmetric about zero, and therefore a theory of anyons typically will
not be T or P invariant. In our flux-charge composite model the origin
of this symmetry breaking is not mysterious — it arises from the nonzero
magnetic field. But in a system with no intrinsic breaking of T and P , if
anyons occur then either these symmetries must be broken spontaneously,
or else the particle spectrum must be “doubled” so that for each anyon
with exchange phase eiθ there also exists an otherwise identical particle
with exchange phase e−iθ .

9.3 Spin and statistics

For identical particles in three dimensions, there is a well known connec-
tion between spin and statistics: indistinguishable particles with integer
spin are bosons, and those with half-odd-integer spin are fermions. In
two dimensions, the spin can be any real number. What does this new
possibility of “fractional spin” imply about statistics? The answer is that
statistics, too, can be “fractionalized”!
What happens if we perform an exchange of two of our flux-charge
composite objects, in a counterclockwise sense? Each charge q is adiabat-
ically transported half way around the flux Φ of the other object. We can
anticipate, then, that each charge will acquire an Aharonov-Bohm phase
that is half of the phase generated by a complete revolution of the charge
about the flux. Adding together the phases arising from the transport of
both charges, we find that the exchange of the two flux-charge composites
changes their wave function by the phase

1 1
exp i qΦ + qΦ = eiqΦ = eiθ = e−2πiJ . (9.5)
2 2
The phase generated when the two objects are exchanged matches the
10 9 Topological quantum computation

phase generated when one of the two objects is rotated by 2π. Thus the
connection between spin and statistics continues to hold, in a form that
is a natural generalization of the connection that applies to bosons and
fermions.
The origin of this connection is fairly clear in our flux-charge composite
model, but in fact it holds much more generally. Why? Reading textbooks
on relativistic quantum field theory, one can easily get the impression that
the spin-statistics connection is founded on Lorentz invariance, and has
something to do with the properties of the complexified Lorentz group.
Actually, this impression is quite misleading. All that is essential for a
spin-statistics connection to hold is the existence of antiparticles. Special
relativity is not an essential ingredient.
Consider an anyon, characterized by the phase θ, and suppose that this
particle has a corresponding antiparticle. This means that the particle
and its antiparticle, when combined, have trivial quantum numbers (in
particular, zero angular momentum) and therefore that there are physical
processes in which particle-antiparticle pairs can be created and annihi-
lated. Draw a world line in spacetime that represents a process in which
two particle-antiparticle pairs are created (one pair on the left and the
other pair on the right), the particle from the pair on the right is ex-
changed in a counterclockwise sense with the particle from the pair on
the left, and then both pairs reannihilate. (The world line has an orien-
tation; if directed forward in time it represents a particle, and if directed
backward in time it represents an antiparticle.) Turning our diagram 90◦ ,
we obtain a depiction of a process in which a single particle-antiparticle
pair is created, the particle and antiparticle are exchanged in a clock-
wise sense, and then the pair reannihilates. Turning it 90◦ yet again, we
have a process in which two pairs are created and the antiparticle from
the pair on the right is exchanged, in a counterclockwise sense, with the
antiparticle from the pair on the left, before reannihilation.

Raa Raa1 Raa

What do we conclude from these manipulations? Denote by Rab the

unitary operator that represents a counterclockwise exchange of particles
of types a and b (so that the inverse operator R−1
ab represents a clockwise
exchange), and denote by ā the antiparticle of a. We have found that

Raa = R−1
aā = Rāā . (9.6)
9.4 Combining anyons 11

If a is an anyon with exchange phase eiθ , then its antiparticle ā also has
the same exchange phase. Furthermore, when a and ā are exchanged
counterclockwise, the phase acquired is e−iθ .
These conclusions are unsurprising when we interpret them from the
perspective of our flux-charge composite model of anyons. The antipar-
ticle of the object with flux Φ and charge q has flux −Φ and charge −q.
Hence, when we exchange two antiparticles, the minus signs cancel and
the effect is the same as though the particles were exchanged. But if we
exchange a particle and an antiparticle, then the relative sign of charge
and flux results in the exchange phase e−iqΦ = e−iθ .
But what is the connection between these observations about statistics
and the spin? Continuing to contemplate the same spacetime diagram, let
us consider its implications regarding the orientation of the particles. For
keeping track of the orientation, it is convenient to envision the particle
world line not as a thread but as a ribbon in spacetime. I claim that our
process can be smoothly deformed to one in which a particle-antiparticle
pair is created, the particle is rotated counterclockwise by 2π, and then
the pair reannihilates. A convenient way to verify this assertion is to take
off your belt (or borrow a friend’s). The buckle at one end specifies an
orientation; point your thumb toward the buckle, and following the right-
hand rule, twist the belt by 2π before rebuckling it. You should be able
to check that you can lay out the belt to match the spacetime diagram for
any of the exchange processes described earlier, and also for the process
in which the particle rotates by 2π.
Thus, in a topological sense, rotating a particle counterclockwise by 2π
is really the same thing as exchanging two particles in a counterclockwise
sense (or exchanging particle and antiparticle in a clockwise sense), which
provides a satisfying explanation for a general spin-statistics connection.†
I emphasize again that this argument invokes processes in which particle-
antiparticle pairs are created and annihilated, and therefore the existence
of antiparticles is an essential prerequisite for it to apply.

9.4 Combining anyons

We know that a composite object composed of two fermions is a bo-
son. What happens when we build a composite object by combining two
anyons?

†
Actually, this discussion has been oversimpliﬁed. Though it is adequate for abelian
anyons, we will see that it must be amended for nonabelian anyons, because Rab has
more than one eigenvalue in the nonabelian case. Similarly, the discussion in the next
section of “combining anyons” will need to be elaborated because, in the nonabelian
case, more than one kind of composite anyon can be obtained when two anyons are
fused together.
12 9 Topological quantum computation

Suppose that a is an anyon with exchange phase eiθ , and that we build
a “molecule” from n of these a anyons. What phase is acquired under a
counterclockwise exchange of the two molecules?
The answer is clear in our ﬂux-charge composite model. Each of the n
charges in one molecule acquires a phase eiθ/2 when transported half way
around each of the n ﬂuxes in the other molecule. Altogether then, 2n2
factors of the phase eiθ/2 are generated, resulting in the total phase
2θ
eiθn = ein . (9.7)

Said another way, the phase eiθ occurs altogether n2 times because in
eﬀect n anyons in one molecule are being exchanged with n anyons in
the other molecule. Contrary to what we might have naively expected, if
we split a fermion (say) into two
√identical constituents, the constituents
have, not an exchange phase of −1 = i, but rather (eiπ )1/4 = eiπ/4 .
This behavior is compatible with the spin-statistics connection: the
angular momentum J of the n-anyon molecule satisﬁes
2J 2θ
e−2πiJn = e−2πin = ein . (9.8)

For example, consider a molecule of two anyons, and imagine rotating

the molecule counterclockwise by 2π. Not only does each anyon in the
molecule rotate by 2π; in addition one of the anyons revolves around the
other. One revolution is equivalent to two successive exchanges, so that
the phase generated by the revolution is ei2θ . The total eﬀect of the two
rotations and the revolution is the phase

exp [i (θ + θ + 2θ)] = ei4θ . (9.9)

Another way to understand why the angular momenta of the anyons in

the molecule do not combine additively is to note that the total angular
momentum of the molecule consists of two parts — the spin angular
momentum S of each of the two anyons (which is additive) and the orbital
angular momentum L of the anyon pair. Because the counterclockwise
transport of one anyon around the other generates the nontrivial phase
ei2θ , the dependence of the two-anyon wavefunction ψ on the relative
azimuthal angle ϕ is not single-valued; instead,

ψ(ϕ + 2π) = e−i2θ ψ(ϕ) . (9.10)

This means that the spectrum of the orbital angular momentum L is

shifted away from integer values:

e−i2πL = e2iθ , (9.11)

9.5 Unitary representations of the braid group 13

and this orbital angular momentum combines additively with the spin S
to produce the total angular momentum
−2πJ = −2πL−2πS = 2θ+2θ+ 2π(integer) = 4θ+ 2π(integer) . (9.12)
What if, on the other hand, we build a molecule āa from an anyon a
and its antiparticle ā? Then, as we’ve seen, the spin S has the same value
as for the aa molecule. But the exchange phase has the opposite value, so
that the noninteger part of the orbital angular momentum is −2πL = −2θ
instead of −2πL = 2θ, and the total angular momentum J = L + S is
an integer. This property is necessary, of course, if the āa pair is to be
able to annihilate without leaving behind an object that carries nontrivial
angular momentum.

9.5 Unitary representations of the braid group

We have already noted that the angular momentum spectrum has differ-
ent properties in two spatial dimensions than in three dimensions because
SO(2) has different topological properties than SO(3) (SO(3) has a com-
pact simply connected covering group SU(2), but SO(2) does not). This
observation provides one way to see why anyons are possible in two di-
mensions but not in three. It is also instructive to observe that particle
exchanges have different topological properties in two spatial dimensions
than in three dimensions.
As we have found in our discussion of the relation between the statistics
of particles and of antiparticles, it is useful to envision exchanges of parti-
cles as processes taking place in spacetime. In particular, it is convenient
to imagine that we are computing the quantum transition amplitude for
a time-dependent process involving n particles by evaluating a sum over
particle histories (though for our purposes it will not actually be necessary
to calculate any path integrals).
Consider a system of n indistinguishable pointlike particles confined to
a two-dimensional spatial surface (which for now we may assume is the
plane), and suppose that no two particles are permitted to occupy coinci-
dent positions. We may think of a configuration of the particles at a fixed
time as a plane with n “punctures” at specified locations — that is, we
associate with each particle a hole in the surface with infinitesimal radius.
The condition that the particles are forbidden to coincide is enforced by
demanding that there are exactly n punctures in the plane at any time.
Furthermore, just as the particles are indistinguishable, each puncture
is the same as any other. Thus if we were to perform a permutation of
the n punctures, this would have no physical effect; all the punctures are
the same anyway, so it makes no difference which one is which. All that
matters is the n distinct particle positions in the plane.
14 9 Topological quantum computation

To evaluate the quantum amplitude for a conﬁguration of n particles

at specified initial positions at time t = 0 to evolve to a configuration
of n particles at specified final positions at time t = T , we are to sum
over all classical histories for the n particles that interpolate between the
fixed initial configuration and the fixed final configuration, weighted by
the phase eiS , where S is the classical action of the history. If we envision
each particle world line as a thread, each history for the n particles be-
comes a braid, where each particle on the initial (t = 0) time slice can be
connected by a thread to any one of the particles on the final (t = T ) time
slice. Furthermore, since the particle world lines are forbidden to cross,
the braids fall into distinct topological classes that cannot be smoothly
deformed one to another, and the path integral can be decomposed as
a sum of contributions, with each contribution arising from a different
topological class of histories.
Nontrivial exchange operations acting on the particles on the final time
slice change the topological class of the braid. Thus we see that the
elements of the symmetry group generated by exchanges are in one-to-one
correspondence with the topological classes. This (infinite) group is called
Bn , the braid group on n strands; the group composition law corresponds
to concatenation of braids (that is, following one braid with another). In
the quantum theory, the quantum state of the n indistinguishable particles
belongs to a Hilbert space that transforms as a unitary representation of
the braid group Bn .
The group can be presented as a set of generators that obey particular
defining relations. To understand the defining relations, we may imag-
ine that the n particles occupy n ordered positions (labeled 1, 2, 3, . . . , n)
arranged on a line. Let σ1 denote a counterclockwise exchange of the
particles that initially occupy positions 1 and 2, let σ2 denote a counter-
clockwise exchange of the particles that initially occupy positions 2 and
3, and so on. Any braid can be constructed as a succession of exchanges
of neighboring particles; hence σ1 , σ2 , . . . , σn−1 are the group generators.
The defining relations satisfied by these generators are of two types.
The first type is
σj σk = σk σj , |j − k| ≥ 2 , (9.13)
which just says that exchanges of disjoint pairs of particles commute. The
second, slightly more subtle, type of relation is

σj σj+1 σj = σj+1 σj σj+1 , j = 1, 2, . . ., n − 2 , (9.14)

which is sometimes called the Yang-Baxter relation. You can verify the
Yang-Baxter relation by drawing the two braids σ1 σ2 σ1 and σ2 σ1 σ2 on
a piece of paper, and observing that both describe a process in which
the particles initially in positions 1 and 3 are exchanged counterclockwise
9.5 Unitary representations of the braid group 15

about the particle labeled 2, which stays ﬁxed — i.e., these are topologi-
cally equivalent braids.

V1 V2
V2 V1
V1 V2

Since the braid group is inﬁnite, it has an inﬁnite number of unitary

irreducible representations, and in fact there are an infinite number of one-
dimensional representations. Indistinguishable particles that transform as
a one-dimensional representation of the braid group are said to be abelian
anyons. In the one-dimensional representations, each generator σj of Bn is
represented by a phase σj = eiθj . Furthermore, the Yang-Baxter relation
becomes eiθj eiθj+1 eiθj = eiθj+1 eiθj eiθj+1 , which implies eiθj = eiθj+1 ≡ eiθ
— all exchanges are represented by the same phase. Of course, that
makes sense; if the particles are really indistinguishable, the exchange
phase ought not to depend on which pair is exchanged. For θ = 0 we
obtain bosons, and for θ = π, fermions
The braid group also has many nonabelian representations that are
of dimension greater than one; indistinguishable particles that transform
as such representations are said to be nonabelian anyons (or, sometimes,
nonabelions). To understand the physical properties of nonabelian anyons
we will need to understand the mathematical structure of some of these
representations. In these lectures, I hope to convey some intuition about
nonabelian anyons by discussing some examples in detail.
For now, though, we can already anticipate the main goal we hope to
fulfill. For nonabelian anyons, the irreducible representation of Bn real-
ized by n anyons acts on a “topological vector space” Vn whose dimension
Dn increases exponentially with n. And for anyons with suitable prop-
erties, the image of the representation may be dense in SU(Dn ). Then
braiding of anyons can simulate a quantum computation — any (special)
unitary transformation acting on the exponentially large vector space Vn
can be realized with arbitrarily good fidelity by executing a suitably cho-
sen braid.
Thus we are keenly interested in the nonabelian representations of the
braid group. But we should also emphasize (and will discuss at greater
16 9 Topological quantum computation

length later on) that there is more to a model of anyons than a mere rep-
resentation of the braid group. In our flux tube model of abelian anyons,
we were able to describe not only the effects of an exchange of anyons, but
also the types of particles that can be obtained when two or more anyons
are combined together. Likewise, in a general anyon model, the anyons
are of various types, and the model incorporates “fusion rules” that spec-
ify what types can be obtained when two anyons of particular types are
combined. Nontrivial consistency conditions arise because fusion is asso-
ciate (fusing a with b and then fusing the result with c is equivalent to
fusing b with c and then fusing the result with a), and because the fusion
rules must be consistent with the braiding rules. Though these consis-
tency conditions are highly restrictive, many solutions exist, and hence
many different models of nonabelian anyons are realizable in principle.

9.6 Topological degeneracy

But before moving on to nonabelian anyons, there is another important
idea concerning abelian anyons that we should discuss. In any model of
anyons (indeed, in any local quantum system with a mass gap), there is a
ground state or vacuum state, the state in which no particles are present.
On the plane the ground state is unique, but for a two-dimensional surface
with nontrivial topology, the ground state is degenerate, with the degree of
degeneracy depending on the topology. We have already encountered this
phenomenon of “topological degeneracy” in the model of abelian anyons
that arose in our study of a particular quantum error-correcting code,
Kitaev’s toric code. Now we will observe that topological degeneracy is a
general feature of any model of (abelian) anyons.
We can arrive at the concept of topological degeneracy by examining
the representations of a simple operator algebra. Consider the case of the
torus, represented as a square with opposite sides identiﬁed, and consider
the two fundamental 1-cycles of the torus: C1 , which winds around the
square in the x1 direction, and C2 which winds around in the x2 direction.
A unitary operator T1 can be constructed that describes a process in
which an anyon-antianyon pair is created, the anyon propagates around
C1 , and then the pair reannihilates. Similarly a unitary operator T2 can
be constructed that describes a process in which the pair is created, and
the anyon propagates around the cycle C2 before the pair reannihilates.
Each of the operators T1 and T2 preserves the ground state of the system
(the state with no particles); indeed, each commutes with the Hamiltonian
H of the system and so either can be simultaneously diagonalized with
H (T1 and T2 are both symmetries).
However, T1 and T2 do not commute with one another. If our torus
has inﬁnite spatial volume, and there is a mass gap (so that the only
9.6 Topological degeneracy 17

interactions among distantly separated anyons are due to the Aharonov-

Bohm eﬀect), then the commutator of T1 and T2 is

T2−1 T1−1 T2 T1 = e−i2θ I , (9.15)

where eiθ is the anyon’s exchange phase. The nontrivial commutator

arises because the process in which (1) an anyon winds around C1 , (2)
an anyon winds around C2 (3) an anyon winds around C1 in the reverse
direction, and (4) an anyon winds around C2 in the reverse direction, is
topologically equivalent to a process in which one anyon winds clockwise
around another. To verify this claim, view the action of T2−1 T1−1 T2 T1
as a process in spacetime. First note that the process described by the
operator T1−1 T1 , in which an anyon world line ﬁrst sweeps though C1 and
then immediately traverses C1 in the reverse order, can be deformed to
a process in which the anyon world line traverses a topologically trivial
loop that can be smoothly shrunk to a point (in keeping with the prop-
erty that T1−1 T1 is really the identity operator). In similar fashion, the
process described by the operator T2−1 T1−1 T2 T1 can be deformed to one
where the anyon world lines traverse two closed loops, but such that the
world lines link once with one another; furthermore, one loop pierces the
surface bounded by the other loop in a direction opposite to the orien-
tation inherited by the surface via the right-hand rule from its bounding
loop. This process can be smoothly deformed to one in which two pairs
are created, one anyon winds clockwise around the other, and then both
pairs annihilate. The clockwise winding is equivalent to two successive
clockwise exchanges, represented in our one-dimensional representation
of the braid group by the phase e−i2θ . We conclude that T1 and T2 are
noncommuting, except in the cases θ = 0 (bosons) and θ = π (fermions).

1
18 9 Topological quantum computation

Since T1 and T2 both commute with the Hamiltonian H, both preserve

the eigenspaces of H, but since T1 and T2 do not commute with one
another, they cannot be simultaneously diagonalized. Since T1 is unitary,
its eigenvalues are phases; let us use the angular variable α ∈ [0, 2π) to
label an eigenstate of T1 with eigenvalue eiα :

T1 |α = eiα |α . (9.16)

Then applying T2 to the T1 eigenstate advances the value of α by 2θ:

T1 (T2 |α) = ei2θ T2 T1 |α = ei2θ eiα (T2 |α) . (9.17)

Suppose that θ is a rational multiple of 2π, which we may express as

θ = πp/q , (9.18)

where q and p (p < 2q) are positive integers with no common factor. Then
we conclude that T1 must have at least q distinct eigenvalues; T1 acting
on α generates an orbit with q distinct values:

2πp
α+ k (mod 2π) , k = 0, 1, 2, . . . , q − 1 . (9.19)
q
Since T1 commutes with H, on the torus the ground state of our anyonic
system (indeed, any energy eigenstate) must have a degeneracy that is an
integer multiple of q. Indeed, generically (barring further symmetries or
accidental degeneracies), the degeneracy is expected to be exactly q.
For a two-dimensional surface with genus g (a sphere with g “handles”),
the degree of this topological degeneracy becomes q g , because there are
operators analogous to T1 and T2 associated with each of the g handles,
and all of the T1 -like operators can be simultaneously diagonalized. Fur-
thermore, we can apply a similar argument to a finite planar medium if
single anyons can be created and destroyed at the edges of the system. For
example, consider an annulus in which anyons can appear or disappear
at the inner and outer edges. Then we could define the unitary opera-
tor T1 as describing a process in which an anyon winds counterclockwise
around the annulus, and a unitary operator T2 as describing a process in
which an anyon appears at the outer edge, propagates to the inner edge,
and disappears. These operators T1 and T2 have the same commutator
as the corresponding operators defined on the torus, and so we conclude
as before that the ground state on the annulus is q-fold degenerate for
θ = πp/q. For a disc with h holes, there is an operator analogous to
T1 that winds an anyon counterclockwise around each of the holes, and
an operator analogous to T2 that propagates an anyon from the outer
boundary of the disk to the edge of the hole; thus the degeneracy is q h .
9.6 Topological degeneracy 19

What we have described here is a robust topological quantum memory.

The phase ei2θ = ei2πp/q ≡ ω acquired when one anyon winds counter-
clockwise around another is a primitive qth root of unity, and in the case
of a planar system with holes, the operator T1 can be regarded as the en-
coded Pauli operator Z̄ acting on a q-dimension system associated with
a particular hole. Physically, the eigenvalue ω s of Z̄ just counts the num-
ber s of anyons that are “stuck” inside the hole. The operator T2 can
be regarded as the complementary Pauli operator X̄ that increments the
value of s by carrying one anyon from the boundary of the system and
depositing it in the hole. Since the quantum information is encoded in a
nonlocal property of the system, it is well protected from environmental
decoherence. By the same token depositing a quantum state in the mem-
ory, and reading it out, might be challenging for this system, though in
principle Z̄ could be measured by, say, performing an interference experi-
ment in which an anyon projectile scatters off of a hole. We will see later
that by using nonabelian anyons we will be able to simplify the readout;
in addition, with nonabelian anyons we can use topological properties to
process quantum information as well as to store it.
Just how robust is this quantum memory? We need to worry about er-
rors due to thermal fluctuations and due to quantum fluctuations. Ther-
mal fluctuations might excite the creation of anyons, and thermal anyons
might diffuse around one of the holes in the sample, or from one bound-
ary to another, causing an encoded error. Thermal errors are heavily
suppressed by the Boltzman factor e−∆/T , if the temperature T is suffi-
ciently small compared to the energy gap ∆ (the minimal energy cost of
creating a single anyon at the edge of the sample, or a pair of anyons in
the bulk). The harmful quantum fluctuations are tunneling processes in
which a virtual anyon-antianyon pair appears and the anyon propagates
around a hole before reannihilating, or a virtual anyon appears at the
edge of a hole and propagates to another boundary before disappearing.
These errors due to quantum tunneling are heavily suppressed if the holes
are sufficiently large and sufficiently well separated from one another and
from the outer boundary.‡
Note that our conclusion that the topological degeneracy is finite hinged
on the assumption that the angle θ is a rational multiple of π. We may
say that a theory of anyons is rational if the topological degeneracy is
finite for any surface of finite genus (and, for nonabelian anyons, if the

‡
If you are familiar with Euclidean path integral methods, you’ll find it easy to verify
that in the leading semiclassical approximation the amplitude A for such a tunneling
process in which the anyon propagates a distance L has the form A = Ce−L/L0 ,
where C is a constant and L0 = h̄ (2m∗ ∆)−1/2 ; here h̄ is Planck’s constant and m∗
is the effective mass of the anyon, defined so that the kinetic energy of an anyon
traveling at speed v is 12 m∗ v 2 .
20 9 Topological quantum computation

topological vector space Vn is ﬁnite-dimensional for any ﬁnite number of

anyons n). We may anticipate that the anyons that arise in any physically
reasonable system will be rational in this sense, and therefore should be
expected to have exchange phases that are roots of unity.

9.7 Toric code revisited

If these observations about topological degeneracy seem hauntingly famil-
iar, it may be because we used quite similar arguments in our discussion
of the toric code.
The toric code can be regarded as the (degenerate) ground state of a
system of qubits that occupy the links of a square lattice on the torus,
with Hamiltonian

1
H=− ∆ ZP + XS , (9.20)
4
P S

where the plaquette operator ZP = ⊗∈P Z is the tensor product of Z’s

acting on the four qubits associated with the links contained in plaquette
P , and the site operator XS ⊗S X is the tensor product of X’s acting
on the four qubits associated with the links that meet at the site S. These
plaquette and site operators are just the (commuting) stabilizer generators
for the toric code. The ground state is the simultaneous eigenstate with
eigenvalue 1 of all the stabilizer generators.
This model has two types of localized particle excitations — plaquette
excitations where ZP = −1, which we might think of as magnetic fluxons,
and site excitations where XS = −1, which we might think of as electric
charges. A Z error acting on a link creates a pair of charges on the two
site joined by the link, and an X error acting on a link creates a pair of
fluxons on the two plaquettes that share the link. The energy gap ∆ is
the cost of creating a pair of either type.
The charges are bosons relative to one another (they have a trivial
exchange phase eiθ = 0), and the fluxons are also bosons relative to one
another. Since the fluxons are distinguishable from the charges, it does
not make sense to exchange a charge with a flux. But what makes this
an anyon model is that a phase (−1) is acquired when a charge is carried
around a flux. The degeneracy of the ground state (the dimension of the
code space) can be understood as a consequence of this property of the
particles.
For this model on the torus, because there are two types of particles,
there are two types of T1 operators: T1,S , which propagates a charge (site
defect) around the 1-cycle C1 , and T1,P , which propagates a fluxon (pla-
quette defect) around C1 . Similarly there are two types of T2 operators,
9.8 The nonabelian Aharonov-Bohm effect 21

T2,S and T2,P . The nontrivial commutators are

−1 −1 −1 −1
T2,P T1,S T2,P T1,S = −1 = T2,S T1,P T2,S T1,P , (9.21)

both arising from processes in which world lines of charges and fluxon link
once with one another. Thus T1,S and T2,S can be diagonalized simulta-
neously, and can be regarded as the encoded Pauli operators Z̄1 and Z̄2
acting on two protected qubits. The operator T2,P , which commutes with
Z̄1 and anticommutes with Z̄2 , can be regarded as the encoded X̄1 , and
similarly T1,P is the encoded X̄2 .
On the torus, the degeneracy of the four ground states is exact for
the ideal Hamiltonian we constructed (the particles have infinite effective
masses). Weak local perturbations will break the degeneracy, but only
by an amount that gets exponentially small as the linear size L of the
torus increases. To be concrete, suppose the perturbation is a uniform
“magnetic field” pointing in the ẑ direction, coupling to the magnetic
moments of the qubits:
H = −h Z . (9.22)

Because of the nonzero energy gap, for the purpose of computing in per-
turbation theory the leading contribution to the splitting of the degen-
eracy, it suffices to consider the effect of the perturbation in the four-
dimensional subspace spanned by the ground states of the unperturbed
system. In the toric code, the operators with nontrivial matrix elements
in this subspace are those such that Z ’s act on links that form a closed
loop that wraps around the torus (or X ’s act on links whose dual links
form a closed loop that wraps around the torus). For an L × L lattice on
the torus, the minimal length of such a closed loop is L; therefore nonva-
nishing matrix elements do not arise in perturbation theory until the Lth
order, and are suppressed by hL . Thus, for small h and large L, memory
errors due to quantum fluctuations occur only with exponentially small
amplitude.

9.8 The nonabelian Aharonov-Bohm eﬀect

There is a beautiful abstract theory of nonabelian anyons, and in due
course we will delve into that theory a bit. But I would prefer to launch
our study of the subject by describing a more concrete model.
With that goal in mind, let us recall some properties of chromodynam-
ics, the theory of the quarks and gluons contained within atomic nuclei
and other strongly interacting particles. In the real world, quarks are per-
manently bound together and can never be isolated, but for our discussion
let us imagine a ﬁctitious world in which the forces between quarks are
22 9 Topological quantum computation

weak, so that the characteristic distance scale of quark conﬁnement is

very large.
Quarks carry a degree of freedom known metaphorically as color. That
is, there are three kinds of quarks, which in keeping with the metaphor
we call red (R), yellow (Y ), and blue (B). Quarks of all three colors are
physical identical, except that when we bring two quarks together, we can
tell whether their colors are the same (the Coulombic interaction between
like colors is repulsive), or different (distinct colors attract). There is
nothing to prevent me from establishing a quark bureau of standards in
my laboratory, where colored quarks are sorted into three bins; all the
quarks in the same bin have the same color, and quarks in different bins
have different colors. We may attach (arbitrary) labels to the three bins
— R, Y , and B.
If while taking a hike outside by lab, I discover a previously unseen
quark, I may at first be unsure of its color. But I can find out. I capture
the quark and carry it back to my lab, being very careful not to disturb
its color along the way (in chromodynamics, there is a notion of parallel
transport of color). Once back at the quark bureau of standards, I can
compare this new quark to the previously calibrated quarks in the bins,
and so determine whether the new quark should be labeled R, Y , or B.
It sounds simple but there is a catch: in chromodynamics, the paral-
lel transport of color is path dependent due to an Aharonov-Bohm phe-
nomenon that affects color. Suppose that at the quark bureau of stan-
dards a quark is prepared whose color is described by the quantum state
|ψq = qR |R + qY |Y + qB |B ; (9.23)
it is a coherent superposition with amplitudes qR , qY , qB for the red, yel-
low, and blue states. The quark is carried along a path that winds around
a color magnetic flux tube and is returned to the quark bureau of stan-
dards where its color can be recalibrated. Upon its return the color state
has been rotated:    
qR qR
q  = U qY  , (9.24)
Y

qB qB
where U is a (special) unitary 3 × 3 matrix. Similarly, when a newly
discovered quark is carried back to the bureau of standards, the outcome
of a measurement of its color will depend on whether it passed to the left
or the right of the flux tube during its voyage.
This path dependence of the parallel transport of color is closely analo-
gous to the path dependence of the parallel transport of a tangent vector
on a curved Riemannian manifold. In chromodynamics, a magnetic field
is the “curvature” whose strength determines the amount of path depen-
dence.
9.8 The nonabelian Aharonov-Bohm effect 23

In general, the SU(3) matrix U that describes the eﬀect of parallel

transport of color about a closed path depends on the basepoint x0 where
the path begins and ends, as well as on the closed loop C traversed by the
path — when it is important to specify the loop and basepoint we will use
the notation U (C, x0 ). The eigenvalues of the matrix U have an invariant
“geometrical” meaning characterizing the parallel transport, but U itself
depends on the conventions we have established at the basepoint. You
might prefer to choose a different orthonormal basis for the color space
at the basepoint x0 than the basis I chose, so that your standard colors
R, Y , and B differ from mine by the action of an SU(3) matrix V (x0 ).
Then, while I characterize the effect or parallel transport around the loop
C with the matrix U , you characterize it with another matrix

V (x0 )U (C, x0 )V (x0 )−1 , (9.25)

that diﬀers from mine by conjugation by V (x0 ). Physicists sometimes

speak of this freedom to redefine conventions as a choice of gauge, and say
that U itself is gauge dependent while its eigenvalues are gauge invariant.
Chromodynamics, on the distance scales we consider here (much smaller
than the characteristic distance scale of quark confinement), is a the-
ory like electrodynamics with long-range Coulombic interactions among
quarks, mediated by “gluon” fields. We will prefer to consider a theory
that retains some of the features of chromodynamics (in particular the
path dependence of color transport), but without the easily excited light
gluons. In the case of electrodynamics, we eliminated the light photon
by considering a “superconductor” in which charged particles form a con-
densate, magnetic fields are expelled, and the magnetic flux of an isolated
object is quantized. Let us appeal to the same idea here. We consider a
nonabelian superconductor in two spatial dimensions. This world contains
particles that carry “magnetic flux” (similar to the color magnetic flux in
chromodynamics) and particles that carry charge (similar to the colored
quarks of chromodynamics). The flux takes values in a nonabelian finite
group G, and the charges are unitary irreducible representations of the
group G. In this setting, we can formulate some interesting models of
nonabelian anyons.
Let R denote a particular irreducible representation of G, whose di-
mension is denoted |R|. We may establish a “charge bureau of stan-
dards,” and define there an arbitrarily chosen orthonormal basis for the
|R|-dimensional vector space acted upon by R:

|R, i , i = 1, 2, . . .|R| . (9.26)

When a charge R is transported around a closed path that encloses a ﬂux

a ∈ G, there is a nontrivial Aharonov-Bohm eﬀect — the basis for R is
24 9 Topological quantum computation

rotated by a unitary matrix DR (a) that represents a:

|R|

|R, j → |R, iDij
R
(a) . (9.27)
i=1

R
The matrix elements Dij (a) are measurable in principle, for example by
conducting interference experiments in which a beam of calibrated charges
can pass on either side of the ﬂux. (The phase of the complex number
R
Dij (a) determines the magnitude of the shift of the interference fringes,
and the modulus of Dij R (a) determines the visibility of the fringes.) Thus

once we have chosen a standard basis for the charges, we can use the
charges to attach labels (elements of G) to all fluxes. The flux labels
are unambiguous as long as the representation R is faithful, and barring
any group automorphisms (which create ambiguities that we are free to
resolve however we please).
However, the group elements that we attach to the fluxes depend on our
conventions. Suppose I am presented with k fluxons (particles that carry
flux), and that I use my standard charges to measure the flux of each
particle. I assign group elements a1 , a2 , . . . , ak ∈ G to the k fluxons. You
are then asked to measure the flux, to verify my assignments. But your
standard charges differ from mine, because they have been surreptitiously
transported around another flux (one that I would label with g ∈ G).
Therefore you will assign the group elements ga1 g −1 , ga2g −1 , . . ., gak g −1
to the k fluxons; our assignments differ by an overall conjugation by g.
The moral of this story is that the assignment of group elements to
fluxons is inherently ambiguous and has no invariant meaning. But be-
cause the valid assignments of group elements to fluxons differ only by
conjugation by some element g ∈ G, the conjugacy class of the flux in
G does have an invariant meaning on which all observers will agree. In-
deed, even if we fix our conventions at the charge bureau of standards, the
group element that we assign to a particular fluxon may change if that
fluxon takes part in a physical process in which it braids with other flux-
ons. For that reason, the fluxons belonging to the same conjugacy class
should all be regarded as indistinguishable particles, even though they
come in many varieties (one for each representative of the class) that can
be distinguished when we make measurements at a particular time and
place: The fluxons are nonabelian anyons.

9.9 Braiding of nonabelian ﬂuxons

We will see that, for a nonabelian superconductor with suitable properties,
it is possible to operate a fault-tolerant universal quantum computer by
9.9 Braiding of nonabelian ﬂuxons 25

manipulating the ﬂuxons. The key thing to understand is what happens

when two fluxons are exchanged with one another.
For this purpose, imagine that we carefully calibrate two fluxons, and
label them with elements of the group G. The labels are assigned by
establishing a standard basis for the charged particles at a basepoint x0 .
Then a standard path, designated α, is chosen that begins at x0 , winds
counterclockwise around the fluxon on the left, and returns to x0 . Finally,
charged particles are carried around the closed path α, and it is observed
that under this parallel transport, the particles are acted upon by D(a),
where D is the representation of G according to which the charged parti-
cles transform, and a ∈ G is the particular group element that we assign
to the fluxon. Similarly, another standard path, designated β, is chosen
that begins at x0 , winds counterclockwise around the fluxon on the right,
and returns to x0 ; the effect of parallel transport around β is found to be
D(b), and so the fluxon on the right is labeled with b ∈ G.
Now imagine that a counterclockwise exchange of the two fluxons is
performed, after which the calibration procedure is repeated. How will
the fluxons be labeled now?
To find the answer, consider the path αβα−1 ; here we use α−1 to denote
the path α traversed in reverse order, and we have adopted the convention
that αβα−1 denotes the path in which α−1 is traversed first, followed by
β and then α. Now observe that if, as the two fluxons are exchanged
counterclockwise, we deform the paths so that they are never crossed by
the fluxons, then the path αβα−1 is deformed to the path α, while the
path α is deformed to β:

αβα−1 → α , α→β . (9.28)

DED 1
D E

x0
D
x0

It follows that the effect of transporting a charge around the path α, after
the exchange, is equivalent to the effect of transport around the path
αβα−1 , before the exchange; similarly, the effect of transport around β,
after the exchange, is the same as the effect of transport around α before.
We conclude that the braid operator R representing a counterclockwise
26 9 Topological quantum computation

exchange acts on the ﬂuxes according to

R : |a, b → |aba−1 , a . (9.29)
Of course, if the fluxes a and b are commuting elements of G, all the
braiding does is swap the positions of the two labels. But if a and b do not
commute, the effect of the exchange is more subtle and interesting. The
asymmetric form of the action of R is a consequence of our conventions
and of the (counterclockwise) sense of the exchange; the inverse operator
R−1 representing a clockwise exchange acts as
R−1 : |a, b → |b, b−1ab . (9.30)
Note that the total flux of the pair of fluxons can be detected by a charged
particle that traverses the path αβ that encloses both members of the
pair. Since in principle the charge detecting this total flux could be far,
far away, the exchange ought not to alter the total flux; indeed, we find
that the product flux ab is preserved by R and by R−1 .
The effect of two successive counterclockwise exchanges is the “mon-
odromy” operator R2 , representing the counterclockwise winding of one
fluxon about the other, whose action is
R2 : |a, b → |(ab)a(ab)−1, (ab)b(ab)−1 ; (9.31)
both fluxes are conjugated by the total flux ab. That is, winding a coun-
terclockwise about b conjugates b by a (and similarly, winding b clockwise
about a conjugates a by b−1 ). The nontrivial monodromy means that if
many fluxons are distributed in the plane, and one of these fluxons is to
be brought to my laboratory for analysis, the group element I assign to
the fluxon may depend on the path the flux follows as it travels to my lab.
If for one choice of path the flux is labeled by a ∈ G, then for other paths
any other element bab−1 might in principle be assigned. Thus, the conju-
gacy class in G represented by the fluxon is invariant, but the particular
representative of that class is ambiguous.
For example, suppose the group is G = S3 , the permutation group
on three objects. One of the conjugacy classes contains all of the two-
cycle permutations (transpositions of two objects), the three elements
{(12), (23), (31)}. When two such two-cycles fluxes are combined, there
are three possibilities for the total flux — the trivial flux e, or one of the
three-cycle fluxes (123) or (132). If the total flux is trivial, the braiding
of the two fluxes is also trivial (a and b = a−1 commute). But if the total
flux is nontrivial, then the braid operator R has orbits of length three:
R : |(12), (23) → |(31), (12) → |(23), (31) → |(12), (23) ,
R : |(23), (12) → |(31), (23) → |(12), (31) → |(23), (12) ,
(9.32)
9.9 Braiding of nonabelian fluxons 27

Thus, if the two fluxons are exchanged three times, they swap positions
(the number of exchanges is odd), yet the labeling of the state is unmod-
ified. This observation means that there can be quantum interference
between the “direct” and “exchange” scattering of two fluxons that carry
distinct labels in the same conjugacy class, reinforcing the notion that
fluxes carrying conjugate labels ought to be regarded as indistinguishable
particles.
Since the braid operator acting on pairs of two-cycle fluxes satisfies
R3 = I, its eigenvalues are third roots of unity. For example, by taking
linear combinations of the three states with total flux (123), we obtain
the R eigenstates

R=1: |(12), (23) + |(31), (12) + |(23), (31) ,

R=ω: |(12), (23) + ω̄|(31), (12) + ω|(23), (31) ,
R = ω̄ : |(12), (23) + ω|(31), (12) + ω̄|(23), (31) , (9.33)

where ω = e2πi/3 .
Although a pair of fluxes |a, a−1 with trivial total flux has trivial braid-
ing properties, it is interesting for another reason — it carries charge. The
way to detect the charge of an object is to carry a flux b around the ob-
ject (counterclockwise); this modifies the object by the action of DR (b) for
some representation R of G. If the charge is zero then the representation
is trivial — D(b) = I for all b ∈ G. But if we carry flux b counterclockwise
around the state |a, a−1 , the state transforms as

|a, a−1 → |bab−1 , ba−1b−1 , (9.34)

a nontrivial action (for at least some b) if a belongs to a conjugacy class

with more than one element. In fact, for each conjugacy class α, there is
a unique state |0; α with zero charge, the uniform superposition of the
class representatives:
1
|0; α = |a, a−1 , (9.35)
|α| a∈α

where |α| denotes the order of α. A pair of fluxons in the class α that can
be created in a local process must not carry any conserved charges and
therefore must be in the state |0; α. Other linear combinations orthogonal
to |0; α carry nonzero charge. This charge carried by a pair of fluxons can
be detected by other fluxons, yet oddly the charge cannot be localized on
the core of either particle in the pair. Rather it is a collective property of
the pair. If two fluxons with a nonzero total charge are brought together,
complete annihilation of the pair will be forbidden by charge conservation,
even though the total flux is zero.
28 9 Topological quantum computation

In the case of a pair of ﬂuxons from the two-cycle class of G = S3 , for

example, there is a two-dimensional subspace with trivial total flux and
nontrivial charge, for which we may choose the basis
|0 = |(12), (12) + ω̄|(23), (23) + ω|(31), (31) ,
|1 = |(12), (12) + ω|(23), (23) + ω̄|(31), (31) . (9.36)
If a flux b is carried around the pair, both fluxes are conjugated by b;
therefore the action (by conjugation) of S3 on these states is

0 1 0 ω̄ 0 ω
D(12) = , D(23) = , D(31) = ,
1 0 ω 0 ω̄ 0

ω 0 ω̄ 0
D(123) = , D(132) = . (9.37)
0 ω̄ 0 ω
This action is just the two-dimensional irreducible representation R = [2]
of S3 , and so we conclude that the charge of the pair of fluxons is [2].
Furthermore, under braiding this charge carried by a pair of fluxons can
be transferred to other particles. For example, consider a pair of particles,
each of which carries charge but no flux (I will refer to such particles as
chargeons), such that the total charge of the pair is trivial. If one of
the chargeons transforms as the unitary irreducible representation R of
G, there is a unique conjugate representation R̄ that can be combined
with R to give the trivial representation; if {|R, i} is a basis for R, then
a basis {|R̄, i} can be chosen for R̄, such that the chargeon pair with
trivial charge can be expressed as
1
|0; R = |R, i ⊗ |R̄, i . (9.38)
|R| i
Imagine that we create a pair of fluxons in the state |0; α and also
create a pair of chargeons in the state |0; R. Then we wind the chargeon
with charge R counterclockwise around the fluxon with flux in class α,
and bring the two chargeons together again to see if they will annihilate.
What happens?
For a fixed value a ∈ α of the flux, the effect of the winding on the
state of the two chargeons is
1
|0; R → |R, j ⊗ |R̄, iDji
R
(a) ; (9.39)
|R| i,j
if the charge of the pair were now measured, the probability that zero
total charge would be found is the square of the overlap of this state with
|0; R, which is
2
χR (a)
Prob(0) = , (9.40)
|R|
9.10 Superselection sectors of a nonabelian superconductor 29

where
χR (a) = R
Dii (a) = tr DR (a) (9.41)
i
is the character of the representation R, evaluated at a. In fact, the
character (a trace) is unchanged by conjugation — it takes the same value
for all a ∈ α. Therefore, eq. (9.40) is also the probability that the pair of
chargeons has zero total charge when one chargeon (initially a member
of a pair in the state |0; R) winds around one fluxon (initially a member
of a pair in the state |0; α). Of course, since the total charge of all four
particles is zero and charge is conserved, after the winding the two pairs
have opposite charges — if the pair of chargeons has total charge R , then
the pair of fluxons must have total charge R̄ , combined with R to give
trivial total charge. A pair of particles with zero total charge and flux can
annihilate, leaving no stable particle behind, while a pair with nonzero
charge will be unable to annihilate completely. We conclude, then, that
if the world lines of a fluxon pair and a chargeon pair link once, the
probability that both pairs will be able to annihilate is given by eq. (9.40).
This probability is less than one, provided that the representation of R
is not one dimensional and the class α is not represented trivially. Thus
the linking of the world lines induces an exchange of charge between the
two pairs.
For example, in the case where α is the two-cycle class of G = S3 and
R = [2] (the two-dimensional irreducible representation of S3 ), we see
from eq. (9.37) that χ[2](α) = 0. Therefore, charge is transfered with
certainty; after the winding, both the fluxon pair and the chargeon pair
transform as R = [2].

9.10 Superselection sectors of a nonabelian superconductor

In our discussion so far of the nonabelian superconductor, we have been
considering two kinds of particles: fluxons, which carry flux but no charge,
and chargeons, which carry charge but no flux. These are not the most
general possible particles. It will be instructive to consider what happens
when we build a composite particle by combining a fluxon with a chargeon.
In particular, what is the charge of the composite? This question is
surprisingly subtle; to answer cogently, we should think carefully about
how the charge can be measured.
In principle, charge can be measured in an Aharonov-Bohm interference
experiment. We could hide the object whose charge is to be found behind
a screen in between two slits, shoot a beam of carefully calibrated fluxons
at the screen, and detect the fluxons on the other side. From the shift and
visibility of the interference pattern revealed by the detected positions of
the fluxons, we can determine DR (b) for each b ∈ G, and so deduce R.
30 9 Topological quantum computation

However, there is a catch if the object being analyzed carries a nontrivial

flux a ∈ G as well as charge. Since carrying a flux b around the flux a
changes a to bab−1 , the two possible paths followed by the b flux do not
interfere, if a and b do not commute. After the b flux is detected, we could
check whether the a flux has been modified, and determine whether the b
flux passed through the slit on the left or the slit on the right. Since the
flux (a or bab−1 ) is correlated with the “which way” information (left or
right slit), the interference is destroyed.
Therefore, the experiment reveals information about the charge only if
a and b commute. Hence the charge attached to a flux a is not described
as an irreducible representation of G; instead it is an irreducible repre-
sentation of a subgroup of G, the normalizer N (a) of a in G, which is
defined as
N (a) = {b ∈ G|ab = ba} . (9.42)
The normalizers N (a) and N (bab−1) are isomorphic, so we may associate
the normalizer with a conjugacy class α of G rather than with a par-
ticular element, and denote it as N (α). Therefore, each type of particle
that can occur in our nonabelian superconductor really has two labels:
a conjugacy class α describing the flux, and an irreducible representa-
tion R(α) of N (α) describing the charge. We say that α and R(α) label
the superselection sectors of the theory, as these are the properties of a
localized object that must be conserved in all local physical processes.
For particles that carry the labels (α, R(α)), it is possible to establish a
“bureau of standards” where altogether |α| · |R(α)| ≡ d(α,R(α)) different
particle species can be distinguished at a particular time and place —
this number is called the dimension of the sector. But if these particles
are braided with other particles the species may change, while the labels
(α, R(α)) remain invariant.
In any theory of anyons, a dimension can be assigned to each particle
type, although as we will see, in general the dimension need not be an
integer, and may have no direct interpretation in terms the counting of
distinct species of the same type. The total dimension D can be defined
by summing over all types; in the case of a nonabelian superconductor we
have
D2 = d2(α,R(α)) = |α|2 |R(α)|2 . (9.43)
α R(α) α R(α)

Since the sum over the dimension squared for all irreducible representa-
tions of a ﬁnite group is the order of the group, and the order of the
normalizer N (α) is |G|/|α|, we obtain

D2 = |α| · |G| = |G|2 ; (9.44)
α
9.10 Superselection sectors of a nonabelian superconductor 31

the total dimension is D = |G|.

For the case G = S3 there are 8 particle types, listed here:
Type Flux Charge Dim
A e [+] 1
B e [-] 1
C e [2] 2
D (12) [+] 3
E (12) [-] 3
F (123) [1] 2
G (123) [ω] 2
H (123) [ω̄] 2
If the flux is trivial (e), then the charge can be any one of the three
irreducible representations of S3 — the trivial one-dimensional represen-
tation [+], the nontrivial one-dimensional representation [-], or the two-
dimensional representation [2]. If the flux is a two-cycle, then the normal-
izer group is Z2 , and the charge can be either the trivial representation
[+] or the nontrivial representation [-]. And if the flux is a three-cycle,
then the normalizer group is Z3 , and the charge can be either the trivial
representation [1], the nontrivial representation [ω], or its conjugate rep-
resentation [ω̄]. You can verify that the total dimension is D = |S3 | = 6,
as expected.
Note that since a commutes with all elements of N (a) by definition, the
(a)
matrix D R (a) that represents a in the irreducible representation R(a)
commutes with all matrices in the representation; therefore by Schur’s
lemma it is a multiple of the identity:
(a)
DR (a) = exp (iθR(a) ) I . (9.45)
To appreciate the significance of the phase exp (iθR(a) ), consider a flux-
charge composite in which a chargeon in representation R(a) is bound to
the flux a, and imagine rotating the composite object counterclockwise
by 2π. This rotation carries the charge around the flux, generating the
phase
e−2πiJ = eiθR(a) ; (9.46)
therefore each superselection sector has a definite value of the topological
spin, determined by θR(a) .
When two different particle types are fused together, the composite
object can be of various types, and the fusion rules of the theory specify
which types are possible. The flux of the composite can belong to any of
the conjugacy classes that can be obtained as a product of representatives
of the classes that label the two constituents. Finding the charge of the
composite is especially tricky, as we must decompose a tensor product of
32 9 Topological quantum computation

representations of two diﬀerent normalizer groups as a sum of representa-

tions of the normalizer of the product ﬂux. In the case G = S3 , the rule
governing the fusion of two particles of type D, for example, is

D×D = A+C+F +G+H (9.47)

We have already noted that the fusion of two two-cycle fluxes can yield
either a trivial total flux or a three-cycle flux, and that the charge of the
composite with trivial total flux can be either [+] or [2]. If the total flux
is a three-cycle, then the charge eigenstates are just the braid operator
eigenstates that we constructed in eq. (9.33).
For a system of two anyons, why should the eigenstates of the total
charge also be eigenstates of the braid operator? We can understand this
connection more generally by thinking about the angular momentum of
the two-anyon composite object. The monodromy operator R2 captures
the effect of winding one particle counterclockwise around another. This
winding is almost the same thing as rotating the composite system coun-
terclockwise by 2π, except that the rotation of the composite system also
rotates both of the constituents. We can compensate for the rotation of
the constituents by following the counterclockwise rotation of the compos-
ite by a clockwise rotation of the constituents. Therefore, the monodromy
operator can be expressed as

(Rcab)2 = e−2πiJ c e2πiJ a e2πiJ b = ei(θc −θa −θb ) . (9.48)

Here Rcab denotes the braid operator for a counterclockwise exchange of

particles of types a and b that are combined together into a composite
of type c, and we are using a more succinct notation than before, in
which a, b, c are complete labels for the superselection sectors (specifying,
in the nonabelian superconductor model, both the ﬂux and the charge).
Since each superselection sector has a deﬁnite topological spin, and the
monodromy operator is diagonal in the topological spin basis, we see that
eigenstates of the braid operator coincide with charge eigenstates. Note
that eq. (9.48) generalizes our earlier observations about abelian anyons
— that a composite of two identical anyons has topological spin ei4θ , and
that the exchange phase of an anyon-antianyon pair (with trivial total
spin) is e−iθ .

9.11 Quantum computing with nonabelian ﬂuxons

A model of anyons is characterized by the answers to two basic questions:
(1) What happens when two anyons are combined together (what are
the fusion rules)? (2) What happens when two anyons are exchanged
(what are the braiding rules)? We have discussed how these questions
9.11 Quantum computing with nonabelian ﬂuxons 33

are answered in the special case of a nonabelian superconductor model

associated with a nonabelian ﬁnite group G, and now we wish to see
how these fusion and braiding rules can be invoked in a simulation of a
quantum circuit.
In formulating the simulation, we will assume these physical capabili-
ties:

Pair creation and identiﬁcation. We can create pairs of particles, and

for each pair we can identify the particle type (the conjugacy class
α of the flux of each particle in the pair, and the particles’s charge
— an irreducible representation R(α) of the flux’s normalizer group
N (α)). This assumption is reasonable because there is no symmetry
relating particles of different types; they have distinguishable phys-
ical properties — for example, different energy gaps and effective
masses. In practice, the only particle types that will be needed are
fluxons that carry no charge and chargeons that carry no flux.

Pair annihilation. We can bring two particles together, and observe

whether the pair annihilates completely. Thus we obtain the answer
to the question: Does this pair of particles have trivial ﬂux and
charge, or not? This assumption is reasonable, because if the pair
carries a nontrivial value of some conserved quantity, a localized
excitation must be left behind when the pair fuses, and this leftover
particle is detectable in principle.

Braiding. We can guide the particles along speciﬁed trajectories, and so

perform exchanges of the particles. Quantum gates will be simulated
by choosing particles world lines that realize particular braids.

These primitive capabilities allow us to realize some further derived

capabilities that will be used repeatedly. First, we can use the chargeons
to calibrate the fluxons and assemble a flux bureau of standards. Suppose
that we are presented with two pairs of fluxons in the states |a, a−1 and
|b, b−1, and we wish to determine whether the fluxes a and b match or
not. We create a chargeon-antichargeon pair, where the charge of the
chargeon is the irreducible representation R of G. Then we carry the
chargeon around a closed path that encloses the first member of the first
fluxon pair and the second member of the second fluxon pair, we reunite
the chargeon and antichargeon, and observed whether the chargeon pair
annihilates or not. Since the total flux enclosed by the chargeon’s path is
ab−1 , the chargeon pair annihilates with probability
2
χR (ab−1 )
Prob(0) = , (9.49)
|R|
34 9 Topological quantum computation

which is less than one if the flux ab−1 is not the identity (assuming that the
representation R is not one-dimensional and represents ab−1 nontrivially).
Thus, if annihilation of the chargeon pair does not occur, we know for sure
that a and b are distinct fluxes, and each time annihilation does occur,
it becomes increasingly likely that a and b are equal. By repeating this
procedure a modest number of times, we can draw a conclusion about
whether a and b are the same, with high statistical confidence.
This procedure allows us to sort the fluxon pairs into bins, where each
pair in a bin has the same flux. If a bin contains n pairs, its state is, in
general, a mixture of states of the form

ψa|a, a−1 ⊗n . (9.50)
a∈G

By discarding just one pair in the bin, each such state becomes a mixture

ρa (|aa|)⊗(n−1) ; (9.51)
a∈g

we may regard each bin as containing (n − 1) pairs, all with the same
definite flux, but where that flux is as yet unknown.
Which bin is which? We want to label the bins with elements of G. To
arrive at a consistent labeling, we withdraw fluxon pairs from three dif-
ferent bins. Suppose the three pairs are |a, a−1 , |b, b−1, and |c, c−1, and
that we want to check whether c = ab. We create a chargeon-antichargeon
pair, carry the chargeon around a closed path that encloses the first mem-
ber of the first fluxon pair, the first member of the second fluxon pair,
and second member of the third fluxon pair, and observe whether the
reunited chargeon pair annihilates or not. Since the total flux enclosed
by the chargeon’s path is abc−1 , by repeating this procedure we can de-
termine with high statistical confidence whether ab and c are the same.
Such observations allow us to label the bins in some manner that is consis-
tent with the group composition rule. This labeling is unique apart from
group automorphisms (and ambiguities arising from any automorphisms
may be resolved arbitrarily).
Once the flux bureau of standards is established, we can use it to mea-
sure the unknown flux of an unlabeled pair. If the state of the pair to
be measured is |d, d−1 , we can withdraw the labeled pair |a, a−1 from
a bin, and use chargeon pairs to measure the flux ad−1 . By repeating
this procedure with other labeled fluxes, we can eventually determine the
value of the flux d, realizing a projective measurement of the flux.
For a simulation of a quantum circuit using fluxons, we will need to
perform logic gates that act upon the value of the flux. The basic gate we
will use is realized by winding counterclockwise a fluxon pair with state
9.11 Quantum computing with nonabelian fluxons 35

|a, a−1 around the first member of another fluxon pair with state |b, b−1.
Since the |a, a−1 pair has trivial total flux, the |b, b−1 pair is unaffected
by this procedure. But since in effect the flux b travels counterclockwise
about both members of the pair whose initial state was |a, a−1 , this pair
is transformed as
|a, a−1 → |bab−1 , ba−1b−1 . (9.52)

We will refer to this operation as the conjugation gate acting on the fluxon
pair.
To summarize what has been said so far, our primitive and derived
capabilities allow us to: (1) Perform a projective flux measurement, (2)
perform a destructive measurement that determines whether or not the
flux and charge of a pair is trivial, and (3) execute a conjugation gate.
Now we must discuss how to simulate a quantum circuit using these ca-
pabilities.
The next step is to decide how to encode qubits using fluxons. Ap-
propriate encodings can be chosen in many ways; we will stick to one
particular choice that illustrates the key ideas — namely we will encode a
qubit by using a pair of fluxons, where the total flux of the pair is trivial.
We select two noncommuting elements a, b ∈ G, where b2 = e, and choose
a computational basis for the qubit

|0̄ = |a, a−1 , |1̄ = |bab−1 , ba−1 b−1 . (9.53)

The crucial point is that a single isolated fluxon with flux a looks iden-
tical to a fluxon with the conjugate flux bab−1 . Therefore, if the two
fluxons in a pair are kept far apart from one another, local interactions
with the environment will not cause a superposition of the states |0̄ and
|1̄ to decohere. The quantum information is protected from damage be-
cause it is stored nonlocally, by exploiting a topological degeneracy of the
states where the fluxon and antifluxon are pinned to fixed and distantly
separated positions.
However, in contrast with the topological degeneracy that arises in
systems with abelian anyons, this protected qubit can be measured rela-
tively easily, without resorting to delicate interferometric procedures that
extract Aharonov-Bohm phases. We have already described how to mea-
sure flux using previously calibrated fluxons; therefore we can perform
a projective measurement of the encoded Pauli operator Z̄ (a projection
onto the basis {|0̄, |1̄}). We can also measure the complementary Pauli
operator X̄, albeit destructively and imperfectly. The X̄ eigenstates are

1 1
|± = √ (|0̄ ± |1̄) ≡ √ |a, a−1 ± |bab−1 , ba−1 b−1 ; (9.54)
2 2
36 9 Topological quantum computation

therefore the state |− is orthogonal to the zero-charge state

1
|0; α = |c, c−1 , (9.55)
|α| c∈α

where α is the conjugacy class that contains a. On the other hand, the
state |+ has a nonzero overlap with |0; α

+|0; α = 2/|α| ; (9.56)

Therefore, if the two members of the fluxon pair are brought together,
complete annihilation is impossible if the state of the pair is |−, and
occurs with probability Prob(0) = 2/|α| if the state is |+.
Note that it is also possible to prepare a fluxon pair in the state |+.
One way to do that is to create a pair in the state |0; α. If α contains
only the two elements a and bab−1 we are done. Otherwise, we compare
the newly created pair with calibrated pairs in each of the states |c, c−1,
where c ∈ α and c is distinct from both a and bab−1 . If the pair fails to
match any of these |c, c−1 pairs, its state must be |+.
To go further, we need to characterize the computational power of the
conjugation gate. Let us use a more compact notation, in which the
state |x, x−1 of a fluxon pair is simply denoted |x, and consider the
transformations of the state |x, y, z that can be built from conjugation
gates. By winding the third pair through the first, either counterclockwise
or clockwise, we can execute the gates

|x, y, z → |x, y, xzx−1 , |x, y, z → |x, y, x−1zx , (9.57)

and by winding the third pair through the second, either counterclockwise
or clockwise, we can execute

|x, y, z → |x, y, yzy −1 , |x, y, z → |x, y, y −1zy ; (9.58)

furthermore, by borrowing a pair with ﬂux |c from the bureau of stan-
dards, we can execute

|x, y, z → |x, y, czc−1 (9.59)

for any constant c ∈ G. Composing these elementary operations, we can

execute any gate of the form

|x, y, z → |x, y, f zf −1 , (9.60)

where the function f (x, y) can be expressed in product form — that is,
as a ﬁnite product of group elements, where the elements appearing in
9.11 Quantum computing with nonabelian ﬂuxons 37

the product may be the inputs x and y, their inverses x−1 and y −1 , or
constant elements of G, each of which may appear in the product any
number of times.
What are the functions f (x, y) that can be expressed in this form?
The answer depends on the structure of the group G, but the following
characterization will suffice for our purposes. Recall that a subgroup H
of a finite group G is normal if for any h ∈ H and any g ∈ G, ghg −1 ∈ H,
and recall that a finite group G is said to be simple if G has no normal
subgroups other than G itself and the trivial group {e}. It turns out that
if G is a simple nonabelian finite group, then any function f (x, y) can be
expressed in product form. In the computer science literature, a closely
related result is often called Barrington’s theorem.
In particular, then, if the group G is a nonabelian simple group, there
is a function f realizable in product form such that
f (a, a) = f (a, bab−1) = f (bab−1 , a) = e , f (bab−1 , bab−1) = b . (9.61)
Thus for x, y, z ∈ {a, bab−1}, the action eq. (9.60) causes the flux of the
third pair to “flip” if and only if x = y = bab−1 ; we have constructed
from our elementary operations a Toffoli gate in the computational ba-
sis. Therefore, conjugation gates suffice for universal reversible classical
computation acting on the standard basis states.
The nonabelian simple group of minimal order is A5 , the group of even
permutations of five objects, with |A5 | = 60. Therefore, one concrete
realization of universal classical computation using conjugation gates is
obtained by choosing a to be the three-cycle element a = (345) ∈ A5 , and
b to be the product of two-cycles b = (12)(34) ∈ A5 , so that bab−1 = (435).
With this judicious choice of the group G, we achieve a topological real-
ization of universal classical computation, but how can be go still further,
to realize universal quantum computation? We have the ability to prepare
computational basis states, to measure in the computational basis, and
to execute Toffoli gates, but these tools are entirely classical. The only
nonclassical tricks at our disposal are the ability to prepare X̄ = 1 eigen-
states, and the ability to perform an imperfect destructive measurement
of X̄. Fortunately, these additional capabilities are sufficient.
In our previous discussions of quantum fault tolerance, we have noted
that if we can do the classical gates Toffoli and CNOT, it suffices for
universal quantum computation to be able to apply each of the Pauli op-
erators X, Y , and Z, and to be able to perform projective measurements
of each of X, Y , and Z. We already know how to apply the classical
gate X and to measure Z (that is, project onto the computational basis).
Projective measurement of X and Y , and execution of Z, are still missing
from our repertoire. (Of course, if we can apply X and Z, we can also
apply their product ZX = iY .)
38 9 Topological quantum computation

Next, let’s see how to elevate our imperfect destructive measurement

of X to a reliable projective measurement of X. Recall the action by
conjugation of a CNOT on Pauli operators:

CNOT : XI → XX , (9.62)

where the first qubit is the control and the second qubit is the target of
the CNOT. Therefore, CNOT gates, together with the ability to prepare
X = 1 eigenstates and to perform destructive measurements of X, suffice
to realize projective measurements of X. We can prepare an ancilla qubit
in the X = 1 eigenstate, perform a CNOT with the ancilla as control
and the data to be measured as target, and then measure the ancilla
destructively. The measurement prepares the data in an eigenstate of X,
whose eigenvalue matches the outcome of the measurement of the ancilla.
In our case, the destructive measurement is not fully reliable, but we
can repeat the measurement multiple times. Each time we prepare and
measure a fresh ancilla, and after a few repetitions, we have acceptable
statistical confidence in the inferred outcome of the measurement.
Now that we can measure X projectively, we can prepare X = −1
eigenstates as well as X = 1 eigenstates (for example, we follow a Z mea-
surement with an X measurement until we eventually obtain the outcome
X = −1). Then, by performing a CNOT gate whose target is an X = −1
eigenstate, we can realize the Pauli operator Z acting on the control qubit.
It only remains to show that a measurement of Y can be realized.
Measurement of Y seems problematic at first, since our physical capa-
bilities have not provided any means to distinguish between Y = 1 and
Y = −1 eigenstates (that is, between a state ψ and its complex conjugate
ψ ∗ ). However, this ambiguity actually poses no serious difficulty, because
it makes no difference how the ambiguity is resolved. Were we to replace
measurement of Y by measurement of −Y in our simulation of a unitary
transformation U , the effect would be that U ∗ is simulated instead; this
replacement would not alter the probability distributions of outcomes for
measurements in the standard computational basis.
To be explicit, we can formulate a protocol for measuring Y by noting
first that applying a Toffoli gate whose target qubit is an X = −1 eigen-
state realizes the controlled-phase gate Λ(Z) acting on the two control
qubits. By composing this gate with the CNOT gate Λ(X), we obtain
the gate Λ(iY ) acting as

Λ(iY ) : |X = +1 ⊗ |Y = +1 → |Y = +1 ⊗ |Y = +1 ,

|X = +1 ⊗ |Y = −1 → |Y = −1 ⊗ |Y = −1 ,
|X = −1 ⊗ |Y = +1 → |Y = −1 ⊗ |Y = +1 ,
|X = −1 ⊗ |Y = −1 → |Y = +1 ⊗ |Y = −1 , (9.63)
9.11 Quantum computing with nonabelian ﬂuxons 39

where the first qubit is the control and the second is the target. Now
suppose that my trusted friend gives me just one qubit that he assures
me has been prepared in the state |Y = 1. I know how to prepare
|X = 1 states myself and I can execute Λ(iY ) gates; therefore since a
Λ(iY ) gate with |Y = 1 as its target transforms |X = 1 to |Y = 1, I
can make many copies of the |Y = 1 state I obtained from my friend.
When I wish to measure Y , I apply the inverse of Λ(iY ), whose target is
the qubit to be measured, and whose control is one of my Y = 1 states;
then I perform an X measurement of the ancilla to read out the result of
the Y measurement of the other qubit.
What if my friend lies to me, and gives me a copy of the state |Y = −1
instead? Then I’ll make many copies of the |Y = −1 state, and I will
be measuring −Y when I think I am measuring Y . My simulation will
work just the same as before; I’ll actually be simulating the complex
conjugate of the ideal circuit, but that won’t change the final outcome of
the quantum computation. If my friend flipped a coin to decide whether
to give me the |Y = 1 state or the |Y = −1, this too would have no
effect on the fidelity of my simulation. Therefore, it turns out I don’t
need by friend’s help at all — instead of using the |Y = 1 state I would
have received from him, I may use the random state ρ = I/2 (an equally
weighted mixture of |Y = 1 and |Y = −1, which I know how to prepare
myself).
This completes the demonstration that we can simulate a quantum cir-
cuit efficiently and fault tolerantly using the fluxons and chargeons of
a nonabelian superconductor, at least in the case where G is a simple
nonabelian finite group.§ Viewed as a whole, including all state prepara-
tion and calibration of fluxes, the simulation can be described this way:
Many pairs of anyons (fluxons and chargeons) are prepared, the anyon
world lines follow a particular braid, and pairs of anyons are fused to see
whether they will annihilate. The simulation is nondeterministic in the
sense that the actual braid executed by the anyons depends on the out-
comes of measurements performed (via fusion) during the course of the
simulation. It is robust if the temperature is low compared to the energy
gap, and if particles are kept sufficiently far apart from one another (ex-
cept when pairs are being created and fused), to suppress the exchange
of virtual anyons. Small deformations in the world lines of the particles
have no effect on the outcome of the computation, as long as the braiding
of the particles is in the correct topological class.

§
Mochon has shown that universal quantum computation is possible for a larger class
of groups.
40 9 Topological quantum computation

9.12 Anyon models generalized

Our discussion of the nonabelian superconductor model provides an exis-
tence proof for fault-tolerant quantum computation using anyons. But the
model certainly has drawbacks. The scheme we described lacks beauty,
elegance, or simplicity.
I have discussed this model in such detail because it is rather concrete
and so helps us to build intuition about the properties of nonabelian
anyons. But now that we understand better the key concepts of braiding
and fusing in anyon models, we are ready to start thinking about anyons
in a more general and abstract way. Our new perspective will lead us
to new models, including some that are far simpler than those we have
considered so far. We will be able to jettison much of the excess baggage
that burdened the nonabelian superconductor model, such as the distinc-
tion between fluxons and chargeons, the calibration of fluxes, and the
measurements required to simulate nonclassical gates. The simpler mod-
els we will now encounter are more naturally conducive to fault-tolerant
computing, and more plausibly realizable in reasonable physical systems.
A model of anyons is a theory of particles on a two-dimensional surface
(which we will assume to be the plane), where the particles carry locally
conserved charges. We also assume that the theory has a mass gap, so
that there are no long-range interactions between particles mediated by
massless particles. The model has three defining properties:

1. A list of particle types. The types are labels that specify the possible
values of the conserved charge that a particle can carry.
2. Rules for fusing and splitting, which specify the possible values of the
charge that can be obtained when two particles of known charge
are combined together, and the possible ways in which the charge
carried by a single particle can be split into two parts.
3. Rules for braiding, which specify what happens when two particles are
exchanged (or when one particle is rotated by 2π).

Let’s now discuss each of these properties in more detail.

9.12.1 Labels
I will use Latin letters {a, b, c, . . .} for the labels that distinguish diﬀerent
types of particles. (For the case of the nonabelian superconductor, the
label was (α, R(α)), specifying a conjugacy class and an irreducible rep-
resentation of the normalizer of the class, but now our notation will be
more compact). We will assume that the set of possible labels is ﬁnite.
The symbol a represents the value of the conserved charge carried by the
9.12 Anyon models generalized 41

particle. Sometimes we say that this label speciﬁes a superselection sector

of the theory. This term just means that the label a is a property of a
localized object that cannot be changed by any local physical process.
That is, if one particle is at all times well isolated from other particles,
its label will never change. In particular, local interactions between the
particle and its environment may jostle the particle, but will not alter the
label. This local conservation of charge is the essential reason that anyons
are amenable to fault-tolerant quantum information processing.
There is one special label — the identity label 1. A particle with the
label 1 is really the same thing as no particle at all. Furthermore, for
each particle label a there is a conjugate label ā, and there is a charge
conjugation operation C (where C 2 = I) acting on the labels that maps
a label to its conjugate:
C : a → ā → a . (9.64)
It is possible for a label to be self-conjugate, so that ā = a. For example,
1̄ = 1.
We will want to consider states of n particles, where the particles have
a speciﬁed order. Therefore, it is convenient to imagine that the particles
are arranged on a particular line (such as the real axis) from left to right
in consecutive order. The n particles are labeled (a1 , a2 , a3 . . . , an ), where
a1 is attached to the particle furthest to the left, an to the particle furthest
to the right.

9.12.2 Fusion spaces

When two particles are combined together, the composite object also has
a charge. The fusion rules of the model specify the possible values of the
total charge c when the constituents have charges a and b. These can be
written
a×b= Nabc
c, (9.65)
c

where each c
Nab is a nonnegative integer and the sum is over the complete
set of labels. Note that a, b and c are labels, not vector spaces; the
product on the left-hand side is not a tensor product and the sum on
the right-hand side is not a direct sum. Rather, the fusion rules can be
regarded as an abstract relation on the label set that maps the ordered
c . This relation is symmetric in a and b (a × b = b × a)
triple (a, b; c) to Nab
— the possible charges of the composite do not depend on whether a is on
the left or the right. Read backwards, the fusion rules specify the possible
ways for the charge c to split into two parts with charges a and b.
c
If Nab = 0, then charge c cannot be obtained when we combine a and
c c
b. If Nab = 1, then c can be obtained — in a unique way. If Nab > 1,
42 9 Topological quantum computation

then c can be obtained in Nab c distinguishable ways. The notion that

fusing two charges can yield a third charge in more than one possible way
should be familiar from group representation theory. For example, the
rule governing the fusion of two octet representations of SU(3) is
8 × 8 = 1 + 8 + 8 + 10 + 10 + 27 , (9.66)
8
so that N88 = 2. We emphasize again, however, that while the fusion
rules for group representations can be interpreted as a decomposition of a
tensor product of vector spaces as a direct sum of vector spaces, in general
the fusion rules in an anyon model have no such interpretation.
The Nabc distinguishable ways that c can arise by fusing a and b can

be regarded as the orthonormal basis states of a Hilbert space Vab c . We

c
call Vab a fusion space and the states it contains fusion states. The basis
elements for Vabc may be denoted

{|ab; c, µ , c
µ = 1, 2, . . . , Nab}. (9.67)
It is quite convenient to introduce a graphical notation for the fusion basis
states:

a b c

P | ab; c, P ² P ¢ ab; c, P |
c a b

The state |ab; c, µ is represented as a circle containing the symbol µ;

connected to the circle are lines labeled a and b with incoming arrows,
representing the charges being fused, and a line labeled c with an outgoing
arrow, representing the result of the fusion. There is a dual vector space
Vcab describing the states that arise when charge c splits into charges a
and b, and a dual basis with the sense of the arrow reversed (c coming in,
a and b going out). The spaces Vabc with diﬀerent values of c are mutually

orthogonal, so that the fusion basis elements satisfy

ab; cµ |ab; c, µ = δcc δµµ , (9.68)
and the completeness of the fusion basis can be expressed as

|ab; c, µab; c, µ| = Iab , (9.69)
c,µ

where Iab denotes the projector onto the space ⊕c Vab

c
, the full Hilbert
space for the anyon pair ab.
9.12 Anyon models generalized 43

c´ a b

P´ P
cc Pc
a b G G
c P ¦P
c,
c a b

P P

c a b

There are some natural isomorphisms among fusion spaces. First of all,
c ∼ c
Vab = Vba ; these vector spaces are associated with diﬀerent labelings of
the two particles (if a = b) and so should be regarded as distinct, but they
are isomorphic spaces because fusion is symmetric. We may also “raise
and lower indices” of a fusion space by replacing a label by its conjugate,
e.g.,
c ∼ b̄ ∼ 1 ∼ b̄c ∼ āb̄ ∼
Vab = Vac̄ = Vabc̄ = Va , = Vc̄ = · · · ; (9.70)
in the diagrammatic notation, we have the freedom to reverse the sense
of a line while conjugating the line’s label. The space Vabc̄1 , represented

as a diagram with three incoming lines, is the space spanned by the dis-
tinguishable ways to obtain the trivial total charge 1 when fusing three
particles with labels a, b, c̄.
The charge 1 deserves its name because it fuses trivially with other
particles:
a×1=a . (9.71)
a ∼ 1
Because of the isomorphism Va1 = Vaā , we conclude that ā is the unique
label that can fuse with a to yield 1, and that this fusion can occur in
a ∼ aā
only one way. Similarly, Va1 = V1 means that pairs of particles created
out of the vacuum have conjugate charges.
An anyon model is nonabelian if

dim c
Vab = c
Nab ≥2 (9.72)
c c

for at least some pair of labels ab; otherwise the model is abelian. In an
abelian model, any two particles fuse in a unique way, but in a nonabelian
model, there are some pairs of particles that can fuse in more than one
way, and there is a Hilbert space of two or more dimensions spanned by
these distinguishable states. We will refer to this space as the “topological
44 9 Topological quantum computation

Hilbert space” of the pair of anyons, to emphasize that this quantum

information is encoded nonlocally — it is a collective property of the
pair, not localized on either particle. Indeed, when the two particles with
labels a and b are far apart, different states in the topological Hilbert
space look identical locally. Therefore, this quantum information is well
hidden, and invulnerable to decoherence due to local interactions with
the environment.
It is for this reason that we propose to use nonabelian anyons in the
operation of a quantum computer. Of course, nonlocally encoded infor-
mation is not only hidden from the environment; we are unable to read
it ourselves as well. However, with nonabelian anyons, we can have our
cake and eat it too! At the conclusion of a quantum computation, when
we are ready to perform the readout, we can bring the anyons together
in pairs and observe the result of this fusion. In fact, it will suffice to
distinguish the case where the charge of the composite is c = 1 from the
case c = 1 — that is, to distinguish a residual particle (unable to decay
because of its nontrivial conserved charge) from no particle at all.
Note that for each pair of anyons this topological Hilbert space is finite-
dimensional. An anyon model with this property is said to be rational.
As in our discussion of the topologically degenerate ground state for an
abelian model, anyons in rational nonabelian models always have topo-
logical spins that are roots of unity.

9.12.3 Braiding: the R-matrix

When two particles with labels a and b undergo a counterclockwise ex-
change, their total charge c is unchanged. Therefore, since the two parti-
cles swap positions on the line, the swap induces a natural isomorphism
c c
mapping the Hilbert space Vba to Vab ; this map is the braid operator
c
R : Vba → Vab
c
. (9.73)

If we choose canonical bases {|ba; c, µ} and {|ab; c, µ} for these two
spaces, R can be expressed as the unitary matrix

R : |ba; c, µ → |ab; c, µ (Rcab)µµ ; (9.74)
µ

note that R may have a nontrivial action on the fusion states. When
we represent the action of R diagrammatically, it is convenient to ﬁx the
positions of the labels a and b on the incoming lines, and twist the lines
counterclockwise as they move toward the fusion vertex (µ)— the graph
c
with twisted lines represents the state in Vab obtained by applying R to
|ba; c, µ, which can be expanded in terms of the canonical basis for Vab
c
:
9.12 Anyon models generalized 45

a b a b
c
c P
P ¦
P c
R ba P Pc

c c

The monodromy operator

R2 : Vab
c
→ Vab
c
(9.75)
c
is an isomorphism from Vab to itself, representing the eﬀect of winding a
counterclockwise around b. As we already remarked in our discussion of
the nonabelian superconductor, the monodromy operator is equivalent to
rotating c by 2π while rotating a and b by −2π; therefore, the eigenvalues
of the monodromy operator are determined by the topological spins of the
particles:
(Rcab)2 = e−2πiJc e2πiJa e2πiJb ≡ ei(θc −θa −θb ) . (9.76)
Furthermore, as we argued for the case of abelian anyons, the topological
spin is determined by the braid operator acting on a particle-antiparticle
pair with trivial total charge:
e−iθa = R1aā (9.77)
(because creating a pair, exchanging, and annihilating is equivalent to
rotating the particle by −2π).

9.12.4 Associativity of fusion: the F -matrix

Fusion is associative:
(a × b) × c = a × (b × c) . (9.78)
Mathematically, this is an axiom satisfied by the fusion rules of an anyon
model. Physically, it is imposed because the total charge of a system of
three particles is an intrinsic property of the three particles, and ought
not to depend on whether we first fuse a and b and then fuse the result
with c, or first fuse b and c and then fuse the result with a.
Therefore, when three particles with charges a, b, c are fused to yield a
total charge of d, there are two natural ways to decompose the topological
Hilbert space in terms of the fusion spaces of pairs of particles:

d ∼
Vabc = e
Vab ⊗ Vebd ∼
= d
Vae e
⊗ Vbc . (9.79)
e e
46 9 Topological quantum computation

a b c a b c

c c c
P d ePQ Pc
e
Q
¦
ec c c
PQ
F
abc ePQ
Qc
e´

d d

The unitary matrices Fabc d are sometimes called fusion matrices; how-

ever, rather than risk causing confusion between F and the fusion rules
c , I will just call it the F -matrix.
Nab

9.12.5 Many anyons: the standard basis

In an anyonic quantum computer, we process the topological quantum
state of n anyons by braiding the anyons. For describing this computation,
it is convenient to adopt a standard basis for such a Hilbert space.
Suppose that n anyons with total charge c, arranged sequentially along
a line, carry labels a1 , a2 , a3 , . . . , an . Imagine fusing anyons 1 and 2, then
fusing the result with anyon 3, then fusing the result with anyon 4, and
so on. Associated with fusion in this order is a decomposition of the
topological Hilbert space of the n anyons

Vac1a2 a3 ···an ∼
= Vab11a2 ⊗ Vbb12a3 ⊗ Vbb23a4 ⊗ · · · ⊗ Vbcn−2 an . (9.82)
b1 ,b2 ,...,bn−2

Note that this space does not have a natural decomposition as a tensor
product of subsystems associated with the localized particles; rather, we
have expressed it as a direct sum of many tensor products. For nonabelian
anyons, its dimension

dim Vac1 a2 a3 ···an ≡ Nac1a2 a3 ···an

= Nab11a2 Nbb12a3 Nbb23a4 . . . Nbcn−2 an (9.83)
b1 ,b2 ,b3 ,...bn−2
9.12 Anyon models generalized 47

is exponential in n; thus the topological Hilbert space is a suitable arena

for powerful quantum information processing.
This decomposition of Vac1a2 a3 ···an suggests a standard basis whose ele-
ments are labeled by the intermediate charges b1 , b2, . . . bn−2 and by the
bj
basis elements {|µj } for the fusion spaces Vbj−1 ,aj+1 :

{|a1 a2 ; b1, µ1 |b1 a3 ; b2, µ2 · · · |bn−3 an−1 ; bn−2 , µn−2 |bn−2 an ; c, µn−1 } ,
(9.84)
or in diagrammatic notation:

a2 a3 a4 an-1 an

a1 P1 P2 P3 Pn 2 Pn 1 c
b1 b2 b3 bn-3 bn-2

Of course, this basis is chosen arbitrarily. If we preferred, we could imag-

ine fusing the particles in a diﬀerent order, and would obtain a diﬀerent
basis that can be expressed in terms of our standard one with help from
the F -matrix.

9.12.6 Braiding in the standard basis: the B-matrix

We would like to consider what happens to states of the topological vector
space Vac1a2 a3 ···an of n anyons when the particles are exchanged with one
another. Actually, since exchanges can swap the positions of particles with
distinct labels, they may map one topological vector space to another by
permuting the labels. Nevertheless, we can consider the direct sums of the
vector spaces associated with all the possible permutations of the labels,
which will provide a representation of the braid group Bn .
We would like to describe how this representation acts on the standard
bases for these spaces. It suﬃces to say how exchanges of neighboring
particles are represented; that is, to specify the action of the generators
of the braid group. However, so far, we have discussed only the action of
the braid group on a pair of particles with deﬁnite total charge (the R-
matrix), which is not in itself enough to tell us its action on the standard
bases.
The way out of this quandary is to observe that, by applying the F -
matrix, we can move from the standard basis to the basis in which the
R-matrix is block diagonal, apply R, and then apply F −1 to return to the
standard basis:
48 9 Topological quantum computation

1
F R F
o o o

The composition of these three operations, which expresses the eﬀect of

braiding in the standard basis, is denoted B and sometimes called the
“braid matrix;” but to avoid confusion between B and R, I will just call
it the B-matrix.
Consider exchanging the anyons in positions j and j + 1 along the line.
In our decomposition of Vac1 a2 a3 ···an , this exchange acts on the space
b
b b
j j−1 j
Vbj−2, aj ,aj+1 = Vbj−2 ,aj ⊗ Vbj−1 ,aj+1 . (9.85)
bj−1

d , which is
To reduce the number of subscripts, we will call this space Vacb
transformed by the exchange as
d
B : Vacb → Vabc
d
. (9.86)

Let us express the action of B in terms of the standard bases for the two
d d
spaces Vacb and Vabc .

b c b c
ecP cQ c

a
e
d
¦
ec c c
PQ
B d
abc ePQ
a d
e´

To avoid cluttering the equations, I suppress the labels for the fusion
space basis elements (it is obvious where they should go). Hence we write
f
B|(ac)b → d; e = B|a(cb) → d; f Facb
d
e
f
f
= |a(bc) → d; f Rfbc Facb
d
e
f
d g f
= |(ab)c → d; g F −1 abc f
Rfbc Facb
d
,
e
f,g
(9.87)
9.13 Simulating anyons with a quantum circuit 49

or
g
B : |(ac)b → d; e → |(ab)c → d; g Babc
d
, (9.88)
e
g

where
g d g f
d
Babc = F −1 abc Rfbc Facb
d
. (9.89)
e f e
f

We have expressed the action of the B-matrix in the standard basis in

terms of the F -matrix and R-matrix, as desired.
Thus, the representation of the braid group realized by n anyons is com-
pletely characterized by the F -matrix and the R-matrix. Furthermore, we
have seen that the R matrix also determines the topological spins of the
anyons, so that we have actually constructed a representation of a larger
group whose generators include both the exchanges of neighboring parti-
cles and 2π rotations of the particles. A good name for this group would
be the ribbon group, as its elements are in one-to-one correspondence with
the topological classes of braided ribbons (which can be twisted) rather
than braided strings; however, mathematicians have already named it
“the mapping class group for the sphere with n punctures.”
And with that observation we have completed our description of an
anyon model in this general setting. The model is specified by: (1) a
label set, (2) the fusion rules, (3) the R-matrix, and (4) the F matrix.
The mathematical object we have constructed is called a unitary topo-
logical modular functor, and it is closely related to two other objects that
have been much studied: topological quantum field theories in 2+1 space-
time dimensions, and conformal field theories in 1+1 spacetime dimen-
sions. However, we will just call it an anyon model.

9.13 Simulating anyons with a quantum circuit

A topological quantum computation is executed in three steps:

1. Initialization: Particle-antiparticle pairs c1 c̄1 , c2 c̄2 , c3 c̄3 , . . . , cmc̄m are

created. Each pair is of a speciﬁed type and has trivial total charge.

2. Processing. The n = 2m particles are guided along trajectories, their

world lines following a speciﬁed braid.

3. Readout. Pairs of neighboring particles are fused together, and it is

recorded whether each pair annihilates fully or not. This record is
the output of the computation.
50 9 Topological quantum computation

(In the case of the nonabelian superconductor model of computation, we

allowed the braiding to be conditioned on the outcome of fusing carried
out during the processing stage. But now we are considering a model in
which all measurements are delayed until the ﬁnal readout.)
How powerful is this model of computation? I claim that this topologi-
cal quantum computer can be simulated eﬃciently by a quantum circuit.
Since the topological Hilbert space of n anyons does not have a simple
and natural decomposition as a tensor product of small subsystems, this
claim may not be immediately obvious. To show it we must explain:

1. How to encode the topological Hilbert space using ordinary qubits.

2. How to represent braiding eﬃciently using quantum gates.
3. How to simulate the fusion of an anyon pair.

Encoding. Since each pair produced during initialization has trivial

total charge, the initial state of the n anyons also has trivial total charge.
Therefore, the topological Hilbert space is

Va11 a2 a3 ···an ∼
= Vab11a2 ⊗ Vbb12a3 ⊗ · · · ⊗ Vbān−3
n
an−1 , (9.90)
b1 ,b2 ,...,bn−3

for some choice of the labels a1 , a2 , a3 , . . . an ; there are n − 3 intermediate

charges and n − 2 fusion spaces appearing in each summand. Exchanges
of the particles swap the labels, but after each exchange the vector space
still has the form eq. (9.90) with labels given by some permutation of the
original labels.
Although each n-anyon topological Hilbert spaces is not itself a tensor
products of subsystems, all of these spaces are contained in

(Hd )⊗(n−2) , (9.91)

where
1
Hd = Vabc . (9.92)
a,b,c

Here, a, b, c are summed over the complete label set of the model (which
we have assumed is ﬁnite), so that Hd contains all the possible fusion
states of three particles, and the dimension d of Hd is

1
d= Nabc . (9.93)
a,b,c

Thus the state of n anyons can be encoded in the Hilbert space of n − 2

qudits for some constant d (which depends on the anyon model but is
9.13 Simulating anyons with a quantum circuit 51

independent of n). The basis states of this qudit can be chosen to be

{|a, b, c; µ}, where µ labels an element of the basis for the fusion space
1 .
Vabc
Braiding. In the topological quantum computer, a braid is executed by
performing a sequence of exchanges, each acting on a pair of neighboring
particles. The eﬀect of each exchange in the standard basis is described by
the B-matrix. How is B represented acting on our encoding of the topo-
logical vector space (using qudits)? Suppressing fusion states, our basis
¯ But in the topolog-
for two-qudit states can be denoted |a, b, c|d, e, f.
ical quantum computer, the labels d and c̄ always match, and therefore
to perform our simulation of braiding we need only consider two-qudit
states whose labels match in this sense:

e b e b
g

a
d f
¦ g
f
Baeb
d a
g g f
d

Then the action of the B-matrix on these basis states is

g
¯ →
B : |a, b, d̄|d, e, f ¯ Bf
|a, e, ḡ|g, b, f . (9.94)
aeb
d
g

As desired, we have represented the B as a d2 × d2 matrix acting on a

pair of neighboring qudits.
Fusion. Fusion of a pair of anyons can be simulated by a two-qudit
measurement, which can be reduced to a single-qudit measurement with
a little help from the F -matrix:

b e
b e
g g

a
d f
¦ g
Ff
abe d g
d
a f

¯ for a pair of neighboring qudits;

Consider a basis state |a, b, d̄|d, e, f
what is the amplitude for the anyon pair (be) to have trivial total charge?
Using an F -move, the state can be expanded as
g
F : ¯ →
|a, b, d̄|d, e, f ¯ ḡ, e F f
|a, g, f|b, abe
d
g
1 g
¯ 1, e F f
= |a, 1, f|b, + |a, g, ¯ ḡ, e F f
f|b, ;(9.95)
abe abe
d d
g =1
52 9 Topological quantum computation

we have separated the sum over g into the component for which (be) fuses
to 1, plus the remainder. After the F -move which (is just a particular
two-qudit unitary gate), we can sample the probability that (be) fuses to
1 by performing a projective measurement of the second qudit in the basis
{|b, ḡ, e}, and recording whether g = 1.
This completes our demonstration that a quantum circuit can simulate
eﬃciently a topological quantum computer.

9.14 Fibonacci anyons

Now we have established that topological quantum computation is no
more powerful than the quantum circuit model — any problem that can
be solved efficiently by braiding nonabelian anyons can also be solved
efficiently with a quantum circuit. But is it as powerful? Can we simulate
a universal quantum computer by braiding anyons? The answer depends
on the specific properties of the anyons: some nonabelian anyon models
are universal, others are not. To find the answer for a particular anyon
model, we need to understand the properties of the representations of the
braid group that are determined by the F -matrix and R-matrix.
Rather than give a general discussion, we will study one especially
simple nonabelian anyon model, and demonstrate its computational uni-
versality. This model is the very simplest nonabelian model — conformal
field theorists call it the “Yang-Lee model,” but I will call it the “Fibonacci
model” for reasons that will soon be clear.
In the Fibonacci model there are only two labels — the trivial label,
which I will now denote 0, and a single nontrivial label that I will call 1,
where 1̄ = 1. And there is only one nontrivial fusion rule:

1×1= 0+1 ; (9.96)

when two anyons are brought together they either annihilate, or fuse to
become a single anyon. The model is nonabelian because two anyons can
fuse in two distinguishable ways.
Consider the standard basis for the Hilbert space V1bn of n anyons, where
each basis element describes a distinguishable way in which the n anyons
could fuse to give total charge b ∈ {0, 1}. If the two anyons furthest to
the left were fused ﬁrst, the resulting charge could be 0 or 1; this charge
could then fuse with the third anyon, yielding a total charge of 0 or 1,
and so on. Finally, the last anyon fuses with the total charge of the ﬁrst
n − 1 anyons to give the total charge b. Altogether n − 2 intermediate
charges b1 , b2, b3 , . . . bn−2 appear in this description of the fusion process;
thus the corresponding basis element can be designated with a binary
string of length n − 2. If the total charge is 0, the result of fusing the
9.15 Quantum dimension 53

ﬁrst n − 1 anyons has to be 1, so the basis states are labeled by strings of

length n − 3.
However, not all binary strings are allowed — a 0 must always be
followed by a 1. There cannot be two zeros in a row because when the
charge 0 fuses with 1, a total charge of 1 is the only possible outcome.
Otherwise, there is no restriction on the sequence. Therefore, the basis
states are in one-to-one with the binary strings that do not contain two
successive 0’s.
Thus the dimensions Nn0 ≡ N10n of the topological Hilbert spaces V10n
obey a simple recursion relation. If the fusion of the ﬁrst two particles
yields trivial total charge, then the remaining n − 2 particles can fuse
0
in Nn−2 distinguishable ways, and if the fusion of the ﬁrst two particles
yields an anyon with nontrivial charge, then that anyon can fuse with the
0
other n − 2 anyons in Nn−1 ways; therefore,

Nn0 = Nn−1
0 0
+ Nn−2 . (9.97)

Since N10 = 0 and N20 = 1, the solution to this recursion relation is

n = 1 2 3 4 5 6 7 8 9...
(9.98)
Nn0 = 0 1 1 2 3 5 8 13 21 . . .

— the dimensions are Fibonacci numbers (which is why I am calling this

model the “Fibonacci model”).
The Fibonacci numbers
grow
√ with n at a rate Nn0 ≈ Cφn , where φ is
the golden mean φ = 12 1 + 5 ≈ 1.618. Because φ governs the rate at
which the Hilbert space enlarges as anyons are added, we say that d = φ
is the quantum dimension of the Fibonacci anyon. That this “dimension”
is an irrational number illustrates vividly that the topological Hilbert
space has no natural decomposition as a tensor product of subsystems
— instead, the topologically encoded quantum information is a collective
property of the n anyons.

9.15 Quantum dimension

We will return shortly to the properties of the Fibonacci model, but first
let’s explore more deeply the concept of quantum dimension. For a general
anyon model, how should the dimension da of label a be defined? For this
purpose, it is convenient to imagine a physical process in which two aā
pairs are created (each with trivial total charge); then the particle a from
the pair on the right fuses with the antiparticle ā from the pair on the
left. Do these particles annihilate?
With suitable phase conventions, the amplitude for the annihilation
to occur is a real number in the unit interval [0,1]. Let us define this
54 9 Topological quantum computation

number to be 1/da, where da is the quantum dimension of a (and 1/d2a

is the probability that annihilation occurs). Note that it is clear from
this deﬁnition that da = dā . For the case in which the a is the label
of an irreducible representation Ra of a group G, the dimension is just
da = |Ra|, the dimension of the representation. This is easily understood
pictorially:

a a a a
1
a a
1
a a
da
If two pairs are created and then each pair annihilates immediately, the
world lines of the pairs form two closed loops, and |R| counts the number
of distinct “colors” that propagate around each loop. But if the particle
from each pair annihilates the antiparticle from the other pair, there is
only one closed loop and therefore one sum over colors; if we normalize
the process on the left to unity, the amplitude for the process on the right
is suppressed by a factor of 1/|R|. To say the same thing in an equation,
the normalized state of an RR̄ pair is
1
|RR̄ = |i|ī , (9.99)
|R| i

where {|i} denotes an orthonormal basis for R and {|ī} is a basis for R̄.
Suppose that two pairs |RR̄and |R R̄ are created; if the pairs are fused
after swapping partners, the amplitude for annihilation is
1
RR̄, R R̄ |RR̄ , R R̄ = j j̄, j j̄ |iī , iī
|R|2
i,i ,j,j
1 1 1
= δji δji δj i δj i = δii = . (9.100)
|R|2 |R|2 |R|
i,i ,j,j i

In general, though, the quantum dimension has no direct interpretation

in terms of counting “colors,” and there is no reason why it has to be an
integer.
How are such quantum dimensions related to the dimensions of topo-
logical Hilbert spaces? To see the connection, if is very useful to alter
our normalization conventions. Notice we can introduce many “zigzags”
in the world line of a particle of type a by creating many aā pairs, and
9.15 Quantum dimension 55

fusing the particle from each pair with the antiparticle from the neighbor-
ing pair. However, each zigzag reduces the amplitude by another factor
of 1/da. We can compensate for these factors of 1/d √ a if we weight each
pair creation or annihilation event by a factor of da. With this new
convention, we can bend the world line of a particle forward or backward
in time without paying any penalty:

a a
da da da

da da da
a a

Now the weight assigned to a world line is a topological invariant (it is

unchanged when we distort the line), and a world line of type a forming
a closed loop is weighted by da .
With our new conventions, we can justify this sequence of manipula-
tions:

b b
a P a
d a db
a b
¦P c P
c,

P
¦P
c,
b
P
a
¦N
c
c
ab
c
¦N
c
c
ab dc
c

Each diagram represents an inner product of two (unconventionally nor-

malized) states. We have inserted a complete sum over the labels (c) and
the corresponding fusion states (µ) that can arise when a and b fuse. Ex-
ploiting the topological invariance of the diagram, we have then turned it
c
“inside out,” then contracted the fusion states (acquiring the factor Nab
which counts the possible values of µ).
The equation that we have derived,

da db = c
Nab dc ≡ (Na)cb dc , (9.101)
c c
56 9 Topological quantum computation

< whose components are the quantum dimensions,

says that the vector d,
is an eigenvector with eigenvalue da of the matrix Na that describes how
the label a fuses with other labels:

Nad< = da d< . (9.102)

Furthermore, since Na has nonnegative entries and all components of d< are
positive, da is the largest eigenvalue of Na and is nondegenerate. (This
simple observation is sometimes called the Perron-Frobenius theorem.)
b
For n anyons, each with label a, the topological Hilbert space Vaaa···a for
the sector with total charge b has dimension

b1 b2 b3
b
Naaa···a = Naa Nab1 Nab 2
b
. . . Nabn−2
= b| (Na)n−1 |a . (9.103)
{bi }

The matrix Na can be diagonalized, and expressed as

Na = |vdav| + · · · , (9.104)

where
d<
|v = , D= dc 2 , (9.105)
D c

and subleading eigenvalues have been omitted; therefore

b
Naaa···a = dna db/D2 + · · · , (9.106)

where the ellipsis represents terms that are exponentially suppressed for
large n. We see that the quantum dimension da controls the rate of growth
of the n-particle Hilbert space for anyons of type a.
Because the label 0 with trivial charge fuses trivially, we have d0 = 1. In
the case of the Fibonacci model, it follows from the fusion rule 1×1 = 0+1
that d21 = 1 + d1 , which is solved by d1 = φ as we found earlier; therefore
D2 = d20 + d21 = 1 + φ2 = 2 + φ. Our formula becomes

0 1
N111···1 = φn , (9.107)
2+φ

which is an excellent approximation to the Fibonacci numbers even for

modest values of n.
Suppose that an aā pair and a bb̄ pair are both created. If the a and b
particles are fused, with what probability p(ab → c) will their total charge
be c? This question can be answered using the same kind of graphical
manipulations:
9.15 Quantum dimension 57

b P
P
d a db p (ab o c) ¦P c
a
¦P b a
P P
c

c
N ab N abc d c
c

Dividing by dadb to restore the proper renormalization of the inner prod-

uct, we conclude that
N c dc
p(ab → c) = ab , (9.108)
da db
which generalizes the formula p(aā → 1) = 1/d2a that we used to deﬁne
the quantum dimension, and satisﬁes the normalization condition

p(ab → c) = 1. (9.109)
c

To arrive at another interpretation of the quantum dimension, imagine

that a dense gas of anyons is created, which is then permitted to anneal
for awhile — anyons collide and fuse, gradually reducing the population of
particles. Eventually, but long before the thermal equilibrium is attained,
the collision rate becomes so slow that the fusion process eﬀectively turns
oﬀ. By this stage, whatever the initial distribution of particles types, a
steady state distribution is attained that is preserved by collisions. If in
the steady state particles of type a appear with probability pa, then

pa pb p(ab → c) = pc . (9.110)
ab

Using
c a
Nab da = Nbc̄ dā = dbdc̄ = db dc , (9.111)
a a
we can easily verify that this condition is satisﬁed by
d2a
pa = . (9.112)
D2
We conclude that if anyons are created in a random process, those carrying
labels with larger quantum dimension are more likely to be produced, in
keeping with the property that anyons with larger dimension have more
quantum states.
58 9 Topological quantum computation

9.16 Pentagon and hexagon equations

To assess the computational power of an anyon model like the Fibonacci
model, we need to know the braiding properties of the anyons, which are
determined by the R and F matrices. We will see that the braiding rules
are highly constrained by algebraic consistency conditions. For the Fi-
bonacci model, these consistency conditions suffice to determine a unique
braiding rule that is compatible with the fusion rules.
Consistency conditions arise because we can make a sequence of “F -
moves” and “R-moves” to obtain an isomorphism relating two topological
Hilbert spaces. The isomorphism can be regarded as a unitary matrix
that relates the canonical orthonormal bases for two different spaces; this
unitary transformation does not depend on the particular sequence of
moves from which the isomorphism is constructed, only on the initial and
final bases.
For example, there are five different ways to fuse four particles (without
any particle exchanges), which are related by F -moves:

1 2 3 4

F a c F
1 2 3 4 1 2 3 4
5
a c
d
b

5 5
F F
1 2 3 4 1 2 3 4
e
F e

d
b

5 5

The basis shown furthest to the left in this pentagon diagram is the “left
standard basis” {|left; a, b}, in which particles 1 and 2 are fused ﬁrst,
the resulting charge a is fused with particle 3 to yield charge b, and then
ﬁnally b is fused with particle 4 to yield the total charge 5. The basis
shown furthest to the right is the “right standard basis” {|right; c, d}, in
which the particles are fused from right to left instead of left to right.
Across the top of the pentagon, these two bases are related by two F -
9.16 Pentagon and hexagon equations 59

moves, and we obtain

5 d 5 c
|left; a, b = |right; c, d F12c a
Fa34 b . (9.113)
c,d

Across the bottom of the pentagon, the bases are related by three F -
moves, and we ﬁnd
c d b e
5
|left; a, b = |right; c, d F234
d
F1e4 b
F123 . (9.114)
e a
c,d,e

Equating our two expressions for |left; a, b, we obtain the pentagon equa-
tion: 5 d 5 c d c 5 d b e
F12c a Fa34 b = F234 F1e4 b F123 . (9.115)
e a
Another nontrivial consistency condition is found by considering the
various ways that three particles can fuse:

1 2 3 2 3 1

F R b F
b
1 2 3 2 3 1
4 4

a c
R R
4 2 1 3 2 1 3 4
F
a c

4 4

The basis {|left; a} furthest to the left in this hexagon diagram is obtained
if the particles are arranged in the order 123, and particles 1 and 2 are
fused ﬁrst, while the basis {|right, c} furthest to the right is obtained if
the particles are arranged in order 231, and particles 1 and 3 are fused
ﬁrst. Across the top of the hexagon, the two bases are related by the
sequence of moves F RF :
4 c 4 4 b
|left, a = |right; c F231 R F123 a .
b 1b
(9.116)
b,c

Across the bottom of the hexagon, the bases are related by the sequence
of moves RF R, and we ﬁnd
4 c a
|left, a = |right; cRc13 F213 R .
a 12
(9.117)
c
60 9 Topological quantum computation

Equating our two expressions for |left; a, we obtain the hexagon equation:
4 c a c 4 4 b
4
Rc13 F213 R =
a 12
F231 R F123 a .
b 1b
(9.118)
b

A beautiful theorem, which I will not prove here, says that there are
no further conditions that must be imposed to ensure the consistency of
braiding and fusing. That is, for any choice of an initial and final basis
for n anyons, all sequences of R-moves and F -moves that take the initial
basis to the final basis yield the same isomorphism, provided that the
pentagon equation and hexagon equation are satisfied. This theorem is
an instance of the MacLane coherence theorem, a fundamental result in
category theory. The pentagon and hexagon equations together are called
the Moore-Seiberg polynomial equations — their relevance to physics was
first appreciated in studies of (1+1)-dimensional conformal field theory
during the 1980’s.
A solution to the polynomial equations defines a viable anyon model.
Therefore, there is a systematic procedure for constructing anyon models:

1. Choose a set of labels and assume a fusion rule.

2. Solve the polynomial equations for R and F .

If no solutions exist, then the hypothetical fusion rule is incompatible with

the principles of local quantum physics and must be rejected. If there is
more than one solution (not related to one another by any reshuffling of
the labels, redefinition of bases, etc.), then each distinct solution defines
a distinct model with the assumed fusion rule.
To illustrate the procedure, consider the polynomial equations for the
Fibonacci fusion rule. There are only two F -matrices that arise, which
we will denote as
F0111 ≡ F0 , F1111 ≡ F1 . (9.119)
F0 is really the 1 × 1 matrix

(F0 )ba = δa1 δ1b , (9.120)

while F1 is a 2 × 2 matrix. The pentagon equation becomes

(Fc )da (Fa )cb = (Fd )ce (Fe )db (Fb )ca . (9.121)
e

The general solution for F ≡ F1 is

√
τ eiφ τ
F = −iφ √ , (9.122)
e τ −τ
9.17 Simulating a quantum circuit with Fibonacci anyons 61

where eiφ is an arbitrary phase

√ (which
we can set to 1 with a suitable
phase convention), and τ = 5 − 1 /2 = φ − 1 ≈ .618, which satisﬁes

τ2 + τ = 1 . (9.123)

The 2 × 2 R-matrix that describes a counterclockwise exchange of two

Fibonacci anyons has two eigenvalues — R0 for the case where the total
charge of the pair of anyons is trivial, and R1 for the case where the total
charge is nontrivial. The hexagon equation becomes

Rc (F )ca Ra = (F )c0 (F )0a + (F )c1 R1 (F )1a . (9.124)

Using the expression for F found by solving the pentagon equation, we

can solve the hexagon equation for R, ﬁnding
4πi/5 √
e 0 τ τ
R= 2πi/5 , F = √ . (9.125)
0 −e τ −τ

The only other solution is the complex conjugate of this one; this second
solution really describes the same model, but with clockwise and coun-
terclockwise braiding interchanged. Therefore, an anyon model with the
Fibonacci fusion rule really does exist, and it is essentially unique.

9.17 Simulating a quantum circuit with Fibonacci anyons

Now we know enough to address whether a universal quantum computer
can be simulated using Fibonacci anyons. We need to explain how qubits
can be encoded with anyons, and how a universal set of quantum gates
can be realized.
First we note that the Hilbert space V40 ≡ V11110
has dimension N40 = 2;
therefore a qubit can be encoded by four anyons with trivial total charge.
The anyons are lined up in order 1234, numbered from left to right; in the
standard basis state |0, anyons number 1 and number 2 fuse to yield total
charge 0, while in the standard basis state |1, anyons 1 and 2 fuse to yield
total charge 1. Acting on this standard basis, the braid group generator
σ1 (counterclockwise exchange of particles 1 and 2) is represented by
4πi/5
e 0
σ1 → R = , (9.126)
0 −e2πi/5

while the generator σ2 is represented by

√
−1 τ τ
σ2 → B = F RF , F = √ . (9.127)
τ −τ
62 9 Topological quantum computation

These matrices generate a representation of the braid group B3 on three

strands whose image is dense in SU(2). Indeed, R and B generate Z10
subgroups of U(2), about two distinct axes, and there is no finite sub-
group of U(2) that contains both of these subgroups — therefore, the
representation closes on the group containing all elements of U(2) with
determinant equal to a 10th root of unity. Similarly, for n anyons with
trivial total charge, the image of the representation of the braid group is
dense in SU(Nn0 ).
To simulate a quantum circuit acting on n qubits, altogether 4n anyons
are used. We have just seen that by braiding within each cluster of four
anyons, arbitrary single-qubit gates can be realized. To complete a uni-
versal set, we will need two-qubit gates as well. But two neighboring
qubits are encoded by eight anyons, and exchanges of these anyons gen-
erate a representation of B8 whose image is dense in SU(N80 )= SU(13),
which of course includes the SU(4) that acts on the two encoded qubits.
Therefore, each gate in a universal set can be simulated with arbitrary
accuracy by some finite braid.
Since we can braid clockwise as well as counterclockwise, the inverse
of each exchange gate is also in our repertoire. Therefore, we can ap-
ply the Solovay-Kitaev theorem to conclude that the universal gates of
the circuit model can be simulated to accuracy ε with braids of length
poly (log(1/ε)). It follows that an ideal quantum circuit with L gates
acting on all together n qubits can be simulated to fixed accuracy using
4n anyons and a braid of length O(L · poly(log(L)). As desired, we have
shown that a universal quantum computer can be simulated efficiently
with Fibonacci anyons. Note that, in contrast to the simulation using
the nonabelian superconductor model, no intermediate measurements are
needed to realize the universal gates.
In the analysis above, we have assumed that there are no errors in
the simulation other than those limiting the accuracy of the Solovay-
Kitaev approximation to the ideal gates. It is therefore implicit that the
temperature is small enough compared to the energy gap of the model
that thermally excited anyons are too rare to cause trouble, that the
anyons are kept far enough apart from one another that uncontrolled
exchange of charge can be neglected, and in general that errors in the
topological quantum computation are unimportant. If the error rate is
small but not completely negligible, then the standard theory of quantum
fault tolerance can be invoked to boost the accuracy of the simulation as
needed, at an additional overhead cost polylogarithmic in L. The fault-
tolerant procedure should include a method for controlling the “leakage”
of the encoded qubits — that is, to prevent the drift of the clusters of
four qubits from the two-dimensional computational space V40 to its three-
dimensional orthogonal complement V41 .
9.18 Epilogue 63

9.18 Epilogue
That is as far as I got in class. I will mention brieﬂy here a few other
topics that I might have covered if I had not run out of time.

9.18.1 Chern-Simons theory

We have discussed how anyon models can be constructed through a brute-
force solution to the polynomial equations. This method is foolproof,
but in practice models are often constructed using other, more eﬃcient
methods. Indeed, most of the known anyon models have been found as
instances of Chern-Simons theory.
The fusion rules of a Chern-Simons theory are a truncated version of the
fusion rules for irreducible representations of a Lie group. For example,
associated with the group SU(2) there is a tower of Chern-Simons theories
indexed by a positive integer k. For SU(2), the irreducible representations
carry labels j = 0, 12 , 1, 32 , 2, 52 , . . ., and the fusion rules have the form
1 +j2
j
j1 × j2 = j . (9.128)
j=|j2 −j1 |

In the Chern-Simons theory denoted SU(2)k , the half-integer labels are

limited to j ≤ k/2, and the label j is contained in j1 ×j2 only if j1 +j2 +j ≤
k.
For example, the SU(2)1 model is abelian, and the nontrivial fusion
rules of the SU(2)2 model are
1 1
2 ×= 0+1 ,
2
1 1
2 ×1 = 2 ,
1×1 = 0 . (9.129)
√
Therefore, the label 12 has quantum dimension d1/2 = 2, and the topo-
logical Hilbert space of 2m such anyons with total charge 0 has dimension
N 01 2m = 2m−1 . (9.130)
(2)
The polynomial equations for these fusion rules have multiple solutions
(only one of which describes the braiding properties of the SU(2)2 model),
but none of the resulting models have computationally universal braiding
rules. The space V 10 1 1 1 is two-dimensional, and the 2 × 2 matrices F ≡
22 22
F 1 1 1 1 and R ≡ R 1 1 are, up to overall phases and complex conjugation,
2 222 22

1 1 1 1 0
F =H= √ , R=P = . (9.131)
2 1 −1 0 i
64 9 Topological quantum computation

There are Cliﬀord-group quantum gates, inadequate for universality.

However, the SU(2)k models for k ≥ 3 are computationally universal.
The nontrivial fusion rules of SU(2)3 are
1 1
2 × = 0+1 ,
2
1 1 3
2 ×1 = 2 + 2 ,
1 3
2 × 2 =1 ,
1×1 = 0 +1 ,
1
1 × 32 = ,
2
3 3
2 × 2 = 0 . (9.132)
The Fibonacci (Yang-Lee) model that we have studied is obtained by
truncating SU(2)3 , further, eliminating the noninteger labels 12 and 32
(i.e., this is the Chern-Simons theory SO(3)3 ); then the only remaining
nontrivial fusion rule is 1 × 1 = 0 + 1.
Wang (unpublished) has recently constructed all anyons models with
no more than four labels, and has found that all of the models are closely
related to the models that are found in Chern-Simons theory.

9.18.2 S-matrix
The modular S-matrix of an anyon model can be deﬁned in terms of two
anyon world lines that form a Hopf link:

a b
b 1
S
D
a

Here D is the total quantum dimension of the model, and we have used
the normalization where unlinked loops would have the value dadb ; then
the matrix Sab is symmetric and unitary. In abelian anyon models, the
Hopf link arose in our discussion of topological degeneracy, where we
characterized how the vacuum state of an anyon model on the torus is
aﬀected when an anyon is transported around one of the cycles of the
torus. The S-matrix has a similar interpretation in the nonabelian case.
By elementary reasoning, S can be related to the fusion rules:
Sd c
c
(Na)b = Sbd a
d
S −1 d ; (9.133)
d
S 1
9.19 Bibliographical notes 65

that is, the S-matrix simulaneously diagonalizes all the matrices {Na}
(the Verlinde relation). Note that it follows from the deﬁnition that S1a =
da /D.

9.18.3 Edge excitations

In our formulation of anyon models, we have discussed the fusing and
braiding of particles in the two-dimensional bulk. But there is another
aspect of the physics of two-dimensional media that we have not yet dis-
cussed, the properties of the one-dimensional edge of the sample. Typi-
cally, if a two-dimensional system supports anyons in the bulk, there are
also chiral massless excitations that propagate along the one-dimensional
edge. At nonzero temperature T , there is an energy ﬂux along the edge
given by the expression
π
J = c− T 2 ; (9.134)
12
here the constant c− , called the chiral central charge of the edge, is a
universal property that is unaﬀected by small changes in the underlying
Hamiltonian of the system.
While this chiral central charge is an intrinsic property of the two-
dimensional medium, the properties of the anyons in the bulk do not
determine it completely; rather we have
1 2 2πiJa
d e = e(2πi/8)c− , (9.135)
D a a

where the sum is over the complete label set of the anyon model, and
e2πiJa = R1aā is the topological spin of the label a. This expression re-
lates the quantity c− , characteristic of the edge theory, to the quantum
dimensions and topological spins of the bulk theory, but determines c−
only modulo 8. Therefore, at least in principle, there can be multiple edge
theories corresponding to a single theory of anyons in the bulk.

9.19 Bibliographical notes

Some of the pioneering papers on the theory of anyons are reprinted in
[1].
What I have called the “nonabelian superconductor” model is often
referred to in the literature as the “quantum double,” and is studied
using the representation theory of Hopf algebras. For a review see [2].
That nonabelian anyons can be used for fault-tolerant quantum com-
puting was ﬁrst suggested in [3]. This paper also discusses the toric code,
and related lattice models that have nonabelian phases. A particular real-
ization of universal quantum computation in a nonabelian superconductor
66 9 Topological quantum computation

was discussed in [4, 5]. My discussion of the universal gate set is based
on [6], where more general models are also discussed. Other schemes,
that make more extensive use of electric charges and that are universal
for smaller groups (like S3 ) are described in [7].
Diagrammatic methods, like those I used in the discussion of the quan-
tum dimension, are extensively applied to derive properties of anyons in
[8]. The role of the polynomial equations (pentagon and hexagon equa-
tions) in (1+1)-dimensional conformal ﬁeld theory is discussed in [9].
Simulation of anyons using a quantum circuit is discussed in [10]. Simu-
lation of a universal quantum computer using the anyons of the SU(2)k=3
Chern-Simons theory is discussed in [11]. That the Yang-Lee model is
also universal was pointed out in [12].
I did not discuss physical implementations in my lectures, but I list a
few relevant references here anyway: Ideas about realizing abelian and
nonabelian anyons using superconducting Josephson-junction arrays are
discussed in [13]. A spin model with nearest-neighbor interactions that
has nonabelian anyons (though not ones that are computationally univer-
sal) is proposed and solved in [14], and a proposal for realizing this model
using cold atoms trapped in an optical lattice is described in [15]. Some
ideas about realizing the (computationally universal) SU(2)k=3 model in
a system of interacting electrons are discussed in [16].
Much of my understanding of the theory of computing with nonabelian
anyons was derived from many helpful discussions with Alexei Kitaev.
References

[1] F. Wilczek, Fractional statistics and anyon superconductivity (World Scien-

tific, Singapore, 1990).
[2] M. de Wild Propitius and F. A. Bais, “Discrete gauge theories,” arXiv:
hep-th/9511201 (1995).
[3] A. Yu. Kitaev, “Fault-tolerant quantum computation by anyons,” Annals
Phys. 303, 2-30 (2003), arXiv: quant-ph/9707021.
[4] R. W. Ogburn and H. Preskill, “Topological quantum computation,” Lect.
Notes in Comp. Sci. 1509, 341-356 (1999).
[5] J. Preskill, “Fault-tolerant quantum computation,” arXiv: quant-
ph/9712048 (1997).
[6] C. Mochon, “Anyons from non-solvable finite groups are sufficient for uni-
versal quantum computation” Phys. Rev. A 67, 022315 (2003), quant-
ph/0206128.
[7] C. Mochon, “Anyon computers with smaller groups,” Phys. Rev. A 69,
032306 (2004), arXiv: quant-ph/0306063.
[8] J. Fröhlich and F. Gabbiani, “Braid statistics in local quantum theory,”
Rev. Math. Phys. 2:3, 251–353 (1990).
[9] G. Moore and N. Seiberg, “Classical and quantum conformal field theory,”
Comm. Math. Phys. 123, 171–254 (1989).
[10] M. H. Freedman, A. Kitaev, and Z. Wang, “Simulation of topological field
theories by quantum computers,” Comm. Math. Phys. 227, 587-603 (2002),
arXiv: quant-ph/0001071.
[11] M. H. Freedman, M. Larsen, and Z. Wang, “A modular functor which is
universal for quantum computation,” arXiv: quant-ph/0001108 (2000).
[12] G. Kuperberg, unpublished.
[13] B. Doucot, L. B. Ioffe, and J. Vidal, “Discrete non-Abelian gauge theo-
ries in two-dimensional lattices and their realizations in Josephson-junction
arrays,” Phys. Rev. B 69, 214501 (2004), arXiv: cond-mat/0302104.

67
68 References

[14] A. Kitaev, “Anyons in a spin model on the honeycomb lattice,” unpublished.

[15] L.-M. Duan, E. Demler, and M. D. Lukin, “Controlling spin exchange inter-
actions of ultracold atoms in optical lattices,” Phys. Rev. Lett. 91, 090402
(2003), arXiv: cond-mat/0210564.
[16] M. Freedman, C. Nayak, K. Shtengel, K. Walker, and Zhenghan Wang, “A
class of P, T -invariant topological phases of interacting electrons,” cond-
mat/0307511 (2003).

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Brass Methods: An Essential Resource for Educators, Conductors, and Students
From Everand
Brass Methods: An Essential Resource for Educators, Conductors, and Students
David Kish
No ratings yet
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
Pres Kill Lecture Notes
No ratings yet
Pres Kill Lecture Notes
321 pages
Lecture Notes For Physics 229: Quantum Information and Computation
No ratings yet
Lecture Notes For Physics 229: Quantum Information and Computation
321 pages
John Preskill - Quantum Information and Computation
No ratings yet
John Preskill - Quantum Information and Computation
321 pages
Quantum Computing Lectures Preskill
100% (4)
Quantum Computing Lectures Preskill
321 pages
(321p) Quantum Information and Computation (Preskill 1998-09) (C494A418)
No ratings yet
(321p) Quantum Information and Computation (Preskill 1998-09) (C494A418)
321 pages
Lecture Notes For Physics 229: Quantum Information and Computation
No ratings yet
Lecture Notes For Physics 229: Quantum Information and Computation
321 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Back in the Real World (Stone Angel #2)
From Everand
Back in the Real World (Stone Angel #2)
Marvin H. Albert
5/5 (1)
Between River and Mountain
From Everand
Between River and Mountain
Sally Walker Brinkmann
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
The Sandy Steele Mystery MEGAPACK®: 6 Young Adult Novels (Complete Series)
From Everand
The Sandy Steele Mystery MEGAPACK®: 6 Young Adult Novels (Complete Series)
Roger Barlow
No ratings yet
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
Operation Exile
From Everand
Operation Exile
E. Hoffmann Price
3.5/5 (1)
APHY 601 Syllabus
No ratings yet
APHY 601 Syllabus
2 pages
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Essential Trout Flies: 50 Indispensable Patterns with Step-by-Step Instructions for 300 Most Useful Variations
From Everand
Essential Trout Flies: 50 Indispensable Patterns with Step-by-Step Instructions for 300 Most Useful Variations
Dave Hughes
3/5 (5)
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
The Last Smile: Stone Angel #5
From Everand
The Last Smile: Stone Angel #5
Marvin H. Albert
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
'Ware the Dark-Haired Man: The Hieromonk's Tale, Book Three
From Everand
'Ware the Dark-Haired Man: The Hieromonk's Tale, Book Three
Robert Reginald
No ratings yet
Never Walk Alone
From Everand
Never Walk Alone
Rufus King
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Lock 'n Load Tactical Core Rules v5.0
From Everand
Lock 'n Load Tactical Core Rules v5.0
David Heath
No ratings yet
Alienist: A Gerald Knave Science Fiction Adventure
From Everand
Alienist: A Gerald Knave Science Fiction Adventure
Laurence M. Janifer
No ratings yet
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
From Everand
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
Amy Rupertus Peacock
No ratings yet
Duenna to a Murder
From Everand
Duenna to a Murder
Rufus King
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Kayaking with Eric Jackson: Strokes and Concepts
From Everand
Kayaking with Eric Jackson: Strokes and Concepts
Eric Jackson
No ratings yet
The Fly Tying Artist: Creative Patterns for Common Hatches
From Everand
The Fly Tying Artist: Creative Patterns for Common Hatches
Rick Takahashi
5/5 (1)
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
From Everand
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
John Gregory Betancourt
No ratings yet
The Deathguards
From Everand
The Deathguards
Phyllis Ann Karr
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
The Naked Eye
From Everand
The Naked Eye
Henriette Martin
No ratings yet
Murder in the Willett Family: A Lt. Valcour Mystery
From Everand
Murder in the Willett Family: A Lt. Valcour Mystery
Rufus King
No ratings yet
Lords of Creation
From Everand
Lords of Creation
Eando Binder
2.5/5 (3)
100 Days to Better English Reading Comprehension: Intermediate-Advanced ESL Reading and Vocabulary Lessons
From Everand
100 Days to Better English Reading Comprehension: Intermediate-Advanced ESL Reading and Vocabulary Lessons
Jackie Bolen
5/5 (1)
Instant Access to Lectures on Quantum Information Physics Textbook 1. Auflage Edition Dagmar Bruß ebook Full Chapters
100% (10)
Instant Access to Lectures on Quantum Information Physics Textbook 1. Auflage Edition Dagmar Bruß ebook Full Chapters
85 pages
Joe Mauser, Mercenary from Tomorrow
From Everand
Joe Mauser, Mercenary from Tomorrow
Mack Reynolds
No ratings yet
Fly Fishing Guide to the Battenkill: Complete Guide to Locations, Hatches, and History
From Everand
Fly Fishing Guide to the Battenkill: Complete Guide to Locations, Hatches, and History
Doug Lyons
No ratings yet
Trouble in Tahiti: Blood on the Hibiscus
From Everand
Trouble in Tahiti: Blood on the Hibiscus
Hayford Peirce
No ratings yet
Slides 0824
No ratings yet
Slides 0824
25 pages
The Walking Shadow: A Promethean Scientific Romance
From Everand
The Walking Shadow: A Promethean Scientific Romance
Brian Stableford
5/5 (1)
Information and Computation - Classical and Quantum Aspects
No ratings yet
Information and Computation - Classical and Quantum Aspects
82 pages
Pages From 31380111 - Spec - 2018-02 - A00 Piping Class Sec 2.2
No ratings yet
Pages From 31380111 - Spec - 2018-02 - A00 Piping Class Sec 2.2
5 pages
Contact Lens and Anterior Eye: Dayron F. Martínez-Pulgarín, Marcel Y. Avila, Alfonso J. Rodríguez-Morales
No ratings yet
Contact Lens and Anterior Eye: Dayron F. Martínez-Pulgarín, Marcel Y. Avila, Alfonso J. Rodríguez-Morales
8 pages
Stephen Bragg - Wikipedia
No ratings yet
Stephen Bragg - Wikipedia
2 pages
Budgetary Control
100% (1)
Budgetary Control
81 pages
Giáo án tiếng Anh lớp 9 Global success 1 cột bản chuẩn 12.4. TEST YOURSELF 4.docx
No ratings yet
Giáo án tiếng Anh lớp 9 Global success 1 cột bản chuẩn 12.4. TEST YOURSELF 4.docx
9 pages
Discourse Analysis Is Sometimes Defined As The Analysis of Language
No ratings yet
Discourse Analysis Is Sometimes Defined As The Analysis of Language
2 pages
2020 Basic Way To Understanding The Hydraulic Fracturing - UPNVYK
No ratings yet
2020 Basic Way To Understanding The Hydraulic Fracturing - UPNVYK
30 pages
The Geometry of Quantum Computation
No ratings yet
The Geometry of Quantum Computation
23 pages
Einaudi 1994 High Sulfidation and Low Sulfidation Porphyry CopperSkarn Systems
No ratings yet
Einaudi 1994 High Sulfidation and Low Sulfidation Porphyry CopperSkarn Systems
6 pages
Neils Bohr Assignment Group 1
No ratings yet
Neils Bohr Assignment Group 1
8 pages
Careers in The Animal Industry
No ratings yet
Careers in The Animal Industry
33 pages
Omron E2b Proxmity Instruction Sheet
No ratings yet
Omron E2b Proxmity Instruction Sheet
2 pages
Course Guide EDTECH 212
No ratings yet
Course Guide EDTECH 212
4 pages
Drishti Book
No ratings yet
Drishti Book
3 pages
Centrifuge DSC-203,303SD (Manual) - 2021
100% (1)
Centrifuge DSC-203,303SD (Manual) - 2021
3 pages
Plant Air System
No ratings yet
Plant Air System
7 pages
Speech Production and Speech Perception
No ratings yet
Speech Production and Speech Perception
21 pages
PLC English Year 2
No ratings yet
PLC English Year 2
2 pages
17-Project Components and Operation Information
No ratings yet
17-Project Components and Operation Information
4 pages
Dissertation Topics On Productivity
100% (2)
Dissertation Topics On Productivity
4 pages
Hari Updated CV
No ratings yet
Hari Updated CV
3 pages
LA - - Exercises Toán Cao Cấp
No ratings yet
LA - - Exercises Toán Cao Cấp
51 pages
FRM 2020-2022 Sba #2
No ratings yet
FRM 2020-2022 Sba #2
5 pages
Limiting and Excess Reactants - Fractional Conversion - and - Extend - of - Reaction PDF
No ratings yet
Limiting and Excess Reactants - Fractional Conversion - and - Extend - of - Reaction PDF
9 pages
Social Psych Reviewer
No ratings yet
Social Psych Reviewer
5 pages
We-Care/greenhouse-Gases .PDF: Teacher-Made Learner's Home Task
No ratings yet
We-Care/greenhouse-Gases .PDF: Teacher-Made Learner's Home Task
4 pages
BCM-V U2 Part-B 19-07-2021
No ratings yet
BCM-V U2 Part-B 19-07-2021
55 pages
Accident Investigation
No ratings yet
Accident Investigation
20 pages
LP_FINAL-decile ungrouped
No ratings yet
LP_FINAL-decile ungrouped
12 pages
Tilt-Seal: High Performance Acrylic Latex
No ratings yet
Tilt-Seal: High Performance Acrylic Latex
2 pages

Preskill Quantum computing

Uploaded by

Preskill Quantum computing

Uploaded by

Lecture Notes for Physics 229:

Quantum Information and

1.1 Physics of information

1.2 Quantum information

1.3 Ecient quantum algorithms

1.4 Quantum complexity

1.5 Quantum parallelism

1.6 A new classi cation of complexity

1.7 What about errors?

1.9 Quantum hardware

1.9.1 Ion Trap

jx; yi ! jx; y  xi; (1.36)

1.9.2 Cavity QED

2.2 The Qubit

2.2.2 Photon polarizations

2.3 The density matrix

A = jaj2j0iA Ah0j + jbj2j1iA A h1j; (2.52)

we see as before that we may interpret  as describing an ensemble of pure

2.3.2 Bloch sphere

2.3.3 Gleason's theorem

E ! p(E); 0  p(E )  1: (2.71)

2.3.4 Evolution of the density operator

2.4 Schmidt decomposition

2.5 Ambiguity of the ensemble interpretation

2.5.2 Ensemble preparation

2.5.3 Faster than light?

2.5.4 Quantum erasure

3.1.2 Generalized measurement

3.1.3 One-qubit POVM

or, in matrix form U yU = 1. It follows that UU y = 1, since

3.1.5 Orthogonal measurement on a tensor product

jeiiA jr ; 1iB ; i = N ; c + 1; N ; c + 2; : : : N ; (3.57)

here ei is a vector whose only nonvanishing component is the ith component,

F a = 23 j "n^a iA A h"n^a j; a = 1; 2; 3: (3.58)

We may realize this POVM by introducing a second qubit B . In the two-

3.1.6 GHJW with POVM's

3.2.3 Complete positivity

3.2.4 POVM as a superoperator

3.4 Three Quantum Channels

3.4.1 Depolarizing channel

(Since UAE is inner product preserving, it has a unitary extension to all of

3.4.2 Phase-damping channel

3.4.3 Amplitude-damping channel

3.5 Master Equation

3.5.2 The Lindbladian

3.5.3 Damped harmonic oscillator

(t) = e;iH tI (t)eiH t;

a(t) = e;iH taI (t)eiH t;

and from eq. (3.159) we have

3.5.4 Phase damping

3.6 What is the problem? (Is there a prob-

jDeadicat jKnow it0s Deadime ; Prob = pdead;

3 = Nlim 1 X (i);

and to argue that (j i)N becomes an eigenstate of  3 with eigenvalue jaj2 ;

Taking the formal N ! 1 limit, we conclude that  3 has vanishing disper-

a) Show, using the normalization conditions satis ed by the Na's and

R(^n; ) = 1 cos 2 + in^ ~ sin 2 = i p1 (1 + 3) = iH :)

4.1.2 Einstein locality and hidden variables

4.1.3 Bell Inequalities

4.1.5 More Bell inequalities

4.1.6 Maximal violation

4.1.7 The Aspect experiment

4.1.8 Nonmaximal entanglement

4.2.1 Dense coding

4.2.2 EPR Quantum Key Distribution

5.1.1 Shannon entropy and data compression

5.1.2 Mutual information

I (X ; Y )  hlog pp(x(x;)py(y)) i = 0; (5.19)

naturally, we can't nd out about X by learning Y if there is no correlation!

5.1.3 The noisy channel coding theorem

2nR i=1 Pi < ": (5.27)

5.2 Von Neumann Entropy

5.2.1 Mathematical properties of S ()

5.2.2 Entropy and thermodynamics

5.3 Quantum Data Compression

5.3.1 Quantum data compression: an example

1.3 Ecient quantum algorithms

jx; yi ! jx; y xi; (1.36)

A = jaj2j0iA Ah0j + jbj2j1iA A h1j; (2.52)

we see as before that we may interpret as describing an ensemble of pure

E ! p(E); 0 p(E ) 1: (2.71)

(t) = e;iH tI (t)eiH t;

3 = Nlim 1 X (i);

and to argue that (j i)N becomes an eigenstate of 3 with eigenvalue jaj2 ;

Taking the formal N ! 1 limit, we conclude that 3 has vanishing disper-

R(^n; ) = 1 cos 2 + in^ ~ sin 2 = i p1 (1 + 3) = iH :)

I (X ; Y ) hlog pp(x(x;)py(y)) i = 0; (5.19)

5.2.1 Mathematical properties of S ()

6.2.2 BQP PSPACE

which comes as close as we please to x. Therefore, the To oli gate is

which applies ABC to the second qubit if x = 0, and AxBxC to the

2n x=0 (;1) (;1) = a;y ;

If a y = 1, then the terms in the coecient of jyi interfere destructively.

The Grover iteration, then, is nothing but a rotation by 2 in the plane

we recall that sin = p1N , or

after conjugating the last bit by H ; (n) becomes controlled(n;1)- z , which

p 1 X e2ix0y X e2ijry=N jyi:

! 12 (1 + )j0i + 12 (1 ; )j1i: (6.254)

j 0i H s p12 (j0i + 4 j1i)

j 0i H s p12 (j0i + j1i)

P~a = haj~ jai; (6.273)

I j i jeI iE + X j i jeX iE + Y j i jeY iE

= jii jstu iEA ; (7.23)

so that R acts according to