100% found this document useful (1 vote)

136 views218 pages

Course Notes

This document contains course notes for a Quantum Computing and Information class being taught in the fall of 2021. It outlines the topics that will be covered in each of the 16 weeks of the course, including overviews of quantum mechanics preliminaries, qubits, quantum gates, quantum algorithms like Deutsch's algorithm, Simon's algorithm, and Shor's factoring algorithm. Exercises will be assigned each week and are due one week later. The notes are subject to change during the semester.

Uploaded by

rdcfh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

136 views218 pages

Course Notes

Uploaded by

rdcfh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 218

Course Notes for CSCE 790-002

Quantum Computing and Information

Fall 2021

Stephen A. Fenner∗
Computer Science and Engineering Department
University of South Carolina

September 7, 2021

Abstract

These notes are mainly for me to lecture with, but you may find them useful to see what was
covered when. All exercises are due one week from when they are assigned. These notes are
subject to change during the semester. The date shown above is the date of the latest version.

∗
Columbia, SC 29208 USA. E-mail: [email protected]. This material is based upon work supported by the National
Science Foundation under Grant Nos. CCF-0515269 and CCF-0915948. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the author and do not necessarily reflect the views of the
National Science Foundation (NSF).

1
Contents

1 Week 1: Overview 8
Brief, vague history of quantum mechanics, informatics, and the combination
of the two. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Implementations of Quantum Computers (the Bad News). . . . . . . . . . . . 9
Implementations of Quantum Cryptography (the Good News). . . . . . . . . 9

2 Week 1: Preliminaries 9
Just Enough Linear Algebra to Understand Just Enough Quantum Mechanics. 9
The Complex Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The Exponential Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Adding and Multiplying Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . 12
The Identity Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Nonsingular Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Hilbert Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Orthogonality and Normality. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Week 2: Preliminaries 17
Linear Transformations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . 17
Adjoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Polarization Identities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Gram-Schmidt Orthonormalization. . . . . . . . . . . . . . . . . . . . . . . . . 19
Hermitean and Unitary Operators. . . . . . . . . . . . . . . . . . . . . . . . . . 20
L(H) is a Hilbert space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Week 2: Preliminaries 22
Dirac Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Change of (Orthonormal) Basis. . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Unitary Conjugation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Back to Quantum Physics: The Double Slit Experiment. . . . . . . . . . . . . . 24

2
5 Week 3: Unitary conjugation 25
Invariance under Unitary Conjugation: Trace and Determinant. . . . . . . . . 25
Orthogonal Subspaces, Projection Operators. . . . . . . . . . . . . . . . . . . . 25
Fundamentals of Quantum Mechanics. . . . . . . . . . . . . . . . . . . . . . . 29
Physical Systems and States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Time Evolution of an Isolated System. . . . . . . . . . . . . . . . . . . . . . . . 29
Projective Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Week 3: Projective measurements (cont.) 30

A Perfect Example: Electron Spin. . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Week 4: Qubits 34
Qubits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Back to Electron Spin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

8 Week 4: Density operators 39

Density Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Properties of the Pauli Operators. . . . . . . . . . . . . . . . . . . . . . . . . . 40
Single-Qubit Unitary Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A direct translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
From unitaries to rotations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
From rotations to unitaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 Week 5: Linear Algebra: Exponential Map, Spectral Theorem, etc. 47

The Exponential Map (Again). . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Upper Triangular Matrices and Schur Bases. . . . . . . . . . . . . . . . . . . . 49
Eigenvectors, Eigenvalues, and the Characteristic Polynomial. . . . . . . . . . 49
Eigenvectors and Eigenvalues of Normal Operators. . . . . . . . . . . . . . . 51
Scalar Functions Applied to Operators. . . . . . . . . . . . . . . . . . . . . . . 56
Positive Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Commuting Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

10 Week 5: Tensor products 63

Tensor Products and Combining Physical Systems. . . . . . . . . . . . . . . . 63
Back to Combining Physical Systems. . . . . . . . . . . . . . . . . . . . . . . . 65
The No-Cloning Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3
Quantum Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

11 Week 6: Quantum gates 69

Quantum Circuits Versus Boolean Circuits. . . . . . . . . . . . . . . . . . . . . 72
Why Clean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

12 Week 6: Measurement gates 76

Measurement gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Bell States and Quantum Teleportation. . . . . . . . . . . . . . . . . . . . . . . 78
Dense Coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

13 Week 7: Basic quantum algorithms 82

Black-Box Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Deutsch’s Problem and the Deutsch-Jozsa Problem. . . . . . . . . . . . . . . . 82

14 Week 7: Simon’s problem 88

Simon’s Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Linear Algebra over Z2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Back to Simon’s Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Shor’s Algorithm for Factoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Modular Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Factoring Reduces to Order Finding. . . . . . . . . . . . . . . . . . . . . . . . . 94

15 Week 8: Factoring and order finding (cont.) 96

Geometric series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
The Quantum Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . 99

16 Week 8: Shor’s algorithm (cont.) 101

Analysis of Shor’s Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

17 Week 9: Best rational approximations 108

The Continued Fraction Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 108
Implementing the QFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

18 Week 9: Approximate QFT 113

Exact versus Approximate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
The Cauchy-Schwarz inequality. . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4
A Hilbert Space Is a Metric Space. . . . . . . . . . . . . . . . . . . . . . . . . . 114

19 Midterm Exam 120

20 Week 10: Grover’s algorithm 122

Quantum Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Some Variants of Quantum Search. . . . . . . . . . . . . . . . . . . . . . . . . . 124

21 Week 10: Quantum search lower bound 125

A Lower Bound on Quantum Search. . . . . . . . . . . . . . . . . . . . . . . . 125

22 Week 11: Quantum cryptography 129

Quantum Cryptographic Key Exchange. . . . . . . . . . . . . . . . . . . . . . 129

23 Week 11: Basic quantum information 134

Norms of Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
POVMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Mixed States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
One-Qubit States and the Bloch Sphere. . . . . . . . . . . . . . . . . . . . . . . 139

24 Week 12: Quantum channels (quantum operations) 140

The Partial Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Open Systems and Quantum Channels. . . . . . . . . . . . . . . . . . . . . . . 141
Equivalence of the Coupled-Systems and Operator-Sum Representations. . . 143
A Normal Form for the Kraus Operators. . . . . . . . . . . . . . . . . . . . . . 146
Quantum Channels Between Different Hilbert Spaces. . . . . . . . . . . . . . 148
General Measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Completely Positive Maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

25 Week 12: Distance and fidelity 155

Distance Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Trace Distance and Fidelity of Operators. . . . . . . . . . . . . . . . . . . . . . 156
Properties of the Trace Distance. . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Properties of the Fidelity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Comparing Trace Distance and Fidelity. . . . . . . . . . . . . . . . . . . . . . . 162

5
26 Week 13: Quantum error correction 164
Quantum Error Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
The Quantum Bit-Flip Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
The Quantum Phase-Flip Channel. . . . . . . . . . . . . . . . . . . . . . . . . . 169
The Shor Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

27 Week 13: Error correction (cont.) 175

Quantum Error Correction: The General Theory. . . . . . . . . . . . . . . . . . 175
Discretization of Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

28 Week 14: Fault tolerance 181

Fault-Tolerant Quantum Computation. . . . . . . . . . . . . . . . . . . . . . . 181

29 Week 15: Stabilizers, Entanglement, and Bell inequalities 183

29.1 Stabilizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
The Pauli Group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Stabilizing Subgroups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Stabilizing Subgroups Acting on H. . . . . . . . . . . . . . . . . . . . . . . . . 185
Connection to Linear Algebra Over Z2 . . . . . . . . . . . . . . . . . . . . . . . 188
Stabilizer Circuits and the Gottesman-Knill Theorem. . . . . . . . . . . . . . . 190
Remark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Stabilizer Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
29.2 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Shannon entropy and von Neumann entropy. . . . . . . . . . . . . . . . . . . 198
29.3 Bell inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
The CHSH game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
The Mermin game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

A Final Exam 208

B Background Results 211

B.1 The Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
B.2 The Schur Triangular Form and the Spectral Theorem . . . . . . . . . . . . . . . . . . 212
B.3 The Polar and Singular Value Decompositions . . . . . . . . . . . . . . . . . . . . . . 214
B.4 Sterling’s Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
B.5 Inequalities of Markov and Chebyshev . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6
B.6 Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
B.7 A Standard Tail Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

7
1 Week 1: Overview

Brief, vague history of quantum mechanics, informatics, and the combination of the two.

Quantum Theory The foundations of quantum mechanics were established “by committee”:
Niels Bohr, Albert Einstein, Werner Heisenberg, Erwin Schrödinger, Max Planck, Louis
de Broglie, Max Born, John von Neumann, Paul A.M. Dirac, Wolfgang Pauli, and others
over the first half of the 20th century. The theory provides extremely accurate descriptions
of the world at the atomic and subatomic levels, where “classical” (i.e., Newtonian) physics
and electrodynamics break down. Examples: stability of atoms, black body radiation, sharp
spectral absorption lines, etc.

Informatics Broadly, this is the study of all aspects of information—its storage, transmission, and
manipulation (i.e., computation). It includes what is commonly called Computer Science in
the US, as well as Information Theory. Foundations of Computer Science were laid at about
the same time as quantum mechanics by Gottlob Frege, David Hilbert, Alonzo Church,
Haskell Curry, Kurt Gödel, John Barkley Rosser, Alan Turing, Jacques Herbrand, Emil Post,
Stephen Kleene and others, who were developing a formal notion of “algorithm” or “effec-
tive procedure” to understand problems in the foundations of mathematics. Foundations of
computability culminated in the Church-Turing thesis. Largly independently, the field of In-
formation Theory started in 1948 with Claude Shannon’s paper, “A Mathematical Theory of
Communication.” Information theory deals with quantifying information and understand-
ing how it can be stored and transmitted, both securely and otherwise. Shannon defined
the notion of information entropy, somewhat analogously to physical entropy, and proved
engineering-related results about compression and noisy transmission that are in common
use today.

Quantum Information and Computation The physicist Richard Feynman first suggested the idea
of a quantum computer and what it could be used for. Charles Bennett (80s?) showed that
reversible computation (with no heat dissipation or entropy increase) was possible at least
in principle. Paul Benioff (80s) showed how quantum dynamics could be used to simulate
classical (reversible) computation, David Deutsch (80s) defined the Quantum Turing Ma-
chine (QTM) and quantum circuits as theoretical models of a quantum computer. Further
foundational work was done by Bernstein & Vazirani, Yao, and others (quantum complexity
theory). Bennett and Gilles Brassard (1984) proposed a scheme for unconditionally secure
cryptographic key exchange based on quantum mechanical principles, using polarized pho-
tons. Deutsch & Jozsa and Simon (early 90s) gave “toy” problems on which quantum
computers performed provably better than classical ones. A big breakthrough came in the
mid 1990s when Peter Shor showed how a quantum computer can factor large integers
quickly (1994), as well as compute discrete logarithms (these would break the security of
most public key encryption schemes in use today). Grover (1996?) proposed a completely
different quantum algorithm to quadratically speed up list search. Calderbank & Shor and
Steane (1996?) showed that good quantum error-correcting codes exist and that fault-tolerant
quantum computation is possible. This led to the threshold theorem (D. Aharonov, A. Yu. Ki-
taev(?)), which states that there is a constant ε0 > 0 (current rough estimates are around
10−4 ) such that if the noise associated with each gate can be kept below ε0 , then any quantum

8
computation can be carried out with arbitrarily small probability of error. This theorem
shows that noise is not a fundamental impediment to quantum computation.

Implementations of Quantum Computers (the Bad News). There are several proposals for
physical devices implementing the elements of quantum computation. Each has its own strengths
and weaknesses. In recent years, ion traps look the most promising. We’re still far off from a
viable, scalable, robust prototype.

Nuclear Magnetic Resonance (NMR) Quantum bits are nuclei of atoms (hydrogen?) arranged
on an organic molecule. The value of the bit is given by the spin of the nucleus. Nuclear
spins can be controlled by electromagnetic pulses of the right frequency and duration. Main
advantage: spins are well shielded from the outside by the electron clouds surrounding
them, so they stay coherent for a long time. Main disadvantage: since the nuclei need to be
on same molecule to control the distances between them, NMR does not scale well. Homay
Valafar will talk about NMR toward the end of the course.

Ions in traps Qubits are ions kept equally spaced in a row (a couple of inches apart) by an
oscillating electric field. Laser pulses can control the states of the ions.

Quantum dots Qubits are particles (electrons?) kept in nanoscopic wells on the surface of a silicon
chip. Main advantage: easy to control and fabricate (solid state). Main disadvantage: short
decoherence times.

Optical schemes Qubits are polarized photons traveling through mirrors, lenses, crystals, and the
vacuum. Main advantages: photons don’t decay and their polarizations are easy to measure;
computation is at the speed of light. Main disadvantage: hard to get photons to interact with
each other.

Superconducting/Josephson junctions I don’t know much about this, except that it presumably
needs temperatures close to absolute zero.

Implementations of Quantum Cryptography (the Good News). Quantum crypto not only
works in the real world, but works just fine on fiber optic networks already in place. British
Telecomm (mid 1990s?) demonstrated the BB84 quantum key exchange protocol using cable laid
across Lake Geneva in Switzerland. I believe the scheme has also been demonstrated to work
with photons through the air over modest distances (a few kilometers?). It is now feasible to use
the fiber optic cable already in place to implement quantum crypto in the network of a major city
(New York banks are already using it(?)). It still won’t work over really large distances without
classical repeaters (“quantum amplification” is theoretically impossible).

2 Week 1: Preliminaries

Just Enough Linear Algebra to Understand Just Enough Quantum Mechanics. We let Z denote
the set of integers, Q denote the set of rational numbers, R denote the set of real numbers, and C
denote the set of complex numbers.

9
The Complex Numbers. C is the set of all numbers of the form z = x + iy, where x, y ∈ R and
i2 = −1. We often represent z as the point (x, y) in the plane. The complex conjugate (or adjoint) of z
is
z∗ = z = x − iy.
Note that x = (z + z∗ )/2 and is the real part of z (<(z)). Similarly, y = (z − z∗ )/2i is the imaginary
part of z (=(z)). The norm or absolute value of z is
√ p
|z| = z∗ z = x2 + y2 > 0,

with equality holding iff z = 0. If z1 , z2 ∈ C, it’s easy to check that |z1 z2 | = |z1 | · |z2 |. It’s not quite
so easy to check that
|z1 + z2 | 6 |z1 | + |z2 | , (1)
but see Corollary B.2 in Section B.1 for a proof. (1) is an example of a triangle inequality.

Exercise 2.1 Let z := 3 − 7i and w := −1 + 2i. Find (a) z + w, (b) zw, (c) |z|, (d) z∗ , and (e) 1/w.

Exercise 2.2 Check that (z1 z2 )∗ = z∗1 z∗2 and (z1 + z2 )∗ = z∗1 + z∗2 and (−z1 )∗ = −z∗1 for all z1 , z2 ∈ C.
Express each answer in the form x + iy for real x, y.

If z , 0, then the argument of z (arg(z)) is defined as the angle that z makes with the positive real
axis. Our convention will be that 0 6 arg(z) < 2π. It is known that arg(z1 z2 ) = arg(z1 ) + arg(z2 )
up to a multiple of 2π.
The real numbers R forms a subset of C consisting of those complex numbers with 0 imaginary
part, namely,
R = {z ∈ C : z = z∗ }.
The unit circle in C is the set of all z of unit norm, i.e., {z ∈ C : |z| = 1}.
C is an algebraically closed field. That is, every polynomial of positive degree with coefficients
in C has a root in C, in fact n of them, where n is the degree of the polynomial. This is equivalent
to saying that every polynomial over C is a product of linear (i.e., degree 1) factors. This fact is
known as the Fundamental Theorem of Algebra.
Every polynomial over R can be factored into real polynomial factors of degrees 1 and 2. This
implies that any odd-degree real polynomial has at least one real root.

The Exponential Map. For any z, we can define ez = exp(z) by the usual power series:

z2 z3 zk
ez = 1 + z + + + ··· + + ···, (2)
2! 3! k!
which converges for all z.
Here are some essential properties of the exponential map on C:

• For all z, w ∈ C, ez+w = ez ew .

10
• e0 = 1.
• e−z = 1/ez .
• ez , 0.

Exercise 2.3 Show that for any real θ,

eiθ = cos θ + i sin θ . (3)
This very important identity is known as Euler’s formula, named after the Swiss mathematician
Leonhard Euler. It connects the exponential and trigonometric functions. [Hint: Compare the
power series for eiθ with those for sin θ and cos θ. Other proofs exist.]

Exercise 2.4 Using Euler’s formula (from the previous exercise), find eiπ/2 and e−iπ/3 . Express
each answer in the form x + iy for real x, y.

Exercise 2.5 Show that for all θ ∈ R,

eiθ + e−iθ
cos θ = ,
2
eiθ − e−iθ
sin θ = .
2i
Thus we can express the basic trigonometric functions in terms of exponentials. [Hint: Use Euler’s
formula and the fact that cos(−θ) = cos θ and sin(−θ) = − sin θ.]

By Exercise 2.3, we have ez = ex (cos y + i sin y). The unit circle is the set {eiθ : θ ∈ R}.

Vector Spaces. We’ll deal with finite dimensional vector spaces only. Much of quantum mechan-
ics requires infinite dimensional spaces, but thankfully, the QM that relates to information and
computation only requires finite dimensions. So all our vector spaces are finite dimensional.
Our vector spaces will usually be over C, the field of complex numbers, but sometimes they
will be over R (i.e., real vector spaces), and when we do information theory, will need to look at
bit vectors (vectors in spaces over the two-element field Z2 = {0, 1}).
In a vector space, vectors can be added to each other and multiplied by scalars, obeying the
usual rules. If V is an n-dimensional vector space and B = {b1 , . . . , bn } is a basis for V, then every
v ∈ V is written as a linear combination of basis vectors:
v = a1 b1 + · · · + an bn ,
where a1 , . . . , an are unique scalars. Thus we can identify the vector v with the n-tuple
 
a1
 .. 
 . ,
an
which we may also write as (a1 , . . . , an ). Under this identification, vector addition and scalar
multiplication are componentwise.
The vector (0, . . . , 0) is the zero vector, denoted by 0.

11
Matrices. For integers m, n > 0, an m × n matrix is a rectangular array of scalars with m rows
and n columns. If A is such a matrix and 1 6 i 6 m and 1 6 j 6 n, we denote the (i, j)th entry of
A (i.e., the scalar in the ith row and jth column) as [A]ij or A[i, j]. The former notation is useful if
the matrix is given by a more complicated expression.
A matrix A is upper triangular if [A]ij = 0 whenever i > j. A is lower triangular if [A]ij = 0
whenever i < j. A is triangular if A is either upper or lower triangular. If A is both upper and
lower triangular, then we can say that A is a diagonal matrix. In this case, all nonzero entries of
A must lie on the main diagonal. Triangular matrices have some nice properties that make them
simple to work with in some cases.

Adding and Multiplying Matrices. Given positive integers m and n and m × n matrices A and
B, we can define the matrix sum A + B to be the unique m × n matrix satisfying

[A + B]ij = [A]ij + [B]ij

for all 1 6 i 6 m and 1 6 i 6 n. That is, one just adds corresponding entries in A and B for
the corresponding entry in the sum. For this to be well defined, A and B must have the same
dimensions, in which case we say that A and B are conformant (for matrix addition); otherwise,
A + B is undefined. If k is a scalar, we can define the scalar multiplication of k with A as the unique
m × n matrix kA satisfying
[kA]ij = k[A]ij
for all i and j as above. One just multiplies each entry of A by k to get the corresponding entry of
kA. One can also write Ak for the same matrix. As you may expect, we write −A for (−1)A and
write A − B for A + (−B). If k , 0, we can also write A/k for (1/k)A.
For positive integers m, n, and s, suppose A is an m × s matrix and B is an s × n matrix. Then
we define the matrix product of A and B as the unique m × n matrix AB satisfying

X
s
[AB]ij = [A]ik [B]kj
k=1

for all 1 6 i 6 m and 1 6 j 6 n. Note that the number of columns of A must equal the number of
rows of B for the product to be well-defined, in which case we say that A and B are conformant (for
matrix multiplication).
Most of the usual laws of addition and multiplication of scalars extend to matrices. In each
identity below, we use A, B, C to stand for arbitrary matrices, I for a unit matrix, and k and ` for
any scalars. For each identity, one side is well-defined if and only if the other side is well-defined.

Commutativity of matrix +: A + B = B + A.

Associativity of matrix +: (A + B) + C = A + (B + C).

Identity for matrix +: 0 + A = A.

Matrix negation: A − A = 0.

Associativity of scalar ×: (k`)A = k(`A).

12
Distributivity of scalar × over matrix +: k(A + B) = kA + kB.

Distributivity of scalar × over scalar +: (k + `)A = kA + `A.

Identity for scalar ×: 1A = A.

Associativity of matrix ×: (AB)C = A(BC).

Distributivity of matrix × over matrix +: A(B + C) = AB + AC.

Distributivity of matrix × over matrix +: (B + C)A = BA + CA.

Commutativity and associativity of scalar ×: k(AB) = (kA)B = A(kB).

There is no commutative law for matrix multiplication; that is, it is not generally true that AB = BA
for matrices A and B, even if both sides are well-defined. If this equation does hold, then we say
that A and B commute.

Exercise 2.6 Find two 2 × 2 matrices A and B such that AB = 0 (the zero matrix), but BA , 0.

The Identity Matrix. For any n, the n × n identity matrix or unit matrix In has 1’s on its main
diagonal and 0’s everywhere off the diagonal, so

1 if i = j,
[In ]ij = δij =
0 if i , j.

This equation also defines the expression δij , which is called the Kronecker delta. For example,
 
1 0 0
I3 =  0 1 0  .
0 0 1

Unit matrices have the property that for any matrix A (say, p × q),

Ip A = AIq = A .

We may drop the subscript (I instead of In ) if the dimension is clear from the context.

Nonsingular Matrices. A square matrix A is nonsingular or invertible iff there exists a matrix B of
the same dimensions such that AB = BA = I. Such a B, if it exists, is uniquely determined by A
and is denoted A−1 . In this case, it is of course true that B is also nonsingular and that B−1 = A.

Determinant. For an n × n matrix A, the determinant of A, denoted det A, is a scalar value that
depends on the entries and their positions inside A. A compact expression for the determinant is
beyond the scope of this course, and besides, we won’t deal with it very much, except to define
eigenvalues and eigenvectors. But at least for the record we can say (without proof) that the map
det mapping n × n matrices to scalars is the unique map satisfying the following two properties:

13
1. det(AB) = (det A)(det B) for all n × n matrices A and B.

2. For any n × n triangular

Q matrix A, det A is the product of all the main diagonal elements of
A, i.e., det A = ni=1 [A] ii .

One fundamental fact about the determinant is that a matrix A is nonsingular if and only if
det A , 0.

Trace. If A is an n × n matrix, the trace of A (denoted tr A) is defined as the sum of all the diagonal
elements of A, i.e.,
X
n
tr A = [A]ii .
i=1
The trace has three fundamental properties:

1. tr I = n, where I is the n × n identity matrix.

2. tr(A + aB) = tr A + a tr B, for n × n matrices A and B and scalar a. (The trace is linear.)

3. tr(AB) = tr(BA) for any n × n matrices A and B.

In fact, tr is the only function from n × n matrices to scalars that satisfies (1)–(3) above.

Exercise 2.7 (Challenging) Prove this last statement.

Exercise 2.8 Show that for any integers m, n > 1, if A is an n × m matrix and B is an m × n matrix,
then
tr(AB) = tr(BA) . (4)
This verifies item (3) above about the trace. We will use this fact frequently.

Hilbert Spaces. A vector space H over C is a Hilbert space if it has a scalar product h·, ·i : H × H →
C that behaves as follows for all u, v, w ∈ H and a ∈ C:

1. hu, v + awi = hu, vi + ahu, wi (h·, ·i is linear in the second argument).

2. hu, vi = hv, ui∗ (h·, ·i is conjugate symmetric).

3. hu, ui > 0, and if u , 0 then hu, ui > 0.

Note that (2) implies that hu, ui ∈ R, so (3) merely asserts that it can’t be negative. Also note
that (1) and (2) imply that hv + aw, ui = hv, ui + a∗ hw, ui, i.e., h·, ·i is conjugate linear in the first
argument. Such a scalar product is called a Hermitean form or a Hermitean inner product.
The norm of a vector u ∈ H is defined as kuk = hu, ui. Note that by (3), k0k = 0 and kuk > 0
p

if u , 0.

Exercise 2.9 Show that for any u ∈ H and any a ∈ C, kauk = |a|kuk.

14
Example. We consider the vector space Cn of all n-tuples of complex numbers (for some n > 0),
where vector addition and scalar multiplication are componentwise, i.e.,
         
u1 v1 u 1 + v1 u1 au1
 ..   ..   ..
 . + . = and a  ...  =  ...  .
    
. 
un vn u n + vn un aun

We define the Hermitean inner product for all vectors u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) as

X
n
hu, vi = u∗1 v1 + · · · + u∗n vn = u∗i vi .
i=1

In this example, u and v can be expressed as linear combinations over the “standard” basis
{e1 , . . . , en }, where  
0
 .. 
 . 
 
 0 
 
 1 ,
ei =  (5)

 0 
 
 .. 
 . 
0
where the 1 occurs in the ith row.

Exercise 2.10 Check that the three properties of a Hermitean form are satisfied in this example.

Note that if we restrict the ui and vi to be real numbers, then this is just the familiar dot product
of two real vectors. Also note that in this example,
q q
kuk = hu, ui = u1 u1 + · · · + un un = |u1 |2 + · · · + |un |2 .
p
∗ ∗

Orthogonality and Normality. In a genuine sense, the example above is the only example that
really matters. First some more definitions. Two vectors u, v in a Hilbert space H are orthogonal
or perpendicular if hu, vi = 0. A vector u is a normal or a unit vector if kuk = 1. A set of vectors
v1 , . . . , vk ∈ H is an orthogonal set if different vectors are orthogonal. The set is an orthonormal set if,
in addition, each vector is a unit vector. That is, for all 1 6 i, j 6 k, we have

1 if i = j,
vi , vj = δij =
0 if i , j.

A basis for H is an orthonormal basis if it is an orthonormal set. Orthonormal bases are special
and have nice properties that make them preferable to other bases. From now on we will assume
that all our bases (for Hilbert spaces) are orthonormal unless I say otherwise, and I won’t.

15
In the example above, e1 , . . . , en clearly form an orthonormal basis. We’ll see later that every
Hilbert space has an orthonormal basis—lots of them, in fact. But let’s get back to our example. If
we fix an orthonormal basis B = {β1 , . . . , βn } for a Hilbert space H, then we can write two vectors
u, v ∈ H in terms of B as
X
n X
n
u= ui βi and v= vj βj ,
i=1 j=1

for some unique scalars u1 , . . . , un , v1 , . . . , vn ∈ C. Let’s see what hu, vi is.

hu, vi = hu1 β1 + · · · + un βn , vi
Xn
= u∗i hβi , vi (conjugate linearity in the first argument)
i=1
X
= u∗i hβi , v1 β1 + · · · + vn βn i
i
X X
n
u∗i

= vj βi , βj (linearity in the second argument)
i j=1
X
u∗i vj

= βi , βj
i,j
X
= u∗i vj δij (the basis is orthonormal)
i,j
X
n
= u∗i vi .
i=1

In other words, hu, vi is exactly the quantity of our example above, if we identify u with the tuple
(u1 , . . . , un ) ∈ Cn and v with the tuple (v1 , . . . , vn ) ∈ Cn .

Exercise 2.11 Show that any orthogonal set of nonzero vectors is linearly independent. [Hint: Let
v be any linear combination of such vectors, and consider hv, vi. You’ll need the fact that h·, ·i is
positive definite.]

16
3 Week 2: Preliminaries

Linear Transformations and Matrices. Let U and V be vector spaces. A linear map is a function
T : U → V such that, for all vectors u, v ∈ U and scalar a,

T (u + av) = T u + aT v.

The vector addition and scalar multiplication on the left-hand side is in U, and the right-hand side
is in V. If {α1 , . . . , αn } is a basis for U and {β1 , . . . , βm } is a basis for V, then T can be expressed
uniquely in matrix form with respect to these bases: For each 1 6 j 6 n, we write T αj uniquely as
a linear combination of the βi :
X m
T αj = aij βi , (6)
i=1

where each aij is a scalar. Now let A be the m × n matrix whose (i, j)th entry is aij . Expressing
any u ∈ U with respect to the first basis (of U) as
 
u1
Xn
u= uj αj =  ...  ,
 
j=1 un

we get
 
X
n
Tu = T  uj αj 
j=1
X
n
= uj T αj (by linearity)
j=1
X X
m
!
= uj aij βi (by (6))
j i=1
 
X X
=  aij uj  βi
i j
 P 
j a1j uj
= 
 .. 
P . 
j amj uj
  
a11 ··· a1n u1
 .. ..   .. 
=  . .  . 
am1 · · · amn un
 
u1
 ..
= A . ,


17
expressed with respect to the second basis (of V). Thus applying T to a vector u amounts to
multiplying the corresponding matrix on the left with the corresponding column vector on the
right.
Conversely, given bases for U and for V, an m × n matrix defines a unique linear map T whose
action on a vector u is given above.
Thus, linear maps and matrices are interchangeable, given bases for the requisite spaces.
Linear maps (with the same domain and codomain) can be added and multiplied by scalars
thus:
(T1 + T2 )u = T1 u + T2 u,
(aT )u = a(T u).
The two equations above define T1 + T2 and aT respectively (a a scalar) by showing how they map
an arbitrary vector u. This makes the set of all such linear maps a vector space in its own right.
If U and V are Hilbert spaces and the {αj } and {βi } are orthonormal bases, then each entry aij
can be expressed as a scalar product in V:

Xm
βi , T αj = βi , a1j β1 + · · · + amj βm = akj hβi , βk i = aij .
k=1

One upshot of this is that a linear map T is completely determined by the quantities βi , T αj for
all i and j.

Adjoints. If A is any m × n matrix over C, the adjoint of A (denoted A∗ or A† ) is the n × m matrix

obtained by taking the transpose of A and then taking the complex conjugate of each entry. That
is,
[A∗ ]ij = ([A]ji )∗ ,
for all 1 6 i 6 n and 1 6 j 6 m. The star on the left denotes the adjoint operator on matrices while
the star on the right denotes complex conjugation in C.
Note the following:

1. (A∗ )∗ = A.
2. (A + aB)∗ = A∗ + a∗ B∗ . (Here, A and B have the same dimensions, and a ∈ C.)
3. (AB)∗ = B∗ A∗ .

An important special case is u∗ where u = (u1 , . . . , un ) is a column vector (i.e., an n × 1 matrix).

We have,
u∗ = u∗1 · · · u∗n .

That is, u∗ is a row vector (i.e., a 1 × n matrix), called the dual vector of u. If u = (u1 , . . . , un ) and
v = (v1 , . . . , vn ) are vectors in some Hilbert space, expressed with respect to an orthonormal basis
{α1 , . . . , αn }, then by our previous example we have
X
n
hu, vi = u∗i vi = u∗ v . (7)
i=1

18
Here, we identify the 1 × 1 matrix u∗ v with the scalar comprising its sole entry.
If H and J are Hilbert spaces and T : H → J is linear, then there exists a unique linear map
T ∗ : J → H such that for all u ∈ H and v ∈ J,

hv, T ui = hT ∗ v, ui.

Note that the left-hand side is the scalar product in J, and the right-hand side is the scalar product
in H.
If we pick any orthonormal bases for H and J, then these two definitions of the adjoint coincide.
That is, if T is represented by the matrix A, then T ∗ is represented by the matrix A∗ .

Exercise 3.1 (Challenging) Prove this fact.

Polarization Identities. Let H and cJ be Hilbert spaces and A, B : H → J linear maps. We have
the following easily verifiable polarization identity: For every x, y ∈ H,

1X
3
(−i)k A(x + ik y), B(x + ik y) .

hAx, Byi = (8)
4
k=0

Exercise 3.2 Verify Equation (8).

Equation (8) has a number of interesting special cases. Here are two: If A = B, then we have

1X 1X
3 3
2
(−i)k A(x + ik y), A(x + ik y) = (−i)k A(x + ik y) ,

hAx, Ayi = (9)
4 4
k=0 k=0

and if H = J and A = B = IH , then we have

1X 1X
3 3
2
(−i)k x + ik y, x + ik y = (−i)k x + ik y .

hx, yi = (10)
4 4
k=0 k=0

Equation (10) is significant because it shows that the inner product on H is completely determined
by the norm itself. Equation (9) implies that if an operator A preserves norms, it must also preserve
inner products.

Gram-Schmidt Orthonormalization. We prefer orthonormal bases for our Hilbert spaces. Here
we show that they actually exist, and in abundance. Let H be an n-dimensional Hilbert space and
let {b1 , . . . , bn } be any basis (not necessarily orthonormal) for H. For i = 1 to n in order, define

X
i−1
xi = bi − hyk , bi iyk
k=1
xi
yi = .
kxi k

19
This is known as the Gram-Schmidt procedure. We’ll see that {y1 , . . . , yn } is an orthonormal basis.
It’s not obvious that the yi are even well-defined, since we need to establish that kxi k in the
denominator is nonzero. We can prove the following facts simultaneously by induction on i for
1 6 i 6 n, that is, assuming that all the facts are true for all j < i, we prove all the facts for i:

1. xi , 0 (and thus kxi k > 0).

2. kyi k = 1.

3. {b1 , . . . , bi }, {x1 , . . . , xi }, and {y1 , . . . , yi } are each linearly independent sets of vectors which
span the same subspace of H.

4. hyi , bi i = hbi , yi i > 0.

5. yj , yi = yi , yj = 0 for all j < i.

For the last item, we compute

X

!

yj , xi 1

yj , b i − yj , b i
yj , yi = = yj , b i − hyk , bi i yj , yk = = 0.
kxi k kxi k kxi k
k<i

The second to last equation comes from the fact that yj , yk = δjk for all j, k < i, which is part of
the inductive hypothesis.
It turns out (we won’t prove this) that given a basis b1 , . . . , bn there can only be one unique list
y1 , . . . , yn satisfying all the items (2)–(5) above.

Exercise 3.3 (Challenging) Prove this fact.

Exercise 3.4 Prove that applying the Gram-Schmidt procedure to a basis that is already orthonor-
mal just results in the same basis.

Hermitean and Unitary Operators.

Definition 3.5 If H and J are Hilbert spaces, we let L(H, J) denote the space of all linear maps
from H to J. We abbreviate L(H, H) by L(H), the space of all linear operators on H, with identity
element I. Note that L(H, J) is a vector space over C.
A map A ∈ J(H) is Hermitean (or self-adjoint) if A∗ = A. A map A is unitary if AA∗ = I
(equivalently, A∗ A = I).

For any u, v ∈ H, we have the following easy facts:

• If A is Hermitean, then hu, Avi = hAu, vi. This follows immediately from the fact that
hu, Avi = hA∗ u, vi.

• If A and B are Hermitean then so is A + B.

20
• If A is Hermitean and a is real, then aA is Hermitean.
• If A is Hermitean, then so is A∗ .
• If A is unitary, then hAu, Avi = hu, vi, that is, A preserves the scalar product. To see this, we
just compute
hAu, Avi = hA∗ Au, vi = hIu, vi = hu, vi.

• If A and B are unitary, then so is AB.

• If A is unitary, then so is A∗ . Note that A∗ = A−1 in this case.
• I is both Hermitean and unitary.

More on all this later.

L(H) is a Hilbert space. In Definition 3.5, we mentioned that L(H, J) is a vector space over C. In
fact, its dimension is the product of the dimensions of H and of J: Suppose H has dimension n and
J has dimension m. Given orthonormal bases for each space, an element of L(H, J) corresponds an
m × n matrix. You can think of this matrix as a vector with mn components which just happen to
be arranged in a 2-dimensional array rather than a single column. The vector addition and scalar
multiplication operations on these matrices are componentwise, just as with vectors, so L(H, J)
has dimension mn.
There is a natural inner product that one can define on L(H, J) that makes it into an mn-
dimensional Hilbert space. For all A, B ∈ L(H, J), define

hA, Bi := tr(A∗ B) . (11)

This is known as the Hilbert-Schmidt inner product on L(H, J). It looks similar to the expression
u∗ v for the inner product of vectors u and v (Equation (7)), except that A∗ B is not a scalar but an
operator in L(H), and so we take the trace to get a scalar result.

Exercise 3.6 Show that L(H, J), together with its Hilbert-Schmidt inner product, satisfies all the
axioms of a Hilbert space. [Hint: You can certainly just verify the axioms directly. Alternatively,
represent operators as matrices with respect to some fixed orthonormal bases of H and J, respec-
tively, then show that if A and B are m × n matrices, then tr(A∗ B) is the usual inner product of A
and B on Cmn , where we identify each matrix with the mn-dimensional vector of all its entries.]

The Hilbert-Schmidt inner product interacts nicely with composition of linear maps (or equiv-
alently, matrix multiplication).

Exercise 3.7 Let H, J, and K be Hilbert spaces. Verify directly that for any A ∈ L(H, K), B ∈
L(J, K), and C ∈ L(H, J),
hA, BCi = hB∗ A, Ci = hAC∗ , Bi . (12)
This means that you can move a right or left factor from one side of the inner product to the other,
provided you take its adjoint. [Hint: Use Exercise 2.8 along with basic properties of adjoints. You
may assume that A, B, and C are all matrices of appropriate dimensions.]

21
4 Week 2: Preliminaries

Exercise 4.1 Let b1 = (−3, 0, 4), b2 = (3, −1, 2), and b3 = (0, 1, −1). Perform the Gram-Schmidt
procedure above on {b1 , b2 , b3 } to find the corresponding {x1 , x2 , x3 } and {y1 , y2 , y3 }.

Dirac Notation. In what follows, we fix an n-dimensional Hilbert space H and some orthonormal
basis for it, so we can identify vectors with column vectors in the usual way. Recall that for column
vectors u, v ∈ H, we have
hu, vi = u∗ v .
Paul Dirac suggested a notation which somewhat reconciles the two sides of this equation: if we let
|ψi denote the column vector v and we let hϕ| denote the row vector u∗ , then hu, vi = u∗ v = hϕ|ψi
is just the usual multiplication of a row vector and a column vector (the two vertical bars overlap).
Note how the product hϕ|ψi looks like hu, vi with the comma replaced by a vertigule. This notation
has become standard in quantum mechanics. We denote a (column) vector by |ψi, where ψ is some
label identifying it, and we denote its corresponding dual (row) vector by hψ| (thus hψ| = |ψi∗ and
vice versa: |ψi = hψ|∗ ). The choice of delimiters tells us whether we are talking about a column
vector or a row vector. A vector of the form |ψi (i.e., a column vector) is called a ket vector. If |ϕi is
another ket vector, its dual (a row vector) hϕ| = |ϕi∗ is called a bra vector, so that the scalar hϕ|ψi
can be called the bracket (“bra-ket”) of |ϕi and |ψi.
We’ll start using Dirac notation because the book uses it, although there are some times when
the notation just gets too clunky, and so then we will go back to using the “standard” notation.
We can combine kets and bras in other ways. For example, |ψihϕ| is a column vector on the left
multiplied by a row vector on the right (in standard notation, vu∗ , where u and v are as above).
This is then an n × n matrix, or considered another way, a linear operator H → H that takes a
vector |χi and maps it to the vector |ψihϕ|χi = (hϕ|χi)|ψi (that is, the vector |ψi multiplied by
the scalar hϕ|χi). In any case, combining bras and kets just amounts to the usual vector or matrix
multiplication.
As a special case, if {e1 , . . . , en } is the orthonormal basis for H that we have fixed, then, letting
|ii := ei for all 1 6 i 6 n, we have, for all 1 6 i, j 6 n,
   
0 0 ··· 0 0 0 ··· 0
 ..   .. .. .. .. .. 
 .   . . . . . 
   
 0   0 ··· 0 0 0 ··· 0 
|iihj| = ei e∗j = 
   
 1  0 ··· 0 1 0 ··· 0 =  0 ··· 0 1 0 ··· 0 ,
  
 0   0 ··· 0 0 0 ··· 0 
   
 ..   .. .. .. .. .. 
 .   . . . . . 
0 0 ··· 0 0 0 ··· 0

Where the 1 is in the ith row and jth column. This matrix is usually denoted Eij . Notice that if A
is a linear map H → H whose corresponding matrix has entries aij , then by the equation above
we must have, X X
A= aij Eij = aij |iihj|,
i,j i,j

22
where both indices in the summation run from 1 to n. In particular, the identity operator is given
by X
I= |iihi|.
i

Change of (Orthonormal) Basis. Let H be as before, and let {e1 , . . . , en } and {f1 , . . . , fn } be two
orthonormal bases for H. There is a unique linear map U ∈ L(H) mapping the first basis to the
second, i.e., Uei = fi for all 1 6 i 6 n. Now for each 1 6 i, j 6 n we have

ei , U∗ Uej = Uei , Uej = fi , fj = δij = ei , ej = ei , Iej .

Since the linear map U∗ U is uniquely determined by the quantities above, we must therefore have
U∗ U = I, and thus U is unitary.
Conversely, if U is unitary and {e1 , . . . , en } is an orthonormal basis, then {Ue1 , . . . , Uen } is also
an orthonormal basis, because U preserves the scalar product.
We conclude that the operators needed to change orthonormal bases in a Hilbert space are
exactly the unitary operators.

Unitary Conjugation. If A and B are two linear operators in L(H) (equivalently, two n × n-
matrices), then we say that A is unitarily conjugate to B if there exists a unitary U such that
B = UAU∗ . The relation “is unitarily conjugate to” is an equivalence relation on L(H), that is, it is
reflexive, symmetric, and transitive.

Exercise 4.2 Prove this. I.e., prove that if A, B, C ∈ L(H), then:

• A is unitarily conjugate to itself.

• If A is unitarily conjugate to B, then B is unitarily conjugate to A.
• If A is unitarily conjugate to B and B is unitarily conjugate to C, then A is unitarily conjugate
to C.

Unitary conjugation allows us to change orthonormal bases. Suppose {e1 , . . . , en } and {f1 , . . . , fn }
are two orthonormal bases for H and let U be the unique unitary operator such that Uei = fi for
all 1 6 i 6 n. Suppose that A is some linear operator on H. We want to compare the matrix entries
of A with respect to the two different

bases.
With respect to the first basis (the e-basis), the (i, j)th
∗
entry of the matrix A is given by ei , Aej = ei Ae
j (or hi|A|ji using Dirac notation). With respect

to the second basis (the f-basis), the same entry is fi , Afj . Starting with this, we get

fi , Afj = Uei , AUej = ei , U∗ AUej .

The right-hand side is the (i, j)th entry of the matrix representing the operator U∗ AU with respect
to the e-basis.
To summarize, if MA and MA 0 are the matrices representing the operator A with respect to the

e- and f-bases respectively, then

0
MA = M∗U MA MU ,

23
where MU is the matrix representing the operator U with respect to the e-basis.
Thus, changing orthonormal basis amounts to unitary conjugation of the corresponding
matrices.

Back to Quantum Physics: The Double Slit Experiment. It’s been known since early in the
20th century that light comes in discrete packets (particles) called photons. People have observed
individual photons hitting a photoelectric detector (or a photographic plate) at specific times and
pinpoint locations, causing local electric currents in the detector (or dots to appear on the plate).
On the other hand, light also exhibits wavelike properties. In the double slit experiment, light
from a laser beam is shined on an opaque barrier with two small openings close to each other (on
the order of the wavelength of the light). A screen is placed on the other side of the barier. What you
see on the screen are alternating bands of light and dark—a standard interference pattern caused
by the light waves from the two slits interfering constructively and destructively with each other.
This is easily visible to the naked eye. If you block one of the slits, then the interference pattern
goes away and you just see a smoothly contoured, glowing blob on the screen (that depends on
the width of the slit).
Here is a plausible (though ultimately wrong) explanation in terms of photons: the photons
somehow are changing phase in time, and the photons that go through the top slit are interfering
with the photons going through the bottom slit.
Let’s see why this is wrong. Now alter the experiment as follows: Make the light source
extremely dim, so that it emits on average only one photon per second, and replace the screen with
a photographic plate (or photodetector) that will register where each photon hits. The photons
appear to hit the plate at random places, but if you run the experiment a long time (thousands or
millions of photons), you see that, statistically, the distribution of photon hits resembles the same
wavy interference pattern as before. That is, the probability of a photon hitting any given location
is proportional to the intensity of the light at that location in the original experiment.
We can’t say the photons are interfering with each other, since one photon goes through long
before the next one comes. The only explanation is that each photon is somehow passing through
both slits at the same time and interfering with itself on the other side. This cannot be explained at
all by classical physics, which asserts that the photon, being a particle, must travel through either
the upper slit or the lower slit, but not both. Indeed, if you put detectors at both slits, the photon
will only be detected at (at most) one slit or the other, not both.
Another thing that classical physics cannot explain is the random behavior of the photons at
the plate. You can send two identical photons, of exactly the same frequency and moving in exactly
the same direction, and they will wind up at different locations at the plate. So the behavior of the
photons is not deterministic but inherently random.
Quantum mechanics is needed to explain both these phenomena as follows: Each photon does
indeed correspond to a wave that goes through both slits, but the amplitude of this wave at any
location is related to the probability of the photon being at that location. These waves interfere with
each other and cause the interference pattern in the statistical distribution of photons at the plate.
So the two hallmarks of quantum mechanics are: (i) nondeterminism (inherent randomness)
and (ii) interference of probabilities. More later.

24
5 Week 3: Unitary conjugation

Invariance under Unitary Conjugation: Trace and Determinant. If A and U are n × n matrices
and U is unitary, then by Equation (4) of Exercise 2.8,

tr(UAU∗ ) = tr(U∗ UA) = tr(IA) = tr A.

In other words, the tr function is invariant under unitary conjugation, i.e., if matrices A and B are
unitarily conjugate, then their traces are equal. This means that the tr function is really a function
of the underlying operator and does not depend on which orthonormal basis you use to represent
the operator as a matrix. (In fact, it does not depend on any basis, orthonormal or otherwise.)
It’s worth looking at what the trace looks like in Dirac notation. If A is an operator and

1 . . . , en } is an orthonormal basis, then we know (letting |ii = ei for all i, as before) that [A]ij =
{e ,
ei , Aej = hi|A|ji for the matrix of A with respect to this basis. So,

X
n X X X
tr A = [A]ii = hei , Aei i = e∗i Aei = hi|A|ii , (13)
i=1 i i i

and this quantity does not depend on the particular orthonormal basis we choose.
Similarly, the determinant function det is also invariant under unitary conjugation. This follows
from the fact that det(AB) = det A det B and det(A−1 ) = (det A)−1 for any nonsingular A. For A
and U as above, we have

det(UAU∗ ) = det(UAU−1 ) = (det U)(det A)(det U)−1 = det A.

So like the trace, det is really a function of the operator and does not depend on the basis used to
represent the operator as a matrix.
Here are some other invariants under unitary conjugation. In each case, U is an arbitrary
unitary operator.

The adjoint. For any A, clearly (UAU∗ )∗ = UA∗ U∗ . (The adjoint of a conjugate is the conjugate
of the adjoint.)

Being Hermitean. If A is Hermitean, then (UAU∗ )∗ = UA∗ U∗ = UAU∗ , so UAU∗ is also Her-
mitean.

Being unitary. If A is unitary, then (UAU∗ )(UAU∗ )∗ = UAU∗ UA∗ U∗ = UAA∗ U∗ = UU∗ = I, so
UAU∗ is also unitary.

Orthogonal Subspaces, Projection Operators. Projection operators are important for under-
standing how measurements are made in quantum mechanics. They are also important because
they can serve as building blocks for more general operators. We will spend some quality time
with them.
Again, let H be an n-dimensional Hilbert space, and let V, W ⊆ H be subspaces of H. V and
W are mutually orthogonal if hv, wi = 0 for every v ∈ V and w ∈ W.

25
Exercise 5.1 Show that if V and W are mutually orthogonal, then no nonzero vector can be in
V ∩ W.

There is a natural one-to-one correspondence between the subspaces of H and certain linear
operators on H known as projection operators.

Definition 5.2 An (orthogonal) projection operator or projector on H is a linear map P ∈ L(H) such
that

1. P = P∗ , i.e., P is Hermitean, and

2. P2 = P, i.e., P is “idempotent.”

There are two trivial projection operators on H, namely, I (the identity) and 0 (the zero operator,
which maps every vector to 0). There are many nontrivial projection operators as well.

Exercise 5.3 Prove that an operator P ∈ L(H) is a projector if and only if P = P∗ P.

Exercise 5.4 Show that if P and Q are projection operators and PQ = 0, then QP = 0 as well, and
P + Q is a projection operator. [Hint: To show that QP = 0, take the adjoint of both sides of the
equation PQ = 0.]

Given a projection operator P on H, let V be the image of P, that is, V = img P := {Pv : v ∈ H}.
Then it is easy to check that V is a subspace of H, and we say that “P projects onto V.” Notice that
if u ∈ V then there is a v such that Pv = u, and so

Pu = PPv = Pv = u.

That is, P fixes every vector in V, and so clearly we also have V = {u ∈ H : Pu = u}.
Not only does P project onto V but it does so orthogonally. This means that P moves any vector
v perpendicularly onto V, or more precisely, hu, Pv − vi = 0 for any u ∈ V, where Pv − v is the vector
representing the net movement from v to Pv. To see that hu, Pv − vi = 0, we write u = Pw for some
w and just calculate:

hu, Pv − vi = hPw, Pv − vi = hPw, Pvi − hPw, vi = hP∗ Pw, vi − hPw, vi = hPw, vi − hPw, vi = 0.

Conversely, if V is any subspace of H, then there is a unique projection operator P that projects
orthogonally onto V as above. First I’ll show uniqueness: If P and Q are projectors that both
orthogonally project onto V, then for any v, w ∈ H we have

hPw, Pvi = hP∗ Pw, vi = P2 w, v = hPw, vi,

and
hPw, Qvi = hQ∗ Pw, vi = hQPw, vi = hPw, vi.

26
The last equation follows from the fact that Q fixes every vector in V, in particular, Q fixes Pw.
Putting these two facts together, we have

hPw, Pv − Qvi = hPw, Pvi − hPw, Qvi = hPw, vi − hPw, vi = 0.

Since w was chosen arbitrarily, this means that Pv − Qv is orthogonal to every vector of the form
Pw, i.e., every vector in V. But Pv − Qv is itself in V because both Pv and Qv are in V. Thus Pv − Qv
is orthogonal to itself, and this means that

0 = hPv − Qv, Pv − Qvi = kPv − Qvk2 ,

and so Pv − Qv = 0, hence Pv = Qv. Since v was chosen arbitrarily, we must have P = Q.

Now for existence. Let V be given. Choose some basis {b1 , . . . , bk } for V, which, by Gram-
Schmidt, we can assume is orthonormal. Here, k is the dimension of V, and 0 6 k 6 n. Extend
this basis for V to a basis B = {b1 , . . . , bn } for H, which (again by Gram-Schmidt) we can assume is
orthonormal. Now let P ∈ L(H) be the linear operator whose matrix (with respect to B) is given
by

• [P]ii = 1 for 1 6 i 6 k,

• [P]ii = 0 for k + 1 6 i 6 n, and

• [P]ij = 0 for i , j.

Thus P is given by a diagonal matrix where the first k diagonal entries are 1 and the rest are 0.
Clearly, P = P∗ and P2 = P, so P is a projector. Furthermore, P fixes each of the basis vectors
b1 , . . . , bk and so it fixes each vector in V. P annihilates all the other bk+1 , . . . , bn , and so Pv ∈ V
for all v ∈ H. Thus P projects orthogonally onto V.

Exercise 5.5 Let V be a subspace of H and let P be its corresponding projection operator. Show
that dim V = tr P. [Hint: Consider the matrix construction just above.]

Exercise 5.7 Find the 3 × 3 matrix for the projector P that projects orthogonally onto the two-
dimensional subspace of C3 spanned by v1 = (1, −1, 0) and v2 = (2, 0, i). P is the unique operator
satisfying: (i) P2 = P = P∗ , (ii) Pv1 = v1 , (iii) Pv2 = v2 , and (iv) tr P = 2. [Hint: If y1 and y2 are
orthogonal unit vectors, then y1 y∗1 + y2 y∗2 projects onto the subspace spanned by y1 and y2 . Use
Gram-Schmidt to find y1 and y2 given v1 and v2 . When you find P, check items (i)–(iv) above.]

Exercise 5.8 Let V and W be subspaces of H with corresponding projection operators P and Q,
respectively. Prove that V and W are mutually orthogonal if and only if PQ = 0. [Hint: For
the forward direction, consider kPQvk2 for any vector v ∈ H. For the reverse direction, consider
hPv, Qwi for any vectors v, w ∈ H, and move the P to the right-hand side of the bracket.]

27
If V is a subspace of H, we define the orthogonal complement of V (denoted V ⊥ ) to be
V ⊥ = {u ∈ H : (∀v ∈ V)[hu, vi = 0]}.
V ⊥ is clearly a subspace of H.

Exercise 5.9 Show that if V is a subspace of H with corresponding projection operator P, then I − P
is the projection operator corresponding to V ⊥ .

Exercise 5.10 Show that if V is a subspace of H, then H is the direct sum of V and V ⊥ (written
H = V ⊕ V ⊥ ), that is, every vector in H is the unique sum of a vector in V with a vector in V ⊥ .
[Hint: use the previous exercise and the fact that V ∩ V ⊥ = {0}.]

Definition 5.11 A complete set of orthogonal projectors, also called a decomposition of I, is a collection
{Pi : i ∈ I} of nonzero projectors on H such that

1. Pi Pj = 0 for all i, j ∈ I with i , j, and

P
2. i∈I Pi = I (the identity map).

Here, I is any finite set of distinct labels. We may have I = {1, . . . , k} for some k, but there are other
possibilities, including real numbers, or labels that are not numbers at all.

We will see later (Exercise 9.34) that condition (1) is actually redundant; it follows from condi-
tion (2).
Taking the trace of both sides of item (2), we get
X X
tr Pi = tr Pi = tr I = n.
i∈I i∈I

Since each Pi , 0, its trace is a positive integer (Exercise 5.5), so there can be at most n many
projection operators in any complete set, where n = dim H.
For each i ∈ I, let Vi be the subspace that Pi projects onto. By Exercise 5.8, the Vi are all
pairwise mutually orthogonal. Furthermore, the Vi together span all of H: for any v ∈ H,
X
v = Iv = Pi v, (14)
i∈I

but Pi v ∈ Vi for L
each i, so v is the sum of vectors from the Vi . Generalizing Exercise 5.10, one can
show that H = i∈I Vi is the direct sum of the Vi . That means that every v ∈ H is the sum of
unique vectors in the respective spaces Vi , and this sum is given by (14) above.
As a special case, if P projects onto a proper, nonzero subspace V of H, then {P, I − P} is a
complete set of projectors corresponding to the two subspaces V and V ⊥ .

Exercise 5.12 Let {Pi : i ∈ I} be a complete set of orthogonal projectors over H, and let v ∈ H be
any vector. Show by direct calculation that
X
kvk2 = kPa vk2 .
a∈I

28
Exercise 5.13 Suppose P and Q are projection operators on H projecting onto subspaces V ⊆ H
and W ⊆ H, respectively. Show that if P and Q commute, that is, if PQ = QP, then PQ is a
projection operator projecting onto V ∩ W.

Fundamentals of Quantum Mechanics. We now know enough math to present the fundamental
principles of quantum mechanics. For now, I will abide by the Copenhagen interpretation of quan-
tum mechanics first put forward by Niels Bohr. This is the best-known interpretation and is easy
to work with, albeit somewhat unsatisfying philosophically. Another well-known interpretation
is the Everett interpretation, a.k.a. the many-worlds interpretation or the unitary interpretation.
More on that later. There are still other interpretations, but there are no conflicts between any of
these interpretations; they all use the same math and lead to the same predictions. The differences
are merely philosophical.

Physical Systems and States. A physical system is some part of nature, for example, the position
of an electron orbiting an atom, the electric field surrounding the earth, the speed of a train, etc.
The last two are “macroscopic,” dealing with big objects with lots of mass, momentum, and energy.
Although in principle quantum mechanics covers all these systems, it is most conveniently applied
to microscopic systems like the first.
The most basic principle of quantum mechanics relevant to us is that to every physical system
S there corresponds a Hilbert space H = HS , called the state space of S.1 At any given point in time,
the system is in some state, which for now we can define as a unit vector |ψi ∈ H. (We will revise
this definition later on, but our revision does not invalidate anything we discuss until then.) The
state of the system may change with time, depending on the forces (internal and external) applied
to the system. We may write the state of the system at time t as |ψ(t)i.

Time Evolution of an Isolated System. Let’s assume that our system S is isolated, i.e., it is not
interacting with any other systems. The state of S evolves in time, but this evolution is linear in the
following sense: For any two times t1 , t2 ∈ R, there is a linear operator U = U(t2 , t1 ) ∈ L(H) such
that if the system is in the state |ψ(t1 )i at time t1 then at time t2 the system will be in the state

|ψ(t2 )i = U|ψ(t1 )i.

The operator U only depends on the system (its internal forces) and on the times t1 and t2 , but not
on the particular state the system happens to be in. That is, the single operator U describes how
the system evolves from any state at t1 to the resulting state at t2 . Note that t1 and t2 are arbitrary;
t2 does not necessarily have to come after t1 .
Since U maps states to states, it must be norm-preserving. From this one can show that it must
preserve the scalar product. That is, U must be unitary. Here are some other basic, intuitive facts:

1. U(t, t) = I for any time t. (If no time elapses, then the state has no time to change.)
1
In general, H may be infinite dimensional. The systems we care about, however, are all bounded, which means they
correspond to finite dimensional spaces.

29
2. U(t1 , t2 ) = U(t2 , t1 )−1 = U(t2 , t1 )∗ for all times t1 , t2 . (Tracing the evolution of the system
backward in time should undo the changes made by running the system forward in time.)

3. U(t3 , t1 ) = U(t3 , t2 )U(t2 , t1 ) for all times t1 , t2 , t3 . (Running the system from t1 to t2 and then
from t2 to t3 has the same effect on the state as running the system from t1 to t3 . Recall that
operator composition reads from right to left.)

(Item (2) actually follows from items (1) and (3).) If the system S is known, then U(t2 , t1 ) can be
computed with arbitrary accuracy, at least in principle. In many simple cases, U(t2 , t1 ) is known
exactly, and can even be controlled precisely by manipulating the system S. Controlling U is crucial
to quantum computation. We’ll see specific examples a bit later.

Projective Measurement. Now and then, we’d like to get information about the state of our
system S. It turns out that quantum mechanics puts severe limitations on how much information
we can extract, and disallows us from extracting this information in a purely passive way.
The standard way of getting information about the state of a system is by making an observation,
also called a measurement. These are terms of art which unfortunately don’t bear much intuitive
resemblance to their every-day meanings. A typical (and very general) type of measurement is a
projective measurement.2 If H is the Hilbert space of system S, then a projective measurement on
S corresponds to a complete set {Pk : k ∈ I} of orthogonal projectors on H. The elements of I
are the possible outcomes of the measurement. If the system is in state |ψi when the measurement
is performed, then the measurement will produce exactly one of the possible outcomes randomly
such that each outcome k ∈ I is produced with probability

Furthermore, immediately after the measurement, the state of the system will be

Pk |ψi Pk |ψi Pk |ψi

|ψk i = = p = p , (16)
kPk |ψik hψ|Pk |ψi Pr[k]

where k is the outcome of the measurement.

If each projector is 1-dimensional (i.e., tr Pk = 1 for every k), then such a measurement is called
a von Neumann measurement. Von Neumann measurements allow for the largest possible number
of outcomes: |I| = dim(H), the dimension of H.

6 Week 3: Projective measurements (cont.)

A number of points need to be emphasized and clarified.

• The outcome of the projective measurement is intrinsically random. You can prepare the
system S in the exact same state |ψi twice, perform the exact same projective measurement
2
There are other, more “general” types of measurement, but these can actually be implemented using projective
measurements on larger systems, so these other measurements really aren’t more general than projective measurements.

30
both times, and get different outcomes. The only things that we can predict from our
experiments are the statistics of the outcomes. If we know the state |ψi of the system when
the measurement is performed, then in principle we can compute Pr[k] for each outcome k,
and then if we run the same experiment many times (say a million times), then we can expect
to see outcome k occur about a Pr[k] fraction of the time. This is indeed what happens.
• There can be at most a finite, discrete number of possible outcomes associated with any
projective measurement—no more than dim(H) (at least for bounded systems).
• The probabilities defined by (15) are certainly nonnegative, but we need to check that they
sum to 1. We have
X X X
!
Pr[k] = hψ|Pk |ψi = hψ| Pk |ψi = hψ|I|ψi = hψ|ψi = k|ψik2 = 1,
k∈I k k

since |ψi is a unit vector.

• Performing a projective measurement in general disturbs the system being measured. The
measurement actually consists of an interaction between the system and the measuring
apparatus, and one cannot be affected without affecting the other. This disturbance of the
system being measured is not just a practical matter of us not building our instruments
delicate enough; it is a fundamental and unavoidable physical reality, sometimes referred to
“collapse of the wavefunction.”
• Suppose that we perform the measurement above on S in state |ψi and get outcome k, so
that the state becomes |ψk i = Pk |ψi/kPk |ψik as in (16), then we immediately repeat the same
measurement. The probability of getting any outcome j ∈ I from the second measurement is
Pj Pk |ψi 2

2
Pr[j] = kPj |ψk ik =
kP |ψik = δkj .

k

That is, we see the outcome k again with certainty, and the state immediately after the second
measurement is
Pk |ψk i |ψk i
= = |ψk i,
kPk |ψk ik k|ψk ik
unchanged from after the first measurement. So the first measurement changes the state to
be consistent with whatever the outcome is, so that repetitions of the same measurement will
always yield the same outcome (provided, of course, that the state does not evolve between
measurements).
• If |ψi is a state and θ ∈ R, then eiθ |ψi is also a state. The unit norm scalar eiθ is known as a
“phase factor.” Note that
1. if U is unitary, then obviously Ueiθ |ψi = eiθ U|ψi, and
2. for the projective measurement {Pk }k∈I above, the probability of seeing k when the
system is in state eiθ |ψi is
2
kPk eiθ |ψik = |eiθ |2 kPk |ψik2 = kPk |ψik2 ,
that is, the same for the state |ψi, and finally,

31
3. if outcome k occurs, then the state after the measurement is

Pk eiθ |ψi Pk |ψi

= eiθ = eiθ |ψk i.
kPk e |ψik
iθ kPk |ψik

This means that the phase factor just “goes along for the ride” and does not affect the statistics
of any projective measurment (or any other type of measurement, either). The state |ψi and
eiθ |ψi are physically indistinguishable, and so we can choose overall phase factors arbitrarily
in defining a state, or we are free to ignore them as we wish. More on this later.

We’ll now see how this all plays out for a two-dimensional system.

A Perfect Example: Electron Spin. Rotating objects possess angular momentum. The angular
momentum of an object is a vector in R3 that depends on the distribution of mass in the object
and how the object is rotating. For any given object, the length of its angular momentum vector is
proportional to the speed of the rotation (in revolutions per minute, say), and the vector’s direction
is pointing (roughly) along the axis of rotation in the direction given by the “right hand rule”: a
disk rotating counterclockwise in the x, y-plane has its angular momentum vector pointing in the
positive z-direction. A Frisbee thrown by a right-handed person (using the usual backhand flip)
rotates clockwise when viewed from above, so its angular momentum vector points down toward
the ground.
If a rotating object carries a net electric charge, then it has a magnetic moment vector that is
proportional to the angular momentum times the net charge. Shooting an object with a magnetic
moment through a nonuniform electric field imparts a force to the object, causing it to deflect and
change direction. The deflection force is along the axis given by the gradient of the electric field
and is proportional to the component of the magnetic moment along that gradent axis. You can
measure the component of the magnetic moment along the gradient axis this way by seeing the
amount of deflection.
Electrons deflect when shot through a nonuniform magnetic field as well, so they possess
magnetic moment. This can only mean that they have angular momentum as well, even though,
being elementary particles, they have no constituent parts that can rotate around one another. This
is just one of the many bizarre aspects of the microscopic world.
In the Stern-Gerlach experiment, randomly oriented electrons are shot through a nonuniform
electric field whose gradient is oriented in the +z-direction (vertically). According to classical
physics, we would expect the electrons to deflect by random amounts, causing a smooth up-down
spread in the beam. Instead, what we actually observe is the beam split into two sharp beams of
roughly equal intensity: one going up, the other going down (see Figure 1). So each electron only
goes up the same amount or down the same amount. This experiment amounts to a projective
measurement of the spin of an electron, at least in the z-direction. There are two possible outcomes:
spin-up and spin-down. It is natural then to model the physical system of electron
spin as a two-
1
dimensional Hilbert space, with an orthonormal basis {|↑i, |↓i}, where |↑i = is the spin-up
0

0
state and |↓i = is the spin-down state. We may also write |↑i and |↓i as |+zi and |−zi,
1

32
+z

|↑i
Spin up

Random spins

Spin down
Nonuniform field |↓i
with gradient in
the +z direction

Figure 1: Stern-Gerlach exeriment: The electron beam comes in from the left, passes through a
nonuniform field between the two probes, and splits into two beams. The field gradient is oriented
along the axis of the probes, which is here given by the +z-direction.

respectively, to make clear along what axis the spin is aligned. The projectors in the projective
measurement are then
1 0
P↑ = P+z = |↑ih↑| = ,
0 0
which projects onto the space spanned by |↑i and corresponds to the spin-up outcome, and

0 0
P↓ = P−z = |↓ih↓| = ,
0 1

which projects onto the space spanned by |↓i and corresponds to the spin-down outcome. As we’ll
see in a little bit, a two-dimensional Hilbert space actually suffices for modeling electron spin.

33
7 Week 4: Qubits

Qubits. In digital information processing, the basic unit of information is the bit, short for binary
digit. Each bit has two distinct states that we care about: 0 and 1. In quantum information
processing, we use bits as well, but we regard them as quantum systems that have two states |0i
and |1i that form an orthonormal basis for a two-dimensional Hilbert space. Such systems are
called quantum bits, or qubits for short. Any two-dimensional Hilbert space will do to model a qubit.
This is why it is useful to consider the electron spin example. In fact, electron spin is one proposed
way to implement a qubit: |↑i is identified with |0i and |↓i with |1i.3 We’ll return to the electron
spin example, but what we say applies generally to any system with a two-dimensional Hilbert
space (sometimes called a “two-level system”), which can then in principle be used to implement
a qubit. To emphasize this point, we’ll use |0i and |1i to stand for |↑i and |↓i, respectively, and we’ll
let the projectors P0 and P1 stand for P↑ and P↓ , respectively.

Back to Electron Spin. Using the Stern-Gerlach apparatus oriented in a particular direction, we
can prepare electrons to have spins in that direction. We simply retain one emerging beam and
discard the other. Figure 2 shows electrons being prepared to spin in one direction in the x, z-plane,
then measured in the +z-direction.
The general state of an electron spin is

|ψi = α|0i + β|1i.

That is, it is some linear combination of spin-up and spin-down. We would now like to determine
which linear combinations correspond to which spin directions (in 3-space). Since |ψi is a unit
vector, we have

1 = hψ|ψi
= (α∗ h0| + β∗ h1|)(α|0i + β|1i)
= α∗ αh0|0i + α∗ βh0|1i + β∗ αh1|0i + β∗ βh1|1i
= |α|2 + |β|2 .

Indeed, the probability of seeing |0i (spin-up) is

hψ|P0 |ψi = hψ|0ih0|ψi = α∗ α = |α|2 .

And similarly, the probability of seeing |1i (spin-down) is |β|2 . Since phase factors don’t matter,
we can assume from now on that α ∈ R and α > 0, because we can multiply |ψi by the right phase
factor, namely e−i arg(α) .
Now consider the state |↑θ i = α|0i + β|1i prepared by the apparatus on the left of Figure 2,
corresponding to a spin pointing at angle θ from the +z-axis in the +x direction (Cartesian co-
ordinates (sin θ, 0, cos θ), which has unit length). Here 0 6 θ 6 π. When it passes through the
vertical apparatus on the right, the beam splits into two beams whose intensities are proportional
3
Another system with a two-dimensional Hilbert space is photon polarization, where we can take as our basis the
state |↔i (horizontal polarization) and the state |li (vertical polarization). All other polarization states (e.g., slanted or
circular) are linear combinations of these two.

34
θ

trash

Figure 2: Electrons are prepared by the tilted apparatus on the left to spin at an angle θ from the
+z-axis. These are then fed into a vertical apparatus.

35
to their probabilities. According to classical mechanics, the average deflection is proportional to
the vertical component of the spin vector, i.e., cos θ. If quantum mechanics is to agree with classical
mechanics in the macroscopic limit, then the average deflection of the two beams must also be
cos θ. The deflection of the spin-up beam is +1, and the deflection of the spin-down beam is −1,
so the average deflection is

(+1) Pr[↑] + (−1) Pr[↓] = 2 Pr[↑] − 1 = 2h↑θ |P+z | ↑θ i − 1 = 2α2 − 1. (17)

This must be cos θ, so solving for α in terms of θ and remembering that α > 0, we get
r
1 + cos θ θ
α= = cos .
2 2

Since 0 6 |β|2 = 1 − α2 , we have |β| = sin(θ/2). Thus,

θ
β = eiϕ sin ,
2
for some real ϕ with 0 6 ϕ < 2π. In experiments, these relative intensities are actually observed.
It is worth mentioning at this point that for any α > 0 and β ∈ C such that α2 + |β|2 = 1, there
are 0 6 θ 6 π and 0 6 ϕ < 2π such that
θ
α = cos ,
2
θ
β = eiϕ sin ,
2
giving the general spin state as

θ θ
|ψi = cos |0i + eiϕ sin |1i.
2 2
Furthermore, θ and ϕ are uniquely determined by |ψi except when α = 0 or β = 0, in which case
θ = π or θ = 0, respectively, but ϕ is completely undetermined.
Now look at the case where θ = π/2, that is, the spin is pointing in the +x direction (to the
right). We get
π π |0i + eiϕ |1i
|+xi = ↑π/2 = cos |0i + eiϕ sin |1i =

√ .
4 4 2
We are free to adjust the phase factor of |1i to absorb the eiϕ above. That is, without changing the
physics, we redefine4
|1i := eiϕ |1i.
By the phase-adjustment we now get the “spin-right” state

|0i + |1i
|+xi = ↑π/2 = |→i =

√ .
2
4
Mathematicians may not like doing this, but physicists and computer scientists aren’t bothered by it.

36
The corresponding one-dimensional projector is
|0i + |1i

h0| + h1| 1
P+x = P→ = |+xih+x| = √ √ = (|0ih0| + |0ih1| + |1ih0| + |1ih1|),
2 2 2

1 1
which has matrix form (1/2) .
1 1
Now we consider the state |+yi representing spin in the +y direction. A +y spin has no
+z-component, so if |+yi is measured along the z-axis, we get Pr[↑] = Pr[↓] = 1/2, as with |+xi.
Thus,
|0i + eiϕ |1i

1 1
|+yi = √ = √ iϕ ,
2 2 e
for some 0 6 ϕ < 2π. If we now measure a +y spin in the +x direction, we should again get equal
probabilities of spin-left and spin-right, since the spin is perpendicular to x. Thus we should have

1 1 1 1 1 1 + cos ϕ
= Pr[→] = h+y|P→ |+yi = 1 e −iϕ
iϕ = .
2 4 1 1 e 2

So cos ϕ = 0, and it follows that ϕ ∈ {π/2, 3π/2} and so eiϕ = ±i. It does not matter which value
we choose; the math and physics is equivalent either way. So we’ll arbitrarily set ϕ := π/2, whence
we get
|0i + i|1i
|+yi = √ .
2
The corresponding projector is
|0i + i|1i

h0| − ih1| 1
P+y = |+yih+y| = √ √ = (|0ih0| − i|0ih1| + i|1ih0| + |1ih1|),
2 2 2

1 −i
which has matrix form (1/2) .
i 1
Let’s review:
|0i + |1i

1 1
|+xi = √ = √ , (18)
2 2 1
|0i + i|1i

1 1
|+yi = √ = √ , (19)
2 2 i

1
|+zi = |0i = . (20)
0
The corresponding projectors are

1 1 1 1
P+x = |+xih+x| = = (I + X),
2 1 1 2

1 1 −i 1
P+y = |+yih+y| = = (I + Y),
2 i 1 2

1 0 1
P+z = |+zih+z| = = (I + Z),
0 0 2

37
where

0 1
X = σx = σ1 = 2P+x − I = , (21)
1 0

0 −i
Y = σy = σ2 = 2P+y − I = , (22)
i 0

1 0
Z = σz = σ3 = 2P+z − I = . (23)
0 −1

X, Y, and Z are known as the Pauli spin matrices. More on them later.
Now consider a general spin state, written in terms of θ and ϕ:

cos(θ/2)
|ψi = |↑θ,ϕ i = cos(θ/2)|0i + e sin(θ/2)|1i =
iϕ
.
eiϕ sin(θ/2)

(Recall that 0 6 θ 6 π and 0 6 ϕ < 2π are arbitrary.) The direction of this spin is given by a vector
s = (xs , ys , zs ) in 3-space with Cartesian coordinates xs , ys , zs ∈ R. How do we find xs , ys , zs ? We
know that these values are the average deflections observed when the spin is measured in the +x,
+y, or +z axes, respectively. So generalizing Equation (17), we must have

xs = 2hψ|P+x |ψi − 1 = hψ|X|ψi = cos(θ/2) sin(θ/2)(eiϕ + e−iϕ ) = sin θ cos ϕ, (24)

ys = 2hψ|P+y |ψi − 1 = hψ|Y|ψi = cos(θ/2) sin(θ/2)(−i)(e iϕ
−e −iϕ
) = sin θ sin ϕ, (25)
zs = 2hψ|P+z |ψi − 1 = hψ|Z|ψi = cos (θ/2) − sin (θ/2) = cos θ.
2 2
(26)

Thus s is exactly the point on the unit sphere whose spherical coordinates are (θ, ϕ).5

Exercise 7.1 Verify Equations (24–26) using matrix multiplication and trig.
√
Exercise 7.2 What is the spin direction corresponding to the state ( 3|0i − |1i)/2? Express your
answer as simply as possible.

Exercise 7.3 What spin state corresponds to the direction s = (−2/3, 2/3, 1/3)? Express your
answer as simply as possible.

Exercise 7.4 (Very useful!) Show that if |ψi is a general spin state corresponding to the direction
s = (xs , ys , zs ) as described above, then

1
|ψihψ| = (I + xs X + ys Y + zs Z). (27)
2
The right-hand side is sometimes written as (1/2)(I + s · σ), abusing the dot product notation.
5
Each vector s on the unit sphere can be described using spherical coordinates, i.e., two angles θ and ϕ, where
0 6 θ 6 π is the angle between s and the +z axis (the “latitude” of s, measured down from the North Pole), and
0 6 ϕ < 2π is the angle one would have to swivel the x, z-plane counterclockwise around the +z axis until it hits s (the
“longitude” of s, measured east of Greenwich, i.e., east of the x, z-plane). If s has spherical coordinates (θ, ϕ), then its
Cartesian coordinates are (sin θ cos ϕ, sin θ sin ϕ, cos θ).

38
8 Week 4: Density operators

Density Operators. One problem with using a vector |ψi to represent a physical state is that the
vector carries more information than is physically relevant, namely, an overall phase factor. The
physically relevant portion of |ψi is really just the one-dimensional subspace that it spans (which
does not depend on any phase factors), or equivalently, the projector |ψihψ| that orthogonally
projects onto that subspace. For this and other reasons, one may define the state of a system
to be a one-dimensional projection operator ρ = |ψihψ| instead of a vector |ψi. This alternate
view of states is known as the density operator formalism, and ρ is known as a density operator or
density matrix. Besides the advantage of discarding the physically irrelevant phase information,
this formalism has other advantages that we will see later when we discuss quantum information
theory. For many of the tasks at hand, however, either formalism will suffice, and we will use both
as is convenient.
We need to describe the two basic physical processes that we have discussed—time evolution
and projective measurement—in terms of the density operator formalism.

Time evolution of an isolated system. In the original formalism, time evolution is described by
a unitary operator U such that any state |ψi evolves to a state U|ψi in the given interval of
time. In the new density operator formalism, the state ρ = |ψihψ| would evolve under U to
the new state
ρ 0 = UρU∗ . (28)
To see why this is so, we merely observe that the new state should be |ϕihϕ|, where |ϕi = U|ψi.
We get

ρ 0 = |ϕihϕ| = |ϕi|ϕi∗ = U|ψi(U|ψi)∗ = U|ψi|ψi∗ U∗ = U|ψihψ|U∗ = UρU∗ .

Projective measurement. Suppose we are given a complete set {Pk : k ∈ I} of projectors corre-
sponding to a projective measurement. In the original formalism, if the system is in state
|ψi before the measurement, then the probability of outcome k is hψ|Pk |ψi. Since this prob-
ability is physically relevant (we can collect statistics over many identical experiments), we
had better get the same probability in the new formalism: when the state of the system is
ρ = |ψihψ| before the measurement, the probability of outcome k is given by

Pr[k] = tr(Pk ρ) = hPk , ρi , (29)

where the right-hand side refers to the Hilbert-Schmidt inner product on L(H) (see Equa-
tion (11)). To see that this is the same as in the original formulation, we can use the form
of the trace given by Equation (13), where we choose an orthonormal basis {e1 , . . . , en } such
that e1 = |ψi. Letting |ii = ei for all i as before (and so |ψi = |1i), we then get

X
n X
tr(Pk ρ) = hi|Pk ρ|ii = hi|Pk |ψihψ|ii
i=1 i
X
= hi|Pk |1ih1|ii = h1|Pk |1i = hψ|Pk |ψi ,
i

39
which is the same as originally defined. Alternatively, we can use the commuting property
of the trace (Equation (4)) to get the same thing:

tr(Pk ρ) = tr(Pk |ψihψ|) = tr(hψ|Pk |ψi) = hψ|Pk |ψi .

Pk |ψihψ|Pk∗ Pk |ψihψ|Pk Pk ρPk Pk ρPk Pk ρPk

|ψk ihψk | = 2
= = = = .
kPk |ψik Pr[k] Pr[k] tr(Pk ρ) hPk , ρi

Thus, the post-measurement state after outcome k is given by

Pk ρPk Pk ρPk Pk ρPk

ρk = = = . (30)
Pr[k] tr(Pk ρ) hPk , ρi

Note that tr(Pk ρ) = tr(Pk2 ρ) = tr(Pk ρPk ), so the denominator in (30), i.e., the probability of
getting the outcome k, is the trace of the numerator. (Obviously, ρk is undefined if Pr[k] = 0,
but if that’s the case, we’d never see outcome k.)

hρ1 , ρ2 i = tr(ρ1 ρ2 ) = |hψ1 |ψ2 i|2 .

Properties of the Pauli Operators. The operators X, Y, Z defined in (21–23) play a prominent
role in quantum mechanics and quantum informatics. Here we’ll present their most important
properties in one place for ease of reference. All of these facts are easy to verify, and we leave that
for the exercises.

1. X2 = Y 2 = Z2 = I.

2. (a) XY = −YX = iZ.

(b) YZ = −ZY = iX.
(c) ZX = −XZ = iY.

3. X, Y, and Z are all both Hermitean and unitary.

4. tr X = tr Y = tr Z = 0.

5. det X = det Y = det Z = −1.

Exercise 8.2 Verify all the above equations.

Note that there is a cyclic symmetry among the Pauli matrices. If we simultaneously substitute
X 7→ Y, Y 7→ Z, and Z 7→ X everywhere in the equations above, we get the same equations. We
won’t pursue it here, but you can use the Pauli operators to represent the quaternions H.

40
The four 2 × 2 matrices I, X, Y, Z (also denoted σ0 , σ1 , σ2 , σ3 , respectively) form a basis for the
space L(C2 ) of all operators over C2 (i.e., 2 × 2 matrices over C). That is, for any 2 × 2 matrix A,
there are unique coefficients a0 , a1 , a2 , a3 ∈ C such that

X
3
A = a0 I + a1 X + a2 Y + a3 Z = ai σi . (31)
i=0

The coefficients can often be found by inspection, but there is a brute force method to find them:

Exercise 8.3

1. Verify that the set

I X Y Z
√ , √ , √ , √
2 2 2 2
is an orthonormal basis for L(C2 ) (with the Hilbert-Schmidt norm, of course).
2. Show that if A is given by Equation (31), then
1 1
ai = tr(σi A) = hσi , Ai
2 2
for all 0 6 i 6 3. [Hint: Use the previous item.] This implies that

σi , σj = 2δij (32)
for all i, j ∈ {0, 1, 2, 3}.

Exercise 8.4 Show that if A = xX + yY + zZ for real numbers x, y, z such that x2 + y2 + z2 = 1, then
A2 = I. Thus A is both Hermitean and unitary.

Single-Qubit Unitary Operators. In this topic, we show that applying any unitary operator to
a one-qubit system amounts to a rigid rotation in R3 , and conversely, any rigid rotation in R3
corresponds to a unitary operator. We’ve seen that a general one-qubit state can be written, up to
an overall phase factor, as
|ψi = |↑θ,ϕ i = cos(θ/2)|0i + eiϕ sin(θ/2)|1i,
for some 0 6 θ 6 π and some 0 6 ϕ < 2π, and that this state corresponds uniquely (and vice
versa) to the point s on the unit sphere in R3 with spherical coordinates (θ, ϕ) (and thus with
Cartesian coordinates (xs , ys , zs ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ)). (Think of s as the spin direction
of an electron, for example.) The unit sphere in question here is known as the Bloch sphere. We’ll
now show that the action of a unitary operator U on one-qubit states amounts to a rigid rotation
SU of the Bloch sphere.
It’s slightly more convenient to work in the density operator formalism, using Equation (27).
Given any one-qubit unitary operator U, we define the map SU from the Bloch sphere onto itself
as follows: For any point s = (xs , ys , zs ) on the Bloch sphere (s is a vector in R3 of length 1), let
1
ρs = (I + xs X + ys Y + zs Z)
2

41
be the corresponding one-qubit state, according to Equation (27). Then let

ρt = Uρs U∗

be the state obtained by evolving the system in state ρs by U. The state ρt can be written as

1
ρt = (I + xt X + yt Y + zt Z),
2
for some unique t = (xt , yt , zt ) on the Bloch sphere. We now define SU (s) to be this t.6
It is immediate from the definition that for unitaries U and V we have SUV = SU SV .
To show that SU rotates the sphere rigidly, we first show that SU preserves dot products of
vectors on the Bloch sphere, that is, SU (r) · SU (s) = r · s for any r and s on the Bloch sphere. This
implies that SU is a rigid map of the Bloch sphere onto itself, but it does not imply that SU is a
rotation, because SU might be orientation-reversing, e.g., a reflection. We’ll see that SU preserves
orientation (aka chirality, aka “handedness”), so that it must be a rotation.7
Let r = (r1 , r2 , r3 ) and s = (s1 , s2 , s3 ) be any two points on the Bloch sphere, with corresponding
P P
states ρr = (1/2) 3i=0 ri σi and ρs = (1/2) 3j=0 sj σj as above, where we define r0 = s0 = 1. Recall
that the dot product of r and s is r·s = r1 s1 +r2 s2 +r3 s3 . Let’s compute hρr , ρs i using Equation (32):

1X 1X
3 3
* +
hρr , ρs i = ri σi , sj σj
2 2
i=0 j=0
1X
1X
= ri sj σi , σj = ri sj (2δij )
4 4
i,j i,j

1X X
3 3
!
1
= ri si = 1+ ri si
2 2
i=0 i=1
1+r·s
= ,
2
so
r · s = 2hρr , ρs i − 1 . (33)
Since r and s were arbitrary, we should also have

SU (r) · SU (s) = 2 ρSU (r) , ρSU (s) − 1 ,

but now,

ρSU (r) , ρSU (s) = tr((Uρr U∗ )(Uρs U∗ )) = tr(Uρr ρs U∗ ) = tr(ρr ρs ) = hρr , ρs i ,

and so SU (r) · SU (s) = r · s as we wanted.

6
There’s no reason that we have to restrict the vector s to be on the Bloch sphere. We can define SU in precisely the
same way for any s ∈ R3 , giving a map from all of R3 to R3 . From the sequel, it will be evident that this is a linear map.
7
A linear map A from Rn to Rn preserves orientation iff det A > 0, and it reverses orientation iff det A < 0.

r · s = 2|hψr |ψs i|2 − 1.

Thus hψr |ψs i = 0 iff r · s = −1. In other words, qubit states that are orthogonal in the Hilbert space
correspond to antipodal—or opposite—points on the Bloch sphere. We kind of knew this already,
since the two possible outcomes of the Stern-Gerlach spin measurement (in any direction) are
opposite spins (e.g., |↑i and |↓i), and must (as with any projective measurement) correspond to
orthogonal states.
Before showing that SU must preserve orientation, we’ll show that for any rigid rotation S of
the Bloch sphere, there is a unitary U such that S = SU . Geometrically, any rotation S of the unit
sphere can be decomposed into a sequence of three simple rotations:

1. a counterclockwise rotation Sz (ψ) about the +z axis through an angle ψ where 0 6 ψ < 2π,

2. a counterclockwise rotation Sy (θ) about the +y axis through an angle θ where 0 6 θ 6 π,

and

3. another counterclockwise rotation Sz (ϕ), about the +z axis, this time through an angle ϕ
where 0 6 ϕ < 2π.

The last two rotations have the effect of moving the North Pole (i.e., the point (0, 0, 1)) to an
arbitrary point on the sphere (with spherical coordinates (θ, ϕ)) in a standard way. The only
remaining freedom left in choosing S is an initial rotation that fixes the North Pole, i.e., the first
rotation above. The three angles ϕ, θ, ψ are uniquely determined by S (except when θ = 0 or
θ = π), and are called the Euler angles of S.
So S = Sz (ϕ)Sy (θ)Sz (ψ), and so to implement S, we only need to find unitaries for rotations
around the +z and +y axes. For any angle ϕ, define

e−iϕ/2

0
Rz (ϕ) = . (34)
0 eiϕ/2

Rz (ϕ) is obviously unitary, and

Rz (ϕ)(α|0i + β|1i) = e−iϕ/2 α|0i + eiϕ/2 β|1i ∝ α|0i + eiϕ β|1i,

and so if U = Rz (ϕ), then SU = Sz (ϕ). (Here and elsewhere, we use the expression A ∝ B to mean
that A and B may differ only by an overall phase factor, i.e., there exists an angle ω ∈ R such that
A = eiω B.) For any angle θ, define

cos(θ/2) − sin(θ/2)
Ry (θ) = . (35)
sin(θ/2) cos(θ/2)

Ry (θ) is unitary, and it is straightforward to show that if U = Ry (θ), then SU = Sy (θ). Thus any
rotation S can be realized as SU for some unitary U. Later, we will see a direct way of translating

43
between a 1-qubit unitary U and its corresponding rotation SU . For completeness, we define a
unitary corresponding to rotation of ϕ counterclockwise about the x-axis:

cos(ϕ/2) −i sin(ϕ/2)
Rx (ϕ) = Ry (π/2)Rz (ϕ)Ry (−π/2) = . (36)
−i sin(ϕ/2) cos(ϕ/2)

Now to show that SU must preserve orientation, we show that the orientation-reversing map
M that maps each point (x, y, z) on the Bloch sphere to its antipodal point (−x, −y, −z) is not of
the form SU for any unitary U. This suffices, because if S is any orientation-reversing rigid map
of the Bloch sphere, then S−1 is also rigid and orientation-reversing, which means that the map
T = MS−1 is orientation-preserving and hence a rotation. Therefore, T = SV for some unitary V,
as we just showed. But then M = T S, and so if we assume that S = SW for some unitary W, we
then have M = SV SW = SVW , a contradiction.
Suppose M = SU for some unitary U. Then, since U must reverse the directions of all spins,
we must have, for example, U|0i = U|+zi ∝ |−zi = |1i and U|1i = U|−zi ∝ |+zi = |0i. Expressing
U as a matrix in the {|+zi, |−zi} basis, we must then have

0 eiσ

U=
eiτ 0

for some σ, τ ∈ R. Now consider the state

1 1 1
|ψi = √ (|0i + ei(τ−σ)/2 |1i) = √ ,
2 2 ei(τ−σ)/2

which corresponds to some point p on the equator of the Bloch sphere. We have

0 eiσ ei(τ+σ)/2 ei(τ+σ)/2

1 1 1 1
U|ψi = √ = √ = √ ∝ |ψi.
2 eiτ 0 ei(τ−σ)/2 2 eiτ 2 ei(τ−σ)/2

So U does not change the state |ψi more than by a phase factor, and thus SU leaves the point p
fixed, which means that SU , M, a contradiction.8
Example. X, Y, and Z are unitary, so what are SX , SY , and SZ ? It’s easy to check that X = iRx (π),
Y = iRy (π), and Z = iRz (π), and so (since phase factors don’t matter) SX , SY , and SZ are rotations
about the x-, y-, and z-axes, respectively, through the angle π (half a revolution).

Exercise 8.5 Prove the claim that if U = Ry (θ), then SU = Sy (θ). [Hint: First check that
Ry (θ)|+yi ∝ |+yi, and thus SU fixes the point (0, 1, 0) where the Bloch sphere intersects the +y-
axis. Then check that Ry (θ)|+zi = Ry (θ)|0i ∝ cos(θ/2)|0i + sin(θ/2)|1i = |↑θ,0 i, so SU moves the
point (0, 0, 1) to the point (sin θ, 0, cos θ). Finally, check that Ry (θ)|+xi = Ry (θ)↑π/2,0 ∝ cos(θ/2+

π/4)|0i + sin(θ/2 + π/4)|1i = ↑θ+π/2,0 , so SU moves the point (1, 0, 0) to (cos θ, 0, − sin θ).]
8
This result has physical significance. It says that there is no single physical process that can reverse the spin of any
isolated electron.

44
A direct translation. Here we give (without proof) a direct way to pass between the 2 × 2 unitary
matrix U and the corresponding rotation SU (a 3 × 3 real matrix), and vice versa, using the Pauli
matrices. First we give some basic facts about rotations of Rn in general and of R3 in particular
(some of which we’ve seen before). You can skip this list if you want.

1. An n×n matrix S with real entries gives a rigid (length- and angle-preserving) transformation
of Rn iff
ST S = I , (37)
or equivalently, SST = I. (Here ST denotes the transpose of S, and I denotes the n × n identity
matrix.) In this case, we also say that S is an orthogonal matrix. Notice that, since S is real,
ST = S∗ , and so S is orthogonal iff S is unitary.
2. Any orthogonal matrix has determinant ±1. If the determinant is +1, then the transformation
is orientation-preserving (i.e., a rotation); otherwise it is orientation-reversing.
3. Let S be a real n × n matrix. If S is a rotation of Rn , then S = Sc , 0. (Here, Sc denotes the
cofactor matrix of S.) The converse also holds if n > 2.
4. If n is odd, then every rotation S of Rn has 1 as an eigenvalue. That is, S fixes some nonzero
vector n̂ ∈ Rn . Thus if n = 3, then S moves points around some fixed axis (through the
vector n̂ ∈ R3 ) counterclockwise through some angle ψ ∈ [0, π]—when viewing the origin
from n̂. (If you want π < ψ < 2π, then this is just the same as a counterclockwise rotation
around −n̂ through angle 2π − ψ, which is in the interval [0, π].)
5. If S is a rotation of R3 , then −1 6 tr S 6 3, and the angle of rotation (described above) is
given by
−1 tr S − 1
ψ = cos . (38)
2
In particular, tr S = 3 just when ψ = 0, that is, when S = I. Also, tr S = −1 just when ψ = π.
In either of these two special cases, S2 = I, which implies that S is a symmetric matrix,
because S = S−1 = ST . If 0 < ψ < π (the general case), then S is not symmetric.
6. If S is as above and the angle of rotation ψ satisfies 0 < ψ < π, then the vector n̂ is unique
up to multiplication by a positive scalar. It can be chosen to be
n̂ = ([S]32 − [S]23 , [S]13 − [S]31 , [S]21 − [S]12 ) . (39)
p
The norm of this particular vector is kn̂k = (1 + tr S)(3 − tr S) = 2 sin ψ.

From unitaries to rotations. First, given a 2 × 2 unitary U, the corresponding rotation is given
by the matrix
hX, UXU∗ i hX, UYU∗ i hX, UZU∗ i
   
hXU, UXi hXU, UYi hXU, UZi
1 ∗ ∗ ∗ 1
SU = hY, UXU i hY, UYU i hY, UZU i =  hYU, UXi hYU, UYi hYU, UZi  ,
2 2
hZ, UXU∗ i hZ, UYU∗ i hZ, UZU∗ i hZU, UXi hZU, UYi hZU, UZi
recalling that hA, Bi = tr(A∗ B) is the Hilbert-Schmidt inner product on L(C2 ). That is, for all
i, j ∈ {1, 2, 3},
1
1

[SU ]ij = σi , Uσj U∗ = σi U, Uσj .

2 2

45
Exercise 8.6 Find the 3 × 3 matrix SU where

1 3 −4i
U= .
5 −4i 3

Also find n̂ (exact expression) and ψ (decimal approximation to three significant digits).

From rotations to unitaries. Now given some rotation S of R3 (S is a 3 × 3 real matrix), we find
a U such that S = SU . There are two cases:

• If tr S , −1, then S = SU for exactly those U satisfying

1
U ∝ √ [(1 + tr S)I + i(([S]23 − [S]32 )X + ([S]31 − [S]13 )Y + ([S]12 − [S]21 )Z)]
2 1 + tr S
i(n̂ · σ)
= (cos(ψ/2))I − = (cos(ψ/2))I − i sin(ψ/2) (m̂ · σ) = e−iψ(m̂·σ)/2 ,
4 cos(ψ/2)

where ψ and n̂ are given by Equations (38) and (39), √ respectively, and m̂ := n̂/kn̂k is the
normalized version of n̂. Here we use the fact that 1 + tr S = 2 cos(ψ/2). I’ll explain the
last equation in the chain more fully next time. These expressions give the unique U with
positive trace such that SU = S (and in addition, det U = 1).

• If tr S = −1, then any one of the following three alternatives is a characterization of all U
such that S = SU , provided it is well-defined:

([S]11 + 1)X + [S]21 Y + [S]31 Z

U ∝ p ,
2([S]11 + 1)
[S]12 X + ([S]22 + 1)Y + [S]32 Z
U ∝ p ,
2([S]22 + 1)
[S]13 X + [S]23 Y + ([S]33 + 1)Z
U ∝ p .
2([S]33 + 1)

The i-th expression above (for i ∈ {1, 2, 3}) is well-defined iff [S]ii , −1. This is true for at
least one of the three for any rotation S of R3 with trace −1.

Exercise 8.7 Find a U such that

 
−151 24 72
1 
SU = −24 137 −96  .
169
−72 −96 −119

Also give n̂ and cos ψ.

46
9 Week 5: Linear Algebra: Exponential Map, Spectral Theorem, etc.

The Exponential Map (Again). Equation (2) defines ez via a power series for all scalars z. We
can use the same power series to extend the definition to operators.

Definition 9.1 Let A be an operator in L(H) or an n × n matrix. Define

∞
X
A2 A3 Ak Ak
eA = I + A + + + ··· + + ··· = . (40)
2! 3! k! k!
k=0

(A0 = I by convention.)

If A is an operator or matrix, then so is eA . The sum in (40) converges absolutely9 for all A.
The exponential map has may useful properties. Here’s one of the most useful, which generalizes
the familiar rule that ez1 +z2 = ez1 ez2 for scalars z1 , z2 .

Proposition 9.2 If operators A and B commute (i.e., AB = BA), then

eA+B = eA eB .

Proof. This closely mirrors the standard proof for scalars. We manipulate the power series directly.
Since A commutes with B, we can expand and rearrange factors in the expression (A+B)k to arrive
at an operator version of the Binomial Theorem:

X
k
k
(A + B)k = Aj Bk−j ,
j
j=0

for all integers k > 0. So,

∞
X (A + B)k X 1 X k X 1 X
k k
A+B k!
e = = Aj Bk−j = Aj Bk−j
k! k! j k! j!(k − j)!
k=0 k j=0 k j=0

XX
k
Aj Bk−j X X Aj B`
= = (setting ` := k − j)
j!(k − j)! j!`!
k j=0 k j,`>0 & j+`=k
 
∞ X
X ∞ ∞ ∞
Aj B` X Aj  X B`
!
= = = eA eB .
j!`! j! `!
j=0 `=0 j=0 `=0

2
We’ll leave the other properties of eA as exercises.
9
We won’t delve deeply into what it means for an infinite sequence of operators A1 , A2 , . . . to converge (absolutely
or otherwise). One easy way to express the notion of convergence (among several equivalent ways) is to say that there
exists an operator A such that for all vectors v, the sequence of vectors A1 v, A2 v, . . . converges to Av. The operator A,
if it exists, must be unique, and we write A = limn→∞ An . Convergence of an infinite series of operators is equivalent
to the convergence of the sequence of partial sums, as usual. Absolute convergence, which we don’t bother to define
here, implies that you can regroup and rearrange terms in the sum freely without worry.

47
Exercise 9.3 Verify the following for any operators or square matrices A and B and any θ ∈ R:

1. e0 = I, where 0 is the zero operator. [Hint: Inspect the power series.]

2. e−A = (eA )−1 . [Hint: Use the previous item and Proposition 9.2.]
∗
3. eA = (eA )∗ . [Hint: You may use the fact that the adjoint of an infinite (convergent) sum is
the sum of the adjoints. We know this already for finite sums.]
4. If A is Hermitean, then eiA is unitary. [Hint: Use the previous two items and the fact that
(iA)∗ = −iA.]
5. A commutes with eA . [Hint: Inspect the power series.]
6. If A and B commute, then so do eA and eB . [Hint: Use Proposition 9.2.]
∗
7. If U is unitary, then eUAU = UeA U∗ . [Hint: Inspect the power series.]
8. If A2 = I (think Pauli matrices!), then eiθA = (cos θ)I + i(sin θ)A. This is analogous to
Exercise 2.3. [Hint: Inspect the power series.]
9.
Rx (θ) = e−iθX/2 ,
Ry (θ) = e−iθY/2 ,
Rz (θ) = e−iθZ/2 ,
where Rx (θ), Ry (θ), and Rz (θ) are defined by Equations (34–36). It then follows from
Proposition 9.2 that Rx (θ + ϕ) = Rx (θ)Rx (ϕ) for all θ, ϕ ∈ R, and similarly for Ry (·) and
Rz (·). [Hint: Use the previous item.]

Exercise 9.4 (Challenging) Let n̂ = (x, y, z) ∈ R3 such that x2 +y2 +z2 = 1, and let A = xX+yY +zZ.
Then A2 = I by Exercise 8.4. For angle ω ∈ R, define
Rn̂ (ω) = e−iωA/2 = (cos(ω/2))I − i(sin(ω/2))A.
Show that if U = Rn̂ (ω), then SU is a rotation of the Bloch sphere about the axis through n̂
counterclockwise through angle ω. [Hint: Observe that rotating around n̂ through angle ω is
equivalent to

1. rotating the sphere so that n̂ coincides with (0, 0, 1) on the +z-axis, then
2. rotating around the +z-axis counterclockwise through angle ω, then
3. undoing the rotation in item 1 above, which moves (0, 0, 1) back to n̂.

(Let (θ, ϕ) be the spherical coordinates of n̂. To achieve the first rotation, first rotate around +z
through angle −ϕ to bring n̂ into the x, z-plane, then rotate around +y through angle −θ.) Now
verify via direct matrix multiplication that
Rn̂ (ω) = Rz (ϕ)Ry (θ)Rz (ω)Ry (−θ)Rz (−ϕ).
This decomposition is known as the S3 parameterization of Rn̂ (ω).]

We need another linear algebraic detour.

48
Upper Triangular Matrices and Schur Bases. In the next topic, we’ll be talking about basis-
independent properties of operators, but we will occasionally need to introduce an orthonormal
basis so that we can talk about matrices, and although all such bases are equivalent, some are more
convenient than others. If A ∈ L(H) is an operator, a Schur basis for A is an orthonormal basis
with respect to which A is represented by an upper triangular matrix, i.e., an n × n matrix M whose
entries below its diagonal are all zero: [M]ij = 0 if i > j. Upper triangular matrices have many
nice properties, so we’ll sometimes choose a Schur basis when it is convenient. Particularly in this
section, we will derive some facts about operators using a Schur basis. Theorem B.5 in Section B.2
shows that we can always choose a Schur basis:

Theorem 9.5 (Theorem B.5 in Section B.2) Every n × n matrix is unitarily conjugate to an upper
triangular matrix. That is, for every n × n matrix M, there is an upper triangular T and unitary U (both
n × n matrices) such that M = UT U∗ .

Thus a Schur basis always exists for any linear operator. The proof of Theorem 2.1 uses the fact
that every operator has an eigenvalue, which we’ll discuss in the next topic.
One key property of an upper triangular matrix is that its determinant is just the product of its
diagonal entries: if T is upper triangular, then

Y
n
det T = [T ]ii . (41)
i=1

Exercise 9.6 Show that if A and B are both upper triangular matrices, then so is AB, and for each
1 6 i 6 n, we have [AB]ii = [A]ii [B]ii , that is, the diagonal entries just multiply individually.

Exercise 9.7 Show that if A is a nonsingular, upper triangular matrix, then A−1 is upper triangular.
What are the diagonal entries of A−1 in terms of those of A?

Exercise 9.8 Show that if A is upper triangular, then so is eA , and we have [eA ]ii = e[A]ii for all
1 6 i 6 n. [Hint: Use the results of Exercise 9.6 and Equation (40) defining eA .]

Exercise 9.9 (One of my favorites.) Show that if A is any operator, then det eA = etr A . [Hint: Pick
a Schur basis for A, then use the previous exercise and Equation (41).]

Lower triangular matrices are defined analogously and have similar properties. A matrix is
diagonal if it is both upper and lower triangular, i.e., all its nondiagonal entries are zero.

Eigenvectors, Eigenvalues, and the Characteristic Polynomial. Let A ∈ L(H) be an operator. A

nonzero vector v ∈ H such that Av = λv, where λ ∈ C, is called an eigenvector of A, and λ is its
corresponding eigenvalue. Likewise, a scalar λ is an eigenvalue of A if it is the eigenvalue of some
eigenvector of A. If λ is an eigenvalue of A, then we have 0 = Av − λv = (A − λI)v, for some
nonzero vector v. That is, the operator A − λI maps the nonzero vector v to 0, which means that
A − λI is singular, which in turn means that det(A − λI) = 0. Conversely, if λ is a scalar such that

49
det(A − λI) = 0, then A − λI is singular, and so it maps some nonzero vector v to 0, and so we have
(A − λI)v = 0, or equivalently, Av = λv. Thus v is an eigenvector of A with eigenvalue λ. Thus the
eigenvalues of A are exactly those scalars λ such that det(A − λI) = 0.
Let’s write A − λI in matrix form with respect to some (any) orthonormal basis. Setting
aij = [A]ij , we get  
a11 − λ a12 ··· a1n
 a21 a22 − λ · · · a2n 
A − λI =  ,
 
.. .. .. ..
 . . . . 
an1 an2 · · · ann − λ
where n = dim H. Fixing all the aij to be constant and considering λ to be a variable, one can
show that det(A − λI) is a polynomial in λ with degree n. This is the characteristic polynomial of A,10
and we denote it charA (λ). From our considerations above, the eigenvalues of A are precisely the
roots of the polynomial charA . Since C is algebraically closed (see the second lecture), charA has n
roots, and so A has exactly n eigenvalues, not necessarily all distinct. The (multi)set of eigenvalues
of A is known as the spectrum of A.

Exercise 9.10 We know that charA is basis-independent because it is defined in terms of basis-
independent things. Show directly that

charA (λ) = charUAU∗ (λ),

for any operator A, unitary operator U, and scalar λ.

The fact that A has at least one eigenvalue is a key ingredient in the proof that A has a Schur
basis (Theorem B.5 in Section B.2), as well as in the proof of the Spectral Theorem, below. So now
that we know that a Schur basis for A really exists, let’s assume that we chose a Schur basis for
A above, and so aij = 0 for all i > j, and hence A − λI is also upper triangular. So taking the
determinant, which is just the product of the diagonal entries, we get

Y
n
charA (λ) = det(A − λI) = (aii − λ). (42)
i=1

From (42) it is clear that the eigenvalues of A—the roots of charA —are exactly a11 , . . . , ann . This
is true because we chose a basis making the matrix representing A upper triangular, but from this
we get two useful, basis-independent facts: If λ1 , . . . , λn are the eigenvalues of A counted with
multiplicities, then
Pn
• tr A = i=1 λi , and
Qn
• det A = i=1 λi .
10
The characteristic polynomial of A is often defined instead as det(λI − A), which, among other things, guarantees
that its leading coefficient is 1. The two definitions coincide for even n, but for odd n, one is the negation of the other.
But in any case, both polynomials have the same roots, which is the important thing.

50
Some of the coefficients of the polynomial charA are familiar. If we expand (42) and group
together powers of (−λ), we get
charA (λ) = (−λ)n + (a11 + a22 + · · · + ann )(−λ)n−1 + · · · + a11 a22 · · · ann (43)
n n−1
= (−λ) + (tr A)(−λ) + · · · + det A. (44)
The constant term is det A, which can also be seen by noting that this term is
charA (0) = det(A − 0I) = det A.

3 −1
Exercise 9.11 Find the eigenvalues of the 2 × 2 matrix A = . Find eigenvectors corre-
4 −2
sponding to each eigenvalue.

What are the eigenvalues of A∗ in terms of those of A? We have,

charA∗ (λ) = det(A∗ − λI) = det((A − λ∗ I)∗ ) = (det(A − λ∗ I))∗ = (charA (λ∗ ))∗ ,
and so charA∗ (λ) = 0 if and only if charA (λ∗ ) = 0. Thus, the eigenvalues of A∗ are the complex
conjugates of the eigenvalues of A.

Eigenvectors and Eigenvalues of Normal Operators. Suppose A is a Hermitean operator. Then

for any eigenvector v of A with eigenvalue λ, we have
λhv, vi = hv, λvi = hv, Avi = hA∗ v, vi = hAv, vi = hλv, vi = λ∗ hv, vi.
Since hv, vi = kvk2 > 0, we get λ = λ∗ . That is, A has only real eigenvalues. If λ1 and λ2 are distinct
eigenvalues of A associated with eigenvectors v1 and v2 , respectively, then
λ2 hv1 , v2 i = hv1 , Av2 i = hAv1 , v2 i = λ∗1 hv1 , v2 i = λ1 hv1 , v2 i.
Thus if λ1 , λ2 , then this can only be because hv1 , v2 i = 0. In other words, eigenvectors of a
Hermitean operator with distinct eigenvalues must be orthogonal.
An eigenbasis for an operator A is an orthonormal basis of eigenvectors of A. In such a basis,
A is given by a diagonal matrix. If A is Hermitean, then A has an eigenbasis. This can be proved
directly by a routine induction on the dimension of A, but it also a special case of a more general
result.
An operator (or matrix) A is normal if it commutes with its adjoint, i.e., AA∗ = A∗ A. It follows
that a normal matrix must be square. Note that normality of matrices is unitarily invariant (if
M is normal then any unitary conjugate of M is normal), and hence independent of the choice
of orthonormal basis. This can be verified directly, or just by observing that normality is defined
in terms of properties of the operator A itself, and is therefore basis-independent. Notice that all
Hermitean operators, all unitary operators, and all operators represented by diagonal matrices (in
some basis) are normal. The next theorem suggests that normal operators are especially important.

Theorem 9.12 (Spectral Theorem for Normal Operators) Every normal operator has an eigenbasis.
That is, if A ∈ L(H) is normal, then there is an orthonormal basis with respect to which A is represented
by a diagonal matrix whose diagonal elements are the eigenvalues of A.

51
In the rest of this section, we prove this theorem and explore some of its consequences. Sec-
tion B.2 in Appendix B includes a proof of the Spectral Theorem using a Schur basis for A.11 The
proof of the Spectral Theorem we give here is independent from that and does not need a Schur
basis for A. We first prove some technical facts about normal matrices and operators.

Lemma 9.13 Let M be an n × n matrix, given in block form as

A B
M= ,
C D

where A and D are square matrices. If M is normal, then

hB, Bi = hC, Ci .

Proof. We have, ignoring all but the top-left block,

∗
A C∗
∗
A A + C∗ C

∗ A B ···
M M = = ,
B∗ D∗ C D ··· ···
∗ ∗ AA + BB∗
∗

A B A C ···
MM∗ = = .
C D B∗ D∗ ··· ···

M normal means M∗ M = MM∗ , so equating the top-left blocks, we have

A∗ A + C∗ C = AA∗ + BB∗ .

Taking the trace of both sides and using the properties of the trace gives

tr A∗ A + tr C∗ C = tr AA∗ + tr BB∗ = tr A∗ A + tr B∗ B .

Thus hC, Ci = tr C∗ C = tr B∗ B = hB, Bi as asserted. 2

Lemma 9.14 Let H be a Hilbert space. Suppose that R ∈ L(H) is normal and there is a subspace V ⊆ H
such that R maps V into V (i.e., R(V) ⊆ V). Then R maps V ⊥ into V ⊥ , and R restricted to V (respectively
V ⊥ ) is a normal operator in L(V) (respectively L(V ⊥ )).

Proof. Let n = dim(H) and let k = dim(V) (whence dim(V ⊥ ) = n − k). We can assume that
1 6 k < n, otherwise the statement is trivial. We can choose an orthonormal basis B := {b1 , . . . , bn }
for H such that {b1 , . . . , bk } is an orthonormal basis for V and {bk+1 , . . . , bn } is an orthonormal
basis for V ⊥ . Letting M be the matrix of R with respect to B, we can write M in block form as

A B
M= ,
C D
11
In Theorem B.6, we show that if a matrix is both normal and upper triangular, then it is diagonal. Hence A is
represented in this basis as a diagonal matrix. That is, any Schur basis of a normal operator A is an eigenbasis of A.This
is an alternate proof of the Spectral Theorem for Normal Operators.

52
where A is a k × k matrix. For all 1 6 j 6 k, we have Rbj ∈ V, so Rbj (represented by the jth
column of M) is a linear combination of b1 , . . . , bk only. This means that C = 0, which implies
hB, Bi = hC, Ci = 0 by the previous lemma. Thus B = 0, and we have

A 0
M= .
0 D

From this we get that R(V ⊥ ) ⊆ V ⊥ , because for k < j 6 n, the jth column of M, which represents
Rbj , is a linear combination of bk+1 , . . . , bn only. Furthermore,
∗
AA∗

A A 0 ∗ ∗ 0
= M M = MM = ,
0 D∗ D 0 DD∗

and it follows by equating blocks that A and D are both normal matrices. 2
We now have the tools in place to give a short, tidy proof of the Spectral Theorem, Theorem 9.12.
Proof. [Spectral Theorem, Theorem 9.12] The proof is by induction on dim(H). If dim(H) = 1,
then the statement follows from the fact that every nonzero vector is an eigenvector of A and hence
forms an eigenbasis for A. Now suppose that dim(H) = n > 1, and assume the statement holds for
proper subspaces of H. Let λ be an eigenvalue of A with some corresponding eigenvector b1 ∈ H.
Without loss of generality, we can assume kb1 k = 1. Let V := {ab1 : a ∈ C} be the 1-dimensional
subspace of H spanned by b1 . For any a ∈ C, we have Aab1 = aAb1 = aλb1 ∈ V, that is, A maps
V into V. By Lemma 9.14, A also maps V ⊥ into V ⊥ , and its restriction A 0 to V ⊥ is a normal operator
in L(V ⊥ ). Since dim(V ⊥ ) = n − 1 < n, we apply the inductive hypothesis to get an eigenbasis
{b2 , . . . , bn } ⊆ V ⊥ for A 0 . Then {b1 , b2 , . . . , bn } is an eigenbasis for A. 2
If U ∈ L(H) is unitary, then U is normal, and choosing an eigenbasis for U, we represent it as
a diagonal matrix D. For each 1 6 i 6 n, let di = [D]ii . Since DD∗ = I (D is unitary), we have

1 = [I]ii = [DD∗ ]ii = di d∗i = |di |2 ,

and thus the eigenvalues of U all lie on the unit circle in C.

We can get more milage out of Lemma 9.14. The next lemma generalizes what we showed for
Hermitean operators.

Lemma 9.15 If A ∈ L(H) is normal, and v1 and v2 are eigenvectors of A with distinct eigenvalues, then
hv1 , v2 i = 0.

Proof. Let λ1 and λ2 be the eigenvalues of v1 and v2 , respectively. By assumption, λ1 , λ2 . We can

assume without loss of generality that kv1 k = 1. Let V be the 1-dimensional subspace of H spanned
by v1 . As in the previous proof, we see that A maps V into V. In the spirit of Gram-Schmidt, let

y := v2 − hv1 , v2 iv1 .

Then hv1 , yi = 0 = hy, v1 i, and so hy, av1 i = ahy, v1 i = 0 for all a ∈ C. Thus y ∈ V ⊥ . Lemma 9.14
states that A must then map V ⊥ into V ⊥ . In particular, hAy, v1 i = hv1 , Ayi = 0. We have

0 = hv1 , Ayi = hv1 , Av2 − hv1 , v2 iAv1 i = hv1 , λ2 v2 − hv1 , v2 iλ1 v1 i

= λ2 hv1 , v2 i − hv1 , v2 iλ1 hv1 , v1 i = (λ2 − λ1 )hv1 , v2 i .

53
Since λ2 − λ1 , 0, we must have hv1 , v2 i = 0. 2
If A is an operator and λ is an eigenvalue of A, then we define the eigenspace of A with respect
to λ as
Eλ (A) = {v ∈ H : Av = λv}.
This is a subspace of H with positive dimension.

Corollary 9.16 If A is normal, then its eigenspaces are mutually orthogonal and span H. The dimension
of each eigenspace is the same as the multiplicity of the corresponding eigenvalue.

The following corollary is useful because it shows that any normal operator is a unique linear
combination of projectors that form a csop, thus revealing how projectors form the building blocks
of normal operators through their eigenvalues. It is essentially an alternate formulation of the
Spectral Theorem.

Corollary 9.17 If A is normal, then there is a unique set {(P1 , λ1 ), . . . , (Pk , λk )}, such that the λj ∈ C are
all distinct, the set {P1 , . . . , Pk } is a complete set of orthogonal projectors, and

A = λ1 P1 + λ2 P2 + · · · + λk Pk . (45)

Furthermore, λ1 , . . . , λk are the distinct eigenvalues of A, and each Pj orthogonally projects onto Eλj (A).

Proof. Suppose dim(H) = n. Let λ1 , . . . , λk be the distinct eigenvalues of A, and for all 1 6 j 6 k
let Pj be the orthogonal projector onto Eλj (A). Choose an eigenbasis {b1 , . . . , bn } for A. For each
1 6 i 6 n, let 1 6 ji 6 k be such that Abi = λji bi . Then bi lies in the eigenspace Eλji (A). This
means that bi = Pji bi , and it follows that for all Pj bi = Pj Pji bi = 0 for all j , ji . And so we have
 
Xk
Abi = λji bi = λji Pji =  λj Pj  bi .
j=1

Thus both sides of Equation (45) act the same way on each bi , and so they must be equal. This
proves existence.
To show uniqueness, we show that for any decomposition

X̀
A= µ j Qj
j=1

where the µj are pairwise distinct scalars and {Q1 , . . . , Q` } is a complete set of orthogonal projectors,
it must be that the µj are all the eigenvalues of A with the respective Qj projecting onto the
corresponding eigenspaces (and so incidentally, ` = k). Fix a j such that 1 6 j 6 `, and notice that
AQj = µj Qj . For any v ∈ H, we then have AQj v = µj Qj v, and thus Qj v is an eigenvector of A
provided Qj v , 0. Since Qj , 0, such a v must exist, and this shows that µj is an eigenvalue of A
and Qj maps H into the corresponding eigenspace Eµj (A). This implies {µ1 , . . . , µ` } ⊆ {λ1 , . . . , λk }.
We’ll be done if we show two things: (1) that Qj maps H onto Eµj (A); and (2) every eigenvalue of
A is in {µ1 , . . . , µ` }. Let u , 0 be any eigenvector of A with some eigenvalue λ.

54
We claim that Qj u = 0 for all j such that µj , λ. We have

X̀
u = Iu = Qj u . (46)
j=1

Fix j and suppose µj , λ. Let v = Qj u. Then v ∈ Eµj (A) by what we showed about Qj above, and
if v , 0, then v is an eigenvector of A with eigenvalue µj , λ. But by Lemma 9.15, we have

0 = hu, vi = u, Qj u = u, Q∗j Qj u = Qj u, Qj u = kvk2 ,

which implies 0 = v = Qj u, and that establishes the claim. Now if λ , µj for all 1 6 j 6 `, then
u = 0 by Equation (46); contradiction. Hence λ ∈ {µ1 , . . . , µ` }. Since λ is an arbitrary eigenvalue of
A, we have {λ1 , . . . , λk } = {µ1 , . . . , µ` } (and since the members of each set are pairwise distinct, we
must also have k = `). Now for any 1 6 j 6 ` and for any u ∈ Eµj (A), Equation (46) and the claim
give u = Qj u, that is, Qj fixes pointwise all elements of Eµj (A). Thus Qj maps H onto Eµj (A). 2
The right-hand side of Equation (45) is called the spectral decomposition of A.

Exercise 9.18 Show that if A is a normal operator with spectral decomposition

A = λ1 P1 + λ2 P2 + · · · + λk Pk

as in Corollary 9.17, then for any integer m > 0,

A m = λm m m
1 P1 + λ2 P2 + · · · + λk Pk .

(We define A0 = I by convention.) [Hint: Induction on m.]

Exercise 9.19 Show that if A is a normal operator with spectral decomposition

A = λ1 P1 + λ2 P2 + · · · + λk Pk

as in Corollary 9.17, then

eA = eλ1 P1 + eλ2 P2 + · · · + eλk Pk .
[Hint: Use the last exercise.]

We’ll be dealing with normal operators almost exclusively from now on.

Exercise 9.20 We know that any Hermitean operator is normal with real eigenvalues. Prove the
converse: any normal operator with only real eigenvalues is Hermitean. [Hint: Use an eigenbasis.]

Exercise 9.21 We know that any unitary operator is normal with eigenvalues on the unit circle
in C. Prove the converse: any normal operator with all eigenvalues on the unit circle is unitary.
[Hint: Use an eigenbasis.]

55
Scalar Functions Applied to Operators. Let Ω ⊆ C be some set and suppose f : Ω → C is some
function mapping scalars to scalars. It is often natural and useful to extend the definition of f to
apply to operators A ∈ L(H), where H is a Hilbert space, with the results also being operators in
L(H). There are at least two situations where this can be done:

1. The value f(x) is expressible as a convergent power series about some point x0 ∈ Ω: for
every x ∈ Ω,
∞
X
f(x) = ci (x − x0 )i
i=0

for some coefficients c0 , c1 , . . . ∈ C independent of x. For any operator A ∈ L(H), we then

define
∞
X
f(A) := ci (A − x0 I)i ,
i=0

provided the right-hand side converges.

2. The function f is arbitrary. For any normal operator A ∈ L(H) all of whose eigenvalues are
contained in Ω, we define

X
k
f(A) := f(λj )Pj = f(λ1 )P1 + · · · + f(λk )Pk ,
j=1

where A = λ1 P1 +· · ·+λk Pk is the unique spectral decomposition of A given by Equation (45)

of Corollary 9.17, above.

We’ve seen an example of item (1) above with the natural exponential map z 7→ ez defined on
all of C, and Exercises 9.18 and 9.19 show that this definition agrees with item (2) as well. In fact,
it can be shown that if both conditions (1) and (2) hold for some f and A, then the two definitions
will coincide. We will see an instance of item (2) below, when we take the square root of a positive
operator. It often the case that a special property that f with respect to scalars has an analogous (but
perhaps weaker) property when applied to operators. For example, ez+w = ez ew for all z, w ∈ C,
and for operators A, B ∈ L(H) we have eA+B = eA eB as well, provided A and B commute.
Here are two more general facts. The first says among other things that this concept is covariant
under unitary conjugation. The second applies to item (2) specifically.

Proposition 9.22 Let function f : Ω → C and operator A ∈ L(H) satisfy the conditions of either item (1)
or item (2) above. Then A and f(A) commute. Furthermore, for any unitary operator U ∈ L(H), we have
that f and UAU∗ also satisfy the same condition(s), and f(UAU∗ ) = Uf(A)U∗ .

Proposition 9.23 Suppose f : Ω → C and A ∈ L(H) satisfy the conditions of item (2) above. Then f(A)
is the unique operator in L(H) such that, for any v ∈ H and λ ∈ C, if v is an eigenvector of A with
eigenvalue λ, then v is an eigenvector of f(A) with eigenvalue f(λ).

56
Positive Operators.

Definition 9.24 An operator A ∈ L(H) is positive or positive semidefinite (written A > 0) iff v∗ Av > 0
for all v ∈ H. We say that A is strictly positive or positive definite (written A > 0) if v∗ Av > 0 for all
nonzero v ∈ H.

Since u∗ Av = hu, Avi for all vectors u, v ∈ H, positivity of A is equivalent to hv, Avi > 0 for
all v ∈ H, or in Dirac notation, hψ|A|ψi > 0 for all |ψi ∈ H. Obviously, strict positivity implies
positivity.
For example, the zero operator 0 ∈ L(H) and the identity operator I ∈ L(H) are clearly positive:
= 0 and v∗ Iv = v∗ v = kvk2 > 0 for all v. In fact, I > 0 as well.
v∗ 0v

Exercise 9.25 Verify that if A > 0 and B > 0 are positive operators and a > 0 is a nonnegative real
number, then A + B > 0 and aA > 0.

Positive operators play a huge role in the study of quantum information, so it is worth spending
some time with them.

Exercise 9.26 (A bit challenging) Show for any operator A ∈ L(H) that A is Hermitean if and only
if v∗ Av ∈ R for all v ∈ H. (Thus every positive operator is Hermitean and hence normal.) [Hint:
The forward direction is easy. For the reverse direction, consider the matrix elements of A with
respect to some orthonormal basis b1 , . . . , bn . Consider three types of cases:

1. v = bk for some k. What does this tell you about the diagonal elements [A]kk ?

2. v = bk + bj for some k , j. This allows you to relate [A]kj and [A]jk in some way.

3. v = bk + ibj for the same k, j above. This allows you to relate [A]kj and [A]jk further.]

Exercise 9.27 Show that A > 0 if and only if A is normal and all its eigenvalues are nonnegative
real numbers. (It follows that if A > 0, then tr A > 0.) [Hint: Use the previous exercise.]

Exercise 9.28 Show that if A > 0 and tr A = 0, then A = 0. [Hint: Use the previous exercise.]

Exercise 9.29 Show that the zero operator is the only operator A satisfying A > 0 and −A > 0.
[Hint: Use the previous two exercises.]

Exercise 9.30 Show that the following are equivalent for any operator A:

1. A > 0.

2. A is normal, and all its eigenvalues are positive reals.

3. A > 0 and A is nonsingular.

57
You may have noticed that you can determine a lot about a normal operator by its spectrum.
This is not too surprising, because

• most properties we’ve been looking at of the underlying matrices are basis-invariant (i.e.,
invariant under unitary conjugation),

• every normal operator is representable by a diagonal matrix in some basis, and

• the spectrum of a diagonal matrix is just the set of diagonal elements of the matrix.

Each entry in the following table is easily checked by representing the operator as a diagonal
matrix with respect to an eigenbasis.

A normal operator is . . . . . . iff all its eigenvalues are . . .

nonsingular (invertible) nonzero
Hermitean real
unitary on the unit circle
positive nonnegative
strictly positive positive
a projector either 0 or 1
the identity 1
the zero operator 0

If A > 0 is a positive operator, then

√ there exists a unique positive operator B > 0 such that
B2 1/2
= A. We denote B by A or by A. To see that B exists, we decompose

A = λ1 P1 + · · · + λk Pk

uniquely according to Corollary 9.17. Since A > 0, we have λj > 0 for 1 6 j 6 k. Now let
p p
B := λ1 P1 + · · · + λk Pk .
√ √
B has eigenvalues λ1 , . . . , λk > 0, so B > 0. By Exercise 9.18, we get
p 2 p 2
B2 = λ1 P1 + · · · + λk Pk = A.

To show uniqueness, suppose that B, C > 0 such that B2 = C2 = A. Using Corollary 9.17 again,
decompose

B = µ1 P1 + · · · + µk Pk ,
C = ν1 Q1 + · · · + ν` Q` .

So,
B2 = µ21 P1 + · · · + µ2k Pk = A = ν21 Q1 + · · · + ν2` Q` = C2 .
Note that the µj are distinct and nonnegative (same with the νj ), and therefore so are the µ2j
(same with the ν2j ). Then since the decomposition of A from Corollary 9.17 is unique, we must

58
have {(P1 , µ21 ), . . . , (Pk , µ2k )} = {(Q1 , ν21 ), . . . , (Q` , ν2` )}. Thus k = ` and {(P1 , µ1 ), . . . , (Pk , µk )} =
{(Q1 , ν1 ), . . . , (Qk , νk )}, because all the µj > 0 and νj > 0. So we must have B = C.
Notice that, since the same projectors are involved in the decompositions of A and B, it follows
that the eigenvectors of A and B coincide, and the corresponding eigenvalues of B are the square
roots of those of A.
The square root function applied to positive operators that we’ve just defined is an example of
a scalar function applied to an operator that we discussed in the previous topic. The fact that any
positive operator has a (positive) square root is useful in many places. For example, we get the
following theorem:

Theorem 9.31 Let A and B be any positive operators over a Hilbert space H. Then hA, Bi > 0, with
equality holding if and only if AB = 0. (Recall that hA, Bi = tr(A∗ B).)

Proof. A is Hermitean, so we have

√ √ √ √ √ √ √ √
hA, Bi = tr(AB) = tr A A B B = tr B A A B = tr(C∗ C) = hC, Ci ,
√ √
where C := A B. By positive definiteness, we have hC, Ci > 0 with equality holding if and
only if C = 0. Now if AB = 0, then hA, Bi = tr(AB) = tr 0 = 0, which shows one direction of the
“if and only√if.” √Thus
√ it√only remains
√ √ to show that if C = 0, then AB = 0. Suppose that C = 0.
Then AB = A A B B = AC B = 0. 2

√ 9.32 Show√that∗ if A and U are operators, A > 0, and U is unitary,

Exercise √ then UAU∗ > 0
∗
and√ UAU∗ = U AU . [Hint: By uniqueness, it suffices to show that U AU > 0 and that
(U AU∗ )2 = UAU∗ .]

Proposition 9.33 Let A be an n × n matrix and B an m × n matrix, for positive integers m and n. If
A > 0, then BAB∗ > 0. (Note that BAB∗ is an m × m matrix.)

Proof. For any m-dimensional column vector v, we have

∗ ∗ ∗
√ √ ∗ D√
∗
√ ∗ E
√ ∗ 2

v BAB v = v B A AB v = AB v, AB v = AB v > 0 .

(The inner product is of n-dimensional vectors.) 2

Exercise 9.34 Let P1 , . . . , Pk be nonzero projectors, and let A = P1 + · · · + Pk . Show that if A is a

projector, then Pi Pj = 0 for all i , j. [Hint: Square both sides of the equation above, simplify, take
the trace, then apply Theorem 9.31.]
Exercise 9.34 establishes that Condition (1) follows from Condition (2) in Definition 5.11.

If A is any operator, then A∗ A is always positive: for any vector v, we have

hv, A∗ Avi = hAv, Avi = kAvk2 > 0.

√
We denote the positive operator A∗ A by |A|. This is analogous to the absolute value of a scalar,
but keep in mind that |A| is an operator and not a scalar.

59
Exercise 9.35 Show that if A is any operator, then A > 0 if and only if A = |A|.

Exercise 9.36 Show that if A is any operator, then the eigenvalues of |A| are the absolute values of
the eigenvalues of A.

Here we summarize many equivalent ways of characterizing positive operators into a single
proposition.

Proposition 9.37 Let H be an n-dimensional Hilbert space for some n > 0, and let A ∈ L(H) be an
operator (equivalently, a n × n matrix with respect to some standard basis {e1 , . . . , en } of H). The following
are equivalent:

1. A > 0 (i.e., v∗ Av > 0 for all v ∈ H).

2. A is normal with all eigenvalues > 0.

3. A = |A|.

4. There exists an n × n matrix B such that A = B∗ B.

5. There exist a positive integer m and an m × n matrix B such that A = B∗ B.

6. hB, Ai > 0 for every positive operator B ∈ L(H).

7. huu∗ , Ai > 0 for every unit vector u ∈ H.

8. There exist vectors v1 , . . . , vn ∈ H such that [A]ij = vi , vj for all 1 6 i, j 6 n.

Proof. (1) ⇔ (2) by Exercise 9.27. (1) ⇔ (3) by Exercise 9.35. Itp
then suffices show (3) ⇒ (4) ⇒
(5) ⇒ (1) and (1) ⇒ (6) ⇒ (7) ⇒ (1). For (3) ⇒ (4), set B := |A|. (4) ⇒ (5) is obvious. For
(5) ⇒ (1), we have, for any v ∈ H,

v∗ Av = v∗ B∗ Bv = (Bv)∗ (Bv) = hBv, Bvi = kBvk2 > 0 ,

hence A > 0. (1) ⇒ (6) follows from Theorem 9.31. We have (6) ⇒ (7) because uu∗ > 0 (check
it!). For (7) ⇒ (1), for any nonzero v ∈ H, let u := v/kvk. Then u is a unit vector, and we have

v∗ Av = tr(v∗ Av) = tr(vv∗ A) = tr((vv∗ )∗ A) = hvv∗ , Ai = kvk2 huu∗ , Ai > 0 .

(Obviously, v∗ Av = 0 if v = 0.) We’ve now established that (1)–(7) are equivalent.

For (4) ⇒ (8), let A = B∗ B for B ∈ L(H). Then for all 1 6 i, j 6 n,

[A]ij = e∗i Aej = e∗i B∗ Bej = Bei , Bej .

Set vk := Bek for all 1 6 k 6 P n. For (8) ⇒ (1), let v1 , . . . , vn ∈ H be given satisfying (8). Then for
any v ∈ H, we can write v = n i=1 xi ei for some x1 , . . . , xn ∈ C. Then

Xn X X X X
* +
∗ ∗ ∗ ∗ ∗

v Av = xi xj (ei Aej ) = xi xj [A]ij = xi xj vi , vj = xi vi , xj vj = hu, ui > 0
i,j=1 i,j i,j i j

60
Pn
by the positive definiteness of h·, ·i, where u := k=1 xk vk . 2
Before leaving this topic, we define and give some basic properties of a binary 6 relation on
L(H), or equivalently, on square matrices. This relation arises naturally from the notion of operator
positivity.

Definition 9.38 Let H be an n-dimensional Hilbert space and let A, B ∈ L(H) (equivalently, A and
B are n × n matrices). We say that A 6 B iff B − A > 0, i.e., iff B − A is a positive operator.

Most of Proposition 9.39, below, follows from the properties of positive operators we have
established above. For technical convenience, we state things in terms of matrices rather than
operators.

Proposition 9.39 Let n and m be positive integers, and let A and B be any n × n matrices.

1. The binary relation 6 on n × n matrices is reflexive and transitive.

2. If A 6 B and B 6 A, then A = B.

3. For any n × n matrix C and scalars 0 6 a 6 b, if A 6 B then A + C 6 B + C and aA 6 bB.

4. For any m × n matrix D, if A 6 B, then DAD∗ 6 DBD∗ .

5. For any n × n matrix F, if A and B are Hermitean, A 6 B, and F > 0, then hF, Ai 6 hF, Bi (and
both quantities are real).

6. If A and B are Hermitean and A 6 B, then tr A 6 tr B (and both quantities are real).

7. If A is a projector, then 0 6 A 6 I.

Proof. Exercise. [Hint: (6) follows from (5) by setting F := I.] 2

Items (1) and (2) are the axioms for 6 being a partial order.

Corollary 9.40 If A and B are operators, A 6 I, and B > 0, then tr(AB) 6 tr B (and both quantities are
real).

Proof. Since I − A > 0 and hence is Hermitean, it follows that A is Hermitean. We then have

tr(AB) = tr(BA) = hB, Ai 6 hB, Ii = tr B .

61
Commuting Operators. In this topic, we’ll prove the fundamental result that commuting normal
operators always share a common eigenbasis, and so they are simultaneously diagonalizable. This
is a stronger version of the Spectral Theorem, which only deals with one normal operator.

Theorem 9.41 Let C be an arbitrary family12 of normal operators in L(H), any two of which commute,
i.e., AB = BA for all A, B ∈ C. Then there is an orthonormal basis B of H that is an eigenbasis for all
operators in C simultaneously.

To prove Theorem 9.41, we will use Lemma 9.14 paired with the following fundamental
property of commuting operators:

Lemma 9.42 Let A, B ∈ L(H) be commuting operators, and let E ⊆ H be any eigenspace of A. Then B
maps E into E.

Proof. Let E = Eλ (A) be the eigenspace of A corresponding to some eigenvalue λ of A. Then for
any v ∈ E, we have
ABv = BAv = B(λv) = λBv .
Thus either Bv = 0 or Bv is an eigenvector of A with eigenvalue λ. In either case, Bv ∈ E, which
proves the lemma. 2
Proof of Theorem 9.41. This proof is somewhat similar to that of the Spectral Theorem. We
proceed by induction on n = dim(H). If n = 1, then all operators in C are scalar multiples of the
identity operator, making H itself a common eigenspace of all operators in C, and any unit vector
in H is then a common eigenbasis.
Now assume n > 1 and the theorem holds for any Hilbert space of dimension less than n. We
prove it true for dimension n by first finding at least one common eigenvector for all the operators
in C. We will then continue as in the proof of the Spectral Theorem. To find a common eigenvector,
we construct a finite, strictly descending chain of subspaces

H = E0 ⊃ E1 ⊃ E2 ⊃ · · · ⊃ Ek ,

where Ei is a proper subspace of Ei−1 for all 1 6 i 6 k, and dim(Ek ) > 0. (Any such chain must
be finite, because the dimension decreases by at least 1 for each successive Ei .) We will do this in
such a way that all nonzero vectors in Ek are common eigenvectors of all the operators in C, that
is, all operators in C are multiples of the identity when restricted to Ek . For convenience, for each
0 6 i 6 k, we also define Ci to be the set of all restrictions to Ei of operators in C. We will maintain
the invariant that every operator A ∈ Ci maps Ei into itself (that is, A ∈ L(Ei )), from which it
follows from Lemma 9.14 that A is a normal operator on Ei .
Now for the construction.13 First, set E0 := H, whence C0 := C. Note that the above invariant
holds trivially for i = 0. Then for i := 1, 2, 3, . . . in increasing order (until we stop), do the following:

• If all operators in Ci−1 are multiples of the identity on Ei−1 , then set k := i − 1 and STOP.
12
C need not be finite—or even countable.
13
The construction makes a series of arbitrary choices, so it is not unique.

62
• Otherwise, choose an operator A ∈ Ci−1 that is not a multiple of the identity, and let Ei
be any eigenspace of A. Note that such an Ei exists (as one does for any operator), that Ei
is a proper subspace of Ei−1 , and that dim(Ei ) > 0. Also note that every operator in Ci−1
commutes with A and thus maps Ei into itself by Lemma 9.42. From this one can see that
the invariant is maintained for i.

By construction, every nonzero vector in Ek is an eigenvector of all the operators in Ck (and

hence in C). Let {b1 , · · · , bd } be any orthonormal basis for Ek , where d = dim(Ek ) > 0. Since all
operators in C map Ek to itself, by Lemma 9.14, all these operators also map E⊥ k into itself and act
normally on Ek . Define C to be the set of restrictions to Ek of operators in C. We now apply
⊥ ⊥ ⊥

the inductive hypothesis with space E⊥ k and set of operators C to obtain an orthonormal basis
⊥

{bd+1 , . . . , bn } ⊆ Ek of common eigenvectors of all operators in C⊥ (and hence all operators in

⊥

C). Combining, we now have an orthonormal basis B := {b1 , . . . , bn } of H consisting of common

eigenvectors of all operators in C. 2

10 Week 5: Tensor products

Tensor Products and Combining Physical Systems. Suppose we have two physical systems
S and T with state spaces HS and HT , respectively, and we want to consider the two systems
together as a single system ST . What is the state space of ST ? Quantum mechanics says that
the state space of ST is completely determined by HS and HT via a construction called the tensor
product. We’ll first describe the tensor product of matrices, then we’ll discuss the tensor product
in a basis-independent way.
Let A be an m × n matrix and let B be an r × s matrix (m, n, r, s are arbitrary positive integers).
The tensor product of A and B (also called the outer product or the direct product or the Kronecker
product) is the mr × ns matrix given in block form by
 
a11 B a12 B · · · a1n B
 a21 B a22 B · · · a2n B 
A⊗B= .
 
.. .. . . ..
 . . . . 
am1 B am2 B · · · amn B

We collect the standard, easily verifiable properties of the ⊗ operation here in one place.

Proposition 10.1 For any matrices A, B, C, D and scalars a, b ∈ C, the following equations hold provided
the operations involved are well-defined:

1. a ⊗ b = ab, where we identify scalars with 1 × 1 matrices as usual.

2. More generally, if A is m × 1 (a column vector) and B is 1 × n (a row vector), then A ⊗ B = AB.

3. A ⊗ (B + aC) = A ⊗ B + a(A ⊗ C) and (A + aB) ⊗ C = A ⊗ C + a(B ⊗ C), that is, ⊗ is bilinear

(linear in both arguments).

4. (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C), that is, ⊗ is associative.

63
5. (A ⊗ B)(C ⊗ D) = AC ⊗ BD. (This is worth memorizing because we’ll use it all the time.)

6. (A ⊗ B)∗ = A∗ ⊗ B∗ .

7. tr(A ⊗ B) = (tr A)(tr B).

Exercise 10.2 Give the 4 × 4 matrices for I ⊗ X, X ⊗ I, X ⊗ Y, and Z ⊗ Z.

Exercise 10.3 Show that if A and B are Hermitean (respectively, unitary), then A ⊗ B is Hermitean
(respectively, unitary).

A special case is when u = (u1 , . . . , um ) ∈ Cm and v = (v1 , . . . , vn ) ∈ Cn are column vectors.

Then  
u 1 v1
 .. 
   . 
u1 v 
 u 1 vn 

 u2 v   
u ⊗ v =  .  =  u2 v1  ∈ Cmn .
   
 .  . .. 
 . 
um v 
 ..


 . 
u m vn
If {e1 , . . . , em } and {f1 , . . . , fn } are the standard bases for Cm and Cn respectively as in Equation (5),
then it is clear that {ei ⊗ fj : 1 6 i 6 m & 1 6 j 6 n} is the standard basis for Cmn . In fact, if we let
{g1 , . . . , gmn } be the standard basis of Cmn , then

ei ⊗ fj = g(i−1)n+j

for 1 6 i 6 m and 1 6 j 6 n. If w = (w1 , . . . , wm ) ∈ Cm and x = (x1 , . . . , xn ) ∈ Cn , then for the

standard inner product we have

hu ⊗ v, w ⊗ xi = (u ⊗ v)∗ (w ⊗ x) = (u∗ ⊗ v∗ )(w ⊗ x) = u∗ w ⊗ v∗ x = hu, wihv, xi.

From this it is easy to see that if {b1 , . . . , bm } and {c1 , . . . , cn } are any orthonormal bases for Cm and
Cn , respectively, then {bi ⊗ cj : 1 6 i 6 m & 1 6 j 6 n} is an orthonormal basis for Cmn . Indeed,
we have

bi ⊗ cj , bk ⊗ c` = hbi , bk i cj , c` = δik δj` ,
which is 1 if i = k and j = ` and is 0 otherwise.
This last bit suggests that we can define the tensor product in a basis-independent way, applied
to (abstract) vectors and operators. If H and J are Hilbert spaces, then we can define a Hilbert
space H ⊗ J (the tensor product of H and J) together with a bilinear map ⊗ : H × J → H ⊗ J,
mapping any pair of vectors u ∈ H and v ∈ J to a vector u ⊗ v ∈ H ⊗ J, such that if {b1 , . . . , bm } and
{c1 , . . . , cn } are orthonormal bases for H and J, respectively, then {bi ⊗ cj : 1 6 i 6 m & 1 6 j 6 n}
is an orthonormal basis for H ⊗ J. We’ll call such a basis a product basis. We won’t do it here, but
it can be shown that these two rules—bilinearity and the basis rule—define in essence the Hilbert
space H ⊗ J uniquely. Notice that the basis rule implies that the dimension of H ⊗ J is the product
of the dimensions of H and J.

64
It’s worth pointing out that not all vectors in H ⊗J are of the form u⊗v for u ∈ H and v ∈ J. For
example, the column vector (1, 0, 0, 1) = e1 + e4 cannot be written as the single tensor product of
two 2-dimensional column vectors. It can, however, be written as the sum of two tensor products:
 
1
 0 
 = 1 ⊗ 1 + 0 ⊗ 0 .
 0  0 0 1 1
1

In general a vector in H ⊗ J may not be a tensor product, but it is always a linear combination of
them (which is clear by our discussion about bases, above), i.e., the tensor products span the space
H ⊗ J.
We’re not done overloading the ⊗ symbol. Given the definition of H ⊗ J just described, we
can extend ⊗ to apply to operators as well as vectors. For example, we can extend it to a map
⊗ : L(H) × L(J) → L(H ⊗ J) by defining the action of an operator A ⊗ B on a vector u ⊗ v ∈ H ⊗ J:

(A ⊗ B)(u ⊗ v) = Au ⊗ Bv.

One can show that this definition is consistent, and since H ⊗ J is spanned by vectors of the form
u ⊗ v, this defines the operator A ⊗ B uniquely by linearity. We could define ⊗ on dual vectors and
other kinds of linear maps, e.g., mapping from one space to another space.
Picking orthonormal bases for H and J allows us to represent objects such as vectors, dual
vectors, operators, or what have you, in both spaces as matrices. When we do this, the abstract
and matrix-based notions of ⊗ completely coincide, as is the case with the other linear algebraic
constructs that we’ve seen, e.g., adjoint, trace, et cetera. This idea (that the two notions should
coincide) guides us in any further extensions of the ⊗ operation that we may wish to use.

Back to Combining Physical Systems. If S and T are physical systems with state spaces HS and
HT as before, then the state space of the combined system is HST = HS ⊗ HT . If |ϕiS ∈ HS is a
state of S and |ψiT ∈ HT is a state of T (we occasionally add subscripts to make clear which state
goes with which system), then |ϕiS ⊗ |ψiT is a state of ST , which we interpret as saying, “The
system S is in state |ϕiS , and the system T is in state |ψiT .” (We’ll often drop the ⊗ and write
|ϕiS ⊗ |ψiT simply as |ϕiS |ψiT , or even just |ϕ, ψi if the meaning is clear. The same holds for
bras as well as kets.) As we’ve seen, however, there can be states of ST√that can’t be written as a
single tensor product, for example, the two-qubit state (|0i|0i + |1i|1i)/ 2. These states are called
entangled states, whereas states of the form |ϕiS |ψiT are called separable states or tensor product states.
More on this later.
How does this look in the density operator formalism? Easy answer: exactly the same, at least
for separable states. Let ρS = |ϕihϕ| be the density operator corresponding to |ϕi of system S,
and let ρT = |ψihψ| be the density operator corresponding to |ψi of system T (subscripts dropped).
Then the density operator for the combined system should be

ρST = (|ϕi|ψi)(|ϕi|ψi)∗ = (|ϕi|ψi)(|ϕi∗ |ψi∗ ) = (|ϕi|ψi)(hϕ|hψ|) = |ϕihϕ| ⊗ |ψihψ| = ρS ⊗ ρT .

So we take the tensor product of the density operators just as we would do with the
√ vectors in the
original formulation. For the two-qubit entangled state example (|0i|0i + |1i|1i)/ 2 above, which

65
√
we abbreviate as (|00i + |11i)/ 2, the corresponding density operator is
  
1 1 0 0 1
|00i + |11i

h00| + h11|  1 0 0 1 = 1
1 0   0 0 0 0 
ρ= √ √ =  .
2 2 2 0  2 0 0 0 0 
1 1 0 0 1

If S and T are isolated from each other (and the outside world), then each evolves in time
according to a unitary operator, say U for system S and V for system T . U and V are called local
operations. In this case, U ⊗ V is the unitary giving the time evolution of the combined system: for
tensor product state |ϕiS ⊗ |ψiT , we have (U ⊗ V)(|ϕiS ⊗ |ψiT ) = U|ϕiS ⊗ V|ψiT , which is again a
tensor product state. If S and T are brought together so that they interact, then the unitary giving
the evolution of the combined system ST might not be able to be written as a single tensor product
of unitaries for S and T respectively.

Exercise 10.4 Let HS and HT be Hilbert spaces, and let P1 , . . . , Pk ∈ L(HS ) be a complete set of
orthogonal projectors for HS . Show that P1 ⊗ I, . . . , Pk ⊗ I is a complete set of orthogonal projectors
for HS ⊗ HT , where I is the identity operator on HT . (The latter set represents a projective
measurement on the system S when viewed from the combined system ST .)

Continuing the idea of Exercise 10.4, let P1 , . . . , Pk ∈ L(HS ) and Q1 , · · · , Q` ∈ L(HT ) be

complete sets of orthogonal projectors for systems S and T , respectively. Suppose that the combined
system ST is in some arbitrary state |ψi and that Alice measures system S using the first set of
projectors. Then the exercise illustrates how she is actually measuring system ST with projectors
P1 ⊗ I, . . . , Pk ⊗ I, where I is the identity operator on HT . She’ll see some outcome i with some
probability, and the state of ST will collapse to some |ψi i according to the usual rules. If Bob now
measures system T using the second set of projectors when ST is in state |ψi i (which is tantamount
to measuring ST with projectors I ⊗ Q1 , . . . , I ⊗ Q` , where I is the identity on H
S ),he will see some
outcome j with some probability, and the system ST will then be in some state ψij , which depends
on both Alice’s outcome i and Bob’s outcome j. Alternatively, Bob may do his measurement on
T first and Alice does hers on S second. We won’t bother to prove it here, but it can be easily
shown mathematically that the joint probability Pr[i, j] of Alice seeing i and Bob seeing j is the
same regardless
of who does their measurement first, and the same goes for the post-measurement
state ψij . Thus we can consider Alice and Bob doing their measurements simultaneously and
independently of each other. Furthermore, we can consider the two measurements combined into
a single projective measurement of ST , with projectors {Pi ⊗ Qj : 1 6 i 6 k & 1 6 j 6 `}, where
each projector Pi ⊗ Qj corresponds to the outcome (i, j). Caveat: even though Alice’s and Bob’s
measurements can be done independently of each other, the probabilities Pr[i] of Alice seeing i
and Pr[j] of Bob seeing j may be correlated (i.e., dependent) if |ψi is an entangled state. We’ll see a
specific example of this later.

The No-Cloning Theorem. Quantum states cannot be duplicated in general. The following
theorem makes this precise.

66
H

Figure 3: Sample quantum circuit with two qubits. Time moves from left to right in the figure.
The gate H is applied first to the first qubit, then CNOT is applied to both qubits.

Theorem 10.5 (No-Cloning Theorem) Let H be a Hilbert space of dimension at least two, and let |0i ∈ H
be a fixed unit vector. There is no unitary operator U ∈ L(H ⊗ H) such that U|ψi|0i ∝ |ψi|ψi for any
unit vector |ψi ∈ H.

Proof. Suppose U exists as above, and let |ϕi, |ψi ∈ H be any two unit vectors. Since U is unitary,
we have

hϕ|ψi = hϕ|ψih0|0i
= (hϕ|h0|)(|ψi|0i)
= (hϕ|h0|)U∗ U(|ψi|0i)
= (U|ϕi|0i)∗ U(|ψi|0i)
∝ (hϕ|hϕ|)(|ψi|ψi)
= hϕ|ψi2 ,

and thus |hϕ|ψi| = |hϕ|ψi|2 , which implies |hϕ|ψi| is either 0 or 1, i.e., |ϕi and |ψi are either
orthogonal or colinear. But clearly we can choose |ϕi and |ψi such that this is not the case. 2

Quantum Circuits. The quantum circuit has become the de facto standard theoretical model of
quantum computation. It is equivalent to the other standard model—the quantum Turing machine,
or QTM—but it is easier to work with and represent visually. Quantum circuits are closely
analogous to classical Boolean circuits, and we’ll compare them occasionally.
A quantum circuit consists of some number of qubits, called a quantum register, represented by
horizontal wires. The qubits start in some designated state, representing the input to the circuit.
From time to time, we may act on one or more qubits in the circuit by applying a quantum gate,
which is just a unitary operator applied to the corresponding qubits. A typical circuit with a
two-qubit register is shown in Figure 3. To keep track, we number the qubits in the register from
top to bottom, so that the topmost qubit is the first, etc. At any given time, the register is in some
quantum state |ψi ∈ H ⊗ · · · ⊗ H = H⊗n , where H is here the state space of a single qubit, and n
is the number of qubits in the register. We choose an orthonormal basis for H⊗n by taking tensor
products of the individual one-qubit basis vectors |0i and |1i. We call this basis the computational
basis for the register. For example, a typical computational basis vector in H⊗5 is

|0i ⊗ |0i ⊗ |1i ⊗ |0i ⊗ |1i = |0i|0i|1i|0i|1i = |00101i.

In this state, the first, second, and fourth qubits are 0, and the third and fifth qubits are 1. The state
space of an n-qubit register has dimension 2n , with computational basis vectors representing all

67
the 2n possible values of n bits, listed in the usual binary order: |00 · · · 00i, |00 · · · 01i, |00 · · · 10i,
etc., through |11 · · · 11i.
In the circuit diagram, the state of the register evolves in time from left to right. In Figure 3,
for example, the first gate that is applied is the leftmost gate, i.e., the H gate applied to the first
qubit. Here, we are not using H as a variable to describe any one-qubit gate, but rather we use H
to denote a useful one-qubit gate, known as the Hadamard gate, given by

1 1 1
H= √ .
2 1 −1
Note that
√
H|0i = (|0i + |1i)/ 2,
√
H|1i = (|0i − |1i)/ 2,

or more succinctly,
|0i + (−1)b |1i
H|bi = √ ,
2
√
for any b ∈ {0, 1}. Clearly, H = (X + Z)/ 2 and H2 = I. We also have H ∝ R(1,0,1)/ √2 (π), and so
H rotates the Bloch sphere 180◦ around the line through (1, 0, 1), swapping the +z-axis with the
+x-axis.
Note that although it looks as if we are only applying H to the first qubit, were are really
transforming the state |ψi ∈ H ⊗ H of the entire two-qubit register via the unitary H ⊗ I, where I
is the one-qubit identity operator representing the fact that we are not acting on the second qubit.
Suppose that the initial state of the register is |00i. After the H gate is applied, the state becomes

|0i + |1i |00i + |10i

|ψ1 i = (H ⊗ I)|00i = √ |0i = √ .
2 2

68
11 Week 6: Quantum gates

The next gate in Figure 3 is another very useful, two-qubit gate called a controlled NOT or C-NOT
gate, acting on both qubits. In a C-NOT gate, the small black dot connects to the control qubit (here,
the first qubit) and the ⊕ end connects to the target qubit. If the control is |0i, then the target does
not change; if the control is |1i, then the target’s Boolean value is flipped |0i ↔ |1i (logical NOT).
The control qubit is unchanged regardless. Here it is schematically for any a, b ∈ {0, 1} (here, ⊕
represents bitwise exclusive OR, i.e., bitwise addition modulo 2):

a a

b a⊕b

The matrix for the C-NOT gate above, with the first qubit being the control and the second being
the target, is  
1 0 0 0
 0 1 0 0 
P0 ⊗ I + P1 ⊗ X = |0ih0| ⊗ I + |1ih1| ⊗ X = 
 0 0 0 1 .


0 0 1 0
Here X is the usual Pauli X operator, which swaps 0 with 1, and hence represents logical NOT. If
the control and target qubits were reversed, then the gate would be
 
1 0 0 0
P0 P1  0 0 0 1 
I ⊗ P0 + X ⊗ P1 = =
 0
.
P1 P0 0 1 0 
0 1 0 0

After the C-NOT gate is applied to the state |ψ1 i in Figure 3, the new and final state of the circuit is

|00i + |10i |00i + |11i

|ψ2 i = C-NOT|ψ1 i = C-NOT √ = √ .
2 2
Keep in mind√that the C-NOT gate (as with any quantum gate) acts linearly on the superposition
(|00i + |10i)/ 2, that is, it acts on each basis vector component of the superposition individually,
and the overall result is the superposition of the individual results.
Every quantum circuit built this way represents a single unitary operator acting on the state
space of all its qubits. Note that the individual gates are applied from left to right, which is opposite
of how operators are applied in mathematical expressions.
C-NOT is a classical gate. A classical gate is one that maps computational basis vectors to
computational basis vectors. It can be described in non-quantum terms as a Boolean gate. Each
column of its matrix has a single 1 with the other entries 0. In order to be a legitimate quantum
gate, the matrix must be unitary, which means that the 1’s must appear in all different rows. Such
a matrix, with exactly one 1 in every row and every column and the other entries 0, is called a
permutation matrix because it permutes the standard basis column vectors.

69
Exercise 11.1 Verify that every permutation matrix is unitary.

The C-NOT gate is one example of a controlled gate. More generally, if U is a unitary gate on
k qubits, we can define the (k + 1)-qubit controlled U gate to be

I 0
C-U = P0 ⊗ I + P1 ⊗ U = ,
0 U

where in this case the control qubit is the first qubit. The matrix would be different if the control
were not the first qubit, but the rule is the same in any case: If the control qubit is 0, then nothing
happens with the other (target) qubits. If the control is 1, then U is applied to the target qubits.
The control qubit is unchanged regardless. Here’s how we draw it in the case where U acts on a
single qubit:

In this context, the C-NOT gate is just a controlled X gate C-X.

We’ve seen two classical gates so far: X and C-NOT. We’ll see some others in a bit. The other
Pauli gates are not classical. The Pauli Z gate, for example, leaves the Boolean value (0 or 1) of the
qubit unchanged, but introduces a phase factor (−1) if the value is 1. Z rotates the Bloch sphere
180 degrees about the +z-axis. Here are some other commonly used (nonclassical) gates:

1 0
S=
0 i

is known as the phase gate. Note that S ∝ Rz (π/2) and that S2 = Z. S rotates the Bloch sphere
counterclockwise about the +z axis 90 degrees.
" #
1 0

1 0
T= = .
0 1+i
√
2
0 eiπ/4

For some obscure reason, this gate is known as the π/8 gate, maybe because

e−iπ/8

0
T ∝ Rz (π/4) = .
0 eiπ/8

We have T 2 = S, and T rotates the Bloch sphere counterclockwise 45◦ about the +z-axis. Notice
that T is the only one-qubit gate we’ve seen so far that does not map all axes to axes (i.e., x-, y-, and
z-axes) in the Bloch sphere. I’d call the three gates Z, S, and T conditional phase-shift gates, that leave
the Boolean value of the qubit unchanged while introducing various phase factors conditioned on
the qubit having Boolean value 1.
Here’s another two-qubit classical gate, the SWAP gate:

70
 
1 0 0 0
 0 0 1 0 
= = 
 0

1 0 0 
0 0 0 1
The first depiction is mine and other people’s; the second is the one the textbook uses. The SWAP
gate just exchanges the Boolean values of the two qubits it acts on, fixing |00i and |11i but mapping
|01i to |10i and vice versa.

Exercise 11.2 This is an entirely classical exercise. Show that

[Hint: Rather than multiplying matrices, which can be time-consuming, just compare what the
two circuits do to the four possible basis states.]

Exercise 11.3 Do Exercise 4.16 on pages 178–179 of the text.

Exercise 11.4 This is a nonclassical exercise in several parts. It will help you to simplify circuits
by inspection, based on some circuit identities. It mirrors Exercises 4.13 and 4.17–4.20 on pages
177–180 of the text. An item may use previous items.

1. Verify directly that HXH = Z and that HZH = X (oh yes, and that HYH = −Y).

2. Verify that

Z
=
Z

What is the matrix of this gate? The same is true for the C-S and C-T gates.

3. Show that, for any unitary gates U and A,

=
∗
UAU U∗ A U

(Remember that in the expression UAU∗ , operators are applied from right to left, but in the
circuit, gates are applied from left to right.) [Hint: Consider separately the case when the
control qubit is |0i and when it is |1i. To show equality of two linear operators generally, you
only need to show that they both act the same on the vectors of some basis.]

71
4. Construct a C-Z gate using a single C-NOT gate and two H gates. Similarly, construct a
C-NOT gate using a single C-Z gate and two H gates.

5. Using the previous items, show that

H H
=
H H

Note that gates acting on separate qubits commute, and so it doesn’t matter which of the
gates is applied first, and the order can be freely switched, provided that there are no gates
in between that connect the qubits together. You can think of the gates as being applied
simultaneously if you like.

Finally, we introduce a three-qubit classical gate known as the Toffoli gate, which is really a
controlled controlled NOT gate:

a a

b b

c c ⊕ (a ∧ b)

There are two control qubits and one target qubit. The control qubits are unchanged, and the target
is flipped if and only if both of the controls are 1.

Quantum Circuits Versus Boolean Circuits. Are quantum circuits with unitary gates as powerful
as classical Boolean circuits? You may have already noticed some similarities and differences
between the two circuit models:

• Both types of circuits carry bit values on wires which are acted on by gates.

• Quantum gates can create superpositions from basis states, but Boolean gates are classical,
mapping Boolean input values to definite Boolean output values.

• A Boolean gate may take some number of inputs (usually one or two), and has one output,
which can be freely copied into any number of wires, and thus the number of wires from
layer to layer may change. In quantum circuits, quantum gates are operators mapping the
state space into itself, and so it always has the same number of outputs as inputs. Thus the
number of qubits never changes, and each qubit retains its identity throughout the circuit.

• Boolean gates may lose information from inputs to output, i.e., the input values are not
uniquely recoverable from the output value (e.g., and AND gate or an OR gate). Any quan-
tum unitary gate U can always be undone (at least theoretically) by applying U∗ immediately

72
before or afterwards. Thus quantum unitary gates are reversible, i.e., the input state is always
uniquely recoverable from the output state.

A quantum circuit can use classical gates, provided that they are reversible. Does this pose a
significant restriction on the power of quantum circuits to simulate classical computation? Not
really. Every classical Boolean circuit can be simulated reversibly. More precisely, we have the
following result:

Theorem 11.5 For every Boolean function f : {0, 1}n → {0, 1}m with n inputs and m outputs, there
is a reversible circuit C (equivalently, a quantum circuit using only classical gates) such that, for all
x = (x1 , . . . , xn ) ∈ {0, 1}n and all y = (y1 , . . . , ym ) ∈ {0, 1}m , we have,

x1 x1
.. ..
. .
xn xn
y1 y1 ⊕ z 1
.. ..
. C .
ym ym ⊕ z m
0 0
.. ..
. .
0 0

where (z1 , . . . , zm ) = f(x). Furthermore, C uses only X and Toffoli gates, and if Cf is a (classical) Boolean
circuit computing f using binary AND, OR, and unary NOT gates, then a description for C can be computed
from a description of Cf in polynomial time.

The circuit C acts on three quantum registers: the input qubits, whose initial values are
x1 , . . . , xn ; the output qubits (or target qubits), whose initial values are y1 , . . . , ym , and a set of
“work” qubits, called an ancilla, whose initial and final value is always 00 · · · 0. When all the
ancilla values are restored to 0 at the end of the circuit, we call this a clean circuit. The ancilla
is used for temporary storage of intermediate results. If the y1 , . . . , ym are all 0 initially, then
f(x) will appear as the final configuration of the output register. In quantum terms, if the initial
state is the basis state |xi ⊗ |yi ⊗ |0 · · · 0i = |x, y, 0 · · · 0i, then the final state is the basis state
|xi ⊗ |y ⊕ f(x)i ⊗ |0 · · · 0i = |x, y ⊕ f(x), 0 · · · 0i, where the three labels in the |·i represent the
contents of the three quantum registers. We often suppress the ancilla register and say that C takes
|x, yi to |x, y ⊕ f(x)i.
Note that C is clearly reversible. In fact, C is its own inverse. If we feed the output values on
the right as input values on the left, then C computes the original inputs as outputs.
We’ll only sketch a proof of Theorem 11.5. If Cf is a Boolean circuit computing f, we build C
by replacing each gate of Cf with one or more Toffoli gates. We replace NOT gates with Pauli X
gates and AND gates with

73
a

0 a∧b

Here we use a fresh ancilla qubit for the second control wire.If we need to copy the Boolean value
of a qubit, we can use

a a

0 a

Here, we use a fresh ancilla qubit for the second control wire, and flip it from 0 to 1 with an X gate.
To replace an OR gate, we can first express it with AND and NOT gates according to De Morgan’s
laws, then replace the AND and NOT gates as above.
Notice that the following one-gate circuit cleanly implements the C-NOT gate (the ancilla stays
0):

0 X X 0

Thus we can use C-NOT gates in our simulation “for free.”

After making all these replacements, we get a circuit that may behave something like this:

x1 g1
..
. ..
.
xn
0 D gk
.. z1
. ..
.
0 zm

74
x1 x1
.. ..
. .. .
.
xn xn
∗
0 D D 0
.. ..
. ... .

0 0
y1 y1 ⊕ z 1
.. .. ..
. . .
ym ym ⊕ z m

Figure 4: A full implementation of the circuit C. Inputs and ancilla values are restored by undoing
the computation after copying the outputs to fresh qubits. The locations of the output register and
the ancilla are swapped for ease of display. The circuit implementing D∗ , the inverse of D is an
exact mirror image of the circuit for D. The values on the qubits intermediate between the D and
D∗ subcircuits, from top down, are g1 , . . . , gk , z1 , . . . , zm . A C-NOT gate connects each zi with the
qubit carrying yi . Some additional ancillæ (not shown) are used to implement the C-NOT gates
via Toffoli gates.

The intended outputs z1 , . . . , zm are somewhere on the right-hand side, and we show them below
the other qubits, which contain unused garbage values g1 , . . . , gk . This circuit, which implements
some unitary operator D, is reversible but may not be clean. We have to clean it up. First, we copy
the intended outputs onto fresh wires using C-NOT gates, then we undo the D computation by
applying the exact same gates as in D but in reverse order, taking note that both the Toffoli and X
gates are their own inverses. The final circuit is shown in Figure 4.

Exercise 11.6 (Challenging because it’s long) The circuit below outputs 1 if and only if at least two
of x1 , x2 , x3 are 1. The three gates in the left column are AND gates; the other two are OR gates.

x2 maj(x1 , x2 , x3 )

Convert this circuit into a reversible circuit as in Theorem 11.5, above. Can you make any im-
provements to the construction?

75
Why Clean? We’d like to occasionally include one circuit as a subcircuit of another circuit. When
we do this, we want to ignore any additional ancilla qubits used by the subcircuit, considering them
“local” to the subcircuit, as we did in Figure 4 with the C-NOT gates. If we don’t restore the ancilla
qubits to their original values, then we can’t ignore them as we’d like. Some of the computation
will bleed into the unrestored ancilla qubits. This will be especially true with nonclassical quantum
circuits.
Let C be a circuit with unitary gates that acts on n input and output qubits, using m ancilla
qubits. Let H be the 2n -dimensional Hilbert space of the input/output qubits, and let A be the
2m -dimensional space of the ancilla. Then C is a unitary operator in L(H ⊗ A). If C is clean, then
it restores the ancilla to |0 · · · 0i, provided the ancilla started that way. Therefore, for every state
|ψin i ∈ H there is a unique state |ψout i ∈ H such that C(|ψin i ⊗ |0 · · · 0i) = |ψout i ⊗ |0 · · · 0i. Let
C 0 : H → H be the mapping that takes any |ψin i to the corresponding |ψout i. C 0 is clearly a linear
operator in L(H), and further, for any states |ψ1 i and |ψ2 i in H, we have

hψ1 |ψ2 i = hψ1 |ψ2 ih0 · · · 0|0 · · · 0i

= (hψ1 |h0 · · · 0|)(|ψ1 i|0 · · · 0i)
= (hψ1 |h0 · · · 0|C∗ )(C|ψ1 i|0 · · · 0i) (since C is unitary)
0∗ 0
= ((hψ1 |C )h0 · · · 0|)((C |ψ1 i)|0 · · · 0i) (by the definition of C 0 )
= ψ1 |C 0∗ C 0 |ψ2 h0 · · · 0|0 · · · 0i

= ψ1 |C 0∗ C 0 |ψ2 .

Thus C 0 preserves the inner product on H and so must be unitary. This justifies our suppressing
the ancilla when we use C as a new unitary “gate” in another circuit. We are really using C 0 , which
C implements with its “private” ancilla. We can’t do this for a general unitary C ∈ L(H ⊗ A).

12 Week 6: Measurement gates

Measurement gates. So far, we’ve only seen unitary gates, reflecting unitary evolution of the
qubit or qubits. To get any useful, classical information from a circuit, we must be able to make
measurements. At the very least, it is only reasonable that we should be able to measure the
(Boolean) value of a qubit, that is, we should be able to make a projective measurement {P0 , P1 }
of any qubit with respect to the computational basis. We represent such a measurement by the
one-qubit gate

(For those of you failing to appreciate the artistry of my iconography, the gate depicts an eye in
profile. In the book, this gate is depicted as the display of a gauge with a needle.) The incoming
qubit is measured projectively in the computational basis, and the classical result (a single bit) is
carried on the double wire to the right. If there are other qubits present in the system, then the
projective measurement is really {P0 ⊗ I, P1 ⊗ I}, where I is the identity operator applying to the
qubits not being measured (recall Exercise 10.4).

76
There are two uses for a qubit measurement. The first, obvious use is to read the answer from
the final state of a computation. The second is to control future operations in the circuit by using
the result of an intermediate measurement. For example, the result of a measurement may be used
to control another gate:

The U gate is applied to the second qubit if and only if the result of measuring the first qubit is 1.
Unlike a qubit, a classical bit can be duplicated freely and used to control many gates later in the
circuit.

Exercise 12.1 A general three-qubit state can be written as

X
|ψi = αx |xi,
x∈{0,1}3
P
where x |αx |2 = 1. For each i = 1, 2, 3, give an expression for the probability of seeing 1 when
the ith qubit is measured, and give the post-measurement state in each case.

Based on the discussion after Exercise 10.4, we may measure several different qubits at once,
since the actual chronological order of the measurements does not matter. Here’s a completely
typical example: we decide to measure qubits 2, 3, and 5 of a n-qubit system (where n > 5,
obviously). The state |ψi of an n-qubit system can always be expressed as a linear combination of
basis states: X
|ψi = αx |xi, (47)
x∈{0,1}n

where each αx is a scalar in C, and

X
|αx |2 = hψ|ψi = 1. (48)
n
x∈{0,1}

If we measure qubits 2,3, and 5 when the system is in state |ψi, what is the probability that we
will see, say, 101, i.e., 1 for qubit 2, 0 for qubit 3, and 1 for qubit 5? The corresponding projector is
P = I ⊗ P1 ⊗ P0 ⊗ I ⊗ P1 ⊗ I ⊗ I, where I is the single-qubit identity operator. The probability is then
X
Pr[101] = hψ|P|ψi = |αx |2 ,
x : x2 x3 x5 =101

where we are letting xj denote the jth bit of x. That is, we only retain those terms in the sum in (48)
in which the corresponding bits of x match the outcome. Upon seeing 101, the post-measurement
state will be
1 X
ψpost = P|ψi = αx |xi.

Pr[101] Pr[101]
x : x2 x3 x5 =101

We will often measure several qubits at once, so this example will come in handy.

77
Bell States and Quantum Teleportation. Recall the circuit of Figure 3. Let B be the two-qubit
unitary operator realized by this circuit. The four states obtained by applying B to the four
computational basis states are known as the Bell states and form the Bell basis:
√
:= B|00i = (|00i + |11i)/ 2,
+
Φ (49)
√
:= B|01i = (|01i + |10i)/ 2,
+
Ψ (50)
√
:= B|10i = (|00i − |11i)/ 2,
−
Φ (51)
√
:= B|11i = (|01i − |10i)/ 2.
−
Ψ (52)

These states are also called EPR states or EPR pairs. In a sense we will quantify later, these states
represent maximally entangle pairs of qubits. EPR is an acronym for Einstein, Podolsky, and
Rosen, who coauthored a paper describing apparent paradoxes in the rules of quantum mechanics
involving pairs of qubits in states such as these. Suppose a pair of electrons is prepared whose
spins are in one of the Bell states, say |Φ+ i. (There are actual physical processes that can do this.)
The electrons can then (theoretically) be separated by a great distance—the first taken by Alice to
a lab at UC Berkeley in California and the second taken by Bob to a lab at MIT in Massachusetts. If
Alice measures her spin first, she’ll see 0 or 1 with equal probability. Same with Bob if he measures
his spin first. But if Alice measures her spin first and sees, say, 0, then according to the standard
Copenhagen interpretation of quantum mechanics (which we are using), the state of the two spins
collapses to |00i, so if Bob measures his spin afterwards, he will see 0 with certainty. So Alice’s
measurement seems to affect Bob’s somehow. Einstein called this phenomenon “spooky action at
a distance.” We’ll talk about this more later, time permitting.
Philosophical problems aside, entangled pairs of qubits can be used in interesting and subtle
ways. One of the earliest discovered uses of EPR pairs is to teleport an unknown quantum state
across a distance using only classical communication, in a process called quantum teleportation. Sup-
pose Alice and Bob share two qubits in the state |Φ+ i as above, which may have been distributed
to them long ago. Suppose also that Alice has another qubit in some arbitrary, unknown state

|ψi = α|0i + β|1i.

She wants Bob to have this state. She could mail her electron to Bob, but this won’t work because
the state |ψi of the electron is very delicate and will be destroyed if the package is bumped, screened
with X-rays, etc. Instead, she can transfer this state to Bob with only a phone call. No quantum
states need to be physically transported between Alice and Bob. Here’s how it works: The state of
the three qubits initially is
√ √
|ψiΦ+ = (α|0i + β|1i)(|00i + |11i)/ 2 = (α|000i + α|011i + β|100i + β|111i)/ 2.

(53)

Alice possesses the first two qubits; Bob possesses the third. Alice applies the inverse B∗ of the
circuit of Figure 3:

H
B∗ =

78
b1
|ψi
B∗ b2



|Φ+ i

 X Z |ψi

Figure 5: Quantum teleportation of a single qubit. Alice possesses the first qubit in some arbitrary,
unknown state |ψi. The second and third qubits are an EPR pair prepared in the state |Φ+ i
sometime in the past, with the second qubit given to Alice and the third to Bob. Alice applies B∗
to her two qubits, then measures both qubits, then communicates the results b1 , b2 ∈ {0, 1} of the
measurements to Bob. Bob uses this information to decide whether to apply Pauli X and Z gates
to his qubit.

to her two qubits. She then measures each qubit in the computational basis, getting Boolean values
b1 and b2 for the first and second qubits, respectively. She then calls Bob on the phone and tells
him the values she observed, i.e., b1 and b2 . Bob then does the following with his qubit (the third
qubit): (i) if b2 = 1, then Bob applies an X gate, otherwise he does nothing; then (ii) if b1 = 1, then
he applies a Z gate, otherwise he does nothing. At this point, Bob’s qubit will be in state |ψi. We
can illustrate the process by the circuit in Figure 5. Let’s check that Bob actually does wind up
with |ψi. It will make our work easier to first express the initial state of (53) using the Bell basis.
It’s easy to check that
√
|00i = Φ+ + Φ− / 2,

√
|01i = Ψ+ + Ψ− / 2,
√
|10i = Ψ+ − Ψ− / 2,
√
|11i = Φ+ − Φ− / 2,

so the initial state of (53) is

1 + −
|0i + α Ψ+ + Ψ− |1i + β Ψ+ − Ψ− |0i + β Φ+ − Φ− |1i

α Φ + Φ
2
1 +
Φ (α|0i + β|1i) + Ψ+ (α|1i + β|0i) + Φ− (α|0i − β|1i) + Ψ− (α|1i − β|0i) .

=
2
Going back to Equations (49–52) and applying B∗ to both sides, we see that B∗ maps |Φ+ i to |00i
and so on. So after Alice applies B∗ to her two qubits, the state becomes

1
[|00i (α|0i + β|1i) + |01i (α|1i + β|0i) + |10i (α|0i − β|1i) + |11i (α|1i − β|0i)] . (54)
2
Now Alice measures her two qubits. She’ll get one of four possible values: 00, 01, 10, 11, all with
probability 1/4. For b1 , b2 ∈ {0, 1}, let |ψb1 b2 i be the state of the three qubits after the measurement,

79
b2 b1



 X Z b1
|Φ i
+ B ∗


b2

Figure 6: Dense coding. The EPR pair is initially distributed between Alice and Bob, with Alice
getting the first qubit. The stuff above the dotted line belongs to Alice, and the rest belongs to Bob.
The qubit crosses the dotted line when Alice sends it to Bob.

assuming the result is b1 , b2 . By applying the corresponding projectors to the state in (54) and
normalizing, we get

|ψ00 i = |00i(α|0i + β|1i) = |00i|ψi,

|ψ01 i = |01i(α|1i + β|0i) = |01i(X|ψi),
|ψ10 i = |10i(α|0i − β|1i) = |10i(Z|ψi),
|ψ11 i = |11i(α|1i − β|0i) = |11i(XZ|ψi).

We see that Bob’s qubit is now in one of four possible states: |ψi, X|ψi, Z|ψi, or XZ|ψi, depending
on whether the values measured by Alice are 00, 01, 10, or 11, respectively. Now Bob simply uses
the information about b1 and b2 to undo the Pauli operators on his qubit, yielding |ψi in every
case.
This scenario can be used to teleport an n-qubit state from Alice to Bob by teleporting each
qubit separately, just as above.
Note that Alice must tell Bob the values b1 and b2 so that Bob can recover |ψi reliably. This
means that quantum states cannot be teleported faster than the speed of light. Also note that after
the protocol is finished, Alice no longer possesses |ψi. She can’t, because that would violate the
No-Cloning Theorem. Finally, note that the EPR state that Alice and Bob shared before the protocol
no longer exists. It is used up, and can’t be used to teleport additional states. Thus, teleporting an
n-qubit state needs n separate EPR pairs.

Dense Coding. In quantum teleportation, with the help of an EPR pair, Alice can substitute
transmitting a qubit to Bob with transmitting two classical bits. There is a converse to this: with
the help of an EPR pair, Alice can substitute transmitting two classical bits to Bob with transmitting
a single qubit. This inverse trade-off is known as dense coding.
Figure 6 illustrates how dense coding works. Alice has two classical bits b1 and b2 that she
wants to communicate to Bob. She also shares an EPR pair with Bob in state |Φ+ i as before. If
b2 = 1, Alice applies X to her half of the EPR pair, otherwise she does nothing. Then, if b1 = 1, she
applies Z to her qubit, otherwise she does nothing. She then sends her qubit to Bob. Bob now has
both qubits. He applies B∗ to them then measures each of his qubits, seeing b1 and b2 as outcomes
with certainty.

80
Here are the four possible states of the two qubits when Alice sends her qubit to Bob, corre-
sponding to the four possible values of b1 b2 (here, I is the one-qubit identity operator):

|ψ00 i = (I ⊗ I)Φ+ = Φ+ ,

√
|ψ01 i = (X ⊗ I)Φ+ = (|10i + |01i)/ 2 = Ψ+ ,

√
|ψ10 i = (Z ⊗ I)Φ+ = (|00i − |11i)/ 2 = Φ− ,

√
|ψ11 i = (ZX ⊗ I)Φ+ = (|01i − |10i)/ 2 = Ψ− .

So Alice is just preparing one of the four Bell states. So when Bob applies B∗ to |ψb1 b2 i, he gets
|b1 b2 i, yielding b1 b2 upon measurement.
Note that, as before, the EPR pair is consumed in the process.

Exercise 12.2 Recall the two-qubit swap operator SWAP satisfying SWAP|ai|bi = |bi|ai for all
a, b ∈ {0, 1}. Show that the four Bell states are eigenvectors of SWAP. What are the corresponding
eigenvalues? For this and other reasons, the states |Φ+ i, |Φ− i, and |Ψ+ i are often called symmetric
states, triplet states, or spin-1 states, while the state |Ψ− i is often called the antisymmetric state, the
singlet state, or the spin-0 state.

81
13 Week 7: Basic quantum algorithms

Black-Box Problems. Many quantum algorithms solve what are called “black-box” problems.
Typically, we are given some Boolean function f : {0, 1}n → {0, 1}m and we want to answer some
question about the function as a whole, for example, “Is f constant?”, “Is f the zero function?”, “Is
f one-to-one?”, etc. We are allowed to feed an input x ∈ {0, 1}n to f and get back the output f(x).
The input x is called a query to f and f(x) is the query answer. Other than making queries to f, we
are not allowed to inspect f in any way, hence the black-box nature of the function. (A black-box
function f is sometimes called an oracle.) Generally, we would like to answer our question by
making as few queries to f as we can, since queries may be expensive.
In the context of quantum computing, the function f is most naturally given to us as a classical,
unitary gate Uf that acts on two quantum registers—the first with n qubits and the second with
m qubits—and behaves as follows for all x ∈ {0, 1}n and y ∈ {0, 1}m :

Uf |x, yi = |x, y ⊕ f(x)i.

This is reasonable, given the restriction that unitary quantum gates must be reversible. Uf is
called an f-gate. To solve a black-box problem involving f, we are allowed to build a quantum
circuit using f-gates—as well as the other usual unitary gates. Each occurrence of an f-gate in the
circuit counts as a query to f, so the number of queries is the number of f-gates in the circuit. The
difference between classical queries to f and quantum queries to f is that we can feed a superposition
of several classical inputs (basis states) into the f-gate, obtaining a corresponding superposition
of the results. We’ll see in a minute that we can use this idea, known as quantum parallelism to get
more information out of f in fewer queries than any classical computation.

Deutsch’s Problem and the Deutsch-Jozsa Problem. The first indication that quantum compu-
tation may be strictly more powerful than classical computation came with a black-box problem
posed by David Deutsch: Given a one-bit Boolean function f : {0, 1} → {0, 1}, is f constant, that is,
is f(0) = f(1)? There are four possible functions {0, 1} → {0, 1}: the constant zero function, the con-
stant one function, the identity function, and the negation function. Deutsch’s task is to determine
whether f falls among the first two or the last two. Classically, it is clear that determining which is
the case requires two queries to f, since we need to know both f(0) and f(1). Quantally, however,
we can get by with only one query to f. Define
√
|+i := H|0i = (|0i + |1i)/ 2, (55)
√
|−i := H|1i = (|0i − |1i)/ 2, (56)

where H is the Hadamard gate. The states |+i and |−i correspond to the states |+xi and |−xi we
defined earlier when we were discussing the Bloch sphere. If we feed these states into Uf like so:

|+i
Uf
|−i

82
|0i H H
Uf
|0i X H

Figure 7: The full circuit for Deutsch’s problem. The second qubit is not used after it emerges from
the f-gate.

then the progression of states through the circuit from left to right is
|+i|−i = (|0i + |1i)(|0i − |1i)/2
= (|00i − |01i + |10i − |11i)/2
Uf
7→ (|0, f(0)i − |0, 1 ⊕ f(0)i + |1, f(1)i − |1, 1 ⊕ f(1)i)/2
=: |ψout i.
If f is constant, i.e., if f(0) = f(1) = y for some y ∈ {0, 1}, then
|ψout i = (|0, yi − |0, 1 ⊕ yi + |1, yi − |1, 1 ⊕ yi)/2
= (|0i + |1i)(|yi − |1 ⊕ yi)/2
= (−1)y (|0i + |1i)(|0i − |1i)/2
= (−1)y |+i|−i.
If f is not constant, i.e., if f(0) = y = 1 ⊕ f(1) for some y ∈ {0, 1}, then
|ψout i = (|0, yi − |0, 1 ⊕ yi + |1, 1 ⊕ yi − |1, yi)/2
= (|0i − |1i)(|yi − |1 ⊕ yi)/2
= (−1)y (|0i − |1i)(|0i − |1i)/2
= (−1)y |−i|−i.
Now suppose we apply the Hadamard gate H to the first qubit of |ψout i. We obtain

±|0i|−i if f is contant,
|φi := (H ⊗ I)|ψout i =
±|1i|−i if f is not constant.
So now we measure the first qubit of |φi. We get 0 with certainty if f is constant, and we get 1
with certainty otherwise. We can prepare the initial state |+i|−i by applying two Hadamards and
a Pauli X gate. The full circuit is in Figure 7. We only use the f-gate once, but in superposition.
That is the key point.
Deutsch and Jozsa generalized this idea to a function f : {0, 1}n → {0, 1} with n inputs and one
output. The corresponding (n + 1)-qubit Uf gate looks like
 

 

.. ..
x . . x

 Uf 


y y ⊕ f(x)

83
We say that f is balanced if the number of inputs x such that f(x) = 0 is equal to the number of
inputs x such that f(x) = 1, namely, 2n−1 . The Deutsch-Jozsa problem is as follows: We are given f
as above as a black-box gate, and we know (we are promised) that f is either constant or balanced,
and we want to determine which is the case. Answering this question classically requires 2n−1 + 1
queries to f in the worst case, since it is possible that f is balanced but the first 2n−1 queries may
all yield the same answer. Quantally, we can do much better; one query to f suffices.
The set-up is similar to what we just did, but instead of using an (n + 1)-qubit f-gate directly,
it is easier to work with an n-qubit inversion f-gate If defined as follows for every x ∈ {0, 1}n :

If |xi = (−1)f(x) |xi.

That is, If leaves the values of the qubits alone but flips the sign iff f(x) = 1. We’ve defined If on
computational basis vectors. Since If is linear, this defines If on all vectors in the state space of n
qubits. If can be implemented cleanly (and easily) using Uf thus:

.. .. .. ..
. If . = . .
Uf

|0i X H H X |0i

For any input state |xi where x ∈ {0, 1}n , the progression of states through the circuit from left to
right is
X
|x, 0i 7→ |x, 1i
H
7→ |xi|−i
Uf √
7→ |xi(|f(x)i − |1 ⊕ f(x)i) 2
√
= (−1) f(x)
|xi(|0i − |1i)/ 2
= (−1)f(x) |xi|−i
H
7→ (−1)f(x) |xi|1i
X
7→ (−1)f(x) |x, 0i

as advertized. Since only one f-gate is used to implement If , each occurrence of If in a circuit
amounts to one occurrence of Uf in the circuit.
To determine whether f is constant or balanced, we use the following n-qubit circuit:

|0i H H
.. .. .. ..
. . If . .

|0i H H

84
The dots indicate that all n qubits start in state |0i, a Hadamard gate is applied to each qubit before
and after If , and all qubits are measured at the end. Before we view the progression of states, let’s
see what happens when we apply a column of n Hadamard gates all at once to n qubits in the state
|xi, for any x = x1 x2 · · · xn ∈ {0, 1}n . (We denote the n-fold Hadamard operator as H⊗n .) Noting
that, for all b ∈ {0, 1},
1 1 X
H|bi = √ (|0i + (−1)b |1i) = √ (−1)bc |ci,
2 2 c∈{0,1}

we get
n
H⊗n
O
|xi 7→ H|xi i
i=1

1
n
O X
= (−1)xi yi |yi i
2n/2
i=1 yi ∈{0,1}
1 X X
= ··· (−1)x1 y1 +···+xn yn |y1 i ⊗ · · · ⊗ |yn i
2n/2
y1 ∈{0,1} yn ∈{0,1}
1 X
= (−1)x·y |yi,
2n/2
y∈{0,1}n

where x · y = x1 y1 + · · · + xn yn denotes the standard dot product of two n-bit vectors x = x1 · · · xn

and y = y1 · · · yn .
Now let’s view the progression of states of the circuit above.

H⊗n 1 X
|00 · · · 0i 7→ |xi (because (00 · · · 0) · x = 0) (57)
2n/2 n
x∈{0,1}
I 1 X
7→f (−1)f(x) |xi (58)
2n/2 n
x∈{0,1}
H⊗n 1 X X
7→ (−1)f(x) (−1)x·y |yi (59)
2n n n
x∈{0,1} y∈{0,1}
1 X
= (−1)f(x)+x·y |yi (60)
2n n
x,y∈{0,1}
 
1 X X
=  (−1)f(x)+x·y  |yi (61)
2n n n
y∈{0,1} x∈{0,1}

Suppose first that f is constant, and we let |ψconst i denote this last state. Then (−1)f(x) = ±1
independent of x, and so we can bring it out side the sum:
 
1 X  X
|ψconst i = ± n (−1)x·y  |yi
2 n n
y∈{0,1} x∈{0,1}

85
X 1 X X
! !
1
= ± n (−1)0 |0 i ± n
n
(−1)x·y |yi
2 2
x y,0n x

1 X X
!
= ±|0n i ± (−1)x·y |yi.
2n n x
y,0

Since 2
1 X X

1 = hψconst |ψconst i = 1 + 2n x·y
(−1) ,
2 n
x
y,0
P
we must have x (−1)x·y = 0 for all y , 0n ,14 and thus
|ψconst i = ±|0n i.
When we measure the qubits in state |ψconst i, we will see 0n with certainty.
Now suppose that f is balanced, and we let |ψbal i denote the state of (61). Again separating the
|0n i-term from the rest, we get
   
1 X 1 X X
|ψbal i = n  (−1)f(x)  |0n i + n  (−1)f(x)+x·y  |yi.
2 n
2 n n
x∈{0,1} y,0 x∈{0,1}
P f(x)
But f is balanced, and so x (−1) = 0 because each term contributes +1 for f(x) = 0 and −1
for f(x) = 1. Thus,  
1 X X
|ψbal i = n  (−1)f(x)+x·y  |yi.
2 n n
y,0 x∈{0,1}

When we measure the qubits in state |ψbal i, we see 0n with probability zero. So we never see 0n ,
but instead we’ll see some random y , 0n .
To summarize: when we measure the qubits, if we see 0n , then we know that f is constant; if
we see anything else, then we know that f is balanced.

Exercise 13.1 (Challenging) Let f : {0, 1}n → {0, 1} be a Boolean function. We’ve implemented an
If gate using Uf and a few standard gates. Show how to implement Uf given a single If gate and
some standard gates. Thus Uf and If are computationally equivalent. Your circuit is allowed to
depend on the value of f(0n ), that is, you can have one circuit that works assuming f(0n ) = 0 and
another (slightly different) circuit that works assuming f(0n ) = 1. [Hint: Build a quantum circuit
with three registers: n input qubits; one output qubit; n ancilla qubits. Assume x ∈ {0, 1}n is the
classical input. Using one Hadamard and n Toffoli gates, convert the input state |x, y, 0n i into the
superposition
|x, 0, 0n i + (−1)y |x, 1, xi
√ .
2
Then feed the ancilla register into If , then undo what you did before applying If . What state do
you wind up with? What else do you need to do, if anything?]
14
P
Here’s another way to see that x (−1)x·y = 0 for all y , 0n : If y ,P
0n , then one ofPy’s bits P
is 1. For convenience, let’s
0 0
assume that the first bit of y is 1, and we let y be the rest of y. Then x (−1)x·y = x1 ∈{0,1} x 0 ∈{0,1}n−1 (−1)x1 x ·1y =
0
P P x 0 ·y 0
P 0 0 P 0 0
x1 (−1)
x1
x 0 (−1) = x 0 (−1)x ·y − x 0 (−1)x ·y = 0.

86
Exercise 13.2 Here are some more circuit equalities for you to verify. Remember that circuits
represent linear operators, and thus to show that two circuits are equal, it suffices to show that
they act the same on the vectors of some basis, e.g., the computational basis.

1. Check that

T
=
S T∗ T

2. In a similar vein, for any ϕ ∈ R show that

=
Rz (ϕ) Rz (ϕ/2) Rz (−ϕ/2)

[Hint: If the control qubit on the right-hand side is 1, then XRz (−ϕ/2)XRz (ϕ/2) is applied
to the target qubit. Note that XRz (−ϕ/2)X = XeiϕZ/4 X = eiϕXZX/4 = e−iϕZ/4 = Rz (ϕ/2).
The second equation follows from Exercise 9.3(7).]

3. (Challenging but recommended) Show that

= S

H T∗ T T∗ T H

Combining this with item 1 above gives a circuit implementing the Toffoli gate using only
C-NOT, H, T , and T ∗ gates (and we could do without T ∗ explicitly by using T 7 instead,
because T 8 = I). The Nielsen & Chuang textbook has a closely similar implementation of the
Toffoli gate on page 182, but it’s not optimal; it has one more gate than is necessary. [Hint: It
will help first to transform this equation into an equivalent one by applying H gates on the
third qubit to both sides of both circuits, i.e., unitarily conjugating both sides of the equation
by I ⊗ I ⊗ H. This has the effect of canceling out both the H gates on the right-hand circuit,
and the left-hand side becomes

H H Z

87
which flips the overall sign of the state (i.e., gives an eiπ = −1 phase change) iff all three
qubits are 1. The advantage of doing this is that now nothing in the right-hand circuit creates
any superpositions; each gate maps a computational basis state to a computational basis
state, up to a phase factor. Now proceed by cases, considering the possible 0, 1-combinations
of the values of the three qubits, adding up the overall phase angles generated. You can
simplify the task further by noticing a few general facts:
• A 0 on the control qubit of a C-NOT gate eliminates the gate.
• Adjacent T and T ∗ gates on the same qubit cancel.
• Adjacent C-NOT gates with the same control and target qubits cancel.]

Exercise 13.3 (Challenging) This exercise is a puzzler that is best solved by finding the right series
of rotations of the Bloch sphere. Find a single-qubit unitary U such that

=
H U∗ U

Furthermore, you are restricted to expressing U as the product of a sequence of operators, all of
which are either H or T . [Hint: You are trying to find a U such that UXU∗ = H. X gives a π-rotation
of the Bloch sphere about the x-axis, and H gives a π-rotation about the line ` through the point in
the x, z-plane halfway
√ between
√ the +x- and +z-axes, with spherical coordinates (π/4, 0) (Cartesian
coordinates (1/ 2, 0, 1/ 2)). So U must necessarily give a rotation that moves the x-axis to `, so
that U∗ (applied first) moves ` to the x-axis, then X (applied second) rotates π around the x-axis,
then U (applied last) moves the x-axis back to `, the net effect of all three being a π-rotation about
`. One possibility for U is a (−π/4)-rotation about the y-axis, but you must implement this using
just H and T , the latter giving a π/4-rotation about the z-axis.]

14 Week 7: Simon’s problem

Simon’s Problem. The Deutsch-Jozsa problem is hard to decide classically, requiring exponen-
tially many (in n) queries to f. But there is a sense in which this problem is easy classically: if
we pick inputs to f at random and query f on those inputs, we quickly learn the right answer
with high probability. If we ever see f output different values, then we know for certain that f is
balanced, since it is nonconstant. Conversely, if f is balanced and we make 100 random queries
to f, then the chances that f gives the same answer to all our queries is exceedingly small—2−99 .
So we have an efficient randomized algorithm for finding the answer: Make m uniformly and
independently random queries to f, where m is, say, 100. If the answers are all the same, output
“constant”; otherwise, output “balanced.” We will never output “balanced” incorrectly. We might
output “constant” incorrectly, but only with probability 21−m , i.e., exponentially small in m. This
algorithm runs in time polynomial in n and m.
As with classical computation, quantum circuits can simulate classical randomized compu-
tation. We won’t pursue that line further here, though. Instead, we’ll now see a black-box
problem—Simon’s problem—that

88
• can be solved efficently with high probability on a quantum computer, but
• cannot be solved efficiently by a classical computer, even by a randomized algorithm that is
allowed a probability of error slightly below 1/2.

In Simon’s problem, we are given a black-box Boolean function f : {0, 1}n → {0, 1}m , for some
n 6 m. We are also given the promise that there is an s ∈ {0, 1}n such that for all distinct
x, y ∈ {0, 1}n ,
f(x) = f(y) ⇐⇒ x ⊕ y = s.
This condition determines s uniquely: either s = 0n and f is one-to-one, or s , 0n in which case f
is two-to-one with f(x) = f(x ⊕ s) for all x, and s is the unique nonzero input such that f(s) = f(0).
Our task is to find s.
The function f is given to us via the gate Uf as before, such that Uf |x, yi = |x, y ⊕ f(x)i for
all x ∈ {0, 1}n and y ∈ {0, 1}m . Consider the following quantum algorithm with two quantum
registers—an n-qubit input register and an m-qubit output register.

1. We start with the two registers in the all-zero state |0n , 0m i.

P
2. We then apply H⊗n to the input register, obtaining the state 2−n/2 x∈{0,1}n |x, 0m i.
P
3. We then apply Uf to get the new state 2−n/2 x∈{0,1}n |x, f(x)i.

4. We apply H⊗n to the first register again to get the state

X
|ψout i = 2−n (−1)x·y |y, f(x)i.
x,y∈{0,1}n

5. We now measure the first register (all n qubits), obtaining some value y ∈ {0, 1}n .

Exercise 14.1 Draw the quantum circuit implementing the algorithm above.

What z do we get in the last step? Note that f(x) = f(x ⊕ s) for all x, and that as x ranges
through all of {0, 1}n , so does x ⊕ s. Thus we can rewrite |ψout i as a split sum and combine terms
in pairs:
1
|ψout i = (|ψout i + |ψout i)
2
X X
!
= 2−n−1 (−1)x·y |y, f(x)i + (−1)(x⊕s)·y |y, f(x ⊕ s)i
x,y x,y
X X
!
= 2 −n−1
(−1)x·y |y, f(x)i + (−1)(x⊕s)·y |y, f(x)i
x,y x,y
X
−n−1 x·y x·y+s·y
|y, f(x)i

= 2 (−1) + (−1)
x,y
X
= 2−n−1 (−1)x·y [1 + (−1)s·y ] |y, f(x)i
x,y
X
= 2 −n
(−1)x·y |y, f(x)i.
x,y : s · y is even

89
The basis states |y, f(x)i for which s · y is odd cancel out, and we are left with a superposition of
only states where s · y is even, with probability amplitudes differing only by a phase factor. So in
Step 5 we will see an arbitrary such y ∈ {0, 1}n , uniformly at random. If s = 0n , then s · y is even
for all y, so each y ∈ {0, 1}n will be seen with probability 2−n . If s , 0, then s · y is even for exactly
half of the y ∈ {0, 1}n , each of which will be seen with probability 21−n .
How does this help us find s? If s , 0n and we get some y in Step 5, then we know that
s · y is even, which eliminates half the possibilities for s. Repeating the algorithm will give us
some y 0 independent of y such that s · y 0 is even. This added constraint will most likely cut our
search space in half again. After repeated executions of the algorithm, we will get a series of
random constraints like this. After a modest number of repetitions, the constraints taken together
will uniquely determine s with high probability. To show this, we need a brief linear algebraic
digression, which will also help us when we discuss binary codes later.

Linear Algebra over Z2 . Until now, we’ve been dealing with vectors and operators with scalars
in C (and occasionally R). These are not the only two possible scalar domains (known in algebra as
fields) over which to do linear algebra. Another is the two-element field Z2 := {0, 1}, with addition
and multiplication defined thus:

+ 0 1 × 0 1
0 0 1 0 0 0 .
1 1 0 1 0 1

Addition and multiplication are the same as in Z, except that 1+1 = 0. Addition is also the same as
the XOR operator ⊕. The additive identity is 0 and the multiplicative identity is 1. Since x + x = 0
in Z2 for any x, the negation −x (additive inverse) of x is x itself. Thus subtraction is the same as
addition. Finally, note that for all x1 , . . . , xn ∈ Z2 , x1 + · · · + xn = 0 (in Z2 ) if and only if x1 + · · · + xn
(in Z) is even.
Column vectors, row vectors, and matrices over Z2 are defined just as over C, except that all
the entries are in Z2 and all scalar arithmetic is done in Z2 . We call these objects bit vectors and bit
matrices. We can identify binary strings in {0, 1}n with bit vectors in Zn2.
Most of the basic concepts of linear algebra can be extended to Z2 (indeed, any field). Matrix
addition and multiplication, trace and determinant of square matrices, and square matrix inversion
are defined completely analogously to the case of C. Same with vector spaces, subspaces, and
linear operators. All the basic results of linear algebra carry over to Z2 . For example,

• For any n, tr is a linear operator from the space of n × n matrices to Z2 , and tr(AB) = tr(BA)
for any conformant bit matrices A and B such that AB is square.

• det(AB) = (det A)(det B) for any square A and B, and A is invertible iff det A , 0.

• charA (λ) = det(A − λI) as before. Its roots are the eigenvalues of A.

• Linear combination, linear (in)dependence, span, and the concept of a basis are the same as
before. Every bit vector space has a basis, and any two bases of the same space have the
same cardinality (the dimension of the space).

90
• The adjoint A∗ is defined as the transpose conjugate as before, but in Z2 we define 0∗ = 0
and 1∗ = 1, and so the adjoint is the same as the transpose in this case.

• The scalar product of two (column) bit vectors x and y is x∗ y = x · y, but here the result is in
Z2 , where 0 represents an even number of 1s in the sum and 1 represents an odd number of
1s. In all of our uses of the dot product of bit vectors, we’ve only cared about whether the
value was even or odd, so we’re not losing any utility here.

• Orthogonality can be defined in terms of the dot product as before, as well as mutually
orthogonal subspaces and the orthogonal complement V ⊥ of a subspace V of some bit
vectors space A. If A has dimension n and V ⊆ A is a subspace of dimension k, then V ⊥ has
dimension n − k as before, and (V ⊥ )⊥ = V as before.

Not everything works the same over Z2 as over C. Here are some differences:

• An n-dimensional vector space over Z2 is finite, with exactly 2n elements, one for each
possible linear combination of the basis vectors

• There is no notion of “positive definite.” We can have x · x = 0 but x , 0 (i.e., x has a positive
but even number of 1s). The norm of a vector cannot be defined in the same way as with C,
however, a useful norm-like quantity associated with each bit vector x is the number of 1s in
x, known as the Hamming weight of x and denoted wt(x).

• The concept of unit vector and orthonormal basis don’t work over Z2 like they do over C,
and there is no Gram-Schmidt procedure.

• Mutually orthogonal subspaces may have nonzero vectors in their intersection. Indeed, it
may be the case that V ⊆ V ⊥ for nontrivial V.

• Z2 is not algebraically closed. This means, for example, that a square matrix may not have
any eigenvectors or eigenvalues.

Exercise 14.2 Let    

0 1 0 1 1 0
A =  0 0 1  and B =  1 0 1 
1 0 0 0 1 1
be bit matrices. Compute AB, tr A, tr B, det A, and det B. All arithmetic is in Z2 .

Exercise 14.3 Find the two 2 × 2 matrices over Z2 that have no eigenvalues or eigenvectors.
(Challenging) Prove that there are only two.

Let A be an m × n matrix (over any field F). The rank of A, denoted rank A, is the maximum
number of linearly independent columns of A (or rows—it does not matter). Equivalently, it is
the dimension of the span of the columns of A (or rows—it does not matter). An m × n matrix
A has full rank if rank A = min(m, n), and this is the highest rank and m × n matrix can have. A
square matrix is invertible if and only if it has full rank. The kernal of A, denoted ker A, is the set
of column vectors v ∈ Fn such that Av = 0. The kernal of A is a subspace of Fn . Its dimension is

91
known as the nullity of A. A standard theorem in linear algebra is that the sum of the rank and the
nullity of A is equal to the number of columns of A, i.e., n. The rank of any given bit matrix A is
easy to compute; you can use Gaussian elimination, for example. If the nullity of A is positive, it
is also easy to find a nonzero bit vector v such that Av = 0 (the right-hand side is the zero vector
(a bit vector)).

Back to Simon’s Problem. If we run the quantum algorithm above k times for some k > n, we
get k independent, uniformly random vectors y1 , . . . , yk ∈ Zn
2 such that the following k linear
equations hold:
y1 · s = 0,
..
.
yk · s = 0.
Let A be the k × n bit matrix whose rows are the yi . Then the above can be expressed as the single
equation As = 0, where 0 denotes the zero vector in Zk 2 . Thus, s ∈ ker A.
The whole solution to Simon’s problem is as follows: Run the algorithm above n times,
obtaining y1 , . . . , yn ∈ Zn
2 . Let A be the n × n bit matrix whose rows are the yi .

1. If rank A < n − 1, then give up (i.e., output “I don’t know”).

2. If rank A = n, i.e., if A is invertible, then output 0.
3. Otherwise, rank A = n − 1. Find the unique s , 0 such that As = 0. Using the Uf gate two
more times, compute f(0) and f(s). If they are equal, then output s; otherwise, output 0.

Several things need explaining here. For one thing, the algorithm may fail to find s, outputting
“I don’t know.” We’ll see that this is reasonably unlikely to happen. For another thing, if we find
that A is invertible in Step 2, then we know that s = A−1 0 = 0, so our output is correct. Finally, in
Step 3 we know that an s exists and is unique: the nullity of A is n − rank A = n − (n − 1) = 1,
so ker A is a one-dimensional space, which thus has 21 = 2 elements, one of which is the zero
vector. The final check is to determine which of these is the correct output. So if the algorithm does
output an answer, that answer is always correct. Such a randomized algorithm (with low failure
probability) is called a Las Vegas algorithm, as opposed to a Monte Carlo algorithm which is allowed
to give a wrong answer with low probability.
What are the chances of the algorithm failing? If the algorithm fails, then rank A < n − 1, which
certainly implies that the matrix formed from first n − 1 rows of A has rank less than n − 1. So
if we bound the latter probability, we bound the probability of failure. For 1 6 k 6 n, let Ak be
the bit matrix formed from the first k rows of A. Each row of A is a uniformly random bit vector
in the space S = {0, s}⊥ , which has dimension n − 1 (if s , 0) or n (if s = 0). Thus S has at least
2n−1 vectors. Consider the probability that rank An−1 = n − 1, i.e., that An−1 has full rank. This
is true iff all rows of An−1 are linearly independent, or equivalently, iff the Ak have full rank for
all 1 6 k 6 n − 1. We can express this probability as a product of conditional probabilities:
Y
n−1
Pr[rank An−1 = n − 1] = Pr[rank A1 = 1] Pr[rank Ak = k | rank Ak−1 = k − 1].
k=2

92
Clearly, rank A1 = 1 iff its row is a nonzero bit vector in S, and so

|S| − 1 2n−1 − 1
Pr[rank A1 = 1] = > = 1 − 21−n .
|S| 2n−1

Now what is Pr[rank Ak = k | rank Ak−1 = k − 1] for k > 1? If rank Ak−1 = k − 1, then the rows
of Ak−1 are linearly independent, and thus span a (k − 1)-dimensional subspace of D ⊆ S that has
2k−1 elements. Assuming this, Ak will have full rank iff its last row is linearly independent of the
other rows, i.e., the last row is an element of S − D. Thus,

|S| − |D| 2n−1 − 2k−1

Pr[rank Ak = k | rank Ak−1 = k − 1] = > = 1 − 2k−n .
|S| 2n−1

Putting this together, we have

Y
n−1 Y
n−1
Pr[rank An−1 = n − 1] > (1 − 2k−n ) = (1 − 2−k ) = pn−1 ,
k=1 k=1

where we define
Y
m
pm := (1 − 2−k ) (62)
k=1

for all m > 0. Clearly, 1 = p0 > p1 > · · · > pn > · · · > 0, and it can be shown that if
p := limm→∞ pn , then 1/4 < p < 1/3. Thus the chances are better than 1/4 that An−1 will have
full rank, and so the algorithm will fail with probability less than 3/4. This seems high, but if we
repeat the whole process r times independently, then the chances that we will fail on all r trials is
less than (3/4)r , which goes to zero exponentially
P in r. The expected number of trials necessary to
succeed at least once is thus at most ∞ r=1 (r/4)(3/4) r−1 = 4.

Shor’s Algorithm for Factoring. In the early 1990s, Peter Shor showed how to factor an integer
N on a quantum computer in time polynomial in lg N (which is roughly the number of bits needed
to represent N in binary). All known classical algorithms for factoring run exponentially slower
than this (with a somewhat liberal definition of “exponentially slower”). Although it has not
been shown that no fast classical factorization algorithm exists, it is widely believed that this is
the case (and RSA security depends on this being the case). Shor’s algorithm is the single most
important quantum algorithm to date, because of its implications for public key cryptography.
Using similar techniques, Shor also gave quantum algorithms for quickly solving the discrete
logarithm problem, which also has cryptographic (actually cryptanalytical) implications. To do
Shor’s algorithm correctly, we need a couple more mathematical detours.

Modular Arithmetic. If a and m are integers and m > 0, then we can divide a by m and get two
integer results—quotient and remainder. Put another way, there are unique integers q, r such that
0 6 r < m and a = qm + r. We let a mod m denote the number r. For any integer m > 1, we let
Zm = {0, 1, . . . , m − 1} = {a mod m : a ∈ Z}, and we define addition and multiplication in Zm just
as in Z except that we take the result mod m. Our previous discussion about Z2 is a special case of
this. Arithmetic in Zm resembles arithmetic in Z in several ways:

93
• Both operations are associative and commutative.

• Multiplication distributes over addition, i.e., x(y + z) = xy + xz in Zm .

• 0 is the additive identity, and 1 is the multiplicative identity of Zm .

• A unique additive inverse (negation) −x ∈ Zm exists for each element x ∈ Zm , such that
x + (−x) = 0. In fact, −0 = 0, and −x = m − x if x , 0. Clearly, −(−x) = x, and (−x)y = −xy
in Zm . Subtraction is defined as addition of the negation as usual: x − y = x + (−y).

• A multiplicative inverse (reciprocal) may or may not exist for any given element x ∈ Zm
(that is, a b ∈ Zm such that xb = 1 in Zm ). If it does, it is unique and written x−1 or 1/x,
and we say that x is invertible or a unit. If x is a unit, then so is x−1 , and (x−1 )−1 = x. 0 is
never a unit, but 1 and −1 are always units. Division can be defined as multiplication by the
reciprocal as usual, provided the denominator is a unit: x/y = x(1/y), provided 1/y exists.

• We define exponentiation as usual: xn is the product of x with itself n times, where x ∈ Zm

and n ∈ Z with n > 0. We let x0 = 1 by convention. If x is a unit, then we can define
x−n = (1/x)n as usual.

We let Z∗m be the set of all units in Zm . Z has only two units—1 and −1—but Zm may have many
units other than ±1. The units of Zm are exactly those elements x that are relatively prime to m
(i.e., gcd(x, m) = 1). If m is prime, then all nonzero elements of Zm are units. In any case, Z∗m
contains 1 and is closed under multiplication and reciprocals, but not necessarily under addition.

Exercise 14.4 What is Z∗30 ? Pair the elements of Z∗30 with their multiplicative inverses.

For any x ∈ Z∗m we define the order of x in Z∗m to be the least r > 0 such that xr = 1. Such an
r must exist: The elements of the sequence 1, x, x2 , x3 , . . . are all in Zm , which is finite, so by the
Pigeon Hole Principle there must exist some 0 6 s < t such that xs = xt . Multiplying both sides
by x−s , we get 1 = x−s xs = x−s xt = xt−s , and incidentally, t − s > 0.

Factoring Reduces to Order Finding. Shor’s algorithm does not factor N directly. Instead it
solves problem of finding the order of an element x ∈ Z∗N . This is enough, as we will now see.
Let N be a large composite integer, and let x be an element of Z∗N . Suppose that you had at
your disposal a black box into which you could feed x and N, and the box would promptly output
the order of x in ZN . Then you could use this box to find a nontrivial factor of N quickly and with
high probability via the following (classical!) Las Vegas algorithm:

1. Input: a composite integer N > 0.

2. If N is even, then output 2 and quit.

3. If N = ab for some integers a, b > 2, then output a and quit. (To see that this can be done
quickly, note that if a, b > 2 and ab = N, then 2b 6 ab = N and so 2 6 b 6 lg N. For each b,
you can try finding an integer a such that ab = N by binary search.)

94
4. (At this point, N is odd and not a power. This means that N has at least two distinct odd
prime factors, in particular, there are odd, coprime p, q > 1 such that N = pq.) Pick a random
x ∈ ZN .

5. Compute gcd(x, N) with the Euclidean Algorithm. If gcd(x, N) > 1, then output gcd(x, N)
and quit.

6. (At this point, x ∈ Z∗N .) Use the order-finding black box to find the order r of x in Z∗N .

7. If r is odd, then give up (i.e., output “I don’t know” and quit).

8. (r is even.) Compute y = xr/2 in ZN . If y = −1 (in ZN ), then give up.

9. (y , −1.) Compute gcd(y − 1, N) and output the result.

Shor’s quantum algorithm provides the order-finding black box for this reduction.

95
15 Week 8: Factoring and order finding (cont.)

This algorithm (really a randomized reduction of Factoring to Order Finding) is clearly efficient
(polynomial time in lg N), given black-box access to Order Finding. We need to check two things:
(i) the algorithm, if it does not give up, outputs a nontrivial factor of N, and (ii) the probability of
it giving up is not too big—at most 1 − ε for some constant ε, say.

Notation 15.1 For a, b ∈ Z, we let a | b mean that a divides b, or that b is a multiple of a, precisely,
there is a c ∈ Z such that ac = b. Clearly, if a > 0, then a | b iff b = 0 in Za . We write a 6 | b to
mean that a does not divide b.

Anything the algorithm outputs in Steps 2, 3, or 5 is clearly correct. The only other output
step is Step 9. We claim that gcd(y − 1, N) is a nontrivial factor of N: We have y , −1 in ZN by
assumption, or equivalently, N 6 | y + 1. Also, y , 1 in ZN , since otherwise xr/2 = 1 in ZN , which
contradicts the fact that r is the least such exponent. Thus N 6 | y − 1. Yet we have y2 = xr = 1 in
ZN , which means that N | y2 − 1 = (y + 1)(y − 1). So N divides (y + 1)(y − 1) but neither of its
two factors. The only way this can happen is when y + 1 includes some, but not all, of the prime
factors of N, and likewise with y − 1. Thus 1 < gcd(y − 1, N) < N, and so we output a nontrivial
factor of N in Step 9.
The algorithm could give up in Steps 7 or 8. Giving up in Step 7 means that r is odd. We show
that at most half the elements of Z∗N have odd order, and so the algorithm gives up in Step 7 with
probability at most 1/2. In fact, we show that if x ∈ Z∗N has odd order r, then −x in ZN (which is
also in Z∗N ) has order 2r. So at least one element of each pair ±x has even order, and so we’re done
since Z∗N is made up of such disjoint pairs. First, we have

(−x)2r = (−1)2r x2r = ((−1)2 )r (xr )2 = 1r 12 = 1,

where all arithmetic is in ZN . So −x has order at most 2r. Now suppose that −x has order
s < 2r. Then 1 = (−x)s = (−1)s xs . We must have s , r, for otherwise this would become
1 = (−1)r xr = (−1)r = −1, since r is odd (and since N > 2, we have 1 , −1 in ZN ). Now since
0 < s < 2r but s , r, we must have xs , 1, and because (−1)s xs = 1, we cannot have (−1)s = 1.
Thus (−1)s = xs = −1. But now,

−1 = (−1)r = (xs )r = xrs = (xr )s = 1s = 1,

contradiction. Therefore, −x has order 2r.

We claim that, if the algorithm makes it to Step 8, then it gives up in this step at most half the
time. We won’t prove the claim, since that would get us too much into number theoretic waters, but
we’ll give some reasonable evidence that it is true. Recall that by Step 8, we have N = pq, where p
and q are odd and coprime. Define a map d : ZN → Zp × Zq such that d(x) = (x mod p, x mod q)
for all x ∈ ZN . Here are some easy-to-prove facts about d. To avoid confusion, for any n > 1 we’ll
use +n and ·n to denote addition in Zn and multiplication in Zn , respectively. Let x, y ∈ ZN be
arbitrary, and let d(x) = (x1 , x2 ) and d(y) = (y1 , y2 ).

• d(x +N y) = (x1 +p y1 , x2 +q y2 ).

96
• d(x ·N y) = (x1 ·p y1 , x2 ·q y2 ).
• d(1) = (1, 1).
• d(−1) = (−1, −1). More generally, d(−x) = (−x1 , −x2 ).
• x ∈ Z∗N if and only if x1 ∈ Z∗p and x2 ∈ Z∗q .

It turns out that d is a bijection from ZN to Zp × Zq . This is a consequence of the following classic
theorem in number theory:

Theorem 15.2 (Chinese Remainder Theorem (dyadic version)) Let p, q > 0 be coprime and let N =
pq. Define d : ZN → Zp × Zq by d(x) = (x mod p, x mod q). Then d is a bijection, i.e., for every
x1 ∈ Zp and x2 ∈ Zq , there exists a unique x ∈ ZN such that d(x) = (x1 , x2 ).

I’ll include the proof here for you to read on your own if you want, but I won’t present it in
class.
Proof. Set p̃ = p mod q and q̃ = q mod p. Since gcd(p, q) = 1, we also have gcd(p̃, q) =
gcd(p, q̃) = 1, and hence p̃ ∈ Z∗q and q̃ ∈ Z∗p . Let p̃−1 and q̃−1 be the reciprocals of p̃ in Z∗q and of
q̃ in Zp , respectively. Given any x1 ∈ Zp and x2 ∈ Zq , let x = (x1 q̃−1 q + x2 p̃−1 p) mod N (normal
arithmetic in Z). Clearly, x ∈ ZN . Then letting d(x) = (y1 , y2 ), we get
y1 = [(x1 q̃−1 q + x2 p̃−1 p) mod N] mod p
= (x1 q̃−1 q + x2 p̃−1 p) mod p
= x1 q̃−1 q mod p
= x1 q̃−1 q̃ mod p
= x1 mod p
= x1 ,
and similarly,
y2 = [(x1 q̃−1 q + x2 p̃−1 p) mod N] mod q
= (x1 q̃−1 q + x2 p̃−1 p) mod q
= x2 p̃−1 p mod q
= x2 p̃−1 p̃ mod q
= x2 mod q
= x2 .
Thus d(x) = (x1 , x2 ), which proves that d is surjective. To see that d is injective, let x, y ∈ ZN be
such that d(x) = d(y) = (x1 , x2 ). Then d(x −N y) = (x1 −p x1 , x2 −q x2 ) = (0, 0), and so we have
(x − y) mod p = (x − y) mod q = 0, or equivalently, p | x − y and q | x − y. But since p and q are
coprime, we must have N | x − y, and so,
x = x mod N = y mod N = y,
which shows that d is an injection. 2
We won’t discuss it here, but given x1 , x2 , one can quickly (and classically) compute inverses in
Z∗n , and thus find the unique x such that d(x) = (x1 , x2 ), using the Extended Euclidean Algorithm.

97
Exercise 15.3 In this exercise, you will prove some standard results about the cardinality of Z∗n for
any integer n > 1. For any such n, the Euler totient function is defined as
ϕ(n) := |Z∗n |,
which is the number of elements of Zn that are relatively prime to n. By convention, ϕ(1) := 1.

1. Show that if a, b > 0 are coprime, then ϕ(ab) = ϕ(a)ϕ(b). [Hint: Show that the bijection
d defined in Theorem 15.2 above (with p = a and q = b) matches elements of Z∗ab with
elements of Z∗a × Z∗b and vice versa.]
2. Show that if n is some power of a prime p, then ϕ(n) = n(p − 1)/p. [Hint: An element
x ∈ Zn is relatively prime to n iff x is not a multiple of p.]
3. Conclude that if n = qe1 1 qe2 2 · · · qkek
is the prime factorization of n, where q1 < q2 < . . . < qk
are all prime and e1 , e2 , . . . , ek > 0, then

Y
k
e −1
ϕ(n) = qj j (qj − 1).
j=1

Dividing this by n, we get

ϕ(n) Y
k
1
= 1− . (63)
n qj
j=1

4. Using Equation (63), prove that for all integers n > 0,

ϕ(n) 1
> . (64)
n 1 + lg n
[Hint: Use the fact that if n = qe1 1 qe2 2 · · · qekk is the prime factorization of n as above, then
k 6 lg n (why?) and the fact that qj > j + 1 for all 1 6 j 6 k (why?).]

Exercise 15.4 (Challenging) Show that ϕ(n)/n > 1/ lg n for all integers n > 1 except 2 and 6.
[Hint: For ` > 0, let n` be the product of the first ` primes. Using the inequality (64) above, show
that, for any ` > 0, if ϕ(n` )/n` > 1/ lg n` , then ϕ(n)/n > 1/ lg n for all n > n` . Then find an ` for
which the hypothesis is true.]

Back to the issue at hand. When y = xr/2 is computed in Step 8, we have y2 = xr = 1, and so y
is one of the square roots of 1 in ZN . Both 1 and −1 are square roots of 1 in ZN for any N, but in this
case (N = pq as above) there are at least two others. Whereas d(1) = (1, 1) and d(−1) = (−1, −1),
by the Chinese Remainder Theorem, there is an x ∈ ZN such that d(x) = (1, −1). By the bijective
nature of d, we have x , ±1, and so x and −x are two additional square roots of 1 besides ±1.
There could be still others. We won’t prove it, but if x is chosen uniformly at random among those
elements of Z∗N with even order, then xr/2 is at least as likely to be one of the other square roots of
1 than ±1, where r is the order of x. Thus Step 8 gives up with probability at most 1/2.
So the whole reduction succeeds in outputting a nontrivial factor of N with probability at least
1/4. As with Simon’s algorithm, we can expect to run this reduction about four times to find such
a factor. Running it additional times decreases the likelihood of failure exponentially.

98
Geometric series. This elementary fact will be useful in what is to come.

Proposition 15.5 For any r ∈ C such that r , 1, and for any integer n > 0,

X
n−1
rn − 1
ri = .
r−1
i=0

You can prove this by induction on n. If n = 0, then both sides are 0. Now assume the equation
holds for fixed n > 0. Then
X
n X
n−1
rn − 1 rn (r − 1) + rn − 1 rn+1 − 1
ri = rn + ri = rn + = = .
r−1 r−1 r−1
i=0 i=0
Pn−1
The sum i=0 ri is called a finite geometric series with ratio r.

The Quantum Fourier Transform. The Fourier transform is of fundamental importance in many
areas of science, math, and engineering. For example, it is used in signal processing to pick out
component frequencies in a periodic signal (and we will see how this applies to Shor’s order-
finding algorithm). The auditory canal inside your ear acts as a natural Fourier transformer,
allowing your brain to register different frequencies (of musical notes, say) inherent in the sound
waves entering the ear.
A quantum version of the Fourier transform, known as the quantum Fourier transform or QFT,
is a crucial ingredient in Shor’s algorithm.
Let m > 1 be an integer. We will define the m-dimensional discrete Fourier transform15 DFTm
is a linear map Cm → Cm that takes a vector x = (x0 , . . . , xm−1 ) ∈ Cm and maps it to the vector
y = (y0 , . . . , ym−1 ) ∈ Cm satisfying

1 X 2πijk/m
m−1
yj = √ e xk
m
k=0

for all 0 6 j < m.16 Set ωm := e2πi/m . Clearly, m is the least positive integer such that ωm
m = 1.
a
We call ωm the principal m-th root of unity. Note that ωm = ωm a mod m for any a ∈ Z, so we can
consider the exponent of ωm to be an element of Zm .
The matrix corresponding to DFTm is the m × m matrix whose (j, k)th entry is [DFTm ]jk =
√
ωjk
m / m, for all 0 6 j, k < m, i.e., for all j, k ∈ Zm . (It will be more convenient for the time being
to start the indexing at zero rather than one.) In fact, DFTm is unitary, and it is worth seeing why
this is so. We check that (DFTm )∗ DFTm has diagonal entries 1 and off-diagonal entries 0. For
general j, k, we have
1 X −`j `k 1 X `(k−j)
[(DFTm )∗ DFTm ]jk = ωm ωm = ωm . (65)
m m
`∈Zm `∈Zm
15
There are continuous versions of the Fourier transform.
16
There is some variation in the definition of√DFTm in different sources; for example, there may be a minus sign in
the exponent of e, or there may be no factor 1/ m in front. The current definition is the most useful for us.

99
P
If j = k, then the right-hand side is (1/m) `∈Zm 1 = 1. Now suppose j , k. Then 0 < |k − j| < m,
and so ωd m , 1, where d = k − j. To see that the sum on the right-hand side of (65) is 0, notice that
it is a finite geometric series with ratio ωd
m , 1, and so we have

X (ωd m (ωm d
m) − 1 m) − 1
(ωd `
m) = = =0,
ωdm−1 ωd
m−1
`∈Zm

because (ωm d 0 d d
m ) = (ωm ) = 1 = 1.
Naively applying DFTm to a vector in Cm requires Θ(m2 ) scalar arithmetic operations. A
much faster method, known as the Fast Fourier Transform (FFT), can do this with O(m lg m) scalar
arithmetic operations. The FFT was described by Cooley & Tukey in 1965, but the same idea can
be traced back to Gauss. It uses divide-and-conquer, and is easiest to describe when m is a power
of 2. The FFT is also easily parallelizable: it can be computed by an arithmetic circuit of width
m and depth lg m called a butterfly network. Because of this, the FFT has been rated as the second
most useful algorithm ever, second only to fast sorting. Besides its use in digital signal processing,
it is also used to implement the asymptotically fastest known algorithms, due to Schönhage &
Strassen, for multiplying integers and polynomials.
It was Shor who first showed that DFT2n could be implemented by a quantum circuit on n
qubits with size polynomial in n, and his idea is based on the Fast Fourier Transform. From now
on, the dimension will be a power of 2, so I’ll define the n-qubit quantum Fourier transform QFTn
to be DFT2n . For notational convenience, I’ll also define
n
en (x) := ωx2n = e2πix/2

for all n, x ∈ Z with n > 0. Note that

• en (x + y) = en (x)en (y) for all y ∈ Z, and

• en (x) = en (x mod 2n ).

Thus, for any x ∈ Z2n , we have

1 X
QFTn |xi = en (xy)|yi .
2n/2 y∈Z2n

Interestingly, this sum factors completely.

1
QFTn |xi = ( |0i + e1 (x)|1i ) ⊗ ( |0i + e2 (x)|1i ) ⊗ · · · ⊗ ( |0i + en (x)|1i )
2n/2
n
1 O
= ( |0i + ek (x)|1i ).
2n/2
k=1

Exercise 15.6 (Challenging) Verify this fact.

Before we describe a circuit for QFTn , we will sketch out and analyze Shor’s quantum algorithm
for order-finding, which is a Monte Carlo algorithm. This description and the QFTn circuit layout
later on are adapted with modifications from a paper by Cleve & Watrous in 2000.

100
1. Input: N > 1 and a ∈ Z∗N with a > 1. (The algorithm attempts to find the order of a in Z∗N .)
Let n = dlg Ne.

2. Initialize a 2n-qubit register and an n-qubit register in the state |0i|0i. Here we will label the
basis states of a register with nonnegative integers via their usual binary representations.

3. Apply a Hadamard gate to each qubit of the first register, obtaining the state

1 X
(H⊗2n ⊗ I)|0i|0i = |xi|0i.
2n
x∈Z22n

4. Apply a classical quantum circuit for modular exponentiation that sends |xi|0i to |xi|ax mod Ni,
obtaining the state
1 X
|ϕi = n |xi|ax mod Ni . (66)
2
x∈Z22n

(We can imagine that N and a are hard-coded into the circuit, which means that the circuit
must be built in a preprocessing step after the inputs N and a are known. Alternatively, we
can keep N and a in separate quantum registers that don’t change during the course of the
computation, then feed them into this circuit when they’re needed.)

5. (Optional) Measure the second register in the computational basis, obtaining some classical
value w ∈ ZN , which is ignored.17

6. Apply QFT2n to the first register.

7. Measure the first register (in the computational basis), obtaining some value y ∈ Z22n . (This
ends the quantum part of the algorithm.)

8. Find the smallest coprime integers k and r > 0 such that

y k −2n−1
22n − r 6 2 . (67)

(We are just finding a reasonably good rational approximation to y/22n that has small
denominator r. This can be done classically using continued fractions. See below.)

9. Classically compute ar mod N. If the result is 1, then output r. Otherwise, give up.

16 Week 8: Shor’s algorithm (cont.)

Let R be the order of a in Z∗N . The whole key to proving that Shor’s algorithm works is to show
that in Step 9 the algorithm outputs R with high probability. First, we’ll show that a single run of
the algorithm above outputs R with probability at least 4/(π2 n) − O(n−2 ), and so if we run the
17
Since we ignore the result of the measurement, this step is entirely superfluous; the algorithm would would do just
as well without it. Including this step, however, collapses the state, which simplifies the analysis greatly and allows us
to ignore the second register altogether.

101
algorithm about π2 n/4 times, we will succeed with high probability. The actual single-run success
probability is usually much higher than 4/(π2 n), but 4/(π2 n) is a good enough approximate lower
bound, and it is easier to derive than a tighter lower bound. After the analysis, we’ll discuss how
the quantum Fourier transform and the (classical) continued fraction algorithm used in Step 8 are
implemented.
Shor’s algorithm, if it succeeds, will be guaranteed to output some r > 0 such that ar = 1 in
ZN . It is possible—although very unlikely—that r is a multiple of R, but not equal to it. If we
run the algorithm until it succeeds k times and take the gcd of the k results, then the chances of
not getting R are at most (1 − 4/(π2 n))k , which decrease exponentially with k. If we only want
to find a nontrivial factor of N, then we use this algorithm to implement the black box in the
Factoring-to-Order-Finding reduction. As the next exercise shows, we don’t need to worry about
the value returned by the black box if it succeeds.

Exercise 16.1 Suppose that on input N and x ∈ Z∗N , the black box used in the reduction from
Factoring to Order Finding is only guaranteed to output some r with 0 < r < 22n such that xr = 1
in ZN , where n = dlg Ne. Show how to modify the reduction slightly so that it succeeds with the
same probability as it did before when the box always outputted the order of x in Z∗N . [Hint: Let
R be the order of x in Z∗N . First, given any multiple r of R, show how to find an odd multiple of R
(that is, a number of the form cR where c is odd) that is no bigger than r. Second, show that the
probability of success of the reduction is the same if the black box returns some odd multiple of R.]

Analysis of Shor’s Algorithm. Let R be the order of a in Z∗N . We can express x uniquely as qR + s
with s ∈ ZR and note that, owing to the periodicity of ax mod N,

ax mod N = aqR+s mod N = (aR )q as mod N = 1q as mod N = as mod N .

Then letting Q := 22n and

1 X X
= √ |qR + si|as mod Ni + O 2−n/2 .
Q q∈Z s∈Z
M R

Now when the second register is measured in the next step, we obtain some w = as mod N
corresponding to some unique s ∈ ZR . The state after this measurement then collapses to either
 
1 X
√  |qR + si |wi ,
M + 1 q∈Z
M+1

102
if 0 6 s < 22n mod R, or to  
1 X
√  |qR + si |wi ,
M q∈Z
M

if 22n mod R 6 s < R. It does not really matter which is the case, as the analysis is nearly identical
and the conclusions (particularly Corollary 16.4, below) are the same either way, so for simplicity,
we’ll assume the latter case applies.18 Also, the second register will no longer participate in the
algorithm, so we can ingore it from now on. To summarize, the post-measurement state of the first
register is then given as
1 X
|ηi = √ |qR + si . (68)
M q∈Z
M

The next step of the algorithm applies QFT2n to this state to obtain

1 X
|ψi = QFT2n |ηi = √ QFT2n |qR + si (69)
M q∈Z
M

1 X X
= √ e2n ((qR + s)y)|yi (70)
QM q∈Z y∈Z
M 22n

1 X X
" #
= √ e2n ((qR + s)y) |yi (71)
QM y q

1 X X
" #
= √ e2n (sy) e2n (qRy) |yi . (72)
QM y q

Finally, the first register is measured, obtaining y ∈ Z22n with probability

2 2 2
e (sy) X X X

|e2n (sy)|2

2n
1
Pr[y] = √ e2n (qRy) = e2n (qRy) = e2n (qRy) .

QM QM

QM q q
q∈ZM

We’ll show that Pr[y] spikes when y/Q is close to a multiple of 1/R, but first some intuition.
Permit me an acoustical analogy. Think of the column vector |ηi as√a periodic signal with period R,
i.e., the entries at indices x = qR + s (for integral q) have value 1/ M, and all the other entries are
0. The “frequency” of this signal is then 1/R, and since the Fourier transform is good at picking out
frequencies, we’d expect to see a “spike” in the probability amplitude of the Fourier transformed
state |ψi of Equation (72) right around the frequencies 1/R, 2/R, 3/R, . . . , with 1/R being the
fundamental component of the signal and the others being overtones (higher harmonics). This is
exactly what happens, and it is the whole point of using the QFT. The larger the signal sample,
the sharper and narrower the spikes will be. We choose a sample of length Q, which is at least N2 ,
giving us at least N2 /R > N periods of the function. This turns out to give us sufficiently sharp
spikes to approximate R with high probability.
18
For the former case, just substitute M + 1 for M in the analysis to follow.

103
We now concentrate on the scalar quantity
X
e2n (qRy) (73)
q∈ZM

in the expression for Pr[y], above. For every y ∈ ZQ define

Ry mod Q if Ry mod Q 6 Q/2,
sy :=
(Ry mod Q) − Q if Ry mod Q > Q/2.

That is, sy is the remainder of Ry divided by Q with least absolute value. We have |sy | 6 Q/2, and
in addition, sy ≡ Ry (mod Q), and thus

e2n (qRy) = e2n (qsy ) (74)

for all q, and so (73) becomes



 e2n (Msy ) − 1
X X  if sy , 0,
e2n (sy ) − 1
e2n (qRy) = e2n (qsy ) = (75)


q∈ZM q∈ZM 
M if sy = 0,
P
noting that q∈ZM e2n (qsy ) is a finite geometric series with ratio e2n (sy ), provided sy , 0.
If |sy | is small, then Ry/Q is close to an integer, and so y/Q is close to an integer multiple of
1/R, which makes Step 8 of the algorithm more likely to find r = R. So we want to show that |sy |
is small with reasonably high probability. The following claim shows that if |sy | is small enough,
then (75) has large absolute value. This is true intuitively because the terms of the sum on the left
are all pointing roughly in the same direction in the complex plane and so they add constructively.
(Conversely, if |sy | is large, then the terms in the sum wrap around the unit circle many times,
mostly canceling each other out and giving (75) a small absolute value.)

Definition 16.2 We say that y ∈ ZQ is okay if |sy | 6 R/2.

P
Claim 16.3 If y is okay, then q∈ZM e2n (qRy) > 2M/π.

Proof. Fix y and suppose that |sy | 6 R/2. If sy = 0, then the claim clearly holds by (75), so assume
sy , 0. Starting from Equation (75) and using Exercise 2.5, we have

e2n (Msy ) − 1
= e2n (Msy /2)[e2n (Msy /2) − e2n (−Msy /2)]

e2n (sy ) − 1 e2n (sy /2)[e2n (sy /2) − e2n (−sy /2)]

|e2n (Msy /2)| |e2n (Msy /2) − e2n (−Msy /2)|

=
|e2n (sy /2)| |e2n (sy /2) − e2n (−sy /2)|

e2n (Msy /2) − e2n (−Msy /2)
=

e2n (sy /2) − e2n (−sy /2)

2i sin(Mθ) sin(Mθ)
=
= ,
2i sin θ sin θ

104
where θ := πsy /Q. Since we have

Q|sy | Q
|Msy | 6 6 ,
R 2
we know that |θ| 6 π/(2M) and |Mθ| 6 π/2. This gives
sin(Mθ) sin |Mθ| sin |Mθ|

sin θ = sin |θ| >
.
|θ|
sin x
It is readily checked that the function x is decreasing in the interval (0, π/2], so

sin |Mθ| sin(π/2) 2M

> =
|θ| π/(2M) π

as desired. 2

Corollary 16.4 If y is okay, then

4M 4 4
− O 2−2n .

Pr[y] > 2
> 2
− O(1/Q) = 2
Qπ Rπ Rπ

So for each individual okay y ∈ ZQ , we get a relatively large (but still exponentially small)
probability of seeing that particular y. We’ll need the additional fact that there are many okay y.
The following claim is obvious, so we’ll give it without proof:

Claim 16.5 For every k ∈ ZR , there exists y ∈ ZQ such that Ry is in the closed interval [Qk − R/2, Qk +
R/2]. Each such y is okay.

These intervals are pairwise disjoint for different k. This means there are at least R many okay
y. By Corollary 16.4, the chances of finding an okay y in Step 7 of the algorithm are then
X
4

4
−2n
Pr[y is okay] = Pr[y] > R 2
− O(2 ) > 2 − O(2−n ) .
Rπ π
y is okay

Let y be the value measured in Step 7, and suppose that y is okay. Then there is some least
ky ∈ Z such that
Qky − R/2 6 Ry 6 Qky + R/2. (76)
(Actually, ky is unique satisfying (76) because the intervals don’t overlap.) Dividing by QR and
rearranging, (76) becomes
k 6 1 = 2−2n−1 ,
y y
−
Q R 2Q
and so ky /R satisfies Equation (67). Now Step 8 produces the least k and r satisfying (67), so we
have two possible issues to address:

1. The k and r found in Step 8 are such that k/r , ky /R.

105
2. The k and r found in Step 8 satisfy k/r = ky /R, but r < R because the fraction ky /R is not in
lowest terms (k/r is always given in lowest terms).

It turns out that the first issue never arises. To see this, first notice that k/r and ky /R must be close
to each other, because they are both close to y/Q:

ky k ky y y k y ky y k 1 −2n
R − r = R − Q + Q − r 6 Q − R + Q − r 6 Q = 2 , (77)

since both k/r and ky /R satisfy

√ (67). Now suppose for the sake of contradiction that k/r , ky /R.
Recall that R < N 6 2n = Q, and also note that r 6 R by the minimality of r. Then we also have
ky k ky r − kR |ky r − kR|

1 1 1
0, − =
= > > √ √ = = 2−2n ,
R r rR rR rR Q Q Q

which contradicts (77). (The first inequality comes

√ from the fact that ky r − kR is a nonzero integer;
the second comes from the fact that r 6 R < Q.) Thus if y is okay, we must have k/r = ky /R.
The second issue is more of a problem. It arises when ky and R are not coprime, whence
k = ky /g and r = R/g, where g = gcd(ky , R) > 1.

Definition 16.6 We say that y ∈ ZQ is good if y is okay and ky is relatively prime to R.

Claim 16.7 If Step 7 produces a good y, then r = R is found in Step 8.

Proof. Let y ∈ ZQ be good. We have k/r = ky /R, since y is okay. But since both fractions are in
lowest terms, we must have k = ky and r = R. 2

Claim 16.8 There are at least ϕ(R) many good y ∈ ZQ .

(Recall that ϕ(n) is Euler’s totient function, defined in Exercise 15.3.)

Proof. Claim 16.5 says that every k ∈ ZR is equal to ky for some okay y ∈ ZQ . There are ϕ(R)
many k coprime with R, so there are at least ϕ(R) many good y. 2
Now we can combine all our claims to get our main Theorem 16.9, below.

Theorem 16.9 The probability that r = R is found in Step 8 is at least 4/(π2 n) − O(n−2 ).

Proof. By Claim 16.7, it suffices to show that a good y is found in Step 7 with probability at least
4/(π2 n) − O(n−2 ). By Claim 16.8 and Equation (64), there are at least
R R R
ϕ(R) > > >
1 + lg R 1 + lg N n+1

many good y. By Corollary 16.4, each good y occurs with probability at least about 4/(Rπ2 ), and
so
4
Pr[y is good] > 2 − O(n−2 ) .
π n

106
2
There are some tricks to (modestly) boost the probability of success of Shor’s algorithm while
keeping the number of repetitions of the whole quantum computation to a minimum. For example,
if an okay y is returned in Step 8 that is not good, it may be that gcd(ky , R) is reasonably small,
in which case R is a small multiple of r. If you can only afford to run the quantum portion of the
computation once, then in Step 9, you could try computing ar , a2r , a3r , . . . , anr (all modN) and
return the least exponent yielding 1, if there is one. If not, you could try relaxing the distance
bound 2−2n−1 in (67) to something bigger, in the hope that the y you found, if not okay, is close
to okay (if y is not okay, it is more likely than random of being close to one that is). If you can
afford to run the quantum computation twice, obtaining r1 and r2 respectively in Step 8, then
taking the least common multiple lcm(r1 , r2 ) yields R with much higher probability than you can
get by running the quantum computation just once. Using ideas like these, it can be shown that
one can boost the probability of finding R (using one run of the quantum part of the algorithm) to
a positive constant, independent of n.
This concludes the analysis of Shor’s algorithm. The only things left are (i) to show how the
QFT is implemented efficiently with a quantum circuit, and (ii) describe how Step 8 is implemented
by a classical algorithm. We’ll take these in reverse order.

107
17 Week 9: Best rational approximations

The Continued Fraction Algorithm. The book illustrates continued fractions as part of the order-
finding algorithm, with Theorem 5.1 on page 229, and Box 5.3 on the next page. We actually don’t
need to talk about continued fractions explicitly. All we need is to find an efficient classical
algorithm to implement Step 7, which we’ll do directly now.
For any real numbers a < b, there are infinitely many rational numbers in the interval [a, b].
We want to find one with smallest denominator and numerator.

Definition 17.1 Let a, b ∈ R with 0 < a < b. Define d to be the least positive denominator of any
fraction in [a, b]. Now define n ∈ Z to be least such that n/d ∈ [a, b]. We call the fraction n/d the
simplest rational interpolant,19 or SRI, of a and b, and we denote it SRI(a, b).

The fraction k/r found in Step 7 is just SRI((2y − 1)/22n+1 , (2y + 1)/22n+1 ).
Here is a simple, efficient, recursive algorithm to find SRI(a, b) for positive rational a < b.
Each step will include a comment explaining why it is correct.
SRI(a, b):

Input: Rational numbers a, b with 0 < a < b, each given in numerator/denominator form, where
both numerator and denominator are in binary.

Base Case: If a 6 1 6 b, then return 1 = 1/1. (Clearly, this is the simplest possible fraction!)

First Recursive Case: If 1 < a, then

1. Let q = da − 1e be the largest integer strictly less than a.

2. Recursively compute r = SRI(a − q, b − q).
3. Return r + q.

(Obviously, shifting the interval [a, b] by an integral amount shifts the SRI the same amount.
Also note that q > a/2—a fact that will be useful later.)

Second Recursive Case: Otherwise, b < 1.

1. Recursively compute r = SRI(1/b, 1/a).

2. Return 1/r.

(We claim that if d 0 /n 0 = SRI(1/b, 1/a), then n 0 /d 0 = SRI(a, b). Let n/d = SRI(a, b). We
show that n 0 /d 0 = n/d. Since n/d ∈ [a, b], we clearly have d/n ∈ [1/b, 1/a], and so n 0 6 n
by minimality of n 0 . Similarly, since d 0 /n 0 ∈ [1/b, 1/a], we have n 0 /d 0 ∈ [a, b], and so
d 6 d 0 by minimality of d. Thus we have n 0 /d 0 6 n/d. Suppose n 0 /d 0 < n/d. We
have n 0 /d 0 6 n 0 /d 6 n/d, so n 0 /d ∈ [a, b] and d/n 0 ∈ [1/b, 1/a]. We also have either
n 0 /d 0 < n 0 /d or n 0 /d < n/d. We can’t have the latter, owing to the minimality of n. But we
can’t have the former, either, for otherwise, d 0 /n 0 > d/n 0 , and this contradicts the minimality
of d 0 . Thus we must have n 0 /d 0 = n/d, and so SRI(a, b) = 1/SRI(1/b, 1/a).)
19
I’m making this term up. I’m sure there must be an official name for it, but I haven’t found what it is.

108
The comments suggest that the SRI algorithm is correct as long as it halts. It does halt, and
quickly, too. Let the original inputs be a = a0 = n0 /d0 and b = N0 /D0 , given as fractions in
lowest terms (n0 , d0 , N0 , and D0 are all positive integers). Similarly, for 0 < k, let ak = nk /dk and
bk = Nk /Dk be respectively the first and second argument to the kth recursive call to SRI. We
consider the product Pk := nk dk Nk Dk and how it changes with k. If the kth recursive call occurs
in the second case, then the numerators and denominators are simply swapped, so Pk = Pk−1 .
If the kth recursive call occurs in the first case, then dk = dk−1 and Dk = Dk−1 , but (letting
q := dak−1 − 1e)

nk = nk−1 − qdk−1 6 nk−1 − (ak−1 /2)dk−1 6 nk−1 /2 ,

and
Nk = Nk−1 − qDk−1 < Nk−1 ,
because q > ak−1 /2. Thus in this case, Pk < Pk−1 /2. The two recursive cases alternate, so Pk
decreases by at least half with every other recursive call. Since Pk > 0, we must hit the base
case after at most 2 lg P0 = 2(lg n0 + lg d0 + lg N0 + lg D0 ) recursive calls. For each k > 0, lg Pk
approximates the size of the input (in bits) up to an additive constant, and this size never increases
from call to call, so the whole algorithm is clearly polynomial time.

Exercise 17.2 What is SRI(7/25, 3/10)?

Exercise 17.3 (Challenging) Using your favorite programming language, implement the SRI algo-
rithm above. You can decide to accept either exact rational or floating point inputs.

Implementing the QFT. Recall that for all x ∈ Z2n ,

1 X
QFTn |xi = en (xy)|yi.
2n/2 y∈Z2n

It was Peter Shor who first showed how to implement QFTn efficiently with a quantum circuit, in
the same paper as his factoring algorithm. The following recursive description is taken from Cleve
& Watrous (2000). When n = 1, you can easily check that QFT1 = H, i.e., the one-qubit Hadamard
gate. Now suppose that n > 1 and let 1 6 m < n be an integer. QFTn can be decomposed into a
circuit using QFTn−m , QFTm , and two other subcircuits, as shown in Figure 8. The Pn,m gate acts
on two numbers—an (n − m)-bit number x ∈ Z2n−m and an m-bit number y ∈ Z2m —such that

Pn,m |x, yi = en (xy)|x, yi.

n
That is, Pn,m adjusts the phase of |x, yi by e2πixy/2 . (The P stands for “phase.”) In the figure, x is
fed to Pn,m in the upper n − m qubits, and y in the lower m qubits. We’ll see shortly how Pn,m
can be implemented in terms of simple gates. The unnamed gate on the far right of the figure
merely serves to move the qubits around, bringing the top n − m qubits to the bottom and bringing
the bottom m qubits to the top.20 These qubit-permuting gates can be left out when recursively
20
This, as well as any other rearrangement of qubits, can always be achieved by two layers of swap gates. Proving
this makes a great exercise.

109




 .. ..
.. .. .. .. .. .. . .
n−m . . . QFTn−m . . .




QFTn = Pn,m


. .. ..
.. .. .. .. .. . .
m .. . . . QFTm . .



Figure 8: QFTn in terms of QFTn−m and QFTm .

expanding QFTn−m and QFTm , as long as you keep track of which qubit is which and adjust the
elementary gates accordingly.
Many recursive decompositions are possible, based on the choice of m at each stage. Shor’s
original circuit for QFTn is obtained by recursively decomposing with m = 1 throughout. A
smaller depth circuit is achieved by a divide-and-conquer approach, letting m be roughly n/2
each time.
Let’s check that the decomposition of Figure 8 is correct. Given any n-bit number x ∈ Z2n , we
split its binary representation into its n − m high-order bits xh ∈ Z2n−m and its m low-order bits
x` ∈ Z2m . So we have x = xh 2m + x` , and we may write the state |xi as |xh , x` i or as |xh i|x` i.
Applying QFTn to |xi gives
1 X 1 X
QFTn |xi = en (xy)|yi = en ((xh 2m + x` )y)|yi. (78)
2n/2 y∈Z2n
2n/2 y

Expressing each y as yh 2n−m + y` for unique yh ∈ Z2m and y` ∈ Z2n−m , (78) becomes
1 X 1 X
en ((xh 2m + x` )(yh 2n−m + y` ))|yi = en−m (xh y` )em (x` yh )en (x` y` )|yi. (79)
2n/2 y
2n/2 y

(Notice that there is no xh yh exponent, since it is multiplied by 2n .) Now let’s see what happens
when the right-hand circuit of Figure 8 acts on |xi. We have
|xi = |xh i|x` i
QFTn−m 1 X
7−→ (n−m)/2
en−m (xh y` )|y` i|x` i
2 y` ∈Z2n−m
Pn,m 1 X
7−→ en−m (xh y` )en (y` x` )|y` i|x` i
2(n−m)/2 y
`

QFTm 1 X X
7−→ en−m (xh y` )en (y` x` )em (x` yh )|y` i|yh i
2n/2 y y ∈Z m
` h 2

110
1 XX
7−→ en−m (xh y` )en (y` x` )em (x` yh )|yh i|y` i
2n/2 y` yh
1 X
= en−m (xh y` )em (x` yh )en (x` y` )|yi,
2n/2 y∈Z2n

where we set y := yh 2n−m + y` as before. The last arrow represents the action of the qubit-
permuting gate. The final state is evidently the same as in (79), so the two circuits are equal.
Finally, we get to implementing the Pn,m gate. We’ll implement Pn,m entirely using controlled
phase-shift gates. For any θ ∈ R, define the conditional phase-shift gate as

πiθ 1 0
P(θ) := e Rz (2πθ) = .
0 e2πiθ

For example, I = P(1), Z = P(1/2), S = P(1/4), and T = P(1/8). For the controlled P(θ) gate—the
C-P(θ) gate—we clearly have
 
P(θ) 1 0 0 0
 0 1 0 0 
= =  .
 0 0 1 0 
P(θ) 0 0 0 e2πiθ

Owing to the symmetry between the control and target qubits, we will display this gate as

where we place the value θ somewhere nearby. Our θ-values will always be of the form 2−k for
integers k > 0. Notice that for any a, b ∈ Z2 ,

C-P(2−k )|ai|bi = ek (ab)|ai|bi. (80)

It is easiest to think of Pn,m as acting on two quantum registers—the first with n − m qubits
and the second with m qubits. What gates do we need to implement Pn,m ? Let’s consider Pn,m
applied to the state |xi|yi = |x1 x2 · · · xn−m i|y1 y2 · · · ym i, where x1 , . . . , xn−m and y1 , . . . , ym are all
bits in Z2 . We have

x X
n−m
y X
m
−j
= 0.x1 x2 · · · xn−m = xj 2 and = 0.y1 y2 · · · ym = yk 2−k ,
2n−m 2m
j=1 k=1

where the “decimal” expansions are actually base 2. Multiplying these two quantities gives

xy X X
n−m m
= xj yk 2−j−k ,
2n
j=1 k=1

111
1/4 1/8 1/16 1/32
1
2
3
4
5
1
2
3
4
1/64 1/128 1/256 1/512

Figure 9: The circuit implementing P9,4 . C-P(θ) gates are grouped according to the values of θ.
Within each group, gates act on disjoint pairs of qubits, so they can form a single layer of gates
acting in parallel.

and so Y Y
en (xy) = exp(2πixy/2n ) = exp(2πixj yk 2−j−k ) = ej+k (xj yk ).
j,k j,k

And thus we get  

Y
Pn,m |xi|yi =  ej+k (xj yk ) |xi|yi.
j,k

Recalling (80), notice that for each j and k, we can get the (j, k)th factor in the product above if
we connect the jth qubit of the first register (carrying xj ) with the kth
qubit
of the second register
(carrying yk ) with a C-P(2−j−k ) gate (which then acts on the state xj yk to get an overall phase
contribution of ej+k (xj yk )). So to implement Pn,m we just need to do this for all 1 6 j 6 n − m
and all 1 6 k 6 m. That’s it. All these gates will combine to give the correct overall phase shift
of en (xy). The order of the gates does not matter, because they all commute with each other (they
are all diagonal matrices in the computational basis). For example, Figure 9 shows the P9,4 circuit.

Exercise 17.4 Give two complete decompositions of QFT4 as circuits, the first using m = 1 through-
out, and the second using m = 2 for the initial decomposition. Both circuits should use only H
and C-P(2−k ) gates for k > 0. Do not cross wires except at the end of the entire circuit, that is, shift
any wire crossings in the recursive QFT circuits to the end of the overall circuit.

In Exercise 17.4, if you correctly shifted all the wire crossings to the end of the circuit, you may
have noticed that in the end you simply reverse the order of the qubits. That is not a coincidence;
an easy inductive argument shows that this must always be the case.

112
Exercise 17.5 (Challenging) Asymptotically, what is the size (number of elementary gates) of QFTn
when decomposed using m = 1 throughout (Shor’s circuit)? What is the size using the divide-
and-conquer method with m = n/2 throughout? The same questions for the depth (minimum
possible number of layers of gates acting on disjoint sets of qubits). Use big-O notation. In all
cases, you can ignore the qubit-permuting gates. [Hint: Find recurrence equations satisfied by the
size and the depth in each case.]

Actually, there is another way to implement Pn,m : Classically compute xy as an n-bit binary
integer, then for each k ∈ {1, 2, . . . , n}, send the kth qubit of the result through the gate P(2−k ).
There are fast parallel circuits for multiplication, with polynomial size and depth O(lg n). This
log-depth implementation of Pn,m together with the divide-and-conquer decomposition method
for QFTn give an O(n)-depth, polynomial-size circuit that exactly implements QFTn .

18 Week 9: Approximate QFT

Exact versus Approximate. The QFTn circuit we described above for Shor’s algorithm blithely
uses C-P(2−k ) gates where k ranges between 2 and n. If Shor’s algorithm is to significantly
outperform the best classical factoring algorithms, then n must be on the order of 103 and above,
which means that we will be using gates that produce conditional phase shifts of 2π/21000 or less.
No one in their right mind imagines that we could ever tune our instruments so precisely as to
produce so small a phase shift, which is required for any exact implementation of QFT1000 . The
bottom line is that implementing QFTn exactly for large n will just never be feasible.
Fortunately, an exact implementation is unnecessary for Shor’s algorithm or for any other
probabilistic quantum algorithm that uses the QFT. We can actually tolerate a lot of imprecision
in the implementation of the C-P(2−k ) gates. In fact, if k lg n, then C-P(2−k ) is close enough
to the identity operator that we can omit these gates entirely. The resulting circuit is much smaller
and produces a good approximation to QFTn that can be used in Shor’s algorithm. Good enough
so that the probability of finding R in Step 7 of the algorithm is at worst only slightly smaller than
with the exact implementation, thus requiring only a few more repetitions of the algorithm to
produce R with high probability.
In the next few topics, we’ll make this all quantitative. The concepts and techniques we
introduce are useful in other contexts. Before we do, we need a basic inequality known as the
Cauchy-Schwarz inequality.

The Cauchy-Schwarz inequality. We mentioned this inequality early in the course as proving
the triangle inequality for complex scalars, but this is the first time since then that we actually need
it. We’ll use it here to bound the effects of unitary errors in implementing a quantum circuit. We’ll
use it again in other contexts.

Theorem 18.1 (Cauchy-Schwarz Inequality) Let H be a Hilbert space. For any vectors u, v ∈ H,

|hu, vi| 6 kuk · kvk,

with equality holding if and only if u and v are linearly dependent.

113
Proof. There are many ways to prove this theorem. The Nielsen & Chuang textbook has a proof in
Box 2.1 on page 68, which we loosely paraphrase here. See Section B.1 of the background material
in Appendix B for another proof. Equality clearly holds if u and v are linearly dependent, since
then one vector is a scalar multiple of the other. So assume that u and v are linearly independent.
By the Gram-Schmidt procedure, we can find orthonormal vectors b1 , b2 such that b1 = u/kuk
and b2 = (v − hb1 , vib1 )/kv − hb1 , vib1 k. We thus have

u = ab1 ,
v = cb1 + db2 ,

for some a, c, d ∈ C with a > 0 and d > 0. We now get

kuk · kvk = a(|c|2 + d2 )1/2 > a(|c|2 )1/2 = a|c| = |ac| = |hab1 , cb1 + db2 i| = |hu, vi| .

Exercise 18.2 Show that ku + vk 6 kuk + kvk for any two vectors u, v ∈ H, with equality holding
if and only if one is a nonnegative scalar times the other. This is another example of a triangle
inequality. [Hint: Use Cauchy-Schwarz (Theorem 18.1) and the fact that <[z] 6 |z| for any z ∈ C.]

A Hilbert Space Is a Metric Space. For any two vectors u, v ∈ H, the Euclidean distance between
u and v is defined as
d(u, v) := ku − vk.
The function d satisfies the following axioms:

1. d(u, v) > 0,

2. d(u, v) = 0 iff u = v,

3. d(u, v) = d(v, u), and

4. d(u, v) 6 d(u, w) + d(w, v) for any w ∈ H.

These are the axioms for a metric on the set H. The last item is known as the triangle inequality,
which can be seen as follows:

d(u, v) = ku − vk = ku − w + w − vk 6 ku − wk + kw − vk = d(u, w) + d(w, v),

where the inequality follows from Exercise 18.2. All the other axioms are straightforward.
Suppose that you could run an ideal quantum algorithm to produce a state |ψi that you then
subject to some projective measurement. You would get certain probabilities for the various
possible outcomes. Suppose instead that you actually ran an imperfect implementation of the
algorithm and produced a state |ϕi that was close to |ψi in Euclidean distance, and you subjected
|ϕi to the same projective measurement. The next proposition shows that the probabilities of the
outcomes are close to those of the ideal situation.

114
Proposition 18.3 Let {Pa : a ∈ I} be some complete set of orthogonal projectors on H. Let u, v ∈ H be
any two unit vectors, and let Pru [a] and Prv [a] be the probability of seeing outcome a ∈ I when measuring
the state u and v respectively using this complete set. Then for every outcome a ∈ I,

|Pu [a] − Prv [a]| 6 2d(u, v).

Proof. We have

|Pru [a] − Prv [a]| = |u∗ Pa u − v∗ Pa v|

= |u∗ Pa u − u∗ Pa v + u∗ Pa v − v∗ Pa v|
= |u∗ Pa (u − v) + (u − v)∗ Pa v|
6 |u∗ Pa (u − v)| + |(u − v)∗ Pa v|
= |hPa u, u − vi| + |hu − v, Pa vi|
6 kPa uk · ku − vk + ku − vk · kPa vk
6 2ku − vk.

The second inequality is an application of Cauchy-Schwarz (Theorem 18.1); the third follows from
the fact that kPwk 6 kwk = 1 for any projector P and unit vector w (see Exercise 5.12). 2
The next definition extends the notion of distance to operators. Here we give one of many
possible ways to do this.

Definition 18.4 Let A ∈ L(H) be an operator. The operator norm of A is defined as

kAvk
kAk := sup kAvk = sup .
v∈H:kvk=1 v,0 kvk

This norm is also sometimes called the `∞ -norm on operators.

kAk is thus the maximum

√ of kAvk taken over all unit vectors v. Don’t confuse kAk, which is
a scalar, with |A| = A A, which is an operator. It can be shown that the maximum is actually
∗

achieved by some vector, i.e., there is always a unit vector v such that kAk = kAvk. Here are some
basic properties of the operator norm that follow quickly from the definition:

1. kAk > 0, with kAk = 0 iff A = 0.

2. kzAk = |z| · kAk for any scalar z ∈ C.

3. kA + Bk 6 kAk + kBk, for any B ∈ L(H).

4. kIk = 1, where I is the identity operator.

5. kUAk = kAUk = kAk for any unitary U ∈ L(H).

6. kAvk 6 kAk · kvk for any v ∈ H.

7. kABk 6 kAk · kBk for any B ∈ L(H).

115
8. kAk = k |A| k.

Exercise 18.5 Verify each of these items, based on the definition of k·k.

We can use the operator norm to define a metric d on L(H) just as we did with H.

Definition 18.6 For A, B ∈ L(H) define

d(A, B) := kA − Bk,

the operator distance between A and B.

Picking up on the last item, above, we see that A has the same norm as |A|. Since |A| > 0, there
is an eigenbasis {b1 , . . . , bn } of |A| with respect to which |A| = diag(λ1 , . . . , λn ), where λ1 > · · · >
λn > 0 are the eigenvalues of |A|. We claim that kAk = λ1 , i.e., kAk is the largest eigenvalue of
|A|. To see why, let v = (v1 , . . . , vn ) be any unit column vector with respect to this basis {bj }16j6n .
Then we have
Xn X
kAvk2 = hAv, Avi = λ2j |vj |2 = λ2j aj ,
j=1 j

where we set aj := |vj |2 . We have aj > 0 for all 1 6 j 6 n, and since v is a unit vector, we have
P
j aj = 1. So,

2
X
n
kAvk = λ2j aj
j=1
X
n
= λ21 a1 + λ2j aj
j=2
 
X
n X
n
= λ21 1 − aj  + λ2j aj
j=2 j=2
X
n
= λ21 + (λ2j − λ21 )aj .
j=2

Since λ2j − λ21 6 0 for all 2 6 j 6 n, the right-hand side is clearly maximized by setting a2 = · · · =
an = 0 (and so a1 = 1). So we must have kAk = k |A| k = k|A|b1 k = λ1 as claimed.
The next property follows from the claim, above.

9. If A and B are operators (not necessarily over the same space), then kA ⊗ Bk = kAk · kBk. In
particular, kA ⊗ Ik = kAk and kI ⊗ Bk = kBk.

This property is useful when we take the norm of a single gate in a circuit. The unitary operator
corresponding to the action of the gate is generally of the form U ⊗ I, where U corresponds to the

116
gate acting on the space of its own qubits, and the identity I acts on the qubits not involved with
the gate. Property 9 says that we can ignore the I when taking the norm of this operator.
To prove Property 9, we first prove that |A ⊗ B| = |A| ⊗ |B|. To show this, we only need to
verify two things: (i) (|A| ⊗ |B|)2 = (A ⊗ B)∗ (A ⊗ B) and (ii) |A| ⊗ |B| > 0. We leave (i) as an
exercise. For (ii), we first pick eigenbases for |A| and |B|, respectively. Then if |A| = diag(λ1 , . . . , λn )
with respect to the first basis and |B| = diag(µ1 , . . . , µm ) with respect to the second, then with
respect to the product of the two bases (itself an orthonormal basis), |A| ⊗ |B| is a diagonal matrix
whose diagonal entries are λj µk for all 1 6 j 6 n and 1 6 k 6 m. Since all the λj and µk are
nonnegative, the diagonal entries of |A| ⊗ |B| are all nonnegative. Hence, |A| ⊗ |B| > 0, which
proves (ii), and thus |A ⊗ B| = |A| ⊗ |B|. Now the largest eigenvalue of |A| ⊗ |B| is clearly λµ, where
λ = max(λ1 , . . . , λn ) = kAk and µ = max(µ1 , . . . , µm ) = kBk by the claim. Since |A| ⊗ |B| = |A ⊗ B|,
the product λµ is also the largest eigenvalue of |A ⊗ B|, and so using the claim again, we get
Property 9.

Exercise 18.7 Verify by direct calculation that (|A| ⊗ |B|)2 = (A ⊗ B)∗ (A ⊗ B).

While we’re on the subject, one more property of the operator norm will find use later on. If
you want, you can skip down to after the proof of Claim 18.8, below, and refer back to it later when
you need to.

10. kA∗ k = kAk for any operator A.

This property follows immediately from the following claim:

Claim 18.8 For any operator A, the operators |A| and |A∗ | are unitarily conjugate, i.e., there is a unitary
operator U such that |A∗ | = U|A|U∗ .

Since unitarily conjugate operators have the same spectrum, Claim 18.8 implies that |A| and |A∗ |
have the same largest eigenvalue, i.e., kAk = kA∗ k. Claim 18.8 itself follows from a fundamental
decomposition theorem known as the polar decomposition. For a proof of this decomposition, see
Section B.3 in Appendix B. The polar decomposition is closely related (in fact, equivalent) to the
singular value decomposition, which is also proved in Section B.3.

Theorem 18.9 (Polar Decomposition, Theorem B.8 in Section B.3) For every operator A there is a
unitary U such that A = U|A|. In fact, |A| is the unique positive operator H such that A = UH for some
unitary U.

If z ∈ C is a scalar, then obviously z = u|z| for some u ∈ C with unit norm (i.e., a phase factor).
Furthermore, |z| is the unique nonnegative real factor in any such decomposition, and if z , 0 then
u is unique as well. Theorem 18.9 generalizes this fact to operators in an analogous way. (If A is
nonsingular (invertible), then U is unique as well: it can be easily shown that if A is nonsingular
then |A| is nonsingular, whence U = A|A|−1 .)

117
Proof of Claim 18.8. Let A be an operator. By the polar decomposition (Theorem 18.9), there is a
unitary U such that A = U|A|. We have, using Exercise 9.32,
√ q q
|A∗ | = AA∗ = U|A|2 U∗ = U |A|2 U∗ = U|A|U∗ .
2
Now we consider an arbitrary idealized quantum circuit C with m many unitary gates, which
basically consists of a succession of unitary operators U1 , . . . , Um applied to some initial state |initi,
producing the state |ψi = Um · · · U1 |initi, which is then projectively measured somehow. When
implementing C we might implement each gate Uj imperfectly, getting some unitary Vj instead,
where hopefully, Vj is close to Uj . I will call this a unitary error. The actual circuit produces the
state |ψ 0 i = Vm · · · V1 |initi. Assuming d(Uj , Vj ) 6 ε for all 1 6 j 6 m, what can we say about
d(|ψi, |ψ 0 i)?
Classical calculations are often numerically unstable, and errors may compound multiplica-
tively. Fortunately for us, unitary errors only compound additively rather than multiplicatively,
so we can tolerate a fair amount of imperfection in our gates—only O(lg n) bits of precision per
gate for a circuit with a polynomially bounded (in n) number of gates.
Back to the question above. Using the basic properties of the operator norm listed above, we
get
d |ψi, ψ 0

= k(Um · · · U1 − Vm · · · V1 )|initik
6 kUm · · · U1 − Vm · · · V1 k · k|initik
= kUm · · · U1 − Vm · · · V1 k.
The operator inside the k·k on the right can be expressed as a telescoping sum:
X
m
Um · · · U1 − Vm · · · V1 = Um · · · Uk+1 (Uk − Vk )Vk−1 · · · V1 . (81)
k=1

Therefore,
X
m

kUm · · · U1 − Vm · · · V1 k = Um · · · Uk+1 (Uk − Vk )Vk−1 · · · V1

k=1
X
6 kUm · · · Uk+1 (Uk − Vk )Vk−1 · · · V1 k
k
X
= kUk − Vk k
k
X
6 ε
k
= mε,
and so d(|ψi, |ψ 0 i) 6 mε.
Suppose we want the probability of some outcome to differ from the ideal probability by no
more than some δ > 0. Then by Proposition 18.3, it suffices that 2mε 6 δ, or that
δ
ε6 .
2m

118
For example, the entire quantum circuit for Shor’s algorithm has size polynomial in n—let’s
say at most cnk gates for some constants c and k. (I’m not sure, but I believe that k 6 3. The
dominant contribution is not the QFT but rather the classical modular exponentiation circuit.) The
algorithm produces a good y (one that will lead to finding R) with probability at least 4/(π2 n),
ignoring a quadratically small correction term. We could settle instead for a success probability
of at least 2/(π2 n), say, which would require up to twice as many trials on average for success.
But then, choosing δ := 4/(π2 n) − 2/(π2 n) = 2/(π2 n), we could implement each gate to within an
error (operator distance) of

2/(π2 n) 1
εShor := k
= 2 k+1 = Θ(n−k−1 )
2cn π cn
away from the ideal. This has major implications for the QFT part of the circuit. The QFT has size
Θ(n2 ), uses n Hadamard gates, and the rest of the gates are C-P(2−j ) gates, where 2 6 j 6 n. (We
can do without the swap gates by keeping track of which qubit is which, and rearranging the bits
of the y value that we measure.) Note that for any θ ∈ R,
   
0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0
 = 2ieiπθ 
  
C-P(θ) − I = 
 0
.
0 0 0   0 0 0 0 
0 0 0 e 2πiθ −1 0 0 0 sin(πθ)

It follows that
d(C-P(θ), I) = kC-P(θ) − Ik = 2| sin(πθ)| 6 2πθ.
This means that if 2π2−j 6 εShor , or equivalently,

j > lg(2π/εShor ) = (k + 1) lg n + O(1),

then any C-P(2−j ) in the QFT circuit is close enough to I that we can just omit it. It’s easy to see
that most of the QFT gates are like this and can be omitted, shrinking the QFT portion of the circuit
from quadratic size to linear size in n. This fact was first observed by Coppersmith.
For n = 103 and assuming k = 3 we can get by with implementing each gate with error O(n−4 ),
which is on the order of one part per trillion. This is still a very tall order, but unlike 2−1000 it is
at least close to the realm of sanity. Optimizing other aspects of Shor’s algorithm and its analysis
increases the error tolerance considerably.

119
19 Midterm Exam

Do all problems. Hand in your anwsers in class on Wednesday, March 28, just as you would a
homework problem set. The only difference between this exam and the homeworks is that you
may not discuss exam questions or answers with anyone inside or outside of class except me. It
goes without saying that if you do, you have cheated and I’ll have to summarily fail you, which is
my usual policy about cheating. I know you won’t, though, so I’ll sleep well at night.
All questions with Roman numerals carry equal weight, but may not be of equal difficulty.
Recall that for two vectors or operators a, b, we say that a ∝ b if there is a phase factor eiθ
where θ ∈ R such that a = eiθ b.

I) (Rotating the Bloch sphere) Find a unit vector n̂ = (x, y, z) ∈ R3 on the Bloch sphere and an
angle ϕ ∈ [0, 2π) such that

Rn̂ (ϕ)|+xi ∝ |+yi,

Rn̂ (ϕ)|+yi ∝ |+zi,

where Rn̂ (ϕ) is defined in Exercise 9.4, and |+xi, |+yi, and |+zi are given by Equations (18–
20). Give the 2 × 2 matrix corresponding to your solution in the standard computational
basis, simplified as much as possible. What can you say about Rn̂ (ϕ)|+zi? There are exactly
two possible solutions to this problem.

II) (Phase factors and density operators) Let U and V be unitary operators over H. It is easy to
see that if U ∝ V, then UρU∗ = VρV ∗ for every state ρ. (Here, by “state” we mean a state in
the density operator formalism, i.e., a one-dimensional projection operator of the form |ψihψ|
for some unit vector |ψi.) Show the converse: If U and V are unitary and UρU∗ = VρV ∗ for
all states ρ, then U ∝ V. [Hint: Consider U and V in matrix form and show that every entry
of U is equal to the corresponding entry of V multiplied by the same phase factor. Use the
equation above for specific values of ρ. This technique is similar to that used in Exercise 9.26.]

III) (Tensor products of matrices) Let A be an arbitrary n × n matrix and let B be an arbitrary
m × m matrix.

(a) If A and B are both upper triangular, explain why A ⊗ B is also upper triangular.
(b) Suppose that A has eigenvalues λ1 , . . . , λn (with multiplicities), and that B has eigenval-
ues µ1 , . . . , µm (with multiplicities). Describe the eigenvalues of A ⊗ B. Note that here,
A and B are not necessarily upper triangular. [Hint: Use the previous item and things
we know about the eigenvalues of upper triangular matrices.]

IV) (Teleportation gone wrong) Alice and Bob think they are sharing a pair of qubits in the state
|Φ+ i, but instead the pair of qubits that they share is in one of the other three Bell states.
Suppose that they now attempt to do the standard one-qubit teleportation protocol to teleport
the state |ψi from Alice to Bob using this pair.

(a) Show that the state that Bob possesses at the end is, up to a phase factor, some Pauli
operator (X, Y, or Z) applied to |ψi. [Hint: You can save yourself a lot of calculation by
observing that the four Bell states are of the form (I ⊗ σ)|Φ+ i for σ ∈ {I, X, Z, XZ}.]

120
(b) Supposing Alice and Bob know that they share a pair of qubits in the state |Ψ− i, show
how they can alter their protocol to faithfully teleport |ψi. [Hint: Use the previous
item.]

V) (A black-box problem) Suppose f : (Z2 )n → Z2 is such that there is some s = s1 · · · sn ∈ (Z2 )n

such that for all x = x1 · · · xn ∈ (Z2 )n ,

f(x) = s · x,
P
n
Where s · x = j=1 s j xj mod 2 is the standard dot product of s and x over Z2 . Recall the
inversion gate If such that
If |xi = (−1)f(x) |xi
for all x ∈ (Z2 )n . The following describes a circuit that uses If once to find s:

(a) Initialize an n-qubit register in the state |0n i.

(b) Apply a Hadamard gate H to each of the n qubits. (This is a single layer.)
(c) Apply If to the n qubits.
(d) Apply a Hadamard gate H to each of the n qubits. (This is a single layer.)
(e) Measure the n qubits in the computational basis, obtaining some y ∈ (Z2 )n .

Do the following:

(a) Draw the circuit described above.

(b) Give the state of the n qubits after each unitary gate—or layer of gates—is applied.
(c) Show that y = s with certainty.
(d) Show how to find s classically by evaluating f on exactly n elements of (Z2 )n .

121
20 Week 10: Grover’s algorithm

Quantum Search. You are given an array A[1 . . . N] of N values, one of which is a recognizable
target value t. You want to find the position w of t in the list. The values are not necessarily sorted
or arranged in any particular way. Classically, the best you can do in the worst case is to probe
all A[j] for 1 6 j 6 N, and find the target on the last probe. On average, you will need about N/2
probes before finding the target with high probability.
With√a quantum algorithm, you can find the target with (extremely) high probability using
only O( N) many probes, giving a quadratic speed-up. This result is due to Lov Grover, and is
known as Grover’s quantum search algorithm. It has many variants, but we only give the simplest
one here to give an idea of how it works.
We assume that N = 2n for some n and that we have a black-box Boolean function f : {0, 1}n →
{0, 1} available such that there is a unique w ∈ {0, 1}n such that f(w) = 1 and f(z) = 0 for all z , w.
Think of f as the target detector. Our task is to find w.
We assume that we can use n-qubit If gates, where we recall that

If |zi = (−1)f(z) |zi.

In the present case, we have If |wi = −|wi and If |zi = |zi if z , w. Note that given the promise
about f, we have If = diag(1, . . . , 1, −1, 1, . . . , 1), where the −1 occurs at position w. Thus,

If = I − 2|wihw|.

Each use of an If gate will count as a probe. We will also use the gate

I0 = I − 2|0n ih0n |,

which flips the sign of |0n i but leaves all other basis states alone. I0 can be implemented by an
O(n)-size O(lg n)-depth circuit using H, X, and CNOT gates. Finally we assume that we have
some n-qubit unitary U available such that hw|U|0n i , 0. Setting x := hw|U|0n i and by adjusting
U by a phase factor if necessary, we can assume that x > 0. The larger x is, the better. If we let
U = H⊗n be a layer of n Hadamard gates, then we can get
X 1
x = hw|U|0n i = 2−n/2 hw|xi = 2−n/2 = √ .
x∈{0,1}n
N

It turns out that we can’t do better than this in the worst case. Grover’s algorithm now works as
follows:

1. Initialize an n-qubit register in the state |0n i.

122
3. Apply G to |si bπ/(4 sin−1 x)c many times, where
G := −UI0 U∗ If
is known as the Grover iterate.
4. Measure the n qubits in the computational basis, obtaining a value y ∈ {0, 1}n .
√
We’ll show that
√ y = w with high probability.
√ Note that if x = 1/ N, then bπ/(4 sin−1 x)c
π/(4x) = Θ( N), and so there are Θ( N) many probes, since G consists of one probe.
We expand G:
G = −UI0 U∗ If
= −U(I − 2|0n ih0n |)U∗ (I − 2|wihw|)
= (I − 2U|0n ih0n |U∗ )(I − 2|wihw|)
= (I − 2|sihs|)(I − 2|wihw|)
= −I + 2|sihs| + 2|wihw| − 4x|sihw|.
Applying the right-hand side to |si and |wi immediately gives us
G|si = (1 − 4x2 )|si + 2x|wi,
G|wi = −2x|si + |wi.
So we see that G|si and G|wi are both (real) linear combinations of |si and |wi. Thus G maps
the plane spanned by |si and |wi into itself, and all intermediate states of the algorithm lie in this
plane. Thus we can now restrict our attention to this two-dimensional subspace S.
Using Gram-Schmidt, we pick an orthonormal basis for S, with |wi being one vector and
|ri := |r 0 i/k|r 0 ik being the other, where |r 0 i := |si − x|wi. We have
2

kr 0 k = r 0 |r 0 = (hs| − xhw|)(|si − x|wi) = 1 − x2 − x2 + x2 = 1 − x2 ,

and so
|si − x|wi
|ri = √ .
1 − x2
It is easily checked that hr|wi = 0. Let 0 < θ < π/2 be such that x = sin θ. Expressing |si in the
{|ri, |wi} basis, we get

p cos θ
|si = 1 − x2 |ri + x|wi = cos θ|ri + sin θ|wi = .
sin θ
Let’s express G with respect to the same {|ri, |wi} basis. Note that restricted to the subspace S, the
identity I has the same effect as the orthogonal projector PS = |rihr| + |wihw| projecting onto S:
they both fix all vectors in S. It follows that, restricted to S,
G = −PS + 2|sihs| + 2|wihw| − 4x|sihw|
= −|rihr| − |wihw| + 2(cos θ|ri + sin θ|wi)(cos θhr| + sin θhw|)
+ 2|wihw| − 4 sin θ(cos θ|ri + sin θ|wi)hw|
= (2 cos θ − 1)|rihr| − 2 cos θ sin θ|rihw| + 2 sin θ cos θ|wihr| + (1 − 2 sin2 θ)|wihw|
2

cos(2θ) − sin(2θ)
= .
sin(2θ) cos(2θ)

123
Geometrically, if we identify |ri with the point (1, 0) ∈ R2 and |wi with the point (0, 1) ∈ R2 , then
|si is the point in the first quadrant of the unit circle, forming angle θ with |ri. Also, G is seen to
give a counterclockwise rotation of the circle through angle 2θ. We want the state to wind up as
close to |wi as possible, which makes an angle π/2 with |ri. Applying G m times puts the state at
an angle (2m + 1)θ from |ri, so we solve

π π 1 π 1
(2m + 1)θ = ⇐⇒ m = − = −1
− .
2 4θ 2 4 sin x 2

Rounding to the nearest integer gives m = bπ/(4 sin−1 x)c, which is the number of times we apply
G to |si. The final state is within an angle θ of |wi, so the probability of getting w as the result of the
measurement is at least cos2 θ = 1 − x2 = 1 − 2−n = 1 − 1/N (if x = 2−n/2 ), which is exponentially
close to 1.
Interestingly, if we apply G too many times, then we start drifting away from |wi and the
probability of getting w in the measurement will start going down again to about zero at 2m
applications, then it will oscillate back to one at about 3m, then close to zero again at 4m, et cetera.

Some Variants of Quantum Search. An obvious variant is to assume that f(z) = 1 for at most
one z, rather than for exactly one z. For this variant, one can run Grover’s algorithm just as before,
but check that the final result y is such that f(y) = 1, using one more probe of f. If not, then you
can conclude that f is the constant zero function, and you’d be wrong with exponentially small
probability.
Another variant is when there are exactly k many z such that f(z) = 1, where k is known, and
your job is to find the location of any one of them. This is the subject of the next exercise.

Exercise 20.1 (Somewhat challenging) Show that if there are exactly k many z such that f(z) = 1,
where n
p 0 < k < 2 is known, then one of ⊗n the targets can be found withPhigh probability using
O( N/k) probes to f. [Hint: Let U = H , let |si = U|0n i = 2−n/2 z∈{0,1}n |zi be the start
state, and let G = −UI0 U∗ If = −(I − 2|sihs|)If be the Grover iterate, all as before. Run Grover’s
algorithm as before, applying G some number of times to |si. To see how many times to apply G:

1. Define the state |wi to be an equal superposition of all target locations:

1 X
|wi := √ |zi.
k z:f(z)=1

2. Likewise, define the state |ri to be the superposition of all nontarget locations:

1 X
|ri := √ n |zi.
2 − k z:f(z)=0

Notice that |ri and |wi are orthogonal unit vectors.

√
3. Let x := hs|wi as before. Show that now, x = k/2n/2 .

124
4. Define 0 < θ < π/2 such that x = sin θ, just as before, and show that |si = cos θ|ri + sin θ|wi,
just as before.

5. (The crucial step) Show directly that

G|ri = cos(2θ)|ri + sin(2θ)|wi,

G|wi = − sin(2θ)|ri + cos(2θ)|wi,

6. Conclude that G maps the space spanned by the orthonormal set {|ri, |si} into itself, and its
matrix looks the same as before.

7. Conclude that bπ/(4θ)c is the right number of applications of G, since measuring the qubits in
p close to |wi returns some target location with high probability. Show that bπ/(4θ)c =
a state
Θ( N/k).]

21 Week 10: Quantum search lower bound

A Lower Bound on Quantum Search. The number of probes to the function f in Grover’s search
algorithm is asymptotically tight. That is, no quantum
√ algorithm can find a unique target in a
search space of size N with high probability using o( N) probes. This bound is due to Bennett,
Bernstein, Brassard, and Vazirani, and predates Grover’s algorithm. It is one of the earliest results
in the area of quantum query complexity.
Suppose we are given an arbitrary r-qubit quantum circuit C of unitary gates followed by an
n-qubit measurement in the computational basis. We assume that the initial state of the r qubits
is some fixed |0i, and that C may contain some number of n-qubit If gates, which allow it to make
queries to a Boolean function f : {0, 1}n → {0, 1}. To prove a lower bound, our goal is to find some
f corresponding to a unique target w ∈ {0, 1}n (i.e., f(w) = 1 and f(z) = 0 for all z , w) such that
w is unlikely to be the final measurement result. The particular w that we choose will depend on
the circuit C.
Here’s the basic intuition. Suppose C contains some number of If gates. Just before one of
these gates is applied, the state of its input qubits is generally some superposition of states |zi with
z ∈ {0, 1}n . There are 2n many such z, and since the state is a unit vector, most of the corresponding
probability amplitudes must be close to zero. If the probability amplitude of some |wi is small,
then changing f(w) from 0 to 1 just flips the sign of this term in the superposition, which in turn
makes little difference to the overall state and is likely to go unnoticed. We want to choose w so
that this is true for all the If gates in C, as well as the final state of the measured qubits.
Now the details. This development is loosely adapted from pages 269–271 of the textbook,
except that, unlike the textbook, we do not implicitly assume that our circuit C has only n qubits.
Suppose that the circuit C has m many If gates, for some m > 0. For any f, the circuit C corresponds
(m) (m−1) (1)
to the unitary transformation Um If Um−1 If · · · U1 If U0 , where

125
(j)
• each If is the unitary operator corresponding to the jth If gate, acting on some sequence of
n of the r qubits,
(1)
• U0 represents all the unitary gates applied prior to If ,
(m)
• Um represents all the unitary gates applied after If , and
(j)
• for all 0 < j < m, Uj represents all the unitary gates applied strictly in between If and
(j+1)
If .

None of the unitary operators U0 , . . . , Um depend on f.

(j) (j)
For any w ∈ {0, 1}n and 1 6 j 6 m, we let Iw be If where f is such that f(w) = 1 and f(z) = 0
(j)
for all z ∈ {0, 1}n − {w}. That is, Iw = (I − 2|wihw|) ⊗ I, where the first operator applies to the
qubits involved in the jth If gate, and the identity applies to the other qubits.
First we run C with each If gate replaced with the identity I (or if you like, Iz where z is the
(j)
constant 0 function). That is, we run C with no targets. For all 0 6 j 6 m, let ψ be the state of
the r qubits immediately after the application of Uj . That is,
E
(j) (1)
= Uj Iz Uj−1 · · · U1 Iz U0 |0i = Uj Uj−1 · · · U1 U0 |0i.
(j)
ψ

In particular ψ(m) is the final state. For 0 6 j < m we can factor ψ(j) uniquely as

E X
(j)
E
|xiβx ,
(j)
ψ =
x∈{0,1}n

where the first ket in each term represents a basis state of the n qubits entering the (j + 1)st If gate,
and the second ket is a (not
necessarily unit) vector representing the other r − n qubits. Likewise,
we uniquely factor ψ(m) as E X
(m)
E
|xiβx
(m)
ψ = ,
x∈{0,1}n

where here the first ket in each term represents a basis state of the n qubits that are about to be
measured, and the second ket is a vector representing the r − n unmeasured qubits.

Since ψ(j) is a state, we have, for all 0 6 j 6 m,

D E X D
(j) (j)
E X
(j) 2
E
1 = ψ(j) ψ(j) = βx βx = βx . (82)

n x
x∈{0,1}

Let w ∈ {0, 1}n be arbitrary. Now we run C again with Iw gates. For 0 6 j 6 m, define
E
(j) (j) (1)
ϕw := Uj Iw Uj−1 · · · U1 Iw U0 |0i

to be the state of the circuit just after the application of Uj . We claim that there are many values of
(j)
w for which |ϕw i does not differ too much from ψ(j) , for any 1 6 j 6 m. For each 0 6 j 6 m,

define the error vector E E E

(j) (j)
ηw := ϕw − ψ(j)

126
(j)
We want to show that enough of the vectors |ηw i have small norm. For each j, define
X (j) 2
E
D(j) := ηw .
w∈{0,1}n

Claim 21.1 D(j) 6 4j2 for all 0 6 j 6 m.

(0)
Proof. We proceed by induction on j. For j = 0, we have |ϕw i = U0 |0i = |ψ(0) i and thus
(0)
|ηw i = 0 for all w, and so the claim clearly holds. Now for the inductive case where 0 6 j < m,
(j+1) (j)
we want to express |ηw i in terms of |ηw i. We have, for all w,
E E
(j+1) (j+1) (j)
ϕw = Uj+1 Iw ϕw
E E
(j+1) (j) (j)
= Uj+1 Iw ψ + ηw
 
(j+1) 
X
(j)
E
(j+1) (j)
E
= Uj+1 Iw |xiβx  + Uj+1 Iw ηw
x∈{0,1}n
X
!
E E
(j) (j+1) (j)
= Uj+1 (Iw |xi) ⊗ βx + Uj+1 Iw ηw
x
X
!
E E
(j) (j+1) (j)
= Uj+1 (|xi − 2|wihw|xi) ⊗ βx + Uj+1 Iw ηw
xE E E
(j) (j+1) (j)
= Uj+1 ψ(j) − 2Uj+1 |wiβw + Uj+1 Iw ηw

E E E
(j) (j+1) (j)
= ψ(j+1) − 2Uj+1 |wiβw + Uj+1 Iw ηw .

Subtracting, we get
E E E E E
(j+1) (j+1) (j+1) (j+1) (j) (j)
ηw = ϕw − ψ = Uj+1 Iw ηw − 2|wi βw ,

whence
(j+1) 2 (j+1) (j) (j) 2 (j) 2
E E E E E
(j)
ηw = Iw ηw − 2|wi βw 6 ηw + 2 βw .
Expanding and summing over w ∈ {0, 1}n , we have
X E E
(j) (j) X (j) 2
E
D(j+1) 6 D(j) + 4 ηw · βw + 4 βw
w∈{0,1}n w∈{0,1}n

= D(j) + 4hκ, λi + 4,
where we have used Equation (82) for the last term, and where κ and λ are 2n -dimensional
(j) (j)
column vectors whose entries, indexed by w, are k|ηw ik and k|βw ik, respectively. We can apply
Cauchy-Schwarz to hκ, λi:

X E 1/2 X E 1/2
! !
(j) 2 (j) 2
p p
hκ, λi = |hκ, λi| 6 kκk · kλk = ηw βw = D(j) · 1 = D(j) ,
w w

127
using (82) again. Plugging this in above and using the inductive hypothesis, we have
p
D(j+1) 6 D(j) + 4 D(j) + 4 6 4j2 + 8j + 4 = 4(j + 1)2 ,

which proves the claim. 2

P (m) 2

(m) 2

Now for j = m, the claim asserts that w∈{0,1}n |ηw i 6 4m2 . This implies that |ηw i >
(m) 2

4m2 /2n−1 for less than 2n−1 many w, i.e., for more than half of the w, we have |ηw i 6
(m) 2

4m2 /2n−1 . Using a similar argument with Equation (82), we must have |βw i 6 1/2n−1 for
more than half of the w. Thus there is some w ∈ {0, 1}n such that both of these inequalities hold.
(m)
Fix such a w. The final state of the circuit when run with target w is |ϕw i, and we can factor it as

(m)
E X
(m)
E
ϕw = |xiγx ,
x∈{0,1}n

where (as with |ψ(m) i) the first ket represents the n qubits that are about to be measured, and the
second ket represents the other qubits (and is not necessarily a unit vector). The probability of
(m) 2

(m)
seeing w as the outcome of the measurement when the state is |ϕw i is then Pr[w] = |γw i ,
but this value is quite small, provided m is not too large:
E E E E
(m) (m) (m) (m)
γw = γw − βw + βw
√

(m)
E
(m)
E 2
6 γw − βw + n/2
2 √

(m)
E
(m)
E 2
= |wi ⊗ γw − βw + n/2

2 √

(m)
E E
(m) 2
= (|wihw| ⊗ I) ϕw − ψ + n/2

√ 2
E
(m) 2
= (|wihw| ⊗ I) ηw + n/2

√ 2
E
(m) 2
6 ηw + n/2
√ 2
(2m + 1) 2
6 .
2n/2

And so we get that Pr[w] 6 (2m + 1)2 /2n−1 = O(m2 /2n ). So finally, if m = o(2n/2 ), we have
Pr[w] = o(1), i.e., Pr[w] approaches zero as n gets large, and the circuit likely won’t find w.

128
22 Week 11: Quantum cryptography

Quantum Cryptographic Key Exchange. If Alice and Bob share knowledge of a secret string r
of random bits, then Alice can send a message m with the same number of bits as r to Bob over
a channel subject to eavesdropping with perfect secrecy, i.e., no third party (Eve), monitoring the
channel with no knowledge of r, can gain any knowledge about m whatsoever. This scheme,
known as a one-time pad, works as follows:

1. Alice computes c = m ⊕ r, the bitwise exclusive OR of m and r. The message m is called the
cleartext or plaintext, and c is called the ciphertext.
2. Alice transmits the ciphertext c to Bob over the channel, which we’ll assume is publically
readable, e.g., a newspaper or an internet bulletin board.
3. Bob gets c and computes m = c ⊕ r, thus recovering the cleartext m.

All Eve sees is c = m ⊕ r, and since she doesn’t know r which is assumed to be uniformly random,
the bits of c look completely random to her—all possible c’s are equally likely if all possible r’s are
equally likely. Hence the perfect secrecy.
It’s called a one-time pad for a reason: r cannot be reused to send another message. Suppose
Alice sends another message m 0 using the same r to transmit c 0 = m 0 ⊕ r. Then Eve can compute

c ⊕ c 0 = (m ⊕ r) ⊕ (m 0 ⊕ r) = m ⊕ m 0 ⊕ r ⊕ r = m ⊕ m 0 .

If m and m 0 are both uncompressed files of English text, then they have enough redundancy that
Eve can gain some knowledge of m and m 0 from their XOR, and likely can even decipher both m
and m 0 uniquely from m ⊕ m 0 if the messages are long enough.
If r is short, say, only a few hundred bits long, then Alice can only transmit that amount of bits
in her message with a one-time pad. It is more practical instead for Alice and Bob to use r as the
key to some symmetric cipher by which they can communicate longer messages. Some commonly
used ciphers for electronic communications include the Advanced Encryption Standard (AES,
a.k.a. Rijndael), Blowfish, and 3DES. These ciphers are called symmetric because the same key r is
used by Alice to encrypt and by Bob to decrypt. These ciphers do not provide perfect secrecy in
the theoretical sense, but they are widely believed to be infeasible to crack.
We get back to the question of how Alice and Bob manage to share r securely in the first place.
If they spend any time together in a physically secure room, they can flip coins and generate an r.
In practice, though, it is not possible for Alice and Bob to ever be together; they may not even know
each other (for example, Alice buys a book online from Bob, who is Barnes and Noble). This is the
problem of key exchange, and it is currently handled using some kind of public key cryptography
such as RSA, Diffie-Hellman, or El-Gamal. I won’t go into how public key crypto works here,
except to say that it relies for its security on the difficulty of performing certain number-theoretic
tasks, such as factoring (RSA) and computing discrete logarithms (Diffie-Hellman, El-Gamal). If
quantum computers are ever physically realized, then Shor’s algorithms for factoring and discrete
log could break current public key schemes.
A key-exchange protocol using quantum mechanics was proposed in 1984 by Charles Bennett
and Gilles Brassard. In this protocol, known as BB84, Alice sends a sequence of qubits to Bob

129
across an insecure quantum channel, subject to eavesdropping/tampering by Eve. Alice and
Bob then perform a series of checks, communicate through a public, nonquantum channel, and
in the end they share some secret random bits. The security of the protocal relies only on the
laws of physics and the faithfulness of the implementation, and not on the assumed difficulty
of certain tasks like factoring large numbers. The key intuition is that in quantum mechanics,
measuring a quantum system may unavoidably alter the system being measured. If Eve wants to
get information about the qubits being sent from Alice to Bob, she must perform a measurement,
which will disrupt the qubits enough to be detected by Alice and Bob with high probability. For
brevity, I will only describe the basic, simplistic, idealized, and unoptimized protocol here. There
are a number of technical issues (such as noise) that I won’t go into. A quick tutorial on quantum
cryptography by Jamie Ford at Dartmouth College can be found at https://www.cs.dartmouth.
edu/˜jford/crypto.html (this link now appears to be broken). There is an on-line simulation
of BB84 by Frederick Henle at http://fredhenle.net/bb84/demo.php. An extensive (though
now somewhat outdated) bibliography of quantum cryptography papers, started(?) by Gilles
Brassard (Université de Montréal) and maintained(?) by Claude Crépeau at McGill University, is
at https://www.cs.mcgill.ca/˜crepeau/CRYPTO/Biblio-QC.html.
In the BB84 protocol, it is assumed that Alice and Bob share an insecure quantum channel, which
Alice will use to send qubits to Bob, and a classical information channel (such as a newspaper,
phone, or electronic bulletin board) that is public (anyone can monitor it) but reliable, in the sense
that any message that Alice and Bob send to each other along this channel reaches the recipient
without alteration, and it is impossible for a third party to send a message to Alice or Bob pretending
to be the other (i.e., it is forgery proof). The description of BB84 needs the following:

Definition 22.1 Let H be an n-dimensional Hilbert space, and let B = {b1 , . . . , bn } and C =
{c1 , . . . , cn } be two

orthonormal bases for H. We say that B and C are mutually unbiased, or
√
complementary, if | b1 |cj | = 1/ n for all 1 6 i, j 6 n. A collection B1 , . . . , Bk of orthonormal bases

for H is mutually unbiased if each pair of bases in the collection is mutually unbiased.

The geometrical intuition is that B and C are mutually unbiased iff the “angle” between any
member of B and any member of C is always the same, up to a phase factor.

Exercise 22.2 Show that if B and C are two orthonormal bases of an n-dimensional Hilbert space
√
such that |hb, ci| = |hb 0 , c 0 i| for any b, b 0 ∈ B and c, c 0 ∈ C, then 1/ n is the common value of
|hb, ci| for any b ∈ B and c ∈ C. [Hint: Express a vector from C as a linear combination of vectors
from B. What can you say about the coefficients?]

A d-dimensional Hilbert space cannot have a mutually unbiased collection of more than d + 1
orthonormal bases. If d is a power of a prime number, then d + 1 mutually unbiased bases can be
constructed, but it is an open problem to determine how many mutually unbiased bases there can
be when d is not a prime power. Even the case where d = 6 is open. Anyway, for the one-qubit
case where d = 2, the bases {|+xi, |−xi}, {|+yi, |−yi}, and {|+zi, |−zi} are mutually unbiased. (Other
collections of three mutually unbiased bases can be obtained from these three by applying some
unitary operator U to every vector (the same for all the vectors). Applying U does not change the
inner product of any pair of vectors.) BB84 uses two of these three, say, {|+zi, |−zi} and {|+xi, |−xi}.
We’ll denote the first of these by l, consisting of spin-up (↑) and spin-down (↓) states, and the

130
second by ↔, consisting of spin-right (→) and spin-left (←) states. The two vectors of each basis
encode the two possible bit values: in the l basis, ↑ encodes 0 and ↓ encodes 1; in the ↔ basis, →
encodes 0 and ← encodes 1. Here is the protocol:

Sending qubits. Alice and Bob repeat the following for j running from 1 to n, where n is some
large number. The random choices made at one iteration are independent of those made at
other iterations.

1. Alice chooses a bit bj ∈ {0, 1} uniformly at random. She also chooses Bj to be one of the
l or ↔ uniformly at random, independent
bases of bj . She prepares a qubit in a state
qj encoding the bit bj in the basis Bj (i.e., qj is either ↑ or → for bj = 0, and either ↓

or ← for bj = 1), and sends the qubit qj to Bob across the quantum channel.
2. Bob receives the qubit sent from Alice, chooses a basis Cj from {l, ↔} uniformly at
random, and measures the qubit projectively using Cj , obtaining a bit value cj according
to the same encoding scheme described above.

This ends the quantum part of the protocol. All further communication between Alice and
Bob is classical and uses the public, classical channel.

Discarding uncorrelated bits. Note that if the quantum channel faithfully transmits all of Alice’s
qubits to Bob unaltered, then bj = cj with certainty whenever Alice’s basis was the same as
Bob’s, i.e., whenever Bj = Cj ; otherwise bj and cj are completely uncorrelated (because l
and ↔ are mutually unbiased).

1. For each 1 6 j 6 n, Bob tells Alice the basis Cj he used to measure cj .

2. Alice replies to Bob with the set C = {j ∈ {1, . . . , n} : Bj = Cj } (C stands for “correlated”).
Let k be the size of C. Note that k is expected to be about n/2, because each Bj and Cj
were chosen independently.
3. Alice and Bob discard the results of all trials where Bj , Cj . Alice retains the bits bj and
Bob retains the bits cj , for all j ∈ C. If the quantum channel was not tampered with,
then bj = cj for all j ∈ C.

Security check. 1. Alice chooses a subset S ⊆ C uniformly at random (S stands for “security
check”). For example, she decides to put j into S with probability 1/2 independently
for each j ∈ C. The set S is expected to have size about k/2.
2. Alice sends S to Bob along with the value of bj for each j ∈ S.
3. Bob checks whether bj = cj for every j ∈ S. If so, he tells Alice that they can accept
the protocol, in which case, Alice and Bob respectively discard the bits bj and cj where
j ∈ S and retain the rest of the bits bj and cj for j ∈ C − S (about k/2 or about n/4 many
bits). On the other hand, if there are any discrepancies, then Bob tells Alice that they
should reject the protocol, in which case, all bits are discarded and they start over with
an entirely new run of the protocol.

Note that if the quantum channel is not tampered with, then Alice and Bob will accept the protocol.
Also notice that any third party monitoring the classical communication between Alice and Bob
knows nothing of the bits that Alice and Bob eventually retain. We’ll explain why there is a good

131
chance that Eve will be caught and the protocol rejected if she tries to eavesdrop on the quantum
channel during the initial qubit communication.
For technical simplicity, we will assume that there is only one way that Eve can eavesdrop on
the quantum channel: she can choose to measure some qubit in either of the bases l or ↔, then
send along to Bob some qubit that she prepares based on her measurement. This is not a general
proof of security then, because Eve could do other things: measure a qubit in some arbitrary basis,
or even couple the qubit to another quantum system, let the combined system evolve, make a
measurement in the combined system, then send along some qubit to Bob based on that. She could
even make correlated measurements involving several of the sent qubits together. It takes some
work to show that Eve’s chances are being caught are not significantly reduced by these more
general attacks, and we won’t show the more general proof here.
If Eve happens to measure a qubit qj in the same basis Bj that Alice used, then this is very

good
for Eve: She knows the encoded bit with certainty, and the post-measurement state is still
qj , i.e., Eve did not alter it. So she can simply retransmit the post-measurement qubit to Bob. In
this case, if j ∈ S, then this qubit won’t provide any evidence of tampering; if j ∈ C − S, then Eve
knows one of the “secret” bits that Alice and Bob share, assuming they accept the protocol.
With probability 1/2, however, Eve measures qj in the wrong basis Bj0 —the one other than Bj .

In this case, she gets a bit value uncorrelated with bj , but even worse (for Eve), her measurement
alters the qubit so as to lose any information about bj . She has to send a qubit to Bob, and at this
point she cannot tell that she has chosen the wrong basis, so the best she can do is what she did
before: resend the post-measurement qubit to Bob. If j ∈ C, then Bob will measure Eve’s altered
qubit rj using Bj , and since rj is in the basis Bj0 , which is mutually unbiased with Bj , Bob’s

result cj will be completely random and uncorrelated with Alice’s bj . If j ∈ S and bj , cj , then
Eve is caught and the protocol is rejected.

To summarize, for each qubit qj that Eve decides to eavesdrop on, Eve will get caught
measuring the qubit if and only if

• she chooses the wrong basis (the one other than Bj ), and
• j ∈ C (i.e., Bob uses Bj to do his measurement and the bit is not discarded as uncorrelated),
and
• Bob measures a value cj , bj , and
• j ∈ S (i.e., this is one of the bits Alice and Bob use for the security check).

Each of these four things happens with probability 1/2, conditioned on the event that the things
above it all happened. This makes the chances of Eve being caught
on behalf of this qubit to be
(1/2)4 = 1/16. If Eve decides to eavesdrop on qubits qj1 , . . . , qj` for 1 6 j1 < · · · < j` 6 n, then

each of these gives her a 1/16 chance of being caught, independently of the others. The probability
of her not being caught is then
1 `

1− < e−`/16 ,
16
which decreases exponentially in ` and is less than 1/e for ` > 16. So Eve cannot eavesdrop on
more than 16 qubits without a high probability of being caught. If n 16, this is a negligible
fraction of the roughly n/4 bits retained by Alice and Bob if they accept the protocol.

132
Exercise 22.3 Suppose that instead of the security check given above, Alice and Bob decide to do
the following alternate security check:
L L
1. Alice and Bob each compute the parities b = j∈C bj and c = j∈C cj of their current
respective qubits.

2. They compare b and c over the public channel.

3. If b , c, then Alice and Bob reject the protocol and start over. Otherwise, they agree on some
j0 ∈ C (it doesn’t matter which), discard bj0 and cj0 , and retain the rest of the bits bj and cj
for j ∈ C − {j0 } as their shared secret, accepting the protocol. [If they didn’t discard one of
the bits, then someone monitoring the public channel would know the parity of Alice’s and
Bob’s shared bits. Discarding a bit removes this information.]

How many bits on average do Alice and Bob retain in this altered protocol, assuming they accept
it? What are Eve’s chances of being caught if she eavesdrops on ` of the qubits, where ` > 0?

In practice, polarized photons are used as qubits for the quantum communication phase. Alice
may generate these photons by a light-emitting diode (LED) and can send them to Bob through
fiber optic cable. Photon polarization has a two-dimensional state space and so can serve as a
qubit. The three standard mutually unbiased bases for photon polarization (each given with its
two possible states) are:

• rectilinear (horizontal, vertical),

• diagonal (northeast-southwest, northwest-southeast), and

• circular (clockwise, counterclockwise).

Photons have the advantage that their polarization is easy to measure and is insensitive to certain
common sources of noise, e.g., stray electric and magnetic fields.
One technical problem is making sure that only one photon is sent at a time. Alice sends a
pulse of electric current through the LED, which emits light in a burst of coherent photons with
intensity (expected number of photons) proportional to the strength of the current. If more than
one photon is sent at a time (in identical quantum states), then Eve could conceivably catch one of
the photons and measure it, letting the other photon(s) go through to Bob as if nothing had been
tampered with. To reduce the probability of a multiphoton burst, the current Alice sends through
the LED must be exceedingly weak: about one tenth the energy of a single photon, say. Then the
expected number of photons sent each time is about 1/10. This means that about nine times out
of ten, no photons are emitted at all. If λ > 0 is the ratio of the current energy divided by the
energy of a single photon (in this example, λ = 0.1), then the number of photons emitted in any
given burst satisfies a Poisson distribution with mean λ(?); that is, the probability that k photons
are emitted is
λk
f(k, λ) = e−λ ,
k!
where k is any nonnegative integer. If λ = 0.1, then f(k, 0) = e−λ 0.9, which is the probability that
the LED emits no photons. The probability of getting a single photon is f(1, λ) = e−λ λ 0.09 0.1.

133
The probability of two emitted photons is f(2, λ) = e−λ λ2 /2 0.005, or about one twentieth the
probability of a single photon. More photons occur with rapidly diminishing probability. So if we
ignore the times when no photons are emitted (Bob tells Alice that he did not receive a photon),
the chances of multiple photons is small—about 1/20. The smaller λ is, the smaller this probability
will be, but the trade-off is that we have to wait longer for a single photon.
Of course, the quantum channel could also be subject to random, nonmalicious noise, which
would cause discrepancies between Alice’s and Bob’s bits. One subtlety is to make the protocol
tolerate a certain amount of noise but still detect malicious tampering with high probability.

23 Week 11: Basic quantum information

We now start our discussion of quantum information. One of the major uses of quantum infor-
mation theory is to analyze how noise can disrupt a quantum computation and how to make the
computation resistant to it. The textbook discusses quantum information in earnest in Chapters 8–
12, with quantum error correction in Chapter 10 and quantum information theory in Chapter 12.
Quantum information is one of the textbook’s real strong points, and I will assume you will read
starting with Chapter 8. The lectures will fill in some background and reiterate points in the text.
In the next few topics, we will be using the density operator formalism almost exclusively. We
really have no choice about this once we generalize our notion of “state” to include mixed states.

Norms of Operators. Recall the definition of the Hilbert-Schmidt inner product on L(H) (Equa-
tion (11)). This suggests another way to define the norm of an operator:

kAk2 := hA, Ai1/2 = (tr(|A|2 ))1/2 .

This norm, known as the Euclidean norm, the L2 -norm, or the Hilbert-Schmidt norm, satisfies all
ten of the properties satisfied by the operator norm of Definition 18.4 except property 4; in fact,
√
kIk2 = n, where n is the dimension of H. The Euclidean norm is one of a parameterized family
of norms defined on operators. For any real p > 1, define the Lp -norm (also called the Schatten
p-norm) of an operator A to be
 1/p
X
n
kAkp := (tr(|A|p ))1/p =  sp
j
 , (83)
j=1

where s1 , . . . , sn > 0 are the eigenvalues of |A|, which are called the singular values of A. (If p is
not an integer, then technically, we have not yet defined |A|p , because |A| is an operator. For now,
you can ignore the middle expression in the equation above and use the right-hand side for the
definition of kAkp .) For p = 2, we get the Euclidean norm. The L1 norm kAk1 = tr |A| is also called
the trace norm and is often useful. In addition, we could define the L∞ norm

kAk∞ = lim kAkp = max(s1 , . . . , sn ),

p→∞

but this is precisely the operator norm kAk of Definition 18.4, because as p gets large, the largest
term in the sum in (83) starts to dominate.

134
Exercise 23.1 Show that if A is an operator on an n-dimensional space, and 1 6 p 6 q are real
numbers, then kAkp > kAkq . Also show that nkAk > kAk1 . Thus all these norms are within a
factor of n of each other. What is kIkp ? [Hint: For the first part, fix s1 , . . . , sn > 0 and differentiate
P 1/p
n p
the expression s
j=1 j with respect to p, and show that the derivative is always negative or
zero.]

POVMs. Let S be a physical system with state space HS . Often, we want to get some classical
information about the current state of S. We can perform a projective measurement on HS ,
obtaining various possible outcomes with various probabilites. This is not the only way to get
information about the state of S, however. We could instead couple the system S with another
system T in some known state in the state space HT , let the combined system ST evolve for a
while, then make a projective measurement of the combined system, i.e., on the space HS ⊗ HT .
This approach is more general and can get information that cannot be obtained by a projective
measurement on HS itself.
Recall that mathematically, a projective measurement on a Hilbert space H corresponds to a
complete set of orthogonal projectors {Pj : j ∈ I}, where I is the set of possible outcomes. We’ll
now relax this restriction a bit.

Definition 23.2 Let H be a Hilbert space. A positive operator-valued measure, or POVM on H is a

set M = {Mj : j ∈ I} where I is some finite or countably infinite set (the possible outcomes), each
Mj > 0 is a positive operator in L(H), and
X
Mj = I,
j∈I

the identity operator on H. If ρ ∈ L(H)

is a state, then measuring ρ with respect to M yields
outcome j ∈ I with probability Pr[j] := Mj , ρ = tr(Mj ρ).

We need to check that the Pr[j]

really form a probability distribution. Fix a state ρ ∈ L(H).
For each j ∈ I, we have Pr[j] = Mj , ρ > 0 by Theorem 9.31, since Mj and ρ are both positive
operators. Furthermore,

X X
X
* +

Pr[j] = Mj , ρ = Mj , ρ = hI, ρi = tr ρ = 1 .
j∈I j j

So the Pr[j] do form a probability distribution. It is important to note that the only properties of
ρ that we used here are that ρ > 0 and that tr ρ = 1. This is important, because we are about
the expand our definition of “state” to include mixed states, which may no longer be projection
operators, but are still positive and have unit trace.
Notice that for a POVM, we don’t specify the post-measurement state. This is OK—quite often,
we don’t care what the post-measurement state is; we only care about the outcomes and their
statistics, and a POVM provides the most general means of measuring a quantum system if we
don’t care about the state after the measurement. We’ll show in a bit that a POVM is equivalent

135
to what we described above: coupling the system to another system, letting the combined system
evolve, then making a projective measurement on the combined system.
A projective measurement on H is just a special case of a POVM where the Mj form a complete
set of orthogonal projectors. To see this, we refer back to Equation (15): Pr[k] = hψ|Pk |ψi, where
Pr[k] is the probability of outcome k when measuring the system in state |ψi, and Pk is the
corresponding projector. Letting ρ := |ψihψ| and treating the scalar hψ|Pk |ψi as a 1 × 1 matrix, we
get
Pr[k] = hψ|Pk |ψi = tr hψ|Pk |ψi = tr (Pk |ψihψ|) = tr(Pk ρ) = hPk , ρi ,
which accords with Definition 23.2.

Exercise 23.3 Consider the following three-outcome POVM:

1 2 −1 1 1 i 1 1 1−i
M1 = , M2 = , M3 = .
4 −1 1 4 −i 1 4 1+i 2

Let ρ = |ψihψ|, where

1 4
|ψi = .
5 −3i
What is the probability of each of the three outcomes when ρ is measured using the POVM above?
Find a unit vector |ϕi such that when |ϕihϕ| is measured with the POVM, the second outcome
occurs with probability 0. (Challenging part) Prove that the first outcome occurs with positive
probability no matter what the state is. What is the essential property of M1 that makes this true?
Clearly, M2 does not share this property. Does M3 ? [Hint: When
computing
the probabilities
above, you can save yourself some calculation by using the fact that Mj , ρ = ψ|Mj |ψ for all j.]

Mixed States.

Definition 23.4 Let A1 , . . . , Ak be scalars, vectors, operators, matrices, etc., all of the same type. A
convex linear combination of A1 , . . . , Ak is a value of the form
X
k
pi Ai ,
i=1
Pk
where the pi are real scalars, each pi > 0, and i=1 pi = 1. In other words, p1 , . . . , pk form a
probability distribution.

Suppose Alice has a lab where she can prepare several states ρ1 = |ψ1 ihψ1 |, . . . , ρk = |ψk ihψk | ∈
L(H), and she flips coins and decides to prepare a state σ chosen at random from this set, where
each ρi is chosen with probability pi . She then sends the state σ she prepared to Bob, without
telling him what it is. What can Bob find out about the state σ that Alice sent her? He can, most
generally, perform a measurement corresponding to some POVM {Mj : j ∈ I}. The probability of
obtaining any outcome j, taken over both the POVM and Alice’s random choice is then

X
k X

Pr[j | σ = ρi ] · Pr[σ = ρi ] =

Pr[j] = Mj , ρi pi = Mj , ρ ,
i=1 i

136
Pk
where ρ = i=1 pi ρi is a convex combination of the ρi with the associated probabilities. So
all Bob can ever determine physically about Alice’s σ is given by the single operator ρ, which
is called a mixed state. By definition, a mixed state is any nontrivial convex linear combination
of one-dimensional projectors. By “nontrivial” we mean that all probabilities are strictly less
than 1, or equivalently, there are at least two probabilities that are nonzero. Mathematically, a
mixed state behaves in many ways much like a state of the form |ψihψ| for some unit vector |ψi
(i.e., a one-dimensional projector). It represents the state of a quantum system about which we
have incomplete information, or which we are not describing completely. Completely described
states, which up until now we have been dealing with exclusively, are of the form |ψihψ| for unit
vectors |ψi. From now on we will call these latter states pure states, and when we use the word
“state” unqualified, we will mean either a pure or mixed state. Both kinds of states are convex
combinations of pure states, trivial or otherwise. A mixed state is then some nontrivial probabilistic
mixture, or weighted average, of pure states.
Why are mixed states important? When we consider a quantum system that is not isolated
from its environment (which we must do when we consider quantum errors and decoherence),
then some information about the state of the system “bleeds” out into the environment, leaving
the system in a partially unknown state—even if the system started out in a pure state. We model
an incompletely known quantum state as a mixed state.
P
Let’s verify that if ρ is any state (say, ρ = ki=1 pi ρi , where the pi form a probability distribution
and the ρi are all pure states), we have ρ > 0 and tr ρ = 1. For positivity, let v be any vector. Then
X
k
v∗ ρv = pi v∗ ρi v > 0,
i=1

because all the ρi are positive operators. Thus ρ > 0. By linearity of the trace, we have
X X
tr ρ = pi tr ρi = pi = 1,
i=1 i

becase all the ρi have unit trace. The next proposition says that the converse of this is also true.

Proposition 23.5 If ρ ∈ L(H) is such that ρ > 0 and tr ρ = 1, then ρ is a convex linear combination of
one-dimensional projectors that project onto mutually orthogonal subspaces.

Proof. Suppose ρ > 0 and tr ρ = 1. Since ρ is normal, it has an eigenbasis {|ψ1 i, . . . , |ψn i}. With
respectPto this eigenbasis, ρ is represented by the matrix diag(p1 , . . . , pn ) for some p1 , . . . , pP
n and
so ρ = n p
i=1 i i|ψ ihψi |. Since ρ > 0, all the p i are nonnegative real, and further 1 = tr ρ = i pi .
So ρ is a convex combination of |ψ1 ihψ1 |, . . . , |ψn ihψn |, which project onto mutually orthogonal,
one-dimensional subspaces. 2
Thus we get the following two corollaries:

Corollary 23.6 An operator ρ ∈ L(H) is a state (i.e., a convex combination of one-dimensional projectors)
if and only if ρ is positive and has unit trace.

Corollary 23.7 An operator ρ ∈ L(H) is a state if and only if ρ is normal and its eigenvalues form a
probability distribution.

137
Measuring a mixed state with a POVM has exactly the same mathematical form as with a pure
state. Recall that the only two properties of the state ρ we used to show that the measurement
makes sense is that ρ > 0 and tr ρ = 1, both of which are true of any mixed state. Similarly,
unitary time evolutionP of a mixed state has exactly the same mathematical form as with a pure
state. Indeed, if ρ = i pi ρi is some mixture of pure states, then evolving ρ via a unitary operator
U should be equivalent to evolving each ρi by U and taking the same mixture of the results. By
linearity, this gives

X X X
!
U
ρ= pi ρi 7−→ pi (Uρi U∗ ) = U pi ρi U∗ = UρU∗ .
i i i

Finally, we won’t bother proving it, but Equations (29) and (30), which describe projective mea-
surements, are equally valid for mixed states ρ as well as for pure states.
Different probability distributions of pure states can yield the same state, but if they do, they
are physically indistinguishable, that is, no physical experiment can tell one distribution from the
other with positive probability. However, for any state ρ, there is a preferred mix of pure states that
yields ρ, namely, the “eigenstates” |ψ1 ihψ1 |, . . . , |ψn ihψn | used in the proof of Proposition 23.5,
with their respective eigenvalues as probabilities. The states are distinguished by the fact that
they are pairwise orthogonal. We will call this preferred probability distribution the eigenvalue
distribution of ρ.
It’s time for an example.
√ Alice may send Bob a single qubit in state |0i with probability 1/2 and
state |+i = (|0i + |1i)/ 2 with probability 1/2. The resulting mixed state is

|0ih0| |0ih0| + |0ih1| + |1ih0| + |1ih1|

1 1 1 3 1
ρ = |0ih0| + |+ih+| = + = .
2 2 2 4 4 1 1

Let’s find the eigenvalue distribution of ρ. One can easily check that an eigenbasis of this ρ consists
of states
√

1 1
|ψ1 i = p √ √ with eigenvalue p1 = (2 + 2)/4,
4−2 2 2−1
√
√

1 2−1
|ψ2 i = p √ with eigenvalue p2 = (2 − 2)/4.
4−2 2 −1
√
Thus ρ = p1 |ψ1 ihψ1 | + p2 |ψ2 ihψ2 |. So if Carol
√ prepares |ψ1 ihψ1 | with probability p 1 = (2 + 2)/4
and |ψ2 ihψ2 | with probability p2 = (2 − 2)/4, then ships her state to Bob, then Bob (who doesn’t
see who the sender is) can’t tell with any advantage over guessing who sent him the state.

Exercise 23.8 Do a similar analysis as that above, this time assuming Alice sends (4|0i+3|1i)(4h0|+
3h1|)/25 with probability 1/2 and (4|0i − 3|1i)(4h0| − 3h1|)/25 with probability 1/2.

Exercise 23.9 Prove that any convex combination of states (pure or mixed) is a state.

138
One-Qubit States and the Bloch Sphere. Recall that we have a nice geometrical representation
of one-qubit pure states: for each one-qubit pure state ρ there correspond unique x, y, z ∈ R such
that x2 + y2 + z2 = 1 and ρ = (I + xX + yY + zZ)/2, and conversely, for any point (x, y, z) on the
unit sphere in R3 (Bloch sphere), the operator (I + xX + yY + zZ)/2 is a one-qubit pure state.
Can we characterize general one-qubit states in a similarly geometrical way? Yes. Let ρ =
Pk
i=1 pi ρi be any one-qubit state, where the ρi are one-qubit pure states and the pi form a
probability distribution as usual. For 1 6 i 6 k, let (xi , yi , zi ) be the point on the Bloch sphere
such that ρi = (I + xi X + yi Y + zi Z)/2. Then by linearity we have

X
k X
I + xi X + yi Y + zi Z

I + xX + yY + zZ
ρ= p i ρi = pi = ,
2 2
i=1 i
Pk 3
where (x, y, z) := i=1 pi (xi , yi , zi ) ∈ R . That is, ρ corresponds geometrically to the point
3
(x, y, z) ∈ R that is the convex combination of all the points (xi , yi , zi ), weighted by the same
probabilities pi used to weight ρ in terms of the ρi . We note that

p X
k X
x2 + y2 + z2 = k(x, y, z)k 6 pi k(xi , yi , zi )k = pi = 1,
i=1 i

and the inequality is strict iff there are at least two distinct points (xi , yi , zi ) on the sphere with
pi > 0. This means that the point (x, y, z) is somewhere on or inside the Bloch sphere. The surface
points of the Bloch sphere correspond to the pure states, and the points in the interior correspond
to mixed states. A one-qubit unitary U rotates a mixed state ρ in the interior just as it does points
on the surface of the sphere (it rotates all of R3 , in fact).
We can get some important facts about ρ based on its geometry. For example, if ρ = (I +
xX + yY + zZ)/2, then let r = k(x, y, z)k = (x2 + y2 + z2 )1/2 6 1 be the distance from (x, y, z) to
the origin. Then the eigenvalues of ρ are (1 ± r)/2, and if r > 0, the corresponding eigenvectors
are the states corresponding to the antipodal points ±(x, y, z)/r on the surface of the sphere.
((I + (x/r)X + (y/r)Y + (z/r)Z)/2 has eigenvalue (1 + r)/2, while (I − (x/r)X − (y/r)Y − (z/r)Z)/2
has eigenvalue (1 − r)/2, which are the two probabilities in the eigenvalue distribution of ρ.) These
are the points where the line through (0, 0, 0) and (x, y, z) intersects the surface of the sphere.
(If r = 0, then (x, y, z) is the origin, ρ = I/2, and every nonzero vector is an eigenvector with
eigenvalue 1/2.)

Exercise 23.10 Prove all the assertions in the paragraph above. [Hint: You could certainly compute
the eigenvectors and eigenvalues of ρ by brute force if you had to. Alternatively, you might
note that if you let ρ1 = |ψ1 ihψ1 | = (I + (x/r)X + (y/r)Y + (z/r)Z)/2 and ρ2 = |ψ2 ihψ2 | =
(I − (x/r)X − (y/r)Y − (z/r)Z)/2, then hψ1 |ψ2 i = 0 because the corresponding points are antipodal,
and further, ρ is a convex combination of ρ1 and ρ2 . What are the coefficients of this combination
in terms of r? What does the matrix of ρ look like in the {|ψ1 i, |ψ2 i} basis?]

139
24 Week 12: Quantum channels (quantum operations)

The Partial Trace. We sometimes have a system T that we are interested in coupling with another
system S that we are not interested in, producing an entangled state in the combined system ST .
This happens, for example, when a quantum computation (system T ), rather than proceeding in
perfect isolation, gets corrupted by an unintended interaction with its environment (system S),
e.g., a cosmic ray hitting the quantum computing device. Since we only care about system T , does
it make sense to ask, “what is the state of T ?” even though it is entangled with S? The partial trace
operator lets us do just that.
Let HS and HT be Hilbert spaces. There is a unique linear map trS : L(HS ⊗ HT ) → L(HT )
such that for every A ∈ L(HS ) and B ∈ L(HT ),

trS (A ⊗ B) = (tr A)B. (84)

The map trS is an example of a partial trace. When we apply trS , we often say that we are tracing
out the system S. There can be only one linear map satisfying (84), because L(HS ⊗ HT )
L(HS ) ⊗ L(HT ) is spanned by tensor products of operators. Suppose HS has dimension m and
HT has dimension n. Suppose some operator C ∈ L(HS ⊗ HT ) is written in block matrix form
with respect to some product basis:
 
B11 B12 · · · B1m
 B21 B22 · · · B2m 
C= . ..  ,
 
.. ..
 .. . . . 
Bm1 Bm2 · · · Bmm
where each block Bij is an n × n matrix. Then we can also write C uniquely as
X
m
C= Eij ⊗ Bij ,
i,j=1

where Eij is the m × m matrix whose (i, j)th entry is 1 and all other entries 0. The partial trace of
C is then given in matrix form as
X X
m
trS (C) = (tr Eij )Bij = Bii ,
i,j i=1

which is the sum of all the diagonal blocks of C and is an n × n matrix.

We may alternatively trace out the system T via the unique linear map trT : L(HS ⊗ HT ) →
L(HT ) that satisfies
trT (A ⊗ B) = (tr B)A
for any A ∈ L(HS ) and B ∈ L(HT ). If C is as above, then
 
tr B11 tr B12 ··· tr B1m
X
m  tr B21 tr B22 ··· tr B2m 
trT (C) = (tr Bij )Eij =  ,
 
.. .. .. ..
i,j=1
 . . . . 
tr Bm1 tr Bm2 · · · tr Bmm

140
which is an m × m matrix.
The partial trace operator extends in a similar way to combinations of several systems at once.
Intuitively, tracing out a system is a bit like averaging over the system. In tensor algebra, the
partial trace operators and the (total) trace operator are called contractions.
If system ST is in some separable (i.e., tensor product) state ρ = ρS ⊗ ρT ∈ L(HS ⊗ HT ), where
ρS ∈ L(HS ) and ρT ∈ L(HT ) are states in S and T , respectively, then trS ρ = (tr ρS )ρT = ρT and
trT ρ = (tr ρT )ρS = ρS . So we can say unequivocably that the system S is in state trT ρ and the
system T is in state trS ρ. If ρ is entangled, then we can still say that system S is in state trT ρ and T
is in trS ρ, but now these two states (called reduced states) are mixed states, even if ρ itself is a pure
state. Thus by tracing out one or the other system, we will lose some information about the state
of the remaining system if the original combined state was entangled.

Open Systems and Quantum Channels. A closed quantum system is one that does not interact
with the outside world. Closed systems evolve unitarily. An open quantum system does couple
with one or more other systems (collectively called the environment) that we wish to ignore. By
considering open systems, we will obtain a powerful formalism for describing what can happen
to a quantum system that may interact with its environment. This formalism, the formalism
of quantum channels (sometimes called quantum operations) is general enough to encompass
both unitary evolution and measurements. A quantum channel is a certain linear map that maps
states in one Hilbert space to states in another. All physical processes, including unitary evolu-
tion, measurements, etc., or any combination of these, are modeled mathematically as quantum
channels.

Definition 24.1 For Hilbert spaces H and J, we let T(H, J) denote the (Hilbert) space L(L(H), L(J))
of all linear maps from L(H) into L(J). We write T(H) to mean T(H, H). A map Φ ∈ T(H, J) is
sometimes called a superoperator.

Since a quantum state is an operator over a Hilbert space, all quantum channels are superop-
erators, mapping states of one Hilbert space H to states of another (or the same) Hilbert space J.
Not all superoperators are quantum channels, however. As we will optionally see below, there
are some simple conditions on a superoperator Φ that makes it a quantum channel. One such
condition is that Φ must be trace-preserving, which is necessary so that Φ maps states to states,
all of which have unit trace. The other condition is that Φ be completely positive, a concept we will
discuss later. At first, we will only consider the case where H = J, i.e., where the channel maps
states to states in the same system, but later we will see how to generalize to arbitrary H and J.
Throughout this section, we will avoid Dirac notation, as it tends to get in the way.
There are a number of different, equivalent ways of representing a quantum channel. We
consider two here: the coupled-systems representation (aka the Stinespring representation), where we
include the environment then trace it out, and the operator-sum representation (aka the representation
by Kraus operators), where we simply apply operators to states in the system without mentioning
the environment. We’ll show that these two views are equivalent. The coupled-systems view is
more physically intuitive, while the operator-sum view is more mathematically convenient.

141
We now formally describe a quantum channel E on some system S according to the coupled-
systems view. We imagine S (state space HS of dimension n) in some state ρ. We now consider
another system T (whose state space HT has dimension N), in some known or prepared pure state
σ, initially isolated from system S. The combined state of T S is then σ ⊗ ρ initially.21 We now
couple T and S together and let the combined system T S evolve according to some unitary operator
U ∈ L(HT ⊗ HS ), resulting in the state U(σ ⊗ ρ)U∗ . We now “forget” the system T by tracing it
out, obtaining the final state

ρ 0 = E(ρ) := trT (U(σ ⊗ ρ)U∗ ) ∈ L(HS ). (85)

More generally, for any X ∈ L(H), we define

X 0 = E(X) := trT (U(σ ⊗ X)U∗ ) ∈ L(HS ). (86)

Because all the components making up the definition of E in (85) are linear maps, E itself is linear,
mapping L(HS ) into L(HS ), and thus E ∈ T(HS ) is a superoperator. E depends implicitly on the
system T , its initial state σ, and U.
At first blush, the operator sum formulation of the quantum channel E looks completely
different. We pick some finite collection of operators K1 , . . . , KN ∈ L(HS ) (for some N > 1) that
are completely arbitrary except that we must have

X
N
K∗j Kj = I . (87)
j=1

We then define, for any X ∈ L(HS ),

X
N
0
X = E(X) := Kj XK∗j ∈ L(HS ). (88)
j=1

Defined this way, the map E is evidently linear from L(HS ) into itself, and so E ∈ T(H) is
a superoperator, and it depends implicitly on the choice of K1 , . . . , KN , which are called Kraus
operators. We’ll show in a minute that the two definitions of E just described are equivalent.

Exercise 24.2 Verify that if ρ is a state (i.e., ρ > 0 and tr ρ = 1), then the operator ρ 0 = E(ρ) defined
by (88) is also a state.

The next exercise shows that quantum channels include unitary evolution.

Exercise 24.3 Show that unitary evolution of the system S through a unitary operator U ∈ L(HS )
is a legitimate quantum channel. Argue with respect to both views of quantum channels.
21
The textbook puts the auxilliary system T on the right, whereas we put it on the left. The two ways are equivalent,
but ours will be more consistent with a block matrix representation we’ll use later when we prove equivalence of the
two views.

142
For another example, suppose we make a projective measurement on the system S in state
ρ—using some complete set {P1 , . . . , Pk } of orthogonal projectors in L(HS )—but we don’t bother
to look at what the outcome of the measurement is. Then for all we know, the post-measurement
state of S will be a mixture of the post-measurement states corresponding to all the possible
outcomes, weighted by their probabilities. That is, using Equation (30), the state of S after this
“information-free” measurement should be22
X
k
Pj ρPj X
0
ρ = Pr[j] = Pj ρPj∗ .
Pr[j]
j=1 j

This looks like the operator sum representation of a quantum channel (Equation (88)), and indeed
we have
X k X X
I= Pj = Pj2 = Pj∗ Pj ,
j=1 j j
because the Pj form a complete set of projectors. Thus P1 , . . . , Pk satisfy Equation (87) to be Kraus
operators, and this information-free measurement is a quantum channel.

Equivalence of the Coupled-Systems and Operator-Sum Representations. First we’ll show that
every quantum channel defined by the coupled system definition has an operator sum represen-
tation. Suppose that E ∈ T(HS ) is defined so that for all X ∈ L(HS ),
E(X) = trT (U(σ ⊗ X)U∗ ),
where T is a system with state space HT , σ ∈ L(HT ) is a pure state (1-dimensional projector), and
U ∈ L(HT ⊗ HS ) is unitary. Let n = dim(HS ) and let N = dim(HT ). We’ll pick a product basis for
HT ⊗ HS so that we can work directly with matrices. Let {e1 , . . . , eN } be an orthonormal basis for
HT and let {f1 , . . . , fn } be an orthonormal basis for HS . We can choose these bases arbitrarily, so
we’ll assume that σ = e1 e∗1 , i.e., σ projects onto the 1-dimensional subspace spanned by e1 . With
respect to the product basis {ei ⊗ fj : 1 6 i 6 N & 1 6 j 6 n}, the operator U can be written
uniquely in block matrix form as
XN
U= Eab ⊗ Bab ,
a,b=1
where each Bab is an n × n matrix, and each Eab := ea e∗b is the N × N matrix whose (a, b)th entry
is 1 and all the other entries are 0. Noting that Eab Ecd = ea e∗b ec e∗d = heb , ec iea e∗d = δbc Ead and
E∗cd = Edc , we have
X
N
U(e1 e∗1 ⊗ X)U∗ = (Eab ⊗ Bab )(E11 ⊗ X)(Ecd ⊗ Bcd )∗ (89)
a,b,c,d=1
X
= Eab E11 Edc ⊗ Bab XB∗cd (90)
a,b,c,d
X
= Eac ⊗ Ba1 XB∗c1 . (91)
a,c
22
A minor technical point: To be well-defined, the first sum in the next equation is really only over those j for which
Pr[j] > 0. However, the second sum is over all j. The two sums are still equal, because if Pr[j] = tr(Pj ρPj∗ ) = 0 for some
j, then Pj ρPj∗ = 0 by Exercise 9.28.

143
Tracing out the first component of each tensor product, and using the fact that tr Eac = δac , we get
X
N X
N
E(X) = trT (U(e1 e∗1 ⊗ X)U∗ ) = (tr Eac )Ba1 XB∗c1 = Ba1 XB∗a1 , (92)
a,c=1 a=1

which has the form of (88) if we let Ka := Ba1 . We’re done if (87) holds. Let IT and IS be the identity
operators in L(HT ) and L(HS ), respectively, and define IT S := IT ⊗ IS , which is the identity on
L(HT ⊗ HS ). Since U is unitary, we have
X
N
IT S = U∗ U = (Eab ⊗ Bab )∗ (Ecd ⊗ Bcd )
a,b,c,d=1
X
= Eba Ecd ⊗ B∗ab Bcd
a,b,c,d
X
= Ebd ⊗ B∗ab Bad
a,b,d

X
N X
!
= Ebd ⊗ B∗ab Bad .
b,d=1 a

We want to isolate what is in the parentheses for b = d = 1 and show that it is the identity IS . We
can do this by multiplying both sides on the left and right with the Hermitean operator E11 ⊗ IS
and then tracing out T . For the left-hand side, we get
trT ((E11 ⊗ IS )IT S (E11 ⊗ IS )) = trT ((E11 ⊗ IS )(IT ⊗ IS )(E11 ⊗ IS )) = trT (E11 ⊗ IS ) = (tr E11 )IS = IS .
For the right-hand side, using linearity of trT , we get
X
N X
N
!! !
trT (E11 ⊗ IS ) Ebd ⊗ B∗ab Bad (E11 ⊗ IS )
b,d=1 a=1

X
N X
!
= trT E11 Ebd E11 ⊗ B∗ab Bad
b,d=1 a

X
N X
!
= trT δ1b δd1 E11 ⊗ B∗ab Bad
b,d=1 a

X
N X
!
= δ1b δd1 trT E11 ⊗ B∗ab Bad
b,d=1 a
X
!
= trT E11 ⊗ B∗a1 Ba1
a
X X
= (tr E11 ) B∗a1 Ba1 = B∗a1 Ba1
a a

as intended. Equating both sides, we then have

X
N
B∗a1 Ba1 = IS ,
a=1

144
which means that (87) holds for Kraus operators B11 , . . . , BN1 , and we have a legitimate operator
sum representation of E.
We’ll now show the other direction. Suppose we are given an operator sum representation of
P
E in the form of some collection K1 , . . . , KN of Kraus operators such that N ∗
j=1 Kj Kj = IS . That is
PN
E(X) = a=1 Ka XK∗a for all X ∈ L(HS ). We want to find a coupled-systems representation of E.
As before, we will fix some orthonormal basis {fj }16j6n of HS , so that we can talk about matrices
instead of operators. Define K to be the nN × n matrix
 
K1
K =  ...  .
 

KN
PN ∗
The condition that j=1 Kj Kj = I can be written in block matrix form as
 
K1
 . 
K∗ K = K∗1 · · · K∗N

 ..  = IS . (93)
KN

Here we are multiplying an n × nN matrix on the left and an nN × n matrix on the right to get the
n×n identity matrix. Consider the columns of K as nN-dimensional column vectors. Equation (93)
is equivalent to saying that the columns of K form an orthonormal set. By Gram-Schmidt, we can
take these column vectors as the first n vectors in an orthonormal basis for CnN . We assemble
these basis vectors as the columns of an nN × nN matrix U written in block form by
 
B11 B12 · · · B1N
 B21 B22 · · · B2N  XN
U= . = Eab ⊗ Bab ,
 
.. .. ..
 ..

. . .  a,b=1
BN1 BN2 · · · BNN

where each Bab is an n×n matrix, and the first n columns of U form K, i.e., Ka = Ba1 for 1 6 a 6 N.
The orthonormality of the columns of U is equivalent to the equation U∗ U = I, and so U is unitary.
Now let HT be any N-dimensional Hilbert space, and fix an orthonormal basis {ei }16i6N for HT .
Then with respect to the product basis, U can be considered a unitary operator in L(HT ⊗ HS ),
and so now we follow the string of equations of (92) to see that E(X) = trT (U(e1 e∗1 ⊗ X)U∗ ) for any
X ∈ L(HS ). Indeed, for all X ∈ L(HS ) we have

X
N
trT (U(e1 e∗1 ⊗ X)U∗ ) = trT ((Eab ⊗ Bab )(e1 e∗1 ⊗ X)(Ecd ⊗ Bcd )∗ )
a,b,c,d=1
X
= trT ((Eab ⊗ Bab )(E11 ⊗ X)(Edc ⊗ B∗cd ))
a,b,c,d
X
= trT (Eab E11 Edc ⊗ Bab XB∗cd )
a,b,c,d
X
= trT (δb1 δ1d Eac ⊗ Bab XB∗cd )
a,b,c,d

145
X
= trT (Eac ⊗ Ba1 XB∗c1 )
a,c
X
= (tr Eac )Ba1 XB∗c1
a,c
X
= δac Ba1 XB∗c1
a,c

X
N X
N
= Ba1 XB∗a1 = Ka XK∗a = E(X) .
a=1 a=1

This is then a coupled-systems representation of E.

A Normal Form for the Kraus Operators. The choices of Kraus operators in the operator-sum
representation (88) of an quantum channel E are not unique. Neither is the form of the unitary
U in the coupled-systems representation of (85). The freedom in the coupled systems case can be
seen as follows: Suppose A ∈ L(HT ) and B ∈ L(HS ) are any operators, and suppose V ∈ L(HT )
is unitary. Then

trT ((V ⊗ I)(A ⊗ B)(V ⊗ I)∗ ) = trT (VAV ∗ ⊗ B) = (tr(VAV ∗ ))B = (tr A)B = trT (A ⊗ B).

Since trT is linear, this extends to

trT [(V ⊗ I)C(V ⊗ I)∗ ] = trT C

for every C ∈ L(HT ⊗ HS ). In other words, if we eventually trace out the environment T , then
it doesn’t matter if we evolve T ’s state unitarily or not. Let’s conjugate Equations (89–91) by
V ⊗ I, where V is an N × N unitary matrix and I is the n × n identity matrix. Noting that
P
V ⊗I= N a,b=1 [V]ab Eab ⊗ I, we get

(V ⊗ I)U(e1 e∗1 ⊗ X)U∗ (V ⊗ I)∗

X
N
= ([V]ef Eef ⊗ I)(Eab ⊗ Bab )(E11 ⊗ X)(Ecd ⊗ Bcd )∗ ([V]gh Egh ⊗ I)∗
a,b,c,d,e,f,g,h=1
X
= [V]ef [V]∗gh (Eef Eab E11 Edc Ehg ) ⊗ (Bab XB∗cd )
a,...,h
X
= [V]ea [V]∗gc Eeg ⊗ Ba1 XB∗c1 .
a,c,e,g

Tracing out T , we have

E(X) = trT (U(e1 e∗1 ⊗ X)U∗ )

= trT ((V ⊗ I)U(e1 e∗1 ⊗ X)U∗ (V ⊗ I)∗ )
X
= [V]ea [V]∗gc trT (Eeg ⊗ Ba1 XB∗c1 )
a,c,e,g
X
= [V]ea [V]∗gc (tr Eeg )Ba1 XB∗c1
a,c,e,g

146
X
= [V]ea [V]∗ec Ba1 XB∗c1
a,c,e
X X X
! !
= [V]ea Ba1 X [V]∗ec Bc1
e a c
X
= K e∗ ,
e e XK
e
e

where
X
N X
N
K
e e := [V]ea Ba1 = [V]ea Ka (94)
a=1 a=1

for all 1 6 e 6 N. So these equations give us the effect of V on the Kraus operators.

Exercise 24.4 Show by direct calculation that if K1 , . . . , KN ∈ L(HS ) are operators such that
PN ∗ PN
j=1 Kj Kj = I, and for all 1 6 j 6 N we define Kj := a=1 [V]ja Ka for some fixed N × N
e
unitary matrix V, then
XN
e∗K
K j j = I,
e
j=1

and for every X ∈ L(HS ),

X
N X
N
K e∗ =
e j XK Kj XK∗j .
j
j=1 j=1

So we are allowed to choose V to be any unitary matrix we want without affecting the quantum
channel. We’ll pick a specific V as follows: Given any set of Kraus operators K1 , . . . , KN , let T be
the N × N matrix whose (i, j)th entry is

[T ]ij := Kj , Ki = tr(K∗j Ki ),

∗
for 1 6 i, j 6 N. Note that [T ]ij = Kj , Ki = Ki , Kj = [T ]∗ji , and so T is a Hermitean matrix. Since

T is normal, we can choose V so that VT V ∗ is a diagonal matrix. Now defining K e j := PN [V]ja Ka

a=1
as in (94), we get, for all 1 6 i, j 6 N,

D E X
N
Kei , K
ej = [V]∗ia [V]jb hKa , Kb i
a,b=1
X
= [V]jb [T ]ba [V ∗ ]ai
a,b
= [VT V ∗ ]ji
= λj δij

for some values λ1 , . . . , λN , because VT V ∗ is diagonal. Thus the K

e j are pairwise orthogonal with
respect to the Hilbert-Schmidt inner product, and hence linearly independent. Since the K e j occupy
2 2
an n -dimensional space, there can only be at most n many of them that are nonzero.

147
Hence we have a normal form for the operator-sum representation of a quantum channel: Any
quantum channel on an n-dimensional state space may be represented by N 6 n2 many Kraus
operators that are pairwise orthogonal.

Exercise 24.5 Explain why the values λ, . . . , λN above are all nonnegative reals.

Quantum Channels Between Different Hilbert Spaces. We have restricted our attention to
quantum channels of the form E : L(H) → L(H), that is, linear maps that map operators of a space
to operators of the same space. This restriction is unnecessary, and it is easy to imagine quantum
channels mapping states in one space to states in another. The partial trace operator itself is a
good example of such a thing. The operator-sum view is the easiest way to characterize these
more general quantum channels. We will satisfy ourselves with the following general definition,
without going into the details of why it is the best one. It certainly coincides with our previous
view in the case where the two spaces are the same.

Definition 24.6 Let H and J be Hilbert spaces. A quantum channel from H to J is a superoperator
E ∈ T(H, J) such that there exists an integer N > 0 and linear maps Kj : H → J for 1 6 j 6 N
P
satisfying the completeness condition N j=1 Kj Kj = IH , such that for every X ∈ L(H),
∗

X
N
E(X) = Kj XK∗j .
j=1

Here, IH denotes the identity map on H. As in the special case where H = J, the Kj are known as
Kraus operators.

Recall that for any linear map K : H → J, the adjoint K∗ is a uniquely defined linear map from J
to H. E maps positive operators to positive operators, and the completeness condition guarantees
that E is trace-preserving.

General Measurements. I didn’t lecture on this in class. The textbook makes a rather significant
logical mistake in its discussion of quantum measurement, starting in Section 2.2.3 but carrying
over into Chapter 8. Except for alerting you to this mistake, this material is optional.
Postulate 3 on pages 84–85 (reformulated in terms of density operators on page 102) describes
general measurements where some classical information may be obtained. In it, they describe a
(general) quantum measurement on a system with state space H Pas an indexed collection {Mm }m∈I
of operators (called measurement operators) in L(H) satisfying m∈I M∗m Mm = I. (Thus the Mm
satisfy the same completeness condition as the Kraus operators did previously.) I use I here again
to describe the set of possible outcomes. According to the postulate, when a system in state ρ is
measured using {Mm }, the probability of seeing an outcome m ∈ I is given by hM∗m Mm , ρi, and
the post-measurement state assuming outcome m occurred is Mm ρM∗m /hM∗m Mm , ρi.
All of this is fine except that it is not general enough. There are legitimate physical measure-
ments that do not take this form. The measurements described by the book are all guaranteed to
produce pure states after the measurement, assuming that a pure state was measured. There are,

148
however, more “imprecise” measurements that may yield mixed states after the measurement,
even if the pre-measurement state was pure.
As before with quantum channels, there are two equivalent views of a general measurement:
the coupled-systems view and the operator-sum view. The textbook gives the operator-sum view.
I’ll describe both views, pointing out how the true operator-sum view differs from the text, but I’ll
omit the proof of equivalence, which is very similar to what I did earlier with quantum channels.
If you want a chance to practice “index gymnastics” yourself, I’ll leave the details of the proof to
you as an exercise.
We’ll only consider finitary measurements here, i.e., measurements with only a finite set of
possible outcomes. One can generalize our analysis to infinitary measurements as well.
In the coupled-systems view, a general measurement on a system S with state space HS
proceeds as follows, assuming ρ is the pre-measurement state of S:

1. Prepare another system T with (finite dimensional) state space HT in some initial pure state
σ.

2. Couple T with S, and let the combined system evolve unitarily according to some unitary
U ∈ L(HT ⊗ HS ), producing the state U(σ ⊗ ρ)U∗ .

3. Perform a projective measurement on the system T S, using some complete set {P(m) : m ∈ I}
of orthogonal projectors in L(HT ⊗ HS ). I is the (finite) set of possible

outcomes. By the
usual rules, the probability of seeing any outcome m ∈ I is Pr[m] = P (m) , U(σ ⊗ ρ)U∗ ,

and, assuming m is the outcome, the post-measurement state of T S is

P(m) U(σ ⊗ ρ)U∗ (P(m) )∗ P(m) U(σ ⊗ ρ)(P(m) U)∗

ρTmS := = .
Pr[m] tr(P(m) U(σ ⊗ ρ)(P(m) U)∗ )

4. Trace out the system T of the post-measurement state to obtain the post-measurement state
of S:
trT (P(m) U(σ ⊗ ρ)(P(m) U)∗ )
ρS
m := .
tr(P(m) U(σ ⊗ ρ)(P(m) U)∗ )

Remember that the projectors satisfy

X X
P(m) = (P(m) )∗ P(m) = I ∈ L(HT ⊗ HS ).
m∈I m∈I

This means that we have

X X
!
∗ (m) ∗ (m) ∗ (m)
(P (m)
U) P U=U (P ) P U = U∗ U = I
m∈I m∈I

as well.
In the operator-sum view, a general measurement M of system S is described by a (finite) set
(m) (m)
I of possible outcomes, and for each outcome m ∈ I a finite list M1 , . . . , MN of operators in

149
L(HS ),23 all satisfying
XX
N
(m) (m)
(Mj )∗ Mj = I.
m∈I j=1

When system S in state ρ is measured according to M, the probability of seeing an outcome m

is given by  
XN
(m) (m)
D E
Pr[m] := tr  Mj ρ(Mj )∗  = M(m) , ρ , (95)
j=1
PN (m) ∗ (m)
where we set M(m) := j=1 (Mj ) Mj . If m is observed, then the post-measurement state of
S is PN (m) (m)
Em (ρ) j=1 Mj ρ(Mj )∗
ρm := := . (96)
Pr[m] Pr[m]
Here, we define Em (ρ) to be the numerator of the right-hand side. Note that tr Em (ρ) = Pr[m] 6 1.
Any operator τ > 0 such that tr τ 6 1 will be called a partial state or an incomplete state. The
incompleteness of Em (ρ) is a reflection of the fact that it might not occur with certainty—its
occurrence is conditioned on the outcome of the measurement being m. Similarly, we’ll call Em an
incomplete quantum channel, since it might not be applied with certainty. Em is clearly linear and
mapsP positive operators to positive operators, but it is not (necessarily) trace-preserving. The map
ρ 7→ m∈I Em (ρ) is, however, a trace-preserving (i.e., complete) quantum channel.

I’ll finish
with two remarks about Equations (95) and (96): First, it’s easy to see that the operators
M(m) m∈I of (95) form a POVM, and the converse is also true—any POVM arises from some

general measurement where the post-measurement state is neglected. To see this, let M(m) m∈I
√
be any (finitary) POVM. If we define measurement elements K(m) := M(m) for each m ∈ I,
then these elements form the operator-sum view of a generalized measurement, one operator per
outcome, and the resulting outcome probabilities are the same as with the given POVM. Second,
Postulate 3 only allows one operator per outcome, and so it is the special case of (96) where N = 1.

Completely Positive Maps. This is another optional topic. We’ve seen that every (complete)
quantum channel E maps states to states; equivalently, it has two properties:

1. E preserves positivity, i.e., if A > 0 then E(A) > 0, and

2. E is trace-preserving, i.e., tr E(A) = tr A.

We’ll see shortly that the converse does not hold. That is, there are linear maps satisfying (1)
and (2) above that are not legitimate quantum channels according to Definition 24.6. To get a
characterization, we need to strengthen (1) a bit. We say that E is positive if (1) holds, i.e., if E maps
positive operators to positive operators. The stronger condition we need is that E be completely
positive—a condition that we now explain.
23
Actually, the lists could contain different numbers of operators, but we can assume they are all the same length by
padding shorter lists with copies of the zero operator.

150
Quantum channels are linear maps, and we can form tensor products of these linear maps
just as we can with any linear maps. So, given two superoperators E ∈ T(H, J) and F ∈ T(K, M)
(H, J, K, and M are Hilbert spaces), we define E ⊗ F as usual to be the unique superoperator in
T(H ⊗ K, J ⊗ M) that takes A ⊗ B to E(A) ⊗ F(B) for every A ∈ L(H) and B ∈ L(K).
For every Hilbert space H we have the identity superoperator I ∈ T(H) defined by I(A) = A
for all A ∈ L(H). I is certainly a quantum channel, given by the single Kraus operator I ∈ L(H).
The next definition gives the strengthening of property (1) that we need:

Definition 24.7 Let H and J be Hilbert spaces. A superoperator E ∈ T(H, J) is completely positive
if for every Hilbert space K, the map I ⊗ E ∈ T(K ⊗ H, K ⊗ J) is positive, where I ∈ T(K) is the
identity map on L(K).

If E is completely positive as in Definition 24.7, then E is also positive: If X is any operator in

either L(H) or L(J), then it is easy to check that X > 0 if and only if I ⊗ X > 0, where I ∈ L(K) is
the identity operator. Then if X > 0, then I ⊗ X > 0, and so by assumption,

(I ⊗ E)(I ⊗ X) = I(I) ⊗ E(X) = I ⊗ E(X) > 0,

and thus E(X) > 0. This means that E is positive. Therefore, complete positivity is at least as strong
a condition as positivity.
It may be counterintuitive, but there are maps E that are positive but not completely positive.
Here’s a great example. Fix some orthonormal basis for H so that we can identify operators
on H with matrices. Now consider the transpose operator T that takes any square matrix to its
transpose (not the adjoint, just the transpose), i.e., T (A) = AT for any matrix A. With respect to
the chosen basis, we can think of T as a map from operators in L(H) to operators in L(H), and it
is clearly a linear map, and so T ∈ T(H). T is obviously trace-preserving, and it is also positive:
For any square matrix A, it is easily checked that if A > 0 then AT > 0 as well (if A is normal,
then so is AT , and both matrices have the same spectrum). T is not completely positive, however,
provided dim(H) > 2. Suppose H = K is the state space of a single qubit, and we fix the standard
computational basis {|0i, |1i} for H. Consider the matrix
 
1 0 0 1
1 0 0 0 0 
A = Φ+ Φ+ = 

 > 0.
2 0 0 0 0 
1 0 0 1

Applying I ⊗ T (sometimes called the partial transpose) to A means taking the transpose of each
2 × 2 block (the T part), but not rearranging the blocks at all (the I part). Thus,
 
1 0 0 0
1 0 0 1 0   = 1 SWAP,
(I ⊗ T )(A) = 
2 0 1 0 0  2
0 0 0 1

where we recall that the two-qubit SWAP operator swaps the qubits, i.e., SWAP|ai|bi = |bi|ai for
any a, b ∈ {0, 1}. The eigenvalues of SWAP are 1, 1, 1, −1 (see Exercise 12.2), and so SWAP is not a
positive operator. This shows that I ⊗ T is not a positive map, and so T is not completely positive.

151
Theorem 24.8 Let H and J be Hilbert spaces, and let E ∈ T(H, J) be superoperator. E is a quantum
channel (Definition 24.6) if and only if E is trace-preserving and completely positive.

Proof. First the forward direction. Let E be a quantum channel given in the operator-sum
P
maps from H to J) such that N
representation by Kraus operators K1 , . . . , KN (linearP ∗
j=1 Kj Kj = IH ,
where IH is the identity operator on H, and E(X) = j Kj XKj for any X ∈ L(H). (Recall that each
∗

K∗j is a linear map from J to H. See page 19.) We have

  
X
N X
tr(E(X)) = tr(Kj XK∗j ) = tr  K∗j Kj  X = tr X
j=1 j

for any X ∈ L(H), so E is trace-preserving.

We show that E is completely positive in two easy steps: (1) we show that any quantum channel
is a positive map, and (2) we show that if K is any Hilbert space and I is the identity on L(K), then
I ⊗ E is also a quantum channel, hence I ⊗ E is positive, hence E is completely positive. Step 1
was done in the case where H = J in Exercise 24.2. The general case is similar: If X ∈ L(H) is any
positive operator and v ∈ J is any vector, then letting Y := E(X), we want to show that v∗ Yv > 0,
which shows that Y is a positive operator in L(J), hence E is a positive map. For 1 6 j 6 N, define
uj := K∗j v ∈ H. We now have

X
N X X
∗
v Yv = v∗ Kj XK∗j v = (K∗j v)∗ XK∗j v = u∗j Xuj > 0
j=1 j j

as desired, since X > 0. Because E is an arbitrary quantum channel, this shows that every quantum
channel is a positive map.
For Step 2, let K be any Hilbert space, let IK ∈ L(K) be the identity operator on K, and let
I ∈ T(K) be the identity map on L(K). We want to show that I ⊗ E ∈ T(K ⊗ H, K ⊗ J) is a quantum
channel, so we must come up with Kraus operators for I ⊗ E: For 1 6 j 6 N, define Lj = IK ⊗ Kj .
Each Lj is a linear map from K ⊗ H to K ⊗ J, i.e., Lj ∈ L(K ⊗ H, K ⊗ J), and for completeness, we
have  
XN X X
L∗j Lj = (IK ⊗ Kj )∗ (IK ⊗ Kj ) = IK ⊗  K∗j Kj  = IK ⊗ IH ,
j=1 j j

which is the identity map on K ⊗ H. Finally, if A ∈ L(K) and B ∈ L(H) are arbitrary operators
and we set C := A ⊗ B, we have

X
N X
(I ⊗ E)(C) = (I ⊗ E)(A ⊗ B) = A ⊗ E(B) = (IK ⊗ Kj )(A ⊗ B)(IK ⊗ K∗j ) = Lj CL∗j . (97)
j=1 j

Both sides of Equation (97) are linear in C, and so (97) extends to arbitrary C ∈ L(K ⊗ H). This
shows that I ⊗ E is a quantum channel and hence positive, making E completely positive.
Now the reverse direction. Suppose E ∈ T(H, J) is trace-preserving and completely positive.
We need to come up with Kraus operators for E. Let n := dim(H), and let I be the identity map on

152
L(H). Fix an orthonormal basis {e1 , . . . , en } for H. Taking the product of this basis with itself, we
get a basis {ei,j : 1 6 i, j 6 n} for H ⊗ H, where for convenience we define ei,j := ei ⊗ ej . Define
the vector
X
n X
v := ei,i = ei ⊗ ei ∈ H ⊗ H.
i=1 i
The operator vv∗ ∈ L(H ⊗ H) is clearly positive, and so by assumption, the operator
J := J(E) := (I ⊗ E)(vv∗ ) ∈ L(H ⊗ J)
is also positive. (We are letting K = H.) We have
X
n X
n X
vv∗ = (ei ⊗ ei )(e∗j ⊗ e∗j ) = ei e∗j ⊗ ei e∗j = Eij ⊗ Eij ,
i,j=1 i,j=1 i,j

where we let Eij := ei e∗j as usual. Thus

 
X X
n
J = (I ⊗ E)  Eij ⊗ Eij  = Eij ⊗ E(Eij ) .24
i,j i,j=1

Because J > 0, we can choose some eigenbasis {g1 , . . . , gN } for J, where N := n2 = dim(H ⊗ H).
This allows us to write
XN
J= λk gk g∗k ,
k=1
where λ1 , . . . , λN > 0 are the eigenvalues of J. For 1 6 k 6 N, we can now define the Kraus
operator Kk ∈ L(H) by its matrix with respect to the {ei,j } basis: for all 1 6 i, j 6 n, define

[Kk ]ij := λk ej,i , gk = λk e∗j,i gk .

p
p

P PN
We need to check that N k=1 Kk Kk = I (completeness) and that E(X) =
∗ ∗
k=1 Kk XKk for all
X ∈ L(H).
For completeness, fix some a, b ∈ {1, . . . , n}, and using the fact that E is trace-preserving,
compute
X XX XX X X
"N # n
K∗k Kk = [K∗k ]ac [Kk ]cb = [Kk ]∗ca [Kk ]cb = λk hea,c , gk i∗ heb,c , gk i
k=1 ab k c=1 k c k c
X X X X
= λk heb,c , gk ihgk , ea,c i = λk e∗b,c gk g∗k ea,c
k c k c
X X X XX
n
!
= e∗b,c λk gk g∗k ea,c = e∗b,c Jea,c = e∗b,c (Eij ⊗ E(Eij ))ea,c
c k c c i,j=1
X X
= e∗b Eij ea ⊗ e∗c (E(Eij ))ec = e∗c (E(Eba ))ec = tr[E(Eba )] = tr Eba = δab .
c,i,j c

24
It is not needed for the proof, but it is interesting to note that the matrix J contains complete information about E.
It includes all the n2 matrices E(Eij ) laid out as n × n blocks in an n2 × n2 matrix. Since the Eij form a basis for L(H),
E is completely determined by the matrices E(Eij ). J = J(E) is called the Choi representation of E. The condition J > 0 is
actually equivalent to E being completely positive, for any superoperator E.

153
PN ∗
From this we get that k=1 Kk Kk is the identity matrix. This shows completeness.
Now let X ∈ L(H) be arbitrary. Again, we compare matrix elements with respect to the {ei }
basis. For any 1 6 a, b 6 n, we have
X X X X X
"N # n n
Kk XK∗k = [Kk ]ac [X]cd [K∗k ]db = [X]cd [Kk ]ac [Kk ]∗bd
k=1 ab k c,d=1 k c,d=1
X X X X
= λk [X]cd hec,a , gk ihgk , ed,b i = λk [X]cd e∗c,a gk g∗k ed,b
k c,d k c,d
X
= [X]cd e∗c,a Jed,b (just as before)
c,d
X X
n
= [X]cd e∗c,a (Eij ⊗ E(Eij ))ed,b
c,d i,j=1
X X X
= [X]cd e∗c Eij ed ⊗ e∗a (E(Eij ))eb = [X]cd e∗a (E(Ecd ))eb
c,d i,j c,d
    
X X
= e∗a  [X]cd E(Ecd ) eb = e∗a E  [X]cd Ecd  eb
c,d c,d
∗
= ea (E(X))eb = [E(X)]ab .
PN
Thus E(X) = ∗
k=1 Kk XKk as we wanted. 2

Exercise 24.9 (Optional) Show that the composition of two quantum channels is a quantum chan-
nel. That is, let E ∈ T(H, J) and F ∈ T(J, K) be quantum channels. Show that F ◦ E ∈ T(H, K) is a
quantum channel, where F ◦ E is defined as (F ◦ E)(X) := F(E(X)) for all X ∈ L(H).

Exercise 24.10 (Challenging, Optional) Show that the tensor product of two quantum channels is
a quantum channel. That is, let E ∈ T(H, J) and F ∈ T(K, M) be quantum channels. Show that
E ⊗ F ∈ T(H ⊗ K, J ⊗ M) is a quantum channel.

Definition 24.6 defines what are sometimes called complete quantum channels, and a general
quantum channel (not necessarily complete) is defined the same way, except that we replace the
P PN ∗
completeness condition N ∗
j=1 Kj Kj = IH with the looser condition j=1 Kj Kj 6 IH . Incomplete
quantum channels are used to describe physical processes that may not happen with certainty, e.g.,
a general measurement that results in some outcome m.

Exercise 24.11 (Challenging, Optional) Show that a superoperator E ∈ T(H, J) is a general quan-
tum channel, as described above, if and only if (1) E is completely positive, and (2) for every state
(positive operator with unit trace) ρ ∈ L(H), we have 0 6 tr(E(ρ)) 6 1. The quantity tr(E(ρ)) is
P
interpreted as the probability that E actually occurs. [Hint: Set L := IH − N ∗
j=1 Kj Kj , where √
the
Kj are the Kraus operators corresponding to E as above. Since L > 0, you can define KN+1 := L,
PN+1
and then define E 0 (X) := j=1 Kj XK∗j for any X ∈ L(H). Notice that E 0 is a complete quantum
√ √
channel and that E(X) = E 0 (X) − LX L. Also note that L 6 IH . Apply Theorem 24.8 to E 0 , and
use it to prove facts about E.]

154
Exercise 24.12 (Challenging, Optional) Show that the partial trace map is always a (complete)
quantum channel. [Hint: Let trH : L(H ⊗ J) → L(J) be a partial trace map. Note that by linearity,
trH ∈ T(H ⊗ J, J) is a superoperator. Fix orthonormal bases {e1 , . . . , en } and {f1 , . . . , fm } for H and
J, respectively, and for each j with 1 6 j 6 n, define the Kraus operator Kj ∈ L(H ⊗ J, J) by

X
m X
m
Kj := e∗j ⊗ IJ = e∗j ⊗ fk f∗k = fk (e∗j ⊗ f∗k ),
k=1 k=1

where IJ is the identity

mapon J. In other words, for every vector in H ⊗ J of the form u ⊗ v,
we have Kj (u ⊗ v) = ej , u v. Show that the Kj satisfy the completeness condition and then
characterize trH accordingly.]

25 Week 12: Distance and fidelity

Distance Measures. First, some basic definitions from probability theory.

Recall that we have been talking about a probability distribution as a finite list of values p =
P
(p1 , p2 , . . . , pk ) such that pj > 0 for all 1 6 j 6 k and k j=1 pj = 1. Here, the set {1, . . . , k} is called
the sample space. More generally, any finite or countable set Ω can be used as a sample space, in
which Pcase, a probability distribution on Ω is a map p : Ω → R such that p(a) > 0 for all a ∈ Ω,
and a∈Ω p(a) = 1. Subsets of Ω are called events, and elements of Ω, which we identify with
singleton subsets of Ω, are called elementary events. If S ⊆ Ω is some event, then the probability of S
(with respect to the probability distribution p, above) is defined as
X
Pr[S] := p(a).
p
a∈S

We might drop the subscript p if it is clear what probability distribution we are using.
If p and q are two probability distributions over the same sample space, we are interested in
measures of the similarity or difference between p and q. We’ll discuss two here: the trace distance
and the fidelity.

Definition 25.1 Let p and q be two probability distributions on the same sample space Ω. The
trace distance (also called the L1 distance or the Kolmogorov distance) between p and q is defined as

1 X
D(p, q) := |p(a) − q(a)|.
2
a∈Ω

It is easy to check that D satisfies the axioms for a metric on the set of all probability distributions
on Ω. These are:

1. D(p, q) > 0,

2. D(p, q) = 0 iff p = q,

3. D(p, q) = D(q, p), and

155
4. D(p, r) 6 D(p, q) + D(q, r),

for any probability distributions p, q, r on Ω. Here’s another way of characterizing the trace
distance: for any probability distributions p and q on Ω,

D(p, q) = max |Prp [S] − Prq [S]| = max (Prp [S] − Prq [S]) . (98)
S⊆Ω S⊆Ω

Exercise 25.2 Prove Equation (98).

The trace distance guages the difference between two distributions p and q. The fidelity, on
the other hand, is a measure of their similarity; it is maximized when p = q.

Definition 25.3 Let p and q be two probability distributions on the same sample space Ω. The
fidelity of p and q is defined as X p
F(p, q) := p(a)q(a).
a∈Ω

p F(p, q) can be seen as the dot product of ptwo real unit vectors—the vector whose a’th entry is
p(a) and the vector whose a’th entry is q(a). Since these two vectors clearly have unit norm,
the fidelity is then the cosine of the angle between them. Thus we immediately get 0 6 F(p, q) 6 1,
with F(p, q) = 1 iff p = q.

Trace Distance and Fidelity of Operators. We’d like to extend these definitions to quantum
states, i.e., operators. A reasonable sanity check on the way we should define such an extension
would be to say that if ρ and σ are mixtures of the same set of pairwise orthogonal pure states with
(eigenvalue) probability distributions r and s, respectively, then D(ρ, σ) should be equal to D(r, s),
P
and F(ρ, σ) should be equal to F(r, s). Let’s see this in more detail. Suppose ρ = k j=1 rj ρj and
Pk
σ = j=1 sj ρj , where the pure states ρj project onto mutually orthogonal subspaces (equivalently,
ρi ρj = δij ρi for any i and j). Now consider the operator |ρ − σ|. We have

|ρ − σ| =
p
(ρ − σ)∗ (ρ − σ)
q
= (ρ − σ)2
 2 1/2
Xk
=  (rj − sj )ρj  
 
j=1
 1/2
Xk
=  (rj − sj )2 ρj  ,
j=1

because the cross-terms (ρi ρj for i , j) all vanish when we expand the expression inside the square
brackets. Since the ρj project onto mutually orthogonal subspaces, we can choose an orthonormal
basis in which all the ρj are diagonal matrices simultaneously. Permuting the basis vectors if need

156
be, we can assume that each ρj (which is a one-dimensional projector) is given by the matrix Ejj .
P P
Thus k 2
j=1 (rj − sj ) ρj =
2 2 2 2
j (rj − sj ) Ejj = diag[(r1 − s1 ) , (r2 − s2 ) , . . . , (rk − sk ) , 0, . . . , 0]. To
take the square root of this matrix, we just take the square root of each diagonal entry, which gives
the matrix diag[|r1 − s1 |, |r2 − s2 |, . . . , |rk − sk |, 0, . . . , 0], and so this is |ρ − σ| in matrix form. Taking
one half of the trace of this gives

1X
k
1
tr |ρ − σ| = |rj − sj | = D(r, s).
2 2
j=1

This suggests that we can now define the trace distance D(ρ, σ) for arbitrary operators A and B as
1 1
D(A, B) := tr |A − B| = kA − Bk1 .
2 2
We can do something similar to define the fidelity of two arbitrary positive operators. I won’t
do the details here, but a reasonable definition is
p √ √
F(A, B) := tr A1/2 BA1/2 = B A (99)

1

for arbitrary operators A, B > 0. It can be shown that F(A, B) = F(B, A), and if ρ and σ are states,
then 0 6 F(ρ, σ) 6 1 with F(ρ, σ) = 1 iff ρ = σ.
We do the same sanity check for F as we did for D, above. If ρ and σ are commuting states as
before, i.e., ρ = diag(r1 , . . . , rk , 0, . . . , 0) and σ = diag(s1 , . . . , sk , 0, . . . , 0) with respect to the same
orthonormal basis, then we have
q X
k
√
F(ρ, σ) = tr diag(r1 s1 , . . . , rk sk , 0, . . . , 0) = rj sj = F(r, s).
j=1

Exercise 25.4 Show that kABk1 = kBAk1 for any Hermitean operators A and B. Thus the fidelity
function F of (99) is symmetric. [Hint: Use Property 10 of the norm, which says that kCk1 = kC∗ k1
for any operator C.]

Properties of the Trace Distance. The trace distance of operators has an alternate characterization
analogous to Equation (98). If A and B are operators, we say that A 6 B if B − A > 0. We’ll show
that for any states ρ and σ,
D(ρ, σ) = max tr(P(ρ − σ)) = max tr(P(ρ − σ)) = max tr(P(ρ − σ)), (100)
projectors P P>0 & kPk=1 06P6I

where the three maxima are taken over all projectors P, all positive operators P of unit operator
norm (L∞ norm), and all operators P such that 0 6 P 6 I, respectively. Equation (100) has many
uses. We won’t bother to do it here, but it is straightforward to check—as a consequence of
Equation (100)—that D(ρ, σ) is the maximum probability difference of any outcome of a POVM
applied to ρ and to σ. The function D is also a metric on the set of all quantum states of a given
system, that is, it can be shown to satisfy the axioms for a metric on page 114, and (100) helps with
showing the triangle inequality for D.
Actually, we’ll show a result slightly more general than Equation (100):

157
Proposition 25.5 Suppose that A is a traceless Hermitean operator, i.e., tr A = 0 and A = A∗ . Let
λ1 , . . . , λn ∈ R be the eigenvalues of A (A acts on an n-dimensional space). The following quantities are all
equal:

1. (1/2)kAk1 ,

2. (1/2) tr |A|,
P
3. i:λi >0 λi ,

4. maxprojectors P tr(PA),

5. max06P & kPk=1 tr(PA),

6. max06P6I tr(PA).

Proof. We’ll do these in increasing order of difficulty.

P (1) = (2) follows directly

P from the definition of k·kP 1 (Equation (83)). For (2) = (3), let p :=
n
λ
i:λi >0 i and let q := λ
i:λi <0 i . Note that p + q = i=1 λi = tr A = 0, and so q = −p. Also
note that the eigenvalues of |A| are |λ1 |, . . . , |λn |. This is easiest to see by taking an eigenbasis for A
(A is normal because it is Hermitean) and looking at the matrices for A and |A|. So we have

X
n X X
tr |A| = |λi | = λi − λi = p − q = 2p.
i=1 i:λi >0 i:λi <0

The inequalities (3) 6 (4) 6 (5) 6 (6) are pretty straightforward and we leave these as exercises.
It remains to show that (6) 6 (3). Consider the expression max06P6I tr(PA) of (6). The
key insight is to show first that the maximum is achieved by some P that commutes with A (i.e.,
PA = AP). Once that fact is established, the rest is easy: we can pick a common eigenbasis for P
and A and look at diagonal matrices.
Suppose that 0 6 P 6 I and that P does not commute with A. We will find an operator P 0
such that 0 6 P 0 6 I and tr(P 0 A) > tr(PA), and so the maximum is not achieved by P.25 Set
C := i(AP − PA). Note that C is Hermitean, because both P and A are, and C , 0 by assumption.
(The quantity AP − PA, for any operators A and P, is called the commutator or the Lie bracket
(pronounced, “LEE”) of A and P, and is denoted by [A, P].) For any ε > 0, define

Uε := e−iεC = I − iεC + O(ε2 ).

Then Uε is unitary by Item 4 of Exercise 9.3. The “O(ε2 )” here denotes an operator (depending on
ε) whose norm (it doesn’t matter which norm) is bounded by some positive constant times ε2 . We
now define
P 0 := Uε PU∗ε
25
We are tacitly assuming that the maximum is achieved by some P such that 0 6 P 6 I. This is in fact true, and it
follows from concepts in topology that we won’t go into here, namely, continuity and compactness.

158
for some ε > 0 that we will choose later. It is easy to check that 0 6 P 0 6 I. Now we have

tr(P 0 A) = tr(Uε PU∗ε A)

= tr (I − iεC + O(ε2 ))P(I + iεC + O(ε2 ))A

= tr PA + iεPCA − iεCPA + O(ε2 )

= tr(PA) + iε [tr(PCA) − tr(CPA)] + O(ε2 )

= tr(PA) + iε [tr(CAP) − tr(CPA)] + O(ε2 )
= tr(PA) + iε tr [C(AP − PA)] + O(ε2 )
= tr(PA) + ε tr(C2 ) + O(ε2 ).

Now C2 = C∗ C > 0, and since C , 0, we must then have tr(C2 ) > 0, either by Exercise 9.28 or by
observing that tr(C∗ C) = hC, Ci > 0 (Hilbert-Schmidt inner product). Now we can choose ε small
enough so that ε tr(C2 ) strictly dominates the O(ε2 ) error term, yielding tr(P 0 A) > tr(PA). This
shows that the maximum value of tr(PA) is achieved only when P commutes with A, i.e.,

max tr(PA) = max tr(PA).

06P6I 06P6I & PA=AP

Finally suppose that 0 6 P 6 I and that P commutes with A. Pick a common eigenbasis for
P and A so that, with respect to this basis, A = diag(λ1 , . . . , λn ) and P = diag(µ1 , . . . , µn ). Since
0 6 P 6 I, we must have 0 6 µ1 , . . . , µn 6 1, but otherwise, we are free to choose the µj arbitrarily
(see the hint to Exercise 25.7, below). We now have
X
n
tr(PA) = µ j λj ,
j=1

and this sum is the largest possible when we define

1 if λj > 0,
µj :=
0 otherwise.
P
For this choice of the µj , we get tr(PA) = j:λj >0 λj , and we’ve shown that this is the largest
possible value for tr(PA) with 0 6 P 6 I. (Note that the optimal P is a projector. That’s a direct
way to see that (4) 6 (3).) 2

P Prove (3) 6 (4) in Proposition

Exercise 25.6 P 25.5, above. [Hint: Find a projector P such that
tr(PA) = i:λi >0 λi . This shows that i:λi >0 λi 6 maxprojectors P tr(PA). To find P, consider the
subspace spanned by all the eigenvectors of A with positive eigenvalues.]

Exercise 25.7 Prove (4) 6 (5) 6 (6) in Proposition 25.5, above. [Hint: The following easy facts are
useful for any operator P:

• 0 6 P 6 I if and only if P is normal and all its eigenvalues are in the closed interval [0, 1] ⊆ R
(consider an eigenbasis for P).

• Recall that kPk is the maximum eigenvalue of |P|.

159
• Recall (Exercise 9.35) that 0 6 P iff P = |P|, and thus if 0 6 P then kPk is the largest eigenvalue
of P itself.

For (4) 6 (5), note that every nonzero projector P satisfies 0 6 P and kPk = 1. You need to treat
the case where P = 0 separately. (5) 6 (6) is straightforward.]

Exercise 25.8 Use Proposition 25.5 to prove Equation (100).

Exercise 25.9 Let ρ1 = (I + ~r · ~σ)/2 = (I + rx X + ry Y + rz Z)/2 and τ = (I + ~t · ~σ)/2 = (I +

tx X + ty Y + tz Z)/2 be single-qubit states, where X, Y, Z are the usual Pauli spin matrices and
~r = (rx , ry , rz ) ∈ R3 and ~t = (tx , ty , tz ) ∈ R3 are either on or inside the Bloch sphere. Show that

~r − ~t 1
q
D(ρ, τ) = = (rx − tx )2 + (ry − ty )2 + (rz − tz )2 ,
2 2
i.e., half the Euclidean distance between ~r and ~t.

In Proposition 25.13, below, I’ll mention one more interesting property of the trace distance:
it can never increase via a quantum channel. This says that all (complete) quantum channels are
contractive with respect to the metric D. So if no classical information is coming out of an open
quantum system, its dynamics tends to cause states to become less distinguishable, not more. This
is not necessarily the case with incomplete quantum channels, where some classical information
is obtained.

Lemma 25.10 (Jordan-Hahn decomposition) For any Hermitean operator A, there exist unique positive
operators Q and S such that QS = SQ = 0 and A = Q − S.

Proof. To prove existence, define

|A| + A
Q := ,
2
|A| − A
S := .
2
Evidently, A = Q − S; moreover,

1
QS = (|A|2 − |A|A + A|A| − A2 ) = |A|2 − A2 ,
4
because A commutes with |A|. Since A is Hermitean, we have A2 = A∗ A = |A|2 , and hence, QS = 0.
A similar argument gives SQ = 0. It remains to show that Q and S are both positive. Let λ1 , . . . , λk
be the distinct eigenvalues of A. The λi are all real, since A is Hermitean. By Corollary 9.17, we
have a unique decomposition
A = λ1 P1 + · · · + λk Pk ,
where the Pj form a complete set of orthogonal projectors. Then

|A| = |λ1 |P1 + · · · + |λk |Pk ,

160
which immediately implies

1 X
Q = ((|λ1 | + λ1 )P1 + · · · + (|λk | + λk )Pk ) = λj Pj ,
2
j:λj >0
1 X
S = ((|λ1 | − λ1 )P1 + · · · + (|λk | − λk )Pk ) = (−λj )Pj .
2
j:λj <0

All the coefficients above are nonnegative real numbers, and since all the Pi are positive, Q and S
must both be positive.
To prove uniqueness, suppose some positive operators Q and S satisfy the conditions of the
lemma. Since QS = 0 = SQ, Q and S commute with each other, and thus Q and S both commute
with Q − S = A. Since A, Q, and S are all normal operators, they share a common eigenbasis B by
Theorem 9.41. With respect to B, these three operators are represented by diagonal matrices:

A = diag(a1 , . . . , an ) ,
Q = diag(q1 , . . . , qn ) ,
S = diag(s1 , . . . , sn ) ,

for some a1 , . . . , an ∈ R (because A is Hermitean) and q1 , . . . , qn , s1 , . . . , sn > 0 (because Q, S > 0).

Let 1 6 i 6 n be arbitrary. The conditions A = Q − S and QS = 0 imply ai = qi − si and qi si = 0,
respectively. The latter equation implies at least one of qi and si is zero. Since qi , si > 0, we must
have (qi , si ) = (ai , 0) if ai > 0 and (qi , si ) = (0, −ai ) if ai 6 0. (If ai = 0, then qi = si = 0.).
Thus qi and si are uniquely determined, given ai . Since i was arbitrary, Q and S are uniquely
determined given A. 2
The condition that QS = SQ = 0 is often referred to as Q and S having “orthogonal support,”
and it is equivalent to hQ, Si = 0 (for any positive operators Q and S.).

Exercise
p 25.11 Show that if A, Q, and S are as in Lemma 25.10, then |A| = |Q − S| = |Q + S| =
Q2 + S2 .

Exercise 25.12 Show that if A, Q, and S are as in Lemma 25.10, then tr |A| = tr Q+tr S. [Hint: Either
pick a common eigenbasis for Q and S or use the decomposition in the proof of Lemma 25.10.]

Proposition 25.13 Let E ∈ T(H, J) be a (complete, i.e., trace-preserving) quantum channel, and let ρ and
σ be states in L(H). Then D(E(ρ), E(σ)) 6 D(ρ, σ).

Proof. The operator ρ − σ satisfies Lemma 25.10, so uniquely write ρ − σ = Q − S, where Q, S > 0
and QS = SQ = 0. Now we have
1 1
D(ρ, σ) = tr |ρ − σ| = (tr Q + tr S) (Exercise 25.12)
2 2
= tr Q (because tr Q − tr S = tr(Q − S) = tr(ρ − σ) = 1 − 1 = 0)
= tr(E(Q)) . (E is trace-preserving)

161
Noticing that E(ρ) − E(σ) = E(ρ − σ) is a traceless Hermitean operator, we can choose a projector
P that maximizes tr(P(E(ρ) − E(σ))). Continuing on, we get

tr(E(Q)) > tr(PE(Q)) (by Corollary 9.40 and P > 0)

> tr(P(E(Q) − E(S))) (because tr(P(E(S))) = hP, E(S)i > 0 by Theorem 9.31)
= tr(P(E(Q − S))) = tr(P(E(ρ − σ)))
= tr(P(E(ρ) − E(σ)))
= D(E(ρ), E(σ)) (choice of P and Equation (100))

Thus D(E(ρ), E(σ)) 6 D(ρ, σ), which proves the theorem. 2

A particular example of Proposition 25.13 is when E is a partial trace operator (see Exer-
cise 24.12). The interpretation is that if we ignore part of a system, we lose distinguishability
between its states.

Properties of the Fidelity. An important special case of Equation (99) is when ρ = uu∗ is a pure
state (u is a unit vector). We may prepare a pure state ρ, then send it through a noisy quantum
channel (quantum channel E), producing a state σ at the other end. The fidelity F(ρ, σ) is a good
measure of how much the state was garbled in the transmission—the higher the fidelity, the less
garbling. For ρ = uu∗ and any state σ, we have
√ √ √
F(uu∗ , σ) = tr u(u∗ σu)u∗ = u∗ σu tr uu∗ = u∗ σu ,
p
(101)
√
noting that uu∗ = uu∗ , which has unit trace. In Dirac notation, letting |ψi := u, this becomes

F(|ψihψ|, σ) = tr |ψihψ|σ|psiihpsi| = hψ|σ|ψi tr |psiihpsi| = hψ|σ|ψi .

p p p p
(102)

There is a fact about the fidelity analogous with Proposition 25.13 regarding the trace distance.
We’ll state it without proof.

Proposition 25.14 Suppose E ∈ T(H, J) is a complete quantum channel. For any two states ρ, σ ∈ L(H),

F(E(ρ), E(σ)) > F(ρ, σ) .

Comparing Trace Distance and Fidelity. The trace distance and fidelity are roughly inter-
changeablepas measures of distinctness/similarity. For pure states ρ and σ it can be shown that
D(ρ, σ) = 1 − F(ρ, σ)2 . For arbitrary states ρ and σ, it can be shown that
q
1 − F(ρ, σ) 6 D(ρ, σ) 6 1 − F(ρ, σ)2 ,

or equivalently, q
1 − D(ρ, σ) 6 F(ρ, σ) 6 1 − D(ρ, σ)2 .
These inequalities are known as the Fuchs-van de Graaf inequalities. So in most situations, it doesn’t
really matter which measure is used. The book uses the fidelity measure almost exclusively. The
inequalities above are illustrated in Figure 10.

162
y

(0, 1)

x
(0, 0) (1, 0)

Figure 10: For any two states ρ and σ, let x := D(ρ, σ) and y := F(ρ, σ). The Fuchs-van de Graaf
inequalities say that the point (x, y) must lie in the shaded region bounded by the line x+y = 1 and
the arc of the unit circle in the first quadrant. This region is symmetric with respect to reflection
through the line x = y.

163
26 Week 13: Quantum error correction

Quantum Error Correction. In this topic, we’ll see ways to reduce the effects of noise in a quantum
channel, thereby increasing the fidelity between the input state to the channel and the output state.
First, we’ll see a typical scenario where this is done classically. Suppose Alice sends individual
bits to Bob across a channel that is noisy in the following sense: any bit b is flipped to the opposite
bit 1 − b with probability p, independent of the other bits. Such a channel is called the binary
symmetric channel or bit-flip channel, and is an often-used model of classical noise. We can assume
that p 6 1/2, because if p > 1/2, then Bob would do well to flip each bit he receives, making the
effective error probability 1 − p < 1/2 per bit sent. If p = 1/2, then all hope is lost; no information
at all can be carried by the bits; Bob receives independently random bits that are completely
uncorrelated with those that Alice sent. So we’ll assume that p < 1/2 from now on.
To reduce the chances of error per bit, Alice and Bob agree on an binary error-correcting code,
which is some mapping

0 7→ c0 ,
1 7→ c1 ,

where c0 and c1 are strings over the binary alphabet {0, 1} (binary strings) of equal length, called
codewords. Instead of sending each bit b, Alice sends cb instead, and Bob decodes what he receives
to (hopefully) recover b. An obvious error-correcting code is

0 7→ 000,
1 7→ 111,

which we’ll call the majority-of-3 code26 . Alice wants to send a bit b (the plaintext or cleartext) to Bob,
so she sends bbb across the channel. When Bob receives the possibly garbled string xyz of three
bits from Alice, he decodes xyz to get the bit c as follows:

c := majority(x, y, z).

The bit b was decoded successfully iff c = b. What is the failure probability, i.e., the probability
that c , b due to unrecoverable errors? There will be a failure iff at least two of the three bits were
flipped in transit. Since each is flipped with probability p independent of the others, we have

Pr[failure] = 3(1 − p)p2 + p3 = 3p2 − 2p3 .

The first term in the middle is the probability that exactly two of the three bits were flipped, and
the second term in the middle is the probability that all three bits were flipped. It is easy to see
that Pr[failure] < p if p < 1/2, and so this code reduces the probability of error per plaintext bit
from no encoding at all. Finally, note that Pr[failure] = O(p2 ) as p tends to 0, and so for tiny p 1,
the failure probability is reduced by a considerable factor.
26
This is an example of a repetition code.

164
|ψi

|0i

Figure 11: The three-qubit quantum majority-of-3 code. An arbitrary one-qubit state |ψi = α|0i +
β|1i is encoded as |ψL i = α|0L i + β|1L i = α|000i + β|111i.

The Quantum Bit-Flip Channel. Now we can “quantize” the scheme above. Suppose Alice
sends qubits one at a time to Bob across a noisy quantum channel that we will call the quantum
bit-flip channel. In the quantum bit-flip channel, a Pauli X operator is applied to each transmitted
qubit with probability p < 1/2, independendently for each qubit. The corresponding quantum
channel is thus
E(ρ) := (1 − p)ρ + pXρX, (103)
√ √
whose set of Kraus operators is { 1 − p I, p X}.27 Suppose that Alice sends some unencoded
one-qubit pure state |ψihψ| through the bit-flip channel E. Ideally, Bob wants to receive |ψihψ|, but
in reality, Bob receives ρ 0 := E(|ψihψ|) = (1 − p)|ψihψ| + pX|ψihψ|X. The fidelity between Alice’s
sent state and Bob’s received state is, by Equation (102),
q
F(|ψihψ|, ρ ) = hψ|ρ |ψi = (1 − p) + phψ|X|ψi2 > 1 − p ,
0
p p
0

√ equality holding if |ψi = |0i or |ψi = |1i. So the fidelity without encoding can be as low as
with
1 − p.
Now suppose that Alice and Bob employ a quantum version of the majority-of-3 code. Alice
encodes each plaintext qubit she sends to Bob as a three-qubit code state using the map

|0i 7→ |0L i := |000i,

|1i 7→ |1L i := |111i

extended to all one-qubit pure states by linearity. Here the subscript “L” stands for “logical”—three
physical qubits are being used to encode one logical (uncoded) qubit. Figure 11 shows how Alice
encodes a single qubit in state |ψi := α|0i + β|1i as a three-qubit state |ψL i := α|0L i + β|1L i. |ψL i
lies in the code space, i.e., the two-dimensional subspace of the eight-dimensional Hilbert space of
three qubits spanned by |0L i and |1L i. The three qubits in state |ψL i are sent through the channel,
and (we assume) each qubit is subjected to the E of Equation (103) independently of the other two.
Thus the channel yields the output state

σ := (E ⊗ E ⊗ E)(|ψL ihψL |) = E⊗3 (|ψL ihψL |)

P
27
E is an example of a mixed unitary channel. Generally, a mixed unitary channel maps A ∈ L(H) to ki=1 pi Ui AU∗i ,
for some k, probability distribution (p1 , . . . , pk ), and unitary operators U1 , . . . , Uk ∈ L(H). The Kraus operators are then
√
pi Ui for 1 6 i 6 k.

165
Bob receives σ and wants to decode it to (hopefully) recover |ψi. Now some issues arise that
aren’t a problem in the classical case. Most importantly, Bob cannot just measure the physical
qubits he receives, since this will destroy the superposition making up |ψi. In fact, Bob’s error
correction operation cannot give him any classical information about |ψi; any such information
would disrupt |ψi. Instead, Bob can measure what kind of error occurred (if any) and correct
the error directly without disturbing |ψi. The type of error is called the error syndrome. Bob’s
decoding is a two-step process: First, Bob will measure the error syndrome, i.e., which bit (if any)
was flipped, without gaining any knowledge of what the values of the bit were before and after.
Second, knowing which qubit was flipped, Bob applies an X gate to that qubit, and this will allow
him to recover |ψi with high probability.
To measure the error syndrome, Bob makes a four-outcome projective measurement on his
three received qubits using the four projectors
P0 = |000ih000| + |111ih111|,
P1 = |100ih100| + |011ih011|,
P2 = |010ih010| + |101ih101|,
P3 = |001ih001| + |110ih110|.
P0 , . . . , P3 form a complete set of projectors, and each Pj is a two-dimensional projector. P0 projects
onto the code space and corresponds to the outcome of either no qubits flipped or all three qubits
flipped. P1 corresponds to the outcome of either the first qubit flipped and the other two left alone,
or the other two flipped and the first left alone. P2 and P3 are similar for the second and third
qubits, respectively. Note that, whatever the state was before the syndrome measurement, the
post-measurement state is in one of the four subspaces projected onto by P0 , . . . , P3 , respectively.
Let j ∈ {0, 1, 2, 3} be the outcome of Bob’s syndrome measurement, above. After the measure-
ment, Bob tries to recover |ψL i as follows: if j = 0, then Bob assumes that no qubits were flipped
(which is way more likely than all three being flipped), and so he does nothing; if 1 6 j 6 3, then
Bob assumes that the jth qubit was flipped (which is somewhat more likely than the other two
being flipped), and so he flips the jth qubit back by applying an X gate to it. No matter what qubits
were flipped in the channel, Bob has a state in the code space after the correction. If at most one
qubit was flipped, then Bob has |ψL i, and the recovery is successful. If more than one qubit was
flipped, then Bob has the state XL |ψL i, where XL is some three-qubit operator that swaps |0L i with
|1L i, and the recovery failed. (Bob doesn’t know at this point whether he succeeded or failed.)
In a moment, we’ll see in detail how Bob can perform these steps, but once he has |ψL i—and
if he really wants to—he can convert |ψL i back into |ψi by applying the circuit

which is the inverse of Alice’s circuit of Figure 11. The two gates labeled “tr” are what I call trace
gates. A trace gate just signifies that a qubit is no longer useful and is to be ignored, i.e., traced

166
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1


 X
σ X

 X

|0i b2

Figure 12: Bob’s error-recovery circuit for the quantum bit-flip channel. The middle three qubits
are what he receives from Alice, and the two outer qubits are ancillæ used in the syndrome
measurement.

out. So mathematically, a trace gate corresponds to a partial trace. Assuming the input state to the
circuit is |ψL i, the first qubit of the output will be in state |ψi. The traced-out qubits will both hold
|0i if the input state is in the code space. Bob usually does not want to recover |ψi by decoding
if the encoded state will be used for further computation, because those computations are more
fault-tolerant using the encoded state.
A circuit for Bob’s syndrome measurement and subsequent correction is shown in Figure 12.
Bob received the three-qubit state σ in the middle three qubits. His syndrome measurement is
split into two binary measurements: He first measures whether the first two of the three qubits
from Alice are different. The value b1 measured on the upper ancilla will be 1 iff they are, and 0
otherwise. In other words, b1 gives the parity (sum modulo 2) of the values of the first two qubits.
Similarly, the lower ancilla measurement is 1 iff the second two qubits are different. To correct the
state, Bob combines these two Boolean values to determine which qubit value, if any, is different
from the other two, and applies a classically controlled X gate to this qubit.

Exercise 26.1 Show mathematically that the syndrome measurement portion of Figure 12 is the
same as the projective measurement {P0 , P1 , P2 , P3 } described earlier. What values of b1 b2 corre-
spond to which Pj ?

Bob’s entire recovery process in Figure 12 can be described as a quantum channel R that maps
three-qubit states to three-qubit states: For input state σ, we have

X
3
R(σ) = P0 σP0 + Xj Pj σPj Xj , (104)
j=1

where Xj is the Pauli X gate applied to the jth qubit. That is,

X1 = X ⊗ I ⊗ I,
X2 = I ⊗ X ⊗ I,
X3 = I ⊗ I ⊗ X.

167
Thus the state after Bob’s recovery is

τ := R(σ) = R(E⊗3 (|ψL ihψL |)).

To get a handle on what τ is, first notice that |ψL ihψL | is a linear combination of operators of the form
|aL ihbL | = |aaaihbbb| for a, b ∈ {0, 1}. By Equation (103) we have E(|aihb|) = (1−p)|aihb|+p|āihb̄|,
where we let ā := 1 − a and b̄ := 1 − b. Then,

E⊗3 (|aaaihbbb|) = E⊗3 (|aihb|)⊗3

= [E(|aihb|)]⊗3
⊗3
= [(1 − p)|aihb| + p|āihb̄|]
= (1 − p)3 |aaaihbbb|
+ (1 − p)2 p (|āaaihb̄bb| + |aāaihbb̄b| + |aaāihbbb̄|)
+ (1 − p)p2 (|aāāihbb̄b̄| + |āaāihb̄bb̄| + |āāaihb̄b̄b|)
+ p3 |āāāihb̄b̄b̄|.

(Alternatively, we can expand E⊗3 (ρ) for any three-qubit operator ρ to get an operator-sum ex-
pression for E⊗3 :

E⊗3 (ρ) = (1 − p)3 ρ

+ (1 − p)2 p(X1 ρX1 + X2 ρX2 + X3 ρX3 )
+ (1 − p)p2 (X2 X3 ρX2 X3 + X1 X3 ρX1 X3 + X1 X2 ρX1 X2 )
+ p3 X1 X2 X3 ρX1 X2 X3 ,

R(E⊗3 (|aL ihbL |)) = (1 − 3p2 + 2p3 )|aaaihbbb| + (3p2 − 2p3 )|āāāihb̄b̄b̄|
= (1 − 3p2 + 2p3 )|aL ihbL | + (3p2 − 2p3 )X1 X2 X3 |aL ihbL |X1 X2 X3 .

Exercise 26.2 Verify this last equation. This may be tedious, but it is good practice.

Since this equation holds for all four bowtie operators |aL ihbL |, by linearity, we get

τ = (1 − 3p2 + 2p3 )|ψL ihψL | + (3p2 − 2p3 )X1 X2 X3 |ψL ihψL |X1 X2 X3 .

The first term represents Bob’s successful recovery of |ψL i, and this occurs with probability 1 −
3p2 + 2p3 , which is greater than 1 − p if p < 1/2. In fact, it is 1 − O(p2 ), which is significant if p is
small. For the fidelity, we get
p
F(|ψL ihψL |, τ) = hψL |τ|ψL i > 1 − 3p2 + 2p3 > 1 − p,
p p

and so the minimum fidelity of τ with |ψL ihψL | is strictly greater than the minimum fidelity of σ
with |ψL ihψL |. So, recovery improves the worst-case fidelity.

168
The Quantum Phase-Flip Channel. Bit flips are not the only possible errors in a quantum
channel. Consider the one-qubit phase-flip channel given by

F(ρ) := (1 − p)ρ + pZρZ,

which applies a Pauli Z operator to the qubit (thus flipping the relative phase between |0i and |1i
by a factor of −1) with probability p < 1/2.
This kind of channel has no classical analogue, but in a very real sense it is closely analogous
to the quantum bit-flip channel—the two channels are “unitarily conjugate” to each other via the
Hadamard H operator. Here’s what I mean by that: Since HX = ZH and XH = HZ, we have

H(F(ρ))H = (1 − p)HρH + pHZρZH = (1 − p)HρH + pXHρHX = E(HρH)

for every one-qubit operator ρ. Similarly, H(E(ρ))H = F(HρH).28 So by conjugating everything

with equality holding if |ψi = H|0i or |ψi = H|1i. So the worst-case fidelity is the same as with the
bit-flip channel.
To get an error-correcting code for the phase-flip channel, we take our majority-of-3 construction
for the bit-flip channel and insert Hadamard gates in the right places. Recall that we’ve defined
|+i := H|0i and |−i := H|1i. Alice now encodes her one-qubit pure state |ψi = α|0i + β|1i as
0
ψ := H⊗3 (α|000i + β|111i) = α|+++i + β|−−−i = α0 0 + β1 0 ,

L L L

where 0L := |+++i, 1L := |−−−i, and H⊗3 = H1 H2 H3 , defined analogously with X1 , X2 , and

0 0

X3 previously. Figure 13 shows the circuit Alice uses to do this.

Note that Z|+i = |−i and Z|−i = |+i. In other words, Z is represented in the {|+i, |−i} basis by
the same matrix as X is in the computational basis. So a phase flip in the channel F will flip a + to
a − and vice versa. This means that we can do the same analysis of the channel F as
we did
with
E by substituting the labels + for 0 and − for 1. Bob receives the state σ := F ( ψL ψL ) from
0 ⊗3 0 0

Alice, measures the error syndrome with projectors Q0 , Q1 , Q2 , Q3 , where each Qj := H⊗3 Pj H⊗3 .
If Bob sees that the relative phase of one of the qubits is different from that of the other two, then
Bob assumes that the qubit’s phase was flipped and applies a Z gate to that qubit. The circuit for
doing all this is shown in Figure 14.
The quantum channel corresponding to Bob’s whole procedure is given by

X
3 X
3
S(σ) := Q0 σQ0 + Zj Qj σQj Zj = H⊗3 P0 H⊗3 σH⊗3 P0 H⊗3 + Zj H⊗3 Pj H⊗3 σH⊗3 Pj H⊗3 Zj
j=1 j=1

28
Put more succinctly, U ◦ F = E ◦ U and U ◦ E = F ◦ U, where U is the one-qubit unitary quantum channel that maps
ρ 7→ HρH.

169
|ψi H

|0i H

Figure 13: The three-qubit

0code
for
0 the
phase-flip
0 channel. An arbitrary one-qubit state |ψi =
α|0i + β|1i is encoded as ψL = α 0L + β 1L = α|+++i + β|−−−i.

b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1


 H H Z
σ H H Z

 H H Z

|0i b2

Figure 14: Bob’s error recovery procedure for the phase-flip channel.

X
3
⊗3 ⊗3 ⊗3 ⊗3
= H P0 H σH P0 H + H⊗3 Xj Pj H⊗3 σH⊗3 Pj Xj H⊗3
j=1
 
X
3
H⊗3 P0 (H⊗3 σH⊗3 )P0 + Xj Pj (H⊗3 σH⊗3 )Pj Xj  H⊗3 = H⊗3 R(H⊗3 σH⊗3 ) H⊗3

=
j=1

and thus
H⊗3 (S(σ))H⊗3 = R(H⊗3 σH⊗3 ) (105)
for any three-qubit operator σ, where R is the bit-flip recovery channel of Equation (104). That is,
S is unitarily conjugate to R via H⊗3 . In a similar fashion, we can get that
H⊗3 (F⊗3 (ρ))H⊗3 = E⊗3 (H⊗3 ρH⊗3 ) (106)
for any three-qubit operator ρ.
Letting τ 0 := S(F⊗3 (ψL0 ψL0 )) and stringing these channels together, (105) and (106) give us

H⊗3 τ 0 H⊗3 = H⊗3 (S(F⊗3 (ψL0 ψL0 )))H⊗3

= R(H⊗3 (F⊗3 (ψL0 ψL0 ))H⊗3 )

= R(E⊗3 (H⊗3 ψL0 ψL0 H⊗3 ))

= R(E⊗3 (|ψL ihψL |))

= τ,

170
|ψi H
|0i
|0i
|0i H
|0i
|0i
|0i H
|0i
|0i

Figure 15: The nine-qubit Shor code. This concatenates the phase-flip and bit-flip codes.

or equivalently,
τ 0 = H⊗3 τH⊗3 = (1 − 3p2 + 2p3 )ψL0 ψL0 + (3p2 − 2p3 )Z1 Z2 Z3 ψL0 ψL0 Z1 Z2 Z3 .

Thus we get the same success probability here as with the bit-flip channel, and the fidelity is at
worst the same as it was then:
q
p
F(ψL0 ψL0 , τ 0 ) = ψL0 |τ 0 |ψL0 > 1 − 3p2 + 2p3 .

The Shor Code. We can combine the bit-flip and phase-flip error correcting codes above to correct
against both kinds of errors, even on the same qubit. As a bonus, we’ll show that the resulting
code corrects against arbitrary errors on a single qubit. A typical one-qubit channel that has all
three kinds of errors (bit flip, phase flip, and combined bit and phase flip) is called the depolarizing
channel, and it maps
p p
ρ 7→ D(ρ) := (1 − p)ρ + (XρX + ZρZ + ZXρXZ) = (1 − p)ρ + (XρX + YρY + ZρZ). (107)
3 3
This channel leaves the qubit alone with probability 1 − p > 1/2 and produces each of the three
possible errors with the same probability p/3.
To help correct against all three types of errors, Alice first encodes a single qubit using the three-
qubit phase-flip code of Figure 13, then she encodes each of the three qubits using the majority-of-3
code for the bit-flip channel, shown in Figure 11. The resulting encoding circuit, shown in Figure 15,
produces the nine-qubit Shor code, named after its inventor, Peter Shor. Such a code is called a
concatenated code, in that it combines (concatenates) two or more simpler codes into a single code.
Using the Shor code, Alice encodes a single qubit in state |ψi = α|0i + β|1i as the nine-qubit state
|ψS i := α|0S i + β|1S i, where
1
|0S i := √ (|000i + |111i)⊗3 = |+L i⊗3 , (108)
2 2
1
|1S i := √ (|000i − |111i)⊗3 = |−L i⊗3 , (109)
2 2

171
√ √
where we define the three-qubit states |+L i := (|000i + |111i)/ 2 and |−L i := (|000i − |111i)/ 2.
The nine qubits are naturally divided into three subblocks of three qubits each, which I’ll call
3-blocks. Alice sends Bob |ψS ihψS | through a channel (e.g., the depolarizing channel) that may
cause one of the three errors on each of the nine qubits with some probability independently of
the others. If more than one qubit is affected, then the recovery may not work, and so we hope
that the probability of this happening is low.
Bob receives the nine-qubit state σ sent from Alice, and we’ll assume (with high probability)
that at most one of the nine qubits endured either a bit flip, phase flip, or both. For example,
suppose Alice sends |0S i = |+L i⊗3 to Bob. If the first qubit is bit-flipped en route, then Bob
√
receives (1/ 2)(|100i + |011i) ⊗ |+L i⊗2 . If the first qubit is phase-flipped, then Bob gets the state
√
(1/ 2)(|000i − |111i) ⊗ |+L i⊗2 = |−L i ⊗ |+L i⊗2 . (Note that a phase flip in a qubit contributes
an overall phase flip in its 3-block; phase flips on two different qubits in the same block would
cancel each other.) Finally, if the first qubit is bit-flipped and then phase-flipped, then Bob gets
√
(1/ 2)(−|100i + |011i) ⊗ |+L i⊗2 .
To recover, Bob first applies the bit-flip error recovery operation R of Figure 12 and Equa-
tion (104) to each of the three 3-blocks independently. This will correct up to a single bit-flip error
within each 3-block. Crucially, this intrablock bit-flip recovery works regardless of whether there
was also a phase-flip in the 3-block. After Bob corrects bit flips within each 3-block, he must then
correct phase flips. He does this by comparing phase differences between adjacent 3-blocks, either
finding which 3-block’s phase doesn’t match the other two and flipping that 3-block’s phase back,
or else determining that the phases of the 3-blocks are all equal and nothing needs to be done.
A circuit that accomplishes this phase-flip recovery portion of the overall recovery is shown in
Figure 16. In essence, Bob’s procedure first applies a bank of H gates to turn phase flips into bit
flips, then it measures the bit flips (bit parity), then converts the state back into phase flips (more
H gates), which it can then correct with Z gates based on what bit flips were measured. To see that
this all works, define
1
|eveni := (|000i + |011i + |101i + |110i),
2
1
|oddi := (|100i + |010i + |001i + |111i).
2
|eveni is a superposition of all computational basis states of three qubits with an even number
of 1’s; |oddi is a superposition of the other computational basis states. One can readily check
that |eveni = H⊗3 |+L i and that |oddi = H⊗3 |−L i. Suppose for example that Alice sends |1S i =
|−L i|−L i|−L i to Bob (we are surpressing the ⊗ operator symbol for now), and there is a phase
flip on some qubit in the first 3-block (it doesn’t matter which). So after correcting bit flips, Bob’s
state is now |+L i|−L i|−L i, which feeds into the circuit of Figure 16. Applying the Hadamard gates
yields the state |eveni|oddi|oddi. As a result of the CNOTs, the upper ancilla’s bit value will flip
an even + odd = odd number of times, and so b1 = 1. The lower ancilla’s bit value will flip an
odd + odd = even number of times, and so b2 = 0. The next layer of Hadamards converts the
state back to |+L i|−L i|−L i, and then Bob recovers by applying a Z gate to the first qubit, yielding
|−L i|−L i|−L i = |1S i.
Let’s look briefly at the quantum channels involved. Let T be the quantum channel corre-
sponding to Bob’s entire recovery procedure for the Shor code.

172
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1

H H Z
H H
H H
H H Z
H H
H H
H H Z
H H
H H

|0i b2

Figure 16: Recovering from a phase flip with the Shor code. When this circuit starts, Bob assumes
that he has already corrected any bit flip in a 3-block if there was one, and so the incoming state is
a linear combination of the eight states |±L i|±L i|±L i.

Exercise 26.3 (Challenging) Give an expression for T applied to an arbitrary nine-qubit state ρ.
Make your expression as succinct as possible but still mathematically precise. You may use the
following notations for operators without having to expand them:

• Let R0 , R1 , R2 , R3 represent the projectors for the measurement performed in Figure 16, corre-
sponding to the outcomes 00, 10, 11, and 01, respectively, for b1 b2 .
(j) (j) (j) (j)
• For j ∈ {0, 1, 2}, let P0 , P1 , P2 , P3 be the projectors used for the bit-flip syndrome mea-
surement in the (j + 1)st 3-block, as described in the bit-flip channel discussion.

• For any single-qubit operator A and k ∈ {1, . . . , 9}, let Ak be the nine-qubit operator that
applies A to the kth qubit and leaves the other qubits alone.

Suppose the channel between Alice and Bob is the one-qubit depolarizing channel D of Equa-
tion (107). If Alice and Bob use the Shor code, then nine qubits will be transferred per single
plaintext qubit. For an arbitrary nine-qubit state ρ, the effect of D on ρ is then

pX
9
D⊗9 (ρ) = (1 − p)9 ρ + (1 − p)8 (Xj ρXj + Yj ρYj + Zj ρZj ) + O(p2 ).
3
j=1

Where the terms hidden in the “O(p2 )” represent errors on two or more qubits, from which Bob
might not recover. Bob, however, can recover from any of the single-qubit errors showing in the
expression above, provided ρ is in the code space of the Shor code. That is, if ρ = |ψS ihψS | for some

173
one-qubit state |ψi, then we’ve shown that T(Xj ρXj ) = T(Yj ρYj ) = T(Zj ρZj ) = ρ for all 1 6 j 6 9,
and thus the final error-corrected state is

υ := T(D⊗9 (ρ)) = ((1 − p)9 + 9p(1 − p)8 )ρ + O(p2 ) = (1 − p)8 (1 + 8p)ρ + O(p2 ).

The hidden terms are all of the form KρK∗ = K|ψS ihψS |K∗ for some Kraus operators K, and thus
the fidelity is

F(|ψS ihψS |, υ) = hψS |υ|ψS i

p
q
= (1 − p)8 (1 + 8p) + (nonnegative terms)
q
> (1 − p)8 (1 + 8p)
p
= (1 − p)4 1 + 8p
= 1 − O(p2 ).
√
The value (1 − p)4 1 + 8p is actually an underestimate for the minimum fidelity, because using
T Bob can correct some errors involving more than one qubit—for example, bit-flips of qubits in
different 3-blocks, or even three phase flips and one bit flip within a single 3-block. The only errors
he potentially cannot recover from are either two or more bit flips within the same 3-block, or net
phase flips in two or more different 3-blocks, and even then, Bob can recover from some of these
errors.

Exercise 26.4 Show that using the Shor code, Bob recovers from X2 X4 X9 Z1 Z2 Z3 Z4 Z5 Z7 Z9 .

Exercise 26.5 (Challenging) What is the worst-case fidelity of sending an unencoded one-qubit
pure state |ψihψ| through the depolarizing channel D? About how small does p have to be so that
the worst-case estimate of the fidelity for the Shor code, above, is better than this? A numerical
approximation will suffice.

174
27 Week 13: Error correction (cont.)

Quantum Error Correction: The General Theory. Here we want to determine, in the most
general terms that we can, when it is possible to recover from a noisy quantum channel through
the use of an error correcting code. Let H be the Hilbert space of states that are to be sent through
some noisy channel. We’ll assume that information sent through the channel is encoded into states
in some linear subspace C ⊆ H, i.e., the code space. Let P be the projector that projects orthogonally
onto C. We’ll assume that the noisy channel is modeled by some (possibly incomplete) quantum
error channel E ∈ T(H). For example, in our discussion of the Shor code, H is the 29 -dimensional
space of nine qubits, and C is the 2-dimensional subspace spanned by the vectors |0S i and |1S i; the
noisy channel E of interest may be the portion of the depolarizing channel D⊗9 in which at most
one qubit is affected. This channel sends a state ρ ∈ L(H) to
X
9
E(ρ) := (1 − p)9 ρ + (1 − p)8 p/3 (Xj ρXj + Yj ρYj + Zj ρZj ), (110)
j=1

and represents the portion of D⊗9 from which we know Bob can recover. Note that E is an
incomplete (non-trace-preserving) channel, because we are omitting the terms of D⊗9 where more
than one qubit is subjected to an error, and from which Bob may not be able to recover. The
incompleteness reflects the fact that this unrecoverability happens with nonzero probability.
Back to the general case. We’llPsay that a quantum state ρ is in the code space C iff it is a convex
sum of pure states in C, i.e., ρ = i pi |ψi ihψi | where each |ψi i ∈ C. Equivalently, ρ is in the code
space iff ρ = PρP (equivalently, ρ = Pρ, or equivalently, ρ = ρP, by Exercise 27.1, below).

Exercise 27.1 Prove that the following are equivalent for any projector P and Hermitean operator
A: (1) A = PAP; (2) A = AP; (3) A = PA. [Hint: No decompositions are needed for any of
these—just simple substitutions and taking adjoints.]

The error channel E can be given in operator-sum form by some Kraus operators E1 , . . . , EN ∈
P
L(H) such that Nj=1 Ej Ej 6 I (inequality because E is not necessarily complete), and
∗

X
N
E(ρ) = Ej ρE∗j
j=1

for any ρ ∈ L(H). Suppose that R ∈ T(H) is some (not necessarily complete) quantum channel
representing a recovery procedure. We will say that R successfully recovers from E if (R ◦ E)(ρ) = cρ
for any ρ in C, where c is a real constant depending on ρ, E, and R and satisfying 0 6 c 6 1. (If
E and R are both complete (hence trace-preserving), then we must have c = 1.) We will say that
E is recoverable if there exists an R that successfully recovers from E. The next theorem gives a
quantitative criterion for when an error channel is recoverable.

Theorem 27.2 Let E be a nonzero, possibly incomplete error channel on L(H) given by Kraus operators
E1 , . . . , EN ∈ L(H). Fix a code space C ⊆ H and let P be the projector projecting onto C. E is recoverable
(with respect to C) if and only if there exists an N × N matrix M such that, for all 1 6 i, j 6 N,
PE∗i Ej P = [M]ij P. (111)

175
Further, if such an M exists, then M > 0, tr M is the probability that E occurs given any state in C, and a
(complete) recovery channel R exists such that (R ◦ E)(ρ) = (tr M)ρ for any ρ in C.

I call Equation (111) the “peep” condition, because of the left-hand side. We will only be
interested in the backwards implication, giving sufficient conditions for E to be recoverable by
actually constructing a recovery procedure. So we won’t prove the forward implication (the
Nielsen & Chuang textbook does it). Here is some intuition about the forward implication:
Suppose E is recoverable. Let |ψi and |ϕi be
in the code space C. Then P|ψi = |ψi

any twostates
and P|ϕi = |ϕi, and so for all i, j, we have ψ|E∗i Ej |ϕ = ψ|PE∗i Ej P|ϕ . The left-hand side is the
inner product of the vectors Ei |ψi and Ej |ϕi. Equation (111) is equivalent to saying that this value
is proportional to hψ|P|ϕi = hψ|ϕi, where the constant of proportionality ([Mij ]) depends only on
i and j and not on |ψi or |ϕi. One might expect that this is needed, so that the error operators do
not “distort” the code space, i.e., they preserve inner products up to a constant, because to recover,
we must apply a unitary operation to restore the code space undistorted, so that superpositions of
states in the code space are preserved up to a constant.
Proof. [backward implication] Suppose that M exists satisfying (111) for all i, j. We can assume
that P , 0; otherwise, the theorem is trivial. Taking the adjoint of each side of (111), we have, for
all 1 6 i, j 6 N,
[M]∗ij P = PE∗j Ei P = [M]ji P,
and so [M]∗ij = [M]ji since P , 0, which means that M is Hermitean. The next thing to do is to
simplify (111) by diagonalizing M. Since M is normal, there is an N × N unitary matrix U and
scalars d1 , . . . , dN ∈ R (the eigenvalues of M) such that U∗ MU = diag(d1 , . . . , dN ). For 1 6 k 6 N,
define
X
N
Fk := [U]jk Ej .
j=1

Then
X
N X
N X X X
!
F∗k Fk = [U]jk [U]∗ik E∗i Ej = δji E∗i Ej = E∗j Ej 6 I,
k=1 i,j=1 k i,j j

and for any ρ ∈ L(H),

X
N X
N X X X
!
Fk ρF∗k = [U]ik [U]∗jk Ei ρE∗j = δij Ei ρE∗j = Ej ρE∗j = E(ρ).
k=1 i,j=1 k i,j j

Thus F1 , . . . , FN are also a set of Kraus operators for E. Now Equation (111) becomes, for all
1 6 k, ` 6 N,

X
N X X
PF∗k F` P = [U]∗ik [U]j` PE∗i Ej P = [U∗ ]ki [M]ij [U]j` P = [U∗ MU]k` P = dk δk` P. (112)
i,j=1 i,j i,j

Taking the trace of both sides of (112) with k = `, we get

tr(PF∗k Fk P) hFk P, Fk Pi
dk = = >0.
tr P tr P

176
This implies M > 0. Also, for any state ρ in C, the probability that E actually occurs is given by

X
N
tr(E(ρ)) = tr(E(PρP)) = tr(Fk PρPF∗k ) =
k=1
X X X X
tr(PF∗k Fk Pρ) = dk tr(Pρ) = dk tr ρ = dk = tr M .
k k k k

Note that if dk = 0 for some k, then hFk P, Fk Pi = 0, and so Fk P = 0. This implies that if ρ is
any state in C, then Fk ρF∗k = Fk PρPF∗k = 0, and so this term is dropped from the operator-sum
expression for E(ρ). Since we only care about the behavior of E on states in C, we can effectively
ignore the cases where dk = 0 and assume instead that all the dk are positive.
By the Polar Decomposition (Theorem B.8 in Section B.3), for each 1 6 k 6 N there is a unitary
Uk ∈ L(H) such that p
Fk P = Uk |Fk P| = Uk PF∗k Fk P = dk Uk P.
p
(113)
Uk rotates C to the subspace Ck that is the image of the projector Pk defined as

Fk PU∗
Pk := Uk PU∗k = √ k . (114)
dk

The crucial fact that makes E recoverable is that these Ck subspaces are mutually orthogonal:

Uk PF∗k F` PU∗` Uk (dk δk` P)U∗`

Pk P` = Pk∗ P` = √ = √ =0 (115)
dk d` dk d`

where |ψk i := Uk |ψi ∈ Ck for each k. So E(|ψihψ|) is a mixture of pure states |ψk i in the various
subspaces Ck . We can thus interpret E as mapping |ψi to |ψk i ∈ Ck with probability dk . Since the
Ck are mutually orthogonal, the |ψk i are pairwise orthogonal. To recover, we can first measure
to which Ck the state E(|ψihψ|) belongs. This projective measurement projects to one of the states
|ψk i = Uk |ψi, where k is the outcome of the measurement. Then to correct the error, we simply
apply U∗k to get U∗k |ψk i = |ψi.
Now we describe R formally. R consists of two stages: (1) measure the error syndrome
(i.e., “which Ck ?”), and (2) apply the appropriate (unitary) correction U∗k . By Equation (115),
the projectors P1 , . . . , PN form a set of orthogonal projectors. If this is not a complete set, i.e., if
PN PN
k=1 Pk , I, then we add one more projector PN+1 := I − k=1 Pk to the set to make it complete.
Otherwise, we set PN+1 := 0. The syndrome measurement is then a projective measurement with
the Pk . (If the outcome is N + 1, which signifies “none of the above,” then we really don’t know
what to do, so we’ll give up and define UN+1 := I for completeness. If the state being measured is
the result of applying E to some state in the code space C, however, then outcome N + 1 will never
actually occur.)

177
So we define, for any σ ∈ L(H),
X
N+1
R(σ) := U∗k Pk σPk Uk .
k=1

Thus R has Kraus operators U∗k Pk for 1 6 k 6 N + 1. We first check that R is compete:
X
N+1 X
N+1
Pk Uk U∗k Pk = Pk = I .
k=1 k=1

It remains to check that R successfully recovers from E for arbitrary states in C—not just pure
states. The following equation will make things easier: for all 1 6 k, ` 6 N,
U∗ Uk PF∗ F` P PF∗ F` P
U∗k Pk F` P = U∗k Pk∗ F` P = k √ k
p
= √k = dk δk` P , (116)
dk dk
using Equations (112) and (114). Also, for 1 6 ` 6 N, we have PN+1 P` = 0 by orthogonality, and
thus, using Equations (113) and (114),
U∗N+1 PN+1 F` P = PN+1 F` P = d` PN+1 U` P = d` PN+1 P` U` = 0 ,
p p
(117)
and so Equation (116) holds for k = N + 1 as well.
So finally, if ρ is in C, we have, by Equations (116) and (117),
XX
N+1 N
R(E(ρ)) = R(E(PρP)) = U∗k Pk F` PρPF∗` Pk Uk
k=1 `=1
XX
= (U∗k Pk F` P) ρ (U∗k Pk F` P)∗
k `
XXp p
= dk δk` P ρ dk δk` P
k `
XX
!
= dk δk` PρP
k `
= (tr M)ρ .
2

Exercise 27.3 (Challenging) Recall the quantum bit-flip channel for a single qubit:
E(ρ) := (1 − p)ρ + pXρX.
Also recall the recoverable portion of the three-qubit bit-flip channel:
X
3
0
E (ρ) = (1 − p) ρ + (1 − p) p
3 2
Xj ρXj .
j=1
√ √ √
Show directly that E 0 , with Kraus operators (1 − p)3/2 I, (1 − p) pX1 , (1 − p) pX2 , (1 − p) pX3 ,
satisfies the peep condition (111) of Theorem 27.2, where C is the usual majority-of-3 code space
given by the projector P = |000ih000| + |111ih111|. What is the matrix M? What are the Pk and Uk ?
Is the R constructed by the Theorem the same as it was before?

178
Discretization of Errors. The great thing about the Shor code is that it can recover from an arbitrary
single-qubit error. There are many possible single-qubit errors, as there are a continuum of possible
one-qubit Kraus operators. Yet they are all corrected by the Shor code, with no additional work.
This happy fact follows from the following two general theorems:

Theorem 27.4 Suppose C ⊆ H is the code space for a quantum code, P is the projector projecting
orthogonally onto C, E ∈ T(H) is a not necessarily complete quantum error channel with Kraus operators
F1 , . . . , FN , and R ∈ T(H) is a quantum channel with Kraus operators R1 , . . . , RM such that, for any
1 6 j 6 N there exist scalars dj > 0 such that
p
Rk Fj P = dj δkj P (118)

for any 1 6 k 6 M. Suppose also that G is an error channel whose Kraus operators G1 , . . . , GK are all
linear combinations of F1 , . . . , FN . Then R successfully recovers from G.
PN
Proof. For all 1 6 ` 6 K we have G` = j=1 mj` Fj , for some scalars mj` . Using (118), we get

X
N X p p
Rk G` P = mj` Rk Fj P = mj` dj δkj P = mk` dk P ,
j=1 j

where we set dk := 0 if N < k 6 M. Then for every state ρ in C, we have

X
M X
K XX
R(G(ρ)) = R(G(PρP)) = (Rk G` P)ρ(Rk G` P)∗ = |mk` |2 dk PρP = cρ,
k=1 `=1 k `
PM PK
`=1 |mk` | dk . Thus R successfully recovers from G given code space C. 2
where c := 2
k=1

Theorem 27.5 Suppose C ⊆ H is the code space for a quantum code, P is the projector projecting
orthogonally onto C, E ∈ T(H) is a not necessarily complete quantum error channel with Kraus operators
E1 , . . . , EN that satisfy the peep condition (111), i.e.,

PE∗i Ej P = [M]ij P

for all 1 6 i, j 6 N, for some matrix M. Suppose also that G is an error channel whose Kraus operators
G1 , . . . , GK are all linear combinations of E1 , . . . , EN . Then the channel R constructed in the proof of
Theorem 27.2 to recover from E also successfully recovers from G, given code space C.

Proof. In the proof of Theorem 27.2 above, we chose new Kraus operators F1 , . . . , FN for E where
P
Fk := N j=1 [U]jk Ej for all 1 6 k 6 N, where U is an N × N unitary matrix that diagonalizes M so
that there are real numbers d1 , . . . , dN > 0 such that PF∗k F` P = dk δk` P for all 1 6 k, ` 6 N. The Fk
are clearly linear combinations of the Ej , but the Ej are also linear combinations of the Fk ; indeed,
P
it is easily checked that Ej = N ∗
k=1 [U]jk Fk , using the unitarity of U. Thus the G` , being linear
combinations of the Ej , are linear combinations of the Fk as well.
The R we constructed in the proof of Theorem 27.2 has Kraus operators

U∗1 P1 , . . . , U∗N PN , UN+1 PN+1 .

179
Setting Rk := U∗k Pk for all 1 6 k 6 N + 1, Equations (116) and (117) say that
p
Rk Fj P = dj δkj P

for all 1 6 k 6 N+1 and all 1 6 j 6 N. This is exactly the discretization condition of Equation (118)
(with M = N + 1). Therefore, G, the Rk , and the Fj together satisfy the hypotheses of Theorem 27.4,
and so R successfully recovers from G by that theorem. 2
We can apply either Theorem 27.4 or Theorem 27.5 to the Shor code to show that Bob’s recovery
procedure can correct any single-qubit error. The key point is that the four Pauli operators I, X, Y, Z
form a basis for the space of all single-qubit operators, and so any single-qubit error channel has
Kraus operators that are linear combinations of the Pauli operators. Since Bob can recover from
any error of the form Xj , Yj , or Zj , for 1 6 j 6 9 in a way that satisfies Theorem 27.4, he can recover
from any linear combination of these—in particular, any error on any one of the nine qubits.

Exercise 27.6 (Challenging) Show that Bob’s recovery channel for the Shor code can recover from
any error on any one of the nine qubits. [Hint: By the preceding discussion, it only remains to
show that Bob’s recovery procedure satisfies the discretization condition of Equation (118) for the
recoverable portion of the depolarizing channel given by Equation (110).]

180
S :=

Figure 17: Implementing the C-NOT gate fault-tolerantly using the Shor code. The double slashes
on the left indicate that each line represents a multi-qubit register (nine qubits in this case). The
circuit maps |aS i|bS i to |aS i|(a ⊕ b)S i for all a, b ∈ {0, 1}.

28 Week 14: Fault tolerance

Fault-Tolerant Quantum Computation. If a qubit is in an encoded state, such as with the Shor
code, then we can repeatedly apply an error-recovery operation to “restore the logic,” i.e., the
state of the logical qubit, assuming isolated errors in the physical qubits. Depending on the
implementation and frequency of the restore operations, we can maintain a logical qubit state
indefinitely with high probability. There is more to a quantum computation, however, than simply
maintaining qubits. We must apply quantum gates to them. A not-so-good way to apply a
quantum gate is to decode each qubit involved in the gate, then apply the gate on the unencoded
qubits, then re-encode the qubits. This is bad because qubits spend time unencoded and subject
to unrecoverable errors, defeating the whole purpose of error correction. A better way is to keep
all qubits in an encoded state always, never decoding them, so that we prepare, work with, save,
and measure qubits in their encoded states only. This practice is called fault-tolerant quantum
computation, and it works by replacing each gate of a standard, non-fault-tolerant quantum circuit
with a quantum mini-circuit that affects the state of the logical qubits in the same way the original
gate affects the state of its unencoded qubits.
With the Shor code as well as other quantum error-correcting codes, we can implement several
types of quantum gates fault-tolerantly. It can be shown that these codes can implement a family of
gates big enough to provide a basis for any feasible quantum computation (a so-called, “universal”
family of gates). We will not do an exhaustive treatment here, but will at least show how to
implement the C-NOT and Pauli gates explicitly using the Shor code.
Figure 17 shows how to implement the C-NOT gate fault-tolerantly using the Shor code. Each
logical qubit is implemented by nine physical qubits.

181
Exercise 28.1 Verify that the circuit in Figure 17 really implements the C-NOT gate with respect to
the Shor code. That is, show that the circuit maps |aS i|bS i to |aS i|(a ⊕ b)S i for all a, b ∈ {0, 1}.

182
29 Week 15: Stabilizers, Entanglement, and Bell inequalities

29.1 Stabilizers

The stabilizer formalism gives us two things: (1) a convenient way of describing some quantum
states and some of the gates that act on them; and (2) a large family of quantum error-correcting
codes that are efficient and allow easy recovery from errors. The Shor code is an example of a
stabilizer code. We will see others.

The Pauli Group. For ease of reference, here we recall the four 1-qubit Pauli operators (matrices
are with respect to the computational basis):

1 0 0 1 0 −i 1 0
I= , X= , Y= , Z= .
0 1 1 0 i 0 0 −1

These operators satisfy X2 = Y 2 = Z2 = I and XY = −YX = iZ and YZ = −ZY = iX and

ZX = −XZ = iY. Also tr I = 2 and tr X = tr Y = tr Z = 0. All four Pauli operators are both
Hermitean and unitary.
First, some basic definitions. Fix n > 1. The Pauli group Πn on n qubits is the set of all n-qubit
unitary operators of the form
g = α(σ1 ⊗ σ2 ⊗ · · · ⊗ σn ) ,
where each σi is a Pauli operator (acting on the ith qubit), and the scalar α is in the set {1, −1, i, −i}.
We call α the coefficient of g, writing coeff(g) for α, and we call σ1 ⊗ · · · ⊗ σn the principal part of g,
writing princ(g) for σ1 ⊗ · · · ⊗ σn . Clearly, both α and all the σi are uniquely determined by g. It
follows that Πn has exactly 4n+1 many elements.
We call Πn a group because it has an identity element I⊗n (which we will hereafter abuse
notation and simply denote as 1) with respect to multiplication (i.e., composition) and it is closed
under multiplication and inverses: if g and h are elements of Πn , then writing g = α(σ1 ⊗ · · · ⊗ σn )
and h = β(τ1 ⊗ · · · ⊗ τn ), we have

gh = αβ(σ1 τ1 ⊗ · · · ⊗ σn τn )

and
g−1 = g∗ = α∗ (σ1 ⊗ · · · ⊗ σn ) .
Notice that, because of the commutation properties of the Pauli operators, g and h either commute
or anticommute, that is, either gh = hg or gh = −hg. The latter occurs just when there are an odd
number of positions with anticommuting Pauli components in g and h.
A subgroup of Πn is any subset of Πn which is itself a group. If g1 , . . . , gk are elements of Πn ,
then the subgroup of Πn generated by S = {g1 , . . . , gk } is the smallest subgroup of Πn that includes
S. It is the closure of S∪{1} under multiplication and inverses, and we denote it hSi or hg1 , . . . , gk i.29
We say that S is a minimal set of generators if no proper subset of S generates the same subgroup.
29
The angle bracket notation h· · ·i we use here has nothing to do with the Hermitean inner product on a Hilbert space.
We have no occasion to use the latter meaning anywhere in this section.

183
Here is an example. Let n := 3 and let g := −i(X ⊗ Z ⊗ Y) = −iX1 Z2 Y3 and h := −(Y ⊗ I ⊗ Z) =
−Y1 Z3 . Then
coeff(g) = −i ,
princ(g) = X ⊗ Z ⊗ Y = X1 Z2 Y3 ,
coeff(h) = −1 ,
princ(h) = Y ⊗ I ⊗ Z = Y1 Z3 ,
gh = −i(Z ⊗ Z ⊗ X) = −iZ1 Z2 X3 ,
hg = −i(Z ⊗ Z ⊗ X) = −iZ1 Z2 X3 = gh ,
g2 = −1 ,
h2 = 1 .

Exercise 29.1 Redo the example above, this time where n = 4 and g = I ⊗ Z ⊗ Z ⊗ X and
h = i(Y ⊗ Y ⊗ X ⊗ Y).

Stabilizing Subgroups. There are subgroups of Πn that are of particular interest to us.

Definition 29.2 Let G be a subgroup of Πn . We will say that G is a stabilizing subgroup iff −1 < G.

This simple condition has strong consequences.

Lemma 29.3 Let G be a stabilizing subgroup of Πn . Then

1. coeff(g) = ±1 (and hence g is Hermitean) for all g ∈ G,

2. g2 = 1 for all g ∈ G,
3. no two distinct elements of G share the same principal part, and
4. gh = hg for all g, h ∈ G.30

Exercise 29.4 Prove Lemma 29.3. [Hint: The last item follows easily from the second item and the
fact that any two elements of Πn either commute or anticommute.]

Exercise 29.5 List the elements of hg1 , g2 i, where g1 = X ⊗ Z and g2 = Z ⊗ X. Is hg1 , g2 i a stabilizing
subgroup of Π2 ? Explain.

Exercise 29.6 Let G be a stabilizing subgroup of Πn with a k-element minimal generating set
S = {g1 , . . . , gk }. Show that G has exactly 2k many elements, each one obtained by multiplying
together the elements of a different subset of S. That is,
 
Y 
G= g : J⊆S , (119)
 
g∈J

and each choice of J yields a distinct product. [Hint: Use items 2 and 4 of Lemma 29.3.]
30
A group with this property is called commutative or abelian.

184
The previous exercise shows that any minimal generating set for G has size k. We will call k
the dimension of G.

n
Stabilizing Subgroups Acting on H. Fix n > 1 as before, and let H = C2 be the n-qubit Hilbert
space with the usual computational basis. Notice that each element of Πn is an operator in L(H).
Let E ⊆ H be any nontrivial (i.e., positive-dimensional) subspace of H. The stabilizer of E in Πn ,
written Stab(E), is the set of all g ∈ Πn that fix E pointwise, that is, the set of all g ∈ Πn such that
gv = v for all v ∈ E. Stab(E) is a subgroup of Πn , and in fact it must be a stabilizing subgroup,
because −1 ∈ Πn maps any v to −v, and so does not fix anything except the zero vector.
Conversely, given a stabilizing subgroup G of Πn , we let EG be the subspace of H stabilized
by G, that is,
EG := {v ∈ H : (∀g ∈ G)[ gv = v ]} .
(If G is not stabilizing, then EG is evidently the trivial space {0}.) Clearly, G ⊆ Stab(EG ). We will
see shortly (Proposition 29.14, below) that the two groups are the same.
We can recast all this in terms of eigenvectors, eigenvalues, and projectors. If g ∈ Πn and
coeff(g) ∈ {1, −1}, then g2 = 1. This means that the only two eigenvalues of g are +1 and −1.
Obviously, the identity element 1 ∈ Πn only has eigenvalues +1. If g , ±1, however, then g
has both eigenvalues, and in fact, dim E+1 (g) = dim E−1 (g) = 2n−1 . This is because first of all,
princ(g) must have a Pauli operator σ , I in at least one position, and since tr σ = 0 we then
have tr g = 0 (see the last item of Proposition 10.1). Secondly, tr g is the sum of g’s eigenvalues
with multiplicity. Thus +1 and −1 occur with the same multiplicity, which is then the common
dimension of the two eigenspaces of g. The projectors onto these two eigenspaces are seen to be
1±g
Pg± := (120)
2
(here as well, 1 ∈ Πn is the n-qubit identity operator).
Notice, by the way, that these projectors are sums over hgi = {1, g} and h−gi = {1, −g}, which
are both two-element (stabilizing) subgroups of Πn . More generally:

Lemma 29.7 Let G be a k-dimensional stabilizing subgroup of Πn , and let {g1 , . . . , gk } be a minimal
generating set for G. Then dim(EG ) = 2n−k , and the operator
X
PG := 2−k g (121)
g∈G

orthogonally projects onto EG .

Proof. We set P := PG for this proof. Note that for any g ∈ G,

X X X
!
−k
gP = g 2 h = 2−k gh = 2−k h=P, (122)
h∈G h∈G h∈G

the third equality following from the fact that for fixed g ∈ G, gh runs through the elements of G
as h runs through the elements of G.

185
P is Hermitean because all elements of G are Hermitean. Next, we check that P projects onto
EG by fixing every vector in EG and mapping every vector in H into EG (whence it follows that
P2 = P). For any v ∈ EG , we have
X X
Pv = 2−k gv = 2−k v=v.
g∈G g∈G

Now let u ∈ H be arbitrary. For any g ∈ G, we have gPu = Pu, and so Pu ∈ EG .

We have established that P projects onto EG . What is dim(EG ) then? The dimension of EG is
equal to the trace of P: X
dim(EG ) = tr P = 2−k tr g = 2−k tr 1 ,
g∈G

because all the other elements of G besides 1 have zero trace. We have tr 1 = tr(I⊗n ) = 2n , and so
finally,
dim(EG ) = tr P = 2n−k .
2
The next exercise shows that we can also characterize EG directly in terms of a generating set
for G, using the Pg+ projectors.

Exercise 29.8 Let G be a k-dimensional stabilizing subgroup of Πn , and let {g1 , . . . , gk } be a minimal
generating set for G, as in the previous lemma. Show that PG = Pg+1 Pg+2 · · · Pg+k , where the Pg+i are
defined by Equation (120). [Hint: Expand the right-hand side and use Exercise 29.6.]

Let’s look at some more examples. Suppose n = 4 and G = hZ1 , Z2 , Z3 , Z4 i. This is a minimal
generating set for G with four elements, so dim(EG ) = 24−4 = 1. (Oh, and G is stabilizing.) What
is a vector in EG ? We have Z|0i = |0i, so |0000i ∈ EG . Thus EG is the 1-dimensional subspace
spanned by |0000i. Now suppose G = hZ1 , Z2 , −Z3 , Z4 i. Then you should check that EG is spanned
by |0010i. Can you generalize these observations?

Exercise 29.9 For any b1 , . . . , bn ∈ {0, 1}, let G := (−1)b1 Z1 , (−1)b2 Z2 , . . . , (−1)bn Zn . Show that

EG is the 1-dimensional subspace of H spanned by |b1 · · · bn i.

Now suppose G = hX1 , X2 , X3 , X4 i. Since X|+i = |+i, we get that EG is the 1-dimensional space
spanned by |+i⊗4 .

Exercise 29.10 Suppose G = hX1 , X2 , −X3 , X4 i. Make a guess about EG and verify that your guess
is correct.

Exercise 29.11 In each case, find a 4-qubit vector that spans EG .

• G = hZ1 , Z2 , X3 , X4 i.

• G = hY1 , Y2 , Y3 , Y4 i.

186
• G = h−Z1 , Z1 Z2 , −Z1 Z2 Z3 , Z1 Z2 Z3 Z4 i.

Exercise 29.12 The last exercise suggests that the minimal generating set of a stabilizing G is not
unique (although they all have the same size). Find an alternate minimum generating set for the
group h−Z1 , Z1 Z2 , −Z1 Z2 Z3 , Z1 Z2 Z3 Z4 i of the last exercise.

Corollary 29.13 If G is a stabilizing subgroup of Πn , then dim(EG ) > 0 and G has dimension at most n.

Now we have the following:

Proposition 29.14 For any stabilizing subgroup G of Πn ,

G = Stab(EG ) .

Proof. We noticed before that G ⊆ Stab(EG ). For the reverse inclusion, let H := Stab(EG ). Then H
is stabilizing and EG ⊆ EH (since H fixes all of EG at least). Thus dim(EH ) > dim(EG ). Let k and `
be the dimensions of G and H, respectively. We have dim(EG ) = 2n−k and dim(EH ) = 2n−` , and
hence ` 6 k. But then G has cardinality 2k and H has cardinality 2` 6 2k , so we must have k = `
and G = H. 2
Not every subspace of H is of the form EG for some stabilizing G. There are infinitely many
subspaces of H but only finitely many stabilizing subgroups of Πn . For almost all subspaces E, we
have Stab(E) = {1}, but E{1} = H. The spaces of the form EG are particularly nice. For one thing,
we will use them as the code spaces for stabilizer error-correcting codes (see the topic, Stabilizer
Codes, below).
Given a stabilizing subgroup G ⊆ Πn and some h ∈ Πn that anticommutes with at least one
element of G, we end this topic with some results about how h “splits G in half.”

Lemma 29.15 Let G be a k-dimensional stabilizing subgroup of Πn and let h ∈ Πn be arbitrary. Let
C := {g ∈ G | gh = hg}. Then C is a subgroup of G, and if C , G, then C has exactly 2k−1 elements (that
is, exactly half the elements of G).

Proof. It is routine to check that C is a subgroup of G. Assuming C , G, there exists some a ∈ G

that anticommutes with h. Obviously, a , 1. Consider the map aL : G → G that takes any g ∈ G
to ag. The map aL is a bijection that satisfies aL (g) , g and aL (aL (g)) = a2 g = g for all g ∈ G.
Thus we can partition the elements of G into disjoint pairs {g, aL (g)} = {g, ag} of distinct elements
of G. Notice further, for any g ∈ G, that h commutes with g if and only if h anticommutes with
ag. Thus exactly one element of each pair is in C. It follows that C has exactly 2k−1 elements. 2
The next corollary will be useful when we discuss stabilizer codes for error correction.

Corollary 29.16 Let G, k, and h be as in Lemma 29.15, and assume that h anticommutes with at least one
element of G. Then PG hPG = 0.

187
Proof. Let C := {g ∈ G | gh = hg} as in the lemma, which implies that, since C , G by assumption,
C and G − C have the same number of elements. Setting P := PG and using Equation (122), we
compute
   
X X X X X
PhP = 2−k ghP = 2−k  ghP + ghP = 2−k  hgP − hgP
g∈G g∈C g∈G−C g∈C g∈G−C
   
X X X X
= 2−k h  gP − gP = 2−k h  P− P = 0 .
g∈C g∈G−C g∈C g∈G−C
2
The next lemma will be used to prove the Gottesman-Knill theorem, below.

Lemma 29.17 Let G, k, h, and C be as in Lemma 29.15, and suppose that coeff (h) = ±1 and neither h
nor −h is in G. Let hC := {hg | g ∈ C}. Then C ∪ hC is a stabilizing subgroup of Πn whose dimension is
k + 1 if C = G and k otherwise.

Proof. Let H := C ∪ hC. It is routine to check that H is a subgroup of Πn (using the fact that h2 = 1)
and that C and hC have the same number of elements. Furthermore, C ∩ hC = ∅, for otherwise
there are g1 , g2 ∈ C such that g1 = hg2 , but then h = g1 g2 ∈ G; contradiction.31 We conclude that
H is twice as big as C. If C = G, then H has 2k+1 many elements. Otherwise, C has 2k−1 many
elements by Lemma 29.15, whence H has exactly 2k many elements.
Finally, we show that H is stabilizing, and thus has the appropriate dimension given its size.
We already have −1 < C, since G is stabilizing, but we cannot have hg = −1 for any g ∈ C either;
otherwise, multiplying both sides by h gives −h = g ∈ G; contradiction. Therefore, −1 < H. 2

Connection to Linear Algebra Over Z2 . There is an illuminating way of describing the principal
part of an element g ∈ Πn as a 2n-dimensional row vector over the 2-element field Z2 . We
define two maps ϕx , ϕz : Πn → Zn 2 as follows: If g = α(σ1 ⊗ · · · σn ), where α ∈ {±1, ±i} and
σi ∈ {I, X, Y, Z} for 1 6 i 6 n, then define
x1 x2 · · · xn ∈ Zn

ϕx (g) := 2 ,
n

ϕz (g) := z1 z2 · · · zn ∈ Z2 ,
where for all 1 6 i 6 n,

1 if σi ∈ {X, Y},
xi :=
0 if σi ∈ {I, Z},

1 if σi ∈ {Z, Y},
zi :=
0 if σi ∈ {I, X}.
Observe that ϕx (g) and ϕz (g) together uniquely determine princ(g) but ignore coeff(g) completely.
Most importantly, one should verify that for any g, h ∈ Πn ,
ϕx (gh) = ϕx (g) + ϕx (h) ,
ϕz (gh) = ϕz (g) + ϕz (h) ,
31
This is a basic result of group theory—cosets of a finite subgroup are all the same size and pairwise disjoint.

188
where the right-hand operations are both vector addition modulo 2.32 Now define

ϕ(g) := ϕx (g) ϕz (g) ,

the 2n-dimensional row vector obtained by concatenating ϕx (g) with ϕz (g).33 For any g, h ∈ Πn ,
we have ϕ(g) = ϕ(h) if and only if princ(g) = princ(h), and

ϕ(gh) = ϕ(g) + ϕ(h)

as well.
If G is a stabilizing subgroup of Πn , then ϕ is one-to-one when restricted to G (this follows
from Lemma 29.3(3)). For any g1 , . . . , gk ∈ G, we can form the k × 2n matrix
   
ϕ(g1 ) ϕx (g1 ) ϕz (g1 )
 ϕ(g2 )   ϕx (g2 ) ϕz (g2 ) 
M :=  =
   
.. .. .. 
 .   . . 
ϕ(gk ) ϕx (gk ) ϕz (gk )

whose rows are the vectors ϕ(gi ) for 1 6 i 6 k. Now assume G = hg1 , . . . , gk i. Then the vectors
ϕ(g) for g ∈ G are exactly the linear combinations (over Z2 ) of the rows of M. Furthermore,
{g1 , . . . , gk } is a minimal generating set if and only if the rows of M are linearly independent over
Z2 (see Lemma 29.21 below). More generally, the dimension of G is equal to the rank of M.
The map ϕ can also easily tell us whether two given elements g, h ∈ Πn commute or anticom-
mute. We define the following inner product34 of the row vectors ϕ(g) and ϕ(h) as

ϕ(g) · ϕ(h) := ϕx (g)(ϕz (h))T + ϕz (g)(ϕx (h))T ∈ Z2 ,

where the right-hand side operations are in Z2 .

Exercise 29.18 For the g and h of Exercise 29.1, give ϕ(g) and ϕ(h) as well as ϕ(g) · ϕ(h). Do the
same for the g and h of the example immediately preceding Exercise 29.1.

We have the following:

Lemma 29.19 For every g, h ∈ Πn ,

1 if gh = −hg,
ϕ(g) · ϕ(h) =
0 if gh = hg.

Exercise 29.20 Prove this lemma.

The next lemma gives a linear algebraic way to determine if given elements of Πn form a
minimal generating set for a stabilizing subgroup. The linear algebra is over Z2 .
32
Any map that preserves operations in this way is known as a group homomorphism.
33
This vector is sometimes written as ϕx (g) ⊕ ϕz (g), but we avoid that notation here as it may be confusing.
34
This is an example of what is known as a symplectic inner product.

189
Lemma 29.21 Let g1 , . . . , gk ∈ Πn be arbitrary. Then {g1 , . . . , gk } is a minimal generating set for a
stabilizing subgroup of Πn if and only if

1. coeff(gi ) = ±1 for all 1 6 i 6 k,

2. ϕ(gi ) · ϕ(gj ) = 0 (equivalently, gi gj = gj gi by Lemma 29.19) for all 1 6 i, j 6 k, and
3. the vectors ϕ(g1 ), . . . , ϕ(gk ) are linearly independent.

Proof. The forward direction comes immediately from Lemma 29.3, except for the third item,
which follows from Exercise 29.6: any linear dependence of the ϕ(gi ) would correspond to a
nonempty product of the gi equalling 1, contradicting the minimality of the generating set.
For the reverse direction, assume all three conditions hold. Let G := hg1 , . . . , gk i. To show that
G is stabilizing, suppose that −1 ∈ G. By commutativity (the second condition) and the fact that
g2i = 1 for all 1 6 i 6 k (the first condition), we must be able to write
−1 = ge1 1 · · · gekk ,
for some e1 , . . . , ek ∈ {0, 1}. But then,
0 = 0 · · · 0 = ϕ(−1) = ϕ(ge1 1 · · · gekk ) = e1 ϕ(g1 ) + · · · + ek ϕ(gk ) .

So by the linear independence of {ϕ(g1 ), . . . , ϕ(gk )} (the third condition), we must have e1 = · · · =
ek = 0; but then, g01 · · · g0k = 1 , −1. Contradiction. Thus G is stabilizing.
Finally, if {g1 , . . . , gk } were not minimal, then we could write some gj as a product of the other
gi , but then ϕ(gj ) is a linear combination of the other ϕ(gi ), contradicting linear independence. 2

Stabilizer Circuits and the Gottesman-Knill Theorem. Our first application of stabilizers is
to show the Gottesman-Knill theorem, which says that quantum circuits employing only H, S,
and C-NOT gates can be simulated efficiently (i.e., in polynomial time) on a classical computer.
We call these circuits stabilizer circuits. Initial states must be computational basis states, and all
measurements are computational basis measurements.35
n
As before, we let H C2 be the n-qubit Hilbert space with the usual computational basis.
The whole idea is to keep track of the quantum state ρ = |ψihψ| inside an n-qubit circuit, not as
a superposition of basis states as we have been doing, but rather as a set of generators of an n-
dimensional stabilizing subgroup of Πn that stabilizes |ψi. When some gate U is applied, mapping
the state |ψi to state |ψ 0 i = U|ψi, we can easily update our generating set to that of a new subgroup
that stabilizes |ψ 0 i.
The gates H (Hadamard gate) and S (phase gate) applied to any qubit and the C-NOT gate
applied to any pair of qubits have a special property that makes the above possible: they normalize
Πn . A unitary operator U ∈ L(H) is said to normalize36 Πn iff, for any unitary operator g ∈ L(H),
g is in Πn if and only if UgU∗ ∈ Πn . The unitary operators that normalize Πn themselves form a
group, called the n-qubit Clifford group Cn . One can show that Cn is generated (up to an arbitrary
global phase) by the three types of operators mentioned above: H, S, and C-NOT, which are
sometimes called Clifford gates.
35
Improvements and generalizations to this theorem were made in a subsequent paper by Aaronson and Gottesman.
36
This term comes from group theory and has nothing to do with making a vector have unit length.

190
Exercise 29.22 The n-qubit Pauli group is clearly a subgroup of the n-qubit Clifford group, so we
can allow Pauli gates in a stabilizer circuit “for free.”

1. For all nine combinations of g, U ∈ {X, Y, Z}, give UgU∗ (= UgU).

2. Show how the three Pauli operators X, Y, and Z (not necessarily in that order) can be written
as products of H and S gates only. [Hint: It may help to picture how these gates rotate the
Bloch sphere.]

We describe the classical simulation of a stabilizer circuit in three steps: (1) representing the
initial quantum state before the circuit is applied; (2) showing how to update this representation
as each Clifford gate of the circuit is applied; and (3) computing outcome probabilities and the
post-measurement state of a 1-qubit measurement in the computational basis. We will take these
in order, but first recall the projector P = PG of Equation (121) for an arbitrary stabilizing subgroup
G. If G has dimension n, then PG projects onto a subspace of dimension 2n−n = 1, in which case,
PG is a pure state that we can represent by a minimal generating set {g1 , . . . , gn } of G. This is how
we will represent states as the computation proceeds.
We assume the initial state being fed to the circuit is some computational basis state |ϕ0 i :=
|b1 b2 · · · bn i,
where each bj ∈ {0, 1}. In Exercise
29.9, you effectively showed that |ϕ0 ihϕ0 | = PG
where G = (−1)b1 Z1 , (−1)b2 Z2 , . . . , (−1)bn Zn . So this is our representation of the initial basis
state of the circuit.
Now suppose that at some stage in the circuit’s application the state is ρ = PG for some
stabilizing G = hg1 , . . . , gn i just before some Clifford gate U is applied. We claim that immediately
after U is applied, the new state ρ 0 = UρU∗ is equal to PG 0 , where

G 0 := UGU∗ = hUg1 U∗ , . . . , Ugn U∗ i .

To see the claim, first note that each Ugj U∗ is in Πn for 1 6 j 6 n, because U is a Clifford gate.
Next, notice that when multiplying terms of the form Ugj U∗ together, the inner U’s cancel, e.g.,
(Ug1 U∗ )(Ug2 U∗ ) = Ug1 g2 U∗ . This fact helps one to see that G 0 = hUg1 U∗ , . . . , Ugn U∗ i. Next, we
can see that G 0 must be a stabilizing subgroup of Πn , for otherwise, −1 ∈ G 0 , and it follows that

−1 = −U∗ U = U∗ (−1)U ∈ U∗ G 0 U = U∗ (UGU∗ )U = G ,

contradicting the fact that G is stabilizing. Now G 0 evidently has the same number of elements
as G, which is 2n , and so G 0 has dimension n. Finally, let |ψi be any unit vector in EG (and thus
ρ = |ψihψ|) and let g be any element of G. Then

UgU∗ (U|ψi) = Ug|ψi = U|ψi .

That is, U|ψi is fixed by UgU∗ . Since any element g 0 ∈ G 0 can be written in this form, we get that
U|ψi ∈ EG 0 . Thus ρ 0 = U|ψihψ|∗ U projects onto EG 0 and so is equal to PG 0 .
Having established the claim, it remains to compute Ugj U∗ given gj , for 1 6 j 6 n. This is
easy, given the limited choices for U. For example, if U = H1 and gj = Z ⊗ · · ·, then

UgU∗ = HZH ⊗ · · · = X ⊗ · · · ,

191
where the omitted Pauli operators remain unchanged. This example generalizes to H, S, and
C-NOT acting on any of the qubits. The table below gives the results of the three Clifford gates
U conjugating Pauli gates σ. Here, 1 6 i, j 6 n and i , j, and we only include the cases where
UσU∗ , σ. We could have omitted the Y-gates from the second column of the table because
Y = iXZ, and so conjugating Y is the same as conjugating Z followed by conjugating X and
inserting the global phase shift i = eiπ/2 (that is, UYU∗ = i(UXU∗ )(UZU∗ )). Recall that C-NOTi,j
has qubit i as the control and qubit j as the target.

U σ UσU∗
Hi Xi Zi
Yi −Yi
Zi Xi
Si Xi Yi
Yi −Xi
C-NOTi,j Xi Xi Xj
Yi Yi Xj
Yj Zi Yj
Zj Zi Zj

Exercise 29.23 Extend the table above to include entries for U = Xi , U = Yi , and U = Zi .

Exercise 29.24 Consider the somewhat randomly chosen stabilizer circuit below:

S H Z

H Y

H X S H

Assuming the initial state is |110ih110|, give a set of three generators for the group stabilizing the
state after each C-NOT gate is applied. The initial state is stabilized by the generators

g1 = −Z ⊗ I ⊗ I = −Z1
g2 = −I ⊗ Z ⊗ I = −Z2
g3 = I⊗I⊗Z = Z3

Give the other sets of generators in the same format, but you can omit all the ⊗’s.

Finally, we consider single-qubit computational basis measurements. Given a state ρ = PG

where G = hg1 , . . . , gn i as before, we will assume for convenience that the first qubit is measured
(measurements on other qubits are handled similarly). We have to determine the probabilities of
the two possible outcomes 0 and 1, as well as the post-measurement states for each, represented
as stabilizing subgroups G0 and G1 of Πn , respectively. The projectors for the two outcomes are

192

+ 1 0 − 0 0
PZ = (1 + Z1 )/2 = ⊗ I ⊗ · · · ⊗ I for outcome 0 and PZ1 = (I − Z1 )/2 = ⊗I⊗· · ·⊗I
1 0 0 0 1
for outcome 1 (see Equation (120)). The outcome probabilities are
D E
+ + + +
Pr[0] = PZ 1
, ρ = tr PZ1 ρ = tr PZ1 ρPZ1 ,
D E
− − − −
Pr[1] = PZ 1
, ρ = tr PZ1 ρ = tr PZ1 ρPZ1 ,

with respective post-measurement states

ρ0 := Pr[0]−1 PZ
+
1
+
ρPZ1
,
ρ1 := Pr[1]−1 PZ
−
1
−
ρPZ1
.
± ± ± ±
To compute PZ 1
ρPZ 1
, first note that for g ∈ G, if gZ1 = Z1 g, then gPZ1
= PZ1
g, but if gZ1 = −Z1 g,
± ∓ ± ∓
then gPZ1 = PZ1 g. Also note that PZ1 PZ1 = 0. From this you can perhaps see how this is going to
go—we split ρ = PG into those terms that commute with Z1 and those that anticommute with Z1 ;
the latter terms disappear:
X
± ± −n ± ±
PZ1
ρPZ1 = 2 PZ 1
gPZ1
(123)
g∈G
 
X X
± ± ± ±
= 2−n  PZ1
gPZ1
+ PZ1
gPZ1
(124)
g∈G:gZ1 =Z1 g g∈G:gZ1 =−Z1 g
 
X X
± ± ± ∓ 
= 2−n  PZ P g+
1 Z1
PZ P g
1 Z1
(125)
g∈G:gZ1 =Z1 g g∈G:gZ1 =−Z1 g
X
−n ±
= 2 PZ1
g (126)
g∈G:gZ1 =Z1 g
X X X
!
= 2−n−1 (1 ± Z1 )g = 2−n−1 g± Z1 g . (127)
g∈G:gZ1 =Z1 g g g

Now we have to consider three cases: (1) Z1 is in G; (2) −Z1 is in G; and (3) neither Z1 nor −Z1 is
in G. These cases are mutually exclusive, because G is stabilizing.
P P
Case 1: Z1 ∈ G. Then Z1 commutes with all elements of G, making both sums gg and g Z1 g
sum over all elements of G and thus be equal. It follows that
X
+ + −n
PZ 1
ρPZ 1
= 2 g=ρ,
g∈G
Pr[0] = tr ρ = 1 ,
− −
PZ1
ρPZ1
= 0,
Pr[1] = tr 0 = 0 .

To summarize, the outcome is 0 with certainty and the post-measurement state is unchanged:
ρ0 = ρ. (ρ1 is undefined, because you cannot divide by 0.) We also have G0 = G, so there is
no need to change the stabilizing group representing the post-measurement state.

193
Case 2: −Z1 ∈ G. ThisPis similar P
to the previous case
P (particularly, Z1 commutes with all of G)
except that now, g∈G g = g∈G (−Z1 )g = − g∈G Z1 g. This gives
+ +
PZ1
ρPZ1
= 0,
Pr[0] = tr 0 = 0 ,
X
− − −n
PZ1
ρPZ1 = 2 g=ρ,
g∈G
Pr[1] = tr ρ = 1 .

To summarize, the outcome is 1 with certainty and the post-measurement state is unchanged:
ρ1 = ρ. (ρ0 is undefined.) Again, we have G1 = G, so there is no need to change the stabilizing
group for the post-measurement state.

Case 3: neither Z1 nor −Z1 is in G. This is the hardest of the three cases to analyze. The probabil-
ity of outcome 0 is

X X
!

+ + −n−1
Pr[0] = tr PZ1 ρPZ1 = 2 tr g + tr (Z1 g) ,
g g

where both sums are over those g ∈ G that commute with Z1 . We now show that the
second sum disappears entirely: for all g ∈ G we must have princ(Z1 g) , 1 (for otherwise,
princ (Z1 ) = princ (Z1 gg) = princ (Z1 g) princ (g) = princ (g), and so Z1 = ±g, contradicting
the fact that neither Z1 nor −Z1 is in G); thus tr (Z1 g) = 0 for all g ∈ G. The only term that
survives in the first sum is g = 1; all others have zero trace. Thus
1
Pr[0] = 2−n−1 tr 1 = 2−n−1 tr I⊗n = 2−n−1 2n = = Pr[1] .
2
Thus outcomes 0 and 1 occur with equal odds. For the post-measurement states, we will see
that G0 and G1 differ from each other and from G. In fact, Z1 ∈ G0 and −Z1 ∈ G1 . A full
analysis will use Lemma 29.25, below.

Lemma 29.25 Let G be an n-dimensional stabilizing subgroup of Πn and let h ∈ Πn be such that
coeff (h) = ±1 and neither h nor −h is in G. Let C := {g ∈ G | gh = hg} and hC := {hg | g ∈ C}. Then
C , G and C ∪ hC is an n-dimensional stabilizing subgroup of Πn .

Proof. This all follows immediately from Lemma 29.17 (with k = n) provided we can show
that C , G. Let H := C ∪ hC. H is a stabilizing subgroup of Πn by Lemma 29.17 of dimension
either n or n + 1 (the latter if C = G). By Corollary 29.13, no stabilizing subgroup of Πn can
have more than 2n elements, and therefore H has dimension n. This implies that C , G, since
|C| = |H|/2 = 2n−1 < 2n = |G|.37 2
We apply Lemma 29.25 twice—once with h = Z1 and again with h = −Z1 —to find the two
alternative post-measurement states (actually the groups G0 and G1 that stabilize them) in the case
where neither Z1 nor −Z1 is in G. Letting C ⊆ G be the set of all elements of G that commute
37
Here we use the vertical bars to indicate the cardinality of a set.

194
with Z1 , we define G0 := C ∪ Z1 C and G1 := C ∪ (−Z1 )C. By the lemma, both are n-dimensional
stabilizing subgroups of Πn . We now verify that for b ∈ {0, 1}, PGb is the post-measurement state
given outcome b. From Equations (123–127), the lemma, and the fact that Pr[0] = Pr[1] = 1/2, we
have
X X
+ +
ρ0 = 2PZ 1
ρPZ1
= 2−n (g + Z1 g) = 2−n g = PG0 ,
g∈C g∈G0
X X X
− − −n −n
ρ1 = 2PZ1
ρPZ1
=2 (g − Z1 g) = 2 (g + (−Z1 )g) = 2−n g = PG1 .
g∈C g∈C g∈G1

There are two things left to do to complete the proof of the Gottesman-Knill theorem: (1) show
how to determine easily which of the three cases applies for a 1-qubit measurement; (2) in Case 3,
determine generators for G0 and G1 given generators for G.
We are given a minimal set of generators S := {g1 , . . . , gn } for G = Stab ρ, the group stabilizing
the pre-measurement state. We can easily distinguish Case 3 from the other two: by Lemma 29.25,
one of ±Z1 is in G if and only if Z1 commutes with all of G, if and only if Z1 commutes with all
of S. The latter can be easily checked: Z1 commutes with gj if and only if princ(gj ) = I ⊗ · · · or
princ(gj ) = Z ⊗ · · ·.
If Z1 commutes with all of S, then to determine which of Z1 or −Z1 is in G, we first find a
subset T ⊆ S whose elements multiply to ±Z1 , then we actually multiply these together to see
what we get—either Z1 or −Z1 . Finding T is essentially a problem in linear algebra over Z2 , via the
correspondence given in the previous topic. Using standard techniques of linear algebra (Gaussian
elimination, particularly), we can express ϕ(Z1 ) as a linear combination (actually a simple sum,
since the field is Z2 ) of elements from the set {ϕ(g1 ), . . . , ϕ(gn )}. Multiplying together those gj
such that ϕ(gj ) appears in the sum will yield ±Z1 .
Finally, here is how to get generators for Gb for b ∈ {0, 1} in the case where neither Z1 nor
−Z1 is in G. Choose one of the generators that anticommutes with Z1 —suppose it is g1 (it doesn’t
matter which one it is). Replace g1 in the generating set with (−1)b Z1 (that is, Z1 if b = 0 and −Z1
if b = 1), then for any other generator gj that anticommutes with Z1 , replace gj by g1 gj . The result
is a generating set Sb for Gb .
To see why, first note that all elements of Sb commute with Z1 , and so they are all in C except
for ±Z1 itself. Thus Sb ⊆ Gb . It remains to show that all of Gb is generated by Sb . For this it
suffices to show that all of C is generated by Sb , because (−1)b Z1 is also in Sb . Every element
g ∈ G is a unique product of distinct elements from S, say,

g = gi1 gi2 · · · gik

for some k and 1 6 i1 < i2 < · · · < ik 6 n. Then g ∈ C (that is, g commutes with Z1 ) if and only
if Z1 anticommutes with an even number of the factors gij : starting with Z1 g = Z1 gi1 gi2 · · · gik ,
transpose Z1 with gi1 then gi2 , etc., until Z1 winds up on the far right. To keep all these expressions
equal, every time Z1 anticommutes with one of these factors, we must introduce a minus sign out
front, so the minus signs cancel just when there are an even number of such anticommutations.
Thus we can group the factors gij < C into adjacent pairs inside the product on the right-hand
side, above. But now each pair is obtainable from Sb . For example, if g3 and g5 are paired, then
g3 g5 = g21 g3 g5 = (g1 g3 )(g1 g5 ), and both g1 g3 and g1 g5 are in Sb . The other unpaired factors are in

195
C and hence are already in Sb . This shows that every element of C is the product of factors from
Sb , which finishes the proof of the Gottesman-Knill theorem.

Remark. The only difference between S0 and S1 above is that S0 contains Z1 and S1 contains −Z1
instead. One can easily simulate any number of 1-qubit measurements in the circuit by maintaining
a single expression for Sb (involving b) after each successive measurement.

Exercise 29.26 Referring back to Exercise 29.24, suppose that after all the gates are applied, the
three qubits are measured in order. Give the probability of the outcomes of each measurement and
the corresponding post-measurement states. Subsequent measurements may depend on previous
results. Describe all possible sequences of outcomes and their probabilities.

Stabilizer Codes.

29.2 Entanglement

Suppose we have two physical systems with Hilbert spaces H and J. If the first system is prepared
in some pure state ρ ∈ L(H) and the second is independently prepared in a pure state σ ∈ L(J)—
and the two systems do not interact in any way—then the (pure) state of the combined system is
ρ ⊗ σ ∈ L(H) ⊗ L(J) L(H ⊗ J).38 We call such a pure state separable between the two systems, or
a tensor product state. For pure states, being a tensor product state and being a separable state are
equivalent, but for general (not necessarily pure) states, the notion of separability is looser. We will
only consider pure states, and then only those of two combined systems—so-called “bipartite”
pure states.
L(H ⊗ J) contains lots of pure states that are not of the form ρ ⊗ σ as above. Such pure states
are said to be entangled. Roughly, two physical systems are in an entangled state when they are
correlated in a non-classical (uniquely quantum) way. The four Bell states are entangled states of
two single-qubit systems—maximally entangled, it will turn out. None of them can be written as
the tensor product of two single-qubit states.
Our first task is to find a way to quantify mathematically the amount of entanglement of a pure
state shared between two systems. For this we use the Schmidt decomposition.

Theorem 29.27 (Schmidt Decomposition) Let H and J be Hilbert spaces, and let u ∈ H ⊗ J be any
unit vector. There exists a unique integer r > 0 and unique positive values s1 > · · · >P sr > 0 such that
there exist pairwise orthogonal unit vectors x1 , . . . , xr ∈ H and y1 , . . . , yr ∈ J such that rk=1 s2k = 1 and

X
r
u= sk (xk ⊗ yk ) . (128)
k=1

In fact, {s21 , . . . , s2r } is the multiset of nonzero eigenvalues of trJ (uu∗ ) and of trH (uu∗ ).
38
The relation indicates that these two spaces are naturally isomorphic.

196
Proof. The Schmidt decomposition is really the singular value decomposition in disguise. Pick
some standard orthonormal bases e1 , . . . , en for H and f1 , . . . , fn for J. (We will assume that
dim(H) = dim(J) = n for technical convenience, but this is not necessary.) We expand u with
respect to the product basis {ei ⊗ fj }16i,j6n as
X
u= αi,j (ei ⊗ fj )
16i,j6n

for some unique coefficients αi,j ∈ C.

Let A be the n × n matrix whose entries are the αi,j , i.e., [A]ij = αi,j for all 1 6 i 6 n and
1 6 j 6 n. By the singular value decomposition (Theorem B.9 in Section B.3) there exist unique real
values s1 > s2 > · · · > sn > 0 such that there exist n × n unitary matrices V, W with A = VDW,
where D = diag(s1 , . . . , sn ). Let r be the largest natural number such that sr > 0. We have r > 1
because A , 0. Now for all 1 6 k 6 r, let
X
n
xk := [V]ik ei ,
i=1
X
n
yk := [W]kj fj .
j=1

There are three things to check here. We first check that {xk } and {yk } are orthonormal sets of
vectors. Using the fact that V is unitary, we have, for all 1 6 k, ` 6 r, letting vik := [V]ik ,
X X X X
* n n
+ n
∗
[V ∗ ]ki [V]i` = [V ∗ V]k` = [I]k` = δk` .

hxk , x` i = vik ei , vj` ej = vik vj` ei , ej =

i=1 j=1 i,j i=1

A similar calculation shows that hyk , y` i = δk` , using the unitarity of W. Thus both {xk } and {yk }
are orthonormal sets.
Second, we have
X X
u = [A]ij (ei ⊗ fj ) = [VDW]ij (ei ⊗ fj )
16i,j6n i,j
X X
= [V]ik [D]k` [W]`j (ei ⊗ fj )
i,j 16k,`6n
X
= sk [V]ik δk` [W]`j (ei ⊗ fj )
i,j,k,`
X
= sk [V]ik [W]kj (ei ⊗ fj )
i,j,k
X r X
= sk [V]ik [W]kj (ei ⊗ fj )
k=1 i,j
 
X
r X X
!
= sk [V]ik ei ⊗  [W]kj fj 
k=1 i j
X
r
= sk (xk ⊗ yk ) .
k=1

197
Since u is a unit vector, we also have
X
r X
r X
r
1 = u∗ u = sk s` x∗k x` y∗k y` = s2k ,
k=1 `=1 k=1

the last equation following from the fact that x∗k x` = y∗k y` = δk` .
Finally, we have

X
r X
r
! !!
trJ (uu∗ ) = trJ sk (xk ⊗ yk ) s` (x∗` ⊗ y∗` )
k=1 `=1
X
= trJ sk s` (xk x∗` ⊗ yk y∗` )
k,`
X
= sk s` tr(yk y∗` )xk x∗`
k,`
X
r
= s2k xk x∗k ,
k=1

where for the last equation

P we have used the fact that tr(yk y∗` ) = tr(y∗` yk ) = hy` , yk i = δk` .
r
Similarly, trH (uu∗ ) = k=1 sk yk yk . Thus {sk | 1 6 k 6 r} is the multiset of the nonzero
2 ∗ 2
∗ ∗
eigenvalues of trJ (uu ) and of trH (uu ), with corresponding eigenvectors xk and yk , respectively.
We then have that r is the common rank of trJ (uu∗ ) and of trH (uu∗ ), and since sk > 0 for all
1 6 k 6 r, it follows that r and s1 , . . . , sr are uniquely determined by u. 2
The number r is called the Schmidt number, or Schmidt rank of u; the λj are called the Schmidt
coefficients of u. The (not necessarily unique) vectors x1 , . . . , xr and y1 , . . . , yr are known collectively
as a Schmidt basis for u, although they may not span their respective spaces.
The Schmidt rank r of u tells us immediately whether u is entangled. The answer is yes if and
only if r > 1. The Schmidt coefficients give more information about the degree of entanglement,
which we now explore.

Shannon entropy and von Neumann entropy.

Definition 29.28 (Shannon entropy) Given a probability distribution p = (p1 , . . . , pn ),39 we define
the Shannon entropy of p to be the quantity

X
n
H(p) = − pi lg pi , (129)
i=1

where 0 lg 0 = 0 by convention (alternately, we restrict the sum to those i for which pi > 0).

H(p) measures the amount of uncertainty inherent in a p-distributed random experiment.

Equivalently, H(p) gives the amount of information (in bits) about the outcome obtained by running
39
Pn
That is, each pi > 0, and i=1 pi = 1.

198
such an experiment, averaged over all the possible outcomes. For any probability distribution
p = (p1 , . . . , pn ) on n outcomes,
0 6 H(p) 6 lg n .
H(p) = 0 exactly when p is deterministic, i.e., pi = 1 for some i (and thus pj = 0 for all j , i).
H(p) = lg n if and only if p is the uniform distribution, i.e., pi = 1/n for all 1 6 i 6 n.
Shannon entropy has a quantum analogue. For any state ρ, we have ρ > 0 and tr ρ = 1, which
is equivalent to the spectrum of ρ being a probability distribution.

Definition 29.29 (Von Neumann entropy) Let H be a Hilbert space of dimension n > 0, let ρ ∈
L(H) be a state, and let λ := (λ1 , · · · , λn ) be the spectrum (vector of eigenvalues) of ρ, where
λ1 > · · · > λn . We define the von Neumann entropy of ρ as

H(ρ) := H(λ) , (130)

where the right-hand side is the Shannon entropy of λ.

The von Neumann entropy of ρ can be written succinctly as

H(ρ) = − tr(ρ lg ρ) .

If ρ > 0, then this expression makes perfect sense using the rule for applying a scalar function to a
normal operator that we discussed in Section 9. If one or more of ρ’s eigenvalues is zero, however,
then some care must be taken with this expression, just as we did when defining Shannon entropy
in the case where one or more of the probabilities was zero. In this case we can confine ρ to a
subspace on which it acts positive definitely. Let V := im(ρ) := {ρv | v ∈ H}, and let σ be ρ
restricted to V (V is a subspace of H). Then σ > 0, and so we can define H(ρ) := − tr(σ lg σ), and
this coincides with Definition 29.29.
Analogous to Shannon entropy, von Neumann entropy quantifies the amount of uncertainty
about a quantum state. We can view a pure state as one about which
we have complete information.
If ρ1 , . . . , ρk are pure states that are pairwise orthogonal (that is, ρi , ρj = 0 for all i , j), then
there is a projective measurement that can distinguish each ρi from the others with certainty.
(The projectors of this measurement are the ρi themselves, each ρi corresponding to outcome i,
P
possibly together with one additional projector P := I − k i=1 ρi for the outcome “none of the
above,” assuming this projector is nonzero.) Keeping with this view, a mixed state ρ ∈ L(H) can
be thought of as a state about which we have incomplete information, that is, our information about
ρ is only statistical. We can regard ρ as a probabilistic mixture of pairwise orthogonal pure states,
and mathematically, these component pure states can come from the spectral decomposition of ρ:

X
n
ρ= pi ui u∗i ,
i=1

where n = dim(H), (p1 , . . . , pn ) is a probability distribution, {ui | 1 6 i 6 n} is an eigenbasis for

ρ, and each pure state is ρi := ui u∗i for 1 6 i 6 n. In this view, the von Neumann entropy of ρ is
just the Shannon entropy of the classical probability distribution (p1 , . . . , pn ).

199
How does all this relate to entanglement? Given a pure state ρ := uu∗ , where u is a unit vector
in H ⊗ J, if we decide to ignore (by tracing out) one or the other system, then the reduced state
(i.e., the state of the remaining system) will be mixed if and only if ρ is entangled. This suggests a
natural quantitative measure of the amount of entanglement in ρ—the amount of uncertainty we
have about either of these reduced states, given by their von Neumann entropy. This quantity can
be computed directly from the Schmidt coefficients s1 , . . . , sr of ρ:
H(trH (ρ)) = H(trJ (ρ)) = H(s21 , . . . , s2r ) ,
noting that (s21 , . . . , s2r ) is a probability distribution by Theorem 29.27.

29.3 Bell inequalities

In 1935, Albert Einstein, Boris Podolsky, and Nathan Rosen (EPR from now on) published a paper
arguing that the laws of quantum mechanics, although correct to the best of anyone’s knowledge,
were not a complete description of nature. Their argument was based on two assumed principles,
which are commonly called “locality” and “realism”:

Locality. All physical influences act locally; put another way, there is no action at a distance. Object
A cannot directly influence a distant object B without some intervening continuum of local
influences connecting the two. For example, that two distant, oppositely charged particles
attract is not due to any direct influence between them but rather due each responding
(locally) to the other’s electromagnetic field, which permeates all of space. For another
example, according to general relativity, a massive object warps the spacetime around it so
that nearby objects move along curved paths, even though locally they are still moving in
straight lines.
Realism. A complete knowledge of the state of a physical system is, in principle, enough to predict
the outcome of every possible measurement of that system. For example knowing the exact
trajectory of an asteroid now (as well as the gravitational forces acting on it) allows us to
predict where it will be a year from now, the accuracy of the prediction limited only by the
precision of the initial measurements and of our calculations.

If we prepare an electron spin in state |+i (i.e., |→i, spin-right) and we then measure its spin
in the vertical direction, we get an apparently uniformly random result: spin-up about half the
time, spin-down the rest of the time. The realistic view says that this apparent randomness is not
fundamental physics; it is instead an artifact of our incomplete understanding of the state of the
electron. There are aspects of the electron’s state that we don’t know about and that our current
theory is not accounting for—so called hidden variables—that predetermine its vertical spin before
we measure it. If we think we are preparing each electron in the same state |+i but getting different
results measuring the spin vertically, then the electrons really aren’t in the same state to begin
with, and our theory is not (as yet) adequate to account for that difference. A complete description
of a physical state would determine all measurement outcomes; nature is not inherently random.
(Einstein: “God does not play dice.”) This is the realist view of physics.
In a modification40 of EPR’s argument, we consider a system of two spin- 21 particles, created
together in a lab then separated from each other by an arbitrary distance. The particles are created
40
This modification is close to one due to David Bohm in 1951.

200
in a closed system with zero net angular momentum in any direction; conservation of angular
momentum then requires that the two spins always be measured as opposite, resulting in a net√spin
of zero. Quantum mechanics dictates that the pair is in the entangled state |Ψ− i = (|↑↓i−|↓↑i)/ 2—
called the spin singlet state—which has the property that if we measure the spin of each particle
in the same direction, we will always get opposite results, regardless of the direction chosen.
Now imagine the two particles being moved very far apart (even lightyears apart), Alice having
one and Bob the other. According to quantum mechanics, if Alice or Bob measures their spin in
the vertical direction they will see ↑ or ↓ uniformly at random, but, if, say, Alice measures her
spin and sees ↑, then Bob must see ↓, and vice versa, although he interprets his own result as
being uniformly random. According to the prevailing interpretation of quantum mechanics (the
so-called Copenhagen interpretation), when Alice measures her spin and sees ↑, say, the state of
the system “collapses” to |↑↓i, ensuring that Bob will subsequently measure ↓. This interpretation
appears to violate local realism: Alice’s measurement result alters the system’s state, thus magically
influencing Bob’s measurement, even though Bob is in another galaxy and (even worse), light may
not have time to travel from one measurement event to the other (the two events are spacelike
separated). Einstein criticized this interpretation as “spooky action at a distance.”
EPR posited an alternative, local realist interpretation of this scenario. When the two particles
were first created together in the lab, hidden variables were fixed locally between the two particles,
predisposing Alice’s particle to result in ↑ when measured, and Bob’s particle to result in ↓.
These hidden parameters perfectly correlated the two particles when they were in the same place
(locality), then were carried with the particles as they were separated; the measurement results
were not random, but were predetermined by these hidden variables (realism). Entanglement and
the subsequent state collapse are fictions; Alice was always going to see ↑, say, and Bob ↓, the die
being cast when the particles were created in the same place.
In the following decades, the EPR argument was taken less as physics than as philosophy,
since there seemed to be no experiment that could confirm or refute it (quantum mechanics versus
local realism). Then in 1964, John Bell showed how EPR’s interpretation has direct physical conse-
quences. He proposed a clever physics experiment that could test the EPR hypothesis. Bell showed
that local realism implies that statistical correlations between measurements of spatially separated
systems must satisfy certain constraints—now known as Bell inequalities—whereas quantum me-
chanics predicts that these constraints are violated. By taking a large enough sample of runs of
the same experiment and gathering the statistics, one could either confirm or refute (based on
statistical evidence, at least) the local realist interpretation.
A number of different Bell inequalities are now known, and we consider two in depth below.
Several experiments have been performed to test these inequalities. Although doubts have been
raised from time to time about statistical loopholes allowing for a local realist interpretation of
some of the experimental results, the overwhelming evidence at this point is that nature violates
the Bell inequalities. Hidden-variable theories are refuted, and there is every reason to think that
quantum mechanics offers a complete description of reality. A good philosophical discussion of
the EPR paradox can be found online in the Stanford Encyclopedia of Philosophy (https://plato.
stanford.edu/entries/qt-epr/).

We give two examples of Bell inequalities in this section, showing how the laws of quantum
mechanics violate each. Each is cast in terms of a nonlocal game, which we now describe. A nonlocal

201
game is a cooperative game played by two parties, Alice and Bob, and a referee. The referee first
produces a pair (s, t) of values, probabilistically drawn from some finite set. The ref gives s to
Alice and t to Bob. Alice then produces a value a and Bob a value b, which they send back to the
referee. Then Alice and Bob win if the tuple (s, t, a, b) satisfies a certain finite condition (which
depends on the type of game being played); otherwise, they lose. Before getting s and t from the
ref, Alice and Bob can get together beforehand and share any information, randomness, strategies,
etc. that they want, but they are not allowed to communicate with each other from the time they
receive s and t until the time they send a and b to the ref.
We say that Alice and Bob employ a classical strategy if what they share beforehand is purely
classical information, including shared randomness. They employ a quantum strategy if, in ad-
dition, they share an entangled quantum state beforehand. We define the value of a particular
strategy to be the overall probability of Alice and Bob winning using that strategy.
In each of the two example games we discuss below, there is a quantum strategy whose
value is strictly higher than any optimal classical strategy.41 These then imply violations of the
corresponding Bell inequalities: the local realist interpretation dictates that classical strategies are
the only ones available to Alice and Bob.
Nonlocal games and their limitations are explored extensively in a paper by Cleve, Høyer,
Toner, and Watrous (https://arxiv.org/abs/quant-ph/0404076).

The CHSH game. In this game, based on a Bell-type inequality discovered by Clauser, Horne,
Shimony, and Holt, the referee chooses values s, t ∈ {0, 1} uniformly at random and independently,
so that each pair (s, t) occurs with probability 41 . Then Alice and Bob produce values a and b in
{+1, −1}, respectively. Alice and Bob win if and only if ab = (−1)st ; more prosaically, if s = t = 1,
then Alice and Bob win iff a , b, and otherwise, they win iff a = b. We will show that using any
classical strategy, Alice and Bob can win with probability no greater than 43 = 0.75, but using a
√
quantum strategy, they can win with probability cos2 (π/8) = (2 + 2)/4 ≈ 0.85.
First, we consider the case where Alice and Bob employ a (classical) deterministic strategy, that
is, Alice computes a := A(s), where A : {0, 1} → {+1, −1} is a function she chooses beforehand, and
in a similar fashion Bob computes b := B(t) for some B : {0, 1} → {+1, −1} of his choosing. There
are four such functions: the constant +1 function, the constant −1 function, the function mapping
x 7→ (−1)x , and the function mapping x 7→ (−1)1+x . The following table gives, for each possible
pair of functions A and B, the value of ab given each of the four possible choices of (s, t):

B(x) = +1 B(x) = −1 B(x) = (−1)x B(x) = (−1)1+x

+ + − − + − − +
A(x) = +1
+ + − − + − − +
− − + + − + + −
A(x) = −1
− − + + − + + −
+ + − − + − − +
A(x) = (−1)x
− − + + − + + −
− − + + − + + −
A(x) = (−1)1+x
+ + − − + − − +
41
An optimal classical strategy always exists, but it may not be unique.

202
Each boxed entry of the table is a 2 × 2 matrix with entries ±1. For brevity, we write “+” for +1
and “−” for −1. The rows of each matrix are indexed by the value of s (0 then 1) and the columns
similarly by t. It follows that each matrix is of the form

A(0) A(0)B(0) A(0)B(1)
M(A, B) := B(0) B(1) =
A(1) A(1)B(0) A(1)B(1)

for the given choice of functions A and B. The matrix giving (−1)st (and hence the winning values
for ab) is
+ +
W := .
+ −
If Alice and Bob choose functions A and B, respectively, then they win for each particular (s, t)
if the corresponding entry in M(A, B) matches that of W. By inspecting the table above, one
observes that for each choice of A and B, the matrix M(A, B) differs from W in at least one of the
four entries. Since each combination (s, t) occurs with probability 14 , the probability that they win
is at most 1 − 14 = 34 , no matter which A and B they choose. (Alice and Bob can achieve this optimal
probability by always outputting a = b = +1, for example. Other strategies are optimal as well.)
There are two useful things to note here that will also apply to the classical case in the next
game we consider:

1. Each M(A, B) (considered as a matrix over R) is the product of a column vector with a row
vector, and as such, has rank 1. On the other hand, we observe that W is nonsingular, of
rank 2. This is an alternate, more succinct way of seeing that M(A, B) , W for all A and B.

2. Deterministic strategies (also called pure strategies) are not the only classical strategies avail-
able to Alice and Bob. They could instead employ a mixed strategy wherein they chose their
A and B at random according to some arbitrary joint probability distribution. In this case,
however, their probability of winning is then a convex combination of their winning prob-
abilities using pure strategies, and so cannot exceed that of the best pure strategy. Thus we
only need to consider pure strategies to get an upper bound for all classical strategies.

We now turn to Alice’s and Bob’s quantum strategy. Before receiving s and t from the referee,
Alice and Bob share an EPR pair, i.e., the 2-qubit state
 
1
|00i + |11i 1  0 
|ψi := Φ+ =

√ = √   ,
2 2 0 
1

the first qubit possessed by Alice, the second by Bob. It is best to conceive of each qubit as a spin- 12
particle. After receiving s, Alice measures her particle’s spin in a certain direction depending on
the value of s. After receiving t, Bob measures his particle’s spin in a certain direction depending
on the value of t. If Alice sees spin-up (↑), then she sends a := +1 to the referee; otherwise (if
spin-down (↓), she sends a := −1. Bob computes b using the same method; the only difference
between Alice and Bob is which directions they choose to measure their respective spins.

203
Both spin measurements are in the x, z-plane. Generally, for any angle θ, a projective measure-
ment of a spin in the direction having cartesian coordinates (sin θ, 0, cos θ) (that is, clockwise from
the upward direction through angle θ) corresponds to the csop
1
P↑ (θ) = (I + (sin θ)X + (cos θ)Z) , (131)
2
1
P↓ (θ) = (I − (sin θ)X − (cos θ)Z) , (132)
2
where I is the 2 × 2 identity matrix and X and Z are the usual Pauli matrices. Upon receiving s,
Alice chooses angle θs to measure her spin. Upon receiving t, Bob chooses angle ϕt to measure
his spin. The angles θs and ϕt are given by the following tables:
s=0 s=1 t=0 t=1
θs 0 π/2 ϕt π/4 −π/4

θ0 = 0
ϕ1 = −π/4 ϕ0 = π/4

θ1 = π/2

Alice Bob

To find the winning probability, we must compute, for any combination (s, t), the probability
that a = b. Generally, if Alice measures her spin with angle θ and Bob measures his spin with
angle ϕ, then the combined measurement corresponds to the 4-outcome csop
{P↑ (θ) ⊗ P↑ (ϕ), P↑ (θ) ⊗ P↓ (ϕ), P↓ (θ) ⊗ P↑ (ϕ), P↓ (θ) ⊗ P↓ (ϕ)} .
Let C ⊗ D be any of these four projectors (or let C and D be any 2 × 2 matrices generally). We have
a handy formula for the probability of obtaining the corresponding outcome given state |ψi:

c11 c12 d11 d12
hψ|(C ⊗ D)|ψi = hψ| ⊗ |ψi
c21 c22 d21 d22
  
c11 d11 c11 d12 c12 d11 c12 d12 1
1  c11 d21 c11 d22 c12 d21 c12 d22   0 
= 1 0 0 1   
 c21 d11 c21 d12 c22 d11 c22 d12   0 
2
c21 d21 c21 d22 c22 d21 c22 d22 1
1
= (c11 d11 + c12 d12 + c21 d21 + c22 d22 )
2
1 T
= tr C D
2
Applying this formula to the projectors given by (131) and (132), and noting that I, X, and Z are all
symmetric and that X and Z have zero trace, we have
1 1 + sin θ sin ϕ + cos θ cos ϕ
Pr[↑↑] = tr ((I + (sin θ)X + (cos θ)Z)(I + (sin ϕ)X + (cos ϕ)Z)) =
8 4

204
1 1 + sin θ sin ϕ + cos θ cos ϕ
Pr[↓↓] = tr ((I − (sin θ)X − (cos θ)Z)(I − (sin ϕ)X − (cos ϕ)Z)) =
8 4
(There is no need for us to compute Pr[↑↓] or Pr[↓↑].) Thus

1 + sin θ sin ϕ + cos θ cos ϕ 1 + cos(θ − ϕ) θ−ϕ
Pr[a = b] = Pr[↑↑] + Pr[↓↓] = = = cos2 .
2 2 2

θ0 θ0
ϕ0 ϕ1

(s, t) = (0, 0) (s, t) = (0, 1)

ϕ0 ϕ1

θ1 θ1

(s, t) = (1, 0) (s, t) = (1, 1)

If s = t = 1, then |θs − ϕt | = 3π/4; otherwise, |θs − ϕt | = π/4. Finally, we can compute Alice’s
and Bob’s winning probability:

Pr ab = (−1)st = Pr[a = b ∧ st = 0] + Pr[a , b ∧ st = 1]

= Pr[st = 0] Pr[a = b | st = 0] + Pr[st = 1] Pr[a , b | st = 1]

3 π 1
3π
= cos2 + 1 − cos2
4 8 4 8

3 π 1 3π
= cos2 + sin2
4 8 4 8
3 π 1 π
= cos2 + cos2
4 8 4 8
2 π
= cos
8
as expected.

205
The Mermin game. This nonlocal game is based on a Bell inequality violation found by David
Mermin, which in turn is based on work by him and Asher Peres. In this game, the referee chooses
values s, t ∈ {0, 1, 2} uniformly at random and independently, so that each pair (s, t) is chosen with
probability 91 . Then, as with the CHSH game above, Alice and Bob produce values a and b in
{+1, −1}, respectively, but now Alice and Bob win if and only if ab = (−1)δst , where δst = 1 if s = t
and δst = 0 otherwise. In words, if s = t, then Alice and Bob win iff a , b, and if s , t, they win
iff a = b. We will show that using any classical strategy, Alice and Bob can win with probability
no greater than 79 ≈ 0.778, but using a quantum strategy, they can win with probability 56 ≈ 0.833.
First, the limits of any classical strategy. As we noted in the discussion of the CHSH game
above, we need only consider pure (deterministic) strategies for Alice and Bob. Any such pure
strategy consists of two functions A, B : {0, 1, 2} → {+1, −1}, one used by Alice and the other used
by Bob. After Alice receives s, she outputs a := A(s), and similarly, Bob outputs b := B(t) upon
receiving t.
The winning value of ab for every (s, t)-combination is given by the following 3 × 3 matrix:
 
− + +
W := (−1)δst =  + − +  ,

+ + −
where we again use “+” to mean +1 and “−” to mean −1. For each choice of A and B, the 3 × 3
matrix giving the ab-values is
 
A(0)
M(A, B) :=  A(1)  B(0) B(1) B(2) .
A(2)
For a given A and B, the winning probability is 19 times the number of entries of M(A, B) that equal
the corresponding entries of W. There are 23 = 8 choices for each of A and B, making 64 matrices
M(A, B) in all. Rather than making an exhaustive table as we did for the CHSH game, we note that
M(A, B) always has rank 1, whereas it is easily checked that W has rank 3. Thus M(A, B) , W for
all A and B, and furthermore, changing any single entry of W still leaves two linearly independent
columns (at least), resulting in a matrix with rank > 2. Thus M(A, B) must differ from W in at least
two places, giving a winning probability of at most 1 − 29 = 97 . (Alice and Bob can achieve this
probability by letting A be any nonconstant function and letting B := −A.)
For the quantum strategy, Alice and Bob share a pair of qubits in the Bell state
|01i − |10i
|χi := Ψ− = √ .
2
Again, think of Alice and Bob each having a spin- 12 particle. After receiving s ∈ {0, 1, 2} from the
referee, Alice projectively measures her spin using the csop

2πs 2πs
Alice: P↑ , P↓ ,
3 3
where P↑ (θ) and P↓ (θ) are defined by Equations (131) and (132) above for all θ ∈ R. Bob makes a
similar measurement, except based on t:

2πt 2πt
Bob: P↑ , P↓ .
3 3

206
Here are the three possible spin directions for each of Alice’s and Bob’s measurements:

s=0 t=0

s=2 s=1 t=2 t=1

Alice Bob

If Alice sees spin-up (↑), then she outputs a := +1; if spin-down (↓), then she outputs a := −1.
Likewise, if Bob sees spin-up (↑), he outputs b := +1; if spin-down (↓), he outputs b := −1.
Generally, if Alice measures using angle θ and Bob measures using angle ϕ, then the probability
of getting the same outcome is

2 θ−ϕ
Pr[a = b] = Pr[↑↑] + Pr[↓↓] = hχ|(P↑ (θ) ⊗ P↑ (ϕ))|χi + hχ|(P↓ (θ) ⊗ P↓ (ϕ))|χi = sin ,
2

where verifying the last equation is left as an exercise (Exercise 29.30, below). If s = t, then Alice
and Bob measure their spins in the same direction, giving Pr[a = b] = sin2 0 = 0, hence a , b with
certainty. If s , t, they measure their spins in directions differing by an angle of 2π/3 (in either
direction), and thus Pr[a = b] = sin2 (π/3) = 34 in this case. Therefore, the winning probability is

Pr ab = (−1)δst = Pr[s = t ∧ a , b] + Pr[s , t ∧ a = b]

= Pr[s = t] Pr[a , b | s = t] + Pr[s , t] Pr[a = b | s , t]

1 2 3
= ·1+ ·
3 3 4
5
=
6
as desired.

Exercise 29.30 Show that, for all real θ and ϕ,

− −
− − 1 2 θ−ϕ
Ψ (P↑ (θ) ⊗ P↑ (ϕ)) Ψ = Ψ (P↓ (θ) ⊗ P↓ (ϕ)) Ψ = sin
,
2 2
√
where |Ψ− i := (|01i − |10i)/ 2, and P↑ (θ), P↑ (ϕ), P↓ (θ), and P↓ (ϕ) are defined by (131–132).

207
A Final Exam

Do all problems. Hand in your anwsers in at my office or in my mailbox by 5:00 pm on Wednesday,

May 9. The same ground rules for the midterm apply here: any resources are at your disposal
except discussion with humans other than me about the exam.
All questions with Roman numerals carry equal weight, but may not be of equal difficulty.

I) (A linear algebraic inequality) Suppose that A is any operator. Show that A > 0 if and only if
tr(PA) > 0 for all projectors P. EXTRA CREDIT: Show that A > 0 if and only if tr(PA) 6 tr A
for all projectors P. [Hint: The extra credit statement is a corollary of the previous statement.]

II) (A circuit identity) Look at Bob’s phase-error recovery circuit for the Shor code in Figure 16.
Show that the following alternative circuit does exactly the same thing:

b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i H H b1

|0i H H b2

Find a similar alternative for Bob’s phase-error recovery circuit in Figure 14.

III) (The square root of SWAP)

(a) Show that if V is any unitary operator, then there exists a (not necessarily unique)
unitary U such that U2 = V. [Hint: All unitary operators are normal.]
(b) Find a two-qubit unitary U such that U2 = SWAP. The U that you find should fix the
vectors |00i and |11i.
√ √
This U is sometimes written as SWAP. It can be shown that SWAP, among many other
two-qubit gates, is (by itself) universal for quantum computation. Also, there is currently
some hope of implementing it flexibly using superconducting Josephson junctions.

208
IV) (Generalized Pauli gates and the QFT) For n > 0, let Xn and Zn be n-qubit unitary operators
such that, for all x ∈ Z2n ,

Xn |xi = |(x + 1) mod 2n i,

Zn |xi = en (x)|xi,

recalling that en (x) := exp(2πix/2n ). Xn and Zn are n-qubit generalizations of the Pauli X
and Z gates, respectively.
(a) What are X∗n Zn Xn and Z∗n Xn Zn ? (Just show how each behaves on |xi for x ∈ Z2n .)
(b) Draw an n-qubit quantum circuit that implements Zn using only single-qubit condi-
tional phase-shift gates P(θ) for various θ.
(c) Show that Xn and Zn are unitarily conjugate via QFTn .
(d) What are the eigenvalues and eigenvectors of Xn ?
V) (The Schmidt Decomposition) You may either take the following on faith or read a proof of it in
the textbook. (The Schmidt Decomposition is actually just the Singular Value Decomposition
(Theorem B.9 in Section B.3) in disguise.)

Theorem A.1 (Schmidt Decomposition) Let H and J be Hilbert spaces, and let |ψi ∈ H ⊗ J be
any unit vector. There exists an integer k > 0, pairwise orthogonal unit vectors |e1 i, . . . , |ek i ∈ H
P
and |f1 i, . . . , |fk i ∈ J, and positive values λ1 > · · · > λk > 0 such that k 2
j=1 λj = 1 and

X
k
|ψi =

λj (ej ⊗ fj ). (133)
j=1

|Φ− i := (|00i − |11i)/ 2 in terms of the two single-qubit spaces.

(b) Suppose |ψi (given by Equation (133)) is projectively measured using the projectors
Pk

IH ⊗ |f1 ihf1 |, . . . , IH ⊗ |fk ihfk |, and IH ⊗ IJ − j=1 fj fj , where IH and IJ are the
identity operators in L(H) and L(J), respectively. The last projector corresponds to the
default “none of the above” outcome. In terms of the λj , what is the probability of each
of the k + 1 outcomes? What is the post-measurement state for each possible outcome?
(c) It is implicit in the book’s discussion on page 109 that k, the λj , and the Schmidt basis
are unique, but they never come out and say it explicitly. Explain briefly why k and
λ1 , . . . , λk are uniquely determined by |ψi. [Hint: Consider the density operator |ψihψ|
and trace out one of the spaces.]
(d) Show that the Schmidt basis is not necessarily uniquely determined by |ψi. Do this by
finding a Schmidt basis for |Φ+ i that is different from the one you found above. (Two
Schmidt bases are considered the same if they are identical up to re-ordering and phase
factors.)

209
VI) (Logical Pauli gates for the Shor code) Recall the nine-qubit Shor code defined by Equa-
tions (108) and (109).

(a) Show that the operator Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 (i.e., a Pauli Z gate applied to each of
the nine qubits) implements the logical Pauli X gate XS , such that XS |0S i = |1S i and
XS |1S i = |0S i.
(b) Find an operator that implements the logical Pauli Z gate ZS , such that ZS |0S i = |0S i
and ZS |1S i = −|1S i.

210
B Background Results
Abstract
These results are background to the course CSCE 790S/CSCE 790B, Quantum Computation
and Information (Spring 2007 and Fall 2011). Each result, or group of related results, is roughly
one page long.

B.1 The Cauchy-Schwarz Inequality

This is one of the most versatile inequalities in all of mathematics.

Theorem B.1 (Cauchy-Schwarz) For any real numbers a1 , . . . , an and b1 , . . . , bn ,

q
|a1 b1 + · · · + an bn | 6 (a21 + · · · + a2n )(b21 + · · · + b2n ) , (134)

with equality holding iff the two vectors (a1 , . . . , an ) and (b1 , . . . , bn ) are linearly dependent.

Proof. There are many, many ways of proving this. Here is a direct calculation. We have
X
0 6 (ai bj − aj bi )2
16i<j6n
X
= [ai bj (ai bj − aj bi ) − aj bi (ai bj − aj bi )]
i<j
X
= [ai bj (ai bj − aj bi ) + aj bi (aj bi − ai bj )]
i<j
X X
= ai bj (ai bj − aj bi ) + aj bi (aj bi − ai bj )
i<j i<j
X X
= ai bj (ai bj − aj bi ) + ai bj (ai bj − aj bi )
i<j j<i
X
= ai bj (ai bj − aj bi )
i,j
X
= ai bj (ai bj − aj bi )
i,j
X X
= a2i b2j − ai bi aj bj
i,j i,j
!  !2
X
n X
n X
n
= a2i  b2j  − ai b i .
i=1 j=1 i=1

P
Adding ( i ai bi )2 to both sides then taking the square root of both sides (noting that the square
root function is strictly monotone increasing) yields the inequality (134). Clearly, equality holds
above iff ai bj − aj bi = 0 for all i < j, or equivalently, ai bj = aj bi for all i < j. It is not hard to
check that this condition is equivalent to (a1 , . . . , an ) and (b1 , . . . , bn ) being linearly dependent. 2

211
Note that (134) still holds if we remove the absolute value delimiters from the left-hand side.
In that case, equality holds iff there exists a λ > 0 such that either (a1 , . . . , an ) = λ(b1 , . . . , bn ) or
(b1 , . . . , bn ) = λ(a1 , . . . , an ).

Corollary B.2 (Triangle Inequality for Complex Numbers) For any z, w ∈ C, |z + w| 6 |z| + |w|.

Proof. Writing z = a1 + a2 i and w = b1 + b2 i for real a1 , a2 , b1 , b2 , we have

|z + w|2 = (a1 + b1 )2 + (a2 + b2 )2 = a21 + a22 + b21 + b22 + 2(a1 b1 + a2 b2 )

q
6 a21 + a22 + b21 + b22 + 2 (a21 + a22 )(b21 + b22 )
q q 2
= 2 2
a1 + a2 + b1 + b22 2

= (|z| + |w|)2 .

Taking the square root of both sides yields the corollary. 2

Corollary B.3 For any complex numbers z1 , . . . , zn and w1 , . . . , wn ,

q
∗ ∗
|z1 w1 + · · · + zn wn | 6 (|z1 |2 + · · · + |zn |2 )(|w1 |2 + · · · + |wn |2 ) . (135)

Proof. We have

|z∗1 w1 + · · · + z∗n wn | 6 |z∗1 w1 | + · · · + |z∗n wn | (by Corollary B.2)

= |z1 ||w1 | + · · · + |zn ||wn |
q
6 (|z1 |2 + · · · + |zn |2 )(|w1 |2 + · · · + |wn |2 ) . (by Theorem B.1)

Corollary B.4 For any column vectors u, v ∈ Cn ,

|hu, vi| 6 kukkvk .

B.2 The Schur Triangular Form and the Spectral Theorem

Theorem B.5 (Schur Triangular Form) For every n × n matrix M, there exists a unitary U and an
upper triangular T (both n × n matrices) such that M = UT U∗ .

Proof. We prove this by induction on n. The n = 1 case is trivial. Now supposing the theorem
holds for n > 1, we prove it holds for n + 1. Let M be any (n + 1) × (n + 1) matrix. We let A be the
linear operator on Cn+1 whose matrix is M with respect to some orthonormal basis. A has some
eigenvalue λ with corresponding unit eigenvector v. Using the Gram-Schmidt procedure, we can

212
find an orthonormal basis {y1 , . . . , yn+1 } for Cn+1 such that y1 = v. With respect to this basis, the
matrix for A looks like
w∗
 
λ
 
N= ,
 0 N 0 

where w is some vector in Cn and N 0 is an n × n matrix. Since M and N represent the same
operator with respect to different orthonormal bases, they must be unitarily conjugate, i.e., there is
a unitary V such that M = VNV ∗ . N 0 is an n × n matrix, so we apply the inductive hypothesis to
get a unitary W 0 and an upper triangular T 0 (both n × n matrices) such that N 0 = W 0 T 0 W 0∗ . Now
we can factor N:
w∗ w∗ W 0
     
λ 1 0 λ 1 0
 = WT W ∗ ,
     
N=  0 0 0 0∗
=
  0 0

 0 0

 0 0∗
W T W W T W 

where
w∗ W 0
   
1 0 λ
   
W=  and T = .
 0 W0   0 T0 

T is clearly upper triangular, and it’s easily checked that WW ∗ = I, using the fact that W 0 is unitary.
Thus W is unitary, and we get M = VNV ∗ = VWT W ∗ V ∗ = UT U∗ , where U = VW is unitary. 2
A Schur basis for an operator A is an orthonormal basis that gives an upper triangular matrix
for A.

Theorem B.6 If an n × n matrix A is both upper triangular and normal, then A is diagonal.

Proof. Suppose that A is upper triangular and normal, but not diagonal. Then there is some i < j
such that [A]ij , 0. Let j be least such that there exists i < j such that [A]ij , 0. For this i and j, we
get
X
n X
n X
n X
n
2
[AA∗ ]ii = [A]ik [A∗ ]ki = [A]ik [A]∗ik = |[A]ik |2 = |[A]ik |2 > |[A]ii |2 +[A]ij > |[A]ii |2 .

k=1 k=1 k=1 k=i

(136)
The last inequality follows from the fact that [A]ij , 0. Similarly,
X
n X
n X
i
[A∗ A]ii = [A]∗ki [A]ki = |[A]ki |2 = |[A]ki |2 = |[A]ii |2 . (137)
k=1 k=1 k=1
The next to last equation holds because A is upper triangular, and the last equation holds because
of our minimum choice of j and the fact that i < j. From (136) and (137), we have [AA∗ ]ii > [A∗ A]ii .
But A is normal, so these two quantities must be equal. From this contradiction we get that A must
be diagonal. 2

Corollary B.7 (Spectral Theorem for Normal Operators) Every normal matrix is unitarily conjugate
to a diagonal matrix. Equivalently, every normal operator has an orthonormal eigenbasis.

213
B.3 The Polar and Singular Value Decompositions

Theorem B.8 (Polar Decomposition) For every n × n matrix A there are is an n × n unitary matrix U
and a unique n × n matrix H such that H > 0 and A = UH. In fact, H = |A|.

Proof. First uniqueness. If A = UH with U unitary and H > 0, then

√ √ √
|A| = A∗ A = H∗ U∗ UH = H∗ H = |H| = H.

Now existence. Let {e1 , . . . , en } be the standard orthonormal basis for Cn . We first prove the
special case where |A| is the diagonal matrix diag(s1 , s2 , . . . , sn ) for some real values s1 > s2 >
· · · > sn > 0. Let 0 6 k 6 n be largest such that sk > 0 (k = 0 if |A| = 0). Thus we have

D 0
|A| = ,
0 0

where D is the k × k nonsingular matrix diag(s1 , . . . , sk ). If j > k, then |A|ej = 0, and thus
2
0 = |A|ej = |A|2 ej = A∗ Aej , whence ∗

Aej = Aej , Aej = ej , A Aej = ej , 0 = 0, and so

Aej = 0. This means that A = B 0 , where B is some n × k matrix, and the last n − k columns
of A are 0. We have
∗ ∗ 2
B B 0 B ∗ D 0
B 0 = A A = |A| = 2

= ,
0 0 0 0 0

and so B∗ B = D2 . Let W be an n × (n − k) matrix whose columns are unit vectors orthogonal to

all the columns of B and to each other. (There are many possibilities for W if k < n; the columns
of W can be any orthonormal set in the orthogonal complement of the space spanned by the
columns ∗ ∗ ∗
of−1B.) Byour choice of W, we have B W = 0, W B = 0, and W W = I. −1
Finally, define
U := BD W . We claim that U is unitary and that A = U|A|. Noting that D is Hermitean,
we have
−1 ∗
D B BD−1 D−1 B∗ W
−1 ∗
∗ D B −1
I 0
U U= BD W = = = I,
W∗ W ∗ BD−1 W∗W 0 I
and therefore U is unitary. We also have

D 0
BD−1

U|A| = W = B 0 = A.
0 0

Now for the general case. Since |A| > 0 (and hence normal), there is a unitary V such that
V|A|V ∗ = diag(s1 , . . . , sn ) for some real values s1 > · · · > sn > 0. Since
√ √
V|A|V ∗ = V A∗ AV ∗ = VA∗ AV ∗ = (VAV ∗ )∗ (VAV ∗ ) = |VAV ∗ |,
p

we see that VAV ∗ satisfies the special case, above, and so there is a unitary U such that VAV ∗ =
U|VAV ∗ |. It follows that

A = V ∗ VAV ∗ V = V ∗ U|VAV ∗ |V = V ∗ UV|A|V ∗ V = V ∗ UV|A|,

which proves the theorem because V ∗ UV is unitary. 2

214
Theorem B.9 (Singular Value Decomposition) For any n × n matrix A there exist unique real values
s1 > s2 > · · · > sn > 0 such that there exist n × n unitary matrices V, W with A = VDW, where
D = diag(s1 , . . . , sn ). Furthermore, s1 , . . . , sn are the eigenvalues of |A|.

The s1 , . . . , sn are known as the singular values of A.

Proof. For uniqueness, if A = VDW as above, then
√ √ √ √
|A| = A∗ A = W ∗ DV ∗ VDW = W ∗ D2 W = W ∗ D2 W = W ∗ DW,

and so the diagonal entries of D must be the eigenvalues of |A|. For existence, the Polar Decom-
position gives a unitary U such that A = U|A|. Since |A| > 0 (and hence is normal), there exists
a unitary Y such that |A| = YDY ∗ , where D = diag(s1 , . . . , sn ) for some s1 > · · · > sn > 0. Then
A = U|A| = UYDY ∗ . Setting V := UY and W := Y ∗ proves the theorem. 2

B.4 Sterling’s Approximation

√
Theorem B.10 (Sterling’s Approximation) n! ∼ 2πn(n/e)n .

Here, f(n) ∼ g(n) means that limn→∞ f(n)/g(n) = 1.

We’ll prove a slightly weaker version of Theorem B.10 that nevertheless suffices for all our
purposes, namely,

Theorem B.11 (Weak Sterling) For all positive integers n,

e √ n n √ n n
√ n 6 n! 6 e n .
2 e e

Proof. We start with an integral approximation. The theorem clearly holds for n = 1, so assume
n > 2. Since the log function is concave downward, we claim that for all i such that 2 6 i 6 n,
Zi
log i + log(i − 1) 1
6 log x dx 6 log i − . (138)
2 i−1 2i

The left-hand side is the area of the trapezoid T1 formed by the points (i − 1, 0), (i, 0), (i, log i), (i −
1, log(i − 1)), and the right-hand side is the area of the trapezoid T2 formed by the points (i −
1, 0), (i, 0), (i, log i), (i − 1, log i − 1/i). Note that T2 ’s upper edge is the tangent line to the curve
y = log x at the point (i, log i). By concavity of log, the region under the curve y = log x in the
interval [i − 1, i] contains T1 and is contained in T2 , hence the inequalities (138).
Pn Pn
Now note that log(n!) = i=1 log i = i=2 log i. Summing (138) from i = 2 to n and
simplifying, we get
Zn
1X1
n
log n
log(n!) − 6 log x dx = n log n − n + 1 6 log(n!) − , (139)
2 1 2 i
i=2

215
R
using the closed form log x dx = x log x − x + C. The sum on the right-hand side of (139) is the
Harmonic series, which satisfies another integral approximation:

X
n Zn
1 dx
> = log n − log 2. (140)
i 2 x
i=2

Equations (139) and (140) yield

log n log n log 2

log n! − 6 n log n − n + 1 6 log(n!) − + ,
2 2 2
and so
log n log 2 log n
n log n − n + 1 + − 6 log n! 6 n log n − n + 1 + . (141)
2 2 2
Taking e to the power of all three quantities in (141) and simplifying, we have
e √ n n √ n n
√ n 6 n! 6 e n
2 e e

as desired. 2

B.5 Inequalities of Markov and Chebyshev

We only consider random variables that are real-valued and over discrete sample spaces. If X is
such a random variable, then we let E[X] and var[X] respectively denote the expected value (mean)
of X and the variance of X.

Theorem B.12 (Markov’s Inequality) Let X be a random variable with finite mean, and suppose X > 0.
For every real c > 0,
E[X]
Pr[X > c] 6 .
c

Proof. Let Ω be the sample space for X. We have

X
E[X] = X(a) Pr[a]
a∈Ω
X X
= X(a) Pr[a] + X(a) Pr[a]
a:X(a)>c a:X(a)<c
X
> X(a) Pr[a]
a:X(a)>c
X
> c Pr[a]
a:X(a)>c
= c Pr[X > c].

Dividing both sides by c proves the theorem. 2

216
Theorem B.13 (Chebyshev’s Inequality) Let X be a random variable with finite mean µ and variance σ2 ,
and let a > 0 be real.
σ2
Pr[ |X − µ| > a ] 6 2 .
a

Proof. We invoke Markov’s Inequality with the random variable Y = (X − µ)2 , letting c = a2 .
Note that Y > 0, E[Y] = σ2 , and Pr[ |X − µ| > a ] = Pr[Y > a2 ]. 2

B.6 Relative Entropy

Let p = (p1 , p2 , . . .) and q = (q1 , q2 , . . .) be two probability distributions over some (finite or
infinite) discrete sample space {1, 2, . . .}. The relative entropy of q with respect to p is defined as
X pi
D(p k q) = pi lg , (142)
qi
i

Where the sum is taken over all i such that pi > 0. If qi = 0 and pi > 0 for some i, then
D(p k q) = ∞. Otherwise, the sum in (142) may or may not converge, but we always have the
following regardless:

Theorem B.14 D(p k q) > 0, with equality holding if and only if p = q.

Proof. We use that fact that log x 6 x − 1 for all x > 0, with equality holding iff x = 1. We have
X pi
D(p k q) = pi lg
qi
i
X qi
= − pi lg
pi
i
1 X qi
= − pi log
log 2 pi
i
1 X

qi
> − pi −1
log 2 pi
i
1 X
= (pi − qi )
log 2
i
X
!
1
= 1− qi
log 2
i
> 0.

It is easy to see that equality holds above if and only if p = q. 2

D(p k q) is also known as the Kullback-Leibler divergence of p and q.
An important special case is when q = (q1 , . . . , qn ) = (1/n, . . . , 1/n) is the uniform distribution
on a sample space of size n (and p = (p1 , . . . , pn ) is arbitrary). In this case, we have

D(p k q) = lg n − H(p1 , . . . , pn ). (143)

217
If (p, 1 − p) and (q, 1 − q) are binary distributions, then we abbreviate D((p, 1 − p) k (q, 1 − q)) by
d(p k q), and we call d(· k ·) the binary relative entropy function. Note that by (143), d(p k 1/2) =
1 − h(p).

B.7 A Standard Tail Inequality

It might be necessary to read Section B.6 before this one. In this section we give an upper bound
on left tail the cumulative distribution function for the binomial distribution.
Let
Pt0 < n p< 1 and let n > 0 be an integer. In this section, we give an upper bound for the
sum i=0 i pi (1 − p)n−i , where t 6 pn. [For example, this sum is the probability of getting at
most t heads among n flips of a p-biased coin (i.e., n identical Bernoulli trials with bias p). The
expected number of heads among n flips is pn, and we want to show that the probability of getting
significantly fewer than pn heads diminishes exponentially with n.]

Theorem B.15 Let n be a positive integer. Let 0 < p < 1 be arbitrary, and set q = 1 − p. If t is an integer
such that 0 6 t 6 pn, then
Xt
n i n−i
pq 6 2−nd(t/n k p) , (144)
i
i=0

where d(· k ·) is the binary relative entropy defined in Section B.6.

Proof. If t = 0, then d(t/n k p) = d(0 k p) = − lg q, and so both sides of (144) equal qn and so the
inequality is satisfied.
Now suppose 0 < t 6 pn. Set λ = t/n, and let µ = 1 − λ. Note that 0 < λ 6 p < 1 and
0 < q 6 µ < 1. Define
pt qn−t
C = t n−t .
λ µ
For any 0 6 i 6 t, we have
t−i µ t−i
i n−i q
pq =C λt µn−t 6 C λt µn−t = Cλi µn−i .
p λ

Therefore, starting with the left-hand side of (144), we get

X
t
n i n−i X
t
n i n−i X
n
n i n−i
pq 6C λµ 6C λµ = C(λ + µ)n = C.
i i i
i=0 i=0 i=0

For the right-hand side of (144), we get

p nλ q nµ p t q n−t
2−nd(t/n k p) = 2−nd(λ k p) = 2n[λ lg(p/λ)+µ lg(q/µ)] = = = C,
λ µ λ µ

which proves the theorem. 2

218

Hiu Yung Wong - Quantum Computing Architecture and Hardware For Engineers - Step by Step-Springer Nature Switzerland (2025)
No ratings yet
Hiu Yung Wong - Quantum Computing Architecture and Hardware For Engineers - Step by Step-Springer Nature Switzerland (2025)
443 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Brass Methods: An Essential Resource for Educators, Conductors, and Students
From Everand
Brass Methods: An Essential Resource for Educators, Conductors, and Students
David Kish
No ratings yet
A Programmers Introduction To Mathematics
No ratings yet
A Programmers Introduction To Mathematics
398 pages
Philip L. Bowers - Lectures On Quantum Mechanics - A Primer For Mathematicians (2020, Cambridge University Press)
100% (2)
Philip L. Bowers - Lectures On Quantum Mechanics - A Primer For Mathematicians (2020, Cambridge University Press)
584 pages
Lecture Notes On Quantum Algorithms
No ratings yet
Lecture Notes On Quantum Algorithms
174 pages
Notes in Descrete Math PDF
No ratings yet
Notes in Descrete Math PDF
399 pages
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Course Notes
No ratings yet
Course Notes
194 pages
Intro To QC Vol 1 Loceff PDF
No ratings yet
Intro To QC Vol 1 Loceff PDF
742 pages
Intro To Quantum Computing
No ratings yet
Intro To Quantum Computing
742 pages
1803 07098 PDF
No ratings yet
1803 07098 PDF
38 pages
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
No ratings yet
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
38 pages
Barak Shoshany PHYS 4P51 Lecture Notes
No ratings yet
Barak Shoshany PHYS 4P51 Lecture Notes
180 pages
Barak Shoshany PHY 256 Lecture Notes
No ratings yet
Barak Shoshany PHY 256 Lecture Notes
165 pages
Premath
No ratings yet
Premath
35 pages
Barak Shoshany PHY 256 Lecture Notes
100% (1)
Barak Shoshany PHY 256 Lecture Notes
167 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
Discrete Mathematics Notes 2023
No ratings yet
Discrete Mathematics Notes 2023
141 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
Qa
No ratings yet
Qa
164 pages
QC Notes
No ratings yet
QC Notes
141 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
No ratings yet
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
170 pages
Quantum Algorithm Childs
No ratings yet
Quantum Algorithm Childs
181 pages
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
Quantitative Algorithms
No ratings yet
Quantitative Algorithms
185 pages
Qclec
No ratings yet
Qclec
260 pages
Intro To Quantum Computing - Aaronson
No ratings yet
Intro To Quantum Computing - Aaronson
259 pages
Qcnotes PDF
No ratings yet
Qcnotes PDF
218 pages
Script QI
No ratings yet
Script QI
199 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
Quantum Computing Fundamentals 1nbsped 0136793819 9780136793816 9780137460328 9780136793830 - Compress
100% (1)
Quantum Computing Fundamentals 1nbsped 0136793819 9780136793816 9780137460328 9780136793830 - Compress
382 pages
Discrete Mathematics: Haluk Bingol February 21, 2012
No ratings yet
Discrete Mathematics: Haluk Bingol February 21, 2012
157 pages
Qit 18
No ratings yet
Qit 18
141 pages
MATH0005 Lecture Notes
No ratings yet
MATH0005 Lecture Notes
128 pages
Notes 11 20
No ratings yet
Notes 11 20
10 pages
Linear Algebra - Alder
No ratings yet
Linear Algebra - Alder
177 pages
Quantum Mechanics Courses 2021
No ratings yet
Quantum Mechanics Courses 2021
38 pages
Fdocuments - in Tfcsnorman L Biggs Discrete Mathematics Second Edition
No ratings yet
Fdocuments - in Tfcsnorman L Biggs Discrete Mathematics Second Edition
398 pages
Discrete Notes
No ratings yet
Discrete Notes
390 pages
Notes PDF
No ratings yet
Notes PDF
390 pages
MATH0005 Lecture Notes
No ratings yet
MATH0005 Lecture Notes
127 pages
Aprendendo AL Com ISETL
No ratings yet
Aprendendo AL Com ISETL
401 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
Theory of Computation-Lecture Notes: Michael Levet August 27, 2019
No ratings yet
Theory of Computation-Lecture Notes: Michael Levet August 27, 2019
119 pages
1403 7050
No ratings yet
1403 7050
164 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Lock 'n Load Tactical Core Rules v5.0
From Everand
Lock 'n Load Tactical Core Rules v5.0
David Heath
No ratings yet
Aaronson - 2016 - The Complexity of Quantum States and Transformations From Quantum Money To Black Holes
No ratings yet
Aaronson - 2016 - The Complexity of Quantum States and Transformations From Quantum Money To Black Holes
111 pages
Shabani J. First Step To Quantum Computing. A Practical Guide For Beginners 2024
No ratings yet
Shabani J. First Step To Quantum Computing. A Practical Guide For Beginners 2024
302 pages
Taming The Delayed Choice Quantum Eraser: Johannes Fankhauser
No ratings yet
Taming The Delayed Choice Quantum Eraser: Johannes Fankhauser
19 pages
Module 5 - Quantum Computing-01
No ratings yet
Module 5 - Quantum Computing-01
7 pages
Jism
No ratings yet
Jism
126 pages
Quantum Mechanics: School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
No ratings yet
Quantum Mechanics: School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
7 pages
Instant Download Logic and Algebraic Structures in Quantum Computing 1st Edition Jennifer Chubb PDF All Chapters
No ratings yet
Instant Download Logic and Algebraic Structures in Quantum Computing 1st Edition Jennifer Chubb PDF All Chapters
55 pages
Open Quantum Systems: Bassano Vacchini
100% (1)
Open Quantum Systems: Bassano Vacchini
436 pages
Lectures On Quantum Mechanics With Problems Exercises and Solutions 3rd Edition Jeanlouis Basdevant Instant Download
No ratings yet
Lectures On Quantum Mechanics With Problems Exercises and Solutions 3rd Edition Jeanlouis Basdevant Instant Download
85 pages
Noc17-Ph05 Week 06 Assignment 01
No ratings yet
Noc17-Ph05 Week 06 Assignment 01
5 pages
Quantum Teleportation
No ratings yet
Quantum Teleportation
26 pages
Quantum Technology For Economists
No ratings yet
Quantum Technology For Economists
107 pages
An Introduction To Quantum Optics Photon and Biphoton Physics Shih - Download The Ebook Now and Read Anytime, Anywhere
100% (1)
An Introduction To Quantum Optics Photon and Biphoton Physics Shih - Download The Ebook Now and Read Anytime, Anywhere
74 pages
A Lie Algebraic Theory of Barren Plateaus For
No ratings yet
A Lie Algebraic Theory of Barren Plateaus For
10 pages
What Is A State in Quantum Mechanics?: Roger G. Newton
No ratings yet
What Is A State in Quantum Mechanics?: Roger G. Newton
3 pages
Critique of "Quantum Enigma: Physics Encounters Consciousness"
No ratings yet
Critique of "Quantum Enigma: Physics Encounters Consciousness"
24 pages
Advancing Quantum Machine Learning From Theoretical Concepts To Experimental Implementations
No ratings yet
Advancing Quantum Machine Learning From Theoretical Concepts To Experimental Implementations
39 pages
2019 Spring Course List ERICA
No ratings yet
2019 Spring Course List ERICA
28 pages
PHY 212-Quantum Mechanics 1-Muhammad Sabieh Anwar
No ratings yet
PHY 212-Quantum Mechanics 1-Muhammad Sabieh Anwar
4 pages
Schrodinger 1935 Cat
No ratings yet
Schrodinger 1935 Cat
17 pages
Quantum COMP
No ratings yet
Quantum COMP
6 pages
Insights Into Quantum Support Vector Machine
No ratings yet
Insights Into Quantum Support Vector Machine
27 pages
Quantum Mechanics - Wikipedia
No ratings yet
Quantum Mechanics - Wikipedia
40 pages
High-Level Quantum Programming With Quantum Walks: by H Ector J. Garc Ia
No ratings yet
High-Level Quantum Programming With Quantum Walks: by H Ector J. Garc Ia
81 pages
Foundations of Quantum Mechanics
No ratings yet
Foundations of Quantum Mechanics
9 pages
Preskill Quantum Computing
No ratings yet
Preskill Quantum Computing
481 pages
Quantum Computing For Energy Economics A Comprehensive Learning Path
No ratings yet
Quantum Computing For Energy Economics A Comprehensive Learning Path
5 pages
30 1 Online
No ratings yet
30 1 Online
7 pages