Course Notes
Course Notes
Stephen A. Fenner∗
Computer Science and Engineering Department
University of South Carolina
September 7, 2021
Abstract
These notes are mainly for me to lecture with, but you may find them useful to see what was
covered when. All exercises are due one week from when they are assigned. These notes are
subject to change during the semester. The date shown above is the date of the latest version.
∗
Columbia, SC 29208 USA. E-mail: [email protected]. This material is based upon work supported by the National
Science Foundation under Grant Nos. CCF-0515269 and CCF-0915948. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the author and do not necessarily reflect the views of the
National Science Foundation (NSF).
1
Contents
1 Week 1: Overview 8
Brief, vague history of quantum mechanics, informatics, and the combination
of the two. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Implementations of Quantum Computers (the Bad News). . . . . . . . . . . . 9
Implementations of Quantum Cryptography (the Good News). . . . . . . . . 9
2 Week 1: Preliminaries 9
Just Enough Linear Algebra to Understand Just Enough Quantum Mechanics. 9
The Complex Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The Exponential Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Adding and Multiplying Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . 12
The Identity Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Nonsingular Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Hilbert Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Orthogonality and Normality. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Week 2: Preliminaries 17
Linear Transformations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . 17
Adjoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Polarization Identities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Gram-Schmidt Orthonormalization. . . . . . . . . . . . . . . . . . . . . . . . . 19
Hermitean and Unitary Operators. . . . . . . . . . . . . . . . . . . . . . . . . . 20
L(H) is a Hilbert space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Week 2: Preliminaries 22
Dirac Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Change of (Orthonormal) Basis. . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Unitary Conjugation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Back to Quantum Physics: The Double Slit Experiment. . . . . . . . . . . . . . 24
2
5 Week 3: Unitary conjugation 25
Invariance under Unitary Conjugation: Trace and Determinant. . . . . . . . . 25
Orthogonal Subspaces, Projection Operators. . . . . . . . . . . . . . . . . . . . 25
Fundamentals of Quantum Mechanics. . . . . . . . . . . . . . . . . . . . . . . 29
Physical Systems and States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Time Evolution of an Isolated System. . . . . . . . . . . . . . . . . . . . . . . . 29
Projective Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Week 4: Qubits 34
Qubits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Back to Electron Spin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3
Quantum Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4
A Hilbert Space Is a Metric Space. . . . . . . . . . . . . . . . . . . . . . . . . . 114
5
26 Week 13: Quantum error correction 164
Quantum Error Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
The Quantum Bit-Flip Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
The Quantum Phase-Flip Channel. . . . . . . . . . . . . . . . . . . . . . . . . . 169
The Shor Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6
B.6 Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
B.7 A Standard Tail Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7
1 Week 1: Overview
Brief, vague history of quantum mechanics, informatics, and the combination of the two.
Quantum Theory The foundations of quantum mechanics were established “by committee”:
Niels Bohr, Albert Einstein, Werner Heisenberg, Erwin Schrödinger, Max Planck, Louis
de Broglie, Max Born, John von Neumann, Paul A.M. Dirac, Wolfgang Pauli, and others
over the first half of the 20th century. The theory provides extremely accurate descriptions
of the world at the atomic and subatomic levels, where “classical” (i.e., Newtonian) physics
and electrodynamics break down. Examples: stability of atoms, black body radiation, sharp
spectral absorption lines, etc.
Informatics Broadly, this is the study of all aspects of information—its storage, transmission, and
manipulation (i.e., computation). It includes what is commonly called Computer Science in
the US, as well as Information Theory. Foundations of Computer Science were laid at about
the same time as quantum mechanics by Gottlob Frege, David Hilbert, Alonzo Church,
Haskell Curry, Kurt Gödel, John Barkley Rosser, Alan Turing, Jacques Herbrand, Emil Post,
Stephen Kleene and others, who were developing a formal notion of “algorithm” or “effec-
tive procedure” to understand problems in the foundations of mathematics. Foundations of
computability culminated in the Church-Turing thesis. Largly independently, the field of In-
formation Theory started in 1948 with Claude Shannon’s paper, “A Mathematical Theory of
Communication.” Information theory deals with quantifying information and understand-
ing how it can be stored and transmitted, both securely and otherwise. Shannon defined
the notion of information entropy, somewhat analogously to physical entropy, and proved
engineering-related results about compression and noisy transmission that are in common
use today.
Quantum Information and Computation The physicist Richard Feynman first suggested the idea
of a quantum computer and what it could be used for. Charles Bennett (80s?) showed that
reversible computation (with no heat dissipation or entropy increase) was possible at least
in principle. Paul Benioff (80s) showed how quantum dynamics could be used to simulate
classical (reversible) computation, David Deutsch (80s) defined the Quantum Turing Ma-
chine (QTM) and quantum circuits as theoretical models of a quantum computer. Further
foundational work was done by Bernstein & Vazirani, Yao, and others (quantum complexity
theory). Bennett and Gilles Brassard (1984) proposed a scheme for unconditionally secure
cryptographic key exchange based on quantum mechanical principles, using polarized pho-
tons. Deutsch & Jozsa and Simon (early 90s) gave “toy” problems on which quantum
computers performed provably better than classical ones. A big breakthrough came in the
mid 1990s when Peter Shor showed how a quantum computer can factor large integers
quickly (1994), as well as compute discrete logarithms (these would break the security of
most public key encryption schemes in use today). Grover (1996?) proposed a completely
different quantum algorithm to quadratically speed up list search. Calderbank & Shor and
Steane (1996?) showed that good quantum error-correcting codes exist and that fault-tolerant
quantum computation is possible. This led to the threshold theorem (D. Aharonov, A. Yu. Ki-
taev(?)), which states that there is a constant ε0 > 0 (current rough estimates are around
10−4 ) such that if the noise associated with each gate can be kept below ε0 , then any quantum
8
computation can be carried out with arbitrarily small probability of error. This theorem
shows that noise is not a fundamental impediment to quantum computation.
Implementations of Quantum Computers (the Bad News). There are several proposals for
physical devices implementing the elements of quantum computation. Each has its own strengths
and weaknesses. In recent years, ion traps look the most promising. We’re still far off from a
viable, scalable, robust prototype.
Nuclear Magnetic Resonance (NMR) Quantum bits are nuclei of atoms (hydrogen?) arranged
on an organic molecule. The value of the bit is given by the spin of the nucleus. Nuclear
spins can be controlled by electromagnetic pulses of the right frequency and duration. Main
advantage: spins are well shielded from the outside by the electron clouds surrounding
them, so they stay coherent for a long time. Main disadvantage: since the nuclei need to be
on same molecule to control the distances between them, NMR does not scale well. Homay
Valafar will talk about NMR toward the end of the course.
Ions in traps Qubits are ions kept equally spaced in a row (a couple of inches apart) by an
oscillating electric field. Laser pulses can control the states of the ions.
Quantum dots Qubits are particles (electrons?) kept in nanoscopic wells on the surface of a silicon
chip. Main advantage: easy to control and fabricate (solid state). Main disadvantage: short
decoherence times.
Optical schemes Qubits are polarized photons traveling through mirrors, lenses, crystals, and the
vacuum. Main advantages: photons don’t decay and their polarizations are easy to measure;
computation is at the speed of light. Main disadvantage: hard to get photons to interact with
each other.
Superconducting/Josephson junctions I don’t know much about this, except that it presumably
needs temperatures close to absolute zero.
Implementations of Quantum Cryptography (the Good News). Quantum crypto not only
works in the real world, but works just fine on fiber optic networks already in place. British
Telecomm (mid 1990s?) demonstrated the BB84 quantum key exchange protocol using cable laid
across Lake Geneva in Switzerland. I believe the scheme has also been demonstrated to work
with photons through the air over modest distances (a few kilometers?). It is now feasible to use
the fiber optic cable already in place to implement quantum crypto in the network of a major city
(New York banks are already using it(?)). It still won’t work over really large distances without
classical repeaters (“quantum amplification” is theoretically impossible).
2 Week 1: Preliminaries
Just Enough Linear Algebra to Understand Just Enough Quantum Mechanics. We let Z denote
the set of integers, Q denote the set of rational numbers, R denote the set of real numbers, and C
denote the set of complex numbers.
9
The Complex Numbers. C is the set of all numbers of the form z = x + iy, where x, y ∈ R and
i2 = −1. We often represent z as the point (x, y) in the plane. The complex conjugate (or adjoint) of z
is
z∗ = z = x − iy.
Note that x = (z + z∗ )/2 and is the real part of z (<(z)). Similarly, y = (z − z∗ )/2i is the imaginary
part of z (=(z)). The norm or absolute value of z is
√ p
|z| = z∗ z = x2 + y2 > 0,
with equality holding iff z = 0. If z1 , z2 ∈ C, it’s easy to check that |z1 z2 | = |z1 | · |z2 |. It’s not quite
so easy to check that
|z1 + z2 | 6 |z1 | + |z2 | , (1)
but see Corollary B.2 in Section B.1 for a proof. (1) is an example of a triangle inequality.
Exercise 2.1 Let z := 3 − 7i and w := −1 + 2i. Find (a) z + w, (b) zw, (c) |z|, (d) z∗ , and (e) 1/w.
Exercise 2.2 Check that (z1 z2 )∗ = z∗1 z∗2 and (z1 + z2 )∗ = z∗1 + z∗2 and (−z1 )∗ = −z∗1 for all z1 , z2 ∈ C.
Express each answer in the form x + iy for real x, y.
If z , 0, then the argument of z (arg(z)) is defined as the angle that z makes with the positive real
axis. Our convention will be that 0 6 arg(z) < 2π. It is known that arg(z1 z2 ) = arg(z1 ) + arg(z2 )
up to a multiple of 2π.
The real numbers R forms a subset of C consisting of those complex numbers with 0 imaginary
part, namely,
R = {z ∈ C : z = z∗ }.
The unit circle in C is the set of all z of unit norm, i.e., {z ∈ C : |z| = 1}.
C is an algebraically closed field. That is, every polynomial of positive degree with coefficients
in C has a root in C, in fact n of them, where n is the degree of the polynomial. This is equivalent
to saying that every polynomial over C is a product of linear (i.e., degree 1) factors. This fact is
known as the Fundamental Theorem of Algebra.
Every polynomial over R can be factored into real polynomial factors of degrees 1 and 2. This
implies that any odd-degree real polynomial has at least one real root.
The Exponential Map. For any z, we can define ez = exp(z) by the usual power series:
z2 z3 zk
ez = 1 + z + + + ··· + + ···, (2)
2! 3! k!
which converges for all z.
Here are some essential properties of the exponential map on C:
10
• e0 = 1.
• e−z = 1/ez .
• ez , 0.
Exercise 2.4 Using Euler’s formula (from the previous exercise), find eiπ/2 and e−iπ/3 . Express
each answer in the form x + iy for real x, y.
By Exercise 2.3, we have ez = ex (cos y + i sin y). The unit circle is the set {eiθ : θ ∈ R}.
Vector Spaces. We’ll deal with finite dimensional vector spaces only. Much of quantum mechan-
ics requires infinite dimensional spaces, but thankfully, the QM that relates to information and
computation only requires finite dimensions. So all our vector spaces are finite dimensional.
Our vector spaces will usually be over C, the field of complex numbers, but sometimes they
will be over R (i.e., real vector spaces), and when we do information theory, will need to look at
bit vectors (vectors in spaces over the two-element field Z2 = {0, 1}).
In a vector space, vectors can be added to each other and multiplied by scalars, obeying the
usual rules. If V is an n-dimensional vector space and B = {b1 , . . . , bn } is a basis for V, then every
v ∈ V is written as a linear combination of basis vectors:
v = a1 b1 + · · · + an bn ,
where a1 , . . . , an are unique scalars. Thus we can identify the vector v with the n-tuple
a1
..
. ,
an
which we may also write as (a1 , . . . , an ). Under this identification, vector addition and scalar
multiplication are componentwise.
The vector (0, . . . , 0) is the zero vector, denoted by 0.
11
Matrices. For integers m, n > 0, an m × n matrix is a rectangular array of scalars with m rows
and n columns. If A is such a matrix and 1 6 i 6 m and 1 6 j 6 n, we denote the (i, j)th entry of
A (i.e., the scalar in the ith row and jth column) as [A]ij or A[i, j]. The former notation is useful if
the matrix is given by a more complicated expression.
A matrix A is upper triangular if [A]ij = 0 whenever i > j. A is lower triangular if [A]ij = 0
whenever i < j. A is triangular if A is either upper or lower triangular. If A is both upper and
lower triangular, then we can say that A is a diagonal matrix. In this case, all nonzero entries of
A must lie on the main diagonal. Triangular matrices have some nice properties that make them
simple to work with in some cases.
Adding and Multiplying Matrices. Given positive integers m and n and m × n matrices A and
B, we can define the matrix sum A + B to be the unique m × n matrix satisfying
for all 1 6 i 6 m and 1 6 i 6 n. That is, one just adds corresponding entries in A and B for
the corresponding entry in the sum. For this to be well defined, A and B must have the same
dimensions, in which case we say that A and B are conformant (for matrix addition); otherwise,
A + B is undefined. If k is a scalar, we can define the scalar multiplication of k with A as the unique
m × n matrix kA satisfying
[kA]ij = k[A]ij
for all i and j as above. One just multiplies each entry of A by k to get the corresponding entry of
kA. One can also write Ak for the same matrix. As you may expect, we write −A for (−1)A and
write A − B for A + (−B). If k , 0, we can also write A/k for (1/k)A.
For positive integers m, n, and s, suppose A is an m × s matrix and B is an s × n matrix. Then
we define the matrix product of A and B as the unique m × n matrix AB satisfying
X
s
[AB]ij = [A]ik [B]kj
k=1
for all 1 6 i 6 m and 1 6 j 6 n. Note that the number of columns of A must equal the number of
rows of B for the product to be well-defined, in which case we say that A and B are conformant (for
matrix multiplication).
Most of the usual laws of addition and multiplication of scalars extend to matrices. In each
identity below, we use A, B, C to stand for arbitrary matrices, I for a unit matrix, and k and ` for
any scalars. For each identity, one side is well-defined if and only if the other side is well-defined.
Commutativity of matrix +: A + B = B + A.
Matrix negation: A − A = 0.
12
Distributivity of scalar × over matrix +: k(A + B) = kA + kB.
There is no commutative law for matrix multiplication; that is, it is not generally true that AB = BA
for matrices A and B, even if both sides are well-defined. If this equation does hold, then we say
that A and B commute.
Exercise 2.6 Find two 2 × 2 matrices A and B such that AB = 0 (the zero matrix), but BA , 0.
The Identity Matrix. For any n, the n × n identity matrix or unit matrix In has 1’s on its main
diagonal and 0’s everywhere off the diagonal, so
1 if i = j,
[In ]ij = δij =
0 if i , j.
This equation also defines the expression δij , which is called the Kronecker delta. For example,
1 0 0
I3 = 0 1 0 .
0 0 1
Unit matrices have the property that for any matrix A (say, p × q),
Ip A = AIq = A .
We may drop the subscript (I instead of In ) if the dimension is clear from the context.
Nonsingular Matrices. A square matrix A is nonsingular or invertible iff there exists a matrix B of
the same dimensions such that AB = BA = I. Such a B, if it exists, is uniquely determined by A
and is denoted A−1 . In this case, it is of course true that B is also nonsingular and that B−1 = A.
Determinant. For an n × n matrix A, the determinant of A, denoted det A, is a scalar value that
depends on the entries and their positions inside A. A compact expression for the determinant is
beyond the scope of this course, and besides, we won’t deal with it very much, except to define
eigenvalues and eigenvectors. But at least for the record we can say (without proof) that the map
det mapping n × n matrices to scalars is the unique map satisfying the following two properties:
13
1. det(AB) = (det A)(det B) for all n × n matrices A and B.
One fundamental fact about the determinant is that a matrix A is nonsingular if and only if
det A , 0.
Trace. If A is an n × n matrix, the trace of A (denoted tr A) is defined as the sum of all the diagonal
elements of A, i.e.,
X
n
tr A = [A]ii .
i=1
The trace has three fundamental properties:
2. tr(A + aB) = tr A + a tr B, for n × n matrices A and B and scalar a. (The trace is linear.)
In fact, tr is the only function from n × n matrices to scalars that satisfies (1)–(3) above.
Exercise 2.8 Show that for any integers m, n > 1, if A is an n × m matrix and B is an m × n matrix,
then
tr(AB) = tr(BA) . (4)
This verifies item (3) above about the trace. We will use this fact frequently.
Hilbert Spaces. A vector space H over C is a Hilbert space if it has a scalar product h·, ·i : H × H →
C that behaves as follows for all u, v, w ∈ H and a ∈ C:
Note that (2) implies that hu, ui ∈ R, so (3) merely asserts that it can’t be negative. Also note
that (1) and (2) imply that hv + aw, ui = hv, ui + a∗ hw, ui, i.e., h·, ·i is conjugate linear in the first
argument. Such a scalar product is called a Hermitean form or a Hermitean inner product.
The norm of a vector u ∈ H is defined as kuk = hu, ui. Note that by (3), k0k = 0 and kuk > 0
p
if u , 0.
Exercise 2.9 Show that for any u ∈ H and any a ∈ C, kauk = |a|kuk.
14
Example. We consider the vector space Cn of all n-tuples of complex numbers (for some n > 0),
where vector addition and scalar multiplication are componentwise, i.e.,
u1 v1 u 1 + v1 u1 au1
.. .. ..
. + . = and a ... = ... .
.
un vn u n + vn un aun
We define the Hermitean inner product for all vectors u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) as
X
n
hu, vi = u∗1 v1 + · · · + u∗n vn = u∗i vi .
i=1
In this example, u and v can be expressed as linear combinations over the “standard” basis
{e1 , . . . , en }, where
0
..
.
0
1 ,
ei = (5)
0
..
.
0
where the 1 occurs in the ith row.
Exercise 2.10 Check that the three properties of a Hermitean form are satisfied in this example.
Note that if we restrict the ui and vi to be real numbers, then this is just the familiar dot product
of two real vectors. Also note that in this example,
q q
kuk = hu, ui = u1 u1 + · · · + un un = |u1 |2 + · · · + |un |2 .
p
∗ ∗
Orthogonality and Normality. In a genuine sense, the example above is the only example that
really matters. First some more definitions. Two vectors u, v in a Hilbert space H are orthogonal
or perpendicular if hu, vi = 0. A vector u is a normal or a unit vector if kuk = 1. A set of vectors
v1 , . . . , vk ∈ H is an orthogonal set if different vectors are orthogonal. The set is an orthonormal set if,
in addition, each vector is a unit vector. That is, for all 1 6 i, j 6 k, we have
1 if i = j,
vi , vj = δij =
0 if i , j.
A basis for H is an orthonormal basis if it is an orthonormal set. Orthonormal bases are special
and have nice properties that make them preferable to other bases. From now on we will assume
that all our bases (for Hilbert spaces) are orthonormal unless I say otherwise, and I won’t.
15
In the example above, e1 , . . . , en clearly form an orthonormal basis. We’ll see later that every
Hilbert space has an orthonormal basis—lots of them, in fact. But let’s get back to our example. If
we fix an orthonormal basis B = {β1 , . . . , βn } for a Hilbert space H, then we can write two vectors
u, v ∈ H in terms of B as
X
n X
n
u= ui βi and v= vj βj ,
i=1 j=1
hu, vi = hu1 β1 + · · · + un βn , vi
Xn
= u∗i hβi , vi (conjugate linearity in the first argument)
i=1
X
= u∗i hβi , v1 β1 + · · · + vn βn i
i
X X
n
u∗i
= vj βi , βj (linearity in the second argument)
i j=1
X
u∗i vj
= βi , βj
i,j
X
= u∗i vj δij (the basis is orthonormal)
i,j
X
n
= u∗i vi .
i=1
In other words, hu, vi is exactly the quantity of our example above, if we identify u with the tuple
(u1 , . . . , un ) ∈ Cn and v with the tuple (v1 , . . . , vn ) ∈ Cn .
Exercise 2.11 Show that any orthogonal set of nonzero vectors is linearly independent. [Hint: Let
v be any linear combination of such vectors, and consider hv, vi. You’ll need the fact that h·, ·i is
positive definite.]
16
3 Week 2: Preliminaries
Linear Transformations and Matrices. Let U and V be vector spaces. A linear map is a function
T : U → V such that, for all vectors u, v ∈ U and scalar a,
T (u + av) = T u + aT v.
The vector addition and scalar multiplication on the left-hand side is in U, and the right-hand side
is in V. If {α1 , . . . , αn } is a basis for U and {β1 , . . . , βm } is a basis for V, then T can be expressed
uniquely in matrix form with respect to these bases: For each 1 6 j 6 n, we write T αj uniquely as
a linear combination of the βi :
X m
T αj = aij βi , (6)
i=1
where each aij is a scalar. Now let A be the m × n matrix whose (i, j)th entry is aij . Expressing
any u ∈ U with respect to the first basis (of U) as
u1
Xn
u= uj αj = ... ,
j=1 un
we get
X
n
Tu = T uj αj
j=1
X
n
= uj T αj (by linearity)
j=1
X X
m
!
= uj aij βi (by (6))
j i=1
X X
= aij uj βi
i j
P
j a1j uj
=
..
P .
j amj uj
a11 ··· a1n u1
.. .. ..
= . . .
am1 · · · amn un
u1
..
= A . ,
un
17
expressed with respect to the second basis (of V). Thus applying T to a vector u amounts to
multiplying the corresponding matrix on the left with the corresponding column vector on the
right.
Conversely, given bases for U and for V, an m × n matrix defines a unique linear map T whose
action on a vector u is given above.
Thus, linear maps and matrices are interchangeable, given bases for the requisite spaces.
Linear maps (with the same domain and codomain) can be added and multiplied by scalars
thus:
(T1 + T2 )u = T1 u + T2 u,
(aT )u = a(T u).
The two equations above define T1 + T2 and aT respectively (a a scalar) by showing how they map
an arbitrary vector u. This makes the set of all such linear maps a vector space in its own right.
If U and V are Hilbert spaces and the {αj } and {βi } are orthonormal bases, then each entry aij
can be expressed as a scalar product in V:
Xm
βi , T αj = βi , a1j β1 + · · · + amj βm = akj hβi , βk i = aij .
k=1
One upshot of this is that a linear map T is completely determined by the quantities βi , T αj for
all i and j.
1. (A∗ )∗ = A.
2. (A + aB)∗ = A∗ + a∗ B∗ . (Here, A and B have the same dimensions, and a ∈ C.)
3. (AB)∗ = B∗ A∗ .
That is, u∗ is a row vector (i.e., a 1 × n matrix), called the dual vector of u. If u = (u1 , . . . , un ) and
v = (v1 , . . . , vn ) are vectors in some Hilbert space, expressed with respect to an orthonormal basis
{α1 , . . . , αn }, then by our previous example we have
X
n
hu, vi = u∗i vi = u∗ v . (7)
i=1
18
Here, we identify the 1 × 1 matrix u∗ v with the scalar comprising its sole entry.
If H and J are Hilbert spaces and T : H → J is linear, then there exists a unique linear map
T ∗ : J → H such that for all u ∈ H and v ∈ J,
hv, T ui = hT ∗ v, ui.
Note that the left-hand side is the scalar product in J, and the right-hand side is the scalar product
in H.
If we pick any orthonormal bases for H and J, then these two definitions of the adjoint coincide.
That is, if T is represented by the matrix A, then T ∗ is represented by the matrix A∗ .
Polarization Identities. Let H and cJ be Hilbert spaces and A, B : H → J linear maps. We have
the following easily verifiable polarization identity: For every x, y ∈ H,
1X
3
(−i)k A(x + ik y), B(x + ik y) .
hAx, Byi = (8)
4
k=0
Equation (8) has a number of interesting special cases. Here are two: If A = B, then we have
1X 1X
3 3
2
(−i)k A(x + ik y), A(x + ik y) = (−i)k
A(x + ik y)
,
hAx, Ayi = (9)
4 4
k=0 k=0
1X 1X
3 3
2
(−i)k x + ik y, x + ik y = (−i)k
x + ik y
.
hx, yi = (10)
4 4
k=0 k=0
Equation (10) is significant because it shows that the inner product on H is completely determined
by the norm itself. Equation (9) implies that if an operator A preserves norms, it must also preserve
inner products.
Gram-Schmidt Orthonormalization. We prefer orthonormal bases for our Hilbert spaces. Here
we show that they actually exist, and in abundance. Let H be an n-dimensional Hilbert space and
let {b1 , . . . , bn } be any basis (not necessarily orthonormal) for H. For i = 1 to n in order, define
X
i−1
xi = bi − hyk , bi iyk
k=1
xi
yi = .
kxi k
19
This is known as the Gram-Schmidt procedure. We’ll see that {y1 , . . . , yn } is an orthonormal basis.
It’s not obvious that the yi are even well-defined, since we need to establish that kxi k in the
denominator is nonzero. We can prove the following facts simultaneously by induction on i for
1 6 i 6 n, that is, assuming that all the facts are true for all j < i, we prove all the facts for i:
2. kyi k = 1.
3. {b1 , . . . , bi }, {x1 , . . . , xi }, and {y1 , . . . , yi } are each linearly independent sets of vectors which
span the same subspace of H.
X
!
yj , xi 1
yj , b i − yj , b i
yj , yi = = yj , b i − hyk , bi i yj , yk = = 0.
kxi k kxi k kxi k
k<i
The second to last equation comes from the fact that yj , yk = δjk for all j, k < i, which is part of
the inductive hypothesis.
It turns out (we won’t prove this) that given a basis b1 , . . . , bn there can only be one unique list
y1 , . . . , yn satisfying all the items (2)–(5) above.
Exercise 3.4 Prove that applying the Gram-Schmidt procedure to a basis that is already orthonor-
mal just results in the same basis.
Definition 3.5 If H and J are Hilbert spaces, we let L(H, J) denote the space of all linear maps
from H to J. We abbreviate L(H, H) by L(H), the space of all linear operators on H, with identity
element I. Note that L(H, J) is a vector space over C.
A map A ∈ J(H) is Hermitean (or self-adjoint) if A∗ = A. A map A is unitary if AA∗ = I
(equivalently, A∗ A = I).
• If A is Hermitean, then hu, Avi = hAu, vi. This follows immediately from the fact that
hu, Avi = hA∗ u, vi.
20
• If A is Hermitean and a is real, then aA is Hermitean.
• If A is Hermitean, then so is A∗ .
• If A is unitary, then hAu, Avi = hu, vi, that is, A preserves the scalar product. To see this, we
just compute
hAu, Avi = hA∗ Au, vi = hIu, vi = hu, vi.
L(H) is a Hilbert space. In Definition 3.5, we mentioned that L(H, J) is a vector space over C. In
fact, its dimension is the product of the dimensions of H and of J: Suppose H has dimension n and
J has dimension m. Given orthonormal bases for each space, an element of L(H, J) corresponds an
m × n matrix. You can think of this matrix as a vector with mn components which just happen to
be arranged in a 2-dimensional array rather than a single column. The vector addition and scalar
multiplication operations on these matrices are componentwise, just as with vectors, so L(H, J)
has dimension mn.
There is a natural inner product that one can define on L(H, J) that makes it into an mn-
dimensional Hilbert space. For all A, B ∈ L(H, J), define
This is known as the Hilbert-Schmidt inner product on L(H, J). It looks similar to the expression
u∗ v for the inner product of vectors u and v (Equation (7)), except that A∗ B is not a scalar but an
operator in L(H), and so we take the trace to get a scalar result.
Exercise 3.6 Show that L(H, J), together with its Hilbert-Schmidt inner product, satisfies all the
axioms of a Hilbert space. [Hint: You can certainly just verify the axioms directly. Alternatively,
represent operators as matrices with respect to some fixed orthonormal bases of H and J, respec-
tively, then show that if A and B are m × n matrices, then tr(A∗ B) is the usual inner product of A
and B on Cmn , where we identify each matrix with the mn-dimensional vector of all its entries.]
The Hilbert-Schmidt inner product interacts nicely with composition of linear maps (or equiv-
alently, matrix multiplication).
Exercise 3.7 Let H, J, and K be Hilbert spaces. Verify directly that for any A ∈ L(H, K), B ∈
L(J, K), and C ∈ L(H, J),
hA, BCi = hB∗ A, Ci = hAC∗ , Bi . (12)
This means that you can move a right or left factor from one side of the inner product to the other,
provided you take its adjoint. [Hint: Use Exercise 2.8 along with basic properties of adjoints. You
may assume that A, B, and C are all matrices of appropriate dimensions.]
21
4 Week 2: Preliminaries
Exercise 4.1 Let b1 = (−3, 0, 4), b2 = (3, −1, 2), and b3 = (0, 1, −1). Perform the Gram-Schmidt
procedure above on {b1 , b2 , b3 } to find the corresponding {x1 , x2 , x3 } and {y1 , y2 , y3 }.
Dirac Notation. In what follows, we fix an n-dimensional Hilbert space H and some orthonormal
basis for it, so we can identify vectors with column vectors in the usual way. Recall that for column
vectors u, v ∈ H, we have
hu, vi = u∗ v .
Paul Dirac suggested a notation which somewhat reconciles the two sides of this equation: if we let
|ψi denote the column vector v and we let hϕ| denote the row vector u∗ , then hu, vi = u∗ v = hϕ|ψi
is just the usual multiplication of a row vector and a column vector (the two vertical bars overlap).
Note how the product hϕ|ψi looks like hu, vi with the comma replaced by a vertigule. This notation
has become standard in quantum mechanics. We denote a (column) vector by |ψi, where ψ is some
label identifying it, and we denote its corresponding dual (row) vector by hψ| (thus hψ| = |ψi∗ and
vice versa: |ψi = hψ|∗ ). The choice of delimiters tells us whether we are talking about a column
vector or a row vector. A vector of the form |ψi (i.e., a column vector) is called a ket vector. If |ϕi is
another ket vector, its dual (a row vector) hϕ| = |ϕi∗ is called a bra vector, so that the scalar hϕ|ψi
can be called the bracket (“bra-ket”) of |ϕi and |ψi.
We’ll start using Dirac notation because the book uses it, although there are some times when
the notation just gets too clunky, and so then we will go back to using the “standard” notation.
We can combine kets and bras in other ways. For example, |ψihϕ| is a column vector on the left
multiplied by a row vector on the right (in standard notation, vu∗ , where u and v are as above).
This is then an n × n matrix, or considered another way, a linear operator H → H that takes a
vector |χi and maps it to the vector |ψihϕ|χi = (hϕ|χi)|ψi (that is, the vector |ψi multiplied by
the scalar hϕ|χi). In any case, combining bras and kets just amounts to the usual vector or matrix
multiplication.
As a special case, if {e1 , . . . , en } is the orthonormal basis for H that we have fixed, then, letting
|ii := ei for all 1 6 i 6 n, we have, for all 1 6 i, j 6 n,
0 0 ··· 0 0 0 ··· 0
.. .. .. .. .. ..
. . . . . .
0 0 ··· 0 0 0 ··· 0
|iihj| = ei e∗j =
1 0 ··· 0 1 0 ··· 0 = 0 ··· 0 1 0 ··· 0 ,
0 0 ··· 0 0 0 ··· 0
.. .. .. .. .. ..
. . . . . .
0 0 ··· 0 0 0 ··· 0
Where the 1 is in the ith row and jth column. This matrix is usually denoted Eij . Notice that if A
is a linear map H → H whose corresponding matrix has entries aij , then by the equation above
we must have, X X
A= aij Eij = aij |iihj|,
i,j i,j
22
where both indices in the summation run from 1 to n. In particular, the identity operator is given
by X
I= |iihi|.
i
Change of (Orthonormal) Basis. Let H be as before, and let {e1 , . . . , en } and {f1 , . . . , fn } be two
orthonormal bases for H. There is a unique linear map U ∈ L(H) mapping the first basis to the
second, i.e., Uei = fi for all 1 6 i 6 n. Now for each 1 6 i, j 6 n we have
Since the linear map U∗ U is uniquely determined by the quantities above, we must therefore have
U∗ U = I, and thus U is unitary.
Conversely, if U is unitary and {e1 , . . . , en } is an orthonormal basis, then {Ue1 , . . . , Uen } is also
an orthonormal basis, because U preserves the scalar product.
We conclude that the operators needed to change orthonormal bases in a Hilbert space are
exactly the unitary operators.
Unitary Conjugation. If A and B are two linear operators in L(H) (equivalently, two n × n-
matrices), then we say that A is unitarily conjugate to B if there exists a unitary U such that
B = UAU∗ . The relation “is unitarily conjugate to” is an equivalence relation on L(H), that is, it is
reflexive, symmetric, and transitive.
Unitary conjugation allows us to change orthonormal bases. Suppose {e1 , . . . , en } and {f1 , . . . , fn }
are two orthonormal bases for H and let U be the unique unitary operator such that Uei = fi for
all 1 6 i 6 n. Suppose that A is some linear operator on H. We want to compare the matrix entries
of A with respect to the two different
bases.
With respect to the first basis (the e-basis), the (i, j)th
∗
entry of the matrix A is given by ei , Aej = ei Ae
j (or hi|A|ji using Dirac notation). With respect
to the second basis (the f-basis), the same entry is fi , Afj . Starting with this, we get
The right-hand side is the (i, j)th entry of the matrix representing the operator U∗ AU with respect
to the e-basis.
To summarize, if MA and MA 0 are the matrices representing the operator A with respect to the
23
where MU is the matrix representing the operator U with respect to the e-basis.
Thus, changing orthonormal basis amounts to unitary conjugation of the corresponding
matrices.
Back to Quantum Physics: The Double Slit Experiment. It’s been known since early in the
20th century that light comes in discrete packets (particles) called photons. People have observed
individual photons hitting a photoelectric detector (or a photographic plate) at specific times and
pinpoint locations, causing local electric currents in the detector (or dots to appear on the plate).
On the other hand, light also exhibits wavelike properties. In the double slit experiment, light
from a laser beam is shined on an opaque barrier with two small openings close to each other (on
the order of the wavelength of the light). A screen is placed on the other side of the barier. What you
see on the screen are alternating bands of light and dark—a standard interference pattern caused
by the light waves from the two slits interfering constructively and destructively with each other.
This is easily visible to the naked eye. If you block one of the slits, then the interference pattern
goes away and you just see a smoothly contoured, glowing blob on the screen (that depends on
the width of the slit).
Here is a plausible (though ultimately wrong) explanation in terms of photons: the photons
somehow are changing phase in time, and the photons that go through the top slit are interfering
with the photons going through the bottom slit.
Let’s see why this is wrong. Now alter the experiment as follows: Make the light source
extremely dim, so that it emits on average only one photon per second, and replace the screen with
a photographic plate (or photodetector) that will register where each photon hits. The photons
appear to hit the plate at random places, but if you run the experiment a long time (thousands or
millions of photons), you see that, statistically, the distribution of photon hits resembles the same
wavy interference pattern as before. That is, the probability of a photon hitting any given location
is proportional to the intensity of the light at that location in the original experiment.
We can’t say the photons are interfering with each other, since one photon goes through long
before the next one comes. The only explanation is that each photon is somehow passing through
both slits at the same time and interfering with itself on the other side. This cannot be explained at
all by classical physics, which asserts that the photon, being a particle, must travel through either
the upper slit or the lower slit, but not both. Indeed, if you put detectors at both slits, the photon
will only be detected at (at most) one slit or the other, not both.
Another thing that classical physics cannot explain is the random behavior of the photons at
the plate. You can send two identical photons, of exactly the same frequency and moving in exactly
the same direction, and they will wind up at different locations at the plate. So the behavior of the
photons is not deterministic but inherently random.
Quantum mechanics is needed to explain both these phenomena as follows: Each photon does
indeed correspond to a wave that goes through both slits, but the amplitude of this wave at any
location is related to the probability of the photon being at that location. These waves interfere with
each other and cause the interference pattern in the statistical distribution of photons at the plate.
So the two hallmarks of quantum mechanics are: (i) nondeterminism (inherent randomness)
and (ii) interference of probabilities. More later.
24
5 Week 3: Unitary conjugation
Invariance under Unitary Conjugation: Trace and Determinant. If A and U are n × n matrices
and U is unitary, then by Equation (4) of Exercise 2.8,
In other words, the tr function is invariant under unitary conjugation, i.e., if matrices A and B are
unitarily conjugate, then their traces are equal. This means that the tr function is really a function
of the underlying operator and does not depend on which orthonormal basis you use to represent
the operator as a matrix. (In fact, it does not depend on any basis, orthonormal or otherwise.)
It’s worth looking at what the trace looks like in Dirac notation. If A is an operator and
1 . . . , en } is an orthonormal basis, then we know (letting |ii = ei for all i, as before) that [A]ij =
{e ,
ei , Aej = hi|A|ji for the matrix of A with respect to this basis. So,
X
n X X X
tr A = [A]ii = hei , Aei i = e∗i Aei = hi|A|ii , (13)
i=1 i i i
and this quantity does not depend on the particular orthonormal basis we choose.
Similarly, the determinant function det is also invariant under unitary conjugation. This follows
from the fact that det(AB) = det A det B and det(A−1 ) = (det A)−1 for any nonsingular A. For A
and U as above, we have
So like the trace, det is really a function of the operator and does not depend on the basis used to
represent the operator as a matrix.
Here are some other invariants under unitary conjugation. In each case, U is an arbitrary
unitary operator.
The adjoint. For any A, clearly (UAU∗ )∗ = UA∗ U∗ . (The adjoint of a conjugate is the conjugate
of the adjoint.)
Being Hermitean. If A is Hermitean, then (UAU∗ )∗ = UA∗ U∗ = UAU∗ , so UAU∗ is also Her-
mitean.
Being unitary. If A is unitary, then (UAU∗ )(UAU∗ )∗ = UAU∗ UA∗ U∗ = UAA∗ U∗ = UU∗ = I, so
UAU∗ is also unitary.
Orthogonal Subspaces, Projection Operators. Projection operators are important for under-
standing how measurements are made in quantum mechanics. They are also important because
they can serve as building blocks for more general operators. We will spend some quality time
with them.
Again, let H be an n-dimensional Hilbert space, and let V, W ⊆ H be subspaces of H. V and
W are mutually orthogonal if hv, wi = 0 for every v ∈ V and w ∈ W.
25
Exercise 5.1 Show that if V and W are mutually orthogonal, then no nonzero vector can be in
V ∩ W.
There is a natural one-to-one correspondence between the subspaces of H and certain linear
operators on H known as projection operators.
Definition 5.2 An (orthogonal) projection operator or projector on H is a linear map P ∈ L(H) such
that
2. P2 = P, i.e., P is “idempotent.”
There are two trivial projection operators on H, namely, I (the identity) and 0 (the zero operator,
which maps every vector to 0). There are many nontrivial projection operators as well.
Exercise 5.4 Show that if P and Q are projection operators and PQ = 0, then QP = 0 as well, and
P + Q is a projection operator. [Hint: To show that QP = 0, take the adjoint of both sides of the
equation PQ = 0.]
Given a projection operator P on H, let V be the image of P, that is, V = img P := {Pv : v ∈ H}.
Then it is easy to check that V is a subspace of H, and we say that “P projects onto V.” Notice that
if u ∈ V then there is a v such that Pv = u, and so
Pu = PPv = Pv = u.
That is, P fixes every vector in V, and so clearly we also have V = {u ∈ H : Pu = u}.
Not only does P project onto V but it does so orthogonally. This means that P moves any vector
v perpendicularly onto V, or more precisely, hu, Pv − vi = 0 for any u ∈ V, where Pv − v is the vector
representing the net movement from v to Pv. To see that hu, Pv − vi = 0, we write u = Pw for some
w and just calculate:
hu, Pv − vi = hPw, Pv − vi = hPw, Pvi − hPw, vi = hP∗ Pw, vi − hPw, vi = hPw, vi − hPw, vi = 0.
Conversely, if V is any subspace of H, then there is a unique projection operator P that projects
orthogonally onto V as above. First I’ll show uniqueness: If P and Q are projectors that both
orthogonally project onto V, then for any v, w ∈ H we have
and
hPw, Qvi = hQ∗ Pw, vi = hQPw, vi = hPw, vi.
26
The last equation follows from the fact that Q fixes every vector in V, in particular, Q fixes Pw.
Putting these two facts together, we have
Since w was chosen arbitrarily, this means that Pv − Qv is orthogonal to every vector of the form
Pw, i.e., every vector in V. But Pv − Qv is itself in V because both Pv and Qv are in V. Thus Pv − Qv
is orthogonal to itself, and this means that
• [P]ii = 1 for 1 6 i 6 k,
• [P]ij = 0 for i , j.
Thus P is given by a diagonal matrix where the first k diagonal entries are 1 and the rest are 0.
Clearly, P = P∗ and P2 = P, so P is a projector. Furthermore, P fixes each of the basis vectors
b1 , . . . , bk and so it fixes each vector in V. P annihilates all the other bk+1 , . . . , bn , and so Pv ∈ V
for all v ∈ H. Thus P projects orthogonally onto V.
Exercise 5.5 Let V be a subspace of H and let P be its corresponding projection operator. Show
that dim V = tr P. [Hint: Consider the matrix construction just above.]
Exercise 5.6 Suppose |ψi is a unit vector in H (i.e., hψ|ψi = 1). Show that |ψihψ| is a projection
operator. What subspace does it project onto? What is tr |ψihψ|?
Exercise 5.7 Find the 3 × 3 matrix for the projector P that projects orthogonally onto the two-
dimensional subspace of C3 spanned by v1 = (1, −1, 0) and v2 = (2, 0, i). P is the unique operator
satisfying: (i) P2 = P = P∗ , (ii) Pv1 = v1 , (iii) Pv2 = v2 , and (iv) tr P = 2. [Hint: If y1 and y2 are
orthogonal unit vectors, then y1 y∗1 + y2 y∗2 projects onto the subspace spanned by y1 and y2 . Use
Gram-Schmidt to find y1 and y2 given v1 and v2 . When you find P, check items (i)–(iv) above.]
Exercise 5.8 Let V and W be subspaces of H with corresponding projection operators P and Q,
respectively. Prove that V and W are mutually orthogonal if and only if PQ = 0. [Hint: For
the forward direction, consider kPQvk2 for any vector v ∈ H. For the reverse direction, consider
hPv, Qwi for any vectors v, w ∈ H, and move the P to the right-hand side of the bracket.]
27
If V is a subspace of H, we define the orthogonal complement of V (denoted V ⊥ ) to be
V ⊥ = {u ∈ H : (∀v ∈ V)[hu, vi = 0]}.
V ⊥ is clearly a subspace of H.
Exercise 5.9 Show that if V is a subspace of H with corresponding projection operator P, then I − P
is the projection operator corresponding to V ⊥ .
Exercise 5.10 Show that if V is a subspace of H, then H is the direct sum of V and V ⊥ (written
H = V ⊕ V ⊥ ), that is, every vector in H is the unique sum of a vector in V with a vector in V ⊥ .
[Hint: use the previous exercise and the fact that V ∩ V ⊥ = {0}.]
Definition 5.11 A complete set of orthogonal projectors, also called a decomposition of I, is a collection
{Pi : i ∈ I} of nonzero projectors on H such that
Here, I is any finite set of distinct labels. We may have I = {1, . . . , k} for some k, but there are other
possibilities, including real numbers, or labels that are not numbers at all.
We will see later (Exercise 9.34) that condition (1) is actually redundant; it follows from condi-
tion (2).
Taking the trace of both sides of item (2), we get
X X
tr Pi = tr Pi = tr I = n.
i∈I i∈I
Since each Pi , 0, its trace is a positive integer (Exercise 5.5), so there can be at most n many
projection operators in any complete set, where n = dim H.
For each i ∈ I, let Vi be the subspace that Pi projects onto. By Exercise 5.8, the Vi are all
pairwise mutually orthogonal. Furthermore, the Vi together span all of H: for any v ∈ H,
X
v = Iv = Pi v, (14)
i∈I
but Pi v ∈ Vi for L
each i, so v is the sum of vectors from the Vi . Generalizing Exercise 5.10, one can
show that H = i∈I Vi is the direct sum of the Vi . That means that every v ∈ H is the sum of
unique vectors in the respective spaces Vi , and this sum is given by (14) above.
As a special case, if P projects onto a proper, nonzero subspace V of H, then {P, I − P} is a
complete set of projectors corresponding to the two subspaces V and V ⊥ .
Exercise 5.12 Let {Pi : i ∈ I} be a complete set of orthogonal projectors over H, and let v ∈ H be
any vector. Show by direct calculation that
X
kvk2 = kPa vk2 .
a∈I
28
Exercise 5.13 Suppose P and Q are projection operators on H projecting onto subspaces V ⊆ H
and W ⊆ H, respectively. Show that if P and Q commute, that is, if PQ = QP, then PQ is a
projection operator projecting onto V ∩ W.
Fundamentals of Quantum Mechanics. We now know enough math to present the fundamental
principles of quantum mechanics. For now, I will abide by the Copenhagen interpretation of quan-
tum mechanics first put forward by Niels Bohr. This is the best-known interpretation and is easy
to work with, albeit somewhat unsatisfying philosophically. Another well-known interpretation
is the Everett interpretation, a.k.a. the many-worlds interpretation or the unitary interpretation.
More on that later. There are still other interpretations, but there are no conflicts between any of
these interpretations; they all use the same math and lead to the same predictions. The differences
are merely philosophical.
Physical Systems and States. A physical system is some part of nature, for example, the position
of an electron orbiting an atom, the electric field surrounding the earth, the speed of a train, etc.
The last two are “macroscopic,” dealing with big objects with lots of mass, momentum, and energy.
Although in principle quantum mechanics covers all these systems, it is most conveniently applied
to microscopic systems like the first.
The most basic principle of quantum mechanics relevant to us is that to every physical system
S there corresponds a Hilbert space H = HS , called the state space of S.1 At any given point in time,
the system is in some state, which for now we can define as a unit vector |ψi ∈ H. (We will revise
this definition later on, but our revision does not invalidate anything we discuss until then.) The
state of the system may change with time, depending on the forces (internal and external) applied
to the system. We may write the state of the system at time t as |ψ(t)i.
Time Evolution of an Isolated System. Let’s assume that our system S is isolated, i.e., it is not
interacting with any other systems. The state of S evolves in time, but this evolution is linear in the
following sense: For any two times t1 , t2 ∈ R, there is a linear operator U = U(t2 , t1 ) ∈ L(H) such
that if the system is in the state |ψ(t1 )i at time t1 then at time t2 the system will be in the state
The operator U only depends on the system (its internal forces) and on the times t1 and t2 , but not
on the particular state the system happens to be in. That is, the single operator U describes how
the system evolves from any state at t1 to the resulting state at t2 . Note that t1 and t2 are arbitrary;
t2 does not necessarily have to come after t1 .
Since U maps states to states, it must be norm-preserving. From this one can show that it must
preserve the scalar product. That is, U must be unitary. Here are some other basic, intuitive facts:
1. U(t, t) = I for any time t. (If no time elapses, then the state has no time to change.)
1
In general, H may be infinite dimensional. The systems we care about, however, are all bounded, which means they
correspond to finite dimensional spaces.
29
2. U(t1 , t2 ) = U(t2 , t1 )−1 = U(t2 , t1 )∗ for all times t1 , t2 . (Tracing the evolution of the system
backward in time should undo the changes made by running the system forward in time.)
3. U(t3 , t1 ) = U(t3 , t2 )U(t2 , t1 ) for all times t1 , t2 , t3 . (Running the system from t1 to t2 and then
from t2 to t3 has the same effect on the state as running the system from t1 to t3 . Recall that
operator composition reads from right to left.)
(Item (2) actually follows from items (1) and (3).) If the system S is known, then U(t2 , t1 ) can be
computed with arbitrary accuracy, at least in principle. In many simple cases, U(t2 , t1 ) is known
exactly, and can even be controlled precisely by manipulating the system S. Controlling U is crucial
to quantum computation. We’ll see specific examples a bit later.
Projective Measurement. Now and then, we’d like to get information about the state of our
system S. It turns out that quantum mechanics puts severe limitations on how much information
we can extract, and disallows us from extracting this information in a purely passive way.
The standard way of getting information about the state of a system is by making an observation,
also called a measurement. These are terms of art which unfortunately don’t bear much intuitive
resemblance to their every-day meanings. A typical (and very general) type of measurement is a
projective measurement.2 If H is the Hilbert space of system S, then a projective measurement on
S corresponds to a complete set {Pk : k ∈ I} of orthogonal projectors on H. The elements of I
are the possible outcomes of the measurement. If the system is in state |ψi when the measurement
is performed, then the measurement will produce exactly one of the possible outcomes randomly
such that each outcome k ∈ I is produced with probability
Pr[k] = kPk |ψik2 = (Pk |ψi)∗ Pk |ψi = hψ|Pk Pk |ψi = hψ|Pk |ψi . (15)
Furthermore, immediately after the measurement, the state of the system will be
• The outcome of the projective measurement is intrinsically random. You can prepare the
system S in the exact same state |ψi twice, perform the exact same projective measurement
2
There are other, more “general” types of measurement, but these can actually be implemented using projective
measurements on larger systems, so these other measurements really aren’t more general than projective measurements.
30
both times, and get different outcomes. The only things that we can predict from our
experiments are the statistics of the outcomes. If we know the state |ψi of the system when
the measurement is performed, then in principle we can compute Pr[k] for each outcome k,
and then if we run the same experiment many times (say a million times), then we can expect
to see outcome k occur about a Pr[k] fraction of the time. This is indeed what happens.
• There can be at most a finite, discrete number of possible outcomes associated with any
projective measurement—no more than dim(H) (at least for bounded systems).
• The probabilities defined by (15) are certainly nonnegative, but we need to check that they
sum to 1. We have
X X X
!
Pr[k] = hψ|Pk |ψi = hψ| Pk |ψi = hψ|I|ψi = hψ|ψi = k|ψik2 = 1,
k∈I k k
That is, we see the outcome k again with certainty, and the state immediately after the second
measurement is
Pk |ψk i |ψk i
= = |ψk i,
kPk |ψk ik k|ψk ik
unchanged from after the first measurement. So the first measurement changes the state to
be consistent with whatever the outcome is, so that repetitions of the same measurement will
always yield the same outcome (provided, of course, that the state does not evolve between
measurements).
• If |ψi is a state and θ ∈ R, then eiθ |ψi is also a state. The unit norm scalar eiθ is known as a
“phase factor.” Note that
1. if U is unitary, then obviously Ueiθ |ψi = eiθ U|ψi, and
2. for the projective measurement {Pk }k∈I above, the probability of seeing k when the
system is in state eiθ |ψi is
2
kPk eiθ |ψik = |eiθ |2 kPk |ψik2 = kPk |ψik2 ,
that is, the same for the state |ψi, and finally,
31
3. if outcome k occurs, then the state after the measurement is
This means that the phase factor just “goes along for the ride” and does not affect the statistics
of any projective measurment (or any other type of measurement, either). The state |ψi and
eiθ |ψi are physically indistinguishable, and so we can choose overall phase factors arbitrarily
in defining a state, or we are free to ignore them as we wish. More on this later.
We’ll now see how this all plays out for a two-dimensional system.
A Perfect Example: Electron Spin. Rotating objects possess angular momentum. The angular
momentum of an object is a vector in R3 that depends on the distribution of mass in the object
and how the object is rotating. For any given object, the length of its angular momentum vector is
proportional to the speed of the rotation (in revolutions per minute, say), and the vector’s direction
is pointing (roughly) along the axis of rotation in the direction given by the “right hand rule”: a
disk rotating counterclockwise in the x, y-plane has its angular momentum vector pointing in the
positive z-direction. A Frisbee thrown by a right-handed person (using the usual backhand flip)
rotates clockwise when viewed from above, so its angular momentum vector points down toward
the ground.
If a rotating object carries a net electric charge, then it has a magnetic moment vector that is
proportional to the angular momentum times the net charge. Shooting an object with a magnetic
moment through a nonuniform electric field imparts a force to the object, causing it to deflect and
change direction. The deflection force is along the axis given by the gradient of the electric field
and is proportional to the component of the magnetic moment along that gradent axis. You can
measure the component of the magnetic moment along the gradient axis this way by seeing the
amount of deflection.
Electrons deflect when shot through a nonuniform magnetic field as well, so they possess
magnetic moment. This can only mean that they have angular momentum as well, even though,
being elementary particles, they have no constituent parts that can rotate around one another. This
is just one of the many bizarre aspects of the microscopic world.
In the Stern-Gerlach experiment, randomly oriented electrons are shot through a nonuniform
electric field whose gradient is oriented in the +z-direction (vertically). According to classical
physics, we would expect the electrons to deflect by random amounts, causing a smooth up-down
spread in the beam. Instead, what we actually observe is the beam split into two sharp beams of
roughly equal intensity: one going up, the other going down (see Figure 1). So each electron only
goes up the same amount or down the same amount. This experiment amounts to a projective
measurement of the spin of an electron, at least in the z-direction. There are two possible outcomes:
spin-up and spin-down. It is natural then to model the physical system of electron
spin as a two-
1
dimensional Hilbert space, with an orthonormal basis {|↑i, |↓i}, where |↑i = is the spin-up
0
0
state and |↓i = is the spin-down state. We may also write |↑i and |↓i as |+zi and |−zi,
1
32
+z
|↑i
Spin up
Random spins
Spin down
Nonuniform field |↓i
with gradient in
the +z direction
Figure 1: Stern-Gerlach exeriment: The electron beam comes in from the left, passes through a
nonuniform field between the two probes, and splits into two beams. The field gradient is oriented
along the axis of the probes, which is here given by the +z-direction.
respectively, to make clear along what axis the spin is aligned. The projectors in the projective
measurement are then
1 0
P↑ = P+z = |↑ih↑| = ,
0 0
which projects onto the space spanned by |↑i and corresponds to the spin-up outcome, and
0 0
P↓ = P−z = |↓ih↓| = ,
0 1
which projects onto the space spanned by |↓i and corresponds to the spin-down outcome. As we’ll
see in a little bit, a two-dimensional Hilbert space actually suffices for modeling electron spin.
33
7 Week 4: Qubits
Qubits. In digital information processing, the basic unit of information is the bit, short for binary
digit. Each bit has two distinct states that we care about: 0 and 1. In quantum information
processing, we use bits as well, but we regard them as quantum systems that have two states |0i
and |1i that form an orthonormal basis for a two-dimensional Hilbert space. Such systems are
called quantum bits, or qubits for short. Any two-dimensional Hilbert space will do to model a qubit.
This is why it is useful to consider the electron spin example. In fact, electron spin is one proposed
way to implement a qubit: |↑i is identified with |0i and |↓i with |1i.3 We’ll return to the electron
spin example, but what we say applies generally to any system with a two-dimensional Hilbert
space (sometimes called a “two-level system”), which can then in principle be used to implement
a qubit. To emphasize this point, we’ll use |0i and |1i to stand for |↑i and |↓i, respectively, and we’ll
let the projectors P0 and P1 stand for P↑ and P↓ , respectively.
Back to Electron Spin. Using the Stern-Gerlach apparatus oriented in a particular direction, we
can prepare electrons to have spins in that direction. We simply retain one emerging beam and
discard the other. Figure 2 shows electrons being prepared to spin in one direction in the x, z-plane,
then measured in the +z-direction.
The general state of an electron spin is
That is, it is some linear combination of spin-up and spin-down. We would now like to determine
which linear combinations correspond to which spin directions (in 3-space). Since |ψi is a unit
vector, we have
1 = hψ|ψi
= (α∗ h0| + β∗ h1|)(α|0i + β|1i)
= α∗ αh0|0i + α∗ βh0|1i + β∗ αh1|0i + β∗ βh1|1i
= |α|2 + |β|2 .
And similarly, the probability of seeing |1i (spin-down) is |β|2 . Since phase factors don’t matter,
we can assume from now on that α ∈ R and α > 0, because we can multiply |ψi by the right phase
factor, namely e−i arg(α) .
Now consider the state |↑θ i = α|0i + β|1i prepared by the apparatus on the left of Figure 2,
corresponding to a spin pointing at angle θ from the +z-axis in the +x direction (Cartesian co-
ordinates (sin θ, 0, cos θ), which has unit length). Here 0 6 θ 6 π. When it passes through the
vertical apparatus on the right, the beam splits into two beams whose intensities are proportional
3
Another system with a two-dimensional Hilbert space is photon polarization, where we can take as our basis the
state |↔i (horizontal polarization) and the state |li (vertical polarization). All other polarization states (e.g., slanted or
circular) are linear combinations of these two.
34
θ
+z
trash
Figure 2: Electrons are prepared by the tilted apparatus on the left to spin at an angle θ from the
+z-axis. These are then fed into a vertical apparatus.
35
to their probabilities. According to classical mechanics, the average deflection is proportional to
the vertical component of the spin vector, i.e., cos θ. If quantum mechanics is to agree with classical
mechanics in the macroscopic limit, then the average deflection of the two beams must also be
cos θ. The deflection of the spin-up beam is +1, and the deflection of the spin-down beam is −1,
so the average deflection is
This must be cos θ, so solving for α in terms of θ and remembering that α > 0, we get
r
1 + cos θ θ
α= = cos .
2 2
θ
β = eiϕ sin ,
2
for some real ϕ with 0 6 ϕ < 2π. In experiments, these relative intensities are actually observed.
It is worth mentioning at this point that for any α > 0 and β ∈ C such that α2 + |β|2 = 1, there
are 0 6 θ 6 π and 0 6 ϕ < 2π such that
θ
α = cos ,
2
θ
β = eiϕ sin ,
2
giving the general spin state as
θ θ
|ψi = cos |0i + eiϕ sin |1i.
2 2
Furthermore, θ and ϕ are uniquely determined by |ψi except when α = 0 or β = 0, in which case
θ = π or θ = 0, respectively, but ϕ is completely undetermined.
Now look at the case where θ = π/2, that is, the spin is pointing in the +x direction (to the
right). We get
π π |0i + eiϕ |1i
|+xi = ↑π/2 = cos |0i + eiϕ sin |1i =
√ .
4 4 2
We are free to adjust the phase factor of |1i to absorb the eiϕ above. That is, without changing the
physics, we redefine4
|1i := eiϕ |1i.
By the phase-adjustment we now get the “spin-right” state
|0i + |1i
|+xi = ↑π/2 = |→i =
√ .
2
4
Mathematicians may not like doing this, but physicists and computer scientists aren’t bothered by it.
36
The corresponding one-dimensional projector is
|0i + |1i
h0| + h1| 1
P+x = P→ = |+xih+x| = √ √ = (|0ih0| + |0ih1| + |1ih0| + |1ih1|),
2 2 2
1 1
which has matrix form (1/2) .
1 1
Now we consider the state |+yi representing spin in the +y direction. A +y spin has no
+z-component, so if |+yi is measured along the z-axis, we get Pr[↑] = Pr[↓] = 1/2, as with |+xi.
Thus,
|0i + eiϕ |1i
1 1
|+yi = √ = √ iϕ ,
2 2 e
for some 0 6 ϕ < 2π. If we now measure a +y spin in the +x direction, we should again get equal
probabilities of spin-left and spin-right, since the spin is perpendicular to x. Thus we should have
1 1 1 1 1 1 + cos ϕ
= Pr[→] = h+y|P→ |+yi = 1 e −iϕ
iϕ = .
2 4 1 1 e 2
So cos ϕ = 0, and it follows that ϕ ∈ {π/2, 3π/2} and so eiϕ = ±i. It does not matter which value
we choose; the math and physics is equivalent either way. So we’ll arbitrarily set ϕ := π/2, whence
we get
|0i + i|1i
|+yi = √ .
2
The corresponding projector is
|0i + i|1i
h0| − ih1| 1
P+y = |+yih+y| = √ √ = (|0ih0| − i|0ih1| + i|1ih0| + |1ih1|),
2 2 2
1 −i
which has matrix form (1/2) .
i 1
Let’s review:
|0i + |1i
1 1
|+xi = √ = √ , (18)
2 2 1
|0i + i|1i
1 1
|+yi = √ = √ , (19)
2 2 i
1
|+zi = |0i = . (20)
0
The corresponding projectors are
1 1 1 1
P+x = |+xih+x| = = (I + X),
2 1 1 2
1 1 −i 1
P+y = |+yih+y| = = (I + Y),
2 i 1 2
1 0 1
P+z = |+zih+z| = = (I + Z),
0 0 2
37
where
0 1
X = σx = σ1 = 2P+x − I = , (21)
1 0
0 −i
Y = σy = σ2 = 2P+y − I = , (22)
i 0
1 0
Z = σz = σ3 = 2P+z − I = . (23)
0 −1
X, Y, and Z are known as the Pauli spin matrices. More on them later.
Now consider a general spin state, written in terms of θ and ϕ:
cos(θ/2)
|ψi = |↑θ,ϕ i = cos(θ/2)|0i + e sin(θ/2)|1i =
iϕ
.
eiϕ sin(θ/2)
(Recall that 0 6 θ 6 π and 0 6 ϕ < 2π are arbitrary.) The direction of this spin is given by a vector
s = (xs , ys , zs ) in 3-space with Cartesian coordinates xs , ys , zs ∈ R. How do we find xs , ys , zs ? We
know that these values are the average deflections observed when the spin is measured in the +x,
+y, or +z axes, respectively. So generalizing Equation (17), we must have
Thus s is exactly the point on the unit sphere whose spherical coordinates are (θ, ϕ).5
Exercise 7.1 Verify Equations (24–26) using matrix multiplication and trig.
√
Exercise 7.2 What is the spin direction corresponding to the state ( 3|0i − |1i)/2? Express your
answer as simply as possible.
Exercise 7.3 What spin state corresponds to the direction s = (−2/3, 2/3, 1/3)? Express your
answer as simply as possible.
Exercise 7.4 (Very useful!) Show that if |ψi is a general spin state corresponding to the direction
s = (xs , ys , zs ) as described above, then
1
|ψihψ| = (I + xs X + ys Y + zs Z). (27)
2
The right-hand side is sometimes written as (1/2)(I + s · σ), abusing the dot product notation.
5
Each vector s on the unit sphere can be described using spherical coordinates, i.e., two angles θ and ϕ, where
0 6 θ 6 π is the angle between s and the +z axis (the “latitude” of s, measured down from the North Pole), and
0 6 ϕ < 2π is the angle one would have to swivel the x, z-plane counterclockwise around the +z axis until it hits s (the
“longitude” of s, measured east of Greenwich, i.e., east of the x, z-plane). If s has spherical coordinates (θ, ϕ), then its
Cartesian coordinates are (sin θ cos ϕ, sin θ sin ϕ, cos θ).
38
8 Week 4: Density operators
Density Operators. One problem with using a vector |ψi to represent a physical state is that the
vector carries more information than is physically relevant, namely, an overall phase factor. The
physically relevant portion of |ψi is really just the one-dimensional subspace that it spans (which
does not depend on any phase factors), or equivalently, the projector |ψihψ| that orthogonally
projects onto that subspace. For this and other reasons, one may define the state of a system
to be a one-dimensional projection operator ρ = |ψihψ| instead of a vector |ψi. This alternate
view of states is known as the density operator formalism, and ρ is known as a density operator or
density matrix. Besides the advantage of discarding the physically irrelevant phase information,
this formalism has other advantages that we will see later when we discuss quantum information
theory. For many of the tasks at hand, however, either formalism will suffice, and we will use both
as is convenient.
We need to describe the two basic physical processes that we have discussed—time evolution
and projective measurement—in terms of the density operator formalism.
Time evolution of an isolated system. In the original formalism, time evolution is described by
a unitary operator U such that any state |ψi evolves to a state U|ψi in the given interval of
time. In the new density operator formalism, the state ρ = |ψihψ| would evolve under U to
the new state
ρ 0 = UρU∗ . (28)
To see why this is so, we merely observe that the new state should be |ϕihϕ|, where |ϕi = U|ψi.
We get
Projective measurement. Suppose we are given a complete set {Pk : k ∈ I} of projectors corre-
sponding to a projective measurement. In the original formalism, if the system is in state
|ψi before the measurement, then the probability of outcome k is hψ|Pk |ψi. Since this prob-
ability is physically relevant (we can collect statistics over many identical experiments), we
had better get the same probability in the new formalism: when the state of the system is
ρ = |ψihψ| before the measurement, the probability of outcome k is given by
where the right-hand side refers to the Hilbert-Schmidt inner product on L(H) (see Equa-
tion (11)). To see that this is the same as in the original formulation, we can use the form
of the trace given by Equation (13), where we choose an orthonormal basis {e1 , . . . , en } such
that e1 = |ψi. Letting |ii = ei for all i as before (and so |ψi = |1i), we then get
X
n X
tr(Pk ρ) = hi|Pk ρ|ii = hi|Pk |ψihψ|ii
i=1 i
X
= hi|Pk |1ih1|ii = h1|Pk |1i = hψ|Pk |ψi ,
i
39
which is the same as originally defined. Alternatively, we can use the commuting property
of the trace (Equation (4)) to get the same thing:
That last equation holds because hψ|Pk |ψi is just a scalar (a 1 × 1 matrix). Assuming the
outcome is k, the state after the measurement should be ρk = |ψk ihψk |, where |ψk i =
Pk |ψi/kPk |ψik. This simplifies:
Note that tr(Pk ρ) = tr(Pk2 ρ) = tr(Pk ρPk ), so the denominator in (30), i.e., the probability of
getting the outcome k, is the trace of the numerator. (Obviously, ρk is undefined if Pr[k] = 0,
but if that’s the case, we’d never see outcome k.)
Exercise 8.1 Show that if |ψ1 i and |ψ2 i are unit vectors, and ρ1 = |ψ1 ihψ1 | and ρ2 = |ψ2 ihψ2 |, then
Properties of the Pauli Operators. The operators X, Y, Z defined in (21–23) play a prominent
role in quantum mechanics and quantum informatics. Here we’ll present their most important
properties in one place for ease of reference. All of these facts are easy to verify, and we leave that
for the exercises.
1. X2 = Y 2 = Z2 = I.
4. tr X = tr Y = tr Z = 0.
Note that there is a cyclic symmetry among the Pauli matrices. If we simultaneously substitute
X 7→ Y, Y 7→ Z, and Z 7→ X everywhere in the equations above, we get the same equations. We
won’t pursue it here, but you can use the Pauli operators to represent the quaternions H.
40
The four 2 × 2 matrices I, X, Y, Z (also denoted σ0 , σ1 , σ2 , σ3 , respectively) form a basis for the
space L(C2 ) of all operators over C2 (i.e., 2 × 2 matrices over C). That is, for any 2 × 2 matrix A,
there are unique coefficients a0 , a1 , a2 , a3 ∈ C such that
X
3
A = a0 I + a1 X + a2 Y + a3 Z = ai σi . (31)
i=0
The coefficients can often be found by inspection, but there is a brute force method to find them:
Exercise 8.3
Exercise 8.4 Show that if A = xX + yY + zZ for real numbers x, y, z such that x2 + y2 + z2 = 1, then
A2 = I. Thus A is both Hermitean and unitary.
Single-Qubit Unitary Operators. In this topic, we show that applying any unitary operator to
a one-qubit system amounts to a rigid rotation in R3 , and conversely, any rigid rotation in R3
corresponds to a unitary operator. We’ve seen that a general one-qubit state can be written, up to
an overall phase factor, as
|ψi = |↑θ,ϕ i = cos(θ/2)|0i + eiϕ sin(θ/2)|1i,
for some 0 6 θ 6 π and some 0 6 ϕ < 2π, and that this state corresponds uniquely (and vice
versa) to the point s on the unit sphere in R3 with spherical coordinates (θ, ϕ) (and thus with
Cartesian coordinates (xs , ys , zs ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ)). (Think of s as the spin direction
of an electron, for example.) The unit sphere in question here is known as the Bloch sphere. We’ll
now show that the action of a unitary operator U on one-qubit states amounts to a rigid rotation
SU of the Bloch sphere.
It’s slightly more convenient to work in the density operator formalism, using Equation (27).
Given any one-qubit unitary operator U, we define the map SU from the Bloch sphere onto itself
as follows: For any point s = (xs , ys , zs ) on the Bloch sphere (s is a vector in R3 of length 1), let
1
ρs = (I + xs X + ys Y + zs Z)
2
41
be the corresponding one-qubit state, according to Equation (27). Then let
ρt = Uρs U∗
be the state obtained by evolving the system in state ρs by U. The state ρt can be written as
1
ρt = (I + xt X + yt Y + zt Z),
2
for some unique t = (xt , yt , zt ) on the Bloch sphere. We now define SU (s) to be this t.6
It is immediate from the definition that for unitaries U and V we have SUV = SU SV .
To show that SU rotates the sphere rigidly, we first show that SU preserves dot products of
vectors on the Bloch sphere, that is, SU (r) · SU (s) = r · s for any r and s on the Bloch sphere. This
implies that SU is a rigid map of the Bloch sphere onto itself, but it does not imply that SU is a
rotation, because SU might be orientation-reversing, e.g., a reflection. We’ll see that SU preserves
orientation (aka chirality, aka “handedness”), so that it must be a rotation.7
Let r = (r1 , r2 , r3 ) and s = (s1 , s2 , s3 ) be any two points on the Bloch sphere, with corresponding
P P
states ρr = (1/2) 3i=0 ri σi and ρs = (1/2) 3j=0 sj σj as above, where we define r0 = s0 = 1. Recall
that the dot product of r and s is r·s = r1 s1 +r2 s2 +r3 s3 . Let’s compute hρr , ρs i using Equation (32):
1X 1X
3 3
* +
hρr , ρs i = ri σi , sj σj
2 2
i=0 j=0
1X
1X
= ri sj σi , σj = ri sj (2δij )
4 4
i,j i,j
1X X
3 3
!
1
= ri si = 1+ ri si
2 2
i=0 i=1
1+r·s
= ,
2
so
r · s = 2hρr , ρs i − 1 . (33)
Since r and s were arbitrary, we should also have
SU (r) · SU (s) = 2 ρSU (r) , ρSU (s) − 1 ,
but now,
42
Now is perhaps a good time to clear up some confusion that may arise about points on the
Bloch sphere. Letting |ψr i and |ψs i be such that ρr = |ψr ihψr | and ρs = |ψs ihψs |, then combining
Equation (33) with Exercise 8.1 above, we get
Thus hψr |ψs i = 0 iff r · s = −1. In other words, qubit states that are orthogonal in the Hilbert space
correspond to antipodal—or opposite—points on the Bloch sphere. We kind of knew this already,
since the two possible outcomes of the Stern-Gerlach spin measurement (in any direction) are
opposite spins (e.g., |↑i and |↓i), and must (as with any projective measurement) correspond to
orthogonal states.
Before showing that SU must preserve orientation, we’ll show that for any rigid rotation S of
the Bloch sphere, there is a unitary U such that S = SU . Geometrically, any rotation S of the unit
sphere can be decomposed into a sequence of three simple rotations:
1. a counterclockwise rotation Sz (ψ) about the +z axis through an angle ψ where 0 6 ψ < 2π,
3. another counterclockwise rotation Sz (ϕ), about the +z axis, this time through an angle ϕ
where 0 6 ϕ < 2π.
The last two rotations have the effect of moving the North Pole (i.e., the point (0, 0, 1)) to an
arbitrary point on the sphere (with spherical coordinates (θ, ϕ)) in a standard way. The only
remaining freedom left in choosing S is an initial rotation that fixes the North Pole, i.e., the first
rotation above. The three angles ϕ, θ, ψ are uniquely determined by S (except when θ = 0 or
θ = π), and are called the Euler angles of S.
So S = Sz (ϕ)Sy (θ)Sz (ψ), and so to implement S, we only need to find unitaries for rotations
around the +z and +y axes. For any angle ϕ, define
e−iϕ/2
0
Rz (ϕ) = . (34)
0 eiϕ/2
and so if U = Rz (ϕ), then SU = Sz (ϕ). (Here and elsewhere, we use the expression A ∝ B to mean
that A and B may differ only by an overall phase factor, i.e., there exists an angle ω ∈ R such that
A = eiω B.) For any angle θ, define
cos(θ/2) − sin(θ/2)
Ry (θ) = . (35)
sin(θ/2) cos(θ/2)
Ry (θ) is unitary, and it is straightforward to show that if U = Ry (θ), then SU = Sy (θ). Thus any
rotation S can be realized as SU for some unitary U. Later, we will see a direct way of translating
43
between a 1-qubit unitary U and its corresponding rotation SU . For completeness, we define a
unitary corresponding to rotation of ϕ counterclockwise about the x-axis:
cos(ϕ/2) −i sin(ϕ/2)
Rx (ϕ) = Ry (π/2)Rz (ϕ)Ry (−π/2) = . (36)
−i sin(ϕ/2) cos(ϕ/2)
Now to show that SU must preserve orientation, we show that the orientation-reversing map
M that maps each point (x, y, z) on the Bloch sphere to its antipodal point (−x, −y, −z) is not of
the form SU for any unitary U. This suffices, because if S is any orientation-reversing rigid map
of the Bloch sphere, then S−1 is also rigid and orientation-reversing, which means that the map
T = MS−1 is orientation-preserving and hence a rotation. Therefore, T = SV for some unitary V,
as we just showed. But then M = T S, and so if we assume that S = SW for some unitary W, we
then have M = SV SW = SVW , a contradiction.
Suppose M = SU for some unitary U. Then, since U must reverse the directions of all spins,
we must have, for example, U|0i = U|+zi ∝ |−zi = |1i and U|1i = U|−zi ∝ |+zi = |0i. Expressing
U as a matrix in the {|+zi, |−zi} basis, we must then have
0 eiσ
U=
eiτ 0
which corresponds to some point p on the equator of the Bloch sphere. We have
So U does not change the state |ψi more than by a phase factor, and thus SU leaves the point p
fixed, which means that SU , M, a contradiction.8
Example. X, Y, and Z are unitary, so what are SX , SY , and SZ ? It’s easy to check that X = iRx (π),
Y = iRy (π), and Z = iRz (π), and so (since phase factors don’t matter) SX , SY , and SZ are rotations
about the x-, y-, and z-axes, respectively, through the angle π (half a revolution).
Exercise 8.5 Prove the claim that if U = Ry (θ), then SU = Sy (θ). [Hint: First check that
Ry (θ)|+yi ∝ |+yi, and thus SU fixes the point (0, 1, 0) where the Bloch sphere intersects the +y-
axis. Then check that Ry (θ)|+zi = Ry (θ)|0i ∝ cos(θ/2)|0i + sin(θ/2)|1i = |↑θ,0 i, so SU moves the
point (0, 0, 1) to the point (sin θ, 0, cos θ). Finally, check that Ry (θ)|+xi = Ry (θ)↑π/2,0 ∝ cos(θ/2+
π/4)|0i + sin(θ/2 + π/4)|1i = ↑θ+π/2,0 , so SU moves the point (1, 0, 0) to (cos θ, 0, − sin θ).]
8
This result has physical significance. It says that there is no single physical process that can reverse the spin of any
isolated electron.
44
A direct translation. Here we give (without proof) a direct way to pass between the 2 × 2 unitary
matrix U and the corresponding rotation SU (a 3 × 3 real matrix), and vice versa, using the Pauli
matrices. First we give some basic facts about rotations of Rn in general and of R3 in particular
(some of which we’ve seen before). You can skip this list if you want.
1. An n×n matrix S with real entries gives a rigid (length- and angle-preserving) transformation
of Rn iff
ST S = I , (37)
or equivalently, SST = I. (Here ST denotes the transpose of S, and I denotes the n × n identity
matrix.) In this case, we also say that S is an orthogonal matrix. Notice that, since S is real,
ST = S∗ , and so S is orthogonal iff S is unitary.
2. Any orthogonal matrix has determinant ±1. If the determinant is +1, then the transformation
is orientation-preserving (i.e., a rotation); otherwise it is orientation-reversing.
3. Let S be a real n × n matrix. If S is a rotation of Rn , then S = Sc , 0. (Here, Sc denotes the
cofactor matrix of S.) The converse also holds if n > 2.
4. If n is odd, then every rotation S of Rn has 1 as an eigenvalue. That is, S fixes some nonzero
vector n̂ ∈ Rn . Thus if n = 3, then S moves points around some fixed axis (through the
vector n̂ ∈ R3 ) counterclockwise through some angle ψ ∈ [0, π]—when viewing the origin
from n̂. (If you want π < ψ < 2π, then this is just the same as a counterclockwise rotation
around −n̂ through angle 2π − ψ, which is in the interval [0, π].)
5. If S is a rotation of R3 , then −1 6 tr S 6 3, and the angle of rotation (described above) is
given by
−1 tr S − 1
ψ = cos . (38)
2
In particular, tr S = 3 just when ψ = 0, that is, when S = I. Also, tr S = −1 just when ψ = π.
In either of these two special cases, S2 = I, which implies that S is a symmetric matrix,
because S = S−1 = ST . If 0 < ψ < π (the general case), then S is not symmetric.
6. If S is as above and the angle of rotation ψ satisfies 0 < ψ < π, then the vector n̂ is unique
up to multiplication by a positive scalar. It can be chosen to be
n̂ = ([S]32 − [S]23 , [S]13 − [S]31 , [S]21 − [S]12 ) . (39)
p
The norm of this particular vector is kn̂k = (1 + tr S)(3 − tr S) = 2 sin ψ.
From unitaries to rotations. First, given a 2 × 2 unitary U, the corresponding rotation is given
by the matrix
hX, UXU∗ i hX, UYU∗ i hX, UZU∗ i
hXU, UXi hXU, UYi hXU, UZi
1 ∗ ∗ ∗ 1
SU = hY, UXU i hY, UYU i hY, UZU i = hYU, UXi hYU, UYi hYU, UZi ,
2 2
hZ, UXU∗ i hZ, UYU∗ i hZ, UZU∗ i hZU, UXi hZU, UYi hZU, UZi
recalling that hA, Bi = tr(A∗ B) is the Hilbert-Schmidt inner product on L(C2 ). That is, for all
i, j ∈ {1, 2, 3},
1
1
45
Exercise 8.6 Find the 3 × 3 matrix SU where
1 3 −4i
U= .
5 −4i 3
Also find n̂ (exact expression) and ψ (decimal approximation to three significant digits).
From rotations to unitaries. Now given some rotation S of R3 (S is a 3 × 3 real matrix), we find
a U such that S = SU . There are two cases:
1
U ∝ √ [(1 + tr S)I + i(([S]23 − [S]32 )X + ([S]31 − [S]13 )Y + ([S]12 − [S]21 )Z)]
2 1 + tr S
i(n̂ · σ)
= (cos(ψ/2))I − = (cos(ψ/2))I − i sin(ψ/2) (m̂ · σ) = e−iψ(m̂·σ)/2 ,
4 cos(ψ/2)
where ψ and n̂ are given by Equations (38) and (39), √ respectively, and m̂ := n̂/kn̂k is the
normalized version of n̂. Here we use the fact that 1 + tr S = 2 cos(ψ/2). I’ll explain the
last equation in the chain more fully next time. These expressions give the unique U with
positive trace such that SU = S (and in addition, det U = 1).
• If tr S = −1, then any one of the following three alternatives is a characterization of all U
such that S = SU , provided it is well-defined:
The i-th expression above (for i ∈ {1, 2, 3}) is well-defined iff [S]ii , −1. This is true for at
least one of the three for any rotation S of R3 with trace −1.
46
9 Week 5: Linear Algebra: Exponential Map, Spectral Theorem, etc.
The Exponential Map (Again). Equation (2) defines ez via a power series for all scalars z. We
can use the same power series to extend the definition to operators.
(A0 = I by convention.)
If A is an operator or matrix, then so is eA . The sum in (40) converges absolutely9 for all A.
The exponential map has may useful properties. Here’s one of the most useful, which generalizes
the familiar rule that ez1 +z2 = ez1 ez2 for scalars z1 , z2 .
eA+B = eA eB .
Proof. This closely mirrors the standard proof for scalars. We manipulate the power series directly.
Since A commutes with B, we can expand and rearrange factors in the expression (A+B)k to arrive
at an operator version of the Binomial Theorem:
X
k
k
(A + B)k = Aj Bk−j ,
j
j=0
XX
k
Aj Bk−j X X Aj B`
= = (setting ` := k − j)
j!(k − j)! j!`!
k j=0 k j,`>0 & j+`=k
∞ X
X ∞ ∞ ∞
Aj B` X Aj X B`
!
= = = eA eB .
j!`! j! `!
j=0 `=0 j=0 `=0
2
We’ll leave the other properties of eA as exercises.
9
We won’t delve deeply into what it means for an infinite sequence of operators A1 , A2 , . . . to converge (absolutely
or otherwise). One easy way to express the notion of convergence (among several equivalent ways) is to say that there
exists an operator A such that for all vectors v, the sequence of vectors A1 v, A2 v, . . . converges to Av. The operator A,
if it exists, must be unique, and we write A = limn→∞ An . Convergence of an infinite series of operators is equivalent
to the convergence of the sequence of partial sums, as usual. Absolute convergence, which we don’t bother to define
here, implies that you can regroup and rearrange terms in the sum freely without worry.
47
Exercise 9.3 Verify the following for any operators or square matrices A and B and any θ ∈ R:
Exercise 9.4 (Challenging) Let n̂ = (x, y, z) ∈ R3 such that x2 +y2 +z2 = 1, and let A = xX+yY +zZ.
Then A2 = I by Exercise 8.4. For angle ω ∈ R, define
Rn̂ (ω) = e−iωA/2 = (cos(ω/2))I − i(sin(ω/2))A.
Show that if U = Rn̂ (ω), then SU is a rotation of the Bloch sphere about the axis through n̂
counterclockwise through angle ω. [Hint: Observe that rotating around n̂ through angle ω is
equivalent to
1. rotating the sphere so that n̂ coincides with (0, 0, 1) on the +z-axis, then
2. rotating around the +z-axis counterclockwise through angle ω, then
3. undoing the rotation in item 1 above, which moves (0, 0, 1) back to n̂.
(Let (θ, ϕ) be the spherical coordinates of n̂. To achieve the first rotation, first rotate around +z
through angle −ϕ to bring n̂ into the x, z-plane, then rotate around +y through angle −θ.) Now
verify via direct matrix multiplication that
Rn̂ (ω) = Rz (ϕ)Ry (θ)Rz (ω)Ry (−θ)Rz (−ϕ).
This decomposition is known as the S3 parameterization of Rn̂ (ω).]
48
Upper Triangular Matrices and Schur Bases. In the next topic, we’ll be talking about basis-
independent properties of operators, but we will occasionally need to introduce an orthonormal
basis so that we can talk about matrices, and although all such bases are equivalent, some are more
convenient than others. If A ∈ L(H) is an operator, a Schur basis for A is an orthonormal basis
with respect to which A is represented by an upper triangular matrix, i.e., an n × n matrix M whose
entries below its diagonal are all zero: [M]ij = 0 if i > j. Upper triangular matrices have many
nice properties, so we’ll sometimes choose a Schur basis when it is convenient. Particularly in this
section, we will derive some facts about operators using a Schur basis. Theorem B.5 in Section B.2
shows that we can always choose a Schur basis:
Theorem 9.5 (Theorem B.5 in Section B.2) Every n × n matrix is unitarily conjugate to an upper
triangular matrix. That is, for every n × n matrix M, there is an upper triangular T and unitary U (both
n × n matrices) such that M = UT U∗ .
Thus a Schur basis always exists for any linear operator. The proof of Theorem 2.1 uses the fact
that every operator has an eigenvalue, which we’ll discuss in the next topic.
One key property of an upper triangular matrix is that its determinant is just the product of its
diagonal entries: if T is upper triangular, then
Y
n
det T = [T ]ii . (41)
i=1
Exercise 9.6 Show that if A and B are both upper triangular matrices, then so is AB, and for each
1 6 i 6 n, we have [AB]ii = [A]ii [B]ii , that is, the diagonal entries just multiply individually.
Exercise 9.7 Show that if A is a nonsingular, upper triangular matrix, then A−1 is upper triangular.
What are the diagonal entries of A−1 in terms of those of A?
Exercise 9.8 Show that if A is upper triangular, then so is eA , and we have [eA ]ii = e[A]ii for all
1 6 i 6 n. [Hint: Use the results of Exercise 9.6 and Equation (40) defining eA .]
Exercise 9.9 (One of my favorites.) Show that if A is any operator, then det eA = etr A . [Hint: Pick
a Schur basis for A, then use the previous exercise and Equation (41).]
Lower triangular matrices are defined analogously and have similar properties. A matrix is
diagonal if it is both upper and lower triangular, i.e., all its nondiagonal entries are zero.
49
det(A − λI) = 0, then A − λI is singular, and so it maps some nonzero vector v to 0, and so we have
(A − λI)v = 0, or equivalently, Av = λv. Thus v is an eigenvector of A with eigenvalue λ. Thus the
eigenvalues of A are exactly those scalars λ such that det(A − λI) = 0.
Let’s write A − λI in matrix form with respect to some (any) orthonormal basis. Setting
aij = [A]ij , we get
a11 − λ a12 ··· a1n
a21 a22 − λ · · · a2n
A − λI = ,
.. .. .. ..
. . . .
an1 an2 · · · ann − λ
where n = dim H. Fixing all the aij to be constant and considering λ to be a variable, one can
show that det(A − λI) is a polynomial in λ with degree n. This is the characteristic polynomial of A,10
and we denote it charA (λ). From our considerations above, the eigenvalues of A are precisely the
roots of the polynomial charA . Since C is algebraically closed (see the second lecture), charA has n
roots, and so A has exactly n eigenvalues, not necessarily all distinct. The (multi)set of eigenvalues
of A is known as the spectrum of A.
Exercise 9.10 We know that charA is basis-independent because it is defined in terms of basis-
independent things. Show directly that
The fact that A has at least one eigenvalue is a key ingredient in the proof that A has a Schur
basis (Theorem B.5 in Section B.2), as well as in the proof of the Spectral Theorem, below. So now
that we know that a Schur basis for A really exists, let’s assume that we chose a Schur basis for
A above, and so aij = 0 for all i > j, and hence A − λI is also upper triangular. So taking the
determinant, which is just the product of the diagonal entries, we get
Y
n
charA (λ) = det(A − λI) = (aii − λ). (42)
i=1
From (42) it is clear that the eigenvalues of A—the roots of charA —are exactly a11 , . . . , ann . This
is true because we chose a basis making the matrix representing A upper triangular, but from this
we get two useful, basis-independent facts: If λ1 , . . . , λn are the eigenvalues of A counted with
multiplicities, then
Pn
• tr A = i=1 λi , and
Qn
• det A = i=1 λi .
10
The characteristic polynomial of A is often defined instead as det(λI − A), which, among other things, guarantees
that its leading coefficient is 1. The two definitions coincide for even n, but for odd n, one is the negation of the other.
But in any case, both polynomials have the same roots, which is the important thing.
50
Some of the coefficients of the polynomial charA are familiar. If we expand (42) and group
together powers of (−λ), we get
charA (λ) = (−λ)n + (a11 + a22 + · · · + ann )(−λ)n−1 + · · · + a11 a22 · · · ann (43)
n n−1
= (−λ) + (tr A)(−λ) + · · · + det A. (44)
The constant term is det A, which can also be seen by noting that this term is
charA (0) = det(A − 0I) = det A.
3 −1
Exercise 9.11 Find the eigenvalues of the 2 × 2 matrix A = . Find eigenvectors corre-
4 −2
sponding to each eigenvalue.
Theorem 9.12 (Spectral Theorem for Normal Operators) Every normal operator has an eigenbasis.
That is, if A ∈ L(H) is normal, then there is an orthonormal basis with respect to which A is represented
by a diagonal matrix whose diagonal elements are the eigenvalues of A.
51
In the rest of this section, we prove this theorem and explore some of its consequences. Sec-
tion B.2 in Appendix B includes a proof of the Spectral Theorem using a Schur basis for A.11 The
proof of the Spectral Theorem we give here is independent from that and does not need a Schur
basis for A. We first prove some technical facts about normal matrices and operators.
hB, Bi = hC, Ci .
A∗ A + C∗ C = AA∗ + BB∗ .
Taking the trace of both sides and using the properties of the trace gives
tr A∗ A + tr C∗ C = tr AA∗ + tr BB∗ = tr A∗ A + tr B∗ B .
Lemma 9.14 Let H be a Hilbert space. Suppose that R ∈ L(H) is normal and there is a subspace V ⊆ H
such that R maps V into V (i.e., R(V) ⊆ V). Then R maps V ⊥ into V ⊥ , and R restricted to V (respectively
V ⊥ ) is a normal operator in L(V) (respectively L(V ⊥ )).
Proof. Let n = dim(H) and let k = dim(V) (whence dim(V ⊥ ) = n − k). We can assume that
1 6 k < n, otherwise the statement is trivial. We can choose an orthonormal basis B := {b1 , . . . , bn }
for H such that {b1 , . . . , bk } is an orthonormal basis for V and {bk+1 , . . . , bn } is an orthonormal
basis for V ⊥ . Letting M be the matrix of R with respect to B, we can write M in block form as
A B
M= ,
C D
11
In Theorem B.6, we show that if a matrix is both normal and upper triangular, then it is diagonal. Hence A is
represented in this basis as a diagonal matrix. That is, any Schur basis of a normal operator A is an eigenbasis of A.This
is an alternate proof of the Spectral Theorem for Normal Operators.
52
where A is a k × k matrix. For all 1 6 j 6 k, we have Rbj ∈ V, so Rbj (represented by the jth
column of M) is a linear combination of b1 , . . . , bk only. This means that C = 0, which implies
hB, Bi = hC, Ci = 0 by the previous lemma. Thus B = 0, and we have
A 0
M= .
0 D
From this we get that R(V ⊥ ) ⊆ V ⊥ , because for k < j 6 n, the jth column of M, which represents
Rbj , is a linear combination of bk+1 , . . . , bn only. Furthermore,
∗
AA∗
A A 0 ∗ ∗ 0
= M M = MM = ,
0 D∗ D 0 DD∗
and it follows by equating blocks that A and D are both normal matrices. 2
We now have the tools in place to give a short, tidy proof of the Spectral Theorem, Theorem 9.12.
Proof. [Spectral Theorem, Theorem 9.12] The proof is by induction on dim(H). If dim(H) = 1,
then the statement follows from the fact that every nonzero vector is an eigenvector of A and hence
forms an eigenbasis for A. Now suppose that dim(H) = n > 1, and assume the statement holds for
proper subspaces of H. Let λ be an eigenvalue of A with some corresponding eigenvector b1 ∈ H.
Without loss of generality, we can assume kb1 k = 1. Let V := {ab1 : a ∈ C} be the 1-dimensional
subspace of H spanned by b1 . For any a ∈ C, we have Aab1 = aAb1 = aλb1 ∈ V, that is, A maps
V into V. By Lemma 9.14, A also maps V ⊥ into V ⊥ , and its restriction A 0 to V ⊥ is a normal operator
in L(V ⊥ ). Since dim(V ⊥ ) = n − 1 < n, we apply the inductive hypothesis to get an eigenbasis
{b2 , . . . , bn } ⊆ V ⊥ for A 0 . Then {b1 , b2 , . . . , bn } is an eigenbasis for A. 2
If U ∈ L(H) is unitary, then U is normal, and choosing an eigenbasis for U, we represent it as
a diagonal matrix D. For each 1 6 i 6 n, let di = [D]ii . Since DD∗ = I (D is unitary), we have
Lemma 9.15 If A ∈ L(H) is normal, and v1 and v2 are eigenvectors of A with distinct eigenvalues, then
hv1 , v2 i = 0.
y := v2 − hv1 , v2 iv1 .
Then hv1 , yi = 0 = hy, v1 i, and so hy, av1 i = ahy, v1 i = 0 for all a ∈ C. Thus y ∈ V ⊥ . Lemma 9.14
states that A must then map V ⊥ into V ⊥ . In particular, hAy, v1 i = hv1 , Ayi = 0. We have
53
Since λ2 − λ1 , 0, we must have hv1 , v2 i = 0. 2
If A is an operator and λ is an eigenvalue of A, then we define the eigenspace of A with respect
to λ as
Eλ (A) = {v ∈ H : Av = λv}.
This is a subspace of H with positive dimension.
Corollary 9.16 If A is normal, then its eigenspaces are mutually orthogonal and span H. The dimension
of each eigenspace is the same as the multiplicity of the corresponding eigenvalue.
The following corollary is useful because it shows that any normal operator is a unique linear
combination of projectors that form a csop, thus revealing how projectors form the building blocks
of normal operators through their eigenvalues. It is essentially an alternate formulation of the
Spectral Theorem.
Corollary 9.17 If A is normal, then there is a unique set {(P1 , λ1 ), . . . , (Pk , λk )}, such that the λj ∈ C are
all distinct, the set {P1 , . . . , Pk } is a complete set of orthogonal projectors, and
A = λ1 P1 + λ2 P2 + · · · + λk Pk . (45)
Furthermore, λ1 , . . . , λk are the distinct eigenvalues of A, and each Pj orthogonally projects onto Eλj (A).
Proof. Suppose dim(H) = n. Let λ1 , . . . , λk be the distinct eigenvalues of A, and for all 1 6 j 6 k
let Pj be the orthogonal projector onto Eλj (A). Choose an eigenbasis {b1 , . . . , bn } for A. For each
1 6 i 6 n, let 1 6 ji 6 k be such that Abi = λji bi . Then bi lies in the eigenspace Eλji (A). This
means that bi = Pji bi , and it follows that for all Pj bi = Pj Pji bi = 0 for all j , ji . And so we have
Xk
Abi = λji bi = λji Pji = λj Pj bi .
j=1
Thus both sides of Equation (45) act the same way on each bi , and so they must be equal. This
proves existence.
To show uniqueness, we show that for any decomposition
X̀
A= µ j Qj
j=1
where the µj are pairwise distinct scalars and {Q1 , . . . , Q` } is a complete set of orthogonal projectors,
it must be that the µj are all the eigenvalues of A with the respective Qj projecting onto the
corresponding eigenspaces (and so incidentally, ` = k). Fix a j such that 1 6 j 6 `, and notice that
AQj = µj Qj . For any v ∈ H, we then have AQj v = µj Qj v, and thus Qj v is an eigenvector of A
provided Qj v , 0. Since Qj , 0, such a v must exist, and this shows that µj is an eigenvalue of A
and Qj maps H into the corresponding eigenspace Eµj (A). This implies {µ1 , . . . , µ` } ⊆ {λ1 , . . . , λk }.
We’ll be done if we show two things: (1) that Qj maps H onto Eµj (A); and (2) every eigenvalue of
A is in {µ1 , . . . , µ` }. Let u , 0 be any eigenvector of A with some eigenvalue λ.
54
We claim that Qj u = 0 for all j such that µj , λ. We have
X̀
u = Iu = Qj u . (46)
j=1
Fix j and suppose µj , λ. Let v = Qj u. Then v ∈ Eµj (A) by what we showed about Qj above, and
if v , 0, then v is an eigenvector of A with eigenvalue µj , λ. But by Lemma 9.15, we have
which implies 0 = v = Qj u, and that establishes the claim. Now if λ , µj for all 1 6 j 6 `, then
u = 0 by Equation (46); contradiction. Hence λ ∈ {µ1 , . . . , µ` }. Since λ is an arbitrary eigenvalue of
A, we have {λ1 , . . . , λk } = {µ1 , . . . , µ` } (and since the members of each set are pairwise distinct, we
must also have k = `). Now for any 1 6 j 6 ` and for any u ∈ Eµj (A), Equation (46) and the claim
give u = Qj u, that is, Qj fixes pointwise all elements of Eµj (A). Thus Qj maps H onto Eµj (A). 2
The right-hand side of Equation (45) is called the spectral decomposition of A.
A = λ1 P1 + λ2 P2 + · · · + λk Pk
A m = λm m m
1 P1 + λ2 P2 + · · · + λk Pk .
A = λ1 P1 + λ2 P2 + · · · + λk Pk
We’ll be dealing with normal operators almost exclusively from now on.
Exercise 9.20 We know that any Hermitean operator is normal with real eigenvalues. Prove the
converse: any normal operator with only real eigenvalues is Hermitean. [Hint: Use an eigenbasis.]
Exercise 9.21 We know that any unitary operator is normal with eigenvalues on the unit circle
in C. Prove the converse: any normal operator with all eigenvalues on the unit circle is unitary.
[Hint: Use an eigenbasis.]
55
Scalar Functions Applied to Operators. Let Ω ⊆ C be some set and suppose f : Ω → C is some
function mapping scalars to scalars. It is often natural and useful to extend the definition of f to
apply to operators A ∈ L(H), where H is a Hilbert space, with the results also being operators in
L(H). There are at least two situations where this can be done:
1. The value f(x) is expressible as a convergent power series about some point x0 ∈ Ω: for
every x ∈ Ω,
∞
X
f(x) = ci (x − x0 )i
i=0
2. The function f is arbitrary. For any normal operator A ∈ L(H) all of whose eigenvalues are
contained in Ω, we define
X
k
f(A) := f(λj )Pj = f(λ1 )P1 + · · · + f(λk )Pk ,
j=1
We’ve seen an example of item (1) above with the natural exponential map z 7→ ez defined on
all of C, and Exercises 9.18 and 9.19 show that this definition agrees with item (2) as well. In fact,
it can be shown that if both conditions (1) and (2) hold for some f and A, then the two definitions
will coincide. We will see an instance of item (2) below, when we take the square root of a positive
operator. It often the case that a special property that f with respect to scalars has an analogous (but
perhaps weaker) property when applied to operators. For example, ez+w = ez ew for all z, w ∈ C,
and for operators A, B ∈ L(H) we have eA+B = eA eB as well, provided A and B commute.
Here are two more general facts. The first says among other things that this concept is covariant
under unitary conjugation. The second applies to item (2) specifically.
Proposition 9.22 Let function f : Ω → C and operator A ∈ L(H) satisfy the conditions of either item (1)
or item (2) above. Then A and f(A) commute. Furthermore, for any unitary operator U ∈ L(H), we have
that f and UAU∗ also satisfy the same condition(s), and f(UAU∗ ) = Uf(A)U∗ .
Proposition 9.23 Suppose f : Ω → C and A ∈ L(H) satisfy the conditions of item (2) above. Then f(A)
is the unique operator in L(H) such that, for any v ∈ H and λ ∈ C, if v is an eigenvector of A with
eigenvalue λ, then v is an eigenvector of f(A) with eigenvalue f(λ).
56
Positive Operators.
Definition 9.24 An operator A ∈ L(H) is positive or positive semidefinite (written A > 0) iff v∗ Av > 0
for all v ∈ H. We say that A is strictly positive or positive definite (written A > 0) if v∗ Av > 0 for all
nonzero v ∈ H.
Since u∗ Av = hu, Avi for all vectors u, v ∈ H, positivity of A is equivalent to hv, Avi > 0 for
all v ∈ H, or in Dirac notation, hψ|A|ψi > 0 for all |ψi ∈ H. Obviously, strict positivity implies
positivity.
For example, the zero operator 0 ∈ L(H) and the identity operator I ∈ L(H) are clearly positive:
= 0 and v∗ Iv = v∗ v = kvk2 > 0 for all v. In fact, I > 0 as well.
v∗ 0v
Exercise 9.25 Verify that if A > 0 and B > 0 are positive operators and a > 0 is a nonnegative real
number, then A + B > 0 and aA > 0.
Positive operators play a huge role in the study of quantum information, so it is worth spending
some time with them.
Exercise 9.26 (A bit challenging) Show for any operator A ∈ L(H) that A is Hermitean if and only
if v∗ Av ∈ R for all v ∈ H. (Thus every positive operator is Hermitean and hence normal.) [Hint:
The forward direction is easy. For the reverse direction, consider the matrix elements of A with
respect to some orthonormal basis b1 , . . . , bn . Consider three types of cases:
1. v = bk for some k. What does this tell you about the diagonal elements [A]kk ?
2. v = bk + bj for some k , j. This allows you to relate [A]kj and [A]jk in some way.
3. v = bk + ibj for the same k, j above. This allows you to relate [A]kj and [A]jk further.]
Exercise 9.27 Show that A > 0 if and only if A is normal and all its eigenvalues are nonnegative
real numbers. (It follows that if A > 0, then tr A > 0.) [Hint: Use the previous exercise.]
Exercise 9.28 Show that if A > 0 and tr A = 0, then A = 0. [Hint: Use the previous exercise.]
Exercise 9.29 Show that the zero operator is the only operator A satisfying A > 0 and −A > 0.
[Hint: Use the previous two exercises.]
Exercise 9.30 Show that the following are equivalent for any operator A:
1. A > 0.
57
You may have noticed that you can determine a lot about a normal operator by its spectrum.
This is not too surprising, because
• most properties we’ve been looking at of the underlying matrices are basis-invariant (i.e.,
invariant under unitary conjugation),
• the spectrum of a diagonal matrix is just the set of diagonal elements of the matrix.
Each entry in the following table is easily checked by representing the operator as a diagonal
matrix with respect to an eigenbasis.
A = λ1 P1 + · · · + λk Pk
uniquely according to Corollary 9.17. Since A > 0, we have λj > 0 for 1 6 j 6 k. Now let
p p
B := λ1 P1 + · · · + λk Pk .
√ √
B has eigenvalues λ1 , . . . , λk > 0, so B > 0. By Exercise 9.18, we get
p 2 p 2
B2 = λ1 P1 + · · · + λk Pk = A.
To show uniqueness, suppose that B, C > 0 such that B2 = C2 = A. Using Corollary 9.17 again,
decompose
B = µ1 P1 + · · · + µk Pk ,
C = ν1 Q1 + · · · + ν` Q` .
So,
B2 = µ21 P1 + · · · + µ2k Pk = A = ν21 Q1 + · · · + ν2` Q` = C2 .
Note that the µj are distinct and nonnegative (same with the νj ), and therefore so are the µ2j
(same with the ν2j ). Then since the decomposition of A from Corollary 9.17 is unique, we must
58
have {(P1 , µ21 ), . . . , (Pk , µ2k )} = {(Q1 , ν21 ), . . . , (Q` , ν2` )}. Thus k = ` and {(P1 , µ1 ), . . . , (Pk , µk )} =
{(Q1 , ν1 ), . . . , (Qk , νk )}, because all the µj > 0 and νj > 0. So we must have B = C.
Notice that, since the same projectors are involved in the decompositions of A and B, it follows
that the eigenvectors of A and B coincide, and the corresponding eigenvalues of B are the square
roots of those of A.
The square root function applied to positive operators that we’ve just defined is an example of
a scalar function applied to an operator that we discussed in the previous topic. The fact that any
positive operator has a (positive) square root is useful in many places. For example, we get the
following theorem:
Theorem 9.31 Let A and B be any positive operators over a Hilbert space H. Then hA, Bi > 0, with
equality holding if and only if AB = 0. (Recall that hA, Bi = tr(A∗ B).)
Proposition 9.33 Let A be an n × n matrix and B an m × n matrix, for positive integers m and n. If
A > 0, then BAB∗ > 0. (Note that BAB∗ is an m × m matrix.)
∗ ∗ ∗
√ √ ∗ D√
∗
√ ∗ E
√ ∗
2
v BAB v = v B A AB v = AB v, AB v =
AB v
> 0 .
59
Exercise 9.35 Show that if A is any operator, then A > 0 if and only if A = |A|.
Exercise 9.36 Show that if A is any operator, then the eigenvalues of |A| are the absolute values of
the eigenvalues of A.
Here we summarize many equivalent ways of characterizing positive operators into a single
proposition.
Proposition 9.37 Let H be an n-dimensional Hilbert space for some n > 0, and let A ∈ L(H) be an
operator (equivalently, a n × n matrix with respect to some standard basis {e1 , . . . , en } of H). The following
are equivalent:
3. A = |A|.
Proof. (1) ⇔ (2) by Exercise 9.27. (1) ⇔ (3) by Exercise 9.35. Itp
then suffices show (3) ⇒ (4) ⇒
(5) ⇒ (1) and (1) ⇒ (6) ⇒ (7) ⇒ (1). For (3) ⇒ (4), set B := |A|. (4) ⇒ (5) is obvious. For
(5) ⇒ (1), we have, for any v ∈ H,
hence A > 0. (1) ⇒ (6) follows from Theorem 9.31. We have (6) ⇒ (7) because uu∗ > 0 (check
it!). For (7) ⇒ (1), for any nonzero v ∈ H, let u := v/kvk. Then u is a unit vector, and we have
Set vk := Bek for all 1 6 k 6 P n. For (8) ⇒ (1), let v1 , . . . , vn ∈ H be given satisfying (8). Then for
any v ∈ H, we can write v = n i=1 xi ei for some x1 , . . . , xn ∈ C. Then
Xn X X X X
* +
∗ ∗ ∗ ∗ ∗
v Av = xi xj (ei Aej ) = xi xj [A]ij = xi xj vi , vj = xi vi , xj vj = hu, ui > 0
i,j=1 i,j i,j i j
60
Pn
by the positive definiteness of h·, ·i, where u := k=1 xk vk . 2
Before leaving this topic, we define and give some basic properties of a binary 6 relation on
L(H), or equivalently, on square matrices. This relation arises naturally from the notion of operator
positivity.
Definition 9.38 Let H be an n-dimensional Hilbert space and let A, B ∈ L(H) (equivalently, A and
B are n × n matrices). We say that A 6 B iff B − A > 0, i.e., iff B − A is a positive operator.
Most of Proposition 9.39, below, follows from the properties of positive operators we have
established above. For technical convenience, we state things in terms of matrices rather than
operators.
Proposition 9.39 Let n and m be positive integers, and let A and B be any n × n matrices.
2. If A 6 B and B 6 A, then A = B.
5. For any n × n matrix F, if A and B are Hermitean, A 6 B, and F > 0, then hF, Ai 6 hF, Bi (and
both quantities are real).
6. If A and B are Hermitean and A 6 B, then tr A 6 tr B (and both quantities are real).
7. If A is a projector, then 0 6 A 6 I.
Corollary 9.40 If A and B are operators, A 6 I, and B > 0, then tr(AB) 6 tr B (and both quantities are
real).
Proof. Since I − A > 0 and hence is Hermitean, it follows that A is Hermitean. We then have
61
Commuting Operators. In this topic, we’ll prove the fundamental result that commuting normal
operators always share a common eigenbasis, and so they are simultaneously diagonalizable. This
is a stronger version of the Spectral Theorem, which only deals with one normal operator.
Theorem 9.41 Let C be an arbitrary family12 of normal operators in L(H), any two of which commute,
i.e., AB = BA for all A, B ∈ C. Then there is an orthonormal basis B of H that is an eigenbasis for all
operators in C simultaneously.
To prove Theorem 9.41, we will use Lemma 9.14 paired with the following fundamental
property of commuting operators:
Lemma 9.42 Let A, B ∈ L(H) be commuting operators, and let E ⊆ H be any eigenspace of A. Then B
maps E into E.
Proof. Let E = Eλ (A) be the eigenspace of A corresponding to some eigenvalue λ of A. Then for
any v ∈ E, we have
ABv = BAv = B(λv) = λBv .
Thus either Bv = 0 or Bv is an eigenvector of A with eigenvalue λ. In either case, Bv ∈ E, which
proves the lemma. 2
Proof of Theorem 9.41. This proof is somewhat similar to that of the Spectral Theorem. We
proceed by induction on n = dim(H). If n = 1, then all operators in C are scalar multiples of the
identity operator, making H itself a common eigenspace of all operators in C, and any unit vector
in H is then a common eigenbasis.
Now assume n > 1 and the theorem holds for any Hilbert space of dimension less than n. We
prove it true for dimension n by first finding at least one common eigenvector for all the operators
in C. We will then continue as in the proof of the Spectral Theorem. To find a common eigenvector,
we construct a finite, strictly descending chain of subspaces
H = E0 ⊃ E1 ⊃ E2 ⊃ · · · ⊃ Ek ,
where Ei is a proper subspace of Ei−1 for all 1 6 i 6 k, and dim(Ek ) > 0. (Any such chain must
be finite, because the dimension decreases by at least 1 for each successive Ei .) We will do this in
such a way that all nonzero vectors in Ek are common eigenvectors of all the operators in C, that
is, all operators in C are multiples of the identity when restricted to Ek . For convenience, for each
0 6 i 6 k, we also define Ci to be the set of all restrictions to Ei of operators in C. We will maintain
the invariant that every operator A ∈ Ci maps Ei into itself (that is, A ∈ L(Ei )), from which it
follows from Lemma 9.14 that A is a normal operator on Ei .
Now for the construction.13 First, set E0 := H, whence C0 := C. Note that the above invariant
holds trivially for i = 0. Then for i := 1, 2, 3, . . . in increasing order (until we stop), do the following:
• If all operators in Ci−1 are multiples of the identity on Ei−1 , then set k := i − 1 and STOP.
12
C need not be finite—or even countable.
13
The construction makes a series of arbitrary choices, so it is not unique.
62
• Otherwise, choose an operator A ∈ Ci−1 that is not a multiple of the identity, and let Ei
be any eigenspace of A. Note that such an Ei exists (as one does for any operator), that Ei
is a proper subspace of Ei−1 , and that dim(Ei ) > 0. Also note that every operator in Ci−1
commutes with A and thus maps Ei into itself by Lemma 9.42. From this one can see that
the invariant is maintained for i.
the inductive hypothesis with space E⊥ k and set of operators C to obtain an orthonormal basis
⊥
Tensor Products and Combining Physical Systems. Suppose we have two physical systems
S and T with state spaces HS and HT , respectively, and we want to consider the two systems
together as a single system ST . What is the state space of ST ? Quantum mechanics says that
the state space of ST is completely determined by HS and HT via a construction called the tensor
product. We’ll first describe the tensor product of matrices, then we’ll discuss the tensor product
in a basis-independent way.
Let A be an m × n matrix and let B be an r × s matrix (m, n, r, s are arbitrary positive integers).
The tensor product of A and B (also called the outer product or the direct product or the Kronecker
product) is the mr × ns matrix given in block form by
a11 B a12 B · · · a1n B
a21 B a22 B · · · a2n B
A⊗B= .
.. .. . . ..
. . . .
am1 B am2 B · · · amn B
We collect the standard, easily verifiable properties of the ⊗ operation here in one place.
Proposition 10.1 For any matrices A, B, C, D and scalars a, b ∈ C, the following equations hold provided
the operations involved are well-defined:
63
5. (A ⊗ B)(C ⊗ D) = AC ⊗ BD. (This is worth memorizing because we’ll use it all the time.)
6. (A ⊗ B)∗ = A∗ ⊗ B∗ .
Exercise 10.3 Show that if A and B are Hermitean (respectively, unitary), then A ⊗ B is Hermitean
(respectively, unitary).
ei ⊗ fj = g(i−1)n+j
From this it is easy to see that if {b1 , . . . , bm } and {c1 , . . . , cn } are any orthonormal bases for Cm and
Cn , respectively, then {bi ⊗ cj : 1 6 i 6 m & 1 6 j 6 n} is an orthonormal basis for Cmn . Indeed,
we have
bi ⊗ cj , bk ⊗ c` = hbi , bk i cj , c` = δik δj` ,
which is 1 if i = k and j = ` and is 0 otherwise.
This last bit suggests that we can define the tensor product in a basis-independent way, applied
to (abstract) vectors and operators. If H and J are Hilbert spaces, then we can define a Hilbert
space H ⊗ J (the tensor product of H and J) together with a bilinear map ⊗ : H × J → H ⊗ J,
mapping any pair of vectors u ∈ H and v ∈ J to a vector u ⊗ v ∈ H ⊗ J, such that if {b1 , . . . , bm } and
{c1 , . . . , cn } are orthonormal bases for H and J, respectively, then {bi ⊗ cj : 1 6 i 6 m & 1 6 j 6 n}
is an orthonormal basis for H ⊗ J. We’ll call such a basis a product basis. We won’t do it here, but
it can be shown that these two rules—bilinearity and the basis rule—define in essence the Hilbert
space H ⊗ J uniquely. Notice that the basis rule implies that the dimension of H ⊗ J is the product
of the dimensions of H and J.
64
It’s worth pointing out that not all vectors in H ⊗J are of the form u⊗v for u ∈ H and v ∈ J. For
example, the column vector (1, 0, 0, 1) = e1 + e4 cannot be written as the single tensor product of
two 2-dimensional column vectors. It can, however, be written as the sum of two tensor products:
1
0
= 1 ⊗ 1 + 0 ⊗ 0 .
0 0 0 1 1
1
In general a vector in H ⊗ J may not be a tensor product, but it is always a linear combination of
them (which is clear by our discussion about bases, above), i.e., the tensor products span the space
H ⊗ J.
We’re not done overloading the ⊗ symbol. Given the definition of H ⊗ J just described, we
can extend ⊗ to apply to operators as well as vectors. For example, we can extend it to a map
⊗ : L(H) × L(J) → L(H ⊗ J) by defining the action of an operator A ⊗ B on a vector u ⊗ v ∈ H ⊗ J:
(A ⊗ B)(u ⊗ v) = Au ⊗ Bv.
One can show that this definition is consistent, and since H ⊗ J is spanned by vectors of the form
u ⊗ v, this defines the operator A ⊗ B uniquely by linearity. We could define ⊗ on dual vectors and
other kinds of linear maps, e.g., mapping from one space to another space.
Picking orthonormal bases for H and J allows us to represent objects such as vectors, dual
vectors, operators, or what have you, in both spaces as matrices. When we do this, the abstract
and matrix-based notions of ⊗ completely coincide, as is the case with the other linear algebraic
constructs that we’ve seen, e.g., adjoint, trace, et cetera. This idea (that the two notions should
coincide) guides us in any further extensions of the ⊗ operation that we may wish to use.
Back to Combining Physical Systems. If S and T are physical systems with state spaces HS and
HT as before, then the state space of the combined system is HST = HS ⊗ HT . If |ϕiS ∈ HS is a
state of S and |ψiT ∈ HT is a state of T (we occasionally add subscripts to make clear which state
goes with which system), then |ϕiS ⊗ |ψiT is a state of ST , which we interpret as saying, “The
system S is in state |ϕiS , and the system T is in state |ψiT .” (We’ll often drop the ⊗ and write
|ϕiS ⊗ |ψiT simply as |ϕiS |ψiT , or even just |ϕ, ψi if the meaning is clear. The same holds for
bras as well as kets.) As we’ve seen, however, there can be states of ST√that can’t be written as a
single tensor product, for example, the two-qubit state (|0i|0i + |1i|1i)/ 2. These states are called
entangled states, whereas states of the form |ϕiS |ψiT are called separable states or tensor product states.
More on this later.
How does this look in the density operator formalism? Easy answer: exactly the same, at least
for separable states. Let ρS = |ϕihϕ| be the density operator corresponding to |ϕi of system S,
and let ρT = |ψihψ| be the density operator corresponding to |ψi of system T (subscripts dropped).
Then the density operator for the combined system should be
So we take the tensor product of the density operators just as we would do with the
√ vectors in the
original formulation. For the two-qubit entangled state example (|0i|0i + |1i|1i)/ 2 above, which
65
√
we abbreviate as (|00i + |11i)/ 2, the corresponding density operator is
1 1 0 0 1
|00i + |11i
h00| + h11| 1 0 0 1 = 1
1 0 0 0 0 0
ρ= √ √ = .
2 2 2 0 2 0 0 0 0
1 1 0 0 1
If S and T are isolated from each other (and the outside world), then each evolves in time
according to a unitary operator, say U for system S and V for system T . U and V are called local
operations. In this case, U ⊗ V is the unitary giving the time evolution of the combined system: for
tensor product state |ϕiS ⊗ |ψiT , we have (U ⊗ V)(|ϕiS ⊗ |ψiT ) = U|ϕiS ⊗ V|ψiT , which is again a
tensor product state. If S and T are brought together so that they interact, then the unitary giving
the evolution of the combined system ST might not be able to be written as a single tensor product
of unitaries for S and T respectively.
Exercise 10.4 Let HS and HT be Hilbert spaces, and let P1 , . . . , Pk ∈ L(HS ) be a complete set of
orthogonal projectors for HS . Show that P1 ⊗ I, . . . , Pk ⊗ I is a complete set of orthogonal projectors
for HS ⊗ HT , where I is the identity operator on HT . (The latter set represents a projective
measurement on the system S when viewed from the combined system ST .)
The No-Cloning Theorem. Quantum states cannot be duplicated in general. The following
theorem makes this precise.
66
H
Figure 3: Sample quantum circuit with two qubits. Time moves from left to right in the figure.
The gate H is applied first to the first qubit, then CNOT is applied to both qubits.
Theorem 10.5 (No-Cloning Theorem) Let H be a Hilbert space of dimension at least two, and let |0i ∈ H
be a fixed unit vector. There is no unitary operator U ∈ L(H ⊗ H) such that U|ψi|0i ∝ |ψi|ψi for any
unit vector |ψi ∈ H.
Proof. Suppose U exists as above, and let |ϕi, |ψi ∈ H be any two unit vectors. Since U is unitary,
we have
hϕ|ψi = hϕ|ψih0|0i
= (hϕ|h0|)(|ψi|0i)
= (hϕ|h0|)U∗ U(|ψi|0i)
= (U|ϕi|0i)∗ U(|ψi|0i)
∝ (hϕ|hϕ|)(|ψi|ψi)
= hϕ|ψi2 ,
and thus |hϕ|ψi| = |hϕ|ψi|2 , which implies |hϕ|ψi| is either 0 or 1, i.e., |ϕi and |ψi are either
orthogonal or colinear. But clearly we can choose |ϕi and |ψi such that this is not the case. 2
Quantum Circuits. The quantum circuit has become the de facto standard theoretical model of
quantum computation. It is equivalent to the other standard model—the quantum Turing machine,
or QTM—but it is easier to work with and represent visually. Quantum circuits are closely
analogous to classical Boolean circuits, and we’ll compare them occasionally.
A quantum circuit consists of some number of qubits, called a quantum register, represented by
horizontal wires. The qubits start in some designated state, representing the input to the circuit.
From time to time, we may act on one or more qubits in the circuit by applying a quantum gate,
which is just a unitary operator applied to the corresponding qubits. A typical circuit with a
two-qubit register is shown in Figure 3. To keep track, we number the qubits in the register from
top to bottom, so that the topmost qubit is the first, etc. At any given time, the register is in some
quantum state |ψi ∈ H ⊗ · · · ⊗ H = H⊗n , where H is here the state space of a single qubit, and n
is the number of qubits in the register. We choose an orthonormal basis for H⊗n by taking tensor
products of the individual one-qubit basis vectors |0i and |1i. We call this basis the computational
basis for the register. For example, a typical computational basis vector in H⊗5 is
In this state, the first, second, and fourth qubits are 0, and the third and fifth qubits are 1. The state
space of an n-qubit register has dimension 2n , with computational basis vectors representing all
67
the 2n possible values of n bits, listed in the usual binary order: |00 · · · 00i, |00 · · · 01i, |00 · · · 10i,
etc., through |11 · · · 11i.
In the circuit diagram, the state of the register evolves in time from left to right. In Figure 3,
for example, the first gate that is applied is the leftmost gate, i.e., the H gate applied to the first
qubit. Here, we are not using H as a variable to describe any one-qubit gate, but rather we use H
to denote a useful one-qubit gate, known as the Hadamard gate, given by
1 1 1
H= √ .
2 1 −1
Note that
√
H|0i = (|0i + |1i)/ 2,
√
H|1i = (|0i − |1i)/ 2,
or more succinctly,
|0i + (−1)b |1i
H|bi = √ ,
2
√
for any b ∈ {0, 1}. Clearly, H = (X + Z)/ 2 and H2 = I. We also have H ∝ R(1,0,1)/ √2 (π), and so
H rotates the Bloch sphere 180◦ around the line through (1, 0, 1), swapping the +z-axis with the
+x-axis.
Note that although it looks as if we are only applying H to the first qubit, were are really
transforming the state |ψi ∈ H ⊗ H of the entire two-qubit register via the unitary H ⊗ I, where I
is the one-qubit identity operator representing the fact that we are not acting on the second qubit.
Suppose that the initial state of the register is |00i. After the H gate is applied, the state becomes
68
11 Week 6: Quantum gates
The next gate in Figure 3 is another very useful, two-qubit gate called a controlled NOT or C-NOT
gate, acting on both qubits. In a C-NOT gate, the small black dot connects to the control qubit (here,
the first qubit) and the ⊕ end connects to the target qubit. If the control is |0i, then the target does
not change; if the control is |1i, then the target’s Boolean value is flipped |0i ↔ |1i (logical NOT).
The control qubit is unchanged regardless. Here it is schematically for any a, b ∈ {0, 1} (here, ⊕
represents bitwise exclusive OR, i.e., bitwise addition modulo 2):
a a
b a⊕b
The matrix for the C-NOT gate above, with the first qubit being the control and the second being
the target, is
1 0 0 0
0 1 0 0
P0 ⊗ I + P1 ⊗ X = |0ih0| ⊗ I + |1ih1| ⊗ X =
0 0 0 1 .
0 0 1 0
Here X is the usual Pauli X operator, which swaps 0 with 1, and hence represents logical NOT. If
the control and target qubits were reversed, then the gate would be
1 0 0 0
P0 P1 0 0 0 1
I ⊗ P0 + X ⊗ P1 = =
0
.
P1 P0 0 1 0
0 1 0 0
After the C-NOT gate is applied to the state |ψ1 i in Figure 3, the new and final state of the circuit is
69
Exercise 11.1 Verify that every permutation matrix is unitary.
The C-NOT gate is one example of a controlled gate. More generally, if U is a unitary gate on
k qubits, we can define the (k + 1)-qubit controlled U gate to be
I 0
C-U = P0 ⊗ I + P1 ⊗ U = ,
0 U
where in this case the control qubit is the first qubit. The matrix would be different if the control
were not the first qubit, but the rule is the same in any case: If the control qubit is 0, then nothing
happens with the other (target) qubits. If the control is 1, then U is applied to the target qubits.
The control qubit is unchanged regardless. Here’s how we draw it in the case where U acts on a
single qubit:
is known as the phase gate. Note that S ∝ Rz (π/2) and that S2 = Z. S rotates the Bloch sphere
counterclockwise about the +z axis 90 degrees.
" #
1 0
1 0
T= = .
0 1+i
√
2
0 eiπ/4
For some obscure reason, this gate is known as the π/8 gate, maybe because
e−iπ/8
0
T ∝ Rz (π/4) = .
0 eiπ/8
We have T 2 = S, and T rotates the Bloch sphere counterclockwise 45◦ about the +z-axis. Notice
that T is the only one-qubit gate we’ve seen so far that does not map all axes to axes (i.e., x-, y-, and
z-axes) in the Bloch sphere. I’d call the three gates Z, S, and T conditional phase-shift gates, that leave
the Boolean value of the qubit unchanged while introducing various phase factors conditioned on
the qubit having Boolean value 1.
Here’s another two-qubit classical gate, the SWAP gate:
70
1 0 0 0
0 0 1 0
= =
0
1 0 0
0 0 0 1
The first depiction is mine and other people’s; the second is the one the textbook uses. The SWAP
gate just exchanges the Boolean values of the two qubits it acts on, fixing |00i and |11i but mapping
|01i to |10i and vice versa.
[Hint: Rather than multiplying matrices, which can be time-consuming, just compare what the
two circuits do to the four possible basis states.]
Exercise 11.4 This is a nonclassical exercise in several parts. It will help you to simplify circuits
by inspection, based on some circuit identities. It mirrors Exercises 4.13 and 4.17–4.20 on pages
177–180 of the text. An item may use previous items.
1. Verify directly that HXH = Z and that HZH = X (oh yes, and that HYH = −Y).
2. Verify that
Z
=
Z
What is the matrix of this gate? The same is true for the C-S and C-T gates.
=
∗
UAU U∗ A U
(Remember that in the expression UAU∗ , operators are applied from right to left, but in the
circuit, gates are applied from left to right.) [Hint: Consider separately the case when the
control qubit is |0i and when it is |1i. To show equality of two linear operators generally, you
only need to show that they both act the same on the vectors of some basis.]
71
4. Construct a C-Z gate using a single C-NOT gate and two H gates. Similarly, construct a
C-NOT gate using a single C-Z gate and two H gates.
H H
=
H H
Note that gates acting on separate qubits commute, and so it doesn’t matter which of the
gates is applied first, and the order can be freely switched, provided that there are no gates
in between that connect the qubits together. You can think of the gates as being applied
simultaneously if you like.
Finally, we introduce a three-qubit classical gate known as the Toffoli gate, which is really a
controlled controlled NOT gate:
a a
b b
c c ⊕ (a ∧ b)
There are two control qubits and one target qubit. The control qubits are unchanged, and the target
is flipped if and only if both of the controls are 1.
Quantum Circuits Versus Boolean Circuits. Are quantum circuits with unitary gates as powerful
as classical Boolean circuits? You may have already noticed some similarities and differences
between the two circuit models:
• Both types of circuits carry bit values on wires which are acted on by gates.
• Quantum gates can create superpositions from basis states, but Boolean gates are classical,
mapping Boolean input values to definite Boolean output values.
• A Boolean gate may take some number of inputs (usually one or two), and has one output,
which can be freely copied into any number of wires, and thus the number of wires from
layer to layer may change. In quantum circuits, quantum gates are operators mapping the
state space into itself, and so it always has the same number of outputs as inputs. Thus the
number of qubits never changes, and each qubit retains its identity throughout the circuit.
• Boolean gates may lose information from inputs to output, i.e., the input values are not
uniquely recoverable from the output value (e.g., and AND gate or an OR gate). Any quan-
tum unitary gate U can always be undone (at least theoretically) by applying U∗ immediately
72
before or afterwards. Thus quantum unitary gates are reversible, i.e., the input state is always
uniquely recoverable from the output state.
A quantum circuit can use classical gates, provided that they are reversible. Does this pose a
significant restriction on the power of quantum circuits to simulate classical computation? Not
really. Every classical Boolean circuit can be simulated reversibly. More precisely, we have the
following result:
Theorem 11.5 For every Boolean function f : {0, 1}n → {0, 1}m with n inputs and m outputs, there
is a reversible circuit C (equivalently, a quantum circuit using only classical gates) such that, for all
x = (x1 , . . . , xn ) ∈ {0, 1}n and all y = (y1 , . . . , ym ) ∈ {0, 1}m , we have,
x1 x1
.. ..
. .
xn xn
y1 y1 ⊕ z 1
.. ..
. C .
ym ym ⊕ z m
0 0
.. ..
. .
0 0
where (z1 , . . . , zm ) = f(x). Furthermore, C uses only X and Toffoli gates, and if Cf is a (classical) Boolean
circuit computing f using binary AND, OR, and unary NOT gates, then a description for C can be computed
from a description of Cf in polynomial time.
The circuit C acts on three quantum registers: the input qubits, whose initial values are
x1 , . . . , xn ; the output qubits (or target qubits), whose initial values are y1 , . . . , ym , and a set of
“work” qubits, called an ancilla, whose initial and final value is always 00 · · · 0. When all the
ancilla values are restored to 0 at the end of the circuit, we call this a clean circuit. The ancilla
is used for temporary storage of intermediate results. If the y1 , . . . , ym are all 0 initially, then
f(x) will appear as the final configuration of the output register. In quantum terms, if the initial
state is the basis state |xi ⊗ |yi ⊗ |0 · · · 0i = |x, y, 0 · · · 0i, then the final state is the basis state
|xi ⊗ |y ⊕ f(x)i ⊗ |0 · · · 0i = |x, y ⊕ f(x), 0 · · · 0i, where the three labels in the |·i represent the
contents of the three quantum registers. We often suppress the ancilla register and say that C takes
|x, yi to |x, y ⊕ f(x)i.
Note that C is clearly reversible. In fact, C is its own inverse. If we feed the output values on
the right as input values on the left, then C computes the original inputs as outputs.
We’ll only sketch a proof of Theorem 11.5. If Cf is a Boolean circuit computing f, we build C
by replacing each gate of Cf with one or more Toffoli gates. We replace NOT gates with Pauli X
gates and AND gates with
73
a
0 a∧b
Here we use a fresh ancilla qubit for the second control wire.If we need to copy the Boolean value
of a qubit, we can use
a a
0 a
Here, we use a fresh ancilla qubit for the second control wire, and flip it from 0 to 1 with an X gate.
To replace an OR gate, we can first express it with AND and NOT gates according to De Morgan’s
laws, then replace the AND and NOT gates as above.
Notice that the following one-gate circuit cleanly implements the C-NOT gate (the ancilla stays
0):
0 X X 0
x1 g1
..
. ..
.
xn
0 D gk
.. z1
. ..
.
0 zm
74
x1 x1
.. ..
. .. .
.
xn xn
∗
0 D D 0
.. ..
. ... .
0 0
y1 y1 ⊕ z 1
.. .. ..
. . .
ym ym ⊕ z m
Figure 4: A full implementation of the circuit C. Inputs and ancilla values are restored by undoing
the computation after copying the outputs to fresh qubits. The locations of the output register and
the ancilla are swapped for ease of display. The circuit implementing D∗ , the inverse of D is an
exact mirror image of the circuit for D. The values on the qubits intermediate between the D and
D∗ subcircuits, from top down, are g1 , . . . , gk , z1 , . . . , zm . A C-NOT gate connects each zi with the
qubit carrying yi . Some additional ancillæ (not shown) are used to implement the C-NOT gates
via Toffoli gates.
The intended outputs z1 , . . . , zm are somewhere on the right-hand side, and we show them below
the other qubits, which contain unused garbage values g1 , . . . , gk . This circuit, which implements
some unitary operator D, is reversible but may not be clean. We have to clean it up. First, we copy
the intended outputs onto fresh wires using C-NOT gates, then we undo the D computation by
applying the exact same gates as in D but in reverse order, taking note that both the Toffoli and X
gates are their own inverses. The final circuit is shown in Figure 4.
Exercise 11.6 (Challenging because it’s long) The circuit below outputs 1 if and only if at least two
of x1 , x2 , x3 are 1. The three gates in the left column are AND gates; the other two are OR gates.
x1
x2 maj(x1 , x2 , x3 )
x3
Convert this circuit into a reversible circuit as in Theorem 11.5, above. Can you make any im-
provements to the construction?
75
Why Clean? We’d like to occasionally include one circuit as a subcircuit of another circuit. When
we do this, we want to ignore any additional ancilla qubits used by the subcircuit, considering them
“local” to the subcircuit, as we did in Figure 4 with the C-NOT gates. If we don’t restore the ancilla
qubits to their original values, then we can’t ignore them as we’d like. Some of the computation
will bleed into the unrestored ancilla qubits. This will be especially true with nonclassical quantum
circuits.
Let C be a circuit with unitary gates that acts on n input and output qubits, using m ancilla
qubits. Let H be the 2n -dimensional Hilbert space of the input/output qubits, and let A be the
2m -dimensional space of the ancilla. Then C is a unitary operator in L(H ⊗ A). If C is clean, then
it restores the ancilla to |0 · · · 0i, provided the ancilla started that way. Therefore, for every state
|ψin i ∈ H there is a unique state |ψout i ∈ H such that C(|ψin i ⊗ |0 · · · 0i) = |ψout i ⊗ |0 · · · 0i. Let
C 0 : H → H be the mapping that takes any |ψin i to the corresponding |ψout i. C 0 is clearly a linear
operator in L(H), and further, for any states |ψ1 i and |ψ2 i in H, we have
= ψ1 |C 0∗ C 0 |ψ2 .
Thus C 0 preserves the inner product on H and so must be unitary. This justifies our suppressing
the ancilla when we use C as a new unitary “gate” in another circuit. We are really using C 0 , which
C implements with its “private” ancilla. We can’t do this for a general unitary C ∈ L(H ⊗ A).
Measurement gates. So far, we’ve only seen unitary gates, reflecting unitary evolution of the
qubit or qubits. To get any useful, classical information from a circuit, we must be able to make
measurements. At the very least, it is only reasonable that we should be able to measure the
(Boolean) value of a qubit, that is, we should be able to make a projective measurement {P0 , P1 }
of any qubit with respect to the computational basis. We represent such a measurement by the
one-qubit gate
(For those of you failing to appreciate the artistry of my iconography, the gate depicts an eye in
profile. In the book, this gate is depicted as the display of a gauge with a needle.) The incoming
qubit is measured projectively in the computational basis, and the classical result (a single bit) is
carried on the double wire to the right. If there are other qubits present in the system, then the
projective measurement is really {P0 ⊗ I, P1 ⊗ I}, where I is the identity operator applying to the
qubits not being measured (recall Exercise 10.4).
76
There are two uses for a qubit measurement. The first, obvious use is to read the answer from
the final state of a computation. The second is to control future operations in the circuit by using
the result of an intermediate measurement. For example, the result of a measurement may be used
to control another gate:
The U gate is applied to the second qubit if and only if the result of measuring the first qubit is 1.
Unlike a qubit, a classical bit can be duplicated freely and used to control many gates later in the
circuit.
Based on the discussion after Exercise 10.4, we may measure several different qubits at once,
since the actual chronological order of the measurements does not matter. Here’s a completely
typical example: we decide to measure qubits 2, 3, and 5 of a n-qubit system (where n > 5,
obviously). The state |ψi of an n-qubit system can always be expressed as a linear combination of
basis states: X
|ψi = αx |xi, (47)
x∈{0,1}n
If we measure qubits 2,3, and 5 when the system is in state |ψi, what is the probability that we
will see, say, 101, i.e., 1 for qubit 2, 0 for qubit 3, and 1 for qubit 5? The corresponding projector is
P = I ⊗ P1 ⊗ P0 ⊗ I ⊗ P1 ⊗ I ⊗ I, where I is the single-qubit identity operator. The probability is then
X
Pr[101] = hψ|P|ψi = |αx |2 ,
x : x2 x3 x5 =101
where we are letting xj denote the jth bit of x. That is, we only retain those terms in the sum in (48)
in which the corresponding bits of x match the outcome. Upon seeing 101, the post-measurement
state will be
1 X
ψpost = P|ψi = αx |xi.
Pr[101] Pr[101]
x : x2 x3 x5 =101
We will often measure several qubits at once, so this example will come in handy.
77
Bell States and Quantum Teleportation. Recall the circuit of Figure 3. Let B be the two-qubit
unitary operator realized by this circuit. The four states obtained by applying B to the four
computational basis states are known as the Bell states and form the Bell basis:
√
:= B|00i = (|00i + |11i)/ 2,
+
Φ (49)
√
:= B|01i = (|01i + |10i)/ 2,
+
Ψ (50)
√
:= B|10i = (|00i − |11i)/ 2,
−
Φ (51)
√
:= B|11i = (|01i − |10i)/ 2.
−
Ψ (52)
These states are also called EPR states or EPR pairs. In a sense we will quantify later, these states
represent maximally entangle pairs of qubits. EPR is an acronym for Einstein, Podolsky, and
Rosen, who coauthored a paper describing apparent paradoxes in the rules of quantum mechanics
involving pairs of qubits in states such as these. Suppose a pair of electrons is prepared whose
spins are in one of the Bell states, say |Φ+ i. (There are actual physical processes that can do this.)
The electrons can then (theoretically) be separated by a great distance—the first taken by Alice to
a lab at UC Berkeley in California and the second taken by Bob to a lab at MIT in Massachusetts. If
Alice measures her spin first, she’ll see 0 or 1 with equal probability. Same with Bob if he measures
his spin first. But if Alice measures her spin first and sees, say, 0, then according to the standard
Copenhagen interpretation of quantum mechanics (which we are using), the state of the two spins
collapses to |00i, so if Bob measures his spin afterwards, he will see 0 with certainty. So Alice’s
measurement seems to affect Bob’s somehow. Einstein called this phenomenon “spooky action at
a distance.” We’ll talk about this more later, time permitting.
Philosophical problems aside, entangled pairs of qubits can be used in interesting and subtle
ways. One of the earliest discovered uses of EPR pairs is to teleport an unknown quantum state
across a distance using only classical communication, in a process called quantum teleportation. Sup-
pose Alice and Bob share two qubits in the state |Φ+ i as above, which may have been distributed
to them long ago. Suppose also that Alice has another qubit in some arbitrary, unknown state
She wants Bob to have this state. She could mail her electron to Bob, but this won’t work because
the state |ψi of the electron is very delicate and will be destroyed if the package is bumped, screened
with X-rays, etc. Instead, she can transfer this state to Bob with only a phone call. No quantum
states need to be physically transported between Alice and Bob. Here’s how it works: The state of
the three qubits initially is
√ √
|ψiΦ+ = (α|0i + β|1i)(|00i + |11i)/ 2 = (α|000i + α|011i + β|100i + β|111i)/ 2.
(53)
Alice possesses the first two qubits; Bob possesses the third. Alice applies the inverse B∗ of the
circuit of Figure 3:
H
B∗ =
78
b1
|ψi
B∗ b2
|Φ+ i
X Z |ψi
Figure 5: Quantum teleportation of a single qubit. Alice possesses the first qubit in some arbitrary,
unknown state |ψi. The second and third qubits are an EPR pair prepared in the state |Φ+ i
sometime in the past, with the second qubit given to Alice and the third to Bob. Alice applies B∗
to her two qubits, then measures both qubits, then communicates the results b1 , b2 ∈ {0, 1} of the
measurements to Bob. Bob uses this information to decide whether to apply Pauli X and Z gates
to his qubit.
to her two qubits. She then measures each qubit in the computational basis, getting Boolean values
b1 and b2 for the first and second qubits, respectively. She then calls Bob on the phone and tells
him the values she observed, i.e., b1 and b2 . Bob then does the following with his qubit (the third
qubit): (i) if b2 = 1, then Bob applies an X gate, otherwise he does nothing; then (ii) if b1 = 1, then
he applies a Z gate, otherwise he does nothing. At this point, Bob’s qubit will be in state |ψi. We
can illustrate the process by the circuit in Figure 5. Let’s check that Bob actually does wind up
with |ψi. It will make our work easier to first express the initial state of (53) using the Bell basis.
It’s easy to check that
√
|00i = Φ+ + Φ− / 2,
√
|01i = Ψ+ + Ψ− / 2,
√
|10i = Ψ+ − Ψ− / 2,
√
|11i = Φ+ − Φ− / 2,
1 + −
|0i + α Ψ+ + Ψ− |1i + β Ψ+ − Ψ− |0i + β Φ+ − Φ− |1i
α Φ + Φ
2
1 +
Φ (α|0i + β|1i) + Ψ+ (α|1i + β|0i) + Φ− (α|0i − β|1i) + Ψ− (α|1i − β|0i) .
=
2
Going back to Equations (49–52) and applying B∗ to both sides, we see that B∗ maps |Φ+ i to |00i
and so on. So after Alice applies B∗ to her two qubits, the state becomes
1
[|00i (α|0i + β|1i) + |01i (α|1i + β|0i) + |10i (α|0i − β|1i) + |11i (α|1i − β|0i)] . (54)
2
Now Alice measures her two qubits. She’ll get one of four possible values: 00, 01, 10, 11, all with
probability 1/4. For b1 , b2 ∈ {0, 1}, let |ψb1 b2 i be the state of the three qubits after the measurement,
79
b2 b1
X Z b1
|Φ i
+ B ∗
b2
Figure 6: Dense coding. The EPR pair is initially distributed between Alice and Bob, with Alice
getting the first qubit. The stuff above the dotted line belongs to Alice, and the rest belongs to Bob.
The qubit crosses the dotted line when Alice sends it to Bob.
assuming the result is b1 , b2 . By applying the corresponding projectors to the state in (54) and
normalizing, we get
We see that Bob’s qubit is now in one of four possible states: |ψi, X|ψi, Z|ψi, or XZ|ψi, depending
on whether the values measured by Alice are 00, 01, 10, or 11, respectively. Now Bob simply uses
the information about b1 and b2 to undo the Pauli operators on his qubit, yielding |ψi in every
case.
This scenario can be used to teleport an n-qubit state from Alice to Bob by teleporting each
qubit separately, just as above.
Note that Alice must tell Bob the values b1 and b2 so that Bob can recover |ψi reliably. This
means that quantum states cannot be teleported faster than the speed of light. Also note that after
the protocol is finished, Alice no longer possesses |ψi. She can’t, because that would violate the
No-Cloning Theorem. Finally, note that the EPR state that Alice and Bob shared before the protocol
no longer exists. It is used up, and can’t be used to teleport additional states. Thus, teleporting an
n-qubit state needs n separate EPR pairs.
Dense Coding. In quantum teleportation, with the help of an EPR pair, Alice can substitute
transmitting a qubit to Bob with transmitting two classical bits. There is a converse to this: with
the help of an EPR pair, Alice can substitute transmitting two classical bits to Bob with transmitting
a single qubit. This inverse trade-off is known as dense coding.
Figure 6 illustrates how dense coding works. Alice has two classical bits b1 and b2 that she
wants to communicate to Bob. She also shares an EPR pair with Bob in state |Φ+ i as before. If
b2 = 1, Alice applies X to her half of the EPR pair, otherwise she does nothing. Then, if b1 = 1, she
applies Z to her qubit, otherwise she does nothing. She then sends her qubit to Bob. Bob now has
both qubits. He applies B∗ to them then measures each of his qubits, seeing b1 and b2 as outcomes
with certainty.
80
Here are the four possible states of the two qubits when Alice sends her qubit to Bob, corre-
sponding to the four possible values of b1 b2 (here, I is the one-qubit identity operator):
|ψ00 i = (I ⊗ I)Φ+ = Φ+ ,
√
|ψ01 i = (X ⊗ I)Φ+ = (|10i + |01i)/ 2 = Ψ+ ,
√
|ψ10 i = (Z ⊗ I)Φ+ = (|00i − |11i)/ 2 = Φ− ,
√
|ψ11 i = (ZX ⊗ I)Φ+ = (|01i − |10i)/ 2 = Ψ− .
So Alice is just preparing one of the four Bell states. So when Bob applies B∗ to |ψb1 b2 i, he gets
|b1 b2 i, yielding b1 b2 upon measurement.
Note that, as before, the EPR pair is consumed in the process.
Exercise 12.2 Recall the two-qubit swap operator SWAP satisfying SWAP|ai|bi = |bi|ai for all
a, b ∈ {0, 1}. Show that the four Bell states are eigenvectors of SWAP. What are the corresponding
eigenvalues? For this and other reasons, the states |Φ+ i, |Φ− i, and |Ψ+ i are often called symmetric
states, triplet states, or spin-1 states, while the state |Ψ− i is often called the antisymmetric state, the
singlet state, or the spin-0 state.
81
13 Week 7: Basic quantum algorithms
Black-Box Problems. Many quantum algorithms solve what are called “black-box” problems.
Typically, we are given some Boolean function f : {0, 1}n → {0, 1}m and we want to answer some
question about the function as a whole, for example, “Is f constant?”, “Is f the zero function?”, “Is
f one-to-one?”, etc. We are allowed to feed an input x ∈ {0, 1}n to f and get back the output f(x).
The input x is called a query to f and f(x) is the query answer. Other than making queries to f, we
are not allowed to inspect f in any way, hence the black-box nature of the function. (A black-box
function f is sometimes called an oracle.) Generally, we would like to answer our question by
making as few queries to f as we can, since queries may be expensive.
In the context of quantum computing, the function f is most naturally given to us as a classical,
unitary gate Uf that acts on two quantum registers—the first with n qubits and the second with
m qubits—and behaves as follows for all x ∈ {0, 1}n and y ∈ {0, 1}m :
This is reasonable, given the restriction that unitary quantum gates must be reversible. Uf is
called an f-gate. To solve a black-box problem involving f, we are allowed to build a quantum
circuit using f-gates—as well as the other usual unitary gates. Each occurrence of an f-gate in the
circuit counts as a query to f, so the number of queries is the number of f-gates in the circuit. The
difference between classical queries to f and quantum queries to f is that we can feed a superposition
of several classical inputs (basis states) into the f-gate, obtaining a corresponding superposition
of the results. We’ll see in a minute that we can use this idea, known as quantum parallelism to get
more information out of f in fewer queries than any classical computation.
Deutsch’s Problem and the Deutsch-Jozsa Problem. The first indication that quantum compu-
tation may be strictly more powerful than classical computation came with a black-box problem
posed by David Deutsch: Given a one-bit Boolean function f : {0, 1} → {0, 1}, is f constant, that is,
is f(0) = f(1)? There are four possible functions {0, 1} → {0, 1}: the constant zero function, the con-
stant one function, the identity function, and the negation function. Deutsch’s task is to determine
whether f falls among the first two or the last two. Classically, it is clear that determining which is
the case requires two queries to f, since we need to know both f(0) and f(1). Quantally, however,
we can get by with only one query to f. Define
√
|+i := H|0i = (|0i + |1i)/ 2, (55)
√
|−i := H|1i = (|0i − |1i)/ 2, (56)
where H is the Hadamard gate. The states |+i and |−i correspond to the states |+xi and |−xi we
defined earlier when we were discussing the Bloch sphere. If we feed these states into Uf like so:
|+i
Uf
|−i
82
|0i H H
Uf
|0i X H
Figure 7: The full circuit for Deutsch’s problem. The second qubit is not used after it emerges from
the f-gate.
then the progression of states through the circuit from left to right is
|+i|−i = (|0i + |1i)(|0i − |1i)/2
= (|00i − |01i + |10i − |11i)/2
Uf
7→ (|0, f(0)i − |0, 1 ⊕ f(0)i + |1, f(1)i − |1, 1 ⊕ f(1)i)/2
=: |ψout i.
If f is constant, i.e., if f(0) = f(1) = y for some y ∈ {0, 1}, then
|ψout i = (|0, yi − |0, 1 ⊕ yi + |1, yi − |1, 1 ⊕ yi)/2
= (|0i + |1i)(|yi − |1 ⊕ yi)/2
= (−1)y (|0i + |1i)(|0i − |1i)/2
= (−1)y |+i|−i.
If f is not constant, i.e., if f(0) = y = 1 ⊕ f(1) for some y ∈ {0, 1}, then
|ψout i = (|0, yi − |0, 1 ⊕ yi + |1, 1 ⊕ yi − |1, yi)/2
= (|0i − |1i)(|yi − |1 ⊕ yi)/2
= (−1)y (|0i − |1i)(|0i − |1i)/2
= (−1)y |−i|−i.
Now suppose we apply the Hadamard gate H to the first qubit of |ψout i. We obtain
±|0i|−i if f is contant,
|φi := (H ⊗ I)|ψout i =
±|1i|−i if f is not constant.
So now we measure the first qubit of |φi. We get 0 with certainty if f is constant, and we get 1
with certainty otherwise. We can prepare the initial state |+i|−i by applying two Hadamards and
a Pauli X gate. The full circuit is in Figure 7. We only use the f-gate once, but in superposition.
That is the key point.
Deutsch and Jozsa generalized this idea to a function f : {0, 1}n → {0, 1} with n inputs and one
output. The corresponding (n + 1)-qubit Uf gate looks like
.. ..
x . . x
Uf
y y ⊕ f(x)
83
We say that f is balanced if the number of inputs x such that f(x) = 0 is equal to the number of
inputs x such that f(x) = 1, namely, 2n−1 . The Deutsch-Jozsa problem is as follows: We are given f
as above as a black-box gate, and we know (we are promised) that f is either constant or balanced,
and we want to determine which is the case. Answering this question classically requires 2n−1 + 1
queries to f in the worst case, since it is possible that f is balanced but the first 2n−1 queries may
all yield the same answer. Quantally, we can do much better; one query to f suffices.
The set-up is similar to what we just did, but instead of using an (n + 1)-qubit f-gate directly,
it is easier to work with an n-qubit inversion f-gate If defined as follows for every x ∈ {0, 1}n :
That is, If leaves the values of the qubits alone but flips the sign iff f(x) = 1. We’ve defined If on
computational basis vectors. Since If is linear, this defines If on all vectors in the state space of n
qubits. If can be implemented cleanly (and easily) using Uf thus:
.. .. .. ..
. If . = . .
Uf
|0i X H H X |0i
For any input state |xi where x ∈ {0, 1}n , the progression of states through the circuit from left to
right is
X
|x, 0i 7→ |x, 1i
H
7→ |xi|−i
Uf √
7→ |xi(|f(x)i − |1 ⊕ f(x)i) 2
√
= (−1) f(x)
|xi(|0i − |1i)/ 2
= (−1)f(x) |xi|−i
H
7→ (−1)f(x) |xi|1i
X
7→ (−1)f(x) |x, 0i
as advertized. Since only one f-gate is used to implement If , each occurrence of If in a circuit
amounts to one occurrence of Uf in the circuit.
To determine whether f is constant or balanced, we use the following n-qubit circuit:
|0i H H
.. .. .. ..
. . If . .
|0i H H
84
The dots indicate that all n qubits start in state |0i, a Hadamard gate is applied to each qubit before
and after If , and all qubits are measured at the end. Before we view the progression of states, let’s
see what happens when we apply a column of n Hadamard gates all at once to n qubits in the state
|xi, for any x = x1 x2 · · · xn ∈ {0, 1}n . (We denote the n-fold Hadamard operator as H⊗n .) Noting
that, for all b ∈ {0, 1},
1 1 X
H|bi = √ (|0i + (−1)b |1i) = √ (−1)bc |ci,
2 2 c∈{0,1}
we get
n
H⊗n
O
|xi 7→ H|xi i
i=1
1
n
O X
= (−1)xi yi |yi i
2n/2
i=1 yi ∈{0,1}
1 X X
= ··· (−1)x1 y1 +···+xn yn |y1 i ⊗ · · · ⊗ |yn i
2n/2
y1 ∈{0,1} yn ∈{0,1}
1 X
= (−1)x·y |yi,
2n/2
y∈{0,1}n
H⊗n 1 X
|00 · · · 0i 7→ |xi (because (00 · · · 0) · x = 0) (57)
2n/2 n
x∈{0,1}
I 1 X
7→f (−1)f(x) |xi (58)
2n/2 n
x∈{0,1}
H⊗n 1 X X
7→ (−1)f(x) (−1)x·y |yi (59)
2n n n
x∈{0,1} y∈{0,1}
1 X
= (−1)f(x)+x·y |yi (60)
2n n
x,y∈{0,1}
1 X X
= (−1)f(x)+x·y |yi (61)
2n n n
y∈{0,1} x∈{0,1}
Suppose first that f is constant, and we let |ψconst i denote this last state. Then (−1)f(x) = ±1
independent of x, and so we can bring it out side the sum:
1 X X
|ψconst i = ± n (−1)x·y |yi
2 n n
y∈{0,1} x∈{0,1}
85
X 1 X X
! !
1
= ± n (−1)0 |0 i ± n
n
(−1)x·y |yi
2 2
x y,0n x
1 X X
!
= ±|0n i ± (−1)x·y |yi.
2n n x
y,0
Since 2
1 X X
1 = hψconst |ψconst i = 1 + 2n x·y
(−1) ,
2 n
x
y,0
P
we must have x (−1)x·y = 0 for all y , 0n ,14 and thus
|ψconst i = ±|0n i.
When we measure the qubits in state |ψconst i, we will see 0n with certainty.
Now suppose that f is balanced, and we let |ψbal i denote the state of (61). Again separating the
|0n i-term from the rest, we get
1 X 1 X X
|ψbal i = n (−1)f(x) |0n i + n (−1)f(x)+x·y |yi.
2 n
2 n n
x∈{0,1} y,0 x∈{0,1}
P f(x)
But f is balanced, and so x (−1) = 0 because each term contributes +1 for f(x) = 0 and −1
for f(x) = 1. Thus,
1 X X
|ψbal i = n (−1)f(x)+x·y |yi.
2 n n
y,0 x∈{0,1}
When we measure the qubits in state |ψbal i, we see 0n with probability zero. So we never see 0n ,
but instead we’ll see some random y , 0n .
To summarize: when we measure the qubits, if we see 0n , then we know that f is constant; if
we see anything else, then we know that f is balanced.
Exercise 13.1 (Challenging) Let f : {0, 1}n → {0, 1} be a Boolean function. We’ve implemented an
If gate using Uf and a few standard gates. Show how to implement Uf given a single If gate and
some standard gates. Thus Uf and If are computationally equivalent. Your circuit is allowed to
depend on the value of f(0n ), that is, you can have one circuit that works assuming f(0n ) = 0 and
another (slightly different) circuit that works assuming f(0n ) = 1. [Hint: Build a quantum circuit
with three registers: n input qubits; one output qubit; n ancilla qubits. Assume x ∈ {0, 1}n is the
classical input. Using one Hadamard and n Toffoli gates, convert the input state |x, y, 0n i into the
superposition
|x, 0, 0n i + (−1)y |x, 1, xi
√ .
2
Then feed the ancilla register into If , then undo what you did before applying If . What state do
you wind up with? What else do you need to do, if anything?]
14
P
Here’s another way to see that x (−1)x·y = 0 for all y , 0n : If y ,P
0n , then one ofPy’s bits P
is 1. For convenience, let’s
0 0
assume that the first bit of y is 1, and we let y be the rest of y. Then x (−1)x·y = x1 ∈{0,1} x 0 ∈{0,1}n−1 (−1)x1 x ·1y =
0
P P x 0 ·y 0
P 0 0 P 0 0
x1 (−1)
x1
x 0 (−1) = x 0 (−1)x ·y − x 0 (−1)x ·y = 0.
86
Exercise 13.2 Here are some more circuit equalities for you to verify. Remember that circuits
represent linear operators, and thus to show that two circuits are equal, it suffices to show that
they act the same on the vectors of some basis, e.g., the computational basis.
1. Check that
T
=
S T∗ T
=
Rz (ϕ) Rz (ϕ/2) Rz (−ϕ/2)
[Hint: If the control qubit on the right-hand side is 1, then XRz (−ϕ/2)XRz (ϕ/2) is applied
to the target qubit. Note that XRz (−ϕ/2)X = XeiϕZ/4 X = eiϕXZX/4 = e−iϕZ/4 = Rz (ϕ/2).
The second equation follows from Exercise 9.3(7).]
= S
H T∗ T T∗ T H
Combining this with item 1 above gives a circuit implementing the Toffoli gate using only
C-NOT, H, T , and T ∗ gates (and we could do without T ∗ explicitly by using T 7 instead,
because T 8 = I). The Nielsen & Chuang textbook has a closely similar implementation of the
Toffoli gate on page 182, but it’s not optimal; it has one more gate than is necessary. [Hint: It
will help first to transform this equation into an equivalent one by applying H gates on the
third qubit to both sides of both circuits, i.e., unitarily conjugating both sides of the equation
by I ⊗ I ⊗ H. This has the effect of canceling out both the H gates on the right-hand circuit,
and the left-hand side becomes
H H Z
87
which flips the overall sign of the state (i.e., gives an eiπ = −1 phase change) iff all three
qubits are 1. The advantage of doing this is that now nothing in the right-hand circuit creates
any superpositions; each gate maps a computational basis state to a computational basis
state, up to a phase factor. Now proceed by cases, considering the possible 0, 1-combinations
of the values of the three qubits, adding up the overall phase angles generated. You can
simplify the task further by noticing a few general facts:
• A 0 on the control qubit of a C-NOT gate eliminates the gate.
• Adjacent T and T ∗ gates on the same qubit cancel.
• Adjacent C-NOT gates with the same control and target qubits cancel.]
Exercise 13.3 (Challenging) This exercise is a puzzler that is best solved by finding the right series
of rotations of the Bloch sphere. Find a single-qubit unitary U such that
=
H U∗ U
Furthermore, you are restricted to expressing U as the product of a sequence of operators, all of
which are either H or T . [Hint: You are trying to find a U such that UXU∗ = H. X gives a π-rotation
of the Bloch sphere about the x-axis, and H gives a π-rotation about the line ` through the point in
the x, z-plane halfway
√ between
√ the +x- and +z-axes, with spherical coordinates (π/4, 0) (Cartesian
coordinates (1/ 2, 0, 1/ 2)). So U must necessarily give a rotation that moves the x-axis to `, so
that U∗ (applied first) moves ` to the x-axis, then X (applied second) rotates π around the x-axis,
then U (applied last) moves the x-axis back to `, the net effect of all three being a π-rotation about
`. One possibility for U is a (−π/4)-rotation about the y-axis, but you must implement this using
just H and T , the latter giving a π/4-rotation about the z-axis.]
Simon’s Problem. The Deutsch-Jozsa problem is hard to decide classically, requiring exponen-
tially many (in n) queries to f. But there is a sense in which this problem is easy classically: if
we pick inputs to f at random and query f on those inputs, we quickly learn the right answer
with high probability. If we ever see f output different values, then we know for certain that f is
balanced, since it is nonconstant. Conversely, if f is balanced and we make 100 random queries
to f, then the chances that f gives the same answer to all our queries is exceedingly small—2−99 .
So we have an efficient randomized algorithm for finding the answer: Make m uniformly and
independently random queries to f, where m is, say, 100. If the answers are all the same, output
“constant”; otherwise, output “balanced.” We will never output “balanced” incorrectly. We might
output “constant” incorrectly, but only with probability 21−m , i.e., exponentially small in m. This
algorithm runs in time polynomial in n and m.
As with classical computation, quantum circuits can simulate classical randomized compu-
tation. We won’t pursue that line further here, though. Instead, we’ll now see a black-box
problem—Simon’s problem—that
88
• can be solved efficently with high probability on a quantum computer, but
• cannot be solved efficiently by a classical computer, even by a randomized algorithm that is
allowed a probability of error slightly below 1/2.
In Simon’s problem, we are given a black-box Boolean function f : {0, 1}n → {0, 1}m , for some
n 6 m. We are also given the promise that there is an s ∈ {0, 1}n such that for all distinct
x, y ∈ {0, 1}n ,
f(x) = f(y) ⇐⇒ x ⊕ y = s.
This condition determines s uniquely: either s = 0n and f is one-to-one, or s , 0n in which case f
is two-to-one with f(x) = f(x ⊕ s) for all x, and s is the unique nonzero input such that f(s) = f(0).
Our task is to find s.
The function f is given to us via the gate Uf as before, such that Uf |x, yi = |x, y ⊕ f(x)i for
all x ∈ {0, 1}n and y ∈ {0, 1}m . Consider the following quantum algorithm with two quantum
registers—an n-qubit input register and an m-qubit output register.
5. We now measure the first register (all n qubits), obtaining some value y ∈ {0, 1}n .
Exercise 14.1 Draw the quantum circuit implementing the algorithm above.
What z do we get in the last step? Note that f(x) = f(x ⊕ s) for all x, and that as x ranges
through all of {0, 1}n , so does x ⊕ s. Thus we can rewrite |ψout i as a split sum and combine terms
in pairs:
1
|ψout i = (|ψout i + |ψout i)
2
X X
!
= 2−n−1 (−1)x·y |y, f(x)i + (−1)(x⊕s)·y |y, f(x ⊕ s)i
x,y x,y
X X
!
= 2 −n−1
(−1)x·y |y, f(x)i + (−1)(x⊕s)·y |y, f(x)i
x,y x,y
X
−n−1 x·y x·y+s·y
|y, f(x)i
= 2 (−1) + (−1)
x,y
X
= 2−n−1 (−1)x·y [1 + (−1)s·y ] |y, f(x)i
x,y
X
= 2 −n
(−1)x·y |y, f(x)i.
x,y : s · y is even
89
The basis states |y, f(x)i for which s · y is odd cancel out, and we are left with a superposition of
only states where s · y is even, with probability amplitudes differing only by a phase factor. So in
Step 5 we will see an arbitrary such y ∈ {0, 1}n , uniformly at random. If s = 0n , then s · y is even
for all y, so each y ∈ {0, 1}n will be seen with probability 2−n . If s , 0, then s · y is even for exactly
half of the y ∈ {0, 1}n , each of which will be seen with probability 21−n .
How does this help us find s? If s , 0n and we get some y in Step 5, then we know that
s · y is even, which eliminates half the possibilities for s. Repeating the algorithm will give us
some y 0 independent of y such that s · y 0 is even. This added constraint will most likely cut our
search space in half again. After repeated executions of the algorithm, we will get a series of
random constraints like this. After a modest number of repetitions, the constraints taken together
will uniquely determine s with high probability. To show this, we need a brief linear algebraic
digression, which will also help us when we discuss binary codes later.
Linear Algebra over Z2 . Until now, we’ve been dealing with vectors and operators with scalars
in C (and occasionally R). These are not the only two possible scalar domains (known in algebra as
fields) over which to do linear algebra. Another is the two-element field Z2 := {0, 1}, with addition
and multiplication defined thus:
+ 0 1 × 0 1
0 0 1 0 0 0 .
1 1 0 1 0 1
Addition and multiplication are the same as in Z, except that 1+1 = 0. Addition is also the same as
the XOR operator ⊕. The additive identity is 0 and the multiplicative identity is 1. Since x + x = 0
in Z2 for any x, the negation −x (additive inverse) of x is x itself. Thus subtraction is the same as
addition. Finally, note that for all x1 , . . . , xn ∈ Z2 , x1 + · · · + xn = 0 (in Z2 ) if and only if x1 + · · · + xn
(in Z) is even.
Column vectors, row vectors, and matrices over Z2 are defined just as over C, except that all
the entries are in Z2 and all scalar arithmetic is done in Z2 . We call these objects bit vectors and bit
matrices. We can identify binary strings in {0, 1}n with bit vectors in Zn2.
Most of the basic concepts of linear algebra can be extended to Z2 (indeed, any field). Matrix
addition and multiplication, trace and determinant of square matrices, and square matrix inversion
are defined completely analogously to the case of C. Same with vector spaces, subspaces, and
linear operators. All the basic results of linear algebra carry over to Z2 . For example,
• For any n, tr is a linear operator from the space of n × n matrices to Z2 , and tr(AB) = tr(BA)
for any conformant bit matrices A and B such that AB is square.
• det(AB) = (det A)(det B) for any square A and B, and A is invertible iff det A , 0.
• charA (λ) = det(A − λI) as before. Its roots are the eigenvalues of A.
• Linear combination, linear (in)dependence, span, and the concept of a basis are the same as
before. Every bit vector space has a basis, and any two bases of the same space have the
same cardinality (the dimension of the space).
90
• The adjoint A∗ is defined as the transpose conjugate as before, but in Z2 we define 0∗ = 0
and 1∗ = 1, and so the adjoint is the same as the transpose in this case.
• The scalar product of two (column) bit vectors x and y is x∗ y = x · y, but here the result is in
Z2 , where 0 represents an even number of 1s in the sum and 1 represents an odd number of
1s. In all of our uses of the dot product of bit vectors, we’ve only cared about whether the
value was even or odd, so we’re not losing any utility here.
• Orthogonality can be defined in terms of the dot product as before, as well as mutually
orthogonal subspaces and the orthogonal complement V ⊥ of a subspace V of some bit
vectors space A. If A has dimension n and V ⊆ A is a subspace of dimension k, then V ⊥ has
dimension n − k as before, and (V ⊥ )⊥ = V as before.
Not everything works the same over Z2 as over C. Here are some differences:
• An n-dimensional vector space over Z2 is finite, with exactly 2n elements, one for each
possible linear combination of the basis vectors
• There is no notion of “positive definite.” We can have x · x = 0 but x , 0 (i.e., x has a positive
but even number of 1s). The norm of a vector cannot be defined in the same way as with C,
however, a useful norm-like quantity associated with each bit vector x is the number of 1s in
x, known as the Hamming weight of x and denoted wt(x).
• The concept of unit vector and orthonormal basis don’t work over Z2 like they do over C,
and there is no Gram-Schmidt procedure.
• Mutually orthogonal subspaces may have nonzero vectors in their intersection. Indeed, it
may be the case that V ⊆ V ⊥ for nontrivial V.
• Z2 is not algebraically closed. This means, for example, that a square matrix may not have
any eigenvectors or eigenvalues.
Exercise 14.3 Find the two 2 × 2 matrices over Z2 that have no eigenvalues or eigenvectors.
(Challenging) Prove that there are only two.
Let A be an m × n matrix (over any field F). The rank of A, denoted rank A, is the maximum
number of linearly independent columns of A (or rows—it does not matter). Equivalently, it is
the dimension of the span of the columns of A (or rows—it does not matter). An m × n matrix
A has full rank if rank A = min(m, n), and this is the highest rank and m × n matrix can have. A
square matrix is invertible if and only if it has full rank. The kernal of A, denoted ker A, is the set
of column vectors v ∈ Fn such that Av = 0. The kernal of A is a subspace of Fn . Its dimension is
91
known as the nullity of A. A standard theorem in linear algebra is that the sum of the rank and the
nullity of A is equal to the number of columns of A, i.e., n. The rank of any given bit matrix A is
easy to compute; you can use Gaussian elimination, for example. If the nullity of A is positive, it
is also easy to find a nonzero bit vector v such that Av = 0 (the right-hand side is the zero vector
(a bit vector)).
Back to Simon’s Problem. If we run the quantum algorithm above k times for some k > n, we
get k independent, uniformly random vectors y1 , . . . , yk ∈ Zn
2 such that the following k linear
equations hold:
y1 · s = 0,
..
.
yk · s = 0.
Let A be the k × n bit matrix whose rows are the yi . Then the above can be expressed as the single
equation As = 0, where 0 denotes the zero vector in Zk 2 . Thus, s ∈ ker A.
The whole solution to Simon’s problem is as follows: Run the algorithm above n times,
obtaining y1 , . . . , yn ∈ Zn
2 . Let A be the n × n bit matrix whose rows are the yi .
Several things need explaining here. For one thing, the algorithm may fail to find s, outputting
“I don’t know.” We’ll see that this is reasonably unlikely to happen. For another thing, if we find
that A is invertible in Step 2, then we know that s = A−1 0 = 0, so our output is correct. Finally, in
Step 3 we know that an s exists and is unique: the nullity of A is n − rank A = n − (n − 1) = 1,
so ker A is a one-dimensional space, which thus has 21 = 2 elements, one of which is the zero
vector. The final check is to determine which of these is the correct output. So if the algorithm does
output an answer, that answer is always correct. Such a randomized algorithm (with low failure
probability) is called a Las Vegas algorithm, as opposed to a Monte Carlo algorithm which is allowed
to give a wrong answer with low probability.
What are the chances of the algorithm failing? If the algorithm fails, then rank A < n − 1, which
certainly implies that the matrix formed from first n − 1 rows of A has rank less than n − 1. So
if we bound the latter probability, we bound the probability of failure. For 1 6 k 6 n, let Ak be
the bit matrix formed from the first k rows of A. Each row of A is a uniformly random bit vector
in the space S = {0, s}⊥ , which has dimension n − 1 (if s , 0) or n (if s = 0). Thus S has at least
2n−1 vectors. Consider the probability that rank An−1 = n − 1, i.e., that An−1 has full rank. This
is true iff all rows of An−1 are linearly independent, or equivalently, iff the Ak have full rank for
all 1 6 k 6 n − 1. We can express this probability as a product of conditional probabilities:
Y
n−1
Pr[rank An−1 = n − 1] = Pr[rank A1 = 1] Pr[rank Ak = k | rank Ak−1 = k − 1].
k=2
92
Clearly, rank A1 = 1 iff its row is a nonzero bit vector in S, and so
|S| − 1 2n−1 − 1
Pr[rank A1 = 1] = > = 1 − 21−n .
|S| 2n−1
Now what is Pr[rank Ak = k | rank Ak−1 = k − 1] for k > 1? If rank Ak−1 = k − 1, then the rows
of Ak−1 are linearly independent, and thus span a (k − 1)-dimensional subspace of D ⊆ S that has
2k−1 elements. Assuming this, Ak will have full rank iff its last row is linearly independent of the
other rows, i.e., the last row is an element of S − D. Thus,
Y
n−1 Y
n−1
Pr[rank An−1 = n − 1] > (1 − 2k−n ) = (1 − 2−k ) = pn−1 ,
k=1 k=1
where we define
Y
m
pm := (1 − 2−k ) (62)
k=1
for all m > 0. Clearly, 1 = p0 > p1 > · · · > pn > · · · > 0, and it can be shown that if
p := limm→∞ pn , then 1/4 < p < 1/3. Thus the chances are better than 1/4 that An−1 will have
full rank, and so the algorithm will fail with probability less than 3/4. This seems high, but if we
repeat the whole process r times independently, then the chances that we will fail on all r trials is
less than (3/4)r , which goes to zero exponentially
P in r. The expected number of trials necessary to
succeed at least once is thus at most ∞ r=1 (r/4)(3/4) r−1 = 4.
Shor’s Algorithm for Factoring. In the early 1990s, Peter Shor showed how to factor an integer
N on a quantum computer in time polynomial in lg N (which is roughly the number of bits needed
to represent N in binary). All known classical algorithms for factoring run exponentially slower
than this (with a somewhat liberal definition of “exponentially slower”). Although it has not
been shown that no fast classical factorization algorithm exists, it is widely believed that this is
the case (and RSA security depends on this being the case). Shor’s algorithm is the single most
important quantum algorithm to date, because of its implications for public key cryptography.
Using similar techniques, Shor also gave quantum algorithms for quickly solving the discrete
logarithm problem, which also has cryptographic (actually cryptanalytical) implications. To do
Shor’s algorithm correctly, we need a couple more mathematical detours.
Modular Arithmetic. If a and m are integers and m > 0, then we can divide a by m and get two
integer results—quotient and remainder. Put another way, there are unique integers q, r such that
0 6 r < m and a = qm + r. We let a mod m denote the number r. For any integer m > 1, we let
Zm = {0, 1, . . . , m − 1} = {a mod m : a ∈ Z}, and we define addition and multiplication in Zm just
as in Z except that we take the result mod m. Our previous discussion about Z2 is a special case of
this. Arithmetic in Zm resembles arithmetic in Z in several ways:
93
• Both operations are associative and commutative.
• A unique additive inverse (negation) −x ∈ Zm exists for each element x ∈ Zm , such that
x + (−x) = 0. In fact, −0 = 0, and −x = m − x if x , 0. Clearly, −(−x) = x, and (−x)y = −xy
in Zm . Subtraction is defined as addition of the negation as usual: x − y = x + (−y).
• A multiplicative inverse (reciprocal) may or may not exist for any given element x ∈ Zm
(that is, a b ∈ Zm such that xb = 1 in Zm ). If it does, it is unique and written x−1 or 1/x,
and we say that x is invertible or a unit. If x is a unit, then so is x−1 , and (x−1 )−1 = x. 0 is
never a unit, but 1 and −1 are always units. Division can be defined as multiplication by the
reciprocal as usual, provided the denominator is a unit: x/y = x(1/y), provided 1/y exists.
We let Z∗m be the set of all units in Zm . Z has only two units—1 and −1—but Zm may have many
units other than ±1. The units of Zm are exactly those elements x that are relatively prime to m
(i.e., gcd(x, m) = 1). If m is prime, then all nonzero elements of Zm are units. In any case, Z∗m
contains 1 and is closed under multiplication and reciprocals, but not necessarily under addition.
Exercise 14.4 What is Z∗30 ? Pair the elements of Z∗30 with their multiplicative inverses.
For any x ∈ Z∗m we define the order of x in Z∗m to be the least r > 0 such that xr = 1. Such an
r must exist: The elements of the sequence 1, x, x2 , x3 , . . . are all in Zm , which is finite, so by the
Pigeon Hole Principle there must exist some 0 6 s < t such that xs = xt . Multiplying both sides
by x−s , we get 1 = x−s xs = x−s xt = xt−s , and incidentally, t − s > 0.
Factoring Reduces to Order Finding. Shor’s algorithm does not factor N directly. Instead it
solves problem of finding the order of an element x ∈ Z∗N . This is enough, as we will now see.
Let N be a large composite integer, and let x be an element of Z∗N . Suppose that you had at
your disposal a black box into which you could feed x and N, and the box would promptly output
the order of x in ZN . Then you could use this box to find a nontrivial factor of N quickly and with
high probability via the following (classical!) Las Vegas algorithm:
3. If N = ab for some integers a, b > 2, then output a and quit. (To see that this can be done
quickly, note that if a, b > 2 and ab = N, then 2b 6 ab = N and so 2 6 b 6 lg N. For each b,
you can try finding an integer a such that ab = N by binary search.)
94
4. (At this point, N is odd and not a power. This means that N has at least two distinct odd
prime factors, in particular, there are odd, coprime p, q > 1 such that N = pq.) Pick a random
x ∈ ZN .
5. Compute gcd(x, N) with the Euclidean Algorithm. If gcd(x, N) > 1, then output gcd(x, N)
and quit.
6. (At this point, x ∈ Z∗N .) Use the order-finding black box to find the order r of x in Z∗N .
Shor’s quantum algorithm provides the order-finding black box for this reduction.
95
15 Week 8: Factoring and order finding (cont.)
This algorithm (really a randomized reduction of Factoring to Order Finding) is clearly efficient
(polynomial time in lg N), given black-box access to Order Finding. We need to check two things:
(i) the algorithm, if it does not give up, outputs a nontrivial factor of N, and (ii) the probability of
it giving up is not too big—at most 1 − ε for some constant ε, say.
Notation 15.1 For a, b ∈ Z, we let a | b mean that a divides b, or that b is a multiple of a, precisely,
there is a c ∈ Z such that ac = b. Clearly, if a > 0, then a | b iff b = 0 in Za . We write a 6 | b to
mean that a does not divide b.
Anything the algorithm outputs in Steps 2, 3, or 5 is clearly correct. The only other output
step is Step 9. We claim that gcd(y − 1, N) is a nontrivial factor of N: We have y , −1 in ZN by
assumption, or equivalently, N 6 | y + 1. Also, y , 1 in ZN , since otherwise xr/2 = 1 in ZN , which
contradicts the fact that r is the least such exponent. Thus N 6 | y − 1. Yet we have y2 = xr = 1 in
ZN , which means that N | y2 − 1 = (y + 1)(y − 1). So N divides (y + 1)(y − 1) but neither of its
two factors. The only way this can happen is when y + 1 includes some, but not all, of the prime
factors of N, and likewise with y − 1. Thus 1 < gcd(y − 1, N) < N, and so we output a nontrivial
factor of N in Step 9.
The algorithm could give up in Steps 7 or 8. Giving up in Step 7 means that r is odd. We show
that at most half the elements of Z∗N have odd order, and so the algorithm gives up in Step 7 with
probability at most 1/2. In fact, we show that if x ∈ Z∗N has odd order r, then −x in ZN (which is
also in Z∗N ) has order 2r. So at least one element of each pair ±x has even order, and so we’re done
since Z∗N is made up of such disjoint pairs. First, we have
where all arithmetic is in ZN . So −x has order at most 2r. Now suppose that −x has order
s < 2r. Then 1 = (−x)s = (−1)s xs . We must have s , r, for otherwise this would become
1 = (−1)r xr = (−1)r = −1, since r is odd (and since N > 2, we have 1 , −1 in ZN ). Now since
0 < s < 2r but s , r, we must have xs , 1, and because (−1)s xs = 1, we cannot have (−1)s = 1.
Thus (−1)s = xs = −1. But now,
• d(x +N y) = (x1 +p y1 , x2 +q y2 ).
96
• d(x ·N y) = (x1 ·p y1 , x2 ·q y2 ).
• d(1) = (1, 1).
• d(−1) = (−1, −1). More generally, d(−x) = (−x1 , −x2 ).
• x ∈ Z∗N if and only if x1 ∈ Z∗p and x2 ∈ Z∗q .
It turns out that d is a bijection from ZN to Zp × Zq . This is a consequence of the following classic
theorem in number theory:
Theorem 15.2 (Chinese Remainder Theorem (dyadic version)) Let p, q > 0 be coprime and let N =
pq. Define d : ZN → Zp × Zq by d(x) = (x mod p, x mod q). Then d is a bijection, i.e., for every
x1 ∈ Zp and x2 ∈ Zq , there exists a unique x ∈ ZN such that d(x) = (x1 , x2 ).
I’ll include the proof here for you to read on your own if you want, but I won’t present it in
class.
Proof. Set p̃ = p mod q and q̃ = q mod p. Since gcd(p, q) = 1, we also have gcd(p̃, q) =
gcd(p, q̃) = 1, and hence p̃ ∈ Z∗q and q̃ ∈ Z∗p . Let p̃−1 and q̃−1 be the reciprocals of p̃ in Z∗q and of
q̃ in Zp , respectively. Given any x1 ∈ Zp and x2 ∈ Zq , let x = (x1 q̃−1 q + x2 p̃−1 p) mod N (normal
arithmetic in Z). Clearly, x ∈ ZN . Then letting d(x) = (y1 , y2 ), we get
y1 = [(x1 q̃−1 q + x2 p̃−1 p) mod N] mod p
= (x1 q̃−1 q + x2 p̃−1 p) mod p
= x1 q̃−1 q mod p
= x1 q̃−1 q̃ mod p
= x1 mod p
= x1 ,
and similarly,
y2 = [(x1 q̃−1 q + x2 p̃−1 p) mod N] mod q
= (x1 q̃−1 q + x2 p̃−1 p) mod q
= x2 p̃−1 p mod q
= x2 p̃−1 p̃ mod q
= x2 mod q
= x2 .
Thus d(x) = (x1 , x2 ), which proves that d is surjective. To see that d is injective, let x, y ∈ ZN be
such that d(x) = d(y) = (x1 , x2 ). Then d(x −N y) = (x1 −p x1 , x2 −q x2 ) = (0, 0), and so we have
(x − y) mod p = (x − y) mod q = 0, or equivalently, p | x − y and q | x − y. But since p and q are
coprime, we must have N | x − y, and so,
x = x mod N = y mod N = y,
which shows that d is an injection. 2
We won’t discuss it here, but given x1 , x2 , one can quickly (and classically) compute inverses in
Z∗n , and thus find the unique x such that d(x) = (x1 , x2 ), using the Extended Euclidean Algorithm.
97
Exercise 15.3 In this exercise, you will prove some standard results about the cardinality of Z∗n for
any integer n > 1. For any such n, the Euler totient function is defined as
ϕ(n) := |Z∗n |,
which is the number of elements of Zn that are relatively prime to n. By convention, ϕ(1) := 1.
1. Show that if a, b > 0 are coprime, then ϕ(ab) = ϕ(a)ϕ(b). [Hint: Show that the bijection
d defined in Theorem 15.2 above (with p = a and q = b) matches elements of Z∗ab with
elements of Z∗a × Z∗b and vice versa.]
2. Show that if n is some power of a prime p, then ϕ(n) = n(p − 1)/p. [Hint: An element
x ∈ Zn is relatively prime to n iff x is not a multiple of p.]
3. Conclude that if n = qe1 1 qe2 2 · · · qkek
is the prime factorization of n, where q1 < q2 < . . . < qk
are all prime and e1 , e2 , . . . , ek > 0, then
Y
k
e −1
ϕ(n) = qj j (qj − 1).
j=1
Exercise 15.4 (Challenging) Show that ϕ(n)/n > 1/ lg n for all integers n > 1 except 2 and 6.
[Hint: For ` > 0, let n` be the product of the first ` primes. Using the inequality (64) above, show
that, for any ` > 0, if ϕ(n` )/n` > 1/ lg n` , then ϕ(n)/n > 1/ lg n for all n > n` . Then find an ` for
which the hypothesis is true.]
Back to the issue at hand. When y = xr/2 is computed in Step 8, we have y2 = xr = 1, and so y
is one of the square roots of 1 in ZN . Both 1 and −1 are square roots of 1 in ZN for any N, but in this
case (N = pq as above) there are at least two others. Whereas d(1) = (1, 1) and d(−1) = (−1, −1),
by the Chinese Remainder Theorem, there is an x ∈ ZN such that d(x) = (1, −1). By the bijective
nature of d, we have x , ±1, and so x and −x are two additional square roots of 1 besides ±1.
There could be still others. We won’t prove it, but if x is chosen uniformly at random among those
elements of Z∗N with even order, then xr/2 is at least as likely to be one of the other square roots of
1 than ±1, where r is the order of x. Thus Step 8 gives up with probability at most 1/2.
So the whole reduction succeeds in outputting a nontrivial factor of N with probability at least
1/4. As with Simon’s algorithm, we can expect to run this reduction about four times to find such
a factor. Running it additional times decreases the likelihood of failure exponentially.
98
Geometric series. This elementary fact will be useful in what is to come.
Proposition 15.5 For any r ∈ C such that r , 1, and for any integer n > 0,
X
n−1
rn − 1
ri = .
r−1
i=0
You can prove this by induction on n. If n = 0, then both sides are 0. Now assume the equation
holds for fixed n > 0. Then
X
n X
n−1
rn − 1 rn (r − 1) + rn − 1 rn+1 − 1
ri = rn + ri = rn + = = .
r−1 r−1 r−1
i=0 i=0
Pn−1
The sum i=0 ri is called a finite geometric series with ratio r.
The Quantum Fourier Transform. The Fourier transform is of fundamental importance in many
areas of science, math, and engineering. For example, it is used in signal processing to pick out
component frequencies in a periodic signal (and we will see how this applies to Shor’s order-
finding algorithm). The auditory canal inside your ear acts as a natural Fourier transformer,
allowing your brain to register different frequencies (of musical notes, say) inherent in the sound
waves entering the ear.
A quantum version of the Fourier transform, known as the quantum Fourier transform or QFT,
is a crucial ingredient in Shor’s algorithm.
Let m > 1 be an integer. We will define the m-dimensional discrete Fourier transform15 DFTm
is a linear map Cm → Cm that takes a vector x = (x0 , . . . , xm−1 ) ∈ Cm and maps it to the vector
y = (y0 , . . . , ym−1 ) ∈ Cm satisfying
1 X 2πijk/m
m−1
yj = √ e xk
m
k=0
for all 0 6 j < m.16 Set ωm := e2πi/m . Clearly, m is the least positive integer such that ωm
m = 1.
a
We call ωm the principal m-th root of unity. Note that ωm = ωm a mod m for any a ∈ Z, so we can
consider the exponent of ωm to be an element of Zm .
The matrix corresponding to DFTm is the m × m matrix whose (j, k)th entry is [DFTm ]jk =
√
ωjk
m / m, for all 0 6 j, k < m, i.e., for all j, k ∈ Zm . (It will be more convenient for the time being
to start the indexing at zero rather than one.) In fact, DFTm is unitary, and it is worth seeing why
this is so. We check that (DFTm )∗ DFTm has diagonal entries 1 and off-diagonal entries 0. For
general j, k, we have
1 X −`j `k 1 X `(k−j)
[(DFTm )∗ DFTm ]jk = ωm ωm = ωm . (65)
m m
`∈Zm `∈Zm
15
There are continuous versions of the Fourier transform.
16
There is some variation in the definition of√DFTm in different sources; for example, there may be a minus sign in
the exponent of e, or there may be no factor 1/ m in front. The current definition is the most useful for us.
99
P
If j = k, then the right-hand side is (1/m) `∈Zm 1 = 1. Now suppose j , k. Then 0 < |k − j| < m,
and so ωd m , 1, where d = k − j. To see that the sum on the right-hand side of (65) is 0, notice that
it is a finite geometric series with ratio ωd
m , 1, and so we have
X (ωd m (ωm d
m) − 1 m) − 1
(ωd `
m) = = =0,
ωdm−1 ωd
m−1
`∈Zm
because (ωm d 0 d d
m ) = (ωm ) = 1 = 1.
Naively applying DFTm to a vector in Cm requires Θ(m2 ) scalar arithmetic operations. A
much faster method, known as the Fast Fourier Transform (FFT), can do this with O(m lg m) scalar
arithmetic operations. The FFT was described by Cooley & Tukey in 1965, but the same idea can
be traced back to Gauss. It uses divide-and-conquer, and is easiest to describe when m is a power
of 2. The FFT is also easily parallelizable: it can be computed by an arithmetic circuit of width
m and depth lg m called a butterfly network. Because of this, the FFT has been rated as the second
most useful algorithm ever, second only to fast sorting. Besides its use in digital signal processing,
it is also used to implement the asymptotically fastest known algorithms, due to Schönhage &
Strassen, for multiplying integers and polynomials.
It was Shor who first showed that DFT2n could be implemented by a quantum circuit on n
qubits with size polynomial in n, and his idea is based on the Fast Fourier Transform. From now
on, the dimension will be a power of 2, so I’ll define the n-qubit quantum Fourier transform QFTn
to be DFT2n . For notational convenience, I’ll also define
n
en (x) := ωx2n = e2πix/2
Before we describe a circuit for QFTn , we will sketch out and analyze Shor’s quantum algorithm
for order-finding, which is a Monte Carlo algorithm. This description and the QFTn circuit layout
later on are adapted with modifications from a paper by Cleve & Watrous in 2000.
100
1. Input: N > 1 and a ∈ Z∗N with a > 1. (The algorithm attempts to find the order of a in Z∗N .)
Let n = dlg Ne.
2. Initialize a 2n-qubit register and an n-qubit register in the state |0i|0i. Here we will label the
basis states of a register with nonnegative integers via their usual binary representations.
3. Apply a Hadamard gate to each qubit of the first register, obtaining the state
1 X
(H⊗2n ⊗ I)|0i|0i = |xi|0i.
2n
x∈Z22n
4. Apply a classical quantum circuit for modular exponentiation that sends |xi|0i to |xi|ax mod Ni,
obtaining the state
1 X
|ϕi = n |xi|ax mod Ni . (66)
2
x∈Z22n
(We can imagine that N and a are hard-coded into the circuit, which means that the circuit
must be built in a preprocessing step after the inputs N and a are known. Alternatively, we
can keep N and a in separate quantum registers that don’t change during the course of the
computation, then feed them into this circuit when they’re needed.)
5. (Optional) Measure the second register in the computational basis, obtaining some classical
value w ∈ ZN , which is ignored.17
7. Measure the first register (in the computational basis), obtaining some value y ∈ Z22n . (This
ends the quantum part of the algorithm.)
(We are just finding a reasonably good rational approximation to y/22n that has small
denominator r. This can be done classically using continued fractions. See below.)
9. Classically compute ar mod N. If the result is 1, then output r. Otherwise, give up.
Let R be the order of a in Z∗N . The whole key to proving that Shor’s algorithm works is to show
that in Step 9 the algorithm outputs R with high probability. First, we’ll show that a single run of
the algorithm above outputs R with probability at least 4/(π2 n) − O(n−2 ), and so if we run the
17
Since we ignore the result of the measurement, this step is entirely superfluous; the algorithm would would do just
as well without it. Including this step, however, collapses the state, which simplifies the analysis greatly and allows us
to ignore the second register altogether.
101
algorithm about π2 n/4 times, we will succeed with high probability. The actual single-run success
probability is usually much higher than 4/(π2 n), but 4/(π2 n) is a good enough approximate lower
bound, and it is easier to derive than a tighter lower bound. After the analysis, we’ll discuss how
the quantum Fourier transform and the (classical) continued fraction algorithm used in Step 8 are
implemented.
Shor’s algorithm, if it succeeds, will be guaranteed to output some r > 0 such that ar = 1 in
ZN . It is possible—although very unlikely—that r is a multiple of R, but not equal to it. If we
run the algorithm until it succeeds k times and take the gcd of the k results, then the chances of
not getting R are at most (1 − 4/(π2 n))k , which decrease exponentially with k. If we only want
to find a nontrivial factor of N, then we use this algorithm to implement the black box in the
Factoring-to-Order-Finding reduction. As the next exercise shows, we don’t need to worry about
the value returned by the black box if it succeeds.
Exercise 16.1 Suppose that on input N and x ∈ Z∗N , the black box used in the reduction from
Factoring to Order Finding is only guaranteed to output some r with 0 < r < 22n such that xr = 1
in ZN , where n = dlg Ne. Show how to modify the reduction slightly so that it succeeds with the
same probability as it did before when the box always outputted the order of x in Z∗N . [Hint: Let
R be the order of x in Z∗N . First, given any multiple r of R, show how to find an odd multiple of R
(that is, a number of the form cR where c is odd) that is no bigger than r. Second, show that the
probability of success of the reduction is the same if the black box returns some odd multiple of R.]
Analysis of Shor’s Algorithm. Let R be the order of a in Z∗N . We can express x uniquely as qR + s
with s ∈ ZR and note that, owing to the periodicity of ax mod N,
1 X X
= √ |qR + si|as mod Ni + O 2−n/2 .
Q q∈Z s∈Z
M R
Now when the second register is measured in the next step, we obtain some w = as mod N
corresponding to some unique s ∈ ZR . The state after this measurement then collapses to either
1 X
√ |qR + si |wi ,
M + 1 q∈Z
M+1
102
if 0 6 s < 22n mod R, or to
1 X
√ |qR + si |wi ,
M q∈Z
M
if 22n mod R 6 s < R. It does not really matter which is the case, as the analysis is nearly identical
and the conclusions (particularly Corollary 16.4, below) are the same either way, so for simplicity,
we’ll assume the latter case applies.18 Also, the second register will no longer participate in the
algorithm, so we can ingore it from now on. To summarize, the post-measurement state of the first
register is then given as
1 X
|ηi = √ |qR + si . (68)
M q∈Z
M
The next step of the algorithm applies QFT2n to this state to obtain
1 X
|ψi = QFT2n |ηi = √ QFT2n |qR + si (69)
M q∈Z
M
1 X X
= √ e2n ((qR + s)y)|yi (70)
QM q∈Z y∈Z
M 22n
1 X X
" #
= √ e2n ((qR + s)y) |yi (71)
QM y q
1 X X
" #
= √ e2n (sy) e2n (qRy) |yi . (72)
QM y q
We’ll show that Pr[y] spikes when y/Q is close to a multiple of 1/R, but first some intuition.
Permit me an acoustical analogy. Think of the column vector |ηi as√a periodic signal with period R,
i.e., the entries at indices x = qR + s (for integral q) have value 1/ M, and all the other entries are
0. The “frequency” of this signal is then 1/R, and since the Fourier transform is good at picking out
frequencies, we’d expect to see a “spike” in the probability amplitude of the Fourier transformed
state |ψi of Equation (72) right around the frequencies 1/R, 2/R, 3/R, . . . , with 1/R being the
fundamental component of the signal and the others being overtones (higher harmonics). This is
exactly what happens, and it is the whole point of using the QFT. The larger the signal sample,
the sharper and narrower the spikes will be. We choose a sample of length Q, which is at least N2 ,
giving us at least N2 /R > N periods of the function. This turns out to give us sufficiently sharp
spikes to approximate R with high probability.
18
For the former case, just substitute M + 1 for M in the analysis to follow.
103
We now concentrate on the scalar quantity
X
e2n (qRy) (73)
q∈ZM
That is, sy is the remainder of Ry divided by Q with least absolute value. We have |sy | 6 Q/2, and
in addition, sy ≡ Ry (mod Q), and thus
Proof. Fix y and suppose that |sy | 6 R/2. If sy = 0, then the claim clearly holds by (75), so assume
sy , 0. Starting from Equation (75) and using Exercise 2.5, we have
e2n (Msy ) − 1
= e2n (Msy /2)[e2n (Msy /2) − e2n (−Msy /2)]
e2n (sy ) − 1 e2n (sy /2)[e2n (sy /2) − e2n (−sy /2)]
104
where θ := πsy /Q. Since we have
Q|sy | Q
|Msy | 6 6 ,
R 2
we know that |θ| 6 π/(2M) and |Mθ| 6 π/2. This gives
sin(Mθ) sin |Mθ| sin |Mθ|
sin θ = sin |θ| >
.
|θ|
sin x
It is readily checked that the function x is decreasing in the interval (0, π/2], so
as desired. 2
4M 4 4
− O 2−2n .
Pr[y] > 2
> 2
− O(1/Q) = 2
Qπ Rπ Rπ
So for each individual okay y ∈ ZQ , we get a relatively large (but still exponentially small)
probability of seeing that particular y. We’ll need the additional fact that there are many okay y.
The following claim is obvious, so we’ll give it without proof:
Claim 16.5 For every k ∈ ZR , there exists y ∈ ZQ such that Ry is in the closed interval [Qk − R/2, Qk +
R/2]. Each such y is okay.
These intervals are pairwise disjoint for different k. This means there are at least R many okay
y. By Corollary 16.4, the chances of finding an okay y in Step 7 of the algorithm are then
X
4
4
−2n
Pr[y is okay] = Pr[y] > R 2
− O(2 ) > 2 − O(2−n ) .
Rπ π
y is okay
Let y be the value measured in Step 7, and suppose that y is okay. Then there is some least
ky ∈ Z such that
Qky − R/2 6 Ry 6 Qky + R/2. (76)
(Actually, ky is unique satisfying (76) because the intervals don’t overlap.) Dividing by QR and
rearranging, (76) becomes
k 6 1 = 2−2n−1 ,
y y
−
Q R 2Q
and so ky /R satisfies Equation (67). Now Step 8 produces the least k and r satisfying (67), so we
have two possible issues to address:
105
2. The k and r found in Step 8 satisfy k/r = ky /R, but r < R because the fraction ky /R is not in
lowest terms (k/r is always given in lowest terms).
It turns out that the first issue never arises. To see this, first notice that k/r and ky /R must be close
to each other, because they are both close to y/Q:
ky k ky y y k y ky y k 1 −2n
R − r = R − Q + Q − r 6 Q − R + Q − r 6 Q = 2 , (77)
Proof. Let y ∈ ZQ be good. We have k/r = ky /R, since y is okay. But since both fractions are in
lowest terms, we must have k = ky and r = R. 2
Theorem 16.9 The probability that r = R is found in Step 8 is at least 4/(π2 n) − O(n−2 ).
Proof. By Claim 16.7, it suffices to show that a good y is found in Step 7 with probability at least
4/(π2 n) − O(n−2 ). By Claim 16.8 and Equation (64), there are at least
R R R
ϕ(R) > > >
1 + lg R 1 + lg N n+1
many good y. By Corollary 16.4, each good y occurs with probability at least about 4/(Rπ2 ), and
so
4
Pr[y is good] > 2 − O(n−2 ) .
π n
106
2
There are some tricks to (modestly) boost the probability of success of Shor’s algorithm while
keeping the number of repetitions of the whole quantum computation to a minimum. For example,
if an okay y is returned in Step 8 that is not good, it may be that gcd(ky , R) is reasonably small,
in which case R is a small multiple of r. If you can only afford to run the quantum portion of the
computation once, then in Step 9, you could try computing ar , a2r , a3r , . . . , anr (all modN) and
return the least exponent yielding 1, if there is one. If not, you could try relaxing the distance
bound 2−2n−1 in (67) to something bigger, in the hope that the y you found, if not okay, is close
to okay (if y is not okay, it is more likely than random of being close to one that is). If you can
afford to run the quantum computation twice, obtaining r1 and r2 respectively in Step 8, then
taking the least common multiple lcm(r1 , r2 ) yields R with much higher probability than you can
get by running the quantum computation just once. Using ideas like these, it can be shown that
one can boost the probability of finding R (using one run of the quantum part of the algorithm) to
a positive constant, independent of n.
This concludes the analysis of Shor’s algorithm. The only things left are (i) to show how the
QFT is implemented efficiently with a quantum circuit, and (ii) describe how Step 8 is implemented
by a classical algorithm. We’ll take these in reverse order.
107
17 Week 9: Best rational approximations
The Continued Fraction Algorithm. The book illustrates continued fractions as part of the order-
finding algorithm, with Theorem 5.1 on page 229, and Box 5.3 on the next page. We actually don’t
need to talk about continued fractions explicitly. All we need is to find an efficient classical
algorithm to implement Step 7, which we’ll do directly now.
For any real numbers a < b, there are infinitely many rational numbers in the interval [a, b].
We want to find one with smallest denominator and numerator.
Definition 17.1 Let a, b ∈ R with 0 < a < b. Define d to be the least positive denominator of any
fraction in [a, b]. Now define n ∈ Z to be least such that n/d ∈ [a, b]. We call the fraction n/d the
simplest rational interpolant,19 or SRI, of a and b, and we denote it SRI(a, b).
The fraction k/r found in Step 7 is just SRI((2y − 1)/22n+1 , (2y + 1)/22n+1 ).
Here is a simple, efficient, recursive algorithm to find SRI(a, b) for positive rational a < b.
Each step will include a comment explaining why it is correct.
SRI(a, b):
Input: Rational numbers a, b with 0 < a < b, each given in numerator/denominator form, where
both numerator and denominator are in binary.
Base Case: If a 6 1 6 b, then return 1 = 1/1. (Clearly, this is the simplest possible fraction!)
(Obviously, shifting the interval [a, b] by an integral amount shifts the SRI the same amount.
Also note that q > a/2—a fact that will be useful later.)
(We claim that if d 0 /n 0 = SRI(1/b, 1/a), then n 0 /d 0 = SRI(a, b). Let n/d = SRI(a, b). We
show that n 0 /d 0 = n/d. Since n/d ∈ [a, b], we clearly have d/n ∈ [1/b, 1/a], and so n 0 6 n
by minimality of n 0 . Similarly, since d 0 /n 0 ∈ [1/b, 1/a], we have n 0 /d 0 ∈ [a, b], and so
d 6 d 0 by minimality of d. Thus we have n 0 /d 0 6 n/d. Suppose n 0 /d 0 < n/d. We
have n 0 /d 0 6 n 0 /d 6 n/d, so n 0 /d ∈ [a, b] and d/n 0 ∈ [1/b, 1/a]. We also have either
n 0 /d 0 < n 0 /d or n 0 /d < n/d. We can’t have the latter, owing to the minimality of n. But we
can’t have the former, either, for otherwise, d 0 /n 0 > d/n 0 , and this contradicts the minimality
of d 0 . Thus we must have n 0 /d 0 = n/d, and so SRI(a, b) = 1/SRI(1/b, 1/a).)
19
I’m making this term up. I’m sure there must be an official name for it, but I haven’t found what it is.
108
The comments suggest that the SRI algorithm is correct as long as it halts. It does halt, and
quickly, too. Let the original inputs be a = a0 = n0 /d0 and b = N0 /D0 , given as fractions in
lowest terms (n0 , d0 , N0 , and D0 are all positive integers). Similarly, for 0 < k, let ak = nk /dk and
bk = Nk /Dk be respectively the first and second argument to the kth recursive call to SRI. We
consider the product Pk := nk dk Nk Dk and how it changes with k. If the kth recursive call occurs
in the second case, then the numerators and denominators are simply swapped, so Pk = Pk−1 .
If the kth recursive call occurs in the first case, then dk = dk−1 and Dk = Dk−1 , but (letting
q := dak−1 − 1e)
and
Nk = Nk−1 − qDk−1 < Nk−1 ,
because q > ak−1 /2. Thus in this case, Pk < Pk−1 /2. The two recursive cases alternate, so Pk
decreases by at least half with every other recursive call. Since Pk > 0, we must hit the base
case after at most 2 lg P0 = 2(lg n0 + lg d0 + lg N0 + lg D0 ) recursive calls. For each k > 0, lg Pk
approximates the size of the input (in bits) up to an additive constant, and this size never increases
from call to call, so the whole algorithm is clearly polynomial time.
Exercise 17.3 (Challenging) Using your favorite programming language, implement the SRI algo-
rithm above. You can decide to accept either exact rational or floating point inputs.
1 X
QFTn |xi = en (xy)|yi.
2n/2 y∈Z2n
It was Peter Shor who first showed how to implement QFTn efficiently with a quantum circuit, in
the same paper as his factoring algorithm. The following recursive description is taken from Cleve
& Watrous (2000). When n = 1, you can easily check that QFT1 = H, i.e., the one-qubit Hadamard
gate. Now suppose that n > 1 and let 1 6 m < n be an integer. QFTn can be decomposed into a
circuit using QFTn−m , QFTm , and two other subcircuits, as shown in Figure 8. The Pn,m gate acts
on two numbers—an (n − m)-bit number x ∈ Z2n−m and an m-bit number y ∈ Z2m —such that
109
.. ..
.. .. .. .. .. .. . .
n−m . . . QFTn−m . . .
QFTn = Pn,m
. .. ..
.. .. .. .. .. . .
m .. . . . QFTm . .
expanding QFTn−m and QFTm , as long as you keep track of which qubit is which and adjust the
elementary gates accordingly.
Many recursive decompositions are possible, based on the choice of m at each stage. Shor’s
original circuit for QFTn is obtained by recursively decomposing with m = 1 throughout. A
smaller depth circuit is achieved by a divide-and-conquer approach, letting m be roughly n/2
each time.
Let’s check that the decomposition of Figure 8 is correct. Given any n-bit number x ∈ Z2n , we
split its binary representation into its n − m high-order bits xh ∈ Z2n−m and its m low-order bits
x` ∈ Z2m . So we have x = xh 2m + x` , and we may write the state |xi as |xh , x` i or as |xh i|x` i.
Applying QFTn to |xi gives
1 X 1 X
QFTn |xi = en (xy)|yi = en ((xh 2m + x` )y)|yi. (78)
2n/2 y∈Z2n
2n/2 y
Expressing each y as yh 2n−m + y` for unique yh ∈ Z2m and y` ∈ Z2n−m , (78) becomes
1 X 1 X
en ((xh 2m + x` )(yh 2n−m + y` ))|yi = en−m (xh y` )em (x` yh )en (x` y` )|yi. (79)
2n/2 y
2n/2 y
(Notice that there is no xh yh exponent, since it is multiplied by 2n .) Now let’s see what happens
when the right-hand circuit of Figure 8 acts on |xi. We have
|xi = |xh i|x` i
QFTn−m 1 X
7−→ (n−m)/2
en−m (xh y` )|y` i|x` i
2 y` ∈Z2n−m
Pn,m 1 X
7−→ en−m (xh y` )en (y` x` )|y` i|x` i
2(n−m)/2 y
`
QFTm 1 X X
7−→ en−m (xh y` )en (y` x` )em (x` yh )|y` i|yh i
2n/2 y y ∈Z m
` h 2
110
1 XX
7−→ en−m (xh y` )en (y` x` )em (x` yh )|yh i|y` i
2n/2 y` yh
1 X
= en−m (xh y` )em (x` yh )en (x` y` )|yi,
2n/2 y∈Z2n
where we set y := yh 2n−m + y` as before. The last arrow represents the action of the qubit-
permuting gate. The final state is evidently the same as in (79), so the two circuits are equal.
Finally, we get to implementing the Pn,m gate. We’ll implement Pn,m entirely using controlled
phase-shift gates. For any θ ∈ R, define the conditional phase-shift gate as
πiθ 1 0
P(θ) := e Rz (2πθ) = .
0 e2πiθ
For example, I = P(1), Z = P(1/2), S = P(1/4), and T = P(1/8). For the controlled P(θ) gate—the
C-P(θ) gate—we clearly have
P(θ) 1 0 0 0
0 1 0 0
= = .
0 0 1 0
P(θ) 0 0 0 e2πiθ
Owing to the symmetry between the control and target qubits, we will display this gate as
where we place the value θ somewhere nearby. Our θ-values will always be of the form 2−k for
integers k > 0. Notice that for any a, b ∈ Z2 ,
It is easiest to think of Pn,m as acting on two quantum registers—the first with n − m qubits
and the second with m qubits. What gates do we need to implement Pn,m ? Let’s consider Pn,m
applied to the state |xi|yi = |x1 x2 · · · xn−m i|y1 y2 · · · ym i, where x1 , . . . , xn−m and y1 , . . . , ym are all
bits in Z2 . We have
x X
n−m
y X
m
−j
= 0.x1 x2 · · · xn−m = xj 2 and = 0.y1 y2 · · · ym = yk 2−k ,
2n−m 2m
j=1 k=1
where the “decimal” expansions are actually base 2. Multiplying these two quantities gives
xy X X
n−m m
= xj yk 2−j−k ,
2n
j=1 k=1
111
1/4 1/8 1/16 1/32
1
2
3
4
5
1
2
3
4
1/64 1/128 1/256 1/512
Figure 9: The circuit implementing P9,4 . C-P(θ) gates are grouped according to the values of θ.
Within each group, gates act on disjoint pairs of qubits, so they can form a single layer of gates
acting in parallel.
and so Y Y
en (xy) = exp(2πixy/2n ) = exp(2πixj yk 2−j−k ) = ej+k (xj yk ).
j,k j,k
Recalling (80), notice that for each j and k, we can get the (j, k)th factor in the product above if
we connect the jth qubit of the first register (carrying xj ) with the kth
qubit
of the second register
(carrying yk ) with a C-P(2−j−k ) gate (which then acts on the state xj yk to get an overall phase
contribution of ej+k (xj yk )). So to implement Pn,m we just need to do this for all 1 6 j 6 n − m
and all 1 6 k 6 m. That’s it. All these gates will combine to give the correct overall phase shift
of en (xy). The order of the gates does not matter, because they all commute with each other (they
are all diagonal matrices in the computational basis). For example, Figure 9 shows the P9,4 circuit.
Exercise 17.4 Give two complete decompositions of QFT4 as circuits, the first using m = 1 through-
out, and the second using m = 2 for the initial decomposition. Both circuits should use only H
and C-P(2−k ) gates for k > 0. Do not cross wires except at the end of the entire circuit, that is, shift
any wire crossings in the recursive QFT circuits to the end of the overall circuit.
In Exercise 17.4, if you correctly shifted all the wire crossings to the end of the circuit, you may
have noticed that in the end you simply reverse the order of the qubits. That is not a coincidence;
an easy inductive argument shows that this must always be the case.
112
Exercise 17.5 (Challenging) Asymptotically, what is the size (number of elementary gates) of QFTn
when decomposed using m = 1 throughout (Shor’s circuit)? What is the size using the divide-
and-conquer method with m = n/2 throughout? The same questions for the depth (minimum
possible number of layers of gates acting on disjoint sets of qubits). Use big-O notation. In all
cases, you can ignore the qubit-permuting gates. [Hint: Find recurrence equations satisfied by the
size and the depth in each case.]
Actually, there is another way to implement Pn,m : Classically compute xy as an n-bit binary
integer, then for each k ∈ {1, 2, . . . , n}, send the kth qubit of the result through the gate P(2−k ).
There are fast parallel circuits for multiplication, with polynomial size and depth O(lg n). This
log-depth implementation of Pn,m together with the divide-and-conquer decomposition method
for QFTn give an O(n)-depth, polynomial-size circuit that exactly implements QFTn .
Exact versus Approximate. The QFTn circuit we described above for Shor’s algorithm blithely
uses C-P(2−k ) gates where k ranges between 2 and n. If Shor’s algorithm is to significantly
outperform the best classical factoring algorithms, then n must be on the order of 103 and above,
which means that we will be using gates that produce conditional phase shifts of 2π/21000 or less.
No one in their right mind imagines that we could ever tune our instruments so precisely as to
produce so small a phase shift, which is required for any exact implementation of QFT1000 . The
bottom line is that implementing QFTn exactly for large n will just never be feasible.
Fortunately, an exact implementation is unnecessary for Shor’s algorithm or for any other
probabilistic quantum algorithm that uses the QFT. We can actually tolerate a lot of imprecision
in the implementation of the C-P(2−k ) gates. In fact, if k lg n, then C-P(2−k ) is close enough
to the identity operator that we can omit these gates entirely. The resulting circuit is much smaller
and produces a good approximation to QFTn that can be used in Shor’s algorithm. Good enough
so that the probability of finding R in Step 7 of the algorithm is at worst only slightly smaller than
with the exact implementation, thus requiring only a few more repetitions of the algorithm to
produce R with high probability.
In the next few topics, we’ll make this all quantitative. The concepts and techniques we
introduce are useful in other contexts. Before we do, we need a basic inequality known as the
Cauchy-Schwarz inequality.
The Cauchy-Schwarz inequality. We mentioned this inequality early in the course as proving
the triangle inequality for complex scalars, but this is the first time since then that we actually need
it. We’ll use it here to bound the effects of unitary errors in implementing a quantum circuit. We’ll
use it again in other contexts.
Theorem 18.1 (Cauchy-Schwarz Inequality) Let H be a Hilbert space. For any vectors u, v ∈ H,
113
Proof. There are many ways to prove this theorem. The Nielsen & Chuang textbook has a proof in
Box 2.1 on page 68, which we loosely paraphrase here. See Section B.1 of the background material
in Appendix B for another proof. Equality clearly holds if u and v are linearly dependent, since
then one vector is a scalar multiple of the other. So assume that u and v are linearly independent.
By the Gram-Schmidt procedure, we can find orthonormal vectors b1 , b2 such that b1 = u/kuk
and b2 = (v − hb1 , vib1 )/kv − hb1 , vib1 k. We thus have
u = ab1 ,
v = cb1 + db2 ,
kuk · kvk = a(|c|2 + d2 )1/2 > a(|c|2 )1/2 = a|c| = |ac| = |hab1 , cb1 + db2 i| = |hu, vi| .
Exercise 18.2 Show that ku + vk 6 kuk + kvk for any two vectors u, v ∈ H, with equality holding
if and only if one is a nonnegative scalar times the other. This is another example of a triangle
inequality. [Hint: Use Cauchy-Schwarz (Theorem 18.1) and the fact that <[z] 6 |z| for any z ∈ C.]
A Hilbert Space Is a Metric Space. For any two vectors u, v ∈ H, the Euclidean distance between
u and v is defined as
d(u, v) := ku − vk.
The function d satisfies the following axioms:
1. d(u, v) > 0,
2. d(u, v) = 0 iff u = v,
These are the axioms for a metric on the set H. The last item is known as the triangle inequality,
which can be seen as follows:
where the inequality follows from Exercise 18.2. All the other axioms are straightforward.
Suppose that you could run an ideal quantum algorithm to produce a state |ψi that you then
subject to some projective measurement. You would get certain probabilities for the various
possible outcomes. Suppose instead that you actually ran an imperfect implementation of the
algorithm and produced a state |ϕi that was close to |ψi in Euclidean distance, and you subjected
|ϕi to the same projective measurement. The next proposition shows that the probabilities of the
outcomes are close to those of the ideal situation.
114
Proposition 18.3 Let {Pa : a ∈ I} be some complete set of orthogonal projectors on H. Let u, v ∈ H be
any two unit vectors, and let Pru [a] and Prv [a] be the probability of seeing outcome a ∈ I when measuring
the state u and v respectively using this complete set. Then for every outcome a ∈ I,
Proof. We have
The second inequality is an application of Cauchy-Schwarz (Theorem 18.1); the third follows from
the fact that kPwk 6 kwk = 1 for any projector P and unit vector w (see Exercise 5.12). 2
The next definition extends the notion of distance to operators. Here we give one of many
possible ways to do this.
kAvk
kAk := sup kAvk = sup .
v∈H:kvk=1 v,0 kvk
achieved by some vector, i.e., there is always a unit vector v such that kAk = kAvk. Here are some
basic properties of the operator norm that follow quickly from the definition:
115
8. kAk = k |A| k.
Exercise 18.5 Verify each of these items, based on the definition of k·k.
We can use the operator norm to define a metric d on L(H) just as we did with H.
d(A, B) := kA − Bk,
Picking up on the last item, above, we see that A has the same norm as |A|. Since |A| > 0, there
is an eigenbasis {b1 , . . . , bn } of |A| with respect to which |A| = diag(λ1 , . . . , λn ), where λ1 > · · · >
λn > 0 are the eigenvalues of |A|. We claim that kAk = λ1 , i.e., kAk is the largest eigenvalue of
|A|. To see why, let v = (v1 , . . . , vn ) be any unit column vector with respect to this basis {bj }16j6n .
Then we have
Xn X
kAvk2 = hAv, Avi = λ2j |vj |2 = λ2j aj ,
j=1 j
where we set aj := |vj |2 . We have aj > 0 for all 1 6 j 6 n, and since v is a unit vector, we have
P
j aj = 1. So,
2
X
n
kAvk = λ2j aj
j=1
X
n
= λ21 a1 + λ2j aj
j=2
X
n X
n
= λ21 1 − aj + λ2j aj
j=2 j=2
X
n
= λ21 + (λ2j − λ21 )aj .
j=2
Since λ2j − λ21 6 0 for all 2 6 j 6 n, the right-hand side is clearly maximized by setting a2 = · · · =
an = 0 (and so a1 = 1). So we must have kAk = k |A| k = k|A|b1 k = λ1 as claimed.
The next property follows from the claim, above.
9. If A and B are operators (not necessarily over the same space), then kA ⊗ Bk = kAk · kBk. In
particular, kA ⊗ Ik = kAk and kI ⊗ Bk = kBk.
This property is useful when we take the norm of a single gate in a circuit. The unitary operator
corresponding to the action of the gate is generally of the form U ⊗ I, where U corresponds to the
116
gate acting on the space of its own qubits, and the identity I acts on the qubits not involved with
the gate. Property 9 says that we can ignore the I when taking the norm of this operator.
To prove Property 9, we first prove that |A ⊗ B| = |A| ⊗ |B|. To show this, we only need to
verify two things: (i) (|A| ⊗ |B|)2 = (A ⊗ B)∗ (A ⊗ B) and (ii) |A| ⊗ |B| > 0. We leave (i) as an
exercise. For (ii), we first pick eigenbases for |A| and |B|, respectively. Then if |A| = diag(λ1 , . . . , λn )
with respect to the first basis and |B| = diag(µ1 , . . . , µm ) with respect to the second, then with
respect to the product of the two bases (itself an orthonormal basis), |A| ⊗ |B| is a diagonal matrix
whose diagonal entries are λj µk for all 1 6 j 6 n and 1 6 k 6 m. Since all the λj and µk are
nonnegative, the diagonal entries of |A| ⊗ |B| are all nonnegative. Hence, |A| ⊗ |B| > 0, which
proves (ii), and thus |A ⊗ B| = |A| ⊗ |B|. Now the largest eigenvalue of |A| ⊗ |B| is clearly λµ, where
λ = max(λ1 , . . . , λn ) = kAk and µ = max(µ1 , . . . , µm ) = kBk by the claim. Since |A| ⊗ |B| = |A ⊗ B|,
the product λµ is also the largest eigenvalue of |A ⊗ B|, and so using the claim again, we get
Property 9.
Exercise 18.7 Verify by direct calculation that (|A| ⊗ |B|)2 = (A ⊗ B)∗ (A ⊗ B).
While we’re on the subject, one more property of the operator norm will find use later on. If
you want, you can skip down to after the proof of Claim 18.8, below, and refer back to it later when
you need to.
Claim 18.8 For any operator A, the operators |A| and |A∗ | are unitarily conjugate, i.e., there is a unitary
operator U such that |A∗ | = U|A|U∗ .
Since unitarily conjugate operators have the same spectrum, Claim 18.8 implies that |A| and |A∗ |
have the same largest eigenvalue, i.e., kAk = kA∗ k. Claim 18.8 itself follows from a fundamental
decomposition theorem known as the polar decomposition. For a proof of this decomposition, see
Section B.3 in Appendix B. The polar decomposition is closely related (in fact, equivalent) to the
singular value decomposition, which is also proved in Section B.3.
Theorem 18.9 (Polar Decomposition, Theorem B.8 in Section B.3) For every operator A there is a
unitary U such that A = U|A|. In fact, |A| is the unique positive operator H such that A = UH for some
unitary U.
If z ∈ C is a scalar, then obviously z = u|z| for some u ∈ C with unit norm (i.e., a phase factor).
Furthermore, |z| is the unique nonnegative real factor in any such decomposition, and if z , 0 then
u is unique as well. Theorem 18.9 generalizes this fact to operators in an analogous way. (If A is
nonsingular (invertible), then U is unique as well: it can be easily shown that if A is nonsingular
then |A| is nonsingular, whence U = A|A|−1 .)
117
Proof of Claim 18.8. Let A be an operator. By the polar decomposition (Theorem 18.9), there is a
unitary U such that A = U|A|. We have, using Exercise 9.32,
√ q q
|A∗ | = AA∗ = U|A|2 U∗ = U |A|2 U∗ = U|A|U∗ .
2
Now we consider an arbitrary idealized quantum circuit C with m many unitary gates, which
basically consists of a succession of unitary operators U1 , . . . , Um applied to some initial state |initi,
producing the state |ψi = Um · · · U1 |initi, which is then projectively measured somehow. When
implementing C we might implement each gate Uj imperfectly, getting some unitary Vj instead,
where hopefully, Vj is close to Uj . I will call this a unitary error. The actual circuit produces the
state |ψ 0 i = Vm · · · V1 |initi. Assuming d(Uj , Vj ) 6 ε for all 1 6 j 6 m, what can we say about
d(|ψi, |ψ 0 i)?
Classical calculations are often numerically unstable, and errors may compound multiplica-
tively. Fortunately for us, unitary errors only compound additively rather than multiplicatively,
so we can tolerate a fair amount of imperfection in our gates—only O(lg n) bits of precision per
gate for a circuit with a polynomially bounded (in n) number of gates.
Back to the question above. Using the basic properties of the operator norm listed above, we
get
d |ψi, ψ 0
= k(Um · · · U1 − Vm · · · V1 )|initik
6 kUm · · · U1 − Vm · · · V1 k · k|initik
= kUm · · · U1 − Vm · · · V1 k.
The operator inside the k·k on the right can be expressed as a telescoping sum:
X
m
Um · · · U1 − Vm · · · V1 = Um · · · Uk+1 (Uk − Vk )Vk−1 · · · V1 . (81)
k=1
Therefore,
X
m
kUm · · · U1 − Vm · · · V1 k =
Um · · · Uk+1 (Uk − Vk )Vk−1 · · · V1
k=1
X
6 kUm · · · Uk+1 (Uk − Vk )Vk−1 · · · V1 k
k
X
= kUk − Vk k
k
X
6 ε
k
= mε,
and so d(|ψi, |ψ 0 i) 6 mε.
Suppose we want the probability of some outcome to differ from the ideal probability by no
more than some δ > 0. Then by Proposition 18.3, it suffices that 2mε 6 δ, or that
δ
ε6 .
2m
118
For example, the entire quantum circuit for Shor’s algorithm has size polynomial in n—let’s
say at most cnk gates for some constants c and k. (I’m not sure, but I believe that k 6 3. The
dominant contribution is not the QFT but rather the classical modular exponentiation circuit.) The
algorithm produces a good y (one that will lead to finding R) with probability at least 4/(π2 n),
ignoring a quadratically small correction term. We could settle instead for a success probability
of at least 2/(π2 n), say, which would require up to twice as many trials on average for success.
But then, choosing δ := 4/(π2 n) − 2/(π2 n) = 2/(π2 n), we could implement each gate to within an
error (operator distance) of
2/(π2 n) 1
εShor := k
= 2 k+1 = Θ(n−k−1 )
2cn π cn
away from the ideal. This has major implications for the QFT part of the circuit. The QFT has size
Θ(n2 ), uses n Hadamard gates, and the rest of the gates are C-P(2−j ) gates, where 2 6 j 6 n. (We
can do without the swap gates by keeping track of which qubit is which, and rearranging the bits
of the y value that we measure.) Note that for any θ ∈ R,
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
= 2ieiπθ
C-P(θ) − I =
0
.
0 0 0 0 0 0 0
0 0 0 e 2πiθ −1 0 0 0 sin(πθ)
It follows that
d(C-P(θ), I) = kC-P(θ) − Ik = 2| sin(πθ)| 6 2πθ.
This means that if 2π2−j 6 εShor , or equivalently,
then any C-P(2−j ) in the QFT circuit is close enough to I that we can just omit it. It’s easy to see
that most of the QFT gates are like this and can be omitted, shrinking the QFT portion of the circuit
from quadratic size to linear size in n. This fact was first observed by Coppersmith.
For n = 103 and assuming k = 3 we can get by with implementing each gate with error O(n−4 ),
which is on the order of one part per trillion. This is still a very tall order, but unlike 2−1000 it is
at least close to the realm of sanity. Optimizing other aspects of Shor’s algorithm and its analysis
increases the error tolerance considerably.
119
19 Midterm Exam
Do all problems. Hand in your anwsers in class on Wednesday, March 28, just as you would a
homework problem set. The only difference between this exam and the homeworks is that you
may not discuss exam questions or answers with anyone inside or outside of class except me. It
goes without saying that if you do, you have cheated and I’ll have to summarily fail you, which is
my usual policy about cheating. I know you won’t, though, so I’ll sleep well at night.
All questions with Roman numerals carry equal weight, but may not be of equal difficulty.
Recall that for two vectors or operators a, b, we say that a ∝ b if there is a phase factor eiθ
where θ ∈ R such that a = eiθ b.
I) (Rotating the Bloch sphere) Find a unit vector n̂ = (x, y, z) ∈ R3 on the Bloch sphere and an
angle ϕ ∈ [0, 2π) such that
where Rn̂ (ϕ) is defined in Exercise 9.4, and |+xi, |+yi, and |+zi are given by Equations (18–
20). Give the 2 × 2 matrix corresponding to your solution in the standard computational
basis, simplified as much as possible. What can you say about Rn̂ (ϕ)|+zi? There are exactly
two possible solutions to this problem.
II) (Phase factors and density operators) Let U and V be unitary operators over H. It is easy to
see that if U ∝ V, then UρU∗ = VρV ∗ for every state ρ. (Here, by “state” we mean a state in
the density operator formalism, i.e., a one-dimensional projection operator of the form |ψihψ|
for some unit vector |ψi.) Show the converse: If U and V are unitary and UρU∗ = VρV ∗ for
all states ρ, then U ∝ V. [Hint: Consider U and V in matrix form and show that every entry
of U is equal to the corresponding entry of V multiplied by the same phase factor. Use the
equation above for specific values of ρ. This technique is similar to that used in Exercise 9.26.]
III) (Tensor products of matrices) Let A be an arbitrary n × n matrix and let B be an arbitrary
m × m matrix.
(a) If A and B are both upper triangular, explain why A ⊗ B is also upper triangular.
(b) Suppose that A has eigenvalues λ1 , . . . , λn (with multiplicities), and that B has eigenval-
ues µ1 , . . . , µm (with multiplicities). Describe the eigenvalues of A ⊗ B. Note that here,
A and B are not necessarily upper triangular. [Hint: Use the previous item and things
we know about the eigenvalues of upper triangular matrices.]
IV) (Teleportation gone wrong) Alice and Bob think they are sharing a pair of qubits in the state
|Φ+ i, but instead the pair of qubits that they share is in one of the other three Bell states.
Suppose that they now attempt to do the standard one-qubit teleportation protocol to teleport
the state |ψi from Alice to Bob using this pair.
(a) Show that the state that Bob possesses at the end is, up to a phase factor, some Pauli
operator (X, Y, or Z) applied to |ψi. [Hint: You can save yourself a lot of calculation by
observing that the four Bell states are of the form (I ⊗ σ)|Φ+ i for σ ∈ {I, X, Z, XZ}.]
120
(b) Supposing Alice and Bob know that they share a pair of qubits in the state |Ψ− i, show
how they can alter their protocol to faithfully teleport |ψi. [Hint: Use the previous
item.]
f(x) = s · x,
P
n
Where s · x = j=1 s j xj mod 2 is the standard dot product of s and x over Z2 . Recall the
inversion gate If such that
If |xi = (−1)f(x) |xi
for all x ∈ (Z2 )n . The following describes a circuit that uses If once to find s:
Do the following:
121
20 Week 10: Grover’s algorithm
Quantum Search. You are given an array A[1 . . . N] of N values, one of which is a recognizable
target value t. You want to find the position w of t in the list. The values are not necessarily sorted
or arranged in any particular way. Classically, the best you can do in the worst case is to probe
all A[j] for 1 6 j 6 N, and find the target on the last probe. On average, you will need about N/2
probes before finding the target with high probability.
With√a quantum algorithm, you can find the target with (extremely) high probability using
only O( N) many probes, giving a quadratic speed-up. This result is due to Lov Grover, and is
known as Grover’s quantum search algorithm. It has many variants, but we only give the simplest
one here to give an idea of how it works.
We assume that N = 2n for some n and that we have a black-box Boolean function f : {0, 1}n →
{0, 1} available such that there is a unique w ∈ {0, 1}n such that f(w) = 1 and f(z) = 0 for all z , w.
Think of f as the target detector. Our task is to find w.
We assume that we can use n-qubit If gates, where we recall that
In the present case, we have If |wi = −|wi and If |zi = |zi if z , w. Note that given the promise
about f, we have If = diag(1, . . . , 1, −1, 1, . . . , 1), where the −1 occurs at position w. Thus,
If = I − 2|wihw|.
Each use of an If gate will count as a probe. We will also use the gate
I0 = I − 2|0n ih0n |,
which flips the sign of |0n i but leaves all other basis states alone. I0 can be implemented by an
O(n)-size O(lg n)-depth circuit using H, X, and CNOT gates. Finally we assume that we have
some n-qubit unitary U available such that hw|U|0n i , 0. Setting x := hw|U|0n i and by adjusting
U by a phase factor if necessary, we can assume that x > 0. The larger x is, the better. If we let
U = H⊗n be a layer of n Hadamard gates, then we can get
X 1
x = hw|U|0n i = 2−n/2 hw|xi = 2−n/2 = √ .
x∈{0,1}n
N
It turns out that we can’t do better than this in the worst case. Grover’s algorithm now works as
follows:
2. Apply U to get the state |si = U|0n i. We call |si the start state. Note that x = hw|si = hs|wi > 0.
We’ll assume that x < 1, or equivalently, that |si and |wi are linearly independent; otherwise,
|si ∝ |wi and we can skip the next step entirely. For U implemented with Hadamards as
above, this assumption clearly holds.
122
3. Apply G to |si bπ/(4 sin−1 x)c many times, where
G := −UI0 U∗ If
is known as the Grover iterate.
4. Measure the n qubits in the computational basis, obtaining a value y ∈ {0, 1}n .
√
We’ll show that
√ y = w with high probability.
√ Note that if x = 1/ N, then bπ/(4 sin−1 x)c
π/(4x) = Θ( N), and so there are Θ( N) many probes, since G consists of one probe.
We expand G:
G = −UI0 U∗ If
= −U(I − 2|0n ih0n |)U∗ (I − 2|wihw|)
= (I − 2U|0n ih0n |U∗ )(I − 2|wihw|)
= (I − 2|sihs|)(I − 2|wihw|)
= −I + 2|sihs| + 2|wihw| − 4x|sihw|.
Applying the right-hand side to |si and |wi immediately gives us
G|si = (1 − 4x2 )|si + 2x|wi,
G|wi = −2x|si + |wi.
So we see that G|si and G|wi are both (real) linear combinations of |si and |wi. Thus G maps
the plane spanned by |si and |wi into itself, and all intermediate states of the algorithm lie in this
plane. Thus we can now restrict our attention to this two-dimensional subspace S.
Using Gram-Schmidt, we pick an orthonormal basis for S, with |wi being one vector and
|ri := |r 0 i/k|r 0 ik being the other, where |r 0 i := |si − x|wi. We have
2
and so
|si − x|wi
|ri = √ .
1 − x2
It is easily checked that hr|wi = 0. Let 0 < θ < π/2 be such that x = sin θ. Expressing |si in the
{|ri, |wi} basis, we get
p cos θ
|si = 1 − x2 |ri + x|wi = cos θ|ri + sin θ|wi = .
sin θ
Let’s express G with respect to the same {|ri, |wi} basis. Note that restricted to the subspace S, the
identity I has the same effect as the orthogonal projector PS = |rihr| + |wihw| projecting onto S:
they both fix all vectors in S. It follows that, restricted to S,
G = −PS + 2|sihs| + 2|wihw| − 4x|sihw|
= −|rihr| − |wihw| + 2(cos θ|ri + sin θ|wi)(cos θhr| + sin θhw|)
+ 2|wihw| − 4 sin θ(cos θ|ri + sin θ|wi)hw|
= (2 cos θ − 1)|rihr| − 2 cos θ sin θ|rihw| + 2 sin θ cos θ|wihr| + (1 − 2 sin2 θ)|wihw|
2
cos(2θ) − sin(2θ)
= .
sin(2θ) cos(2θ)
123
Geometrically, if we identify |ri with the point (1, 0) ∈ R2 and |wi with the point (0, 1) ∈ R2 , then
|si is the point in the first quadrant of the unit circle, forming angle θ with |ri. Also, G is seen to
give a counterclockwise rotation of the circle through angle 2θ. We want the state to wind up as
close to |wi as possible, which makes an angle π/2 with |ri. Applying G m times puts the state at
an angle (2m + 1)θ from |ri, so we solve
π π 1 π 1
(2m + 1)θ = ⇐⇒ m = − = −1
− .
2 4θ 2 4 sin x 2
Rounding to the nearest integer gives m = bπ/(4 sin−1 x)c, which is the number of times we apply
G to |si. The final state is within an angle θ of |wi, so the probability of getting w as the result of the
measurement is at least cos2 θ = 1 − x2 = 1 − 2−n = 1 − 1/N (if x = 2−n/2 ), which is exponentially
close to 1.
Interestingly, if we apply G too many times, then we start drifting away from |wi and the
probability of getting w in the measurement will start going down again to about zero at 2m
applications, then it will oscillate back to one at about 3m, then close to zero again at 4m, et cetera.
Some Variants of Quantum Search. An obvious variant is to assume that f(z) = 1 for at most
one z, rather than for exactly one z. For this variant, one can run Grover’s algorithm just as before,
but check that the final result y is such that f(y) = 1, using one more probe of f. If not, then you
can conclude that f is the constant zero function, and you’d be wrong with exponentially small
probability.
Another variant is when there are exactly k many z such that f(z) = 1, where k is known, and
your job is to find the location of any one of them. This is the subject of the next exercise.
Exercise 20.1 (Somewhat challenging) Show that if there are exactly k many z such that f(z) = 1,
where n
p 0 < k < 2 is known, then one of ⊗n the targets can be found withPhigh probability using
O( N/k) probes to f. [Hint: Let U = H , let |si = U|0n i = 2−n/2 z∈{0,1}n |zi be the start
state, and let G = −UI0 U∗ If = −(I − 2|sihs|)If be the Grover iterate, all as before. Run Grover’s
algorithm as before, applying G some number of times to |si. To see how many times to apply G:
1 X
|wi := √ |zi.
k z:f(z)=1
2. Likewise, define the state |ri to be the superposition of all nontarget locations:
1 X
|ri := √ n |zi.
2 − k z:f(z)=0
124
4. Define 0 < θ < π/2 such that x = sin θ, just as before, and show that |si = cos θ|ri + sin θ|wi,
just as before.
just as before. Note that G = −(I − 2|sihs|)If , −(I − 2|sihs|)(I − 2|wihw|), so the calculation
must be a bit different from before. You might observe that If has the same effect as I−2|wihw|
within the space spanned by |ri and |wi, but you can’t use this fact until you establish that G
maps this space into itself. Better to just do the calculations above directly.
6. Conclude that G maps the space spanned by the orthonormal set {|ri, |si} into itself, and its
matrix looks the same as before.
7. Conclude that bπ/(4θ)c is the right number of applications of G, since measuring the qubits in
p close to |wi returns some target location with high probability. Show that bπ/(4θ)c =
a state
Θ( N/k).]
A Lower Bound on Quantum Search. The number of probes to the function f in Grover’s search
algorithm is asymptotically tight. That is, no quantum
√ algorithm can find a unique target in a
search space of size N with high probability using o( N) probes. This bound is due to Bennett,
Bernstein, Brassard, and Vazirani, and predates Grover’s algorithm. It is one of the earliest results
in the area of quantum query complexity.
Suppose we are given an arbitrary r-qubit quantum circuit C of unitary gates followed by an
n-qubit measurement in the computational basis. We assume that the initial state of the r qubits
is some fixed |0i, and that C may contain some number of n-qubit If gates, which allow it to make
queries to a Boolean function f : {0, 1}n → {0, 1}. To prove a lower bound, our goal is to find some
f corresponding to a unique target w ∈ {0, 1}n (i.e., f(w) = 1 and f(z) = 0 for all z , w) such that
w is unlikely to be the final measurement result. The particular w that we choose will depend on
the circuit C.
Here’s the basic intuition. Suppose C contains some number of If gates. Just before one of
these gates is applied, the state of its input qubits is generally some superposition of states |zi with
z ∈ {0, 1}n . There are 2n many such z, and since the state is a unit vector, most of the corresponding
probability amplitudes must be close to zero. If the probability amplitude of some |wi is small,
then changing f(w) from 0 to 1 just flips the sign of this term in the superposition, which in turn
makes little difference to the overall state and is likely to go unnoticed. We want to choose w so
that this is true for all the If gates in C, as well as the final state of the measured qubits.
Now the details. This development is loosely adapted from pages 269–271 of the textbook,
except that, unlike the textbook, we do not implicitly assume that our circuit C has only n qubits.
Suppose that the circuit C has m many If gates, for some m > 0. For any f, the circuit C corresponds
(m) (m−1) (1)
to the unitary transformation Um If Um−1 If · · · U1 If U0 , where
125
(j)
• each If is the unitary operator corresponding to the jth If gate, acting on some sequence of
n of the r qubits,
(1)
• U0 represents all the unitary gates applied prior to If ,
(m)
• Um represents all the unitary gates applied after If , and
(j)
• for all 0 < j < m, Uj represents all the unitary gates applied strictly in between If and
(j+1)
If .
E X
(j)
E
|xiβx ,
(j)
ψ =
x∈{0,1}n
where the first ket in each term represents a basis state of the n qubits entering the (j + 1)st If gate,
and the second ket is a (not
necessarily unit) vector representing the other r − n qubits. Likewise,
we uniquely factor ψ(m) as E X
(m)
E
|xiβx
(m)
ψ = ,
x∈{0,1}n
where here the first ket in each term represents a basis state of the n qubits that are about to be
measured, and the second ket is a vector representing the r − n unmeasured qubits.
Since ψ(j) is a state, we have, for all 0 6 j 6 m,
D E X D
(j) (j)
E X
(j)
2
E
1 = ψ(j) ψ(j) = βx βx =
βx
. (82)
n x
x∈{0,1}
Let w ∈ {0, 1}n be arbitrary. Now we run C again with Iw gates. For 0 6 j 6 m, define
E
(j) (j) (1)
ϕw := Uj Iw Uj−1 · · · U1 Iw U0 |0i
to be the state of the circuit just after the application of Uj . We claim that there are many values of
(j)
w for which |ϕw i does not differ too much from ψ(j) , for any 1 6 j 6 m. For each 0 6 j 6 m,
126
(j)
We want to show that enough of the vectors |ηw i have small norm. For each j, define
X
(j)
2
E
D(j) :=
ηw
.
w∈{0,1}n
(0)
Proof. We proceed by induction on j. For j = 0, we have |ϕw i = U0 |0i = |ψ(0) i and thus
(0)
|ηw i = 0 for all w, and so the claim clearly holds. Now for the inductive case where 0 6 j < m,
(j+1) (j)
we want to express |ηw i in terms of |ηw i. We have, for all w,
E E
(j+1) (j+1) (j)
ϕw = Uj+1 Iw ϕw
E E
(j+1) (j) (j)
= Uj+1 Iw ψ + ηw
(j+1)
X
(j)
E
(j+1) (j)
E
= Uj+1 Iw |xiβx + Uj+1 Iw ηw
x∈{0,1}n
X
!
E E
(j) (j+1) (j)
= Uj+1 (Iw |xi) ⊗ βx + Uj+1 Iw ηw
x
X
!
E E
(j) (j+1) (j)
= Uj+1 (|xi − 2|wihw|xi) ⊗ βx + Uj+1 Iw ηw
xE E E
(j) (j+1) (j)
= Uj+1 ψ(j) − 2Uj+1 |wiβw + Uj+1 Iw ηw
E E E
(j) (j+1) (j)
= ψ(j+1) − 2Uj+1 |wiβw + Uj+1 Iw ηw .
Subtracting, we get
E E E E E
(j+1) (j+1) (j+1) (j+1) (j) (j)
ηw = ϕw − ψ = Uj+1 Iw ηw − 2|wi βw ,
whence
(j+1)
2
(j+1) (j) (j)
2
(j)
2
E
E E
E
E
(j)
ηw
=
Iw ηw − 2|wi βw
6
ηw
+ 2
βw
.
Expanding and summing over w ∈ {0, 1}n , we have
X
E
E
(j)
(j)
X
(j)
2
E
D(j+1) 6 D(j) + 4
ηw
·
βw
+ 4
βw
w∈{0,1}n w∈{0,1}n
= D(j) + 4hκ, λi + 4,
where we have used Equation (82) for the last term, and where κ and λ are 2n -dimensional
(j) (j)
column vectors whose entries, indexed by w, are k|ηw ik and k|βw ik, respectively. We can apply
Cauchy-Schwarz to hκ, λi:
X
E
1/2 X
E
1/2
! !
(j)
2
(j)
2
p p
hκ, λi = |hκ, λi| 6 kκk · kλk =
ηw
βw
= D(j) · 1 = D(j) ,
w w
127
using (82) again. Plugging this in above and using the inductive hypothesis, we have
p
D(j+1) 6 D(j) + 4 D(j) + 4 6 4j2 + 8j + 4 = 4(j + 1)2 ,
where (as with |ψ(m) i) the first ket represents the n qubits that are about to be measured, and the
second ket represents the other qubits (and is not necessarily a unit vector). The probability of
(m)
2
(m)
seeing w as the outcome of the measurement when the state is |ϕw i is then Pr[w] =
|γw i
,
but this value is quite small, provided m is not too large:
E
E E E
(m)
(m) (m) (m)
γw
=
γw − βw + βw
√
(m)
E
(m)
E
2
6
γw − βw
+ n/2
2 √
(m)
E
(m)
E
2
=
|wi ⊗ γw − βw
+ n/2
2 √
(m)
E E
(m)
2
=
(|wihw| ⊗ I) ϕw − ψ
+ n/2
√ 2
E
(m)
2
=
(|wihw| ⊗ I) ηw
+ n/2
√ 2
E
(m)
2
6
ηw
+ n/2
√ 2
(2m + 1) 2
6 .
2n/2
And so we get that Pr[w] 6 (2m + 1)2 /2n−1 = O(m2 /2n ). So finally, if m = o(2n/2 ), we have
Pr[w] = o(1), i.e., Pr[w] approaches zero as n gets large, and the circuit likely won’t find w.
128
22 Week 11: Quantum cryptography
Quantum Cryptographic Key Exchange. If Alice and Bob share knowledge of a secret string r
of random bits, then Alice can send a message m with the same number of bits as r to Bob over
a channel subject to eavesdropping with perfect secrecy, i.e., no third party (Eve), monitoring the
channel with no knowledge of r, can gain any knowledge about m whatsoever. This scheme,
known as a one-time pad, works as follows:
1. Alice computes c = m ⊕ r, the bitwise exclusive OR of m and r. The message m is called the
cleartext or plaintext, and c is called the ciphertext.
2. Alice transmits the ciphertext c to Bob over the channel, which we’ll assume is publically
readable, e.g., a newspaper or an internet bulletin board.
3. Bob gets c and computes m = c ⊕ r, thus recovering the cleartext m.
All Eve sees is c = m ⊕ r, and since she doesn’t know r which is assumed to be uniformly random,
the bits of c look completely random to her—all possible c’s are equally likely if all possible r’s are
equally likely. Hence the perfect secrecy.
It’s called a one-time pad for a reason: r cannot be reused to send another message. Suppose
Alice sends another message m 0 using the same r to transmit c 0 = m 0 ⊕ r. Then Eve can compute
c ⊕ c 0 = (m ⊕ r) ⊕ (m 0 ⊕ r) = m ⊕ m 0 ⊕ r ⊕ r = m ⊕ m 0 .
If m and m 0 are both uncompressed files of English text, then they have enough redundancy that
Eve can gain some knowledge of m and m 0 from their XOR, and likely can even decipher both m
and m 0 uniquely from m ⊕ m 0 if the messages are long enough.
If r is short, say, only a few hundred bits long, then Alice can only transmit that amount of bits
in her message with a one-time pad. It is more practical instead for Alice and Bob to use r as the
key to some symmetric cipher by which they can communicate longer messages. Some commonly
used ciphers for electronic communications include the Advanced Encryption Standard (AES,
a.k.a. Rijndael), Blowfish, and 3DES. These ciphers are called symmetric because the same key r is
used by Alice to encrypt and by Bob to decrypt. These ciphers do not provide perfect secrecy in
the theoretical sense, but they are widely believed to be infeasible to crack.
We get back to the question of how Alice and Bob manage to share r securely in the first place.
If they spend any time together in a physically secure room, they can flip coins and generate an r.
In practice, though, it is not possible for Alice and Bob to ever be together; they may not even know
each other (for example, Alice buys a book online from Bob, who is Barnes and Noble). This is the
problem of key exchange, and it is currently handled using some kind of public key cryptography
such as RSA, Diffie-Hellman, or El-Gamal. I won’t go into how public key crypto works here,
except to say that it relies for its security on the difficulty of performing certain number-theoretic
tasks, such as factoring (RSA) and computing discrete logarithms (Diffie-Hellman, El-Gamal). If
quantum computers are ever physically realized, then Shor’s algorithms for factoring and discrete
log could break current public key schemes.
A key-exchange protocol using quantum mechanics was proposed in 1984 by Charles Bennett
and Gilles Brassard. In this protocol, known as BB84, Alice sends a sequence of qubits to Bob
129
across an insecure quantum channel, subject to eavesdropping/tampering by Eve. Alice and
Bob then perform a series of checks, communicate through a public, nonquantum channel, and
in the end they share some secret random bits. The security of the protocal relies only on the
laws of physics and the faithfulness of the implementation, and not on the assumed difficulty
of certain tasks like factoring large numbers. The key intuition is that in quantum mechanics,
measuring a quantum system may unavoidably alter the system being measured. If Eve wants to
get information about the qubits being sent from Alice to Bob, she must perform a measurement,
which will disrupt the qubits enough to be detected by Alice and Bob with high probability. For
brevity, I will only describe the basic, simplistic, idealized, and unoptimized protocol here. There
are a number of technical issues (such as noise) that I won’t go into. A quick tutorial on quantum
cryptography by Jamie Ford at Dartmouth College can be found at https://www.cs.dartmouth.
edu/˜jford/crypto.html (this link now appears to be broken). There is an on-line simulation
of BB84 by Frederick Henle at http://fredhenle.net/bb84/demo.php. An extensive (though
now somewhat outdated) bibliography of quantum cryptography papers, started(?) by Gilles
Brassard (Université de Montréal) and maintained(?) by Claude Crépeau at McGill University, is
at https://www.cs.mcgill.ca/˜crepeau/CRYPTO/Biblio-QC.html.
In the BB84 protocol, it is assumed that Alice and Bob share an insecure quantum channel, which
Alice will use to send qubits to Bob, and a classical information channel (such as a newspaper,
phone, or electronic bulletin board) that is public (anyone can monitor it) but reliable, in the sense
that any message that Alice and Bob send to each other along this channel reaches the recipient
without alteration, and it is impossible for a third party to send a message to Alice or Bob pretending
to be the other (i.e., it is forgery proof). The description of BB84 needs the following:
Definition 22.1 Let H be an n-dimensional Hilbert space, and let B = {b1 , . . . , bn } and C =
{c1 , . . . , cn } be two
orthonormal bases for H. We say that B and C are mutually unbiased, or
√
complementary, if | b1 |cj | = 1/ n for all 1 6 i, j 6 n. A collection B1 , . . . , Bk of orthonormal bases
for H is mutually unbiased if each pair of bases in the collection is mutually unbiased.
The geometrical intuition is that B and C are mutually unbiased iff the “angle” between any
member of B and any member of C is always the same, up to a phase factor.
Exercise 22.2 Show that if B and C are two orthonormal bases of an n-dimensional Hilbert space
√
such that |hb, ci| = |hb 0 , c 0 i| for any b, b 0 ∈ B and c, c 0 ∈ C, then 1/ n is the common value of
|hb, ci| for any b ∈ B and c ∈ C. [Hint: Express a vector from C as a linear combination of vectors
from B. What can you say about the coefficients?]
A d-dimensional Hilbert space cannot have a mutually unbiased collection of more than d + 1
orthonormal bases. If d is a power of a prime number, then d + 1 mutually unbiased bases can be
constructed, but it is an open problem to determine how many mutually unbiased bases there can
be when d is not a prime power. Even the case where d = 6 is open. Anyway, for the one-qubit
case where d = 2, the bases {|+xi, |−xi}, {|+yi, |−yi}, and {|+zi, |−zi} are mutually unbiased. (Other
collections of three mutually unbiased bases can be obtained from these three by applying some
unitary operator U to every vector (the same for all the vectors). Applying U does not change the
inner product of any pair of vectors.) BB84 uses two of these three, say, {|+zi, |−zi} and {|+xi, |−xi}.
We’ll denote the first of these by l, consisting of spin-up (↑) and spin-down (↓) states, and the
130
second by ↔, consisting of spin-right (→) and spin-left (←) states. The two vectors of each basis
encode the two possible bit values: in the l basis, ↑ encodes 0 and ↓ encodes 1; in the ↔ basis, →
encodes 0 and ← encodes 1. Here is the protocol:
Sending qubits. Alice and Bob repeat the following for j running from 1 to n, where n is some
large number. The random choices made at one iteration are independent of those made at
other iterations.
1. Alice chooses a bit bj ∈ {0, 1} uniformly at random. She also chooses Bj to be one of the
l or ↔ uniformly at random, independent
bases of bj . She prepares a qubit in a state
qj encoding the bit bj in the basis Bj (i.e., qj is either ↑ or → for bj = 0, and either ↓
or ← for bj = 1), and sends the qubit qj to Bob across the quantum channel.
2. Bob receives the qubit sent from Alice, chooses a basis Cj from {l, ↔} uniformly at
random, and measures the qubit projectively using Cj , obtaining a bit value cj according
to the same encoding scheme described above.
This ends the quantum part of the protocol. All further communication between Alice and
Bob is classical and uses the public, classical channel.
Discarding uncorrelated bits. Note that if the quantum channel faithfully transmits all of Alice’s
qubits to Bob unaltered, then bj = cj with certainty whenever Alice’s basis was the same as
Bob’s, i.e., whenever Bj = Cj ; otherwise bj and cj are completely uncorrelated (because l
and ↔ are mutually unbiased).
Security check. 1. Alice chooses a subset S ⊆ C uniformly at random (S stands for “security
check”). For example, she decides to put j into S with probability 1/2 independently
for each j ∈ C. The set S is expected to have size about k/2.
2. Alice sends S to Bob along with the value of bj for each j ∈ S.
3. Bob checks whether bj = cj for every j ∈ S. If so, he tells Alice that they can accept
the protocol, in which case, Alice and Bob respectively discard the bits bj and cj where
j ∈ S and retain the rest of the bits bj and cj for j ∈ C − S (about k/2 or about n/4 many
bits). On the other hand, if there are any discrepancies, then Bob tells Alice that they
should reject the protocol, in which case, all bits are discarded and they start over with
an entirely new run of the protocol.
Note that if the quantum channel is not tampered with, then Alice and Bob will accept the protocol.
Also notice that any third party monitoring the classical communication between Alice and Bob
knows nothing of the bits that Alice and Bob eventually retain. We’ll explain why there is a good
131
chance that Eve will be caught and the protocol rejected if she tries to eavesdrop on the quantum
channel during the initial qubit communication.
For technical simplicity, we will assume that there is only one way that Eve can eavesdrop on
the quantum channel: she can choose to measure some qubit in either of the bases l or ↔, then
send along to Bob some qubit that she prepares based on her measurement. This is not a general
proof of security then, because Eve could do other things: measure a qubit in some arbitrary basis,
or even couple the qubit to another quantum system, let the combined system evolve, make a
measurement in the combined system, then send along some qubit to Bob based on that. She could
even make correlated measurements involving several of the sent qubits together. It takes some
work to show that Eve’s chances are being caught are not significantly reduced by these more
general attacks, and we won’t show the more general proof here.
If Eve happens to measure a qubit qj in the same basis Bj that Alice used, then this is very
good
for Eve: She knows the encoded bit with certainty, and the post-measurement state is still
qj , i.e., Eve did not alter it. So she can simply retransmit the post-measurement qubit to Bob. In
this case, if j ∈ S, then this qubit won’t provide any evidence of tampering; if j ∈ C − S, then Eve
knows one of the “secret” bits that Alice and Bob share, assuming they accept the protocol.
With probability 1/2, however, Eve measures qj in the wrong basis Bj0 —the one other than Bj .
In this case, she gets a bit value uncorrelated with bj , but even worse (for Eve), her measurement
alters the qubit so as to lose any information about bj . She has to send a qubit to Bob, and at this
point she cannot tell that she has chosen the wrong basis, so the best she can do is what she did
before: resend the post-measurement qubit to Bob. If j ∈ C, then Bob will measure Eve’s altered
qubit rj using Bj , and since rj is in the basis Bj0 , which is mutually unbiased with Bj , Bob’s
result cj will be completely random and uncorrelated with Alice’s bj . If j ∈ S and bj , cj , then
Eve is caught and the protocol is rejected.
To summarize, for each qubit qj that Eve decides to eavesdrop on, Eve will get caught
measuring the qubit if and only if
• she chooses the wrong basis (the one other than Bj ), and
• j ∈ C (i.e., Bob uses Bj to do his measurement and the bit is not discarded as uncorrelated),
and
• Bob measures a value cj , bj , and
• j ∈ S (i.e., this is one of the bits Alice and Bob use for the security check).
Each of these four things happens with probability 1/2, conditioned on the event that the things
above it all happened. This makes the chances of Eve being caught
on behalf of this qubit to be
(1/2)4 = 1/16. If Eve decides to eavesdrop on qubits qj1 , . . . , qj` for 1 6 j1 < · · · < j` 6 n, then
each of these gives her a 1/16 chance of being caught, independently of the others. The probability
of her not being caught is then
1 `
1− < e−`/16 ,
16
which decreases exponentially in ` and is less than 1/e for ` > 16. So Eve cannot eavesdrop on
more than 16 qubits without a high probability of being caught. If n 16, this is a negligible
fraction of the roughly n/4 bits retained by Alice and Bob if they accept the protocol.
132
Exercise 22.3 Suppose that instead of the security check given above, Alice and Bob decide to do
the following alternate security check:
L L
1. Alice and Bob each compute the parities b = j∈C bj and c = j∈C cj of their current
respective qubits.
3. If b , c, then Alice and Bob reject the protocol and start over. Otherwise, they agree on some
j0 ∈ C (it doesn’t matter which), discard bj0 and cj0 , and retain the rest of the bits bj and cj
for j ∈ C − {j0 } as their shared secret, accepting the protocol. [If they didn’t discard one of
the bits, then someone monitoring the public channel would know the parity of Alice’s and
Bob’s shared bits. Discarding a bit removes this information.]
How many bits on average do Alice and Bob retain in this altered protocol, assuming they accept
it? What are Eve’s chances of being caught if she eavesdrops on ` of the qubits, where ` > 0?
In practice, polarized photons are used as qubits for the quantum communication phase. Alice
may generate these photons by a light-emitting diode (LED) and can send them to Bob through
fiber optic cable. Photon polarization has a two-dimensional state space and so can serve as a
qubit. The three standard mutually unbiased bases for photon polarization (each given with its
two possible states) are:
Photons have the advantage that their polarization is easy to measure and is insensitive to certain
common sources of noise, e.g., stray electric and magnetic fields.
One technical problem is making sure that only one photon is sent at a time. Alice sends a
pulse of electric current through the LED, which emits light in a burst of coherent photons with
intensity (expected number of photons) proportional to the strength of the current. If more than
one photon is sent at a time (in identical quantum states), then Eve could conceivably catch one of
the photons and measure it, letting the other photon(s) go through to Bob as if nothing had been
tampered with. To reduce the probability of a multiphoton burst, the current Alice sends through
the LED must be exceedingly weak: about one tenth the energy of a single photon, say. Then the
expected number of photons sent each time is about 1/10. This means that about nine times out
of ten, no photons are emitted at all. If λ > 0 is the ratio of the current energy divided by the
energy of a single photon (in this example, λ = 0.1), then the number of photons emitted in any
given burst satisfies a Poisson distribution with mean λ(?); that is, the probability that k photons
are emitted is
λk
f(k, λ) = e−λ ,
k!
where k is any nonnegative integer. If λ = 0.1, then f(k, 0) = e−λ 0.9, which is the probability that
the LED emits no photons. The probability of getting a single photon is f(1, λ) = e−λ λ 0.09 0.1.
133
The probability of two emitted photons is f(2, λ) = e−λ λ2 /2 0.005, or about one twentieth the
probability of a single photon. More photons occur with rapidly diminishing probability. So if we
ignore the times when no photons are emitted (Bob tells Alice that he did not receive a photon),
the chances of multiple photons is small—about 1/20. The smaller λ is, the smaller this probability
will be, but the trade-off is that we have to wait longer for a single photon.
Of course, the quantum channel could also be subject to random, nonmalicious noise, which
would cause discrepancies between Alice’s and Bob’s bits. One subtlety is to make the protocol
tolerate a certain amount of noise but still detect malicious tampering with high probability.
We now start our discussion of quantum information. One of the major uses of quantum infor-
mation theory is to analyze how noise can disrupt a quantum computation and how to make the
computation resistant to it. The textbook discusses quantum information in earnest in Chapters 8–
12, with quantum error correction in Chapter 10 and quantum information theory in Chapter 12.
Quantum information is one of the textbook’s real strong points, and I will assume you will read
starting with Chapter 8. The lectures will fill in some background and reiterate points in the text.
In the next few topics, we will be using the density operator formalism almost exclusively. We
really have no choice about this once we generalize our notion of “state” to include mixed states.
Norms of Operators. Recall the definition of the Hilbert-Schmidt inner product on L(H) (Equa-
tion (11)). This suggests another way to define the norm of an operator:
This norm, known as the Euclidean norm, the L2 -norm, or the Hilbert-Schmidt norm, satisfies all
ten of the properties satisfied by the operator norm of Definition 18.4 except property 4; in fact,
√
kIk2 = n, where n is the dimension of H. The Euclidean norm is one of a parameterized family
of norms defined on operators. For any real p > 1, define the Lp -norm (also called the Schatten
p-norm) of an operator A to be
1/p
X
n
kAkp := (tr(|A|p ))1/p = sp
j
, (83)
j=1
where s1 , . . . , sn > 0 are the eigenvalues of |A|, which are called the singular values of A. (If p is
not an integer, then technically, we have not yet defined |A|p , because |A| is an operator. For now,
you can ignore the middle expression in the equation above and use the right-hand side for the
definition of kAkp .) For p = 2, we get the Euclidean norm. The L1 norm kAk1 = tr |A| is also called
the trace norm and is often useful. In addition, we could define the L∞ norm
but this is precisely the operator norm kAk of Definition 18.4, because as p gets large, the largest
term in the sum in (83) starts to dominate.
134
Exercise 23.1 Show that if A is an operator on an n-dimensional space, and 1 6 p 6 q are real
numbers, then kAkp > kAkq . Also show that nkAk > kAk1 . Thus all these norms are within a
factor of n of each other. What is kIkp ? [Hint: For the first part, fix s1 , . . . , sn > 0 and differentiate
P 1/p
n p
the expression s
j=1 j with respect to p, and show that the derivative is always negative or
zero.]
POVMs. Let S be a physical system with state space HS . Often, we want to get some classical
information about the current state of S. We can perform a projective measurement on HS ,
obtaining various possible outcomes with various probabilites. This is not the only way to get
information about the state of S, however. We could instead couple the system S with another
system T in some known state in the state space HT , let the combined system ST evolve for a
while, then make a projective measurement of the combined system, i.e., on the space HS ⊗ HT .
This approach is more general and can get information that cannot be obtained by a projective
measurement on HS itself.
Recall that mathematically, a projective measurement on a Hilbert space H corresponds to a
complete set of orthogonal projectors {Pj : j ∈ I}, where I is the set of possible outcomes. We’ll
now relax this restriction a bit.
X X
X
* +
Pr[j] = Mj , ρ = Mj , ρ = hI, ρi = tr ρ = 1 .
j∈I j j
So the Pr[j] do form a probability distribution. It is important to note that the only properties of
ρ that we used here are that ρ > 0 and that tr ρ = 1. This is important, because we are about
the expand our definition of “state” to include mixed states, which may no longer be projection
operators, but are still positive and have unit trace.
Notice that for a POVM, we don’t specify the post-measurement state. This is OK—quite often,
we don’t care what the post-measurement state is; we only care about the outcomes and their
statistics, and a POVM provides the most general means of measuring a quantum system if we
don’t care about the state after the measurement. We’ll show in a bit that a POVM is equivalent
135
to what we described above: coupling the system to another system, letting the combined system
evolve, then making a projective measurement on the combined system.
A projective measurement on H is just a special case of a POVM where the Mj form a complete
set of orthogonal projectors. To see this, we refer back to Equation (15): Pr[k] = hψ|Pk |ψi, where
Pr[k] is the probability of outcome k when measuring the system in state |ψi, and Pk is the
corresponding projector. Letting ρ := |ψihψ| and treating the scalar hψ|Pk |ψi as a 1 × 1 matrix, we
get
Pr[k] = hψ|Pk |ψi = tr hψ|Pk |ψi = tr (Pk |ψihψ|) = tr(Pk ρ) = hPk , ρi ,
which accords with Definition 23.2.
Mixed States.
Definition 23.4 Let A1 , . . . , Ak be scalars, vectors, operators, matrices, etc., all of the same type. A
convex linear combination of A1 , . . . , Ak is a value of the form
X
k
pi Ai ,
i=1
Pk
where the pi are real scalars, each pi > 0, and i=1 pi = 1. In other words, p1 , . . . , pk form a
probability distribution.
Suppose Alice has a lab where she can prepare several states ρ1 = |ψ1 ihψ1 |, . . . , ρk = |ψk ihψk | ∈
L(H), and she flips coins and decides to prepare a state σ chosen at random from this set, where
each ρi is chosen with probability pi . She then sends the state σ she prepared to Bob, without
telling him what it is. What can Bob find out about the state σ that Alice sent her? He can, most
generally, perform a measurement corresponding to some POVM {Mj : j ∈ I}. The probability of
obtaining any outcome j, taken over both the POVM and Alice’s random choice is then
X
k X
Pr[j | σ = ρi ] · Pr[σ = ρi ] =
Pr[j] = Mj , ρi pi = Mj , ρ ,
i=1 i
136
Pk
where ρ = i=1 pi ρi is a convex combination of the ρi with the associated probabilities. So
all Bob can ever determine physically about Alice’s σ is given by the single operator ρ, which
is called a mixed state. By definition, a mixed state is any nontrivial convex linear combination
of one-dimensional projectors. By “nontrivial” we mean that all probabilities are strictly less
than 1, or equivalently, there are at least two probabilities that are nonzero. Mathematically, a
mixed state behaves in many ways much like a state of the form |ψihψ| for some unit vector |ψi
(i.e., a one-dimensional projector). It represents the state of a quantum system about which we
have incomplete information, or which we are not describing completely. Completely described
states, which up until now we have been dealing with exclusively, are of the form |ψihψ| for unit
vectors |ψi. From now on we will call these latter states pure states, and when we use the word
“state” unqualified, we will mean either a pure or mixed state. Both kinds of states are convex
combinations of pure states, trivial or otherwise. A mixed state is then some nontrivial probabilistic
mixture, or weighted average, of pure states.
Why are mixed states important? When we consider a quantum system that is not isolated
from its environment (which we must do when we consider quantum errors and decoherence),
then some information about the state of the system “bleeds” out into the environment, leaving
the system in a partially unknown state—even if the system started out in a pure state. We model
an incompletely known quantum state as a mixed state.
P
Let’s verify that if ρ is any state (say, ρ = ki=1 pi ρi , where the pi form a probability distribution
and the ρi are all pure states), we have ρ > 0 and tr ρ = 1. For positivity, let v be any vector. Then
X
k
v∗ ρv = pi v∗ ρi v > 0,
i=1
because all the ρi are positive operators. Thus ρ > 0. By linearity of the trace, we have
X X
tr ρ = pi tr ρi = pi = 1,
i=1 i
becase all the ρi have unit trace. The next proposition says that the converse of this is also true.
Proposition 23.5 If ρ ∈ L(H) is such that ρ > 0 and tr ρ = 1, then ρ is a convex linear combination of
one-dimensional projectors that project onto mutually orthogonal subspaces.
Proof. Suppose ρ > 0 and tr ρ = 1. Since ρ is normal, it has an eigenbasis {|ψ1 i, . . . , |ψn i}. With
respectPto this eigenbasis, ρ is represented by the matrix diag(p1 , . . . , pn ) for some p1 , . . . , pP
n and
so ρ = n p
i=1 i i|ψ ihψi |. Since ρ > 0, all the p i are nonnegative real, and further 1 = tr ρ = i pi .
So ρ is a convex combination of |ψ1 ihψ1 |, . . . , |ψn ihψn |, which project onto mutually orthogonal,
one-dimensional subspaces. 2
Thus we get the following two corollaries:
Corollary 23.6 An operator ρ ∈ L(H) is a state (i.e., a convex combination of one-dimensional projectors)
if and only if ρ is positive and has unit trace.
Corollary 23.7 An operator ρ ∈ L(H) is a state if and only if ρ is normal and its eigenvalues form a
probability distribution.
137
Measuring a mixed state with a POVM has exactly the same mathematical form as with a pure
state. Recall that the only two properties of the state ρ we used to show that the measurement
makes sense is that ρ > 0 and tr ρ = 1, both of which are true of any mixed state. Similarly,
unitary time evolutionP of a mixed state has exactly the same mathematical form as with a pure
state. Indeed, if ρ = i pi ρi is some mixture of pure states, then evolving ρ via a unitary operator
U should be equivalent to evolving each ρi by U and taking the same mixture of the results. By
linearity, this gives
X X X
!
U
ρ= pi ρi 7−→ pi (Uρi U∗ ) = U pi ρi U∗ = UρU∗ .
i i i
Finally, we won’t bother proving it, but Equations (29) and (30), which describe projective mea-
surements, are equally valid for mixed states ρ as well as for pure states.
Different probability distributions of pure states can yield the same state, but if they do, they
are physically indistinguishable, that is, no physical experiment can tell one distribution from the
other with positive probability. However, for any state ρ, there is a preferred mix of pure states that
yields ρ, namely, the “eigenstates” |ψ1 ihψ1 |, . . . , |ψn ihψn | used in the proof of Proposition 23.5,
with their respective eigenvalues as probabilities. The states are distinguished by the fact that
they are pairwise orthogonal. We will call this preferred probability distribution the eigenvalue
distribution of ρ.
It’s time for an example.
√ Alice may send Bob a single qubit in state |0i with probability 1/2 and
state |+i = (|0i + |1i)/ 2 with probability 1/2. The resulting mixed state is
Let’s find the eigenvalue distribution of ρ. One can easily check that an eigenbasis of this ρ consists
of states
√
1 1
|ψ1 i = p √ √ with eigenvalue p1 = (2 + 2)/4,
4−2 2 2−1
√
√
1 2−1
|ψ2 i = p √ with eigenvalue p2 = (2 − 2)/4.
4−2 2 −1
√
Thus ρ = p1 |ψ1 ihψ1 | + p2 |ψ2 ihψ2 |. So if Carol
√ prepares |ψ1 ihψ1 | with probability p 1 = (2 + 2)/4
and |ψ2 ihψ2 | with probability p2 = (2 − 2)/4, then ships her state to Bob, then Bob (who doesn’t
see who the sender is) can’t tell with any advantage over guessing who sent him the state.
Exercise 23.8 Do a similar analysis as that above, this time assuming Alice sends (4|0i+3|1i)(4h0|+
3h1|)/25 with probability 1/2 and (4|0i − 3|1i)(4h0| − 3h1|)/25 with probability 1/2.
Exercise 23.9 Prove that any convex combination of states (pure or mixed) is a state.
138
One-Qubit States and the Bloch Sphere. Recall that we have a nice geometrical representation
of one-qubit pure states: for each one-qubit pure state ρ there correspond unique x, y, z ∈ R such
that x2 + y2 + z2 = 1 and ρ = (I + xX + yY + zZ)/2, and conversely, for any point (x, y, z) on the
unit sphere in R3 (Bloch sphere), the operator (I + xX + yY + zZ)/2 is a one-qubit pure state.
Can we characterize general one-qubit states in a similarly geometrical way? Yes. Let ρ =
Pk
i=1 pi ρi be any one-qubit state, where the ρi are one-qubit pure states and the pi form a
probability distribution as usual. For 1 6 i 6 k, let (xi , yi , zi ) be the point on the Bloch sphere
such that ρi = (I + xi X + yi Y + zi Z)/2. Then by linearity we have
X
k X
I + xi X + yi Y + zi Z
I + xX + yY + zZ
ρ= p i ρi = pi = ,
2 2
i=1 i
Pk 3
where (x, y, z) := i=1 pi (xi , yi , zi ) ∈ R . That is, ρ corresponds geometrically to the point
3
(x, y, z) ∈ R that is the convex combination of all the points (xi , yi , zi ), weighted by the same
probabilities pi used to weight ρ in terms of the ρi . We note that
p X
k X
x2 + y2 + z2 = k(x, y, z)k 6 pi k(xi , yi , zi )k = pi = 1,
i=1 i
and the inequality is strict iff there are at least two distinct points (xi , yi , zi ) on the sphere with
pi > 0. This means that the point (x, y, z) is somewhere on or inside the Bloch sphere. The surface
points of the Bloch sphere correspond to the pure states, and the points in the interior correspond
to mixed states. A one-qubit unitary U rotates a mixed state ρ in the interior just as it does points
on the surface of the sphere (it rotates all of R3 , in fact).
We can get some important facts about ρ based on its geometry. For example, if ρ = (I +
xX + yY + zZ)/2, then let r = k(x, y, z)k = (x2 + y2 + z2 )1/2 6 1 be the distance from (x, y, z) to
the origin. Then the eigenvalues of ρ are (1 ± r)/2, and if r > 0, the corresponding eigenvectors
are the states corresponding to the antipodal points ±(x, y, z)/r on the surface of the sphere.
((I + (x/r)X + (y/r)Y + (z/r)Z)/2 has eigenvalue (1 + r)/2, while (I − (x/r)X − (y/r)Y − (z/r)Z)/2
has eigenvalue (1 − r)/2, which are the two probabilities in the eigenvalue distribution of ρ.) These
are the points where the line through (0, 0, 0) and (x, y, z) intersects the surface of the sphere.
(If r = 0, then (x, y, z) is the origin, ρ = I/2, and every nonzero vector is an eigenvector with
eigenvalue 1/2.)
Exercise 23.10 Prove all the assertions in the paragraph above. [Hint: You could certainly compute
the eigenvectors and eigenvalues of ρ by brute force if you had to. Alternatively, you might
note that if you let ρ1 = |ψ1 ihψ1 | = (I + (x/r)X + (y/r)Y + (z/r)Z)/2 and ρ2 = |ψ2 ihψ2 | =
(I − (x/r)X − (y/r)Y − (z/r)Z)/2, then hψ1 |ψ2 i = 0 because the corresponding points are antipodal,
and further, ρ is a convex combination of ρ1 and ρ2 . What are the coefficients of this combination
in terms of r? What does the matrix of ρ look like in the {|ψ1 i, |ψ2 i} basis?]
139
24 Week 12: Quantum channels (quantum operations)
The Partial Trace. We sometimes have a system T that we are interested in coupling with another
system S that we are not interested in, producing an entangled state in the combined system ST .
This happens, for example, when a quantum computation (system T ), rather than proceeding in
perfect isolation, gets corrupted by an unintended interaction with its environment (system S),
e.g., a cosmic ray hitting the quantum computing device. Since we only care about system T , does
it make sense to ask, “what is the state of T ?” even though it is entangled with S? The partial trace
operator lets us do just that.
Let HS and HT be Hilbert spaces. There is a unique linear map trS : L(HS ⊗ HT ) → L(HT )
such that for every A ∈ L(HS ) and B ∈ L(HT ),
The map trS is an example of a partial trace. When we apply trS , we often say that we are tracing
out the system S. There can be only one linear map satisfying (84), because L(HS ⊗ HT )
L(HS ) ⊗ L(HT ) is spanned by tensor products of operators. Suppose HS has dimension m and
HT has dimension n. Suppose some operator C ∈ L(HS ⊗ HT ) is written in block matrix form
with respect to some product basis:
B11 B12 · · · B1m
B21 B22 · · · B2m
C= . .. ,
.. ..
.. . . .
Bm1 Bm2 · · · Bmm
where each block Bij is an n × n matrix. Then we can also write C uniquely as
X
m
C= Eij ⊗ Bij ,
i,j=1
where Eij is the m × m matrix whose (i, j)th entry is 1 and all other entries 0. The partial trace of
C is then given in matrix form as
X X
m
trS (C) = (tr Eij )Bij = Bii ,
i,j i=1
140
which is an m × m matrix.
The partial trace operator extends in a similar way to combinations of several systems at once.
Intuitively, tracing out a system is a bit like averaging over the system. In tensor algebra, the
partial trace operators and the (total) trace operator are called contractions.
If system ST is in some separable (i.e., tensor product) state ρ = ρS ⊗ ρT ∈ L(HS ⊗ HT ), where
ρS ∈ L(HS ) and ρT ∈ L(HT ) are states in S and T , respectively, then trS ρ = (tr ρS )ρT = ρT and
trT ρ = (tr ρT )ρS = ρS . So we can say unequivocably that the system S is in state trT ρ and the
system T is in state trS ρ. If ρ is entangled, then we can still say that system S is in state trT ρ and T
is in trS ρ, but now these two states (called reduced states) are mixed states, even if ρ itself is a pure
state. Thus by tracing out one or the other system, we will lose some information about the state
of the remaining system if the original combined state was entangled.
Open Systems and Quantum Channels. A closed quantum system is one that does not interact
with the outside world. Closed systems evolve unitarily. An open quantum system does couple
with one or more other systems (collectively called the environment) that we wish to ignore. By
considering open systems, we will obtain a powerful formalism for describing what can happen
to a quantum system that may interact with its environment. This formalism, the formalism
of quantum channels (sometimes called quantum operations) is general enough to encompass
both unitary evolution and measurements. A quantum channel is a certain linear map that maps
states in one Hilbert space to states in another. All physical processes, including unitary evolu-
tion, measurements, etc., or any combination of these, are modeled mathematically as quantum
channels.
Definition 24.1 For Hilbert spaces H and J, we let T(H, J) denote the (Hilbert) space L(L(H), L(J))
of all linear maps from L(H) into L(J). We write T(H) to mean T(H, H). A map Φ ∈ T(H, J) is
sometimes called a superoperator.
Since a quantum state is an operator over a Hilbert space, all quantum channels are superop-
erators, mapping states of one Hilbert space H to states of another (or the same) Hilbert space J.
Not all superoperators are quantum channels, however. As we will optionally see below, there
are some simple conditions on a superoperator Φ that makes it a quantum channel. One such
condition is that Φ must be trace-preserving, which is necessary so that Φ maps states to states,
all of which have unit trace. The other condition is that Φ be completely positive, a concept we will
discuss later. At first, we will only consider the case where H = J, i.e., where the channel maps
states to states in the same system, but later we will see how to generalize to arbitrary H and J.
Throughout this section, we will avoid Dirac notation, as it tends to get in the way.
There are a number of different, equivalent ways of representing a quantum channel. We
consider two here: the coupled-systems representation (aka the Stinespring representation), where we
include the environment then trace it out, and the operator-sum representation (aka the representation
by Kraus operators), where we simply apply operators to states in the system without mentioning
the environment. We’ll show that these two views are equivalent. The coupled-systems view is
more physically intuitive, while the operator-sum view is more mathematically convenient.
141
We now formally describe a quantum channel E on some system S according to the coupled-
systems view. We imagine S (state space HS of dimension n) in some state ρ. We now consider
another system T (whose state space HT has dimension N), in some known or prepared pure state
σ, initially isolated from system S. The combined state of T S is then σ ⊗ ρ initially.21 We now
couple T and S together and let the combined system T S evolve according to some unitary operator
U ∈ L(HT ⊗ HS ), resulting in the state U(σ ⊗ ρ)U∗ . We now “forget” the system T by tracing it
out, obtaining the final state
Because all the components making up the definition of E in (85) are linear maps, E itself is linear,
mapping L(HS ) into L(HS ), and thus E ∈ T(HS ) is a superoperator. E depends implicitly on the
system T , its initial state σ, and U.
At first blush, the operator sum formulation of the quantum channel E looks completely
different. We pick some finite collection of operators K1 , . . . , KN ∈ L(HS ) (for some N > 1) that
are completely arbitrary except that we must have
X
N
K∗j Kj = I . (87)
j=1
X
N
0
X = E(X) := Kj XK∗j ∈ L(HS ). (88)
j=1
Defined this way, the map E is evidently linear from L(HS ) into itself, and so E ∈ T(H) is
a superoperator, and it depends implicitly on the choice of K1 , . . . , KN , which are called Kraus
operators. We’ll show in a minute that the two definitions of E just described are equivalent.
Exercise 24.2 Verify that if ρ is a state (i.e., ρ > 0 and tr ρ = 1), then the operator ρ 0 = E(ρ) defined
by (88) is also a state.
The next exercise shows that quantum channels include unitary evolution.
Exercise 24.3 Show that unitary evolution of the system S through a unitary operator U ∈ L(HS )
is a legitimate quantum channel. Argue with respect to both views of quantum channels.
21
The textbook puts the auxilliary system T on the right, whereas we put it on the left. The two ways are equivalent,
but ours will be more consistent with a block matrix representation we’ll use later when we prove equivalence of the
two views.
142
For another example, suppose we make a projective measurement on the system S in state
ρ—using some complete set {P1 , . . . , Pk } of orthogonal projectors in L(HS )—but we don’t bother
to look at what the outcome of the measurement is. Then for all we know, the post-measurement
state of S will be a mixture of the post-measurement states corresponding to all the possible
outcomes, weighted by their probabilities. That is, using Equation (30), the state of S after this
“information-free” measurement should be22
X
k
Pj ρPj X
0
ρ = Pr[j] = Pj ρPj∗ .
Pr[j]
j=1 j
This looks like the operator sum representation of a quantum channel (Equation (88)), and indeed
we have
X k X X
I= Pj = Pj2 = Pj∗ Pj ,
j=1 j j
because the Pj form a complete set of projectors. Thus P1 , . . . , Pk satisfy Equation (87) to be Kraus
operators, and this information-free measurement is a quantum channel.
Equivalence of the Coupled-Systems and Operator-Sum Representations. First we’ll show that
every quantum channel defined by the coupled system definition has an operator sum represen-
tation. Suppose that E ∈ T(HS ) is defined so that for all X ∈ L(HS ),
E(X) = trT (U(σ ⊗ X)U∗ ),
where T is a system with state space HT , σ ∈ L(HT ) is a pure state (1-dimensional projector), and
U ∈ L(HT ⊗ HS ) is unitary. Let n = dim(HS ) and let N = dim(HT ). We’ll pick a product basis for
HT ⊗ HS so that we can work directly with matrices. Let {e1 , . . . , eN } be an orthonormal basis for
HT and let {f1 , . . . , fn } be an orthonormal basis for HS . We can choose these bases arbitrarily, so
we’ll assume that σ = e1 e∗1 , i.e., σ projects onto the 1-dimensional subspace spanned by e1 . With
respect to the product basis {ei ⊗ fj : 1 6 i 6 N & 1 6 j 6 n}, the operator U can be written
uniquely in block matrix form as
XN
U= Eab ⊗ Bab ,
a,b=1
where each Bab is an n × n matrix, and each Eab := ea e∗b is the N × N matrix whose (a, b)th entry
is 1 and all the other entries are 0. Noting that Eab Ecd = ea e∗b ec e∗d = heb , ec iea e∗d = δbc Ead and
E∗cd = Edc , we have
X
N
U(e1 e∗1 ⊗ X)U∗ = (Eab ⊗ Bab )(E11 ⊗ X)(Ecd ⊗ Bcd )∗ (89)
a,b,c,d=1
X
= Eab E11 Edc ⊗ Bab XB∗cd (90)
a,b,c,d
X
= Eac ⊗ Ba1 XB∗c1 . (91)
a,c
22
A minor technical point: To be well-defined, the first sum in the next equation is really only over those j for which
Pr[j] > 0. However, the second sum is over all j. The two sums are still equal, because if Pr[j] = tr(Pj ρPj∗ ) = 0 for some
j, then Pj ρPj∗ = 0 by Exercise 9.28.
143
Tracing out the first component of each tensor product, and using the fact that tr Eac = δac , we get
X
N X
N
E(X) = trT (U(e1 e∗1 ⊗ X)U∗ ) = (tr Eac )Ba1 XB∗c1 = Ba1 XB∗a1 , (92)
a,c=1 a=1
which has the form of (88) if we let Ka := Ba1 . We’re done if (87) holds. Let IT and IS be the identity
operators in L(HT ) and L(HS ), respectively, and define IT S := IT ⊗ IS , which is the identity on
L(HT ⊗ HS ). Since U is unitary, we have
X
N
IT S = U∗ U = (Eab ⊗ Bab )∗ (Ecd ⊗ Bcd )
a,b,c,d=1
X
= Eba Ecd ⊗ B∗ab Bcd
a,b,c,d
X
= Ebd ⊗ B∗ab Bad
a,b,d
X
N X
!
= Ebd ⊗ B∗ab Bad .
b,d=1 a
We want to isolate what is in the parentheses for b = d = 1 and show that it is the identity IS . We
can do this by multiplying both sides on the left and right with the Hermitean operator E11 ⊗ IS
and then tracing out T . For the left-hand side, we get
trT ((E11 ⊗ IS )IT S (E11 ⊗ IS )) = trT ((E11 ⊗ IS )(IT ⊗ IS )(E11 ⊗ IS )) = trT (E11 ⊗ IS ) = (tr E11 )IS = IS .
For the right-hand side, using linearity of trT , we get
X
N X
N
!! !
trT (E11 ⊗ IS ) Ebd ⊗ B∗ab Bad (E11 ⊗ IS )
b,d=1 a=1
X
N X
!
= trT E11 Ebd E11 ⊗ B∗ab Bad
b,d=1 a
X
N X
!
= trT δ1b δd1 E11 ⊗ B∗ab Bad
b,d=1 a
X
N X
!
= δ1b δd1 trT E11 ⊗ B∗ab Bad
b,d=1 a
X
!
= trT E11 ⊗ B∗a1 Ba1
a
X X
= (tr E11 ) B∗a1 Ba1 = B∗a1 Ba1
a a
144
which means that (87) holds for Kraus operators B11 , . . . , BN1 , and we have a legitimate operator
sum representation of E.
We’ll now show the other direction. Suppose we are given an operator sum representation of
P
E in the form of some collection K1 , . . . , KN of Kraus operators such that N ∗
j=1 Kj Kj = IS . That is
PN
E(X) = a=1 Ka XK∗a for all X ∈ L(HS ). We want to find a coupled-systems representation of E.
As before, we will fix some orthonormal basis {fj }16j6n of HS , so that we can talk about matrices
instead of operators. Define K to be the nN × n matrix
K1
K = ... .
KN
PN ∗
The condition that j=1 Kj Kj = I can be written in block matrix form as
K1
.
K∗ K = K∗1 · · · K∗N
.. = IS . (93)
KN
Here we are multiplying an n × nN matrix on the left and an nN × n matrix on the right to get the
n×n identity matrix. Consider the columns of K as nN-dimensional column vectors. Equation (93)
is equivalent to saying that the columns of K form an orthonormal set. By Gram-Schmidt, we can
take these column vectors as the first n vectors in an orthonormal basis for CnN . We assemble
these basis vectors as the columns of an nN × nN matrix U written in block form by
B11 B12 · · · B1N
B21 B22 · · · B2N XN
U= . = Eab ⊗ Bab ,
.. .. ..
..
. . . a,b=1
BN1 BN2 · · · BNN
where each Bab is an n×n matrix, and the first n columns of U form K, i.e., Ka = Ba1 for 1 6 a 6 N.
The orthonormality of the columns of U is equivalent to the equation U∗ U = I, and so U is unitary.
Now let HT be any N-dimensional Hilbert space, and fix an orthonormal basis {ei }16i6N for HT .
Then with respect to the product basis, U can be considered a unitary operator in L(HT ⊗ HS ),
and so now we follow the string of equations of (92) to see that E(X) = trT (U(e1 e∗1 ⊗ X)U∗ ) for any
X ∈ L(HS ). Indeed, for all X ∈ L(HS ) we have
X
N
trT (U(e1 e∗1 ⊗ X)U∗ ) = trT ((Eab ⊗ Bab )(e1 e∗1 ⊗ X)(Ecd ⊗ Bcd )∗ )
a,b,c,d=1
X
= trT ((Eab ⊗ Bab )(E11 ⊗ X)(Edc ⊗ B∗cd ))
a,b,c,d
X
= trT (Eab E11 Edc ⊗ Bab XB∗cd )
a,b,c,d
X
= trT (δb1 δ1d Eac ⊗ Bab XB∗cd )
a,b,c,d
145
X
= trT (Eac ⊗ Ba1 XB∗c1 )
a,c
X
= (tr Eac )Ba1 XB∗c1
a,c
X
= δac Ba1 XB∗c1
a,c
X
N X
N
= Ba1 XB∗a1 = Ka XK∗a = E(X) .
a=1 a=1
A Normal Form for the Kraus Operators. The choices of Kraus operators in the operator-sum
representation (88) of an quantum channel E are not unique. Neither is the form of the unitary
U in the coupled-systems representation of (85). The freedom in the coupled systems case can be
seen as follows: Suppose A ∈ L(HT ) and B ∈ L(HS ) are any operators, and suppose V ∈ L(HT )
is unitary. Then
trT ((V ⊗ I)(A ⊗ B)(V ⊗ I)∗ ) = trT (VAV ∗ ⊗ B) = (tr(VAV ∗ ))B = (tr A)B = trT (A ⊗ B).
for every C ∈ L(HT ⊗ HS ). In other words, if we eventually trace out the environment T , then
it doesn’t matter if we evolve T ’s state unitarily or not. Let’s conjugate Equations (89–91) by
V ⊗ I, where V is an N × N unitary matrix and I is the n × n identity matrix. Noting that
P
V ⊗I= N a,b=1 [V]ab Eab ⊗ I, we get
146
X
= [V]ea [V]∗ec Ba1 XB∗c1
a,c,e
X X X
! !
= [V]ea Ba1 X [V]∗ec Bc1
e a c
X
= K e∗ ,
e e XK
e
e
where
X
N X
N
K
e e := [V]ea Ba1 = [V]ea Ka (94)
a=1 a=1
for all 1 6 e 6 N. So these equations give us the effect of V on the Kraus operators.
Exercise 24.4 Show by direct calculation that if K1 , . . . , KN ∈ L(HS ) are operators such that
PN ∗ PN
j=1 Kj Kj = I, and for all 1 6 j 6 N we define Kj := a=1 [V]ja Ka for some fixed N × N
e
unitary matrix V, then
XN
e∗K
K j j = I,
e
j=1
So we are allowed to choose V to be any unitary matrix we want without affecting the quantum
channel. We’ll pick a specific V as follows: Given any set of Kraus operators K1 , . . . , KN , let T be
the N × N matrix whose (i, j)th entry is
[T ]ij := Kj , Ki = tr(K∗j Ki ),
∗
for 1 6 i, j 6 N. Note that [T ]ij = Kj , Ki = Ki , Kj = [T ]∗ji , and so T is a Hermitean matrix. Since
D E X
N
Kei , K
ej = [V]∗ia [V]jb hKa , Kb i
a,b=1
X
= [V]jb [T ]ba [V ∗ ]ai
a,b
= [VT V ∗ ]ji
= λj δij
147
Hence we have a normal form for the operator-sum representation of a quantum channel: Any
quantum channel on an n-dimensional state space may be represented by N 6 n2 many Kraus
operators that are pairwise orthogonal.
Exercise 24.5 Explain why the values λ, . . . , λN above are all nonnegative reals.
Quantum Channels Between Different Hilbert Spaces. We have restricted our attention to
quantum channels of the form E : L(H) → L(H), that is, linear maps that map operators of a space
to operators of the same space. This restriction is unnecessary, and it is easy to imagine quantum
channels mapping states in one space to states in another. The partial trace operator itself is a
good example of such a thing. The operator-sum view is the easiest way to characterize these
more general quantum channels. We will satisfy ourselves with the following general definition,
without going into the details of why it is the best one. It certainly coincides with our previous
view in the case where the two spaces are the same.
Definition 24.6 Let H and J be Hilbert spaces. A quantum channel from H to J is a superoperator
E ∈ T(H, J) such that there exists an integer N > 0 and linear maps Kj : H → J for 1 6 j 6 N
P
satisfying the completeness condition N j=1 Kj Kj = IH , such that for every X ∈ L(H),
∗
X
N
E(X) = Kj XK∗j .
j=1
Here, IH denotes the identity map on H. As in the special case where H = J, the Kj are known as
Kraus operators.
Recall that for any linear map K : H → J, the adjoint K∗ is a uniquely defined linear map from J
to H. E maps positive operators to positive operators, and the completeness condition guarantees
that E is trace-preserving.
General Measurements. I didn’t lecture on this in class. The textbook makes a rather significant
logical mistake in its discussion of quantum measurement, starting in Section 2.2.3 but carrying
over into Chapter 8. Except for alerting you to this mistake, this material is optional.
Postulate 3 on pages 84–85 (reformulated in terms of density operators on page 102) describes
general measurements where some classical information may be obtained. In it, they describe a
(general) quantum measurement on a system with state space H Pas an indexed collection {Mm }m∈I
of operators (called measurement operators) in L(H) satisfying m∈I M∗m Mm = I. (Thus the Mm
satisfy the same completeness condition as the Kraus operators did previously.) I use I here again
to describe the set of possible outcomes. According to the postulate, when a system in state ρ is
measured using {Mm }, the probability of seeing an outcome m ∈ I is given by hM∗m Mm , ρi, and
the post-measurement state assuming outcome m occurred is Mm ρM∗m /hM∗m Mm , ρi.
All of this is fine except that it is not general enough. There are legitimate physical measure-
ments that do not take this form. The measurements described by the book are all guaranteed to
produce pure states after the measurement, assuming that a pure state was measured. There are,
148
however, more “imprecise” measurements that may yield mixed states after the measurement,
even if the pre-measurement state was pure.
As before with quantum channels, there are two equivalent views of a general measurement:
the coupled-systems view and the operator-sum view. The textbook gives the operator-sum view.
I’ll describe both views, pointing out how the true operator-sum view differs from the text, but I’ll
omit the proof of equivalence, which is very similar to what I did earlier with quantum channels.
If you want a chance to practice “index gymnastics” yourself, I’ll leave the details of the proof to
you as an exercise.
We’ll only consider finitary measurements here, i.e., measurements with only a finite set of
possible outcomes. One can generalize our analysis to infinitary measurements as well.
In the coupled-systems view, a general measurement on a system S with state space HS
proceeds as follows, assuming ρ is the pre-measurement state of S:
1. Prepare another system T with (finite dimensional) state space HT in some initial pure state
σ.
2. Couple T with S, and let the combined system evolve unitarily according to some unitary
U ∈ L(HT ⊗ HS ), producing the state U(σ ⊗ ρ)U∗ .
3. Perform a projective measurement on the system T S, using some complete set {P(m) : m ∈ I}
of orthogonal projectors in L(HT ⊗ HS ). I is the (finite) set of possible
outcomes. By the
usual rules, the probability of seeing any outcome m ∈ I is Pr[m] = P (m) , U(σ ⊗ ρ)U∗ ,
4. Trace out the system T of the post-measurement state to obtain the post-measurement state
of S:
trT (P(m) U(σ ⊗ ρ)(P(m) U)∗ )
ρS
m := .
tr(P(m) U(σ ⊗ ρ)(P(m) U)∗ )
as well.
In the operator-sum view, a general measurement M of system S is described by a (finite) set
(m) (m)
I of possible outcomes, and for each outcome m ∈ I a finite list M1 , . . . , MN of operators in
149
L(HS ),23 all satisfying
XX
N
(m) (m)
(Mj )∗ Mj = I.
m∈I j=1
I’ll finish
with two remarks about Equations (95) and (96): First, it’s easy to see that the operators
M(m) m∈I of (95) form a POVM, and the converse is also true—any POVM arises from some
general measurement where the post-measurement state is neglected. To see this, let M(m) m∈I
√
be any (finitary) POVM. If we define measurement elements K(m) := M(m) for each m ∈ I,
then these elements form the operator-sum view of a generalized measurement, one operator per
outcome, and the resulting outcome probabilities are the same as with the given POVM. Second,
Postulate 3 only allows one operator per outcome, and so it is the special case of (96) where N = 1.
Completely Positive Maps. This is another optional topic. We’ve seen that every (complete)
quantum channel E maps states to states; equivalently, it has two properties:
We’ll see shortly that the converse does not hold. That is, there are linear maps satisfying (1)
and (2) above that are not legitimate quantum channels according to Definition 24.6. To get a
characterization, we need to strengthen (1) a bit. We say that E is positive if (1) holds, i.e., if E maps
positive operators to positive operators. The stronger condition we need is that E be completely
positive—a condition that we now explain.
23
Actually, the lists could contain different numbers of operators, but we can assume they are all the same length by
padding shorter lists with copies of the zero operator.
150
Quantum channels are linear maps, and we can form tensor products of these linear maps
just as we can with any linear maps. So, given two superoperators E ∈ T(H, J) and F ∈ T(K, M)
(H, J, K, and M are Hilbert spaces), we define E ⊗ F as usual to be the unique superoperator in
T(H ⊗ K, J ⊗ M) that takes A ⊗ B to E(A) ⊗ F(B) for every A ∈ L(H) and B ∈ L(K).
For every Hilbert space H we have the identity superoperator I ∈ T(H) defined by I(A) = A
for all A ∈ L(H). I is certainly a quantum channel, given by the single Kraus operator I ∈ L(H).
The next definition gives the strengthening of property (1) that we need:
Definition 24.7 Let H and J be Hilbert spaces. A superoperator E ∈ T(H, J) is completely positive
if for every Hilbert space K, the map I ⊗ E ∈ T(K ⊗ H, K ⊗ J) is positive, where I ∈ T(K) is the
identity map on L(K).
and thus E(X) > 0. This means that E is positive. Therefore, complete positivity is at least as strong
a condition as positivity.
It may be counterintuitive, but there are maps E that are positive but not completely positive.
Here’s a great example. Fix some orthonormal basis for H so that we can identify operators
on H with matrices. Now consider the transpose operator T that takes any square matrix to its
transpose (not the adjoint, just the transpose), i.e., T (A) = AT for any matrix A. With respect to
the chosen basis, we can think of T as a map from operators in L(H) to operators in L(H), and it
is clearly a linear map, and so T ∈ T(H). T is obviously trace-preserving, and it is also positive:
For any square matrix A, it is easily checked that if A > 0 then AT > 0 as well (if A is normal,
then so is AT , and both matrices have the same spectrum). T is not completely positive, however,
provided dim(H) > 2. Suppose H = K is the state space of a single qubit, and we fix the standard
computational basis {|0i, |1i} for H. Consider the matrix
1 0 0 1
1 0 0 0 0
A = Φ+ Φ+ =
> 0.
2 0 0 0 0
1 0 0 1
Applying I ⊗ T (sometimes called the partial transpose) to A means taking the transpose of each
2 × 2 block (the T part), but not rearranging the blocks at all (the I part). Thus,
1 0 0 0
1 0 0 1 0 = 1 SWAP,
(I ⊗ T )(A) =
2 0 1 0 0 2
0 0 0 1
where we recall that the two-qubit SWAP operator swaps the qubits, i.e., SWAP|ai|bi = |bi|ai for
any a, b ∈ {0, 1}. The eigenvalues of SWAP are 1, 1, 1, −1 (see Exercise 12.2), and so SWAP is not a
positive operator. This shows that I ⊗ T is not a positive map, and so T is not completely positive.
151
Theorem 24.8 Let H and J be Hilbert spaces, and let E ∈ T(H, J) be superoperator. E is a quantum
channel (Definition 24.6) if and only if E is trace-preserving and completely positive.
Proof. First the forward direction. Let E be a quantum channel given in the operator-sum
P
maps from H to J) such that N
representation by Kraus operators K1 , . . . , KN (linearP ∗
j=1 Kj Kj = IH ,
where IH is the identity operator on H, and E(X) = j Kj XKj for any X ∈ L(H). (Recall that each
∗
X
N X X
∗
v Yv = v∗ Kj XK∗j v = (K∗j v)∗ XK∗j v = u∗j Xuj > 0
j=1 j j
as desired, since X > 0. Because E is an arbitrary quantum channel, this shows that every quantum
channel is a positive map.
For Step 2, let K be any Hilbert space, let IK ∈ L(K) be the identity operator on K, and let
I ∈ T(K) be the identity map on L(K). We want to show that I ⊗ E ∈ T(K ⊗ H, K ⊗ J) is a quantum
channel, so we must come up with Kraus operators for I ⊗ E: For 1 6 j 6 N, define Lj = IK ⊗ Kj .
Each Lj is a linear map from K ⊗ H to K ⊗ J, i.e., Lj ∈ L(K ⊗ H, K ⊗ J), and for completeness, we
have
XN X X
L∗j Lj = (IK ⊗ Kj )∗ (IK ⊗ Kj ) = IK ⊗ K∗j Kj = IK ⊗ IH ,
j=1 j j
which is the identity map on K ⊗ H. Finally, if A ∈ L(K) and B ∈ L(H) are arbitrary operators
and we set C := A ⊗ B, we have
X
N X
(I ⊗ E)(C) = (I ⊗ E)(A ⊗ B) = A ⊗ E(B) = (IK ⊗ Kj )(A ⊗ B)(IK ⊗ K∗j ) = Lj CL∗j . (97)
j=1 j
Both sides of Equation (97) are linear in C, and so (97) extends to arbitrary C ∈ L(K ⊗ H). This
shows that I ⊗ E is a quantum channel and hence positive, making E completely positive.
Now the reverse direction. Suppose E ∈ T(H, J) is trace-preserving and completely positive.
We need to come up with Kraus operators for E. Let n := dim(H), and let I be the identity map on
152
L(H). Fix an orthonormal basis {e1 , . . . , en } for H. Taking the product of this basis with itself, we
get a basis {ei,j : 1 6 i, j 6 n} for H ⊗ H, where for convenience we define ei,j := ei ⊗ ej . Define
the vector
X
n X
v := ei,i = ei ⊗ ei ∈ H ⊗ H.
i=1 i
The operator vv∗ ∈ L(H ⊗ H) is clearly positive, and so by assumption, the operator
J := J(E) := (I ⊗ E)(vv∗ ) ∈ L(H ⊗ J)
is also positive. (We are letting K = H.) We have
X
n X
n X
vv∗ = (ei ⊗ ei )(e∗j ⊗ e∗j ) = ei e∗j ⊗ ei e∗j = Eij ⊗ Eij ,
i,j=1 i,j=1 i,j
Because J > 0, we can choose some eigenbasis {g1 , . . . , gN } for J, where N := n2 = dim(H ⊗ H).
This allows us to write
XN
J= λk gk g∗k ,
k=1
where λ1 , . . . , λN > 0 are the eigenvalues of J. For 1 6 k 6 N, we can now define the Kraus
operator Kk ∈ L(H) by its matrix with respect to the {ei,j } basis: for all 1 6 i, j 6 n, define
P PN
We need to check that N k=1 Kk Kk = I (completeness) and that E(X) =
∗ ∗
k=1 Kk XKk for all
X ∈ L(H).
For completeness, fix some a, b ∈ {1, . . . , n}, and using the fact that E is trace-preserving,
compute
X XX XX X X
"N # n
K∗k Kk = [K∗k ]ac [Kk ]cb = [Kk ]∗ca [Kk ]cb = λk hea,c , gk i∗ heb,c , gk i
k=1 ab k c=1 k c k c
X X X X
= λk heb,c , gk ihgk , ea,c i = λk e∗b,c gk g∗k ea,c
k c k c
X X X XX
n
!
= e∗b,c λk gk g∗k ea,c = e∗b,c Jea,c = e∗b,c (Eij ⊗ E(Eij ))ea,c
c k c c i,j=1
X X
= e∗b Eij ea ⊗ e∗c (E(Eij ))ec = e∗c (E(Eba ))ec = tr[E(Eba )] = tr Eba = δab .
c,i,j c
24
It is not needed for the proof, but it is interesting to note that the matrix J contains complete information about E.
It includes all the n2 matrices E(Eij ) laid out as n × n blocks in an n2 × n2 matrix. Since the Eij form a basis for L(H),
E is completely determined by the matrices E(Eij ). J = J(E) is called the Choi representation of E. The condition J > 0 is
actually equivalent to E being completely positive, for any superoperator E.
153
PN ∗
From this we get that k=1 Kk Kk is the identity matrix. This shows completeness.
Now let X ∈ L(H) be arbitrary. Again, we compare matrix elements with respect to the {ei }
basis. For any 1 6 a, b 6 n, we have
X X X X X
"N # n n
Kk XK∗k = [Kk ]ac [X]cd [K∗k ]db = [X]cd [Kk ]ac [Kk ]∗bd
k=1 ab k c,d=1 k c,d=1
X X X X
= λk [X]cd hec,a , gk ihgk , ed,b i = λk [X]cd e∗c,a gk g∗k ed,b
k c,d k c,d
X
= [X]cd e∗c,a Jed,b (just as before)
c,d
X X
n
= [X]cd e∗c,a (Eij ⊗ E(Eij ))ed,b
c,d i,j=1
X X X
= [X]cd e∗c Eij ed ⊗ e∗a (E(Eij ))eb = [X]cd e∗a (E(Ecd ))eb
c,d i,j c,d
X X
= e∗a [X]cd E(Ecd ) eb = e∗a E [X]cd Ecd eb
c,d c,d
∗
= ea (E(X))eb = [E(X)]ab .
PN
Thus E(X) = ∗
k=1 Kk XKk as we wanted. 2
Exercise 24.9 (Optional) Show that the composition of two quantum channels is a quantum chan-
nel. That is, let E ∈ T(H, J) and F ∈ T(J, K) be quantum channels. Show that F ◦ E ∈ T(H, K) is a
quantum channel, where F ◦ E is defined as (F ◦ E)(X) := F(E(X)) for all X ∈ L(H).
Exercise 24.10 (Challenging, Optional) Show that the tensor product of two quantum channels is
a quantum channel. That is, let E ∈ T(H, J) and F ∈ T(K, M) be quantum channels. Show that
E ⊗ F ∈ T(H ⊗ K, J ⊗ M) is a quantum channel.
Definition 24.6 defines what are sometimes called complete quantum channels, and a general
quantum channel (not necessarily complete) is defined the same way, except that we replace the
P PN ∗
completeness condition N ∗
j=1 Kj Kj = IH with the looser condition j=1 Kj Kj 6 IH . Incomplete
quantum channels are used to describe physical processes that may not happen with certainty, e.g.,
a general measurement that results in some outcome m.
Exercise 24.11 (Challenging, Optional) Show that a superoperator E ∈ T(H, J) is a general quan-
tum channel, as described above, if and only if (1) E is completely positive, and (2) for every state
(positive operator with unit trace) ρ ∈ L(H), we have 0 6 tr(E(ρ)) 6 1. The quantity tr(E(ρ)) is
P
interpreted as the probability that E actually occurs. [Hint: Set L := IH − N ∗
j=1 Kj Kj , where √
the
Kj are the Kraus operators corresponding to E as above. Since L > 0, you can define KN+1 := L,
PN+1
and then define E 0 (X) := j=1 Kj XK∗j for any X ∈ L(H). Notice that E 0 is a complete quantum
√ √
channel and that E(X) = E 0 (X) − LX L. Also note that L 6 IH . Apply Theorem 24.8 to E 0 , and
use it to prove facts about E.]
154
Exercise 24.12 (Challenging, Optional) Show that the partial trace map is always a (complete)
quantum channel. [Hint: Let trH : L(H ⊗ J) → L(J) be a partial trace map. Note that by linearity,
trH ∈ T(H ⊗ J, J) is a superoperator. Fix orthonormal bases {e1 , . . . , en } and {f1 , . . . , fm } for H and
J, respectively, and for each j with 1 6 j 6 n, define the Kraus operator Kj ∈ L(H ⊗ J, J) by
X
m X
m
Kj := e∗j ⊗ IJ = e∗j ⊗ fk f∗k = fk (e∗j ⊗ f∗k ),
k=1 k=1
We might drop the subscript p if it is clear what probability distribution we are using.
If p and q are two probability distributions over the same sample space, we are interested in
measures of the similarity or difference between p and q. We’ll discuss two here: the trace distance
and the fidelity.
Definition 25.1 Let p and q be two probability distributions on the same sample space Ω. The
trace distance (also called the L1 distance or the Kolmogorov distance) between p and q is defined as
1 X
D(p, q) := |p(a) − q(a)|.
2
a∈Ω
It is easy to check that D satisfies the axioms for a metric on the set of all probability distributions
on Ω. These are:
1. D(p, q) > 0,
2. D(p, q) = 0 iff p = q,
155
4. D(p, r) 6 D(p, q) + D(q, r),
for any probability distributions p, q, r on Ω. Here’s another way of characterizing the trace
distance: for any probability distributions p and q on Ω,
D(p, q) = max |Prp [S] − Prq [S]| = max (Prp [S] − Prq [S]) . (98)
S⊆Ω S⊆Ω
The trace distance guages the difference between two distributions p and q. The fidelity, on
the other hand, is a measure of their similarity; it is maximized when p = q.
Definition 25.3 Let p and q be two probability distributions on the same sample space Ω. The
fidelity of p and q is defined as X p
F(p, q) := p(a)q(a).
a∈Ω
p F(p, q) can be seen as the dot product of ptwo real unit vectors—the vector whose a’th entry is
p(a) and the vector whose a’th entry is q(a). Since these two vectors clearly have unit norm,
the fidelity is then the cosine of the angle between them. Thus we immediately get 0 6 F(p, q) 6 1,
with F(p, q) = 1 iff p = q.
Trace Distance and Fidelity of Operators. We’d like to extend these definitions to quantum
states, i.e., operators. A reasonable sanity check on the way we should define such an extension
would be to say that if ρ and σ are mixtures of the same set of pairwise orthogonal pure states with
(eigenvalue) probability distributions r and s, respectively, then D(ρ, σ) should be equal to D(r, s),
P
and F(ρ, σ) should be equal to F(r, s). Let’s see this in more detail. Suppose ρ = k j=1 rj ρj and
Pk
σ = j=1 sj ρj , where the pure states ρj project onto mutually orthogonal subspaces (equivalently,
ρi ρj = δij ρi for any i and j). Now consider the operator |ρ − σ|. We have
|ρ − σ| =
p
(ρ − σ)∗ (ρ − σ)
q
= (ρ − σ)2
2 1/2
Xk
= (rj − sj )ρj
j=1
1/2
Xk
= (rj − sj )2 ρj ,
j=1
because the cross-terms (ρi ρj for i , j) all vanish when we expand the expression inside the square
brackets. Since the ρj project onto mutually orthogonal subspaces, we can choose an orthonormal
basis in which all the ρj are diagonal matrices simultaneously. Permuting the basis vectors if need
156
be, we can assume that each ρj (which is a one-dimensional projector) is given by the matrix Ejj .
P P
Thus k 2
j=1 (rj − sj ) ρj =
2 2 2 2
j (rj − sj ) Ejj = diag[(r1 − s1 ) , (r2 − s2 ) , . . . , (rk − sk ) , 0, . . . , 0]. To
take the square root of this matrix, we just take the square root of each diagonal entry, which gives
the matrix diag[|r1 − s1 |, |r2 − s2 |, . . . , |rk − sk |, 0, . . . , 0], and so this is |ρ − σ| in matrix form. Taking
one half of the trace of this gives
1X
k
1
tr |ρ − σ| = |rj − sj | = D(r, s).
2 2
j=1
This suggests that we can now define the trace distance D(ρ, σ) for arbitrary operators A and B as
1 1
D(A, B) := tr |A − B| = kA − Bk1 .
2 2
We can do something similar to define the fidelity of two arbitrary positive operators. I won’t
do the details here, but a reasonable definition is
p
√ √
F(A, B) := tr A1/2 BA1/2 =
B A
(99)
1
for arbitrary operators A, B > 0. It can be shown that F(A, B) = F(B, A), and if ρ and σ are states,
then 0 6 F(ρ, σ) 6 1 with F(ρ, σ) = 1 iff ρ = σ.
We do the same sanity check for F as we did for D, above. If ρ and σ are commuting states as
before, i.e., ρ = diag(r1 , . . . , rk , 0, . . . , 0) and σ = diag(s1 , . . . , sk , 0, . . . , 0) with respect to the same
orthonormal basis, then we have
q X
k
√
F(ρ, σ) = tr diag(r1 s1 , . . . , rk sk , 0, . . . , 0) = rj sj = F(r, s).
j=1
Exercise 25.4 Show that kABk1 = kBAk1 for any Hermitean operators A and B. Thus the fidelity
function F of (99) is symmetric. [Hint: Use Property 10 of the norm, which says that kCk1 = kC∗ k1
for any operator C.]
Properties of the Trace Distance. The trace distance of operators has an alternate characterization
analogous to Equation (98). If A and B are operators, we say that A 6 B if B − A > 0. We’ll show
that for any states ρ and σ,
D(ρ, σ) = max tr(P(ρ − σ)) = max tr(P(ρ − σ)) = max tr(P(ρ − σ)), (100)
projectors P P>0 & kPk=1 06P6I
where the three maxima are taken over all projectors P, all positive operators P of unit operator
norm (L∞ norm), and all operators P such that 0 6 P 6 I, respectively. Equation (100) has many
uses. We won’t bother to do it here, but it is straightforward to check—as a consequence of
Equation (100)—that D(ρ, σ) is the maximum probability difference of any outcome of a POVM
applied to ρ and to σ. The function D is also a metric on the set of all quantum states of a given
system, that is, it can be shown to satisfy the axioms for a metric on page 114, and (100) helps with
showing the triangle inequality for D.
Actually, we’ll show a result slightly more general than Equation (100):
157
Proposition 25.5 Suppose that A is a traceless Hermitean operator, i.e., tr A = 0 and A = A∗ . Let
λ1 , . . . , λn ∈ R be the eigenvalues of A (A acts on an n-dimensional space). The following quantities are all
equal:
1. (1/2)kAk1 ,
2. (1/2) tr |A|,
P
3. i:λi >0 λi ,
4. maxprojectors P tr(PA),
6. max06P6I tr(PA).
X
n X X
tr |A| = |λi | = λi − λi = p − q = 2p.
i=1 i:λi >0 i:λi <0
The inequalities (3) 6 (4) 6 (5) 6 (6) are pretty straightforward and we leave these as exercises.
It remains to show that (6) 6 (3). Consider the expression max06P6I tr(PA) of (6). The
key insight is to show first that the maximum is achieved by some P that commutes with A (i.e.,
PA = AP). Once that fact is established, the rest is easy: we can pick a common eigenbasis for P
and A and look at diagonal matrices.
Suppose that 0 6 P 6 I and that P does not commute with A. We will find an operator P 0
such that 0 6 P 0 6 I and tr(P 0 A) > tr(PA), and so the maximum is not achieved by P.25 Set
C := i(AP − PA). Note that C is Hermitean, because both P and A are, and C , 0 by assumption.
(The quantity AP − PA, for any operators A and P, is called the commutator or the Lie bracket
(pronounced, “LEE”) of A and P, and is denoted by [A, P].) For any ε > 0, define
Then Uε is unitary by Item 4 of Exercise 9.3. The “O(ε2 )” here denotes an operator (depending on
ε) whose norm (it doesn’t matter which norm) is bounded by some positive constant times ε2 . We
now define
P 0 := Uε PU∗ε
25
We are tacitly assuming that the maximum is achieved by some P such that 0 6 P 6 I. This is in fact true, and it
follows from concepts in topology that we won’t go into here, namely, continuity and compactness.
158
for some ε > 0 that we will choose later. It is easy to check that 0 6 P 0 6 I. Now we have
Now C2 = C∗ C > 0, and since C , 0, we must then have tr(C2 ) > 0, either by Exercise 9.28 or by
observing that tr(C∗ C) = hC, Ci > 0 (Hilbert-Schmidt inner product). Now we can choose ε small
enough so that ε tr(C2 ) strictly dominates the O(ε2 ) error term, yielding tr(P 0 A) > tr(PA). This
shows that the maximum value of tr(PA) is achieved only when P commutes with A, i.e.,
Finally suppose that 0 6 P 6 I and that P commutes with A. Pick a common eigenbasis for
P and A so that, with respect to this basis, A = diag(λ1 , . . . , λn ) and P = diag(µ1 , . . . , µn ). Since
0 6 P 6 I, we must have 0 6 µ1 , . . . , µn 6 1, but otherwise, we are free to choose the µj arbitrarily
(see the hint to Exercise 25.7, below). We now have
X
n
tr(PA) = µ j λj ,
j=1
Exercise 25.7 Prove (4) 6 (5) 6 (6) in Proposition 25.5, above. [Hint: The following easy facts are
useful for any operator P:
• 0 6 P 6 I if and only if P is normal and all its eigenvalues are in the closed interval [0, 1] ⊆ R
(consider an eigenbasis for P).
159
• Recall (Exercise 9.35) that 0 6 P iff P = |P|, and thus if 0 6 P then kPk is the largest eigenvalue
of P itself.
For (4) 6 (5), note that every nonzero projector P satisfies 0 6 P and kPk = 1. You need to treat
the case where P = 0 separately. (5) 6 (6) is straightforward.]
In Proposition 25.13, below, I’ll mention one more interesting property of the trace distance:
it can never increase via a quantum channel. This says that all (complete) quantum channels are
contractive with respect to the metric D. So if no classical information is coming out of an open
quantum system, its dynamics tends to cause states to become less distinguishable, not more. This
is not necessarily the case with incomplete quantum channels, where some classical information
is obtained.
Lemma 25.10 (Jordan-Hahn decomposition) For any Hermitean operator A, there exist unique positive
operators Q and S such that QS = SQ = 0 and A = Q − S.
|A| + A
Q := ,
2
|A| − A
S := .
2
Evidently, A = Q − S; moreover,
1
QS = (|A|2 − |A|A + A|A| − A2 ) = |A|2 − A2 ,
4
because A commutes with |A|. Since A is Hermitean, we have A2 = A∗ A = |A|2 , and hence, QS = 0.
A similar argument gives SQ = 0. It remains to show that Q and S are both positive. Let λ1 , . . . , λk
be the distinct eigenvalues of A. The λi are all real, since A is Hermitean. By Corollary 9.17, we
have a unique decomposition
A = λ1 P1 + · · · + λk Pk ,
where the Pj form a complete set of orthogonal projectors. Then
160
which immediately implies
1 X
Q = ((|λ1 | + λ1 )P1 + · · · + (|λk | + λk )Pk ) = λj Pj ,
2
j:λj >0
1 X
S = ((|λ1 | − λ1 )P1 + · · · + (|λk | − λk )Pk ) = (−λj )Pj .
2
j:λj <0
All the coefficients above are nonnegative real numbers, and since all the Pi are positive, Q and S
must both be positive.
To prove uniqueness, suppose some positive operators Q and S satisfy the conditions of the
lemma. Since QS = 0 = SQ, Q and S commute with each other, and thus Q and S both commute
with Q − S = A. Since A, Q, and S are all normal operators, they share a common eigenbasis B by
Theorem 9.41. With respect to B, these three operators are represented by diagonal matrices:
A = diag(a1 , . . . , an ) ,
Q = diag(q1 , . . . , qn ) ,
S = diag(s1 , . . . , sn ) ,
Exercise
p 25.11 Show that if A, Q, and S are as in Lemma 25.10, then |A| = |Q − S| = |Q + S| =
Q2 + S2 .
Exercise 25.12 Show that if A, Q, and S are as in Lemma 25.10, then tr |A| = tr Q+tr S. [Hint: Either
pick a common eigenbasis for Q and S or use the decomposition in the proof of Lemma 25.10.]
Proposition 25.13 Let E ∈ T(H, J) be a (complete, i.e., trace-preserving) quantum channel, and let ρ and
σ be states in L(H). Then D(E(ρ), E(σ)) 6 D(ρ, σ).
Proof. The operator ρ − σ satisfies Lemma 25.10, so uniquely write ρ − σ = Q − S, where Q, S > 0
and QS = SQ = 0. Now we have
1 1
D(ρ, σ) = tr |ρ − σ| = (tr Q + tr S) (Exercise 25.12)
2 2
= tr Q (because tr Q − tr S = tr(Q − S) = tr(ρ − σ) = 1 − 1 = 0)
= tr(E(Q)) . (E is trace-preserving)
161
Noticing that E(ρ) − E(σ) = E(ρ − σ) is a traceless Hermitean operator, we can choose a projector
P that maximizes tr(P(E(ρ) − E(σ))). Continuing on, we get
Properties of the Fidelity. An important special case of Equation (99) is when ρ = uu∗ is a pure
state (u is a unit vector). We may prepare a pure state ρ, then send it through a noisy quantum
channel (quantum channel E), producing a state σ at the other end. The fidelity F(ρ, σ) is a good
measure of how much the state was garbled in the transmission—the higher the fidelity, the less
garbling. For ρ = uu∗ and any state σ, we have
√ √ √
F(uu∗ , σ) = tr u(u∗ σu)u∗ = u∗ σu tr uu∗ = u∗ σu ,
p
(101)
√
noting that uu∗ = uu∗ , which has unit trace. In Dirac notation, letting |ψi := u, this becomes
There is a fact about the fidelity analogous with Proposition 25.13 regarding the trace distance.
We’ll state it without proof.
Proposition 25.14 Suppose E ∈ T(H, J) is a complete quantum channel. For any two states ρ, σ ∈ L(H),
Comparing Trace Distance and Fidelity. The trace distance and fidelity are roughly inter-
changeablepas measures of distinctness/similarity. For pure states ρ and σ it can be shown that
D(ρ, σ) = 1 − F(ρ, σ)2 . For arbitrary states ρ and σ, it can be shown that
q
1 − F(ρ, σ) 6 D(ρ, σ) 6 1 − F(ρ, σ)2 ,
or equivalently, q
1 − D(ρ, σ) 6 F(ρ, σ) 6 1 − D(ρ, σ)2 .
These inequalities are known as the Fuchs-van de Graaf inequalities. So in most situations, it doesn’t
really matter which measure is used. The book uses the fidelity measure almost exclusively. The
inequalities above are illustrated in Figure 10.
162
y
(0, 1)
x
(0, 0) (1, 0)
Figure 10: For any two states ρ and σ, let x := D(ρ, σ) and y := F(ρ, σ). The Fuchs-van de Graaf
inequalities say that the point (x, y) must lie in the shaded region bounded by the line x+y = 1 and
the arc of the unit circle in the first quadrant. This region is symmetric with respect to reflection
through the line x = y.
163
26 Week 13: Quantum error correction
Quantum Error Correction. In this topic, we’ll see ways to reduce the effects of noise in a quantum
channel, thereby increasing the fidelity between the input state to the channel and the output state.
First, we’ll see a typical scenario where this is done classically. Suppose Alice sends individual
bits to Bob across a channel that is noisy in the following sense: any bit b is flipped to the opposite
bit 1 − b with probability p, independent of the other bits. Such a channel is called the binary
symmetric channel or bit-flip channel, and is an often-used model of classical noise. We can assume
that p 6 1/2, because if p > 1/2, then Bob would do well to flip each bit he receives, making the
effective error probability 1 − p < 1/2 per bit sent. If p = 1/2, then all hope is lost; no information
at all can be carried by the bits; Bob receives independently random bits that are completely
uncorrelated with those that Alice sent. So we’ll assume that p < 1/2 from now on.
To reduce the chances of error per bit, Alice and Bob agree on an binary error-correcting code,
which is some mapping
0 7→ c0 ,
1 7→ c1 ,
where c0 and c1 are strings over the binary alphabet {0, 1} (binary strings) of equal length, called
codewords. Instead of sending each bit b, Alice sends cb instead, and Bob decodes what he receives
to (hopefully) recover b. An obvious error-correcting code is
0 7→ 000,
1 7→ 111,
which we’ll call the majority-of-3 code26 . Alice wants to send a bit b (the plaintext or cleartext) to Bob,
so she sends bbb across the channel. When Bob receives the possibly garbled string xyz of three
bits from Alice, he decodes xyz to get the bit c as follows:
c := majority(x, y, z).
The bit b was decoded successfully iff c = b. What is the failure probability, i.e., the probability
that c , b due to unrecoverable errors? There will be a failure iff at least two of the three bits were
flipped in transit. Since each is flipped with probability p independent of the others, we have
The first term in the middle is the probability that exactly two of the three bits were flipped, and
the second term in the middle is the probability that all three bits were flipped. It is easy to see
that Pr[failure] < p if p < 1/2, and so this code reduces the probability of error per plaintext bit
from no encoding at all. Finally, note that Pr[failure] = O(p2 ) as p tends to 0, and so for tiny p 1,
the failure probability is reduced by a considerable factor.
26
This is an example of a repetition code.
164
|ψi
|0i
|0i
Figure 11: The three-qubit quantum majority-of-3 code. An arbitrary one-qubit state |ψi = α|0i +
β|1i is encoded as |ψL i = α|0L i + β|1L i = α|000i + β|111i.
The Quantum Bit-Flip Channel. Now we can “quantize” the scheme above. Suppose Alice
sends qubits one at a time to Bob across a noisy quantum channel that we will call the quantum
bit-flip channel. In the quantum bit-flip channel, a Pauli X operator is applied to each transmitted
qubit with probability p < 1/2, independendently for each qubit. The corresponding quantum
channel is thus
E(ρ) := (1 − p)ρ + pXρX, (103)
√ √
whose set of Kraus operators is { 1 − p I, p X}.27 Suppose that Alice sends some unencoded
one-qubit pure state |ψihψ| through the bit-flip channel E. Ideally, Bob wants to receive |ψihψ|, but
in reality, Bob receives ρ 0 := E(|ψihψ|) = (1 − p)|ψihψ| + pX|ψihψ|X. The fidelity between Alice’s
sent state and Bob’s received state is, by Equation (102),
q
F(|ψihψ|, ρ ) = hψ|ρ |ψi = (1 − p) + phψ|X|ψi2 > 1 − p ,
0
p p
0
√ equality holding if |ψi = |0i or |ψi = |1i. So the fidelity without encoding can be as low as
with
1 − p.
Now suppose that Alice and Bob employ a quantum version of the majority-of-3 code. Alice
encodes each plaintext qubit she sends to Bob as a three-qubit code state using the map
extended to all one-qubit pure states by linearity. Here the subscript “L” stands for “logical”—three
physical qubits are being used to encode one logical (uncoded) qubit. Figure 11 shows how Alice
encodes a single qubit in state |ψi := α|0i + β|1i as a three-qubit state |ψL i := α|0L i + β|1L i. |ψL i
lies in the code space, i.e., the two-dimensional subspace of the eight-dimensional Hilbert space of
three qubits spanned by |0L i and |1L i. The three qubits in state |ψL i are sent through the channel,
and (we assume) each qubit is subjected to the E of Equation (103) independently of the other two.
Thus the channel yields the output state
165
Bob receives σ and wants to decode it to (hopefully) recover |ψi. Now some issues arise that
aren’t a problem in the classical case. Most importantly, Bob cannot just measure the physical
qubits he receives, since this will destroy the superposition making up |ψi. In fact, Bob’s error
correction operation cannot give him any classical information about |ψi; any such information
would disrupt |ψi. Instead, Bob can measure what kind of error occurred (if any) and correct
the error directly without disturbing |ψi. The type of error is called the error syndrome. Bob’s
decoding is a two-step process: First, Bob will measure the error syndrome, i.e., which bit (if any)
was flipped, without gaining any knowledge of what the values of the bit were before and after.
Second, knowing which qubit was flipped, Bob applies an X gate to that qubit, and this will allow
him to recover |ψi with high probability.
To measure the error syndrome, Bob makes a four-outcome projective measurement on his
three received qubits using the four projectors
P0 = |000ih000| + |111ih111|,
P1 = |100ih100| + |011ih011|,
P2 = |010ih010| + |101ih101|,
P3 = |001ih001| + |110ih110|.
P0 , . . . , P3 form a complete set of projectors, and each Pj is a two-dimensional projector. P0 projects
onto the code space and corresponds to the outcome of either no qubits flipped or all three qubits
flipped. P1 corresponds to the outcome of either the first qubit flipped and the other two left alone,
or the other two flipped and the first left alone. P2 and P3 are similar for the second and third
qubits, respectively. Note that, whatever the state was before the syndrome measurement, the
post-measurement state is in one of the four subspaces projected onto by P0 , . . . , P3 , respectively.
Let j ∈ {0, 1, 2, 3} be the outcome of Bob’s syndrome measurement, above. After the measure-
ment, Bob tries to recover |ψL i as follows: if j = 0, then Bob assumes that no qubits were flipped
(which is way more likely than all three being flipped), and so he does nothing; if 1 6 j 6 3, then
Bob assumes that the jth qubit was flipped (which is somewhat more likely than the other two
being flipped), and so he flips the jth qubit back by applying an X gate to it. No matter what qubits
were flipped in the channel, Bob has a state in the code space after the correction. If at most one
qubit was flipped, then Bob has |ψL i, and the recovery is successful. If more than one qubit was
flipped, then Bob has the state XL |ψL i, where XL is some three-qubit operator that swaps |0L i with
|1L i, and the recovery failed. (Bob doesn’t know at this point whether he succeeded or failed.)
In a moment, we’ll see in detail how Bob can perform these steps, but once he has |ψL i—and
if he really wants to—he can convert |ψL i back into |ψi by applying the circuit
tr
tr
which is the inverse of Alice’s circuit of Figure 11. The two gates labeled “tr” are what I call trace
gates. A trace gate just signifies that a qubit is no longer useful and is to be ignored, i.e., traced
166
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1
X
σ X
X
|0i b2
Figure 12: Bob’s error-recovery circuit for the quantum bit-flip channel. The middle three qubits
are what he receives from Alice, and the two outer qubits are ancillæ used in the syndrome
measurement.
out. So mathematically, a trace gate corresponds to a partial trace. Assuming the input state to the
circuit is |ψL i, the first qubit of the output will be in state |ψi. The traced-out qubits will both hold
|0i if the input state is in the code space. Bob usually does not want to recover |ψi by decoding
if the encoded state will be used for further computation, because those computations are more
fault-tolerant using the encoded state.
A circuit for Bob’s syndrome measurement and subsequent correction is shown in Figure 12.
Bob received the three-qubit state σ in the middle three qubits. His syndrome measurement is
split into two binary measurements: He first measures whether the first two of the three qubits
from Alice are different. The value b1 measured on the upper ancilla will be 1 iff they are, and 0
otherwise. In other words, b1 gives the parity (sum modulo 2) of the values of the first two qubits.
Similarly, the lower ancilla measurement is 1 iff the second two qubits are different. To correct the
state, Bob combines these two Boolean values to determine which qubit value, if any, is different
from the other two, and applies a classically controlled X gate to this qubit.
Exercise 26.1 Show mathematically that the syndrome measurement portion of Figure 12 is the
same as the projective measurement {P0 , P1 , P2 , P3 } described earlier. What values of b1 b2 corre-
spond to which Pj ?
Bob’s entire recovery process in Figure 12 can be described as a quantum channel R that maps
three-qubit states to three-qubit states: For input state σ, we have
X
3
R(σ) = P0 σP0 + Xj Pj σPj Xj , (104)
j=1
where Xj is the Pauli X gate applied to the jth qubit. That is,
X1 = X ⊗ I ⊗ I,
X2 = I ⊗ X ⊗ I,
X3 = I ⊗ I ⊗ X.
167
Thus the state after Bob’s recovery is
To get a handle on what τ is, first notice that |ψL ihψL | is a linear combination of operators of the form
|aL ihbL | = |aaaihbbb| for a, b ∈ {0, 1}. By Equation (103) we have E(|aihb|) = (1−p)|aihb|+p|āihb̄|,
where we let ā := 1 − a and b̄ := 1 − b. Then,
= [E(|aihb|)]⊗3
⊗3
= [(1 − p)|aihb| + p|āihb̄|]
= (1 − p)3 |aaaihbbb|
+ (1 − p)2 p (|āaaihb̄bb| + |aāaihbb̄b| + |aaāihbbb̄|)
+ (1 − p)p2 (|aāāihbb̄b̄| + |āaāihb̄bb̄| + |āāaihb̄b̄b|)
+ p3 |āāāihb̄b̄b̄|.
(Alternatively, we can expand E⊗3 (ρ) for any three-qubit operator ρ to get an operator-sum ex-
pression for E⊗3 :
then plug in |aaaihbbb| for ρ to get the same expression for E⊗3 (|aaaihbbb|).) Applying the R of
Equation (104) to E⊗3 (|aaaihbbb|) above, we get, after much simplification,
R(E⊗3 (|aL ihbL |)) = (1 − 3p2 + 2p3 )|aaaihbbb| + (3p2 − 2p3 )|āāāihb̄b̄b̄|
= (1 − 3p2 + 2p3 )|aL ihbL | + (3p2 − 2p3 )X1 X2 X3 |aL ihbL |X1 X2 X3 .
Exercise 26.2 Verify this last equation. This may be tedious, but it is good practice.
Since this equation holds for all four bowtie operators |aL ihbL |, by linearity, we get
τ = (1 − 3p2 + 2p3 )|ψL ihψL | + (3p2 − 2p3 )X1 X2 X3 |ψL ihψL |X1 X2 X3 .
The first term represents Bob’s successful recovery of |ψL i, and this occurs with probability 1 −
3p2 + 2p3 , which is greater than 1 − p if p < 1/2. In fact, it is 1 − O(p2 ), which is significant if p is
small. For the fidelity, we get
p
F(|ψL ihψL |, τ) = hψL |τ|ψL i > 1 − 3p2 + 2p3 > 1 − p,
p p
and so the minimum fidelity of τ with |ψL ihψL | is strictly greater than the minimum fidelity of σ
with |ψL ihψL |. So, recovery improves the worst-case fidelity.
168
The Quantum Phase-Flip Channel. Bit flips are not the only possible errors in a quantum
channel. Consider the one-qubit phase-flip channel given by
which applies a Pauli Z operator to the qubit (thus flipping the relative phase between |0i and |1i
by a factor of −1) with probability p < 1/2.
This kind of channel has no classical analogue, but in a very real sense it is closely analogous
to the quantum bit-flip channel—the two channels are “unitarily conjugate” to each other via the
Hadamard H operator. Here’s what I mean by that: Since HX = ZH and XH = HZ, we have
with equality holding if |ψi = H|0i or |ψi = H|1i. So the worst-case fidelity is the same as with the
bit-flip channel.
To get an error-correcting code for the phase-flip channel, we take our majority-of-3 construction
for the bit-flip channel and insert Hadamard gates in the right places. Recall that we’ve defined
|+i := H|0i and |−i := H|1i. Alice now encodes her one-qubit pure state |ψi = α|0i + β|1i as
0
ψ := H⊗3 (α|000i + β|111i) = α|+++i + β|−−−i = α0 0 + β1 0 ,
L L L
Alice, measures the error syndrome with projectors Q0 , Q1 , Q2 , Q3 , where each Qj := H⊗3 Pj H⊗3 .
If Bob sees that the relative phase of one of the qubits is different from that of the other two, then
Bob assumes that the qubit’s phase was flipped and applies a Z gate to that qubit. The circuit for
doing all this is shown in Figure 14.
The quantum channel corresponding to Bob’s whole procedure is given by
X
3 X
3
S(σ) := Q0 σQ0 + Zj Qj σQj Zj = H⊗3 P0 H⊗3 σH⊗3 P0 H⊗3 + Zj H⊗3 Pj H⊗3 σH⊗3 Pj H⊗3 Zj
j=1 j=1
28
Put more succinctly, U ◦ F = E ◦ U and U ◦ E = F ◦ U, where U is the one-qubit unitary quantum channel that maps
ρ 7→ HρH.
169
|ψi H
|0i H
|0i H
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1
H H Z
σ H H Z
H H Z
|0i b2
Figure 14: Bob’s error recovery procedure for the phase-flip channel.
X
3
⊗3 ⊗3 ⊗3 ⊗3
= H P0 H σH P0 H + H⊗3 Xj Pj H⊗3 σH⊗3 Pj Xj H⊗3
j=1
X
3
H⊗3 P0 (H⊗3 σH⊗3 )P0 + Xj Pj (H⊗3 σH⊗3 )Pj Xj H⊗3 = H⊗3 R(H⊗3 σH⊗3 ) H⊗3
=
j=1
and thus
H⊗3 (S(σ))H⊗3 = R(H⊗3 σH⊗3 ) (105)
for any three-qubit operator σ, where R is the bit-flip recovery channel of Equation (104). That is,
S is unitarily conjugate to R via H⊗3 . In a similar fashion, we can get that
H⊗3 (F⊗3 (ρ))H⊗3 = E⊗3 (H⊗3 ρH⊗3 ) (106)
for any three-qubit operator ρ.
Letting τ 0 := S(F⊗3 (ψL0 ψL0 )) and stringing these channels together, (105) and (106) give us
170
|ψi H
|0i
|0i
|0i H
|0i
|0i
|0i H
|0i
|0i
Figure 15: The nine-qubit Shor code. This concatenates the phase-flip and bit-flip codes.
or equivalently,
τ 0 = H⊗3 τH⊗3 = (1 − 3p2 + 2p3 )ψL0 ψL0 + (3p2 − 2p3 )Z1 Z2 Z3 ψL0 ψL0 Z1 Z2 Z3 .
Thus we get the same success probability here as with the bit-flip channel, and the fidelity is at
worst the same as it was then:
q
p
F(ψL0 ψL0 , τ 0 ) = ψL0 |τ 0 |ψL0 > 1 − 3p2 + 2p3 .
The Shor Code. We can combine the bit-flip and phase-flip error correcting codes above to correct
against both kinds of errors, even on the same qubit. As a bonus, we’ll show that the resulting
code corrects against arbitrary errors on a single qubit. A typical one-qubit channel that has all
three kinds of errors (bit flip, phase flip, and combined bit and phase flip) is called the depolarizing
channel, and it maps
p p
ρ 7→ D(ρ) := (1 − p)ρ + (XρX + ZρZ + ZXρXZ) = (1 − p)ρ + (XρX + YρY + ZρZ). (107)
3 3
This channel leaves the qubit alone with probability 1 − p > 1/2 and produces each of the three
possible errors with the same probability p/3.
To help correct against all three types of errors, Alice first encodes a single qubit using the three-
qubit phase-flip code of Figure 13, then she encodes each of the three qubits using the majority-of-3
code for the bit-flip channel, shown in Figure 11. The resulting encoding circuit, shown in Figure 15,
produces the nine-qubit Shor code, named after its inventor, Peter Shor. Such a code is called a
concatenated code, in that it combines (concatenates) two or more simpler codes into a single code.
Using the Shor code, Alice encodes a single qubit in state |ψi = α|0i + β|1i as the nine-qubit state
|ψS i := α|0S i + β|1S i, where
1
|0S i := √ (|000i + |111i)⊗3 = |+L i⊗3 , (108)
2 2
1
|1S i := √ (|000i − |111i)⊗3 = |−L i⊗3 , (109)
2 2
171
√ √
where we define the three-qubit states |+L i := (|000i + |111i)/ 2 and |−L i := (|000i − |111i)/ 2.
The nine qubits are naturally divided into three subblocks of three qubits each, which I’ll call
3-blocks. Alice sends Bob |ψS ihψS | through a channel (e.g., the depolarizing channel) that may
cause one of the three errors on each of the nine qubits with some probability independently of
the others. If more than one qubit is affected, then the recovery may not work, and so we hope
that the probability of this happening is low.
Bob receives the nine-qubit state σ sent from Alice, and we’ll assume (with high probability)
that at most one of the nine qubits endured either a bit flip, phase flip, or both. For example,
suppose Alice sends |0S i = |+L i⊗3 to Bob. If the first qubit is bit-flipped en route, then Bob
√
receives (1/ 2)(|100i + |011i) ⊗ |+L i⊗2 . If the first qubit is phase-flipped, then Bob gets the state
√
(1/ 2)(|000i − |111i) ⊗ |+L i⊗2 = |−L i ⊗ |+L i⊗2 . (Note that a phase flip in a qubit contributes
an overall phase flip in its 3-block; phase flips on two different qubits in the same block would
cancel each other.) Finally, if the first qubit is bit-flipped and then phase-flipped, then Bob gets
√
(1/ 2)(−|100i + |011i) ⊗ |+L i⊗2 .
To recover, Bob first applies the bit-flip error recovery operation R of Figure 12 and Equa-
tion (104) to each of the three 3-blocks independently. This will correct up to a single bit-flip error
within each 3-block. Crucially, this intrablock bit-flip recovery works regardless of whether there
was also a phase-flip in the 3-block. After Bob corrects bit flips within each 3-block, he must then
correct phase flips. He does this by comparing phase differences between adjacent 3-blocks, either
finding which 3-block’s phase doesn’t match the other two and flipping that 3-block’s phase back,
or else determining that the phases of the 3-blocks are all equal and nothing needs to be done.
A circuit that accomplishes this phase-flip recovery portion of the overall recovery is shown in
Figure 16. In essence, Bob’s procedure first applies a bank of H gates to turn phase flips into bit
flips, then it measures the bit flips (bit parity), then converts the state back into phase flips (more
H gates), which it can then correct with Z gates based on what bit flips were measured. To see that
this all works, define
1
|eveni := (|000i + |011i + |101i + |110i),
2
1
|oddi := (|100i + |010i + |001i + |111i).
2
|eveni is a superposition of all computational basis states of three qubits with an even number
of 1’s; |oddi is a superposition of the other computational basis states. One can readily check
that |eveni = H⊗3 |+L i and that |oddi = H⊗3 |−L i. Suppose for example that Alice sends |1S i =
|−L i|−L i|−L i to Bob (we are surpressing the ⊗ operator symbol for now), and there is a phase
flip on some qubit in the first 3-block (it doesn’t matter which). So after correcting bit flips, Bob’s
state is now |+L i|−L i|−L i, which feeds into the circuit of Figure 16. Applying the Hadamard gates
yields the state |eveni|oddi|oddi. As a result of the CNOTs, the upper ancilla’s bit value will flip
an even + odd = odd number of times, and so b1 = 1. The lower ancilla’s bit value will flip an
odd + odd = even number of times, and so b2 = 0. The next layer of Hadamards converts the
state back to |+L i|−L i|−L i, and then Bob recovers by applying a Z gate to the first qubit, yielding
|−L i|−L i|−L i = |1S i.
Let’s look briefly at the quantum channels involved. Let T be the quantum channel corre-
sponding to Bob’s entire recovery procedure for the Shor code.
172
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i b1
H H Z
H H
H H
H H Z
H H
H H
H H Z
H H
H H
|0i b2
Figure 16: Recovering from a phase flip with the Shor code. When this circuit starts, Bob assumes
that he has already corrected any bit flip in a 3-block if there was one, and so the incoming state is
a linear combination of the eight states |±L i|±L i|±L i.
Exercise 26.3 (Challenging) Give an expression for T applied to an arbitrary nine-qubit state ρ.
Make your expression as succinct as possible but still mathematically precise. You may use the
following notations for operators without having to expand them:
• Let R0 , R1 , R2 , R3 represent the projectors for the measurement performed in Figure 16, corre-
sponding to the outcomes 00, 10, 11, and 01, respectively, for b1 b2 .
(j) (j) (j) (j)
• For j ∈ {0, 1, 2}, let P0 , P1 , P2 , P3 be the projectors used for the bit-flip syndrome mea-
surement in the (j + 1)st 3-block, as described in the bit-flip channel discussion.
• For any single-qubit operator A and k ∈ {1, . . . , 9}, let Ak be the nine-qubit operator that
applies A to the kth qubit and leaves the other qubits alone.
Suppose the channel between Alice and Bob is the one-qubit depolarizing channel D of Equa-
tion (107). If Alice and Bob use the Shor code, then nine qubits will be transferred per single
plaintext qubit. For an arbitrary nine-qubit state ρ, the effect of D on ρ is then
pX
9
D⊗9 (ρ) = (1 − p)9 ρ + (1 − p)8 (Xj ρXj + Yj ρYj + Zj ρZj ) + O(p2 ).
3
j=1
Where the terms hidden in the “O(p2 )” represent errors on two or more qubits, from which Bob
might not recover. Bob, however, can recover from any of the single-qubit errors showing in the
expression above, provided ρ is in the code space of the Shor code. That is, if ρ = |ψS ihψS | for some
173
one-qubit state |ψi, then we’ve shown that T(Xj ρXj ) = T(Yj ρYj ) = T(Zj ρZj ) = ρ for all 1 6 j 6 9,
and thus the final error-corrected state is
υ := T(D⊗9 (ρ)) = ((1 − p)9 + 9p(1 − p)8 )ρ + O(p2 ) = (1 − p)8 (1 + 8p)ρ + O(p2 ).
The hidden terms are all of the form KρK∗ = K|ψS ihψS |K∗ for some Kraus operators K, and thus
the fidelity is
Exercise 26.4 Show that using the Shor code, Bob recovers from X2 X4 X9 Z1 Z2 Z3 Z4 Z5 Z7 Z9 .
Exercise 26.5 (Challenging) What is the worst-case fidelity of sending an unencoded one-qubit
pure state |ψihψ| through the depolarizing channel D? About how small does p have to be so that
the worst-case estimate of the fidelity for the Shor code, above, is better than this? A numerical
approximation will suffice.
174
27 Week 13: Error correction (cont.)
Quantum Error Correction: The General Theory. Here we want to determine, in the most
general terms that we can, when it is possible to recover from a noisy quantum channel through
the use of an error correcting code. Let H be the Hilbert space of states that are to be sent through
some noisy channel. We’ll assume that information sent through the channel is encoded into states
in some linear subspace C ⊆ H, i.e., the code space. Let P be the projector that projects orthogonally
onto C. We’ll assume that the noisy channel is modeled by some (possibly incomplete) quantum
error channel E ∈ T(H). For example, in our discussion of the Shor code, H is the 29 -dimensional
space of nine qubits, and C is the 2-dimensional subspace spanned by the vectors |0S i and |1S i; the
noisy channel E of interest may be the portion of the depolarizing channel D⊗9 in which at most
one qubit is affected. This channel sends a state ρ ∈ L(H) to
X
9
E(ρ) := (1 − p)9 ρ + (1 − p)8 p/3 (Xj ρXj + Yj ρYj + Zj ρZj ), (110)
j=1
and represents the portion of D⊗9 from which we know Bob can recover. Note that E is an
incomplete (non-trace-preserving) channel, because we are omitting the terms of D⊗9 where more
than one qubit is subjected to an error, and from which Bob may not be able to recover. The
incompleteness reflects the fact that this unrecoverability happens with nonzero probability.
Back to the general case. We’llPsay that a quantum state ρ is in the code space C iff it is a convex
sum of pure states in C, i.e., ρ = i pi |ψi ihψi | where each |ψi i ∈ C. Equivalently, ρ is in the code
space iff ρ = PρP (equivalently, ρ = Pρ, or equivalently, ρ = ρP, by Exercise 27.1, below).
Exercise 27.1 Prove that the following are equivalent for any projector P and Hermitean operator
A: (1) A = PAP; (2) A = AP; (3) A = PA. [Hint: No decompositions are needed for any of
these—just simple substitutions and taking adjoints.]
The error channel E can be given in operator-sum form by some Kraus operators E1 , . . . , EN ∈
P
L(H) such that Nj=1 Ej Ej 6 I (inequality because E is not necessarily complete), and
∗
X
N
E(ρ) = Ej ρE∗j
j=1
for any ρ ∈ L(H). Suppose that R ∈ T(H) is some (not necessarily complete) quantum channel
representing a recovery procedure. We will say that R successfully recovers from E if (R ◦ E)(ρ) = cρ
for any ρ in C, where c is a real constant depending on ρ, E, and R and satisfying 0 6 c 6 1. (If
E and R are both complete (hence trace-preserving), then we must have c = 1.) We will say that
E is recoverable if there exists an R that successfully recovers from E. The next theorem gives a
quantitative criterion for when an error channel is recoverable.
Theorem 27.2 Let E be a nonzero, possibly incomplete error channel on L(H) given by Kraus operators
E1 , . . . , EN ∈ L(H). Fix a code space C ⊆ H and let P be the projector projecting onto C. E is recoverable
(with respect to C) if and only if there exists an N × N matrix M such that, for all 1 6 i, j 6 N,
PE∗i Ej P = [M]ij P. (111)
175
Further, if such an M exists, then M > 0, tr M is the probability that E occurs given any state in C, and a
(complete) recovery channel R exists such that (R ◦ E)(ρ) = (tr M)ρ for any ρ in C.
I call Equation (111) the “peep” condition, because of the left-hand side. We will only be
interested in the backwards implication, giving sufficient conditions for E to be recoverable by
actually constructing a recovery procedure. So we won’t prove the forward implication (the
Nielsen & Chuang textbook does it). Here is some intuition about the forward implication:
Suppose E is recoverable. Let |ψi and |ϕi be
in the code space C. Then P|ψi = |ψi
any twostates
and P|ϕi = |ϕi, and so for all i, j, we have ψ|E∗i Ej |ϕ = ψ|PE∗i Ej P|ϕ . The left-hand side is the
inner product of the vectors Ei |ψi and Ej |ϕi. Equation (111) is equivalent to saying that this value
is proportional to hψ|P|ϕi = hψ|ϕi, where the constant of proportionality ([Mij ]) depends only on
i and j and not on |ψi or |ϕi. One might expect that this is needed, so that the error operators do
not “distort” the code space, i.e., they preserve inner products up to a constant, because to recover,
we must apply a unitary operation to restore the code space undistorted, so that superpositions of
states in the code space are preserved up to a constant.
Proof. [backward implication] Suppose that M exists satisfying (111) for all i, j. We can assume
that P , 0; otherwise, the theorem is trivial. Taking the adjoint of each side of (111), we have, for
all 1 6 i, j 6 N,
[M]∗ij P = PE∗j Ei P = [M]ji P,
and so [M]∗ij = [M]ji since P , 0, which means that M is Hermitean. The next thing to do is to
simplify (111) by diagonalizing M. Since M is normal, there is an N × N unitary matrix U and
scalars d1 , . . . , dN ∈ R (the eigenvalues of M) such that U∗ MU = diag(d1 , . . . , dN ). For 1 6 k 6 N,
define
X
N
Fk := [U]jk Ej .
j=1
Then
X
N X
N X X X
!
F∗k Fk = [U]jk [U]∗ik E∗i Ej = δji E∗i Ej = E∗j Ej 6 I,
k=1 i,j=1 k i,j j
X
N X
N X X X
!
Fk ρF∗k = [U]ik [U]∗jk Ei ρE∗j = δij Ei ρE∗j = Ej ρE∗j = E(ρ).
k=1 i,j=1 k i,j j
Thus F1 , . . . , FN are also a set of Kraus operators for E. Now Equation (111) becomes, for all
1 6 k, ` 6 N,
X
N X X
PF∗k F` P = [U]∗ik [U]j` PE∗i Ej P = [U∗ ]ki [M]ij [U]j` P = [U∗ MU]k` P = dk δk` P. (112)
i,j=1 i,j i,j
tr(PF∗k Fk P) hFk P, Fk Pi
dk = = >0.
tr P tr P
176
This implies M > 0. Also, for any state ρ in C, the probability that E actually occurs is given by
X
N
tr(E(ρ)) = tr(E(PρP)) = tr(Fk PρPF∗k ) =
k=1
X X X X
tr(PF∗k Fk Pρ) = dk tr(Pρ) = dk tr ρ = dk = tr M .
k k k k
Note that if dk = 0 for some k, then hFk P, Fk Pi = 0, and so Fk P = 0. This implies that if ρ is
any state in C, then Fk ρF∗k = Fk PρPF∗k = 0, and so this term is dropped from the operator-sum
expression for E(ρ). Since we only care about the behavior of E on states in C, we can effectively
ignore the cases where dk = 0 and assume instead that all the dk are positive.
By the Polar Decomposition (Theorem B.8 in Section B.3), for each 1 6 k 6 N there is a unitary
Uk ∈ L(H) such that p
Fk P = Uk |Fk P| = Uk PF∗k Fk P = dk Uk P.
p
(113)
Uk rotates C to the subspace Ck that is the image of the projector Pk defined as
Fk PU∗
Pk := Uk PU∗k = √ k . (114)
dk
The crucial fact that makes E recoverable is that these Ck subspaces are mutually orthogonal:
if k , `. To help see what’s going on, it’s worth seeing what happens when E is applied to some
pure state |ψihψ| with |ψi ∈ C. We have |ψi = P|ψi, and so by Equation (113) we have
X
N X X X
E(|ψihψ|) = Fk |ψihψ|F∗k = Fk P|ψihψ|PF∗k = dk Uk |ψihψ|U∗k = dk |ψk ihψk |,
k=1 k k k
where |ψk i := Uk |ψi ∈ Ck for each k. So E(|ψihψ|) is a mixture of pure states |ψk i in the various
subspaces Ck . We can thus interpret E as mapping |ψi to |ψk i ∈ Ck with probability dk . Since the
Ck are mutually orthogonal, the |ψk i are pairwise orthogonal. To recover, we can first measure
to which Ck the state E(|ψihψ|) belongs. This projective measurement projects to one of the states
|ψk i = Uk |ψi, where k is the outcome of the measurement. Then to correct the error, we simply
apply U∗k to get U∗k |ψk i = |ψi.
Now we describe R formally. R consists of two stages: (1) measure the error syndrome
(i.e., “which Ck ?”), and (2) apply the appropriate (unitary) correction U∗k . By Equation (115),
the projectors P1 , . . . , PN form a set of orthogonal projectors. If this is not a complete set, i.e., if
PN PN
k=1 Pk , I, then we add one more projector PN+1 := I − k=1 Pk to the set to make it complete.
Otherwise, we set PN+1 := 0. The syndrome measurement is then a projective measurement with
the Pk . (If the outcome is N + 1, which signifies “none of the above,” then we really don’t know
what to do, so we’ll give up and define UN+1 := I for completeness. If the state being measured is
the result of applying E to some state in the code space C, however, then outcome N + 1 will never
actually occur.)
177
So we define, for any σ ∈ L(H),
X
N+1
R(σ) := U∗k Pk σPk Uk .
k=1
Thus R has Kraus operators U∗k Pk for 1 6 k 6 N + 1. We first check that R is compete:
X
N+1 X
N+1
Pk Uk U∗k Pk = Pk = I .
k=1 k=1
It remains to check that R successfully recovers from E for arbitrary states in C—not just pure
states. The following equation will make things easier: for all 1 6 k, ` 6 N,
U∗ Uk PF∗ F` P PF∗ F` P
U∗k Pk F` P = U∗k Pk∗ F` P = k √ k
p
= √k = dk δk` P , (116)
dk dk
using Equations (112) and (114). Also, for 1 6 ` 6 N, we have PN+1 P` = 0 by orthogonality, and
thus, using Equations (113) and (114),
U∗N+1 PN+1 F` P = PN+1 F` P = d` PN+1 U` P = d` PN+1 P` U` = 0 ,
p p
(117)
and so Equation (116) holds for k = N + 1 as well.
So finally, if ρ is in C, we have, by Equations (116) and (117),
XX
N+1 N
R(E(ρ)) = R(E(PρP)) = U∗k Pk F` PρPF∗` Pk Uk
k=1 `=1
XX
= (U∗k Pk F` P) ρ (U∗k Pk F` P)∗
k `
XXp p
= dk δk` P ρ dk δk` P
k `
XX
!
= dk δk` PρP
k `
= (tr M)ρ .
2
Exercise 27.3 (Challenging) Recall the quantum bit-flip channel for a single qubit:
E(ρ) := (1 − p)ρ + pXρX.
Also recall the recoverable portion of the three-qubit bit-flip channel:
X
3
0
E (ρ) = (1 − p) ρ + (1 − p) p
3 2
Xj ρXj .
j=1
√ √ √
Show directly that E 0 , with Kraus operators (1 − p)3/2 I, (1 − p) pX1 , (1 − p) pX2 , (1 − p) pX3 ,
satisfies the peep condition (111) of Theorem 27.2, where C is the usual majority-of-3 code space
given by the projector P = |000ih000| + |111ih111|. What is the matrix M? What are the Pk and Uk ?
Is the R constructed by the Theorem the same as it was before?
178
Discretization of Errors. The great thing about the Shor code is that it can recover from an arbitrary
single-qubit error. There are many possible single-qubit errors, as there are a continuum of possible
one-qubit Kraus operators. Yet they are all corrected by the Shor code, with no additional work.
This happy fact follows from the following two general theorems:
Theorem 27.4 Suppose C ⊆ H is the code space for a quantum code, P is the projector projecting
orthogonally onto C, E ∈ T(H) is a not necessarily complete quantum error channel with Kraus operators
F1 , . . . , FN , and R ∈ T(H) is a quantum channel with Kraus operators R1 , . . . , RM such that, for any
1 6 j 6 N there exist scalars dj > 0 such that
p
Rk Fj P = dj δkj P (118)
for any 1 6 k 6 M. Suppose also that G is an error channel whose Kraus operators G1 , . . . , GK are all
linear combinations of F1 , . . . , FN . Then R successfully recovers from G.
PN
Proof. For all 1 6 ` 6 K we have G` = j=1 mj` Fj , for some scalars mj` . Using (118), we get
X
N X p p
Rk G` P = mj` Rk Fj P = mj` dj δkj P = mk` dk P ,
j=1 j
X
M X
K XX
R(G(ρ)) = R(G(PρP)) = (Rk G` P)ρ(Rk G` P)∗ = |mk` |2 dk PρP = cρ,
k=1 `=1 k `
PM PK
`=1 |mk` | dk . Thus R successfully recovers from G given code space C. 2
where c := 2
k=1
Theorem 27.5 Suppose C ⊆ H is the code space for a quantum code, P is the projector projecting
orthogonally onto C, E ∈ T(H) is a not necessarily complete quantum error channel with Kraus operators
E1 , . . . , EN that satisfy the peep condition (111), i.e.,
PE∗i Ej P = [M]ij P
for all 1 6 i, j 6 N, for some matrix M. Suppose also that G is an error channel whose Kraus operators
G1 , . . . , GK are all linear combinations of E1 , . . . , EN . Then the channel R constructed in the proof of
Theorem 27.2 to recover from E also successfully recovers from G, given code space C.
Proof. In the proof of Theorem 27.2 above, we chose new Kraus operators F1 , . . . , FN for E where
P
Fk := N j=1 [U]jk Ej for all 1 6 k 6 N, where U is an N × N unitary matrix that diagonalizes M so
that there are real numbers d1 , . . . , dN > 0 such that PF∗k F` P = dk δk` P for all 1 6 k, ` 6 N. The Fk
are clearly linear combinations of the Ej , but the Ej are also linear combinations of the Fk ; indeed,
P
it is easily checked that Ej = N ∗
k=1 [U]jk Fk , using the unitarity of U. Thus the G` , being linear
combinations of the Ej , are linear combinations of the Fk as well.
The R we constructed in the proof of Theorem 27.2 has Kraus operators
179
Setting Rk := U∗k Pk for all 1 6 k 6 N + 1, Equations (116) and (117) say that
p
Rk Fj P = dj δkj P
for all 1 6 k 6 N+1 and all 1 6 j 6 N. This is exactly the discretization condition of Equation (118)
(with M = N + 1). Therefore, G, the Rk , and the Fj together satisfy the hypotheses of Theorem 27.4,
and so R successfully recovers from G by that theorem. 2
We can apply either Theorem 27.4 or Theorem 27.5 to the Shor code to show that Bob’s recovery
procedure can correct any single-qubit error. The key point is that the four Pauli operators I, X, Y, Z
form a basis for the space of all single-qubit operators, and so any single-qubit error channel has
Kraus operators that are linear combinations of the Pauli operators. Since Bob can recover from
any error of the form Xj , Yj , or Zj , for 1 6 j 6 9 in a way that satisfies Theorem 27.4, he can recover
from any linear combination of these—in particular, any error on any one of the nine qubits.
Exercise 27.6 (Challenging) Show that Bob’s recovery channel for the Shor code can recover from
any error on any one of the nine qubits. [Hint: By the preceding discussion, it only remains to
show that Bob’s recovery procedure satisfies the discretization condition of Equation (118) for the
recoverable portion of the depolarizing channel given by Equation (110).]
180
S :=
Figure 17: Implementing the C-NOT gate fault-tolerantly using the Shor code. The double slashes
on the left indicate that each line represents a multi-qubit register (nine qubits in this case). The
circuit maps |aS i|bS i to |aS i|(a ⊕ b)S i for all a, b ∈ {0, 1}.
Fault-Tolerant Quantum Computation. If a qubit is in an encoded state, such as with the Shor
code, then we can repeatedly apply an error-recovery operation to “restore the logic,” i.e., the
state of the logical qubit, assuming isolated errors in the physical qubits. Depending on the
implementation and frequency of the restore operations, we can maintain a logical qubit state
indefinitely with high probability. There is more to a quantum computation, however, than simply
maintaining qubits. We must apply quantum gates to them. A not-so-good way to apply a
quantum gate is to decode each qubit involved in the gate, then apply the gate on the unencoded
qubits, then re-encode the qubits. This is bad because qubits spend time unencoded and subject
to unrecoverable errors, defeating the whole purpose of error correction. A better way is to keep
all qubits in an encoded state always, never decoding them, so that we prepare, work with, save,
and measure qubits in their encoded states only. This practice is called fault-tolerant quantum
computation, and it works by replacing each gate of a standard, non-fault-tolerant quantum circuit
with a quantum mini-circuit that affects the state of the logical qubits in the same way the original
gate affects the state of its unencoded qubits.
With the Shor code as well as other quantum error-correcting codes, we can implement several
types of quantum gates fault-tolerantly. It can be shown that these codes can implement a family of
gates big enough to provide a basis for any feasible quantum computation (a so-called, “universal”
family of gates). We will not do an exhaustive treatment here, but will at least show how to
implement the C-NOT and Pauli gates explicitly using the Shor code.
Figure 17 shows how to implement the C-NOT gate fault-tolerantly using the Shor code. Each
logical qubit is implemented by nine physical qubits.
181
Exercise 28.1 Verify that the circuit in Figure 17 really implements the C-NOT gate with respect to
the Shor code. That is, show that the circuit maps |aS i|bS i to |aS i|(a ⊕ b)S i for all a, b ∈ {0, 1}.
182
29 Week 15: Stabilizers, Entanglement, and Bell inequalities
29.1 Stabilizers
The stabilizer formalism gives us two things: (1) a convenient way of describing some quantum
states and some of the gates that act on them; and (2) a large family of quantum error-correcting
codes that are efficient and allow easy recovery from errors. The Shor code is an example of a
stabilizer code. We will see others.
The Pauli Group. For ease of reference, here we recall the four 1-qubit Pauli operators (matrices
are with respect to the computational basis):
1 0 0 1 0 −i 1 0
I= , X= , Y= , Z= .
0 1 1 0 i 0 0 −1
gh = αβ(σ1 τ1 ⊗ · · · ⊗ σn τn )
and
g−1 = g∗ = α∗ (σ1 ⊗ · · · ⊗ σn ) .
Notice that, because of the commutation properties of the Pauli operators, g and h either commute
or anticommute, that is, either gh = hg or gh = −hg. The latter occurs just when there are an odd
number of positions with anticommuting Pauli components in g and h.
A subgroup of Πn is any subset of Πn which is itself a group. If g1 , . . . , gk are elements of Πn ,
then the subgroup of Πn generated by S = {g1 , . . . , gk } is the smallest subgroup of Πn that includes
S. It is the closure of S∪{1} under multiplication and inverses, and we denote it hSi or hg1 , . . . , gk i.29
We say that S is a minimal set of generators if no proper subset of S generates the same subgroup.
29
The angle bracket notation h· · ·i we use here has nothing to do with the Hermitean inner product on a Hilbert space.
We have no occasion to use the latter meaning anywhere in this section.
183
Here is an example. Let n := 3 and let g := −i(X ⊗ Z ⊗ Y) = −iX1 Z2 Y3 and h := −(Y ⊗ I ⊗ Z) =
−Y1 Z3 . Then
coeff(g) = −i ,
princ(g) = X ⊗ Z ⊗ Y = X1 Z2 Y3 ,
coeff(h) = −1 ,
princ(h) = Y ⊗ I ⊗ Z = Y1 Z3 ,
gh = −i(Z ⊗ Z ⊗ X) = −iZ1 Z2 X3 ,
hg = −i(Z ⊗ Z ⊗ X) = −iZ1 Z2 X3 = gh ,
g2 = −1 ,
h2 = 1 .
Exercise 29.1 Redo the example above, this time where n = 4 and g = I ⊗ Z ⊗ Z ⊗ X and
h = i(Y ⊗ Y ⊗ X ⊗ Y).
Stabilizing Subgroups. There are subgroups of Πn that are of particular interest to us.
Definition 29.2 Let G be a subgroup of Πn . We will say that G is a stabilizing subgroup iff −1 < G.
Exercise 29.4 Prove Lemma 29.3. [Hint: The last item follows easily from the second item and the
fact that any two elements of Πn either commute or anticommute.]
Exercise 29.5 List the elements of hg1 , g2 i, where g1 = X ⊗ Z and g2 = Z ⊗ X. Is hg1 , g2 i a stabilizing
subgroup of Π2 ? Explain.
Exercise 29.6 Let G be a stabilizing subgroup of Πn with a k-element minimal generating set
S = {g1 , . . . , gk }. Show that G has exactly 2k many elements, each one obtained by multiplying
together the elements of a different subset of S. That is,
Y
G= g : J⊆S , (119)
g∈J
and each choice of J yields a distinct product. [Hint: Use items 2 and 4 of Lemma 29.3.]
30
A group with this property is called commutative or abelian.
184
The previous exercise shows that any minimal generating set for G has size k. We will call k
the dimension of G.
n
Stabilizing Subgroups Acting on H. Fix n > 1 as before, and let H = C2 be the n-qubit Hilbert
space with the usual computational basis. Notice that each element of Πn is an operator in L(H).
Let E ⊆ H be any nontrivial (i.e., positive-dimensional) subspace of H. The stabilizer of E in Πn ,
written Stab(E), is the set of all g ∈ Πn that fix E pointwise, that is, the set of all g ∈ Πn such that
gv = v for all v ∈ E. Stab(E) is a subgroup of Πn , and in fact it must be a stabilizing subgroup,
because −1 ∈ Πn maps any v to −v, and so does not fix anything except the zero vector.
Conversely, given a stabilizing subgroup G of Πn , we let EG be the subspace of H stabilized
by G, that is,
EG := {v ∈ H : (∀g ∈ G)[ gv = v ]} .
(If G is not stabilizing, then EG is evidently the trivial space {0}.) Clearly, G ⊆ Stab(EG ). We will
see shortly (Proposition 29.14, below) that the two groups are the same.
We can recast all this in terms of eigenvectors, eigenvalues, and projectors. If g ∈ Πn and
coeff(g) ∈ {1, −1}, then g2 = 1. This means that the only two eigenvalues of g are +1 and −1.
Obviously, the identity element 1 ∈ Πn only has eigenvalues +1. If g , ±1, however, then g
has both eigenvalues, and in fact, dim E+1 (g) = dim E−1 (g) = 2n−1 . This is because first of all,
princ(g) must have a Pauli operator σ , I in at least one position, and since tr σ = 0 we then
have tr g = 0 (see the last item of Proposition 10.1). Secondly, tr g is the sum of g’s eigenvalues
with multiplicity. Thus +1 and −1 occur with the same multiplicity, which is then the common
dimension of the two eigenspaces of g. The projectors onto these two eigenspaces are seen to be
1±g
Pg± := (120)
2
(here as well, 1 ∈ Πn is the n-qubit identity operator).
Notice, by the way, that these projectors are sums over hgi = {1, g} and h−gi = {1, −g}, which
are both two-element (stabilizing) subgroups of Πn . More generally:
Lemma 29.7 Let G be a k-dimensional stabilizing subgroup of Πn , and let {g1 , . . . , gk } be a minimal
generating set for G. Then dim(EG ) = 2n−k , and the operator
X
PG := 2−k g (121)
g∈G
X X X
!
−k
gP = g 2 h = 2−k gh = 2−k h=P, (122)
h∈G h∈G h∈G
the third equality following from the fact that for fixed g ∈ G, gh runs through the elements of G
as h runs through the elements of G.
185
P is Hermitean because all elements of G are Hermitean. Next, we check that P projects onto
EG by fixing every vector in EG and mapping every vector in H into EG (whence it follows that
P2 = P). For any v ∈ EG , we have
X X
Pv = 2−k gv = 2−k v=v.
g∈G g∈G
because all the other elements of G besides 1 have zero trace. We have tr 1 = tr(I⊗n ) = 2n , and so
finally,
dim(EG ) = tr P = 2n−k .
2
The next exercise shows that we can also characterize EG directly in terms of a generating set
for G, using the Pg+ projectors.
Exercise 29.8 Let G be a k-dimensional stabilizing subgroup of Πn , and let {g1 , . . . , gk } be a minimal
generating set for G, as in the previous lemma. Show that PG = Pg+1 Pg+2 · · · Pg+k , where the Pg+i are
defined by Equation (120). [Hint: Expand the right-hand side and use Exercise 29.6.]
Let’s look at some more examples. Suppose n = 4 and G = hZ1 , Z2 , Z3 , Z4 i. This is a minimal
generating set for G with four elements, so dim(EG ) = 24−4 = 1. (Oh, and G is stabilizing.) What
is a vector in EG ? We have Z|0i = |0i, so |0000i ∈ EG . Thus EG is the 1-dimensional subspace
spanned by |0000i. Now suppose G = hZ1 , Z2 , −Z3 , Z4 i. Then you should check that EG is spanned
by |0010i. Can you generalize these observations?
Exercise 29.9 For any b1 , . . . , bn ∈ {0, 1}, let G := (−1)b1 Z1 , (−1)b2 Z2 , . . . , (−1)bn Zn . Show that
Now suppose G = hX1 , X2 , X3 , X4 i. Since X|+i = |+i, we get that EG is the 1-dimensional space
spanned by |+i⊗4 .
Exercise 29.10 Suppose G = hX1 , X2 , −X3 , X4 i. Make a guess about EG and verify that your guess
is correct.
• G = hZ1 , Z2 , X3 , X4 i.
• G = hY1 , Y2 , Y3 , Y4 i.
186
• G = h−Z1 , Z1 Z2 , −Z1 Z2 Z3 , Z1 Z2 Z3 Z4 i.
Exercise 29.12 The last exercise suggests that the minimal generating set of a stabilizing G is not
unique (although they all have the same size). Find an alternate minimum generating set for the
group h−Z1 , Z1 Z2 , −Z1 Z2 Z3 , Z1 Z2 Z3 Z4 i of the last exercise.
Corollary 29.13 If G is a stabilizing subgroup of Πn , then dim(EG ) > 0 and G has dimension at most n.
G = Stab(EG ) .
Proof. We noticed before that G ⊆ Stab(EG ). For the reverse inclusion, let H := Stab(EG ). Then H
is stabilizing and EG ⊆ EH (since H fixes all of EG at least). Thus dim(EH ) > dim(EG ). Let k and `
be the dimensions of G and H, respectively. We have dim(EG ) = 2n−k and dim(EH ) = 2n−` , and
hence ` 6 k. But then G has cardinality 2k and H has cardinality 2` 6 2k , so we must have k = `
and G = H. 2
Not every subspace of H is of the form EG for some stabilizing G. There are infinitely many
subspaces of H but only finitely many stabilizing subgroups of Πn . For almost all subspaces E, we
have Stab(E) = {1}, but E{1} = H. The spaces of the form EG are particularly nice. For one thing,
we will use them as the code spaces for stabilizer error-correcting codes (see the topic, Stabilizer
Codes, below).
Given a stabilizing subgroup G ⊆ Πn and some h ∈ Πn that anticommutes with at least one
element of G, we end this topic with some results about how h “splits G in half.”
Lemma 29.15 Let G be a k-dimensional stabilizing subgroup of Πn and let h ∈ Πn be arbitrary. Let
C := {g ∈ G | gh = hg}. Then C is a subgroup of G, and if C , G, then C has exactly 2k−1 elements (that
is, exactly half the elements of G).
Corollary 29.16 Let G, k, and h be as in Lemma 29.15, and assume that h anticommutes with at least one
element of G. Then PG hPG = 0.
187
Proof. Let C := {g ∈ G | gh = hg} as in the lemma, which implies that, since C , G by assumption,
C and G − C have the same number of elements. Setting P := PG and using Equation (122), we
compute
X X X X X
PhP = 2−k ghP = 2−k ghP + ghP = 2−k hgP − hgP
g∈G g∈C g∈G−C g∈C g∈G−C
X X X X
= 2−k h gP − gP = 2−k h P− P = 0 .
g∈C g∈G−C g∈C g∈G−C
2
The next lemma will be used to prove the Gottesman-Knill theorem, below.
Lemma 29.17 Let G, k, h, and C be as in Lemma 29.15, and suppose that coeff (h) = ±1 and neither h
nor −h is in G. Let hC := {hg | g ∈ C}. Then C ∪ hC is a stabilizing subgroup of Πn whose dimension is
k + 1 if C = G and k otherwise.
Proof. Let H := C ∪ hC. It is routine to check that H is a subgroup of Πn (using the fact that h2 = 1)
and that C and hC have the same number of elements. Furthermore, C ∩ hC = ∅, for otherwise
there are g1 , g2 ∈ C such that g1 = hg2 , but then h = g1 g2 ∈ G; contradiction.31 We conclude that
H is twice as big as C. If C = G, then H has 2k+1 many elements. Otherwise, C has 2k−1 many
elements by Lemma 29.15, whence H has exactly 2k many elements.
Finally, we show that H is stabilizing, and thus has the appropriate dimension given its size.
We already have −1 < C, since G is stabilizing, but we cannot have hg = −1 for any g ∈ C either;
otherwise, multiplying both sides by h gives −h = g ∈ G; contradiction. Therefore, −1 < H. 2
Connection to Linear Algebra Over Z2 . There is an illuminating way of describing the principal
part of an element g ∈ Πn as a 2n-dimensional row vector over the 2-element field Z2 . We
define two maps ϕx , ϕz : Πn → Zn 2 as follows: If g = α(σ1 ⊗ · · · σn ), where α ∈ {±1, ±i} and
σi ∈ {I, X, Y, Z} for 1 6 i 6 n, then define
x1 x2 · · · xn ∈ Zn
ϕx (g) := 2 ,
n
ϕz (g) := z1 z2 · · · zn ∈ Z2 ,
where for all 1 6 i 6 n,
1 if σi ∈ {X, Y},
xi :=
0 if σi ∈ {I, Z},
1 if σi ∈ {Z, Y},
zi :=
0 if σi ∈ {I, X}.
Observe that ϕx (g) and ϕz (g) together uniquely determine princ(g) but ignore coeff(g) completely.
Most importantly, one should verify that for any g, h ∈ Πn ,
ϕx (gh) = ϕx (g) + ϕx (h) ,
ϕz (gh) = ϕz (g) + ϕz (h) ,
31
This is a basic result of group theory—cosets of a finite subgroup are all the same size and pairwise disjoint.
188
where the right-hand operations are both vector addition modulo 2.32 Now define
ϕ(g) := ϕx (g) ϕz (g) ,
the 2n-dimensional row vector obtained by concatenating ϕx (g) with ϕz (g).33 For any g, h ∈ Πn ,
we have ϕ(g) = ϕ(h) if and only if princ(g) = princ(h), and
as well.
If G is a stabilizing subgroup of Πn , then ϕ is one-to-one when restricted to G (this follows
from Lemma 29.3(3)). For any g1 , . . . , gk ∈ G, we can form the k × 2n matrix
ϕ(g1 ) ϕx (g1 ) ϕz (g1 )
ϕ(g2 ) ϕx (g2 ) ϕz (g2 )
M := =
.. .. ..
. . .
ϕ(gk ) ϕx (gk ) ϕz (gk )
whose rows are the vectors ϕ(gi ) for 1 6 i 6 k. Now assume G = hg1 , . . . , gk i. Then the vectors
ϕ(g) for g ∈ G are exactly the linear combinations (over Z2 ) of the rows of M. Furthermore,
{g1 , . . . , gk } is a minimal generating set if and only if the rows of M are linearly independent over
Z2 (see Lemma 29.21 below). More generally, the dimension of G is equal to the rank of M.
The map ϕ can also easily tell us whether two given elements g, h ∈ Πn commute or anticom-
mute. We define the following inner product34 of the row vectors ϕ(g) and ϕ(h) as
Exercise 29.18 For the g and h of Exercise 29.1, give ϕ(g) and ϕ(h) as well as ϕ(g) · ϕ(h). Do the
same for the g and h of the example immediately preceding Exercise 29.1.
The next lemma gives a linear algebraic way to determine if given elements of Πn form a
minimal generating set for a stabilizing subgroup. The linear algebra is over Z2 .
32
Any map that preserves operations in this way is known as a group homomorphism.
33
This vector is sometimes written as ϕx (g) ⊕ ϕz (g), but we avoid that notation here as it may be confusing.
34
This is an example of what is known as a symplectic inner product.
189
Lemma 29.21 Let g1 , . . . , gk ∈ Πn be arbitrary. Then {g1 , . . . , gk } is a minimal generating set for a
stabilizing subgroup of Πn if and only if
Proof. The forward direction comes immediately from Lemma 29.3, except for the third item,
which follows from Exercise 29.6: any linear dependence of the ϕ(gi ) would correspond to a
nonempty product of the gi equalling 1, contradicting the minimality of the generating set.
For the reverse direction, assume all three conditions hold. Let G := hg1 , . . . , gk i. To show that
G is stabilizing, suppose that −1 ∈ G. By commutativity (the second condition) and the fact that
g2i = 1 for all 1 6 i 6 k (the first condition), we must be able to write
−1 = ge1 1 · · · gekk ,
for some e1 , . . . , ek ∈ {0, 1}. But then,
0 = 0 · · · 0 = ϕ(−1) = ϕ(ge1 1 · · · gekk ) = e1 ϕ(g1 ) + · · · + ek ϕ(gk ) .
So by the linear independence of {ϕ(g1 ), . . . , ϕ(gk )} (the third condition), we must have e1 = · · · =
ek = 0; but then, g01 · · · g0k = 1 , −1. Contradiction. Thus G is stabilizing.
Finally, if {g1 , . . . , gk } were not minimal, then we could write some gj as a product of the other
gi , but then ϕ(gj ) is a linear combination of the other ϕ(gi ), contradicting linear independence. 2
Stabilizer Circuits and the Gottesman-Knill Theorem. Our first application of stabilizers is
to show the Gottesman-Knill theorem, which says that quantum circuits employing only H, S,
and C-NOT gates can be simulated efficiently (i.e., in polynomial time) on a classical computer.
We call these circuits stabilizer circuits. Initial states must be computational basis states, and all
measurements are computational basis measurements.35
n
As before, we let H C2 be the n-qubit Hilbert space with the usual computational basis.
The whole idea is to keep track of the quantum state ρ = |ψihψ| inside an n-qubit circuit, not as
a superposition of basis states as we have been doing, but rather as a set of generators of an n-
dimensional stabilizing subgroup of Πn that stabilizes |ψi. When some gate U is applied, mapping
the state |ψi to state |ψ 0 i = U|ψi, we can easily update our generating set to that of a new subgroup
that stabilizes |ψ 0 i.
The gates H (Hadamard gate) and S (phase gate) applied to any qubit and the C-NOT gate
applied to any pair of qubits have a special property that makes the above possible: they normalize
Πn . A unitary operator U ∈ L(H) is said to normalize36 Πn iff, for any unitary operator g ∈ L(H),
g is in Πn if and only if UgU∗ ∈ Πn . The unitary operators that normalize Πn themselves form a
group, called the n-qubit Clifford group Cn . One can show that Cn is generated (up to an arbitrary
global phase) by the three types of operators mentioned above: H, S, and C-NOT, which are
sometimes called Clifford gates.
35
Improvements and generalizations to this theorem were made in a subsequent paper by Aaronson and Gottesman.
36
This term comes from group theory and has nothing to do with making a vector have unit length.
190
Exercise 29.22 The n-qubit Pauli group is clearly a subgroup of the n-qubit Clifford group, so we
can allow Pauli gates in a stabilizer circuit “for free.”
2. Show how the three Pauli operators X, Y, and Z (not necessarily in that order) can be written
as products of H and S gates only. [Hint: It may help to picture how these gates rotate the
Bloch sphere.]
We describe the classical simulation of a stabilizer circuit in three steps: (1) representing the
initial quantum state before the circuit is applied; (2) showing how to update this representation
as each Clifford gate of the circuit is applied; and (3) computing outcome probabilities and the
post-measurement state of a 1-qubit measurement in the computational basis. We will take these
in order, but first recall the projector P = PG of Equation (121) for an arbitrary stabilizing subgroup
G. If G has dimension n, then PG projects onto a subspace of dimension 2n−n = 1, in which case,
PG is a pure state that we can represent by a minimal generating set {g1 , . . . , gn } of G. This is how
we will represent states as the computation proceeds.
We assume the initial state being fed to the circuit is some computational basis state |ϕ0 i :=
|b1 b2 · · · bn i,
where each bj ∈ {0, 1}. In Exercise
29.9, you effectively showed that |ϕ0 ihϕ0 | = PG
where G = (−1)b1 Z1 , (−1)b2 Z2 , . . . , (−1)bn Zn . So this is our representation of the initial basis
state of the circuit.
Now suppose that at some stage in the circuit’s application the state is ρ = PG for some
stabilizing G = hg1 , . . . , gn i just before some Clifford gate U is applied. We claim that immediately
after U is applied, the new state ρ 0 = UρU∗ is equal to PG 0 , where
To see the claim, first note that each Ugj U∗ is in Πn for 1 6 j 6 n, because U is a Clifford gate.
Next, notice that when multiplying terms of the form Ugj U∗ together, the inner U’s cancel, e.g.,
(Ug1 U∗ )(Ug2 U∗ ) = Ug1 g2 U∗ . This fact helps one to see that G 0 = hUg1 U∗ , . . . , Ugn U∗ i. Next, we
can see that G 0 must be a stabilizing subgroup of Πn , for otherwise, −1 ∈ G 0 , and it follows that
contradicting the fact that G is stabilizing. Now G 0 evidently has the same number of elements
as G, which is 2n , and so G 0 has dimension n. Finally, let |ψi be any unit vector in EG (and thus
ρ = |ψihψ|) and let g be any element of G. Then
That is, U|ψi is fixed by UgU∗ . Since any element g 0 ∈ G 0 can be written in this form, we get that
U|ψi ∈ EG 0 . Thus ρ 0 = U|ψihψ|∗ U projects onto EG 0 and so is equal to PG 0 .
Having established the claim, it remains to compute Ugj U∗ given gj , for 1 6 j 6 n. This is
easy, given the limited choices for U. For example, if U = H1 and gj = Z ⊗ · · ·, then
UgU∗ = HZH ⊗ · · · = X ⊗ · · · ,
191
where the omitted Pauli operators remain unchanged. This example generalizes to H, S, and
C-NOT acting on any of the qubits. The table below gives the results of the three Clifford gates
U conjugating Pauli gates σ. Here, 1 6 i, j 6 n and i , j, and we only include the cases where
UσU∗ , σ. We could have omitted the Y-gates from the second column of the table because
Y = iXZ, and so conjugating Y is the same as conjugating Z followed by conjugating X and
inserting the global phase shift i = eiπ/2 (that is, UYU∗ = i(UXU∗ )(UZU∗ )). Recall that C-NOTi,j
has qubit i as the control and qubit j as the target.
U σ UσU∗
Hi Xi Zi
Yi −Yi
Zi Xi
Si Xi Yi
Yi −Xi
C-NOTi,j Xi Xi Xj
Yi Yi Xj
Yj Zi Yj
Zj Zi Zj
Exercise 29.23 Extend the table above to include entries for U = Xi , U = Yi , and U = Zi .
Exercise 29.24 Consider the somewhat randomly chosen stabilizer circuit below:
S H Z
H Y
H X S H
Assuming the initial state is |110ih110|, give a set of three generators for the group stabilizing the
state after each C-NOT gate is applied. The initial state is stabilized by the generators
g1 = −Z ⊗ I ⊗ I = −Z1
g2 = −I ⊗ Z ⊗ I = −Z2
g3 = I⊗I⊗Z = Z3
Give the other sets of generators in the same format, but you can omit all the ⊗’s.
192
+ 1 0 − 0 0
PZ = (1 + Z1 )/2 = ⊗ I ⊗ · · · ⊗ I for outcome 0 and PZ1 = (I − Z1 )/2 = ⊗I⊗· · ·⊗I
1 0 0 0 1
for outcome 1 (see Equation (120)). The outcome probabilities are
D E
+ + + +
Pr[0] = PZ 1
, ρ = tr PZ1 ρ = tr PZ1 ρPZ1 ,
D E
− − − −
Pr[1] = PZ 1
, ρ = tr PZ1 ρ = tr PZ1 ρPZ1 ,
ρ0 := Pr[0]−1 PZ
+
1
+
ρPZ1
,
ρ1 := Pr[1]−1 PZ
−
1
−
ρPZ1
.
± ± ± ±
To compute PZ 1
ρPZ 1
, first note that for g ∈ G, if gZ1 = Z1 g, then gPZ1
= PZ1
g, but if gZ1 = −Z1 g,
± ∓ ± ∓
then gPZ1 = PZ1 g. Also note that PZ1 PZ1 = 0. From this you can perhaps see how this is going to
go—we split ρ = PG into those terms that commute with Z1 and those that anticommute with Z1 ;
the latter terms disappear:
X
± ± −n ± ±
PZ1
ρPZ1 = 2 PZ 1
gPZ1
(123)
g∈G
X X
± ± ± ±
= 2−n PZ1
gPZ1
+ PZ1
gPZ1
(124)
g∈G:gZ1 =Z1 g g∈G:gZ1 =−Z1 g
X X
± ± ± ∓
= 2−n PZ P g+
1 Z1
PZ P g
1 Z1
(125)
g∈G:gZ1 =Z1 g g∈G:gZ1 =−Z1 g
X
−n ±
= 2 PZ1
g (126)
g∈G:gZ1 =Z1 g
X X X
!
= 2−n−1 (1 ± Z1 )g = 2−n−1 g± Z1 g . (127)
g∈G:gZ1 =Z1 g g g
Now we have to consider three cases: (1) Z1 is in G; (2) −Z1 is in G; and (3) neither Z1 nor −Z1 is
in G. These cases are mutually exclusive, because G is stabilizing.
P P
Case 1: Z1 ∈ G. Then Z1 commutes with all elements of G, making both sums gg and g Z1 g
sum over all elements of G and thus be equal. It follows that
X
+ + −n
PZ 1
ρPZ 1
= 2 g=ρ,
g∈G
Pr[0] = tr ρ = 1 ,
− −
PZ1
ρPZ1
= 0,
Pr[1] = tr 0 = 0 .
To summarize, the outcome is 0 with certainty and the post-measurement state is unchanged:
ρ0 = ρ. (ρ1 is undefined, because you cannot divide by 0.) We also have G0 = G, so there is
no need to change the stabilizing group representing the post-measurement state.
193
Case 2: −Z1 ∈ G. ThisPis similar P
to the previous case
P (particularly, Z1 commutes with all of G)
except that now, g∈G g = g∈G (−Z1 )g = − g∈G Z1 g. This gives
+ +
PZ1
ρPZ1
= 0,
Pr[0] = tr 0 = 0 ,
X
− − −n
PZ1
ρPZ1 = 2 g=ρ,
g∈G
Pr[1] = tr ρ = 1 .
To summarize, the outcome is 1 with certainty and the post-measurement state is unchanged:
ρ1 = ρ. (ρ0 is undefined.) Again, we have G1 = G, so there is no need to change the stabilizing
group for the post-measurement state.
Case 3: neither Z1 nor −Z1 is in G. This is the hardest of the three cases to analyze. The probabil-
ity of outcome 0 is
X X
!
+ + −n−1
Pr[0] = tr PZ1 ρPZ1 = 2 tr g + tr (Z1 g) ,
g g
where both sums are over those g ∈ G that commute with Z1 . We now show that the
second sum disappears entirely: for all g ∈ G we must have princ(Z1 g) , 1 (for otherwise,
princ (Z1 ) = princ (Z1 gg) = princ (Z1 g) princ (g) = princ (g), and so Z1 = ±g, contradicting
the fact that neither Z1 nor −Z1 is in G); thus tr (Z1 g) = 0 for all g ∈ G. The only term that
survives in the first sum is g = 1; all others have zero trace. Thus
1
Pr[0] = 2−n−1 tr 1 = 2−n−1 tr I⊗n = 2−n−1 2n = = Pr[1] .
2
Thus outcomes 0 and 1 occur with equal odds. For the post-measurement states, we will see
that G0 and G1 differ from each other and from G. In fact, Z1 ∈ G0 and −Z1 ∈ G1 . A full
analysis will use Lemma 29.25, below.
Lemma 29.25 Let G be an n-dimensional stabilizing subgroup of Πn and let h ∈ Πn be such that
coeff (h) = ±1 and neither h nor −h is in G. Let C := {g ∈ G | gh = hg} and hC := {hg | g ∈ C}. Then
C , G and C ∪ hC is an n-dimensional stabilizing subgroup of Πn .
Proof. This all follows immediately from Lemma 29.17 (with k = n) provided we can show
that C , G. Let H := C ∪ hC. H is a stabilizing subgroup of Πn by Lemma 29.17 of dimension
either n or n + 1 (the latter if C = G). By Corollary 29.13, no stabilizing subgroup of Πn can
have more than 2n elements, and therefore H has dimension n. This implies that C , G, since
|C| = |H|/2 = 2n−1 < 2n = |G|.37 2
We apply Lemma 29.25 twice—once with h = Z1 and again with h = −Z1 —to find the two
alternative post-measurement states (actually the groups G0 and G1 that stabilize them) in the case
where neither Z1 nor −Z1 is in G. Letting C ⊆ G be the set of all elements of G that commute
37
Here we use the vertical bars to indicate the cardinality of a set.
194
with Z1 , we define G0 := C ∪ Z1 C and G1 := C ∪ (−Z1 )C. By the lemma, both are n-dimensional
stabilizing subgroups of Πn . We now verify that for b ∈ {0, 1}, PGb is the post-measurement state
given outcome b. From Equations (123–127), the lemma, and the fact that Pr[0] = Pr[1] = 1/2, we
have
X X
+ +
ρ0 = 2PZ 1
ρPZ1
= 2−n (g + Z1 g) = 2−n g = PG0 ,
g∈C g∈G0
X X X
− − −n −n
ρ1 = 2PZ1
ρPZ1
=2 (g − Z1 g) = 2 (g + (−Z1 )g) = 2−n g = PG1 .
g∈C g∈C g∈G1
There are two things left to do to complete the proof of the Gottesman-Knill theorem: (1) show
how to determine easily which of the three cases applies for a 1-qubit measurement; (2) in Case 3,
determine generators for G0 and G1 given generators for G.
We are given a minimal set of generators S := {g1 , . . . , gn } for G = Stab ρ, the group stabilizing
the pre-measurement state. We can easily distinguish Case 3 from the other two: by Lemma 29.25,
one of ±Z1 is in G if and only if Z1 commutes with all of G, if and only if Z1 commutes with all
of S. The latter can be easily checked: Z1 commutes with gj if and only if princ(gj ) = I ⊗ · · · or
princ(gj ) = Z ⊗ · · ·.
If Z1 commutes with all of S, then to determine which of Z1 or −Z1 is in G, we first find a
subset T ⊆ S whose elements multiply to ±Z1 , then we actually multiply these together to see
what we get—either Z1 or −Z1 . Finding T is essentially a problem in linear algebra over Z2 , via the
correspondence given in the previous topic. Using standard techniques of linear algebra (Gaussian
elimination, particularly), we can express ϕ(Z1 ) as a linear combination (actually a simple sum,
since the field is Z2 ) of elements from the set {ϕ(g1 ), . . . , ϕ(gn )}. Multiplying together those gj
such that ϕ(gj ) appears in the sum will yield ±Z1 .
Finally, here is how to get generators for Gb for b ∈ {0, 1} in the case where neither Z1 nor
−Z1 is in G. Choose one of the generators that anticommutes with Z1 —suppose it is g1 (it doesn’t
matter which one it is). Replace g1 in the generating set with (−1)b Z1 (that is, Z1 if b = 0 and −Z1
if b = 1), then for any other generator gj that anticommutes with Z1 , replace gj by g1 gj . The result
is a generating set Sb for Gb .
To see why, first note that all elements of Sb commute with Z1 , and so they are all in C except
for ±Z1 itself. Thus Sb ⊆ Gb . It remains to show that all of Gb is generated by Sb . For this it
suffices to show that all of C is generated by Sb , because (−1)b Z1 is also in Sb . Every element
g ∈ G is a unique product of distinct elements from S, say,
for some k and 1 6 i1 < i2 < · · · < ik 6 n. Then g ∈ C (that is, g commutes with Z1 ) if and only
if Z1 anticommutes with an even number of the factors gij : starting with Z1 g = Z1 gi1 gi2 · · · gik ,
transpose Z1 with gi1 then gi2 , etc., until Z1 winds up on the far right. To keep all these expressions
equal, every time Z1 anticommutes with one of these factors, we must introduce a minus sign out
front, so the minus signs cancel just when there are an even number of such anticommutations.
Thus we can group the factors gij < C into adjacent pairs inside the product on the right-hand
side, above. But now each pair is obtainable from Sb . For example, if g3 and g5 are paired, then
g3 g5 = g21 g3 g5 = (g1 g3 )(g1 g5 ), and both g1 g3 and g1 g5 are in Sb . The other unpaired factors are in
195
C and hence are already in Sb . This shows that every element of C is the product of factors from
Sb , which finishes the proof of the Gottesman-Knill theorem.
Remark. The only difference between S0 and S1 above is that S0 contains Z1 and S1 contains −Z1
instead. One can easily simulate any number of 1-qubit measurements in the circuit by maintaining
a single expression for Sb (involving b) after each successive measurement.
Exercise 29.26 Referring back to Exercise 29.24, suppose that after all the gates are applied, the
three qubits are measured in order. Give the probability of the outcomes of each measurement and
the corresponding post-measurement states. Subsequent measurements may depend on previous
results. Describe all possible sequences of outcomes and their probabilities.
Stabilizer Codes.
29.2 Entanglement
Suppose we have two physical systems with Hilbert spaces H and J. If the first system is prepared
in some pure state ρ ∈ L(H) and the second is independently prepared in a pure state σ ∈ L(J)—
and the two systems do not interact in any way—then the (pure) state of the combined system is
ρ ⊗ σ ∈ L(H) ⊗ L(J) L(H ⊗ J).38 We call such a pure state separable between the two systems, or
a tensor product state. For pure states, being a tensor product state and being a separable state are
equivalent, but for general (not necessarily pure) states, the notion of separability is looser. We will
only consider pure states, and then only those of two combined systems—so-called “bipartite”
pure states.
L(H ⊗ J) contains lots of pure states that are not of the form ρ ⊗ σ as above. Such pure states
are said to be entangled. Roughly, two physical systems are in an entangled state when they are
correlated in a non-classical (uniquely quantum) way. The four Bell states are entangled states of
two single-qubit systems—maximally entangled, it will turn out. None of them can be written as
the tensor product of two single-qubit states.
Our first task is to find a way to quantify mathematically the amount of entanglement of a pure
state shared between two systems. For this we use the Schmidt decomposition.
Theorem 29.27 (Schmidt Decomposition) Let H and J be Hilbert spaces, and let u ∈ H ⊗ J be any
unit vector. There exists a unique integer r > 0 and unique positive values s1 > · · · >P sr > 0 such that
there exist pairwise orthogonal unit vectors x1 , . . . , xr ∈ H and y1 , . . . , yr ∈ J such that rk=1 s2k = 1 and
X
r
u= sk (xk ⊗ yk ) . (128)
k=1
In fact, {s21 , . . . , s2r } is the multiset of nonzero eigenvalues of trJ (uu∗ ) and of trH (uu∗ ).
38
The relation indicates that these two spaces are naturally isomorphic.
196
Proof. The Schmidt decomposition is really the singular value decomposition in disguise. Pick
some standard orthonormal bases e1 , . . . , en for H and f1 , . . . , fn for J. (We will assume that
dim(H) = dim(J) = n for technical convenience, but this is not necessary.) We expand u with
respect to the product basis {ei ⊗ fj }16i,j6n as
X
u= αi,j (ei ⊗ fj )
16i,j6n
There are three things to check here. We first check that {xk } and {yk } are orthonormal sets of
vectors. Using the fact that V is unitary, we have, for all 1 6 k, ` 6 r, letting vik := [V]ik ,
X X X X
* n n
+ n
∗
[V ∗ ]ki [V]i` = [V ∗ V]k` = [I]k` = δk` .
A similar calculation shows that hyk , y` i = δk` , using the unitarity of W. Thus both {xk } and {yk }
are orthonormal sets.
Second, we have
X X
u = [A]ij (ei ⊗ fj ) = [VDW]ij (ei ⊗ fj )
16i,j6n i,j
X X
= [V]ik [D]k` [W]`j (ei ⊗ fj )
i,j 16k,`6n
X
= sk [V]ik δk` [W]`j (ei ⊗ fj )
i,j,k,`
X
= sk [V]ik [W]kj (ei ⊗ fj )
i,j,k
X r X
= sk [V]ik [W]kj (ei ⊗ fj )
k=1 i,j
X
r X X
!
= sk [V]ik ei ⊗ [W]kj fj
k=1 i j
X
r
= sk (xk ⊗ yk ) .
k=1
197
Since u is a unit vector, we also have
X
r X
r X
r
1 = u∗ u = sk s` x∗k x` y∗k y` = s2k ,
k=1 `=1 k=1
the last equation following from the fact that x∗k x` = y∗k y` = δk` .
Finally, we have
X
r X
r
! !!
trJ (uu∗ ) = trJ sk (xk ⊗ yk ) s` (x∗` ⊗ y∗` )
k=1 `=1
X
= trJ sk s` (xk x∗` ⊗ yk y∗` )
k,`
X
= sk s` tr(yk y∗` )xk x∗`
k,`
X
r
= s2k xk x∗k ,
k=1
Definition 29.28 (Shannon entropy) Given a probability distribution p = (p1 , . . . , pn ),39 we define
the Shannon entropy of p to be the quantity
X
n
H(p) = − pi lg pi , (129)
i=1
where 0 lg 0 = 0 by convention (alternately, we restrict the sum to those i for which pi > 0).
198
such an experiment, averaged over all the possible outcomes. For any probability distribution
p = (p1 , . . . , pn ) on n outcomes,
0 6 H(p) 6 lg n .
H(p) = 0 exactly when p is deterministic, i.e., pi = 1 for some i (and thus pj = 0 for all j , i).
H(p) = lg n if and only if p is the uniform distribution, i.e., pi = 1/n for all 1 6 i 6 n.
Shannon entropy has a quantum analogue. For any state ρ, we have ρ > 0 and tr ρ = 1, which
is equivalent to the spectrum of ρ being a probability distribution.
Definition 29.29 (Von Neumann entropy) Let H be a Hilbert space of dimension n > 0, let ρ ∈
L(H) be a state, and let λ := (λ1 , · · · , λn ) be the spectrum (vector of eigenvalues) of ρ, where
λ1 > · · · > λn . We define the von Neumann entropy of ρ as
H(ρ) = − tr(ρ lg ρ) .
If ρ > 0, then this expression makes perfect sense using the rule for applying a scalar function to a
normal operator that we discussed in Section 9. If one or more of ρ’s eigenvalues is zero, however,
then some care must be taken with this expression, just as we did when defining Shannon entropy
in the case where one or more of the probabilities was zero. In this case we can confine ρ to a
subspace on which it acts positive definitely. Let V := im(ρ) := {ρv | v ∈ H}, and let σ be ρ
restricted to V (V is a subspace of H). Then σ > 0, and so we can define H(ρ) := − tr(σ lg σ), and
this coincides with Definition 29.29.
Analogous to Shannon entropy, von Neumann entropy quantifies the amount of uncertainty
about a quantum state. We can view a pure state as one about which
we have complete information.
If ρ1 , . . . , ρk are pure states that are pairwise orthogonal (that is, ρi , ρj = 0 for all i , j), then
there is a projective measurement that can distinguish each ρi from the others with certainty.
(The projectors of this measurement are the ρi themselves, each ρi corresponding to outcome i,
P
possibly together with one additional projector P := I − k i=1 ρi for the outcome “none of the
above,” assuming this projector is nonzero.) Keeping with this view, a mixed state ρ ∈ L(H) can
be thought of as a state about which we have incomplete information, that is, our information about
ρ is only statistical. We can regard ρ as a probabilistic mixture of pairwise orthogonal pure states,
and mathematically, these component pure states can come from the spectral decomposition of ρ:
X
n
ρ= pi ui u∗i ,
i=1
199
How does all this relate to entanglement? Given a pure state ρ := uu∗ , where u is a unit vector
in H ⊗ J, if we decide to ignore (by tracing out) one or the other system, then the reduced state
(i.e., the state of the remaining system) will be mixed if and only if ρ is entangled. This suggests a
natural quantitative measure of the amount of entanglement in ρ—the amount of uncertainty we
have about either of these reduced states, given by their von Neumann entropy. This quantity can
be computed directly from the Schmidt coefficients s1 , . . . , sr of ρ:
H(trH (ρ)) = H(trJ (ρ)) = H(s21 , . . . , s2r ) ,
noting that (s21 , . . . , s2r ) is a probability distribution by Theorem 29.27.
In 1935, Albert Einstein, Boris Podolsky, and Nathan Rosen (EPR from now on) published a paper
arguing that the laws of quantum mechanics, although correct to the best of anyone’s knowledge,
were not a complete description of nature. Their argument was based on two assumed principles,
which are commonly called “locality” and “realism”:
Locality. All physical influences act locally; put another way, there is no action at a distance. Object
A cannot directly influence a distant object B without some intervening continuum of local
influences connecting the two. For example, that two distant, oppositely charged particles
attract is not due to any direct influence between them but rather due each responding
(locally) to the other’s electromagnetic field, which permeates all of space. For another
example, according to general relativity, a massive object warps the spacetime around it so
that nearby objects move along curved paths, even though locally they are still moving in
straight lines.
Realism. A complete knowledge of the state of a physical system is, in principle, enough to predict
the outcome of every possible measurement of that system. For example knowing the exact
trajectory of an asteroid now (as well as the gravitational forces acting on it) allows us to
predict where it will be a year from now, the accuracy of the prediction limited only by the
precision of the initial measurements and of our calculations.
If we prepare an electron spin in state |+i (i.e., |→i, spin-right) and we then measure its spin
in the vertical direction, we get an apparently uniformly random result: spin-up about half the
time, spin-down the rest of the time. The realistic view says that this apparent randomness is not
fundamental physics; it is instead an artifact of our incomplete understanding of the state of the
electron. There are aspects of the electron’s state that we don’t know about and that our current
theory is not accounting for—so called hidden variables—that predetermine its vertical spin before
we measure it. If we think we are preparing each electron in the same state |+i but getting different
results measuring the spin vertically, then the electrons really aren’t in the same state to begin
with, and our theory is not (as yet) adequate to account for that difference. A complete description
of a physical state would determine all measurement outcomes; nature is not inherently random.
(Einstein: “God does not play dice.”) This is the realist view of physics.
In a modification40 of EPR’s argument, we consider a system of two spin- 21 particles, created
together in a lab then separated from each other by an arbitrary distance. The particles are created
40
This modification is close to one due to David Bohm in 1951.
200
in a closed system with zero net angular momentum in any direction; conservation of angular
momentum then requires that the two spins always be measured as opposite, resulting in a net√spin
of zero. Quantum mechanics dictates that the pair is in the entangled state |Ψ− i = (|↑↓i−|↓↑i)/ 2—
called the spin singlet state—which has the property that if we measure the spin of each particle
in the same direction, we will always get opposite results, regardless of the direction chosen.
Now imagine the two particles being moved very far apart (even lightyears apart), Alice having
one and Bob the other. According to quantum mechanics, if Alice or Bob measures their spin in
the vertical direction they will see ↑ or ↓ uniformly at random, but, if, say, Alice measures her
spin and sees ↑, then Bob must see ↓, and vice versa, although he interprets his own result as
being uniformly random. According to the prevailing interpretation of quantum mechanics (the
so-called Copenhagen interpretation), when Alice measures her spin and sees ↑, say, the state of
the system “collapses” to |↑↓i, ensuring that Bob will subsequently measure ↓. This interpretation
appears to violate local realism: Alice’s measurement result alters the system’s state, thus magically
influencing Bob’s measurement, even though Bob is in another galaxy and (even worse), light may
not have time to travel from one measurement event to the other (the two events are spacelike
separated). Einstein criticized this interpretation as “spooky action at a distance.”
EPR posited an alternative, local realist interpretation of this scenario. When the two particles
were first created together in the lab, hidden variables were fixed locally between the two particles,
predisposing Alice’s particle to result in ↑ when measured, and Bob’s particle to result in ↓.
These hidden parameters perfectly correlated the two particles when they were in the same place
(locality), then were carried with the particles as they were separated; the measurement results
were not random, but were predetermined by these hidden variables (realism). Entanglement and
the subsequent state collapse are fictions; Alice was always going to see ↑, say, and Bob ↓, the die
being cast when the particles were created in the same place.
In the following decades, the EPR argument was taken less as physics than as philosophy,
since there seemed to be no experiment that could confirm or refute it (quantum mechanics versus
local realism). Then in 1964, John Bell showed how EPR’s interpretation has direct physical conse-
quences. He proposed a clever physics experiment that could test the EPR hypothesis. Bell showed
that local realism implies that statistical correlations between measurements of spatially separated
systems must satisfy certain constraints—now known as Bell inequalities—whereas quantum me-
chanics predicts that these constraints are violated. By taking a large enough sample of runs of
the same experiment and gathering the statistics, one could either confirm or refute (based on
statistical evidence, at least) the local realist interpretation.
A number of different Bell inequalities are now known, and we consider two in depth below.
Several experiments have been performed to test these inequalities. Although doubts have been
raised from time to time about statistical loopholes allowing for a local realist interpretation of
some of the experimental results, the overwhelming evidence at this point is that nature violates
the Bell inequalities. Hidden-variable theories are refuted, and there is every reason to think that
quantum mechanics offers a complete description of reality. A good philosophical discussion of
the EPR paradox can be found online in the Stanford Encyclopedia of Philosophy (https://plato.
stanford.edu/entries/qt-epr/).
We give two examples of Bell inequalities in this section, showing how the laws of quantum
mechanics violate each. Each is cast in terms of a nonlocal game, which we now describe. A nonlocal
201
game is a cooperative game played by two parties, Alice and Bob, and a referee. The referee first
produces a pair (s, t) of values, probabilistically drawn from some finite set. The ref gives s to
Alice and t to Bob. Alice then produces a value a and Bob a value b, which they send back to the
referee. Then Alice and Bob win if the tuple (s, t, a, b) satisfies a certain finite condition (which
depends on the type of game being played); otherwise, they lose. Before getting s and t from the
ref, Alice and Bob can get together beforehand and share any information, randomness, strategies,
etc. that they want, but they are not allowed to communicate with each other from the time they
receive s and t until the time they send a and b to the ref.
We say that Alice and Bob employ a classical strategy if what they share beforehand is purely
classical information, including shared randomness. They employ a quantum strategy if, in ad-
dition, they share an entangled quantum state beforehand. We define the value of a particular
strategy to be the overall probability of Alice and Bob winning using that strategy.
In each of the two example games we discuss below, there is a quantum strategy whose
value is strictly higher than any optimal classical strategy.41 These then imply violations of the
corresponding Bell inequalities: the local realist interpretation dictates that classical strategies are
the only ones available to Alice and Bob.
Nonlocal games and their limitations are explored extensively in a paper by Cleve, Høyer,
Toner, and Watrous (https://arxiv.org/abs/quant-ph/0404076).
The CHSH game. In this game, based on a Bell-type inequality discovered by Clauser, Horne,
Shimony, and Holt, the referee chooses values s, t ∈ {0, 1} uniformly at random and independently,
so that each pair (s, t) occurs with probability 41 . Then Alice and Bob produce values a and b in
{+1, −1}, respectively. Alice and Bob win if and only if ab = (−1)st ; more prosaically, if s = t = 1,
then Alice and Bob win iff a , b, and otherwise, they win iff a = b. We will show that using any
classical strategy, Alice and Bob can win with probability no greater than 43 = 0.75, but using a
√
quantum strategy, they can win with probability cos2 (π/8) = (2 + 2)/4 ≈ 0.85.
First, we consider the case where Alice and Bob employ a (classical) deterministic strategy, that
is, Alice computes a := A(s), where A : {0, 1} → {+1, −1} is a function she chooses beforehand, and
in a similar fashion Bob computes b := B(t) for some B : {0, 1} → {+1, −1} of his choosing. There
are four such functions: the constant +1 function, the constant −1 function, the function mapping
x 7→ (−1)x , and the function mapping x 7→ (−1)1+x . The following table gives, for each possible
pair of functions A and B, the value of ab given each of the four possible choices of (s, t):
202
Each boxed entry of the table is a 2 × 2 matrix with entries ±1. For brevity, we write “+” for +1
and “−” for −1. The rows of each matrix are indexed by the value of s (0 then 1) and the columns
similarly by t. It follows that each matrix is of the form
A(0) A(0)B(0) A(0)B(1)
M(A, B) := B(0) B(1) =
A(1) A(1)B(0) A(1)B(1)
for the given choice of functions A and B. The matrix giving (−1)st (and hence the winning values
for ab) is
+ +
W := .
+ −
If Alice and Bob choose functions A and B, respectively, then they win for each particular (s, t)
if the corresponding entry in M(A, B) matches that of W. By inspecting the table above, one
observes that for each choice of A and B, the matrix M(A, B) differs from W in at least one of the
four entries. Since each combination (s, t) occurs with probability 14 , the probability that they win
is at most 1 − 14 = 34 , no matter which A and B they choose. (Alice and Bob can achieve this optimal
probability by always outputting a = b = +1, for example. Other strategies are optimal as well.)
There are two useful things to note here that will also apply to the classical case in the next
game we consider:
1. Each M(A, B) (considered as a matrix over R) is the product of a column vector with a row
vector, and as such, has rank 1. On the other hand, we observe that W is nonsingular, of
rank 2. This is an alternate, more succinct way of seeing that M(A, B) , W for all A and B.
2. Deterministic strategies (also called pure strategies) are not the only classical strategies avail-
able to Alice and Bob. They could instead employ a mixed strategy wherein they chose their
A and B at random according to some arbitrary joint probability distribution. In this case,
however, their probability of winning is then a convex combination of their winning prob-
abilities using pure strategies, and so cannot exceed that of the best pure strategy. Thus we
only need to consider pure strategies to get an upper bound for all classical strategies.
We now turn to Alice’s and Bob’s quantum strategy. Before receiving s and t from the referee,
Alice and Bob share an EPR pair, i.e., the 2-qubit state
1
|00i + |11i 1 0
|ψi := Φ+ =
√ = √ ,
2 2 0
1
the first qubit possessed by Alice, the second by Bob. It is best to conceive of each qubit as a spin- 12
particle. After receiving s, Alice measures her particle’s spin in a certain direction depending on
the value of s. After receiving t, Bob measures his particle’s spin in a certain direction depending
on the value of t. If Alice sees spin-up (↑), then she sends a := +1 to the referee; otherwise (if
spin-down (↓), she sends a := −1. Bob computes b using the same method; the only difference
between Alice and Bob is which directions they choose to measure their respective spins.
203
Both spin measurements are in the x, z-plane. Generally, for any angle θ, a projective measure-
ment of a spin in the direction having cartesian coordinates (sin θ, 0, cos θ) (that is, clockwise from
the upward direction through angle θ) corresponds to the csop
1
P↑ (θ) = (I + (sin θ)X + (cos θ)Z) , (131)
2
1
P↓ (θ) = (I − (sin θ)X − (cos θ)Z) , (132)
2
where I is the 2 × 2 identity matrix and X and Z are the usual Pauli matrices. Upon receiving s,
Alice chooses angle θs to measure her spin. Upon receiving t, Bob chooses angle ϕt to measure
his spin. The angles θs and ϕt are given by the following tables:
s=0 s=1 t=0 t=1
θs 0 π/2 ϕt π/4 −π/4
θ0 = 0
ϕ1 = −π/4 ϕ0 = π/4
θ1 = π/2
Alice Bob
To find the winning probability, we must compute, for any combination (s, t), the probability
that a = b. Generally, if Alice measures her spin with angle θ and Bob measures his spin with
angle ϕ, then the combined measurement corresponds to the 4-outcome csop
{P↑ (θ) ⊗ P↑ (ϕ), P↑ (θ) ⊗ P↓ (ϕ), P↓ (θ) ⊗ P↑ (ϕ), P↓ (θ) ⊗ P↓ (ϕ)} .
Let C ⊗ D be any of these four projectors (or let C and D be any 2 × 2 matrices generally). We have
a handy formula for the probability of obtaining the corresponding outcome given state |ψi:
c11 c12 d11 d12
hψ|(C ⊗ D)|ψi = hψ| ⊗ |ψi
c21 c22 d21 d22
c11 d11 c11 d12 c12 d11 c12 d12 1
1 c11 d21 c11 d22 c12 d21 c12 d22 0
= 1 0 0 1
c21 d11 c21 d12 c22 d11 c22 d12 0
2
c21 d21 c21 d22 c22 d21 c22 d22 1
1
= (c11 d11 + c12 d12 + c21 d21 + c22 d22 )
2
1 T
= tr C D
2
Applying this formula to the projectors given by (131) and (132), and noting that I, X, and Z are all
symmetric and that X and Z have zero trace, we have
1 1 + sin θ sin ϕ + cos θ cos ϕ
Pr[↑↑] = tr ((I + (sin θ)X + (cos θ)Z)(I + (sin ϕ)X + (cos ϕ)Z)) =
8 4
204
1 1 + sin θ sin ϕ + cos θ cos ϕ
Pr[↓↓] = tr ((I − (sin θ)X − (cos θ)Z)(I − (sin ϕ)X − (cos ϕ)Z)) =
8 4
(There is no need for us to compute Pr[↑↓] or Pr[↓↑].) Thus
1 + sin θ sin ϕ + cos θ cos ϕ 1 + cos(θ − ϕ) θ−ϕ
Pr[a = b] = Pr[↑↑] + Pr[↓↓] = = = cos2 .
2 2 2
If |θs − ϕt | is small, then Pr[a = b] is close to 1; if |θs − ϕt | is close to π, then Pr[a = b] is close to 0.
As the next picture illustrates, Alice’s and Bob’s measurements are chosen so that |θs − ϕt | is close
to π if and only if s = t = 1:
θ0 θ0
ϕ0 ϕ1
ϕ0 ϕ1
θ1 θ1
If s = t = 1, then |θs − ϕt | = 3π/4; otherwise, |θs − ϕt | = π/4. Finally, we can compute Alice’s
and Bob’s winning probability:
205
The Mermin game. This nonlocal game is based on a Bell inequality violation found by David
Mermin, which in turn is based on work by him and Asher Peres. In this game, the referee chooses
values s, t ∈ {0, 1, 2} uniformly at random and independently, so that each pair (s, t) is chosen with
probability 91 . Then, as with the CHSH game above, Alice and Bob produce values a and b in
{+1, −1}, respectively, but now Alice and Bob win if and only if ab = (−1)δst , where δst = 1 if s = t
and δst = 0 otherwise. In words, if s = t, then Alice and Bob win iff a , b, and if s , t, they win
iff a = b. We will show that using any classical strategy, Alice and Bob can win with probability
no greater than 79 ≈ 0.778, but using a quantum strategy, they can win with probability 56 ≈ 0.833.
First, the limits of any classical strategy. As we noted in the discussion of the CHSH game
above, we need only consider pure (deterministic) strategies for Alice and Bob. Any such pure
strategy consists of two functions A, B : {0, 1, 2} → {+1, −1}, one used by Alice and the other used
by Bob. After Alice receives s, she outputs a := A(s), and similarly, Bob outputs b := B(t) upon
receiving t.
The winning value of ab for every (s, t)-combination is given by the following 3 × 3 matrix:
− + +
W := (−1)δst = + − + ,
+ + −
where we again use “+” to mean +1 and “−” to mean −1. For each choice of A and B, the 3 × 3
matrix giving the ab-values is
A(0)
M(A, B) := A(1) B(0) B(1) B(2) .
A(2)
For a given A and B, the winning probability is 19 times the number of entries of M(A, B) that equal
the corresponding entries of W. There are 23 = 8 choices for each of A and B, making 64 matrices
M(A, B) in all. Rather than making an exhaustive table as we did for the CHSH game, we note that
M(A, B) always has rank 1, whereas it is easily checked that W has rank 3. Thus M(A, B) , W for
all A and B, and furthermore, changing any single entry of W still leaves two linearly independent
columns (at least), resulting in a matrix with rank > 2. Thus M(A, B) must differ from W in at least
two places, giving a winning probability of at most 1 − 29 = 97 . (Alice and Bob can achieve this
probability by letting A be any nonconstant function and letting B := −A.)
For the quantum strategy, Alice and Bob share a pair of qubits in the Bell state
|01i − |10i
|χi := Ψ− = √ .
2
Again, think of Alice and Bob each having a spin- 12 particle. After receiving s ∈ {0, 1, 2} from the
referee, Alice projectively measures her spin using the csop
2πs 2πs
Alice: P↑ , P↓ ,
3 3
where P↑ (θ) and P↓ (θ) are defined by Equations (131) and (132) above for all θ ∈ R. Bob makes a
similar measurement, except based on t:
2πt 2πt
Bob: P↑ , P↓ .
3 3
206
Here are the three possible spin directions for each of Alice’s and Bob’s measurements:
s=0 t=0
Alice Bob
If Alice sees spin-up (↑), then she outputs a := +1; if spin-down (↓), then she outputs a := −1.
Likewise, if Bob sees spin-up (↑), he outputs b := +1; if spin-down (↓), he outputs b := −1.
Generally, if Alice measures using angle θ and Bob measures using angle ϕ, then the probability
of getting the same outcome is
2 θ−ϕ
Pr[a = b] = Pr[↑↑] + Pr[↓↓] = hχ|(P↑ (θ) ⊗ P↑ (ϕ))|χi + hχ|(P↓ (θ) ⊗ P↓ (ϕ))|χi = sin ,
2
where verifying the last equation is left as an exercise (Exercise 29.30, below). If s = t, then Alice
and Bob measure their spins in the same direction, giving Pr[a = b] = sin2 0 = 0, hence a , b with
certainty. If s , t, they measure their spins in directions differing by an angle of 2π/3 (in either
direction), and thus Pr[a = b] = sin2 (π/3) = 34 in this case. Therefore, the winning probability is
207
A Final Exam
I) (A linear algebraic inequality) Suppose that A is any operator. Show that A > 0 if and only if
tr(PA) > 0 for all projectors P. EXTRA CREDIT: Show that A > 0 if and only if tr(PA) 6 tr A
for all projectors P. [Hint: The extra credit statement is a corollary of the previous statement.]
II) (A circuit identity) Look at Bob’s phase-error recovery circuit for the Shor code in Figure 16.
Show that the following alternative circuit does exactly the same thing:
b1 ∧ ¬b2 b1 ∧ b2 ¬b1 ∧ b2
|0i H H b1
|0i H H b2
Find a similar alternative for Bob’s phase-error recovery circuit in Figure 14.
(a) Show that if V is any unitary operator, then there exists a (not necessarily unique)
unitary U such that U2 = V. [Hint: All unitary operators are normal.]
(b) Find a two-qubit unitary U such that U2 = SWAP. The U that you find should fix the
vectors |00i and |11i.
√ √
This U is sometimes written as SWAP. It can be shown that SWAP, among many other
two-qubit gates, is (by itself) universal for quantum computation. Also, there is currently
some hope of implementing it flexibly using superconducting Josephson junctions.
208
IV) (Generalized Pauli gates and the QFT) For n > 0, let Xn and Zn be n-qubit unitary operators
such that, for all x ∈ Z2n ,
recalling that en (x) := exp(2πix/2n ). Xn and Zn are n-qubit generalizations of the Pauli X
and Z gates, respectively.
(a) What are X∗n Zn Xn and Z∗n Xn Zn ? (Just show how each behaves on |xi for x ∈ Z2n .)
(b) Draw an n-qubit quantum circuit that implements Zn using only single-qubit condi-
tional phase-shift gates P(θ) for various θ.
(c) Show that Xn and Zn are unitarily conjugate via QFTn .
(d) What are the eigenvalues and eigenvectors of Xn ?
V) (The Schmidt Decomposition) You may either take the following on faith or read a proof of it in
the textbook. (The Schmidt Decomposition is actually just the Singular Value Decomposition
(Theorem B.9 in Section B.3) in disguise.)
Theorem A.1 (Schmidt Decomposition) Let H and J be Hilbert spaces, and let |ψi ∈ H ⊗ J be
any unit vector. There exists an integer k > 0, pairwise orthogonal unit vectors |e1 i, . . . , |ek i ∈ H
P
and |f1 i, . . . , |fk i ∈ J, and positive values λ1 > · · · > λk > 0 such that k 2
j=1 λj = 1 and
X
k
|ψi =
λj (ej ⊗ fj ). (133)
j=1
The vectors |e1 i, . . . , |ek i and |f1 i, . . . , |fk i are known collectively as a Schmidt basis for |ψi,
although they may not span their respective spaces. The λj are called (the) Schmidt coefficients
for |ψi, and k is called the Schmidt number of |ψi.
√
(a) Give full Schmidt decompositions
√ for the Bell states |Φ+ i := (|00i + |11i)/ 2 and
209
VI) (Logical Pauli gates for the Shor code) Recall the nine-qubit Shor code defined by Equa-
tions (108) and (109).
(a) Show that the operator Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 (i.e., a Pauli Z gate applied to each of
the nine qubits) implements the logical Pauli X gate XS , such that XS |0S i = |1S i and
XS |1S i = |0S i.
(b) Find an operator that implements the logical Pauli Z gate ZS , such that ZS |0S i = |0S i
and ZS |1S i = −|1S i.
210
B Background Results
Abstract
These results are background to the course CSCE 790S/CSCE 790B, Quantum Computation
and Information (Spring 2007 and Fall 2011). Each result, or group of related results, is roughly
one page long.
with equality holding iff the two vectors (a1 , . . . , an ) and (b1 , . . . , bn ) are linearly dependent.
Proof. There are many, many ways of proving this. Here is a direct calculation. We have
X
0 6 (ai bj − aj bi )2
16i<j6n
X
= [ai bj (ai bj − aj bi ) − aj bi (ai bj − aj bi )]
i<j
X
= [ai bj (ai bj − aj bi ) + aj bi (aj bi − ai bj )]
i<j
X X
= ai bj (ai bj − aj bi ) + aj bi (aj bi − ai bj )
i<j i<j
X X
= ai bj (ai bj − aj bi ) + ai bj (ai bj − aj bi )
i<j j<i
X
= ai bj (ai bj − aj bi )
i,j
X
= ai bj (ai bj − aj bi )
i,j
X X
= a2i b2j − ai bi aj bj
i,j i,j
! !2
X
n X
n X
n
= a2i b2j − ai b i .
i=1 j=1 i=1
P
Adding ( i ai bi )2 to both sides then taking the square root of both sides (noting that the square
root function is strictly monotone increasing) yields the inequality (134). Clearly, equality holds
above iff ai bj − aj bi = 0 for all i < j, or equivalently, ai bj = aj bi for all i < j. It is not hard to
check that this condition is equivalent to (a1 , . . . , an ) and (b1 , . . . , bn ) being linearly dependent. 2
211
Note that (134) still holds if we remove the absolute value delimiters from the left-hand side.
In that case, equality holds iff there exists a λ > 0 such that either (a1 , . . . , an ) = λ(b1 , . . . , bn ) or
(b1 , . . . , bn ) = λ(a1 , . . . , an ).
Corollary B.2 (Triangle Inequality for Complex Numbers) For any z, w ∈ C, |z + w| 6 |z| + |w|.
= (|z| + |w|)2 .
Proof. We have
Theorem B.5 (Schur Triangular Form) For every n × n matrix M, there exists a unitary U and an
upper triangular T (both n × n matrices) such that M = UT U∗ .
Proof. We prove this by induction on n. The n = 1 case is trivial. Now supposing the theorem
holds for n > 1, we prove it holds for n + 1. Let M be any (n + 1) × (n + 1) matrix. We let A be the
linear operator on Cn+1 whose matrix is M with respect to some orthonormal basis. A has some
eigenvalue λ with corresponding unit eigenvector v. Using the Gram-Schmidt procedure, we can
212
find an orthonormal basis {y1 , . . . , yn+1 } for Cn+1 such that y1 = v. With respect to this basis, the
matrix for A looks like
w∗
λ
N= ,
0 N 0
where w is some vector in Cn and N 0 is an n × n matrix. Since M and N represent the same
operator with respect to different orthonormal bases, they must be unitarily conjugate, i.e., there is
a unitary V such that M = VNV ∗ . N 0 is an n × n matrix, so we apply the inductive hypothesis to
get a unitary W 0 and an upper triangular T 0 (both n × n matrices) such that N 0 = W 0 T 0 W 0∗ . Now
we can factor N:
w∗ w∗ W 0
λ 1 0 λ 1 0
= WT W ∗ ,
N= 0 0 0 0∗
=
0 0
0 0
0 0∗
W T W W T W
where
w∗ W 0
1 0 λ
W= and T = .
0 W0 0 T0
T is clearly upper triangular, and it’s easily checked that WW ∗ = I, using the fact that W 0 is unitary.
Thus W is unitary, and we get M = VNV ∗ = VWT W ∗ V ∗ = UT U∗ , where U = VW is unitary. 2
A Schur basis for an operator A is an orthonormal basis that gives an upper triangular matrix
for A.
Theorem B.6 If an n × n matrix A is both upper triangular and normal, then A is diagonal.
Proof. Suppose that A is upper triangular and normal, but not diagonal. Then there is some i < j
such that [A]ij , 0. Let j be least such that there exists i < j such that [A]ij , 0. For this i and j, we
get
X
n X
n X
n X
n
2
[AA∗ ]ii = [A]ik [A∗ ]ki = [A]ik [A]∗ik = |[A]ik |2 = |[A]ik |2 > |[A]ii |2 +[A]ij > |[A]ii |2 .
Corollary B.7 (Spectral Theorem for Normal Operators) Every normal matrix is unitarily conjugate
to a diagonal matrix. Equivalently, every normal operator has an orthonormal eigenbasis.
213
B.3 The Polar and Singular Value Decompositions
Theorem B.8 (Polar Decomposition) For every n × n matrix A there are is an n × n unitary matrix U
and a unique n × n matrix H such that H > 0 and A = UH. In fact, H = |A|.
Now existence. Let {e1 , . . . , en } be the standard orthonormal basis for Cn . We first prove the
special case where |A| is the diagonal matrix diag(s1 , s2 , . . . , sn ) for some real values s1 > s2 >
· · · > sn > 0. Let 0 6 k 6 n be largest such that sk > 0 (k = 0 if |A| = 0). Thus we have
D 0
|A| = ,
0 0
where D is the k × k nonsingular matrix diag(s1 , . . . , sk ). If j > k, then |A|ej = 0, and thus
2
0 = |A|ej = |A|2 ej = A∗ Aej , whence ∗
Aej = Aej , Aej = ej , A Aej = ej , 0 = 0, and so
Aej = 0. This means that A = B 0 , where B is some n × k matrix, and the last n − k columns
of A are 0. We have
∗ ∗ 2
B B 0 B ∗ D 0
B 0 = A A = |A| = 2
= ,
0 0 0 0 0
Now for the general case. Since |A| > 0 (and hence normal), there is a unitary V such that
V|A|V ∗ = diag(s1 , . . . , sn ) for some real values s1 > · · · > sn > 0. Since
√ √
V|A|V ∗ = V A∗ AV ∗ = VA∗ AV ∗ = (VAV ∗ )∗ (VAV ∗ ) = |VAV ∗ |,
p
we see that VAV ∗ satisfies the special case, above, and so there is a unitary U such that VAV ∗ =
U|VAV ∗ |. It follows that
214
Theorem B.9 (Singular Value Decomposition) For any n × n matrix A there exist unique real values
s1 > s2 > · · · > sn > 0 such that there exist n × n unitary matrices V, W with A = VDW, where
D = diag(s1 , . . . , sn ). Furthermore, s1 , . . . , sn are the eigenvalues of |A|.
and so the diagonal entries of D must be the eigenvalues of |A|. For existence, the Polar Decom-
position gives a unitary U such that A = U|A|. Since |A| > 0 (and hence is normal), there exists
a unitary Y such that |A| = YDY ∗ , where D = diag(s1 , . . . , sn ) for some s1 > · · · > sn > 0. Then
A = U|A| = UYDY ∗ . Setting V := UY and W := Y ∗ proves the theorem. 2
Proof. We start with an integral approximation. The theorem clearly holds for n = 1, so assume
n > 2. Since the log function is concave downward, we claim that for all i such that 2 6 i 6 n,
Zi
log i + log(i − 1) 1
6 log x dx 6 log i − . (138)
2 i−1 2i
The left-hand side is the area of the trapezoid T1 formed by the points (i − 1, 0), (i, 0), (i, log i), (i −
1, log(i − 1)), and the right-hand side is the area of the trapezoid T2 formed by the points (i −
1, 0), (i, 0), (i, log i), (i − 1, log i − 1/i). Note that T2 ’s upper edge is the tangent line to the curve
y = log x at the point (i, log i). By concavity of log, the region under the curve y = log x in the
interval [i − 1, i] contains T1 and is contained in T2 , hence the inequalities (138).
Pn Pn
Now note that log(n!) = i=1 log i = i=2 log i. Summing (138) from i = 2 to n and
simplifying, we get
Zn
1X1
n
log n
log(n!) − 6 log x dx = n log n − n + 1 6 log(n!) − , (139)
2 1 2 i
i=2
215
R
using the closed form log x dx = x log x − x + C. The sum on the right-hand side of (139) is the
Harmonic series, which satisfies another integral approximation:
X
n Zn
1 dx
> = log n − log 2. (140)
i 2 x
i=2
as desired. 2
We only consider random variables that are real-valued and over discrete sample spaces. If X is
such a random variable, then we let E[X] and var[X] respectively denote the expected value (mean)
of X and the variance of X.
Theorem B.12 (Markov’s Inequality) Let X be a random variable with finite mean, and suppose X > 0.
For every real c > 0,
E[X]
Pr[X > c] 6 .
c
216
Theorem B.13 (Chebyshev’s Inequality) Let X be a random variable with finite mean µ and variance σ2 ,
and let a > 0 be real.
σ2
Pr[ |X − µ| > a ] 6 2 .
a
Proof. We invoke Markov’s Inequality with the random variable Y = (X − µ)2 , letting c = a2 .
Note that Y > 0, E[Y] = σ2 , and Pr[ |X − µ| > a ] = Pr[Y > a2 ]. 2
Let p = (p1 , p2 , . . .) and q = (q1 , q2 , . . .) be two probability distributions over some (finite or
infinite) discrete sample space {1, 2, . . .}. The relative entropy of q with respect to p is defined as
X pi
D(p k q) = pi lg , (142)
qi
i
Where the sum is taken over all i such that pi > 0. If qi = 0 and pi > 0 for some i, then
D(p k q) = ∞. Otherwise, the sum in (142) may or may not converge, but we always have the
following regardless:
Proof. We use that fact that log x 6 x − 1 for all x > 0, with equality holding iff x = 1. We have
X pi
D(p k q) = pi lg
qi
i
X qi
= − pi lg
pi
i
1 X qi
= − pi log
log 2 pi
i
1 X
qi
> − pi −1
log 2 pi
i
1 X
= (pi − qi )
log 2
i
X
!
1
= 1− qi
log 2
i
> 0.
217
If (p, 1 − p) and (q, 1 − q) are binary distributions, then we abbreviate D((p, 1 − p) k (q, 1 − q)) by
d(p k q), and we call d(· k ·) the binary relative entropy function. Note that by (143), d(p k 1/2) =
1 − h(p).
It might be necessary to read Section B.6 before this one. In this section we give an upper bound
on left tail the cumulative distribution function for the binomial distribution.
Let
Pt0 < n p< 1 and let n > 0 be an integer. In this section, we give an upper bound for the
sum i=0 i pi (1 − p)n−i , where t 6 pn. [For example, this sum is the probability of getting at
most t heads among n flips of a p-biased coin (i.e., n identical Bernoulli trials with bias p). The
expected number of heads among n flips is pn, and we want to show that the probability of getting
significantly fewer than pn heads diminishes exponentially with n.]
Theorem B.15 Let n be a positive integer. Let 0 < p < 1 be arbitrary, and set q = 1 − p. If t is an integer
such that 0 6 t 6 pn, then
Xt
n i n−i
pq 6 2−nd(t/n k p) , (144)
i
i=0
Proof. If t = 0, then d(t/n k p) = d(0 k p) = − lg q, and so both sides of (144) equal qn and so the
inequality is satisfied.
Now suppose 0 < t 6 pn. Set λ = t/n, and let µ = 1 − λ. Note that 0 < λ 6 p < 1 and
0 < q 6 µ < 1. Define
pt qn−t
C = t n−t .
λ µ
For any 0 6 i 6 t, we have
t−i µ t−i
i n−i q
pq =C λt µn−t 6 C λt µn−t = Cλi µn−i .
p λ
X
t
n i n−i X
t
n i n−i X
n
n i n−i
pq 6C λµ 6C λµ = C(λ + µ)n = C.
i i i
i=0 i=0 i=0
218