Preskill Quantum computing
Preskill Quantum computing
can be implemented in an ion trap with altogether 5 laser pulses. The condi-
tional excitation of a phonon, Eq. (1.35) has been demonstrated experimen-
tally, for a single trapped ion, by the NIST group.
One big drawback of the ion trap computer is that it is an intrinsically
slow device. Its speed is ultimately limited by the energy-time uncertainty
relation. Since the uncertainty in the energy of the laser photons should be
small compared to the characteristic vibrational splitting , each laser pulse
should last a time long compared to ;1 . In practice, is likely to be of
order 100 kHz.
1.9.3 NMR
A third (dark horse) hardware scheme has sprung up in the past year, and
has leap frogged over the ion trap and cavity QED to take the current lead
in coherent quantum processing. The new scheme uses nuclear magnetic
resonance (NMR) technology. Now qubits are carried by certain nuclear
spins in a particular molecule. Each spin can either be aligned (j "i = j0i)
or antialigned (j #i = j1i) with an applied constant magnetic eld. The
spins take a long time to relax or decohere, so the qubits can be stored for a
reasonable time.
We can also turn on a pulsed rotating magnetic eld with frequency
! (where the ! is the energy splitting between the spin-up and spin-down
states), and induce Rabi oscillations of the spin. By timing the pulse suitably,
we can perform a desired unitary transformation on a single spin (just as in
our discussion of the ion trap). All the spins in the molecule are exposed to
the rotating magnetic eld but only those on resonance respond.
Furthermore, the spins have dipole-dipole interactions, and this coupling
can be exploited to perform a gate. The splitting between j "i and j #i for
one spin actually depends on the state of neighboring spins. So whether a
driving pulse is on resonance to tip the spin over is conditioned on the state
of another spin.
1.9. QUANTUM HARDWARE 35
All this has been known to chemists for decades. Yet it was only in the
past year that Gershenfeld and Chuang, and independently Cory, Fahmy, and
Havel, pointed out that NMR provides a useful implementation of quantum
computation. This was not obvious for several reasons. Most importantly,
NMR systems are very hot. The typical temperature of the spins (room
temperature, say) might be of order a million times larger than the energy
splitting between j0i and j1i. This means that the quantum state of our
computer (the spins in a single molecule) is very noisy { it is subject to
strong random thermal uctuations. This noise will disguise the quantum
information. Furthermore, we actually perform our processing not on a single
molecule, but on a macroscopic sample containing of order 1023 \computers,"
and the signal we read out of this device is actually averaged over this ensem-
ble. But quantum algorithms are probabilistic, because of the randomness of
quantum measurement. Hence averaging over the ensemble is not equivalent
to running the computation on a single device; averaging may obscure the
results.
Gershenfeld and Chuang and Cory, Fahmy, and Havel, explained how to
overcome these diculties. They described how \e ective pure states" can
be prepared, manipulated, and monitored by performing suitable operations
on the thermal ensemble. The idea is to arrange for the uctuating properties
of the molecule to average out when the signal is detected, so that only the
underlying coherent properties are measured. They also pointed out that
some quantum algorithms (including Shor's factoring algorithm) can be cast
in a deterministic form (so that at least a large fraction of the computers give
the same answer); then averaging over many computations will not spoil the
result.
Quite recently, NMR methods have been used to prepare a maximally
entangled state of three qubits, which had never been achieved before.
Clearly, quantum computing hardware is in its infancy. Existing hardware
will need to be scaled up by many orders of magnitude (both in the number of
stored qubits, and the number of gates that can be applied) before ambitious
computations can be attempted. In the case of the NMR method, there is
a particularly serious limitation that arises as a matter of principle, because
the ratio of the coherent signal to the background declines exponentially with
the number of spins per molecule. In practice, it will be very challenging to
perform an NMR quantum computation with more than of order 10 qubits.
Probably, if quantum computers are eventually to become practical de-
vices, new ideas about how to construct quantum hardware will be needed.
36 CHAPTER 1. INTRODUCTION AND OVERVIEW
1.10 Summary
This concludes our introductory overview to quantum computation. We
have seen that three converging factors have combined to make this subject
exciting.
1. Quantum computers can solve hard problems. It seems that
a new classi cation of complexity has been erected, a classi cation
better founded on the fundamental laws of physics than traditional
complexity theory. (But it remains to characterize more precisely the
class of problems for which quantum computers have a big advantage
over classical computers.)
2. Quantum errors can be corrected. With suitable coding methods,
we can protect a complicated quantum system from the debilitating
e ects of decoherence. We may never see an actual cat that is half dead
and half alive, but perhaps we can prepare and preserve an encoded cat
that is half dead and half alive.
3. Quantum hardware can be constructed. We are privileged to be
witnessing the dawn of the age of coherent manipulation of quantum
information in the laboratory.
Our aim, in this course, will be to deepen our understanding of points
(1), (2), and (3).
Chapter 2
Foundations I: States and
Ensembles
2.1 Axioms of quantum mechanics
For a few lectures I have been talking about quantum this and that, but
I have never de ned what quantum theory is. It is time to correct that
omission.
Quantum theory is a mathematical model of the physical world. To char-
acterize the model, we need to specify how it will represent: states, observ-
ables, measurements, dynamics.
1. States. A state is a complete description of a physical system. In
quantum mechanics, a state is a ray in a Hilbert space.
What is a Hilbert space?
a) It is a vector space over the complex numbers C. Vectors will be
denoted j i (Dirac's ket notation).
b) It has an inner product h j'i that maps an ordered pair of vectors
to C, de ned by the properties
(i) Positivity: h j i > 0 for j i = 0
(ii) Linearity: h'j(aj 1i + bj 2i) = ah'j 1i + bh'j 2i
(iii) Skew symmetry: h'j i = h j'i
c) It is complete in the norm jj jj = h j i1=2
37
38 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(Completeness is an important proviso in in nite-dimensional function
spaces, since it will ensure the convergence of certain eigenfunction
expansions { e.g., Fourier analysis. But mostly we'll be content to
work with nite-dimensional inner product spaces.)
What is a ray? It is an equivalence class of vectors that di er by multi-
plication by a nonzero complex scalar. We can choose a representative
of this class (for any nonvanishing vector) to have unit norm
h j i = 1: (2.1)
We will also say that j i and ei j i describe the same physical state,
where jei j = 1.
(Note that every ray corresponds to a possible state, so that given two
states j'i; j i, we can form another as aj'i + bj i (the \superposi-
tion principle"). The relative phase in this superposition is physically
signi cant; we identify aj'i + bj'i with ei (aj'i + bj i) but not with
aj'i + ei bj i:)
2. Observables. An observable is a property of a physical system that
in principle can be measured. In quantum mechanics, an observable is
a self-adjoint operator. An operator is a linear map taking vectors to
vectors
A : j i ! Aj i; A (aj i + bj i) = aAj i + bBj i: (2.2)
The adjoint of the operator A is de ned by
h'jA i = hAy'j i; (2.3)
for all vectors j'i; j i (where here I have denoted Aj i as jA i). A is
self-adjoint if A = Ay.
If A and B are self adjoint, then so is A + B (because (A + B)y =
Ay + By) but (AB)y = ByAy, so AB is self adjoint only if A and B
commute. Note that AB + BA and i(AB ; BA) are always self-adjoint
if A and B are.
A self-adjoint operator in a Hilbert space H has a spectral representa-
tion { it's eigenstates form a complete orthonormal basis in H. We can
express a self-adjoint operator A as
X
A = anPn : (2.4)
n
2.1. AXIOMS OF QUANTUM MECHANICS 39
Here each an is an eigenvalue of A, and Pn is the corresponding or-
thogonal projection onto the space of eigenvectors with eigenvalue an.
(If an is nondegenerate, then Pn = jnihnj; it is the projection onto the
corresponding eigenvector.) The Pn 's satisfy
Pn Pm = n;mPn
Pyn = Pn : (2.5)
(For unbounded operators in an in nite-dimensional space, the de ni-
tion of self-adjoint and the statement of the spectral theorem are more
subtle, but this need not concern us.)
3. Measurement. In quantum mechanics, the numerical outcome of a
measurement of the observable A is an eigenvalue of A; right after the
measurement, the quantum state is an eigenstate of A with the mea-
sured eigenvalue. If the quantum state just prior to the measurement
is j i, then the outcome an is obtained with probability
Prob (an) =k Pn j i k2= h jPn j i; (2.6)
If the outcome is an is attained, then the (normalized) quantum state
becomes
Pn j i : (2.7)
(h jPn j i)1=2
(Note that if the measurement is immediately repeated, then according
to this rule the same outcome is attained again, with probability one.)
4. Dynamics. Time evolution of a quantum state is unitary; it is gener-
ated by a self-adjoint operator, called the Hamiltonian of the system. In
the Schrodinger picture of dynamics, the vector describing the system
moves in time as governed by the Schrodinger equation
d j (t)i = ;iHj (t)i; (2.8)
dt
where H is the Hamiltonian. We may reexpress this equation, to rst
order in the in nitesimal quantity dt, as
j (t + dt)i = (1 ; iHdt)j (t)i: (2.9)
40 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
The operator U(dt) 1 ; iHdt is unitary; because H is self-adjoint
it satis es UyU = 1 to linear order in dt. Since a product of unitary
operators is nite, time evolution over a nite interval is also unitary
j (t)i = U(t)j (0)i: (2.10)
In the case where H is t-independent; we may write U = e;itH.
This completes the mathematical formulation of quantum mechanics. We
immediately notice some curious features. One oddity is that the Schrodinger
equation is linear, while we are accustomed to nonlinear dynamical equations
in classical physics. This property seems to beg for an explanation. But far
more curious is the mysterious dualism; there are two quite distinct ways
for a quantum state to change. On the one hand there is unitary evolution,
which is deterministic. If we specify j (0)i, the theory predicts the state
j (t)i at a later time.
But on the other hand there is measurement, which is probabilistic. The
theory does not make de nite predictions about the measurement outcomes;
it only assigns probabilities to the various alternatives. This is troubling,
because it is unclear why the measurement process should be governed by
di erent physical laws than other processes.
Beginning students of quantum mechanics, when rst exposed to these
rules, are often told not to ask \why?" There is much wisdom in this ad-
vice. But I believe that it can be useful to ask way. In future lectures.
we will return to this disconcerting dualism between unitary evolution and
measurement, and will seek a resolution.
2.2.1 Spin- 12
First of all, the coecients a and b in eq. (2.11) encode more than just the
probabilities of the outcomes of a measurement in the fj0i; j1ig basis. In
particular, the relative phase of a and b also has physical signi cance.
For a physicist, it is natural to interpret eq. (2.11) as the spin state of an
object with spin- 12 (like an electron). Then j0i and j1i are the spin up (j "i)
and spin down (j #i) states along a particular axis such as the z-axis. The
two real numbers characterizing the qubit (the complex numbers a and b,
modulo the normalization and overall phase) describe the orientation of the
spin in three-dimensional space (the polar angle and the azimuthal angle
').
We cannot go deeply here into the theory of symmetry in quantum me-
chanics, but we will brie y recall some elements of the theory that will prove
useful to us. A symmetry is a transformation that acts on a state of a system,
42 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
yet leaves all observable properties of the system unchanged. In quantum
mechanics, observations are measurements of self-adjoint operators. If A is
measured in the state j i, then the outcome jai (an eigenvector of A) oc-
curs with probability jhaj ij2. A symmetry should leave these probabilities
unchanged (when we \rotate" both the system and the apparatus).
A symmetry, then, is a mapping of vectors in Hilbert space
j i ! j 0i; (2.12)
that preserves the absolute values of inner products
jh'j ij = jh'0j 0ij; (2.13)
for all j'i and j i. According to a famous theorem due to Wigner, a mapping
with this property can always be chosen (by adopting suitable phase conven-
tions) to be either unitary or antiunitary. The antiunitary alternative, while
important for discrete symmetries, can be excluded for continuous symme-
tries. Then the symmetry acts as
j i ! j 0i = Uj i; (2.14)
where U is unitary (and in particular, linear).
Symmetries form a group: a symmetry transformation can be inverted,
and the product of two symmetries is a symmetry. For each symmetry op-
eration R acting on our physical system, there is a corresponding unitary
transformation U(R). Multiplication of these unitary operators must respect
the group multiplication law of the symmetries { applying R1 R2 should be
equivalent to rst applying R2 and subsequently R1. Thus we demand
U(R1)U(R2) = Phase (R1; R2)U(R1 R2) (2.15)
The phase is permitted in eq. (2.15) because quantum states are rays; we
need only demand that U(R1 R2) act the same way as U(R1)U(R2) on
rays, not on vectors. U(R) provides a unitary representation (up to a phase)
of the symmetry group.
So far, our concept of symmetry has no connection with dynamics. Usu-
ally, we demand of a symmetry that it respect the dynamical evolution of
the system. This means that it should not matter whether we rst transform
the system and then evolve it, or rst evolve it and then transform it. In
other words, the diagram
2.2. THE QUBIT 43
dynamics -
Initial Final
rotation rotation
? ?
dynamics - New Final
New Initial
is commutative. This means that the time evolution operator eitH should
commute with the symmetry transformation U(R) :
U(R)e;itH = e;itHU(R); (2.16)
and expanding to linear order in t we obtain
U(R)H = HU(R) (2.17)
For a continuous symmetry, we can choose R in nitesimally close to the
identity, R = I + T , and then U is close to 1,
U = 1 ; i"Q + O("2): (2.18)
From the unitarity of U (to order ") it follows that Q is an observable,
Q = Qy. Expanding eq. (2.17) to linear order in " we nd
[Q; H] = 0; (2.19)
the observable Q commutes with the Hamiltonian.
Eq. (2.19) is a conservation law. It says, for example, that if we prepare
an eigenstate of Q, then time evolution governed by the Schrodinger equation
will preserve the eigenstate. We have seen that symmetries imply conserva-
tion laws. Conversely, given a conserved quantity Q satisfying eq. (2.19) we
can construct the corresponding symmetry transformations. Finite transfor-
mations can be built as a product of many in nitesimal ones
R = (1 + N T )N ) U(R) = (1 + i N Q)N ! eiQ; (2.20)
(taking the limit N ! 1). Once we have decided how in nitesimal sym-
metry transformations are represented by unitary operators, then it is also
44 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
determined how nite transformations are represented, for these can be built
as a product of in nitesimal transformations. We say that Q is the generator
of the symmetry.
Let us brie y recall how this general theory applies to spatial rotations
and angular momentum. An in nitesimal rotation by d about the axis
speci ed by the unit vector n^ = (n1; n2; n3) can be expressed as
~
R(^n; d) = I ; idn^ J; (2.21)
where (J1; J2; J3) are the components of the angular momentum. A nite
rotation is expressed as
R(^n; ) = exp(;in^ J~): (2.22)
Rotations about distinct axes don't commute. From elementary properties
of rotations, we nd the commutation relations
[Jk; J` ] = i"k`mJm ; (2.23)
where "k`m is the totally antisymmetric tensor with "123 = 1, and repeated
indices are summed. To implement rotations on a quantum system, we nd
self-adjoint operators J1; J2; J3 in Hilbert space that satisfy these relations.
The \de ning" representation of the rotation group is three dimensional,
but the simplest nontrivial irreducible representation is two dimensional,
given by
Jk = 12 k ; (2.24)
where
! ! !
0 1 0 ; i
1 = 1 0 ; 2 = i 0 ; 3 = 0 ;1 ; 1 0 (2.25)
are the Pauli matrices. This is the unique two-dimensional irreducible rep-
resentation, up to a unitary change of basis. Since the eigenvalues of Jk are
12 , we call this the spin- 12 representation. (By identifying J as the angular-
momentum, we have implicitly chosen units with ~ = 1).
The Pauli matrices also have the properties of being mutually anticom-
muting and squaring to the identity,
k ` + ` k = 2k` 1; (2.26)
2.2. THE QUBIT 45
So we see that (^n ~)2 = nk n`k ` = nk nk 1 = 1. By expanding the
exponential series, we see that nite rotations are represented as
U(^n; ) = e;i 2 n^~ = 1 cos 2 ; in^ ~ sin 2 : (2.27)
The most general 2 2 unitary matrix with determinant 1 can be expressed
in this form. Thus, we are entitled to think of a qubit as the state of a spin- 21
object, and an arbitrary unitary transformation acting on the state (aside
from a possible rotation of the overall phase) is a rotation of the spin.
A peculiar property of the representation U(^n; ) is that it is double-
valued. In particular a rotation by 2 about any axis is represented nontriv-
ially:
U(^n; = 2) = ;1: (2.28)
Our representation of the rotation group is really a representation \up to a
sign"
U(R1)U(R2) = U(R1 R2): (2.29)
But as already noted, this is acceptable, because the group multiplication is
respected on rays, though not on vectors. These double-valued representa-
tions of the rotation group are called spinor representations. (The existence
of spinors follows from a topological property of the group | it is not simply
connected.)
While it is true that a rotation by 2 has no detectable e ect on a spin-
1 object, it would be wrong to conclude that the spinor property has no
2
observable consequences. Suppose I have a machine that acts on a pair of
spins. If the rst spin is up, it does nothing, but if the rst spin is down, it
rotates the second spin by 2. Now let the machine act when the rst spin
is in a superposition of up and down. Then
p1 (j "i1 + j #i1) j "i2 ! p1 (j "i1 ; j #i1) j "i2 : (2.30)
2 2
While there is no detectable e ect on the second spin, the state of the rst
has ipped to an orthogonal state, which is very much observable.
In a rotated frame of reference, a rotation R(^n; ) becomes a rotation
through the same angle but about a rotated axis. It follows that the three
components of angular momentum transform under rotations as a vector:
U(R)Jk U(R)y = Rk` J` : (2.31)
46 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
Thus, if a state jmi is an eigenstate of J3
J3jmi = mjmi; (2.32)
then U(R)jmi is an eigenstate of RJ3 with the same eigenvalue:
RJ3 (U(R)jmi) = U(R)J3U(R)yU(R)jmi
= U(R)J3jmi = m (U(R)jmi) : (2.33)
Therefore, we can construct eigenstates of angular momentum along the axis
n^ = (sin cos '; sin sin '; cos ) by applying a rotation through , about the
axis n^ 0 = (; sin '; cos '; 0), to a J3 eigenstate. For our spin- 12 representation,
this rotation is
" # " ;i'!#
exp ;i 2 n^ 0 ~ = exp 2 ei' 0 0 ; e
cos ;e ;i' sin !
= ei' sin2 2 ; (2.34)
2 cos 2
and applying it to 10 , the J3 eigenstate with eigenvalue 1, we obtain
e ;i'=2 cos !
j (; ')i = ei'=2 sin 2 ; (2.35)
2
(up to an overall phase). We can check directly that this is an eigenstate of
cos e;i' sin !
n^ ~ = ei' sin ; cos ; (2.36)
with eigenvalue one. So we have seen that eq. (2.11) with a = e;i'=2 cos 2 ; b =
ei'=2 sin 2 ; can be interpreted as a spin pointing in the (; ') direction.
We noted that we cannot determine a and b with a single measurement.
Furthermore, even with many identical copies of the state, we cannot com-
pletely determine the state by measuring each copy only along the z-axis.
This would enable us to estimate jaj and jbj, but we would learn nothing
about the relative phase of a and b. Equivalently, we would nd the compo-
nent of the spin along the z-axis
h (; ')j3j (; ')i = cos2 2 ; sin2 2 = cos ; (2.37)
2.2. THE QUBIT 47
but we would not learn about the component in the x ; y plane. The problem
of determining j i by measuring the spin is equivalent to determining the
unit vector n^ by measuring its components along various axes. Altogether,
measurements along three di erent axes are required. E.g., from h3i and
h1i we can determine n3 and n1, but the sign of n2 remains undetermined.
Measuring h2i would remove this remaining ambiguity.
Of course, if we are permitted to rotate the spin, then only measurements
along the z-axis will suce. That is, measuring a spin along the n^ axis is
equivalent to rst applying a rotation that rotates the n^ axis to the axis z^,
and then measuring along z^.
In the special case = 2 and ' = 0 (the x^-axis) our spin state is
j "xi = p12 (j "z i + j #z i) ; (2.38)
(\spin-up along the x-axis"). The orthogonal state (\spin down along the
x-axis") is
j #xi = p12 (j "z i ; j #z i) : (2.39)
For either of these states, if we measure the spin along the z-axis, we will
obtain j "z i with probability 12 and j #z i with probability 21 .
Now consider the combination
p1 (j "xi + j #xi) : (2.40)
2
This state has the property that, if we measure the spin along the x-axis, we
obtain j "xi or j #xi, each with probability 12 . Now we may ask, what if we
measure the state in eq. (2.40) along the z-axis?
If these were probabilistic classical bits, the answer would be obvious.
The state in eq. (2.40) is in one of two states, and for each of the two, the
probability is 12 for pointing up or down along the z-axis. So of course we
should nd up with probability 12 when we measure along the z-axis.
But not so for qubits! By adding eq. (2.38) and eq. (2.39), we see that
the state in eq. (2.40) is really j "z i in disguise. When we measure along the
z-axis, we always nd j "z i, never j #z i.
We see that for qubits, as opposed to probabilistic classical bits, proba-
bilities can add in unexpected ways. This is, in its simplest guise, the phe-
nomenon called \quantum interference," an important feature of quantum
information.
48 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
It should be emphasized that, while this formal equivalence with a spin- 12
object applies to any two-level quantum system, of course not every two-level
system transforms as a spinor under rotations!
with eigenvalues 1. Because the eigenvalues are 1 (not 21 ) we say that
the photon has spin-1.
In this context, the quantum interference phenomenon can be described
this way: Suppose that we have a polarization analyzer that allows only
one of the two linear photon polarizations to pass through. Then an x or y
polarized photon has prob 21 of getting through a 45o rotated polarizer, and
a 45o polarized photon has prob 12 of getting through an x and y analyzer.
But an x photon never passes through a y analyzer. If we put a 45o rotated
analyzer in between an x and y analyzer, then 21 the photons make it through
each analyzer. But if we remove the analyzer in the middle no photons make
it through the y analyzer.
A device can be constructed easily that rotates the linear polarization of
a photon, and so applies the transformation Eq. (2.41) to our qubit. As
noted, this is not the most general possible unitary transformation. But if
we also have a device that alters the relative phase of the two orthogonal
linear polarization states
jxi ! ei!=2jxi
jyi ! e;i!=2jyi ; (2.45)
the two devices can be employed together to apply an arbitrary 2 2 unitary
transformation (of determinant 1) to the photon polarization state.
where 0 < pa 1 and Pa pa = 1. If the state is not pure, P thereP are two
or more terms in this sum, and 6= ; in fact, tr = pa < pa = 1.
2 2 2
We say that is an incoherent superposition of the states fj aig; incoherent
meaning that the relative phases of the j ai are experimentally inaccessible.
Since the expectation value of any observable M acting on the subsystem
can be expressed as
X
hMi = trM = pah ajMj ai; (2.61)
a
= 12 (1 + n^ ~) (2.68)
where n^ = (sin cos '; sin sin '; cos ). One nice property of the Bloch
parametrization of the pure states is that while j (; ')i has an arbitrary
overall phase that has no physical signi cance, there is no phase ambiguity
in the density matrix (; ') = j (; ')ih (; ')j; all the parameters in
have a physical meaning.
From the property
1 tr = (2.69)
2 i j ij
we see that
hn^ ~iP~ = tr n^ ~(P~ ) = n^ P~ : (2.70)
Thus the vector P~ in Eq. (2.62) parametrizes the polarization of the spin. If
there are many identically prepared systems at our disposal, we can determine
P~ (and hence the complete density matrix (P~ )) by measuring hn^ ~i along
each of three linearly independent axes.
2.4.1 Entanglement
With any bipartite pure state j iAB we may associate a positive integer, the
Schmidt number, which is the number of nonzero eigenvalues in A (or B )
and hence the number of terms in the Schmidt decomposition of j iAB . In
terms of this quantity, we can de ne what it means for a bipartite pure state
to be entangled: j iAB is entangled (or nonseparable) if its Schmidt number
is greater than one; otherwise, it is separable (or unentangled). Thus, a
separable bipartite pure state is a direct product of pure states in HA and
HB ,
j iAB = j'iA jiB ; (2.92)
then the reduced density matrices A = j'iA A h'j and B = jiB B hj are
pure. Any state that cannot be expressed as such a direct product is entan-
gled; then A and B are mixed states.
62 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
One of our main goals this term will be to understand better the signif-
icance of entanglement. It is not strictly correct to say that subsystems A
and B are uncorrelated if j iAB is separable; after all, the two spins in the
separable state
j "iA j "iB ; (2.93)
are surely correlated { they are both pointing in the same direction. But
the correlations between A and B in an entangled state have a di erent
character than those in a separable state. Perhaps the critical di erence is
that entanglement cannot be created locally. The only way to entangle A and
B is for the two subsystems to directly interact with one another.
We can prepare the state eq. (2.93) without allowing spins A and B to
ever come into contact with one another. We need only send a (classical!)
message to two preparers (Alice and Bob) telling both of them to prepare a
spin pointing along the z-axis. But the only way to turn the state eq. (2.93)
into an entangled state like
p1 (j "iA j "iB + j #iAj #iB ) ; (2.94)
2
is to apply a collective unitary transformation to the state. Local unitary
transformations of the form UA UB , and local measurements performed by
Alice or Bob, cannot increase the Schmidt number of the two-qubit state,
no matter how much Alice and Bob discuss what they do. To entangle two
qubits, we must bring them together and allow them to interact.
As we will discuss later, it is also possible to make the distinction between
entangled and separable bipartite mixed states. We will also discuss various
ways in which local operations can modify the form of entanglement, and
some ways that entanglement can be put to use.
2.6 Summary
Axioms. The arena of quantum mechanics is a Hilbert space H. The
fundamental assumptions are:
(1) A state is a ray in H.
74 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(2) An observable is a self-adjoint operator on H.
(3) A measurement is an orthogonal projection.
(4) Time evolution is unitary.
Density operator. But if we con ne our attention to only a portion of
a larger quantum system, assumptions (1)-(4) need not be satis ed. In par-
ticular, a quantum state is described not by a ray, but by a density operator
, a nonnegative operator with unit trace.2 The density operator is pure (and
the state can be described by a ray) if = ; otherwise, the state is mixed.
An observable M has expectation value tr(M) in this state.
Qubit. A quantum system with a two-dimensional Hilbert space is called
a qubit. The general density matrix of a qubit is
(P~ ) = 12 (1 + P~ ~) (2.120)
where P~ is a three-component vector of length jP~ j 1. Pure states have
jP~ j = 1.
Schmidt decomposition. For any quantum system divided into two
parts A and B (a bipartite system), the Hilbert space is a tensor product HA
HB . For any pure state j iAB of a bipartite system, there are orthonormal
bases fjiiA g for HA and fji0iB g for HB such that
X
j iAB = ppi jiiAji0iB ; (2.121)
i
Eq. (2.121) is called the Schmidt decomposition of j iAB . In a bipartite pure
state, subsystems A and B separately are described by density operators A
and B ; it follows from eq. (2.121) that A and B have the same nonvanish-
ing eigenvalues (the pi's). The number of nonvanishing eigenvalues is called
the Schmidt number of j iAB . A bipartite pure state is said to be entangled
if its Schmidt number is greater than one.
Ensembles. The density operators on a Hilbert space form a convex set,
and the pure states are the extremal points of the set. A mixed state of a
system A can be prepared as an ensemble of pure states in many di erent
ways, all of which are experimentally indistinguishable if we observe system
A alone. Given any mixed state A of system A, any preparation of A
as an ensemble of pure states can be realized in principle by performing a
2.7. EXERCISES 75
measurement in another system B with which A is entangled. In fact given
many such preparations of A , there is a single entangled state of A and
B such that any one of these preparations can be realized by measuring a
suitable observable in B (the GHJW theorem). By measuring in system B
and reporting the measurement outcome to system A, we can extract from
the mixture a pure state chosen from one of the ensembles.
2.7 Exercises
2.1 Fidelity of a random guess
A single qubit (spin- 21 object) is in an unknown pure state j i, se-
lected at random from an ensemble uniformly distributed over the Bloch
sphere. We guess at random that the state is ji. On the average, what
is the delity F of our guess, de ned by
F jhj ij2 : (2.122)
2.2 Fidelity after measurement
After randomly selecting a one-qubit pure state as in the previous prob-
lem, we perform a measurement of the spin along the z^-axis. This
measurement prepares a state described by the density matrix
= P"h jP"j i + P#h jP#j i (2.123)
(where P";# denote the projections onto the spin-up and spin-down
states along the z^-axis). On the average, with what delity
F h jj i (2.124)
does this density matrix represent the initial state j i? (The improve-
ment in F compared to the answer to the previous problem is a crude
measure of how much we learned by making the measurement.)
2.3 Schmidt decomposition
For the two-qubit state
p ! p !
1 1 3 1 3 1
= p j "iA 2 j "iB + 2 j #iB + p j #iA 2 j "iB + 2 j #iB ;
2 2 (2.125)
76 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
a. Compute A = trB (jihj) and B = trA (jihj).
b. Find the Schmidt decomposition of ji.
2.4 Tripartite pure state
Is there a Schmidt decomposition for an arbitrary tripartite pure state?
That is if j iABC is an arbitrary vector in HA HB HC , can we nd
orthonormal bases fjiiA g, fjiiB g, fjiiC g such that
X
j iABC = ppi jiiA jiiB jiiC ? (2.126)
i
Explain your answer.
2.5 Quantum correlations in a mixed state
Consider a density matrix for two qubits
= 81 1 + 12 j ;ih ; j ; (2.127)
where 1 denotes the 4 4 unit matrix, and
j ;i = p12 (j "ij #i ; j #ij "i) : (2.128)
Suppose we measure the rst spin along the n^ axis and the second spin
along the m^ axis, where n^ m^ = cos . What is the probability that
both spins are \spin-up" along their respective axes?
Chapter 3
Foundations II: Measurement
and Evolution
3.1 Orthogonal Measurement and Beyond
3.1.1 Orthogonal Measurements
We would like to examine the properties of the generalized measurements
that can be realized on system A by performing orthogonal measurements
on a larger system that contains A. But rst we will brie y consider how
(orthogonal) measurements of an arbitrary observable can be achieved in
principle, following the classic treatment of Von Neumann.
To measure an observable M, we will modify the Hamiltonian of the world
by turning on a coupling between that observable and a \pointer" variable
that will serve as the apparatus. The coupling establishes entanglement
between the eigenstates of the observable and the distinguishable states of the
pointer, so that we can prepare an eigenstate of the observable by \observing"
the pointer.
Of course, this is not a fully satisfying model of measurement because we
have not explained how it is possible to measure the pointer. Von Neumann's
attitude was that one can see that it is possible in principle to correlate
the state of a microscopic quantum system with the value of a macroscopic
classical variable, and we may take it for granted that we can perceive the
value of the classical variable. A more complete explanation is desirable and
possible; we will return to this issue later.
We may think of the pointer as a particle that propagates freely apart
77
78 CHAPTER 3. MEASUREMENT AND EVOLUTION
from its tunable coupling to the quantum system being measured. Since we
intend to measure the position of the pointer, it should be prepared initially
in a wavepacket state that is narrow in position space | but not too narrow,
because a vary narrow wave packet will spread too rapidly. If the initial
width of the wave packet is x, then the uncertainty in it velocity will be
of order v = p=m ~=mx, so that after a time t, the wavepacket will
spread to a width
x(t) x + m~t x ; (3.1)
which is minimized for [x(t)]2 [x]2 ~t=m. Therefore, if the experi-
ment takes a time t, the resolution we can achieve for the nal position of
the pointer is limited by
s
~t
x >(x)SQL m ; (3.2)
the \standard quantum limit." We will choose our pointer to be suciently
heavy that this limitation is not serious.
The Hamiltonian describing the coupling of the quantum system to the
pointer has the form
H = H0 + 21m P2 + MP; (3.3)
where P2=2m is the Hamiltonian of the free pointer particle (which we will
henceforth ignore on the grounds that the pointer is so heavy that spreading
of its wavepacket may be neglected), H0 is the unperturbed Hamiltonian of
the system to be measured, and is a coupling constant that we are able to
turn on and o as desired. The observable to be measured, M, is coupled to
the momentum P of the pointer.
If M does not commute with H0, then we have to worry about how the
observable evolves during the course of the measurement. To simplify the
analysis, let us suppose that either [M; H0] = 0, or else the measurement
is carried out quickly enough that the free evolution of the system can be
neglected during the measurement procedure. Then the Hamiltonian can be
approximated as H ' MP (where of course [M; P] = 0 because M is an
observable of the system and P is an observable of the pointer), and the time
evolution operator is
U(t) ' exp[;itMP]: (3.4)
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 79
Expanding in the basis in which M is diagonal,
X
M = jaiMahaj; (3.5)
a
we express U(t) as
X
U(t) = jai exp[;itMaP]haj: (3.6)
a
Now we recall that P generates a translation of the position of the pointer:
P = ;i dxd in the position representation, so that e;ixoP = exp ;xo dxd , and
by Taylor expanding,
e;ixoP (x) = (x ; xo); (3.7)
In other words e;ixoP acting on a wavepacket translates the wavepacket by xo.
We see that if our quantum system starts in a superposition of M eigenstates,
initially unentangled with the position-space wavepacket j (x) of the pointer,
then after time t the quantum state has evolved to
X !
U(t) a jai j (x)i
a
X
= a jai j (x ; tMa)i; (3.8)
a
the position of the pointer is now correlated with the value of the observable
M. If the pointer wavepacket is narrow enough for us to resolve all values of
the Ma that occur (x <tMa), then when we observe the position of the
pointer (never mind how!) we will prepare an eigenstate of the observable.
With probability j aj2, we will detect that the pointer has shifted its position
by tMa, in which case we will have prepared the M eigenstate jai. In the
end, then, we conclude that the initial state j'i or the quantum system is
projected to jai with probability jhaj'ij2. This is Von Neumann's model of
orthogonal measurement.
The classic example is the Stern{Gerlach apparatus. To measure 3 for a
spin- 21 object, we allow the object to pass through a region of inhomogeneous
magnetic eld
B3 = z: (3.9)
80 CHAPTER 3. MEASUREMENT AND EVOLUTION
The magnetic moment of the object is ~ , and the coupling induced by the
magnetic eld is
H = ;z 3: (3.10)
In this case 3 is the observable to be measured, coupled to the position
z rather than the momentum of the pointer, but that's all right because z
generates a translation of Pz , and so the coupling imparts an impulse to the
pointer. We can perceive whether the object is pushed up or down, and so
project out the spin state j "z i or j #z i. Of course, by rotating the magnet,
we can measure the observable n^ ~ instead.
Our discussion of the quantum eraser has cautioned us that establishing
the entangled state eq. (3.8) is not sucient to explain why the measurement
procedure prepares an eigenstate of M. In principle, the measurement of the
pointer could project out a peculiar superposition of position eigenstates,
and so prepare the quantum system in a superposition of M eigenstates. To
achieve a deeper understanding of the measurement process, we will need to
explain why the position eigenstate basis of the pointer enjoys a privileged
status over other possible bases.
If indeed we can couple any observable to a pointer as just described, and
we can observe the pointer, then we can perform any conceivable orthogonal
projection in Hilbert space. Given a set of operators fEag such that
X
Ea = Eya; EaEb = abEa; Ea = 1; (3.11)
a
we can carry out a measurement procedure that will take a pure state j ih j
to
Eaj ih jEa (3.12)
h jEaj i
with probability
Prob(a) = h jEaj i: (3.13)
The measurement outcomes can be described by a density matrix obtained
by summing over all possible outcomes weighted by the probability of that
outcome (rather than by choosing one particular outcome) in which case the
measurement modi es the initial pure state according to
X
j ih j ! Eaj ih jEa: (3.14)
a
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 81
This is the ensemble of pure states describing the measurement outcomes
{ it is the description we would use if we knew a measurement had been
performed, but we did not know the result. Hence, the initial pure state has
become a mixed state unless the initial state happened to be an eigenstate
of the observable being measured. If the initial state before the measure-
ment were a mixed state with density matrix , then by expressing as an
ensemble of pure states we nd that the e ect of the measurement is
! X EaEa: (3.15)
a
Now let's change our perspective on eq. (3.28). Interpret the ( a)i's not as
n N vectors in an N -dimensional space, but rather an N n vectors
( iT )a in an n-dimensional space. Then eq. (3.28) becomes the statement
that these N vectors form an orthonormal set. Naturally, it is possible to
extend these vectors to an orthonormal basis for an n-dimensional space. In
other words, there is an n n matrix uai, with uai = ~ai for i = 1; 2; : : : ; N ,
such that
X
uaiuaj = ij ; (3.29)
a
3.2 Superoperators
3.2.1 The operator-sum representation
We now proceed to the next step of our program of understanding the be-
havior of one part of a bipartite quantum system. We have seen that a pure
state of the bipartite system may behave like a mixed state when we observe
subsystem A alone, and that an orthogonal measurement of the bipartite
system may be a (nonorthogonal) POVM on A alone. Next we ask, if a state
of the bipartite system undergoes unitary evolution, how do we describe the
evolution of A alone?
Suppose that the initial density matrix of the bipartite system is a tensor
product state of the form
A j0iB B h0j; (3.64)
system A has density matrix A, and system B is assumed to be in a pure
state that we have designated j0iB . The bipartite system evolves for a nite
time, governed by the unitary time evolution operator
UAB (A j0iB B h0j) UAB : (3.65)
Now we perform the partial trace over HB to nd the nal density matrix of
system A,
0A = trB UAB (A j0iB B h0j) UyAB
X
= B hjUAB j0iB A B h0jUAB jiB ; (3.66)
where fjiB g is an orthonormal basis for HB0 and B hjUAB j0iB is an operator
acting on HA . (If fjiiA jiB g is an orthonormal basis for HA HB , then
B hjUAB j iB denotes the operator whose matrix elements are
A hij (B hjUAB j iB ) jj iA
3.2. SUPEROPERATORS 93
= (A hij B hj) UAB (jj iA j iB ) :) (3.67)
If we denote
M = B hjUAB j0iB ; (3.68)
then we may express 0A as
X
$(A) 0A = MAMy: (3.69)
It follows from the unitarity of UAB that the M 's satisfy the property
X y X
M M = B h0jUyAB jiB B hjUAB j0iB
= B h0jUyAB UAB j0iB = 1A: (3.70)
Eq. (3.69) de nes a linear map $ that takes linear operators to linear
operators. Such a map, if the property in eq. (3.70) is satis ed, is called a
superoperator, and eq. (3.69) is called the operator sum representation (or
Kraus representation) of the superoperator. A superoperator can be regarded
as a linear map that takes density operators to density operators, because it
follows from eq. (3.69) and eq. (3.70) that 0A is a density matrix if A is:
(1) 0 is hermitian: 0y = P M y My = .
A A A A
(2) 0A has unit trace: tr0A = P tr(AMy M) = trA = 1.
(3) 0A is positive: A h j0Aj iA = P(h jM )A(My j i) 0.
We showed that the operator sum representation in eq. (3.69) follows from
the \unitary representation" in eq. (3.66). But furthermore, given the oper-
ator sum representation of a superoperator, it is always possible to construct
a corresponding unitary representation. We choose HB to be a Hilbert space
whose dimension is at least as large as the number of terms in the operator
sum. If fj'Ag is any vector in HA, the fjiB g are orthonormal states in HB ,
and j0iB is some normalized state in HB , de ne the action of UAB by
X
UAB (j'iA j0iB ) = Mj'iA jiB : (3.71)
94 CHAPTER 3. MEASUREMENT AND EVOLUTION
This action is inner product preserving:
X ! X !
A h'2 jM B h j M j'1iA jiB
y
X
= Ah'2j My Mj'1iA = A h'2j'1iA ; (3.72)
therefore, UAB can be extended to a unitary operator acting on all of HA
HB . Taking the partial trace we nd
trB UAB (j'iA j0iB ) (A h'j B h0j) UyAB
X
= M (j'iA A h'j) My : (3.73)
Since any A can be expressed as an ensemble of pure states, we recover the
operator sum representation acting on an arbitrary A .
It is clear that the operator sum representation of a given superoperator
$ is not unique. We can perform the partial trace in any basis we please. If
we use the basis fB h 0 j = P U B hjg then we obtain the representation
X
$(A ) = N ANy ; (3.74)
where N = U M. We will see shortly that any two operator-sum repre-
sentations of the same superoperator are always related this way.
Superoperators are important because they provide us with a formalism
for discussing the general theory of decoherence, the evolution of pure states
into mixed states. Unitary evolution of A is the special case in which there
is only one term in the operator sum. If there are two or more terms, then
there are pure initial states of HA that become entangled with HB under
evolution governed by UAB . That is, if the operators M1 and M2 appearing
in the operator sum are linearly independent, then there is a vector j'iA such
that j'~1iA = M1j'iA and j'~2iA = M2j'iA are linearly independent, so that
the state j'~1iA j1iB + j'~2iA j2iB + has Schmidt number greater than one.
Therefore, the pure state j'iA A h'j evolves to the mixed nal state 0A.
Two superoperators $1 and $2 can be composed to obtain another super-
operator $2 $1; if $1 describes evolution from yesterday to today, and $2
3.2. SUPEROPERATORS 95
describes evolution from today to tomorrow, then $2 $1 describes the evolu-
tion from yesterday to tomorrow. But is the inverse of a superoperator also a
superoperator; that is, is there a superoperator that describes the evolution
from today to yesterday? In fact, you will show in a homework exercise that
a superoperator is invertible only if it is unitary.
Unitary evolution operators form a group, but superoperators de ne a
dynamical semigroup. When decoherence occurs, there is an arrow of time;
even at the microscopic level, one can tell the di erence between a movie that
runs forwards and one running backwards. Decoherence causes an irrevocable
loss of quantum information | once the (dead) cat is out of the bag, we can't
put it back in again.
3.2.2 Linearity
Now we will broaden our viewpoint a bit and consider the essential properties
that should be satis ed by any \reasonable" time evolution law for density
matrices. We will see that any such law admits an operator-sum representa-
tion, so in a sense the dynamical behavior we extracted by considering part
of a bipartite system is actually the most general possible.
A mapping $ : ! 0 that takes an initial density matrix to a nal
density matrix 0 is a mapping of operators to operators that satis es
(1) $ preserves hermiticity: 0 hermitian if is.
(2) $ is trace preserving: tr0 = 1 if tr = 1.
(3) $ is positive: 0 is nonnegative if is.
It is also customary to assume
(0) $ is linear.
While (1), (2), and (3) really are necessary if 0 is to be a density matrix,
(0) is more open to question. Why linearity?
One possible answer is that nonlinear evolution of the density matrix
would be hard to reconcile with any ensemble interpretation. If
$ (()) $ (1 + (1 ; )2) = $(1) + (1 ; )$(2);
(3.75)
96 CHAPTER 3. MEASUREMENT AND EVOLUTION
then time evolution is faithful to the probabilistic interpretation of ():
either (with probability ) 1 was initially prepared and evolved to $(1), or
(with probability 1 ; ) 2 was initially prepared and evolved to $(2). But
a nonlinear $ typically has consequences that are seemingly paradoxical.
Consider, for example, a single qubit evolving according to
$() = exp [i1tr(1)] exp [;i1tr(1)] : (3.76)
One can easily check that $ is positive and trace-preserving. Suppose that
the initial density matrix is = 12 1, realized as the ensemble
= 12 j "z ih"z j + 21 j #z ih#z j: (3.77)
Since tr(1) = 0, the evolution of is trivial, and both representatives of
the ensemble are unchanged. If the spin was prepared as j "z i, it remains in
the state j "z i.
But now imagine that, immediately after preparing the ensemble, we do
nothing if the state has been prepared as j "z i, but we rotate it to j "xi if it
has been prepared as j #z i. The density matrix is now
0 = 12 j "z ih"z j + 12 j "xij "xi; (3.78)
so that tr01 = 12 . Under evolution governed by $, this becomes $(0) =
101. In this case then, if the spin was prepared as j "z i, it evolves to the
orthogonal state j #z i.
The state initially prepared as j "z i evolves di erently under these two
scenarios. But what is the di erence between the two cases? The di erence
was that if the spin was initially prepared as j #z i, we took di erent actions:
doing nothing in case (1) but rotating the spin in case (2). Yet we have found
that the spin behaves di erently in the two cases, even if it was initially
prepared as j "z i!
We are accustomed to saying that describes two (or more) di erent
alternative pure state preparations, only one of which is actually realized
each time we prepare a qubit. But we have found that what happens if we
prepare j "z i actually depends on what we would have done if we had prepared
j #xi instead. It is no longer sensible, apparently, to regard the two possible
preparations as mutually exclusive alternatives. Evolution of the alternatives
actually depends on the other alternatives that supposedly were not realized.
3.2. SUPEROPERATORS 97
Joe Polchinski has called this phenomenon the \Everett phone," because the
di erent \branches of the wave function" seem to be able to \communicate"
with one another.
Nonlinear evolution of the density matrix, then, can have strange, perhaps
even absurd, consequences. Even so, the argument that nonlinear evolution
should be excluded is not completely compelling. Indeed Jim Hartle has
argued that there are versions of \generalized quantum mechanics" in which
nonlinear evolution is permitted, yet a consistent probability interpretation
can be salvaged. Nevertheless, we will follow tradition here and demand that
$ be linear.
(where q > 0, P q = 1, and each j~ i, like j ~iAB , is normalized so that
h~ j~ i = N ). Invoking the relative-state method, we have
$A(j'iA A h'j) =B h'j($A IB )(j ~iAB AB h ~j)j'iB
X
= q B h'j~ iAB AB h~ j'iB : (3.100)
Now we are almost done; we de ne an operator M on HA by
M : j'iA ! pq B h'j~ iAB : (3.101)
We can check that:
1. M is linear, because the map j'iA ! j'iB is antilinear.
2. $A(j'iA A h'j) = P M (j'iA Ah'j)M y, for any pure state j'iA 2 HA.
102 CHAPTER 3. MEASUREMENT AND EVOLUTION
3. $A (A) = P M A M y for any density matrix A, because A can be
expressed as an ensemble of pure states, and $A is linear.
4. P M y M = 1A , because $A is trace preserving for any A.
Thus, we have constructed an operator-sum representation of $A.
Put succinctly, the argument went as follows. Because $A is completely
positive, $A IB takes a maximally entangled density matrix on HA HB to
another density matrix. This density matrix can be expressed as an ensemble
of pure states. With each of these pure states in HA HB , we may associate
(via the relative-state method) a term in the operator sum.
Viewing the operator-sum representation this way, we may quickly estab-
lish two important corollaries:
How many Kraus operators? Each M is associated with a state
j i in the ensemble representation of ~AB . Since ~AB has a rank at most
0 0
N 2 (where N = dim HA ), $A always has an operator-sum representation with
at most N 2 Kraus operators.
How ambiguous? We remarked earlier that the Kraus operators
Na = M Ua; (3.102)
(where Ua is unitary) represent the same superoperator $ as the M 's. Now
we can see that any two Kraus representations of $ must always be related
in this way. (If there are more Na's than M 's, then it is understood that
some zero operators are added to the M 's so that the two operator sets
have the same cardinality.) This property may be viewed as a consequence
of the GHJW theorem.
The relative-state construction described above established a 1 ; 1 corre-
spondence between ~ ensemble~ representations of the (unnormalized) density
matrix ($A IB) j iAB AB h j and operator-sum representations of $A . (We
explicitly described how to proceed from the ensemble representation to the
operator sum, but we can clearly go the other way, too: If
X
$A(jiiA Ahj j) = M jiiA Ahj jM y ; (3.103)
then
X
($A IB )(j ~iAB AB h ~j) = (M jiiA ji0iB )(Ahj jM y B hj 0j)
i;j
X
= qj~ iAB AB h~ j; (3.104)
3.3. THE KRAUS REPRESENTATION THEOREM 103
where
pqj~ iAB = X M jiiAji0iB : ) (3.105)
i
Now consider two suchpensembles (or correspondingly
p two operator-sum rep-
~ ~
resentations of $A), f qj iAB g and f pajaiAB g. For each ensemble,
there is a corresponding \puri cation" in HAB HC :
Xp ~
qjiAB j iC
Xp ~
pajaiAB j aiC ; (3.106)
a
where f( iC g and fj aiC g are two di erent orthonormal sets in Hc . The
GHJW theorem asserts that these two puri cations are related by 1AB U0C ,
a unitary transformation on HC . Therefore,
Xp ~
pa jaiAB j aiC
a
Xp ~
= qjiAB U0C j iC
X
= pqj~ iAB Ua j aiC ; (3.107)
;a
where, to establish the second equality we note that the orthonormal bases
fj iC g and fj aiC g are related by a unitary transformation, and that a
product of unitary transformations is unitary. We conclude that
pp j~ i = X pq j~ i U ; (3.108)
a a AB AB a
(where Ua is unitary) from which follows
N a = X M Ua : (3.109)
Remark. Since we have already established that we can proceed from an
operator-sum representation of $ to a unitary representation, we have now
found that any \reasonable" evolution law for density operators on HA can
104 CHAPTER 3. MEASUREMENT AND EVOLUTION
be realized by a unitary transformation UAB that acts on HA HB according
to
X
UAB : j iA j0iB ! j'iA jiB : (3.110)
Is this result surprising? Perhaps it is. We may interpret a superoperator as
describing the evolution of a system (A) that interacts with its environment
(B ). The general states of system plus environment are entangled states.
But in eq. (3.110), we have assumed an initial state of A and B that is
unentangled. Apparently though a real system is bound to be entangled
with its surroundings, for the purpose of describing the evolution of its density
matrix there is no loss of generality if we imagine that there is no pre-existing
entanglement when we begin to track the evolution!
Remark: The operator-sum representation provides a very convenient
way to express any completely positive $. But a positive $ does not admit
such a representation if it is not completely positive. As far as I know, there
is no convenient way, comparable to the Kraus representation, to express the
most general positive $.
If an error occurs, then j i evolves to an ensemble of the three states 1j i; 2j i; 3j i,
all occuring with equal likelihood.
Unitary representation
The depolarizing channel can be represented by a unitary operator acting on
HA HE , where HE has dimension 4. (I am calling it HE here to encour-
age you to think of the auxiliary system as the environment.) The unitary
operator UAE acts as
UAE : j iA j0iE
q rp2
! 1 ; pj i j0iE + 3 41j iA j1iE
3
+ 2j i j2iE + 3j i j3iE 5: (3.111)
Bloch-sphere representation
This will be worked out in a homework exercise.
Interpretation
We might interpret the phase-damping channel as describing a heavy \clas-
sical" particle (e.g., an interstellar dust grain) interacting with a background
gas of light particles (e.g., the 30K microwave photons). We can imagine
that the dust is initially prepared in a superposition of position eigenstates
j i = p12 (jxi + j ; xi) (or more generally a superposition of position-space
wavepackets with little overlap). We might be able to monitor the behavior
of the dust particle, but it is hopeless to keep track of the quantum state of
all the photons that scatter from the particle; for our purposes, the quantum
state of the particle is described by the density matrix obtained by tracing
over the photon degrees of freedom.
Our analysis of the phase damping channel indicates that if photons are
scattered by the dust particle at a rate ;, then the o -diagonal terms in
decay like exp(;;t), and so become completely negligible for t ;;1 .
At that point, the coherence of the superposition of position eigenstates is
completely lost { there is no chance that we can recombine the wavepackets
and induce them to interfere. (If we attempt to do a double-slit interference
pattern with dust grains, we will not see any interference pattern if it takes
a time t ;;1 for the grain to travel from the source to the screen.)
The dust grain is heavy. Because of its large inertia, its state of motion is
little a ected by the scattered photons. Thus, there are two disparate time
scales relevant to its dynamics. On the one hand, there is a damping time
scale, the time for a signi cant amount of the particle's momentum to be
transfered to the photons; this is a long time if the particle is heavy. On the
other hand, there is the decoherence time scale. In this model, the time scale
for decoherence is of order ;, the time for a single photon to be scattered
by the dust grain, which is far shorter than the damping time scale. For a
3.4. THREE QUANTUM CHANNELS 111
macroscopic object, decoherence is fast.
As we have already noted, the phase-damping channel picks out a pre-
ferred basis for decoherence, which in our \interpretation" we have assumed
to be the position-eigenstate basis. Physically, decoherence prefers the spa-
tially localized states of the dust grain because the interactions of photons
and grains are localized in space. Grains in distinguishable positions tend to
scatter the photons of the environment into mutually orthogonal states.
Even if the separation between the \grains" were so small that it could
not be resolved very well by the scattered photons, the decoherence process
would still work in a similar way. Perhaps photons that scatter o grains at
positions x and ;x are not mutually orthogonal, but instead have an overlap
h + j ;i = 1 ; "; " 1: (3.128)
The phase-damping channel would still describe this situation, but with p
replaced by p" (if p is still the probability of a scattering event). Thus, the
decoherence rate would become ;dec = ";scat; where ;scat is the scattering
rate (see the homework).
The intuition we distill from this simple model applies to a vast variety
of physical situations. A coherent superposition of macroscopically distin-
guishable states of a \heavy" object decoheres very rapidly compared to its
damping rate. The spatial locality of the interactions of the system with its
environment gives rise to a preferred \local" basis for decoherence. Presum-
ably, the same principles would apply to the decoherence of a \cat state"
p1 (j deadi + j alivei), since \deadness" and \aliveness" can be distinguished
2
by localized probes.
9 The nth level of excitation of the oscillator may be interpreted as a state of n nonin-
teracting particles; the rate is n; because any one of the n particles can decay.
10This model extends our discussion of the amplitude-damping channel to a damped
oscillator rather than a damped qubit.
3.5. MASTER EQUATION 121
1
= ;tr 2 [a ; a]aI = ; ;2 tr(aI ) = ; ;2 ha~ i:
y (3.161)
Integrating this equation, we obtain
ha~ (t)i = e;;t=2ha~ (0)i: (3.162)
Similarly, the occupation number of the oscillator n aya = a~ ya~ decays
according to
d hni = d ha~ ya~ i = tr(aya_ )
dt dt I
= ;tr ayaaI ay ; 12 ayaayaI ; 21 ayaI aya
= ;tray[ay; a]aI = ;;trayaI = ;;hni; (3.163)
which integrates to
hn(t)i = e;;thn(0)i: (3.164)
Thus ; is the damping rate of the oscillator. We can interpret the nth
excitation state of the oscillator as a state of n noninteracting particles,
each with a decay probability ; per unit time; hence eq. (3.164) is just the
exponential law satis ed by the population of decaying particles.
More interesting is what the master equation tells us about decoherence.
The details of that analysis will be a homework exercise. But we will analyze
here a simpler problem { an oscillator undergoing phase damping.
jNo decayiatom jAliveicat jKnow it0s Aliveime Prob = 21
(3.178)
This describes two alternatives, but for either alternative, I am certain
about the health of the cat. I never see a cat that is half alive and half dead.
(I am in an eigenstate of the \certainty operator," in accord with experience.)
By assuming that the wave function describes reality and that all evo-
lution is unitary, we are led to the \many-worlds interpretation" of quan-
tum theory. In this picture, each time there is a \measurement," the wave
function of the universe \splits" into two branches, corresponding to the
two possible outcomes. After many measurements, there are many branches
(many worlds), all with an equal claim to describing reality. This prolifera-
tion of worlds seems like an ironic consequence of our program to develop the
most economical possible description. But we ourselves follow one particular
branch, and for the purpose of predicting what we will see in the next instant,
the many other branches are of no consequence. The proliferation of worlds
comes at no cost to us. The \many worlds" may seem weird, but should
we be surprised if a complete description of reality, something completely
foreign to our experience, seems weird to us?
By including ourselves in the reality described by the wave function, we
have understood why we perceive a de nite outcome to a measurement, but
there is still a further question: how does the concept of probability enter
128 CHAPTER 3. MEASUREMENT AND EVOLUTION
into this (deterministic) formalism? This question remains troubling, for to
answer it we must be prepared to state what is meant by \probability."
The word \probability" is used in two rather di erent senses. Sometimes
probability means frequency. We say the probability of a coin coming up
heads is 1=2 if we expect, as we toss the coin many times, the number of
heads divided by the total number of tosses to converge to 1=2. (This is a
tricky concept though; even if the probability is 1=2, the coin still might come
up heads a trillion times in a row.) In rigorous mathematical discussions,
probability theory often seems to be a branch of measure theory { it concerns
the properties of in nite sequences.
But in everyday life, and also in quantum theory, probabilities typically
are not frequencies. When we make a measurement, we do not repeat it
an in nite number of times on identically prepared systems. In the Everett
viewpoint, or in cosmology, there is just one universe, not many identically
prepared ones.
So what is a probability? In practice, it is a number that quanti es the
plausibility of a proposition given a state of knowledge. Perhaps surpris-
ingly, this view can be made the basis of a well-de ned mathematical theory,
sometimes called the \Bayesian" view of probability. The term \Bayesian"
re ects the way probability theory is typically used (both in science and in
everyday life) { to test a hypothesis given some observed data. Hypothesis
testing is carried out using Bayes's rule for conditional probability
P (A0jB ) = P (B jA0)P (A0)=P (B ): (3.179)
For example { suppose that A0 is the preparation of a particular quantum
state, and B is a particular outcome of a measurement of the state. We
have made the measurement (obtaining B ) and now we want to infer how
the state was prepared (compute P (A0jB ). Quantum mechanics allows us to
compute P (B jA0). But it does not tell us P (A0) (or P (B )). We have to make
a guess of P (A0), which is possible if we adopt a \principle of indi erence"
{ if we have no knowledge that Ai is more or less likely than Aj we assume
P (Ai) = P (Aj ). Once an ensemble of preparations is chosen, we can compute
X
P (B ) = P (B=Ai )P (Ai); (3.180)
i
and so obtain P (A0jB ) by applying Bayes's rule.
But if our attitude will be that probability theory quanti es plausibility
given a state of knowledge, we are obligated to ask \whose state of knowl-
edge?" To recover an objective theory, we must interpret probability in
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 129
quantum theory not as a prediction based on our actual state of knowledge,
but rather as a prediction based on the most complete possible knowledge
about the quantum state. If we prepare j "xi and measure 3, then we say
that the result is j "z i with probability 1=2, not because that is the best
prediction we can make based on what we know, but because it is the best
prediction anyone can make, no matter how much they know. It is in this
sense that the outcome is truly random; it cannot be predicted with certainty
even when our knowledge is complete (in contrast to the pseudo randomness
that arises in classical physics because our knowledge is incomplete).
So how, now, are we to extract probabilities from Everett's deterministic
universe? Probabilities arise because we (a part of the system) cannot predict
our future with certainty. I know the formalism, I know the Hamiltonian and
wave function of the universe, I know my branch of the wave function. Now
I am about to look at the cat. A second from now, I will be either be certain
that the cat is dead or I will be certain that it is alive. Yet even with all I
know, I cannot predict the future. Even with complete knowledge about the
present, I cannot say what my state of knowledge will be after I look at the
cat. The best I can do is assign probabilities to the outcomes. So, while the
wave function of the universe is deterministic I, as a part of the system, can
do no better than making probabilistic predictions.
Of course, as already noted, decoherence is a crucial part of this story.
We may consistently assign probabilities to the alternatives Dead and Alive
only if there is no (or at least negligible) possibility of interference among the
alternatives. Probabilities make sense only when we can identify an exhaus-
tive set of mutually exclusive alternatives. Since the issue is really whether
interference might arise at a later time, we cannot decide whether probabil-
ity theory applies by considering a quantum state at a xed time; we must
examine a set of mutually exclusive (coarse-grained) histories, or sequences
of events. There is a sophisticated technology (\decoherence functionals")
for adjudicating whether the various histories decohere to a sucient extent
for probabilities to be sensibly assigned.
So the Everett viewpoint can be reconciled with the quantum indeter-
minism that we observe, but there is still a troubling gap in the picture, at
least as far as I can tell. I am about to look at the cat, and I know that the
density matrix a second from now will be
We can compute
h
x j 3j x i = 0 ;
(N ) (N )
h
x j 3j x i
(N ) 2 (N )
X (i) (j) (N )
= 12 h x(N )j
3 3 j x i
N ij
X
= N12 ij = NN2 = N1 : (3.186)
ij
3.7 Summary
POVM. If we restrict our attention to a subspace of a larger Hilbert space,
then an orthogonal (Von Neumann) measurement performed on the larger
space cannot in general be described as an orthogonal measurement on the
subspace. Rather, it is a generalized measurement or POVM { the outcome
a occurs with a probability
Prob(a) = tr (F a) ; (3.191)
where is the density matrix of the subsystem, each F a is a positive hermi-
tian operator, and the F a's satisfy
X
Fa = 1 : (3.192)
a
134 CHAPTER 3. MEASUREMENT AND EVOLUTION
A POVM in HA can be realized as a unitary transformation on the tensor
product HA HB , followed by an orthogonal measurement in HB .
Superoperator. Unitary evolution on HA HB will not in general
appear to be unitary if we restrict our attention to HA alone. Rather, evo-
lution in HA will be described by a superoperator, (which can be inverted
by another superoperator only if unitary). A general superoperator $ has an
operator-sum (Kraus) representation:
X
$ : ! $() = M M y ; (3.193)
where
X
M yM = 1 : (3.194)
In fact, any reasonable (linear and completely positive) mapping of density
matrices to density matrices has unitary and operator-sum representations.
Decoherence. Decoherence { the decay of quantum information due to
the interaction of a system with its environment { can be described by a
superoperator. If the environment frequently \scatters" o the system, and
the state of the environment is not monitored, then o -diagonal terms in the
density matrix of the system decay rapidly in a preferred basis (typically a
spatially localized basis selected by the nature of the coupling of the system
to the environment). The time scale for decoherence is set by the scattering
rate, which may be much larger than the damping rate for the system.
Master Equation. When the relevant dynamical time scale of an open
quantum system is long compared to the time for the environment to \for-
get" quantum information, the evolution of the system is e ectively local in
time (the Markovian approximation). Much as general unitary evolution is
generated by a Hamiltonian, a general Markovian superoperator is generated
by a Lindbladian L as described by the master equation:
_ L[] = ;i[H ; ] + X LLy ; 12 LyL ; 21 LyL :
(3.195)
Here each Lindblad operator (or quantum jump operator) represents a \quan-
tum jump" that could in principle be detected if we monitored the envi-
ronment faithfully. By solving the master equation, we can compute the
decoherence rate of an open system.
3.8. EXERCISES 135
3.8 Exercises
3.1 Realization of a POVM
Consider the POVM de ned by the four positive operators
P1 = 21 j "z ih"z j ; P2 = 12 j #z ih#z j
P3 = 12 j "xih"x j ; P4 = 21 j #xih#x j :
(3.196)
Show how this POVM can be realized as an orthogonal measurement
in a two-qubit Hilbert space, if one ancilla spin is introduced.
3.2 Invertibility of superoperators
The purpose of this exercise is to show that a superoperator is invertible
only if it is unitary. Recall that any superoperator has an operator-sum
representation; it acts on a pure state as
X
M(j ih j) = Mj ih jMy; (3.197)
where P MyM = 1. Another superoperator N is said to be the
inverse of M if N M = I , or
X
NaMj ih jMyNya = j ih j; (3.198)
;a
for any j i. It follows that
X
jh jNaMj ij2 = 1: (3.199)
;a
M2 = pp 21 (1 ; 3): (3.203)
3.8. EXERCISES 137
a) Find an alternative representation using only two Kraus operators
N0; N1.
b) Find a unitary 3 3 matrix Ua such that your Kraus operators
found in (a) (augmented by N2 = 0) are related to M0;1;2 by
M = UaNa: (3.204)
c) Consider a single-qubit channel with a unitary representation
q
j0iA j0iE ! 1 ; p j0iA j0iE + pp j0iA j 0iE
q
j1iA j0iE ! 1 ; p j1iA j0iE + pp j1iA j 1iE ;
(3.205)
where j 0iE and j 1iE are normalized states, both orthogonal to
j0iE , that satisfy
E h 0 j 1 iE = 1 ; "; 0 < " < 1: (3.206)
Show that this is again the phase-damping channel, and nd its
operator-sum representation with two Kraus operators.
d) Suppose that the channel in (c) describes what happens to the qubit
when a single photon scatters from it. Find the decoherence rate
;decoh in terms of the scattering rate ;scatt .
3.6 Decoherence on the Bloch sphere
Parametrize the density matrix of a single qubit as
= 12 1 + P~ ~ : (3.207)
a) Describe what happens to P~ under the action of the phase-damping
channel.
b) Describe what happens to P~ under the action of the amplitude-
damping channel de ned by the Kraus operators.
! pp !
1 0
M0 = 0 p1 ; p ; M1 = 0 0 : 0
(3.208)
138 CHAPTER 3. MEASUREMENT AND EVOLUTION
c) The same for the \two-Pauli channel:"
q r r
M0 = 1 ; p 1; M1 = p2 1; M2 = p2 3:
(3.209)
3.7 Decoherence of the damped oscillator
We saw in class that, for an oscillator that can emit quanta into a zero-
temperature reservoir, the interaction picture density matrix I (t) of
the oscillator obeys the master equation
1 1
_ I = ; aI a ; 2 a aI ; 2 I a a ;
y y y (3.210)
where a is the annihilation operator of the oscillator.
a) Consider the quantity
h i
X (; t) = tr I (t)eay e;a ; (3.211)
(where is a complex number). Use the master equation to derive
and solve a di erential equation for X (; t). You should nd
X (; t) = X (0 ; 0); (3.212)
where 0 is a function of ; ;; and t. What is this function
0(; ;; t)?
b) Now suppose that a \cat state" of the oscillator is prepared at t = 0:
jcati = p12 (j 1i + j 2i) ; (3.213)
where j i denotes the coherent state
j i = e;j j =2e ay j0i:
2
(3.214)
Use the result of (a) to infer the density matrix at a later time
t. Assuming ;t 1, at what rate do the o -diagonal terms in
decay (in this coherent state basis)?
Chapter 4
Quantum Entanglement
4.1 Nonseparability of EPR pairs
4.1.1 Hidden quantum information
The deep ways that quantum information di ers from classical information
involve the properties, implications, and uses of quantum entanglement. Re-
call from x2.4.1 that a bipartite pure state is entangled if its Schmidt number
is greater than one. Entangled states are interesting because they exhibit
correlations that have no classical analog. We will begin the study of these
correlations in this chapter.
Recall, for example, the maximally entangled state of two qubits de ned
in x3.4.1:
j+iAB = p12 (j00iAB + j11iAB ): (4.1)
\Maximally entangled" means that when we trace over qubit B to nd the
density operator A of qubit A, we obtain a multiple of the identity operator
A = trB (j+iAB AB h+ ) = 21 1A; (4.2)
(and similarly B = 21 1B ). This means that if we measure spin A along
any axis, the result is completely random { we nd spin up with probability
1/2 and spin down with probability 1=2. Therefore, if we perform any local
measurement of A or B , we acquire no information about the preparation of
the state, instead we merely generate a random bit. This situation contrasts
139
140 CHAPTER 4. QUANTUM ENTANGLEMENT
sharply with case of a single qubit in a pure state; there we can store a bit by
preparing, say, either j "n^ i or j #n^ i, and we can recover that bit reliably by
measuring along the n^-axis. With two qubits, we ought to be able to store
two bits, but in the state j+iAB this information is hidden; at least, we can't
acquire it by measuring A or B .
In fact, j+i is one member of a basis of four mutually orthogonal states
for the two qubits, all of which are maximally entangled | the basis
j i = p12 (j00i j11i);
j i = p12 (j01i j10i); (4.3)
introduced in x3.4.1. We can choose to prepare one of these four states, thus
encoding two bits in the state of the two-qubit system. One bit is the parity
bit (ji or j i) { are the two spins aligned or antialigned? The other is
the phase bit (+ or ;) { what superposition was chosen of the two states
of like parity. Of course, we can recover the information by performing
an orthogonal measurement that projects onto the fj+i; j; i; j +i; j ;ig
basis. But if the two qubits are distantly separated, we cannot acquire this
information locally; that is, by measuring A or measuring B .
What we can do locally is manipulate this information. Suppose that
Alice has access to qubit A, but not qubit B . She may apply 3 to her
qubit, ipping the relative phase of j0iA and j1iA . This action ips the phase
bit stored in the entangled state:
j+i $ j;i;
j +i $ j ;i: (4.4)
On the other hand, she can apply 1, which ips her spin (j0iA $ j1iA ), and
also ips the parity bit of the entangled state:
j+i $ j +i;
j;i $ ;j ;i: (4.5)
Bob can manipulate the entangled state similarly. In fact, as we discussed
in x2.4, either Alice or Bob can perform a local unitary transformation that
changes one maximally entangled state to any other maximally entangled
4.1. NONSEPARABILITY OF EPR PAIRS 141
state.1 What their local unitary transformations cannot do is alter A =
B = 21 1 { the information they are manipulating is information that neither
one can read.
But now suppose that Alice and Bob are able to exchange (classical)
messages about their measurement outcomes; together, then, they can learn
about how their measurements are correlated. The entangled basis states are
conveniently characterized as the simultaneous eigenstates of two commuting
observables:
(1A)(1B);
(3A) (3B); (4.6)
the eigenvalue of (3A)(3B) is the parity bit, and the eigenvalue of (1A)(1B) is
the phase bit. Since these operators commute, they can in principle be mea-
sured simultaneously. But they cannot be measured simultaneously if Alice
and Bob perform localized measurements. Alice and Bob could both choose
to measure their spins along the z-axis, preparing a simultaneous eigenstate
of 3 and 3 . Since 3 and 3 both commute with the parity operator
( A ) (B ) ( A) (B )
(3A)(3B), their orthogonal measurements do not disturb the(Aparity bit, and
they can combine their results to infer the parity bit. But 3 and (3B) do
)
not commute with phase operator (1A) (1B ), so their measurement disturbs
the phase bit. On the other hand, they could both choose to measure their
spins along the x-axis; then they would learn the phase bit at the cost of
disturbing the parity bit. But they can't have it both ways. To have hope of
acquiring the parity bit without disturbing the phase bit, they would need(to
learn about the product 3 3 without nding out anything about 3A)
(A) (B )
and (3B) separately. That cannot be done locally.
Now let us bring Alice and Bob together, so that they can operate on
their qubits jointly. How might they acquire both the parity bit and the
phase bit of their pair? By applying an appropriate unitary transformation,
they can rotate the entangled basis fji; j ig to the unentangled basis
fj00i; j01i; j10i; j11ig. Then they can measure qubits A and B separately to
acquire the bits they seek. How is this transformation constructed?
1But of course, this does not suce to perform an arbitrary unitary transformation on
the four-dimensional space HA HB , which contains states that are not maximally entan-
gled. The maximally entangled states are not a subspace { a superposition of maximally
entangled states typically is not maximally entangled.
142 CHAPTER 4. QUANTUM ENTANGLEMENT
This is a good time to introduce notation that will be used heavily later in
the course, the quantum circuit notation. Qubits are denoted by horizontal
lines, and the single-qubit unitary transformation U is denoted:
U
A particular single-qubit unitary we will nd useful is the Hadamard trans-
form
!
H = p2 1 ;1 = p12 (1 + 3);
1 1 1 (4.7)
which has the properties
H 2 = 1; (4.8)
and
H1H = 3;
H3H = 1: (4.9)
(We can envision H (up to an overall phase) as a = rotation about the
axis n^ = p12 (^n1 + n^3) that rotates x^ to z^ and vice-versa; we have
H u
i
(to be read from left to right) represents the product of H applied to the
rst qubit followed by CNOT with the rst bit as the source and the second
bit as the target. It is straightforward to see that this circuit transforms the
standard basis to the entangled basis,
j00i ! p12 (j0i + j1i)j0i ! j+i;
j01i ! p12 (j0i + j1i)j1i ! j +i;
j10i ! p12 (j0i ; j1i)j0i ! j;i;
j11i ! p12 (j0i ; j1i)j1i ! j ;i; (4.13)
so that the rst bit becomes the phase bit in the entangled basis, and the
second bit becomes the parity bit.
Similarly, we can invert the transformation by running the circuit back-
wards (since both CNOT and H square to the identity); if we apply the
inverted circuit to an entangled state, and then measure both bits, we can
learn the value of both the phase bit and the parity bit.
Of course, H acts on only one of the qubits; the \nonlocal" part of our
circuit is the controlled-NOT gate { this is the operation that establishes or
removes entanglement. If we could only perform an \interstellar CNOT,"
we would be able to create entanglement among distantly separated pairs, or
144 CHAPTER 4. QUANTUM ENTANGLEMENT
extract the information encoded in entanglement. But we can't. To do its
job, the CNOT gate must act on its target without revealing the value of
its source. Local operations and classical communication will not suce.
4.1.4 Photons
Experiments that test the Bell inequality are done with entangled photons,
not with spin; 12 objects. What are the quantum-mechanical predictions for
photons?
Suppose, for example, that an excited atom emits two photons that come
out back to back, with vanishing angular momentum and even parity. If jxi
and jyi are horizontal and vertical linear polarization states of the photon,
4.1. NONSEPARABILITY OF EPR PAIRS 149
then we have seen that
j+i = p12 (jxi + ijyi);
j;i = p1 (ijxi + jyi); (4.23)
2
are the eigenstates of helicity (angular momentum along the axis of propaga-
tion z^. For two photons, one propagating in the +^z direction, and the other
in the ;z^ direction, the states
j+iA j;iB
j;iA j+iB (4.24)
are invariant under rotations about z^. (The photons have opposite values of
Jz , but the same helicity, since they are propagating in opposite directions.)
Under a re ection in the y ; z plane, the polarization states are modi ed
according to
jxi ! ;jxi; j+i ! +ij;i;
jyi ! jyi; j;i ! ;ij+i; (4.25)
therefore, the parity eigenstates are entangled states
p1 (j+iAj;iB j;iAj+iB ): (4.26)
2
The state with Jz = 0 and even parity, then, expressed in terms of the linear
polarization states, is
; pi2 (j+iA j;iB + j;iA j+iB )
= p1 (jxxiAB + jyyiAB )n = j+iAB : (4.27)
2
Because of invariance under rotations about z^, the state has this form irre-
spective of how we orient the x and y axes.
We can use a polarization analyzer to measure the linear polarization of
either photon along any axis in the xy plane. Let jx()i and jy()i denote
150 CHAPTER 4. QUANTUM ENTANGLEMENT
the linear polarization eigenstates along axes rotated by angle relative to
the canonical x and y axes. We may de ne an operator (the analog of ~ n^)
() = jx()ihx()j ; jy()ihy()j; (4.28)
which has these polarization states as eigenstates with respective eigenvalues
1. Since
! !
cos
jx()i = sin ; jy()i = cos ; ; sin (4.29)
in the jxi; jyi basis, we can easily compute the expectation value
AB h
+ j (A) (
1) (B)(2)j+iAB : (4.30)
Using rotational invariance:
= AB h+ j (A)(0) (B)(2 ; 1)j+iAB
= 12 hxj (B)(2 ; 1)jxiB ; 12 hyj (B)(2 ; 1)jyiB
B B
= cos2(2 ; 1) ; sin2(2 ; 1) = cos[2(2 ; 1)]: (4.31)
(For spin- 12 objects, we would obtain
AB h j( ~ n1
+ (A) ^ )( ~ (B ) n
^ 2) = n^ 1 n^2 = cos(2 ; 1); (4.32)
the argument of the cosine is di erent than in the case of photons, because
the half angle =2 appears in the formula analogous to eq. (4.29).)
!(B)
( B ) cos 2 sin
b = ( n^2) = sin 2 ; cos 2 ; 2 (4.61)
so that quantum mechanics predicts
habi = hjabji
= cos 1 cos 2 + 2 sin 1 sin 2 (4.62)
p
(and we recover cos(1 ; 2) in the maximally entangled case = = 1= 2).
Now let us consider, for simplicity, the (nonoptimal!) special case
A = 0; A0 = 2 ; B0 = ;B ; (4.63)
so that the quantum predictions are:
habi = cos B = hab0i
ha0bi = 2 sin B = ;ha0b0i (4.64)
Plugging into the CHSH inequality, we obtain
j cos B ; 2 sin B j 1; (4.65)
and we easily see that violations occur for B close to 0 or . Expanding to
linear order in B , the left hand side is
' 1 ; 2 B ; (4.66)
which surely exceeds 1 for B negative and small.
We have shown, then, that any entangled pure state of two qubits violates
some Bell inequality. It is not hard to generalize the argument to an arbitrary
bipartite pure state. For bipartite pure states, then, \entangled" is equivalent
to \Bell-inequality violating." For bipartite mixed states, however, we will
see shortly that the situation is more subtle.
156 CHAPTER 4. QUANTUM ENTANGLEMENT
4.2 Uses of Entanglement
After Bell's work, quantum entanglement became a subject of intensive study
among those interested in the foundations of quantum theory. But more
recently (starting less than ten years ago), entanglement has come to be
viewed not just as a tool for exposing the weirdness of quantum mechanics,
but as a potentially valuable resource. By exploiting entangled quantum
states, we can perform tasks that are otherwise dicult or impossible.
= H (X; Y ) ; H (Y ); (5.16)
and similarly
H (Y jX ) h; log p(yjx)i
!
p ( x; y )
= h; log p(x) i = H (X; Y ) ; H (X ): (5.17)
We may interpret H (X jY ), then, as the number of additional bits per letter
needed to specify both x and y once y is known. Obviously, then, this quantity
cannot be negative.
The information about X that I gain when I learn Y is quanti ed by how
much the number of bits per letter needed to specify X is reduced when Y is
known. Thus is
I (X ; Y ) H (X ) ; H (X jY )
= H (X ) + H (Y ) ; H (X; Y )
= H (Y ) ; H (Y jX ): (5.18)
I (X ; Y ) is called the mutual information. It is obviously symmetric under
interchange of X and Y ; I nd out as much about X by learning Y as about Y
5.1. SHANNON FOR DUMMIES 173
by learning X . Learning Y can never reduce my knowledge of X , so I (X ; Y )
is obviously nonnegative. (The inequalities H (X ) H (X jY ) 0 are easily
proved using the convexity of the log function; see for example Elements of
Information Theory by T. Cover and J. Thomas.)
Of course, if X and Y are completely uncorrelated, we have p(x; y) =
p(x)p(y), and
C is called the channel capacity and depends only on the conditional proba-
bilities p(yjx) that de ne the channel.
We have now shown that any rate R < C is attainable, but is it possible
for R to exceed C (with the error probability still approaching 0 for large
n)? To show that C is an upper bound on the rate may seem more subtle
in the general case than for the binary symmetric channel { the probability
of error is di erent for di erent letters, and we are free to exploit this in the
design of our code. However, we may reason as follows:
Suppose we have chosen 2nR strings of n letters as our codewords. Con-
sider a probability distribution (denoted X~ n ) in which each codeword occurs
with equal probability (= 2;nR ). Evidently, then,
H (X~ n ) = nR: (5.33)
Sending the codewords through the channel we obtain a probability distri-
bution Y~ n of output states.
Because we assume that the channel acts on each letter independently,
the conditional probability for a string of n letters factorizes:
p(y1y2 ynjx1x2 xn ) = p(y1jx1)p(y2jx2) p(yn jxn);
(5.34)
and it follows that the conditional entropy satis es
X
H (Y~ n jX~ n) = h; log p(ynjxn)i = h; log p(yijxi)i
Xi ~ ~
= H (Yi jXi ); (5.35)
i
178 CHAPTER 5. QUANTUM INFORMATION THEORY
where X~i and Y~i are the marginal probability distributions for the ith letter
determined by our distribution on the codewords. Recall that we also know
that H (X; Y ) H (X ) + H (Y ), or
X
H (Y~ n ) H (Y~i): (5.36)
i
It follows that
I (Y~ n ; X~ n) = H (Y~ n ) ; H (Y~ njX~ n )
X
(H (Y~i) ; H (Y~i jX~i))
Xi ~ ~
= I (Yi; Xi) nC ; (5.37)
i
the mutual information of the messages sent and received is bounded above
by the sum of the mutual information per letter, and the mutual information
for each letter is bounded above by the capacity (because C is de ned as the
maximum of I (X ; Y )).
Recalling the symmetry of mutual information, we have
I (X~ n; Y~ n) = H (X~ n ) ; H (X~ n jY~ n )
= nR ; H (X~ n jY~ n) nC: (5.38)
Now, if we can decode reliably as n ! 1, this means that the input code-
word is completely determined by the signal received, or that the conditional
entropy of the input (per letter) must get small
1 H (X~ n jY~ n ) ! 0: (5.39)
n
If errorless transmission is possible, then, eq. (5.38) becomes
R C; (5.40)
in the limit n ! 1. The rate cannot exceed the capacity. (Remember that
the conditional entropy, unlike the mutual information, is not symmetric.
Indeed (1=n)H (Y~ n jX~ n ) does not become small, because the channel intro-
duces uncertainty about what message will be received. But if we can decode
accurately, there is no uncertainty about what codeword was sent, once the
signal has been received.)
5.2. VON NEUMANN ENTROPY 179
We have now shown that the capacity C is the highest rate of communi-
cation through the noisy channel that can be attained, where the probability
of error goes to zero as the number of letters in the message goes to in nity.
This is Shannon's noisy channel coding theorem.
Of course the method we have used to show that R = C is asymptotically
attainable (averaging over random codes) is not very constructive. Since a
random code has no structure or pattern, encoding and decoding would be
quite unwieldy (we require an exponentially large code book). Nevertheless,
the theorem is important and useful, because it tells us what is in principle
attainable, and furthermore, what is not attainable, even in principle. Also,
since I (X ; Y ) is a concave function of X = fx; p(x)g (with fp(yjx)g xed),
it has a unique local maximum, and C can often be computed (at least
numerically) for channels of interest.
where E 0 denotes the orthogonal projection onto the subspace 0. The aver-
age delity therefore obeys
X X
F = pi Fi pih'i jE 0j'ii = tr(nE 0): (5.102)
i i
But since E 0 projects onto a space of dimension 2n(S;) ; tr(n E 0) can be no
larger than the sum of the 2n(S;) largest eigenvalues of n . It follows from
the properties of typical subspaces that this sum becomes as small as we
please; for n large enough
F tr(nE 0) < ": (5.103)
Thus we have shown that, if we attempt to compress to S ; qubits per
letter, then the delity inevitably becomes poor for n suciently large. We
conclude then, that S () qubits per letter is the optimal compression of the
quantum information that can be attained if we are to obtain good delity as
n goes to in nity. This is Schumacher's noiseless quantum coding theorem.
The above argument applies to any conceivable encoding scheme, but only
to a restricted class of decoding schemes (unitary decodings). A more general
decoding scheme can certainly be contemplated, described by a superoperator.
More technology is then required to prove that better compression than S
194 CHAPTER 5. QUANTUM INFORMATION THEORY
qubits per letter is not possible. But the conclusion is the same. The point is
that n(S ; ) qubits are not sucient to distinguish all of the typical states.
To summarize, there is a close analogy between Shannon's noiseless cod-
ing theorem and Schumacher's noiseless quantum coding theorem. In the
classical case, nearly all long messages are typical sequences, so we can code
only these and still have a small probability of error. In the quantum case,
nearly all long messages have nearly unit overlap with the typical subspace,
so we can code only the typical subspace and still achieve good delity.
In fact, Alice could send e ectively classical information to Bob|the
string x1x2 xn encoded in mutually orthogonal quantum states|and Bob
could then follow these classical instructions to reconstruct Alice's state.
By this means, they could achieve high- delity compression to H (X ) bits|
or qubits|per letter. But if the letters are drawn from an ensemble of
nonorthogonal pure states, this amount of compression is not optimal; some
of the classical information about the preparation of the state has become re-
dundant, because the nonorthogonal states cannot be perfectly distinguished.
Thus Schumacher coding can go further, achieving optimal compression to
S () qubits per letter. The information has been packaged more eciently,
but at a price|Bob has received what Alice intended, but Bob can't know
what he has. In contrast to the classical case, Bob can't make any measure-
ment that is certain to decipher Alice's message correctly. An attempt to
read the message will unavoidably disturb it.
a spin- 21 object points in one of three directions that are symmetrically dis-
tributed in the xz-plane. Each state has a priori probability 31 . Evidently,
Alice's \signal states" are nonorthogonal:
h'1j'2i = h'1j'3i = h'2 j'3i = ; 12 : (5.150)
Bob's task is to nd out as much as he can about what Alice prepared by
making a suitable measurement. The density matrix of Alice's ensemble is
= 13 (j'1ih'1 j + j'2ih'3 j + j'3ih'3j) = 21 1; (5.151)
which has S () = 1. Therefore, the Holevo bound tells us that the mutual
information of Alice's preparation and Bob's measurement outcome cannot
exceed 1 bit.
In fact, though, the accessible information is considerably less than the
one bit allowed by the Holevo bound. In this case, Alice's ensemble has
enough symmetry that it is not hard to guess the optimal measurement.
Bob may choose a POVM with three outcomes, where
F a = 32 (1 ; j'aih'aj); a = 1; 2; 3; (5.152)
we see that
(
p(ajb) = h'b jF aj'bi = 01 aa = 6
=
b;
b: (5.153)
2
206 CHAPTER 5. QUANTUM INFORMATION THEORY
Therefore, the measurement outcome a excludes the possibility
1 that Alice
prepared a, but leaves equal a posteriori probabilities p = 2 for the other
two states. Bob's information gain is
I = H (X ) ; H (X jY ) = log2 3 ; 1 = :58496: (5.154)
To show that this measurement is really optimal, we may appeal to a variation
on a theorem of Davies, which assures us that an optimal POVM can be
chosen with three F a's that share the same three-fold symmetry as the three
states in the input ensemble. This result restricts the possible POVM's
enough so that we can check that eq. (5.152) is optimal with an explicit
calculation. Hence we have found that the ensemble E = fj'ai; pa = 13 g has
accessible information.
Acc(E ) = log2 32 = :58496::: (5.155)
The Holevo bound is not saturated.
Now suppose that Alice has enough cash so that she can a ord to send
two qubits to Bob, where again each qubit is drawn from the ensemble E .
The obvious thing for Alice to do is prepare one of the nine states
j'aij'bi; a; b = 1; 2; 3; (5.156)
each with pab = 1=9. Then Bob's best strategy is to perform the POVM
eq. (5.152) on each of the two qubits, achieving a mutual information of
.58496 bits per qubit, as before.
But Alice and Bob are determined to do better. After discussing the
problem with A. Peres and W. Wootters, they decide on a di erent strategy.
Alice will prepare one of three two-qubit states
jai = j'aij'ai; a = 1; 2; 3; (5.157)
each occurring with a priori probability pa = 1=2. Considered one-qubit at
a time, Alice's choice is governed by the ensemble E , but now her two qubits
have (classical) correlations { both are prepared the same way.
The three jai's are linearly independent, and so span a three-dimensional
subspace of the four-dimensional two-qubit Hilbert space. In a homework
exercise, you will show that the density matrix
X3 !
1
= 3 jaiha j ; (5.158)
a=1
5.4. ACCESSIBLE INFORMATION 207
has the nonzero eigenvalues 1=2; 1=4; 1=4, so that
S () = ; 21 log 12 ; 2 41 log 14 = 23 : (5.159)
The Holevo bound requires that the accessible information per qubit is less
than 3=4 bit. This would at least be consistent with the possibility that we
can exceed the .58496 bits per qubit attained by the nine-state method.
Naively, it may seem that Alice won't be able to convey as much clas-
sical information to Bob, if she chooses to send one of only three possible
states instead of nine. But on further re ection, this conclusion is not obvi-
ous. True, Alice has fewer signals to choose from, but the signals are more
distinguishable; we have
ha jbi = 14 ; a 6= b; (5.160)
instead of eq. (5.150). It is up to Bob to exploit this improved distinguishabil-
ity in his choice of measurement. In particular, Bob will nd it advantageous
to perform collective measurements on the two qubits instead of measuring
them one at a time.
It is no longer obvious what Bob's optimal measurement will be. But Bob
can invoke a general procedure that, while not guaranteed optimal, is usually
at least pretty good. We'll call the POVM constructed by this procedure a
\pretty good measurement" (or PGM).
Consider some collection of vectors j~ ai that are not assumed to be or-
thogonal or normalized. We want to devise a POVM that can distinguish
these vectors reasonably well. Let us rst construct
G = X j~ aih~ a j; (5.161)
a
This is a positive operator on the space spanned by the j~ ai's. Therefore, on
that subspace, G has an inverse, G;1 and that inverse has a positive square
root G;1=2. Now we de ne
F a = G;1=2j~ aih~ a jG;1=2; (5.162)
and we see that
X X ~ ~ ! ;1=2
Fa = G ; 1 = 2 jaiha j G
a a
= G GG;1=2 = 1;
; 1 = 2 (5.163)
208 CHAPTER 5. QUANTUM INFORMATION THEORY
on the span of the j~ ai's. If necessary, we can augment these F a's with one
more positive operator, the projection F 0 onto the orthogonal complement
of the span of the j~ ai's, and so construct a POVM. This POVM is the PGM
associated with the vectors j~ ai.
In the special case where the j~ ai's are orthogonal,
q
~
jai = ajai; (5.164)
(where the jai's are orthonormal), we have
F a = X (jb i;b 1=2hb j)(ajaihaj)(jc i;c 1=2hc j)
a;b;c
= jaihaj; (5.165)
this is the orthogonal measurement that perfectly distinguishes the jai's
and so clearly is optimal. If the j~ ai's are linearly independent but not
orthogonal, then the PGM is again an orthogonal measurement (because n
one-dimensional operators in an n-dimensional space can constitute a POVM
only if mutually orthogonal), but in that case the measurement may not be
optimal.
In the homework, you'll construct the PGM for the vectors jai in eq. (5.157),
and you'll show that
!2
1 1
p(aja) = ha jF ajai = 3 1 + p = :971405
2
!2
1 1
p(bja) = ha jF bjai = 6 1 ; p = :0142977;
2 (5.166)
(for b 6= a). It follows that the conditional entropy of the input is
H (X jY ) = :215893; (5.167)
and since H (X ) = log2 3 = 1:58496, the information gain is
I = H (X ) ; H (X jY ) = 1:36907; (5.168)
a mutual information of .684535 bits per qubit. Thus, the improved dis-
tinguishability of Alice's signals has indeed paid o { we have exceeded the
5.4. ACCESSIBLE INFORMATION 209
.58496 bits that can be extracted from a single qubit. We still didn't saturate
the Holevo bound (I < 1:5 in this case), but we came a lot closer than before.
This example, rst described by Peres and Wootters, teaches some useful
lessons. First, Alice is able to convey more information to Bob by \pruning"
her set of codewords. She is better o choosing among fewer signals that
are more distinguishable than more signals that are less distinguishable. An
alphabet of three letters encodes more than an alphabet of nine letters.
Second, Bob is able to read more of the information if he performs a
collective measurement instead of measuring each qubit separately. His opti-
mal orthogonal measurement projects Alice's signal onto a basis of entangled
states.
The PGM described here is \optimal" in the sense that it gives the best
information gain of any known measurement. Most likely, this is really the
highest I that can be achieved with any measurement, but I have not proved
it.
5.7 Exercises
5.1 Distinguishing nonorthogonal states.
Alice has prepared a single qubit in one of the two (nonorthogonal)
states
1 !
cos 2
jui = 0 ; jvi = sin ; (5.222)
2
where 0 < < . Bob knows the value of , but he has no idea whether
Alice prepared jui or jvi, and he is to perform a measurement to learn
what he can about Alice's preparation.
Bob considers three possible measurements:
a) An orthogonal measurement with
E 1 = juihuj; E 2 = 1 ; juihuj: (5.223)
(In this case, if Bob obtains outcome 2, he knows that Alice must have
prepared jvi.)
b) A three-outcome POVM with
F 1 = A(1 ; juihuj); F 2 = A(1 ; jvihvj)
226 CHAPTER 5. QUANTUM INFORMATION THEORY
F 3 = (1 ; 2A)1 + A(juihuj + jvihvj); (5.224)
where A has the largest value consistent with positivity of F 3. (In
this case, Bob determines the preparation unambiguously if he obtains
outcomes 1 or 2, but learns nothing from outcome 3.)
c) An orthogonal measurement with
E 1 = jwihwj; E 2 = 1 ; jwihwj; (5.225)
where
0 h 1 i 1
cos +
jwi = @ h 12 2 2 i A : (5.226)
sin 2 2 + 2
(In this case E 1 and E 2 are projections onto the spin states that are ori-
ented in the x ; z plane normal to the axis that bisects the orientations
of jui and jvi.)
Find Bob's average information gain I () (the mutual information of
the preparation and the measurement outcome) in all three cases, and
plot all three as a function of . Which measurement should Bob
choose?
5.2 Relative entropy.
The relative entropy S (j) of two density matrices and is de ned
by
S (j) = tr(log ; log ): (5.227)
You will show that S (j) is nonnegative, and derive some conse-
quences of this property.
a) A di erentiable real-valued function of a real variable is concave if
f (y) ; f (x) (y ; x)f 0(x); (5.228)
for all x and y. Show that if a and b are observables, and f is concave,
then
tr(f (b) ; f (a)) tr[(b ; a)f 0(a)]: (5.229)
5.7. EXERCISES 227
b) Show that f (x) = ;x log x is concave for x > 0.
c) Use (a) and (b) to show S (j) 0 for any two density matrices and
.
d) Use nonnegativity of S (j) to show that if has its support on a space
of dimension D, then
S () log D: (5.230)
e) Use nonnegativity of relative entropy to prove the subadditivity of entropy
S (AB ) S (A ) + S (B ): (5.231)
[Hint: Consider the relative entropy of A B and AB .]
f) Use subadditivity to prove the concavity of the entropy:
X X
S ( ii) iS (i); (5.232)
i i
where the i's are positive real numbers summing to one. [Hint: Apply
subadditivity to
AB = X i (i)A (jeiiheij)B : ] (5.233)
i
g) Use subadditivity to prove the triangle inequality (also called the Araki-
Lieb inequality):
S (AB ) jS (A) ; S (B )j: (5.234)
[Hint: Consider a puri cation of AB ; that is, construct a pure state
j i such that AB = trC j ih j. Then apply subadditivity to BC .]
5.3 Lindblad{Uhlmann monotonicity.
According to a theorem proved by Lindblad and by Uhlmann, relative
entropy on HA HB has a property called monotonicity:
S (AjA ) S (AB jAB ); (5.235)
The relative entropy of two density matrices on a system AB cannot
be less than the induced relative entropy on the subsystem A.
228 CHAPTER 5. QUANTUM INFORMATION THEORY
a) Use Lindblad-Uhlmann monotonicity to prove the strong subadditivity
property of the Von Neumann entropy. [Hint: On a tripartite system
ABC , consider the relative entropy of ABC and A BC .]
b) Use Lindblad{Uhlmann monotonicity to show that the action of a super-
operator cannot increase relative entropy, that is,
S ($j$) S (j); (5.236)
Where $ is any superoperator (completely positive map). [Hint: Recall
that any superoperator has a unitary representation.]
c) Show that it follows from (b) that a superoperator cannot increase the
Holevo information of an ensemble E = fx; px g of mixed states:
($(E )) (E ); (5.237)
where
X ! X
(E ) = S px x ; px S (x): (5.238)
x x
5.4 The Peres{Wootters POVM.
Consider the Peres{Wootters information source described in x5.4.2 of
the lecture notes. It prepares one of the three states
jai = j'aij'ai; a = 1; 2; 3; (5.239)
each occurring with a priori probability 13 , where the j'ai's are de ned
in eq. (5.149).
a) Express the density matrix
X !
1
= 3 jaiha j ; (5.240)
a
in terms of the Bell basis of maximally entangled states fji; j ig,
and compute S ().
b) For the three vectors jai; a = 1; 2; 3, construct the \pretty good mea-
surement" de ned in eq. (5.162). (Again, expand the jai's in the Bell
basis.) In this case, the PGM is an orthogonal measurement. Express
the elements of the PGM basis in terms of the Bell basis.
5.7. EXERCISES 229
c) Compute the mutual information of the PGM outcome and the prepara-
tion.
5.5 Teleportation with mixed states.
An operational way to de ne entanglement is that an entangled state
can be used to teleport an unknown quantum state with better delity
than could be achieved with local operations and classical communica-
tion only. In this exercise, you will show that there are mixed states
that are entangled in this sense, yet do not violate any Bell inequality.
Hence, for mixed states (in contrast to pure states) \entangled" and
\Bell-inequality-violating" are not equivalent.
Consider a \noisy" entangled pair with density matrix.
() = (1 ; )j ;ih ; j + 41 1: (5.241)
a) Find the delity F that can be attained if the state () is used to teleport
a qubit from Alice to Bob. [Hint: Recall that you showed in an earlier
exercise that a \random guess" has delity F = 12 .]
b) For what values of is the delity found in (a) better than what can be
achieved if Alice measures her qubit and sends a classical message to
Bob? [Hint: Earlier, you showed that F = 2=3 can be achieved if Alice
measures her qubit. In fact this is the best possible F attainable with
classical communication.]
c) Compute
Prob("n^ "m^ ) tr (E A(^n)E B (m^ )()) ; (5.242)
where E A(^n) is the projection of Alice's qubit onto j "n^ i and E B (m^ )
is the projection of Bob's qubit onto j "m^ i.
d) Consider the case = 1=2. Show that in this case the state () violates
no Bell inequalities. Hint: It suces to construct a local hidden variable
model that correctly reproduces the spin correlations found in (c), for
= 1=2. Suppose that the hidden variable ^ is uniformly distributed
on the unit sphere, and that there are functions fA and fB such that
ProbA ("n^ ) = fA(^ n^); ProbB ("m^ ) = fB (^ m^ ):
(5.243)
230 CHAPTER 5. QUANTUM INFORMATION THEORY
The problem is to nd fA and fB (where 0 fA;B 1) with the
properties
Z Z
fA
^ Z
(^ n
^ ) = 1 =2 ; ^
fB (^ m^ ) = 1=2;
fA (^ n^)fB (^ m^ ) = Prob("n^ "m^ ): (5.244)
^
Chapter 6
Quantum Computation
6.1 Classical Circuits
The concept of a quantum computer was introduced in Chapter 1. Here we
will specify our model of quantum computation more precisely, and we will
point out some basic properties of the model. But before we explain what a
quantum computer does, perhaps we should say what a classical computer
does.
Then
f (x) = f (1)(x) _ f (2)(x) _ f (3)(x) _ : : : : (6.5)
f is the logical OR (_) of all the f (a)'s. In binary arithmetic the _ operation
of two bits may be represented
x _ y = x + y ; x y; (6.6)
it has the value 0 if x and y are both zero, and the value 1 otherwise.
Now consider the evaluation of f (a). In the case where x(a) = 111 : : : 1,
we may write
f (a)(x) = x1 ^ x2 ^ x3 : : : ^ xn; (6.7)
it is the logical AND (^) of all n bits. In binary arithmetic, the AND is the
product
x ^ y = x y: (6.8)
For any other x(a); f (a) is again obtained as the AND of n(abits, but where the
NOT (:) operation is rst applied to each xi such that xi = 0; for example
)
x s x
y g xy
This gate ips the second bit if the rst is 1, and does nothing if the rst bit
is 0 (hence the name controlled-NOT). Its square is trivial, that is, it inverts
itself. Of course, this gate performs a NOT on the second bit if the rst bit
is set to 1, and it performs the copy operation if y is initially set to zero:
XOR : (x; 0) 7! (x; x): (6.34)
With the circuit
x s gs y
y gs g x
it ips the third bit if the rst two are 1 and does nothing otherwise. Like
the XOR gate, it is its own inverse.
Unlike the reversible 2-bit gates, the To oli gate serves as a universal gate
for Boolean logic, if we can provide xed input bits and ignore output bits.
If z is initially 1, then x " y = 1 ; xy appears in the third output | we can
perform NAND. If we x x = 1, the To oli gate functions like an XOR gate,
and we can use it to copy.
The To oli gate (3) is universal in the sense that we can build a circuit to
compute any reversible function using To oli gates alone (if we can x input
bits and ignore output bits). It will be instructive to show this directly,
without relying on our earlier argument that NAND/NOT is universal for
Boolean functions. In fact, we can show the following: From the NOT gate
6.1. CLASSICAL CIRCUITS 243
and the To oli gate (3), we can construct any invertible function on n bits,
provided we have one extra bit of scratchpad space available.
The rst step is to show that from the three-bit To oli-gate (3) we can
construct an n-bit To oli gate (n) that acts as
(x1; x2; : : : xn;1; y) ! (x1; x2; : : : ; xn;1y x1x2 : : : xn;1):
(6.40)
The construction requires one extra bit of scratch space. For example, we
construct (4) from (3)'s with the circuit
x1 s s x1
x2 s s x2
0 gs g 0
x3 s x3
y g y x1x2x3
The purpose of the last (3) gate is to reset the scratch bit back to its original
value zero. Actually, with one more gate we can obtain an implementation
of (4) that works irrespective of the initial value of the scratch bit:
x1 s s x1
x2 s s x2
w gs gs w
x3 s s x3
y g g y x1x2x3
Again, we can eliminate the last gate if we don't mind ipping the value of
the scratch bit.
We can see that the scratch bit really is necessary, because (4) is an odd
permutation (in fact a transposition) of the 24 4-bit strings | it transposes
1111 and 1110. But (3) acting on any three of the four bits is an even
permutation; e.g., acting on the last three bits it transposes 0111 with 0110,
244 CHAPTER 6. QUANTUM COMPUTATION
and 1111 with 1110. Since a product of even permutations is also even, we
cannot obtain (4) as a product of (3)'s that act on four bits only.
The construction of (4) from four (3)'s generalizes immediately to the
construction of (n) from two (n;1)'s and two (3)'s (just expand x1 to several
control bits in the above diagram). Iterating the construction, we obtain (n)
from a circuit with 2n;2 +2n;3 ; 2 (3)'s. Furthermore, just one bit of scratch
space is sucient.2) (When we need to construct (k), any available extra
bit will do, since the circuit returns the scratch bit to its original value. The
next step is to note that, by conjugating (n) with NOT gates, we can in
e ect modify the value of the control string that \triggers" the gate. For
example, the circuit
x1 gs g
x2 s
x3 gs g
y g
ips the value of y if x1x2x3 = 010, and it acts trivially otherwise. Thus
this circuit transposes the two strings 0100 and 0101. In like fashion, with
(n) and NOT gates, we can devise a circuit that transposes any two n-bit
strings that di er in only one bit. (The location of the bit where they di er
is chosen to be the target of the (n) gate.)
But in fact a transposition that exchanges any two n-bit strings can be
expressed as a product of transpositions that interchange strings that di er
in only one bit. If a0 and as are two strings that are Hamming distance s
apart (di er in s places), then there is a chain
a0; a1; a2; a3; : : : ; as; (6.41)
such that each string in the chain is Hamming distance one from its neighbors.
Therefore, each of the transpositions
(a0a1); (a1a2); (a2a3); : : : (as;1as); (6.42)
2 With more scratch space, we can build (n) from (3) 's much more eciently | see
the exercises.
6.1. CLASSICAL CIRCUITS 245
can be implemented as a (n) gate conjugated by NOT gates. By composing
transpositions we nd
(a0as) = (as;1as)(as;2as;1) : : : (a2a3)(a1a2)(a0a1)(a1a2)(a2a3)
: : : (as;2 as;1)(as;1as); (6.43)
we can construct the Hamming-distance-s transposition from 2s;1 Hamming-
distance-one transpositions. It follows that we can construct (a0as) from
(n)'s and NOT gates.
Finally, since every permutation is a product of transpositions, we have
shown that every invertible function on n-bits (every permutation on n-bit
strings) is a product of (3)'s and NOT's, using just one bit of scratch space.
Of course, a NOT can be performed with a (3) gate if we x two input
bits at 1. Thus the To oli gate (3) is universal for reversible computation,
if we can x input bits and discard output bits.
builds the Fredkin gate from four switch gates (two running forward and two
running backward). Time delays needed to maintain synchronization are not
explicitly shown.
In the billiard ball computer, the switch gate is constructed with two
re ectors, such that (in the case x = y = 1) two moving balls collide twice.
The trajectories of the balls in this case are:
6.1. CLASSICAL CIRCUITS 247
A ball labeled x emerges from the gate along the same trajectory (and at the
same time) regardless of whether the other ball is present. But for x = 1, the
position of the other ball (if present) is shifted down compared to its nal
position for x = 0 | this is a switch gate. Since we can perform a switch
gate, we can construct a Fredkin gate, and implement universal reversible
logic with a billiard ball computer.
An evident weakness of the billiard-ball scheme is that initial errors in the
positions and velocities of the ball will accumulate rapidly, and the computer
will eventually fail. As we noted in Chapter 1 (and Landauer has insistently
pointed out) a similar problem will aict any proposed scheme for dissipa-
tionless computation. To control errors we must be able to compress the
phase space of the device, which will necessarily be a dissipative process.
6.2.1 Accuracy
Let's discuss the issue of accuracy. We imagine that we wish to implement
a computation in which the quantum gates U 1; U 2; : : : ; U T are applied se-
quentially to the initial state j'0i. The state prepared by our ideal quantum
circuit is
j'T i = U T U T ;1 : : : U 2U 1j'0i: (6.60)
But in fact our gates do not have perfect accuracy. When we attempt to ap-
ply the unitary transformation U t, we instead apply some \nearby" unitary
transformation U~ t. (Of course, this is not the most general type of error that
we might contemplate { the unitary U t might be replaced by a superoperator.
Considerations similar to those below would apply in that case, but for now
we con ne our attention to \unitary errors.")
The errors cause the actual state of the computer to wander away from
the ideal state. How far does it wander? Let j'ti denote the ideal state after
t quantum gates are applied, so that
j'ti = U tj't;1i: (6.61)
But if we apply the actual transformation U~ t, then
U~ tj't;1i = j'ti + jEti; (6.62)
where
jEti = (U~ t ; U t)j't;1i; (6.63)
is an unnormalized vector. If j'~ti denotes the actual state after t steps, then
we have
j'~1i = j'1i + jE1i;
j'~2i = U~ 2j'~1i = j'2i + jE2i + U~ 2jE1i; (6.64)
6.2. QUANTUM CIRCUITS 255
and so forth; we ultimately obtain
j'~T i = j'T i + jET i + U~ T jET ;1i + U~ T U~ T ;1jET ;2i
+ : : : + U~ T U~ T ;1 : : : U~ 2jE1i: (6.65)
Thus we have expressed the di erence between j'~T i and j'T i as a sum of T
remainder terms. The worst case yielding the largest deviation of j'~T i from
j'T i occurs if all remainder terms line up in the same direction, so that the
errors interfere constructively. Therefore, we conclude that
k j'~T i ; j'T i k k jET i k + k jET ;1i k
+ : : : + k jE2i k + k jE1i k; (6.66)
where we have used the property k U jEii k=k jEii k for any unitary U .
Let k A ksup denote the sup norm of the operator A | that is, the
maximum modulus of an eigenvalue of A. We then have
k jEti k=k U~ t ; U t j't;1i kk U~ t ; U t ksup (6.67)
(since j't;1i is normalized). Now suppose that, for each value of t, the error
in our quantum gate is bounded by
k U~ t ; U t ksup< ": (6.68)
Then after T quantum gates are applied, we have
k j'~T i ; j'T i k< T"; (6.69)
in this sense, the accumulated error in the state grows linearly with the length
of the computation.
The distance bounded in eq. (6.68) can equivalently be expressed as k
W t ; 1 ksupi, where W t = U~ tU yt . Since W t is unitary, each of its eigenvalues
is a phase e , and the corresponding eigenvalue of W t ; 1 has modulus
jei ; 1j = (2 ; 2 cos )1=2; (6.70)
so that eq. (6.68) is the requirement that each eigenvalue satis es
cos > 1 ; "2=2; (6.71)
256 CHAPTER 6. QUANTUM COMPUTATION
<", for " small). The origin of eq. (6.69) is clear. In each time step,
(or jj
j'~i rotates relative to j'i by (at worst) an angle of order ", and the distance
between the vectors increases by at most of order ".
How much accuracy is good enough? In the nal step of our computation,
we perform an orthogonal measurement, and the probability of outcome a,
in the ideal case, is
P (a) = jhaj'T ij2: (6.72)
Because of the errors, the actual probability is
P~ (a) = jhaj'~T ij2: (6.73)
If the actual vector is close to the ideal vector, then the probability distribu-
tions are close, too. If we sum over an orthonormal basis fjaig, we have
X ~
jP (a) ; P (a)j 2 k j'~T i ; j'T i k; (6.74)
a
as you will show in a homework exercise. Therefore, if we keep T" xed (and
small) as T gets large, the error in the probability distribution also remains
xed. In particular, if we have designed a quantum algorithm that solves a
decision problem correctly with probability greater 21 + (in the ideal case),
then we can achieve success probability greater than 12 with our noisy gates,
if we can perform the gates with an accuracy T" < O(). A quantum circuit
family in the BQP class can really solve hard problems, as long as we can
improve the accuracy of the gates linearly with the computation size T .
U0 = P U P ;1
that applies R to the third qubit if the rst two qubits have the value 1;
otherwise it acts trivially. Here
! !
R = ;iRx() = (;i) exp i 2 x = (;i) cos 2 + ix sin 2
(6.89)
is, up to a phase, a rotation by about the x-axis, where is a particular
angle incommensurate with .
262 CHAPTER 6. QUANTUM COMPUTATION
The nth power of the Deutsch gate is the controlled-controlled-Rn . In
particular, R4 = Rx(4), so that all one-qubit transformations generated by
x are reachable by integer powers of R. Furthermore the (4n + 1)st power
is
" #
(4 n + 1)
(;i) cos 2 + ix sin 2 (4 n + 1) ; (6.90)
s s
s s
j0i gs g
s
R
s
U
264 CHAPTER 6. QUANTUM COMPUTATION
denotes the controlled-U gate (the 2 2 unitary U is applied to the second
qubit if the rst qubit is 1; otherwise the gate acts trivially) then a controlled-
controlled-U 2 gate is obtained from the circuit
x s x s s s x
y s = y s gs g y
U2 U Uy U
A g B g C
i"Asuciently
for n small) we can reach any feiAg to within distance " with
e , for some integer n of order ";2k . We also know that we can ob-
tain transformations feiAa g where the Aa's span the full U (2k ) Lie algebra,
using P circuits of xed size (independent of "). We may then approach any
exp (i a aAa) as in eq. (6.87), also with polynomial convergence.
In principle, we should be able to do much better, reaching a desired
k-qubit unitary within distance " using just poly(log(";1)) quantum gates.
Since the number of size-T circuits that we can construct acting on k qubits
is exponential in T , and the circuits ll U (2k ) roughly uniformly, there should
be a size-T circuit reaching within a distance of order e;T of any point in
U (2k ). However, it might be a computationally hard problem classically
to work out the circuit that comes exponentially close to the unitary we are
trying to reach. Therefore, it would be dishonest to rely on this more ecient
construction in an asymptotic analysis of quantum complexity.
j0i H s H Measure
j1i H Uf
7 The term \oracle" signi es that the box responds to a query immediately; that is, the
time it takes the box to operate is not included in the complexity analysis.
6.3. SOME QUANTUM ALGORITHMS 269
Here H denotes the Hadamard transform
H : jxi ! p12 X(;1)xy jyi; (6.107)
y
or
H : j0i ! p12 (j0i + j1i)
j1i ! p12 (j0i ; j1i); (6.108)
that is, H is the 2 2 matrix
p1 p1
!
H : p1 ; p1 :
2 2 (6.109)
2 2
The circuit takes the input j0ij1i to
j0ij1i ! 12 (j0i + j1i)(j0i ; j1i)
! 12 (;1)f (0)j0i + (;1)f (1)j1i (j0i ; j1i)
2
1
! 2 4 (;1)f (0) + (;1)f (1) j0i
3
f (0)
+ (;1) ; (;1)f (1) j1i5 p1 (j0i ; j1i):
2 (6.110)
Then when we measure the rst qubit, we nd the outcome j0i with prob-
ability one if f (0) = f (1) (constant function) and the outcome j1i with
probability one if f (0) 6= f (1) (balanced function).
A quantum computer enjoys an advantage over a classical computer be-
cause it can invoke quantum parallelism. Because we input a superposition
of j0i and j1i, the output is sensitive to both the values of f (0) and f (1),
even though we ran the box just once.
Deutsch{Jozsa problem. Now we'll consider some generalizations of
Deutsch's problem. We will continue to assume that we are to analyze a
quantum black box (\quantum oracle"). But in the hope of learning some-
thing about complexity, we will imagine that we have a family of black boxes,
270 CHAPTER 6. QUANTUM COMPUTATION
with variable input size. We are interested in how the time needed to nd
out what is inside the box scales with the size of the input (where \time" is
measured by how many times we query the box).
In the Deutsch{Jozsa problem, we are presented with a quantum black
box that computes a function taking n bits to 1,
f : f0; 1gn ! f0; 1g; (6.111)
and we have it on good authority that f is either constant (f (x) = c for all
x) or balanced (f (x) = 0 for exactly 21 of the possible input values). We are
to solve the decision problem: Is f constant or balanced?
In fact, we can solve this problem, too, accessing the box only once, using
the same circuit as for Deutsch's problem (but with x expanded from one
bit to n bits). We note that if we apply n Hadamard gates in parallel to
n-qubits.
H (n) = H H : : : H ; (6.112)
then the n-qubit state transforms as
0 1
Yn 1 X 1 2X
n ;1
H : jxi !
( n ) @ p
2 yi =f0;1g
xi y i A
(;1) jyii 2n=2 (;1)xy jyi;
i=1 y=0 (6.113)
where x; y represent n-bit strings, and x y denotes the bitwise AND (or mod
2 scalar product)
x y = (x1 ^ y1) (x2 ^ y2) : : : (xn ^ yn ): (6.114)
Acting on the input (j0i)n j1i, the action of the circuit is
2Xn ;1 !
(j0i) j1i ! 2n=2
n 1 jxi p12 (j0i ; j1i)
x=0
2Xn ;1 !
! 2n=2 (;1) jxi p12 (j0i ; j1i)
1 f (x )
0 2n;x=0 1
1 X 1 2X
n ;1
! @ 2n (;1)f (x)(;1)xy jyiA p1 (j0i ; j1i)
x=0 y=0 2 (6.115)
Now let us evaluate the sum
1 2X n ;1
(because half of the terms are (+1) and half are (;1)). Therefore, the prob-
ability of obtaining the measurement outcome jy = 0i is zero.
We conclude that one query of the quantum oracle suces to distinguish
constant and balanced function with 100% con dence. The measurement
result y = 0 means constant, any other result means balanced.
So quantum computation solves this problem neatly, but is the problem
really hard classically? If we are restricted to classical input states jxi, we
can query the oracle repeatedly, choosing the input x at random (without
replacement) each time. Once we obtain distinct outputs for two di erent
queries, we have determined that the function is balanced (not constant).
But if the function is in fact constant, we will not be certain it is constant
until we have submitted 2n;1 +1 queries and have obtained the same response
every time. In contrast, the quantum computation gives a de nite response
in only one go. So in this sense (if we demand absolute certainty) the classical
calculation requires a number of queries exponential in n, while the quantum
computation does not, and we might therefore claim an exponential quantum
speedup.
But perhaps it is not reasonable to demand absolute certainty of the
classical computation (particularly since any real quantum computer will be
susceptible to errors, so that the quantum computer will also be unable to
attain absolute certainty.) Suppose we are satis ed to guess balanced or
constant, with a probability of success
P (success) > 1 ; ": (6.119)
If the function is actually balanced, then if we make k queries, the probability
of getting the same response every time is p = 2;(k;1). If after receiving the
272 CHAPTER 6. QUANTUM COMPUTATION
same response k consecutive times we guess that the function is balanced,
then a quick Bayesian analysis shows that the probability that our guess is
wrong is 2k;11+1 (assuming that balanced and constant are a priori equally
probable). So if we guess after k queries, the probability of a wrong guess is
1 ; P (success) = 2k;1 (2k1;1 + 1) : (6.120)
Therefore, we can achieve success probability 1 ; " for ";1 = 2k;1(2k;1 +1) or
k 12 log 1" . Since we can reach an exponentially good success probability
with a polynomial number of trials, it is not really fair to say that the problem
is hard.
Bernstein{Vazirani problem. Exactly the same circuit can be used
to solve another variation on the Deutsch{Jozsa problem. Let's suppose that
our quantum black box computes one of the functions fa, where
fa(x) = a x; (6.121)
and a is an n-bit string. Our job is to determine a.
The quantum algorithm can solve this problem with certainty, given just
one (n-qubit) quantum query. For this particular function, the quantum
state in eq. (6.115) becomes
1 2X X
n ;1 2n ;1
But in fact
1 2X
n ;1
so this state is jai. We can execute the circuit once and measure the n-qubit
register, nding the n-bit string a with probability one.
If only classical queries are allowed, we acquire only one bit of information
from each query, and it takes n queries to determine the value of a. Therefore,
we have a clear separation between the quantum and classical diculty of
the problem. Even so, this example does not probe the relation of BPP
to BQP , because the classical problem is not hard. The number of queries
required classically is only linear in the input size, not exponential.
6.3. SOME QUANTUM ALGORITHMS 273
Simon's problem. Bernstein and Vazirani managed to formulate a vari-
ation on the above problem that is hard classically, and so establish for the
rst time a \relativized" separation between quantum and classical complex-
ity. We will nd it more instructive to consider a simpler example proposed
somewhat later by Daniel Simon.
Once again we are presented with a quantum black box, and this time we
are assured that the box computes a function
f : f0; 1gn ! f0; 1gn ; (6.124)
that is 2-to-1. Furthermore, the function has a \period" given by the n-bit
string a; that is
f (x) = f (y) i y = x a; (6.125)
where here denotes the bitwise XOR operation. (So a is the period if we
regard x as taking values in (Z2)n rather than Z2n .) This is all we know
about f . Our job is to determine the value of a.
Classically this problem is hard. We need to query the oracle an exponen-
tially large number of times to have any reasonable probability of nding a.
We don't learn anything until we are fortunate enough to choose two queries
x and y that happen to satisfy x y = a. Suppose, for example, that we
choose 2n=4 queries. The number of pairs of queries is less than (2n=4)2, and
for each pair fx; yg, the probability that x y = a is 2;n . Therefore, the
probability of successfully nding a is less than
2;n (2n=4 )2 = 2;n=2; (6.126)
even with exponentially many queries, the success probability is exponentially
small.
If we wish, we can frame the question as a decision problem: Either f
is a 1-1 function, or it is 2-to-1 with some randomly chosen period a, each
occurring with an a priori probability 21 . We are to determine whether the
function is 1-to-1 or 2-to-1. Then, after 2n=4 classical queries, our probability
of making a correct guess is
P (success) < 12 + 2n=1 2 ; (6.127)
which does not remain bounded away from 12 as n gets large.
274 CHAPTER 6. QUANTUM COMPUTATION
But with quantum queries the problem is easy! The circuit we use is
essentially the same as above, but now both registers are expanded to n
qubits. We prepare the equally weighted superposition of all n-bit strings
(by acting on j0i with H (n)), and then we query the oracle:
2X
n;1 ! 2X
n ;1
Uf : jxi j0i ! jxijf (x)i: (6.128)
x=0 x=0
Now we measure the second register. (This step is not actually necessary,
but I include it here for the sake of pedagogical clarity.) The measurement
outcome is selected at random from the 2n;1 possible values of f (x), each
occurring equiprobably. Suppose the outcome is f (x0). Then because both
x0 and x0 a, and only these values, are mapped by f to f (x0), we have
prepared the state
p1 (jx0i + jx0 ai) (6.129)
2
in the rst register.
Now we want to extract some information about a. Clearly it would
do us no good to measure the register (in the computational basis) at this
point. We would obtain either the outcome x0 or x0 a, each occurring with
probability 21 , but neither outcome would reveal anything about the value of
a.
But suppose we apply the Hadamard transform H (n) to the register before
we measure:
H (n) : p12 (jx0i + jx0 + ai)
n ;1 h
2X i
1
! 2(n+1)=2 (;1)x0y + (;1)(x0a)y jyi
y=0
X
= 2(n;11)=2 (;1)x0y jyi: (6.130)
ay=0
' p1 ; (6.149)
N
for N large; if we choose
p
T = 4 N (1 + O(N ;1=2)); (6.150)
U s = 2jsihsj ; 1; (6.155)
that re ects a vector about the axis de ned by the vector jsi. How do
we build this transformation eciently from quantum gates? Since jsi =
H (n)j0i, where H (n) is the bitwise Hadamard transformation, we may write
U s = H (n)(2j0ih0j ; 1)H (n); (6.156)
so it will suce to construct a re ection about the axis j0i. We can easily
build this re ection from an n-bit To oli gate (n).
Recall that
HxH = z ; (6.157)
s s
s s
s =
s
...
H g H Z
Since each term in the polynomial Peven ~ ) contains at most 2T of the X~i 's,
(2T )(X
we can invoke the identity
X ~
Xi = 0; (6.185)
X~i 2f0;1g
to see that the sum in eq. (6.184) must vanish if N > 2T . We conclude that
X (2T ) ~ X ~ );
(2T )(X
Peven (X ) = Peven (6.186)
par(X~ )=1 par(X~ )=;1
hence, for T < N=2, we are just as likely to guess \even" when the actual
PARITY(X~ ) is odd as when it is even (on average). Our quantum algorithm
6.8. DISTRIBUTED DATABASE SEARCH 293
fails to tell us anything about the value of PARITY(X~ ); that is, averaged
over the (a priori equally likely) possible values of Xi , we are just as likely
to be right as wrong.
We can also show, by exhibiting an explicit algorithm (exercise), that
N=2 queries (assuming N even) are sucient to determine PARITY (either
probabilistically or deterministically.) In a sense, then, we can achieve a
factor of 2 speedup compared to classical queries. But that is the best we
can do.
6.9 Periodicity
So far, the one case for which we have found an exponential separation be-
tween the speed of a quantum algorithm and the speed of the corresponding
15R. Cleve, et al., \Quantum Entanglement and the Communication Complexity of the
Inner Product Function," quant-ph/9708019; W. van Dam, et al., \Multiparty Quantum
Communication Complexity," quant-ph/9710054.
6.9. PERIODICITY 297
classical algorithm is the case of Simon's problem. Simon's algorithm exploits
quantum parallelism to speed up the search for the period of a function. Its
success encourages us to seek other quantum algorithms designed for other
kinds of period nding.
Simon studied periodic functions taking values in (Z2)n . For that purpose
the n-bit Hadamard transform H (n) was a powerful tool. If we wish instead to
study periodic functions taking values in Z2n , the (discrete) Fourier transform
will be a tool of comparable power.
The moral of Simon's problem is that, while nding needles in a haystack
may be dicult, nding periodically spaced needles in a haystack can be far
easier. For example, if we scatter a photon o of a periodic array of needles,
the photon is likely to be scattered in one of a set of preferred directions,
where the Bragg scattering condition is satis ed. These preferred directions
depend on the spacing between the needles, so by scattering just one photon,
we can already collect some useful information about the spacing. We should
further explore the implications of this metaphor for the construction of
ecient quantum algorithms.
So imagine a quantum oracle that computes a function
f : f0; 1gn ! f0; 1gm ; (6.192)
that has an unknown period r, where r is a positive integer satisfying
1 r 2n : (6.193)
That is,
f (x) = f (x + mr); (6.194)
where m is any integer such that x and x + mr lie in f0; 1; 2; : : : ; 2n ; 1g.
We are to nd the period r. Classically, this problem is hard. If r is, say,
of order 2n=2, we will need to query the oracle of order 2n=4 times before we
are likely to nd two values of x that are mapped to the same value of f (x),
and hence learn something about r. But we will see that there is a quantum
algorithm that nds r in time poly (n).
Even if we know how to compute eciently the function f (x), it may
be a hard problem to determine its period. Our quantum algorithm can
be applied to nding, in poly(n) time, the period of any function that we
can compute in poly(n) time. Ecient period nding allows us to eciently
298 CHAPTER 6. QUANTUM COMPUTATION
solve a variety of (apparently) hard problems, such as factoring an integer,
or evaluating a discrete logarithm.
The key idea underlying quantum period nding is that the Fourier trans-
form can be evaluated by an ecient quantum circuit (as discovered by Peter
Shor). The quantum Fourier transform (QFT) exploits the power of quantum
parallelism to achieve an exponential speedup of the well-known (classical)
fast Fourier transform (FFT). Since the FFT has such a wide variety of
applications, perhaps the QFT will also come into widespread use someday.
jx2i H R1 R2 jy2i
jx1i s H R1 jy1i
jx0i s s H jy0i
does the job (but note that the order of the bits has been reversed in the
output). Each Hadamard gate acts as
H : jxk i ! p12 j0i + e2i(:xk)j1i : (6.221)
The other contributions to the relative phase of j0i and j1i in the kth qubit
are provided by the two-qubit conditional rotations, where
!
1 0
Rd = 0 ei=2d ; (6.222)
and d = (k ; j ) is the \distance" between the qubits.
In the case n = 3, the QFT is constructed from three H gates and three
controlled-R gates. For general
n, the obvious generalization of this circuit
requires n H gates and n2 = 21 n(n ; 1) controlled R's. A two qubit gate
is applied to each pair of qubits, again with controlled relative phase =2d ,
where d is the \distance" between the qubits. Thus the circuit family that
implements QFT has a size of order (log N )2.
We can reduce the circuit complexity to linear in log N if we are will-
ing to settle for an implementation of xed accuracy, because the two-qubit
gates acting on distantly separated qubits contribute only exponentially small
phases. If we drop the gates acting on pairs with distance greater than m,
than each term in eq. (6.217) is replaced by an approximation to m bits of
accuracy; the total error in xy=2n is certainly no worse than n2;m , so we
can achieve accuracy " in xy=2n with m log n=". If we retain only the
gates acting on qubit pairs with distance m or less, then the circuit size is
mn n log n=".
304 CHAPTER 6. QUANTUM COMPUTATION
In fact, if we are going to measure in the computational basis immedi-
ately after implementing the QFT (or its inverse), a further simpli cation
is possible { no two-qubit gates are needed at all! We rst remark that the
controlled { Rd gate acts symmetrically on the two qubits { it acts trivially
on j00i; j01i, and j10i, and modi es the phase of j11i by eid . Thus, we
can interchange the \control" and \target" bits without modifying the gate.
With this change, our circuit for the 3-qubit QFT can be redrawn as:
jx2i H s s jy2i
jx1i R1 H s jy1i
jx0i R2 R1 H jy0i
Once we have measured jy0i, we know the value of the control bit in the
controlled-R1 gate that acted on the rst two qubits. Therefore, we will
obtain the same probability distribution of measurement outcomes if, instead
of applying controlled-R1 and then measuring, we instead measure y0 rst,
and then apply (R1)y0 to the next qubit, conditioned on the outcome of the
measurement of the rst qubit. Similarly, we can replace the controlled-R1
and controlled-R2 gates acting on the third qubit by the single qubit rotation
(R2)y0 (R1)y1 ; (6.223)
(that is, a rotation with relative phase (:y1y0)) after the values of y1 and y0
have been measured.
Altogether then, if we are going to measure after performing the QFT,
only n Hadamard gates and n ; 1 single-qubit rotations are needed to im-
plement it. The QFT is remarkably simple!
6.10 Factoring
6.10.1 Factoring as period nding
What does the factoring problem ( nding the prime factors of a large com-
posite positive integer) have to do with periodicity? There is a well-known
6.10. FACTORING 305
(randomized) reduction of factoring to determining the period of a function.
Although this reduction is not directly related to quantum computing, we
will discuss it here for completeness, and because the prospect of using a
quantum computer as a factoring engine has generated so much excitement.
Suppose we want to nd a factor of the n-bit number N . Select pseudo-
randomly a < N , and compute the greatest common divisor GCD(a; N ),
which can be done eciently (in a time of order (log N )3) using the Euclidean
algorithm. If GCD(a; N ) 6= 1 then the GCD is a nontrivial factor of N , and
we are done. So suppose GCD(a; N ) = 1.
[Aside: The Euclidean algorithm. To compute GCD(N1; N2) (for N1 >
N2) rst divide N1 by N2 obtaining remainder R1. Then divide N2 by
R1, obtaining remainder R2. Divide R1 by R2, etc. until the remainder
is 0. The last nonzero remainder is R = GCD(N1; N2). To see that the
algorithm works, just note that (1) R divides all previous remainders
and hence also N1 and N2, and (2) any number that divides N1 and
N2 will also divide all remainders, including R. A number that divides
both N1 and N2, and also is divided by any number that divides both
N1 and N2 must be GCD(N1 ; N2). To see how long the Euclidean
algorithm takes, note that
Rj = qRj+1 + Rj+2 ; (6.224)
where q 1 and Rj+2 < Rj+1; therefore Rj+2 < 21 Rj . Two divisions
reduce the remainder by at least a factor of 2, so no more than 2 log N1
divisions are required, with each division using O((log N )2) elementary
operations; the total number of operations is O((log N )3).]
The numbers a < N coprime to N (having no common factor with N )
form a nite group under multiplication mod N . [Why? We need to establish
that each element a has an inverse. But for given a < N coprime to N , each
ab (mod N ) is distinct, as b ranges over all b < N coprime to N .16 Therefore,
for some b, we must have ab 1 (mod N ); hence the inverse of a exists.]
Each element a of this nite group has a nite order r, the smallest positive
integer such that
ar 1 (mod N ): (6.225)
16If N divides ab ; ab0, it must divide b ; b0.
306 CHAPTER 6. QUANTUM COMPUTATION
The order of a mod N is the period of the function
fN;a(x) = ax (mod N ): (6.226)
We know there is an ecient quantum algorithm that can nd the period of
a function; therefore, if we can compute fN;a eciently, we can nd the order
of a eciently.
Computing fN;a may look dicult at rst, since the exponent x can be
very large. But if x < 2m and we express x as a binary expansion
x = xm;1 2m;1 + xm;2 2m;2 + : : : + x0; (6.227)
we have
ax(mod N ) = (a2m;1 )xm;1 (a2m;2 )xm;2 : : : (a)x0 (mod N ):
(6.228)
Each a2j has a large exponent, but can be computed eciently by a classical
computer, using repeated squaring
a2j (mod N ) = (a2j;1 )2 (mod N ): (6.229)
So only m ; 1j (classical) mod N multiplications are needed to assemble a
table of all a2 's.
The computation of ax(mod N ) is carried out by executing a routine:
INPUT 1
For j = 0 to m ; 1, if xj = 1, MULTIPLY a2j .
This routine requires at most m mod N multiplications, each requiring of
order (log N )2 elementary operations.17 Since r < N , we will have a rea-
sonable chance of success at extracting the period if we choose m 2 log N .
Hence, the computation of fN;a can be carried out by a circuit family of size
O((log N )3). Schematically, the circuit has the structure:
17Using tricks for performing ecient multiplication of very large numbers, the number
of elementary operations can be reduced to O(log N log log N loglog log N ); thus, asymp-
totically for large N , a circuit family with size O(log2 N log log N loglog log N ) can com-
pute fN;a .
6.10. FACTORING 307
jx2i s
jx1i s
jx0i s
j1i a a2 a4
6.10.2 RSA
Does anyone care whether factoring is easy or hard? Well, yes, some people
do.
The presumed diculty of factoring is the basis of the security of the
widely used RSA18 scheme for public key cryptography, which you may have
used yourself if you have ever sent your credit card number over the internet.
The idea behind public key cryptography is to avoid the need to exchange
a secret key (which might be intercepted and copied) between the parties
that want to communicate. The enciphering key is public knowledge. But
using the enciphering key to infer the deciphering key involves a prohibitively
dicult computation. Therefore, Bob can send the enciphering key to Alice
and everyone else, but only Bob will be able to decode the message that Alice
(or anyone else) encodes using the key. Encoding is a \one-way function"
that is easy to compute but very hard to invert.
18For Rivest, Shamir, and Adleman
310 CHAPTER 6. QUANTUM COMPUTATION
(Of course, Alice and Bob could have avoided the need to exchange the
public key if they had decided on a private key in their previous clandestine
meeting. For example, they could have agreed to use a long random string
as a one-time pad for encoding and decoding. But perhaps Alice and Bob
never anticipated that they would someday need to communicate privately.
Or perhaps they did agree in advance to use a one-time pad, but they have
now used up their private key, and they are loath to reuse it for fear that an
eavesdropper might then be able to break their code. Now they are two far
apart to safely exchange a new private key; public key cryptography appears
to be their most secure option.)
To construct the public key Bob chooses two large prime numbers p and
q. But he does not publicly reveal their values. Instead he computes the
product
N = pq: (6.239)
Since Bob knows the prime factorization of N , he also knows the value of the
Euler function '(N ) { the number of number less than N that are coprime
with N . In the case of a product of two primes it is
'(N ) = N ; p ; q + 1 = (p ; 1)(q ; 1); (6.240)
(only multiples of p and q share a factor with N ). It is easy to nd '(N ) if
you know the prime factorization of N , but it is hard if you know only N .
Bob then pseudo-randomly selects e < '(N ) that is coprime with '(N ).
He reveals to Alice (and anyone else who is listening) the value of N and e,
but nothing else.
Alice converts her message to ASCII, a number a < N . She encodes the
message by computing
b = f (a) = ae (mod N ); (6.241)
which she can do quickly by repeated squaring. How does Bob decode the
message?
Suppose that a is coprime to N (which is overwhelmingly likely if p and
q are very large { anyway Alice can check before she encodes). Then
a'(N ) 1 (mod N ) (6.242)
(Euler's theorem). This is so because the numbers less than N and coprime
to N form a group (of order '(N )) under mod N multiplication. The order of
6.10. FACTORING 311
any group element must divide the order of the group (the powers of a form
a subgroup). Since GCD(e; '(N ) = 1, we know that e has a multiplicative
inverse d = e;1 mod '(N ):
ed 1 (mod '(N )): (6.243)
The value of d is Bob's closely guarded secret; he uses it to decode by com-
puting:
f ;1 (b) = bd (mod N )
= aed (mod N )
= a (a'(N ))integer (mod N )
= a (mod N ): (6.244)
[Aside: How does Bob compute d = e;1? The multiplicative inverse is a
byproduct of carrying out the Euclidean algorithm to compute GCD(e; '(N )) =
1. Tracing the chain of remainders from the bottom up, starting with
Rn = 1:
1 = Rn = Rn;2 ; qn;1 Rn;1
Rn;1 = Rn;3 ; qn;2 Rn;2
Rn;2 = Rn;4 ; qn;3 Rn;3
etc : : : : (6.245)
(where the qj 's are the quotients), so that
1 = (1 + qn;1qn;2 )Rn;2 ; qn;1Rn;3
1 = (;qn;1 ; qn;3(1 + qn;1qn;2 ))Rn;3
+ (1 + qn;1qn;2)Rn;4 ;
etc : : : : : (6.246)
Continuing, we can express 1 as a linear combination of any two suc-
cessive remainders; eventually we work our way up to
1 = d e + q '(N ); (6.247)
and identify d as e;1 (mod '(N )).]
312 CHAPTER 6. QUANTUM COMPUTATION
Of course, if Eve has a superfast factoring engine, the RSA scheme is
insecure. She factors N , nds '(N ), and quickly computes d. In fact, she
does not really need to factor N ; it is sucient to compute the order modulo
N of the encoded message ae (mod N ). Since e is coprime with '(N ), the
order of ae (mod N ) is the same as the order of a (both elements generate
the same orbit, or cyclic subgroup). Once the order Ord(a) is known, Eve
computes d~ such that
~ 1 (mod Ord(a))
de (6.248)
so that
(ae)d~ a (aOrd(a))integer (mod N ) a (mod N );
(6.249)
and Eve can decipher the message. If our only concern is to defeat RSA,
we run the Shor algorithm to nd r = Ord(ae), and we needn't worry about
whether we can use r to extract a factor of N or not.
How important are such prospective cryptographic applications of quan-
tum computing? When fast quantum computers are readily available, con-
cerned parties can stop using RSA, or can use longer keys to stay a step
ahead of contemporary technology. However, people with secrets sometimes
want their messages to remain con dential for a while (30 years?). They may
not be satis ed by longer keys if they are not con dent about the pace of
future technological advances.
And if they shun RSA, what will they use instead? Not so many suitable
one-way functions are known, and others besides RSA are (or may be) vul-
nerable to a quantum attack. So there really is a lot at stake. If fast large
scale quantum computers become available, the cryptographic implications
may be far reaching.
But while quantum theory taketh away, quantum theory also giveth;
quantum computers may compromise public key schemes, but also o er an
alternative: secure quantum key distribution, as discussed in Chapter 4.
1
Prob(1) = 2 (1 ; ) j2 = sin2(); (6.255)
where = e2i.
As we have discussed previously (for example in connection with Deutsch's
problem), this procedure distinguishes with certainty between the eigenval-
ues = 1 ( = 0) and = ;1 ( = 1=2): But other possible values of can
also be distinguished, albeit with less statistical con dence. For example,
suppose the state on which U acts is a superposition of U eigenstates
1 j1 i + 2 j2 i: (6.256)
And suppose we execute the above circuit n times, with n distinct control
bits. We thus prepare the state
!n
1 + 1 1 ; 1
1j1 i
2 j0i + 2 j1i
!n
1 + 2
+ 2j2i 2 j0i + 2 j1i : 1 ; 2
(6.257)
If 1 6= 2, the overlap between the two states of the n control bits is ex-
ponentially small for large n; by measuring the control bits, we can perform
the orthogonal projection onto the fj1i; j2ig basis, at least to an excellent
approximation.
If we use enough control bits, we have a large enough sample to measure
Prob (0)= 21 (1 + cos 2) with reasonable statistical con dence. By execut-
ing a controlled-(iU ), we can also measure 12 (1 + sin 2) which suces to
determine modulo an integer.
6.11. PHASE ESTIMATION 315
However, in the factoring algorithm, we need to measure the phase of
e2ik=r to exponential accuracy, which seems to require an exponential number
of trials. Suppose, though, that we can eciently compute high powers of U
(as is the case for U a) such as
U 2j : (6.258)
By applying the above procedure to measurement of U 2j , we determine
exp(2i2j ); (6.259)
where e2i is an eigenvalue of U . Hence, measuring U 2j to one bit of accu-
racy is equivalent to measuring the j th bit of the eigenvalue of U .
We can use this phase estimation procedure for order nding, and hence
factorization. We invert eq. (6.253) to obtain
rX
;1
jx0i = p1r jk i; (6.260)
k=0
each computational basis state (for x0 6= 0) is an equally weighted superpo-
sition of r eigenstates of U a.
Measuring the eigenvalue, we obtain k = e2ik=r , with k selected from
f0; 1 : : : ; r ; 1g equiprobably. If r < 2n , we measure to 2n bits of precision to
determine k=r. In principle, we can carry out this procedure in a computer
that stores fewer qubits than we would need to evaluate the QFT, because
we can attack just one bit of k=r at a time.
But it is instructive to imagine that we incorporate the QFT into this
phase estimation procedure. Suppose the circuit
j 0i H s 2 (j0i +
p1 2 j1i)
ji U U2 U4
316 CHAPTER 6. QUANTUM COMPUTATION
acts on the eigenstate ji of the unitary transformation U . The conditional
U prepares p12 (j0i + j1i), the conditional U 2 prepares p12 (j0i + 2j1i), the
conditional U 4 prepares p12 (j0i + 4j1i), and so on. We could perform a
Hadamard and measure each of these qubits to sample the probability dis-
tribution governed by the j th bit of , where = e2i. But a more ecient
method is to note that the state prepared by the circuit is
p1 m X e2iy jyi:
2m;1
(6.261)
2 y=0
A better way to learn the value of is to perform the QFT(m), not the
Hadamard H (m), before we measure.
Considering the case m = 3 for clarity, the circuit that prepares and then
Fourier analyzes the state
p1 X e2iy jyi
7
(6.262)
8 y=0
is
j0i H r H r r jy~0i
j0i H r 1 H r jy~1i
j0i H r 2 1 H jy~2i
U U2 U4
This circuit very nearly carries out our strategy for phase estimation out-
lined above, but with a signi cant modi cation. Before we execute the nal
Hadamard transformation and measurement of y~1 and y~2, some conditional
phase rotations are performed. It is those phase rotations that distinguish
the QFT(3) from Hadamard transform H (3), and they strongly enhance the
reliability with which we can extract the value of .
We can understand better what the conditional rotations are doing if we
suppose that = k=8, for k 2 f0; 1; 2 : : : ; 7g; in that case, we know that the
Fourier transform will generate the output y~ = k with probability one. We
may express k as the binary expansion
k = k2 k1k0 k2 4 + k1 2 + k0: (6.263)
6.12. DISCRETE LOG 317
In fact, the circuit for the least signi cant bit y~0 of the Fourier transform
is precisely Kitaev's measurement circuit applied to the unitary U 4, whose
eigenvalue is
(e2i)4 = eik = eik0 = 1: (6.264)
The measurement circuit distinguishes eigenvalues 1 perfectly, so that y~0 =
k0 .
The circuit for the next bit y~1 is almost the measurement circuit for U 2,
with eigenvalue
(e2i)2 = eik=2 = ei(k1 k0): (6.265)
Except that the conditional phase rotation has been inserted, which multi-
plies the phase by exp[i(k0)], resulting in eik1 . Again, applying a Hadamard
followed by measurement, we obtain the outcome y~1 = k1 with certainty.
Similarly, the circuit for y~2 measures the eigenvalue
e2i = eik=4 = ei(k2 k1k0 ); (6.266)
except that the conditional rotation removes ei(k1k0 ), so that the outcome
is y~2 = k2 with certainty.
Thus, the QFT implements the phase estimation routine with maximal
cleverness. We measure the less signi cant bits of rst, and we exploit
the information gained in the measurements to improve the reliability of our
estimate of the more signi cant bits. Keeping this interpretation in mind,
you will nd it easy to remember the circuit for the QFT(n)!
c) Suppose that = j ih j and ~ = j ~ih ~j are pure states. Use (b) to show
that
X
jPa ; P~aj 2 k j i ; j ~i k : (6.275)
a
{ Figure {
and nally measure the ancilla qubit. If the qubits 1 and 2 are in a state
with Z 1Z 2 = ;1 (either j0i1j1i2 or j1i1j0i2), then the ancilla qubit will ip
once and the measurement outcome will be j1i. But if qubits 1 and 2 are
in a state with Z 1Z 2 = 1 (either j0i1 j0i2 or j1i1j1i2 ), then the ancilla qubit
will ip either twice or not at all, and the measurement outcome will be j0i.
Similarly, the two-qubit operators
Z 4Z 5; Z 7Z 8 ;
Z 5Z 6; Z 8Z 9 ; (7.3)
can be measured to diagnose bit ip errors in the other two clusters of three
qubits.
A three-qubit code would suce to protect against a single bit ip. The
reason the 3-qubit clusters are repeated three times is to protect against
4 CHAPTER 7. QUANTUM ERROR CORRECTION
phase errors as well. Suppose now that a phase error
j i ! Zj i (7.4)
occurs acting on one of the nine qubits. We can diagnose in which cluster
the phase error occurred by measuring the two six-qubit observables
X 1 X 2X 3 X 4 X 5 X 6;
X 4 X 5X 6 X 7 X 8 X 9: (7.5)
The logical basis states j0i and j1i are both eigenstates with eigenvalue one
of these observables. A phase error acting on any one of the qubits in a
particular cluster will change the value of XXX in that cluster relative to
the other two; the location of the change can be identi ed by measuring the
observables in eq. (7.5). Once the a ected cluster is identi ed, we can reverse
the error by applying Z to one of the qubits in that cluster.
How do we measure the six-qubit observable X 1X 2X 3X 4X 5X 6? Notice
that if its control qubit is initially in the state p12 (j0i + j1i), and its target is
an eigenstate of X (that is, NOT) then a controlled-NOT acts according to
CNOT : p1 (j0i + j1i) jxi ! p1 (j0i + (;1)xj1i) jxi;
2 2 (7.6)
it acts trivially if the target is the X = 1 (x = 0) state, and it ips the
control if the target is the X = ;1 (x = 1) state. To measure a product of
X 's, then, we execute the circuit
{ Figure {
and then measure the ancilla in the p12 (j0i j1i) basis.
We see that a single error acting on any one of the nine qubits in the block
will cause no irrevocable damage. But if two bit ips occur in a single cluster
of three qubits, then the encoded information will be damaged. For example,
if the rst two qubits in a cluster both ip, we will misdiagnose the error and
attempt to recover by ipping the third. In all, the errors, together with our
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 5
mistaken recovery attempt, apply the operator X 1X 2X 3 to the code block.
Since j0i and j1i are eigenstates of X 1X 2X 3 with distinct eigenvalues, the
e ect of two bit ips in a single cluster is a phase error in the encoded qubit:
X 1X 2 X 3 : aj0i + bj1i ! aj0i ; bj1i : (7.7)
The encoded information will also be damaged if phase errors occur in two
di erent clusters. Then we will introduce a phase error into the third cluster
in our misguided attempt at recovery, so that altogether Z 1Z 4Z 7 will have
been applied, which ips the encoded qubit:
Z 1Z 4 Z 7 : aj0i + bj1i ! aj1i + bj0i : (7.8)
If the likelihood of an error is small enough, and if the errors acting on
distinct qubits are not strongly correlated, then using the nine-qubit code
will allow us to preserve our unknown qubit more reliably than if we had not
bothered to encode it at all. Suppose, for example, that the environment
acts on each of the nine qubits, independently subjecting it to the depolar-
izing channel described in Chapter 3, with error probability p. Then a bit
ip occurs with probability 32 p, and a phase ip with probability 23 p. (The
probability that both occur is 13 p). We can see that the probability of a phase
error a ecting the logical qubit is bounded above by 4p2 , and the probability
of a bit ip error is bounded above by 12p2 . The total error probability is no
worse than 16p2 ; this is an improvement over the error probability p for an
unprotected qubit, provided that p < 1=16.
Of course, in this analysis we have implicitly assumed that encoding,
decoding, error syndrome measurement, and recovery are all performed aw-
lessly. In Chapter 8 we will examine the more realistic case in which errors
occur during these operations.
where now the states jiE are elements of an orthonormal basis for the envi-
ronment, and the matrices M are linear combinations of the Pauli operators
10 CHAPTER 7. QUANTUM ERROR CORRECTION
E a contained in E , satisfying the operator-sum normalization condition
X
M y M = I : (7.21)
The error can be reversed by a recovery superoperator if there exist operators
R such that
X y
R R = I ; (7.22)
and X
R M jii jiE j iA
;
{ Figure {
(7.51)
The delity of the recovered state therefore satis es
F h jGOOD j i =k jsiEA k2=k jGOOD0i k2 :
0 (7.52)
Furthermore, since the recovery operation is unitary, we have k jGOOD0i k=
k jGOODi k, and hence
X
F k jGOODi k2=k E a j i jeaiE k2 : (7.53)
E a 2E
In general, though, jBADi need not be orthogonal to jGOODi, so that
jBAD0i need not be orthogonal to jGOOD0i. Then jBAD0i might have a
component along jGOOD0i that interferes destructively with jGOOD0i and
so reduces the delity. We can still obtain a lower bound on the delity in
this more general case by resolving jBAD0i into a component along jGOOD0i
and an orthogonal component, as
jBAD0i = jBAD0ki + jBAD0?i (7.54)
Then reasoning just as above we obtain
F k jGOOD0i + jBAD0ki k2 (7.55)
Of course, since both the error operation and the recovery operation are uni-
tary acting on data, environment, and ancilla, the complete state jGOOD0i +
jBAD0i is normalized, or
k jGOOD0i + jBAD0ki k2 + k jBAD0?i k2= 1 ; (7.56)
and eq. (7.55) becomes
F 1; k jBAD0?i k2 : (7.57)
7.4. PROBABILITY OF FAILURE 19
Finally, the norm of jBAD0? i cannot exceed the norm of jBAD0i, and we
conclude that
X
1 ; F k jBAD0i k2=k jBADi k2k E b j i jeb iE k2 :
E b 62E (7.58)
This is our general bound on the \failure probability" of the recovery oper-
ation. The result eq. (7.53) then follows in the special case where jGOODi
and jBADi are orthogonal states.
where each i 2 f0; 1g, and addition is modulo 2. We may say that the
length-n vector v( 1 : : : k ) encodes the k-bit message = ( 1; : : : ; k ).
22 CHAPTER 7. QUANTUM ERROR CORRECTION
The k basis vectors v1; : : :vk may be assembled into a k n matrix
0 1
BB v..1 CC
G = @ . A; (7.67)
vk
called the generator matrix of the code. Then in matrix notation, eq. (7.66)
can be rewritten as
v( ) = G ; (7.68)
the matrix G, acting to the left, encodes the message .
An alternative way to characterize the k-dimensional code subspace of
F2n is to specify n ; k linear constraints. There is an (n ; k) n matrix H
such that
Hv = 0 (7.69)
for all those and only those vectors v in the code C . This matrix H is called
the parity check matrix of the code C . The rows of H are n ; k linearly
independent vectors, and the code space is the space of vectors that are
orthogonal to all of these vectors. Orthogonality is de ned with respect to
the mod 2 bitwise inner product; two length-n binary strings are orthogonal
is they \collide" (both take the value 1) at an even number of locations. Note
that
HGT = 0 ; (7.70)
where GT is the transpose of G; the rows of G are orthogonal to the rows of
H.
For a classical bit, the only kind of error is a bit ip. An error occurring
in an n-bit string can be characterized by an n-component vector e, where
the 1's in e mark the locations where errors occur. When aicted by the
error e, the string v becomes
v !v+e : (7.71)
Errors can be detected by applying the parity check matrix. If v is a code-
word, then
H (v + e) = Hv + He = He : (7.72)
7.5. CLASSICAL LINEAR CODES 23
He is called the syndrome of the error e. Denote by E the set of errors
feig that we wish to be able to correct. Error recovery will be possible if
and only if all errors ei have distinct syndromes. If this is the case, we can
unambiguously diagnose the error given the syndrome He, and we may then
recover by ipping the bits speci ed by e as in
v + e ! (v + e) + e = v : (7.73)
On the other hand, if He1 = He2 for e1 6= e2 then we may misinterpret an
e1 error as an e2 error; our attempt at recovery then has the e ect
v + e1 ! v + (e1 + e2) 6= v: (7.74)
The recovered message v + e1 + e2 lies in the code, but it di ers from the
intended message v; the encoded information has been damaged.
The distance d of a code C is the minimum weight of any vector v 2 C ,
where the weight is the number of 1's in the string v. A linear code with
distance d = 2t + 1 can correct t errors; the code assigns a distinct syndrome
to each e 2 E , where E contains all vectors of weight t or less. This is so
because, if He1 = He2, then
0 = He1 + He2 = H (e1 + e2) ; (7.75)
and therefore e1 + e2 2 C . But if e1 and e2 are unequal and each has weight
no larger than t, then the weight of e1 + e2 is greater than zero and no larger
than 2t. Since d = 2t + 1, there is no such vector in C . Hence He1 and He2
cannot be equal.
A useful concept in classical coding theory is that of the dual code. We
have seen that the k n generator matrix G and the (n ; k) n parity check
matrix H of a code C are related by HGT = 0. Taking the transpose, it
follows that GH T = 0. Thus we may regard H T as the generator and G as
the parity check of an (n ; k)-dimensional code, which is denoted C ? and
called the dual of C . In other words, C ? is the orthogonal complement of
C in F2n. A vector is self-orthogonal if it has even weight, so it is possible
for C and C ? to intersect. A code contains its dual if all of its codewords
have even weight and are mutually orthogonal. If n = 2k it is possible that
C = C ?, in which case C is said to be self-dual.
An identity relating the code C and its dual C ? will prove useful in the
24 CHAPTER 7. QUANTUM ERROR CORRECTION
following section:
8k
X >
<2 u 2 C ?
(;1) = >
v u : (7.76)
v2C :0 u 26 C ?
The nontrivial content of the identity is the statement that the sum vanishes
for u 62 C ?. This readily follows from the familiar identity
X
(;1)vw = 0; w 6= 0; (7.77)
v2f0;1gk
{ Figure {
Separate syndromes are measured to diagnose the bit ip errors and the phase
errors. An important special case of the CSS construction arises when a code
C contains its dual C ?. Then we may choose C1 = C and C2 = C ? C ; the
C parity check is computed in both the F basis and the P basis to determine
the two syndromes.
X
j1iF = p18 jvi : (7.93)
odd v
2 Hamming
Since both j0i and j1i are superpositions of Hamming codewords, bit ips
can be diagnosed in this basis by performing an H parity check. In the
Hadamard rotated basis, these codewords become
1 X 1 (j0i + j1i )
(7)
H : j0iF ! j0iP jv i = p
4 v2 Hamming 2 F F
X
j1iF ! j1iP 14 (;1)wt(v)jvi = p1 (j0iF ; j1iF ):
2
v2 Hamming (7.94)
In this basis as well, the states are superpositions of Hamming codewords,
so that bit ips in the P basis (phase ips in the original basis) can again
be diagnosed with an H parity check. (We note in passing that for this
code, performing the bitwise Hadamard transformation also implements a
Hadamard rotation on the encoded data, a point that will be relevant to our
discussion of fault-tolerant quantum computation in the next chapter.)
Steane's quantum code can correct a single bit ip and a single phase
ip on any one of the seven qubits in the block. But recovery will fail if
two di erent qubits both undergo either bit ips or phase ips. If e1 and e2
are two distinct weight-one strings then He1 + He2 is a sum of two distinct
columns of H , and hence a third column of H (all seven of the nontrivial
strings of length 3 appear as columns of H .) Therefore, there is another
weight-one string e3 such that He1 + He2 = He3, or
H (e1 + e2 + e3) = 0 ; (7.95)
thus e1 + e2 + e3 is a weight-3 word in the Hamming code. We will interpret
the syndrome He3 as an indication that the error v ! v + e3 has arisen, and
we will attempt to recover by applying the operation v ! v + e3. Altogether
30 CHAPTER 7. QUANTUM ERROR CORRECTION
then, the e ect of the two bit ip errors and our faulty attempt at recovery
will be to add e1 + e2 + e3 (an odd-weight Hamming codeword) to the data,
which will induce a ip of the encoded qubit
j0iF $ j1iF : (7.96)
Similarly, two phase ips in the F basis are two bit ips in the P basis, which
(after the botched recovery) induce on the encoded qubit
j0iP $ j1iP ; (7.97)
or equivalently
j0iF ! j0iF
j1iF ! ;j1iF ; (7.98)
a phase ip of the encoded qubit in the F basis. If there is one bit ip and
one phase ip (either on the same qubit or di erent qubits) then recovery
will be successful.
{ Figure {
32 CHAPTER 7. QUANTUM ERROR CORRECTION
If we append j00i to each of those two sub-blocks, then the original block
has spawned two o spring, each with two located errors. If we were able to
correct the two located errors in each of the o spring, we would obtain two
identical copies of the parent block | we would have cloned an unknown
quantum state, which is impossible. Therefore, no [[4; 1; 3]] quantum code
can exist. We conclude that n = 5 is the minimal block size of a quantum
code that corrects one error, whether the code is degenerate or not.
The same reasoning shows that an [[n; k 1; d]] code can exist only for
n > 2(d ; 1) : (7.102)
X 1 X 2 X 3 X 4X 5 X 6 ; X 4X 5 X 6 X 7 X 8X 9 :
(7.136)
In the notation of eq. (7.134) these become
0 1 1 0 1
BB 0 1 1 0 0 CC
BB 1 1 0 CC
BB 0 0 0 CC
BB 0 1 1 CC
BB 0 0 1 1 0 CC
BB 0 1 1 CC
B@ 1 1 1 1 1 1 0 0 0 CA
0 0 0 0 1 1 1 1 1 1
(b) The seven-qubit code. This [[7; 1; 3]] code has six stabilizer genera-
tors, which can be expressed as
!
H~ = H ham 0
0 Hham ; (7.137)
where Hham is the 3 7 parity-check matrix of the classical [7,4,3]
Hamming code. The three check operators
M 1 = Z 1Z 3Z 5Z 7
M 2 = Z 2Z 3Z 6Z 7
M 3 = Z 4Z 5Z 6Z 7; (7.138)
42 CHAPTER 7. QUANTUM ERROR CORRECTION
detect the bit ips, and the three check operators
M 4 = X 1 X 3 X 5X 7
M 5 = X 2 X 3 X 6X 7
M 6 = X 4 X 5 X 6X 7 ; (7.139)
detect the phase errors. The space with M 1 = M 2 = M 3 = 1 is
spanned by the codewords that satisfy the Hamming parity check. Re-
calling that a Hadamard change of basis interchanges Z and X , we
see that the space with M 4 = M 5 = M 6 is spanned by codewords
that satisfy the Hamming parity check in the Hadamard-rotated ba-
sis. Indeed, we constructed the seven-qubit code by demanding that
the Hamming parity check be satis ed in both bases. The generators
commute because the Hamming code contains its dual code; i.e., each
row of Hham satis es the Hamming parity check.
(c) CSS codes. Recall whenever an [n; k; d] classical code C contains its
dual code C ?, we can perform the CSS construction to obtain an
[[n; 2k ; n; d]] quantum code. The stabilizer of this code can be written
as
!
H
H~ = 0 H 0 (7.140)
where H is the (n ; k) n parity check matrix of C . As for the seven-
qubit code, the stabilizers commute because C contains C ?, and the
code subspace is spanned by states that satisfy the H parity check in
both the F -basis and the P -basis. Equivalently, codewords obey the H
parity check and are invariant under
jvi ! jv + wi; (7.141)
where w 2 C ?.
(d) More general CSS codes. Consider, more generally, a stabilizer
whose generators can each be chosen to be either a product of Z 's
( j0) or a product of X 's (0j ). Then the generators have the form
!
H
H~ = 0 H :Z 0 (7.142)
X
7.9. STABILIZER CODES 43
Now, what condition must HX and HZ satisfy if the Z -generators and
X -generators are to commute? Since Z 's must collide with X 's an
even number of times, we have
HX HZT = HZ HXT = 0 : (7.143)
But this is just the requirement that the dual CX? of the code whose
parity check is HX be contained in the code CZ whose parity check is
HZ . In other words, this QECC ts into the CSS framework, with
C2 = CX? C1 = CZ : (7.144)
So we may characterize CSS codes as those and only those for which
the stabilizer has generators of the form eq. (7.142).
However there is a caveat. The code de ned by eq. (7.142) will be non-
degenerate if errors are restricted to weight less than d = min(dZ ; dX )
(where dZ is the distance of CZ , and dX the distance of CX ). But the
true distance of the QECC could exceed d. For example, the 9-qubit
code is in this generalized sense a CSS code. But in that case the
classical code CX is distance 1, re ecting that, e.g., Z 1Z 2 is contained
in the stabilizer. Nevertheless, the distance of the CSS code is d = 3,
since no weight-2 Pauli operator lies in S ? n S .
(a) The 9-qubit Code. As we have discussed previously, the logical oper-
ators can be chosen to be
Z = X 1X 2 X 3 ;
X = Z 1Z 4 Z 7 : (7.146)
These anti-commute with one another (an X and a Z collide at position
1), commute with the stabilizer generators, and are independent of the
generators (no element of the stabilizer contains three X 's or three
Z 's).
{ Figure {
The Hadamard rotations on the rst and fourth qubits rotate M 1 to the
tensor product of Z 's ZZZZI , and the CNOT's then imprint the value
of this operator on the ancilla. The nal Hadamard rotations return the
encoded block to the standard code subspace. Circuits for measuring M 2;3;4
are obtained from the above by cyclically permuting the ve qubits in the
code block.
What about encoding? We want to construct a unitary transformation
U encode : j0000i (aj0i + bj1i) ! aj0i + bj1i: (7.158)
We have already seen that j00000i is a Z = 1 eigenstate, and that j00001i is
a Z = ;1 eigenstate. Therefore (up to normalization)
0 1
X
aj0i + bj1i = @ M A j0000i (aj0i + bj1i): (7.159)
M 2S
7.10. THE 5-QUBIT CODE 49
So we need to gure out how to construct a circuit that applies (P M ) to
an initial state.
Since the generators are independent, each element of the stabilizer can be
expressed as a product of generators as a unique way, and we may therefore
rewrite the sum as
X
M = (I + M 4)(I + M 3 )(I + M 2 )(I + M 1) :
M 2S (7.160)
Now to proceed further it is convenient to express the stabilizer in an alter-
native form. Note that we have the freedom to replace the generator M i by
M i M j without changing the stabilizer. This replacement is equivalent to
adding the j th row to the ith row in the matrix H~ . With such row opera-
tions, we can perform a Gaussian elimination on the 4 5 matrix HX , and
so obtain the new presentation for the stabilizer
0 11011 10001 1
B 00110 01001 CC
H~ 0 = B
B@ 11000 00101 CA ; (7.161)
10111 00011
or
M1 = Y ZIZY
M2 = IXZZX
M3 = ZZXIX
M4 = ZIZY Y (7.162)
In this form M i applies an X ( ip) only to qubits i and 5 in the block.
Adopting this form for the stabilizer, we can apply p12 (I + M 1) to a state
j0; z2; z3; z4; z5i by executing the circuit
{ Figure {
The Hadamard prepares p12 (j0i + j1i. If the rst qubit is j0i, the other
operations don't do anything, so I is applied. But if the rst qubit is j1i,
then X has been applied to this qubit, and the other gates in the circuit apply
50 CHAPTER 7. QUANTUM ERROR CORRECTION
ZZIZY , conditioned on the rst qubit being j1i. Hence, Y ZIZY = M 1
has been applied. Similar circuits can be constructed that apply p12 (I + M 2)
to jz1; 0; z3; z4; z5i, and so forth. Apart from the Hadamard gates each of these
circuits applies only Z 's and conditional Z 's to qubits 1 through 4; these
qubits never ip. (It was to ensure thus that we performed the Gaussian
elimination on HX .) Therefore, we can construct our encoding circuit as
{ Figure {
{ Figure {
There are 22m such functions forming what we may regard as a binary vector
space of dimension 2m. It will be useful to have a basis for this space. Recall
(x6.1), that any Boolean function has a disjunctive normal form. Since the
NOT of a bit x is 1 ; x, and the OR of two bits x and y can be expressed as
x _ y == x + y ; xy ; (7.216)
$(n) = $ : : : $; (7.243)
and yet can still be decoded with high delity.
The rate of a code is de ned as
(n)
log H
R = log Hcode(n) ; (7.244)
this is the number of qubits employed to carry one qubit of encoded infor-
mation. The quantum channel capacity Q($) of the superoperator $ is the
7.16. THE QUANTUM CHANNEL CAPACITY 75
maximum asymptotic rate at which quantum information can be sent over
the channel with arbitrarily good delity. That is, Q($) is the largest number
such that for any R < Q($) and any " > 0, there is a code H(coden) with rate at
least R, such that for any j i 2 H(code
n) , the state recovered after j i passes
through $(n) has delity
F = h jj i > 1 ; ": (7.245)
Thus, Q($) is a quantum version of the capacity de ned by Shannon
for a classical noisy channel. As we have already seen in Chapter 5, this
Q($) is not the only sort of capacity that can be associated with a quantum
channel. It is also of considerable interest to ask about C ($), the maximum
rate at which classical information can be transmitted through a quantum
channel with arbitrarily small probability of error. A formal answer to this
question was formulated in x5.4, but only for a restricted class of possible
encoding schemes; the general answer is still unknown. The quantum channel
capacity Q($) is even less well understood than the classical capacity C ($) of
a quantum channel. Note that Q($) is not the same thing as the maximum
asymptotic rate k=n that can be achieved by \good" [[n; k; d]] QECC's with
positive d=n. In the case of the quantum channel capacity we need not insist
that the code correct any possible distribution of pn errors, as long as the
errors that cannot be corrected become highly atypical for n large.
Here we will mostly limit the discussion to two interesting examples of
quantum channels acting on a single qubit | the quantum erasure channel
(for which Q is exactly known), and the depolarizing channel (for which Q
is still unknown, but useful upper and lower bounds can be derived).
What are these channels? In the case of the quantum erasure chan-
nel, a qubit transmitted through the channel either arrives intact, or (with
probability p) becomes lost and is never received. We can nd a unitary rep-
resentation of this channel by embedding the qubit in the three-dimensional
Hilbert space of a qubit with orthonormal basis fj0i; j1i; j2ig. The channel
acts according to
q
j0i j0iE ! 1 ; pj0i j0iE + ppj2i j1iE ;
q
j1i j0iE ! 1 ; pj1i j0iE + ppj2i j2iE ; (7.246)
where fj0iE ; j1iE ; j2iE g are mutually orthogonal states of the environment.
The receiver can measure the observable j2ih2j to determined whether the
qubit is undamaged or has been \erased."
76 CHAPTER 7. QUANTUM ERROR CORRECTION
The depolarizing channel (with error probability p) was discussed at
length in x3.4.1. We see that, for p 3=4, we may describe the fate of
a qubit transmitted through the channel this way: with probability 1 ; q
(where q = 4=3p), the qubit arrives undamaged, and with probability q it is
destroyed, in which case it is described by the random density matrix 12 1.
Both the erasure channel and the depolarizing channel destroy a qubit
with a speci ed probability. The crucial di erence between the two channels
is that in the case of the erasure channel, the receiver knows which qubits
have been destroyed; in the case of the depolarizing channel, the damaged
qubits carry no identifying marks, which makes recovery more challenging.
Of course, for both channels, the sender has no way to know ahead of time
which qubits will be obliterated.
{ Figure {
We observe that the erasure channel can be realized if Alice sends a qubit
to Bob, and a third party Charlie decides at random to either steal the
qubit (with probability p) or allow the qubit to pass unscathed to Bob (with
probability 1 ; p).
If Alice sends a large number n of qubits, then about (1 ; p)n reach Bob,
and pn are intercepted by Charlie. Hence for p > 12 , Charlie winds up in
possession of more qubits than Bob, and if Bob can recover the quantum
information encoded by Alice, then certainly Charlie can as well. Therefore,
if Q(p) > 0 for p > 21 , Bob and Charlie can clone the unknown encoded
quantum states sent by Alice, which is impossible. (Strictly speaking, they
can clone with delity F = 1 ; ", for any " > 0.) We conclude that Q(p) = 0
for p > 12 .
7.16. THE QUANTUM CHANNEL CAPACITY 77
To obtain a bound on Q(p) in the case p < 12 , we will appeal to the
following lemma. Suppose that Alice and Bob are connected by both a
perfect noiseless channel and a noisy channel with capacity Q > 0. And
suppose that Alice sends m qubits over the perfect channel and n qubits
over the noisy channel. Then the number r of encoded qubits that Bob may
recover with arbitrarily high delity must satisfy
r m + Qn: (7.247)
We derive this inequality by noting that Alice and Bob can simulate the m
qubits sent over the perfect channel by sending m=Q over the noisy channel
and so achieve a rate
!
r
R = m=Q + n = m + Qn Q;r (7.248)
over the noisy channel. Were r to exceed m + Qn, this rate R would exceed
the capacity, a contradiction. Therefore eq. (7.247) is satis ed.
How consider the erasure channel with error probability p1, and suppose
Q(p1) > 0. Then we can bound Q(p2) for p2 p1 by
Q(p2) 1 ; pp2 + pp2 Q(p1): (7.249)
1 1
(In other words, if we plot Q(p) in the (p; Q) plane, and we draw a straight line
segment from any point (p1 ; Q1) on the plot to the point (p = 0; Q = 1), then
the curve Q(p) must lie on or below the segment in the interval 0 p p1; if
Q(p) is twice di erentiable, then its second derivative cannot be positive.) To
obtain this bound, imagine that Alice sends n qubits to Bob, knowing ahead
of time that n(1 ; p2 =p1) speci ed qubits will arrive safely. The remaining
n(p2=p1) qubits are erased with probability p1. Therefore, Alice and Bob are
using both a perfect channel and an erasure channel with erasure probability
p1; eq. (7.247) holds, and the rate R they can attain is bounded by
R 1 ; pp2 + pp2 Q(p1): (7.250)
1 1
On the other hand, for n large, altogether about np2 qubits are erased, and
(1 ; p2)n arrive safely. Thus Alice and Bob have an erasure channel with
erasure probability p2 , except that they have the additional advantage of
78 CHAPTER 7. QUANTUM ERROR CORRECTION
knowing ahead of time that some of the qubits that Alice sends are invul-
nerable to erasure. With this information, they can be no worse o than
without it; eq. (7.249) then follows. The same bound applies to the depolar-
izing channel as well.
Now, the result Q(p) = 0 for p > 1=2 can be combined with eq. (7.249).
We conclude that the curve Q(p) must be on or below the straight line
connecting the points (p = 0; Q = 1) and (p = 1=2; Q = 0), or
Q(p) 1 ; 2p; 0 p 12 : (7.251)
In fact, there are stabilizer codes that actually attain the rate 1 ; 2p for
0 p 1=2. We can see this by borrowing an idea from Claude Shannon,
and averaging over random stabilizer codes. Imagine choosing, in succession,
altogether n ; k stabilizer generators. Each is selected from among the
4n Pauli operators, where all have equal a priori probability, except that
each generator is required to commute with all generators chosen in previous
rounds.
Now Alice uses this stabilizer code to encode an arbitrary quantum state
in the 2k -dimensional code subspace, and sends the n qubits to Bob over an
erasure channel with erasure probability p. Will Bob be able to recover the
state sent by Alice?
Bob replaces each erased qubit by a qubit in the state j0i, and then
proceeds to measure all n ; k stabilizer generators. From this syndrome
measurement, he hopes to infer the Pauli operator E acting on the replaced
qubits. Once E is known, we can apply E y to recover a perfect duplicate
of the state sent by Alice. For n large, the number of qubits that Bob must
replace is about pn, and he will recover successfully if there is a unique Pauli
operator E that can produce the syndrome that he nds. If more than one
Pauli operator acting on the replaced qubits has this same syndrome, then
recovery may fail.
How likely is failure? Since there are about pn replaced qubits, there are
about 4pn Pauli operators with support on these qubits. Furthermore, for any
particular Pauli operator E , a random stabilizer code generates a random
syndrome | each stabilizer generator has probability 1=2 of commuting with
E , and probability 1=2 of anti-commuting with E . Therefore, the probability
that two Pauli operators have the same syndrome is (1=2)n;k .
There is at least one particular Pauli operator acting on the replaced
qubits that has the syndrome found by Bob. But the probability that an-
7.16. THE QUANTUM CHANNEL CAPACITY 79
other Pauli operator has this same syndrome (and hence the probability of
a recovery failure) is no worse than
n;k
Pfail 4pn 21 = 2;n(1;2p;R) : (7.252)
where R = k=n is the rate. Eq. (7.252) bounds the failure probability if
we average over all stabilizer codes with rate R; it follows that at least one
particular stabilizer code must exist whose failure probability also satis es
the bound.
For that particular code, Pfail gets arbitrarily small as n ! 1, for any rate
R = 1 ; 2p ; strictly less than 1 ; 2p. Therefore R = 1 ; 2p is asymptotically
attainable; combining this result with the inequality eq. (7.251) we obtain
the capacity of the quantum erasure channel:
Q(p) = 1 ; 2p; 0 p 12 : (7.253)
If we wanted assurance that a distinct syndrome could be assigned to
all ways of damaging pn erased qubits, then we would require an [[n; k; d]]
quantum code with distance d > pn. Our Gilbert{Varshamov bound of x7.14
guarantees the existence of such a code for
R < 1 ; H2(p) ; p log2 3: (7.254)
This rate can be achieved by a code that recovers from any of the possible
ways of erasing up to pn qubits. It lies strictly below the capacity for p > 0,
because to achieve high average delity, it suces to be able to correct the
typical erasures, rather than all possible erasures.
{ Figure {
This modi ed channel consists (as shown) of: rst the inner encoder, then
propagation through the original noisy channel, and nally inner decoding
and inner recovery. The rate that can be attained through the original chan-
nel, via concatenated coding, is the same as the rate that can be attained
through the modi ed channel, via random coding.
Speci cally, suppose that the inner code is an m-qubit repetition code,
with stabilizer
Z 1 Z 2; Z 1 Z 3 ; Z 1Z 4 ; : : : ; Z 1Z m : (7.266)
This is not much of a quantum code; it has distance 1, since it is insensi-
tive to phase errors | each Z j commutes with the stabilizer. But in the
present context its important feature is it high degeneracy, all Z i errors are
equivalent.
The encoding (and decoding) circuit for the repetition code consists of
just m ; 1 CNOT's, so our composite channel looks like (in the case m = 3)
{ Figure {
84 CHAPTER 7. QUANTUM ERROR CORRECTION
where $ denotes the original noisy channel. (We have also suppressed the
nal recovery step of the decoding; e.g., if the measured qubits both read
1, we should ip the data qubit. In fact, to simplify the analysis of the
composite channel, we will dispense with this step.)
Since we recall that a CNOT propagates bit ips forward (from control
to target) and phase ips backward (from target to control), we see that for
each possible measurement outcome of the auxiliary qubits, the composite
channel is a Pauli channel. If we imagine that this measurement of the m ; 1
inner block qubits is performed for each of the n qubits of the outer block,
then Pauli channels act independently on each of the n qubits, but the chan-
nels acting on di erent qubits have di erent parameters (error probabilities
p(Ii); p(Xi); p(Yi); p(Zi) for the ith qubit). Now the number of typical error operators
acting on the n qubits is
Pn
2 i=1 Hi (7.267)
where
Hi = H (p(Ii); p(Xi); p(Yi); p(Zi)); (7.268)
is the Shannon entropy of the Pauli channel acting on the ith qubit. By the
law of large numbers, we will have
X
n
Hi = nhH i; (7.269)
i=1
for large n, where hH i is the Shannon entropy, averaged over the 2m;1 pos-
sible classical outcomes of the measurement of the extra qubits of the inner
code. Therefore, the rate that can be attained by the random outer code is
R = 1 ;mhH i ; (7.270)
(we divide by m, because the concatenated code has a length m times longer
than the random code).
Shor and Smolin discovered that there are repetition codes (values of m)
for which, in a suitable range of p; 1 ;hH i is positive while 1 ; H2(p) ; p log2 3
is negative. In this range, then, the capacity Q(p) is nonzero, showing that
the lower bound eq. (7.262) is not tight.
7.16. THE QUANTUM CHANNEL CAPACITY 85
A nonvanishing asymptotic rate is attainable through random coding for
1 ; H2 (p) ; p log2 3 > 0, or p < pmax ' :18929. If a random outer code is
concatenated with a 5-qubit inner repetition code (m = 5 turns out to be the
optimal choice), then 1 ;hH i > 0 for p < p0max ' :19036; the maximum error
probability for which a nonzero rate is attainable increases by about 0:6%.
It is not obvious that the concatenated code should outperform the random
code in this range of error probability, though as we have indicated, it might
have been expected because of the (phase) degeneracy of the repetition code.
Nor is it obvious that m = 5 should be the best choice, but this can be
veri ed by an explicit calculation of hH i.7
The depolarizing channel is one of the very simplest of quantum chan-
nels. Yet even for this case, the problem of characterizing and calculating
the capacity is largely unsolved. This example illustrates that, due to the
possibility of degenerate coding, the capacity problem is considerably more
subtle for quantum channels than for classical channels.
We have seen that (if the errors are well described by the depolarizing
channel), quantum information can be recovered from a quantum memory
with arbitrarily high delity, as long as the probability of error per qubit is
less than 19%. This is an improvement relative to the 10% error rate that
we found could be handled by concatenation of the [[5; 1; 3]] code. In fact
[[n; k; d]] codes that can recover from any distribution of up to pn errors do
not exist for p > 1=6, according to the Rains bound. Nonzero capacity is
possible for error rates between 16:7% and 19% because it is sucient for the
QECC to be able to correct the typical errors rather than all possible errors.
However, the claim that recovery is possible even if 19% of the qubits
sustain damage is highly misleading in an important respect. This result
applies if encoding, decoding, and recovery can be executed awlessly. But
these operations are actually very intricate quantum computations that in
practice will certainly be susceptible to error. We will not fully understand
how well coding can protect quantum information from harm until we have
learned to design an error recovery protocol that is robust even if the execu-
tion of the protocol is awed. Such fault-tolerant protocols will be developed
in Chapter 8.
7In fact a very slight furtherimprovement can be achieved by concatenating a random
code with the 25-qubit generalized Shor code described in the exercises { then a nonzero
rate is attainable for max ' 19056 (another 0 1% better than the maximum tolerable
p < p
00 : :
7.18 Exercises
7.1 Phase error-correcting code
a) Construct stabilizer generators for an n = 3, k = 1 code that can
correct a single bit ip; that is, ensure that recovery is possible for
any of the errors in the set E = fIII ; XII ; IXI ; IIX g. Find
an orthonormal basis for the two-dimensional code subspace.
b) Construct stabilizer generators for an n = 3, k = 1 code that can
correct a single phase error; that is, ensure that recovery is possible
for any of the errors in the set E = fIII ; ZII ; IZI ; IIZ g. Find
an orthonormal basis for the two-dimensional code subspace.
7.2 Error-detecting codes
a) Construct stabilizer generators for an [[n; k; d]] = [[3; 0; 2]] quantum
code. With this code, we can detect any single-qubit error. Find
the encoded state. (Does it look familiar?)
b) Two QECC's C1 and C2 (with the same length n) are equivalent
if a permutation of qubits, combined with single-qubit unitary
transformations, transforms the code subspace of C1 to that of
C2. Are all [[3; 0; 2]] stabilizer codes equivalent?
88 CHAPTER 7. QUANTUM ERROR CORRECTION
c) Does a [[3; 1; 2]] stabilizer code exist?
7.3 Maximal entanglement
Consider the [[5; 1; 3]] quantum code, whose stabilizer generators are
M1 = XZZXI , and M2;3;4 obtained by cyclic permutations of M1,
and choose the encoded operation Z to be Z = ZZZZZ . From the
encoded states j0i with Z j0i = j0i and j1i with Z j1i = ;j1i, construct
the n = 6, k = 0 code whose encoded state is
p1 (j0i j0i + j1i j1i) : (7.271)
2
a) Construct a set of stabilizer generators for this n = 6, k = 0 code.
b) Find the distance of this code. (Recall that for a k = 0 code, the
distance is de ned as the minimum weight of any element of the
stabilizer.)
c) Find (3), the density matrix that is obtained if three qubits are
selected and the remaining three are traced out.
7.4 Codewords and nonlocality
For the [[5,1,3]] code with stabilizer generators and logical operators as
in the preceding problem,
a) Express Z as a weight-3 Pauli operator, a tensor product of I 's,
X 's, and Z 's (no Y 's). Note that because the code is cyclic,
all cyclic permutations of your expression are equivalent ways to
represent Z .
b) Use the Einstein locality assumption (local hidden variables) to pre-
dict a relation between the ve (cyclically related) observables
found in (a) and the observable ZZZZZ . Is this relation among
observables satis ed for the state j0i?
c) What would Einstein say?
7.5 Generalized Shor code
For integer m 2, consider the n = m2, k = 1 generalization of Shor's
nine-qubit code, with code subspace spanned by the two states:
j0i = (j000 : : : 0i + j111 : : : 1i) m ;
j1i = (j000 : : : 0i ; j111 : : : 1i) m : (7.272)
7.18. EXERCISES 89
a) Construct stabilizer generators for this code, and construct the log-
ical operations Z and X such that
Z j0i = j0i ; X j0i = j1i ;
Z j1i = ;j1i ; X j1i = j0i : (7.273)
b) What is the distance of this code?
c) Suppose that m is odd, and suppose that each of the n = m2 qubits
is subjected to the depolarizing channel with error probability p.
How well does this code protect the encoded qubit? Speci cally,
(i) estimate the probability, to leading nontrivial order in p, of a
logical bit- ip error j0i $ j1i, and (ii) estimate the probability,
to leading nontrivial order in p, of a logical phase error j0i ! j0i,
j1i ! ;j1i.
d) Consider the asymptotic behavior of your answer to (c) for m large.
What condition on p should be satis ed for the code to provide
good protection against (i) bit ips and (ii) phase errors, in the
n ! 1 limit?
7.6 Encoding circuits
For an [[n,k,d]] quantum code, an encoding transformation is a unitary
U that acts as
U : j i j0i (n;k) ! j i ; (7.274)
where j i is an arbitrary k-qubit state, and j i is the corresponding
encoded state. Design a quantum circuit that implements the encoding
transformation for
a) Shor's [[9,1,3]] code.
b) Steane's [[7,1,3]] code.
7.7 Shortening a quantum code
a) Consider a binary [[n; k; d]] stabilizer code. Show that it is possible
to choose the n ; k stabilizer generators so that at most two act
nontrivially on the last qubit. (That is, the remaining n ; k ; 2
generators apply I to the last qubit.)
90 CHAPTER 7. QUANTUM ERROR CORRECTION
b) These n;k;2 stabilizer generators that apply I to the last qubit will
still commute and are still independent if we drop the last qubit.
Hence they are the generators for a code with length n;1 and k +1
encoded qubits. Show that if the original code is nondegenerate,
then the distance of the shortened code is at least d ; 1. (Hint:
First show that if there is a weight-t element of the (n ; 1)-qubit
Pauli group that commutes with the stabilizer of the shortened
code, then there is an element of the n-qubit Pauli group of weight
at most t + 1 that commutes with the stabilizer of the original
code.)
c) Apply the code-shortening procedure of (a) and (b) to the [[5; 1; 3]]
QECC. Do you recognize the code that results? (Hint: It may
be helpful to exploit the freedom to perform a change of basis on
some of the qubits.)
7.8 Codes for qudits
A qudit is a d-dimensional quantum system. The Pauli operators
I ; X ; Y ; Z acting on qubits can be generalized to qudits as follows.
Let fj0i; j1i; : : : ; jd ; 1ig denote an orthonormal basis for the Hilbert
space of a single qudit. De ne the operators:
X : jj i ! jj + 1 (mod d)i ;
Z : jj i ! ! j jj i ; (7.275)
where ! = exp(2i=d). Then the d d Pauli operators E r;s are
E r;s X r Z s ; r; s = 0; 1; : : : ; d ; 1 (7.276)
a) Are the E r;s's a basis for the space of operators acting on a qudit?
Are they unitary? Evaluate tr(E yr;sE t;u).
b) The Pauli operators obey
E r;s E t;u = (r;s;t;u )E t;u E r;s ; (7.277)
where r;s;t;u is a phase. Evaluate this phase.
The n-fold tensor products of these qudit Pauli operators form a group
G(nd) of order d2n+1 (and if we mod out its d-element center, we obtain
7.18. EXERCISES 91
the group G (nd) of order d2n ). To construct a stabilizer code for qudits,
we choose an abelian subgroup of G(nd) with n ; k generators; the code
is the simultaneous eigenstate with eigenvalue one of these generators.
If d is prime, then the code subspace has dimension dk : k logical qudits
are encoded in a block of n qudits.
c) Explain how the dimension might be di erent if d is not prime.
Hint: Consider the case d = 4 and n = 1.)
7.9 Syndrome measurement for qudits
Errors on qudits are diagnosed by measuring the stabilizer generators.
For this purpose, we may invoke the two-qudit gate SUM (which gen-
eralizes the controlled-NOT), acting as
SUM : jj i jki ! jj i jk + j (mod d)i : (7.278)
a) Describe a quantum circuit containing SUM gates that can be exe-
cuted to measure an n-qudit observable of the form
O sa
Za : (7.279)
a
If d is prime, then for each r; s = 0; 1; 2; : : : ; d ; 1, there is a single-qudit
unitary operator U r;s such that
U r;s E r;s U yr;s = Z : (7.280)
b) Describe a quantum circuit containing SUM gates and U r;s gates
that can be executed to measure an arbitrary element of G(nd) of
the form
O
E ra ;sa : (7.281)
a
7.10 Error-detecting codes for qudits
A qudit with d = 3 is called a qutrit. Consider a qutrit stabilizer
code with length n = 3 and k = 1 encoded qutrit de ned by the two
stabilizer generators
ZZZ ; XXX : (7.282)
92 CHAPTER 7. QUANTUM ERROR CORRECTION
a) Do the generators commute?
b) Find the distance of this code.
c) In terms of the orthonormal basis fj0i; j1i; j2ig for the qutrit, write
out explicitly an orthonormal basis for the three-dimensional code
subspace.
d) Construct the stabilizer generators for an n = 3m qutrit code (where
m is any positive integer), with k = n ; 2, that can detect one
error.
e) Construct the stabilizer generators for a qudit code that detects one
error, with parameters n = d, k = d ; 2.
7.11 Error-correcting code for qudits
Consider an n = 5, k = 1 qudit stabilizer code with stabilizer generators
X Z Z ;1 X ;1 I
I X Z Z ;1 X ;1
X ;1 I X Z Z ;1 (7.283)
Z ;1 X ;1 I X Z
(the second, third, and fourth generators are obtained from the rst by
a cyclic permutation of the qudits).
a) Find the order of each generator. Are the generators really in-
dependent? Do they commute? Is the fth cyclic permutation
Z Z ;1 X ;1 I X independent of the rest?
b) Find the distance of this code. Is the code nondegenerate?
c) Construct the encoded operations X and Z , each expressed as an
operator of weight 3. (Be sure to check that these operators obey
the right commutation relations for any value of d.)
Lecture Notes for Physics 219:
Quantum Computation
John Preskill
California Institute of Technology
14 June 2004
Contents
2
Contents 3
References 67
9
Topological quantum computation
4
9.1 Anyons, anyone? 5
∗
Two interesting approaches to realizing nonabelian anyons — using superconduct-
ing junction arrays and using cold atoms trapped in optical lattices — have been
discussed in the recent literature.
9.2 Flux-charge composites 7
The relative sign in the superposition flips, but this has no detectable
physical effects, since all observables are block diagonal in the (−1)F
basis.
Similarly, in two dimensions, the shift in the angular momentum spec-
trum e−2πiJ = eiθ has no unacceptable physical consequences if there is
9.3 Spin and statistics 9
phase generated when one of the two objects is rotated by 2π. Thus the
connection between spin and statistics continues to hold, in a form that
is a natural generalization of the connection that applies to bosons and
fermions.
The origin of this connection is fairly clear in our flux-charge composite
model, but in fact it holds much more generally. Why? Reading textbooks
on relativistic quantum field theory, one can easily get the impression that
the spin-statistics connection is founded on Lorentz invariance, and has
something to do with the properties of the complexified Lorentz group.
Actually, this impression is quite misleading. All that is essential for a
spin-statistics connection to hold is the existence of antiparticles. Special
relativity is not an essential ingredient.
Consider an anyon, characterized by the phase θ, and suppose that this
particle has a corresponding antiparticle. This means that the particle
and its antiparticle, when combined, have trivial quantum numbers (in
particular, zero angular momentum) and therefore that there are physical
processes in which particle-antiparticle pairs can be created and annihi-
lated. Draw a world line in spacetime that represents a process in which
two particle-antiparticle pairs are created (one pair on the left and the
other pair on the right), the particle from the pair on the right is ex-
changed in a counterclockwise sense with the particle from the pair on
the left, and then both pairs reannihilate. (The world line has an orien-
tation; if directed forward in time it represents a particle, and if directed
backward in time it represents an antiparticle.) Turning our diagram 90◦ ,
we obtain a depiction of a process in which a single particle-antiparticle
pair is created, the particle and antiparticle are exchanged in a clock-
wise sense, and then the pair reannihilates. Turning it 90◦ yet again, we
have a process in which two pairs are created and the antiparticle from
the pair on the right is exchanged, in a counterclockwise sense, with the
antiparticle from the pair on the left, before reannihilation.
Raa = R−1
aā = Rāā . (9.6)
9.4 Combining anyons 11
If a is an anyon with exchange phase eiθ , then its antiparticle ā also has
the same exchange phase. Furthermore, when a and ā are exchanged
counterclockwise, the phase acquired is e−iθ .
These conclusions are unsurprising when we interpret them from the
perspective of our flux-charge composite model of anyons. The antipar-
ticle of the object with flux Φ and charge q has flux −Φ and charge −q.
Hence, when we exchange two antiparticles, the minus signs cancel and
the effect is the same as though the particles were exchanged. But if we
exchange a particle and an antiparticle, then the relative sign of charge
and flux results in the exchange phase e−iqΦ = e−iθ .
But what is the connection between these observations about statistics
and the spin? Continuing to contemplate the same spacetime diagram, let
us consider its implications regarding the orientation of the particles. For
keeping track of the orientation, it is convenient to envision the particle
world line not as a thread but as a ribbon in spacetime. I claim that our
process can be smoothly deformed to one in which a particle-antiparticle
pair is created, the particle is rotated counterclockwise by 2π, and then
the pair reannihilates. A convenient way to verify this assertion is to take
off your belt (or borrow a friend’s). The buckle at one end specifies an
orientation; point your thumb toward the buckle, and following the right-
hand rule, twist the belt by 2π before rebuckling it. You should be able
to check that you can lay out the belt to match the spacetime diagram for
any of the exchange processes described earlier, and also for the process
in which the particle rotates by 2π.
Thus, in a topological sense, rotating a particle counterclockwise by 2π
is really the same thing as exchanging two particles in a counterclockwise
sense (or exchanging particle and antiparticle in a clockwise sense), which
provides a satisfying explanation for a general spin-statistics connection.†
I emphasize again that this argument invokes processes in which particle-
antiparticle pairs are created and annihilated, and therefore the existence
of antiparticles is an essential prerequisite for it to apply.
†
Actually, this discussion has been oversimplified. Though it is adequate for abelian
anyons, we will see that it must be amended for nonabelian anyons, because Rab has
more than one eigenvalue in the nonabelian case. Similarly, the discussion in the next
section of “combining anyons” will need to be elaborated because, in the nonabelian
case, more than one kind of composite anyon can be obtained when two anyons are
fused together.
12 9 Topological quantum computation
Suppose that a is an anyon with exchange phase eiθ , and that we build
a “molecule” from n of these a anyons. What phase is acquired under a
counterclockwise exchange of the two molecules?
The answer is clear in our flux-charge composite model. Each of the n
charges in one molecule acquires a phase eiθ/2 when transported half way
around each of the n fluxes in the other molecule. Altogether then, 2n2
factors of the phase eiθ/2 are generated, resulting in the total phase
2θ
eiθn = ein . (9.7)
Said another way, the phase eiθ occurs altogether n2 times because in
effect n anyons in one molecule are being exchanged with n anyons in
the other molecule. Contrary to what we might have naively expected, if
we split a fermion (say) into two
√identical constituents, the constituents
have, not an exchange phase of −1 = i, but rather (eiπ )1/4 = eiπ/4 .
This behavior is compatible with the spin-statistics connection: the
angular momentum J of the n-anyon molecule satisfies
2J 2θ
e−2πiJn = e−2πin = ein . (9.8)
and this orbital angular momentum combines additively with the spin S
to produce the total angular momentum
−2πJ = −2πL−2πS = 2θ+2θ+ 2π(integer) = 4θ+ 2π(integer) . (9.12)
What if, on the other hand, we build a molecule āa from an anyon a
and its antiparticle ā? Then, as we’ve seen, the spin S has the same value
as for the aa molecule. But the exchange phase has the opposite value, so
that the noninteger part of the orbital angular momentum is −2πL = −2θ
instead of −2πL = 2θ, and the total angular momentum J = L + S is
an integer. This property is necessary, of course, if the āa pair is to be
able to annihilate without leaving behind an object that carries nontrivial
angular momentum.
which is sometimes called the Yang-Baxter relation. You can verify the
Yang-Baxter relation by drawing the two braids σ1 σ2 σ1 and σ2 σ1 σ2 on
a piece of paper, and observing that both describe a process in which
the particles initially in positions 1 and 3 are exchanged counterclockwise
9.5 Unitary representations of the braid group 15
about the particle labeled 2, which stays fixed — i.e., these are topologi-
cally equivalent braids.
V1 V2
V2 V1
V1 V2
length later on) that there is more to a model of anyons than a mere rep-
resentation of the braid group. In our flux tube model of abelian anyons,
we were able to describe not only the effects of an exchange of anyons, but
also the types of particles that can be obtained when two or more anyons
are combined together. Likewise, in a general anyon model, the anyons
are of various types, and the model incorporates “fusion rules” that spec-
ify what types can be obtained when two anyons of particular types are
combined. Nontrivial consistency conditions arise because fusion is asso-
ciate (fusing a with b and then fusing the result with c is equivalent to
fusing b with c and then fusing the result with a), and because the fusion
rules must be consistent with the braiding rules. Though these consis-
tency conditions are highly restrictive, many solutions exist, and hence
many different models of nonabelian anyons are realizable in principle.
1
18 9 Topological quantum computation
θ = πp/q , (9.18)
where q and p (p < 2q) are positive integers with no common factor. Then
we conclude that T1 must have at least q distinct eigenvalues; T1 acting
on α generates an orbit with q distinct values:
2πp
α+ k (mod 2π) , k = 0, 1, 2, . . . , q − 1 . (9.19)
q
Since T1 commutes with H, on the torus the ground state of our anyonic
system (indeed, any energy eigenstate) must have a degeneracy that is an
integer multiple of q. Indeed, generically (barring further symmetries or
accidental degeneracies), the degeneracy is expected to be exactly q.
For a two-dimensional surface with genus g (a sphere with g “handles”),
the degree of this topological degeneracy becomes q g , because there are
operators analogous to T1 and T2 associated with each of the g handles,
and all of the T1 -like operators can be simultaneously diagonalized. Fur-
thermore, we can apply a similar argument to a finite planar medium if
single anyons can be created and destroyed at the edges of the system. For
example, consider an annulus in which anyons can appear or disappear
at the inner and outer edges. Then we could define the unitary opera-
tor T1 as describing a process in which an anyon winds counterclockwise
around the annulus, and a unitary operator T2 as describing a process in
which an anyon appears at the outer edge, propagates to the inner edge,
and disappears. These operators T1 and T2 have the same commutator
as the corresponding operators defined on the torus, and so we conclude
as before that the ground state on the annulus is q-fold degenerate for
θ = πp/q. For a disc with h holes, there is an operator analogous to
T1 that winds an anyon counterclockwise around each of the holes, and
an operator analogous to T2 that propagates an anyon from the outer
boundary of the disk to the edge of the hole; thus the degeneracy is q h .
9.6 Topological degeneracy 19
‡
If you are familiar with Euclidean path integral methods, you’ll find it easy to verify
that in the leading semiclassical approximation the amplitude A for such a tunneling
process in which the anyon propagates a distance L has the form A = Ce−L/L0 ,
where C is a constant and L0 = h̄ (2m∗ ∆)−1/2 ; here h̄ is Planck’s constant and m∗
is the effective mass of the anyon, defined so that the kinetic energy of an anyon
traveling at speed v is 12 m∗ v 2 .
20 9 Topological quantum computation
both arising from processes in which world lines of charges and fluxon link
once with one another. Thus T1,S and T2,S can be diagonalized simulta-
neously, and can be regarded as the encoded Pauli operators Z̄1 and Z̄2
acting on two protected qubits. The operator T2,P , which commutes with
Z̄1 and anticommutes with Z̄2 , can be regarded as the encoded X̄1 , and
similarly T1,P is the encoded X̄2 .
On the torus, the degeneracy of the four ground states is exact for
the ideal Hamiltonian we constructed (the particles have infinite effective
masses). Weak local perturbations will break the degeneracy, but only
by an amount that gets exponentially small as the linear size L of the
torus increases. To be concrete, suppose the perturbation is a uniform
“magnetic field” pointing in the ẑ direction, coupling to the magnetic
moments of the qubits:
H = −h Z . (9.22)
Because of the nonzero energy gap, for the purpose of computing in per-
turbation theory the leading contribution to the splitting of the degen-
eracy, it suffices to consider the effect of the perturbation in the four-
dimensional subspace spanned by the ground states of the unperturbed
system. In the toric code, the operators with nontrivial matrix elements
in this subspace are those such that Z ’s act on links that form a closed
loop that wraps around the torus (or X ’s act on links whose dual links
form a closed loop that wraps around the torus). For an L × L lattice on
the torus, the minimal length of such a closed loop is L; therefore nonva-
nishing matrix elements do not arise in perturbation theory until the Lth
order, and are suppressed by hL . Thus, for small h and large L, memory
errors due to quantum fluctuations occur only with exponentially small
amplitude.
R
The matrix elements Dij (a) are measurable in principle, for example by
conducting interference experiments in which a beam of calibrated charges
can pass on either side of the flux. (The phase of the complex number
R
Dij (a) determines the magnitude of the shift of the interference fringes,
and the modulus of Dij R (a) determines the visibility of the fringes.) Thus
once we have chosen a standard basis for the charges, we can use the
charges to attach labels (elements of G) to all fluxes. The flux labels
are unambiguous as long as the representation R is faithful, and barring
any group automorphisms (which create ambiguities that we are free to
resolve however we please).
However, the group elements that we attach to the fluxes depend on our
conventions. Suppose I am presented with k fluxons (particles that carry
flux), and that I use my standard charges to measure the flux of each
particle. I assign group elements a1 , a2 , . . . , ak ∈ G to the k fluxons. You
are then asked to measure the flux, to verify my assignments. But your
standard charges differ from mine, because they have been surreptitiously
transported around another flux (one that I would label with g ∈ G).
Therefore you will assign the group elements ga1 g −1 , ga2g −1 , . . ., gak g −1
to the k fluxons; our assignments differ by an overall conjugation by g.
The moral of this story is that the assignment of group elements to
fluxons is inherently ambiguous and has no invariant meaning. But be-
cause the valid assignments of group elements to fluxons differ only by
conjugation by some element g ∈ G, the conjugacy class of the flux in
G does have an invariant meaning on which all observers will agree. In-
deed, even if we fix our conventions at the charge bureau of standards, the
group element that we assign to a particular fluxon may change if that
fluxon takes part in a physical process in which it braids with other flux-
ons. For that reason, the fluxons belonging to the same conjugacy class
should all be regarded as indistinguishable particles, even though they
come in many varieties (one for each representative of the class) that can
be distinguished when we make measurements at a particular time and
place: The fluxons are nonabelian anyons.
DED 1
D E
x0
D
x0
It follows that the effect of transporting a charge around the path α, after
the exchange, is equivalent to the effect of transport around the path
αβα−1 , before the exchange; similarly, the effect of transport around β,
after the exchange, is the same as the effect of transport around α before.
We conclude that the braid operator R representing a counterclockwise
26 9 Topological quantum computation
Thus, if the two fluxons are exchanged three times, they swap positions
(the number of exchanges is odd), yet the labeling of the state is unmod-
ified. This observation means that there can be quantum interference
between the “direct” and “exchange” scattering of two fluxons that carry
distinct labels in the same conjugacy class, reinforcing the notion that
fluxes carrying conjugate labels ought to be regarded as indistinguishable
particles.
Since the braid operator acting on pairs of two-cycle fluxes satisfies
R3 = I, its eigenvalues are third roots of unity. For example, by taking
linear combinations of the three states with total flux (123), we obtain
the R eigenstates
where ω = e2πi/3 .
Although a pair of fluxes |a, a−1 with trivial total flux has trivial braid-
ing properties, it is interesting for another reason — it carries charge. The
way to detect the charge of an object is to carry a flux b around the ob-
ject (counterclockwise); this modifies the object by the action of DR (b) for
some representation R of G. If the charge is zero then the representation
is trivial — D(b) = I for all b ∈ G. But if we carry flux b counterclockwise
around the state |a, a−1 , the state transforms as
where |α| denotes the order of α. A pair of fluxons in the class α that can
be created in a local process must not carry any conserved charges and
therefore must be in the state |0; α. Other linear combinations orthogonal
to |0; α carry nonzero charge. This charge carried by a pair of fluxons can
be detected by other fluxons, yet oddly the charge cannot be localized on
the core of either particle in the pair. Rather it is a collective property of
the pair. If two fluxons with a nonzero total charge are brought together,
complete annihilation of the pair will be forbidden by charge conservation,
even though the total flux is zero.
28 9 Topological quantum computation
where
χR (a) = R
Dii (a) = tr DR (a) (9.41)
i
is the character of the representation R, evaluated at a. In fact, the
character (a trace) is unchanged by conjugation — it takes the same value
for all a ∈ α. Therefore, eq. (9.40) is also the probability that the pair of
chargeons has zero total charge when one chargeon (initially a member
of a pair in the state |0; R) winds around one fluxon (initially a member
of a pair in the state |0; α). Of course, since the total charge of all four
particles is zero and charge is conserved, after the winding the two pairs
have opposite charges — if the pair of chargeons has total charge R , then
the pair of fluxons must have total charge R̄ , combined with R to give
trivial total charge. A pair of particles with zero total charge and flux can
annihilate, leaving no stable particle behind, while a pair with nonzero
charge will be unable to annihilate completely. We conclude, then, that
if the world lines of a fluxon pair and a chargeon pair link once, the
probability that both pairs will be able to annihilate is given by eq. (9.40).
This probability is less than one, provided that the representation of R
is not one dimensional and the class α is not represented trivially. Thus
the linking of the world lines induces an exchange of charge between the
two pairs.
For example, in the case where α is the two-cycle class of G = S3 and
R = [2] (the two-dimensional irreducible representation of S3 ), we see
from eq. (9.37) that χ[2](α) = 0. Therefore, charge is transfered with
certainty; after the winding, both the fluxon pair and the chargeon pair
transform as R = [2].
Since the sum over the dimension squared for all irreducible representa-
tions of a finite group is the order of the group, and the order of the
normalizer N (α) is |G|/|α|, we obtain
D2 = |α| · |G| = |G|2 ; (9.44)
α
9.10 Superselection sectors of a nonabelian superconductor 31
We have already noted that the fusion of two two-cycle fluxes can yield
either a trivial total flux or a three-cycle flux, and that the charge of the
composite with trivial total flux can be either [+] or [2]. If the total flux
is a three-cycle, then the charge eigenstates are just the braid operator
eigenstates that we constructed in eq. (9.33).
For a system of two anyons, why should the eigenstates of the total
charge also be eigenstates of the braid operator? We can understand this
connection more generally by thinking about the angular momentum of
the two-anyon composite object. The monodromy operator R2 captures
the effect of winding one particle counterclockwise around another. This
winding is almost the same thing as rotating the composite system coun-
terclockwise by 2π, except that the rotation of the composite system also
rotates both of the constituents. We can compensate for the rotation of
the constituents by following the counterclockwise rotation of the compos-
ite by a clockwise rotation of the constituents. Therefore, the monodromy
operator can be expressed as
which is less than one if the flux ab−1 is not the identity (assuming that the
representation R is not one-dimensional and represents ab−1 nontrivially).
Thus, if annihilation of the chargeon pair does not occur, we know for sure
that a and b are distinct fluxes, and each time annihilation does occur,
it becomes increasingly likely that a and b are equal. By repeating this
procedure a modest number of times, we can draw a conclusion about
whether a and b are the same, with high statistical confidence.
This procedure allows us to sort the fluxon pairs into bins, where each
pair in a bin has the same flux. If a bin contains n pairs, its state is, in
general, a mixture of states of the form
ψa|a, a−1 ⊗n . (9.50)
a∈G
By discarding just one pair in the bin, each such state becomes a mixture
ρa (|aa|)⊗(n−1) ; (9.51)
a∈g
we may regard each bin as containing (n − 1) pairs, all with the same
definite flux, but where that flux is as yet unknown.
Which bin is which? We want to label the bins with elements of G. To
arrive at a consistent labeling, we withdraw fluxon pairs from three dif-
ferent bins. Suppose the three pairs are |a, a−1 , |b, b−1, and |c, c−1, and
that we want to check whether c = ab. We create a chargeon-antichargeon
pair, carry the chargeon around a closed path that encloses the first mem-
ber of the first fluxon pair, the first member of the second fluxon pair,
and second member of the third fluxon pair, and observe whether the
reunited chargeon pair annihilates or not. Since the total flux enclosed
by the chargeon’s path is abc−1 , by repeating this procedure we can de-
termine with high statistical confidence whether ab and c are the same.
Such observations allow us to label the bins in some manner that is consis-
tent with the group composition rule. This labeling is unique apart from
group automorphisms (and ambiguities arising from any automorphisms
may be resolved arbitrarily).
Once the flux bureau of standards is established, we can use it to mea-
sure the unknown flux of an unlabeled pair. If the state of the pair to
be measured is |d, d−1 , we can withdraw the labeled pair |a, a−1 from
a bin, and use chargeon pairs to measure the flux ad−1 . By repeating
this procedure with other labeled fluxes, we can eventually determine the
value of the flux d, realizing a projective measurement of the flux.
For a simulation of a quantum circuit using fluxons, we will need to
perform logic gates that act upon the value of the flux. The basic gate we
will use is realized by winding counterclockwise a fluxon pair with state
9.11 Quantum computing with nonabelian fluxons 35
|a, a−1 around the first member of another fluxon pair with state |b, b−1.
Since the |a, a−1 pair has trivial total flux, the |b, b−1 pair is unaffected
by this procedure. But since in effect the flux b travels counterclockwise
about both members of the pair whose initial state was |a, a−1 , this pair
is transformed as
|a, a−1 → |bab−1 , ba−1b−1 . (9.52)
We will refer to this operation as the conjugation gate acting on the fluxon
pair.
To summarize what has been said so far, our primitive and derived
capabilities allow us to: (1) Perform a projective flux measurement, (2)
perform a destructive measurement that determines whether or not the
flux and charge of a pair is trivial, and (3) execute a conjugation gate.
Now we must discuss how to simulate a quantum circuit using these ca-
pabilities.
The next step is to decide how to encode qubits using fluxons. Ap-
propriate encodings can be chosen in many ways; we will stick to one
particular choice that illustrates the key ideas — namely we will encode a
qubit by using a pair of fluxons, where the total flux of the pair is trivial.
We select two noncommuting elements a, b ∈ G, where b2 = e, and choose
a computational basis for the qubit
The crucial point is that a single isolated fluxon with flux a looks iden-
tical to a fluxon with the conjugate flux bab−1 . Therefore, if the two
fluxons in a pair are kept far apart from one another, local interactions
with the environment will not cause a superposition of the states |0̄ and
|1̄ to decohere. The quantum information is protected from damage be-
cause it is stored nonlocally, by exploiting a topological degeneracy of the
states where the fluxon and antifluxon are pinned to fixed and distantly
separated positions.
However, in contrast with the topological degeneracy that arises in
systems with abelian anyons, this protected qubit can be measured rela-
tively easily, without resorting to delicate interferometric procedures that
extract Aharonov-Bohm phases. We have already described how to mea-
sure flux using previously calibrated fluxons; therefore we can perform
a projective measurement of the encoded Pauli operator Z̄ (a projection
onto the basis {|0̄, |1̄}). We can also measure the complementary Pauli
operator X̄, albeit destructively and imperfectly. The X̄ eigenstates are
1 1
|± = √ (|0̄ ± |1̄) ≡ √ |a, a−1 ± |bab−1 , ba−1 b−1 ; (9.54)
2 2
36 9 Topological quantum computation
where α is the conjugacy class that contains a. On the other hand, the
state |+ has a nonzero overlap with |0; α
Therefore, if the two members of the fluxon pair are brought together,
complete annihilation is impossible if the state of the pair is |−, and
occurs with probability Prob(0) = 2/|α| if the state is |+.
Note that it is also possible to prepare a fluxon pair in the state |+.
One way to do that is to create a pair in the state |0; α. If α contains
only the two elements a and bab−1 we are done. Otherwise, we compare
the newly created pair with calibrated pairs in each of the states |c, c−1,
where c ∈ α and c is distinct from both a and bab−1 . If the pair fails to
match any of these |c, c−1 pairs, its state must be |+.
To go further, we need to characterize the computational power of the
conjugation gate. Let us use a more compact notation, in which the
state |x, x−1 of a fluxon pair is simply denoted |x, and consider the
transformations of the state |x, y, z that can be built from conjugation
gates. By winding the third pair through the first, either counterclockwise
or clockwise, we can execute the gates
and by winding the third pair through the second, either counterclockwise
or clockwise, we can execute
furthermore, by borrowing a pair with flux |c from the bureau of stan-
dards, we can execute
where the function f (x, y) can be expressed in product form — that is,
as a finite product of group elements, where the elements appearing in
9.11 Quantum computing with nonabelian fluxons 37
the product may be the inputs x and y, their inverses x−1 and y −1 , or
constant elements of G, each of which may appear in the product any
number of times.
What are the functions f (x, y) that can be expressed in this form?
The answer depends on the structure of the group G, but the following
characterization will suffice for our purposes. Recall that a subgroup H
of a finite group G is normal if for any h ∈ H and any g ∈ G, ghg −1 ∈ H,
and recall that a finite group G is said to be simple if G has no normal
subgroups other than G itself and the trivial group {e}. It turns out that
if G is a simple nonabelian finite group, then any function f (x, y) can be
expressed in product form. In the computer science literature, a closely
related result is often called Barrington’s theorem.
In particular, then, if the group G is a nonabelian simple group, there
is a function f realizable in product form such that
f (a, a) = f (a, bab−1) = f (bab−1 , a) = e , f (bab−1 , bab−1) = b . (9.61)
Thus for x, y, z ∈ {a, bab−1}, the action eq. (9.60) causes the flux of the
third pair to “flip” if and only if x = y = bab−1 ; we have constructed
from our elementary operations a Toffoli gate in the computational ba-
sis. Therefore, conjugation gates suffice for universal reversible classical
computation acting on the standard basis states.
The nonabelian simple group of minimal order is A5 , the group of even
permutations of five objects, with |A5 | = 60. Therefore, one concrete
realization of universal classical computation using conjugation gates is
obtained by choosing a to be the three-cycle element a = (345) ∈ A5 , and
b to be the product of two-cycles b = (12)(34) ∈ A5 , so that bab−1 = (435).
With this judicious choice of the group G, we achieve a topological real-
ization of universal classical computation, but how can be go still further,
to realize universal quantum computation? We have the ability to prepare
computational basis states, to measure in the computational basis, and
to execute Toffoli gates, but these tools are entirely classical. The only
nonclassical tricks at our disposal are the ability to prepare X̄ = 1 eigen-
states, and the ability to perform an imperfect destructive measurement
of X̄. Fortunately, these additional capabilities are sufficient.
In our previous discussions of quantum fault tolerance, we have noted
that if we can do the classical gates Toffoli and CNOT, it suffices for
universal quantum computation to be able to apply each of the Pauli op-
erators X, Y , and Z, and to be able to perform projective measurements
of each of X, Y , and Z. We already know how to apply the classical
gate X and to measure Z (that is, project onto the computational basis).
Projective measurement of X and Y , and execution of Z, are still missing
from our repertoire. (Of course, if we can apply X and Z, we can also
apply their product ZX = iY .)
38 9 Topological quantum computation
CNOT : XI → XX , (9.62)
where the first qubit is the control and the second qubit is the target of
the CNOT. Therefore, CNOT gates, together with the ability to prepare
X = 1 eigenstates and to perform destructive measurements of X, suffice
to realize projective measurements of X. We can prepare an ancilla qubit
in the X = 1 eigenstate, perform a CNOT with the ancilla as control
and the data to be measured as target, and then measure the ancilla
destructively. The measurement prepares the data in an eigenstate of X,
whose eigenvalue matches the outcome of the measurement of the ancilla.
In our case, the destructive measurement is not fully reliable, but we
can repeat the measurement multiple times. Each time we prepare and
measure a fresh ancilla, and after a few repetitions, we have acceptable
statistical confidence in the inferred outcome of the measurement.
Now that we can measure X projectively, we can prepare X = −1
eigenstates as well as X = 1 eigenstates (for example, we follow a Z mea-
surement with an X measurement until we eventually obtain the outcome
X = −1). Then, by performing a CNOT gate whose target is an X = −1
eigenstate, we can realize the Pauli operator Z acting on the control qubit.
It only remains to show that a measurement of Y can be realized.
Measurement of Y seems problematic at first, since our physical capa-
bilities have not provided any means to distinguish between Y = 1 and
Y = −1 eigenstates (that is, between a state ψ and its complex conjugate
ψ ∗ ). However, this ambiguity actually poses no serious difficulty, because
it makes no difference how the ambiguity is resolved. Were we to replace
measurement of Y by measurement of −Y in our simulation of a unitary
transformation U , the effect would be that U ∗ is simulated instead; this
replacement would not alter the probability distributions of outcomes for
measurements in the standard computational basis.
To be explicit, we can formulate a protocol for measuring Y by noting
first that applying a Toffoli gate whose target qubit is an X = −1 eigen-
state realizes the controlled-phase gate Λ(Z) acting on the two control
qubits. By composing this gate with the CNOT gate Λ(X), we obtain
the gate Λ(iY ) acting as
where the first qubit is the control and the second is the target. Now
suppose that my trusted friend gives me just one qubit that he assures
me has been prepared in the state |Y = 1. I know how to prepare
|X = 1 states myself and I can execute Λ(iY ) gates; therefore since a
Λ(iY ) gate with |Y = 1 as its target transforms |X = 1 to |Y = 1, I
can make many copies of the |Y = 1 state I obtained from my friend.
When I wish to measure Y , I apply the inverse of Λ(iY ), whose target is
the qubit to be measured, and whose control is one of my Y = 1 states;
then I perform an X measurement of the ancilla to read out the result of
the Y measurement of the other qubit.
What if my friend lies to me, and gives me a copy of the state |Y = −1
instead? Then I’ll make many copies of the |Y = −1 state, and I will
be measuring −Y when I think I am measuring Y . My simulation will
work just the same as before; I’ll actually be simulating the complex
conjugate of the ideal circuit, but that won’t change the final outcome of
the quantum computation. If my friend flipped a coin to decide whether
to give me the |Y = 1 state or the |Y = −1, this too would have no
effect on the fidelity of my simulation. Therefore, it turns out I don’t
need by friend’s help at all — instead of using the |Y = 1 state I would
have received from him, I may use the random state ρ = I/2 (an equally
weighted mixture of |Y = 1 and |Y = −1, which I know how to prepare
myself).
This completes the demonstration that we can simulate a quantum cir-
cuit efficiently and fault tolerantly using the fluxons and chargeons of
a nonabelian superconductor, at least in the case where G is a simple
nonabelian finite group.§ Viewed as a whole, including all state prepara-
tion and calibration of fluxes, the simulation can be described this way:
Many pairs of anyons (fluxons and chargeons) are prepared, the anyon
world lines follow a particular braid, and pairs of anyons are fused to see
whether they will annihilate. The simulation is nondeterministic in the
sense that the actual braid executed by the anyons depends on the out-
comes of measurements performed (via fusion) during the course of the
simulation. It is robust if the temperature is low compared to the energy
gap, and if particles are kept sufficiently far apart from one another (ex-
cept when pairs are being created and fused), to suppress the exchange
of virtual anyons. Small deformations in the world lines of the particles
have no effect on the outcome of the computation, as long as the braiding
of the particles is in the correct topological class.
§
Mochon has shown that universal quantum computation is possible for a larger class
of groups.
40 9 Topological quantum computation
1. A list of particle types. The types are labels that specify the possible
values of the conserved charge that a particle can carry.
2. Rules for fusing and splitting, which specify the possible values of the
charge that can be obtained when two particles of known charge
are combined together, and the possible ways in which the charge
carried by a single particle can be split into two parts.
3. Rules for braiding, which specify what happens when two particles are
exchanged (or when one particle is rotated by 2π).
9.12.1 Labels
I will use Latin letters {a, b, c, . . .} for the labels that distinguish different
types of particles. (For the case of the nonabelian superconductor, the
label was (α, R(α)), specifying a conjugacy class and an irreducible rep-
resentation of the normalizer of the class, but now our notation will be
more compact). We will assume that the set of possible labels is finite.
The symbol a represents the value of the conserved charge carried by the
9.12 Anyon models generalized 41
where each c
Nab is a nonnegative integer and the sum is over the complete
set of labels. Note that a, b and c are labels, not vector spaces; the
product on the left-hand side is not a tensor product and the sum on
the right-hand side is not a direct sum. Rather, the fusion rules can be
regarded as an abstract relation on the label set that maps the ordered
c . This relation is symmetric in a and b (a × b = b × a)
triple (a, b; c) to Nab
— the possible charges of the composite do not depend on whether a is on
the left or the right. Read backwards, the fusion rules specify the possible
ways for the charge c to split into two parts with charges a and b.
c
If Nab = 0, then charge c cannot be obtained when we combine a and
c c
b. If Nab = 1, then c can be obtained — in a unique way. If Nab > 1,
42 9 Topological quantum computation
fusing two charges can yield a third charge in more than one possible way
should be familiar from group representation theory. For example, the
rule governing the fusion of two octet representations of SU(3) is
8 × 8 = 1 + 8 + 8 + 10 + 10 + 27 , (9.66)
8
so that N88 = 2. We emphasize again, however, that while the fusion
rules for group representations can be interpreted as a decomposition of a
tensor product of vector spaces as a direct sum of vector spaces, in general
the fusion rules in an anyon model have no such interpretation.
The Nabc distinguishable ways that c can arise by fusing a and b can
{|ab; c, µ , c
µ = 1, 2, . . . , Nab}. (9.67)
It is quite convenient to introduce a graphical notation for the fusion basis
states:
a b c
P | ab; c, P ² P ¢ ab; c, P |
c a b
c´ a b
P´ P
cc Pc
a b G G
c P ¦P
c,
c a b
P P
c a b
There are some natural isomorphisms among fusion spaces. First of all,
c ∼ c
Vab = Vba ; these vector spaces are associated with different labelings of
the two particles (if a = b) and so should be regarded as distinct, but they
are isomorphic spaces because fusion is symmetric. We may also “raise
and lower indices” of a fusion space by replacing a label by its conjugate,
e.g.,
c ∼ b̄ ∼ 1 ∼ b̄c ∼ āb̄ ∼
Vab = Vac̄ = Vabc̄ = Va , = Vc̄ = · · · ; (9.70)
in the diagrammatic notation, we have the freedom to reverse the sense
of a line while conjugating the line’s label. The space Vabc̄1 , represented
as a diagram with three incoming lines, is the space spanned by the dis-
tinguishable ways to obtain the trivial total charge 1 when fusing three
particles with labels a, b, c̄.
The charge 1 deserves its name because it fuses trivially with other
particles:
a×1=a . (9.71)
a ∼ 1
Because of the isomorphism Va1 = Vaā , we conclude that ā is the unique
label that can fuse with a to yield 1, and that this fusion can occur in
a ∼ aā
only one way. Similarly, Va1 = V1 means that pairs of particles created
out of the vacuum have conjugate charges.
An anyon model is nonabelian if
dim c
Vab = c
Nab ≥2 (9.72)
c c
for at least some pair of labels ab; otherwise the model is abelian. In an
abelian model, any two particles fuse in a unique way, but in a nonabelian
model, there are some pairs of particles that can fuse in more than one
way, and there is a Hilbert space of two or more dimensions spanned by
these distinguishable states. We will refer to this space as the “topological
44 9 Topological quantum computation
If we choose canonical bases {|ba; c, µ} and {|ab; c, µ} for these two
spaces, R can be expressed as the unitary matrix
R : |ba; c, µ → |ab; c, µ (Rcab)µµ ; (9.74)
µ
note that R may have a nontrivial action on the fusion states. When
we represent the action of R diagrammatically, it is convenient to fix the
positions of the labels a and b on the incoming lines, and twist the lines
counterclockwise as they move toward the fusion vertex (µ)— the graph
c
with twisted lines represents the state in Vab obtained by applying R to
|ba; c, µ, which can be expanded in terms of the canonical basis for Vab
c
:
9.12 Anyon models generalized 45
a b a b
c
c P
P ¦
P c
R ba P Pc
c c
d
Correspondingly, there are two natural orthonormal bases for Vabc , which
we may denote
|(ab)c → d; eµν ≡ |ab; e, µ ⊗ |ec; d, ν ,
|a(bc) → d; eµ ν ≡ |ae ; d, ν ⊗ |bc; e, µ , (9.80)
and which are related by a unitary transformation F :
e µ ν
|(ab)c → d; eµν = |a(bc) → d; e µ ν Fabc
d
. (9.81)
eµν
e µ ν
a b c a b c
c c c
P d ePQ Pc
e
Q
¦
ec c c
PQ
F
abc ePQ
Qc
e´
d d
The unitary matrices Fabc d are sometimes called fusion matrices; how-
ever, rather than risk causing confusion between F and the fusion rules
c , I will just call it the F -matrix.
Nab
Note that this space does not have a natural decomposition as a tensor
product of subsystems associated with the localized particles; rather, we
have expressed it as a direct sum of many tensor products. For nonabelian
anyons, its dimension
dim Vac1 a2 a3 ···an ≡ Nac1a2 a3 ···an
= Nab11a2 Nbb12a3 Nbb23a4 . . . Nbcn−2 an (9.83)
b1 ,b2 ,b3 ,...bn−2
9.12 Anyon models generalized 47
{|a1 a2 ; b1, µ1 |b1 a3 ; b2, µ2 · · · |bn−3 an−1 ; bn−2 , µn−2 |bn−2 an ; c, µn−1 } ,
(9.84)
or in diagrammatic notation:
a2 a3 a4 an-1 an
a1 P1 P2 P3 Pn 2 Pn 1 c
b1 b2 b3 bn-3 bn-2
1
F R F
o o o
d , which is
To reduce the number of subscripts, we will call this space Vacb
transformed by the exchange as
d
B : Vacb → Vabc
d
. (9.86)
Let us express the action of B in terms of the standard bases for the two
d d
spaces Vacb and Vabc .
b c b c
ecP cQ c
a
e
d
¦
ec c c
PQ
B d
abc ePQ
a d
e´
To avoid cluttering the equations, I suppress the labels for the fusion
space basis elements (it is obvious where they should go). Hence we write
f
B|(ac)b → d; e = B|a(cb) → d; f Facb
d
e
f
f
= |a(bc) → d; f Rfbc Facb
d
e
f
d g f
= |(ab)c → d; g F −1 abc f
Rfbc Facb
d
,
e
f,g
(9.87)
9.13 Simulating anyons with a quantum circuit 49
or
g
B : |(ac)b → d; e → |(ab)c → d; g Babc
d
, (9.88)
e
g
where
g d g f
d
Babc = F −1 abc Rfbc Facb
d
. (9.89)
e f e
f
where
1
Hd = Vabc . (9.92)
a,b,c
Here, a, b, c are summed over the complete label set of the model (which
we have assumed is finite), so that Hd contains all the possible fusion
states of three particles, and the dimension d of Hd is
1
d= Nabc . (9.93)
a,b,c
e b e b
g
a
d f
¦ g
f
Baeb
d a
g g f
d
b e
b e
g g
a
d f
¦ g
Ff
abe d g
d
a f
we have separated the sum over g into the component for which (be) fuses
to 1, plus the remainder. After the F -move which (is just a particular
two-qudit unitary gate), we can sample the probability that (be) fuses to
1 by performing a projective measurement of the second qudit in the basis
{|b, ḡ, e}, and recording whether g = 1.
This completes our demonstration that a quantum circuit can simulate
efficiently a topological quantum computer.
when two anyons are brought together they either annihilate, or fuse to
become a single anyon. The model is nonabelian because two anyons can
fuse in two distinguishable ways.
Consider the standard basis for the Hilbert space V1bn of n anyons, where
each basis element describes a distinguishable way in which the n anyons
could fuse to give total charge b ∈ {0, 1}. If the two anyons furthest to
the left were fused first, the resulting charge could be 0 or 1; this charge
could then fuse with the third anyon, yielding a total charge of 0 or 1,
and so on. Finally, the last anyon fuses with the total charge of the first
n − 1 anyons to give the total charge b. Altogether n − 2 intermediate
charges b1 , b2, b3 , . . . bn−2 appear in this description of the fusion process;
thus the corresponding basis element can be designated with a binary
string of length n − 2. If the total charge is 0, the result of fusing the
9.15 Quantum dimension 53
Nn0 = Nn−1
0 0
+ Nn−2 . (9.97)
n = 1 2 3 4 5 6 7 8 9...
(9.98)
Nn0 = 0 1 1 2 3 5 8 13 21 . . .
a a a a
1
a a
1
a a
da
If two pairs are created and then each pair annihilates immediately, the
world lines of the pairs form two closed loops, and |R| counts the number
of distinct “colors” that propagate around each loop. But if the particle
from each pair annihilates the antiparticle from the other pair, there is
only one closed loop and therefore one sum over colors; if we normalize
the process on the left to unity, the amplitude for the process on the right
is suppressed by a factor of 1/|R|. To say the same thing in an equation,
the normalized state of an RR̄ pair is
1
|RR̄ = |i|ī , (9.99)
|R| i
where {|i} denotes an orthonormal basis for R and {|ī} is a basis for R̄.
Suppose that two pairs |RR̄and |R R̄ are created; if the pairs are fused
after swapping partners, the amplitude for annihilation is
1
RR̄, R R̄ |RR̄ , R R̄ = j j̄, j j̄ |iī , iī
|R|2
i,i ,j,j
1 1 1
= δji δji δj i δj i = δii = . (9.100)
|R|2 |R|2 |R|
i,i ,j,j i
fusing the particle from each pair with the antiparticle from the neighbor-
ing pair. However, each zigzag reduces the amplitude by another factor
of 1/da. We can compensate for these factors of 1/d √ a if we weight each
pair creation or annihilation event by a factor of da. With this new
convention, we can bend the world line of a particle forward or backward
in time without paying any penalty:
a a
da da da
da da da
a a
b b
a P a
d a db
a b
¦P c P
c,
P
¦P
c,
b
P
a
¦N
c
c
ab
c
¦N
c
c
ab dc
c
Furthermore, since Na has nonnegative entries and all components of d< are
positive, da is the largest eigenvalue of Na and is nondegenerate. (This
simple observation is sometimes called the Perron-Frobenius theorem.)
b
For n anyons, each with label a, the topological Hilbert space Vaaa···a for
the sector with total charge b has dimension
b1 b2 b3
b
Naaa···a = Naa Nab1 Nab 2
b
. . . Nabn−2
= b| (Na)n−1 |a . (9.103)
{bi }
Na = |vdav| + · · · , (9.104)
where
d<
|v = , D= dc 2 , (9.105)
D c
where the ellipsis represents terms that are exponentially suppressed for
large n. We see that the quantum dimension da controls the rate of growth
of the n-particle Hilbert space for anyons of type a.
Because the label 0 with trivial charge fuses trivially, we have d0 = 1. In
the case of the Fibonacci model, it follows from the fusion rule 1×1 = 0+1
that d21 = 1 + d1 , which is solved by d1 = φ as we found earlier; therefore
D2 = d20 + d21 = 1 + φ2 = 2 + φ. Our formula becomes
0 1
N111···1 = φn , (9.107)
2+φ
b P
P
d a db p (ab o c) ¦P c
a
¦P b a
P P
c
c
N ab N abc d c
c
Using
c a
Nab da = Nbc̄ dā = dbdc̄ = db dc , (9.111)
a a
we can easily verify that this condition is satisfied by
d2a
pa = . (9.112)
D2
We conclude that if anyons are created in a random process, those carrying
labels with larger quantum dimension are more likely to be produced, in
keeping with the property that anyons with larger dimension have more
quantum states.
58 9 Topological quantum computation
1 2 3 4
F a c F
1 2 3 4 1 2 3 4
5
a c
d
b
5 5
F F
1 2 3 4 1 2 3 4
e
F e
d
b
5 5
The basis shown furthest to the left in this pentagon diagram is the “left
standard basis” {|left; a, b}, in which particles 1 and 2 are fused first,
the resulting charge a is fused with particle 3 to yield charge b, and then
finally b is fused with particle 4 to yield the total charge 5. The basis
shown furthest to the right is the “right standard basis” {|right; c, d}, in
which the particles are fused from right to left instead of left to right.
Across the top of the pentagon, these two bases are related by two F -
9.16 Pentagon and hexagon equations 59
Across the bottom of the pentagon, the bases are related by three F -
moves, and we find
c d b e
5
|left; a, b = |right; c, d F234
d
F1e4 b
F123 . (9.114)
e a
c,d,e
Equating our two expressions for |left; a, b, we obtain the pentagon equa-
tion: 5 d 5 c d c 5 d b e
F12c a Fa34 b = F234 F1e4 b F123 . (9.115)
e a
Another nontrivial consistency condition is found by considering the
various ways that three particles can fuse:
1 2 3 2 3 1
F R b F
b
1 2 3 2 3 1
4 4
a c
R R
4 2 1 3 2 1 3 4
F
a c
4 4
The basis {|left; a} furthest to the left in this hexagon diagram is obtained
if the particles are arranged in the order 123, and particles 1 and 2 are
fused first, while the basis {|right, c} furthest to the right is obtained if
the particles are arranged in order 231, and particles 1 and 3 are fused
first. Across the top of the hexagon, the two bases are related by the
sequence of moves F RF :
4 c 4 4 b
|left, a = |right; c F231 R F123 a .
b 1b
(9.116)
b,c
Across the bottom of the hexagon, the bases are related by the sequence
of moves RF R, and we find
4 c a
|left, a = |right; cRc13 F213 R .
a 12
(9.117)
c
60 9 Topological quantum computation
Equating our two expressions for |left; a, we obtain the hexagon equation:
4 c a c 4 4 b
4
Rc13 F213 R =
a 12
F231 R F123 a .
b 1b
(9.118)
b
A beautiful theorem, which I will not prove here, says that there are
no further conditions that must be imposed to ensure the consistency of
braiding and fusing. That is, for any choice of an initial and final basis
for n anyons, all sequences of R-moves and F -moves that take the initial
basis to the final basis yield the same isomorphism, provided that the
pentagon equation and hexagon equation are satisfied. This theorem is
an instance of the MacLane coherence theorem, a fundamental result in
category theory. The pentagon and hexagon equations together are called
the Moore-Seiberg polynomial equations — their relevance to physics was
first appreciated in studies of (1+1)-dimensional conformal field theory
during the 1980’s.
A solution to the polynomial equations defines a viable anyon model.
Therefore, there is a systematic procedure for constructing anyon models:
τ2 + τ = 1 . (9.123)
The only other solution is the complex conjugate of this one; this second
solution really describes the same model, but with clockwise and coun-
terclockwise braiding interchanged. Therefore, an anyon model with the
Fibonacci fusion rule really does exist, and it is essentially unique.
9.18 Epilogue
That is as far as I got in class. I will mention briefly here a few other
topics that I might have covered if I had not run out of time.
9.18.2 S-matrix
The modular S-matrix of an anyon model can be defined in terms of two
anyon world lines that form a Hopf link:
a b
b 1
S
D
a
Here D is the total quantum dimension of the model, and we have used
the normalization where unlinked loops would have the value dadb ; then
the matrix Sab is symmetric and unitary. In abelian anyon models, the
Hopf link arose in our discussion of topological degeneracy, where we
characterized how the vacuum state of an anyon model on the torus is
affected when an anyon is transported around one of the cycles of the
torus. The S-matrix has a similar interpretation in the nonabelian case.
By elementary reasoning, S can be related to the fusion rules:
Sd c
c
(Na)b = Sbd a
d
S −1 d ; (9.133)
d
S 1
9.19 Bibliographical notes 65
that is, the S-matrix simulaneously diagonalizes all the matrices {Na}
(the Verlinde relation). Note that it follows from the definition that S1a =
da /D.
where the sum is over the complete label set of the anyon model, and
e2πiJa = R1aā is the topological spin of the label a. This expression re-
lates the quantity c− , characteristic of the edge theory, to the quantum
dimensions and topological spins of the bulk theory, but determines c−
only modulo 8. Therefore, at least in principle, there can be multiple edge
theories corresponding to a single theory of anyons in the bulk.
was discussed in [4, 5]. My discussion of the universal gate set is based
on [6], where more general models are also discussed. Other schemes,
that make more extensive use of electric charges and that are universal
for smaller groups (like S3 ) are described in [7].
Diagrammatic methods, like those I used in the discussion of the quan-
tum dimension, are extensively applied to derive properties of anyons in
[8]. The role of the polynomial equations (pentagon and hexagon equa-
tions) in (1+1)-dimensional conformal field theory is discussed in [9].
Simulation of anyons using a quantum circuit is discussed in [10]. Simu-
lation of a universal quantum computer using the anyons of the SU(2)k=3
Chern-Simons theory is discussed in [11]. That the Yang-Lee model is
also universal was pointed out in [12].
I did not discuss physical implementations in my lectures, but I list a
few relevant references here anyway: Ideas about realizing abelian and
nonabelian anyons using superconducting Josephson-junction arrays are
discussed in [13]. A spin model with nearest-neighbor interactions that
has nonabelian anyons (though not ones that are computationally univer-
sal) is proposed and solved in [14], and a proposal for realizing this model
using cold atoms trapped in an optical lattice is described in [15]. Some
ideas about realizing the (computationally universal) SU(2)k=3 model in
a system of interacting electrons are discussed in [16].
Much of my understanding of the theory of computing with nonabelian
anyons was derived from many helpful discussions with Alexei Kitaev.
References
67
68 References