Qcnotes 1
Qcnotes 1
miniaturization.
2. Making use of quantum effects allows one to speed up certain computations enormously
(sometimes exponentially), and even enables some things that are impossible for classical
computers. The main purpose of these lecture notes is to explain these advantages of quantum
computing (algorithms, crypto, etc.) in detail.
3. Finally, one might say that the main goal of theoretical computer science is to “study the
power and limitations of the strongest-possible computational devices that Nature allows
us.” Since our current understanding of Nature is quantum mechanical, theoretical computer
science should arguably be studying the power of quantum computers, not classical ones.
Before limiting ourselves to theory, let us say a few words about practice: to what extent will
quantum computers ever be built? At this point in time, it is just too early to tell. The first
small 2-qubit quantum computer was built in 1997 and in 2001 a 5-qubit quantum computer was
used to successfully factor the number 15 [240]. Since then, experimental progress on a number
of different technologies has been steady but slow. The most advanced implementations currently
use superconducting qubits and ion-trap qubits. The largest quantum computation done at the
time of writing is Google’s “quantum supremacy” experiment on 53 qubits [30], which performs
a complicated (but rather useless) sampling task that appears to be no longer simulatable in a
reasonable amount of the time on even the largest existing classical supercomputer.
The practical problems facing physical realizations of quantum computers seem formidable. The
problems of noise and decoherence have to some extent been solved in theory by the discovery of
quantum error-correcting codes and fault-tolerant computing (see, e.g., Chapter 20 in these notes),
but these problems are by no means solved in practice. On the other hand, we should realize
that the field of physical realization of quantum computing is still in its infancy and that classical
computing had to face and solve many formidable technical problems as well—interestingly, often
these problems were even of the same nature as those now faced by quantum computing (e.g.,
noise-reduction and error-correction). Moreover, while the difficulties facing the implementation
of a full quantum computer may seem daunting, more limited applications involving quantum
communication have already been implemented with some success, for example teleportation (which
is the process of sending qubits using entanglement and classical communication), and versions of
BB84 quantum key distribution are nowadays even commercially available.
Even if the theory of quantum computing never materializes to a real large-scale physical com-
puter, quantum-mechanical computers are still an extremely interesting idea which will bear fruit
in other areas than practical fast computing. On the physics side, it may improve our understand-
ing of quantum mechanics. The emerging theories of entanglement and of Hamiltonian complexity
have already done this to some extent. On the computer science side, the theory of quantum
computation generalizes and enriches classical complexity theory and may help resolve some of its
problems (see Section 15.3 for an example).
2
1.2.1 Superposition
Consider some physical system that can be in N different, mutually exclusive classical states.
Because we will typically start counting from 0 in these notes, we call these states |0i, |1i, . . . , |N −1i.
Roughly, by a “classical” state we mean a state in which the system can be found if we observe it.
A pure quantum state (usually just called state) |φi is a superposition of classical states, written
|φi = α0 |0i + α1 |1i + · · · + αN −1 |N − 1i.
Here αi is a complex number that is called the amplitude of |ii in |φi. Intuitively, a system in
quantum state |φi is “in all classical states at the same time,” each state having a certain amplitude.
It is in state |0i with amplitude α0 , in state |1i with amplitude α1 , and so on. Mathematically,
the states |0i, . . . , |N − 1i form an orthonormal basis of an N -dimensional Hilbert space (i.e., an
N -dimensional vector space equipped with an inner product). A quantum state |φi is a vector in
this space, usually written as an N -dimensional column vector of its amplitudes:
α0
|φi = ..
.
.
αN −1
Such a vector is sometimes called a “ket.” It conjugate transpose is the following row vector,
sometimes called a “bra”:
hφ| = α0∗ , . . . , αN
∗
−1 .
The reason for this terminology (often called “Dirac notation” after Paul Dirac) is that an inner
product hφ|ψi between two states corresponds to the dot product between a bra and a ket vector
(“bracket”): hφ|ψi = hφ| · |ψi.
We can combine different Hilbert spaces using tensor product: if |0i, . . . , |N − 1i are an or-
thonormal basis of space HA and |0i, . . . , |M − 1i are an orthonormal basis of space HB , then the
tensor product space H = HA ⊗ HB is an N M -dimensional space spanned by the set of states
{|ii
PN −1⊗ |ji
PM|−1i ∈ {0, . . . , N − 1}, j ∈ {0, . . . , M − 1}}. An arbitrary state in H is of the form
i=0 j=0 αij |ii ⊗ |ji. Such a state is called bipartite. Similarly we can have tripartite states
that “live” in a Hilbert space that is the tensor product of three smaller Hilbert spaces, etc.
There are two things we can do with a quantum state: measure it or let it evolve unitarily
without measuring it. We will deal with measurement first.
1.2.2 Measurement
Measurement in the computational basis
Suppose we measure state |φi. We cannot “see” a superposition itself, but only classical states.
Accordingly, if we measure state |φi we will see one and only one classical state |ji. Which specific
|ji will we see? This is not determined in advance; the only thing we can say is that we will
see state |ji with probability |αj |2 , which is the squared norm of the corresponding amplitude αj .
This is known as “Born’s rule.” Accordingly, observing a quantum state induces a probability
distribution
PN −1 on the classical states, given by the squared norms of the amplitudes. This implies
2
j=0 |αj | = 1, so the vector of amplitudes has (Euclidean) norm 1. If we measure |φi and get
outcome j as a result1 , then |φi itself has “disappeared,” and all that is left is |ji. In other words,
1
Don’t use the ambiguous phrase “we measure j” in this case, since it’s not clear in that phrasing whether |ji is
the state you’re applying the measurement to, or the outcome of the measurement.
3
observing |φi “collapses” the quantum superposition |φi to the classical state |ji that we saw, and
all “information” that might have been contained in the amplitudes αi is gone. Note that the
probabilities of the various measurement outcomes are exactly the same when we measure |φi or
when we measure state eiθ |φi; because of this we sometimes say that the “global phase” eiθ has no
physical significance.
Projective measurement
For most of the topics in these notes, the above “measurement in the computational (or standard)
basis” suffices. However, somewhat more general kinds of measurement than the above are possible
and sometimes useful. The remainder of this subsection may be skipped on a first reading, but will
become more relevant in the later parts of these notes, starting from Chapter 15.
A projective measurement on some space, with m possible outcomes, P is a collection of projectors
P1 , . . . , Pm that all act on that same space and that sum to identity, mj=1 P j = I.2 These projectors
are then pairwise orthogonal, meaning that Pi Pj = 0 if i 6= j. The projector Pj projects on some
subspace Vj of P the total Hilbert space V , and every state |φi ∈ V can be decomposed in a unique
m
way as |φi = j=1 |φj i, with |φj i = Pj |φi ∈ Vj . Because the projectors are orthogonal, the
subspaces Vj are orthogonal as well, as are the states |φj i. When we apply this measurement to
the pure state |φi, then we will get outcome j with probability k|φj ik2 = Tr(Pj |φihφ|) = hφ|Pj |φi
and the measured state will then “collapse” to the newPstate |φj i/k|φj ik = Pj |φi/kPj |φik.3 The
probabilities sum to 1 thanks to our assumption that m j=1 Pj = I and the fact that trace is a
linear function:
m
X m
X
Tr(Pj |φihφ|) = Tr(( Pj )|φihφ|) = Tr(|φihφ|) = hφ|φi = 1.
j=1 j=1
Note carefully that we cannot choose which Pj will be applied to the state but can only give a
probability distribution. However, if the state |φi that we measure lies fully within one of the
subspaces Vj , then the measurement outcome will be that j with certainty.
For example, a measurement in the computational basis on an N -dimensional state is the specific
projective measurement where m = N and Pj = |jihj|. That is, Pj projects onto the computational
basis state |ji and the corresponding subspace Vj ⊆ V is the 1-dimensional subspace spanned by
P −1
|ji. Consider the state |φi = N j=0 αj |ji. Note that Pj |φi = αj |ji, so applying our measurement
to |φi will give outcome j with probability kαj |jik2 = |αj |2 , and in that case the state collapses
α α
to αj |ji/kαj |jik = |αjj | |ji. The norm-1 factor |αjj | may be disregarded because it has no physical
significance, so we end up with the state |ji as we saw before.
Instead of the standard orthonormal basis, with basis states |0i, . . . , |N − 1i, we may consider
any other orthonormal basis B of states |ψ0 i, . . . , |ψN −1 i, and consider the projective measurement
defined by the projectors Pj = |ψj ihψj |. This is called “measuring in basis B.” Applying this
measurement to state |φi gives outcome j with probability hφ|Pj |φi = |hφ|ψj i|2 . Note that if |φi
equals one of the basis vectors |ψj i, then the measurement gives that outcome j with probability 1.
In the previous two examples the projectors had rank 1 (i.e., project on 1-dimensional sub-
spaces), but this is not necessary. For example, a measurement that distinguishes between |ji
2
The m projectors together form one measurement; don’t use the word “measurement” for individual Pj s.
3
Don’t confuse the outcome of the measurement, which is the label j of the projector Pj that was applied, and
the post-measurement state, which is Pj |φi/kPj |φik.
4
P
with jP< N/2 and |ji with j ≥ N/2 corresponds to the two projectors P1 = j<N/2 |jihj| and
P2 = j≥N/2 |jihj|, each of rank N/2 (assume N is even). Applying this measurement to the state
q
|φi = 3 |1i + 23 |N i gives outcome 1 with probability kP1 |φik2 = 1/3, in which case the state
1
√
collapses to |1i. It gives outcome 2 with probability kP2 |φik2 = 2/3, the state then collapses to |N i.
Observables
A projective measurement with projectors PmP1 , . . . , Pm and associated distinct outcomes λ1 , . . . , λm ∈
R, can be written as one matrix M = i=1 λi Pi , which is called an observable. This is a succinct
way of writing down the projective measurement in one matrix, and has the added advantage that
the expected value of the outcome can be easily calculated: if we are measuring a state |φi, then
2
Pmprobability of outcomePλm
the i is kPi |φik = Tr(Pi |φihφ|), so the expected value of the outcome is
i=1 λi Tr(Pi |φihφ|) = Tr( i=1 λi Pi |φihφ|) = Tr(M |φihφ|). Note that M Pm is Hermitian: M = M ∗ .
Conversely, since every Hermitian M has a spectral decomposition M = i=1 λi Pi , there is a direct
correspondence between observables and Hermitian matrices.
The Pauli matrices I, X, Y, Z (see Appendix A.9) are examples of 2-dimensional observables,
with eigenvalues ±1. For example, Z = |0ih0| − |1ih1| corresponds to measurement in the compu-
tational basis (with measurement outcomes +1 and −1 for |0i and |1i, respectively).
Suppose we have a bipartite state. An observable A on the first part of the state corresponds
to an observable A ⊗ I on the bipartite state. Similarly, an observable B on the second part of the
state corresponds to an observable I ⊗ B on the bipartite state. Separately measuring observables
A and B on the two parts of a bipartite state is different from measuring the joint observable
A ⊗ B: the separate measurements give one outcome each, while the joint measurement gives
only one outcome, and the distribution on the post-measurement state may be different. What is
true, however, is that the measurement statistics of the product of outcomes is the same as the
measurement statistics of the outcome of the joint measurement. For example consider the case
when A = B = Z (these correspond to measurement in the computational basis), and the bipartite
state is |φi = √12 (|0i ⊗ |0i + |1i ⊗ |1i). With the separate measurements, the outcomes will be
++ or −− (note that in both cases the product of the two outcomes is +1) and the state |φi will
collapse to either |0i ⊗ |0i or |1i ⊗ |1i. Yet |φi remains undisturbed by a joint measurement with
±1-valued observable Z ⊗ Z, because |φi is a +1-eigenstate of Z ⊗ Z.
POVM measurement
If we only care about the final probability distribution on the m outcomes, not about the result-
ing post-measurement state, then the most general type of measurement we can do is a so-called
positive-operator-valued measure (POVM). This is specified by m positive semidefinite (psd) ma-
trices E1 , . . . , Em that sum to identity. When measuring a state |φi, the probability of outcome
i is given by Tr(Ei |φihφ|). A projective measurement is the special case of a POVM where the
measurement elements Ei are projectors.4
There are situations where a POVM can do things a projective measurement cannot do.5 For
4
Note that if Ei is a projector, then Tr(Ei |φihφ|) = Tr(Ei2 |φihφ|) = Tr(Ei |φihφ|Ei ) = kEi |φik2 , using the fact
that Ei = Ei2 and the cyclic property of the trace. These inequalities can fail if Ei is psd but not a projector.
5
Even though POVMs strictly generalize projective measurements, one can show that every POVM can be “sim-
ulated” by a projective measurement on a slightly larger space that yields the exact same probability distribution
over measurement outcomes (this follows from Neumark’s theorem).
5
example, suppose you have a state in a 2-dimensional space, and you know it is either in state |0i
or in state |+i = √12 (|0i + |1i). These two states are not orthogonal, so there is no measurement
that distinguishes them perfectly. However, there is a POVM measurement that never makes a
mistake, but sometimes gives another outcome 2, meaning “I don’t know.” That is, you would like
to do a measurement with three possible outcome: 0, 1, and 2, such that:
• If the state is |0i, then you get correct outcome 0 with probability 1/4, and outcome 2 with
probability 3/4, but never get incorrect outcome 1.
• If the state is |+i, then you get correct outcome 1 with probability 1/4, and outcome 2 with
probability 3/4, but never get incorrect outcome 0.
You cannot achieve this with a projective measurement on the qubit, but the following 3-outcome
POVM does the job:
You can check that E0 , E1 , E2 are psd and add up to identity, so they form a valid POVM. None of
the 3 matrices is a projector. The success probability 1/4 can be improved further, see Exercise 9.
Quantum mechanics only allows linear operations to be applied to quantum states. What this
means is: if we view a state like |φi as an N -dimensional vector (α0 , . . . , αN −1 )T , then applying an
operation that changes |φi to |ψi corresponds to multiplying |φi with an N × N complex-valued
matrix U :
α0 β0
U .. ..
= . .
.
αN −1 βN −1
P P
Note that by linearity we have |ψi = U |φi = U ( i αi |ii) = i αi U |ii.
PNBecause measuring |ψi should also give a probability distribution, we have the constraint
−1 2 = 1 on the new state. This implies that the operation U must preserve the norm
j=0 |βj |
of vectors, and hence must be a unitary transformation (often just called “a unitary”). A matrix
U is unitary if its inverse U −1 equals its conjugate transpose U ∗ . This is equivalent to saying that
U always maps a vector of norm 1 to a vector of norm 1. Because a unitary transformation always
has an inverse, it follows that any (non-measuring) operation on quantum states must be reversible:
by applying U −1 we can always “undo” the action of U , and nothing is lost in the process. On the
other hand, a measurement is clearly non-reversible, because we cannot reconstruct |φi from the
observed classical state |ji.
6
1.3 Qubits and quantum memory
In classical computation, the unit of information is a bit, which can be 0 or 1. In quantum compu-
tation, this unit is a quantum bit (qubit), which is a superposition of 0 and 1. Consider a system
with 2 basis
them |0i and |1i. We identify these basis states with the two orthogonal
states, call
1 0
vectors and , respectively. A single qubit can be in any superposition
0 1
Since hxk |yk i = δxk ,yk , we see that basis states |xi and |yi will be orthogonal as soon as there is at
least one position k at which the bits of x and y differ.
A quantum register of n qubits can be in any superposition6
n −1
2X
n
α0 |0i + α1 |1i + · · · + α2n −1 |2 − 1i, |αj |2 = 1.
j=0
Measuring this in the computational basis, we obtain the n-bit state |ji with probability |αj |2 .
Measuring just the first qubit of a state would correspond to the projective measurement that
has the two projectors P0 = |0ih0| ⊗ qI2n−1 and P1 = |1ih1| ⊗ I2n−1 . For example, applying this
measurement to the state √13 |0i|φi + 23 |1i|ψi gives outcome 0 with probability 1/3; the state then
becomes |0i|φi. We get outcome 1 with probability 2/3; the state then becomes |1i|ψi. Similarly,
measuring the first n qubits of an (n + m)-qubit state in the computational basis corresponds to
the projective measurement that has 2n projectors Pj = |jihj| ⊗ I2m for j ∈ {0, 1}n .
An important property that deserves to be mentioned is entanglement, which refers to quantum
correlations between different qubits. For instance, consider a 2-qubit register that is in the state
1 1
√ |00i + √ |11i.
2 2
6
Don’t call such a multi-qubit state or register a “qubit” or an “n-qubit”—the term “qubit” only refers to the
state of a 2-dimensional system. You can use “n-qubit” as an adjective but not as a noun.
7
Such 2-qubit states are sometimes called EPR-pairs in honor of Einstein, Podolsky, and Rosen [106],
who examined such states and their seemingly paradoxical properties. Initially neither of the two
qubits has a classical value |0i or |1i. However, if we measure the first qubit and observe, say, a
|0i, then the whole state collapses to |00i. Thus observing the first qubit immediately fixes also
the second, unobserved qubit to a classical value. Since the two qubits that make up the register
may be far apart, this example illustrates some of the non-local effects that quantum systems can
exhibit. In general, a bipartite state |φi is called entangled if it cannot be written as a tensor
product |φA i ⊗ |φB i where |φA i lives in the first space and |φB i lives in the second.7
At this point, a comparison with classical probability distributions may be helpful. Suppose
we have two probability spaces, A and B, the first with 2n possible outcomes, the second with 2m
possible outcomes. A probability distribution on the first space can be described by 2n numbers
(nonnegative reals summing to 1; actually there are only 2n − 1 degrees of freedom here) and a
distribution on the second by 2m numbers. Accordingly, a product distribution on the joint space
can be described by 2n + 2m numbers. However, an arbitrary (non-product) distribution on the
joint space takes 2n+m real numbers, since there are 2n+m possible outcomes in total. Analogously,
an n-qubit state |φA i can be described by 2n numbers (complex numbers whose squared moduli
sum to 1), an m-qubit state |φB i by 2m numbers, and their tensor product |φA i ⊗ |φB i by 2n + 2m
numbers. However, an arbitrary (possibly entangled) state in the joint space takes 2n+m numbers,
since it lives in a 2n+m -dimensional space. We see that the number of parameters required to
describe quantum states is the same as the number of parameters needed to describe probability
distributions. Also note the analogy between statistical independence8 of two random variables A
and B and non-entanglement of the product state |φA i ⊗ |φB i. However, despite the similarities
between probabilities and amplitudes, quantum states are much more powerful than distributions,
because amplitudes may have negative (or even complex) parts which can lead to interference
effects. Amplitudes only become probabilities when we square them. The art of quantum computing
is to use these special properties for interesting computational purposes.
0 1 1 0
X= , Z= .
1 0 0 −1
7
We often omit the tensor product symbol for such unentangled states, abbreviating |φA i ⊗ |φB i to |φA i|φB i (you
shouldn’t abbreviate this further to |φA φB i though, unless both |φA i and |φB i are computational basis states). Note
that there cannot be ambiguity between tensor product and the usual matrix product in this abbreviation, because
both |φA i and |φB i are column vectors and hence their matrix product wouldn’t even be well-defined (the dimensions
“don’t fit”).
8
Two random variables A and B are independent if their joint probability distribution can be written as a product
of individual distributions for A and for B: Pr[A = a ∧ B = b] = Pr[A = a] · Pr[B = b] for all possible values a, b.
8
Another important 1-qubit gate is the phase gate Rφ , which merely rotates the phase of the |1i-state
by an angle φ:
Rφ |0i = |0i
Rφ |1i = eiφ |1i
This corresponds to the unitary matrix
1 0
Rφ = .
0 eiφ
Note that Z is a special case of this: Z = Rπ , because eiπ = −1. The Rπ/4 -gate is often just called
the T -gate.
Possibly the most important 1-qubit gate is the Hadamard transform, specified by:
1 1
H|0i = √ |0i + √ |1i
2 2
1 1
H|1i = √ |0i − √ |1i
2 2
As a unitary matrix, this is represented as
1 1 1
H=√ .
2 1 −1
If we apply H to initial state |0i and then measure, we have equal probability of observing |0i or
|1i. Similarly, applying H to |1i and observing gives equal probability of |0i or |1i. However, if we
apply H to the superposition √12 |0i + √12 |1i then we obtain
1 1 1 1 1 1
H( √ |0i + √ |1i) = √ H|0i + √ H|1i = (|0i + |1i) + (|0i − |1i) = |0i.
2 2 2 2 2 2
The positive and negative amplitudes for |1i have canceled each other out! This effect is called
interference, and is analogous to interference patterns between light or sound waves.
An example of a 2-qubit gate is the controlled-not gate CNOT. It negates the second bit of its
input if the first bit is 1, and does nothing if the first bit is 0:
CNOT|0i|bi = |0i|bi
CNOT|1i|bi = |1i|1 − bi
The first qubit is called the control qubit, the second the target qubit. In matrix form, this is
1 0 0 0
0 1 0 0
CNOT = 0 0
.
0 1
0 0 1 0
More generally, if U is some n-qubit unitary matrix, then the controlled-U operation corresponds
to the following 2n+1 × 2n+1 unitary matrix:
I 0
,
0 U
where I is the 2n -dimensional identity matrix and the two 0s denote 2n × 2n all-0 matrices.
9
1.5 Example: quantum teleportation
In the next chapter we will look in more detail at how we can use and combine such elementary
gates, but as an example we will here already explain teleportation [48]. Suppose there are two
parties, Alice and Bob. Alice has a qubit α0 |0i + α1 |1i that she wants to send to Bob via a classical
channel. Without further resources this would be impossible, because the amplitudes α0 , α1 may
require an infinite number of bits of precision to write them down exactly. However, suppose Alice
also shares an EPR-pair
1
√ (|00i + |11i)
2
with Bob (say Alice holds the first qubit and Bob the second). Initially, their joint state is
1
(α0 |0i + α1 |1i) ⊗ √ (|00i + |11i).
2
The first two qubits belong to Alice, the third to Bob. Alice performs a CNOT on her two qubits
and then a Hadamard transform on her first qubit. Their joint 3-qubit state can now be written as
1
2 |00i(α0 |0i + α1 |1i) +
1
2 |01i(α0 |1i + α1 |0i) +
1
2 |10i(α0 |0i − α1 |1i) +
1
2 |11i (α0 |1i − α1 |0i) .
|{z} | {z }
Alice Bob
Alice then measures her two qubits in the computational basis and sends the result (2 random
classical bits ab) to Bob over a classical channel. Bob now knows which transformation he must
do on his qubit in order to regain the qubit α0 |0i + α1 |1i. First, if b = 1 then he applies a bitflip
(X-gate) on his qubit; second if a = 1 then he applies a phaseflip (Z-gate). For instance, if Alice
sent ab = 11, then Bob knows that his qubit is α0 |1i − α1 |0i. A bitflip followed by a phaseflip
will give him Alice’s original qubit α0 |0i + α1 |1i. In fact, if Alice’s qubit had been entangled with
some other qubits, then teleportation preserves this entanglement: Bob then receives a qubit that
is entangled in the same way as Alice’s original qubit was.
Note that the qubit on Alice’s side has been destroyed: teleporting moves a qubit from Alice to
Bob, rather than copying it. In fact, copying an unknown qubit is impossible [249], see Exercise 10.
Exercises
1. (a) What is the inner product between the real vectors (0, 1, 0, 1) and (0, 1, 1, 1)?
(b) What is the inner product between the states |0101i and |0111i?
2. Compute the result of applying a Hadamard transform to both qubits of |0i ⊗ |1i in two ways
(the first way using tensor product of vectors, the second using tensor product of matrices),
and show that the two results are equal:
3. Show that a bitflip operation, preceded and followed by Hadamard transforms, equals a
phaseflip operation: HXH = Z.
10
4. Show that surrounding a CNOT gate with Hadamard gates switches the role of the control-bit
and target-bit of the CNOT: (H ⊗ H)CNOT(H ⊗ H) is the 2-qubit gate where the second
bit controls whether the first bit is negated (i.e., flipped).
5. Simplify the following: (h0| ⊗ I)(α00 |00i + α01 |01i + α10 |10i + α11 |11i).
6. Prove that an EPR-pair √12 (|00i + |11i) is an entangled state, i.e., that it cannot be written
as the tensor product of two separate qubits.
7. Suppose we have the state √12 (|0i|φi + |1i|ψi), where |φi and |ψi are unknown normalized
quantum states with the same number of qubits. Suppose we apply a Hadamard gate to the
first qubit and then measure that first qubit in the computational basis. Give the probability
of measurement outcome 1, as a function of the states |φi and |ψi.
8. Give the 2-outcome projective measurement on a 2-qubit space that measures the parity (i.e.,
sum modulo 2) of 2-bit basis states. Also give the corresponding observable.
10. (H) Prove the quantum no-cloning theorem: there does not exist a 2-qubit unitary U that
maps
|φi|0i 7→ |φi|φi
for every qubit |φi.
11. Show that unitaries cannot “delete” information: there is no 1-qubit unitary U that maps
|φi 7→ |0i for every 1-qubit state |φi.
12. Suppose Alice and Bob are not entangled. If Alice sends a qubit to Bob, then this can
give Bob at most one bit of information about Alice.9 However, if they share an EPR-pair,
|ψi = √12 (|00i + |11i), then they can transmit two classical bits by sending one qubit over the
channel; this is called superdense coding. This exercise will show how this works.
(a) They start with a shared EPR-pair, √12 (|00i + |11i). Alice has classical bits a and b.
Suppose she does an X-gate on her half of the EPR-pair if a = 1, followed by a Z-gate
if b = 1 (she does both if ab = 11, and neither if ab = 00). Write the resulting 2-qubit
state for the four different cases that ab could take.
(b) Suppose Alice sends her half of the state to Bob, who now has two qubits. Show that
Bob can determine both a and b from his state, using Hadamard and CNOT gates,
followed by a measurement in the computational basis.
11
(c) Suppose Alice applies one of the 4 Pauli matrices to her qubit and then sends that qubit
to Bob. Give the 4 projectors of a 4-outcome projective measurement that Bob could
do on his 2 qubits to find out which Pauli matrix Alice actually applied.
cos θ − sin θ
14. Let θ ∈ [0, 2π), Uθ = , |φi = Uθ |0i and |φ⊥ i = Uθ |1i.
sin θ cos θ
12
Chapter 2
13
successfully outputs the right answer f (x) with probability at least 2/3 for every x (probability
taken over the values of the random bits). Randomized circuits are equal in power to randomized
Turing machines: a language L can be decided by a uniformly polynomial randomized circuit
family iff L ∈ BPP, where BPP (“Bounded-error Probabilistic Polynomial time”) is the class of
languages that can efficiently be recognized by randomized Turing machines with success probability
at least 2/3. Because we can efficiently reduce the error probability of randomzied algorithms (see
Appendix B.2), the particular value 2/3 doesn’t really matter here and may be replaced by any
fixed constant in (1/2, 1).
1 X
√ |ji,
2n j∈{0,1}n
which is a superposition of all n-bit strings. More generally, if we apply H ⊗n to an initial state |ii,
with i ∈ {0, 1}n , we obtain
1 X
H ⊗n |ii = √ (−1)i·j |ji, (2.1)
2n j∈{0,1}n
Pn
where i · j = k=1 ik jk denotes the inner product of the n-bit strings i, j ∈ {0, 1}n . For example:
1 1 1 X
H ⊗2 |01i = √ (|0i + |1i) ⊗ √ (|0i − |1i) = (−1)01·j |ji.
2 2 2
j∈{0,1}2
Note that Hadamard happens to be its own inverse (it’s unitary and Hermitian, hence H = H ∗ =
H −1 ), so applying it once more on the right-hand side of the above equation would give us back
|01i. The n-fold Hadamard transform will be very useful for quantum algorithms.
14
As in the classical case, a quantum circuit is a finite directed acyclic graph of input nodes,
gates, and output nodes. There are n nodes that contain the input (as classical bits); in addition
we may have some more input nodes that are initially |0i (“workspace”). The internal nodes of the
quantum circuit are quantum gates that each operate on at most two or three qubits of the state.
The gates in the circuit transform the initial state vector into a final state, which will generally
be a superposition. We measure some or all qubits of this final state in the computational basis
in order to (probabilistically) obtain a classical output to the algorithm. We can think of the
measurement of one qubit in the computational basis as one special type of gate. We may assume
without much loss of generality that such measurements only happen at the very end of the circuit
(see Exercise 7).
What about the more general kinds of measurements discussed in Section 1.2.2? If we want
to apply such a measurement in the circuit model, we will have to implement it using a circuit
of elementary gates followed by a measurement in the computational basis. For example, suppose
projectors P0 and P1 form a 2-outcome projective measurement on an n-qubit space (P0 +P1 = I2n ).
Assume for simplicity that P0 and P1 both have rank 2n /2. Then there exists a unitary U that
maps an n-qubit state |φi to a state whose first qubit is |0i whenever P0 |φi = |φi, and that maps
n-qubit |ψi to a state whose first qubit is |1i whenever P1 |ψi = |ψi. We can now implement the
projective measurement by first applying a circuit that implements U , and then measuring (in the
computational basis) the first qubit of the resulting state. The minimal-size circuit to implement
U could be very large (i.e., expensive) if the projective measurement is complicated, but that is
how it should be.
To draw quantum circuits, the convention is to let time progress from left to right: we start with
the initial state on the left. Each qubit is pictured as a horizontal wire, and the circuit prescribes
which gates are to be applied to which wires. Single-qubit gates like X and H just act on one
wire, while multi-qubit gates such as the CNOT act on multiple wires simultaneously.3 When one
qubit “controls” the application of a gate to another qubit, then the controlling wire is drawn with
a dot linked vertically to the gate that is applied to the target qubit. This happens for instance
with the CNOT, where the applied single-qubit gate is X, usually drawn as ‘⊕’ in a circuit picture
(similarly, the Toffoli gate is drawn in a circuit with a dot on the two control wires and an ‘⊕’ on
the target wire). Figure 2.1 gives a simple example on two qubits, initially in basis state |00i: first
apply H to the 1st qubit, then CNOT to both qubits (with the first qubit acting as the control),
and then Z to the last qubit. The resulting state is √12 (|00i − |11i).
|0i H •
|0i Z
Figure 2.1: Simple circuit for turning |00i into an entangled state
Note that if we have a circuit for unitary U , it is very easy to find a circuit for the inverse U −1
with the same complexity: just reverse the order of the gates, and take the inverse of each gate.
For example, if U = U1 U2 U3 , then U −1 = U3−1 U2−1 U1−1 .
3
Note that the number of wires (qubits) going into a unitary must equal the number of wires going out because
a unitary is always invertible (reversible). This differs from the case of classical circuits, where non-reversible gates
like AND have more wires going in than out.
15
In analogy to the classical class BPP, we will define BQP (“Bounded-error Quantum Poly-
nomial time”) as the class of languages that can efficiently be computed with success probability
at least 2/3 by (a family of) quantum circuits whose size grows at most polynomially with the
input length. We will study this quantum complexity class and its relation with various classical
complexity classes in more detail in Chapter 13.
We applied U just once, but the final superposition contains f (z) for all 2n input values z! However,
by itself this is not very useful and does not give more than classical randomization, since observing
the final superposition will give just one uniformly random |zi|f (z)i and all other information will
be lost. As we will see below, quantum parallelism needs to be combined with the effects of
interference and entanglement in order to get something that is better than classical.
16
2.4 The early algorithms
The two best-known successes of quantum algorithms so far are Shor’s factoring algorithm from
1994 [228] and Grover’s search algorithm from 1996 [125], which will be explained in later chapters.
Here we describe some of the earlier quantum algorithms that preceded Shor’s and Grover’s.
Virtually all quantum algorithms work with queries in some form or other. We will explain
this model here. It may look contrived at first, but eventually will lead smoothly to Shor’s and
Grover’s algorithm. We should, however, emphasize that the query complexity model differs from
the standard model described above, because the input is now given as a “black-box” (also some-
times called an “oracle”). This means that the exponential quantum-classical separations that we
describe below do not by themselves give exponential quantum-classical separations in the standard
circuit model (the same applies to Simon’s algorithm in the next chapter).
To explain the query setting, consider an N -bit input x = (x0 , . . . , xN −1 ) ∈ {0, 1}N . Usually we
will have N = 2n , so that we can address bit xi using an n-bit index i. One can think of the input
as an N -bit memory which we can access at any point of our choice (a “Random Access Memory”
or RAM). For example, a memory of N = 1024 bits can be indexed by addresses i ∈ {0, 1}10 of
n = 10 bits each. A memory access is via a so-called “black-box,” which is equipped to output the
bit xi on input i. As a quantum operation, this is the following unitary mapping on n + 1 qubits:
Ox : |i, 0i → |i, xi i.
The first n qubits of the state are called the address bits (or address register), while the (n + 1)st
qubit is called the target bit.4 Since this mapping must be unitary, we also have to specify what
happens if the initial value of the target bit is 1. Therefore we actually let Ox be the following
unitary transformation:
Ox : |i, bi → |i, b ⊕ xi i,
here i ∈ {0, 1}n , b ∈ {0, 1}, and ⊕ denotes exclusive-or (addition modulo 2). In matrix representa-
tion, this Ox is now a permutation matrix and hence unitary. Note that a quantum computer can
apply Ox on a superposition of various i, something a classical computer cannot do. One applica-
tion of this black-box is called a query, and counting the required number of queries to compute
this or that function of x is something we will do a lot in the first half of these notes.
Given the ability to make a query of the above type, we can also make a query of the form
|ii 7→ (−1)xi |ii by setting the target bit to the state |−i = √12 (|0i − |1i) = H|1i:
1
Ox (|ii|−i) = |ii √ (|xi i − |1 − xi i) = (−1)xi |ii|−i.
2
This ±-kind of query puts the output variable in the phase of the state: if xi is 1 then we get
a −1 in the phase of basis state |ii; if xi = 0 then nothing happens to |ii.5 This “phase-query”
or “phase-oracle” is sometimes more convenient than the standard type of query. We denote the
corresponding n-qubit unitary transformation by Ox,± .
4
It is a common rookie mistake to confuse the N bits of x with the n address bits; don’t fall for this!
5
This is sometimes called the “phase kick-back trick.” Note that for |+i = √12 (|0i + |1i), we have Ox (|ii|+i) =
|ii|+i irrespective of what x is. This allows us to control on which part of the state a phase-query is applied: we put
the control qubit in state |−i for indices i where we want to apply the phase-query, and in state |+i for the indices
where we do to not want to apply a phase-query.
17
2.4.1 Deutsch-Jozsa
Deutsch-Jozsa problem [99]:
For N = 2n , we are given x ∈ {0, 1}N such that either
(1) all xi have the same value (“constant”), or
(2) N/2 of the xi are 0 and N/2 are 1 (“balanced”).
The goal is to find out whether x is constant or balanced.
The algorithm of Deutsch and Jozsa is as follows. We start in the n-qubit zero state |0n i, apply
a Hadamard transform to each qubit, apply a query (in its ±-form), apply another Hadamard to
each qubit, and then measure the final state. As a unitary transformation, the algorithm would be
H ⊗n Ox,± H ⊗n . We have drawn the corresponding quantum circuit in Figure 2.2 (where time again
progresses from left to right). Note that the number of wires going into the query is n, not N ; the
basis states on this sequence of wires specify an n-bit address.
|0i H H
|0i H H
Let us follow the state through these operations. Initially we have the state |0n i. By Equa-
tion (2.1) on page 14, after the first Hadamard transforms we have obtained the uniform superpo-
sition of all i:
1 X
√ |ii.
2n i∈{0,1}n
The Ox,± -query turns this into
1 X
√ (−1)xi |ii.
2n i∈{0,1}n
Applying the second batch of Hadamards gives (again by Equation (2.1)) the final superposition
1 X xi
X
(−1) (−1)i·j |ji,
2n n n
i∈{0,1} j∈{0,1}
Pn
where i · j = k=1 ik jk as before. Since i · 0n = 0 for all i ∈ {0, 1}n , we see that the amplitude of
the |0n i-state in the final superposition is
1 X 1 if xi = 0 for all i,
xi
(−1) = −1 if xi = 1 for all i,
2n
0 if x is balanced.
i∈{0,1}n
Hence the final observation will yield |0n i if x is constant and will yield some other state if x
is balanced. Accordingly, the Deutsch-Jozsa problem can be solved with certainty using only 1
18
quantum query and O(n) other operations (the original solution of Deutsch and Jozsa used 2
queries, the 1-query solution is from [91]).
In contrast, it is easy to see that any classical deterministic algorithm needs at least N/2 + 1
queries: if it has made only N/2 queries and seen only 0s, the correct output is still undetermined.
However, a classical algorithm can solve this problem efficiently if we allow a small error probability:
just query x at two random positions, output “constant” if those bits are the same and “balanced”
if they are different. This algorithm outputs the correct answer with probability 1 if x is constant
and outputs the correct answer with probability 1/2 if x is balanced. Thus the quantum-classical
separation of this problem only holds if we consider algorithms without error probability.
2.4.2 Bernstein-Vazirani
Bernstein-Vazirani problem [53]:
For N = 2n , we are given x ∈ {0, 1}N with the property that there is some unknown a ∈ {0, 1}n
such that xi = (i · a) mod 2. The goal is to find a.
The Bernstein-Vazirani algorithm is exactly the same as the Deutsch-Jozsa algorithm, but now
the final observation miraculously yields a. Since (−1)xi = (−1)(i·a) mod 2 = (−1)i·a , we can write
the state obtained after the query as:
1 X 1 X
√ (−1)xi |ii = √ (−1)i·a |ii.
n
2 i∈{0,1}n n
2 i∈{0,1}n
Since Hadamard is its own inverse, from Equation (2.1) we can see that applying a Hadamard to
each qubit of the above state will turn it into the classical state |ai. This solves the Bernstein-
Vazirani problem with 1 query and O(n) other operations. In contrast, any classical algorithm
(even a randomized one with small error probability) needs to ask n queries for information-theoretic
reasons: the final answer consists of n bits and one classical query gives at most 1 bit of information.
Bernstein and Vazirani also defined a recursive version of this problem, which can be solved
exactly by a quantum algorithm in poly(n) steps, but for which every classical randomized algorithm
needs nΩ(log n) steps.
Exercises
1. Is the controlled-NOT operation C Hermitian? Determine C −1 .
2. Construct a CNOT from two Hadamard gates and one controlled-Z (the controlled-Z gate
maps |11i 7→ −|11i and acts like the identity on the other basis states).
3. A SWAP-gate interchanges two qubits: it maps basis state |a, bi to |b, ai. Implement a SWAP-
gate using a few CNOTs (when using a CNOT, you’re allowed to use either of the 2 bits as
the control, but be explicit about this).
4. Show that every 1-qubit unitary with real entries can be written as a rotation matrix, possibly
preceded and followed by Z-gates. In other words, show that for every 2 × 2 real unitary U ,
there exist signs s1 , s2 , s3 ∈ {1, −1} and angle θ ∈ [0, 2π) such that
1 0 cos(θ) − sin(θ) 1 0
U = s1 .
0 s2 sin(θ) cos(θ) 0 s3
19
5. Let U be a 1-qubit unitary that we would like to implement in a controlled way, i.e., we want
to implement a map |ci|bi 7→ |ciU c |bi for all c, b ∈ {0, 1} (here U 0 = I and U 1 = U ). One can
show there exist 1-qubit unitaries A, B, and C, such that ABC = I and AXBXC = U (X
is the NOT-gate); you may assume this without proof. Give a circuit that acts on two qubits
and implements a controlled-U gate, using CNOTs and (uncontrolled) A, B, and C gates.
6. (H) Let C be a given quantum circuit consisting of T many gates, which may be CNOTs and
single-qubit gates. Show that we can implement C in a controlled way using O(T ) Toffoli
gates, CNOTs and single-qubit gates, and no auxiliary qubits other than the controlling qubit.
7. (H) It is possible to avoid doing any intermediate measurements in a quantum circuit, using
one auxiliary qubit for each 1-qubit measurement that needs to be delayed until the end of
the computation. Show how.
8. (a) Give a circuit that maps |0n , bi 7→ |0n , 1 − bi for b ∈ {0, 1}, and that maps |i, bi 7→
|i, bi whenever i ∈ {0, 1}n \{0n }. You are allowed to use every type of elementary gate
mentioned in the lecture notes (incl. Toffoli gates), as well as auxiliary qubits that are
initially |0i and that should be put back to |0i at the end of the computation.
(b) Suppose we can make queries of the type |i, bi 7→ |i, b ⊕ xi i to input x ∈ {0, 1}N , with
N = 2n . Let x0 be the input x with its first bit flipped (e.g., if x = 0110 then x0 = 1110).
Give a circuit that implements a query to x0 . Your circuit may use one query to x.
(c) Give a circuit that implements a query to an input x00 that is obtained from x (analo-
gously to (b)) by setting its first bit to 0. Your circuit may use one query to x.
9. In Section 2.4 we showed that a standard query, which maps |i, bi 7→ |i, b ⊕ xi i (where
i ∈ {0, . . . , N − 1} and b ∈ {0, 1}), can be used to implement a phase-query to x, i.e., one of
the type |ii 7→ (−1)xi |ii (this is an uncontrolled phase-query).
(a) Show that a standard query can be implemented using one controlled phase-query to x
(which maps |c, ii 7→ (−1)cxi |c, ii, so the phase is added only if the control bit is c = 1),
and possibly some auxiliary qubits and other gates.
(b) Can you also implement a standard query using one or more uncontrolled phase-queries
to x, and possibly some auxiliary qubits and other gates? If yes, show how. If no, prove
why not.
(a) Suppose we run the 1-qubit circuit HOx,± H on initial state |0i and then measure (in
the computational basis). What is the probability distribution on the output bit, as a
function of x?
(b) Now suppose the query leaves some workspace in a second qubit, which is initially |0i:
0
Ox,± : |b, 0i 7→ (−1)xb |b, bi for b ∈ {0, 1}.
Suppose we just ignore the workspace and run the algorithm of (a) on the first qubit
0
with Ox,± instead of Ox,± (and H ⊗ I instead of H, and initial state |00i). What is now
20
the probability distribution on the output bit (i.e., if we measure the first of the two
bits)?
Comment: This exercise illustrates why it’s important to “clean up” (i.e., set back to |0i) workspace
qubits of some subroutine before running it on a superposition of inputs: the unintended entanglement
between the address and workspace registers can thwart the intended interference effects.
11. Give a randomized classical algorithm (i.e., one that can flip coins during its operation) that
makes only two queries to x, and decides the Deutsch-Jozsa problem with success probability
at least 2/3 on every possible input. A high-level description is enough, no need to write out
the classical circuit.
13. (H) Let N = 2n . A parity query to input x ∈ {0, 1} N corresponds to the (N + 1)-qubit
PN −1
unitary map Qx : |y, bi 7→ |y, b ⊕ (x · y)i, where x · y = i=0 xi yi mod 2. For a fixed function
f : {0, 1}N → {0, 1}, give a quantum algorithm that computes f (x) using only one such query
(i.e., one application of Qx ), and as many elementary gates as you want. You do not need to
give the circuit in full detail, an informal description of the algorithm is good enough.
21
22