PartIIC QIClectures Full
PartIIC QIClectures Full
QUANTUM INFORMATION
AND COMPUTATION
Lecture notes
Richard Jozsa, DAMTP Cambridge
[email protected]
CONTENTS
1 Introduction: why quantum computation and information? ··· 3
2 Principles of quantum mechanics and the Dirac bra-ket notation ··· 7
2.1 Quantum states and operations ··· 7
2.2 Quantum measurements ··· 13
2.3 Some basic unitary operations for qubits ··· 18
2.4 An aside: superposition and quantum interference ··· 20
3 Quantum states as information carriers ··· 23
3.1 The no cloning theorem ··· 24
3.2 Distinguishing non-orthogonal states ··· 27
3.3 The no-signalling principle ··· 30
3.4 Quantum dense coding ··· 34
4 Quantum teleportation ··· 35
5 Quantum cryptography – BB84 quantum key distribution ··· 39
6 Basics of classical computation and complexity ··· 47
6.1 Query complexity and promise problems ··· 50
7 Circuit model of quantum computation ··· 51
8 The Deutsch-Jozsa algorithm ··· 55
8.1 Simon’s algorithm ··· 59
9 Quantum Fourier transform and periodicities ··· 60
9.1 QFT mod N ··· 60
9.2 Periodicity determination ··· 61
9.3 Efficient implementation of QFT ··· 64
10 Quantum algorithms for search problems ··· 68
10.1 The class NP and search problems ··· 68
10.2 Grover’s quantum searching algorithm ··· 71
11 Shor’s quantum factoring algorithm ··· 78
11.1 Factoring as a periodicity problem ··· 78
11.2 Computing the period r of f (k) = ak mod N ··· 80
11.3 Getting r from a good c value ··· 83
11.4 Assessing the complexity of Shor’s algorithm ··· 87
1
Some useful references:
M. Nielsen and I. Chuang ”Quantum computation and information”. CUP.
B. Schumacher and M. Westmoreland, “Quantum processes, systems and information”.
CUP 2010.
S. Leopp and W. Wootters, “Protecting information: from classical error correction to
quantum cryptography”. Academic press 2006.
John Preskill’s notes for Caltech course on quantum computation.
Available at http://www.theory.caltech.edu/people/preskill/ph229/notes/book.ps
2
1 Introduction: why quantum computation and
information?
3
More generally we see that:
the possibilities and limitations of information storage, processing (i.e. computation) and
communication must all rest on the laws of physics and cannot be determined by abstract
thought or mathematics alone!
Thus there must be a deep fundamental connection between physics and computation,
and that is why we need quantum information and computation. Or as Feynman put it
more succinctly: because “Nature isn’t classical, dammit...”. And indeed it can be argued
that our conventional generally accepted model of computation (in any of its equivalent
forms, e.g. based on Turing machines or viewing computations in terms of Boolean gates
etc.) amounts to the computational possibilities allowed by the laws of classical physics.
Quantum physics differs dramatically from classical physics in the way it represents the
physical world and the kinds of processes that it allows (as we’ll see in detail shortly).
Quantum computation and information is the study of the possible applications and
exploitation of these novel quantum features in issues of information representation,
computation, computational complexity, cryptography and communication. The subject
is a remarkable synthesis of theoretical computer science, classical information theory
and cryptography with quantum physics, promising a series of benefits of much practical
significance, beyond the remit (even in principle) of conventional (classical) computing
and information technology.
The subject emerged in the mid-1980’s and it is currently one of the most active areas
of all scientific research internationally. Here we’ll just briefly highlight some of the key
issues for its benefits:
Computing power - computational complexity issues.
As we’ll see later in the course, a quantum computer cannot compute any computational
task that’s not already computable in principle on a classical computer. However the key
issue here is not ‘computability in principle’ but ‘computability in practice’ i.e. that of
computational hardness. In computational complexity theory the ‘hardness’ of a compu-
tational task is measured by the amount of computational resources needed to compute
it; the resources considered are ‘time’ i.e. the number of computational steps, and ‘space’
i.e. the amount of computer memory workspace needed.
For example, think of the task of factoring a given integer with n digits, and how hard this
is to do, as a function of n. Here n is called the ‘input size’ for the computation. For any
proposed algorithm we ask: does the time (i.e. number of steps) grow polynomially with
n (so-called poly-time algorithms) or exponentially with n (faster than any polynomial
growth), as n gets larger and larger. Poly-time algorithms are regarded as “feasible in
practice” whereas a task with only exponential time algorithms, while computable in
principle, is regarded as infeasible, or effectively uncomputable, in practice – because for
relatively modest input sizes, the time required would exceed any reasonably available
period (e.g. exceeding the age of the universe).
This is where quantum computing has a major impact: we’ll see that the formalism of
quantum theory leads to new kinds of “non-classical” modes of computation (new kinds
of computational steps for information processing), providing remarkable new possibil-
ities for computational algorithms. In some cases these possibilities are able to cross
4
the boundary between poly-time and exponential time algorithms i.e. there are some
computational tasks for which no known classical poly-time algorithm exists but which
can be solved in poly-time on a quantum computer i.e. these tasks, which are effectively
uncomputable in practice on a classical computer, become computable in practice on a
quantum computer.
The most famous example is the computational task of integer factorisation. In classical
computation there is no known algorithm that runs in polynomial time (in the number of
digits) but in 1994 Peter Shor discovered a poly-time quantum algorithm for factorisation.
We emphasise that this exponential speedup in time is achieved not by an increase in
clock speed of steps on the computer, but by exploiting entirely new (quantum) kinds
of computational steps (and needing exponentially fewer of them) that are simply not
available to classical computers.
We’ll see a variety of examples of such quantum computational benefits (including fac-
toring) in the second half of the course.
Communication and security issues - quantum states as information carriers.
Intrinsically quantum (i.e. non-classical) features of quantum states (including the possi-
bilities of quantum superposition, entanglement and principles of quantum measurement
theory) can be exploited to provide novel possibilities (beyond what’s achievable with
classical physics) for information communication and security. These include the so-
called process of quantum teleportation, and a variety of important cryptographic issues
such as the ability to implement provably secure communication. In this course we’ll
discuss quantum teleportation, the Bennett-Brassard quantum scheme for secure com-
munication, and some further features of quantum states when viewed as information
carriers.
Technological issues.
This section, written in 2018 is now out of date.
For some accounts of recent developments see for example (clickable links):
https://www.nature.com/articles/s41586-019-1666-5
https://www.nature.com/articles/d41586-019-03213-z
https://quantum-journal.org/papers/q-2018-08-06-79/pdf/
https://en.wikipedia.org/wiki/Quantum supremacy
Historically in computer science (before the advent of quantum computing) there was a
phenomenon known as Moore’s law viz. that since 1965 there has been a steady rate of
miniaturisation of computer components, by approximately a factor of 4 every 3.5 years.
With this trend we have now effectively reached the atomic scale where classical physics
fails completely and quantum effects are dominant – components begin to malfunction
in ‘bizarre’ quantum ways. To deal with this, we could either aim to re-design our
components to stamp out the new effects to provide the same functionality as before, or
else we could embrace the new quantum effects, aiming to exploit them in new kinds of
computational ways. Our discussion of computational complexity above shows that the
latter is surely the way to go!
However this involves immense technological challenges: it turns out that quantum states
5
and processes are intrinsically more fragile and difficult to control cleanly, than their clas-
sical counterparts. Inspired by the theoretical technological possibilities on offer, in recent
years there has be a huge effort devoted to developing the needed quantum technologi-
cal capability. To date, some quantum cryptographic protocols have been implemented
(including secure communication), even to the level of being commercially available.
However these require quantum processing of only relatively small systems (thus within
our current quantum technological capabilities) and similarly some quantum algorithms
on very small input instances have been demonstrated. But the ultimate ‘holy grail’
of a working scalable universal quantum computer is currently beyond our quantum
technological capability. Some of the world’s leading information technology companies
(including IBM, Google, Microsoft) have mounted huge research and development efforts
with just that aim. They have announced their expectation of having a working quantum
computer in 2018 (for the first time) that can reliably process around 50 qubits of quan-
tum information (a qubit being the quantum analogue of a classical bit). This may not
sound like a large number e.g. what useful computation can you carry out with just 50
classical bits!? But this comparison is completely misguided: astonishingly it turns out
that the extraordinary possibilities of quantum effects (that we’ll see) imply that even
a quantum computer of only around 50 qubits should already be able to perform some
computational tasks that are beyond the dedicated use of all classical computing power
on earth today.
In the popular science press, quantum computing (and quantum information technol-
ogy more broadly) has been the subject of some sensationalisation. In that spirit one
could indeed say that (in view of the above expected technological developments) it is
now widely accepted that we are presently on the cusp of the next major revolution in
technology, in a venerable tradition that perhaps began with the stone age, and later the
iron age, continuing to steam and mechanical engineering, electrical, then electronic, and
now quantum technology (including more broadly nano-technology), for the first time
embracing fully the technological possibilities of quantum physics.
6
2 Principles of quantum mechanics and the
Dirac bra ket notation
We will begin by setting out the basic principles of quantum mechanics (as four basic
principles (QM1) - (QM4)) while simultaneously introducing and explaining the formal-
ism of Dirac notation, which we will use to express their mathematical content.
Dirac notation is nothing more than an alternative notation for basic linear algebra which
is widely used in quantum mechanics. We will use it for essentially all aspects of this
course so it will be important for you to master it at the outset!
Thus in components bra vectors are always written as row vectors. If |wi = c |0i + d |1i
is another ket then the inner product of |vi and |wi is written by juxtaposing brackets
† ∗ ∗ c
hv|wi = |vi |wi = (a b ) = a∗ c + b∗ d.
d
Indeed the whole Dirac notation formalism is motivated by the bracket notation (v, w)
for inner products commonly used in mathematics, hence the terms “bra” and “ket”
vectors, giving the inner product as a “bra-ket”. Orthonormality of the basis {|0i , |1i} is
equivalent to the condition hi|ji = δij (the Kronecker delta). In more abstract terms, bra
vectors are a notation for elements of the dual space V ∗ viz. hv| is the linear functional
whose value on any ket |wi is the inner product hv|wi.
Dirac notation: tensor products of vectors
7
If V and W are vector spaces of dimensions m and n with bases {|e1 i , . . . , |em i} and
{|f1 i , . . . , |fn i} respectively, then the tensor product space V ⊗W has dimension mn and
can be regarded as consisting of all formal linear combinations of the symbols |ei i ⊗ |fj i
for i = 1, . . . , m and j = 1, . .P . , n. There is a natural P bilinear embedding V ×W → V ⊗W
defined as follows. If |αi = i ai |ei i and |βi = j bj |fj i are general vectors in V and
W respectively then
X
(|αi , |βi) 7→ |αi ⊗ |βi = ai bj |ei i ⊗ |fj i (∗)
ij
P P
obtained by formally “multiplying out” the brackets in ( i ai |ei i)( j bj |fj i). Alterna-
tively, this is just the outer product of tensors with components ai and bj to obtain a
tensor Tij = ai bj in the larger vector space of two index tensors.
Product vectors and entangled vectors
Any vector |αi ⊗ |βi in V ⊗ W is called a product vector. We often write the product
vector |αi ⊗ |βi simply as |αi |βi (omitting the ⊗). The mapping in eqn. (*) above is not
surjective – vectors in V ⊗ W that are not product vectors are called entangled vectors.
We will see that in the formalism of quantum mechanics, physical states are represented
by vectors of unit length, and entangled states will play a very important role in quantum
computation and information. We will introduce and define them again later, and have
a lot more to say about them in due course!
We will be mostly concerned with tensor products of the two-dimensional space V2 with
itself (multiple times). For the k-fold tensor power we write ⊗k V2 = V2 ⊗ . . . ⊗ V2 which
is a space of dimension 2k with basis |i1 i ⊗ . . . ⊗ |ik i (i1 , . . . ik = 0, 1) labelled by the 2k
k-bit strings i1 . . . ik . We often write |i1 i ⊗ . . . ⊗ |ik i simply as |i1 . . . ik i. This basis is
also called the computational (or standard) basis of ⊗k V2 .
Example. For k = 2 if |vi = a |0i+b |1i and |wi = c |0i+d |1i , we have |vi |wi ∈ V2 ⊗V2 .
By formal multiplication we get
|vi ⊗ |wi = (a |0i + b |1i)(c |0i + d |1i) = ac |00i + ad |01i + bc |10i + bd |11i
and in terms of components we have
ac c
ad a
a c d
|vi ⊗ |wi = ⊗ =
bc =
.
b d c
b
bd d
Note how the last expression gives the pattern for how to get the final tensor product
components from those of the individual vectors: take each numerical component of the
first vector in turn and “expand it up (doubling it to two numbers)” by multiplying it
by the components of the second vector taken in order, and then list all these in order
in a column. Note also that this illustrates that the tensor product is not commutative
i.e. |vi ⊗ |wi =
6 |wi ⊗ |vi in general.
Example. The vector |00i + |11i ∈ V2 ⊗ V2 is entangled i.e. it is not a product vector.
To see this in an elementary way, suppose that it is a product i.e.
|00i + |11i = a |0i + b |1i)(c |0i + d |1i) = ac |00i + ad |01i + bc |10i + bd |11i
8
for some a, b, c, d. Then comparing the first and last expressions, we must have ad = 0
(and also bc = 0), so either a = 0 or d = 0. Thus respectively either |00i or |11i has
coefficient zero too, which is a contradiction.
This argument may be generalised to show that an arbitrary vector α |00i + β |01i +
γ |10i + δ |11i ∈ V2 ⊗ V2 is entangled iff αδ − βγ 6= 0 (see exercise sheet 1). But beware:
this simple single-equation characterisation no longer suffices if the component
P spaces
have dimension greater than 2. Indeed in that general case, a vector i,j Aij |ii |ji is a
product vector iff the matrix Aij of coefficients has rank one.
Inner product in V ⊗ W
The inner products on V and W give a natural inner product on V ⊗ W defined “slot-
wise” (the slots being the component spaces). Thus for product vectors we have the
inner product of |α1 i |β1 i with |α2 i |β2 i being hα1 |α2 ihβ1 |β2 i. This extends to general
(entangled) vectors by linearity since general vectors are always linear combinations of
product vectors (e.g. of the product basis vectors |ei i |fj i). More explicitly, (and here
using the summation convention for repeated indices) if |Ai = aij |ei i |fj i and |Bi =
bij |ei i |fj i are two vectors in V ⊗ W then
hA|Bi = a∗ij hei | hfj | ( bpq |ep i |fq i ) = a∗ij bpq hei |ep i hfj |fq i = a∗ij bij
where we have used the basis orthonormality relations hei |ep i = δip and hfj |fq i = δjq .
Thus inner products are calculated by contracting indices on the vector components
(relative to an orthonormal product basis), also with a complex conjugation for the left
vector. Later (preceeding (QM4) below) we will introduce a useful ‘partial inner product’
construction for |Ai ∈ ⊗m V2 and |Bi ∈ ⊗n V2 with m ≤ n in which the (complex
conjugated) components of |Ai are contracted against a specified set of m indices of |Bi
to result in a vector in ⊗(n−m) V2 .
We often write the bra vector of a product ket |αi |βi ∈ V ⊗ W as hβ| hα| (“reflecting”
the symbols) with order of spaces reversed. But however we write it, it will generally be
important to remain aware of which vector space corresponds to which slot. If needed, we
can make this explicit by using subscripts to denote the names of the spaces e.g. writing
the bra vector of |αiV |βiW as Whβ| V hα| or V hα| Whβ|.
Quantum principles (QM1) and (QM2)
Our description of quantum mechanics below may at first sight look a little different from
standard textbook presentations but in fact it’s equivalent. Here we focus on quantum
mechanics of physical systems with finite dimensional state spaces (multi-qubit systems,
cf below) and unitary matrices representing finite time evolutions, whereas quantum
physics textbooks traditionally begin with the infinite dimensional case viz. wavefunc-
tions, and Schrödinger’s wave equation giving infinitesimal time evolution via a Hamilto-
nian. We will also emphasise ab initio the quantum measurement formalism, which will
be of crucial significance for us.
(QM1) (physical states): the states of any (isolated) physical system are represented
by unit vectors in a complex vector space with an inner product.
By slight abuse of terminology we will often say that “a system has state space V (of
9
some dimension d)” when its states are the unit vectors in the vector space V .
The simplest non-trivial quantum system has a 2 dimensional vector space. Choosing
a pair of orthonormal vectors and labelling them |0i and |1i, the general state can be
written
|ψi = a |0i + b |1i |a|2 + |b|2 = 1.
We say that |ψi is a superposition of states |0i and |1i with amplitudes a and b.
Qubits: any quantum system, with a 2 dimensional state space and a chosen orthonormal
basis (which we write as {|0i , |1i}) is called a qubit. The basis states |0i , |1i are called
computational basis states or standard basis states. They will be used to represent the
two corresponding classical bit values as qubit states, and then general qubit states can
be thought of as superpositions of the classical bit values (cf later for how we can think of
a superposition as a kind of “simultaneous or parallel physical existence” of bit values 0
and 1). There are many real physical systems that can embody the structure of a qubit,
for example the spin of an electron, the polarisation of a photon, superpositions of two
selected energy levels in an atom etc.
Example. For a single qubit, the orthonormal states |0i and |1i give a quantum repre-
sentation of the classical bit values 0 and 1. Another pair of orthonormal states that we
will frequently encounter in applications is the following pair, labelled by plus and minus
signs:
1 1
|+i = √ (|0i + |1i) |−i = √ (|0i − |1i) .
2 2
They are “equally weighted superpositions” in the sense that the squared amplitudes of
0 and 1 are equal in each state. The basis {|+i , |−i} is called the conjugate basis (and
the states themselves are called the conjugate basis states).
(QM2) (composite systems): if system S1 had state space V and system S2 has state
space W then the joint system obtained by taking S1 and S2 together, has states given
by arbitrary unit vectors in the tensor product space V ⊗ W .
Fundamental example: (n qubits) A system comprising n qubits thus has state space
⊗n V2 of dimension 2n . An n-qubit state |ψi is a called a product state if it is the product
of n single-qubit states |ψi = |v1 i |v2 i . . . |vn i and |ψi is called entangled if it is not a
product state.
As we’ve mentioned previously, the computational basis or standard basis for n qubits is
given by the tensor products of |0i’s and |1i’s in each slot, giving the 2n orthonormal
vectors |i1 i |i2 i . . . |in i where each i1 , . . . in is 0 or 1. Thus the basis vectors are labelled
by n-bit strings and we often write |i1 i |i2 i . . . |in i simply as |i1 i2 . . . in i.
We note the significant fact that as the number of qubits grows linearly, the full state
description (given as the full list of amplitudes) grows exponentially in its complexity.
However the description of any product state grows only linearly with n (each successive
|vi i is described by two further amplitudes) so this exponential complexity of state de-
scription is intimately related to the phenomenon of entanglement that arises for tensor
products of spaces. With this in mind, it is especially interesting to contrast (QM2) with
10
its classical counterpart – for classical physics, the state space of a composite system is
the cartesian product of the state spaces of the constituent parts. Thus if classical sys-
tem S requires K parameters for its state description then a composite of n such systems
will require only nK parameters i.e. a linear growth of description, in contrast to the
exponential growth for quantum systems.
Dirac notation: linear maps
To illustrate the notation and formalism, we’ll consider the case of linear maps on V2 and
its tensor powers. With |vi = a |0i + b |1i and |wi = c |0i + d |1i in V2 , standard matrix
multiplication for the “ket-bra” product gives
∗
ac ad∗
a ∗ ∗
M = |vi hw| = (c d ) = (1)
b bc∗ bd∗
Projection operators
An important special case of eq. (1) is when |vi = |wi and |vi is normalised (i.e.
hv|vi = 1). Then Πv = |vi hv| is the operator of projection onto |vi, satisfying Πv Πv = Πv .
The latter property can be seen very neatly in Dirac notation: Πv Πv = (|vi hv|)(|vi hv|) =
|vi hv|vi hv| = |vi hv| = Πv as hv|vi = 1. If |ai is any vector orthogonal to |vi then
Πv |ai = |vi hv|ai = 0. It then easily follows that for any vector |xi, Πv |xi is the vector
obtained by projection of |xi into the one dimensional subspace spanned by |vi. Similarly
for any vector space W (of any dimension), if |wi is any normalised vector in W , then
11
Πw = |wi hw| is the linear operation of projection into the one-dimensional subspace
spanned by |wi.
More generally, if E is any linear subspace of a vector space V and {|e1 i , . . . , |ed i} is any
orthonormal basis of E (which thus has dimension d), then ΠE = |e1 i he1 | + . . . + |ed i hed |
is the operator of projection into E. This property is easily checked by extending the
given basis of E to a full orthonormal basis of the whole space V . Then by writing any
vector |ψi in V in terms of this basis we readily see that ΠE |ψi is indeed its projection
into E.
Finally we point out a possible notational confusion: if |xi = A |vi then the corresponding
bra vector is given by hx| = (A |vi)† = |vi† A† = hv| A† . This follows from the fact that
taking adjoints of matrix products reverses the product order (M N )† = N † M † . Thus for
example in the bra-ket inner product construction we can write ha| M |bi as ha|xi or as
hy|bi where |xi = M |bi but |yi = M † |ai (so hy| = ha| M ) i.e. the central M in ha| M |bi
acts as M if viewed as acting to the right, but acts as M † if viewed as acting to the left
i.e. on the ket |ai before it is turned into a bra vector.
Tensor products of maps
If
p q
B=
r s
is a second linear map on V2 then the tensor product of maps A ⊗ B : V2 ⊗ V2 → V2 ⊗ V2
is defined by its action on the basis |ii |ji → A |ii B |ji for i and j being 0,1. Extending
this linearly defines A ⊗ B on general vectors in V2 ⊗ V2 . In particular for product vectors
we get (A ⊗ B)(|vi |wi) = A |vi ⊗ B |wi.
The 4 × 4 matrix of components of A ⊗ B has a simple block form, as can be seen by
writing down its action on basis states in components (giving the columns of the matrix
of A ⊗ B). We get the following pattern (similar to our previous pattern for components
of tensor products of vectors):
ap aq bp bq
aB bB ar as br bs
A⊗B = =
cp cq dp dq .
cB dB
cr cs dr ds
Important special cases of tensor product maps are A ⊗ I and I ⊗ A, being the action
of A on the first (resp. second) component space of V2 ⊗ V2 , leaving the other space
“unaffected”.
Example: for |ψi = |00i + |11i and A as above, we have
A ⊗ I |ψi = (A |0i) |0i + (A |1i) |1i = (a |0i + c |1i) |0i + (b |0i + d |1i) |1i
= a |00i + b |01i + c |10i + d |11i .
On the other hand I ⊗A |ψi = |0i (A |0i)+|1i (A |1i) giving a |00i+c |01i+b |10i+d |11i,
which is different.
Quantum principle (QM3)
12
(QM3) (physical evolution of quantum systems): any physical (finite time) evo-
lution of an (isolated) quantum system is represented by a unitary operation on the
corresponding vector space of states.
Recall that a linear operation U on any vector space is unitary if its matrix has U −1 = U †
(where dagger is conjugate transpose). We have the following equivalent characterisations
(useful for recognising unitary operations). U is unitary:
iff U maps an orthonormal basis to an orthonormal set of vectors;
iff the columns (or rows) of the matrix of U form an orthonormal set of vectors.
Below (after (QM4)) we will introduce a number of particular unitary operations on one
and two qubits that will be frequently used.
Dirac notation: partial inner products for vectors in V ⊗ W
For expressing our final quantum principle (QM4) it will be useful to introduce a ‘partial
inner product’ operation on tensor product spaces. Any ket |vi ∈ V defines a linear map
V ⊗ W → W which we call “partial inner product with |vi”. It is defined on the basis of
V ⊗ W by the formula |ei i |fj i → hv|ei i |fj i, and on general vectors in V ⊗ W by linear
extension of its basis action. Similarly for any |wi ∈ W we get a partial inner product
mapping V ⊗ W to V . If V and W are instances of the same space (e.g. we will often
have them both being V2 ) then it is important to specify (e.g. with a subscript label
on the kets) which of the two spaces is supporting the bra-ket construction of the inner
product.
Example. For |vi ∈ V and |ξi ∈ V ⊗ V we can form the partial inner product on the
first or second space. To make the position explicit we’ll introduce subscripts to label the
slots, writing V ⊗ V as V(1) ⊗ V(2) , and writing 1hv|ξi12 ∈ V(2) for partial inner product
on the first slot, and 2hv|ξi12 ∈ V(1) for partial inner product on the second slot.
Thus for example, if V = V2 and |ξi = a |00i + b |01i + c |10i + d |11i then the orthonor-
mality relations hi|ji = δij give 1h0|ξi12 = a |0i + b |1i and 2h0|ξi12 = a |0i + c |1i i.e. we
just pick out the terms of |ξi that contain 0 in the first, respectively second, slot.
The partial inner product operation is in fact just the familiar operation of contraction of
tensor indices (with a complex conjugation). In index notation (with components always
relative to an orthonormal product basis), if |ξi ∈ V ⊗ V and |vi ∈ V , have components
ξij and vk respectively then the partial inner products of |vi with |ξi on the two slots are
respectively vi∗ ξij and vj∗ ξij (with the summation convention applied i.e. repeated indices
are summed).
13
different – quantum measurements generally have only probabilistic outcomes, they are
“invasive”, generally unavoidably corrupting the input state, and they reveal only a
rather small amount of information about the (now irrevocably corrupted) input state
identity. Furthermore the (probabilistic) change of state in a quantum measurement is
(unlike normal time evolution) not a unitary process. Here we outline the associated
mathematical formalism, which is at least, easy to apply.
The basic Born rule – complete projective (or von Neumann) measurements
Suppose we are given a (single physical instance of a) quantum state for a system with
state space V of dimension
P n. Let B = {|e1 i , . . . , |en i} be any orthonormal basis of V
and write |ψi = ai |ei i. Then we can make a quantum measurement of |ψi relative
to the basis B. This is sometimes called a (complete) von Neumann measurement or
projective measurement. The possible outcomes are j = 1, . . . , n corresponding to the
basis states |ej i. The probability of obtaining outcome j is
pr(j) = |hej |ψi|2 = |aj |2 .
If outcome j is seen then after the measurement the state is no longer |ψi but has been
“collapsed” to |ψafter i = |ej i i.e. the basis state corresponding to the seen outcome.
Stated alternatively: the probability is the squared length of the projection of |ψi onto
the basis state, and the post-measurement state is that projected vector, renormalised
to have length 1. Since |ψafter i = |ej i corresponding to the seen outcome j, if we were to
apply the measurement again we will simply see the same j with certainty, and not be
able to sample the probability distribution pi = |ai |2 again.
The qualifier “complete” in “complete projective measurement” refers to the fact that
the projections here are into one-dimensional orthogonal subspaces (defined by the or-
thonormal basis). The notion of incomplete projective measurement generalises this to
arbitrary decompositions of the state space into orthogonal subspaces (of arbitrary di-
mension, summing to the dimension of the full space).
Incomplete projective measurements
Let {E1 , . . . , Ed } be any decomposition of the state space V into d mutually orthogonal
subspaces i.e. V is the direct sum E1 ⊕ . . . ⊕ Ed . Let Πi be the operation of projection
into Ei . Thus Πi Πi = Πi (property of any projection operator) and by orthogonality we
have Πi Πj = 0 for all i 6= j. Then the incomplete measurement of any state |ψi relative
to the orthogonal decomposition {E1 , . . . , Ed } is the following quantum operation: the
measurement outcomes i = 1, . . . d have probabilities given by the squared length of the
projection of |ψi into Ei :
prob (i) = hψ| Πi Πi |ψi = hψ| Πi |ψi
and the post-measurement state |ψi i for outcome i is the (“collapsed”) projected vector
renormalised to unit length:
p
|ψi i = Πi |ψi / prob (i).
A complete projective measurement is thus clearly a special case in which all the sub-
spaces have dimension one. Any incomplete measurement (with orthogonal decomposi-
tion {E1 , . . . , Ed }) can be refined to a complete one by choosing an orthonormal basis of
14
the state space that is consistent with the Ei ’s i.e. each Ei is spanned by a subset of the
basis vectors. Then by performing this complete measurement (instead of the incomplete
one) we can recover the output probabilities of the incomplete measurement outcomes by
summing all the probabilities corresponding to basis vectors in each subspace Ei . How-
ever the post-measurement states will be different for the incomplete measurement and
its refinement.
Example. (parity measurement). The parity of a 2-bit string b1 b2 is the mod 2 sum
b1 ⊕b2 . The parity measurement on two qubits is the incomplete measurement on the four
dimensional state space with two outcomes (labelled 0 and 1), which on the computational
basis states corresponds to the parity of the state label. Thus the corresponding orthog-
onal decomposition is E0 = span {|00i , |11i} and E1 = span {|01i , |10i}. Upon measure-
ment, the state |ψi = a |00i + b |01i + c |10i + d |11i will give outcome 0 with probability
√
p0 = |a|2 +|d|2 and the post-measurement state would then be |ψ1 i = (a |00i+d |11i)/ p0 .
Measurement of “quantum observables”
In textbooks we often read of measurement of a quantum observable O which is just a
slight variation of the above notion of complete or incomplete projective measurement.
A quantum observable is defined to be a Hermitian operator O on V . Recall that a Her-
mitian matrix always has real eigenvalues λi and the corresponding eigenspaces Λi (with
dimension given by the multiplicity of the corresponding eigenvalue) give an orthogonal
decomposition of V . The measurement of the quantum observable O is then just the
incomplete measurement relative to that orthogonal eigenspace decomposition of V and
in which we label the outcomes by the corresponding eigenvalue λi (rather than just i or
some other label). If the eigenvalues of O are all non-degenerate, then the measurement
will be a complete projective measurement. For the purposes of obtaining information
about the state identity, the actual choice of naming of the distinct outcomes is of no
real consequence, so we prefer to base our notion of quantum measurement on the under-
lying orthogonal decomposition of the state space rather than the hermitian observable
itself. However, for other purposes the physical observable is important: physical imple-
mentation of a measurement involves a physical interaction between the system and a
“measuring apparatus” and if for example, the basic |0i and |1i states of our qubit phys-
ically are spin-Z eigenstates or photon polarisations or two chosen energy energy levels
in a Calcium atom (with corresponding quantum observables being spin, polarisation or
energy respectively), this knowledge will have a crucial effect on how the measurement
interaction for a standard basis measurement is actually implemented.
The extended Born rule
We will often consider measurement of only some part of a composite system, which
is in fact just a particular kind of incomplete measurement. The associated formalism
for probabilities and post-measurement states is called the extended Born rule and we
give an explicit description here (as it will be often used). Suppose |ψi is a quantum
state of a composite system S1 S2 with state space V ⊗ W . Let B = {|e1 i ,P . . . , |en i} be an
orthonormal basis of V . Note that |ψi can be expanded uniquely as |ψi = i |ei i |ξi i with
|ξi i being vectors in W (not generally normalised nor orthogonal). Indeed orthonormality
of the basis gives the |ξk i’s as the partial inner products |ξk i = hek |ψi. Alternatively if
15
P
{|f1 i , . . . , |fm i} is a basis of P
W then writing |ψi = ij aij |ei i |fj i in the product basis
of V ⊗ W , we see that |ξk i = j akj |fj i i.e. we just pick out all terms of |ψi that involve
|ek i in the V -slot.
Now we can make a measurement of |ψi ∈ V ⊗ W relative to the basis B of V . This
amounts to an incomplete measurement in V ⊗ W relative to its decomposition into
mutually orthogonal subspaces Ei given by Ei = span{|ei i ⊗ |ψi for all |ψi ∈ W }. The
extended Born rule asserts the following:
(a) the probability of outcome k = 1, . . . , n is pr(k) = hξk |ξk i i.e. the squared length of
the partial inner product hek |ψi;
(b) if the outcome k is seen then the post-measurement state is the product state
p
|ψafter i = |ek i |ξk i / pr(k)
16
in orthogonal subspaces of the state space. Consequently two states are reliably physi-
cally distinguishable iff the corresponding vectors are orthogonal. Here distinguishability
means that there is a measurement which respectively outputs two distinct results, say
0 or 1, with certainty when applied respectively to the two states. We will explore conse-
quences of this important non-classical feature much more later! – but we emphasise here
that in contrast, in classical physics any two different states of a system are in principle
distinguishable.
Remark
If |ψi is an n-dimensional state then for all the measurements we have described, there
are at most n outcomes. But we can associate properties to |ψi with more than n
outcomes by enlarging the space as follows. We introduce a second quantum system
(called an ancilla) of any dimension m, in some fixed state |Ai independent of |ψi. Then
performing a complete projective measurement on the joint system |ψi |Ai we will obtain
an outcome depending only on |ψi and having mn possible values. Such measurements
have the curious physical feature that generally there will be no states |ψi having any
one of the outcomes with certainty. (why?)
Remark (optional)
In the most general form of quantum measurement theory there is a notion of measure-
ment that is more general than the projective measurements we have described. It is
called a positive operator valued measurement (or POVM) and it can have any number
of outcomes. We will not describe the formalism of POVMs in this course but just say
that it can be shown (the so-called Stinespring dilation theorem) that any such POVM
is equivalent to a projective measurement in an enlarged space, constructed exactly as in
the previous remark, and as such, does not provide any genuinely new features that are
not already accessible by projective measurements.
Remark (optional): global and relative phases.
If |vi is any unit vector then the states |vi and eiα |vi will have the same measurement
probabilities (for any basis or orthogonal decomposition), independent of α (since prob-
abilities always depend on squared moduli of amplitudes.) Also under unitary (hence
linear) evolution the phase eiα just persists unchanged as a coefficient scalar multiplier.
Here α is called a global phase. Thus |vi and eiα |vi represent identical physical situations
and in (QM1) we should (more correctly) have said that states of a physical system cor-
respond to unit vectors up to an (irrelevant) global phase. Note also that the projection
operator Πv = |vi hv| is independent of the choice of global phase for |vi and hence it
can also be used to uniquely represent distinct physical systems (not having the global
phase ambiguity).
On the other hand θ in |0i+eiθ |1i is called a relative phase and it is a crucially important
parameter for the qubit state. Indeed for example, we can think of any unitary operation
as evolving |0i and |1i separately and combining the results with relative phase θ which
will affect the way that the two terms interfere (cf below). A notable illustrative example
is the pair of states |+i = √12 (|0i + |1i) and |−i = √12 (|0i − |1i). These differ only by a
relative phase of π but they’re easily seen to be orthogonal, so can be distinguished with
certainty by a suitable measurement.
17
2.3 Some basic unitary operations for qubits
Unitary operations on qubits are also called quantum gates. Matrices given below are
always relative to the standard basis |0i , |1i. The following notations for commonly
occurring gates will be used throughout the course, and you should memorise them.
One-qubit gates
1 1 1
Hadamard gate H=√ .
2 1 −1
X is sometimes also called the (quantum) NOT-gate because it interchanges the kets |0i
and |1i i.e. effects the classical NOT operation on the label.
Note that {|+i , |−i} is the X-eigenbasis and {|0i , |1i} is the Z-eigenbasis (in each case
corresponding to eigenvalues +1 and −1 respectively). We also have the formulas
(noting the cyclic shift of x, y, z labels in the latter set). Note that the matrices I, σx , σy , σz
are all Hermitian as well as unitary (which is an unusual coincidence). Finally we have
the
1 0
Phase gate Pθ = .
0 eiθ
Two-qubit gates
Controlled-X (or controlled-NOT) gate
1 0 0 0
0 1 0 0 (I) (0)
CX = = .
0 0 0 1 (0) (X)
0 0 1 0
18
For the four basis states we can compactly write CX |ii |ji = |ii |i ⊕ ji where ⊕ denotes
addition modulo 2. Note that for any 1-qubit state |αi we have
i.e CX applies X to the second qubit if the first is set to “1” and acts as the identity if the
first is set to “0” (and extends by linearity if the first qubit is in a superposition of the two
values etc.) Accordingly the first qubit is called the control qubit and the second is called
the target qubit. Note that we get a different 2-qubit gate if we interchange the qubit roles
of control and target. Thus if there is an ambiguity as to which qubit is to be the control
and target we introduce labels (say 1 and 2) for the two qubit system, writing CX12 or
CX21 with the first subscript always denoting the control qubit and the second, the target
qubit. Thus for example CX12 |0i1 |1i2 = |0i1 |1i2 whereas CX21 |0i1 |1i2 = |1i1 |1i2 .
The controlled-Z gate:
1 0 0 0
0 1 0 0
CZ = = (I) (0)
0 0 1 0 (0) (Z)
0 0 0 −1
i.e. as for CX, CZ applies Z to the second qubit controlled by the state of the first qubit.
Note that despite this asymmetrical description, CZ (unlike CX) is actually symmetric
in its action on the two qubits.
Example. Although quantum operations on computational basis states generally do
not correspond to just Boolean functions on their labels (e.g. the Z and H gates), they
can serve to reveal new relationships between such operations that do (like the CN OT
operation), with these relations being not expressible within the formalism of Boolean
algebra. For example we saw that CN OT12 and CN OT21 were different operations,
corresponding to two Boolean functions of the basis state labels. We can now verify that
i.e. we can reverse the control/target roles of the two bits by applying H to each bit
vector both before and after the CN OT action. This relation is not possible to achieve
with any classical 1-bit Boolean operations before and after the CN OT s.
The validity of this relation can be readily checked e.g. by computing the actions of the
gates on each basis state in turn, and we omit the details here (which you can easily
provide).
19
2.4 An aside: Superposition and quantum interference
which maps |0i to √12 (|0i + |1i) and |1i to √12 (|0i − |1i). H is also its own inverse so H
applied to |ψ0 i gives |0i. Hence if we prepare a qubit in state |ψ0 i, apply H and then
measure it, we will see outcome 0 with certainty. On the other hand if we prepare either
|0i or |1i and conduct the same process then in each case we will see outcome 0 and 1 with
probabilities half i.e. the equal probabilistic mixture of |0i and |1i will behave differently
from the equal superposition state. In an intuitive sense we think of |ψ0 i = √12 (|0i + |1i)
as “simultaneously having both |0i and |1i present” rather than just a probabilistic choice
√
of one or the other. When we apply H to |ψ√0 i, |0i (starting with amplitude a = 1/ 2)
√ c = 1/ 2 and to |1i with amplitude c whereas |1i
is transformed to |0i with amplitude
(starting with amplitude a = 1/ 2 also) is transformed to |0i with amplitude c and to
|1i with amplitude −c. Thus the total amplitude to go from |ψ0 i to |0i is made up of
two “paths”: a(c) + a(c) = 1 which add. On the other hand the total amplitude to go
to |1i also has two contributions but these cancel: a(c) + a(−c) = 0. In the first case
we say that the two paths interfere constructively whereas in the latter case we say that
they interfere destructively.
These notions of “interfering paths” form the basis of Feynman’s sum-over-paths
description of quantum mechanics which we now briefly outline. To illustrate the
formalism consider the simple quantum process of applying H twice to |0i:
20
|0i
@
@
apply H
R
@
√1 |0i + √1 |1i
2 2
PP
@ PP apply H
@ PP
PP
? R
@ q
P
√1 ( √1 |0i + √1 |1i) + √1 ( √1 |0i − √1 |1i) = |0i
2 2 2 2 2 2
We think of this as a process of “transitions between basis states |0i and |1i with pre-
scribed amplitudes and we depict it as a branching tree, just like a probabilistic process
but the branches are labelled by amplitudes, not probabilities:
|0i √1 |0i
-
2
√1 @
2 @ √1
@2
|0i @
@
@
@ √1 @
@√1 2 @
@2 @
R
@
@
R
@
|1i − √12 -
|1i
The rules for accumulating amplitudes are just like those for probabilities:
(i) each path has an amplitude given by the product of numbers along the path;
(ii) each final basis state has an amplitude given by the sum over all paths from the
start to it;
(iii) the probability for the transition from initial |0i to a final basis state is the modulus
square of the sum over all paths to it.
This is the Feynman sum-over-paths formulation of quantum processes. It turns out to
be an alternative equivalent description of the calculations involved in multiplying gate
matrices and applying the Born rule for a measurement on the final state of the process.
As an illustration, in our example above there are two paths to go from initial |0i to final
|0i or final |1i. For final |0i the two paths interfere constructively: | √12 √12 + √12 √12 |2 = 1
whereas for final |1i the two paths interfere totally destructively: | √12 √12 − √12 √12 |2 = 0
and this transition is forbidden.
For probabilistic trees we can simulate the whole process by walking through some single
path of the tree so long as we make an appropriate probabilistic choice of direction at each
node along the way. But for quantum amplitude trees this is not possible! In the above
example there are non-zero single paths from initial |0i to final |1i yet this transition is
forbidden. The process “needs to know about” all paths and how they interfere!
21
Just as in the example above, any quantum process of applying a sequence of unitary
gates to an initial starting state of say, n qubits in state |0i . . . |0i can be represented as
a branching tree of amplitudes. The nodes are now labelled by n-bit strings organised
in columns, such that each column contains each n-bit string once. The quantum state
at the k th stage of the process corresponds to the nodes in the k th column of the tree.
The lines connecting column k to column k + 1 are labelled by the amplitudes of the k th
unitary gate acting on the corresponding basis state in the k th column. The Feynman
sum-over-paths rules then correctly reproduce the calculation of matrix multiplication of
the corresponding unitary matrices to determine the amplitudes in the final state.
An important feature of the n qubit process (with depth say of order n too) is that the
tree will generally have exponentially many nodes and exponentially many paths from the
root to any final node. Suppose we are given a description of the process i.e. the sequence
of operations to be applied, that generates the tree. If it were a classical probabilistic
branching tree, we would still be able to sample the output distribution quite easily – by
just “walking through the tree” once, along a single path, making probabilistic choices
at each node as we go. But for a tree of quantum amplitudes, because of interference as
above, to sample the output distribution by classical means (given the description of the
process), it does not suffice to choose some single pass through the tree – we need to know
about all (generally exponentially many!) paths from the root to the chosen final node,
in order to be able to sample its probability distribution. On the other hand the quantum
process itself perform only one single physical pass through the tree, albeit in quantum
superposition over all the nodes at each depth. In this sense the quantum physical
process achieves something that requires massive classical effort to mimic or simulate
i.e. sampling the output distribution by quantum physical means requires exponentially
less physical effort (physically just one quantum pass) than by classical means (needing
to know the results for exponentially many passes). This feature, expressed intuitively
here, is in fact a key effect for the power of quantum computing (compared to classical
computing) that we’ll be studying in the second half of the course.
22
3 Quantum states as information carriers
23
3.1 The no-cloning theorem
We now give our first tantalising property of quantum information which is not shared
by classical information: quantum information cannot be copied or ‘cloned’ !
The copying of classical information is a very familiar process (e.g photocopying) and it is
common to view the possibility of copying as obvious and unremarkable. In fundamental
terms, given some classical information represented in the state of a classical physical
system A (e.g. text ink and paper page) we can take another similar system B, initially
in some fixed (“blank”) state independent of A (blank sheet of paper). We can then
carry out a physical operation (operation of the photocopier) on the joint system AB to
reliably measure or “read” the state of A (without any corruption) and evolve the state
of B to that measured state, to achieve cloning.
Now let us consider similar ideas for quantum information to establish what we’d mean
by quantum copying or cloning. Our process will involve three quantum subsystems A,
B and M . A will contain our quantum information to be copied; B is system (with state
space of same dimension as A) which should finally contain the copy that we seek; M
will represent any extra “machinery” or physical objects that are needed in the cloning
process (like the photocopy machine for classical copying).
Initially A will contain |ψi and B will contain some standard “blank” state denoted |0i;
M will also be in some fixed starting state, denoted |M0 i representing the initial “ready”
state of the machine and any materials that it may require. A crucially important point
here is that the initial state |0i |M0 i of BM should be independent of |ψi.
The cloning process will be a fixed quantum physical evolution of ABM achieving the
following transformation:
i.e. the quantum information is copied into B (while also remaining intact in A) and the
final state of M is allowed to be arbitrary and even depend possibly on |ψi (as indicated).
The cloning process may be required to work for all states |ψi of A or alternatively only
for some restricted subset of states.
No-cloning theorem. Let S be any set of states of A that contains at least one non-
orthogonal pair of states. Then no unitary cloning process exists that achieves cloning
for all states in S.
Remark.
The no-cloning theorem actually remains true for arbitrary prospective cloning processes,
not just unitary ones i.e. even if the further operations of (Ancilla) and (Measure) are
allowed to be included. If (Measure) is used then all probabilistic branches are required
to lead to perfect cloning. The use of these extra operations can in fact be reduced to
the fully unitary case, and in this course we will prove only the unitary case.
For (Ancilla) it is clear that we can just include any needed ancilla into the initial state
of M to be used when required.
24
The case of (Measure) is rather more complicated (optional!):
suppose that during a process we make a (generally incomplete) measurement with d
outcomes i = 1, . . . , d and subsequently apply unitary operations Ui that can even be
chosen adaptively to depend on the earlier measurement outcome. Then it can be shown
that this gives a final result that is totally equivalent to the following fully unitary sub-
stitute:
Let C be an extra ancilla system with d-dimensional state space and orthonormal basis
|ii labelled by the d measurement outcomes. Let Λi for i = 1, . . . , d be the orthogonal
subspaces of ABM corresponding to the outcomes of the (generally incomplete) mea-
surement. Let {|ki}k∈K be an orthonormal basis of ABM (labelled by elements of some
set K) that is consistent with the orthogonal decomposition into the Λi ’s i.e. each Λi is
spanned by a subset of the |ki’s. And for each k let i(k) denote the subspace label (i.e.
measurement outcome) to which the basis ket |ki belongs.
Let |c0 i be any fixed chosen state of CPand finally consider the mapping V on the states
of CABM achieving the following. If k ak |ki is any state of ABM then
X X
|c0 i ak |ki −→ ak |i(k)i |ki .
k k
One can check that this mapping, depending only on the description of the measurement,
is unitary. In words, for any state of ABM written in the |ki basis, V attaches an extra
label (in the ancilla state) to each basis state corresponding to its corresponding mea-
surement outcome. We can view C physically as a pointer system for the measurement
and the state on RHS above represents the superposition of all measurement outcomes
(after collecting all terms with the same i(k) for each measurement outcome i) having
the associated measurement result attached as an extra label i.e. it represents the result
of the measurement before probabilistic collapse to some one outcome occurs.
Finally back to our task: to replace the measurement on ABM by a unitary process, we
adjoin the ancilla C and we replace the measurement by the unitary operation V above.
Subsequent unitary operations Ui on ABM that depended on the measurement outcome
are then also replaced by the single larger controlled unitary operation on CABM , the
“multiple controlled operation” (with C being a d-dimensional control register) that for
each i, applies Ui to AMB when C is in state |ii and linear extension to general superpo-
sition states of CABM . Then we can readily check that if the original process (with the
intermediate measurement) achieved cloning (for all possible measurement outcomes)
then our new (now fully unitary) process will also achieve cloning, thus reducing the
intermediate measurement case to the unitary case.
Proof of the no-cloning theorem (for unitary processes)
Let |ξi and |ηi be two distinct non-orthogonal states in S. Then the cloning process
must do both the following evolutions:
|ξiA |0iB |M0 iM −→ |ξiA |ξiB |Mξ iM
|ηiA |0iB |M0 iM −→ |ηiA |ηiB |Mη iM
(for some possibly different states |Mξ i and |Mη i of M ). Now, any unitary process
preserves inner products so the inner product of the two initial states must equal that of
25
the two final states:
Taking absolute values in eqn. (*) and using h0|0i = hM0 |M0 i = 1 we get
Since |ξi =
6 |ηi and |ξi is not orthogonal to |ηi we have 0 |hξ|ηi| 1 and cancelling it
gives
1 = |hξ|ηi| |hMξ |Mη i|
which is a contradiction since |hξ|ηi| 1 and |hMξ |Mη i| ≤ 1.
Example. (cloning and superluminal signalling)
The no-cloning theorem was proved in 1982 independently by D. Dieks and by W. Woot-
ters & W. Zurek. The theorem also appears (at least implicitly) in earlier work of D.
Park of 1970 but this went completely unnoticed until recently. The 1982 work arose in
response to a proposal by F. Herbert for a method of superluminal (in fact instantaneous)
communication using quantum methods. If Herbert’s result were correct it would have
cast serious doubt on quantum theory as an acceptable physical theory! Fortunately
there was an error in Herbert’s argument – he had taken for granted without justification
or discussion that quantum states could be cloned!
Herbert’s method was as follows. Suppose Alice and Bob are distantly separated and
they share the entangled state
Alice wants to communicate a yes/no decision instantaneously to Bob at noon. She does
the following.
Alice’s action:
at noon, for ‘yes’ she measures her qubit in the standard (i.e. Pauli Z) basis {|0i , |1i},
and for ‘no’ she measures her qubit in the X basis {|+i , |−i}.
According to the Born rule, after her ‘yes’ action, Bob’s qubit will be in state |0i or |1i
with 50/50 probability; and after her ‘no’ action, Bob’s qubit will be in state |+i or |−i
with 50/50 probability (as is immediately clear from the second formula for |φ+ i above).
Fact:
these ‘yes’ and ‘no’ preparations of Bob’s qubit are completely indistinguishable by any
local action (measurement) on Bob’s qubit i.e. they each give exactly the same probability
distribution of outcomes for any measurement (and also in fact the same as that for Alice
26
having done no action at noon!) Indeed if Πi is the projection operator for outcome i of
a measurement by Bob, then in the ‘yes’ case, his probability of seeing i is
1 1 |0i h0| + |1i h1|
probyes (i) = h0| Πi Πi |0i + h1| Πi Πi |1i = trace Πi ( )
2 2 2
(since Π2i = Πi and hA|Bi = trace (|Bi hA|).
On the other hand, in the ‘no’ case we similarly get
|+i h+| + |−i h−|
probno (i) = trace Πi ( ).
2
Now (as can easily be checked by computing the 2 × 2 matrices of components) we have
Suppose we are given an unknown quantum state |ψi that is promised to be one of
two non-orthogonal states |αi i for i = 0, 1, and we wish to determine which one it is
i.e. the value of subscript i. We have seen that this is impossible to do with certainty
(which would then e.g. also provide a method for cloning the states!) but can we still
obtain some information about i, and then, how much? Since quantum measurement
outputs are generally probabilistic we ask if we can identify the state while allowing
some probability of error in the answer, or failure of the process. For example we could
just do nothing with the state and randomly guess i = 0 or 1, which would always be
27
correct with probability half. But we can do better than this by performing a quantum
measurement on |ψi to guide our answer.
More formally we consider a state estimation process of the following general kind. Given
|ψi we first adjoin an ancillary system in some fixed state |Ai (independent of |ψi) which
has the effect of enlarging the state space that we can work in. Then we apply a unitary
operation to the joint system and finally a measurement with two outcomes labelled 0
and 1 corresponding to our guess of |ψi having been |α0 i or |α1 i.
Remark. (optional)
Using the constructions outlined in the remark after the statement of the no-cloning
theorem, it can be shown that the triple of operations above viz. (Ancilla) followed
by (Unitary) followed by (Measure) is in fact equivalent to any general process i.e any
arbitrary sequence of our three basic operations. Indeed the construction in the previous
Remark can be used to replace intermediate measurements by unitary operations and all
measurements can be postponed to a single all-encompassing measurement at the end.
We will omit the technical details here.
We can simplify the mathematical description of our process as follows. Adjoining the
ancilla state |Ai amounts to just converting the discrimination problem from |α0 i vs.
|α1 i to |α0 i |Ai vs. |α1 i |Ai i.e. just another example of two non-orthogonal states,
which in fact even have the same inner product. Secondly applying a unitary U to any
state |ξi before a measurement with orthogonal projectors Π0 and Π1 , is equivalent to
just performing only a measurement with U -rotated orthogonal projectors Π0i = U † Πi U
as these give the same outcome probabilities:
Hence we can recast our state estimation process as just a single measurement: given one
of two possible non-orthogonal states |α0 i and |α1 i (which are now the |αi i’s with the
ancilla adjoined), perform a single two outcome measurement with projectors Π0 and Π1
(which are now the U -rotated versions of the original measurement).
Some measurements will be better than others by providing the correct answers with
higher probability. To formalise this we introduce a definition of success probability PS
for the process to quantify how good it is, and we’ll seek to optimise this. In the absence
of any prior knowledge about which of the two states |α0 i or |α1 i we will receive we
assume a prior probability of half for each. Then the success probability is defined by
1
PS = 2
prob {process outputs 0 given |α0 i was sent} +
1
2
prob {process outputs 1 given |α1 i was sent}
which by the Born rule becomes
1
PS = (hα0 | Π0 |α0 i + hα1 | Π1 |α1 i) .
2
Since Π0 + Π1 = I (the identity operator on the full space) we have Π1 = I − Π0 and so
1 1
PS = + trace Π0 (|α0 i hα0 | − |α1 i hα1 |). (∗)
2 2
28
The optimal choice of measurement {Π0 , I − Π0 } will be the one that maximises PS (for
the given known pair of states |αi i).
To explicitly identify the optimal measurement let’s look in detail at the operator
So
|c1 |2 −c0 c∗1
1 c0
c∗0 c∗1
D= 1 0 − = .
0 c1 −c1 c∗0 −|c1 |2
A straightforward calculation shows that det (D − δI) = 0 has solutions δ = ±|c1 | =
± sin θ, where we have introduced θ defined by cos θ = |hα0 |α1 i|.
Now, finally returning to our formula eq. (*) for PS and substituting our expression for
D, we get
PS = 21 + 2δ trace Π0 (|pi hp| − |mi hm|)
1
= 2
+ 2δ (hp| Π0 |pi − hm| Π0 |mi) .
For any projector Π and state |ξi we have 0 ≤ hξ| Π |ξi ≤ 1 (since hξ| Π |ξi is the squared
length of the projected state Π |ξi). Thus we see that PS achieves its maximum value
of (1 + δ)/2 if Π0 is chosen to be any subspace that contains |pi (so Π0 |pi = |pi) and is
orthogonal to |mi (so Π0 |mi = 0); and then Π1 = I − Π0 will have Π1 |mi = |mi. Such
a choice of Π0 is always possible since |pi and |mi are always orthogonal.
In particular for example if |α0 i and |α1 i are qubit states then we can just work entirely in
their two dimensional space and an optimal measurement to discriminate them will be the
measurement relative to the eigenbasis of the hermitian operator D = |α0 i hα0 |−|α1 i hα1 |.
The achievable optimal success probability above is known as the Helstrom-Holevo bound
for discriminating pure states.
In summary, we have proven
29
Theorem (Helstrom-Holevo bound for pure states): Given one of two equally likely
states |α0 i and |α1 i with |hα0 |α1 i| = cos θ, the probability PS of correctly identifying the
state by any quantum measurement process is bounded by PS ≤ 12 (1 + sin θ) and the
bound is tight (i.e. achieved by a particular choice of measurement).
Remark (unambiguous state discrimination)
Finally we mention that by “changing the rules of the game” we can formulate other
interesting kinds of state discrimination tasks, such as so-called unambiguous state dis-
crimination. For this task we are again given an unknown one of two states |α0 i and |α1 i
and we want to construct quantum measurement process with three outcomes called 0,
1 and ‘fail’ with the following properties:
(i) if measurement outcome 0 occurs then the state was certainly |α0 i;
(ii) if measurement outcome 1 occurs then the state was certainly |α1 i;
(iii) if measurement outcome ‘fail’ occurs then our process has failed and we have gener-
ally irretrievably lost all information about the given state.
In this scenario we would seek to minimise the average probability of obtaining the third
outcome.
Both scenarios fall short of providing reliably perfect discrimination of non-orthogonal
states albeit in interestingly different ways: in the first we always get an answer 0 or
1 with the caveat that it may be incorrect, whereas in the second the 0 and 1 answers
are always certainly correct but the catch now is that the process sometimes fails, and
destroys the state (so we cannot try again!) See exercise sheet 1 for an example of an
actual unambiguous state discrimination process.
Consider two parties Alice and Bob separated distantly in space, each holding their
own local quantum systems A and B respectively. Suppose they possess a (generally)
entangled) joint quantum state |φiAB of the joint system (sometimes called a bipartite
state since there are two subsystems). They each have access only to their respective part
of the bipartite state, which they can manipulate locally by quantum actions. Suppose
Alice performs a complete measurement on her subsystem A. According to the Born rule,
for each measurement outcome the state of Bob’s system will change instantaneously. If
Bob could notice this change then they would be able to communicate instantaneously!
Example.
Suppose that the shared bipartite state is the entangled state
+ 1
φ
AB
= √ (|00iAB + |11iAB ) .
2
If Alice performs a standard basis measurement on A then Bob’s system will be “col-
lapsed” into pure state |0i or |1i corresponding to Alice’s outcome, each occurring with
probability half. Before the measurement however, B was not in a pure state (but was
part of an entangled state). Can Bob notice this change by just local actions on his sub-
system? Note also that he does not know Alice’s outcome - acquisition of that knowledge
would require communication from Alice!
30
Suppose Bob performs a complete measurement in basis {|bi i : i = 0, 1}. By the Born
rule, after Alice’s measurement he will get
1 1 1 1
prob(i) = |h0|bi i|2 + |h1|bi i|2 = hbi |bi i =
2 2 2 2
(where we have averaged over Alice’s two possible outcomes using their probabilities of
half each). However before Alice’s measurement he would have had
1
prob(i) = || B hbi |φ+ iAB ||2 = too.
2
Thus even though each individual outcome of Alice’s measurement will give noticeably
different probabilities of i for Bob (viz. |h0|bi i|2 and |h1|bi i|2 ), if Bob does not know
Alice’s outcome he must average over their probabilities and his ability to notice any
change is lost!
This turns out to be true in full generality: for any bipartite state |φiAB , no local actions
by Alice can be noticed by Bob locally i.e. for any local measurement by Bob, the output
probability distribution is always unaffected by any local action by Alice. This is the
quantum no signalling principle. It seems bizarrely remarkable that quantum theory ap-
pears to involve non-local effects (at the level of state descriptions viz. post-measurement
states arising from local actions on composite systems) yet the full quantum formalism
conspires to prevent us from being able to harness this nonlocality for communication!
We now give a more formal formulation and proof of the no-signalling principle.
Local operations on a composite system
(loc-Unitary): a local unitary operation U by Alice resp. Bob on a bipartite system is
mathematically represented as the operator UA ⊗ IB resp. IA ⊗ UB on the full state of
AB (and here I is the identity operation). Note that any two local unitary operations
on disjoint subsystems always commute (as (U ⊗ I)(I ⊗ V ) = (I ⊗ V )(U ⊗ I) = U ⊗ V )).
(loc-Ancilla): Alice and Bob can adjoin local ancillary systems A0 and B 0 which simply
enlarge their locally held systems.
(loc-Measure): Let HA , HB and HAB = HA ⊗ HB denote the state spaces of A, B and
AB respectively. If Alice performs a (generally incomplete) local measurement on A
corresponding to the decomposition of HA into the orthogonal subspaces Ea ’s labelled
by outcomes a, then mathematically on the full state space this is represented by the
measurement with the orthogonal decomposition Ea ⊗ HB ’s of HAB . In particular even
a complete measurement on A will be an incomplete measurement of the full system.
If Fb ’s are the orthogonal subspaces of a local measurement by Bob on B with outcomes
b, then one can check from the Born rule that the joint probabilities prob(a, b) obtained
from performing both measurements is independent of whether A or B goes first, or
whether they measure ‘simultaneously’, corresponding to the global measurement with
orthogonal subspaces Ea ⊗ Fb for all pairs (a, b).
No-signalling theorem: Suppose Alice and Bob have access to subsystems A and B
respectively of a joint state |φiAB . Then Alice cannot convey any information to Bob by
31
performing local operations i.e. no local action by Alice can change the output probability
distribution of any local quantum process by Bob.
Proof: Consider first the basic case of Bob performing a complete measurement on B
relative to a basis {|bi} labelled by the outcomes b. We use this basis of B to express
|φiAB as X
|φiAB = |ξb iA |biB . (∗)
b
Here the |ξb iA ’s are subnormalised states of A given by the partial inner products
|ξb iA =Bhb|φiAB .
They are sometimes called conditional states of A given B’s measurement outcomes b,
or relative states of A, relative to basis states |bi of B. By the Born rule, the probability
of Bob seeing outcome b in the absence of any local operation by Alice, will be squared
length:
prob(b) = hξb |ξb i = | Bhb|φiAB |2 .
Now suppose Alice performs a complete measurement relative to basis |ai’s (labelled by
outcomes a) and subsequently Bob performs his measurement above. Then the joint
probabilities are given by
(which actually holds regardless of which time order the local measurements are per-
formed in). Then the marginal probability distribution for b i.e. the distribution that
Bob will see, is
prob(b) = Pa prob(a, b) = a |ha|ξb i|2
P P
= a hξP
b |aiha|ξb i
= hξb | { a |ai ha|} |ξb i .
P
But a |ai ha| = IA as {|ai} is a basis, so prob(b) = hξb |ξb i, which is the same (cf above)
as in the case of Alice not having done anything on her side.
This argument may be readily generalised to Alice and/or Bob performing incomplete
measurements too
P e.g. using the fact that for any orthogonal decomposition Ei of a state
space we have i Ei = I. We’ll omit the details here.
If Alice performs a local unitary U i.e. UA ⊗ IB is applied to |φiAB , then by eq. (*) above
this just changes the conditional states to |ξb0 i = U |ξb i and we have hξb0 |ξb0 i = hξb |ξb i so
Bob’s probabilities are again unchanged.
Finally if Alice and Bob include extra local ancillas this just has the effect of enlarging
their local spaces and the above arguments go through unchanged in this albeit enlarged
scenario.
Remark (communication complexity)
The fact that quantum theory appears to include non-local effects has a long history
going back at least to the iconically influential 1935 ‘EPR’ paper by A. Einstein, B.
Podolsky and N. Rosen. Later in the 1960’s J. Bell introduced what are now known as
32
the Bell inequalities, providing a much simplified and experimentally accessible way of
demonstrating the non-local effects. But (perhaps because of the no-signalling property)
these effects were largely ignored by “serious physicists” and viewed as just an awkward
curiosity or inconvenience. Then early in the 1990’s something remarkable happened: it
was realised that if Alice and Bob shared entangled states and were also allowed classical
communication too, then although entanglement by itself cannot provide communication
(by no-signalling) it can nevertheless greatly assist (when used alongside classical commu-
nication) by greatly reducing the amount of classical communication needed to achieve
some distributed tasks, involving inputs from both Alice and Bob. In some cases the
amount of classical communication could be reduced by a massive exponential amount
at the expense of a modest consumption of entangled states used alongside. As a result,
a whole new research area, called quantum communication complexity was born.
The basic scenario is the following: Alice and Bob possess separate n-bit strings x and y
and they wish to compute some joint function f (x, y) of both strings. Clearly they’ll need
to communicate (e.g. at least the results of some intermediate local calculations) and n
bits of communication each way always suffices (they just exchange the information of
their strings, and each can then compute f locally). It is now known that for some f ’s, if
Alice and Bob share some entanglement (e.g. some |φ+ i states), then f can be computed
using exponentially less classical communication than is possible by any method involving
just classical communication. (and for other f ’s entanglement provides no help at all).
Thus a communication network having shared entanglement along its connections (a
‘quantum internet’) can solve some distributed computing tasks with exponentially less
classical bit traffic across the network. Correspondingly entanglement is now recognised
to be a preciously valuable communication resource. In so-called quantum teleportation
(cf below) we’ll see another communication use for entanglement, this time for the task
of communicating quantum states.
33
3.4 Quantum dense coding
We have seen that an individual qubit (e.g. if received as a quantum message) can reliably
encode only a single classical bit, corresponding to having a maximum of two mutually
orthogonal states. Quantum dense coding is a way of doubling this information capacity:
a receiver can reliably extract two classical bits from a received single qubit if he is already
in possession of another qubit with which the newly received qubit had previously been
entangled. The protocol is very simple, and based on an important orthonormal set of
2-qubit states, the so-called Bell states, which we first introduce.
Bell states
The Bell states (named after the physicist J. S. Bell) are the following four orthonormal
entangled states of two qubits, (usually denoted in the literature as |φ± i and |ψ ± i):
This notation is standard and you should memorise it. Orthonormality is easily checked
directly. The 2-qubit basis B = {|φ+ i , |φ− i , |ψ + i , |ψ − i} is called the Bell basis. A
quantum measurement on two qubits relative to this basis is called a Bell measurement.
In physical terms, if |0i and |1i are the spin +1/2 and spin −1/2 states of a spin half
particle (say in the Z direction) then |ψ − i is the spin-zero singlet state of two spin half
particles and the other three Bell states span the 3-dimensional spin-1 triplet space.
In terms of our basic 1-qubit operators I, X, Y, Z it is again easy to check that:
|φ+ i = I ⊗ I |φ+ i
|φ− i = Z ⊗ I |φ+ i = I ⊗ Z |φ+ i
|ψ + i = X ⊗ I |φ+ i = I ⊗ X |φ+ i
|ψ − i = Y ⊗ I |φ+ i = −I ⊗ Y |φ+ i
and that for example, |φ+ i can be prepared from a simple computational basis state via
|φ+ i = (CX)(H ⊗ I) |00i.
The quantum dense coding protocol
Alice and Bob (distantly separated in space) each possess one qubit of a |φ+ i state. In
order to reliably communicate two classical bits to Bob by sending him only a single
qubit, Alice first locally applies the operation I, Z, X or Y to her qubit, to represent
the messages 00, 01, 10 or 11 respectively, and then sends her qubit over to Bob. On
receiving Alice’s qubit Bob simply performs a Bell measurement on the two qubit which
he now holds, to reliably read out Alice’s 2-bit message.
We will see the Bell measurement again below when we discuss quantum teleportation.
34
4 Quantum teleportation
Consider again our participants Alice and Bob who are distantly separated in space, and
suppose each possesses one qubit of the entangled Bell state
φ = √1 (|00i + |11i).
+
2
Suppose Alice has another qubit in some state |αi and she wants to transfer this qubit
state to Bob. How can she achieve this transfer? She may not even know the identity
of the state |αi and according to quantum measurement theory she is unable to learn
more than a small amount of information about it before totally destroying it! She can
place the (physical system embodying the) qubit state in a “box” and physically carry it
across to Bob. But is there any other way? What if the space region in between A and
B is a hostile and dangerous place?
Quantum teleportation provides an alternative method for state transfer that utilises
the entanglement in the state |φ+ i. As we’ll see precisely in a moment, but speaking
intuitively now, the state transfer from A to B is achieved “without the state having to
pass through the space in between” in the following sense: the transference is unaffected
by any physical process whatever that takes place in the intervening space. Note that
this is also a feature of the entanglement of |φ+ i: although quantum theory appears to
imply the existence of some kind of “non-local connection” between entangled particles
(e.g. reflected in correlations between local measurement results, cf exercise sheet 1), the
entanglement (“non-local connection”) itself remains entirely unaffected by any physical
process occurring in the space in between; it can change only by physical actions on the
particles themselves.
Let qubit 1, qubit 2 and qubit 3 denote respectively Alice’s input qubit (in state |αi),
Alice’s qubit of |φ+ i and Bob’s qubit of |φ+ i. Using subscripts to label the qubits the
starting state can be written |αi1 |φ+ i23 with 1,2 in A’s possession and 3 in B’s possession.
If
|αi = a |0i + b |1i
then we explicitly have
The protocol for quantum teleportation comprises the following five steps (i) to (v).
(i) Alice applies CX to her two qubits 1 and 2 (with 1 being the control and 2 the
target).
(ii) Alice applies H to her qubit 1.
(iii) Alice measures her two qubits (in the computational basis) to obtain a 2-bit string
00, 01, 10 or 11.
35
Note that the sequence (i), (ii), (iii) is equivalent to Alice just performing a Bell mea-
surement on the two qubits – indeed the unitary operations in (i) and (ii) simply serve to
rotate the Bell basis into the computational basis of the two qubits (as is easily checked).
By calculating the effect of these three operations on eq. (2) we see that each 2-bit string
is obtained with equal probability of 1/4 (irrespective of the values of a and b, recalling
that |a|2 + |b|2 = 1). Furthermore after the measurements in (iii) we have the following
post-measurement states (as you should calculate):
i.e. Bob’s qubit 3 is now disentangled from 1,2 and it is in a state that’s a fixed transform
of |αi, the choice of transform depending only on the measurement outcome and not on
the identity of |αi (i.e. not on the a, b values). In fact if the measurement outcome is ij
then Bob’s qubit will be “collapsed” into the state X j Z i |αi.
(iv) Alice sends the 2-bit measurement outcome ij to Bob (i.e. she sends him 2 bits of
classical information).
(v) On receiving ij Bob applies the unitary operation Z i X j (i.e. the inverse of X j Z i ) to
his qubit which is then guaranteed to be in state |αi.
This completes the teleportation of |αi from Alice to Bob.
Note that no remnant of any information about |αi remains with Alice. After stage (iii)
she is left with only a 2-bit string that has always been chosen uniformly at random
(independent of |αi) and the ‘original’ state |αi is always totally destroyed. Thus the
teleportation process is fully consistent with the no-cloning theorem, as indeed it must
be.
The whole protocol is shown diagrammatically in a spacetime diagram in figure 1. In fig-
ure 2 we give an alternative depiction of the protocol as a network of quantum gates. This
representation is perhaps more pertinent to computation (rather than communication),
using teleportation to transfer qubits between different parts of a quantum memory.
36
B’s output
qubit
3
6
.. Z iX j t2
.
..
3
3
TIME
.
.. two bits ij
. transmitted 6
3
3
Bell Mmt t1
3
Q
Q
Q
A’s input ... ..
Q
Q
Q
k 3
.
qubit Q
Q
.. Q
Qu
..
. +
. t0
|φ i state
.. created ..
. .
Alice Bob -
SPACE
Figure 1: Quantum teleportation. The figure shows a spacetime diagram with the en-
tangled state |φ+ i = √12 (|00i + |11i) created at t0 and subsequently distributed to A and
B. At time t1 Alice performs a Bell measurement on the joint state of her input qubit
and her qubit from |φ+ i and sends the outcome ij to Bob. On reception at time t2 , Bob
applies Z i X j to his particle which is then guaranteed to be in the same state as Alice’s
original input qubit.
|αi - H ij
measure
CX qubits
-
??
|φ+ i
@
@
|αi
@@ - Z iX j -
Figure 2: A quantum network diagram for teleportation. The diagram is read from left
to right. Horizontal lines represent qubits in a quantum memory. As a result of the above
37
sequence of operations the qubit state is transferred from the top line to the bottom line.
We conclude this section with a few further remarks about the teleportation process (for
possible discussion in lectures).
• Unlike “star-trek” teleportation, the physical system embodying |αi is not transferred
from A to B. Only the “information” of the state’s identity is transferred, residing finally
in a new physical system i.e. the system that was initially Bob’s half of |φ+ i.
• Before A’s measurements in (iii) Bob’s qubit has preparation: “the right half of |φ+ i”.
After A’s measurement Bob’s qubit has preparation: “ one of the four states |αi, Z |αi,
X |αi or XZ |αi chosen uniformly at random”. It can be shown (see exercise sheet 1)
that for any measurement process on Bob’s qubit, these two preparations give identical
probability distributions of outcomes so Bob cannot notice any change at all in his qubit’s
behaviour as a result of A’s measurements. This is just another example of the no-
signalling principle in action. Bob can reliably create the qubit state |αi only after
receiving the ij message from Alice.
• Figure 1 highlights one of the most enigmatic features of the quantum teleportation
process. The question is this: Alice succeeds in transferring the quantum state |αi to Bob
by sending him just two bits of classical information. Clearly these two bits are vastly
inadequate to specify the state (whose description depends on continuous parameters) so
“how does the remaining information get across to Bob?” What carries it? What route
does it take? Usually when information is transferred from one location to another, it
requires a channel for its transmission! But in figure 1 there is clearly another route
connecting Alice to Bob (apart from the channel carrying the two classical bits) and it
does indeed carry a qubit – it runs backwards in time from Alice to the creation of the
|ψi state and then forwards in time to Bob. Hence it is tempting to assert that most
of the quantum information of |αi was propagated along this route, firstly backwards in
time and then forwards to Bob! In this view, at times between t0 and t1 this part of
the state’s information was already well on its way to Bob even though Alice had not
yet performed her measurement! Such statements may appear paradoxical but further
consideration (maybe elaborate in lectures?) shows that, as an interpretation, this view
is actually entirely consistent and sound. Whether or not you accept this view as a
correct description of what actually happens in the real physical world, is only a matter
of personal preference!
• To add here – include some mention of practical implementations of teleportation, and
maybe discuss the question: “can we teleport a human?”
38
5 Quantum cryptography –
BB84 quantum key distribution
Introduction
The use of general quantum states to represent information and encode messages would
appear, from what we have seen, to have some significant drawbacks compared to the
use of classical states! Firstly (a): a received unknown quantum state cannot be reliably
identified (unless it has been promised to belong to a specified orthonormal set) so the
receiver cannot reliably read the message. Secondly (b): any attempt to read the message
(in the context of general signal states) results in only partial information and is always
accompanied by irrevocable corruption (“measurement collapse”) of the quantum state
and correspondingly, part of the sent message is necessarily irretrievably destroyed.
Remarkably these seemingly negative features can be used to positive effect to provide
valuable benefits for some cryptographic and information security issues, which in some
cases turn out to be impossible to achieve with classical signals. For example, intuitively
here, in communication between distant parties, (b) implies that any attempted eaves-
dropping on the message en route must leave a mark on the signal which can then in
principle be detected by actions of receiver in (public) discussion with the sender. It
turns out that this can be used to provide communication that’s provably secure against
eavesdropping. Classical messages on the other hand can always be read en route and
sent on to the receiver perfectly intact. Also it turns out (cf more below) that the effects
of (a) for the communicators can be circumvented by a suitably clever (non-obvious)
protocol involving further (public) discussion between them.
It is now known that quantum effects can provide benefits for a wide variety of cryp-
tographic tasks beyond just secure communication. The associated subject of quantum
cryptography is currently a highly active and flourishing area of scientific research world-
wide, with evident huge practical and theoretical significance for modern society which
is becoming increasingly reliant on secure information technology. In this course we will
consider only one cryptographic issue, but perhaps the most fundamental of all – that of
secure communication between two spacially separated parties traditionally named Alice
(A) and Bob (B) with eavesdropper Eve (E). We will discuss the Bennett-Brassard pro-
tocol (the so-called BB84 protocol) for quantum key distribution which provides a means
of communication that’s provably secure against eavesdropping. The protocol itself is
relatively simple to describe but the proof of unconditional security against general at-
tacks is very involved and technical (beyond the scope of this course). Below we will
describe the protocol and be content with making some remarks about its security in
some restricted situations.
How can we communicate securely?
The issue of secure communication has a long history going back thousands of years.
Circa 100 BC Julius Caesar used a cipher in which the letters of the text were simply
shifted forward by three places in the alphabet. A more elaborate version of this kind of
encryption method (subsequently historically used in a variety of contexts) is to apply
39
some more general chosen fixed permutation of the alphabet, known securely only to
the sender and receiver. However such schemes are insecure (against suitably intelligent
adversaries) for example by compiling a table of symbols with their occurring frequencies,
and comparing this to a similar table derived from typical texts in the language.
With the development of mathematics (particularly number theory, abstract algebra,
group theory, coding theory, computational complexity theory etc.) a variety of more
sophisticated (classical) schemes for secure communication were invented but none of
these apart from the one time pad (which in turn requires a method of key distribution)
is provably secure. We will discuss the one time pad below as it is also an underpinning
ingredient for quantum key distribution (QKD) schemes such as BB84. QKD schemes
will be able to circumvent shortcomings of classical key distribution schemes which render
them unsuitable in many common situations (cf below).
Remark (on public key cryptosystems).
Amongst more sophisticated schemes in common use today are the so-called public key
crypto systems (Diffie-Hellman scheme, RSA, elliptic curve cryptography). The secu-
rity of these schemes is not absolute but relies on (unproven but widely believed) com-
putational hardness assumptions i.e. a belief that certain computational tasks while
computable in principle require so much computing time that they are effectively un-
computable in practice. For example given two large prime numbers p and q (of say
hundreds of digits each) it is easy to compute their product (e.g. using a very large sheet
of paper and careful long multiplication we could even probably do it by hand over a
rainy weekend). But conversely, given a composite number N (similarly having hundreds
of digits) there is no known “fast algorithm” to factorise it. In fact for N having just
several hundreds of digits, and using our best known classical factoring algorithms with
all the classical computing power on earth today, it would generally take longer than the
age of the universe to complete the task! These kinds of issues are the subject of computa-
tional complexity theory which we’ll see more of in the second half of the course (quantum
computation and quantum algorithms). More precisely here, the computational task of
multiplication of n-digit integers can be completed in a number of steps growing polyno-
mially (quadratically) with n, whereas our best classical factoring algorithms for n digit
integer factorisation require an exponential (super-polynomial) number of steps, expo-
nential in the cube root of n, and this exponential versus polynomial growth in number
of digits makes that latter task effectively uncomputable in practice for modest sizes of
n. Public key cryptosystems exploit such asymmetric (assumed) computational hardness
properties of various tasks to provide security. They also have the remarkable very useful
feature that the communicating parties do not need any prior secret shared information
known only to them (such as knowledge of the permutation in a Caesar-like cypher) so,
for example, they need never have previously met. However there are significant draw-
backs:
(a) it has not been proven that faster classical algorithms for the tasks may be discovered
in the future e.g. a factoring algorithm that requires only a polynomially growing number
of steps to complete.
(b) quantum computation provides entirely new (non-classical) modes of computing con-
sistent with the laws of physics (as we’ll see in the second half of the course). These
modes lead to new kinds of algorithms which can be used to solve some computational
40
problems exponentially faster than any known classical method. And coincidentally,
known tasks of this kind include those on which public key cryptosystems are based.
So public key crypto systems in common use today could be readily broken if we had a
working quantum computer! The most famous such algorithm is Shor’s quantum algo-
rithm for integer factorisation (and also computation of discrete logarithms in number
theory) discovered by Peter Shor in 1994, which achieves these tasks in a number of
(quantum-) computational steps growing only polynomially with n i.e rendering them
feasible in practice.
Thus on the one hand quantum physics (via Shor’s and other quantum algorithms)
allows the breaking of some classical cryptosystems that are not known to be classically
breakable, while on the other hand, via quantum key distribution protocols, it offers a
method for provably secure communication.
The one time pad
We assume that our message is a bit string M of length n (without loss of generality
e.g. we could represent letters of the alphabet and some punctuation symbols as distinct
5-bit strings).
For the one time pad Alice and Bob need to share a secret private key K which is a
uniformly random bit string of the same length n as the message, and which is known
only to them.
Alice encrypts her message by adding K to M . Here addition is addition mod 2 and it is
carried out separately at each bit position of the strings. This produces the cryptotext
C = M ⊕ K which she sends to Bob (over a public classical channel).
Bob receives C and computes C ⊕ K = M ⊕ K ⊕ K = M (with the last equality since
0 ⊕ 0 = 1 ⊕ 1 = 0) thus decrypting the message.
Features and remarks
If K is uniformly distributed amongst bit strings then so is C. Thus any potential
eavesdropper Eve can learn nothing about M (apart from its length) by looking at C.
Hence this scheme cannot be broken, a feature that can be proven more formally in the
context of classical information theory, introduced by Claude Shannon in the 1940s.
It is important for security that the key K is used only once (hence the name “one time
pad”) e.g. if it would be used twice to generate C1 = M1 ⊕ K and C2 = M2 ⊕ K then
C1 ⊕ C2 = M1 ⊕ M2 which would generally contain information about M1 and M2 and
(with C1 and C2 available) about K too, if Alice were to use K again.
Thus the scheme is rather inefficient in its ongoing needed secret resource, with Alice
and Bob needing fresh secret key of length equal to that of each subsequent message.
But given that, all seems fine and the key question is: how can Alice and Bob acquire
their secret key? It is impossible for two parties to classically generate a secure private
key over a public channel. Thus they would need to meet (and carry away e.g. a private
one time pad book, for later use) or else use a trusted intermediary to distribute the key.
Each of these has evident limitations and potential significant security risks. Quantum
41
key distribution (QKD) provides a method for Alice and Bob to generate a shared secret
key over public classical and quantum channels without the need to meet or to use a
trusted intermediary.
In QKD schemes the quantum signals are used to generate the shared secret key rather
than to encode the message itself, which is subsequently communicated using the classical
one time pad scheme. A variety of quantum key distribution schemes have been proposed
including:
(1) BB84 (C. Bennett and G. Brassard 1984), uses four qubit signal states that include
non-orthogonal pairs;
(2) B92 (C. Bennett 1992), uses only two (non-orthogonal) quantum signal states;
(3) E91 (A. Ekert 1991), uses (one qubt of ) an entangled pair of qubits in place of the
signal states of BB84 which are later created using local measurements by Alice and Bob;
and others. We will discuss only BB84.
The BB84 quantum key distribution protocol
We assume that Alice and Bob can communicate over a public classical channel and they
can also send qubits over a quantum channel. Eve also has access to these channels and
she wants to acquire information without being detected, about the secret key that Alice
and Bob will generate. The bottom line will be that this will be impossible for Eve to
achieve by any means whatever consistent with the laws of physics.
For quantum transmissions we will use the following four qubit states
|ψ00 i = |0i
|ψ10 i = |1i
|ψ01 i = |+i = √1 (|0i + |1i)
2
|ψ11 i = |−i = √1 (|0i − |1i)
2
making up two orthonormal qubit bases viz. B0 = {|ψ00 i , |ψ10 i} and B1 = {|ψ01 i , |ψ11 i}.
These are the computational basis (or Pauli Z eigenbasis), and the diagonal basis (or
Pauli X eigenbasis). These bases are called mutually unbiassed since if any state of one
basis is measured in the other basis, the outcomes are always equally likely.
We now give the BB84 protocol as a series of steps:
BB84 Step1:
Alice generates two uniformly random binary strings X = x1 x2 . . . xm and Y = y1 y2 . . . ym ,
xi , yi ∈ {0, 1}. Then she prepares m qubits in the states
42
When Bob receives the m qubits they may no longer be in the states |ψxi yi i that Alice sent,
since the quantum channel may have been noisy or eavesdropping may have occurred.
To understand how the protocol works let us imagine first that there is no eavesdropping
and that the channel is perfectly noiseless i.e. Bob receives precisely the states |ψxi yi i
that Alice sent.
0
Bob chooses a uniformly random bit string Y 0 = y10 y20 . . . ym and measures the ith received
0 0
qubit in basis Byi0 to get a result xi i.e. yi is Bob’s guess at Alice’s choice of encoding
basis and x0i is his guess at A’s bit value xi . Let X 0 = x01 x02 . . . x0m be the string of Bob’s
measurement outcomes. Note that if yi0 = yi (i.e. Bob correctly guessed Alice’s encoding
basis) then x0i = xi and he will learn her message bit correctly with certainty. But if
yi0 6= yi then x0i is completely uncorrelated with xi (recalling the mutually unbiassed
relationship between the bases).
BB84 Step 3:
After the completion of Step 2 Alice and Bob publicly reveal and compare their choice
of bases i.e. their strings Y and Y 0 (but they do not reveal the strings X and X 0 !). They
discard all bits xi and x0i for which yi 6= yi0 leaving shorter strings of expected length m/2.
Call these strings X̃ and X̃ 0 . Under our assumptions of no noise and no eavesdropping
in the quantum channel, these bit strings would provide the desired outcome of a shared
secret key.
[In lecture do an example of Steps 1 to 3.]
In reality there will always be some noise in transmissions and we need to deal with
possible eavesdropping too. To address these issues the BB84 protocol concludes with
the following steps 4 and 5, which we discuss further below. These are entirely issues
and techniques from classical cryptography.
BB84 Step 4 (information reconciliation):
Alice and Bob want next to estimate the bit error rate i.e. the number of bits in X̃ 0
that are not equal to those in X̃. To do this they publicly compare a random sample of
their strings (say half of their bits chosen at random positions), and then discard all the
announced bits. They assume that the remaining bits have about the same proportion
of errors as those checked. Next they want to correct these remaining errors (albeit at
unknown positions) to obtain two strings that agree in a high percentage of positions
with high probability. Remarkably this can be done (at the expense of sacrificing some
more bits) without giving everything away, if the bit error rate is not too large (in fact
less than about 11%).
BB84 Step 5 (privacy amplification):
From the estimated bit error rate Alice and Bob can estimate the maximum amount of
information that an eavesdropper is likely to have obtained about the remaining bits.
From this information estimate they use techniques of so-called privacy amplification
from classical cryptography to replace their strings by even shorter strings about which
the eavesdropper can have practically no knowledge whatever (with high probability).
43
This concludes the BB84 quantum key distribution protocol.
Further remarks about information reconciliation and privacy amplification
A rigorous treatment of the details of Steps 4 and 5 requires much further technical
development from classical information theory, the theory of error correcting codes and
classical cryptography. A full treatment is beyond the scope of this course and here we
will only draw attention to some of the essential ideas.
In Step 5 the bit error rate provides an upper bound on the amount of information that an
eavesdropper can have gained because (as we have previously discussed) non-orthogonal
states cannot be reliably distinguished and any attempt to acquire information about the
state identity certainly involves irreversible state disturbance. In the full theory one can
prove information disturbance tradeoff relations that quantify the intuition that more
information gain is necessarily accompanied by more disturbance. As a consequence
of this fundamental property of quantum information, the amount of Eve’s acquired
information is reflected in the bit error rate. Of course noise in the channel also generates
bit errors but Alice and Bob can reliably upper bound Eve’s information by simply
assuming that the whole error rate arose from eavesdropping.
There are many ways in which Eve could attempt to acquire information, such as:
(a) the intercept-resend attack: Eve can intercept each transmitted qubit separately,
measure it in some chosen basis to acquire some information about it, and then send on
the post-measurement state to Bob.
(b) general coherent attack: much more generally and complicatedly, Eve can introduce
an auxiliary (possibly very large) probe quantum system E of her own and unitarily
interact E with many of the passing qubits. Finally she can measure E to acquire infor-
mation, which now can be joint information about many of the qubits. Her measurement
here can even be postponed until after she overhears Alice and Bob’s public discussions
in Steps 2,3,4,5 and chosen in response to what she hears.
(c) Note that the standard classical strategy for eavesdropping on classical bits viz. read-
ing them and retaining a copy, and then sending them on perfectly intact, is not available
for our quantum state bit encodings because of the no-cloning theorem and the use of
non-orthogonal states in the set of encoding states!
Remarkably the privacy amplification techniques of classical information theory can be
shown to provide security against any possible eavesdropping strategy that’s consistent
with the laws of physics.
Example. (an intercept-resend attack)
Assume that the quantum channel is noiseless but Eve intercepts each passing qubit and
measures it in the so-called Breidbart basis:
|α0 i = cos π8 |0i + sin π8 |1i
|α1 i = − sin π8 |0i + cos π8 |1i .
This is a good choice of basis as it lies “midway” between the two BB84 encoding bases:
the squared overlaps of |α0 i with the two states |0i and |+i used to encode bit value
0 are equal (being cos2 π/8) and similarly for |α1 i with |1i and |−i. For any other
choice of basis one of these four overlaps will be smaller and intuitively the |αi i’s thus
44
provide the best (most parallel) simultaneous approximations to the two non-orthogonal
states used to encode each bit value i; and Eve will learn each bit of X̃ with probability
cos2 π/8 ≈ 0.85.
Let us compute the bit error rate in the strings X̃ and X̃ 0 arising from Eve’s intervention.
For each of the four encoding states (used with probability 1/4) we compute prob(x 6= x0 ).
Let us denote measurement outcomes by the corresponding basis states and write for
example, prob(B gets |1i | |α0 i) to denote the probability that B’s measurement result
is 1 given that he received a qubit in state |α0 i etc.
For state |0i sent by Alice in basis B0 to encode bit value 0, we know that Bob will
measure in basis B0 (i.e. same bases for the tilde strings) but in between Eve will have
measured in the Breidbart basis. Thus for this case
prob(x0 6= x) = prob(B gets |1i | A sent |0i)
= prob(E gets |α0 i | |0i) · prob(B gets |1i | |α0 i)
+ prob(E gets |α1 i | |0i) · prob(B gets |1i | |α1 i)
= | hα0 |0i |2 | h1|α0 i |2 + | hα1 |0i |2 | h1|α1 i |2
= cos2 π/8 sin2 π/8 + sin2 π/8 cos2 π/8
= 41 .
The other three encoding states similarly give the same result. Thus the eavesdropping
will result in a disturbance amounting to a bit error rate of 25%. This is in fact the
minimal value over all choices of Eve’s basis (cf exercise sheet 2).
Having estimated the bit error rate in step 4, Alice and Bob now perform information
reconciliation to correct the errors (albeit in unknown positions!) in their remaining
strings. This can be achieved using techniques from the theory of error correcting codes
(that we will not discuss in this course) or other methods from classical cryptography.
(Discuss a method in lectures if time permits?)
Information reconciliation leaks further limited information to Eve and the final result is
a pair of shorter strings that are guaranteed to be equal in each position with high prob-
ability, but about which Eve may still have information. Finally privacy amplification is
performed to produce two final strings of still shorter length, about which is guaranteed
with high probability to have no significant information at all.
Eve may have different kinds of information about the string. For example she may know
some specific bits, or parities of some subsets of bits, or some other Boolean function of
bits, or perhaps probabilistic information e.g. that a particular bit or Boolean function
has probability 2/3 of being 0 etc. It is thus prima facie remarkable that privacy amplifi-
cation can be done at all, without publicly revealing the whole string. Here’s an example
to illustrate how it can be achieved in a very simple case.
Example. (optional)
Note first that if Eve knows a bit value x but not y (i.e. y is uniformly random to her)
then the Boolean sum x ⊕ y will be uniformly random to her. More generally she will
have no knowledge of the parity of a subset of bits if she knows only some (and not all)
of the bits (and nothing more).
45
Now suppose Alice and Bob share a 3-bit string x = x1 x2 x3 and that Eve knows at most
one of the bits and nothing else. Then consider the 2-bit string y = y1 y2 constructed as
parities of some subsets of the x’s:
y1 = x1 ⊕ x3 y2 = x2 ⊕ x3 .
We claim that Eve then knows nothing about the (shorter) string y. Indeed listing all
eight possibilities for x and their corresponding y’s we see that if we select the four table
entries corresponding to any fixed value of any chosen xi , the corresponding four y’s are
always 00, 01, 10 and 11. Thus if Eve knows any single bit of x (and even has knowledge
of Alice and Bob’s formulas for the yi ’s) she will know nothing about y.
The calculation of the yi ’s from the xj ’s can be written as a Boolean matrix multiplication
y = Gx viz.
x1
y1 1 0 1
= x2
y2 0 1 1
x3
with the rows of the matrix G corresponding to the subsets of xj ’s whose parities give
the yi ’s. More generally in coding theory one can prove that if Eve knows k bits (and
nothing else) of an n bit string then a Boolean matrix G (i.e. with (0, 1) entries) mapping
n bit strings to m bit strings (n > m) will produce secret y’s iff the minimum weight of
the code generated by G is strictly greater than k.
These ideas can be extended to cover all possible kinds of information that Eve may
have, by using the cryptographic method called universal hashing, to produce a short,
very secret, string y from a longer but less secret string x. One way to achieve this is to
select a random Boolean matrix G of size m × n (with m < n determined by the bit error
rate, that characterises the extent of Eve’s possible information). We can also think of
the choice of G as that of m random subsets of the n bit string x. Then it can be shown
that with high probability Eve will have no significant knowledge of the parities of these
subsets.
We have here really just drawn attention to the ideas of information reconciliation and
privacy amplification, and their significance for the BB84 QKD protocol. Readers inter-
ested in more details should consult the (extensive) classical cryptography literature.
Practical implementation of quantum key distribution
To be briefly discussed in lectures.
46
6 Basic notions of classical computation
and computational complexity
We begin by setting down the basic notions of classical computation which will later be
readily generalised to provide a precise definition of quantum computation and associated
notions of quantum computational complexity classes.
Computational tasks:
The input to a computation will always be taken to be a bit string. The input size is
the number of bits in the bit string. For example if the input is 0110101 then the input
size is 7. Note that strings from any other alphabet can always be encoded as bit strings
(with only a linear overhead in the length of the string). For example decimal integers
are conventionally represented via their binary representation.
A computational task is not just a single task such as “is 10111 prime?” (where we are
interpreting the bit string as an integer in binary) but a whole family of similar tasks such
as “given an n-bit string A (for any n), is A prime?” The output of a computation is also
a bit string. If this is a single bit (with values variously called 0/1 or “accept/reject” or
“yes/no”) then the computational task is called a decision problem. For computational
complexity considerations (cf later), we will be especially interested in how various kinds
of computational resources (principally time – number of steps, or space – amount of
memory needed) grow as a function of input size n.
Let B = B1 = {0, 1} and let Bn denote the set of all n-bit strings. Let B ∗ denote the set
of all n-bit strings, for all n i.e. B ∗ = ∪∞ ∗
n=1 Bn . A subset L of B is called a language.
Thus a decision problem corresponds to the recognition of a language viz. those strings
for which the answer is “yes” or “accept” or 1, denoting membership of L. For example
primality testing as above is the decision problem of recognising the language L ⊆ B ∗
where L is the subset of all bit strings that represent prime numbers in binary. More
general computational tasks have outputs that are bit strings of length > 1. For example
the task FACTOR(x) with input bit string x is required to output a bit string y which
is a (non-trivial) factor of x, or output 1 if x is prime.
Circuit model of classical computation:
There are various possible ways of defining what is meant by a “computation” e.g. the
Turing machine model, the circuit (or gate array) model, cellular automata etc. Although
these look quite different, they can all be shown to be equivalent for the purposes of
assessing the complexity of obtaining the answer for a computational task. Here we will
discuss only the circuit model, as it provides the easiest passage to a notion of quantum
computation.
For each n the computation with inputs of size n begins with the input string x = b1 . . . bn
extended with a number of extra bits all set to 0 viz. b1 . . . bn 00 . . . 0. These latter bits
provide “extra working space” that may be needed in the course of the computation. A
computational step is the application of a designated Boolean operation (or Boolean
gate) to designated bits, thus updating the total bit string. These elementary steps should
be fixed operations and for example, not become more complicated with increasing n. We
47
restrict the Boolean gates to be AND, OR or NOT. It can be shown that these operations
are universal i.e. any Boolean function f : Bm → Bn at all can be constructed by the
sequential application of just these simple operations. The output of the computation is
the value of some designated subset of bits after the final step.
Then for each input size n we have a so-called circuit Cn which is just a prescribed
sequence of computational steps. Cn depends only on n and not on the the particular
input x of size n. In total we have a circuit family (C1 , C2 , . . . , Cn , . . .). We think
of Cn as “the computer program” or algorithm for inputs of size n. (There is actually
an extra technical subtlety here that we will just gloss over: we also require that the
descriptions of the circuits Cn should be generated in a suitably simple computational
way as a function of n, giving a so-called uniform circuit family. This prevents us from
“cheating” by coding the answer of some hard computational problem into the changing
structure of Cn with n.)
Randomised classical computations:
It is useful to extend out model of classical computation to also incorporate classical
probabilistic choices (for later comparison with outputs of quantum measurements, that
are generally probabilistic). This is done in the circuit model as follows: for input b1 . . . bn
we extend the starting string b1 . . . bn 00 . . . 0 to b1 . . . bn r1 . . . rk 00 . . . 0 where r1 . . . rk is a
sequence of bits each of which is set to 0 or 1 uniformly at random. If the computation
is repeated with the same input b1 . . . bn the random bits will generally be different.
The output is now a sample from a probability distribution over all possible output
strings, which is generated by the uniformly random choice of r1 . . . rk . (Thus any output
probability must always have the form a/2k for some integer a ≤ 2k ). Then in specific
computational algorithms we normally require the output to be correct “with suitably
high probability”, specified according to some desired criteria. This formalism with
random input bits can be used to implement probabilistic choices of gates. For example
suppose we wish to apply either AND or OR at some point, chosen with probability half.
Consider the 3-bit gate whose action is as follows: if the first bit is 0 (resp. 1) apply OR
(resp. AND) to the last two bits. Then we use this gate with a random input to the first
bit.
Polynomial time complexity classes P and BPP
In computational complexity theory a fundamental issue is the time complexity of algo-
rithms: how many steps (in the worst case) does the algorithm require for any input of
size n? In the circuit model the number of steps on inputs of size n is taken to mean
the total number of gates in the circuit Cn i.e. the size of the circuit Cn . Let T (n) be
the size of Cn , which we also interpret as a measure of the run time of the algorithm
as a function of input size n. We are especially interested in the question of whether
T (n) is bounded by a polynomial function of n (i.e. is T (n) < cnk for all large n for
some positive constants k, c?) or else, does T√(n) grow faster than any polynomial (e.g.
exponential functions such as T (n) = 2n or 2 n or nlog n have this property).
Remark. (Notations for growth rates in computer science (CS) literature.)
For a positive function T (n) we write T (n) = O(f (n)) if there are positive constants c
and n0 such that T (n) ≤ cf (n) for all n > n0 , i.e. “T grows no faster than f ”. We
48
write T (n) = O(poly(n)) if T (n) = O(nk ) for some constant k, i.e. T grows at most
polynomially with n.
Note that this use of big-O is slightly different from common usage in say calculus, where
instead of n → ∞ we consider x → 0 e.g. writing ex = 1 + x + O(x2 ).
In CS usage if T (n) is O(n2 ) then it is also O(n3 ) but in calculus O(x2 ) terms are not
also O(x3 ).
In the CS literature we also commonly find other notations: we write T (n) = Ω(f (n)) to
mean T (n) ≥ cf (n) for all n > some n0 and some positive constant c, i.e. “T grows at
least as fast as f ”; and we write T (n) = Θ(f (n)) to mean c2 f (n) ≤ T (n) ≤ c1 f (n) for
all n > some n0 and positive constants c1 , c2 , i.e. T is both O(f (n)) and Ω(f (n)), i.e. “T
grows at rate f ”.
In this course we will use only the big-O notation (and not Ω or Θ).
Although computations with any run time T (n) are computable in principle, poly time
computations are regarded as “tractable” or “computable in practice”. The term ef-
ficient algorithm is also synonymous with poly time algorithm. If T (n) is not
polynomially bounded then the computation is regarded as “intractable” or “not com-
putable in practice” as the physical resource of time will, for fairly small n values, exceed
sensibly available limits (e.g. running time on any available computer may exceed the
age of the universe).
We have the following standard terminology for some classes of languages (or sometimes
these terms are applied to algorithms themselves, that satisfy the stated conditions):
P (“poly time”):
class of all languages for which the membership problem has a classical algorithm that
runs in polynomial time and gives the correct answer with certainty.
BPP (“bounded error probabilistic poly time”):
class of all languages whose membership problem has a classical randomised algorithm
that runs in poly time and gives the correct answer with probability at least 2/3 for every
input.
The class BPP is generally viewed as the mathematical formalisation of “decision prob-
lems that are feasible on a classical computer”.
Example: Let the problem FACTOR(N, M ) be the following: given an integer N of n
digits and M < N , decide if N has a non-trivial factor less than M or not. The fastest
1 2
known classical algorithm runs in time exp O(n 3 (log n) 3 ) i.e. more than exponential in
the cube root of the input size. Thus this problem is not known to be in BPP.
Remark (about the definition of BPP)
We have required the output to be correct with probability 2/3. However it may be shown
that “2/3” here may be replaced by any other number 1 − that’s strictly greater than
half without changing the contents of the class i.e. if there is a poly time algorithm for a
problem that succeeds with probability 21 + δ (for any chosen δ > 0, however small) then
there is also a poly time algorithm that succeeds with probability 0.500001 or 0.99999 or
indeed 1 − for any 0 < < 12 (however small). This result relies on the following fact,
sometimes called the amplification lemma (proved using the Chernoff bound for repeated
49
Bernoulli trials cf Nielsen and Chuang p154 for a simple proof): if we have an algorithm
for a decision problem that works correctly with probability 12 +δ then consider repeating
the algorithm K times and taking the majority vote of all K answers as our final answer.
Then this answer is correct with a probability at least 1 − exp(−2δ 2 K), approaching 1
exponentially fast in K. Thus given any > 0 this probability will exceed 1 − for
some constant K, and if the original algorithm had poly running time T (n) then our
K-repetition majority vote strategy has running time KT (n) which is still polynomial in
n.
Remark. (Optional. Polynomial space complexity.)
If we replace the computational resource of time (i.e. number of gates or elementary
computational steps) by that of space (i.e. amount of memory or number of bits needed
to perform the computation) then we obtain the complexity class PSPACE, of all de-
cision problems that can be solved within a polynomially bounded amount of space (as
a function of input size) and no imposed restriction on time. It is easy to see that we
have the inclusions P ⊆ BPP ⊆ PSPACE. Indeed any poly time computation occurs
in poly space since poly many one- and two-bit gates can act on at most poly many
bits in total. Similarly in any randomised poly time computation, for each fixed choice
of the random bits, we can perform the associated computation in poly space. Then
doing this sequentially in turn (re-using the same poly space allocation) for each of the
exponentially many choices of the random bits, we can keep a running total of accept
and reject answers, and thus get BPP ⊆ PSPACE.
Astonishingly(!?) it is not known whether any of the preceding inclusions are equalities
or strict inclusions!
In quantum computation (cf later) and the study of its properties relative to classical
computation, there is another computational scenario that is often considered. This is
the formalism of “black box promise problems” with an associated measure of complexity
called “query complexity”.
In this scenario, instead of being given an input bit string of some length n, we are
given as input a black box or oracle that computes some (here Boolean, but sometimes
more general) function f : Bm → Bn . We can query the black box by giving it inputs
and this is the only access we have to the function and its values. No other use of
the box is allowed. In particular we cannot “look inside it” to see its actual operation
and learn information about the function f . Thus, at the start, it is unknown exactly
which function f is, but there is often an a priori promise on f i.e. some stated a priori
restriction on the possible form of f . Our task is to determine some desired property of
f e.g. some feature of the set of all values of f . We want to achieve this by querying
the box the least possible number of times. In our circuits in addition to our usual gates
we may use the black box as a gate, each use counting as just one step of computation.
The query complexity of such an algorithm is simply the number of times that the
oracle is used (as a function of its “size” e.g. as measured by m + n). In addition to the
query complexity we may also be interested in the total time complexity, counting also
50
the number of gates used to process the answers to the queries in addition to merely the
number of queries themselves.
Example 1 The following are examples of black box promise problems that will be espe-
cially relevant in this course.
The “balanced versus constant” problem
Input: a black box for a Boolean function f : Bn → B (one bit output).
Promise: f is either (a) a constant function (f (x) = 0 for all x or f (x) = 1 for all x)
or (b) a “balanced” function in the sense that f (x) = 0 resp. 1 for exactly half of the 2n
inputs x.
Problem: Determine whether f is balanced or constant. We could ask for the answer
to be correct with certainty or merely with some probability, say 0.99 in every case.
Boolean satisfiability
Input: a black box for a Boolean function f : Bn → B.
Promise: no restriction on the form of f .
Problem: determine whether there is an input x such that f (x) = 1.
Search
Input: a black box for a Boolean function f : Bn → B.
Promise: There is a unique x such that f (x) = 1.
Problem: find this special x.
Periodicity
Input: a black box for a function f : Zn → Zn
(where Zn denotes the set of integers mod n).
Promise: f is periodic i.e. there is a least r such that f (x + r) = f (x) for all x (and +
here denotes addition mod n).
Problem: find the period r.
In each case we are interested in how the minimum number of queries grows as a function
of the natural parameter n (for quantum versus classical algorithms).
51
(in the computational basis) on a specified subset of the qubits (this being part of the
description of Cn ).
Remark: More generally we could allow measurements along the way (rather than only
at the end) and allow the choice of subsequent gates to depend on the measurement
outcomes. However it can be shown that this further generality adds nothing extra:
any such circuit can be re-expressed as an equivalent circuit in which measurements are
performed at the end only.
A quantum computation or quantum algorithm is defined by a (uniform) family of quan-
tum circuits (C1 , C2 , . . .).
Classical or quantum circuits can be depicted pictorially as a circuit diagram. Each input
bit or qubit is represented by a horizontal line running across the diagram, which is read
from left to right. The applied gates are represented by labelled boxes (or other symbols
attached to the relevant lines), read in order from left to right.
Universal sets of quantum gates
In classical computation we restrict our circuits to be composed of gates chosen from
a (small) universal set that act on only a few bits each. One such choice is the set
{NOT, AND, OR}. Actually OR may even be deleted from this set since b1 OR b2 =
NOT(NOT(b1 ) AND NOT(b2 )).
Remark (optional): It may be shown that no sets of 2-bit reversible gates are universal
(see Preskill p241-2) but there are 3-bit reversible gates G that are universal even just
by themselves i.e. any reversible Boolean function may be constructed as a circuit of G’s
alone, so long as we have available constant extra inputs set to 0 or 1. Two examples of
such gates (assuming our starting bit string can also have bits set to 1 in the extra working
space) are the Fredkin gate F (0b2 b2 ) = 0b2 b3 and F (1b2 b3 ) = 1b3 b2 i.e. a controlled
SWAP, controlled by the value of the first bit, and the Toffoli gate Toff(0b2 b3 ) = 0b2 b3
and Toff(1b2 b3 ) = 1CX(b2 b3 ) i.e. a controlled-controlled-X gate in which X is applied
to bit 3 iff the first two bits are 1 and Toff is the identity otherwise.
Approximately universal sets of quantum gates
In the quantum case all gates are reversible (unitary) by definition and there are similar
universality results but the situation is a little more complicated: quantum gates are
parameterised by continuous parameters (in contrast to classical gates which form a dis-
crete set) so no finite set can generate them all exactly via (even unboundedly large) finite
circuits. But many small finite sets of quantum gates are still approximately universal in
the sense that they can generate any unitary gate with any prescribed accuracy > 0.
Such approximations (for suitably small ) will suffice for all our purposes and for clarity
of discussion we will generally ignore this issue of approximability and just allow use of
any exact gate that we need.
More precisely, introduce a notion of closeness of unitary operators U and V (on the
same space) by defining ||U − V || ≤ to mean that max ||U |ψi − V |ψi || ≤ . Here
the maximum is taken over all normalised vectors |ψi and ||..|| in the maximum is the
usual length of vectors. Then a set of quantum gates (acting on qubits) is defined to
be approximately universal if for any unitary W on any number n of qubits and any
> 0 there is a circuit C of the given gates whose overall unitary action (also denoted
52
C) satisfies ||W − C|| ≤ . The set is called exactly universal if we have = 0 in the
preceding condition.
For either exactly or approximately universal sets of gates, the size of the circuit C for
W will generally be exponential in the number of qubits n on which W acts (but in
some important special cases of W , it can be poly-sized e.g. notably for the quantum
Fourier transform on n qubits, cf later! Another important issue is how the size of C
grows with decreasing accuracy parameter . Here we just quote a fundamental result:
the Solovay-Kitaev theorem asserts that if G is an approximately universal set of gates,
then (under some further mild technical conditions on G) the size of C can be taken
to be bounded by poly(log(1/)) as a function of accuracy parameter i.e. polynomial
in the number of digits (log(1/) of accuracy (For a proof see e.g. appendix in Nielsen
and Chuang). The degree of the polynomial p here depends (generally exponentially) on
the number of qubits n of W , but for fixed n, p does not depend on the gate W being
approximated.
Remark (optional)
Some examples of approximately universal sets of quantum
gates are the following:
1 0
{CX, all 1-qubit gates}, {CX, H, T = } and {Toffoli 3-qubit gate, H}
0 exp iπ/4
the latter actually being universal for all gates with real entries, which can be shown to
suffice for full universal quantum computation i.e. for any quantum circuit there is a cor-
responding circuit comprising only real gates, that generates the same output probability
distribution for any computational basis input. The infinite set {CX, all 1-qubit gates}
is actually exactly universal too (with continuous parameters provided by the 1-qubit
gates). For more details and proofs see Nielsen and Chuang §4.5.
Polynomial time quantum computations and BQP
The complexity class BQP (bounded error quantum polynomial time) is defined as a
direct generalisation of BPP viz. BQP is the class of languages L such that there is
a polynomial time quantum algorithm for deciding membership of L i.e. for each input
size n we have a quantum circuit Cn whose size is bounded by poly(n) and for any input
string the output answer is correct with probability at least 2/3.
BQP is our mathematical formalisation of “computations that are feasible on a quantum
computer”. From the definitions it can be shown that BPP ⊆ BQP (any poly sized
classical circuit can be replaced by an equivalent circuit of classical reversible gates,
still of poly size and the latter is also a quantum circuit albeit comprising gates that
preserve the computational basis as a set). Thus with these computational definitions
the question “Is quantum computing more powerful than classical computing?” can
be expressed formally as “Is BQP strictly larger than BPP?”. This question remains
unsolved although it is generally believed that the classes are unequal. For example the
decision problem FACTOR(M, N ) (viz. does M have a nontrivial factor less than N ?)
is in BQP (as we’ll see in detail later) but it is not known to be in BPP (although we
have no proof that it is not in BPP!)
More generally we will be especially interested in any kind of computational task that
can demonstrate any kind of computational resource benefit (especially an exponential
53
benefit) for solution by quantum vs. classical computation. Historically the notion
of query complexity and promise problems that we introduced above, provided the first
source of such examples, and we’ll consider some of them as our first quantum algorithms
below. But before giving explicit algorithms we need a further result about black boxes
(oracles) in the context of quantum vs. classical computations.
Reversible version of any Boolean function
If f : Bm → Bn is any Boolean function it can be expressed in an equivalent reversible
form f˜ : Bm+n → Bm+n as follows. We introduce an addition operation, denoted ⊕, for
n-bit strings: if b = b1 . . . bn and c = c1 . . . cn then b ⊕ c = (b1 ⊕ c1 ) . . . (bn ⊕ cn ) i.e. b ⊕ c
is the n-bit string obtained by adding mod 2, the corresponding bits of b and c in each
slot separately. For example 011 ⊕ 110 = 101. Note that for any n-bit string we have
b ⊕ b = 0 . . . 0 where 0 . . . 0 denotes the n-bits string of all zeroes.
Now for any f : Bm → Bn define f˜ : Bm+n → Bm+n by
f˜(b, c) = (b, c ⊕ f (b)) for any m-bit string b and any n-bit string c.
Note that f˜ is easily computable if we can compute f and the (simple) addition operation
⊕ on bit strings. Conversely given f˜ we can easily recover f (b) for any b by setting
c = 0 . . . 0 and looking at the last n bits of output of f˜.
Furthermore we have the key property: for any f , f˜ is a reversible (i.e. invertibele
function on m + n bits. In fact f˜ is always self-inverse i.e. f˜ applied twice is the identity
operation (an easy consequence of the fact that b ⊕ b = 00 . . . 0 for any bit string b).
It should be intuitively clear that any classical algorithm using an oracle for f can be
equally well performed using an oracle for the reversible version f˜ instead. In quantum
computation, gates are always reversible (unitary) by definition so we will always use
(a quantum version of) f˜ for any oracle problem involving f . More specifically the
quantum oracle for any Boolean function f : Bn → Bm will be the quantum gate
denoted Uf on n + m qubits, defined by its action on basis states as follows:
i.e. Uf acts exactly like the classical function f˜ on the labels (x, y) ∈ Bn+m of the
computational basis states (and it acts on arbitrary states of n + m qubits by linear
extension). We sometimes refer to the n-qubit register |xi and the m-qubit register |yi
as the input and output registers respectively.
Remark. Uf as defined above is always guaranteed to be a unitary operation. Indeed if
g : Bk → Bk is any reversible Boolean function on k bits then it is just a permutation of
all k-bit strings. Hence the linear map V on k qubits defined by V |i1 . . . ik i = |g(i1 . . . ik )i
will be represented by a permutation matrix in the computational basis i.e. each column
is all 0’s with a single 1 entry, and different columns have the 1 entry in different rows,
so V is unitary.
Computation by quantum parallelism
Note that as a quantum operation (in contrast to classical oracles) Uf can act on (jointly)
superposed inputs of both registers. Indeed if we set the input register to an equal
54
superposition of all 2n possible n-bit strings we get (by linearity)
1 X 1 X
Uf : √ |xi |0i → |f i ≡ √ |xi |f (x)i
2n 2n
all x all x
i.e. in one run of Uf we obtain a final state which depends on all of the function
values. Such a computation on superposed inputs is called computation by quantum
parallelism. By further quantum processing and measurement on the state |f i we are
able to obtain “global” information about the nature of the function f (e.g. determine
some joint properties of all the values) with just one run of Uf , and these properties
may be difficult to get classically without many classical evaluations of f (as each such
evaluation reveals only one further value). This simple idea of running computations
in quantum superposition is a powerful ingredient in quantum vs. classical algorithms.
In Appendix 1 (at the end of the notes) we discuss some further issues relating to the
interpretation of superpositions in quantum computation.
It is instructive to consider more explicitly how we can actually create the input state
of a uniform superposition over all x values that is needed in the above process. Recall
that H |0i = √12 (|0i + |1i) so if we apply H to each of n qubits initially in state |0i and
multiply out all the state tensor products, we get
H ⊗ . . . ⊗ H(|0i . . . |0i) = √1 (|0i + |1i) . . . (|0i + |1i)
2n P
1
√1 √1
P
= 2n x1 ,x2 ,...,xn =0 |x1 i |x2 i . . . |xn i = 2n x∈Bn |xi .
An important feature of this process (recalling the fundamental significance of poly vs.
exponential growth in complexity theory) is that we have created a superposition of
exponentially many (viz. 2n ) terms with only a linear number of elementary operations
viz. application of H just n times.
Our first (and historically the first, from 1992) example of an exponential benefit of
quantum over classical computation is a quantum algorithm (the so-called DJ algorithm)
for the “balanced vs. constant” black-box promise problem, which we re-iterate here:
The “balanced versus constant” problem
Input: a black box for a Boolean function f : Bn → B (one bit output).
Promise: f is either (a) a constant function (f (x) = 0 for all x or f (x) = 1 for all x)
or (b) a “balanced” function in the sense that f (x) = 0 resp. 1 for exactly half of the 2n
inputs x.
Problem: Determine (with certainty) whether f is balanced or constant.
(At the end we will discuss the bounded error version of the problem i.e. requiring the
correct solution only with some high probability, 0.999 say).
A little thought (hmm...) shows that classically 2n /2 + 1 queries (i.e. exponentially
many) are necessary and sufficient to solve the problem with certainty in the worst
case. Sufficiency is clear (why?). For necessity, suppose we have a deterministic classical
55
algorithm that purports to solve this problem with certainty in every case, for any f
satisfying the promise, while making K ≤ 2n /2 queries. Here the choice of query may
even adaptively depend in any way on the results of previous queries too.
A devious adversary (having a function f ) can force this algorithm to fail as follows (thus
showing necessity): when the algorithm is applied to him, he actually hasn’t a priori
chosen his function f yet but simply answers 0 for all queries. At the end his function
has been fixed on K inputs, but if K ≤ 2n /2 he is still free to complete the definition of
his function to be either constant or balanced, and have it contradict whatever conclusion
the algorithm reached.
Similarly for any probabilistic classical algorithm whose final output is required still to be
correct with certainty (although probabilistic choices may be used along the way), each
of the probabilistic branches of the algorithm must work with certainty themselves and
the above argument applies to them, showing again that the number of queries (on any
probabilistic branch) must be at least 2n /2 + 1.
We now show that in the quantum scenario, just one query suffices (with O(n) extra
processing steps) in every case! Our quantum black box is
where the input register |xi comprises n qubits and the output register |yi comprises a
single qubit. We assume that initially all (n + 1) qubits are in standard state |0i. We
begin by constructing an equal superposition of all n-bit strings in the input register
(as described above) and (surprisingly!) set the output register to the state |−i =
√1 (|0i − |1i). The latter is achieved by applying X and then H to the output qubit,
2
initially in state |0i. Thus we have the (n + 1)-qubit state
!
1 X
√ |xi |−i .
2n x∈Bn
i.e. we just get a minus sign on |xi if f (x) = 1 and no change if f (x) = 0. This process
of obtaining the (−1)f (x) sign is sometimes called “phase kickback”. Hence on the full
superposition we get
! !
1 X 1 X
Uf : √ |xi |−i −→ √ (−1)f (x) |xi |−i .
n
2 x∈Bn n
2 x∈Bn
56
This is a product state of the n-qubit input and single qubit output registers. Discarding
the last (output) qubit we get the n-qubit state
1 X
|f i ≡ √ (−1)f (x) |xi . (4)
n
2 x∈Bn
If f is balanced then the sum in eq. (4) contains an equal number of plus and minus
n
terms, with minus signs sprinkled in some P unknown locations along the 2 terms. But if
we take the inner product of |f i with x |xi we simply add up all the coefficients in |f i
(as hx|yi = 0 if x 6= y and = 1 if x = y) and wherever the minus P signs occur, the total
sum is always zero i.e. if f is balanced then |f i is orthogonal to x |xi. Hence if we apply
the unitaryP operation Hn (which preserves inner products), Hn P |f i will be orthogonal to
1
Hn ( √2n x |xi) = |0 . . . 0i. Hence Hn |f i must have the form x6=0...0 ax |xi having the
all-zero term absent. Generally Hn |f i will be a superposition of many basis states but
for some special balanced functions f it can have the form |xi 6= |0 . . . 0i comprising a
single basis state. See exercise sheet 3 (the Bernstein-Vazirani problem) for more details.
In view of the above discussion, having constructed |f i (for our given black box) we
apply Hn and measure the n qubits in the computational basis. If the result is 0 . . . 0
then f was certainly constant and if the result is any non-zero string x1 . . . xn 6= 0 . . . 0
then f was certainly balanced. Hence we have solved the problem with one query to f
and (3n + 2) further operations: (n + 1) H’s and one X to make the input state for Uf ,
n H’s on |f i and n single qubit measurements to get the classical output string.
|0i1 H H get x1
|0i2 H H get x2
.. .. .. Uf .. .. ..
. . . . . .
|0in H H get xn
|0i X H discard
|Ai |f i |−i
Figure. Circuit diagramPfor the DJ algorithm. The state |Ai just before Uf is the equal
superposition state √12n x∈Bn |xi in the first n qubits and |−i in the last qubit. The
action of Uf then produces the state |f i as in the text above.
The balanced versus constant problem with bounded error
Suppose we tolerate some error i.e. require our algorithm to correctly distinguish balanced
versus constant functions only with probability > 1 − for some > 0. Then the
above (single query) algorithm still works (as it has = 0) but there is now a classical
57
(randomised) algorithm that solves the problem with only a constant number of queries
(depending on as O(1/ log ) for any n and for any fixed > 0). Thus we lose the
all-interesting exponential gap between classical and quantum query complexities in this
bounded error scenario. The classical algorithm is the following: we pick K x values,
each chosen independently uniformly at random and evaluate the corresponding f values.
If they are all 0 or all 1, output “f is constant”. If we get at least one instance of each of
0 and 1, output “f is balanced”. Clearly the second output must always be correct (as a
constant function can never output both values). But the first output (“f is constant”)
can be erroneous. Suppose f is a balanced function. Then each random value f (x) has
probability half to be 0 or 1. So the probability that K random values are all 0 or all 1
is 2/2K = 1/2K−1 . This is < if 1/2K−1 < i.e. K > log 1/ i.e. K = O(log 1/) suffices
to guarantee error probability < in every case, for all n.
So does the above prove conclusively that quantum computation can be exponentially
more powerful (in terms of time complexity) than classical computation? We point out
two important shortcomings in this claim.
(i) The first weakness is that if we allow any level of error in the result, however small,
we lose the exponential separation between classical and quantum algorithm running
times (as described in the previous paragraph). But the exactly-zero error scenario in
computation is an unrealistic idealisation and for realistic computation we should always
accept some (suitably small) level of error – physical computers never work perfectly and
in the quantum case for example, gates depend on continuous parameters that cannot
be physically set with infinite precision. However this weakness can be fully addressed:
there exist other black box promise problems for which a provable exponential separation
exists between classical and quantum query complexity even in the presence of error. An
example is the so-called Simon’s quantum algorithm, which we will outline below.
(ii) A second (more serious) issue is the fact that the DJ problem is only a black box
problem (with the black box’s interior workings being inaccessible to us) rather than
a straightforward “standard” computational task with a bit string as input, and no
“hidden” ingredients. To convert it to a standard task we would want a class of Boolean
functions fn : Bn → B such that the balanced/constant decision is hard classically (e.g.
takes exponential time in n) even if we have full access to a description of the function
e.g. a formula for it or a circuit Cn that computes fn . Note that even a constant function
can be presented to us in such a perversely complicated way that its trivial action is
hard to recognise! Alas, no such (provably hard) class of Boolean function descriptions
is known.
So, are there any “standard” computational tasks for which we can prove the existence
of an exponential speed-up for quantum versus classical computation? No such absolute
proofs are known but the difficulty seems to be largely within the classical theory: even
though many problems have only exponential-time known classical algorithms, they can-
not be proven to be hard classically i.e. we cannot prove that no poly-time algorithm
exists (that we have not yet discovered!) – a glaring instance of this classical theory
shortcoming is the notorious fact that it is unproven that the class NP (cf later for more
about this class) or even PSPACE, is strictly larger than P. (PSPACE intuitively is
the class of languages that can be decided with an algorithm that uses a polynomial
amount of space or memory, and can thus generally run for exponential time). However
58
there are problems which are believed to be hard for classical computation (i.e. no clas-
sical poly-time algorithm, even with bounded error, is known despite much effort) for
which poly-time quantum algorithms do exist. A centrally important such problem is in-
teger factorisation. Below we will describe Shor’s polynomial time quantum algorithm for
factorisation after we introduce the fundamentally important construct of the quantum
Fourier transform, which is at the heart of the workings of Shor’s algorithm.
59
9 The quantum Fourier transform and periodicities
The quantum Fourier transform (QFT) can be viewed as a generalisation of the Hadamard
operation to dimensions N > 2. Later we will be especially interested in N = 2n i.e.
the QFT on an n-qubit space. As a pure mathematical construction it is the same as
the so-called discrete Fourier transform which is widely used in digital signal and image
processing. It is a unitary matrix that arises naturally in a wide variety of mathematical
situations so it fits well into the quantum formalism, providing a bridge between a quan-
tum operation and certain mathematical problems. In fact QFT is at the heart of most
known quantum algorithms that provide a significant speedup over classical computation.
Let HN denote a state space with an orthonormal basis (the computational basis)
|0i , |1i , . . . , |N − 1i labelled by ZN . The quantum Fourier transform (QFT) modulo N ,
denoted QFTN (or just QFT when N is clear) is the unitary transform on HN defined
by:
N −1
1 X xy
QF T : |xi → √ exp(2πi ) |yi (6)
N y=0 N
(note: here we are labelling rows and columns from 0 to N − 1 in ZN rather than 1 to
N .) If ω = e2πi/N is the√primitive N th root of unity then the matrix elements are all
powers of ω (divided by N ) following a simple pattern:
• The initial row and column always contain only 1’s.
• Each row (or column) is a geometric sequence. The k th row (or column) for k =
0, . . . , N − 1 is the sequence of powers of ω k (starting with power 0 up to power N − 1).
Remark: Note that QFT4 is different from H ⊗ H and generally QFT2n differs from
Hn = H ⊗ . . . ⊗ H. However in group representation theory a Fourier transform can be
defined on any group, which embraces both of these constructs as special cases – on a
set of 4 elements there are two (non-isomorphic) group structures viz. Z2 × Z2 and Z4
(addition of integers mod 4), and then H ⊗ H and QFT4 are respectively the Fourier
transforms for these two different group structures. In this course QFTN will always
mean “Fourier transform on the group ZN ”, as defined above in eq. (6).
Many properties of QFT, including the fact that it is unitary, follow from a basic algebraic
fact about roots of unity and geometric series. Recall the formula for the sum of any
geometric series 1−αN
2 N −1 if α 6= 1
1 + α + α + ... + α = 1−α
N if α = 1
60
Then setting α = wK (for some chosen K) we have α = 1 iff K is a multiple of N . Thus
K 2K (N −1)K N if K is a multiple of N .
1 + ω + ω + ... + ω = (7)
0 if K is not a multiple of N .
Now to see that QFT is unitary, consider the abth element of the matrix product QFT† QFT.
This is 1/N times the sum of “the ath row of QFT† lined up against the bth column of
QFT”. The latter sum is just the geometric series with α = ω b−a , divided by N . So using
eq. (7) we get 0/N = 0 if b 6= a and we get N/N = 1 if b = a i.e. QFT† QFT is the
identity matrix and QFT is unitary.
Here 0 ≤ x0 ≤ r − 1 has been chosen uniformly at random (by the extended Born rule,
since each possible value y of f occurs the same number A of times i.e. once in each
61
period.) If we measure the register of |peri we will see x0 + j0 r where j0 has been picked
uniformly at random too. Thus we have a random period (the j0th period) and a random
element in it (determined by x0 ) i.e. overall we get a random number between 0 and
N − 1, giving no information about r at all. Nevertheless the state |peri seems to contain
the information of r!
The resolution of this problem is to use the Fourier transform which is known even
in classical image processing, to be able to pick up periodicities in a periodic pattern
irrespective of an overall random shift of the pattern (e.g. the x0 in |peri). Applying
QFT to |peri we get (using eq. (6) with x replaced by x0 + jr, and summing over j):
A−1 N −1
! N −1
"A−1 #
1 X X (x0 +jr)y 1 X x0 y X jry
QF T |peri = √ ω |yi = √ ω ω |yi . (8)
N A j=0 y=0 N A y=0 j=0
(In the last equality we have reversed the order of summation and factored out the j-
independent ω xo y terms). Which labels y appear here with nonzero amplitude? Look at
the square-bracketed coefficient of |yi in eq. (8). It is a geometric series with powers of
α = e2πiry/N = (e2πi/A )y summed from power 0 to power A − 1. According to eq. (7)
(now applied with A taking the role of N there) this sum is zero whenever y is not a
multiple of A and the sum is A otherwise i.e. only multiples of A = N/r survive as y
values:
A−1
X
jry A if y = kN/r for k = 0, . . . , r − 1
ω =
0 otherwise
j=0
and
r−1
r
A X x0 (kN/r)
QFT |peri = ω |kN/ri .
N k=0
The random shift x0 has been eliminated from the labels and now occurs only in a pure
phase ω x0 kN/r (whose modulus squared is 1), and the periodicity of the ket labels has
been “inverted” from r to A = N/r. Since measurement probabilities are squared moduli
of the amplitudes, these probabilities are now independent of x0 and depend only on N
(known) and r (to be determined). This is represented schematically in the following
diagram.
62
6probs 6probs
r x0- r - r - ...
N
...
...
labels
- labels
-
0 x0 x0 + r x0 + 2r . . . 0 N/r 2N/r . . .
If we now measure the label we will obtain a value c which is a multiple k0 N/r of N/r
where 0 ≤ k0 ≤ r − 1 has been chosen uniformly at random. Thus c = k0 N/r so
k0 c
= .
r N
Here c and N are known and k0 is unknown and random, so how do we get r out of
this? If (by some good fortune!) k0 was coprime to r we could cancel c/N down and
read off r as the denominator. If k0 is not coprime to r then this procedure will deliver
a denominator r0 that is smaller than the correct r so f (x) 6= f (x + r0 ) for any x. Thus
in our process we check the output r value by evaluating f (0) and f (r) and accepting r
as the correct period iff these are equal.
But k0 was chosen at random so what is the chance of getting this good fortune of
coprimality? We’ll use (without proof) the following theorem from number theory:
Theorem 1 (Coprimality theorem) The number of integers less than r that are coprime
to r grows as O(r/ log log r) with increasing r. Hence if k0 < r is chosen at random
prob(k0 coprime to r) ≈ O((r/ log log r)/r) = O(1/ log log r).
Thus if we repeat the whole process O(log log r) < O(log log N ) times we will obtain a
coprime k0 in at least one case with a constant level of probability. Here we have used
the following fact from probability theory:
Lemma 1 If a single trial has success probability p and we repeat the trial M times
independently then for any constant 0 < 1 − < 1:
− log
prob(at least one success in M trials) > 1 − if M =
p
63
Proof of lemma We have that the probability of at least one success in M runs = 1−
− log
prob(all runs fail) = 1 − (1 − p)M . Then 1 − (1 − p)M = 1 − if M = − log(1−p) . Next use
− log
the fact that p < − log(1 − p) for all 0 < p < 1 to see that M < p i.e. M = O(1/p)
repetitions suffice.
In each round we query f three times (once at the start to make |f i and twice more at the
end to check the output r) so we use O(log log N ) queries in all. We also need to apply the
“large” unitary gate QFTN (which grows with N ) and we show in the next section that
this may be implemented in O((log N )2 ) elementary steps. The remaining operations
are all familiar arithmetic operations on integers of size O(N ) (such as cancelling c/N
down to lowest form) that are all well known to be computable in polynomial time i.e.
poly(log N ) steps. Thus we succeed in determining the period with any constant level
1 − of probability with O(log log N ) queries and O(poly(log N )) further computational
steps.
We have described above the quantum algorithm for periodicity determination, for pe-
riodic functions on Zn which will form the core of Shor’s efficient quantum algorithm
for integer factorisation (cf below). But the basic problem of periodicity determination
may be mathematically generalised in a natural way from Zn to an arbitrary group G as
the so-called hidden subgroup problem (beyond the scope of this course). This formalism
leads to a class of further important quantum algorithms of which Simon’s algorithm and
the above Zn case are special cases.
64
arithmetic of n bit strings that we used earlier!)
and we want to insert the expression for xy/2n from eq. (9) into the exponential.
Since eq. (9) is a sum over the P different yi ’s, the exponential will be a product of
these
P terms
P and P hence the sum y0 ,...,yn−1 splits up into a product of single index sums
( y0 )( y1 ) . . . ( yn−1 ) so we get
X xy X xy
exp 2πi |yi = exp 2πi |yn−1 i |yn−2 i . . . |y0 i =
y
2n y
2n
|0i + e2πi(.x0 ) |1i |0i + e2πi(.x1 x0 ) |1i . . . |0i + e2πi(.xn−1 ....x0 ) |1i .
(10)
Hence QFT|xi is √the product of corresponding 1-qubit states obtained by taking each
bracket with a 1/ 2 normalising factor.
This factorisation is the key to building our QFT circuit. It should map each basis (prod-
uct) state |xn−1 i . . . |x0 i into the corresponding product state given in eq. (10). Before
we start note that the Hadamard operation can be expressed in our binary fractional
notation as
1
H |xi = √ |0i + e2πi(.x) |1i .
2
Indeed if x = 0 resp. 1 then .x is 0 resp. 1/2 as a decimal fraction so e2πi(.x) is 1 resp.
-1, as required.
To see how the QFT circuit actually works, let’s look at the example of N = 8 i.e. n = 3.
We want a circuit that transforms |x2 i |x1 i |x0 i to the following states in these three
registers (called y2 , y1 , y0 at the output):
65
y2 register y1 register y0 register
1 1 1
√ |0i + e2πi(.x0 ) |1i √ |0i + e2πi(.x1 x0 ) |1i √ |0i + e2πi(.x2 x1 x0 ) |1i
⊗ ⊗
2 2 2
| {z } | {z } | {z }
where the binary digit 1 in the last exponential is (d + 1) places to the right of the dot.
The controlled-Rd gate, denoted C-Rd acts on two qubits and is defined by the following
actions
C-Rd |0i |ψi = |0i |ψi C-Rd |1i |ψi = |1i Rd |ψi
for any 1-qubit state |ψi. Diagramatically this will be denoted as
Rd
66
.. ..
. .
|x2 i H R1 R2 .. .. |y2 i
. . A
A
.. .. A
. .
v
A
|x1 i .. H R1 .. A |y1 i
. . A
.. .. A
. . A
|x0 i v v H A |y0 i
.. ..
. .
STAGE 1 .. STAGE 2 .. STAGE 3 SWAP
. .
For N = 8 = 23 we use 3 Hadamard gates (one in each stage) and 2 + 1 controlled phase
gates (in stages 1 and 2 respectively). For general N = 2n we would use n Hadamard
gates (one in each of n stages) and (n − 1) + (n − 2) + . . . + 2 + 1 = n(n − 1)/2 controlled
phase gates (in stages 1, 2, . . . , n − 1 respectively). Overall we have O(n2 ) = O((log N )2 )
gates for QFT mod N . (In this accounting we have ignored the final swap operation to
reverse the order of qubits, but this requires only a further O(n) 2-qubit SWAP gates to
implement).
67
10 Quantum algorithms for search problems
We can intuitively think of NP as comprising problems that are “hard to solve” (i.e. no
poly time algorithm known) but if a solution (or certificate of a solution) is given then
its correctness can be “easily verified” (i.e. in poly time). Typically we are faced with a
search over an exponentially large space of candidates seeking a “good” candidate, and
given any candidate it is easy to check if it is good or not.
Definition of NP
NP (“nondeterministic poly time”):
a language is in NP if it has a poly-time verifier V . A verifier V for a language L is a
computation with two inputs w and c such that:
(i) if w ∈ L then for some c, V (w, c) halts with “accept”. Any such ‘good’ c is called a
certificate of membership for w;
(ii) if w ∈
/ L then for all c, V (w, c) halts with “reject”.
V is a poly-time verifier if for all inputs (w, c) V runs in poly(n) time where n is the size
of w. (Note that in this case c need only be poly(n) long too, since any single step of a
computation can access only a constant number of new bits).
Intuitively (i) and (ii) say that you can certify membership of L (viz. (i)) in such a way
that you cannot be tricked into accepting false w’s (viz. (ii)) and checking of certificates
can be done quickly/efficiently. Note the asymmetry – we are required to certify only
membership, but not non-membership.
Alternative definition of NP
Imagine a computer that operates “nondeterministically” i.e. instead of sequentially
implementing the steps of a single algorithm, at each step the computer duplicates itself
and branches into two computational paths performing two steps (possibly the same)
that are performed simultaneously in parallel (in contrast to a probabilistic choice of
one or other step). Thus after m steps we have 2m computers performing computations
in parallel. We require that all paths eventually halt (with “accept” or “reject”) and
the running time of this nondeterministic computation is defined to be the length of the
longest path.
The computation is defined to:
(i) accept its input if at least one path accepts; and
(ii) reject its input if all paths reject,
68
(so all inputs are either accepted or rejected as (ii) is the negation of (i)). Then we have:
Proposition: NP is the class of languages that are decided by a nondeterministic
computation with polynomial running time.
[Optional exercise: prove the proposition – given such a nondeterministic computation
and input w, what is the verifier and certificate (if w ∈ L)? Conversely, given a verifier,
what is the corresponding nondeterministic computation with acceptance conditions as
above?]
Note that this notion of computation is “non-physical” for complexity considerations, in
the following sense: although we have just a polynomial running time, we generally need
to invest an exponential amount of physical resources to actually implement it viz. an
exponential number of computers all running simultaneously, or alternatively, a single
computer with exponential running time - being used to do an exponential number of
computations i.e. all the paths, in succession.
The satisfiability problem SAT: given a Boolean formula φ(x1 , . . . , xn ) with n vari-
ables and single bit output, we want to decide if there is an assignment x1 = b1 , . . . , xn =
bn with φ(b1 , . . . , bn ) = 1. Any such assignment is called a satisfying assignment for φ. A
brute force evaluation of all 2n possible assignments will surely decide this problem but
this generally takes exponential (O(2n )) time. More formally, if we encode the formula as
a bit string using some specified representation of its basic symbols, each as a bit string,
then inputs of size m could have O(m) variables and hence the brute force algorithm
runs in exponential time.
It is not known whether SAT is in P or not but it is easily seen to be in NP – if φ
is satisfiable then the certificate c is any actual satisfying assignment and the verifier
V (φ, c) simply evaluates φ(c) to check that it is 1. Clearly if φ is unsatisfiable we cannot
be tricked into accepting it by this procedure!
Relation to searching: SAT illustrates a fundamental connection between NP and
search problems – for any φ(x1 , . . . , xn ) we have an exponentially large number of can-
didate assignments (possible certificates) and we want to know if a “good” (satisfying)
assignment exists. Although it is not clear how to locate a good candidate “quickly”, if
we are given any prospective candidate we can check quickly if it is good or not. This is
a general feature of very many practical problems e.g. scheduling/timetabling tasks, or
more general simultaneous constraint satisfaction problems.
From the definitions we have the series of inclusions P ⊆ NP ⊆ PSPACE and P ⊆
BPP ⊆ PSPACE, but it is not known whether either of NP and BPP is contained in
the other or not. The most notoriously famous open problem of complexity theory is the
question of whether P is equal to NP or not.
The unstructured search problem
Suppose we are given a large database with N items and we wish to locate a particular
item. We assume that the database is entirely unstructured or unsorted but given any
item we can easily check whether or not it is the one we seek. Our algorithm should
locate the item with some constant level of probability (half say) independent of the
69
size N . Each access to the database is called a query and we normally regard it as one
computational step.
For classical computation we may argue that O(N ) queries will be necessary and suffi-
cient: the good item has completely unknown location; if we examine an item and find it
bad, we gain no further information about the location of the good item (beyond the fact
that it is not the current one). Hence if we examine m items the probability of seeing
the good one is p = m/N so we must have m = O(N ) to have p constant.
√
For quantum computation we will see that O( N ) queries are sufficient (and in fact
that number is necessary too) to locate the good item i.e. we get a quadratic speedup
over classical search. This speedup does not cross the polynomial vs. exponential divide
(the “holy grail” of complexity theory) but it is still viewed as significant in situations
where exhaustive search is the best known classical algorithm. At first sight we might
have naively expected an exponential quantum speedup here: suppose N = 2n and recall
that a quantum algorithm can easily access 2n items in superposition (by use of only
n = log N Hadamard operations) so we can look up the “goodness” of all items in
superposition, with just one query! We may then hope that we could manipulate the
resulting quantum state to efficiently reveal the good item. But the above-quoted result
shows that this hope cannot be realised. Intuitively the good item occurs with only an
exponentially small amplitude in the total superposition. If the item were re-located at
another place (or reclassified as bad) then the corresponding quantum state would differ
only by an exponentially small amount in the space of quantum states and it will thus
be very difficult to reliably distinguish by any physical process.
Databases are often actually structured, in a way that can facilitate the search. As an
example suppose our N items are labelled by the numbers and we seek a particular one
labelled k. Unstructured search (requiring O(N ) queries) corresponds to the database
containing the numbers in some unknown random order. But if the items are structured
by being presented in numerical order, then we can locate k with only O(log N ) queries
(in fact exactly 1×log N queries) using a binary search procedure: each query of a middle
item eliminates an entire half of the remaining database. This kind of structured search
is common in practice e.g. the lexicographic ordering of names in a large phone book
facilitating search for a given person’s number. But suppose we were given a person’s
number and asked to determine their name. Then we would be faced with an essentially
unstructured search requiring a lot more time!
In the following we will consider quantum algorithms for only unstructured search, √ in
particular Grover’s quantum searching algorithm which achieves this search in O( N )
queries. The issue of understanding which kinds of structure in a database can provide a
good benefit for quantum versus classical computation is still largely open and a topic of
current research. (One interesting known result is that in the case of a linearly ordered
database (such as the phone book above) any quantum algorithm still requires O(log N )
queries but the actual number of queries now is k log N with k strictly less than 1).
70
10.2 Grover’s Quantum Searching Algorithm
(with I denoting the identity operation) is the operation of reflection in the subspace that
is orthogonal to |αi (i.e. vectors in that subspace are left unchanged and general vectors
have their component along |αi reversed in sign). For any unitary operator U it is easy
to check that
U Π|αi U † = ΠU |αi U I|αi U † = IU |αi . (12)
Example.
In the space of a single qubit let α⊥ be any chosen unit vector orthogonal to |αi. Then
any ket vector may be uniquely expressed as |vi = x |αi + y α⊥ and
71
|xi - - |xi
Uf
|yi - - |y ⊕ f (x)i
The assumption that the database is unstructured is formalised here as the standard
oracle idealisation that we have no access to the internal workings of Uf – it operates
as a “black box” on the input and output registers, telling us only if the queried item is
good or not.
Instead of using Uf we will generally use a closely related operation denoted Ix0 on n
qubits. It is defined by
|xi if x 6= x0
Ix0 |xi = (14)
− |x0 i if x = x0
i.e. Ix0 simply inverts the amplitude of the |x0 i component and so Ix0 is just the reflection
operator I|x0 i defined above. If x0 is the n bit string 00 . . . 0 then Ix0 will be written simply
as I0 .
A black box which performs Ix0 may be simply constructed from Uf by just setting the
output register to √12 (|0i − |1i). Then the action of Uf leaves the output register in this
state and effects Ix0 on the input register. Pictorially
Our searching problem becomes the following: we are given a black box which computes
Ix0 for some n bit string x0 and we want to determine the value of x0 using the least
number of queries to the box.
We will work in a space of n qubits with a standard basis {|xi} labelled by n-bit strings
x. Let Bn denote the space of all n-qubit states. Let Hn = H ⊗ . . . ⊗ H acting on Bn
denote the application of H to each of the n qubits separately.
Grover’s quantum searching algorithm operates as follows. Having no initial information
about x0 we begin with the state
1 X
|ψ0 i = Hn |0 . . . 0i = √ |xi (15)
2n x
which is an equal superposition of all possible x0 values. Consider the compound operator
72
Q, called the Grover iteration operator, defined by
Note that all amplitudes in |ψ0 i and all matrix elements of Q are real numbers so to
analyse Q we will be able to use the geometrical interpretations of the projection and
reflection operators described above in terms of real (rather than complex) Euclidean
geometry.
In the next section we will explain the structure of Q and show that it has a simple
geometrical interpretation:
(Q1): In the plane P(x0 ) spanned by (the initially unknown) |x0 i and |ψ0 i, Q is rotation
through angle 2α where sin α = √1N .
Thus by repeatedly applying Q to the starting state |ψ0 i in P(x0 ) we may rotate it
around near to |x0 i and then determine x0 with high probability by a measurement in the
standard basis. For large N , |x0 i and |ψ0 i are almost orthogonal and 2α ≈ 2 sin α = √2N .
√
Thus about π4 N iterations will be needed. Each application of Q uses one evaluation
√
of Ix0 and hence of Uf so O( N ) evaluations are required, representing a square root
speedup over the O(N ) evaluations needed for a classical unstructured search. More
precisely we have hx0 |ψ0 i = √1N so the number of iterations needed is the integer nearest
to (arccos √1N )/(2 arcsin √1N ) (which is independent of x0 ).
Example: searching for “one in four”.
A simple striking example is the case of N = 4 in which sin α = 21 and Q is a rotation
through π/3. The initial state is |ψ0 i = 12 (|00i + |01i + |10i + |11i) and for any marked
x0 the angle between |x0 i and |ψ0 i is precisely π/3 too. Hence after one application of Q
i.e. just one query, we will learn the position of any single marked item in a set of four
with certainty!
The iteration operator Q – reflections and rotations
Using eq. (12) (and noting that H = H † ) the Grover iteration operator can be written
Now for any |αi and |vi we have I|αi |vi = |vi−2hα|vi |αi i.e. |vi is modified by a multiple
of |αi. Hence if |vi is in the (real) plane P(x0 ) spanned by |x0 i and |ψ0 i = Hn |0 . . . 0i
then both I|x0 i |vi and I|ψ0 i |vi will be in P(x0 ) too – indeed I|x0 i and I|ψ0 i within this
plane are just reflections in the lines perpendicular to |x0 i and |ψ0 i respectively. Hence
Q also preserves the plane P(x0 ) and its action is given by the following fact of Euclidean
geometry.
Lemma: let M1 and M2 be two mirror lines in the Euclidean plane IR2 intersecting at a
point O and let θ be the angle in the plane from M1 to M2 (cf figure below). Then the
73
operation of reflection in M1 followed by reflection in M2 is just (anticlockwise) rotation
by angle 2θ about the point O.
M2
M1
θ
•
O
Proof of lemma: this is immediate, for example, from standard matrix expressions for
rotations and reflections in R2 .
Using the lemma we see that the action of IHn |0i I|x0 i = −Q in P(x0 ) is a rotation
through 2β where cos β = hx0 | Hn |0i = √1N . For large N , β ≈ π/2 and we have a
rotation of almost π. It would be possible to use this large rotation as the basis of the
quantum searching algorithm but we prefer a smaller incremental motion. We could use
the operator (IHn |0i I|x0 i )2 but there is another solution, explaining the occurrence of the
minus sign in the definition of Q:
Lemma: for any 2 dimensional real v we have
−Iv = Iv⊥
74
so the number of iterations is independent of x0 too.
Optimality
√
Grover’s algorithm achieves unstructured search for a unique good item with π4 N
queries. Is it possible to invent an even more ingenious quantum algorithm that uses
fewer queries? Alas the answer is no. We’ll just state (without proof):
Theorem Any quantum algorithm that achieves the search for a unique good item in an
unstructured
√ database of size N (with any constant level of probability, say half) must
use O( N ) queries.
√
Even more, it can be shown that π4 (1 − ) N queries for any > 0 are insufficient, so
Grover’s algorithm is optimal in a tight sense.
Searching with multiple good items
Suppose our search space contains r ≥ 1 good items and we wish to find any one such
item. Consider first the case that r is known. In this case we’ll see that our previous
algorithm still works; we just need to modify the number of iterations in a way that
depends on r.
Let the good items be denoted x1 , . . . , xr so now f (xi ) = 1 for i = 1, . . . , r and f (x) = 0
for all other x’s. Using the same construction that gave Ix0 from Uf in the case of a
single good item, we obtain the operator IG (where G stands for “good”) with action:
|xi if x 6= x1 , . . . , xr
IG |xi =
− |xi if x = x1 , . . . , xr
QG = −Hn I0 Hn IG = −I|ψ0 i IG .
Let r
1 X
|ψG i = √ |xi i
r i=1
be the equal superposition of all good items. We can separate out the good and bad
parts of the full equal superposition |ψ0 i writing:
√ √
1 X r N −r
|ψ0 i = √ |xi = √ |ψG i + √ |ψB i (17)
N N N
all x
where |ψB i = √N1−r bad x |xi is the equal superposition of all bad items and |ψG i and
P
|ψB i are orthogonal states. Note that we can write IG = I − 2 ri=1 |xi i hxi | xi which (on
P
arbitrary vectors) is not of the form I|αi for any single vector |αi. But we still have:
75
Theorem: let PG be the plane spanned by |ψ0 i and |ψG i. Then the action of QG
preserves this plane and within PG this action is rotation through angle 2α where
r
r
sin α = hψ0 |ψG i = .
N
Proof: Clearly I|ψ0 i preserves PG since acting on any |ψi it just subtracts a multiple of
|ψ0 i. For IG we note that by eq. (17), PG can also be characterised as the plane spanned
by the orthogonal states |ψG i and |ψB i. Now IG |ψG i = − |ψG i and IG |ψB i = |ψB i so
for any state |ψi = a |ψG i + b |ψB i in PG the action of IG is to subtract a multiple of
|ψG i i.e. the result lies in the plane too. This also shows that within PG , IG coincides
with the operation I|ψG i and QG = −I|ψ0 i I|ψG i = I|ψ⊥ i I|ψG i . Hence exactly as before, Q
0
is a rotation through angle 2α where α is the angle between ψ0⊥ and |ψG i i.e. sin α =
p
hψ0 |ψG i = r/N .
Now suppose that we start with |ψ0 i and p repeatedly apply QG . The angle between |ψ0 i
and |ψG i is β where cos β p= hψ0 |ψG i = r/N . Each applicationp of QG is a rotation
p
through 2α where sin α = r/N so we need β/(2α) = (arccos r/N )/(2 arcsin r/N )
iterations to move |ψ0 i very close to |ψ
pG i. If r << N thenp |ψ0 i and |ψG i are almost
orthogonal (β ≈ π/2) and α ≈ sin α = r/N so we need π4 N/r iterations.
The basic technique and use of the operator QG in the above result can be generalised
to give the so-called principle of amplitude amplification (which we won’t discuss in this
course).
Searching with an unknown number of good items
Optional, not required for exam purposes.
We can also adapt the algorithm to work in the case that r is unknown. The apparent
difficulty is the following: if we start with |ψ0 i and repeatedly apply the operator Q (in
either case r = 1 or r > 1) we just rotate the state round and round in the plane of |ψ0 i
and |ψG i. The trick is to know when to stop i.e. when the state lines up closely with
|ψG i in this plane. But if r is unknown then the rotation angle 2α of Q is unknown!
To illustrate the way around this problem we’ll consider only the case where the unknown
r is very small r << N . (General r values can be addressed by a more complicated
argument along
√ similar lines). We choose a number K uniformly randomly in the range
0 < K < π4 N , apply K iterations of Q, measure the final state and test if the result is
p
good or not. For r << N each iteration is a rotation√through √ small angle 2α ≈ 2 r/N
π
i.e. we have chosen a random √ angle in the range 0 to r 2
of r quadrants. Equivalently
we can choose one of the r quadrants at random and then a random angle in it. Now
think of |ψ0 i as the x-axis direction and |ψG i as the y axis direction (recalling that these
states are almost orthogonal for r << N ). If the final rotation angle is within ±45◦ of
the y axis then the final state |ψi has | hψ|ψG i |2 ≥ cos2 45◦ = 1/2 i.e. we have probability
at least half of seeing a good item in our final measurement. Now for every quadrant,
half√the angles are within ±45◦ of the y axis so our randomised procedure above, using
O( N ) queries, will locate a good item with probability at least 1/4. Repeating the
76
√
whole procedure a constant number of times, say M = 10 times, thus still using O( N )
queries, we will fail to locate a good item only with tiny probability (3/4)M = (3/4)10 .
This case of unknown r is directly relevant to the consideration of computational tasks
in NP, where rather than locating a good item we want instead to know whether a good
item exists or not. Consider for example the task SAT: given a Boolean function f , does
it have a satisfying assignment or not? f will generally have some unknown number r ≥ 0
of satisfying assignments. We run the above randomised version of Grover’s algorithm,
say 10 times, checking each output x to see if f (x) = 1 or not. If they all fail we conclude
that f is not satisfiable, which will be correct with high probability 1 − (3/4)10 . In this
way Grover’s algorithm can be applied to any NP problem to provide a quadratic speedup
over classical exhaustive search.
77
11 Shor’s quantum factoring algorithm
We will now describe Shor’s quantum factoring algorithm. Given an integer N with
n = log N digits this algorithm will output a factor 1 < K < N (or output N if N
is a prime) with any chosen constant level of probability 1 − , and the algorithm will
run in polynomial time O(n3 ). Currently the best known classical algorithm (the so-
1/3 2/3
called number field sieve algorithm) runs in time eO(n (log n) ) i.e. there is no known
polynomial time classical algorithm for this task.
We’ll begin by first describing some pure mathematics (number theory) – involving no
quantum ingredients at all – showing how to convert the problem of factoring N into
a problem of periodicity determination. Then we’ll use our quantum period finding
algorithm to achieve the task of factorisation. We’ll encounter (and deal with) a technical
complication: our function will be periodic on the infinite set Z of all integers so for
computational purposes we need to truncate this down to a finite size ZM for some M
(suitably large, depending on N ). Since we do not know the period at the outset the
restricted function will not be exactly periodic on ZM : the “last” period will generally
be incomplete (as M is not generally an exact multiple of the period). But we’ll see
that if M is sufficiently large (in fact M = O(N 2 ) will suffice) then there will be enough
complete periods so that the single “corrupted” period has only a negligible effect on our
period finding algorithm. We will also always choose M to be a power of 2 to be able to
use our explicit circuit for QFT mod M for such M ’s.
Let N with n = log N digits denote the integer that we wish to factorise. We start by
choosing 1 < a < N at random. Next using Euclid’s algorithm (which is a poly-time
algorithm) we compute the greatest common divisor b = gcd(a, N ). If b > 1 we are
finished. Thus suppose b = 1 i.e. a and N are coprime. We will use:
Theorem 2 (Euler’s theorem): If a and N are coprime then there is a least power
1 < r < N such that ar ≡ 1 mod N . r is called the order of a mod N .
We omit the proof which may be found in most standard texts on number theory.
Now consider the powers of a as a function of the index i.e. the modular exponential
function:
f : Z → ZN f (k) = ak mod N (18)
Clearly f (k1 + k2 ) = f (k1 )f (k2 ) and by Euler’s theorem f (r) = 1 so f (k + r) = f (k) for
all k i.e. f is periodic with period r. Also since r is the least integer with f (r) = 1 we
see that f must be one-to-one within each period.
Next suppose we can find r. (We will use our quantum period finding algorithm for this).
Suppose r comes out to be even. Then
ar − 1 = (ar/2 − 1)(ar/2 + 1) ≡ 0 mod N
78
i.e.
N exactly divides the product (ar/2 − 1)(ar/2 + 1) (19)
(and knowing r we can calculate each of these terms in poly(n) time).
We know N does not divide ar/2 − 1 (since r was the least power x such that ax − 1 is
divisible by N ). Thus if N does not divide ar/2 + 1 i.e. if ar/2 ≡
/ − 1 mod N , then in eq.
(19) N must partly divide into ar/2 − 1 and partly into ar/2 + 1. Hence using Euclid’s
algorithm again, we compute gcd(ar/2 ± 1, N ) which will be factors of N .
All this works provided r is even and ar/2 ≡
/ − 1 mod N . How likely is this, given that a
was chosen at random? We quote the following theorem.
Theorem 3 Suppose N is odd and not a power of a prime. If a < N is chosen uniformly
at random with gcd(a, N ) = 1 then Prob(r is even and ar/2 ≡
/ − 1 mod N ) ≥ 1/2.
For a proof of this result see Preskill’s notes page 307 et seq., Nielsen/Chuang appendix
4.3 or A. Ekert and R. Jozsa, Reviews of Modern Physics, vol 68, p733-753 1996, appendix
B.
Hence for any N which is odd and not a prime power, we will obtain a factor with
probability at least half. Given any candidate factor we can check it (in poly(n) time) by
test division into N . Thus repeating the process, say 10 times, we will fail to get a factor
only with tiny probability 1/210 , and succeed with any probability 1 − with log2 1/
repetitions.
Example 2 Consider N = 15 and choose a = 7. Then a direct calculation shows that the
function f (k) = 7k mod 15 for k = 0, 1, 2, . . . has values 1,7,4,13,1,7,4,13,. . . so r = 4.
Thus 74 − 1 = (72 − 1)(72 + 1) = (48)(50) is divisible by 15 and computing gcd(15, 48) = 3
and gcd(15, 50) = 5 gives non-trivial factors of 15.
All of this works if N is not even or a prime power. So how do we recognise and treat
these latter cases? If N is even (which is easy to recognise!) we immediately have a
factor 2 and we are finished. If N = pl is a prime power then we can identify this case
and find p using the following result (which we quote without proof).
Running this algorithm on any N will output some number c0 and we can check if it
divides N or not. If N was a prime power pl then c0 will be p.
Summarizing the process so far: given N we proceed as follows.
(i) Is N even? If so, output 2 and stop.
(ii) Run the algorithm of lemma 2, test divide the output and stop if a factor of N is
obtained.
79
(iii) If N is neither even nor a prime power choose 1 < a < N at random and compute
s = gcd(a, N ). If s 6= 1 output s and stop.
(iv) If s = 1 find the period r of f (k) = ak mod N . (We will achieve this with any
desired level of constant probability 1 − using the quantum algorithm described in the
next section).
(v) If r is odd, go back to (iii). If r is even compute t = gcd(ar/2 + 1, N ), so by definition
t is a factor of N . If t 6= 1, N output t. If t = 1 or N go back to (iii) and try again.
According to theorem 3 any run of (iv) and (v) will output a factor with probability
> 1/2 so K repetitions of looping back to (iii) will all fail only with probability < 1/2K
which can be made as small as we like.
Let r denote the (as yet unknown) period of f (k) = ak mod N on the infinite domain
Z. We will work on the finite domain D = {0, 1, . . . , 2m − 1} where 2m is the least power
of 2 greater than N 2 (see later for the reason for this choice). Let 2m = Br + b with
1 < b < r i.e. the domain D contains B full periods and only the initial part up to b of
the next period. Using a standard
P application of computation by quantum parallelism
1
we manufacture the state 2m x∈D |xi |f (x)i and measure the second register to obtain
√
some value y0 = f (x0 ) with 0 ≤ x0 < r. In the first register we get the state
A−1
1 X
|peri = √ |x0 + kri
A k=0
where m
B + 1 = b 2r c + 1 if x0 < b
A= m (20)
B = b 2r c if x0 ≥ b.
Let m −1
2X
QF T2m |peri = f˜(c) |ci .
c=0
2πi/2m
Writing ω = e we have
A−1
"A−1 #
cx0
1 X ω X
f˜(c) = √ √ ω c(x0 +kr) = √ √ ω crk .
A 2m k=0 A 2m k=0
As before (as in eq. (8), where c was called y) the square bracket is a geometric series
with ratio α = ω cr and we have
1−αA
2 A−1 for α 6= 1
[. . .] = 1 + α + α + . . . + α = 1−α
A for α = 1.
m
Let’s look more closely at the ratio α = e2πicr/2 . Previously we had r dividing the
denominator 2m exactly and 2m /r = A so if α 6= 1 then α was an Ath root of unity and
the geometric series summed to zero in all these cases. The only c values that survived
80
were the exact multiples of A = 2m /r having α = 1. There were r such multiples each
with equal |amplitude| of √1r .
In the present case r does not divide 2m exactly generally so α is not an Ath root of
unity and we don’t get a lot of “exactly zero” amplitudes for |ci’s! However we aim to
show that a measurement on QFT|peri will yield an integer c-value which is close to a
multiple of 2m /r with suitably high probability.
Consider the r multiples of 2m /r (which are now not integers necessarily!):
2m 2m 2m
0, , 2( ), . . . , (r − 1)( ).
r r r
Each of these is within half of a unique nearest integer. Note that k(2m /r) can never be
exactly half way between two integers since r < N and 2m > N 2 , so (using 2’s in 2m ) all
factors of 2 can be cancelled out of the denominator r. Thus we consider c values (r of
them) such that
2m 1
|c − k | < k = 0, 1, . . . , (r − 1). (21)
r 2
In the previous case of exact periodicity (where 2m /r was an integer) each of these c-
values appeared with probability 1/r and all other c-values had probability zero. Here
we will show that although the other c-values will generally have non-zero probabilities,
the special ones in eq. (21) still have probability at least γ/r for a constant γ.
6˜ 6˜
|f (c)| |f (c)|
2m /r- 2m /r- 2m /r-
≈ 2m /r
-
≈ 2m /r- ≈ 2m /r-
... ...
- -
c c
(a) exact periodicity (b) inexact periodicity
Theorem 4 Suppose we measure the label in QFT|peri. Let ck be the unique integer
m
with |c − k 2r | < 21 . Then prob(ck ) > γ/r where γ ≈ 4/π 2 .
81
Proof: (optional) For any c we have
1 − α A 2
1
prob(c) = |f˜(c)|2 =
A2m 1−α
82
11.3 Getting r from a good c value
and c/2m is a known fraction. We claim that there is at most one fraction k 0 /r0 with a
denominator r0 less than N satisfying eq. (26). Hence for given c/2m , eq. (26) determines
k/r uniquely. To prove this claim suppose k 0 /r0 and k 00 /r00 both lie within 1/(2N 2 ) of
c/2m . Then 0 00 0 00 0 00
− k = |k r − r k | ≥ 1 > 1
k
r0 (27)
r00 r0 r00 r0 r00 N2
But k 0 /r0 and k 00 /r00 are both within 1/(2N 2 ) of c/2m so they must be within 1/N 2 of
each other, contradicting eq. (27). Hence there is at most one k/r with r < N satisfying
eq. (26).
This result is the reason why we chose 2m to be greater than N 2 : it guarantees that the
bound on RHS of eq. (26) is < 1/(2N 2 ) and then k/r is uniquely determined from c/2m .
There are O(N 2 ) such fractions to try. We find that there is only one viz. a/b = 5/12
that satisfies eq. (28):
a 853 1
−
b 2048 = 0.000163 < 212 = 0.000244
This result is consistent with k = 5 and r = 12 and also with k = 10 and r = 24. But
our theory also guarantees that k is coprime to r with “reasonable” probability which in
this case sets r = 12. We can then verify that 712 is indeed congruent to 1 mod 39 and
7x for all x < 12 is not congruent to 1 so r = 12 is the correct period.
83
So far we have that k/r is uniquely determined by c/2m but how do we actually compute
k/r from c/2m ? In the above example we were able to try out all candidate fractions
k 0 /r0 with denominator less than N . But there are generally O(N 2 ) such fractions to try
so this method of seeking the unique one is not efficient, requiring at least O(N 2 ) steps,
which is exponential in n = log N !
To obtain an efficient (i.e. poly(n) time) method we invoke the elegant mathematical:
Theory of continued fractions
Any rational number s/t (with s < t) may be expressed as a so-called continued fraction:
s 1
= (29)
t a1 + a2 + 1 1
···+ a1
l
where a1 , . . . , al are positive integers. To do this we begin by writing s/t = 1/(t/s). Since
s < t we have t/s = a1 + s1 /t1 with a1 ≥ 1 and s1 < t1 = s and so
s 1
= s1 .
t a1 + t1
Continuing in this way we get a sequence of integers ak , sk and tk . Note that sk < tk and
tk+1 is always given by sk . Hence the sequence tk of denominators is strictly a decreasing
sequence of non-negative integers and hence the process must always terminate, after
some number l, of iterations giving the expression in eq. (29).
To avoid the cumbersome “fractions of fractions” notation in eq. (29) we will write
1
1 = [a1 , a2 , . . . , al ]. (30)
a1 + a2 + 1
···+ a1
l
For each k = 1, . . . , l we can truncate the fraction in (30) at the k th level to get a sequence
of rational numbers
p1 1 p2 1 a2
= [a1 ] = , = [a1 , a2 ] = 1 = , ···
q1 a1 q2 a1 + a2
a1 a2 + 1
pk pl s
= [a1 , . . . , ak ], ... = [a1 , . . . , al ] = .
qk ql t
pk /qk is called the k th convergent of the continued fraction of s/t.
Continued fractions enjoy the following tantalising properties.
84
Lemma 3 Let a1 , . . . , al be any positive numbers (not necessarily integers here). Set
p0 = 0, q0 = 1, p1 = 1 and q1 = a1 .
(a) Then [a1 , . . . , ak ] = pk /qk where
pk = ak pk−1 + pk−2 qk = ak qk−1 + qk−2 k ≥ 2. (31)
Note that if the ak ’s are integers then so are the pk ’s and qk ’s.
(b) qk pk−1 − pk qk−1 = (−1)k for k ≥ 1.
(c) If a1 , . . . , al are integers then gcd(pk , qk ) = 1 for k ≥ 1.
Theorem 5 Consider the continued fraction s/t = [a1 , . . . , al ]. Let pk /qk = [a1 , . . . , ak ]
be the k th convergent for k = 1, . . . , l. If s and t (cancelled to lowest terms) are m bit
integers then the length l of the continued fraction is O(m) and this continued fraction
together with its convergents can be calculated in time O(m3 ).
Theorem 6 Let 0 < x < 1 be a rational number and suppose that p/q is a rational
number such that
x − < 1 .
p
q 2q 2
85
Then p/q is a convergent of the continued fraction of x.
Proof (optional):
Let p/q = [a1 , . . . , an ] be the CF expansion of p/q with convergents pj /qj , so pn /qn = p/q.
Introduce δ defined by
pn δ
x= + 2 (32)
qn 2qn
so |δ| < 1. We aim to show that the CF of x is an extension of the CF of p/q i.e. we
want to construct λ rational so that x = [a1 , . . . , an , λ]. In view of lemma 3(a) define λ
by x = (λpn + pn−1 )/(λqn + qn−1 ). Using eq. (32) to replace x we get
qn pn−1 − pn qn−1 qn−1
λ=2 − .
δ qn
By lemma 3(b), qn pn−1 − pn qn−1 = (−1)n . We may assume that this is the same as the
sign of δ since if it is the opposite sign then from the start write p/q = [a1 , . . . , an − 1, 1]
so the value of n is increased by 1 and the sign is flipped. Thus without loss of generality
we can assume that (qn pn−1 − pn qn−1 )/δ is positive and so
2 qn−1
λ= − >2−1>1
δ qn
(as |δ| < 1 and qn−1 < qn ). Next let λ = b0 + λ0 where b0 is te integer part and 0 < λ0 < 1
and write λ0 = [b1 , . . . , bm ]. So x = [a1 , . . . , an , λ] = [a1 , . . . , an , b0 , b1 , . . . , bm ] i.e. p/q is
a convergent of the CF of x as required. (In the last argument we also used the easily
proven fact that the CF expansion of any number is unique, except for the above trick
of splitting 1 off from the last term i.e. if [a1 , . . . , an ] = [b1 , . . . , bm ] and an , bm 6= 1 then
m = n and ai = bi ).
Remark: Theorem 6 actually remains true for irrational x too. For an irrational
number the continued fraction development does not terminate – we get an infinitely
long continued fraction and corresponding infinite sequence of rational convergents pk /qk
k = 1, 2, . . .. This sequence provides an efficient method of computing excellent rational
approximations to an irrational recalling that qk grows exponentially with k and (by
theorem 6) it determines the accuracy of the approximation.
Now let us return to our problem of getting r from the knowledge of c and 2m satisfying
eq. (26):
c k 1
2m − r < 2N 2 and r < N .
We know that there is (at most) a unique such fraction k/r and according to theorem 6
this fraction must be a convergent of the continued fraction of c/2m . Since 2m = O(N 2 )
we have that c and 2m are O(n) bit integers and the computation of all the convergents
can be performed in time O(n3 ). So we do this computation and finally check through
the list of O(n) convergents to find the unique one satisfying eq. (26), and read off r as
its denominator.
86
Example 4 (Continuation of example 3).
Suppose we have obtained c = 853 with 2m = 211 = 2048. We develop 853/2048 as a
continued fraction:
853 2048 342 853 169
= 1/(2048/853); =2+ ; =2+ ;
2048 853 853 243 342
342 4 169 1 4
=2+ ; = 42 + ; =4+0
169 169 4 4 1
so
853
= [2, 2, 2, 42, 4].
2048
The convergents are
1 2 5 212 852
[2] = ; [2, 2] = ; [2, 2, 2] = ; [2, 2, 2, 42] = ; [2, 2, 2, 42, 4] = .
2 5 12 509 2048
Checking these five fractions we find only 5/12 as being within 1/212 of 853/2048 and
having denominator < 39.
In appendix 11.4 we will reconsider all the ingredients of Shor’s quantum factoring algo-
rithm and assess its polynomial time complexity in more detail.
Remark
There exist algorithms for integer multiplication that are faster than O(n2 ) time, running
87
in time O(n log n log log n) so the above O(n3 ) can be improved to O(n2 log n log log n).
Next we perform measurements on the output register of O(n) qubits i.e. O(n) single
qubit measurements. Then we apply QFT mod 2m to obtain the state QFT|peri. We
have seen in section 9.3 that QFT mod 2m may be implemented in O(m2 ) = O(n2 )
steps.
Remark
There is a further subtle issue here. To implement QFT mod 2m we will need controlled
d
Rd gates (cf eq. (11)) with smaller and smaller phases eiπ/2 for d = O(m), which
potentially involves an implementational cost that grows with m. However it can be
shown that we can neglect these gates for very small phases, giving an inexact but still
suitably good approximation to QFT for the factoring algorithm to work, and still have
implementational cost O(n2 ).
Next we measure the state QFT|peri (O(n) single qubit measurements again) to obtain
the value that we called c in section 11.3. Thus to get such a value the number of steps
is O(n2 log n log log n) + O(n) + O(n2 ) + O(n) = O(n2 log n log log n). To get the period
r we need c to be a “good” c value i.e. c/2m is close to a multiple k/r of 1/r where k is
coprime to r. To achieve this with a constant level of probability, O(log log N ) = O(log n)
repetitions of the above process suffice i.e. O(n2 (log n)2 log log n) steps in all.
Remark
Actually it may be shown that a constant number of repetitions suffices here (instead
of O(log n)) to determine r. Suppose that in two repetitions we obtain k1 /r and k2 /r
with neither k1 nor k2 coprime to r. Then we will determine r1 and r2 which are the
denominators of k1 /r and k2 /r cancelled to lowest terms i.e. r1 and r2 will be randomly
chosen factors of r. Then, according to a further theorem of number theory, if we compute
the least common multiple r̃ of r1 and r2 we will have r̃ = r with probability at least
1/4.
To get r from c we use the (classical) continued fractions algorithm which required O(n3 )
steps. Finally to obtain our factor of N we (classically) compute t = gcd(ar/2+1 , N ) using
Euclid’s algorithm which requires O(n3 ) steps for n digit integers. If r was odd or r is
even but t = 1 then we go back to the start. But we saw that the good case “r is even
and t 6= 1” will occur with any fixed constant level of probability 1 − after a constant
number O(log 1/) of such repetitions.
Hence the time complexity of the entire algorithm is O(n3 ) (or actually slightly better
with optimized algorithms and a more careful complicated analysis). It is amusing to
note that the “bottlenecks” of the algorithms performance i.e. the sections requiring the
highest degree polynomial running times, are actually the classical processing sections
and not the novel quantum parts!
88