Quantum Computing, 2nd Ed.
Quantum Computing, 2nd Ed.
Quantum
Computing
Second Edition
Springer
Mika Hirvensalo
University of Turku
Department of Mathematics
20014 Thrku
Finland
[email protected]
Series Editors
G. Rozenberg (Managing Editor)
[email protected]
Th. Bäck,}. N. Kok, H. P. Spaink
Leiden Center for Natural Computing, Leiden University
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
A.E.Eiben
Vrije Universiteit Amsterdam
ACM Computing Classification (1998): F.1-2, G.1.2, G3, H.l.l, 1.1.2, J.2
ISBN 978-3-642-07383-0
This work is subject to copyright. All rights are reserved, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data
banks. Duplication of this publication or parts thereof is permitted only under the prov-
isions of the German Copyright Law of September 9, 1965, in its current version, and per-
mission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH.
Violations are liable for prosecution under the German Copyright Law.
springeronline.com
© Springer-Verlag Berlin Heidelberg 2004
Originally published by Springer-Verlag Berlin Heidelberg New York in 2004
Softcover reprint of the hardcover 2nd edition 2004
The use of general descriptive names, registered names, trademarks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
Cover design: KünkelLopka, Heidelberg
Typesetting: Computer to film from author's data
Printed on acid-free paper 45/3142PS - 543210
Preface to the Second Edition
After the first edition of this book was published, I received much positive
feedback from the readers. It was very helpful to have all those comments sug-
gesting improvements and corrections. In many cases, it was suggested that
more aspects on quantum information would be welcome. Unfortunately, I
am afraid that an attempt to cover such a broad area as quantum informa-
tion theory would make this book too scattered to be helpful for educational
purposes.
On the other hand, ladmit that some aspects of quantum information
should be discussed. The first edition already contained the so-called No-
Cloning Theorem. In this edition, I have added a stronger version of the
aforementioned theorem due to R. Jozsa, a variant which also covers the
no-deleting principle. Moreover, in this edition, I have added some famous
protocols, such as quantum teleportation.
The response to the first edition strongly supports the idea that the main
function of this book should be educational, and I have not included furt her
aspects of quantum information theory here. For further reading, I suggest
[43] by Josef Gruska and [62] by Michael A. Nielsen and Isaac L. Chuang.
Chapter 1, especially Section 1.4, includes the most basic knowledge for
the presentation of quantum systems relevant to quantum computation.
The basic properties of quantum information are introduced in Chapter 2.
This chapter also includes interesting protocols: quantum teleportation and
superdense coding.
Chapter 3 is divided as follows: Thring machines, as well as their prob-
abilistic counterparts, are introduced in Section 3.1 as traditional, uniform
models of computation. For the reader interested in quantum computing but
having little knowledge of the theory of computation, this section was de-
signed to also include the basic definitions of complexity theory. Section 3.1
is intended for the reader who has a solid background in quantum mechanics,
but little previous knowledge on classical computation theory. The reader
who is well aware of the theory of computation may skip Section 3.1: for such
areader the knowledge in Chapter 2 and Section 3.2 (excluding the first sub-
section) is sufficient to follow this book. In Section 3.2, we represent Boolean
and quantum circuits (as an extension of the concept of reversible circuits)
VI Preface
The twentieth century witnessed the birth of revolutionary ideas in the phys-
ieal sciences. These ideas began to shake the traditional view of the universe
dating back to the days of Newton, even to the days of Galileo. Albert Ein-
stein is usually identified as the creator of the relativity theory, a theory that
is used to model the behavior of the huge macrosystems of astronomy. An-
other new view of the physical world was supplied by quantum physics, which
tumed out to be successful in describing phenomena in the mieroworld, the
behavior of particles of atomic size.
Even though the first ideas of automatie information processing are quite
old, I feel justified in saying that the twentieth century also witnessed the
birth of computer science. As a mathematieian, by the term "computer sci-
ence", I mean the more theoretieal parts of this vast research area, such as
the theory of formal languages, automata theory, complexity theory, and al-
gorithm design. I hope that readers who are used to a more flexible concept of
"computer science" will forgive me. The idea of a computational device was
crystallized into a mathematical form as a Turing machine by Alan Turing
in the 1930s. Since then, the growth of computer science has been immense,
but many problems in newer areas such as complexity theory are still waiting
for a solution.
Since the very first electronic computers were built, computer technology
has grown rapidly. An observation by Gordon Moore in 1965laid the founda-
tions for what became known as "Moore's Law" - that computer processing
power doubles every eighteen months. How far can this technical process go?
How efficient can we make computers? In light of the present knowledge, it
seems unfair even to attempt to give an answer to these questions, but some
estimate can be given. By naively extrapolating Moore's law to the future,
we leam that sooner or later, each bit of information should be encoded by
a physieal system of subatomic size! Several decades ago such an idea would
have seemed somewhat absurd, but it does not seem so anymore. In fact, a
system of seven bits encoded subatomically has been already implemented
[51]. These small systems can no longer be described by classieal physies, but
rather quantum physical effects must be taken into consideration.
When thinking again about the formalization of a computer as a Turing
machine, rewriting system, or some other classieal model of computation, one
VIII Preface
realizes that the concept of information is usually based on strings over a finite
alphabet. This strongly reflects the idea of classical physics in the following
sense: each member of astring can be represented by a physical system
(storing the members in the memory of an electronic computer, writing them
on sand, etc.) that can be in a certain state; i.e., contain a character of the
alphabet. Moreover, we should be able to identify different states reliably.
That is, we should be able to make an observation in such a way that we
become convinced that the system under observation represents a certain
character.
In this book, we typically identify the alphabet and the distinguishable
states of a physical system that represent the information. These identifi-
able states are called basis states. In quantum physical microsystems, there
are also basis states that can be identified and, therefore, we could use such
microsystems to represent information. But, unlike the systems of classical
physics, these microsystems are also able to exist in a superposition of basis
states, which, informally speaking, means that the state of such a system can
also be a combination of basis states. We will call the information represented
by such microsystems quantum information. One may argue that in classi-
ca!. physics it is also possible to speak about combinations of basis states:
we can prepare a mixed state which is essentially a probability distribution
of the basis states. But there is a difference between the superpositions of
quantum physics and the prob ability distributions of classical physics: due to
the interference effects, the superpositions cannot be interpreted as mixtures
(probability distributions) of the basis states.
Richard Feynman [38] pointed out in 1982 that it appears to be extremely
difficult by using an ordinary computer to simulate efficiently how a quan-
tum physical system evolves in time. He also demonstrated that, if we had a
computer that runs according to the laws of quantum physics, then this sim-
ulation could be made efficiently. Thus, he actually suggested that a quantum
computer could be essentially more efficient than any traditional one.
Therefore, it is an interesting challenge to study quantum computation,
the theory of computation in which traditional information is replaced by
its quantum physical counterpart. Are quantum computers more powerful
than traditional ones? If so, what are the problems that can be solved more
efficiently by using a quantum computer? These quest ions are still waiting
for answers.
The purpose of this book is to provide a good introduction to quantum
computation for beginners, as weIl as a clear presentation of the most impor-
tant presently known results for more advanced readers. The latter purpose
also includes providing a bridge (from a mathematician's point of view) be-
tween quantum mechanics and the theory of computation: it is not only my
personal experience that the language used in research articles on these topics
is completely different.
Preface IX
1. Introduction.............................................. 1
1.1 ABrief History of Quantum Computation . . . . . . . . . . . . . . . . . 1
1.2 Classical Physics ....................................... 2
1.3 Probabilistic Systems ................................... 4
1.4 Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Feynman also suggested that this slowdown could be avoided by using a com-
puter running according to the laws of quantum physics. This idea suggests, at
least implicitly, that a quantum computer could operate exponentiaHy faster
than a deterministic classical one. In [38], Feynman also addressed the prob-
lem of simulating a quantum physical system with a probabilistic computer
but due to interference phenomena, it appears to be a difficult problem.
Quantum mechanical computation models were also constructed by Be-
nioff [7] in 1982, but Deutsch argued in [31] that Benioff's model can be
perfectly simulated by an ordinary computer. In 1985 in his notable paper
[31], Deutsch was the first to establish asolid ground for the theory of quan-
tum computation by introducing a fully quantum model for computation and
giving the description of a universal quantum computer. Later, Deutsch also
defined quantum networks in [32]. The construction of a universal quantum
Thring machine was improved by Bernstein and Vazirani in [13], where the
I One should give the coordinates and the momentum of each particle with re-
quired precision.
2 1. Introduction
important when learning about the differences between quantum and classical
computation.
As its very core, physics is ultimately an empirical science in the sense
that a physical theory can be regarded valid only if the theory agrees with
empirical observations. Therefore, it is not surprising that the concept of
observables 2 has great importance in the physical sciences. There are ob-
servables associated with a physical system, like position and momentum, to
mention a few. The description of a physical system is caHed the state of the
system.
Example 1.2.1. Assume that we would like to describe the mechanics of a
single particle X in a closed region of space. The observables used for the
system description are the position and the moment um. Thus, we can, un-
der a fixed coordinate system, express the state of the system as a vector
x = (Xl, X2, X3, pI, P2, P3) E jR6, where (Xl, X2, X3) describes the position and
(PI,p2,P3) the momentum.
As the particle moves, the state of the system changes in time. The way
in which classical mechanics describes the time evolution of the state is given
by the Hamiltonian equations 0/ motion:
d ä
-d xi=~H,
tUPi
(1.1)
(1.2)
but (1.1) is only the prob ability distribution over states Xi.
1.3 Probabilistic Systems 5
Example 1.3.1. Tossing a fair coin will give head h or tail t with a probability
of ~. According to classical mechanics, we may think that, in principle, perfect
knowledge about the coin and all circumstances connected to the tossing will
allow us to determine the outcome with certainty. However, in practice it is
impossible to handle all these circumstances, and the notation ~ [hJ + ~ [tJ
for the mixed state of a fair coin reflects our lack of information about the
system.
Example 1.3.2. Let us assume that the time evolution of a system with pure
states Xl, ... , X n also depends on an auxiliary system with pure states hand
t, such that the compound system state (Xi, h) evolves during a fixed time
interval into (Xh(i),h) and (Xi,t) into (Xt(i),t), where h,t: {l, ... ,n} I-t
{I, ... ,n} are some functions. The auxiliary system with states hand t can
thus be interpreted as a control system which indicates how the original one
should behave.
Let us then consider the control system in a mixed state P1 [hJ + P2 [tJ (if
P1,P2 =I- ~, we may call the control system a biased coin). The compound
state can then be written as a mixture
If the auxiliary system no longer interferes with the first one, we can ignore
it and write the state of the first system as
(1.3)
3 In classical computation, generating random bits is a very complicated issue. For
further discussion on this topic, consult section 11.3 of [64].
6 1. Introduction
such that Pli +P2i + ... +Pni = 1 for each i. In (1.3), Pji is the probability that
the system state Xi evolves into Xj. Notice that we have now also made the
time discrete in order to simplify the mathematical framework, and that this
actually is well-suited to the computational aspects: we can assurne that we
have instantaneous descriptions of the system between short time intervals,
and that during each interval, the system undergoes the time evolution (1.3).
Of course, the time evolution does not always need to be the same, but
rather may depend on the particular interval. Considering this probabilistic
time evolution, the notation (1.1) appears to be very handy: during a time
interval, a distribution
(1.4)
evolves into
+ ... + Pnl [Xn ]) + ... + Pn (Pln [Xl] + ... + Pnn [Xn ])
PI (Pll [Xl]
= (PllPl + ... + PlnPn) [xIJ + ... + (PnlPl + ... + PnnPn) [Xn ]
= P~ [Xl] + p~[X2] + ... + P~[Xn],
where we have denoted P~ = PilPl + ... + PinPn' The probabilities Pi and P~
are thus related by
P})
P2 (Pll
P2l.P12
P22 ... Pln) (PI)
... P2n P2
(1.5)
(
··· ...
.
....
...
.
..
p~ Pnl Pn2 ... Pnn Pn
Notice that the matrix in (1.5) has non-negative entries and Pli
+ P2i + ... +
Pni PI
= 1 for each i, which guarantees that p~ +p~+ ... +p~ = +P2+·· '+Pn'
A matrix with this property is called a Markov matrix. A probabilistic system
with a time evolution described above is called a Markov chain.
Notice that a distribution (1.4) can be considered as a vector with non-
negative coordinates that sum up to 1 in an n-dimensional real vector space
having [Xl], ... , [X n ] as basis vectors. The set of all mixed states (distribu-
tions) is a convex set,4 having the pure states as extremals. Unlike in the
representation of quantum systems, the fact that the mixed states are ele-
ments of a vector space is not very important. However, it may be convenient
to describe the probabilistic time evolution as a Markov mapping, i.e., a linear
mapping which preserves the property that the coordinates are non-negative
and sum up to 1. For an introduction to Markov chains, see [75], for instance.
4 A set S of vectors is convex, if for all Xl, X2 E Sand all PI, P2 2': 0 such that
PI + P2 = 1 also PIXI + P2X2 E S. An element X E S of a convex set is an
extremal, if x = PIXI + P2X2 implies that either PI = 0 or P2 = O.
1.4 Quantum Mechanics 7
(1.6)
(1.7)
where O:i are complex numbers called the amplitudes (with respect to the
chosen basis) and the requirement of unit length means that 10:11 2+ 10:21 2+
... + IO:nl 2 = 1.
It should be immediately emphasized that the choice of the orthonormal
basis {lxI) , ... , Ix n )} is arbitrary but, for any such fixed basis, refers to a
physical observable which can take n values. To simplify the framework, we do
not associate any numerical values with the observables in this section, but we
merely say that the system can have properties Xl, ... , X n . Numerical values
associated to the observables are handled in Section 8.3.2. The amplitudes O:i
induce a probability distribution in the following way: the probability that a
system in astate (1.7) is seen to have property Xi is IO:il 2 (we also say that the
5 A finite-dimensional vector space over complex numbers is an example of a Hilbert
space.
8 1. Introduction
prob ability that Xi is observed is IO:iI 2 ). The basis vectors lXi) are called the
basis states and (1.7) is referred to as a superposition of basis states. Mapping
'l/J(Xi) = O:i is called the wave function with respect to basis lXI), ... , Ix n ).
Even though this state vector formalism is simpler than the general one,
it has one inconvenient feature: if Ix) ,Iy) E H n are any states that satisfy
Ix) = eiO Iy), we say that states Ix) and Iy) are equivalent. Clearly equivalent
states induce the same probability distribution over basis states (with respect
to any chosen basis), and we usually identify equivalent states.
where amplitudes 0:1, ... , O:n and o:~, ... , o:~ are related by
(1.8)
and 10:~12 + ... + 10:~12 = 10:11 2+ ... + IO:nI 2. It turns out that the matrices
satisfying this requirement are unitary matrices (see Section 8.3.1 for details).
Unitarity of a matrix A means that the transpose complex conjugate of A,
denoted by A *, is the inverse matrix to A. Matrix A * is also called the
adjoint matrix 6 of A. By saying that the time evolution of quantum systems
is unitary, several authors mean that the evolution of quantum systems are
determined by unitary matrices. Notice that unitary time evolution has very
6 Among physics-oriented authors, there is also a widespread tradition of denoting
the adjoint matrix by At.
1.4 Quantum Mechanics 9
(1.9)
(1.10)
of a quantum system formally resemble each other very closely, and therefore
it is quite natural if (1.10) can be seen as a generalization of (1.9). At first
glance, the interpretation that (1.10) induces the probability distribution such
that lai 12 is the probability of observing Xi may only seem like a technical
difference. Can (1.10) also be interpreted to represent our ignorance such that
the system in state (1.10) is actually in some state lXi) with a probability of
lail 2 ? The answer is absolutely no. A fundamental difference can be found
even by recalling that the orthonormal basis of H n can be chosen freely. We
can, in fact, choose an orthonormal basis Ix~), ... , Ix~) such that
so, with respect to the new basis, the state of the system is simply Ix~).
The new basis refers to another physical observable, and with respect to this
observable, the system may have some of the properties x~, ... , x~. But in
the state lxi), the system is seen to have property xi with a probability of
1.
Example 1.4.2. Consider a quantum system with two basis states t and h
(of course this system is the same as a quantum bit, only the terminology
is different for illustration). Here we call this system a quantum coin. We
consider a time evolution
1 1
Ih) f-t J2 Ih) + J2 It)
1 1
It) f-t J2 Ih) - J2 It) ,
which we here call a fair coin toss (verify that the time evolution is unitary).
Notice that, beginning with either state Ih) or It), the state after the toss is
one of the above states on the right-hand side, and that both of them have the
property that h and t will both be observed with a probability of ~. Imagine
then that we begin with state Ih) and perform the fair coin toss twice. After
the first toss, the state is as above, but if we do not observe the system, the
10 1. Introduction
state will be Ih) again after the second toss (verify). The phenomenon that
t cannot be observed after the second toss clearly cannot take place in any
probabilistic system.
sentations (1.11) is the tensor product of the state spaces of the subsystems,
and now we will briefly explain what is meant by a tensor product.
Let H n and H m be two vector spaces with bases {Xl,"" X n } and
{Y1,"" Ym}. The tensor product of spaces H n and H m is denoted by
Hn®Hm . Space Hn®Hm has ordered pairs (Xi,Yj) as a basis, thus Hn®Hm
has dimension mn. We also denote (Xi, Y j) = Xi ® Y j and say that Xi ® Y j is
the tensor product of basis vectors Xi and Y j. Tensor products of other vec-
tors than basis vectors are defined by requiring that the product is bilinear:
n m n m
Since Xi ® Yj form the basis of H n ® H m , the not ion of the tensor product
of vectors is perfectly weIl established, but notice carefully that the tensor
product is not commutative. In connection with representing quantum sys-
tems, we usually omit the symbol ® and use notations closer to the original
idea of regarding Xi ® Y j as a pair (Xi, Y j ):
(1.12)
These states are called superpositions of basis states, and they induce a
prob ability distribution such that when one observes (1.12), a property Xi
is seen with prob ability lai 1
2.
It must be first noted that, despite the title, quantum information theory is
not the main topic of this chapter. Instead, we will merely concentrate on de-
scriptional differences between presenting information by using classical and
quantum systems. In fact, in this chapter, we will present the fundament als of
quantum information processing. A reader weH aware of basic linear algebra
will presumably have no difficulties in foHowing this section, but for areader
feeling some uncertainty, we recommend consulting Sections 8.1, 8.2, and the
initial part of Section 8.3. Moreover, the basic notions of linear algebra are
outlined in Section 9.3.
(2.1)
that has a unit length, i.e., Icol2 + ICll2 = 1. Numbers ca and Cl are called
amplitudes of 10) and 11), respectively.
We say that an observation of a quantum bit in state (2.1) will give 0 or
1 as an outcome with probabilities of Ical 2and IClI2, respectively.
is unitary, Le.,
( ac)
bd
(a*c* d*b*) = (101·
0)
Remark 2.1.2. In the coordinate representation, (2.2) can be written as
Notation a*
stands for the complex conjugate of complex number a. No-
tation A * will be also used for different purposes, but this should cause no
misunderstandings; the meaning of a particular *-symbol should be clear by
the context. In what follows, notation (a, b)T is used to indicate the transpo-
sition, i.e.,
Example 2.1.1. Let us use the coordinate representation 10) = (1,0)T and
11) = (0, 1)T. Then the unitary matrix
M~ = (~~)
defines an action M~ 10) = 11), M~ 11) = 10). A unary quantum gate defined
by M~ is called a quantum not-gate.
l+i I-i)
JM~ =
( l~i ~ .
Since
1
2'
observation of (2.4) and (2.5) will give 0 or 1 as the outcome, both with a
probability of ~. Because ,jM~,jM~ = M~, gate ,jM~ is called the square
root of the not-gate.
The action of W 2 is
1 1
W 2 10) = ,;21 0 ) + ,;21 1),
1 1
W 2 10) = ,;21 0) - ,;21 1).
Matrix W 2 is called a Walsh matrix, Hadamard matrix, or Hadamard- Walsh
matrix and will eventually appear to be very useful. A very important feature
of quantum gates that has been already mentioned implicitly is that they are
linear, and therefore it suffices to know the action on the basis states. For
example, if a Hadamard-Walsh gate acts on astate ~(IO) + 11)), the outcome
is
Notice that the above equality reveals that W 2 W 2 10) = 10). Similarly, we
can verify that W2W211) = 11).
What happens to state 10) when Hadamard-Walsh gate W 2 is applied on
it twice, is a very interesting matter. Figure 2.1 contains a scheme of that
event.
The top row of the figure contains the initial state 10). When applied
once, W 2 "splits" the state 10) into states 10) and 11); both will be present
with amplitudes 0'
This is depicted in the figure's middle row. The second
application of W 2 "splits" 10) as before, but 11) is split slightly differently; 11)
16 2. Quantum Information
10)
~ 10) + ~ 11)
Fig. 2.1. Hadamard-Walsh-gate twice. The left-hand side depicts how the applica-
tion of W2 operates on states, whereas the corresponding states are written on the
right side.
occurs with amplitude - ~. The bottom row of the figure describes the final
state. Now, the amplitudes in the bottom row can be computed by following
the path from top to bottom and multiplying all the amplitudes occuring in
the path. For example, the amplitude oft he left-most 10) in the bottom row is
~. ~ = ~,whereas the amplitude ofthe right-most 11) is ~. (- ~) = -~.
When computing the outcome, we add up all the states in the lower right
corner and get 10) as the outcome.
The effect that the amplitudes of states 10) sum to more than any of
the summands is called constructive interference. On the other hand, the
cancellation of states 11) is referred as to destructive interference.
F=(~_~).
Then, Facts as F 10) = 10), F 11) = -11). Gate F is an example of unary
quantum gates called phase flips. In general, phase flips are of the form
°
rules of probabilities will apply. This means that an observation of the first
(resp., second) qubit will give or 1 with probabilities of lcol 2 + ICl12 and
IC212 + IC312 (resp., Icol2 + IC212 and ICl12 + 1c31 2).
Remark 2.2.1. Notice that here the tensor product of vectors does not com-
mute: 10) 11) -I- 11) 10). We use linear ordering (write from left to right) to
address the qubits individually.
~(IO) 10) + 10) 11) + 11) 10) + 11) 11)) = ~(IO) + 11)) ~(IO) + 11)),
as easily verified. On the other hand, state ~(IOO) + 111)) is entangled. To
see this, assume on the contrary, that
1
J2(100) + 111)) = (ao 10) + al 11) )(bo 10) + bIl l ))
= aobo 100) + aObl 101) + albo 110) + albl 111)
for some complex numbers ao, al, bo , and bl . But then,
which is absurd.
18 2. Quantum Information
Remark 2.2.2. Iftwo qubits are in an entangled state ~(IOO) + 111)), then,
°
observing one of them will give or 1, both with a probability of ~, but
it is not possible to observe different values on the qubits. It is interesting
to notice that the experiments have shown that this correlation can remain,
even if the qubits are spatially separated by more than 10 km [86]. This dis-
tant correlation opens opportunities for quantum cryptography and quantum
communication protocols. A pair of qubits in state ~ (100) + 111)) is called
an EPR pair. 1 Notice that if both qubits of the EPR pair are run through
a Hadamard gate, the resulting state is again ~(IOO) + 111)), so it is im-
possible to give an ignorance interpretation for the EPR pair. EPR pairs are
extremely important in many quantum protocols, and we will discuss their
usefulness later.
1000)
0100
Mcnot =
(
°°01
0010
defines a unitary mapping, whose action on the basis states is Mcnot 100) =
100), Mcnot 101) = 101), Mcnot 110) = 111), Mcnot 111) = 110). Gate Mcnot is
called controlled not, since the second qubit (target qubit) is flipped if and
only if the first (control qubit) is 1.
Other examples of multiqubit states and of their gates will be given after the
following important definition.
Definition 2.2.2. The tensor product, also called the Kronecker product, of
r x sand t x u matrices
is an rt x su matrix defined as
1 EPR stands for Einstein, Podolsky, and Rosen, who first regarded the distant
correlation as the source of a paradox of quantum physics [88].
2.2 Quantum Registers 19
mQm~(~} mQm~(~}
mQm~m,andmQm~m·
Similarly, we can get coordinate representations of tripIes of qubits, etc. No-
tice that the above coordinate representations agree with those of Definition
2.2.1 and that it is not a mere coincidence!
Example 2.2.4. Let MI = M 2 = W 2 be the Hadamard matrix ofthe previous
section. Action on both qubits with a Hadamard-Walsh matrix can be seen
as a binary quantum gate, whose matrix is defined by
1 1 1 1)
11-11-1
(
W 4 = W 2 ® W 2 ="2 1 1 -1 -1 '
1 -1 -1 1
and the action of W 4 can be written as
But the action of M cnot turns the decomposable state (2.8) into an entangled
state 0(100) + 111)). Because M cnot introduces entanglement, it cannot be
a tensor product of two unary quantum gates.
where Icol2 +h1 2+ ... +lc2m_112 = 1. That is to say that a general description
of an m two-state quantum system requires 2m complex numbers. Hence, the
description size of the system is exponential in its physical size.
The time evolution of an m-qubit system is determined by unitary map-
pings in H 2 m. The size of the matrix of such a mapping is 2m X 2m , also
exponential in the physical size of the system. 2 A more detailed explanation
of quantum register processing is provided in Section 3.2.3.
So far, we have been talking about qubits, but everything also generalizes
to n-ary quantum digits; If we have an alphabet A = {al, ... ,an}, we can
2 It should not be any more surprising that Feynman found the effective determin-
istic simulation of a quantum system difficult. Due to the interference effects, it
also seems to be difficult to simulate a quantum system efficiently with a prob-
abilistic computer.
2.3 No-Cloning Theorem 21
identify the letters with basis states lal), ... , lan ) of an n-Ievel quantum
system. We say that such a basis is a quantum representation of alphabet A.
These key features of quantum representations are already listed at the end
of the introductory chapter:
• A set of n elements can be identified with the vectors of an orthonormal
basis of an n-dimensional complex vector space H n . We call H n astate
space. When we have a fixed basis lal), ... , lan ), we call these vectors basis
states. Also, the basis that we choose to fix is usually called a computational
basis.
• A general state of a quantum system is a unit-Iength vector in the state
space. If allal) + ... + an la n ) is astate, then the system is seen in state
ai with a probability of lai 12 .
• The state space of a compound system consisting of two subsystems is the
tensor product of the subsystem state spaces.
• In this formalism, the state transformations are length-preserving linear
mappings. It can be shown that these mappings are exactly the unitary
mappings in the state space.
In the above list, we have operations that can be done with quantum
systems (under this chosen formalism). We now present a somewhat sur-
prising result called the "No-Cloning Theorem" due to W. K. Wootters and
W. H. Zurek [91]. Consider a quantum system having n basis states lal),
... , la n ). Let us denote the state space by H n and specify that the state lal)
is a "blank sheet state". A unitary mapping in H n 0 H n is called a quantum
copy machine, if for any state Ix) E H n ,
The No-Cloning Theorem thus states that there is no allowed operation (uni-
tary mapping) that would produce a copy of an arbitrary quantum state.
Notice also that in the above proof, we did not use unitarity; only the linear-
ity of the time-evolution mapping was needed. If, however, we are satisfied
with cloning only the basis states, there is a solution: let I = {I, ... ,n} be
the set of indices. Partially defined mapping f : I x I -+ I x I, f(i, 1) = (i, i)
is clearly injective, so we can complete the definition (in many ways) such
that f becomes apermutation of I x I and still satisfies f (i, 1) = (i, i). Let
f(i,j) = (i',i'). Then the linear mapping defined by U lai) laj) = lad la)')
is apermutation of basis vectors of H n (>9 H n , and any such permutation is
unitary, as is easily verified. Moreover, U(lai) laI)) = lai) lai), so U is a copy
machine operation on basis vectors.
Remark 2.3.2. Notice that, when defining the unitary mapping U(lai) laI)) =
lai) lai), we do not need to assume that the second system looks like the first
one; the only thing we need is that the second system must have at least as
many basis states as the first one. We could, therefore, also define a unitary
mapping U lai) Ib l ) = lai) Ibi ), where Ib l ), ... , Ibm) (m 2 n) are the basis
states of some other system. What is interesting here is that we could regard
the second system as a measurement apparatus designed to observe the state
of the first system. Thus, Ib l ) could be interpreted as the "initial pointer
state", and U as a "measurement interaction" . Measurements that can be
described in this fashion, are called von Neumann-Lüders measurements. The
measurement interaction is also discussed in Section 8.4.3.
2.4 Observation
So far, we have tacitly assumed that quantum systems are used for proba-
bilistic information processing according to the following scheme:
• The system is first prepared in an initial basis state.
• Next, the unitary information processing is carried out.
• Finally, we observe the system to see the outcome.
What is missing in the above procedure is that we have not considered the
possibility of making an intermediate observation3 during unitary process-
ing. In other words, we have not discussed the effect of observation upon the
quantum system state. For a more systematic treatment of observation, the
3 In physics literat ure, the term "observation" is usually replaced with "measure-
ment".
2.4 Observation 23
reader is advised to study Section 8.3.2. For now, we will present, in a sim-
plified way, the most widely used method for handling state changes during
an observation procedure. Suppose that a system in state
and the first system is observed with outcome Xk (notice that the probability
of observing Xk is P(k) = "L,j'=l lakj 12 ). The projection postulate now implies
that the postobservation state of the whole system is
(2.10)
In other words, the initial state (2.9) of the system is projected to the sub-
space that corresponds to the observed state and renormalized to the unit
length. It is now worth strongly emphasizing that the state evolution from
(2.9) to (2.10) given by the projection postulate is not consistent with the
unitary time evolution, since unitary evolution is always reversible; but
there is no way to recover state (2.9) from (2.10). In fact, no explanation for
the observation process which is consistent with quantum mechanics (using a
certain interpretation) has ever been discovered. The difficulty arising when
trying to find such an explanation is usually referred to as the measurement
paradox 0/ quantum physics.
However, instead of having intermediate observations that cause a "col-
lapse of astate vector" from (2.9) to (2.10), we can have an auxiliary
system with unitary measurement interaction (basis state copy machine)
lXi) lXI) H lXi) lXi). That is, we replace the collapse (2.9) H (2.10) by a
measurement interaction, which turns state
n m n m
(LLaij lXi) IYj)) lXI) = LLaij lXi) IYj) lXI) (2.11)
i=l j=l i=l j=l
into
n m
4 The projection postulate can be seen as an ad hoc explanation for the state
transform during the measurement process.
24 2. Quantum Information
Even though the transformation from (2.11) to (2.12) appears different from
(2.9) H (2.10) at first glance, it has many similar features. In fact, using
notation P(k) = 2:7'=1IakjI2 again, we can rewrite (2.12) as
(2.13)
Let us now interpret (2.12) (and (2.13)): the third system which we intro-
duced shatters the whole system into n orthogonal subspaces, which are,
for any k E {1, ... ,n} spanned by vectors lXi) IYj) IXk), i E {1, ... ,n},
jE {I, ... ,m}. The left multipliers ofvectors Jp(k)lxk) are unit-length
vectors exactly the same as in (2.10). Moreover, the probability of observing
Xk in the right-most system of (2.12) is P(k). Therefore, it should be clear
that, if quantum information processing continues independently of the third
system, then the final probability distribution is the same if operation (2.9)
H (2.10) is replaced with operation (2.11) H (2.12). For this reason, we will
not consider intermediate observations in this book, but may refer to them
only as notational simplifications of procedure (2.11) H (2.12).
Anyway, it is an undenied fact that the observation procedure always
disturbs the original system. If it becomes necessary to refer to a quantum
system after observation, we will mainly adopt the projection postulate or
even assurne that the whole system is lost after observation.
aIO)+bll), (2.14)
but state (2.14) is unknown to Alice. Now Alice wants to send state (2.14)
to Bob.
One, at least theoretical, possibility is to send Bob the whole two-state
quantum system that is in state (2.14). We say that there is a quantum
channel from Alice to Bob if this is possible. Similarly, if Alice can send
classical bits to Bob, we say that there is a classical channel from Alice to
Bob.
Now, we assurne that there is no quantum channel from Alice to Bob, but
a classical one exists. Alice's task, to send state (2.14) to Bob appears quite
impossible. Notice that state (2.14) is unknown to Alice, so she cannot send
Bob instructions for constructing (2.14).
2.5 Quantum Teleportation 25
Alice may try, for instance, to observe her state and then send Bob the
outcome, 0 or 1. This attempt fails immediately if both a and bare nonzero;
Bob cannot reconstruct state (2.14). Alice cannot make many observations
either, because her observation always disturbs her qubit. Alice cannot make
copies of her qubit for many observations, since copying of an unknown quan-
tum state is impossible, as shown in Section 2.3.1. If an unlimited number of
observations were allowed to Alice, she could get arbitrarily precise approxi-
mations of the probabilities of seeing 0 and 1 as outcomes, which is to sa! that
she could get arbitrarily good approximations of numbers lal 2 and Ibl . But
even the exact values of lal 2 and Ibl 2 are not enough for Bob to reconstruct
state (2.14). This can be noticed immediately, for those values are identical
for states
1 1
-10)+ -11) (2.15)
y'2 y'2
and
1 1
-10) - -11). (2.16)
y'2 y'2
Nevertheless, states (2.15) and (2.16) can behave quite differently, as seen
by applying a Hadamard-Walsh gate (see Section 2.2) on both states: the
out comes will be 10) and 11), respectively (See Exercise 1).
In fact, it is impossible for Alice to send her quantum bit to Bob by only
using a classical channel. To see this, it is enough to notice that classical
information can be perfectly cloned: if there were a way to reconstruct state
(2.14) from some classical information (which state (2.14) itself determines),
then we would be able to make an unlimited number of reconstructions of
(2.14). But this means that we would be able to create a quantum copy
machine, and that was already proven impossible.
On the other hand, if Alice and Bob initially share an EPR pair, there is
a way to execute the required task. This protocol, introduced in [11], is called
quantum telepo'rtation. 5
We will now describe the teleportation protocol. The basic assumption is
that Alice and Bob have two qubits in EPR state
1 1
y'2100) + y'21 11 ). (2.17)
Notations are chosen such that the qubit on the left-hand side belongs to
Alice and the right-hand side qubit belongs to Bob. In addition to qubits
(2.17), Alice has her qubit to be teleported in state
5 One could argue that to create an EPR pair, Alice and Bob should have a
quantum channel. On the other hand, Alice and Bob could have a supply of
EPR pairs generated when they last met.
26 2. Quantum Information
aIO)+bll).
Recall that only the right-most qubit belongs to Bob and that Alice has fuH
access to the two qubits on the left.
Teleportation protocol
1. Alice performs the controHed not-operation on her qubit, using the left-
most one as the control qubit (see Example 2.2.2). State (2.18) becomes
then
a a b b
vl21000) + vl21011) + vl21110) + vl2 1101 ). (2.19)
3. Now Alice observes her two qubits. As the outcome, she sees 00, 01, 10,
and 11, each with a probability of ~. For brevity, we will refer to the
projection postulate and summarize the resulting state (depending on
the outcome) in the following table.
with Bob. Again we assurne that the qubit marked on the left is Alices, and
the other one is Bob's.
B - _(~ ~l
1
y'2 0 0-y'2
iz)
1 (2.23)
o -~ ~ 0
(verify that matrix B is unitary). It is then easy to verify, by direct cal-
culation, that the following table (extending the previous one) is correct:
b1 b2 State after Alice's operations State after Bob's operation
o0 y'2100) + y'2111) 100)
o1 ~ 110) + ~ 101) 101)
1 0 ~ 100) - ~ 111) 110)
1 1 ~ 110) - ~ 101) 111)
4. Bob observes his qubits. The table above shows that the bits b1 and b2
are recovered faithfully.
Remark 2.6.1. Regarding Alice's action in the beginning, it is sufficient and
necessary to force the qubits into orthonormal states (recall from the begin-
ning ofSection of2.2 that states 100),101),110), and 111) form an orthonormal
set). Whenever this can be done, there always exists a unitary mapping that
transforms these orthonormal states to basis states.
2.7 Exercises
1. Compute W 2 W 2 . ConcIude that W 2 applied to state (2.15) (resp., 2.16)
yields 10) (resp., 11))
2. Verify that matrix (2.23) is unitary.
3. Devices for Computation·
The reason for ealling the Turing maehine of the above definition determin-
istic is due to the form of the transition 8, which deseribes the dynamics (the
eomputation) of the maehine. Later we will introduee some other forms of
Turing maehines, but for now, we will give an interpretation for the above
definition, mainly using the notations of [64]. As a configuration of a Turing
machine we understand a triplet (ql. x, y), where ql E Q and x, y E A*. For
a eonfiguration c = (ql, x, y), we say that the Turing maehine is in state ql
and that the tape eontains word xy. If word y also begins with letter al, we
say that the maehine is scanning letter al or reading letter al. To deseribe
the dynamics determined by the transition function 8, we write x = WI a
and y = al W2, where a, al E A and wl. W2 E A *. Thus, C ean be written as
C = (ql, WI a, al W2), and the transition 8 defines also a transition rule from
one eonfiguration to another in a very natural way: if 8(ql. al) = (q2, a2, d),
then a eonfiguration
ean be transformed to
Value <5(qI, al, q2, a2, d) is interpreted as the probability that, when the ma-
chine in state ql reads symbol al, it will print a2, enter the state q2, and
move the head to direction d E { -1, 0, I}.
Definition 3.1.4. A probabilistic Turing machine M over alphabet A is a
sixtuple (Q, A, <5, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states respectively. It is required that for all (ql, al) E Q x A
~ <5(ql,al,q2,a2,d) = 1.
(Q2,a2,d)EQxAx {-I,O,I}
Prom this time on, to avoid fundamental difficulties in the notion of com-
putability, we will agree that all the values of <5(ql, al, q2, a2, d) are rational. l
The computation of a probabilistic Turing machine is not so straight-
forward a concept as the deterministic computation. We say that the con-
figuration c = (ql, Wl a, al W2) yields (or is followed by) any configuration
1 This agreement can, of course, be criticized, but the feasibility of a machine
working with arbitrary real number probabilities is also questionable.
3.1 Uniform Computation 33
(3.2)
over the basis configurations Cl, ... , Gm. As in the introductory chapter, we
can define a vector space, which we now call the configuration space, that has
all of the potential basis configurations as the basis vectors, and a general
configuration (3.2) is a vector in the configuration space having non-negative
coordinates that sum up to 1. On the other hand, there is now an essentially
more complicated feature compared to the introductory chapter: there is a
countable infinity of basis vectors, so our configuration space is countably
infinite-dimensional.
Regarding this latter representation of total configurations of a pro ba-
bilistic Turing machine, the reader may already guess how to introduce the
notion of quantum Turing machines, but we will still continue with pro ba-
bilistic computations and study some acceptance models.
When talking about time-bounded computation in connection with prob-
abilistic Turing machines, we usually assurne that all the computations have
the same length. In other words, we assurne that all the computations are
34 3. Devices for Computation
synchronized so weIl that they reach a halting configuration at the same time,
so that aIl the branches of the computation tree have the same length. We
can thus regard a probabilistic Turing machine as a facility for computing
the prob ability distribution of outcomes. For instance, if the purpose of a
particular probabilistic machine is to solve adecision problem, then some of
the computations may end up in an accepting state, but some mayaIso end
up in a rejecting state. What then do we mean by saying that a probabilistic
machine accepts astring or a language? In fact, there are several different
choices, some of which are given in the foIlowing definitions:
• Class NP is the family of languages S that can be accepted in polynomial
time by some probabilistic Turing machine M in the foIlowing sense: a word
w is in the language S if and only if M accepts w with nonzero probability
(see Remark 3.1.2).
• Class RP is the family of languages S that can be accepted in polynomial
time with some probabilistic Turing machine M in the foIlowing way: if
WES, then M accepts W with a probability of at least ~, but if w r:f- S,
then M always rejects w.
• Class coRP is the dass of languages consisting exactly of the complements
of those in RP. Notice that coRP is not the complement of RP among
all the languages.
• We define ZPP = RP n coRP .
• Class BPP is the family of languages S that are accepted by a probabilistic
Turing machine M such that if WES, then M accepts with a probability
of at least ~, and if w r:f- S, then M rejects with a probability of at least ~.
Remark 3.1.2. The definition of the dass NP (standing for nondeterministic
polynomial time) given here is not the usual one. A much more traditional
definition would be given by using the notion of a nondeterministic Turing
machine, which can be obtained from the probabilistic ones by ignoring the
probabilities. In other words, a nondeterministic Turing machine has a tran-
sition relation 0 s:;; Q x A x Q x A x { -1,0, I}, which teIls whether it is possible
for a configuration c to yield another configuration c'. More precisely, the fact
that (ql,al,q2,a2,d) E 0 means that, if the machine is in state ql scanning
symbol al, then it is possible to replace al with a2, move the head to direc-
tion d, and enter state q2' However, this model resembles the probabilistic
computation very dosely: indeed, the not ion of "computing c' from c in t
steps with nonzero probability" is replaced with the not ion that "there is a
possibility to compute c' form c in t steps" , which does not seem to make any
difference. The acceptance model for nondeterministic Turing machines also
looks like the one that we defined for probabilistic NP-machines. A word w is
accepted if and only if it is possible to reach an accepting final configuration
in polynomial time.
It can also be argued that the dass NP does not correspond very weIl
to our intuition of practical computation. For example, if each configuration
yields two distinct ones each with a prob ability of ~ and the computation
3.1 Uniform Computation 35
lasts t steps, there are 2t final configurations, each computed from the initial
one with a probability of f,:. However, we say that the machine accepts a
word if and only if at least one of these final configurations is accepting, but
it may happen that only one final configuration is accepting, and we can-
not distinguish the acceptance probabilities f,: (accepting) and 0 (rejecting)
practically without running the machine D(2 t ) times. But this would usually
make the computation last exponential time since, if the machine reads the
whole input, then t ~ n, where n is the length of the input. By its very
definition, each deterministic Thring machine is also a probabilistic one (al-
ways working with a probability of 1), and therefore P ~ NP. However,
it is a long-standing open problem in theoretical computer science whether
P 1= NP.
Remark 3.1.4. If a language S belongs to ZPP, then there are Monte Carlo
Thring machines MI and M 2 accepting Sand the complement of S respec-
tively. By combining these two machines, we obtain an algorithm that can
be repeatedly used to make the correct decision with certainty. For if MI
gives answer wES one time, it surely means that WES since MI has no
false positives. If M 2 gives an answer w E A* \ S, then we know that w 1:- S
since M 2 has no false positives either. In both cases, the probability of a false
answer is at most ~, so by repeating the procedure of running both machines
k times, we obtain with a probability of at least 1 - -f.,. the certainty that
either wES or w 1:- S. Thus, we can say that a ZPP-algorithm works like a
deterministic algorithm whose expected running time is polynomial. Notation
ZPP stands for Zero error Probability in Polynomial time. A ZPP-algorithm
is called a Las Vegas-algorithm.
Remark 3.1.5. The definition of BPP merely contains the idea that we ac-
cept languages or, to put it in other words, that we solve decision problems
using a probabilistic Thring machine that is required to give a correct answer
with a prob ability that is larger than ~. Notation BPP stands for Bounded
error Probability in Polynomial time. The constant ~ is arbitrary and any
C E (~, 1) would define the same dass. This is because we can efficiently
increase the success probability by defining a Thring machine that runs the
same computation several times, and then taking the majority of results as
the answer. It is also widely believed that the dass BPP is the dass of
36 3. Devices for Computation
problems that are efficiently solvable. This belief is known as the Extended
Church- Turing thesis.
(3.4)
where q E Q and Xi,Yi E A*. For a configuration (3.4) we say that q is the
current state, and that XiYi is the content 01 the ith tape. Let Xi = Viai and
Yi = biWi' Then we say that bi is the currently scanned symbol on the ith
tape. If
such that 6(q1, a1, q2, a2, d) gives the amplitude that whenever the machine is
in state q1 scanning symbol a1, it will replace a1 with a2, enter state q2, and
move the head to direction d E {-I, 0, I}.
(3.5)
computation in the next chapter. The reason for this choice is that, compared
to QTMs, a quantum circuit model is much more straightforward to use for
describing quantum algorithms. In fact, all of the quantum algorithms that
are studied in chapters 4-6 are given by using quantum circuit formalism.
We end this section by examining a couple of properties of quantum Thr-
ing machines. The first thing to consider is that the transition of a QTM
determines a unitary and, therefore, a reversible time evolution in the config-
uration space. However, ordinary Thring machines can be irreversible, too. 5
The quest ion is how powerful the reversible computation iso Can we design a
reversible Thring machine that performs each particular computational task?
A positive answer to this quest ion was first given by Lecerf in [53], who
intended to demonstrate that the Post Correspondence Problem 6 remains
undecidable even for injective morphisms. Lecerf's constructions were later
extended by Ruohonen [78], but Bennett in [8] was the first to give a model
for reversible computation simulating the original, possibly irreversible, com-
putation with constant slowdown but possibly with a huge increase in the
space consumed (see also [46]).
Bennett's work was at least partially motivated by a thermodynamic prob-
lem: according to Landauer [52], an irreversible overwriting of a bit causes
at least kT In 2 joules of energy dissipation. 7 This theoretical lower bound
can always be ignored by using reversible computation, which is possible
according to the results of Lecerf and Bennett.
Bennett's construction of a reversible Thring machine uses a three-tape
Thring machine with input tape, history tape, and output tape. Reversibility is
obtained by simulating the original machine on the input tape, thereby writ-
ing down the history of the computation, i.e., the transition rules that have
been used so far, onto the history tape. When the machine stops, the output
is copied from the input tape to the empty output tape, and the computation
is run backward (also a reversible procedure) to erase the history tape for
future use. The amount of space this construction consumes is proportional
to the computation time of the original computation, but by applying the
erasure üf the history tape recursively, the space requirement can be reduced
even to O(s(n) logt(n)) at the same time using time O(t(n) logt(n)), where
sand t are the original space and time consumption [9]. Für spacejtime
trade-offs for reversible computing, see also [54].
5 We call an ordinary Turing machine reversible if each configuration admits a
unique precessor.
6 The Post Correspondence Problem, or PCP for short, was among the first com-
putational problems that were shown to be undecidable. The undecidability of
the PCP was established by Emil Post in [71 J. The importance of the PCP lies
in the simple combinatorial formulation of the problem; the PCP is very useful
for establishing other undecidability results and studying the boundary between
decidability and undecidability.
7 Here k = 1.380658· 10- 23 J /K is Bolzmann's constant, and T is the absolute
temperature.
40 3. Devices for Computation
3.2 Circuits
Example 3.2.1. The Boolean circuit in Figure 3.1 computes the function 1:
18'~ -+ 18'~ defined by 1(0,0) = (0,0), 1(0,1) = (0,1), 1(1,0) = (0,1), and
1(1,1) = (1,0). Thus, I(Xl, X2) = (Yt,Y2), where Yl is Xl +X2 modulo 2, and
Y2 is the carry bit.
x r-------------~ )----+{ Y1
8 lF2 stands for the binary field with two elements 0 and 1, the addition and the
multiplication defined by 0 + 0 = 1 + 1 = 0,0 + 1 = 1, 0·0 = 0·1 = 0, 1·1 = l.
Thus -1 = 1 and 1- 1 = 1 in lF2.
9 Notations Xl /\ X2 and Xl V X2 are also used instead of /\(X1, X2) and V(X1, X2).
42 3. Devices for Computation
Xl _ _....._ _ yl
X2 _ _....._ _ y2
X3 _ _~--y3
where (Yl, ... , Yn) = C(Xl, ... , x n ). Moreover, in this construction we also
allow bit permutations that swap any two bits:
Toffoli gates, controlled not-gates and the constant O. But using the constant
1, the Toffoli gates can also be used to replace the not- and controlled not-
gates: T(I, 1, Xl) = (1,1, --,xd and T(I, Xl, X2) = (1, Xl, Xl -X2), so the Toffoli
gates with constants 0 and 1 are sufficient to simulate any Boolean circuit
with gates /\, V and --'. Since Boolean circuits can be used to build up an
arbitrary function !F2 --7 !Fr, we have obtained the following theorem.
Remark 3.2.1. A more straight forward proof for the above theorem was given
by Toffoli in [87]. It is interesting to note that there are universal two-qubit
gates for quantum computation, [33] but it can be shown that there are no
universal reversible two-bit gates. In fact, there are only 4! = 24 reversible
two-bit gates, and all of them are linear, Le., they can all be expressed as
T(x) = Ax + b, where b E !F~ and A is an invertible matrix over the binary
field. Thus, any function composed of them is also linear, but there are also
nonlinear reversible gates, such as the Toffoli gate.
We identify again the bit strings x E !Fr and an orthogonal basis {Ix) I
x E !Fr} of a 2m -dimensional Hilbert space H 2",. To represent linear map-
pings !Fr --7 !Fr, we adopt the coordinate representation Ix) = ei =
(0, ... , 1, ... ,of, where ei is a column vector having zero es elsewhere but
1 in the ith position, if the components of x = (x!, ... , x m ) form a binary
representation of number i-I.
A reversible gate f on m bits is apermutation of !Fr, so any reversible
gate also defines a linear mapping in H 2m. There is a 2m X 2m permutation
matrix M(f) 10 representing this mapping, M(f)ij = 1, if f(ej) = ei, and
M(f)ij = 0 otherwise.
Example 3.2.3. In!F~, we denote 1000) = (1,0, ... ,O)T, 1001) = (0,1, ... ,O)T,
... , 1111) = (0,0, ... , I)T. The matrix representation of the Toffoli gate is
M(T) =
Since reversible circuits are also quantum circuits, we have already dis-
covered the fact that whatever is computable by a Boolean circuit is also
computable by a quantum circuit. It is also interesting to compare the compu-
tational power of polynomial-size quantum circuits (that is, quantum circuits
containing polynomially many quantum gates) and polynomial-time QTMs.
A. Yao [92] has shown that their computational powers coincide.
It would be also interesting to know which kinds of gates are needed
for quantum computing. The very first answer to that was given by David
Deutsch, who demonstrated that there exists a three-qubit universal gate for
quantum computing [32].11
It has turned out that the controlled not-gate (see Example 2.2.2) plays
a most important role in quantum computing.
Theorem 3.2.3 ([5]). All quantum circuits can be constructed by using only
controlled not-gates and unary gates.
Remark 3.2.2. Even though 2-qubit gates are enough for quantum comput-
ing, 2-bit gates are not enough for classical reversible computing. This is quite
easy to see; recall Remark 3.2.1.
Theorem 3.2.4 ([80]). Aiz quantum circuits can be constructed (in the ap-
proximate sense) by using only ToJJoli gates and Hadamard- Walsh gates.
In the forthcoming chapters, we will use mainly the quantum circuit for-
malism for representing quantum algorithms. To conclude this section, we
mention a very important thorem of Solovay and Kitaev. [50]
Theorem 3.2.5. Assume that S is a finite set of unary quantum gates that
can approximate any unary quantum gate up to an arbitrary precision. There
exists a constant C depending on S only and c ~ 4 such that any unary quan-
tum gate can be approximated up to precision f by using at most C 10gC( ~)
gates from set S.
The above theorem, together with Theorem 3.2.3, implies that an n-gate
quantum circuit can be simulated by using O( n 10gC (:;)) gates from a univer-
sal set.
4. Fast Factorization
Let G = {gI, g2, ... , gn} be an abelian group (we will use the additive nota-
tions) and {Xl, X2,"" Xn} the characters of G (see Section 9.2). The func-
tions f : G --+ C form a complex vector space V, addition and scalar multipli-
cation are defined pointwise. If h, 12 E V, then the standard inner product
(see Section 9.3) of hand 12 is defined by
n
(!I 112) = L f;(gk)12(gk).
k=l
where Ci are complex numbers called the Fourier coefficients of f. The discrete
Fourier transform of fE V is another function [ E V defined by [(gi) = Ci'
Since the functions Bi form an orthonormal basis, we see easily that Ci
(Bi 11), so
(4.1)
(4.3)
(4.6)
and hence it is clear that in basis Igi), the matrix ofthe QFT is
(4.7)
= ~ (4.9)
yrs
k=11=1
Let us study a special case G = !Fr. All the characters of the additive group
of!Fr are
The elements x = (Xl, X2, ... , X m ) E F2" have a very natural quantum repre-
sentation by m qubits:
Thus, it follows from Lemma 4.1.1 that it suffices to find QFT on F 2 , and the
m-fold tensor product of that mapping will perform QFT on F2". However,
QFT on F 2 with representation {IO) ,11l} is defined by
1
10) r--+ yI2(IO) + 11)),
1
11) r--+ yI2(IO) -11)),
H = yI2
1 (1 1) 1 -1 .
(4.10)
For any m, the matrix H m is also called a Hadamard matrix. QFT (4.10)
is also called a Hadamard transform, Walsh transform, or Hadamard- Walsh
transform.
where x and y are some representatives of cosets (See Sections 9.1.3, 9.1.4,
and 9.2). To simplify the notations, we will denote any coset k+nZ by number
the k. Using these notations,
4.1 Quantum Fourier Transform 53
10),11),12), ... , In - 1)
{IO) , 11), ... , Inl - I)} and {IO) , 11) , ... , In2 - I)}
of Zn, and Zn2 and we also have the routines for mappings
(4.12)
y=o
= (10) +e~ 11))(10) +e* 11))···(10) +e 2 ;';':i 11)) (4.13)
4.1 Quantum Fourier Transform 55
Proof. Representing Iy) = Iy'b) = Iy') Ib), where y' are the m - 1 most sig-
nificant bits and b the least significant bit of y, we can divide the sum into
two parts:
(4.14)
(4.15)
ni ni ni
exp( 21 ), ... ,exp( 2 m - 2 )' exp( 2m - I )
conditionally. That is, for each 1 E {I, 2, ... ,m -I}, a phase factor exp( 2!~1)
is introduced to the mth bit if and only if mth and lth qubits are both l.
This procedure will yield state
The same procedure will be applied from right to left to qubits at locations
m - 1, ... , 2, and 1: for a qubit at location 1 we first perform a Hadamard
transform to get the qubit in state
ni
exp( 21- k )
conditionally if and only if the lth and kth qubit are both 1. That can be
achieved by mapping
10) 10) f-t 10) 10)
10) 11) f-t 10) 11)
11) 10) f-t 11) 10)
11) 11) f-t e""#T 11) 11),
which acts on the lth and kth qubits. The matrix of the mapping can be
written as
1000 )
0100
cl>k,l = ( 0 0 1 ~i •
(4.16)
OOOe2T=
Ignoring the swapping (Xm-l, Xm -2, ... , xo) f-t (Xo, Xl, ... , xm-d, this pro-
cedure results in the network in Figure 4.1 with ~m(m + 1) gates.
In Figure 4.1, the subindices of the gates cl> are omitted for typographical
reasons. The 4>-gates are (from left to right) cl>m-l.m-2, cl>m-l,m-3, cl>m-l,O,
cl>m-2,m-3, cl>m-2,O, and cl>m-3,O.
4.1 Quantum Fourier Transform 57
Xm-l ~
X m -2 Yl
X m -3 Y2
Xo Ym-l
The naive method for computing the Fourier transform is to use formula
(4.17) straightforwardly to find all elements in (4.19), and the time complexity
given by this method is O((2m)2), as one can easily see.
A significant improvement is obtained by using fast Fourier transform
(FFT), whose core can be expressed in a decomposition
58 4. Fast Factorization
which very closely resembles the decomposition of Lemma 4.1.3, and essen-
tially states that the vector (4.19) can be computed by combining two Fourier
transforms in Z2m-1. The time complexity obtained by recursively apply-
ing the above decomposition is O(m2 m ), significantly better than the naive
method.
The problem of computing QFT is quite different: in a very typical situ-
ation we have, instead of (4.18), a quantum superposition
and QFT operates on the coefficients of (4.20). At the same time, the physical
representation size of (4.20) is small; in system (4.20) there are only m qubits,
yet there are 2m coefficients Ci. Earlier we learned that QFT in Z2'" can be
done in time O(m 2 ) (Hadamard-Walsh transform can be done even in time
O(m)), which is exponentially separate from the classical counterparts of
Fourier transform. But the major difference is that the physical representation
sizes of (4.18) and (4.20) are also exponentially separate, the first one taking
il(2 m ) bits and the latter one m qubits. Later, we will learn that the key
idea behind many interesting fast quantum algorithms is the use of quantum
parallelism to convey some information of interest into the coefficients of
(4.20) and then compute QFT rapidly.
Given two prime numbers p and q, it is an easy task to compute the prod-
uct n = pq. The naive algorithm already has quadratic time complexity
O(max{lpl ,lqIP), but, by using more sophisticated methods, even perfor-
mance O(max{lpl , IqIP+f) is reachable for any E > 0 (see [28]). On the other
hand, the inverse problem, factorization, seems to be extremely hard to solve.
aT == 1 (mod n),
which means that n divides a T - 1. Of course this does not yet offer us a
method for extracting a nontrivial factor of n, but if r is even, we can easily
factorize a T - 1:
a~ == 1 (mod n),
which would be absurd by the very definition of order. But it may still happen
that n just divides a~ + 1, and does not share any factor with a~ -1. In this
case, the factar given by Euclid's algorithm would also be n. On the other
hand, n I d + 1 means that
a~ == -1 (mod n),
i.e., a~ is a nontrivial square root of 1 modulo n. In the next section we will use
elementary number theory to show that, for a randomly chosen (with uniform
distribution) element a E Z~, the probability that r = ord(a) is even, and
that a~ t=
-1 (mod n) is at least !.Consequently, assuming that r = ord(a)
could be rapidly extracted, we could, with a reasonable probability, find a
nontrivial factor of n.
60 4. Fast Factorization
as claimed. D
We can use the previous lemma to estimate the probability of having an odd
order.
Lemma 4.2.2. The probability that r = ord(a) is odd for a uniformly chosen
a E Z~ is at most ~.
Proof. Let (al, ... , ak) be the decomposition (4.24) of an element a E Z~.
Let ri = ordp:i (ai). Since r = lcm{rl' ... ' rk} (Exercise 4), r is odd if and
only if each ri is odd.
Putting s = 0 in Lemma 4.2.1, we learn that the probability of having a
random ai E Zp:i with odd order is at most ~. Therefore, the probability PI
of having odd r is at most
as claimed. D
What about the probability that a~ == -1 (mod n)? It turns out that
Lemma 4.2.1 is useful for estimating that probability, too.
Lemma 4.2.3. Let n = p~l ... p~k be the prime decomposition of an odd n
and k 2: 2. If r = ordn (a) is even, then the probability that a ~ == -1 (mod n)
is at most 2 kl_ 1 .
for each i. Let ri = ordp~i (ai), so r = lcm{rl"'" rd. We write r = 2St and
ri = 2Siti, where 2 f t and 2 f ti' From the fact that ri I r for each i, it follows
that Si ::; S, but the congruencies (4.25) can only hold if Si = S for every i.
For if Si < S for some i, then also ri I ~ (the assumption k 2:: 2 is needed
here!), which implies that
(4.26)
But (4.26) together with (4.25) gives 1 == -1 (mod p~i), which is absurd since
Pi =1= 2.
Therefore, the probability P2 that d == -1 (mod n) is at most the prob-
ability that Si = S for each i, which is
L P (SI = l) rr P(Si = sd
00 k
P2 =
1=0 i=2
rr
k 00
= rr
k
i=2
P(Si = s) ::; 2k- 1
1
by Lemma 4.2.l. D
For our purposes, the result of the previous lemma would be sufficient. On
the other hand, it would be theoretically interesting to see if the result could
be improved. In fact, this is the case:
Lemma 4.2.5. Let all the notations be as before. Then the probability that
r = ordn(a) is odd or a f == -1 (mod n) is at most 2k~1'
Proof. Recalling the previous notations, the result follows directly from the
observation that r is odd if and only if each Si = 0, and that if a f == -1
(mod n) happens, then all numbers Si are equal. The former event is a subcase
of the latter, so we may conclude that
P(r is odd or a f == -1 (mod n))
1
::; P(all numbers Si are equal)::; 2 k - 1'
The latter inequality follows directly from that one of Lemma 4.2.3. D
4.2 Shor's Algorithm for Factoring Numbers 63
Corollary 4.2.1. Let all the notations be as in Lemma 4.2.4. The probability
that that r is even and a ~ t= -1 (mod n) is at least ~.
Remark 4.2.2. By studying the group Z21 we can see that the probability
limit of the previous lemma is optimal.
By Lemma 4.2.4 we know that, for a randomly (and uniformly) chosen a E Z~,
the probability that r = ordn (a) is even and a ~ t= -1 is at least 196 for any
odd n having at least two distinct prime factors. It was already discussed in
Section 4.2.1 that the knowledge about the order of such an element a allows
us to find efficiently a nontrivial factor of n. Thus, a procedure for comput-
ing the order would provide an efficient probabilistic method for finding the
factors. Therefore, we have a good reason to believe that finding r = ordn (a)
is computationally a very difficult task.
It is dear that finding r = ordn (a) reduces to finding the period of function
f : Z -+ Zn defined by
f(k) = ak (mod n). (4.27)
)in L
m-l
(4.29)
where SI is the greatest integer for which sIr + l < m. It is clear that Si
cannot vary very much: we always have !!!:r - 1 - !:.r. -< SI < !!!:
r
- !:...
r
3. Compute the inverse QFT on 2 m to get
1
-~~-~e
~~ 1 ~ 2 .. ip(qr+l)
m Ip)
I0,I)
Viii 1=0 q=O Viii p=O
r-1 m-I s
= -1 '"' ~ e~
~ '"' m ~e ~
'"' m ) IaI).
Ip (4.31 )
m l=O p=O q=O
~L Ik) 10).
k=O
~ (1 0) 11) + 11) 17) + 12) 14) + 13) 113 ) + 14) 11) + ... + 115) 113) )
= ~((IO) + 14) + 18) + 112») 11)
+( 11) + 15) + 19) + 113) ) 17)
+( 12) + 16) + 110) + 114) ) 14)
+( 13) + 17) + 111) + 115 ») 113 »).
4.3 The Correctness Probability 65
() L m 1 ~
Pp= r-l - e1m
Le
SI
2
2 .. ipqr 1
m =-
1 L Le
r-
1
1 SI
2
2 .. ipqr 1
m (4.32)
2 m2
1=0 q=O 1=0 q=O
The sum in (4.33) runs over all of the characters of Zmlr evaluated at p. Thus
(see Section 9.2.1),
mlr-l
L ~ ==
emIr { ill.,
r
..
If P = 0 m Zmlr
q=O
o otherwise,
and therefore
Thus, in the case r I m, observation of (4.31) can give only some p in the
set {O· ~,1 . ~, ... , (r - 1) . ~}, any such with a probability of ~. Now
that we know m and have learned p = d· ~ by observing (4.31), we may
try to find r by canceling -;. = ~ into an irreducible fraction using Euclid's
algorithm. Unfortunately, this works for certain only if gcd(d, r) = 1; in case
gcd( d, r) > 1 we will just get a factor of r as the nominator of .;;, = ~.
Fortunately, we can show, by using the number-theoretical results, that the
probability of having gcd(d, r) = 1 does not converge to zero too fast.
Example 4.3.1. In Example 4.2.2, the period r = 4 divided m = 16, and the
only elements that could be observed were 0, 4, 8, 12, the multiples of 4 = 146.
However, 0 and 8 did not give the period, since the multipliers 0 and 2 share
a factor 'with 4.
is small enough and gcd(d, r) = 1, then ~ is a convergent of .;;" and all the
convergents of -;. can be found efficiently by using Euclid's algorithm. (see
Section 9.4.2).
We will now fix m in such a way that the continued fraction method will
apply. For any integer d E {O, 1, ... ,r - 1}, there is always a unique integer
p such that the inequality
1 m 1
-- <p-d- <- (4.34)
2 r - 2
Imp dl 1 1 1
-:;: ::; 2m ::; 2n 2 < 2r 2 '
for some d.
4.3 The Cürrectness Probability 67
für any fixed d by the periodicity of sin 2. Since we now assume that x =
pr - dm takes values in [- ~, H we will estimate
sin2 n(sl+l)
f(x) = . 2:'
sm - m
It is not difficult to show that f(x) is an even function, taking the maximum
(SI + 1)2 at x = 0 and minima in [-~, ~l at the end points ±~. Therefore,
x) > sm
• 2 7rr (
+ 1)
f(
2m SI
- . 2 -7rr
sm 2m
4 -
f(x)~-
7f2
(m)
r
2( 7f r 2)
1-(--)
2m
, (4.37)
(4.39)
68 4. Fast Factorization
in (4.38) tends to 1 as fn --+ 0, and for n 2 100, (4.39) is already greater than
0.9999. Thus, the probability of observing a fixed p such that - ~ < P - d~ ::;
~ is at least ~ ~ for any n 2 100.
But there are exactly r such values p E Zm that -~ < p - d~ ::; ~;
namely, the nearest integers to d~ for each d E {O, 1, ... ,r - I}. Therefore,
the probability of seeing some is at least ~ if n 2 100. D
r T 2.50637
ip r < e log log r + 1og Iogr '
-(-)
Lemma 4.3.2. FoT' T' 2: 19, the probability that, foT' a uniformly chosen d E
{O, 1, ... ,r -I}, gcd(d,r) = 1 holds is at least 41og~ogn'
Therefore,
ip(r) 1 1
-- > > -----:---
r 4 log log r 4 log log n '
• The probability that observing (4.31) will give a p such that Ip - dr;!' I < !
is at least ~ (Lemma 4.3.1) .
• The probability that gcd(d, r) = 1 is at least 41og\ogn (Lemma 4.3.2).
Lemma 4.3.3. The probability that the quantum algorithm finds the order
of an element of Zn is at least
1 1
20 loglogn'
Remark 4.3.1. It was already mentioned in Section 4.1.4 that many interest-
ing quantum algorithms are based on conveying some information of interest
into the coefficients of quantum superposition and then applying fast QFT.
The period finding in Shor's factoring algorithm is also based on this method,
as we can see: to use quantum parallelism, it prepared a superposition
1 rn-I
Vm L Ik) 10).
k=O
The information of interest, namely, the period, was moved into the coeffi-
cients by computing k t-+ a k (mod n) to get
Jm L
rn-I
Ik) la k )
k=O
1 r - I SI
= Vm L L Iqr+l) lai).
1=0 q=O
Llqr+l) (4.41)
q=o
Step 2 can be done in time O(C(n)3).2 For step 3 we first check whether
a has order less than 19, which can be done in time O(C(n)3). Then
we use Hadamard-Walsh transform in Zm, which can be done in time
O(C(m)) = O(C(n)). After that, computing ak (mod n) can be done in time
O(C(m)C(n)2) = O(C(n)3). QFT in Zm will be done in O(C(m)2) steps, and
finally the computation ofthe convergents can be done in time O(C(n)3). Step
4 can also be done in time O(C(n)3).
The overall complexity of the above algorithm is therefore O(C(n)3),
but the success probability is only guaranteed to be at least 57(-1og -) =
-11ogn
57Cog~(n)). Thus, by running the above algorithm O(log(C(n))) times, we ob-
tain a method that extracts a nontrivial factor of n in time O(C(n)3log C(n))
with high probability.
4.4 Excercises
f() I, if g = gi,
g = { O,otherwise.
If fis viewed as a superposition (see connection between formulae (4.3)
and (4.4)), which is the state of H corresponding to f?
2 Recall that the notation f(n) stands for the length of number n; that is, the
number of the digits needed to represent n. This notation, of course, depends on
a partitular number system chosen, but in different systems (excluding the unary
system) the length differs only by a multiplicative constant, and this difference
can be embedded into the O-notation.
4.4 Excercises 71
be the function given by F«k l , k 2)) = aln 2kl + a2nlk2, where al (resp.
a2) given by the Chinese Remainder Theorem, is the multiplicative in-
verse of n2 (resp. nl) modulo nl (resp. n2). Show that F is an isomor-
phism.
3. a) Let n and k be fixed natural numbers. Device a polynomial-time algo-
rithm that decides whether n = x k for some integer x.
b)Based on the above algorithm, device a polynomial-time algorithm
which tells whether a given natural number n is a nontrivial power of
another number.
4. Let n = p~l ... p~k be the prime factorization of n and a E Z~. Let
(al, ... , ak) be the decomposition of a given by the Chinese Remainder
Theorem and r i = ordp ;; (ai). Show that ordn (a) = lern{rl, ... , rd.
5. a)Prove that le ix_11 2 = 4sin2~.
b) Prove that
5.1.1 Preliminaries
The results represented here will also apply if F 2 is replaced with any finite
field F, so we will, for a short moment, describe them in a more general form.
More mathematical background can be found in Section 9.3.
Here we temporarily use another notation for the inner product: if x =
(XI, ... , x m ) and y = (Yl, ... , Ym) are elements of Fm, we denote their inner
product by
hEH hEH
= (_1)h 1 ·Y L(-l)h· Y = (_1)h 1 · Y S,
hEH
hence S = o. D
For a subgroup H of lFm , let T be some set that consists of exactly one ele-
ment from each coset (see Section 9.1.2) of H. Such a set is called a transversal
of H in lFm and it is clear that ITI = [lFm : HJ = PB"; I = 2;; . It is also easy to
verify that, since lFm = HEB HJ..., the equation IlFml = IHI·IHJ...I must hold.
2. Compute p to get
ffm
1
I: Ix) Ip(x)) = ffm I: I: It + x) Ip(t)).
1
(5.3)
2 "'ElF;" 2 tET "'EH
Lemma 5.1.2. Let t ::; d and Y1, ... , Yt be randomly chosen vectors, with
uniform distribution, in a d-dimensional vector space over lF 2. Then the prob-
ability that Y1, ... , Yt are linearly independent is at least i.
Proof. The cardinality of a d-dimensional vector space over lF 2 is 2d . There-
fore, the prob ability that {Yd is a linearly independent set is 2~d1 , since only
choosing Y1 = 0 makes {Yd dependent. Suppose now that Y1, ... , Yi have
been chosen in such a way that S = {Y1, ... , Yi-d is a linearly independent
set. Now S generates a subspace of dimension i - 1. Hence, there are 2i - 1
choices for Yi that make S U {Yd linearly dependent. Thus, the probability
that the set {Yl, ... , Yt} is an independent set is
2d - 20 2d _ 21
P = 2d 2d
(5.4)
Then
5.1 Generalized Simon's Algorithm 77
Now
In -
1
2p - ~
i=2
t
< "ln 1 + -.-
( 1)
<" -.- <
t 1
2' - 1 - ~ 2' - 1 - ~ - . 2'
i=2 i=2 4
"-31-.
t 4
< -3 "~ ----:-
i=2
2'
00 1
412
3 2 3'
so
1
2p < eaa < 2, hence p > 1
4:' o
Remark 5.1.1. Notice that if we build another algorithm (call it A') which
repeats Algorithm A whenever the output is "failure" , then A' cannot give
us a wrong answer, but we cannot find apriori upper bound for the number
of repeats it has to make. Instead, we can say that on average, the number
of repeats is at most four. Such an algorithm is called a Las Vegas algorithm,
cf. Section 3.1.2. An interesting method for finding the basis in polynomial
time with certainty is described in [18J.
5.2 Examples
We will now demonstrate how the hidden subgroup problem can be applied
to computational problems. These examples are due to [59].
This problem lies at the very heart of Shor's factorization algorithm: Let n
be a large integer and a another integer such that gcd( a, n) = 1. Denote
r = ordn(a), i.e., r is the least positive integer that satisfies a r == 1 (mod n).
In this problem, we have a group Z, and the hidden subgroup is rZ, whose
generator r should be found.
The function p(x) = aX + nZ satisfies Simon's promise, since
p(x) = p(y)
{::::::} aX + nZ = aY + nZ
{::::::} n I (a X - aY )
{::::::} r I (x - y)
{::::::} x + rZ = y + rZ.
Because Z is an infinite group, we cannot directly solve this problem by using
the algorithm of the previous section, but using finite groups Z21 instead of
Z already gives us approximations that are good enough, as shown in the
previous chapter.
p(x,y) = gXa-Y
satisfies Simon's promise, can be verified easily:
80 5. Finding the Hidden Subgroup
5.3 Exercises
Let us consider a children's game called hiding the key: one player hides the
key in the house, and the others try to find it. The one who has hidden the key
is permanently advising the others by using phrases like "freezing", "cold",
"warm", and "hot", depending on how elose the seekers are to the hiding
place. Without this advice, the game would obviously last much longer. Or,
can you develop a strategy for finding the key without searching through the
entire house?
There are many problems in computer science that elosely resemble searching
for a tiny key in a large house. We will shortly discuss the problem of finding
a solution for an NP-complete 3-satisfiability problem: we are given a propo-
sitional expression in a conjunctive normal form; each elause is a disjunction
of three literals (a Boolean variable or a negation of a Boolean variable). In
the original form of the problem, the task is to find a satisfying truth as-
signment, if any such exists. Let us then imagine an advisor who always has
enough power to tell at once whether or not a given Boolean expression has a
satisfying truth assignment. If such an advisor were provided, finding a sat-
isfying assignment is no longer a difficult problem: we could just substitute 1
for some variable and ask the advisor whether or not the resulting Boolean
expression with less variables had a demanded assignment. If our choice was
incorrect, we would flip the substituted value and proceed on recursively.
Unfortunately, the advisor's problem to tell whether a satisfying valuation
exists is an NP-complete problem, and in light of our present knowledge, it
is very unlikely that there would be a fast solution to this problem in the
real world. But let us continue our thinking experiment by assuming that
somebody, let us call hirn a verifier, knows a satisfying assignment for a
Boolean expression but is not willing to tell it to uso Quite surprisingly,
there exist so-called zero-knowledge protocols (see [79J for more details), which
the verifier can use to guarantee that he really knows a satisfying valuation
without revealing even a single bit of his knowledge. Thus, it is possible that
84 6. Grover's Search Algorithm
we are quite sure that a satisfying truth assignment exists, yet we do not have
a clue what it might bel The obvious strategy to find a satisfying assignment
is to search through all of the possible assignments. But if there are n Boolean
variables in the expression, there are 2n assignments, and because we do not
have a supernatural advisor, our task seems quite hopeless for large n, at
least in the general case. In fact, no faster method than an exhaustive search
is known in the general case.
Remark 6.1.1. The most effective classical procedure for solving generic NP-
complete problems seems to be that one described by Uwe Schönig [83].
and our search problem is to find a x E 1F2' such that J(x) = 1 (if any such
x exists).
Notice that, with the assumption that J is computable in polynomial
time, this model is enough to represent NP problems because it includes the
NP-complete 3-satisfiability problem. This model is also assigned [41] to an
unordered database search, where we have to find a specific item in a huge
database. Here the database consists of 2 n items, and J is the function giving
a value of 1 to a required item and 0 to the rest, hereby telling us whether
the item under investigation was the required one.
Another simplification, a huge one this time, is to assurne that J is a so-
called blackbox function, i.e., we do not know how to compute J, but we can
query J, and the value is returned to us instantly in one computational step.
With that assumption, we will demonstate in the next chapter how to derive
a lower bound for the number of queries to J to find an item x such that
J(x) = 1. In the next chapter, we will, in fact, present a general strategy for
finding lower bounds for the number of queries concerning other goals, too.
Now we will study probabilistic search, i.e., we will omit the requirement
that the search stragegy will always give such a value x that J(x) = 1 but
will do this only with a nonvanishing probability. This means that the search
algorithm will give the required x with at least some constant prob ability
o < p :s:; 1 for any n. This can be seen as a natural generalization of the
search which with certainty returns the required item.
Remark 6.1.2. Provided that the probabilistic search strategy is rapid, we use
very standard argumentation to say that we can find x rapidly in practice: the
prob ability that, after mattempts we have notfound x, is at most (l_p)m :s:;
e- pm , which is smaller than any given f > 0, when m > -~. Thus, we
6.1 Search Problems 85
can reduce the error probability p to any other positive constant c just by
repeating the original search a constant number of times.
f () { 1, if x = y, (6.1)
y x = 0, otherwise.
If we draw disjoint elements xI, ... , Xk E 1F2' with uniform distribution, the
probability of finding y is ~, so we would need at least pN queries to find y
with a probability of at least p. Using nonuniform distributions will not offer
any relief:
ProoJ. Let 1y be as in (6.1) and Py(k) be the probability that At" returns
y using k queries. By assumption Py(k) 2: p, and we will demonstrate that
there is some y E 1F2' such that Py(k) :::; W.
First, by using induction, we show that
L Py(k) :::; k + 1.
yEIF2'
If k = 0, then At" gives any x E 1F2' with some probability Pro, and thus
L Py(O) = L Py = 1.
yEIF2' yEIF2'
L Py(k - 1) :::; k.
yEIF2'
86 6. Grover's Search Algorithm
On the kth query A J" queries f(y) with some probability qy, and therefore
Py(k) ::::: Py(k - 1) + qy. Thus,
Because there are N = 2n different choices for y, there must exist one with
P. (k) < k + 1.
y - N
where ffi means addition modulo 2 or, in other words, exclusive or -operation.
We can now easily see that (6.2) defines a unitary mapping. In fact, vector
set
V
1
!Ort
2··
L
"'ElF;'
Ix) 10) (6.3)
beginning with 10) 10) and using Hadamard-Walsh transform H n (cf. Section
4.1.2). After that, one could make a single query QJ" to obtain state
6.1 Search Problems 87
V
Mn
1
2··
L
"'ElF,
Ix) 10 EB fy(x)) =
V
Mn
1
2··
L
"'ElF,
Ix) Ify(x)). (6.4)
But if we now observe the last qubit of state (6.4), we would only see 1 (and
after that, observing the first register, get the required y) with a prob ability
of 21n , so we would not gain any advantage over just guessing y.
But quantum search can improve the probabilistic search, as we will now
demonstrate. Having state (6.3), we could flip the target bit (to 11)), and
then apply the Hadamard transform to that bit to get astate
~
v2 n
L Ix) ~(IO) -11)) = v2~(
v2 n +1
L Ix) 10) - L Ix) 11)). (6.5)
"'ElF, "'ElF, "'ElF,
Ifx =I- y, then Qfy Ix) 10) = Ix) 10) and Qfy Ix) 11) = Ix) 11), but Qfy Iy) 10) =
Iy) 11) and Qfy Iy) 11) = Iy) 10); so, querying fy by the query operator Qfy
on state (6.5) would give us astate
(6.7)
So far we have seen that applying the query operator only once in a quantum
superposition (6.4), we get state (6.7) ignoring the target qubit. We continue
as follows: we write (6.7) in form
~(",~,IX) - 21Y) ),
and apply Hadamard H n transform to get
10) - 2: L (-l)""y
"'ElF,
Ix)
(6.8)
88 6. Grover's Search Algorithm
= ~(
2 1
2: (-l)OO·Y Ix) -1 0)) 10)
IF n
(6.11)
ooE 2
and
respeetively.
Finally, observing (6.12) and (6.13) will give us y with probabilities 2~
and 2~~1 respeetively. Keeping in mind the probabilities to see states (6.10)
and (6.11), we find that the total probability of observing y is
which is approximately 2.5 times better than 22n , the best we eould get by
a randomized algorithm that queries I y only onee (see the proof of Lemma
6.1.1).
6.2 Grover's Amplification Method 89
A query operator Qf1l which is used to call for values of I y uses n qubits for
the source register and 1 for the target. We will also need a quantum operator
Rn defined on n qubits and operating as Rn 10) = -10) and Rn Ix) = Ix),
if x t- o. If we index the rows and columns of a matrix by elements of lF2'
(ordered as binary numbers), we can easily express Rn as a 2n X 2n matrix,
-100 ... 0
010 ... 0
001. .. 0 (6.14)
000 ... 1
But we require that the operations on a quantum circuit should be loeal, Le.,
there should be a fixed upper bound on number of qubits on which each gate
operates. Let us, therefore, pay some attention on how to decompose (6.14)
into local operations.
The decomposition can obviously be made by making use of function
10 : lF2' ---+ lF2 , which is 1 at 0 and 0 everywhere else. As discussed in the
earlier section, a quantum circuit F n for this function can be constructed by
using O(n) simple quantum gates (which operate on at most three qubits)
and some, say k n , ancilla qubits. To summarize: we can obtain a quantum
circuit Fn on n + k n qubits operating as
Using first the reverse circuit F;;l and then H 2 on the target qubit will give
us
for the modified query operator, again omitting the ancilla qubit.
(6.17)
(HnRnHn):z:y = L (HnRn):z:z(Hn)zy
:Z:EIF2'
z w
= 2~ L (-l):Z:·Z(Rn)zz(-l)z.y
ZEIF2'
= 2~ (-2 + L (_l)(:Z:+Y)·Z)
zEIF2'
={ 1-
-~,.ifx=FY
2 n , If X = y.
HnRnHn = ( 1=~. .
.
2n 1 =~
..
.
. ..
. . =~)
2n· • . 2n
.
..
,
(6.18)
'ljJ=
1
Mn L
v2·· :z:EIF2'
Ix).
Thus, using the notations of Chapter 8, we can write P =1 'ljJ)('ljJ I, but this
notation is not crucial here. It is more essential that representation (6.18)
gives us an easy method for finding the effect of -HnRnHn on a general
superposition
(6.19)
Writing
92 6. Grover's Search Algorithm
(6.20)
is the decomposition of (6.19) into two orthogonal vectors: the first belongs to
the subspace spanned by 'l/Jj the second belongs to the orthogonal complement
of that subspace. In fact, it is clear that the first summand of the right-hand
side of (6.20) belongs to the subspace generated by 'I/J, and the orthogonality
of the summands is easy to verify:
(A L Ix) I L(Cy - A) Iy))
= L LA*(c:z: - A)(x I y)
:z:ElF2' ylF2'
and
1
~ -~I__- L_ _~~_ _- L_ _.-~_ _- L_ _ _ _- - L
I
ffn '---'.2A
,--, I
- ffn ~
~
I
ffn
- Jn Jn Jn,
{
r-+ 2A + :::0 3.
1
~
y = (0, ... ,0,1,0,1) with a prob ability :::0 29n by just a single query. This is
approximately 4.5 times better than a classical randomized search can do.
94 6. Grover's Search Algarithm
In this section, we will find out the effect of iterative use of mapping e n =
-HnRnHn Vf, but instead of a blackbox function I : lF2' ~ lF 2 that assumes
only one solution, we will study a general I having k solutions (recall that
here a solution means a vector x E lF2' such that I(x) = 1).
Let the notation T <:: lF2' stand for the set of solutions, and F = lF2' \ T
for the set of non-solutions. Thus ITI = k and IFI = 2n - k. Assume that
after riteration steps, the state is
(6.24)
(6.25)
6.2 Grover's Amplification Method 95
must hold for each r. That is to say that each point (t r , Ir) lies in the ellipse
defined by equation (6.25). Therefore, we can write
{
tr = ~ sinOn
Ir = v'2!-k cos Or
The boundary condition gives us that sin2 00 = 2";.' and it is easy to verify
that the solution of the recursion is
tr = ~ sin(rw + 00),
{
Ir = v'2!-k cos(rw + 00),
where 00 E [0,11"/2] and w E [0,11"] are chosen in such a way that sin200 = 2";.
and cos w = 1 - ~~. In fact, then cos w = 1 - 2 sin 2 4>0 = cos 200 , so we have
obtained the following lemma.
Lemma 6.2.1. Solution 01 (6.24) with boundary conditions to = 10 = Jn
is
tr = ~ sin (2r + 1)00 ),
{ (6.27)
Ir = v'2!-k cos «2r + 1)00),
We would like to find a suitable value for r to maximize the probability for
finding a solution. Since there are k solutions, the probability of seeing one
solution is
(6.28)
7'017'0
(2r + 1)80 = -
2
{=? r = - -
2
+ -46
0 '
Theorem 6.2.1. Let f : lF~ ---+ lF 2 such that there are k elements x E lF~
satisfying f(x) = 1. Assume that 0 < k :::; ~ ·2 n ,l and let 60 E [0,7'0/3) be
chosen such that sin 2 60 = 2':-. :::; ~. After l4~o J iterations of G n on an initial
superposition
1
l2n"
v,(,··
L
"'ElF;'
Ix)
Proof. The probability of seeing a desired element is given by sin 2 ((2r+ 1)80),
and we just saw that r = - ~ + 4~o would a give of probability 1. Thus, we
only have to estimate the error when - ~ + 4~o is replaced by l4~o J.
Clearly
Thus for k = i.
2n we can find a solution with certainty using G n only once,
which is clearly impossible using any classical search strategy.
In a typical situation we, unfortunately, do not know the value of k in
advance. In the next section, we present a simplified version of a method due
to M. Boyer, G. Brassard, P. H0yer and A. Tapp [17] in order to find the
required element even if k is not known.
~
~cos
((2 r+ 1))
a- _ sin(2ma)
..
r=O 2sma
Pr = ~ _ sin( 4mOo) .
2 4m sin(20o)
Proof. In the previous section we saw that, after riterations of G n , the
probability of seeing a solution is sin2 ((2r + 1)00 ), Thus, if r E [0, m - 1] is
chosen uniformly, then the probability of seeing a solution is
m-I
Pm = ~ L sin 2 ((2r + 1)00 )
m r=O
1
L
m-I
= - (1 - cos((2r + 1)200 ))
2m r=O
1 sin(4mOo)
=
2 4msin(20o)
according to Lemma 6.3.l. o
Remark 6.3.1. If m >
-
~2f) ) ' then
Sln\""vO)
_.,.,.1....,....,..
sin(20o)
= 1
2sinOocosOo
= 2n
2Jk(2 n -
<
k) -
rzn < v'2n,
Vk -
6.3 Utilizing Grover's Search Method 99
Vf2r'
> 1-:--;-
m - k -> --:--:-:c
sin(20o)'
Remark 6.3.2. Since the above algorithm is guaranteed to work with a prob-
ability of at least ~, we can say that on average the solution can be found
after four attempts. In each attempt, the number of queries to f is at most
ffn.
A linear improvement of the above algorithm can be obtained byemplying
the methods of [17]. In any case, the above algorithm results in the following
theorem.
Theorem 6.3.1. By using a quantum circuit making O( ffn) queries to a
blackbox function f, one can decide with nonvanishing correctness probability
if there is an element x E F2' such that f (x) = l.
In the next chapter we will present a clever technique formulated by
R. Beals, H. Buhrman, R. eleve, M. Mosca, and R. de Wolf [6], which will
allow us to demonstrate that the above algorithm is, in fact, optimal up to
100 6. Grover's Search Algorithm
a constant factor. Any quantum algorithm that can discover, with nonvan-
ishing probability, whether a solution to f exists uses n( ffn) queries to f.
This result concerns blackbox functions, so it does not imply a complexity
lower bound for computable functions (see Remark 6.1.3).
On the other hand, blackbox function f can be replaced with a com-
putable function to get the foHowing theorem.
Theorem 6.3.2. By using a quantum circuit, any problem in NP can be
solved with nonvanishing correctness probability in time O( ffnp( n)), where
p is a polynomial depending on the particular problem.
In the above theorem, polynomial p( n) is essentiaHy that the nondeterministic
computation needs to solve the problem (see Section 3.1.2).
Grover's search algorithm has been cunningly used for several purposes.
To mention a few examples, see [19] for quantum counting algorithm, and
[35] for minimum finding algorithm. In [17] the authors outline a generalized
search algorithm for sets other than 1F2'.
Remark 6.3.3. Recently, L. Grover and T. Rudolph have argued [42] that
many algoritms based on the "standard quantum search method" fail to be
nontrivial in the sense that the speedup provided could be obtained just as
weH by dividing the search space into suitable parts, and then performing
the "standard search" in parallel.
7. Complexity Lower Bounds for Quantum
Circuits
has some property or not (for instance, in Section 6.1.2 we were inter-
ested in whether there was an element y E F2' such that f(y) = 1).
This quantum circuit can then be seen as a device which is given vector
(f(0), f(l), ... , f(2 n - 1)) (the values of 1) as an input, and it gives output
"yes" or "no" respectively if f has the property or not. In other words, the
quantum circuit can then be seen as a device for computing a {O, 1}-valued
function on input vector (f(0), f(l), ... , f(2 n - 1)) (unknown to us), which
is to say that the circuit is a device for computing a Boolean function on 2n
input variables f(O), f(l), ... , f(2 n - 1).
Although lot of things about Boolean functions are unknown to us, fortu-
nately, we do know many things about them; see [64] for discussion. It turns
out that the polynomial representations of Boolean functions will give us the
desired information about the number of blackbox queries needed to find out
how many queries are required to find out some property of f. The bound
will be expressed in terms of polynomial representation degree for quantum
circuits that exactly (with a prob ability of 1) compute the property (Boolean
102 7. Complexity Lower Bounds for Quantum Circuits
°
way: the value of function X K on a vector x = (XQ, Xl, ... , xN-d is the
product Xk 1 Xk2 •.•.• Xkp which is interpreted as the number or 1. This may
require some explanation: x is an element in lFr, and each of its component
Xi is an element of the binary field lF 2 • Thus, the product XklXk2 . . . . • Xkl
also belongs to lF 2 , but embedding lF 2 in C, we may interpret the product
as also belonging to C. It is natural to call functions X K monomials and to
denote X0 = 1 (an empty product having no factors at all is interpreted as
1). If IKI = l, the degree of a monomial X K = X k1 X k2 ..• X kl is defined to be
deg XK = l. A linear combination of monomials Xkl' •.• , XK is naturally
B
denoted by
(7.2)
and is called a polynomial. The degree deg P of polynomial (7.2) is defined
as the highest degree of a monomial occurring in (7.2), except in the case of
zero polynomial p = 0, whose degree we symbolically define to be -00.
7.2 Polynomial Representations 103
Areader may now wonder why we would consider such a simple concept
as a polynomial so precisely? The problematic point is that, in a finite field,
the same function can be defined by many different polynomials. For example,
both elements of the binary field JF 2 satisfy equation x 2 = x and, thus, non-
constant polynomial x 2 - X + 1 behaves like a constant polynomial 1.
We will, therefore, define the product ofmonomials X K and XL by consid-
ering X K and XL as functions JFf -+ JF 2 . It is plain to see that this will result
in definition XKXL = XKUL. The product of two polynomials is defined as
usual: if
and
P1 P2 = LLCidjXkiXlj'
i=l j=1
The product differs slightly from the ordinary product of polynomials. Take,
for example, K = {O, I} and L = {l, 2}. Then
F(x) = F(7r(x))
holds.
Clearly, any linear eombination of symmetrie functions is again symmetrie,
thus, symmetrie functions form a subspaee of W C V. We will now find a basis
for W. Let P be a nonzero polynomial representing asymmetrie function.
Then P ean be represented as a sum of homogenous polynomials 2
2 A homogenous polynomial is a linear combination of monomials having the same
degree.
7.2 Polynomial Representations 105
where the sum is taken over all subsets of {O, 1, ... ,N - I} having a cardi-
nality of i. Because Qi is invariant under variable permutations, and because
representation as a polynomial is unique, we must have that Ck 1 ,k2, ... ,ki = Ci
independent of the choiee of {k l , k 2 , ..• , kd. That is to say, symmetrie poly-
nomials
Vo = 1,
Vl = X o +Xl + ... +XN-b
V2 = XOXl + X OX 2 + ... + X n - 2 XN-l
which teIls us that Vi(x) = (~) if i ~ k, and Vi(x) = 0 if i > k. This leads us
to an important observation whieh will be used later.
106 7. Complexity Lower Bounds for Quantum Circuits
Pw(wt(x)) = P(x)
for each xE Ff.
Proof. Beeause P is symmetrie, it ean be represented as
P = Co Vo + Cl VI + ... Cd Vd .
The claim follows from the fact that Vi (x) = (wt~"')) is a polynomial of wt (x)
having degree i. D
There are many eomplexity measures for a Boolean funetion: the number of
Boolean gates -', 1\, and V needed to implement the function (Section 3.2.1)
and deeision tree eomplexity, to mention a few. For our purposes, the ideal
eomplexity measure is the degree of the multivariate polynomial representing
the function. We follow the notations of [6] in the definition below.
1
IB(x) - P(x)1 ::; '3
for each xE Ff.
Clearly, there may be many polynomials whieh approximate a single Boolean
function B, but we are interested in the minimum degree of an approximating
polynomial. Therefore, following [6], we give the following definition.
Definition 7.2.4. The polynomial approximation degree of a Boolean func-
tion B is
r(B) = min{12k - N + 111 Bw(k) =f. Bw(k + 1) and 0 :::; k :::; N - I}.
Thus, r(B) measures how elose to weight N/2 function B w ehanges value: if
Bw(k) =f. Bw(k + 1) for some k approximately N /2, then r(B) is low. Reeall
that Bw(k) =f. Bw(k + 1) means that, if wt(x) = k, then value B(x) ehanges
if we flip one more co ordinate of x to 1.
Example 7.2.1. Consider function OR, whose only change oeeurs when the
weight of the argument inereases from 0 to 1. Thus, r (0 R) = N -1. Similarly
we see that the only jump for function AND oeeurs when the weight of x
inereases from N - 1 to N. Therefore, we have that r(AND) = N - 1.
108 7. Complexity Lower Bounds for Quantum Circuits
We are now ready to present the idea of [6], which connects quantum circuits
computing Boolean functions and polynomials representing and approximat-
ing Boolean functions.
We consider blackbox functions / : F~ -+ F 2 , and as earlier, quantum
blackbox queries are modelled by using a query operator Qf operating as
There mayaiso be more qubits for other computational purposes, but without
violating generality, we may assume that a blackbox query takes place on fixed
qubits. Therefore, we can assume that a general state of a quantum circuit
computing a property of / is a superposition of states of form
of some number, say m qubits, which are needed for other computational
purposes. There are 2n +m +1 different states (7.3), and we can expand the
definition of Qf to include all states (7.3) by operating on Iw) as an identity
operator; but, we will just denote the enlargened operator again by Qf:
Remark 1.3.2. Notice that the proof of Lemma 7.3.1 does not utilize the fact
that each Ui is unitary; but rather, only the linearity is needed. In fact, if we
could use nonlinear operators as weIl, reader may easily verify that we could
have much faster growth in the representation degree of coefficients.
The following theorem from [6J finally connects quantum circuits and
polynomial representations.
Theorem 7.3.1. Let N = 2n , J : F2' ~ F 2 an arbitrary blackbox Junction
and Q a quantum circuits that computes a Boolean Junction B on N variables
X o = J(O), Xl = J(I), ... , XN-I = J(N - 1).
1. IJ Q computes B with a probability oJ 1, using T queries to J, then
T ~ deg(B)/2.
2. IJ Q computes B with a correctness probability oJ at least ~ using T
queries to J, then T ~ deg(B)/2.
where the sum is taken over all those states where the right-most bit of w
is 1. By considering separately the real and imaginary parts, we see that
P(Xo, ... ,XN-d, in fact, can be represented as a polynomial of variables
X o, ... , XN-l, with real coefficients and degree at most 2T.
If Q computes B exactly, then (7.9) is 1 if and only if Bis 1, and, therefore,
P(Xo, ... , XN-l) = B. Thus, 2T 2: deg(P) = deg(B), and (1) follows. If Q
computes B with a probability of at least ~,then (7.9) is at most ~ apart from
1 when B is 1, and similarly, (7.9) is at most ~ apart from 0 when B is~This
means that (7.9) is a polynomial approximating B, so 2T 2: deg(P) 2: deg(B),
and (2) follows. D
We can apply the main theorem (Theorem 7.3.1) of the previous section by
taking B = OR to arrive at the following theorem.
The above theorem shows that the method of using Grover's search algo-
rithm to decide whether a solution for f exists, is optimal up to a constant
factor. Similar results can be derived for other functions whose representation
degree is known.
Proof. Take B = PARITY, apply Paturi's theorem (Theorem 7.2.1), and use
reasoning similar to the above proof. D
Remark 7.3.3. In [6], the authors also derive results analogous to the previous
ones for so-called Las Vegas quantum circuits, which must always give the
112 7. Cümplexity Lüwer Büunds für Quantum Circuits
correct answer, but that can, sometimes, be "ignorant", i.e., give the answer
"I do not know", with a prob ability of at most ~. Moreover, in [6] it is shown
that if a quantum circuit computes a Boolean function B on variables X o,
Xl, ... , X N - 1 using T queries to 1 with a nonvanishing error prob ability,
then there is a dassical deterministic algorithm that computes B exactly and
uses O(T 6 ) queries to I. Moreover, if B is symmetrie, then O(T 6 ) can even
be replaced with O(T 2 ).
Remark 7.3.4. It can be shown that only a vanishing ratio of Boolean func-
tions on N variables have representation degree lower than N and that the
same holds true for the approximation degree [2]. Thus, for almost any B, it
requires Jl(N) queries to 1 to compute B on 1(0), 1(1), ... , I(N - 1) even
with a bounded error probability.
Remark 7.3.5. With a little effort, we can utilize the results represented in
this section to orade Turing Machine computation: for almost all orades X,
Np x is not induded in BQPx. However, this does not imply that NP is
not induded in BQP, but it offers some reason to believe so. On the other
hand, blackbox functions are the most "unstructured" examples that one can
have - in [29] (see also [21]) W. van Dam has an example on breaking the
blackbox lower bound by replacing an arbitrary 1 with a more structured
function.
Remark 7.3.6. The lower bound by polynomial degree is not tight: A. Am-
bainis has proved the existence of a Boolean function that has degree M, but
quantum query complexity Jl(M1. 32 1...). For the construction of the function
and the lower bound method using the quantum adversary technique, see [3].
8. Appendix A: Quantum Physics
(8.4)
over all the configurations Xi. In the above mixed state, PI + ... + Pn = 1
and the system can be seen in state Xi with a probability of Pi. A quantum
mechanical counterpart of this system is called an n-level quantum system. To
describe such a system, we choose a basis lXI), ... , Ix n ) of an n-dimensional
Hilbert space H n , and a general state of an n-Ievel quantum system is de-
scribed by a vector
(8.5)
5 "There are not two worlds, one of light and waves, one of matter and corpuscles.
There is only a single universe. Some of its properties can be accounted for by
wave theory, and others by the corpuscular theory." (A citation of the lecture
given by Louis de Broglie at the Nobel prize award ceremony in 1929.)
6 In formal language theory, we also call a finite set representing information an
alphabet.
116 8. Appendix A: Quantum Physics
If the system starts in state 10) or 11) and undergoes the time evolution, the
probability of observing 0 or 1 will be ~ in both cases. On the other hand,
if the system starts at state 10) and undergoes the time evolution twice, the
state will be
1 1 1 1 1 1
)2()21 0)+ )21 1))+ )2()21 0)- )21 1))
1 1 1 1
= 210) + 211) + 210) - 2 11) = 10),
and the probability of observing 0 becomes 1 again. The effect that the ampli-
tudes of 11) cancel each other, is called destructive interference and the effect
that the coefficients of 10) amplify each other is called constructive interfer-
ence. Destructive interference cannot occur in the evolution of the probability
distribution (8.4) since all the coefficients are always non-negative real num-
bers. A probabilistic counterpart to the quantum time evolution would be
1 1
[0] H 2[0] + 2[1],
1 1
[1] H 2[0] + 2[1],
but the double time evolution beginning with state [0] would give state ~ [0] +
~ [1] as the outcome.
On the other hand, eaeh inner produet is indueed by some orthonormal basis
as in (8.6). For, if (- I .) stands for an arbitrary inner product, we may
use the Gram-Sehmidt proeess (see Seetion 9.3) to find a basis {bI, ... , bn }
orthonormal with respect to (. I .). Then
(x I y) = (xlbl + ... + xnbn I ylbl + ... Ynbn)
= XiYI + ... +X~Yn
The inner produet induees a vector norm in a very natural way:
Ilxll=~·
Informally speaking, the completeness of the veetor spaee H means that
there are enough veetors in H, i.e., at least one for eaeh limit proeess:
lim
m,n--+(X)
Ilx m - xnll = 0,
there exists a vector x E H such that
8.2.2 Operators
(B + T)x = Bx + Tx
and
(aB)x = a(Bx)
for each x E H.
Definition 8.2.2. For any operator T : H -+ H, the adjoint operator T* is
defined by the requirement
(x I Ty) = (T*x I y)
for all x, y EH.
Remark 8.2.1. With a fixed basis {eI, ... , e n } of H n , any operator T can be
represented as an n x n matrix over the field of complex numbers. It is not
difficult to see that the matrix representing the adjoint operator T* is the
transposed complex conjugate of the matrix representing T.
8.2 Mathematical Framework for Quantum Theory 119
If {Xl, ... , X n } and {Y1, ... , Yn} are orthonormal bases of H n , then it
can be shown that
n n
(8.7)
i=l i=l
see Exercise 1.
By (8.7), the not ion of trace is well-defined. Moreover, it is clear that the
trace is linear. Notice also that in the matrix representation of T, the trace
is the sum of the diagonal elements.
Praof. Assume that ,,\ f= A', Ax = "\x, and Ax' = "\' x'. Since ,,\ and A' are
real by the previous lemma,
Lemma 8.2.8. Let {Xl, ... , xd and {YI, ... , Yk} be two sets of vectors in
H. 1f (Xi I Xj) = (Yi I Yj) for each i and j; then there is a unitary mapping
U : H ~ H such that Yi = U Xi.
Proof. Let W be the subspace of H generated by vectors Xl, ... , Xk. There
exists a subset of {Xl,"" xd which forms a basis of W. Without loss of
generality, we may assume that this subset is {Xl, ... , Xk'} for some k'::; k.
Now we define a mapping U : W ~ H by UXi = Yi for each i E {1, ... , k'}
and extend this into a linear mapping in the only possible way.
We will first show that YI, ... , Yk' is a basis of Im(U). Clearly those vec-
tors generate Im(U), so it remains to show that they are linearly independent.
For that purpose, we assume that
(8.9)
for some coefficients al, ... , ak', For any i E {l, ... ,k'}, we compute the
inner product of (8.9) by Yi thus getting
(8.10)
(8.12)
means that
(Xj I Xi) = Cl(i) (Xj I Xl) + ... + Ck(i)' (Xj I Xk')' (8.13)
(8.14)
Ix)(YI Z = (y I z)x.
It is plain to see that, if Ilxll = 1, then I x)(x I is the projection onto the
one-dimensional subspace generated by X.
8.2 Mathematical Framewürk für Quantum Theüry 123
Since this holds for each ZEHn, we have that IAx)(Byl= A Ix)(yl B*.
(X I Ty) = (Tx I y) = A* (x I y) = 0,
which means that Ty E W 1- as weIl. Therefore, we may study the restrictions
T : W -+ Wand T : W 1- -+ W 1- and apply the induction hypothesis to find
an orthonormal basis for Wand W 1- consisting of eigenvectors of T. Since
H n = W EI:l W 1-, the union of these bases satisfies the claim. D
eigenvalues. Numbers Ai are real by Theorem 8.2.1, but they are not neces-
sarily distinct. The set of eigenvalues is called a spectrum, and it can be easily
verified (Exercise 3) that
(8.16)
(8.17)
where P l , ... , Pn , are the projections onto the eigenspaces of A~, ... , A~,. It
is easy to see that the spectral representation (8.17) is unique. 8
Recall that all the eigenvectors belonging to distinct eigenvalues are or-
thogonal, which implies that, in representation (8.17), all the projections are
projections to mutually orthogonal subspaces. Therefore, PiPj = 0 whenever
i i= j. It follows that, if p is a polynomial, then
We also generalize (8.18): if f : ffi. -+ <C is any function and (8.17) is the
spectral representation of T, we define
(8.20)
( 01)
10 = 1· (!!)
!! - (!-!)
_! ! .
1.
By definition,
which is to say that eiT is unitary. We will now show that each unitary
mapping can be represented as eiT , where T is a self-adjoint operator. To do
this, we first derive an auxiliary result which also has independent interest.
Lemma 8.2.9. Self-adjoint opemtors A and B commute if and only if A
and B share an orthonormal eigenvector basis.
Proof. If {al, ... , an} is a set of orthonormal eigenvectors of both A and B,
then, by using the corresponding eigenvalues, we can write
126 8. Appendix A: Quantum Physics
and
Hence,
Assume, then, that AB = BA. Let Al, ... , Ah be all the distinct eigenvalues
of A, and let a~k), ... , a~~ be orthonormal eigenvectors belonging to Ak. For
any a;k), we have
ABa(k)
1.
= BAa(k)
t
= BAka(k)
'l
= AkBa(k)
'l ,
(x I x) =I U*Ux) =
(x (Ux lUx) = (AX lAX) = IAI 2 (x I x),
and it follows that lAI = 1.
We decompose U into "real and imaginary parts" by writing U = A+iB,
where A = ~(U + U*) and B = ~(U - U*). Note that A and Bare now
self-adjoint, commuting operators. According to the previous lemma, we have
spectral representations
and
8.2 Mathematical Framewürk für Quantum Theüry 127
Since the eigenvalues of U are of absolute value 1, it follows that the eigen-
values of A and B, i.e., numbers .Ai and {.Li, have absolute values of at most
1. But since A and Bare self-adjoint, numbers .Ai and {.Li are also real. Thus,
U can be written as
where .Aj and {.Lj are real numbers in the interval [-1, 1J. Because the eigen-
values of U have absolute value 1, we must have .AJ + {.LJ = 1. Thus, there
exists a unique 8j E [0, 21f) for each j such that .Aj = cos 8j and {.Lj = sin 8j .
It follows that .Aj + i{.Lj = e ie ; and U can be expressed as U = e iH , where
Representation
W2
1 (1 1)
= J2 1-1
and
128 8. Appendix A: Quantum Physics
w2 = l·lxl)(Xll +(-l)'lx-l)(X-ll,
which can be also written as W2 = ei .Q 1 Xl)(Xl 1 +ei '1!" 1 X-l)(X-l I, so
W2 = eiT , where
R =
o
(COS() -Sin())
sin () cos ()
be the rotation matrix. Clearly, Ro is unitary. Now that we know that unitary
matrices also have eigenvectors forming an orthonormal basis, we can find
them direct1y without seeking a decomposition Ro = A + iB. It is an easy
task to verify that the eigenvalues of Ro are e±iO, and the corresponding
eigenvectors can be chosen as X+ = ~(i,l) and x_ = ~(-i,l)T. The
corresponding projections are given by
0
Ho = ( -i() 0
i()) .
Example 8.2.6. Let a and ß be real numbers. The phase shift matrix
is unitary, as is easily verified (Notice that these matrices are closely related
to phase flips, see Example 2.1.4). The spectral decomposition is now trivial:
Condition 4 is still unused. Its place is found after the following theorem by
M. Stone; see [65J.
Theorem 8.3.1. 1f each Ut satisfies 1~4, then there is a unique self-adjoint
operator H such that
Ut = e- itH .
Recall that the exponential function of a self-adjoint operator A can be de-
fined by using the spectral representation (8.19), or even as series
A 1 2 1 3
e =1+A+,A
2.
+,A
3.
+ ....
It is clear that the definitions coincide in a finite-dimensional H n . By Stone's
theorem, the time evolution can be expressed as
x(t) = e-itHx(O),
from which we get
8.3.2 Observables
We again study a general state
(8.23)
The basis states were introduced in connection with the distinguishable prop-
erties which we are interested in, and a general state, Le., a superposition of
basis states, induces a probability distribution of the basis states: a basis
state Xi is observed with a probability of lail 2 • This will be generalized as
folIows:
132 8. Appendix A: Quantum Physics
In the above definition, inequality m :::; n must, of course, hold. We equip the
subspaces Ei with distinct real number "labels" fh, ... , Bm . For each vector
xE H n , there is a unique representation
such that Xi E Ei. Instead of observing spaces Ei, we can talk ab out observ-
ing the labels Bi: we say that by observing E, value Bi will be seen with a
probability of Ilxi W. 12
Example 8.3.2. The not ion of observing the basis states Xi is a special case:
the observable E can be defined as
and the probability of observing the label Bi when the system state is Ix), is
given by
12 The original starting point is, of course, controversial: instead of talking ab out
observing subspaces, one talks about measuring some physical quantity and ob-
taining areal number as the outcome. However, here it seems to be logically
more consistent to introduce these quantity values as labels of subspaces.
8.3 Quantum States as Hilbert Space Vectors 133
1 = (x I Ix).
Thus, we may extend E to be defined on sets of real numbers by setting
E( {()i,{tj}) = E«()i) + E(Bj ) if Bi -:f. Bj . We can easily see that, as a mapping
from subsets of IR to L(H), E satisfies the following conditions:
1. For each X C IR, E(X) is a projection.
2. E(IR) = I.
3. E(UXi ) = L E(Xi ) for a disjoint sequence Xi of sets.
A mapping E that satisfies the above conditions is called a projection-valued
measure. 13 Viewing an observable as a projection-valued measure represents
an alternative way of thinking than Definition (8.3.1). In that definition, we
merely equipped each subspace with areal number, a label, which is thought
to be observed instead of the actual subspace. Here we associate a projection
which defines a subspace to a set of labels (real numbers). If X is a set of
real numbers, then (x I E(X)x) is just the probability of seeing a number in
the X.
It follows from Condition 3 that, if X and Y are disjoint sets, then the
corresponding projections E(X) and E(Y) project onto mutually orthogonal
subspaces. In fact, since E(X)(Hn ) is a subspace of E(X U Y)(Hn), we must
have E(X)E(X U Y) = E(X) and; therefore, E(X) = E(X)E(X U Y) =
E(X) + E(X)E(Y), so E(X)E(Y) = O.
The third and perhaps the most traditional view of the observables can
be achieved by regarding an observable as a self-adjoint operator
equipped with areal number label Bi. This point of view most closely
resembles the original idea of thinking about a quantum state as a gen-
eralization of a probability distribution. In fact, observing the state of a
quantum system can be seen as learning the value of an observable, which
is defined as a collection of one-dimensional subspaces spanned by the basis
states .
• An observable can be seen as a projection-valued measure E which maps
a set of labels to a projection that defines a subspace. This viewpoint
offers us a method for generalizing the not ion of an observable into positive
operator-valued measure.
• The traditional view of an observable is to define an observable as a self-
adjoint operator by the spectral representation
All these viewpoints are logically quite equal. In what follows, we will use all
of them, choosing the one which is best-suited for each situation.
Proof. Notice first that, for self-adjoint A and B, [A, B]* = -[A, B], so the
commutator of A and B can be written as [A, B] = iG, where G = -i[A, B]
is self-adjoint. If J-lA and J-lB are the expected values üf A and B respectively,
we get
Var",(A)Var",(B) = II(A - J-lA)xWII(B - J-lB)xI1 2
:::: 1((A-J-lA)X I (B-J-lB)X)1 2
8.3 Quantum States as Hilbert Space Vectors 135
as claimed. o
Remark 8.3.2. A classical example of noncommutative observables are posi-
tion and momentum. It turns out that the commutator of these two observ-
ables is a homothety, i.e., multiplication by a nonzero constant, so Lemma
8.3.2 demonstrates that, in any state, the product of the variances of position
and moment um has a positive lower bound. This is known as Heisenberg's
uncertainty principle.
The uncertainty principle (Theorem 8.3.2) was criticized by D. Deutsch
[30] on the grounds that the lower bound provided by Theorem 8.3.2 is not
generally fixed, but depends on the state x. Deutsch hirnself in [30] gave
astate-independent lower bound for the sum of the uncertaint'ies of two
observables. Here we present an improved version of this bound courtesy of
Maassen and Uffink [56].
We say that a sequence (PI, ... , Pn) of real numbers is a probability dis-
tribution if Pi 2 0 and PI + ... + Pn = 1. For a probability distribution
P = (PI, ... ,Pn), the Shannon entropy of distribution is defined to be
(8.26)
We claim that
1 n n
lim -log (LPr+1)
r~O r
= LPi lOgPi. (8.27)
i=l i=l
In fact, since I:~l Pi = 1, the nominator and numerator of the left-hand side
of (8.27) tend to 0 as r does. Therefore, L'Hospital rule applies and
I (I: n :+1 ) 1 n n
lim og .=lP. = lim
"n r+l "p:+llogp·
L..J •
= "p.logp.•.
• L..J'
r~O r r~O L..i=l Pi i=l i=l
and that the claim is equivalent to exp(-H(P) - H(Q)) ::; c2 , which can be
written as
rr rr
n
i=1
pfi
n
i=1
q'!i ::; c2 . (8.28)
-a
n
(I: I(ai I x W
i=1
r 1
a ::; C 2;;a . (8.29)
Since (8.29) can also be raised to some positive power k, we will try to fix
values such that (8.29) would begin to resemble the right-hand side of (8.26).
Therefore, we will search for numbers r, s, and k such that a~1 = 2(s + 1),
ka-;;l = ~, a = 2(r + 1), -~ = ~, and k 2 -;;a = 2. To satisfy 1 :::: a :::: 2, we
must choose r E [- ~,O]. A simple calculation shows that choosing s = - 2r~1 '
k = - 2r:2 will suffice. Thus, (8.29) can be rewritten as
Remark 8.3.3. The lower bound of Theorem 8.3.3 does not depend on the
system state x, but it may depend on the representation of observables
and
14 Riesz's theorem is a special case of Stein's interpolation formula. The elementary
proof given by Riesz [73] is based on Hölder's inequality.
138 8. Appendix A: Quantum Physics
A = L c~) Ix)(xl·
mEB;
Now we do not care very much about the labels c~); they can be fixed in any
manner such that both sequences ~) consist of 2m distinct real numbers.
This, in fact, is to regard an observable as a decomposition of H 2 m into
2m mutually orthogonal one-dimensional subspaces. For any Ix) E B l and
Ix') E B2, we have
I(x I x')1 = ~,
y2 m
as is easily verified. Thus, the entropie uncertainty relation gives
That is, the sum of the uncertainties of observing an m-qubit state in bases
B l and B 2 mutually is at least m bits! On the other hand, the uncertainty
of a single observation cannot naturally be more than m bits: an observation
with respect to any basis will give us a sequence x = Xl •.. X m E {a, I}m with
some probabilities of Pm such that EmElF!{' Pm = 1. The Shannon entropy is
given by
-L Pm log Pm ,
mElF!{'
which achieves the maximum at Pm = 2;" for each x E {a, I}m. Thus, the
maximum uncertainty is m bits.
8.4 Quantum States as Operators 139
Inspired by the word "bracket", we will call the row vectors (xl bra-vectors
and column vectors Iy) ket-vectors. 15
Remark 8.4.1. A coordinate independent notion of dual vectors can be
achieved by using the theorem of Frechet and F. Riesz. Any vector x E H n
defines a linear function Im : H n ----7 C by
(jJ
define the density matrix belonging to x by
(8.32)
The spectral representation (8.32) in fact tells us that each state (density
matrix) is a linear combination of one-dimensional projections, which are
alternative notations for quantum states which are defined as unit-length
vectors. We will call the states corresponding to one-dimensional projections
vector states or pure states. States that are not pure are referred to as mixed.
More information about density matrices can be easily obtained: using
the eigenvector basis, it is trivial to see that Al + ... + An = Tr(p) = 1. Since
p is positive, we also see that Ai = (Xi pXi) 2': O. Thus, we can say that each
1
but the converse is not so dear. For there is one unfortunate feature in the
spectral decomposition (8.32). Recall that (8.32) is unique if and only if p
has distinct eigenvalues. For instance, if Al = A2 = A, then it is easy to verify
that we can always replace
with
where
,
Xl = Xl cosa - X2
.
SIna,
x~ = xlsina+x2cosa,
and a is areal number (Exercise 5). Therefore, we cannot generally say,
without additional information, that a density matrix represents a prob ability
distribution of orthogonal state vectors.
We describe the time evolution of the generalized states briefly at first.
Afterwards, in Section 8.4.4, we will study the issue of describing the time
evolution in more detail. Here we merely use decomposition (8.32) straight-
forwardly: for a pure state 1 x) (x 1 there is a unitary mapping U(t) such
that
(see Exercise 6). By extending this linearly for mixed states p, we have
p(t) = U(t)p(O)U(t)*,
where U(t) is the time-evolution mapping ofthe system. By using representa-
tion U(t) = e- itH , it is quite easy to verify (Exercise 7) that the Schrödinger
equation takes the form
In this section we will demonstrate how to fit the concept of observables into
general quantum states. For that purpose, we will first regard an observable
as a self-adjoint operator A with spectral representation
where each E((}i) =1 Xi)(Xi 1is a projection such that the vectors Xl, ... , Xn
form an orthonormal basis of H n . If the quantum system is in astate X (pure
state 1 x)(x 1), we can write X = CiX1 + ... + Cnx n , where Ci = (Xi 1 X). Thus,
the probability of observing a label (}i is
n
(x 1E((}i)X) = \ 2: (Xj 1x)Xj 1E((}i)X)
j=1
n
= 2: \Xj 1 E((}i)(X 1 Xj)X)
j=l
n
= 2:<Xj 1E((}i) Ix)(xi Xj)
j=l
(8.34)
Equation 8.34 will be the basis for handling the observations on mixed
states: Any state T can be expressed as a convex combination
(8.35)
of pure states 1Xi) (Xi I, and we use the linearity of the trace to join the concept
of observables together with the not ion of states as self-adjoint mappings. Let
T be astate of a quantum system and E an observable which is considered
as a projection-valued measure. The probability that areal number (label) in
set X is observed is Tr(E(X)T) = Tr(TE(X)). Notice that, since Tr(T) = 1
and E(X) is a projection, we always have 0 :s; Tr(TE(X)) :s; 1. This can
be immediately seen by using the linearity of the trace and the spectral
representation (8.35) for T:
8.4 Quantum States as Operators 143
n n
Tr(TE(X» = LAiTr(lxi)(xil E(X» = LAi(xi I E(X)Xi).
i=1 i=1
Inequalities 0 :S Tr(TE(X)) :S 1 now directly follow from the facts that
o :S (Xi I E(X)Xi) :S 1, Ai ~ 0, and that L:~=I Ai = 1.
Example 8.4.1. Let E = {EI' ... ' E m } be an observable (a collection of mu-
tually orthogonal subspaces), as in Definition 8.3.1, and Pi the projection
onto Ei. If, moreover, BI, ... , Bm E lR are the values associated with the
subspaces, we can define a projection-valued measure E by E(Bi ) = Pi. Then
A = BIPI + ... + BmPm = BIE(Bd + ... + BmE(Bm ), and the probability
that the measurement of observable E will give Bi as an outcome is given by
Tr(TPi ) = Tr(TE(B i )).
jL(P) = Tr(TP).
The proof of Gleason's theorem can be found in [65], but no simple proof of
that theorem is known.
We will demonstrate here how an analogous statement can be reached in
a finite-dimensional case, if instead of projections, all self-adjoint mappings
are allowed. The proof of the theorem below is from [61].
jL(S) = Tr(TS)
Proof. Recalling that Re(c) = !(c+c*) and Im(c) = i(c-c*), the statement
of the lemma follows by direct calculation. D
r=ls=l
n
= Z)xr 1 Sxr ) Ixr}{xrl + L(xr 1 SXs) Ixr)(xsl
r=l
n
= ~)Xr 1 Sxr ) Ixr}{xr 1
r=l
r<s r>s
changing the role of rand s in the last term we have that
n
S = L(xr 1 Sxr ) Ixr)(xr 1
r=l
r<s r<s
n
= L(xr 1 Sxr ) Ixr)(xrl
r=l
+ L ((xr 1 SXs) Ixr}{xsl +(xr 1 SXs)* IXs)(xrl) (8.36)
r<s
By applying Lemma 8.4.1 to (8.36), this can be rewritten as
8.4 Quantum States as Operators 145
n
S = ~)Xr 1 Sxr) Ixr)(xr 1
r=l
r<s
+ LIm(x r 1SX s) (i Ixr)(xsl-i Ixs)(xrl)
r<s
By denoting Ar =1 Xr)(Xr I, Brs =1 Xr)(X s 1+ 1Xs)(Xr I, and Crs = i 1Xr)(X s 1
-i 1 Xs)(X r 1 (mappings Ar are clearly self-adjoint and by Lemma 8.4.2 all
the mappings B rs and Crs are also self-adjoint), the above equality becomes
n
S = L(xr 1Sxr)Ar + LRe(xr 1Sxs)Brs + LIm(xr 1SXs)Crs .
r=l r<s r<s
Because all the coefficients in the above sums are real, and the mappings
self-adjoint, the assumption implies that
n
M(S) = L(xr 1 SXr)M(Ar)
r=l
r<s r<s
This can be written as
n
M(S) = L(xr 1 SXr)M(Ar)
r=l
r<s r<s
n
= L(xr 1 SXr)M(Ar)
r=l
r=1s=1
n n
= LL(xs 1Txr)(xr 18x 8)
r=1s=1
n n
= L L(x 8 11 Txr)(xr 18x 8)
r=1s=1
n n
= ~)xs 1TL Ixr)(xrl 8x 8 )
s=1 r=1
n
= L(xs 1 T8x s) = n(T8).
s=1
To finish the proof, we have to show that T is astate. That is, a self-adjoint,
unit-trace positive mapping.
We have already noticed that T is self-adjoint. The trace of T can be
easily found by summing over the diagonal elements:
n n n
n(T) = LTrr = LJ-l(Ar ) = J-l(LAr ) = J-l(I) = 1.
r=1 r=1 r=1
For the positivity of T, we notice that if XE H n has unit length, then
The second-last equality follows from the fact that it is always possible to
choose an orthonormal basis containing x. Therefore,
Remark 8.4.3. Paul Busch introduced a result [22] which states that proba-
bility measures defined on effects16 can be expressed analogously to Gleason's
theorem. The result of Busch is based on extending the measure J-l orginally
defined on effects only into a linear functional defined on all self-adjoint op-
erators. The previous theorem can then be applied.
16 Effect Eis an operator in Ls(Hn } satisfying 0 ~ E ~ I.
8.4 QuantumStates as Operators 147
Now we will study the description of compound systems in more detail. Let
H n and H m be the state spaces of two distinguishable quantum systems.
As discussed before, the state space of a compound system is the tensor
product H n ~ H m . Astate of the compound system is a self-adjoint mapping
on H n ~ H m . Such astate can be comprised of states of subsystems in the
following way: if Tl and T 2 are states (also viewed as self-adjoint mappings)
of H n and H m , then the tensor product Tl ~T2 defines a self-adjoint mapping
on H n ~ H m , as is easily seen (Tl ~ T 2 is defined by
for the basis states and extended to Hn~Hm by using linearity requirements).
One can also easily verify that the matrix of Tl ~ T 2 is the tensor product
of the matrices of Tl and T 2 . As weIl, observables of subsystems make up an
observable of a compound system: if Al and A 2 are observables of subsystems
(now seen as unit-trace positive self-adjoint mappings), then Al ~ A 2 is a
positive, unit-trace self-adjoint mapping in H n ~ H m .
The identity mapping I: H --+ H is also a positive, unit-trace self-adjoint
mapping and, therefore, defines an observable. However, this observable is
most uninformative in the following sense: the corresponding projection-
valued measure is defined by
E (X) = {I, if 1 E X,
I 0, otherwise,
°
so the probability distribution associated with EI is trivial: Tr(TEi(X)) is 1
if 1 EX, otherwise. Notice also that I has only one eigenvalue 1, but all the
vectors of H are eigenvectors of I. This strongly reflects on the corresponding
decomposition of H into orthogonal subspaces: the decomposition has only
one component, Hitself. This corresponds to the idea that, by measuring
observable I, we are not observing any nontrivial property. In fact, if Tl and
T 2 are the states of H n and H m and Al and A 2 are some observables, then
it is easy to see that
Tr((TI ~T2)(AI ~I)) = Tr(TIA I ),
Tr( (Tl ~ T 2)(I ~ A 2)) = Tr(T2A 2).
That is, observing the compound observable Al ~I (resp., I ~A2) corresponds
to observing only the first (resp., second) system. Based on this idea, we will
define the substates of a compound system.
which gives
n n m
Tl = 2: 2: 2:(Xi I8lYk! T(Xj I8lYk)) !xi)(Xjl (8.44)
i=l j=l k=l
= Tl'
8.4 Quantum States as Operators 149
°
Tr(TIA) = Tr(T{ A) for each self-adjoint A. By choosing A as a projection
1x)(x I, we have that (x 1 (Tl - T{)x) = for any unit-Iength x. Mapping
Tl - T{ is also self-adjoint, so there is an orthonormal basis of H n consisting of
eigenvectors of Tl - T{. If all the eigenvalues of Tl - T{ are 0, then Tl - T{ = 0,
and the proof is complete. In the opposite case, there is a nonzero eigenvalue
Al of Tl - T{ and a unit-Iength eigenvector Xl belonging to Al. Then,
a contradiction. D
Let us write Tl = TrH".. (T) for the state obtained by tracing over H m
and, similarly, T 2 = Tr Hn (T) for the state that we get by tracing over H n .
By collecting all facts in the previous proof, we see that
n n m
Tl = LLL(XiQ9Yk 1 T(XjQ9Yk) IXi)(Xjl· (8.45)
i=l j=l k=l
Similarly,
m m n
T2 = L L L(Xk Q9 Yi 1T(Xk Q9 Yj) IYi)(Yj I· (8.46)
i=l j=l k=l
It is easy to see that the tracing-over operation is linear: Tr H n (a8 + ßT) =
aTr Hn (8) + ßTr Hn (T).
Example 8.4.2. Let the notation be as in the previous lemma. If 8 is astate
of the compound system H n Q9 H m, then 8 is, in particular, a linear mapping
H n Q9 H m -+ H n Q9 H m . Therefore, 8 can be uniquely represented as
n m n m
8= LLLLSrstu IXr Q9Yt)(xs Q9Yul. (8.47)
r=lt=ls=lu=l
It is easy to verify that 1Xr Q9 Yt)(x s Q9 Yu 1=1 xr)(xsl Q9 1Yt)(Yu I, so we can
write
n n m m
8 = L L 1xr)(xsl Q9 L L Srstu 1Yt)(Yu 1
r=ls=l t=l u=l
n n
= L: L: 1 xr)(xsl Q98rs , (8.48)
r=ls=l
150 8. Appendix A: Quantum Physics
where each Srs E L(Hm ) (recall that the notation L(Hm ) stands for the
linear mappings H m -+ H m ). It is worth noticing that the latest expression
for S is actually a decomposition of an nm x nm matrix (8.47) into an n x n
block matrix having m x m matrices Srs as entries.
Substituting (8.48) in (8.45), we get, after some calculation, that
n m m n
1001)
0000 1
(
Mty)(Y!="2 0000
1001
and the density matrices correspoding to states S' are
which agrees with the previous example. On the other hand, if we try to
reconstruct the state of H 2 @ H 2 from states S', we find out that
1 1000)
0100
Ms' @ Ms' = 4" ( 0 0 1 0 '
0001
which is different from Mty)(y!. This demonstrates that subsystem states S'
obtained by tracing over are not enough to determine the state 01 the com-
pound system.
8.4 Quantum States as Operators 151
Even though the state of the compound system perfectly determines the
subsystem states, there is no way to obtain the whole system state from the
partial systems without additional information.
Example 8.4.4. Let the notation be as before. If z E H n Q9Hm is a unit-Iength
vector, then 1 z) (z 1 corresponds to a pure state of system H n Q9 H m . For each
pair (x, z) E H n x H n Q9 H m , there exists a unique vector Y E H m such that
(Y' 1 y) = (x Q9 y' 1 z)
for each Y' E H m . In fact, if Y1 and Y2 were two such vectors, then (Y' 1
Y1 - Y2) = 0 for each Y' E H m , and Y1 = Y2 follows immediately. On the
other hand,
m
Y = L (x Q9 Yk 1 Z)Yk
k=l
clearly satisfies the condition required. We define a mapping (., .) : H n x H n Q9
H m -tHm by
m
(x,z) = L(XQ9Yk 1 Z)Yk.
k=l
Clearly (.,.) has properties that resemble the inner product, such as linearity
with respect to the second component and antilinearity 17 with respect to the
first component.
Now, substituting 8 =1 z) (z 1in (8.45), we see that
n
8 2 = L I(Xk,Z))((Xk,Z)I· (8.49)
k=l
Comparing (8.49) with the equation
n n
Tr(lz)(zl) = L(Xk Ilz)(zl Xk) = L(Xk 1z)(z 1Xk)
k=l k=l
somehow explains the name "tracing over". Notice also that if
n m n m
where
m
Y~ = I: cijbj .
j=l
On the other hand, we can write
m
Z = Lxj ®bj , (8.51)
j=l
where
n
xj = LCijXi.
i=l
k=l 1=1
m m m
which means that Yl, ... , Yn are eigenvectors ofT2 belonging to eigenvalues
AI, ... , An. 0
Aifi=j
{
Cij = 0 otherwise.
Remark 8.4.4. The above lemma states that any state of a quantum system
can be interpreted as a result of tracing over a pure state of a larger system.
If T is a mixed state, a pure state 1z) (z 1such that T = Tr Hm (I z) (z I) is
called a purijication of T.
(8.55)
for each i. D
The proof of the following theorem, the first structure theorem, is due to
[25].
8.4 Quantum States as Operators 157
W = LL IYr)(Ysl Q9Ars
r=ls=l
is positive, then also
r=ls=l i=l
n2
(zi (Im Q9 V)(W)z) = L(Zi I Wzi) ~ 0,
i=l
because W is positive.
For the other direction, let V be a completely positive mapping. Then
particularly, In Q9 V should be positive. For clarity, we will denote the basis of
one copy of H n by {Yl, ... ,Yn} and that of the other copy by {Xl, ... ,Xn }.
Mapping W E L(Hn Q9 H n ), defined by
158 8. Appendix A: Quantum Physics
n n
W = LL IYr)(Ysll8J Ixr)(xsl
r=ls=l
n n
= L LI Yr l8J Xr)(Ys l8J Xs 1
r=ls=l
n n
If
n n
V = LLVijYj I8JXi
j=li=l
is a vector in H n l8J H n , we associate with va mapping Vv E L(Hn ) by
n n
V(A) = L ViAVi*
i=1
holds for each A E L(Hn ). o
Another quite useful criterion for complete positiveness can be gathered
from the proof of the previous theorem:
Theorem 8.4.6. A linear mapping V : L(Hn ) -+ L(Hn ) is completely posi-
tive if and only if
n n
T= LL IYr)(Ysl ®V(lxr)(xsl)
r=1s=1
is a positive mapping H n ® H n -+ H n ® H n .
Proof. If V E L(Hn ) is a completely positive mapping, then T is positive by
the very definition, since
n n
LL IYr)(Ysl ® Ixr)(xsl
r=1s=1
is positive. On the other hand, if T is positive, then a representation
n2
V(A) = L ViAVi*
i=1
can be found as in the proof of the previous theorem. o
Recall that we also required that V should preserve the trace.
Lemma 8.4.6. Let V : L(Hn ) -+ L(Hn ) be a completely positive mapping
n2
represented as V(A) = 2:i=1 Vi AVi * . Then V preserves the troce if and only
2
if2:~=1 Vi*Vi = I.
160 8. Appendix A: Quantum Physics
1= L Ixr)(xrl,
r=l
(Xk L 1 Vi*Vixl)
i=l
i=l r=l
i=l r=l
i=l r=l
n2
Le., V preserves the trace of the mappings 1XI)(Xk I. Since V and the trace
are linear and the mappings 1 Xl) (Xk 1 generate the whole space L(Hn ),
V preserves all traces. On the other hand, if V preserves all traces, then
2
I:~=l Vi*Vi = I, since any linear mapping A E L(Hn ) is determined by the
values (Xk AXI)' 1 D
can be rewritten as
Proof. Assume first that V(A) = TrH n2(U(A®B)U*). Let {Xl,"" Xn } and
{Yb ... , Yn 2 } be orthonormal bases of H n and H n2, respectively. We write
U* in the form
n n
U* = LL Ixr)(xsl ®U:s
r=ls=l
and define Vk E L(Hn ) as
n n
Vk = LL(UjiYk 1 b) IXi)(xjl·
i=1 j=1
Now, if
n n
A = LLaij IXi)(Xj I,
i=1 j=l
a direct calculation gives
n n n n
VkAVk* = L L L L ( xr 1 AXt)(U:iYk IIb)(bl Ut'jYk) IXi)(xjl· (8.61)
i=1 j=l r=l t=1
On the other hand, (8.45) gives that
Tr Hn2 (U(A ® B)U*)
k=1i=1j=1r=1t=1
n2
TrHn 2(U(A®B)U*) = LVkAVk*.
k=1
162 8. Appendix A: Quantum Physics
V(A) = LViAVi*,
i=l
LViVi* = I.
i=l
n2
= L(XI I Vi*Vi x 2) = (Xl I X2)
i=l
8.4 Quantum States as Operators 163
Denoting
H n ® b = {x ® b I x E H n } <;;;: H n ® H n 2,
we see that U : H n ® b --t H n ® H n 2 is a linear mapping that preserves inner
products. It follows that U can be extended (usually, in many ways) to a
unitary mapping U' E L(Hn ® H n 2). We fix one extension and denote that
again by U. If x is any unit-length vector, then,
U(I x)(x I ® Ib)(bl)U* = U Ix ® b)(x ® bl U*
= IU(x®b))(U(x®b)1
i=l i=l
Using Example 8.4.4, it is easy to see that
i=l
i=l
= V(lx)(xl)·
Hence with U defined as in (8.62) and B =1 b)(b I, the claim holds at least
for all pure states Ix)(x I. Since linear mappings
A r--+ TrHn2 (U(A ® B)U*)
and A r--+ V (A) agree on all pure states and all states can be expressed as
convex combinations of pure states, the mappings must be equal. D
Remark 8.4.8. The above theorem can be interpreted as follows: There exists
a physical operation that pro duces copies of pure states IXk), •.• , IXk) (with
extra information Pb ... , Pk such that Pi belongs to state lXi)) if and only
if there exists a physical operation that pro duces state lXi) from ancillary
information Pi. Choosing PI = ... = Pk = p, we get the classical no-cloning
theorem of Wootters and Zurek (Theorem 2.3.1).
(8.65)
for each i,j E {I, ... , k}. Now, by Lemma 8.2.8, we know that there is a
unitary mapping U' E L(Hn 0 H m 0 H m 0 He) such that
U'(O 0 Yi 0 e) = Xi 0 Zi
for each i E {I, ... , k}. By tracing over He and the other copy of H m , we get
the desired mapping V'. 0
which is essentially only a swap between the environment state and the second
copy of the state Xi. In this case, the state Xi remains in the environment,
and it can be recovered perfectly.
On the other hand, if the projection postulate is adopted, then deletion
is possible, as pointed out in [48]. For deletion, then, it is enough to observe
the second copy of state lXi) lXi) and then to swap the resulting state into
10).
Theorem 8.4.9 (No-Deleting Principle). Assume that Xl, ... , Xk E H n
and 0 E H n are as above. 1f there exists a completely positive mapping erasing
the second copy of Xi, i.e., a mapping for which
Proof. Assurne that there is a completely positive mapping for which the
assumption holds. By Theorem 8.4.7, there is aspace He, vectors e, el, ... ,
ek E H n , and a unitary U E L(Hn ()9 H n ()9 He) that satisfies
or as
(8.66)
Lemma 8.2.8 states that there is a unitary mapping U' E L(Hn ()9 He) such
that
for each i E {I, ... , k}. As a unitary mapping, U' also has an inverse, and
this implies that Xi can be recovered from the environment. 0
166 8. Appendix A: Quantum Physics
8.5 Exercises
holds. Conclude that if IIAxl1 = Ilxll for any X E H n , then also (Ax I
Ay) = (X I y) for each X, y E H n .
5. Prove that
where
X~ = Xl cosa - X2 sina,
x~ = Xl sina + X2 cosa,
and a is any real number.
6. Prove that I Ax)(By 1= A I x)(y I B* for each x, y E H n and any
operators A, B : H n -+ H n .
7. Derive the generalized Schrödinger equation
.d
zdtP(t) = [H, p(t)]
with equation
p(t) = U(t)p(O)U(t)*
The purpose of this chapter is to introduce the reader to the basic mathe-
matical not ions used in this book.
A group G is a set equipped with mapping G x G ----t G, i.e., with a rule that
unambiguously describes how to create one element of Gout of an ordered
pair of given ones. This operation is frequently called the multiplication or the
addition and is denoted by g = g1g2 or g = g1 + g2 respectively. Moreover,
there is a special required element in G, which is called the unit element
or the neutral element and it is usually denoted by 1 in the multiplicative
notation, and 0 in the additive notation.
Furthermore, there is one more required operation, inversion, that sends
any element of g into its inverse element g-1 (resp. opposite element -g
when additive notations are used). Finally, the group operations are required
to obey the following group axioms (using multiplicative notations):
1. For all elements g1, g2, and g3, g1(g2g3) = (g1g2)g3.
2. For any element g, gl = 19 = g.
3. For any element g, gg-1 = g-1 g = 1.
If a group G also satisfies
4. For all elements g1 and g2, g1g2 = g2g1,
then G is caHed an abelian group or a commutative group.
To be precise, instead of speaking about group G, we should say that
(G,', -\ 1) is a group. Here G is a set of group elements; . stands for the
multiplication; -1 stands for the inversion, and 1 is the unit element. However,
if there is no danger of confusion, we will just use the notation G for the group,
as weH as for the underlying set.
Example 9.1.1. Integers Z form an abelian group having addition as the
group operation, 0 as the neutral element and mapping n f-t -n as the
inversion. On the other hand. natural numbers
168 9. Appendix B: Mathematical Background
N = {1, 2, 3, ... }
do not form a group with respect to these operations, since N is not closed
under the inversion n t-+ -n, and there is no neutral element in N.
Example 9.1.2. Nonzero complex numbers form an abelian group with re-
spect to multiplication. The neutral element is 1, and the inversion is the
mapping z t-+ Z-1.
Example 9.1.3. The set of n x n matrices over C that have a nonzero deter-
minant constitute a group with respect to matrix multiplication. This group,
denoted by GLn(C) and called the general linear group over C, is not abelian
unless n = 1, when the group is essentially the same as in the previous
example.
Regarding group axiom 1, it makes sense to ornit the parenthesis and write
just g1g2g3 = g1 (g2g3) = (g1g2)g3' This clearly generalizes to the products
of more than three elements. A special case is a product g ... 9 (k times),
which we will denote by gk (in the additive notations it would be kg). We
also define gO = 1 and g-k = (g-1)k (Og = 0, -kg = k( -g) when written
additively).
gH = {gh I h E H} (9.1)
is called a coset of H (determined by g).
Simple but useful observations can easily be made; the first is that each
element 9 E G belongs to some coset, for instance in gH. This is true because
H, as a subgroup, contains the neutral element 1 and therefore 9 = gl E gH.
This observation teIls us that the cosets of a subgroup H cover the whole
group G. Notice that, especially for any h E H, we have hH = H because
H, being a group, is closed under multiplication and each element h 1 E H
appears in hH (h 1 = h(h- 1h 1)).
Other observations are given in the following lemmata.
Lemma 9.1.1. g1H = 92H if and only if g1 1g2 EH.
1 The unit element can be seen as a nullary operation.
9.1 Group Theory 169
Remark 9.1.1. Notice that 91 192 EH if and only if 9;;1 91 EH. This is true
because H contains all inverses of the elements in Hand (91 1g2) -1 = 9;;1 91 .
In the additive notations, the condition 91 192 E H would be written as
-91 + g2 EH.
Prool. By the very definition of (9.1), it is clear that each coset can have
at most m elements. If some coset 9H has less than m elements, then for
some h i # hj we must have gh i = 9hj. Multiplication by 9-1, however, gives
h i = h j , which is a contradiction. 0
(9.2)
But, as far as we know, two distinct elements can define the same coset:
91H = 92H may hold even if 91 =I 92. Can the coset product ever not be
well-defined, i.e., that the product would depend on the representatives 91
and 92 which are chosen? The answer is no:
Lemma 9.1.4. If H is anormal subgroup, g1H = g~H and g2H = 9~H,
then also (g1g2)H = (g~g~)H.
Proof. By assumption and Lemma 9.1.1, g11g~ = h 1 EH and g:;1g~ = h 2 E
H. But since H is normal, we have
( 91g2 ) -1 ( 9192
I ')
= g2-1 g1-1 91g2
I I
= g2-1h 192I
= 92-1h 19292-1 g2 = g2-1h 192 h 2 E
I H.
The conclusion that g:;1h192h2 EH is due to the fact that 9:;1h 192 is in H
because H is normal. Therefore, by Lemma 9.1.1, (g1g2)H = (9~9~)H. D
{ 0 1 2
(g=g,g,g,
) ... ,g k-l} . (9.4)
Proof. Since IGI is divisible by k = ord(g), we can write IGI = k·l for some
integer I, and then glGI = gk./ = 1/ = 1. 0
Consider again the group Zn. Although the group operation is the coset
addition, it is also easy to see that the coset product
IZ~consists exactly of cosets k + nlZ such that gcd(k, n) = 1 (Notice that this
property is independent of the representative k chosen). For that purpose we
have to find an inverse for each such coset, which will be an easy task after
the following lemma.
If gcd(k, n) = 1, we can use the previous lemma to find the inverse of the
coset k + nlZ: let a and b be the integers such that ak + bn = 1. We claim
that the coset a + nlZ is the required multiplicative inverse. But this is easily
verified: since ak -1 is divisible by n, we have ak + nlZ = 1 + nlZ and therefore
(a + nlZ)(k + nlZ) = 1 + nlZ.
How many elements are there in group IZ~? Since all the cosets of nlZ
can be represented as k + nlZ, where k ranges over the set {O, ... , n - 1},
we have to find out how many of those k values satisfy the extra condition
gcd(k,n) = 1. The number ofsuch values of k is denoted by cp(n) = IIZ~I and
is called Euler's cp-function.
Let us first consider the case that n = pm is a prime power. Then, only
the numbers Op, 1p, 2p, ... , (pm-I - l)p in the set {O, 1, ... ,pm - I} da
not satisfy the condition gcd(k,pm) = 1. Therefore, cp(pm) = pm _ pm-I.
Especially cp(p) = p - 1 for prime numbers.
Assurne now that n = nl ... nr, where numbers ni are pairwise coprime,
i.e., gcd(ni,nj) = 1 whenever i -I- j. We will demonstrate that then cp(n) =
cp(nl)··· cp(n r ) and, for that pur pose , we present a well-known result:
Proof. Let mi = njni = nl ... ni-Ini+1 ... n r . Then, clearly gcd(mi, ni) = 1,
and, according to Lemma 9.1.5, aimi + bini = 1 for some integers ai and bio
Let
9.1 Group Theory 173
Then
is divisible by ni, since each mj with j =I- i is divisible as weH, and aimi - 1 is
also divisible by ni because aimi+bini = 1. It foHows that k+niZ = ki+niZ,
which proves the existence of such a coset k + nZ. If also k' + niZ = k i + niZ
for each i, then k' - k is divisible by each ni and, since the numbers ni are
coprime, k' - k is also divisible by n = ni ... n r . Hence, k' + nZ = k + nZ.
o
Clearly, any coset k + nZ E Z~ defines r cosets k + niZ E Z~i' But in
accordance with Chinese Remainder Theorem, alt the cosets k i + niZ E Z~i
are obtained in this way. To show that, we must demonstrate that, if we
have gcd(ki , ni) = 1 for each i, then the k which is given by the Chinese
Remainder Theorem also satisfies gcd(k, n) = 1. But this is straightforward:
if gcd(k,n) > 1, then also d = gcd(k',ni) > 1 for some i and some k' such
that k = k' k". Then, however, k' k" + niZ = k + niZ ~ Z~. It follows that
and therefore,
<p(nI)'" <p(nr ) = IZ~ll" 'IZ~r I
= IZ~l X •.• x Z~rI = IZ~I = <p(n).
Now we are ready to count the cardinality of Z~: let n = p~l ... p~r be the
prime factorization of n, Pi =I- Pj whenever i =I- j. Then
<p( n) = <p(p~l ) ... <p(p~r) = (p~l _ p~l -1) ... (p~r _ p~r-l)
Proof. First notice that Im(H) is always a subgroup of Hand that Ker(J)
is anormal subgroup of G, so the factor group G/Ker(J) can be defined. We
will denote K = Ker(J) for short and define a function F : G / K --+ Im( H)
by F(gK) = J(g). The first thing to be verified is that Fis well-defined, i.e,
the value of F does not depend on the choice of the coset representative g.
But this is straightforward: If glK = g2K, then by Lemma 9.1.1, gllg2 E
K and, by the definition of K, !(gllg2) = 1, which implies that J(gd =
J(g2) and F(glK) = J(gl) = J(g2) = F(g2K). It is clear that Fis a group
morphism. The injectivity can be seen as folIows: if F(glK) = F(g2K), then
by definition, J(gd = !(g2), hence J(gllg2) = 1, which means that gllg2 E
K. But this is to say, by Lemma 9.1.1, that glK = g2K. It is clear that F is
surjective, hence F is an isomorphism. D
9.1 Group Theory 175
Let G and G' be two groups. We can give the Cartesian product G X G' a
group structure by defining
It is easy to see that G x G' becomes a group with (1,1 ' ) as the neutral
element 3 and (g,g') H (g-1, g'-l) as the inversion. The group G x G' is
called the (outer) direct product of G and G'. The direct product is essentially
commutative, since G x G' and G' x G are isomorphie; an isomorphism is
given by (g,g') H (g',g). Also, (GI xG 2 ) xG 3 ~ GI X (G 2 xG 3 ) (isomorphism
given by ((gl,g2),g3) H (gI, (g2,g3)), so we mayas weIl write this product
as GI x G 2 X G 3 . This generalizes naturally to the direct products of any
number of groups.
If a group G has subgroups H 1 and H 2 such that G is isomorphie to
H 1 x H 2, we say that Gis the (inner) direct product of subgroups H 1 and H 2.
We then write also G = H 1 X H 2 . Notice also that in this case, the subgroup
H 1 (and also H 2 ) is normal. To see this, we identify, with some abuse of
notations, G and H 1 x H 2 and H 1 = {(h, 1) I hE Hd. Then
as required.
The following lemma will make the not ion of the factor group more nat-
ural:
Lemma 9.1.7. IfG = H 1 x H 2 (inner) then G/H1 ~ H 2.
Example 9.1.5. Consider again the group Z~, where n = p~l ... p~r is a prime
decomposition of n. It can be shown that the mapping
It is easy to see that any cyclie group of order n is isomorphie to Zn, the
additive group of integers modulo n, so we can consider Zn as a "prototype"
when studying cyclie groups. For any fixed y E Z we define mapping Xy :
Z --+ <C by
21Tixy
Xy(x) =e n •
+ z)
21riy(x+z) 21Tixy 27l"izy
Xy(x = e n = e n e n = Xy(x)xy(z),
which means that each Xy is, in fact, a character of Zn. Moreover, if y and
z are some representatives of distinct cosets modulo n, then also Xy and
XZ are different. Namely, if Xy = xz, then especially Xy(1) = Xz(l), i.e.,
e~ = e 2:iZ , which implies that y = z + k . n for some integer k. But then
y and z represent the same coset, a contradiction.
It is straightforward to see that XaXb = Xa+b, so the characters of Zn also
form a cyclic group of order n, genera ted by Xl, for instance. This can be
summarized as folIows: the character group of Zn is isomorphie to Zn; hence,
the character group of any cyclic group is isomorphie to the group itself.
4 A nth root of unity is a complex number x satisfying x n = 1.
9.2 Fourier Thansforms 177
The fact that the dual group of a cyclie group is isomorphie to the group
itself, can be generalized: a well-known theorem states that any finite abelian
group G can be expressed as a direct sum5 (or as a direct product, if the
group operation is thought as multiplication) of cyclic groups Gi:
g = gl + ... + gm,
where gi E Gi' This allows us to define a function X : G ---+ C \ {O} by
(9.7)
Example 9.2.2. Consider.!Fr, the rn-dimensional vector space over the binary
field. Each element in the additive group of .!Fr has order 2, so group .!Fr has
a simple decomposition:
(9.8)
so the vector space V evidently has dimension n. The natural basis of V is
given by el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , e n = (0,0, ... ,1). In other
words, ei is a function G -+ C defined as
I, if i = j
ei (gj) = { 0, otherwise.
The standard inner product in space V is defined by
n
(f I h) = L !*(gi)h(gi), (9.9)
i=l
and the inner product also induces norm in the very natural way:
Ilhll = y'(hfh).
Basis {eI, ... e n } is clearly orthogonal with respect to the standard inner
product. Another orthogonal basis is the characters basis:
Lemma 9.2.2. If Xi and Xj are characters of G, then
0, if i '" j (9.10)
(xiIXj)= { n,ifi=j.
1 = Ix(g)1 2 = X*(g)X(g),
which implies that X*(g) = X(g)-l for any 9 E G. Then
n n n
If i = j, then X;IXj is the principal character and the claim for i = j follows
immediately. On the other hand, if i =I- j, then X = X;IXj is a nontrivial
character of G and it suffices to show that
°
n
S = LX(9k) =
k=1
Since the characters are orthogonal and there are n = IG I of them, they
also form a basis. By scaling the characters to the unit norm, we obtain an
orthonormal basis T3 = {BI, ... , B n }, where
k=1
n
= LX:(9k)xj(9k) = (Xi I Xj)
k=1
which actually states that X* X = nI, which implies that X-I = .1n X*. But
since any matrix commutes with its inverse, we also have XX* = nI, which
can be written as
(XX*) ..
'J
= {O,n, If~f z~ =I- ~
=],
or as
180 9. Appendix B: Mathematical Background
~ () *()
L.t Xk gi Xk 9j =
{O,n ifif ii =# J'.j (9.11)
k=l '
Equations (9.10) and (9.11) are known as the orthogonality relations of char-
acters. Notice also that, by choosing Xj as the principal character in (9.10)
and 9j as the neutral element in (9.11), we get a useful corollary:
Corollary 9.2.1.
L e 2"ixy
n
n, if X =
= { 0, otherwise,
°
yEZ n
~ (-1 )""Y = { 2m , if X = ?
L.t 0, otherwlse.
YElF2'
We have now all the tools for defining the discrete Fourier trans form: any
element f E V (recall that V is the vector space of functions G --+ q has
unique representation with respect to basis !3 = {JnX1"'" JnXn}:
(9.14)
(9.15)
By its very definition, it is dear that T+h = 1+ hand -;;j = cl for any
functions f, hand any cE C.
and that the matrix appearing in the right-hand side of (9.16) is X*, the
transpose complex conjugate of matrix X defined as X ij = Xj(gi).6 By mul-
tiplying (9.16) by X we get
(9.17)
(9.18)
-
f(x) = 1
Mm ""
~ (-I)""Y f(y) = ~
f(x).
v 2.. · Y ElF'"
2
6 Equation (9.16) makes Parseval's identity even clearer: matrices JnX and JnX*
are unitary and, therefore, they preserve the norm in V.
9.3 Linear Algebra 183
We will now give an illustration on how powerfu11y the Fourier transform can
extract information about the periodicity.
Let f : G ---t C be a function with period pE G, Le., f(g + p) = f(g) for
any 9 E G. Then
1
Vn L
~ n
f(gi) = X:(9k)f(9k)
k=l
1 n
= Vn (; Xi(9k + P - p)* f(9k + p)
9.3.1 Preliminaries
1. CI(C2X) = (CIC2)X.
2. + C2)X = CIX + C2X.
(Cl
3. C(XI + X2) = CXI + CX2·
4.1x=x.
Again, to be precise, we should talk ab out vector space (V, +, 0, -, C, .) in-
stead of space V, but to simplify the notations we identify the space and the
underlying set V. The elements of V are called vectors.
184 9. Appendix B: Mathematical Background
Example 9.3.1. Let V = C n , the set of n-tuples over complex numbers. Set
C n equipped with sum
is convergent, also constitutes a vector space over C. The sum and scalar
product are again defined pointwise: for functions x and y, (x + y)(n) =
x(n) + y(n) and (cx)(n) = cx(n). To simplify the notations, we write also
X n = x(n). To verify that the sum of two elements stays in the space, we
also have to check that the series E:=l /xn + Yn/ 2 is convergent whenever
E:=1/Xn /2 and E:=1/Yn/ 2 are convergent. For this purpose, we can use
estimations
/X + y/ 2 = /x/ 2 + 2Re(x*y) + /y/ 2
:::; /x/ 2 + 2/x//y/ + /y/ 2 :::; 2/x/ 2 + 2/y/ 2
for each summand. The vector space of this example is denoted by L 2 (C).
Notice that the domain N in the definition could be replaced with any numer-
able set. If, instead of N, there is a finite domain {l, ... ,n}, the convergence
condition would be unnecessary and the space would become essentially the
same as cn.
Definition 9.3.1. A linear combination of vectors Xl, ... , X n is a finite sum
+ ... + cnxn . A set S C V is calted linearly independent if
Cl Xl
implies Cl = ... = Cn = 0 whenever Xl, ... , X n ES. A set that is not linearly
independent is linearly dependent.
Definition 9.3.2. For any set S ~ V, L(S) is the set of alt linear combi-
nations of vectors in S. By its very definition, set L( S) is a vector space
contained in V. We say that L(S) is generated by Sand also calt L(S) the
span of S.
9.3 Linear Algebra 185
Definition 9.3.3. A set Be V is a basis ofV ifV = L(B) and Bis linearly
independent.
It can be shown that all the bases have the same cardinality (See Exercise
2). This gives the following definition.
Definition 9.3.4. The dimension dim(V) of a vector space V is the cardi-
nality of a basis of V.
Example 9.3.3. Vectors el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , e n =
(0,0, ... ,1) form a basis of C n . This is a so-called natural basis. Thus C n ,
has dimension n.
I, if i = n
ei(n)= { O,ifii-n
for each i E N. It is clear that the set t: = {el,e2,e3, ... } is linearly in-
dependent, but it is not a basis of L 2 (C) in the sense of definition (9.3.3),
because there are vectors in L 2 (C) that cannot be expressed as a linear com-
bination of vectors in c. This is because, according to definition (9.3.1), a
linear combination is a finite sum of vectors (Exercise 4).
Definition 9.3.5. A subset W ~ V is a subspace of V if W is a subgroup
of V that is closed under scalar multiplication.
Example 9.3.5. Consider the n-dimensional vector space C n . Set
It is straight forward to see that W is a subspace of L 2 (C) and that the set t:
of example (9.3.4) is a basis of W.
186 9. Appendix B: Mathematical Background
A vector space equipped with an inner product is also called an inner product
space.
Example 9.3.1. For vectors x = (Xl, ... , Xn ), and y = (Yl, ... , Yn) in C n the
formula
(9.19)
and the series 2:::=:='=1 Ix n l2 and 2:::=:='=1 Ix n l2 converge by the very definition of
L 2 (C).
An orthogonal set that does not contain the zero vector 0 is always linearly
independent: for, if S is such a set and
o <- (x I x) - (y I x) (x I y)
(y I y) ,
and the claim follows. o
Definition 9.3.8. A norm on a vector space is a mapping V -+ lR, x t--+ Ilxll
such that, for all vectors x and y, and C E <C we have
1.llxll ~ 0 and Ilxll = 0 if and only ifx = o.
2·llcxll = !clllxll·
3·llx + yll ::; Ilxll + Ilyll·
A vector space equipped with a norm is also called a normed space . Any
inner product always induces a norm by
Ilxll=~,
so an inner product space is always a normed space as weH. To verify that
~ is a norm, it is easy to check that the conditions 1 and 2 are fulfilled;
and for condition 3, we use the Cauchy-Schwartz inequality, which can also
be stated as I(x I y) I ::; Ilxllllyll· Thus,
Ilx + Yl12 = (x + Y I x + y) = IIxl1 2 + 2Re(x I y) + IIYl12
::;IIxl1 2 + 21(x I y)1 + IIyl12 ::; IlxW + 211xllllyII + IlyW
= (Ilxii + Ily11)2.
188 9. Appendix B: Mathematical Background
lim Ilxm
m,n-too
- xnll = 0,
It is clear that this method always gives gcd(x, y) correctly. However, the
method is not quite efficient: for instance, in computing gcd(x, 1) it proceeds
as
where 0 ::; Ti < Ti-I, we have Ti = Ti-2 - diTi-1 < Ti-2 - diTi. The estimation
Ti-2> (1 + di)Ti :2: 2Ti follows easily. Therefore,
x >y> Tl > 2T3 > 4T5 ... > 2iT2i+1.
n-l n-l
If n is odd, we have x > 2-2-Tn :2: 2-2-. For an even n, we also have
n-2 n-2 n-2
X > 2-2- Tn _1 > 2-2-T n , so the inequality x > 2-2- holds in any case. It
follows that f(x) = 1l0glO X J > 10glO x-I > n 22 10glO 2 - 1. Therefore,
n< - 2\
oglQ
2 (f(x) + 1) + 2 = O(f(n)). As a conclusion, we have that, for num-
263 1
- = 1+ 1 (9.25)
189 2 + ___"...-_
1
1+ 1
1+--1
4+ 8
Expression (9.25) is an example of a finite continued jraction.
It is clear that this procedure can be done for any positive rational number
a = ~ ?: 1 in the lowest terms.
For other values of a, there is a unique way to express a = aa + ß where
aa E Z and ß E [0,1). If ß =1= 0, this can also be written as
1 1
a = aa + -, where al = -ß > l.
al
By applying the same procedure recursively to al, we get an expansion
1
a = aa + - - - - - - - : ; 1 - - - (9.26)
al +------::-1--
exists for each sequence (9.29). We will so on find an answer to that quest ion.
Let (ai) be a sequence as in (9.29). The nth convergent of the sequence
(ai) is defined to be Pn
qn
= [aa, ... , an]. A simple calculation shows that
Pa aa
qa T'
PI aaal +1
q2 al + ...l.
a2
a2ql + qa
Continuing this way, we see that Pn and qn are polynomials that depend on
aa, al, ... an. We are now interested in finding the polynomials Pn and qn,
so we will regard ai as indeterminates, assuming that they do not take any
specific integer values. After calculating some first Pn and qn, we may guess
that there are general recursion formulae for polynomials Pn and qn.
Lemma 9.4.3. For each n ?: 2,
= anPn-1 + Pn-2,
Pn (9.31 )
qn = anqn-l + qn-2· (9.32)
hold for each n ?: 2.
9.4 Number Theory 193
Proof. By induetion on n. The formulae (9.31) and (9.32) are true for n = 2,
as we saw before. Assume then that they hold for numbers {2, ... ,n}. Then,
Pn+l
-- = [ao,···,an-l,an,an+l 1= [ao,···,an-l,an +--
1 1
qn+l an+l
(an + ~ )Pn-l + Pn-2
= (an+ - 1a )qn-l + qn-2
n+l
If al, a2, ... are natural numbers, then it follows by (9.32) that qn >
qn-l + qn-2 and the inequality qn 2: F n , where F n is the nth Fibonacci
number,7 ean be proved by induetion. In the sequel we will use notation Pn
and qn also for the Pn and qn evaluated at {ao, al, ... , an}. The meaning of
individual notation Pn or qn will be clear by the eontext.
Using the reeursion formulae, it is also easy to prove the following lemma:
Lemma 9.4.4. For any n 2: 1, Pnqn-l - Pn-lqn = (_1)n-l.
From the previous lemma it also follows that the eonvergents fu are always
in their lowest terms: for if d divides both Pn and qn, then d al~o divides 1.
Lemma 9.4.4 has even more interesting eonsequenees: By multiplying
(9.33)
byan and using the reeursion formulae (9.31) and (9.32) onee more, we see
that
(9.34)
Pn-l
Pn - - (_1)n-l
- - (9.35)
qn qn-l qnqn-l
and
Pn-2
Pn - - ( -1)n an
- - (9.36)
qn qn-2 qnqn-2
(9.38)
355
113 - 'Ir
1 1
~ 113. 33102 = 0.000000267342 ....
1
In fact,
Remark 9.4.1. Inequality (9.38) teIls us that, for an irrational a, the conver-
gents fu
h
give infinitely many rational approximations Eq with gcd(p, q) = 1
of a so precise that
(9.39)
This has furt her number theory interest, since the rational numbers have only
finitely many rational approximations (9.39) such that gcd(p, q) = 1. This is
true because, if a = ~ with s > 0, then for q 2': s and ~ =I- ~,
For 0 < q < s there are clearly only finitely many approximations ~ that
satisfy (9.39) and gcd(p, q) = 1.
Thus, the irrational numbers can be characterized by the property that
they have infinitely many rational approximations (9.38). The continued frac-
tion expansion also gives finitely many rational approximations (9.39) for
rational numbers. On the other hand, we will show in a moment (Theorem
9.4.3) that approximations which are good enough are convergents.
It can even be shown that the convergents to a number aare the "best"
rational approximations of a in the following sense. For the proof of the
following theorem, consult [44J.
whenever n ~ 2.
Now we will show that approximations which are good enough are con-
vergents. To do that, we must first derive an auxiliary result. Let a
[ao,al,a2 ... J be a continued fraction expansion of a. We define an+l =
[a n +1,a n +2, ... J and this gives us a formal expression
(9.40)
with Pnqn-l - Pn-lqn = (_l)n-l just like in the proof of Lemma 9.4.3. But
also any representation looking "enough" like (9.40) implies that Er!.
qn
and Pn-l
qn-l
are convergents to a. More precisely:
Theorem 9.4.2. Let P, Q, P', and Q' be integers such that Q > Q' > 0
and PQ' - QP' = ±l. 1f also a' ~ 1 and
a'P+P'
a = a'Q+Q"
(9.41 )
or even as
(9.43)
which can be valid only if Q' = qn-l. By (9.43) it also follows that P' = Pn-l.
Hence,
(9.44)
just like in the proof of Lemma 9.4.3. Since a' :::: 1, there is a continued
fraction expansion a' = [an+l' an+2, ... ] such that an+l :::: 1. 0
then E
q
is a convergent to the continued fraction expansion of a.
9.5 Shannon Entropy and Information 197
Proof. By assumption,
P ()
- -a=a-,
q q2
for some 0 < () :s; ~ and a = ±1. Let ~ = [aa, al, ... , an] be the continued
fraction expansion for ~. Again by Lemma 9.4.2, we may assurne that a =
(-1)n-l. Wedefine
where EI!.
qn
= E.q and Pn-l
qn-l
are the last and the second-Iast convergents to E..
q
Thus, we have
a=
a'Pn + Pn-l
a'qn + qn-l
and
Thus,
, 1 qn-l
a =---->2-1=1.
() qn
The purpose of this section is to represent, following [20], the very elemen-
tary concepts of information theory. In the beginning, we concentrate on
information represented by binary digits: bits.
9.5.1 Entropy
By using a single bit we can represent two different configurations: the bit is
either 0 or 1. Using two bits, four different configurations are possible, three
bits allow eight configurations, etc. Inspired by these initial observations, we
say that the elementary binary entropy of a set with n elements is log2 n. The
elementary binary entropy, thus, approximately measures the length of bit
strings which we need to label all the elements of the set.
198 9. Appendix B: Mathematical Background
Remark 9.5.1. Consider the problem of specifying one single element out of
given n elements such that the probability of being the outcome is ~ for
each element. The elementary binary entropy log2 n thus measures the initial
uncertainty in bits, i.e., to specify the outcome, we have to gain log2 n bits
of information. In other words, we have to obtain the binary string of log2 n
bits that labels the outcome.
On the other hand, using an alphabet of 3 symbols, we would say that the
eZementary ternary uncerlainty of a set of n elements is log3 n. To get a more
unified approach, we define the eZementary entropy of a set of n elements to
be
H(n) = Klogn,
strings where the ith letter occurs k i = PiZ times. Moreover, the distribu-
tion of such strings tends to uniform distribution as Z tends to infinity. The
elementary entropy of the set of such strings is
9.5 Shannon Entropy and Information 199
I!
Klog k 1 ! ... k n ! = K(logl! -log k 1 ! - ... -log kn!).
_
--K Ln -ki 1og-+
k O(logl)
- i
I I I
i=1
~ logl
= -K ~Pi logpi + O(-Z-),
i=1
By letting I -t 00, the last term tends to O. This encourages the following
definition.
Definition 9.5.1. Suppose that the elements of an n-element set occur with
probabilities of PI, ... , Pn· The Shannon entropy of distribution PI, ... , Pn is
defined to be
Because f is strict1y increasing, we have also f(sm) < f(2 n ) < f(sm+l),
whieh implies that
Since n can be chosen arbitrarily great, we must have f(s) = fo~~ logs =
K log s. Constant K = fo~~ is positive, since f is strietly increasing and
f(1) = f(1°) = Of(1) = o.
For any natural number n, we define f(n) = H(~, ... , ~). We will demon-
strate that f(sm) = mf(s). Symmetry of Hand condition 3 imply that
9.5 Shannon Entropy and Information 201
1 1 11 11 1
H(-,oo.,-)
sm sm
= H(-,oo.,-)
s S
+s· -H(-l,oo"-l)'
S sm- sm-
which is to say that f(sm) = f(s) + f(sm-l). Claim f(sm) = mf(s) follows
now by induction. Since f is strictly increasing and non-negative by 2, it
follows that f (s) = K log s for some positive constant K.
Let Pi, ... , Pn be some rational numbers such that PI + ... + Pn = 1. We
can assurne that Pi = ~i, where N = ml + ... + m n . Using 3 we get
1 1
=H(N,oo"N)
1 1 1 1
= H(PI . - , ... , PI . - , P2 . - , ... , ... , ... , Pn . - )
ml ml m2 mn
, v I '-v--" '-v--"
Hence,
n
= - K (L Pi log mi - log N)
i=l
n n
i=l i=l
n
= -KLPilogpi,
i=l
which demonstrates that the theorem is true for rational probabilities. Since
H is continuous, the result extends to real numbers straightforwardly. 0
9.5.2 Information
Remark 9.5.7. By saying that "Y is an observable on H n " we mean the fol-
lowing: There is a collection {E1 , ... , E m } of mutually orthogonal subspaces
of Hn such that Hn = E 1 (f) ... (f)Em (see Definition 8.3.1), and each subspace
Ei has label i. Y is then defined as the random variable which gets value i if
and only if the observable {E 1 , •.• , E m } gets value i.
Remark 9.5.8. The above theorem holds also even if Y is allowed to be a
POVM rather than a typical observable, defined as in Definition 8.3.1.
204 9. Appendix B: Mathematical Background
9.6 Exercises
1. Prove that a set S <,:;; V is linearly dependent if and only if for some
element x ES, x E L (S \ {X} ).
2. Show that if Band B' are two bases of aHn, then necessarily IBI = IB'I.
3. a) Let n 2: 2 and Xl, X2 E Hn- Show that y E H n can be chosen in
such a way that L(XI, y) = L(XI, X2) and (Xl I y) = O.
b) Generalize a) into a procedure for finding an orthonormal basis of
H n (the procedure is called the Gmm-Schmidt method).
4. Prove that function f : N ---+ C defined by f(n) = ~ is in L 2 (C) but
cannot be expressed as a linear combination of E = {e I, e2, e3, ... }.
5. Prove the parallelogram rule: in inner product space V, the equation
34. David Deutsch, Richard Jozsa: Rapid solutions of problems by quantum com-
putation, Proceedings of the Royal Society of London A 439, 553-558 (1992).
35. Chistoph Dürr and Peter H!1lyer: A quantum algorithm for jinding the minimum.
Electronically available at quant-ph/9607014.
36. Mark Ettinger and Peter H!1lyer: On quantum algorithms for noncommutative
hidden subgroups, Proceedings of the 16th Annual Symposium on Theoretical
Aspects of Computer Science - STACS 99, Lecture Notes in Computer Science
1563, 478-487, Springer (1999). Electronically available at quant-ph/9807029.
37. Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser: A limit
on the speed of quantum computation in determining parity, Physical Review
Letters 81:5, 5442-5444 (1998). Electronically available at quant-ph/9802045.
38. Richard P. Feynman: Simulating physics with computers, International Journal
of Theoretical Physics 21:6/7, 467-488 (1982).
39. A. Furusawa, J. L. Sorensen, S. L. Braunstein, C. A. Fuchs, H. J. Kimble, E.
S. Polzik: Unconditional quantum teleportation, Science 282, 706-709 (1998).
40. Daniel Gottesmann: The Heisenberg Representation of Quantum Computers.
Electronically available at quant-ph/9807006.
41. Lov K. Grover: A fast quantum-mechanical algorithm for database search, Pro-
ceedings of the 28th Annual ACM Symposium on the Theory of Computing -
STOC, 212-219 (1996). Electronically available at quant-ph/9605043.
42. Lov K. Grover and Terry Rudolph: How signijicant are the known collision and
element distinctness quantum algorithms'? Electronically available at quant-
ph/0306017.
43. Josef Gruska: Quantum Computing, McGraw-Hill (1999).
44. G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, 4th
ed with corrections, Clarendon Press, Oxford (1971).
45. Mika Hirvensalo: On quantum computation, Ph.Lic. Thesis, University of
Turku, 1997.
46. Mika Hirvensalo: The reversibility in quantum computation theory, Proceed-
ings of the 3rd International Conference Developments in Language Theory -
DLT'97, Ed.: Symeon Bozapalidis, Aristotle University of Thessaloniki, 203-
210 (1997).
47. A. S. Holevo: Statistical Problems in Quantum Physics, Proceedings of the
Second Japan-USSR Symposium on Prob ability Theory, Eds.: G. Murayama
and J.V. Prokhorov, Springer, 104-109 (1973).
48. Richard Jozsa: A stronger no-cloning theorem. Electronically available at quant-
ph/0204153.
49. Loo Keng Hua: Introduction to number theory, Springer-Verlag, 1982.
50. A. Y. Kitaev: Quantum computation: algorithms and error correction, Russian
Mathematical surveys 52:1991 (1997).
51. E. Knill, R. Laflamme, R. Martinez, C.-H. Tseng: An algorithmic benchmark
for quantum information processing, Nature 404: 368-370 (2000).
52. Rolf Landauer: Irreversibility and heat generation in the computing process,
IBM Journal of Research and Development 5, 183-191 (1961).
53. M. Y. Lecerf: Recursive insolubilite de l'equation generale de diagonalisation
de deux monomorphismes de monoiäes libres r.px = 'lj;x, Comptes Rendus de
l'Academie des Sciences 257, 2940-2943 (1963).
54. Ming Li, John Tromp and Paul Vitanyi: Reversible simulation of irreversible
computation, Physica D 120:1/2, 168-176 (1998). Electronically available at
quant-ph/9703009.
55. Seth Lloyd: A potentially realizable quantum computer, Science 261, 1569-1571
(1993).
208 References
78. Keijo Ruohonen: Reversible machines and Post's correspondence problem for
biprefix morphisms, EIK - Journal of Information Processing and Cybernetics
21:12, 579-595 (1985).
79. Arto Salomaa: Public-key cryptography, Texts in Theoretical Computer Science
- An EATCS Series, 2nd ed., Springer (1996).
80. Yaoyun Shi: Both Toffoli and controlled-NOT need little help to do univer-
sal quantum computation. Quantum Information and Computation 3:1,84-92
(2003). Electronically available at quant-ph/0205115.
81. Peter W. Shor: Algorithms for quantum computation: discrete log and factoring,
Proceedings of the 35th annual IEEE Symposium on Foundations of Computer
Science - FOCS, 20-22 (1994).
82. Peter W. Shor: Scheme for reducing decoherence in quantum computer memory,
Physical Review A 52:4, 2493-2496 (1995).
83. Uwe Schöning: A Probabilistic algorithm for k -SAT based on limited loeal search
and restart, Algorithmica 32, 615-623 (2002).
84. Daniel R. Simon: On the power of quantum computation, Proceedings of the
35th annual IEEE Symposium on Foundations of Computer Science - FOCS,
116-123 (1994).
85. Douglas R. Stinson: Cryptography - Theory and practice, CRC Press Series on
Discrete Mathematics and Its Applications, CRC Press, Boca Raton (1995).
86. W. Tittel, J. Brendel, H. Zbinden, N. Gisin: Violation of Bell inequalities by
photons more than 10 km apart, Physical Review Letters 81:17, 3563-3566,
(1998). Electronically available at quant-ph/9806043.
87. Tommaso Toffoli: Bicontinuous extensions of invertible combinatorial functions,
Mathematical Systems Theory 14, 13-23 (1981).
88. B. L. van der Waerden: Sources of quantum mechanics, North-Holland (1967).
89. C. P. Williams and S. H. Clearwater: Explorations in quantum computing,
Springer (1998).
90. C. P. Williams and S. H. Clearwater: Ultimate zero and one. Computing at the
quantum frontier, Springer (2000).
91. William K. Wootters, Wojciech H. Zurek: A single quantum cannot be cloned,
Nature 299, 802-803 (1982).
92. Andrew Chi-Chih Yao: Quantum circuit complexity, Proceedings of the 34th
annual IEEE Symposium on Foundations of Computer Science - FOCS, 352-
361 (1993).
Index