100% found this document useful (4 votes)
978 views223 pages

Quantum Computing, 2nd Ed.

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
978 views223 pages

Quantum Computing, 2nd Ed.

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

Natural Computing Series

Series Editors: G. Rozenberg


Th. Bäck A.E. Eiben J.N. Kok H.P. Spaink
Leiden Center for Natural Computing

Advisory Board: S. Amari G. Brassard M. Conrad


K.A. De Jong C.C.A.M. Gielen T. Head L. Kari
L. Landweber T. Martinetz Z. Michalewicz M.C. Mozer
E.Oja G. Paun J. Reif H. Rubin A. Salomaa M. Schoenauer
H.-P. Schwefel C. Torras D. Whitley E. Winfree J.M. Zurada
Springer-Verlag Berlin Heidelberg GmbH
Mika Hirvensalo

Quantum
Computing

Second Edition

Springer
Mika Hirvensalo
University of Turku
Department of Mathematics
20014 Thrku
Finland
[email protected]

Series Editors
G. Rozenberg (Managing Editor)
[email protected]
Th. Bäck,}. N. Kok, H. P. Spaink
Leiden Center for Natural Computing, Leiden University
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
A.E.Eiben
Vrije Universiteit Amsterdam

Library of Congress Cataloging-in-Publication Data


Hirvensalo, Mika, 1972-
Quantum computing / M. Hirvensalo.
p. cm. - (Natural computing series)
Includes bibliographical references and index.
ISBN 978-3-642-07383-0 ISBN 978-3-662-09636-9 (eBook)
DOI 10.1007/978-3-662-09636-9
1. Quantumcomputers. I. Tide. 11. Series.
QA76.889.H57 2003 004.1-dc22 2003066405

ACM Computing Classification (1998): F.1-2, G.1.2, G3, H.l.l, 1.1.2, J.2
ISBN 978-3-642-07383-0
This work is subject to copyright. All rights are reserved, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data
banks. Duplication of this publication or parts thereof is permitted only under the prov-
isions of the German Copyright Law of September 9, 1965, in its current version, and per-
mission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH.
Violations are liable for prosecution under the German Copyright Law.

springeronline.com
© Springer-Verlag Berlin Heidelberg 2004
Originally published by Springer-Verlag Berlin Heidelberg New York in 2004
Softcover reprint of the hardcover 2nd edition 2004

The use of general descriptive names, registered names, trademarks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
Cover design: KünkelLopka, Heidelberg
Typesetting: Computer to film from author's data
Printed on acid-free paper 45/3142PS - 543210
Preface to the Second Edition

After the first edition of this book was published, I received much positive
feedback from the readers. It was very helpful to have all those comments sug-
gesting improvements and corrections. In many cases, it was suggested that
more aspects on quantum information would be welcome. Unfortunately, I
am afraid that an attempt to cover such a broad area as quantum informa-
tion theory would make this book too scattered to be helpful for educational
purposes.
On the other hand, ladmit that some aspects of quantum information
should be discussed. The first edition already contained the so-called No-
Cloning Theorem. In this edition, I have added a stronger version of the
aforementioned theorem due to R. Jozsa, a variant which also covers the
no-deleting principle. Moreover, in this edition, I have added some famous
protocols, such as quantum teleportation.
The response to the first edition strongly supports the idea that the main
function of this book should be educational, and I have not included furt her
aspects of quantum information theory here. For further reading, I suggest
[43] by Josef Gruska and [62] by Michael A. Nielsen and Isaac L. Chuang.
Chapter 1, especially Section 1.4, includes the most basic knowledge for
the presentation of quantum systems relevant to quantum computation.
The basic properties of quantum information are introduced in Chapter 2.
This chapter also includes interesting protocols: quantum teleportation and
superdense coding.
Chapter 3 is divided as follows: Thring machines, as well as their prob-
abilistic counterparts, are introduced in Section 3.1 as traditional, uniform
models of computation. For the reader interested in quantum computing but
having little knowledge of the theory of computation, this section was de-
signed to also include the basic definitions of complexity theory. Section 3.1
is intended for the reader who has a solid background in quantum mechanics,
but little previous knowledge on classical computation theory. The reader
who is well aware of the theory of computation may skip Section 3.1: for such
areader the knowledge in Chapter 2 and Section 3.2 (excluding the first sub-
section) is sufficient to follow this book. In Section 3.2, we represent Boolean
and quantum circuits (as an extension of the concept of reversible circuits)
VI Preface

as models of computation. Because of their descriptional simplicity, circuits


are used throughout this book to present quantum algorithms.
Chapter 4 is devoted to a complete representation of Shor's famous fac-
torization algorithm. The instructions in Chapter 4 will help the reader to
choose which sections may, according to the reader's background, be skipped.
Chapter 5 is closely connected to Chapter 4, and can be seen as a represen-
tation of a structural but not straightforward extension of Shor's algorithm.
Chapter 6 was written to introduce Grover's method for obtaining a
quadratic speedup in quantum computation (with respect to classical com-
putation) in its basic form, whereas in Chapter 7 we represent a method for
obtaining lower bounds for quantum computation in a restricted quantum
circuit model.
Chapters 8 and 9 are appendices intended for the beginner , but Chapter
8 is also suitable for the reader who has a strong background in computer
science and is interested in quantum computation. Chapter 9 is composed of
various different topics in mathematics, since it has already turned out that,
in the area of quantum computation, many mathematical disciplines, seem-
ingly separate from each other, are useful. Moreover, my personal experience
is that a basic education in computer science and physics very seldom covers
all the areas in Chapt. 9.

Acknowledgements. That this second edition ever emerged is due to In-


geborg Mayer. I wish to thank her very much for her complaisant way of
cooperating. Thanks also go to all the readers of the first edition who sent
me suggestions about corrections and improvements.
Turku, Finland
October 2003 Mika Hirvensalo
From the Preface to the First Edition

The twentieth century witnessed the birth of revolutionary ideas in the phys-
ieal sciences. These ideas began to shake the traditional view of the universe
dating back to the days of Newton, even to the days of Galileo. Albert Ein-
stein is usually identified as the creator of the relativity theory, a theory that
is used to model the behavior of the huge macrosystems of astronomy. An-
other new view of the physical world was supplied by quantum physics, which
tumed out to be successful in describing phenomena in the mieroworld, the
behavior of particles of atomic size.
Even though the first ideas of automatie information processing are quite
old, I feel justified in saying that the twentieth century also witnessed the
birth of computer science. As a mathematieian, by the term "computer sci-
ence", I mean the more theoretieal parts of this vast research area, such as
the theory of formal languages, automata theory, complexity theory, and al-
gorithm design. I hope that readers who are used to a more flexible concept of
"computer science" will forgive me. The idea of a computational device was
crystallized into a mathematical form as a Turing machine by Alan Turing
in the 1930s. Since then, the growth of computer science has been immense,
but many problems in newer areas such as complexity theory are still waiting
for a solution.
Since the very first electronic computers were built, computer technology
has grown rapidly. An observation by Gordon Moore in 1965laid the founda-
tions for what became known as "Moore's Law" - that computer processing
power doubles every eighteen months. How far can this technical process go?
How efficient can we make computers? In light of the present knowledge, it
seems unfair even to attempt to give an answer to these questions, but some
estimate can be given. By naively extrapolating Moore's law to the future,
we leam that sooner or later, each bit of information should be encoded by
a physieal system of subatomic size! Several decades ago such an idea would
have seemed somewhat absurd, but it does not seem so anymore. In fact, a
system of seven bits encoded subatomically has been already implemented
[51]. These small systems can no longer be described by classieal physies, but
rather quantum physical effects must be taken into consideration.
When thinking again about the formalization of a computer as a Turing
machine, rewriting system, or some other classieal model of computation, one
VIII Preface

realizes that the concept of information is usually based on strings over a finite
alphabet. This strongly reflects the idea of classical physics in the following
sense: each member of astring can be represented by a physical system
(storing the members in the memory of an electronic computer, writing them
on sand, etc.) that can be in a certain state; i.e., contain a character of the
alphabet. Moreover, we should be able to identify different states reliably.
That is, we should be able to make an observation in such a way that we
become convinced that the system under observation represents a certain
character.
In this book, we typically identify the alphabet and the distinguishable
states of a physical system that represent the information. These identifi-
able states are called basis states. In quantum physical microsystems, there
are also basis states that can be identified and, therefore, we could use such
microsystems to represent information. But, unlike the systems of classical
physics, these microsystems are also able to exist in a superposition of basis
states, which, informally speaking, means that the state of such a system can
also be a combination of basis states. We will call the information represented
by such microsystems quantum information. One may argue that in classi-
ca!. physics it is also possible to speak about combinations of basis states:
we can prepare a mixed state which is essentially a probability distribution
of the basis states. But there is a difference between the superpositions of
quantum physics and the prob ability distributions of classical physics: due to
the interference effects, the superpositions cannot be interpreted as mixtures
(probability distributions) of the basis states.
Richard Feynman [38] pointed out in 1982 that it appears to be extremely
difficult by using an ordinary computer to simulate efficiently how a quan-
tum physical system evolves in time. He also demonstrated that, if we had a
computer that runs according to the laws of quantum physics, then this sim-
ulation could be made efficiently. Thus, he actually suggested that a quantum
computer could be essentially more efficient than any traditional one.
Therefore, it is an interesting challenge to study quantum computation,
the theory of computation in which traditional information is replaced by
its quantum physical counterpart. Are quantum computers more powerful
than traditional ones? If so, what are the problems that can be solved more
efficiently by using a quantum computer? These quest ions are still waiting
for answers.
The purpose of this book is to provide a good introduction to quantum
computation for beginners, as weIl as a clear presentation of the most impor-
tant presently known results for more advanced readers. The latter purpose
also includes providing a bridge (from a mathematician's point of view) be-
tween quantum mechanics and the theory of computation: it is not only my
personal experience that the language used in research articles on these topics
is completely different.
Preface IX

This book is concentrated mainly on quantum algorithms but other in-


teresting topics, such as quantum information theory, quantum communica-
tion, quantum error-correcting, and quantum cryptography, are not covered.
Therefore, for additional reading, we can warmly recommend [43] by Josef
Gruska. Book [43] also contains a large number of references to works on quan-
tum computing. Areader who is more oriented to physics mayaIso See [89]
and [90] by C. P. Williams and S. H. Clearwater. It may also be useful to fol-
low the Los Alamos preprint archive at http://xxx.lanl.gov /archive/quant-ph
to leam about the new developments in quantum computing. The Los Alamos
preprint archive contains a large number of articles on quantum computing
since 1994, and includes many articles referred to in this book.

Acknowledgements. My warmest thanks go to Professors Grzegorz Rozen-


berg and Arto Salomaa for encouraging me to write this book, and to Profes-
sor Juhani Karhumäki and Thrku Centre for Computer Science for providing
excellent working conditions during the writing period. This work has been
supported by the Academy of Finland under grants 14047 and 44087. I am
also indebted to Docent P. J. Lahti, V. Halava, and M. Rönkä for their careful
revision work on parts of this book. Thanks also go to Dr. Hans Wössner for
a fruitful cooperation.
Thrku, Finland, February 2001 Mika Hirvensalo
Contents

1. Introduction.............................................. 1
1.1 ABrief History of Quantum Computation . . . . . . . . . . . . . . . . . 1
1.2 Classical Physics ....................................... 2
1.3 Probabilistic Systems ................................... 4
1.4 Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2. Quantum Information .................................... 13


2.1 Quantum Bits...... ... .... . ......... . .... . .... ... . ... .. 13
2.2 Quantum Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17
2.3 No-Cloning Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20
2.4 Observation ........................................... 22
2.5 Quantum Teleportation ................................. 24
2.6 Superdense Coding ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27
2.7 Exercises.............................................. 28

3. Devices for Computation ................................. 29


3.1 Uniform Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
3.1.1 Thring Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
3.1.2 Probabilistic Thring Machines. . . . . . . . . . . . . . . . . . . . .. 32
3.1.3 Multitape Thring Machines. . . . . . . . . . . . . . . . . . . . . . .. 36
3.1.4 Quantum Thring Machines. . . . . . . . . . . . . . . . . . . . . . .. 37
3.2 Circuits............................................... 41
3.2.1 Boolean Circuits ................................. 41
3.2.2 Reversible Circuits ............................... 43
3.2.3 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45

4. Fast Factorization ........................................ 49


4.1 Quantum Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49
4.1.1 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49
4.1.2 Hadamard-Walsh Transform . . . . . . . . . . . . . . . . . . . . . .. 51
4.1.3 Quantum Fourier Transform in Zn. . . . . . . . . . . . . . . . .. 52
4.1.4 Complexity Remarks... .. ... .... . .. . . . ... .... ... .. 57
4.2 Shor's Algorithm for Factoring Numbers. . . . . . . . . . . . . . . . . .. 58
4.2.1 From Periods to Factoring. . . . . . . . . . . . . . . . . . . . . . . .. 58
XII Contents

4.2.2 Orders of the Elements in Zn ...................... 60


4.2.3 Finding the Period ............................... 63
4.3 The Correctness Probability ............................. 65
4.3.1 The Easy Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 65
4.3.2 The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66
4.3.3 The Complexity of Shor's Factorization Algorithm . . .. 69
4.4 Excercises............................................. 70

5. Finding the Hidden Subgroup ............................ 73


5.1 Generalized Simon's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. 74
5.1.1 Preliminaries.................................... 74
5.1.2 The Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
5.2 Examples.............................................. 79
5.2.1 Finding the Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 79
5.2.2 Discrete Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 79
5.2.3 Simon's Original Problem ......................... 80
5.3 Exercises.............................................. 80

6. Grover's Search Algorithm ............................... 83


6.1 Search Problems ....................................... 83
6.1.1 Satisfiability Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83
6.1.2 Probabilistic Search .............................. 84
6.1.3 Quantum Search with One Query .................. 86
6.2 Grover's Amplification Method . . . . . . . . . . . . . . . . . . . . . . . . . .. 89
6.2.1 Quantum Operators for Grover's Search Algorithm . .. 89
6.2.2 Amplitude Amplification . . . . . . . . . . . . . . . . . . . . . . . . .. 90
6.2.3 Analysis of Amplification Method .................. 94
6.3 Utilizing Grover's Search Method . . . . . . . . . . . . . . . . . . . . . . . .. 97
6.3.1 Searching with Unknown Number of Solutions ....... 97

7. Complexity Lower Bounds for Quantum Circuits ......... 101


7.1 General Idea ........................................... 101
7.2 Polynomial Representations .............................. 102
7.2.1 Preliminaries .................................... 102
7.2.2 Bounds for the Representation Degrees .............. 106
7.3 Quantum Circuit Lower Bound ........................... 108
7.3.1 General Lower Bound ............................. 108
7.3.2 Some Examples .................................. 111

8. Appendix A: Quantum Physics ........................... 113


8.1 ABrief History of Quantum Theory ...................... 113
8.2 Mathematical Framework for Quantum Theory ............. 115
8.2.1 Hilbert Spaces ................................... 117
8.2.2 Operators ....................................... 118
8.2.3 Spectral Representation of Self-Adjoint Operators .... 122
Contents XIII

8.2.4 Spectral Representation of Unitary Operators ........ 125


8.3 Quantum States as Hilbert Space Vectors .................. 129
8.3.1 Quantum Time Evolution ......................... 130
8.3.2 Observables ..................................... 131
8.3.3 The Uncertainty Principles ........................ 134
8.4 Quantum States as Operators ............................ 139
8.4.1 Density Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.4.2 Observables and Mixed States ..................... 142
8.4.3 Subsystem States ................................. 147
8.4.4 More on Time Evolution .......................... 154
8.4.5 Representation Theorems .......................... 156
8.4.6 Jozsa's Theorem of Cloning and Deleting ............ 163
8.5 Exercises .............................................. 166

9. Appendix B: Mathematical Background .................. 167


9.1 Group Theory ......................................... 167
9.1.1 Preliminaries .................................... 167
9.1.2 Subgroups, Cosets ................................ 168
9.1.3 Factor Groups ................................... 169
9.1.4 Group Z~ ....................................... 171
9.1.5 Group Morphisms ................................ 173
9.1.6 Direct Product ................................... 175
9.2 Fourier Transforms ..................................... 176
9.2.1 Characters of Abelian Groups .................... " 176
9.2.2 Orthogonality of the Characters .................... 178
9.2.3 Discrete Fourier Transform ........................ 180
9.2.4 The Inverse Fourier Transform ..................... 182
9.2.5 Fourier Transform and Periodicity .................. 183
9.3 Linear Algebra ......................................... 183
9.3.1 Preliminaries .................................... 183
9.3.2 Inner Product .................................... 186
9.4 Number Theory ........................................ 189
9.4.1 Euclid's Algorithm ............................... 189
9.4.2 Continued Fractions .............................. 190
9.5 Shannon Entropy and Information ........................ 197
9.5.1 Entropy ......................................... 197
9.5.2 Information ...................................... 201
9.5.3 The Holevo Bound ............................... 203
9.6 Exercises .............................................. 204

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Index ......................................................... 211


1. Introduction

1.1 ABrief History of Quantum Computation

In connection to computational complexity, it could be stated that the theory


of quantum computation was launched in the beginning of the 1980s. A most
famous physicist, Nobel Prize winner Richard P. Feynman, proposed in his
article [38] which appeared in 1982 that a quantum physical system of R par-
ticles cannot be simulated by an ordinary computer without an exponential
slowdown in the efficiency of the simulation. However, a system of R particles
in classical physics can be simulated weH with only a polynomial slowdown.
The main reason for this is that the description size of a particle system is
linear in R in classical physics, l but exponential in R according to quantum
physics (In Section 1.4 we willlearn ab out the quantum physics description).
Feynman himself expressed:
But the full description of quantum mechanics for a large system with R par-
ticles is given by a function 'I/J(XI, X2, ... , XR, t) which we call the amplitude to
find the particles Xl, ... , XR, and therefore, because it has too many variables,
it cannot be simulated with a normal computer with a number of elements
proportional to R or proportional to N. [38]

Feynman also suggested that this slowdown could be avoided by using a com-
puter running according to the laws of quantum physics. This idea suggests, at
least implicitly, that a quantum computer could operate exponentiaHy faster
than a deterministic classical one. In [38], Feynman also addressed the prob-
lem of simulating a quantum physical system with a probabilistic computer
but due to interference phenomena, it appears to be a difficult problem.
Quantum mechanical computation models were also constructed by Be-
nioff [7] in 1982, but Deutsch argued in [31] that Benioff's model can be
perfectly simulated by an ordinary computer. In 1985 in his notable paper
[31], Deutsch was the first to establish asolid ground for the theory of quan-
tum computation by introducing a fully quantum model for computation and
giving the description of a universal quantum computer. Later, Deutsch also
defined quantum networks in [32]. The construction of a universal quantum
Thring machine was improved by Bernstein and Vazirani in [13], where the
I One should give the coordinates and the momentum of each particle with re-
quired precision.
2 1. Introduction

authors show how to construct a universal quantum Turing machine capable


of simulating any other quantum Turing machine with polynomial efficiency.
After the pioneering work of D. Deutsch, quantum computation still re-
mained a marginal curiosity in the theory of computation until 1994, when
Peter W. Shor introduced his celebrated quantum algorithms for factoring
integers and extracting discrete logarithms in polynomial time [81]. The im-
portance of Shor's algorithm for finding factors is well-known: reliability of
the famous RSA cryptosystem designed for secret communications is based on
the assumption that the factoring of large integers will remain an intractable
problem, but Shor demonstrated that this is not true if one could build a
quantum computer.
However, the theory is far more developed than the practice: a large-scale
quantum computer has not been built yet. A major difficulty arises from two
somehow contradictory requirements. On the one hand the computer mem-
ory consisting of a microscopically small quantum system must be isolated as
perfectly as possible to prevent destructive interaction with the environment.
On the other hand, the "quantum processing unit" cannot be totally isolated,
since the computation must carry on, and a "supervisor" should ensure that
the quantum system evolves in the desired way. In principle, the problem
that non-controllable errors occur is not a new one: in classical information
theory we consider a noisy channel that may corrupt the messages. The task
of the receiver is to extract the correct information from the distorted mes-
sage without any additional information transmission. The classical theory of
error-correcting codes [57] addresses this problem, and the elementary result
of this theory originally, due to Claude Shannon, could be expressed as fol-
lows: for a reasonably erratic channel, there is a co ding system of messages
which allows us to decrease the error probability in the transmission as much
as we want.
It was first believed that a corresponding scheme for quantum computa-
tions was impossible even in theory, mainly because of the No-cloning The-
orem [91], which states that quantum information cannot be duplicated ex-
actly. However, Shor demonstrated in his article [82] how we can construct
error-correcting schemes for quantum computers, thus establishing the the-
ory of quantum error-correcting codes. To leam more about this theory, the
reader may consult [24], for example. In article [72], J. Preskill emphasizes
that quantum error-correcting codes may someday lead to the construction
of a large-scale quantum computer, but this remains to be seen.

1.2 Classical Physics

Physics, as we can understand it today, is the theory of overall nature. This


theory is naturally too broad to be profoundly accessed in a brief moment,
so we are satisfied just to point out some essential features that will be
1.2 Classical Physics 3

important when learning about the differences between quantum and classical
computation.
As its very core, physics is ultimately an empirical science in the sense
that a physical theory can be regarded valid only if the theory agrees with
empirical observations. Therefore, it is not surprising that the concept of
observables 2 has great importance in the physical sciences. There are ob-
servables associated with a physical system, like position and momentum, to
mention a few. The description of a physical system is caHed the state of the
system.
Example 1.2.1. Assume that we would like to describe the mechanics of a
single particle X in a closed region of space. The observables used for the
system description are the position and the moment um. Thus, we can, un-
der a fixed coordinate system, express the state of the system as a vector
x = (Xl, X2, X3, pI, P2, P3) E jR6, where (Xl, X2, X3) describes the position and
(PI,p2,P3) the momentum.
As the particle moves, the state of the system changes in time. The way
in which classical mechanics describes the time evolution of the state is given
by the Hamiltonian equations 0/ motion:
d ä
-d xi=~H,
tUPi

where H = H(XI, X2, X3,PI,p2,p3) is an observable caHed the Hamiltonian


/unction of the system.
Suppose now that another particle Y is inserted into our system. Then
the fuH description of the system is given as a vector (x, y) E jR6 X jR6, where
x is as before, and y describes the position and the momentum of the second
particle.
To build the mathematical description of a physical system, it is sufficient to
assume that aH the observables take their values in a set whose cardinality
does not exceed the cardinality of the real number system. Therefore, we
can assume that the observables take real number values; if it is natural to
think that a certain observable takes values in some other set A, we can
always replace the elements of A with suitable real numbers. Keeping this
agreement in mind, we can list the properties of a description of a physical
system .
• A physical system is described by astate vector (also caHed state) x E jRk
for some k. The set of states is caHed the phase space of the system. The
state is the fuH description of the dynamic properties of interest. This
means that the state completely describes the properties whose develop-
ment we actually want to describe. The other physical properties, such as
electrical charge, temperature, etc., are not included in the state if they do
2 In classical mechanics, observables are usually called dynamic variables.
4 1. Introduction

not alter or do not affect the properties of interest. It should be empha-


sized here that, in the previous example, we were interested in describing
the mechanics of a particle, so the properties of interest are its position and
moment um. However, it may be argued that if we investigate a particle in
an electrical field, the electrical charge of the particle definitely affects its
behaviour: the greater the charge, the greater the acceleration. But in this
case, the charge is seen as a constant property of the system and is not
encoded in the state .
• The state depends on time, so instead of x we should actually write x(t) or
Xt. If the system is regular enough, as classical mechanics is, astate also
determines the future states (and also the past states). That is, we can
find a time-dependency which is a function Ut such that x(t) = Ut(x(O)).
This Ut , of course, depends on the system itself, as well as on the constant
properties of the system. In our first example, Ut is determined by the
Hamiltonian via the Newtonian equations of motion .
• If two systems are described by states x and y, the state of the compound
system consisting of both systems can be written as (x, y). That is, the
description of the compound system is a Cartesian product of the subsys-
tems.

1.3 Probabilistic Systems

In order to be able to comment on some characteristic features of the represen-


tation of quantum systems, we first study the representation of a probabilistic
system. A system admitting probabilistic nature means that we do not know
for certain the state of the system, but we do know the probability distribu-
tion of the states. In other words, we know that system is in states Xl, ... , X n
with probabilities PI, ... , Pn that sum up to 1 (if there is a continuum of the
possible states as in Example 1.2.1, we should have a probability distribution
that integrates up to I, but for the sake of simplicity, we will study here only
systems having finitely many states). Notation

(1.1)

where Pi 2: 0 and PI + ... + Pn = 1 stands for a probability distribution,


meaning that the system is in state Xi with prob ability Pi. We also call
distribution (1.1) a mixed state. Hereafter, states Xi are called pure states.
Now make a careful distinction between the notations: Expression (1.1)
does not mean the expectation value (also called the average) of the state

(1.2)

but (1.1) is only the prob ability distribution over states Xi.
1.3 Probabilistic Systems 5

Example 1.3.1. Tossing a fair coin will give head h or tail t with a probability
of ~. According to classical mechanics, we may think that, in principle, perfect
knowledge about the coin and all circumstances connected to the tossing will
allow us to determine the outcome with certainty. However, in practice it is
impossible to handle all these circumstances, and the notation ~ [hJ + ~ [tJ
for the mixed state of a fair coin reflects our lack of information about the
system.

Using an auxiliary system, we can easily introduce a probabilistic nature


for any system. In fact, this is what we usually do in connection with prob-
abilistic algorithms: the algorithm itself is typically thought to be strictly
deterministic, but it utilizes random bits, which are supposed to be freely
available. 3

Example 1.3.2. Let us assume that the time evolution of a system with pure
states Xl, ... , X n also depends on an auxiliary system with pure states hand
t, such that the compound system state (Xi, h) evolves during a fixed time
interval into (Xh(i),h) and (Xi,t) into (Xt(i),t), where h,t: {l, ... ,n} I-t
{I, ... ,n} are some functions. The auxiliary system with states hand t can
thus be interpreted as a control system which indicates how the original one
should behave.
Let us then consider the control system in a mixed state P1 [hJ + P2 [tJ (if
P1,P2 =I- ~, we may call the control system a biased coin). The compound
state can then be written as a mixture

which evolves into a mixed state

If the auxiliary system no longer interferes with the first one, we can ignore
it and write the state of the first system as

A control system in a mixed state is called a randomizer. Supposing that the


randomizers are always available, we may even assume that the system under
consideration evolves probabilistically and ignore the randomizer.

Using a more general concept of randomizer, we can achieve notational


and conceptual simplification by assuming that the time evolution of a system
is not a deterministic procedure, but develops each state Xi into a distribution

(1.3)
3 In classical computation, generating random bits is a very complicated issue. For
further discussion on this topic, consult section 11.3 of [64].
6 1. Introduction

such that Pli +P2i + ... +Pni = 1 for each i. In (1.3), Pji is the probability that
the system state Xi evolves into Xj. Notice that we have now also made the
time discrete in order to simplify the mathematical framework, and that this
actually is well-suited to the computational aspects: we can assurne that we
have instantaneous descriptions of the system between short time intervals,
and that during each interval, the system undergoes the time evolution (1.3).
Of course, the time evolution does not always need to be the same, but
rather may depend on the particular interval. Considering this probabilistic
time evolution, the notation (1.1) appears to be very handy: during a time
interval, a distribution

(1.4)

evolves into
+ ... + Pnl [Xn ]) + ... + Pn (Pln [Xl] + ... + Pnn [Xn ])
PI (Pll [Xl]
= (PllPl + ... + PlnPn) [xIJ + ... + (PnlPl + ... + PnnPn) [Xn ]
= P~ [Xl] + p~[X2] + ... + P~[Xn],
where we have denoted P~ = PilPl + ... + PinPn' The probabilities Pi and P~
are thus related by

P})
P2 (Pll
P2l.P12
P22 ... Pln) (PI)
... P2n P2
(1.5)
(
··· ...
.
....
...
.
..
p~ Pnl Pn2 ... Pnn Pn

Notice that the matrix in (1.5) has non-negative entries and Pli
+ P2i + ... +
Pni PI
= 1 for each i, which guarantees that p~ +p~+ ... +p~ = +P2+·· '+Pn'
A matrix with this property is called a Markov matrix. A probabilistic system
with a time evolution described above is called a Markov chain.
Notice that a distribution (1.4) can be considered as a vector with non-
negative coordinates that sum up to 1 in an n-dimensional real vector space
having [Xl], ... , [X n ] as basis vectors. The set of all mixed states (distribu-
tions) is a convex set,4 having the pure states as extremals. Unlike in the
representation of quantum systems, the fact that the mixed states are ele-
ments of a vector space is not very important. However, it may be convenient
to describe the probabilistic time evolution as a Markov mapping, i.e., a linear
mapping which preserves the property that the coordinates are non-negative
and sum up to 1. For an introduction to Markov chains, see [75], for instance.
4 A set S of vectors is convex, if for all Xl, X2 E Sand all PI, P2 2': 0 such that
PI + P2 = 1 also PIXI + P2X2 E S. An element X E S of a convex set is an
extremal, if x = PIXI + P2X2 implies that either PI = 0 or P2 = O.
1.4 Quantum Mechanics 7

1.4 Quantum Mechanics

In the beginning of the twentieth century, experiments on atoms and radiation


physics gave strong support to the idea that there are physical systems that
cannot be satisfactorily described even by using the Markov chain representa-
tion of the previous section. In Section 8.1 we will discuss these experiments
in more detail, but here in the introductory chapter we are satisfied with
only presenting the mathematical description of a quantum mechanical sys-
tem. We would like to emphasize that this representation based on so-called
state vectors which is studied here is not the most general representation of
quantum systems. A general Hilbert space formalism of quantum mechanics
is represented in Section 8.4. The advantage of a representation using only
state vectors is that it is mathematically simpler than the more general one.
To explain the terminology, we usually use "quantum mechanics" to mean
the mathematical structure that describes "quantum physics" , which can be
understood more generally.
The quantum mechanical description of a physical system looks very much
like the probabilistic representation

(1.6)

but still differs essentially from (1.6).


In fact, in quantum mechanics astate of an n-level system is depicted as a
unit-Iength vector in an n-dimensional complex vector space H n (see Section
9.3).5 We call H n the state space ofthe system. To illustrate, let us choose an
orthonormal basis {lxI) , ... , Ix n )} for the state space H n (we assume here
that n is a finite number). This strange "ket"-notation Ix) is originally due
to P. Dirac, and its usefulness will become apparent later. Now, any state of
our quantum system can then be written as

(1.7)

where O:i are complex numbers called the amplitudes (with respect to the
chosen basis) and the requirement of unit length means that 10:11 2+ 10:21 2+
... + IO:nl 2 = 1.
It should be immediately emphasized that the choice of the orthonormal
basis {lxI) , ... , Ix n )} is arbitrary but, for any such fixed basis, refers to a
physical observable which can take n values. To simplify the framework, we do
not associate any numerical values with the observables in this section, but we
merely say that the system can have properties Xl, ... , X n . Numerical values
associated to the observables are handled in Section 8.3.2. The amplitudes O:i
induce a probability distribution in the following way: the probability that a
system in astate (1.7) is seen to have property Xi is IO:il 2 (we also say that the
5 A finite-dimensional vector space over complex numbers is an example of a Hilbert
space.
8 1. Introduction

prob ability that Xi is observed is IO:iI 2 ). The basis vectors lXi) are called the
basis states and (1.7) is referred to as a superposition of basis states. Mapping
'l/J(Xi) = O:i is called the wave function with respect to basis lXI), ... , Ix n ).
Even though this state vector formalism is simpler than the general one,
it has one inconvenient feature: if Ix) ,Iy) E H n are any states that satisfy
Ix) = eiO Iy), we say that states Ix) and Iy) are equivalent. Clearly equivalent
states induce the same probability distribution over basis states (with respect
to any chosen basis), and we usually identify equivalent states.

Example 1.4.1. A two-Ievel quantum mechanical system can be used to rep-


resent a bit, and such a system is called a quantum bit. For such a system we
choose an orthonormal basis {IO) ,11)}, and a general state of the system is

where 10:01 2 + 10:11 2 = 1. The above superposition induces a probability distri-


bution such that the probabilities that the system is seen to have properties
o and 1 are 10:01 2 and 10:11 2 respectively.
What about the time evolution of quantum systems? The time evolution
of a probabilistic system via Markov matrices is replaced by matrices with
complex number entries that preserve quantity 10:11 2 + ... + IO:nI 2 • Thus, the
quantum system in astate

evolves during a time interval into the state

where amplitudes 0:1, ... , O:n and o:~, ... , o:~ are related by

(1.8)

and 10:~12 + ... + 10:~12 = 10:11 2+ ... + IO:nI 2. It turns out that the matrices
satisfying this requirement are unitary matrices (see Section 8.3.1 for details).
Unitarity of a matrix A means that the transpose complex conjugate of A,
denoted by A *, is the inverse matrix to A. Matrix A * is also called the
adjoint matrix 6 of A. By saying that the time evolution of quantum systems
is unitary, several authors mean that the evolution of quantum systems are
determined by unitary matrices. Notice that unitary time evolution has very
6 Among physics-oriented authors, there is also a widespread tradition of denoting
the adjoint matrix by At.
1.4 Quantum Mechanics 9

interesting consequences: unitarity especially means that the time evolution of


a quantum system is invertible. If fact, (al, ... , an) can be perfectly recovered
from (a~, . .. ,a~), since the matrix in (1.8) has an inverse.
We interrupt the description of quantum systems for a moment to discuss
the differences between probabilistic and quantum systems. As we have said
before, a mixed state (probability distribution)

(1.9)

of a probabilistic system and a superposition

(1.10)

of a quantum system formally resemble each other very closely, and therefore
it is quite natural if (1.10) can be seen as a generalization of (1.9). At first
glance, the interpretation that (1.10) induces the probability distribution such
that lai 12 is the probability of observing Xi may only seem like a technical
difference. Can (1.10) also be interpreted to represent our ignorance such that
the system in state (1.10) is actually in some state lXi) with a probability of
lail 2 ? The answer is absolutely no. A fundamental difference can be found
even by recalling that the orthonormal basis of H n can be chosen freely. We
can, in fact, choose an orthonormal basis Ix~), ... , Ix~) such that

so, with respect to the new basis, the state of the system is simply Ix~).
The new basis refers to another physical observable, and with respect to this
observable, the system may have some of the properties x~, ... , x~. But in
the state lxi), the system is seen to have property xi with a probability of
1.

Example 1.4.2. Consider a quantum system with two basis states t and h
(of course this system is the same as a quantum bit, only the terminology
is different for illustration). Here we call this system a quantum coin. We
consider a time evolution
1 1
Ih) f-t J2 Ih) + J2 It)
1 1
It) f-t J2 Ih) - J2 It) ,
which we here call a fair coin toss (verify that the time evolution is unitary).
Notice that, beginning with either state Ih) or It), the state after the toss is
one of the above states on the right-hand side, and that both of them have the
property that h and t will both be observed with a probability of ~. Imagine
then that we begin with state Ih) and perform the fair coin toss twice. After
the first toss, the state is as above, but if we do not observe the system, the
10 1. Introduction

state will be Ih) again after the second toss (verify). The phenomenon that
t cannot be observed after the second toss clearly cannot take place in any
probabilistic system.

Remark 1.4.1. In quantum mechanics, astate (1.10) is a pure state: it con-


tains the maximal information of the system properties. The mixed states of
quantum systems are discussed in Section 8.4.

Let us now continue the description of quantum systems by introducing


two systems with basis states lXI), ... , Ix n ) and IY1)' ... , IYm) respectively.
As in the case of classical and probabilistic systems, the basis states of the
compound system can be thought of as pairs (I Xi) , IY j )). It is natural to
represent the general states of the compound system as
n n

LLaij IXi,Yj) , (1.11)


i=l j=l

where L:~1 L:?=1 laij 1 2 = 1. A natural mathematical framework for repre-

sentations (1.11) is the tensor product of the state spaces of the subsystems,
and now we will briefly explain what is meant by a tensor product.
Let H n and H m be two vector spaces with bases {Xl,"" X n } and
{Y1,"" Ym}. The tensor product of spaces H n and H m is denoted by
Hn®Hm . Space Hn®Hm has ordered pairs (Xi,Yj) as a basis, thus Hn®Hm
has dimension mn. We also denote (Xi, Y j) = Xi ® Y j and say that Xi ® Y j is
the tensor product of basis vectors Xi and Y j. Tensor products of other vec-
tors than basis vectors are defined by requiring that the product is bilinear:
n m n m

(LaiXi) ® (LßjYj) = LLaißjXi®Yj.


i=l j=l i=l j=l

Since Xi ® Yj form the basis of H n ® H m , the not ion of the tensor product
of vectors is perfectly weIl established, but notice carefully that the tensor
product is not commutative. In connection with representing quantum sys-
tems, we usually omit the symbol ® and use notations closer to the original
idea of regarding Xi ® Y j as a pair (Xi, Y j ):

If (1.11) can be represented as


n m n m

i=l j=l i=l j=l


1.4 Quantum Mechanics 11

we say that the compound system is in decomposable state. Otherwise, the


system state is entangled. It is plain to verify that the the not ion "decompos-
able state" is independent of the bases chosen for each spaces.
Since Hl ® (Hm ® H n ) is clearly isomorphie to (Hl ® H m ) ® H n , we can
omit the parentheses and refer to this tensor product as H l ®Hm ®Hn . Thus,
we can inductively define the tensor products of more than two spaces.

Example 1.4.3. A system of m quantum bits is described by an m-fold tensor


product of two-dimensional Hilbert space H 2 . Thus the system has basis
states lXI) IX2) .. . Ix m ), where (Xl, ... , X m ) E {O, l}m. The dimension of this
space is 2m and the system of m qubits is referred to as the quantum register
of length m.

We conclude this section by listing the key features of a finite-state quan-


tum system in state-vector formalism.
• The basis states of an n-Ievel quantum system form an orthonormal basis
of the system state space, Hilbert space H n . This basis can be chosen freely
and it refers to a particular observable.
• Astate of a quantum system is a unit-Iength vector

(1.12)

These states are called superpositions of basis states, and they induce a
prob ability distribution such that when one observes (1.12), a property Xi
is seen with prob ability lai 1
2.

• If two quantum systems are depicted by using state spaces H n and H m ,


then the state space of the compound system consisting of these two sub-
systems is described by tensor product H m ® H n .
• The states of quantum systems change via unitary transformations.
2. Quantum Information

It must be first noted that, despite the title, quantum information theory is
not the main topic of this chapter. Instead, we will merely concentrate on de-
scriptional differences between presenting information by using classical and
quantum systems. In fact, in this chapter, we will present the fundament als of
quantum information processing. A reader weH aware of basic linear algebra
will presumably have no difficulties in foHowing this section, but for areader
feeling some uncertainty, we recommend consulting Sections 8.1, 8.2, and the
initial part of Section 8.3. Moreover, the basic notions of linear algebra are
outlined in Section 9.3.

2.1 Quantum Bits


Definition 2.1.1. A quantum bit, qubit for short, is a two-level quantum
system. Because there should not be any danger of confusion, we also say
that the two-dimensional Hilbert space H 2 is a quantum bit. Space H 2 is
equipped with a fixed basis B = {IO), 11)}, a so-called computational basis.
States 10) and 11) are also called basis states.

A general state of a single quantum bit is a vector

(2.1)

that has a unit length, i.e., Icol2 + ICll2 = 1. Numbers ca and Cl are called
amplitudes of 10) and 11), respectively.
We say that an observation of a quantum bit in state (2.1) will give 0 or
1 as an outcome with probabilities of Ical 2and IClI2, respectively.

Remark 2.1.1. The coordinate representation for quantum bits is chosen as


10) = (~), 11) = (~). The examples below concerning quantum gates
assurne that this representation is used.

Definition 2.1.2. An operation on a qubit, called a unary quantum gate, is


a unitary mapping U : H 2 ---+ H2.
14 2. Quantum Information

In other words, a unary quantum gate defines a linear operation


10) r-+ a 10) + b 11) (2.2)
11) r-+cI0)+dI1) (2.3)
such that the matrix

is unitary, Le.,

( ac)
bd
(a*c* d*b*) = (101·
0)
Remark 2.1.2. In the coordinate representation, (2.2) can be written as

and (2.3) can be written as

Notation a*
stands for the complex conjugate of complex number a. No-
tation A * will be also used for different purposes, but this should cause no
misunderstandings; the meaning of a particular *-symbol should be clear by
the context. In what follows, notation (a, b)T is used to indicate the transpo-
sition, i.e.,

Example 2.1.1. Let us use the coordinate representation 10) = (1,0)T and
11) = (0, 1)T. Then the unitary matrix

M~ = (~~)
defines an action M~ 10) = 11), M~ 11) = 10). A unary quantum gate defined
by M~ is called a quantum not-gate.

Example 2.1.2. Let us examine a quantum gate defined by a unitary matrix

l+i I-i)
JM~ =
( l~i ~ .

The action of v'M~ is


2.1 Quantum Bits 15

JM: 10) = 1; i 10) + 1 ~ i 11) , (2.4)

JM: 11) = 1 ~ i 10) + 1; i 11) . (2.5)

Since

1
2'
observation of (2.4) and (2.5) will give 0 or 1 as the outcome, both with a
probability of ~. Because ,jM~,jM~ = M~, gate ,jM~ is called the square
root of the not-gate.

Example 2.1.3. Let us study a quantum gate W 2 defined by matrix

The action of W 2 is
1 1
W 2 10) = ,;21 0 ) + ,;21 1),
1 1
W 2 10) = ,;21 0) - ,;21 1).
Matrix W 2 is called a Walsh matrix, Hadamard matrix, or Hadamard- Walsh
matrix and will eventually appear to be very useful. A very important feature
of quantum gates that has been already mentioned implicitly is that they are
linear, and therefore it suffices to know the action on the basis states. For
example, if a Hadamard-Walsh gate acts on astate ~(IO) + 11)), the outcome
is

Notice that the above equality reveals that W 2 W 2 10) = 10). Similarly, we
can verify that W2W211) = 11).
What happens to state 10) when Hadamard-Walsh gate W 2 is applied on
it twice, is a very interesting matter. Figure 2.1 contains a scheme of that
event.
The top row of the figure contains the initial state 10). When applied
once, W 2 "splits" the state 10) into states 10) and 11); both will be present
with amplitudes 0'
This is depicted in the figure's middle row. The second
application of W 2 "splits" 10) as before, but 11) is split slightly differently; 11)
16 2. Quantum Information

10)

~ 10) + ~ 11)

10) 11) 10) 11) ~ 10) + ~ 11) + ~ 10) - ~ 11)

Fig. 2.1. Hadamard-Walsh-gate twice. The left-hand side depicts how the applica-
tion of W2 operates on states, whereas the corresponding states are written on the
right side.

occurs with amplitude - ~. The bottom row of the figure describes the final
state. Now, the amplitudes in the bottom row can be computed by following
the path from top to bottom and multiplying all the amplitudes occuring in
the path. For example, the amplitude oft he left-most 10) in the bottom row is
~. ~ = ~,whereas the amplitude ofthe right-most 11) is ~. (- ~) = -~.
When computing the outcome, we add up all the states in the lower right
corner and get 10) as the outcome.
The effect that the amplitudes of states 10) sum to more than any of
the summands is called constructive interference. On the other hand, the
cancellation of states 11) is referred as to destructive interference.

Remark 2.1.3. The efficiency of quantum algorithms is based on the interfer-


ence phenomenon. In the forthcoming chapters, we willlearn about the most
important quantum algorithms in more detail.

Example 2.1.4. Let F be defined as

F=(~_~).
Then, Facts as F 10) = 10), F 11) = -11). Gate F is an example of unary
quantum gates called phase flips. In general, phase flips are of the form

for areal () (here we have denoted F 7r by F).


2.2 Quantum Registers 17

2.2 Quantum Registers

A system of two quantum bits is a four-dimensional Hilbert space H 4


H 2 Q9 H 2 having orthonormal basis {10) 10),10) 11),11) 10),11) 11)}. We also
write 10) 10) = 100), 10) 11) = 101), etc. Astate of a two-qubit system is a
unit-Iength vector

Co 100) + Cl 101) + c211O) + C3 111) , (2.6)

so it is required that Icol2 + ICl12 + IC212 + IC312 = 1.


Observation of a two-qubit system in state (2.6) will give 00, 01, 10, and
11 as an outcome with probabilities of Icol2, IC112, IC212, and IC312, respectively.
On the other hand, if we choose to observe only one of the qubits, the standard

°
rules of probabilities will apply. This means that an observation of the first
(resp., second) qubit will give or 1 with probabilities of lcol 2 + ICl12 and
IC212 + IC312 (resp., Icol2 + IC212 and ICl12 + 1c31 2).
Remark 2.2.1. Notice that here the tensor product of vectors does not com-
mute: 10) 11) -I- 11) 10). We use linear ordering (write from left to right) to
address the qubits individually.

Recall that the state z E H 4 of a two-qubit system is decomposable if z


can be written as a product of states in H 2 , z = x Q9 y. Astate that is not
decomposable is entangled.

Example 2.2.1. State ~(IOO) + 101) + 110) + 111)) is decomposable, since

~(IO) 10) + 10) 11) + 11) 10) + 11) 11)) = ~(IO) + 11)) ~(IO) + 11)),
as easily verified. On the other hand, state ~(IOO) + 111)) is entangled. To
see this, assume on the contrary, that
1
J2(100) + 111)) = (ao 10) + al 11) )(bo 10) + bIl l ))
= aobo 100) + aObl 101) + albo 110) + albl 111)
for some complex numbers ao, al, bo , and bl . But then,

which is absurd.
18 2. Quantum Information

Remark 2.2.2. Iftwo qubits are in an entangled state ~(IOO) + 111)), then,
°
observing one of them will give or 1, both with a probability of ~, but
it is not possible to observe different values on the qubits. It is interesting
to notice that the experiments have shown that this correlation can remain,
even if the qubits are spatially separated by more than 10 km [86]. This dis-
tant correlation opens opportunities for quantum cryptography and quantum
communication protocols. A pair of qubits in state ~ (100) + 111)) is called
an EPR pair. 1 Notice that if both qubits of the EPR pair are run through
a Hadamard gate, the resulting state is again ~(IOO) + 111)), so it is im-
possible to give an ignorance interpretation for the EPR pair. EPR pairs are
extremely important in many quantum protocols, and we will discuss their
usefulness later.

Definition 2.2.1. A binary quantum gate is a unitary mapping H 4 --+ H 4 .


To define the operation of a binary quantum gate, we use coordinate rep-
resentation 100) = (l,O,O,O)T, 101) = (O,l,O,O)T, 110) = (0,0, 1,0)T, and
111) = (0,0,0, l)T.
Example 2.2.2. Matrix

1000)
0100
Mcnot =
(
°°01
0010

defines a unitary mapping, whose action on the basis states is Mcnot 100) =
100), Mcnot 101) = 101), Mcnot 110) = 111), Mcnot 111) = 110). Gate Mcnot is
called controlled not, since the second qubit (target qubit) is flipped if and
only if the first (control qubit) is 1.

Other examples of multiqubit states and of their gates will be given after the
following important definition.

Definition 2.2.2. The tensor product, also called the Kronecker product, of
r x sand t x u matrices

all a12 ...


a21 a22 ... a2s
alS) (bll b12 ... b1U)
b21 b22 ... b2u
A = ( . '. . . .. and B = ... '. . . .. ,
.. . '. . '.
ar! a r2 ... ars btl bt2 ... btu

is an rt x su matrix defined as
1 EPR stands for Einstein, Podolsky, and Rosen, who first regarded the distant
correlation as the source of a paradox of quantum physics [88].
2.2 Quantum Registers 19

allB a12B ... aISB)


a2l B a22 B ... a2s B
A®B= ( . '
.. . . ' ... .
..
arlB ar2B ... arsB
If MI and M 2 are 2 x 2 matrices that describe unary quantum gates, then
it is easy to verify that the joint actions of MI on the first qubit and M 2 on
the second one are described by the matrix MI ® M 2. This also generalizes to
quantum systems of any size: if matrices MI and M 2 define unitary mappings
on Hilbert spaces H n and H m , then the nm x nm matrix MI ® M 2 defines a
unitary mapping on space H n ® H m . The action of MI ® M 2 is exactly the
same as the action of MI on H n , followed by the action of M 2 on H m (or
vi ce versa). In particular, mapping MI ® M 2 cannot introduce entanglement
between systems H n and H m'
Example 2.2.3. Using the Kronecker product of the previous definition, the
coordinate representations 10) = (~) and 11) = (~) induce the co ordinate
representations for 10) 10) = 10) ® 10), 10) 11), 11) 10), and 11) 11) in a very
natural way:

mQm~(~} mQm~(~}
mQm~m,andmQm~m·
Similarly, we can get coordinate representations of tripIes of qubits, etc. No-
tice that the above coordinate representations agree with those of Definition
2.2.1 and that it is not a mere coincidence!
Example 2.2.4. Let MI = M 2 = W 2 be the Hadamard matrix ofthe previous
section. Action on both qubits with a Hadamard-Walsh matrix can be seen
as a binary quantum gate, whose matrix is defined by

1 1 1 1)
11-11-1
(
W 4 = W 2 ® W 2 ="2 1 1 -1 -1 '
1 -1 -1 1
and the action of W 4 can be written as

W4lxoxl) = ~(IO) + (_I)X O11»)( 10) + (_I)Xl 11»)

= ~ (100) + (_I)Xl 101) + (_I)XO 110) + (_I)xo+x 1 111») (2.7)


20 2. Quantum Information

for any XO, Xl E {O, 1}.


Remark 2.2.3. Note that state (2.7) is decomposable. This is, of course, nat-
ural since the initial state IXoxl) is decomposable, and also W 4 decomposes
into a product of unary gates, W 4 = W 2 0 W 2 • On the other hand, M cnot
cannot be expressed as a tensor product of 2 x 2 matrices. This could, of
course, be seen by assuming the contrary, as in Example 2.2.1, but we can
use another argument. Consider a two-qubit system in an initial state 100).
The action of the Hadamard-Walsh gate on the first qubit transforms this
into astate

~(IO) + 11)) 10) = ~(IOO) + 110)). (2.8)

But the action of M cnot turns the decomposable state (2.8) into an entangled
state 0(100) + 111)). Because M cnot introduces entanglement, it cannot be
a tensor product of two unary quantum gates.

By a quantum register of the length m, we understand an ordered system


of m qubits. The state space of such a system is the m-fold tensor product
H 2 m = H 2 0 ... 0H2 , and the basis states are {Ix) I x E {O, 1}m}. Identifying
x = Xl ... X m with the binary representation of a number, we can also say
that the basis states of an m-qubit register are {la) I a E {O, 1, ... ,2m - 1}}.
A peculiar feature of the exponential packing density associated with
quantum systems can be plainly seen here: the general state of a system of
m quantum bits is

Co 10) + Cl 11) + ... + C2"'-1 12 m - 1) ,

where Icol2 +h1 2+ ... +lc2m_112 = 1. That is to say that a general description
of an m two-state quantum system requires 2m complex numbers. Hence, the
description size of the system is exponential in its physical size.
The time evolution of an m-qubit system is determined by unitary map-
pings in H 2 m. The size of the matrix of such a mapping is 2m X 2m , also
exponential in the physical size of the system. 2 A more detailed explanation
of quantum register processing is provided in Section 3.2.3.

2.3 No-Cloning Theorem

So far, we have been talking about qubits, but everything also generalizes
to n-ary quantum digits; If we have an alphabet A = {al, ... ,an}, we can
2 It should not be any more surprising that Feynman found the effective determin-
istic simulation of a quantum system difficult. Due to the interference effects, it
also seems to be difficult to simulate a quantum system efficiently with a prob-
abilistic computer.
2.3 No-Cloning Theorem 21

identify the letters with basis states lal), ... , lan ) of an n-Ievel quantum
system. We say that such a basis is a quantum representation of alphabet A.
These key features of quantum representations are already listed at the end
of the introductory chapter:
• A set of n elements can be identified with the vectors of an orthonormal
basis of an n-dimensional complex vector space H n . We call H n astate
space. When we have a fixed basis lal), ... , lan ), we call these vectors basis
states. Also, the basis that we choose to fix is usually called a computational
basis.
• A general state of a quantum system is a unit-Iength vector in the state
space. If allal) + ... + an la n ) is astate, then the system is seen in state
ai with a probability of lai 12 .
• The state space of a compound system consisting of two subsystems is the
tensor product of the subsystem state spaces.
• In this formalism, the state transformations are length-preserving linear
mappings. It can be shown that these mappings are exactly the unitary
mappings in the state space.
In the above list, we have operations that can be done with quantum
systems (under this chosen formalism). We now present a somewhat sur-
prising result called the "No-Cloning Theorem" due to W. K. Wootters and
W. H. Zurek [91]. Consider a quantum system having n basis states lal),
... , la n ). Let us denote the state space by H n and specify that the state lal)
is a "blank sheet state". A unitary mapping in H n 0 H n is called a quantum
copy machine, if for any state Ix) E H n ,

Theorem 2.3.1 (No-Cloning Theorem). For n > 1, there is no quantum


copymachine.
Proof. Assume that a quantum copy machine exists, even if n > 1. Be-
cause n > 1, there are two orthogonal states lal) and la2). We should have
U(lal) lal)) = lal) lal) and U(la2) lal)) = la2) la2), and also
1 1 1
U(v'2(lal) + la2)) lal)) = (v'2(lal) + la2)))(v'2(lal) + la2)))
1
= 2(Ial) lal) + lal) la2) + la2) lal) + la2) la2)).
But since U is linear,
1 1 1
U(v'2(lal) + la2)) lal)) = v'2U(lal) lal)) + v'2U(la2) lal))
1 1
= v'2 lal) lal) + v'2 la2) la2) .
The above two representations for U( ~(Ial) + la2)) lal)) do not coincide by
the very definition of a tensor product, a contradiction. D
22 2. Quantum Information

The No-Cloning Theorem thus states that there is no allowed operation (uni-
tary mapping) that would produce a copy of an arbitrary quantum state.
Notice also that in the above proof, we did not use unitarity; only the linear-
ity of the time-evolution mapping was needed. If, however, we are satisfied
with cloning only the basis states, there is a solution: let I = {I, ... ,n} be
the set of indices. Partially defined mapping f : I x I -+ I x I, f(i, 1) = (i, i)
is clearly injective, so we can complete the definition (in many ways) such
that f becomes apermutation of I x I and still satisfies f (i, 1) = (i, i). Let
f(i,j) = (i',i'). Then the linear mapping defined by U lai) laj) = lad la)')
is apermutation of basis vectors of H n (>9 H n , and any such permutation is
unitary, as is easily verified. Moreover, U(lai) laI)) = lai) lai), so U is a copy
machine operation on basis vectors.

Remark 2.3.1. A. K. Pati and S. L. Braunstein introduced a principle com-


plementary to the no-cloning principle [66]. We will return to this no-deleting
principle in Section 8.4.6, where we also represent a stronger variant of the
No-Cloning Theorem due to R. Jozsa.

Remark 2.3.2. Notice that, when defining the unitary mapping U(lai) laI)) =
lai) lai), we do not need to assume that the second system looks like the first
one; the only thing we need is that the second system must have at least as
many basis states as the first one. We could, therefore, also define a unitary
mapping U lai) Ib l ) = lai) Ibi ), where Ib l ), ... , Ibm) (m 2 n) are the basis
states of some other system. What is interesting here is that we could regard
the second system as a measurement apparatus designed to observe the state
of the first system. Thus, Ib l ) could be interpreted as the "initial pointer
state", and U as a "measurement interaction" . Measurements that can be
described in this fashion, are called von Neumann-Lüders measurements. The
measurement interaction is also discussed in Section 8.4.3.

2.4 Observation

So far, we have tacitly assumed that quantum systems are used for proba-
bilistic information processing according to the following scheme:
• The system is first prepared in an initial basis state.
• Next, the unitary information processing is carried out.
• Finally, we observe the system to see the outcome.
What is missing in the above procedure is that we have not considered the
possibility of making an intermediate observation3 during unitary process-
ing. In other words, we have not discussed the effect of observation upon the
quantum system state. For a more systematic treatment of observation, the
3 In physics literat ure, the term "observation" is usually replaced with "measure-
ment".
2.4 Observation 23

reader is advised to study Section 8.3.2. For now, we will present, in a sim-
plified way, the most widely used method for handling state changes during
an observation procedure. Suppose that a system in state

is observed with Xk as the outcome. According to the projection postulate,4


the state of the system after observation is IXk). Suppose, then, that we have
a compound system in state
n m
L L aij lXi) IYj) (2.9)
i=l j=l

and the first system is observed with outcome Xk (notice that the probability
of observing Xk is P(k) = "L,j'=l lakj 12 ). The projection postulate now implies
that the postobservation state of the whole system is

(2.10)

In other words, the initial state (2.9) of the system is projected to the sub-
space that corresponds to the observed state and renormalized to the unit
length. It is now worth strongly emphasizing that the state evolution from
(2.9) to (2.10) given by the projection postulate is not consistent with the
unitary time evolution, since unitary evolution is always reversible; but
there is no way to recover state (2.9) from (2.10). In fact, no explanation for
the observation process which is consistent with quantum mechanics (using a
certain interpretation) has ever been discovered. The difficulty arising when
trying to find such an explanation is usually referred to as the measurement
paradox 0/ quantum physics.
However, instead of having intermediate observations that cause a "col-
lapse of astate vector" from (2.9) to (2.10), we can have an auxiliary
system with unitary measurement interaction (basis state copy machine)
lXi) lXI) H lXi) lXi). That is, we replace the collapse (2.9) H (2.10) by a
measurement interaction, which turns state
n m n m
(LLaij lXi) IYj)) lXI) = LLaij lXi) IYj) lXI) (2.11)
i=l j=l i=l j=l

into
n m

L L aij lXi) IYj) lXi) . (2.12)


i=l j=l

4 The projection postulate can be seen as an ad hoc explanation for the state
transform during the measurement process.
24 2. Quantum Information

Even though the transformation from (2.11) to (2.12) appears different from
(2.9) H (2.10) at first glance, it has many similar features. In fact, using
notation P(k) = 2:7'=1IakjI2 again, we can rewrite (2.12) as

(2.13)

Let us now interpret (2.12) (and (2.13)): the third system which we intro-
duced shatters the whole system into n orthogonal subspaces, which are,
for any k E {1, ... ,n} spanned by vectors lXi) IYj) IXk), i E {1, ... ,n},
jE {I, ... ,m}. The left multipliers ofvectors Jp(k)lxk) are unit-length
vectors exactly the same as in (2.10). Moreover, the probability of observing
Xk in the right-most system of (2.12) is P(k). Therefore, it should be clear
that, if quantum information processing continues independently of the third
system, then the final probability distribution is the same if operation (2.9)
H (2.10) is replaced with operation (2.11) H (2.12). For this reason, we will
not consider intermediate observations in this book, but may refer to them
only as notational simplifications of procedure (2.11) H (2.12).
Anyway, it is an undenied fact that the observation procedure always
disturbs the original system. If it becomes necessary to refer to a quantum
system after observation, we will mainly adopt the projection postulate or
even assurne that the whole system is lost after observation.

2.5 Quantum Teleportation

In the cryptography community, there is a tradition to call two communicat-


ing parties not only by letters A and B but by names Alice and Bob. Since
this convention brings some life to the text, we will also use it here.
Assurne now that Alice has a single qubit in state

aIO)+bll), (2.14)

but state (2.14) is unknown to Alice. Now Alice wants to send state (2.14)
to Bob.
One, at least theoretical, possibility is to send Bob the whole two-state
quantum system that is in state (2.14). We say that there is a quantum
channel from Alice to Bob if this is possible. Similarly, if Alice can send
classical bits to Bob, we say that there is a classical channel from Alice to
Bob.
Now, we assurne that there is no quantum channel from Alice to Bob, but
a classical one exists. Alice's task, to send state (2.14) to Bob appears quite
impossible. Notice that state (2.14) is unknown to Alice, so she cannot send
Bob instructions for constructing (2.14).
2.5 Quantum Teleportation 25

Alice may try, for instance, to observe her state and then send Bob the
outcome, 0 or 1. This attempt fails immediately if both a and bare nonzero;
Bob cannot reconstruct state (2.14). Alice cannot make many observations
either, because her observation always disturbs her qubit. Alice cannot make
copies of her qubit for many observations, since copying of an unknown quan-
tum state is impossible, as shown in Section 2.3.1. If an unlimited number of
observations were allowed to Alice, she could get arbitrarily precise approxi-
mations of the probabilities of seeing 0 and 1 as outcomes, which is to sa! that
she could get arbitrarily good approximations of numbers lal 2 and Ibl . But
even the exact values of lal 2 and Ibl 2 are not enough for Bob to reconstruct
state (2.14). This can be noticed immediately, for those values are identical
for states
1 1
-10)+ -11) (2.15)
y'2 y'2
and
1 1
-10) - -11). (2.16)
y'2 y'2

Nevertheless, states (2.15) and (2.16) can behave quite differently, as seen
by applying a Hadamard-Walsh gate (see Section 2.2) on both states: the
out comes will be 10) and 11), respectively (See Exercise 1).
In fact, it is impossible for Alice to send her quantum bit to Bob by only
using a classical channel. To see this, it is enough to notice that classical
information can be perfectly cloned: if there were a way to reconstruct state
(2.14) from some classical information (which state (2.14) itself determines),
then we would be able to make an unlimited number of reconstructions of
(2.14). But this means that we would be able to create a quantum copy
machine, and that was already proven impossible.
On the other hand, if Alice and Bob initially share an EPR pair, there is
a way to execute the required task. This protocol, introduced in [11], is called
quantum telepo'rtation. 5
We will now describe the teleportation protocol. The basic assumption is
that Alice and Bob have two qubits in EPR state
1 1
y'2100) + y'21 11 ). (2.17)

Notations are chosen such that the qubit on the left-hand side belongs to
Alice and the right-hand side qubit belongs to Bob. In addition to qubits
(2.17), Alice has her qubit to be teleported in state
5 One could argue that to create an EPR pair, Alice and Bob should have a
quantum channel. On the other hand, Alice and Bob could have a supply of
EPR pairs generated when they last met.
26 2. Quantum Information

aIO)+bll).

The compound state of all of the qubits will be denoted as

(a 10) + b 11)) ~(IOO) + 111))

= ~ 10) 100) + ~ 10) 111) + ~ 11) 100) + ~ 11) 111)


a a b b
= vl21000) + vl21011) + vl21100) + vl21111). (2.18)

Recall that only the right-most qubit belongs to Bob and that Alice has fuH
access to the two qubits on the left.

Teleportation protocol
1. Alice performs the controHed not-operation on her qubit, using the left-
most one as the control qubit (see Example 2.2.2). State (2.18) becomes
then
a a b b
vl21000) + vl21011) + vl21110) + vl2 1101 ). (2.19)

2. Alice's next action is to make a Hadamard-Walsh transformation (recall


Example 2.1.3) on the left-most qubit. The result is
a 1 a 1
vI2 vI2(IO) + 11)) 100) + vI2 vI2(IO) + 11)) 111)
b 1 b 1
+ vI2 vI2(IO) -11)) 110) + vI2 vI2(IO) -11)) 101),
which can be written as
a a a a
21000) + 21100) + 21011) + 2 1111 )
b b b b
+ 21010) - 2 1110) + 2 1001) - 2 1101) . (2.20)

Taking apart Bob's qubit, state (2.20) can clearly be rewritten as


1 1 1 1
2100) a 10) + 2110) a 10) + 2101) all) + 2111) all)
+ ~ 101) biO) - ~ 111) biO) + ~ 100) b 11) - ~ 110) b 11)
and, moreover, as
1 1
2100) (a 10) + bll)) + 2101) (a 11) + biO))
1 1
+ 2110) (aIO) -bll)) + 2111) (all) -biO)). (2.21)
2.6 Superdense Coding 27

3. Now Alice observes her two qubits. As the outcome, she sees 00, 01, 10,
and 11, each with a probability of ~. For brevity, we will refer to the
projection postulate and summarize the resulting state (depending on
the outcome) in the following table.

Alice's observation Postobservation state


00 100)(aI0)+bI1))
01 101) (a 11) + biO))
10 110) (a 10) - b 11))
11 111) (a 11) - biO))
Recall now that the two left-most qubits are Alice's, whereas the right-
most one belongs to Bob.
4. Alice sends Bob her observation result (two classical bits).
5. Bob makes the following:
• If Alice's bits are 00, Bob makes nothing. His qubit is already in state
alO) +bI1).
• If Alice sent 01, Bob performs the not-operation on his qubit, thus
obtaining the desired state.
• If Alice sent 10, Bob performs the phase-flip operation (See Example
2.1.4) on his bit. Again, the resulting state is a 10) + b 11).
• If Alice's bits were 11, Bob first performs the not-operation and after
that the phase flip.

Remark 2.5.1. It should be emphasized here that in the quantum teleporta-


tion described above, no physical qubit is transmitted, just the state of the
qubit. It is also worth noticing that the quantum state is transmitted, not
copied; the original state held by Alice is destroyed in the protocol.

Remark 2.5.2. The first experimental realizations of teleportation were re-


ported in [15] and [16]. Shortly after, [39] reported an extended version of
those first ones.

2.6 Superdense Coding

Superdense coding, introduced in [12], is the complementary action to quan-


tum teleportation. Initially, Alice and Bob share an EPR pair, and there is
a quantum channel from Alice to Bob. Now Alice wants to send classical bits
to Bob. Superdense coding is a protocol where Alice sends one quantum bit
to Bob, but the amount of transmitted information is two classical bits.
In theory, superdense coding is achieved as folIows: Alice has two classical
bits, b1 and b2 , and Alice shares an EPR pair
1 1
J21 00 ) + J21 11 ) (2.22)
28 2. Quantum Information

with Bob. Again we assurne that the qubit marked on the left is Alices, and
the other one is Bob's.

Superdense coding protocol


1. If h = 1, then Alice performs the phase flip on her qubit. If also b2 = 1,
then she also performs the not-operation on her qubit. What becomes of
state (2.22) is summarized in the following table.
b1 b2 State after Alice's operations
o0 y'2100) + y'2111)
o 1 ~ 110) + ~ 101)
1 0 ~ 100) - ~ 111)
1 1 ~ 110) - ~ 101)
2. Alice sends her qubit to Bob.
3. Now that Bob has access to both qubits, he runs them both through a
two-qubit gate B defined by the matrix

B - _(~ ~l
1
y'2 0 0-y'2
iz)
1 (2.23)

o -~ ~ 0
(verify that matrix B is unitary). It is then easy to verify, by direct cal-
culation, that the following table (extending the previous one) is correct:
b1 b2 State after Alice's operations State after Bob's operation
o0 y'2100) + y'2111) 100)
o1 ~ 110) + ~ 101) 101)
1 0 ~ 100) - ~ 111) 110)
1 1 ~ 110) - ~ 101) 111)
4. Bob observes his qubits. The table above shows that the bits b1 and b2
are recovered faithfully.
Remark 2.6.1. Regarding Alice's action in the beginning, it is sufficient and
necessary to force the qubits into orthonormal states (recall from the begin-
ning ofSection of2.2 that states 100),101),110), and 111) form an orthonormal
set). Whenever this can be done, there always exists a unitary mapping that
transforms these orthonormal states to basis states.

2.7 Exercises
1. Compute W 2 W 2 . ConcIude that W 2 applied to state (2.15) (resp., 2.16)
yields 10) (resp., 11))
2. Verify that matrix (2.23) is unitary.
3. Devices for Computation·

To study computational pro ces ses , we have to fix a computational device


first. In this chapter, we study Turing machines and circuits as models of
computation. We use the standard notations of formal language theory and
represent these notations briefiy now. An alphabet is any set A. The elements
of an alphabet Aare called letters. The concatenation of sets A and B is
a set AB consisting of strings formed of any element of A followed by any
element of B. Especially, A k is the set of strings of length k over A. These
strings are also called words. The concatenation WI W2 of words WI and W2
is just the word WI followed by W2. The length of word W is denoted by Iwi
or R( w) and defined as the number of the letters that constitute w. We also
define AO as the set that contains only the empty word E that has no letters,
and A* = AO U Al U A2 U ... is the set of all words over A. Mathematically
speaking, A * is the free monoid generated by the elements of A, having the
concatenation as the monoid operation and E as the unit element.
The attribute "uniform" is usually reserved for those models of computa-
tion, where the length of the input is not fixed but the input can be arbitrarily
long. For instance, Turing machines can be regarded as uniform models of
computation. On the other hand, circuits (which are handled in Section 3.2),
are not uniform, since circuits must be constructed for each input size.

3.1 Uniform Computation

3.1.1 Turing Machines

The very classical model to describe computational processes is the Turing


machine, or TM for short. To give a description of a Turing machine, we fix a
finite set of basic information units called alphabet A, a finite set of internal
control states Q and a transition junction

6: Q x A --+ Q x A x {-l,D,l}. (3.1)

Definition 3.1.1. A (deterministic) Turing machine M over alphabet A is


a sixtuple (Q, A, 6, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states respectively and 6 as above.
30 3. Devices for Computation

The reason for ealling the Turing maehine of the above definition determin-
istic is due to the form of the transition 8, which deseribes the dynamics (the
eomputation) of the maehine. Later we will introduee some other forms of
Turing maehines, but for now, we will give an interpretation for the above
definition, mainly using the notations of [64]. As a configuration of a Turing
machine we understand a triplet (ql. x, y), where ql E Q and x, y E A*. For
a eonfiguration c = (ql, x, y), we say that the Turing maehine is in state ql
and that the tape eontains word xy. If word y also begins with letter al, we
say that the maehine is scanning letter al or reading letter al. To deseribe
the dynamics determined by the transition function 8, we write x = WI a
and y = al W2, where a, al E A and wl. W2 E A *. Thus, C ean be written as
C = (ql, WI a, al W2), and the transition 8 defines also a transition rule from
one eonfiguration to another in a very natural way: if 8(ql. al) = (q2, a2, d),
then a eonfiguration

ean be transformed to

depending on if d = -1, d = 0, or d = 1 respectively. This means that, under


transition 8 (ql, al) = (q2, a2, d), the symbol al being seanned is replaced with
a2 (we say that the machine prints a2 to replace al); the machine enters state
q2, and begins to scan the symbol to the left ofthe previously scanned symbol,
eontinues seanning the same loeation, or seans the symbol to the right of the
previously scanned symbol, depending on d E {-1, 0, 1}. We also say that
the Turing machine 's read-write head moves to the left, remains stationary,
or moves to the right.
If 8 defines a transition from c to c', we write c f- c' and say that c' is a
successor of c, c yields c', and that c is followed by c'. The transition from c
to c' is called a computational step. The anomalous cases c = (ql, W, E) and
c = (ql. E, w) (recall that E was defined to be the empty word having no letters
at all) are treated by introducing a blank symbol u, extending the definition
of 8 and replacing (ql. W, E) (resp. (ql, E, w)) with (ql, w, u), (resp. (ql. u, w))
if necessary.
A computation of a Turing machine with input w E A * is defined as a
sequence of configurations Co, Cl. C2, ... such that Co = (qo, E, w) and Ci f- CHI
for each i. We say that the computation halts if some Ci has no successors, or
if the state symbol of the configuration of Ci is either qa or qr. In the former
case, the computation is accepting, and in the latter case it is rejecting.
If a computation of a Turing machine T beginning with configuration
(qO, E, w) leads into a halting eonfiguration (q, WI, W2) in t computational
steps, we say that T computes WI W2' from the input word w in time t. Thus,
a Turing machine can be seen as a device computing (partial) functions A* -+
A *; but, we can also ignore the output WI W2 and just say that the Turing
3.1 Uniform Computation 31

machine T accepts the input w if it halts in an accepting state, and that T


rejects the input w if it either does not halt or halts in a rejecting state. Thus,
a Turing machine can also be seen as an acceptor that classifies the words
into those which are accepted and those which are rejected.

Definition 3.1.2. A set S of words over A is a recursively enumerable lan-


guage if there is a Turing machine T such that T accepts w if and only if
wES.

Definition 3.1.3. A set S is a recursive language if there is a Turing ma-


chine T such that each computation of T halts, and T accepts w if and only
ifw E S.
Families of recursively enumerable and recursive languages are denoted by
RE and R respectively. By definition, R ~ RE, but it is a well-known fact
that also R =f. RE; see [64], for instance.
It is a widespread belief that Turing machines capture the not ion of algo-
rithmic computability: whatever is algorithmically computable is also com-
putable by a Turing machine. This strong belief should actually be called
a hypothesis, which is known as the Church- Turing thesis. In this book we
establish the notion of algorithm by using Turing machines. Turing machines
provide a clear notion for computational resources like the time and the space
used during computation. Moreover, by using Turing machines, it is easy to
introduce the notions of probabilistic, nondeterministic and quantum com-
putation. On the other hand, it is worth arguing that, for almost all practical
purposes, the notion of the Turing machine is clumsy: even to describe very
simple algorithms by using Turing machines requires a lengthy and fairly
non-intuitive description of the transition rules. For the above reasons, we
will represent the fundamental concepts of computability by using Turing
machines, but we will also use more sophisticated notions of algorithms in-
cluding Boolean circuits, quantum circuits and even pseudo-programming
languages.
The above definitions of recursive and recursively enumerable languages
represent two classes of algorithmic decision problems. When problem is a
decision problem, it means that we are given a word w as an input and we
should decide if w belongs to some particular language or not. It is clear that,
for recursive languages, when the Turing machine accepts the particular lan-
guage, it offers an algorithm for solving this problem. However, for recursively
enumerable languages the corresponding Turing machine does not provide a
good algorithmic solution. The reasOn for this is that even though the ma-
chine halts when the input w belongs to S, we do not know how long the
computation lasts On individual input words w. Thus, we do not know, how
long we should wait until we know that the machine will not halt, i.e., the
decision w ~ S cannot be made. We call all decision problems inside R algo-
rithmically solvable, recursively solvable, or decidable. The decision problems
outside Rare called recursively unsolvabl~ or undecidable.
32 3. Devices for Computation

The classification into recursively solvable and unsolvable problems is


nowadays far too rough. A typical way to introduce some refinement is to con-
si der the computational time required to solve a particular problem. There-
fore, we now define some basic concepts connected to measuring the com-
putation time. In complexity theory, we are usually interested in measuring
the computation time only up to some multiplicative constant, and for that
purpose, the following notations are useful:
Let fand 9 be functions N --+ N. We write f = O(g), ifthere are constants
c> 0 and no E N such that f(n) :::; cg(n), whenever n 2 no. If f(n) 2 cg(n)
whenever n 2 no, we write f = [2(g). If f = O(g) and f = [2(g), then we
write f = 8(g) and say that fand gare of the same order.
Let M be a Turing machine that halts on each input and T(n) be the
maximum computation time on inputs having length n. We say that T( n)
is the time complexity function of M. A Turing machine M is a polynomial-
time Tu:ring machine if its time complexity function satisfies T(n) = O(n k )
for some constant k. The family of decision problems that can be solved by
polynomial-time Turing machines is denoted by P. Using widespread jargon,
we say that the decision problems in P are tractable, while the other ones are
intractable.

3.1.2 Probabilistic Turing Machines

We can modify the definition of a deterministic Turing machine to provide a


model for probabilistic computation. The necessary modification is to replace
the transition function with transition probability distribution

<5: Q x A x Q x A x {-1,0,1} --+ [0,1].

Value <5(qI, al, q2, a2, d) is interpreted as the probability that, when the ma-
chine in state ql reads symbol al, it will print a2, enter the state q2, and
move the head to direction d E { -1, 0, I}.
Definition 3.1.4. A probabilistic Turing machine M over alphabet A is a
sixtuple (Q, A, <5, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states respectively. It is required that for all (ql, al) E Q x A

~ <5(ql,al,q2,a2,d) = 1.
(Q2,a2,d)EQxAx {-I,O,I}

Prom this time on, to avoid fundamental difficulties in the notion of com-
putability, we will agree that all the values of <5(ql, al, q2, a2, d) are rational. l
The computation of a probabilistic Turing machine is not so straight-
forward a concept as the deterministic computation. We say that the con-
figuration c = (ql, Wl a, al W2) yields (or is followed by) any configuration
1 This agreement can, of course, be criticized, but the feasibility of a machine
working with arbitrary real number probabilities is also questionable.
3.1 Uniform Computation 33

c' = (q2,Wl,aa2w2) (resp. c' = (q2,wla,a2w2) and c' = (Q2,wlaa2,w2)) with


probability P = 8(Ql, ab Q2, a2, 1) (resp. with probability P = 8(Ql, al, Q2, a2, 0)
and P = 8(QI, al, Q2, a2, -1)). If C yields c' with probability P, we write C f- p c'.
Let Co, Cl, ... , Ct be a sequence of configurations such that Ci f- Pi + 1 CHI for
each i. Then we say that Ct is computed from Co in t steps with probability
PIP2 ... Pt· If PIP2 ... Pt -I- 0, we also say that Co f- P1 Cl f- P2 ..• f- Pt Ct is a
computation of a probabilistic Turing machine.
Remark 3.1.1. Areader who thinks that the not ion of probabilistic compu-
tation based on probabilistic Turing machines does not correspond very weIl
to the idea of an algorithm utilizing random bits is perfectly right! In many
ways a simpler model for probabilistic computations could have been given
by using nondeterministic Turing machines. The reason for presenting this
model is to make the traditional definition of a quantum Turing machine (see
e.g. [13]) clearer.

Unlike in a deterministic computation, a single configuration can now


yield several different configurations with probabilities that sum up to 1.
Thus, the total computation of a probabilistic Turing machine can be seen
as a tree having configurations as nodes. The initial configuration is the root
and each node C has configurations followed by C as descendants. Thus, a
computation of the probabilistic machine in question is just a single branch
in the total computation tree. All computations can be expressed by the
terms of the probabilistic systems we considered in the introductory chapter:
assurne that taking t steps, a probabilistic Turing machine starting from an
initial configuration computes configurations Cl, ... , Cm with probabilities
PI, ... , Pm such that PI + ... + Pm = 1. We can then say that the total
configuration of a probabilistic machine at time t is a probability distribution

(3.2)

over the basis configurations Cl, ... , Gm. As in the introductory chapter, we
can define a vector space, which we now call the configuration space, that has
all of the potential basis configurations as the basis vectors, and a general
configuration (3.2) is a vector in the configuration space having non-negative
coordinates that sum up to 1. On the other hand, there is now an essentially
more complicated feature compared to the introductory chapter: there is a
countable infinity of basis vectors, so our configuration space is countably
infinite-dimensional.
Regarding this latter representation of total configurations of a pro ba-
bilistic Turing machine, the reader may already guess how to introduce the
notion of quantum Turing machines, but we will still continue with pro ba-
bilistic computations and study some acceptance models.
When talking about time-bounded computation in connection with prob-
abilistic Turing machines, we usually assurne that all the computations have
the same length. In other words, we assurne that all the computations are
34 3. Devices for Computation

synchronized so weIl that they reach a halting configuration at the same time,
so that aIl the branches of the computation tree have the same length. We
can thus regard a probabilistic Turing machine as a facility for computing
the prob ability distribution of outcomes. For instance, if the purpose of a
particular probabilistic machine is to solve adecision problem, then some of
the computations may end up in an accepting state, but some mayaIso end
up in a rejecting state. What then do we mean by saying that a probabilistic
machine accepts astring or a language? In fact, there are several different
choices, some of which are given in the foIlowing definitions:
• Class NP is the family of languages S that can be accepted in polynomial
time by some probabilistic Turing machine M in the foIlowing sense: a word
w is in the language S if and only if M accepts w with nonzero probability
(see Remark 3.1.2).
• Class RP is the family of languages S that can be accepted in polynomial
time with some probabilistic Turing machine M in the foIlowing way: if
WES, then M accepts W with a probability of at least ~, but if w r:f- S,
then M always rejects w.
• Class coRP is the dass of languages consisting exactly of the complements
of those in RP. Notice that coRP is not the complement of RP among
all the languages.
• We define ZPP = RP n coRP .
• Class BPP is the family of languages S that are accepted by a probabilistic
Turing machine M such that if WES, then M accepts with a probability
of at least ~, and if w r:f- S, then M rejects with a probability of at least ~.
Remark 3.1.2. The definition of the dass NP (standing for nondeterministic
polynomial time) given here is not the usual one. A much more traditional
definition would be given by using the notion of a nondeterministic Turing
machine, which can be obtained from the probabilistic ones by ignoring the
probabilities. In other words, a nondeterministic Turing machine has a tran-
sition relation 0 s:;; Q x A x Q x A x { -1,0, I}, which teIls whether it is possible
for a configuration c to yield another configuration c'. More precisely, the fact
that (ql,al,q2,a2,d) E 0 means that, if the machine is in state ql scanning
symbol al, then it is possible to replace al with a2, move the head to direc-
tion d, and enter state q2' However, this model resembles the probabilistic
computation very dosely: indeed, the not ion of "computing c' from c in t
steps with nonzero probability" is replaced with the not ion that "there is a
possibility to compute c' form c in t steps" , which does not seem to make any
difference. The acceptance model for nondeterministic Turing machines also
looks like the one that we defined for probabilistic NP-machines. A word w is
accepted if and only if it is possible to reach an accepting final configuration
in polynomial time.
It can also be argued that the dass NP does not correspond very weIl
to our intuition of practical computation. For example, if each configuration
yields two distinct ones each with a prob ability of ~ and the computation
3.1 Uniform Computation 35

lasts t steps, there are 2t final configurations, each computed from the initial
one with a probability of f,:. However, we say that the machine accepts a
word if and only if at least one of these final configurations is accepting, but
it may happen that only one final configuration is accepting, and we can-
not distinguish the acceptance probabilities f,: (accepting) and 0 (rejecting)
practically without running the machine D(2 t ) times. But this would usually
make the computation last exponential time since, if the machine reads the
whole input, then t ~ n, where n is the length of the input. By its very
definition, each deterministic Thring machine is also a probabilistic one (al-
ways working with a probability of 1), and therefore P ~ NP. However,
it is a long-standing open problem in theoretical computer science whether
P 1= NP.

Remark 3.1.3. Class RP (randomized polynomial time) corresponds more


dosely to the not ion of realizable effective computation. If w rf. S, then this
fact is revealed by any computation. On the other hand, if wES we leam
this with a probability of at least ~. This means that an RP-Turing ma-
chine has no false positives, but it may give a false negative, however, with a
probability of less than ~. By repeating the whole computation k times, the
probability of making false decision w rf. S can thus be reduced to at most
-f.,.. A probabilistic Thring machine with RP-style acceptance is also called
Monte Carlo Turing machine.

Remark 3.1.4. If a language S belongs to ZPP, then there are Monte Carlo
Thring machines MI and M 2 accepting Sand the complement of S respec-
tively. By combining these two machines, we obtain an algorithm that can
be repeatedly used to make the correct decision with certainty. For if MI
gives answer wES one time, it surely means that WES since MI has no
false positives. If M 2 gives an answer w E A* \ S, then we know that w 1:- S
since M 2 has no false positives either. In both cases, the probability of a false
answer is at most ~, so by repeating the procedure of running both machines
k times, we obtain with a probability of at least 1 - -f.,. the certainty that
either wES or w 1:- S. Thus, we can say that a ZPP-algorithm works like a
deterministic algorithm whose expected running time is polynomial. Notation
ZPP stands for Zero error Probability in Polynomial time. A ZPP-algorithm
is called a Las Vegas-algorithm.

Remark 3.1.5. The definition of BPP merely contains the idea that we ac-
cept languages or, to put it in other words, that we solve decision problems
using a probabilistic Thring machine that is required to give a correct answer
with a prob ability that is larger than ~. Notation BPP stands for Bounded
error Probability in Polynomial time. The constant ~ is arbitrary and any
C E (~, 1) would define the same dass. This is because we can efficiently
increase the success probability by defining a Thring machine that runs the
same computation several times, and then taking the majority of results as
the answer. It is also widely believed that the dass BPP is the dass of
36 3. Devices for Computation

problems that are efficiently solvable. This belief is known as the Extended
Church- Turing thesis.

Earlier in this section it was mentioned that, according to the Church-


Thring thesis, Thring machines are capable of capturing the not ion of com-
putability. However, it seems that probabilistic Thring machines with any
of the above acceptance models also fit very well into the not ion of "algo-
rithmic computability". Is it true that any language which is accepted by a
probabilistic Thring machine, regardless of the acceptance mode, can also be
accepted by an ordinary Thring machine? The answer is yes, and it can be
justified as follows: since the probabilities were required to be rational, we can
always simulate the computation tree of a probabilistic Thring machine with
an ordinary one that sequentially performs all possible computations and the
probabilities associated to them. In other words, knowledge of the probabil-
ities on how a probabilistic algorithm works also allows us to compute the
probability distribution deterministically. 2
Therefore, the not ion of probabilistic computation does not shatter the
border between decidability and undecidability. However, any deterministic
Thring machine can also be seen as a probabilistic machine having only one
choice for each transition. Therefore, it is clear that

P ~ RP, P ~ ZPP, and P ~ BPP. (3.3)

The simulation of a probabilistic machine by a deterministic one is done by


sequentially computing all the possible computations of a probabilistic ma-
chine. Since the number of different computations may be exponential (recall
Remark 3.1.2), the simulation mayaiso take exponential time. Hence, we can
ask whether probabilistic computation is more efficient than deterministic
computation, i.e., if some of the inclusions (3.3) are strict. Or is it true that
there is a cunning way to imitate deterministically a probabilistic computa-
tion such that the simulation time would not grow exponentially? The answer
to such questions are unknown so far.

3.1.3 Multitape Turing Machines

A straightforward generalization of a Thring machine is a multitape Turing


machine. This model of computation becomes apparent when speaking about
space-bounded computation.
2 The restriction to rational probabilities plays an essential role here. This restric-
tion can be criticized, but from the practical point of view, rational probabilities
should be sufficient. Some authors allow all "computable numbers" as proba-
bilities, but this would introduce more problems, such as: Which numbers are
computable? Which is the computational model to "compute these numbers"?
Can the comparisons of "computable numbers" (needed to decide, e.g., to decide
whether the acceptance probability is at least ~) be made efficiently?
3.1 Uniform Computation 37

Definition 3.1.5. A (deterministic) Turing machine with k tapes over an


alphabet A is a sixtuple (Q, A, 8, qo, qa, qr), where qo, qa, qr E Q are the
initial, accepting, and rejecting states respectively, and 8 : Q X A k -+ Q x
(A x {-I, O,l})k is the transition junction.

A configuration of a k-tape Thring machine is a 2k + 1-tuple

(3.4)

where q E Q and Xi,Yi E A*. For a configuration (3.4) we say that q is the
current state, and that XiYi is the content 01 the ith tape. Let Xi = Viai and
Yi = biWi' Then we say that bi is the currently scanned symbol on the ith
tape. If

8(q, b1 , b2 , ... , bk ) = (r, (Cl, dl ), (C2, d 2), ... , (Ck, dk)),

where r E Q, Ci E A, and di E {-1, 0,1}, then the configuration c yields


c' = (r,x~,y~,x~,y~, ... ,x~,yD, where (x~,yD = (vi,aiciwi), if di = -1,
(x~,y~) = (Viai,Ciwi), if di = 0, and (x~,yD = (Viaici,Wi), if di = l.
The concepts associated with computing, such as acceptation, rejection,
computational time, etc., are defined for multitape Thring machines exactly
in the same fashion as for ordinary Thring machines. But when talking about
the space consumed by the computation, it may seem fair to ignore the space
occupied by the input and the space needed for writing the output, and
merely talk ab out the space consumed by the computation. By using multi-
tape machines, this is achieved as folIows: we choose k > 2 and identify the
first tape as an input tape and the last tape as an output tape. The machine
must never change the contents of the input tape, and the head position in
the output tape can never move to the left. If, using such a model for compu-
tation, one reaches a final configuration c = (q, UI, VI, U2, V2, ... , Uk, Vk), then
the space required during the computation is defined to be l:~~i IUiVil.
Again, we may ask whether the computational capacity of multitape Thr-
ing machines is greater than that of ordinary ones. This time, we can even
guarantee that the computation capacity of multitape machines is very dose
to the capacity of a single-tape machine: it is left as an exercise to show that
whatever is computable by a multitape machine in time t is also computable
by a single-tape Thring machine in time O(t 2 ).

3.1.4 Quantum Turing Machines

To provide a model for quantum computation, one can consider a straightfor-


ward generalization of a probabilistic Thring machine, replacing the probabil-
ities with transition amplitudes. Thus, there should be a transition amplitude
lunction

8 : Q x A x Q x A x {-1, 0,1} -+ <C


38 3. Devices for Computation

such that 6(q1, a1, q2, a2, d) gives the amplitude that whenever the machine is
in state q1 scanning symbol a1, it will replace a1 with a2, enter state q2, and
move the head to direction d E {-I, 0, I}.

Definition 3.1.6. A quantum Turing machine (QTM) over alphabet A is a


sixtuple (Q,A,6,qO,qa,qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states. The transition amplitude function must satisfy

Notions such as "configuration yields another" are straightforward gen-


eralizations of those associated with probabilistic computation. We can also
generalize the acceptance models NP, RP, ZPP, and BPP to correspond to
quantum computation. For any family F of languages accepted by some clas-
sical model of computation (deterministic or probabilistic), we usually write
QF to stand for the corresponding class in quantum computation. However,
there is already quite a well-established tradition to denote the quantum
counterpart of BPP 3 not by QBPP, but by BQP, and the quantum coun-
terpart of P, 4 is usually denoted by EQP to stand for Exact acceptation by
a Quantum computer in Polynomial time. The quantum counterpart of NP
is usually denoted by NQP.
As in probabilistic computation, a general configuration of a quantum
Turing machine can be seen as a combination

(3.5)

of basis configurations. Taking the basis configurations as an orthonormal


basis of an infinite-dimensional vector space, we see that the general configu-
rations, superpositions, are merely the unit-Iength vectors ofthe configuration
space. Moreover, the transition amplitude function determines a linear map-
ping M8 in the state space. From the theory of quantum mechanics (to be
precise: from the formalism that we are using) there now arises a further
requirement on the transition amplitude function 6: the linear mapping M8
in the configuration space determined by 6 should be unitary.
It turns out that the unitarity of the mapping M8 can be determined
by the local conditions on the transition amplitude function (see [13], [45],
and [63]), but the conditions are quite technical, and we will not represent
them here. Instead, we will study quantum circuits as a model for quantum
3 The quantum counterpart of BPP is the family of languages accepted by a
polynomial-time quantum Turing machine with a correctness probability of at
least ~.
4 The family of languages accepted by a polynomial-time quantum Turing machine
with a prob ability of 1.
3.1 Uniform Computation 39

computation in the next chapter. The reason for this choice is that, compared
to QTMs, a quantum circuit model is much more straightforward to use for
describing quantum algorithms. In fact, all of the quantum algorithms that
are studied in chapters 4-6 are given by using quantum circuit formalism.
We end this section by examining a couple of properties of quantum Thr-
ing machines. The first thing to consider is that the transition of a QTM
determines a unitary and, therefore, a reversible time evolution in the config-
uration space. However, ordinary Thring machines can be irreversible, too. 5
The quest ion is how powerful the reversible computation iso Can we design a
reversible Thring machine that performs each particular computational task?
A positive answer to this quest ion was first given by Lecerf in [53], who
intended to demonstrate that the Post Correspondence Problem 6 remains
undecidable even for injective morphisms. Lecerf's constructions were later
extended by Ruohonen [78], but Bennett in [8] was the first to give a model
for reversible computation simulating the original, possibly irreversible, com-
putation with constant slowdown but possibly with a huge increase in the
space consumed (see also [46]).
Bennett's work was at least partially motivated by a thermodynamic prob-
lem: according to Landauer [52], an irreversible overwriting of a bit causes
at least kT In 2 joules of energy dissipation. 7 This theoretical lower bound
can always be ignored by using reversible computation, which is possible
according to the results of Lecerf and Bennett.
Bennett's construction of a reversible Thring machine uses a three-tape
Thring machine with input tape, history tape, and output tape. Reversibility is
obtained by simulating the original machine on the input tape, thereby writ-
ing down the history of the computation, i.e., the transition rules that have
been used so far, onto the history tape. When the machine stops, the output
is copied from the input tape to the empty output tape, and the computation
is run backward (also a reversible procedure) to erase the history tape for
future use. The amount of space this construction consumes is proportional
to the computation time of the original computation, but by applying the
erasure üf the history tape recursively, the space requirement can be reduced
even to O(s(n) logt(n)) at the same time using time O(t(n) logt(n)), where
sand t are the original space and time consumption [9]. Für spacejtime
trade-offs for reversible computing, see also [54].
5 We call an ordinary Turing machine reversible if each configuration admits a
unique precessor.
6 The Post Correspondence Problem, or PCP for short, was among the first com-
putational problems that were shown to be undecidable. The undecidability of
the PCP was established by Emil Post in [71 J. The importance of the PCP lies
in the simple combinatorial formulation of the problem; the PCP is very useful
for establishing other undecidability results and studying the boundary between
decidability and undecidability.
7 Here k = 1.380658· 10- 23 J /K is Bolzmann's constant, and T is the absolute
temperature.
40 3. Devices for Computation

Thus, it is established that whatever is computable by a Turing machine


is also computable by a reversible Turing machine, which can be seen as a
quantum Turing machine as wen. Therefore, QTMs are at least as powerful as
ordinary Turing machines. The next question which arises is whether QTMs
can do something that the ordinary Turing machines cannot. The answer to
this question is no; the reason is the same as for probabilistic Turing ma-
chines: we can use an ordinary Turing machine to simulate the computation
of a QTM by remembering an the coexisting configurations in a superposition
and their amplitudes. Arguing like this, we have to assurne that an the ampli-
tudes can be written in a way which can be described finitely. For example, it
may be required that an the transition amplitudes can be written as x + yi,
where x and y are rational. This assumption can, of course, be criticized, but
the feasibility of a model having arbitrary complex number amplitudes is also
questionable. It is, however, true that, on some operations on quantum sys-
tems, nonrational amplitudes, such as ~, would be very natural. Therefore,
in practice, we will also use amplitudes x + yi where x and y are not rational,
but only if there is an "efficient" way to find rational approximations for x
and y. It is left to the reader to verify that in each example in this book, the
amplitudes not expressible by rational numbers can be wen approximated by
amplitudes which can be expressed by using rational numbers.
We have good reasons to believe (as did Feynman) that QTMs cannot be
efficiently simulated by ordinary ones, not even by probabilistic ones. One
major reason to believe this lies in Shor's factoring algorithm [81]: there is a
polynomial-time, probabilistic quantum algorithm for factoring integers, but
any classical polynomial-time algorithm for that purpose has not been found,
not even a probabilistic one, despite the huge effort.
The next and final issue connected to QTMs in this section is the exis-
tence of a universal quantum Turing machine. We say that a Turing machine
is universal if it can simulate the computation of any other Turing machine in
the fonowing sense: the description (the transition rules) and the input of an
arbitrary Turing machine Mare given to the universal machine U suitably
encoded, and the universal machine U performs the computation of M on
the encoded string. We could also say that the universal Turing machine U is
pmgmmmable so as to perform any Turing machine computation. It can be
shown that a universal Turing machine with 5 states having alphabet size 5
exists; see [74]. In [31], D. Deutsch proved the existence of a universal quantum
Turing machine capable of simulating any other QTM with arbitmry preci-
sion, but Deutsch did not consider the efficiency of the simulation. Bernstein
and Vazirani [13] supplied a construction of a universal QTM, which simu-
lates any other Turing machine such that, given the description of a QTM
M, a natural number T, and E > 0, the universal machine can simulate M
with precision E for T steps in time polynomial in T and ~.
3.2 Cireuits 41

3.2 Circuits

3.2.1 Boolean Circuits

Recall that a Thring machine can be regarded as a facility for computing a


partially defined function I: A* -+ A*. We will now fix A = 18'2 = {O, 1} to be
the binary alphabet 8 and consider the Boolean circuits computing functions
{O, l}n -+ {O, l}m. As the basic elements ofthese circuits we choose functions
/\ : 18'~ -+ 18'2 (the logical and-gate) defined by /\(Xl,X2) = 1 if and only if
Xl = X2 = 1, V : 18'~ -+ 18'2 (the logical or-gate) defined as V(Xt, X2) = 0 if and
only if Xl = X2 = 0 and the logical not-gate 18'2 -+ 18'2 defined by ""X = 1- x. 9
A Boolean circuit is an acyclic, directed graph whose nodes are labelled
either with input variables, output variables, or logical gates /\, V, and ...,. An
input variable node has no incoming arrows, while an output variable node
has no outcoming arrows but exactly one incoming arrow. Nodes /\, V, and
..., have 2, 2, and 1 incoming arrows respectively. In connection with Boolean
circuits, the arrows of the graph are also called wires. The number of the
nodes of a circuit is called the complexity of the Boolean circuit.
A Boolean circuit with n input variables Xl, ... , X n and m output variables
Yl, ... , Ym naturally defines a function 18'2' -+ 18'2': the input is encoded by
giving the input variables values 0 and 1, and each gate computes a primitive
function /\, V or ...,. The value of the function is given in the sequence of the
output variables Yl, ... , Ym·

Example 3.2.1. The Boolean circuit in Figure 3.1 computes the function 1:
18'~ -+ 18'~ defined by 1(0,0) = (0,0), 1(0,1) = (0,1), 1(1,0) = (0,1), and
1(1,1) = (1,0). Thus, I(Xl, X2) = (Yt,Y2), where Yl is Xl +X2 modulo 2, and
Y2 is the carry bit.

x r-------------~ )----+{ Y1

Fig. 3.1. A Boolean circuit computing the sum of bits Xl and X2

8 lF2 stands for the binary field with two elements 0 and 1, the addition and the
multiplication defined by 0 + 0 = 1 + 1 = 0,0 + 1 = 1, 0·0 = 0·1 = 0, 1·1 = l.
Thus -1 = 1 and 1- 1 = 1 in lF2.
9 Notations Xl /\ X2 and Xl V X2 are also used instead of /\(X1, X2) and V(X1, X2).
42 3. Devices for Computation

Emil Post has proven a powerful theorem characterizing the complete


sets 01 truth lunctions [70] which implies that all functions F2' -+ Fr can be
computed by using Boolean circuits with gates A, V, and -..,. A set S of gates is
called universal if all functions F2' -+ F2 can be constructed by using the gates
in S. In what follows, we will demonstrate that S = {A, V, -..,} is a universal
set of gates. The fact that all functions F2' -+ Fr are also computable follows
from the observation that one can design any function I : F2' -+ Fr using m
functions which each compute a single bit.
First, it is easy to verify that V(Xl, V(X2, X3)) equals to 0 if and only if
Xl = X2 = X3 = O. For short, we denote V(Xl, V(X2, X3)) = Xl V X2 V X3.
Clearly this generalizes to any number of variables: using only function V on
two variables, it is possible to construct a function Xl V X2 V ... V Xn , which
takes value 0 if and only if all the variables are O. Similarly, using only A
on two variables, one can construct function Xl A X2 A ... A Xn , which takes
value 1 if and only if all the variables are 1. We define a function M a for each
a = (al, . .. ,an) E F2' by

where CPi(Xi) = -"'Xi, if ai = 0 and CPi(Xi) = Xi, if ai = 1. Thus, functions M a


can be constructed by using only A and -..,. Moreover, Ma(xl, ... , X n ) = 1 if
and only if each CPi(Xi) = 1 for each i, but

so CPi(Xi) = 1 if and only if Xi = ai. It follows that M a is the char-


acteristic function of singleton {al: Ma(x) = 1 if and only if X = a.
Using the characteristic functions M a it is easy to build any function I:
I = Mal V M a2 V ... V M ak , where al, a2, ... , ak are exactly the elements
of F2' for which I takes a value of 1.
Boolean circuits thus provide a facility for computing functions F2' -+ Fr,
where the number of input variables is fixed. Let us then consider a function
I: {O, 1}* -+ {O, I} defined on any binary string. Let In be the restrietion of
I on {O, l}n. For each n there is a Boolean circuit Cn with n input variables
and one output variable computing In, and we say that Co, Cl, C2 , C3 , •.• , is
a lamily 01 Boolean circuits computing I. A family of Boolean circuits having
only one output node can be also used to recognize languages over the binary
alphabet: we regard a binary string with length n as accepted if the output
of C n is 1, otherwise that string is rejected.
So far we have demonstrated that, for an arbitrary function In: F2' -+ F2
there is a circuit computing In. Thus, for an arbitrary binary language L there
is a family Co, Cl, C 2 , ••• , accepting L. Does this violate Church-Turing's
thesis? There are only numerably many Turing machines, so there are only
numerably many binary languages that can be accepted by Turing machines.
But all binary languages (there are non-numerably many such languages) can
3.2 Circuits 43

be accepted by Boolean circuit families. Is the notion of a circuit family a


stronger algorithmic device than a Turing machine? The answer is that the
notion of a circuit family is not an algorithmic device at all. The reason is
that the way in which we constructed the circuit Cn requires exact knowledge
of the values of fn. In other words, without knowing the values of f, we do
not know how to construct the circuit family Co, Cl, C2 , ..• ; we can only
state that such a family exists but, without the construction, we cannot say
that the circuit family Co, Cl, C 2 , •.• is an algorithmic device.
We say that a language L has uniformly polynomial circuits if there exists
a Turing machine M that on input In (n consecutive l's) outputs the graph
of circuit Cn using space O(logn), and the family Co, Cl, C2 , ••• accepts L.
For the proof of the following theorem, consult [64].
Theorem 3.2.1. A language L has uniformly polynomial circuits if and only
if L E P.
Since our main concern in this book is polynomial-time quantum computa-
tion, we will use quantum circuit formalism which is analogous to the uni-
formly polynomial circuits.

3.2.2 Reversible Circuits

In Section 2 we introduced unary and binary quantum gates as unitary map-


pings. This is a generalization of the not ion of reversible gates, so let us now
study some properties of reversible gates. A reversible gate on m bits is a
permutation on lFr, so there are m inputs and output bits. Clearly there are
(2 m )! reversible gates on m bits.

Example 3.2.2. Function T : lF? -+ lF?, T(Xl,X2,X3) = (Xl,X2,XlX2 - X3)


defines a reversible gate on three bits. This gate is called the Toffoli gate.
The Toffoli gate does not change bits Xl and X2, but computes the not-
operation on X3 if and only if Xl = X2 = 1. The symbol in Figure 3.2 is used
to signify the Toffoli gate.

Xl _ _....._ _ yl

X2 _ _....._ _ y2

X3 _ _~--y3

Fig. 3.2. The Toffoli gate


44 3. Devices für Cümputatiün

A reversible circuit is a permutation on lF~ composed of reversible gates. Thus,


we malm no fundamental distinction between reversible gates and reversible
circuits, but the only difIerence is contextual: we usually assurne that there
is a fixed set of reversible gates, and an arbitrarily large reversible circuit can
be built from them.
We call a set R of reversible gates universal if any reversible circuit
C : lF~ -7 lF~ can be constructed using the gates in R, constants, and
some workspace. This means that, using the gates in R, we can construct
apermutation f : lF~+m -7 lF~+m such that there is a fixed constant vector
(Cl, ... , Gm) E lFr such that

where (Yl, ... , Yn) = C(Xl, ... , x n ). Moreover, in this construction we also
allow bit permutations that swap any two bits:

but these bit permutations could, of course, be avoided by requiring only


that f simulates C in such a way that the constants are scattered among
the variables in fixed places, and that the output can be read on fixed bit
positions. If a universal set R consists of a single gate, this gate is said to be
universal. When a universal set R of reversible gates is fixed, we again say
that the complexity of a circuit C (with respect to the set R) is the number
of the gates in C.
Earlier we demonstrated that a Boolean circuit constructed of irreversible
gates /\, V, and ' can compute any function lF~ -7 lFr, which especially
implies that, by using these gates, we can build any reversible circuit, as
weIl. Recalling that any Thring machine can also be simulated by a reversible
Thring machine, it is no longer surprising that all Boolean circuits can be
simulated by using only reversible gates. In fact:
• Not-gates are already reversible, so we can use them directly.
• And-gates are irreversible, and therefore we simulate any such by using a
TofIoli gate T(xl, X2, X3) = (Xl, X2, X1X2 - X3). Notice that T(xl, X2, 0) =
(Xl,X2,X1X2), so, using the constant 0 on the third component, a TofIoli
gate computes the logical and of variables Xl and X2.
• Since Xl V X2 = .( 'Xl /\ 'X2), we can replace all the V-gates in a Boolean
circuit by /\- and .-gates.
• The fanout (multiple wires leaving agate) is simulated by a controlled
not-gate C : lF~ -7 lF~, which is apermutation defined by C(Xl, X2) =
(Xl, Xl - X2), so the second bit is negated if and only if the first bit is one.
Again, C(Xl,O) = (Xl, Xl)' so, by using the constant 0 we can duplicate
the first bit Xl.
Because of the above properties, we can construct a reversible circuit for each
Boolean circuit simulating the original one. This contruction uses not-gates,
3.2 Circuits 45

Toffoli gates, controlled not-gates and the constant O. But using the constant
1, the Toffoli gates can also be used to replace the not- and controlled not-
gates: T(I, 1, Xl) = (1,1, --,xd and T(I, Xl, X2) = (1, Xl, Xl -X2), so the Toffoli
gates with constants 0 and 1 are sufficient to simulate any Boolean circuit
with gates /\, V and --'. Since Boolean circuits can be used to build up an
arbitrary function !F2 --7 !Fr, we have obtained the following theorem.

Theorem 3.2.2. A Toffoli gate is a universal reversible gate.

Remark 3.2.1. A more straight forward proof for the above theorem was given
by Toffoli in [87]. It is interesting to note that there are universal two-qubit
gates for quantum computation, [33] but it can be shown that there are no
universal reversible two-bit gates. In fact, there are only 4! = 24 reversible
two-bit gates, and all of them are linear, Le., they can all be expressed as
T(x) = Ax + b, where b E !F~ and A is an invertible matrix over the binary
field. Thus, any function composed of them is also linear, but there are also
nonlinear reversible gates, such as the Toffoli gate.

3.2.3 Quantum Circuits

We identify again the bit strings x E !Fr and an orthogonal basis {Ix) I
x E !Fr} of a 2m -dimensional Hilbert space H 2",. To represent linear map-
pings !Fr --7 !Fr, we adopt the coordinate representation Ix) = ei =
(0, ... , 1, ... ,of, where ei is a column vector having zero es elsewhere but
1 in the ith position, if the components of x = (x!, ... , x m ) form a binary
representation of number i-I.
A reversible gate f on m bits is apermutation of !Fr, so any reversible
gate also defines a linear mapping in H 2m. There is a 2m X 2m permutation
matrix M(f) 10 representing this mapping, M(f)ij = 1, if f(ej) = ei, and
M(f)ij = 0 otherwise.

Example 3.2.3. In!F~, we denote 1000) = (1,0, ... ,O)T, 1001) = (0,1, ... ,O)T,
... , 1111) = (0,0, ... , I)T. The matrix representation of the Toffoli gate is

M(T) =

where I 2 is the 2 x 2 identity matrix and M~ is the matrix of the not-gate


(Example 2.1.1).

Quantum gates generally are introduced as straight forward generaliza-


tions of reversible gates.
10 A permutation matrix is a matrix having entries 0 and 1, exactly one 1 in each
row and column.
46 3. Devices for Computation

Definition 3.2.1. A quantum gate on m qubits is a unitary mapping in


H2 0 ... 0 H 2 (m times), which operates on a fixed number (independent of
m) of qubits.
Because M (f) 'ij = 1 if and only if f (ei) = e j, M (f) * represents the inverse
permutation to f. Therefore, a permutation matrix is always unitary, and
reversible gates are special cases of quantum gates. The not ion of a quantum
circuit is obtained from that of a reversible circuit by replacing the reversible
gates by quantum gates. The only difference between quantum gates and
quantum circuits here is again contextual: we require that a quantum circuit
must be composed of quantum gates, such that each gate operates only on a
bounded number of qubits (the bound is the same for each gate).

Definition 3.2.2. A quantum circuit on m qubits is a unitary mapping on


H 2 m, which can be represented as a concatenation of a finite set of quantum
gates.

Since reversible circuits are also quantum circuits, we have already dis-
covered the fact that whatever is computable by a Boolean circuit is also
computable by a quantum circuit. It is also interesting to compare the compu-
tational power of polynomial-size quantum circuits (that is, quantum circuits
containing polynomially many quantum gates) and polynomial-time QTMs.
A. Yao [92] has shown that their computational powers coincide.
It would be also interesting to know which kinds of gates are needed
for quantum computing. The very first answer to that was given by David
Deutsch, who demonstrated that there exists a three-qubit universal gate for
quantum computing [32].11
It has turned out that the controlled not-gate (see Example 2.2.2) plays
a most important role in quantum computing.
Theorem 3.2.3 ([5]). All quantum circuits can be constructed by using only
controlled not-gates and unary gates.

Remark 3.2.2. Even though 2-qubit gates are enough for quantum comput-
ing, 2-bit gates are not enough for classical reversible computing. This is quite
easy to see; recall Remark 3.2.1.

Remark 3.2.3. It is known that to implement quantum computing, only real


numbers are needed [13] and that quantum computing can be regarded as
"discrete" [13]. The theorem below gives even more information. Even though
these important consequences of Shi's theorem [80] were known before, it
expresses many things in a very compact form.
11 By a universal gate, we mean here a quantum gate that can be used to approx-
imate all quantum networks. Moreover, usually it is assumed that some ancilla
qubits can be used. The values of those qubits are set previously in some fixed
manner, and the values of those qubits are ignored when reading the result.
3.2 Circuits 47

Theorem 3.2.4 ([80]). Aiz quantum circuits can be constructed (in the ap-
proximate sense) by using only ToJJoli gates and Hadamard- Walsh gates.

Remark 3.2.4. By the so-called Gottesman-Knill theorem [40], all quantum


circuits consisting only of Hadamard-Walsh gates and of controlled not-gates,
can be simulated with only polynomial loss of efficiency by using classical
circuits.

In the forthcoming chapters, we will use mainly the quantum circuit for-
malism for representing quantum algorithms. To conclude this section, we
mention a very important thorem of Solovay and Kitaev. [50]
Theorem 3.2.5. Assume that S is a finite set of unary quantum gates that
can approximate any unary quantum gate up to an arbitrary precision. There
exists a constant C depending on S only and c ~ 4 such that any unary quan-
tum gate can be approximated up to precision f by using at most C 10gC( ~)
gates from set S.
The above theorem, together with Theorem 3.2.3, implies that an n-gate
quantum circuit can be simulated by using O( n 10gC (:;)) gates from a univer-
sal set.
4. Fast Factorization

In this chapter we represent Shor's quantum algorithm for factoring integers.


Shor's algorithm can be better understood after studying quantum Fourier
transforms. The issues related to Fourier transforms and other mathematical
details are handled in Chapter 9, but areader having asolid mathematical
knowledge of these concepts is advised to ignore the references to Chapter 9.

4.1 Quantum Fourier Transform

4.1.1 General Framework

Let G = {gI, g2, ... , gn} be an abelian group (we will use the additive nota-
tions) and {Xl, X2,"" Xn} the characters of G (see Section 9.2). The func-
tions f : G --+ C form a complex vector space V, addition and scalar multipli-
cation are defined pointwise. If h, 12 E V, then the standard inner product
(see Section 9.3) of hand 12 is defined by
n
(!I 112) = L f;(gk)12(gk).
k=l

Any inner product induces a norm by Ilfll = vTTTf). In Section 9.2 it is


demonstrated that the functions Bi = JnXi form an orthonormal basis of
the vector space, so each f E V can be represented as

where Ci are complex numbers called the Fourier coefficients of f. The discrete
Fourier transform of fE V is another function [ E V defined by [(gi) = Ci'
Since the functions Bi form an orthonormal basis, we see easily that Ci
(Bi 11), so

(4.1)

The Fourier transform satisfies Parseval 's identity


50 4. Fast Factorization

IIAI = 11/11, (4.2)

which will be important in the sequel.


Let H be a finite quantum system capable of representing the elements of
G. This means that {Ig) I g E G} is an orthonormal basis of some quantum
system H. To obtain the matrix representations for linear mappings, we use
the coordinate representation Igi) = ei = (0, ... , 1, ... ,O)T (all coordinates 0
except 1 in the ith position; see Section 9.3). The general states oft he system
are the unit-Iength linear combinations of the basis states. Thus, a general
state

(4.3)

of H can be seen as a mapping

I: G -+ C, where I(gi) = Ci and 11I11 = 1, (4.4)

and vice versa, each mapping (4.4) defines astate of H.


Definition 4.1.1. The quantum Fourier translorm (QFT) is the operation
n n
L I(gi) Igi) L legi) Igi) .
I-t (4.5)
i=1 i=1

In other words, QFT is just the ordinary Fourier transform of a function


I: G -+ C determined by the state (4.3) via (4.4). It is clear by the formula
(4.1) that the QFT (4.5) is linear, but by Parseval's identity (4.2), QFT is
even a unitary operation in H. As seen in Exercise 1, for basis vectors Igi)
the operation (4.5) becomes

(4.6)

and hence it is clear that in basis Igi), the matrix ofthe QFT is

(4.7)

What kind of quantum circuit is needed to implement (4.6)? The problem


which arises is that, in a typical situation, n = IGI = dim(H) is large, but
usually H is a tensor product of smaller spaces. However, quantum circuit
operations were required to be local, not affecting a large number of quantum
digits at the same time. In other words, the problem is how we can decompose
the QFT matrix (4.7) into a tensor product of small matrices, or into a
4.1 Quantum Fourier Transform 51

product offew matrices that can be expressed as a tensor product of matrices


operating only on some small subspaces.
To approach this problem, let us assurne that G = U E9 V is a direct sum
of the subgroups U and V. Let r = IUI and s = lVI, hence IGI = rs. Let
also U and V be represented by some quantum systems H u and H v with
orthonormal bases

respectively. Then, the tensor product H u 0Hv represents G in a very natural


way: each gE G can be uniquely expressed as 9 = U + v, where u E U and
v E V, so we represent 9 = u + v by lu) Iv). Since we have a decomposition
{] = fj x V, all the characters of G can be written as
Xij(g) = Xij(U + v) = xf (u)XY (v),
where xf and XY are characters of U and V respectively (see Section 9.2.1),
and

(i,j) E {I, ... ,r} x {l, ... ,s}.


Thus, the Fourier transforms can also be decomposed:
Lemma 4.1.1 (Fourier transform decomposition). Let G = U E9 V be
a direct product of subgroups U and V and {lu) Iv) Iu EU, v E V} a quantum
representation of the elements of G. Then

IUi) IVj) ~ (~ t(xf (Uk))* IUk) ) (~ t(xY (Vl))* lVI) ) (4.8)


k=1 1=1
1
L L X:j (Uk + VI) IUk) lVI)
r B

= ~ (4.9)
yrs
k=11=1

is the Fourier tmnsform on G.


The decomposition of the Fourier transform may be applied recursively to
groups U and V. 1t is also interesting to notice that the state (4.9) is decom-
posable.

4.1.2 Hadamard-Walsh Transform

Let us study a special case G = !Fr. All the characters of the additive group
of!Fr are

where y E !Fr. To implement the corresponding QFT we have to find a


quantum circuit that performs the operation
52 4. Fast Factorization

Ix) r--+ 2: (-l)""Y Iy) .


yElF;"

The elements x = (Xl, X2, ... , X m ) E F2" have a very natural quantum repre-
sentation by m qubits:

which corresponds exactly to the algebraic decomposition

Thus, it follows from Lemma 4.1.1 that it suffices to find QFT on F 2 , and the
m-fold tensor product of that mapping will perform QFT on F2". However,
QFT on F 2 with representation {IO) ,11l} is defined by
1
10) r--+ yI2(IO) + 11)),
1
11) r--+ yI2(IO) -11)),

which can be implemented by the Hadamard matrix (which is also denoted


by W 2 )

H = yI2
1 (1 1) 1 -1 .

Thus, we have obtained the following result:


Lemma 4.1.2. LetHm = H@···@H (m times). Thenforlx) = lXI)" ·Ix m )

(4.10)

For any m, the matrix H m is also called a Hadamard matrix. QFT (4.10)
is also called a Hadamard transform, Walsh transform, or Hadamard- Walsh
transform.

4.1.3 Quantum Fourier Transform in Zn

All the characters of group Zn are


27rixy
Xy(x) = e n ,

where x and y are some representatives of cosets (See Sections 9.1.3, 9.1.4,
and 9.2). To simplify the notations, we will denote any coset k+nZ by number
the k. Using these notations,
4.1 Quantum Fourier Transform 53

Zn={0,1,2, ... ,n-1},

where addition is computed modulo n. This notation will be used hereafter.


The corresponding QFT on Zn is the operation
n-l
Ix) --+ L e- 2oc~xy Iy) . (4.11)
y=o
Now we have the problem that, unlike for lFr, there is no evident way to give
a quantum representation for the elements of Zn in such a way that the basis

10),11),12), ... , In - 1)

could be represented as a tensor product of two smaller bases representing


some subsystems of Zn. If, however, we know some factorization n = nln2
such that gcd(nl, n2) = 1, the Chinese Remainder Theorem (Theorem 9.1.2)
offers us a decomposition: according to the Chinese Remainder Theorem,
there is a bijection

given by F((k l , k 2 )) = aln2kl + a2nlk2, where al (resp. a2) is the multi-


plicative inverse of n2 (resp. nd modulo nl (resp. n2). By Exercise 2, the
mapping Fis, in fact, an isomorphism. We also notice that, since al (resp.
a2) has multiplicative inverse, the mapping k l f-t alk l (resp. k 2 f-t a2k2) is a
permutation of Zn, (resp. Zn2)' and this is all we need for decomposing the
QFT (4.11).
Assurne that the QFTs are available for Zn, and Zn2' i.e., we have quan-
tum representations

{IO) , 11), ... , Inl - I)} and {IO) , 11) , ... , In2 - I)}

of Zn, and Zn2 and we also have the routines for mappings

We use the quantum representation Ik) = Ik l ) Ik 2) for the elements of Zn


given by the Chinese Remainder Theorem with k = aln2kl + a2nlk2. By
constructing a quantum circuit that performs the multiplications (k l , k 2 ) f-t
(alkl,a2k2) and concatenating that with the QFT circuits on both compo-
nents, we get
54 4. Fast Factorization

This decomposition can be applied recursively to nl and n2.


Notiee carefully that the previous decomposition requires some factoriza-
tion n = n1n2 such that gcd(nl,n2) = 1, but generally the problem is to find
a nontrivial factorization of n. Shor's factorization algorithm makes use of
the inverse QFT on Z2"" and since there is no coprime factorization for 2m ,
we have to be ready to offer something else. Fortunately, we can also obtain
some more knowledge on QTFs on groups Z2"'.
Since the inverse Fourier transform in Zn is quite symmetrie to (4.11),
2'7l'"ixy • • 2?Tixy • • •
(e- n 1S replaced w1th e n ), choosmg Wh1Ch one to study 1S a matter of
personal preference. In what follows, we will learn how to implement the in-
verse Fourier transform on Z2"'. Group Z2'" has a very natural representation
by using m qubits: an element x = Xm_12m-1 + Xm_22m-2 + ... + X12 + xo,
where each Xi E {O, 1}, is represented as

How do to implement the inverse QFT

(4.12)

with a quantum circuit? To approach this problem, we can first make an


observation whieh should no longer be a surprise more after the previous
decomposition results: The superposition on the right-hand side of (4.12) is
decomposable.
Lemma 4.1.3.

y=o
= (10) +e~ 11))(10) +e* 11))···(10) +e 2 ;';':i 11)) (4.13)
4.1 Quantum Fourier Transform 55

Proof. Representing Iy) = Iy'b) = Iy') Ib), where y' are the m - 1 most sig-
nificant bits and b the least significant bit of y, we can divide the sum into
two parts:

y=o y'=o y'=o


2m _l 2m _l
L L
- 1 - 1

e 2"2i J."'u Iy) 10) + e 2"2"1,"'Y e ~ Iy) 11)


y=o y=o
2m - 1 _l
= L e~;;'i':~ ly)(IO) +e 2 ;.i': 1 11)).
y=o
The claim now follows by induction. D

Decomposition (4.13) also gives us a clue on how to compute transform (4.12)


by using a quantum circuit. The lth phase of 11) in (4.13) depends only on
bits xo, Xl, ... , XI-1 and can be written as
7fi(2 m- 1Xm _1 + 2m - 2 x m _2 + ... + 2X1 + xo))
exp ( 21- 1
_ (7fi(2 1- 1XI_1 + 21- 2XI_2 + ... + 2X1 + xo))
- exp 21- 1
7fiXI_1
= exp ( - - - ) exp (7fiXI_2
- - ) ... exp (7fiXl
- - ) exp (7fixO
- -)
2 0 12 1 2 2- 1 1 2-
7fiXI-2
= (_1)X 1- 1 exp ( - - ) .. ·exp (7fiXl
- - ) exp (7fixO
--)
21 21- 2 21- 1
We can now describe a quantum circuit that performs operation (4.12): given
a quantum representation

(4.14)

we swap the qubits to get the reverse binary representation

(4.15)

In practice, this swapping is not necessary, since we could operate as weIl


with the inverse binary representations; we included the swapping here just
for the sake of completeness.
On state (4.15) we proceed from right to left as follows: Hadamard trans-
form on the mth (the right-most) qubit gives

and we complete the phase (_1)X m - l to


56 4. Fast Factorization

by using phase rotations

ni ni ni
exp( 21 ), ... ,exp( 2 m - 2 )' exp( 2m - I )

conditionally. That is, for each 1 E {I, 2, ... ,m -I}, a phase factor exp( 2!~1)
is introduced to the mth bit if and only if mth and lth qubits are both l.
This procedure will yield state

The same procedure will be applied from right to left to qubits at locations
m - 1, ... , 2, and 1: for a qubit at location 1 we first perform a Hadamard
transform to get the qubit in state

~(IO) + (_1?1-1 11)),


then, for each k in range 1 - 1, 1 - 2, ... , 1, we introduce phase factor

ni
exp( 21- k )

conditionally if and only if the lth and kth qubit are both 1. That can be
achieved by mapping
10) 10) f-t 10) 10)
10) 11) f-t 10) 11)
11) 10) f-t 11) 10)
11) 11) f-t e""#T 11) 11),
which acts on the lth and kth qubits. The matrix of the mapping can be
written as

1000 )
0100
cl>k,l = ( 0 0 1 ~i •
(4.16)
OOOe2T=

Ignoring the swapping (Xm-l, Xm -2, ... , xo) f-t (Xo, Xl, ... , xm-d, this pro-
cedure results in the network in Figure 4.1 with ~m(m + 1) gates.
In Figure 4.1, the subindices of the gates cl> are omitted for typographical
reasons. The 4>-gates are (from left to right) cl>m-l.m-2, cl>m-l,m-3, cl>m-l,O,
cl>m-2,m-3, cl>m-2,O, and cl>m-3,O.
4.1 Quantum Fourier Transform 57

Xm-l ~

X m -2 Yl

X m -3 Y2

Xo Ym-l

Fig. 4.1. Quantum network for QFT on Z2m

4.1.4 Complexity Remarks

By the traditional discrete Fourier transform one usually understands the


Fourier transform in Zn given by
n-l
[(x) = 2: e- 2~~XY f(y). (4.17)
y=ü

For practical reasons, n is typically chosen as n = 2m. Fourier transform


(4.17) can be used to approximate the continuous Fourier transform, and has
therefore tremendous importance in physics and engineering.
By computing (4.17), we understand that we are given vector

(1(0), f(l), ... , f(2m - 1)) (4.18)

as the input data, and the required output is

([(0), [(1), ... , [(2m - 1)). (4.19)

The naive method for computing the Fourier transform is to use formula
(4.17) straightforwardly to find all elements in (4.19), and the time complexity
given by this method is O((2m)2), as one can easily see.
A significant improvement is obtained by using fast Fourier transform
(FFT), whose core can be expressed in a decomposition
58 4. Fast Factorization

which very closely resembles the decomposition of Lemma 4.1.3, and essen-
tially states that the vector (4.19) can be computed by combining two Fourier
transforms in Z2m-1. The time complexity obtained by recursively apply-
ing the above decomposition is O(m2 m ), significantly better than the naive
method.
The problem of computing QFT is quite different: in a very typical situ-
ation we have, instead of (4.18), a quantum superposition

Co 10) + Cl 11) + ... + C2"'_112 m - 1) (4.20)

and QFT operates on the coefficients of (4.20). At the same time, the physical
representation size of (4.20) is small; in system (4.20) there are only m qubits,
yet there are 2m coefficients Ci. Earlier we learned that QFT in Z2'" can be
done in time O(m 2 ) (Hadamard-Walsh transform can be done even in time
O(m)), which is exponentially separate from the classical counterparts of
Fourier transform. But the major difference is that the physical representation
sizes of (4.18) and (4.20) are also exponentially separate, the first one taking
il(2 m ) bits and the latter one m qubits. Later, we will learn that the key
idea behind many interesting fast quantum algorithms is the use of quantum
parallelism to convey some information of interest into the coefficients of
(4.20) and then compute QFT rapidly.

4.2 Shor's Algorithm for Factoring Numbers

4.2.1 From Periods to Factoring

Given two prime numbers p and q, it is an easy task to compute the prod-
uct n = pq. The naive algorithm already has quadratic time complexity
O(max{lpl ,lqIP), but, by using more sophisticated methods, even perfor-
mance O(max{lpl , IqIP+f) is reachable for any E > 0 (see [28]). On the other
hand, the inverse problem, factorization, seems to be extremely hard to solve.

Remark 4.2.1. A deterministic polynomial-time algorithm for recognizing


prime numbers has been recently discovered [1].

Nowadays it is a widespread belief that there is no efficient classical algo-


rithm for solving the factorization problem: given n = pq, a product of two
primes, find out p and q. This problem has great importance in cryptography:
the reliability of the famous public-key cryptosystem RSA is based on the
assumption that no efficient algorithm exists for factorization.
In 1994, Peter W. Shor [81] introduced a probabilistic polynomial-time
quantum algorithm for factorization, which will be presented in this chapter.
First we show how to reduce the factoring to finding orders of elements in
Zn. Let
4.2 Shor's Algorithm for Factoring Numbers 59

n = p~l ... p~k (4.21)

be the (unknown) prime factorization of an odd number (the powers of 2 can


be easily recognized and canceled). We will also assume that k ~ 2, i.e., n
has at least two different prime factors. In fact, there is a polynomial-time
algorithm for recognizing powers (see Exercise 3), so we may as weH assume
that n is not a prime power. We can be satisfied if we can find any non-trivial
factor of n rapidly, since the process can be recursively applied to aH of the
factors in order to uncover the whole factorization (4.21) swiftly.
Let us randomly choose, with uniform probability, an element a E Zn,
a =I- 1. If d = gcd(a, n) > 1, then dis a nontrivial factor of n and our task
is complete. If d = 1, assume that we could somehow extract r = ordn(a)
(ordn(a) means the order of element a in Zn, see Sections 9.1.3 and 9.1.4 for
the number-theoretic notions in this section). Then

aT == 1 (mod n),

which means that n divides a T - 1. Of course this does not yet offer us a
method for extracting a nontrivial factor of n, but if r is even, we can easily
factorize a T - 1:

aT - 1 = (a~ -l)(a~ + 1).


Now, since n divides aT -1, n must share a factor with a~ -1 or with a~ + 1
(or with both), and this factor can be extracted by using Euclid's algorithm.
How could we guarantee that the factor of n obtained in this way is
nontrivial? Namely, if n divides a~ ± 1, then Euclid's algorithm would only
give n as the greatest common divisor. Fortunately, it is not so likely that n
divides a~ ± 1, as we will soon demostrate. First, n I (a~ - 1) would imply
that

a~ == 1 (mod n),
which would be absurd by the very definition of order. But it may still happen
that n just divides a~ + 1, and does not share any factor with a~ -1. In this
case, the factar given by Euclid's algorithm would also be n. On the other
hand, n I d + 1 means that

a~ == -1 (mod n),
i.e., a~ is a nontrivial square root of 1 modulo n. In the next section we will use
elementary number theory to show that, for a randomly chosen (with uniform
distribution) element a E Z~, the probability that r = ord(a) is even, and
that a~ t=
-1 (mod n) is at least !.Consequently, assuming that r = ord(a)
could be rapidly extracted, we could, with a reasonable probability, find a
nontrivial factor of n.
60 4. Fast Factorization

There may be still some concern about the magnitude of numbers a li ± 1. Is


it possible that these numbers would be so large that the procedure could be
inefficient anyway? Fortunately, it is easy to answer this question: divisibility
by n is a periodic property with period n; so, once r = ord(a) is found, it
suffices to find a li ±1 modulo n (for modular exponentiation, rapid algorithms
are known).
We may now focus on the problem of finding r = ord(a). Since n is odd
but not a prime power, group Z~ is never cyclic, but each element a E Z~
has an order r = ord(a) smaller than ~rp(n) < ~. By the definition of the
order,
a l +sr == al (mod n)
for any integer s, which implies that the function f : Z --+ Zn defined by
f(k) = ak (mod n) (4.22)
has r as the period. We will demonstrate in Section 4.2.3 that it is possible
to find this period rapidly by using QFT.
Example 4.2.1. 15 = 3·5 is the smallest number that can be factorized by
the above method.
Zi5 = {1,2,4, 7,8,11, 13, 14},
and the orders of the elements are 0, 4, 2, 4, 4, 2, 4, 2 respectively. The
only element for which a li == -1 (mod 15) is a = 14 == -1 (mod 15). If,
for example, a = 7, then 15 I 74 - 1 and 72 - 1 == 3 (mod 15), 72 + 1 == 5
(mod 15). We get 3 = gcd(3, 15) and 5 = gcd(5, 15) as nontrivial factors of
15.

4.2.2 Orders of the Elements in Zn


Let n = p~l ... p~k be the prime factorization of an odd n, and a E Z~ a
randomly (and uniformly) chosen element. The Chinese Remainder Theorem
gives us a useful decomposition
(4.23)

of Z~ as a direct product of cyclic groups. Recall that the cardinality of each


factor is given by IZp:i I = rp(p~i) = p~i-l(Pi - 1) (see Section 9.1.4). Note
especially that each Zp:i has an even cardinality. For the decomposition of
an element a E Z~ we will use notation
(4.24)

and the order of an element a in Z;e will be denoted by ordpe (a).


Because of the decomposition (4.24), to choose a random element a E Z~
is to choose krandom elements as in (4.24) and vice versa. The following
technicallemma will be quite useful for the probability estimations.
4.2 8hor's Algorithm for Factoring Numbers 61

Lemma 4.2.1. Let i.p(pe) = 2uv, where u 2: 1, 2 t v and s 2: 0 a fixed integer.


Then the probability that a randomly (and uniformly) chosen element a E Z;,
has an order of form 2S t with 2 t t is at most ~.

Proof. If s > u, then the probability in quest ion is 0, so we may assume


s ::::: u. Let g be a generator of Z;,.
Then Z;e
= {gO, gl , ... , g2 v-l }, and
U

so an order ofform 2S t occurs if and only if j = 2u - s w, where 2 t w. In the set


{O, 1,2, ... ,2 u v - I} there are exactly 2s v multiples of 2u - s , namely O· 2u - s ,
1· 2u - s , ... , and (2 S v - 1)2 U - S , but only half of them have an odd multiplier.
Therefore, the required probability is

as claimed. D

We can use the previous lemma to estimate the probability of having an odd
order.
Lemma 4.2.2. The probability that r = ord(a) is odd for a uniformly chosen
a E Z~ is at most ~.

Proof. Let (al, ... , ak) be the decomposition (4.24) of an element a E Z~.
Let ri = ordp:i (ai). Since r = lcm{rl' ... ' rk} (Exercise 4), r is odd if and
only if each ri is odd.
Putting s = 0 in Lemma 4.2.1, we learn that the probability of having a
random ai E Zp:i with odd order is at most ~. Therefore, the probability PI
of having odd r is at most

as claimed. D

What about the probability that a~ == -1 (mod n)? It turns out that
Lemma 4.2.1 is useful for estimating that probability, too.

Lemma 4.2.3. Let n = p~l ... p~k be the prime decomposition of an odd n
and k 2: 2. If r = ordn (a) is even, then the probability that a ~ == -1 (mod n)
is at most 2 kl_ 1 .

Proof. Congruence a~ == -1 (mod n) implies that

a~ == -1 (mod p:i) (4.25)


62 4. Fast Factorization

for each i. Let ri = ordp~i (ai), so r = lcm{rl"'" rd. We write r = 2St and
ri = 2Siti, where 2 f t and 2 f ti' From the fact that ri I r for each i, it follows
that Si ::; S, but the congruencies (4.25) can only hold if Si = S for every i.
For if Si < S for some i, then also ri I ~ (the assumption k 2:: 2 is needed
here!), which implies that

(4.26)
But (4.26) together with (4.25) gives 1 == -1 (mod p~i), which is absurd since
Pi =1= 2.
Therefore, the probability P2 that d == -1 (mod n) is at most the prob-
ability that Si = S for each i, which is

L P (SI = l) rr P(Si = sd
00 k
P2 =
1=0 i=2

rr
k 00

= P(Si = SI) L P(SI = l)


i=2 1=0

= rr
k

i=2
P(Si = s) ::; 2k- 1
1

by Lemma 4.2.l. D

Combining Lemmata 4.2.2 and 4.2.3, we get the following lemma.


Lemma 4.2.4. Let n = p~l ... p~k be the prime factorization of an odd n
with k 2:: 2. Then, for a random a E IZ~ (chosen uniformly), the probability
that r = ordn(a) is even and a f -t- -1 (mod n) is at least

For our purposes, the result of the previous lemma would be sufficient. On
the other hand, it would be theoretically interesting to see if the result could
be improved. In fact, this is the case:
Lemma 4.2.5. Let all the notations be as before. Then the probability that
r = ordn(a) is odd or a f == -1 (mod n) is at most 2k~1'
Proof. Recalling the previous notations, the result follows directly from the
observation that r is odd if and only if each Si = 0, and that if a f == -1
(mod n) happens, then all numbers Si are equal. The former event is a subcase
of the latter, so we may conclude that
P(r is odd or a f == -1 (mod n))
1
::; P(all numbers Si are equal)::; 2 k - 1'

The latter inequality follows directly from that one of Lemma 4.2.3. D
4.2 Shor's Algorithm for Factoring Numbers 63

Corollary 4.2.1. Let all the notations be as in Lemma 4.2.4. The probability
that that r is even and a ~ t= -1 (mod n) is at least ~.

Remark 4.2.2. By studying the group Z21 we can see that the probability
limit of the previous lemma is optimal.

4.2.3 Finding the Period

By Lemma 4.2.4 we know that, for a randomly (and uniformly) chosen a E Z~,
the probability that r = ordn (a) is even and a ~ t= -1 is at least 196 for any
odd n having at least two distinct prime factors. It was already discussed in
Section 4.2.1 that the knowledge about the order of such an element a allows
us to find efficiently a nontrivial factor of n. Thus, a procedure for comput-
ing the order would provide an efficient probabilistic method for finding the
factors. Therefore, we have a good reason to believe that finding r = ordn (a)
is computationally a very difficult task.
It is dear that finding r = ordn (a) reduces to finding the period of function
f : Z -+ Zn defined by
f(k) = ak (mod n). (4.27)

In fact, if the period of fis r, then aT = aO == 1 (mod n).


It is a well-known fact that Fourier transform can be used for extract-
ing information about the periodicity (see Section 9.2.5), and we will show
that, by using QFT, the period of (4.27) can be found with non-negligible
probability.
Of course, we cannot compute the QFT of (4.27) on whole Z, so we choose
a domain Zm = {O, 1, ... ,m - 1} with m > n large enough to give room for
the period to appear. We will also choose m = 21 for two reasons: the first is
that then there is a natural quantum representation of elements of Zm by l
qubits; and the second is that we have already learned in Section 4.1.3 how
to compute the QFT in Z2t. We will fix this l later.
The procedure requires quantum representation of Zm and Zn (the latter
can be replaced by some set induding Zn). The first takes l qubits and the
other takes at most l qubits. For the representation, notation Ix) Iy), where
x E Zm, y E Zn, will be used.

Quantum Algorithm for Finding the Orders


1. The first stage is to prepare a superposition

)in L
m-l

Ik) 10) (4.28)


k=O

by beginning with 10) 10) and using Hadamard-Walsh transform in Zm.


64 4. Fast Factorization

2. The second step is to compute k H a k (mod n) to get

(4.29)

Since the function k H a k (mod n) has period r, (4.29) can be written


as
1
; ; :; L L
r-l SI

Iqr + I) jal) , (4.30)


yrn 1=0 q=O

where SI is the greatest integer for which sIr + l < m. It is clear that Si
cannot vary very much: we always have !!!:r - 1 - !:.r. -< SI < !!!:
r
- !:...
r
3. Compute the inverse QFT on 2 m to get

1
-~~-~e
~~ 1 ~ 2 .. ip(qr+l)
m Ip)
I0,I)
Viii 1=0 q=O Viii p=O
r-1 m-I s
= -1 '"' ~ e~
~ '"' m ~e ~
'"' m ) IaI).
Ip (4.31 )
m l=O p=O q=O

4. Observe the quantum representation of 2 m in superposition (4.31) to get


some p E 2 m .
5. Find the convergents ~ oft he continuous fraction expansion (see Section
9.4.2) of .E..
m
using Euclid's algorithm and output the smallest qi such that
a qi == 1 (mod n), if such qi exists.
In the next section we will estimate the correctness probability ofthis method.
Example 4.2.2. Let n = 15 and a = 7 be the element whose per iod is to be
found. Let us choose m = 16.
1. The first step is to prepare
15

~L Ik) 10).
k=O

2. Computation of k H 7k (mod 15) gives

~ (1 0) 11) + 11) 17) + 12) 14) + 13) 113 ) + 14) 11) + ... + 115) 113) )
= ~((IO) + 14) + 18) + 112») 11)
+( 11) + 15) + 19) + 113) ) 17)
+( 12) + 16) + 110) + 114) ) 14)
+( 13) + 17) + 111) + 115 ») 113 »).
4.3 The Correctness Probability 65

3. The inverse QFT in Z16 yields

~ (( 10) + 14) + 18) + 112) ) 11)


+( 10) + i 14) - 18) - i 112) ) 17)
+( 10) -14) + 18) -112) ) 14)
+( 10) - i 14) -18) + i 112)) 113)).
4. The probabilities to observe elements 0, 4, 8, and 12 are ~ for each.
5. The only convergent of 26 is just ~, which does not give the period. The
convergents of 1~ are ~ and ~, and the latter one gives the correct period
4. Observation of 8 does not give the period either, since the convergents
of 186 are ~ and !'
but 12 gives, the convergents of ~~ are ~, and t, i.

4.3 The Correctness Probability

4.3.1 The Easy Case

The probability of seeing a particular pE Zm when observing (4.31) is

() L m 1 ~
Pp= r-l - e1m
Le
SI
2
2 .. ipqr 1
m =-
1 L Le
r-
1
1 SI
2
2 .. ipqr 1
m (4.32)
2 m2
1=0 q=O 1=0 q=O

We will show that, with a non-negligible probability, the value p received


observing (4.31) will give us the period r.
To get some insight into how to evaluate the probability (4.32), we will
first assume that r happens to divide m. Recalling that SI is the greatest
integer such that sir + l < m, and that 0 ~ l < r, we see that SI = -1 for z;.
each l. In this case, (4.32) becomes
2
mlr-l
P(P) = ~ ""'~
L....J e ""fr (4.33)
m2 q=O

The sum in (4.33) runs over all of the characters of Zmlr evaluated at p. Thus
(see Section 9.2.1),
mlr-l
L ~ ==
emIr { ill.,
r
..
If P = 0 m Zmlr
q=O
o otherwise,

and therefore

P( ) = { ~, if P = d . z;. for some d


p 0, otherwise.
66 4. Fast Factorization

Thus, in the case r I m, observation of (4.31) can give only some p in the
set {O· ~,1 . ~, ... , (r - 1) . ~}, any such with a probability of ~. Now
that we know m and have learned p = d· ~ by observing (4.31), we may
try to find r by canceling -;. = ~ into an irreducible fraction using Euclid's
algorithm. Unfortunately, this works for certain only if gcd(d, r) = 1; in case
gcd( d, r) > 1 we will just get a factor of r as the nominator of .;;, = ~.
Fortunately, we can show, by using the number-theoretical results, that the
probability of having gcd(d, r) = 1 does not converge to zero too fast.

Example 4.3.1. In Example 4.2.2, the period r = 4 divided m = 16, and the
only elements that could be observed were 0, 4, 8, 12, the multiples of 4 = 146.
However, 0 and 8 did not give the period, since the multipliers 0 and 2 share
a factor 'with 4.

4.3.2 The General Case

Remembering that m = 21, it is clear that we cannot always be lucky enough


to have aperiod r dividing m. We can, however, ask what is the probability
of observing a p which is close to a multiple of ~. For if

is small enough and gcd(d, r) = 1, then ~ is a convergent of .;;" and all the
convergents of -;. can be found efficiently by using Euclid's algorithm. (see
Section 9.4.2).
We will now fix m in such a way that the continued fraction method will
apply. For any integer d E {O, 1, ... ,r - 1}, there is always a unique integer
p such that the inequality

1 m 1
-- <p-d- <- (4.34)
2 r - 2

holds. Now choosing m such that n 2 ::; m < 2n 2 guarantees that

Imp dl 1 1 1
-:;: ::; 2m ::; 2n 2 < 2r 2 '

which implies by Theorem 9.4.3 that if gcd(d, r) = 1, then ~ is a convergent


to .1!...
m
But there are only r integers p in Zm that satisfy (4.34), one for each
d E {O, 1, ... ,r - 1}. What is the probability of observing one? In other
words, we should estimate the probability of observing p which satisfies
r
Ipr - dml ::; 2" (4.35)

for some d.
4.3 The Cürrectness Probability 67

Lemma 4.3.1. For n ~ 100, the observation of (4.31) will give a p E /E m


such that Ipr - dml :::; ~ with a probability of not less than ~.

Proof. Evaluating (4.32) (Exercise 5), we get


1 r-I . 2 7rpr(sl+l)
P(P) = _ '""' sm m
m2 ~ sin 2 !S1!I.
l=O m
1 r-I . 2 7r(pr-dm)(sl+l)
_ '""' sm m
- m2 ~ . 2 7r(pr-dm)
1=0 sln m

für any fixed d by the periodicity of sin 2. Since we now assume that x =
pr - dm takes values in [- ~, H we will estimate

sin2 n(sl+l)
f(x) = . 2:'
sm - m

It is not difficult to show that f(x) is an even function, taking the maximum
(SI + 1)2 at x = 0 and minima in [-~, ~l at the end points ±~. Therefore,

x) > sm
• 2 7rr (
+ 1)
f(
2m SI
- . 2 -7rr
sm 2m

Since SI is the greatest integer such that sir + l < m,


7f r 7fr 7f r
-(1- -) < - ( S I + 1) < -(1 + -),
2 m 2m 2 m
and
sin 2 .!!:(1-.1:..)
f( x) >
-
2
. 2 7rr
m
• (4.36)
sm -2m

By the choice of m ~ n 2 , fn is negligible, and we can use estimations sin x :::; x


and sin 2 ~(1 + x) ~ 1 - (~X)2 (Exercise 6) to get

4 -
f(x)~-
7f2
(m)
r
2( 7f r 2)
1-(--)
2m
, (4.37)

which implies that

P(p) ~ -4- 1 ( 1 - (--)


7f r 2) . (4.38)
7f2 r 2m
Factor

(4.39)
68 4. Fast Factorization

in (4.38) tends to 1 as fn --+ 0, and for n 2 100, (4.39) is already greater than
0.9999. Thus, the probability of observing a fixed p such that - ~ < P - d~ ::;
~ is at least ~ ~ for any n 2 100.
But there are exactly r such values p E Zm that -~ < p - d~ ::; ~;
namely, the nearest integers to d~ for each d E {O, 1, ... ,r - I}. Therefore,
the probability of seeing some is at least ~ if n 2 100. D

We have learned that an individual p E Zm such that


1 m 1
- - <p-d- <- (4.40)
2 r - 2

is observed with a prob ability of at least ir'


Therefore, any corresponding
d E {O, 1,2, ... ,r - I} is also obtained with a prob ability of at least Weir'
should finally estimate what the probability is that for such d gcd( d, r) = 1
also holds.
The prob ability that gcd( d, r) = 1 for a given element d E {O, 1, ... , r - I}
is clearly 'P~), which can be estimated by using the following result.
Theorem 4.3.1 ([76)). For r 2 3,

r T 2.50637
ip r < e log log r + 1og Iogr '
-(-)

where 'Y = 0.5772156649 ... is Euler's constant. 1

Lemma 4.3.2. FoT' T' 2: 19, the probability that, foT' a uniformly chosen d E
{O, 1, ... ,r -I}, gcd(d,r) = 1 holds is at least 41og~ogn'

Proof. It directly follows from Theorem 4.3.1 that for r 2 19,


r
ip(r) < 4 log log r.

Therefore,

ip(r) 1 1
-- > > -----:---
r 4 log log r 4 log log n '

which was claimed. D

We combine the following facts to get Lemma (4.3.3):


• The probability that, for a randomly (and uniformly) chosen a E Zn, the
order r = ordn(a) is even and a~ =t -1 (mod n) is at least ~ (Lemma
4.2.4).
1 Euler's constant is defined by'Y = lim n --+ CXl (l + ~ + ~ + ... ~ -logn). In [76) it
is even shown that in the above theorem, 2.50637 can be replaced with 2.5 for
all numbers T' but T' = 2 ·3· 5 . 7 . 11 . 13· 17· 19 . 23.
4.3 The Correctness Probability 69

• The probability that observing (4.31) will give a p such that Ip - dr;!' I < !
is at least ~ (Lemma 4.3.1) .
• The probability that gcd(d, r) = 1 is at least 41og\ogn (Lemma 4.3.2).
Lemma 4.3.3. The probability that the quantum algorithm finds the order
of an element of Zn is at least
1 1
20 loglogn'
Remark 4.3.1. It was already mentioned in Section 4.1.4 that many interest-
ing quantum algorithms are based on conveying some information of interest
into the coefficients of quantum superposition and then applying fast QFT.
The period finding in Shor's factoring algorithm is also based on this method,
as we can see: to use quantum parallelism, it prepared a superposition
1 rn-I
Vm L Ik) 10).
k=O

The information of interest, namely, the period, was moved into the coeffi-
cients by computing k t-+ a k (mod n) to get

Jm L
rn-I
Ik) la k )
k=O
1 r - I SI
= Vm L L Iqr+l) lai).
1=0 q=O

In the above superposition, each lai) has


SI

Llqr+l) (4.41)
q=o

as the left multiplier. But in (4.41), a superposition of basis vectors Ix),


x E Zrn, already contains information about the period in the coefficients:
the coefficient of Ix) is nonzero if and only if x = qr + l, so the coefficient
sequence of (4.41) also has period r, and this period is to be extracted by
using QFT.
Notice that the elements a l are distinct for l E {O, 1, ... ,r -I}, and hence
it is guaranteed that the left multipliers (4.41) do not interfere. Thus, we can
say that the computation k t-+ a k was the operative step in conveying the
periodicity to the superposition's coefficients.

4.3.3 The Complexity of Shor's Factorization Algorithm

We can summarize the previous sections in the following algorithm:


70 4. Fast Factorization

Shor's quantum factorization algorithm


Input: An odd number n that has at least two distinct prime factors.
Output: A nontrivial factor of n.

1. Choose an arbitrary a E {I, 2, ... n - I}.


2. Compute d = gcd(a, n) by using Euclid's algorithm. If d > 1, output d
and stop.
3. Compute number r by using the quantum algorithm in Section 4.2.3
(m = 21 is chosen such that n 2 :::; m < 2n 2). If ar =1= 1 (mod n) or r is
odd or a~ == -1 (mod n), output "failure" and stop.
4. Compute d± = gcd( n, a ~ ± 1) by using Euclid's algorithm. Output num-
bers d± and stop.

Step 2 can be done in time O(C(n)3).2 For step 3 we first check whether
a has order less than 19, which can be done in time O(C(n)3). Then
we use Hadamard-Walsh transform in Zm, which can be done in time
O(C(m)) = O(C(n)). After that, computing ak (mod n) can be done in time
O(C(m)C(n)2) = O(C(n)3). QFT in Zm will be done in O(C(m)2) steps, and
finally the computation ofthe convergents can be done in time O(C(n)3). Step
4 can also be done in time O(C(n)3).
The overall complexity of the above algorithm is therefore O(C(n)3),
but the success probability is only guaranteed to be at least 57(-1og -) =
-11ogn
57Cog~(n)). Thus, by running the above algorithm O(log(C(n))) times, we ob-
tain a method that extracts a nontrivial factor of n in time O(C(n)3log C(n))
with high probability.

4.4 Excercises

1. Let G be an abelian group. Find the Fourier transform of f : G ---+ <C


defined by

f() I, if g = gi,
g = { O,otherwise.
If fis viewed as a superposition (see connection between formulae (4.3)
and (4.4)), which is the state of H corresponding to f?
2 Recall that the notation f(n) stands for the length of number n; that is, the
number of the digits needed to represent n. This notation, of course, depends on
a partitular number system chosen, but in different systems (excluding the unary
system) the length differs only by a multiplicative constant, and this difference
can be embedded into the O-notation.
4.4 Excercises 71

be the function given by F«k l , k 2)) = aln 2kl + a2nlk2, where al (resp.
a2) given by the Chinese Remainder Theorem, is the multiplicative in-
verse of n2 (resp. nl) modulo nl (resp. n2). Show that F is an isomor-
phism.
3. a) Let n and k be fixed natural numbers. Device a polynomial-time algo-
rithm that decides whether n = x k for some integer x.
b)Based on the above algorithm, device a polynomial-time algorithm
which tells whether a given natural number n is a nontrivial power of
another number.
4. Let n = p~l ... p~k be the prime factorization of n and a E Z~. Let
(al, ... , ak) be the decomposition of a given by the Chinese Remainder
Theorem and r i = ordp ;; (ai). Show that ordn (a) = lern{rl, ... , rd.
5. a)Prove that le ix_11 2 = 4sin2~.
b) Prove that

6. Use Taylor series to show that sin2 ~(1 + x) 2: 1- (~x)2.


5. Finding the Hidden Subgroup

In this brief chapter we present a quantum algorithm, which can be seen as


a generalization of Shor's algorithm. We can here explain, in a more struc-
tural manner, why quantum computers can speed up some computational
problems. The so-called Simon's hidden subgroup problem can be stated in
a general form as follows [18]:

Input: A finite abelian group G and function p : G -+ R, where R is a finite


set.
Promise: There exists a nontrivial subgroup H ::; G such that p is constant
and distinct on each coset of H.
Output: A generating set for H.

The function p is said to fulfill Simon's promise with respect to subgroup


H. If h EH, then elements 9 and 9 + h belong to the same coset of H (see
Section 9.1 for details), and since p is constant on the cosets of H, we have
that p(g + h) = p(g). We also say that pis H-periodic.
In Section 5.2 we will see that some interesting computation problems can
be reduced to solving the hidden subgroup problem. In a typical example of
this problem, IGI is so large that an exhaustive search for the generators of
H would be hopeless, but the representation size 1 of an individual element
gE G is only 8(log IGI). Moreover, G can be typicaHy described by a smaH
number of generators. Here we will consider the description size of G, which
is usually of order (log IGl)k for some fixed k as the size of the input. It is
also usually assumed that function p : G -+ R can be computed rapidly (in
polynomial time) with respect to log IGI.
For better illustration we will present the generalized form of Simon's
quantum algorithm on lFr
for solving the hidden subgroup problem. Note
also that in group lFrany subgroup is a subspace as weH, so instead of
asking for a generating set, we could ask for a basis. By using the Gauss-
Jordan elimination method one can efficiently find a basis when a generating
set is known (see Exercise 4).
1 The elements of G can be represented as binary strings, for instance.
74 5. Finding the Hidden Subgroup

5.1 Generalized Simon's Algorithm

In this section we will show how to solve Simon's subgroup problem on G =


F2', the addivite group of m-dimensional vector space over binary field F 2 =
{0,1}.

5.1.1 Preliminaries

The results represented here will also apply if F 2 is replaced with any finite
field F, so we will, for a short moment, describe them in a more general form.
More mathematical background can be found in Section 9.3.
Here we temporarily use another notation for the inner product: if x =
(XI, ... , x m ) and y = (Yl, ... , Ym) are elements of Fm, we denote their inner
product by

x . y = X1Yl + ... + XmYm.


An element y is said to be orthogonal to H if y . h = 0 for each h E H.
It is easy to verify that elements orthogonal to H form a subgroup (even a
subspace ) of Fm, which is denoted by H J... The importance of the following
simple lemma will become clear in amoment.
Lemma 5.1.1. For any y E Fm,

""' (_l)h.Y = {IHI, ify E HJ.., (5.1)


L 0, otherwzse.
hEH

Proof. This is analogous to proving the orthogonality of characters (Section


9.2.2). If y E HJ.., then h· y = 0 for each h E H, and the claim is obvious.
If y ~ H J.., then there exists an element h 1 E H such that h 1 . y -=I- 0, and
therefore

hEH hEH
= (_1)h 1 ·Y L(-l)h· Y = (_1)h 1 · Y S,
hEH
hence S = o. D

Let H E Fm be a subspace having dimension d. If X is any generating


subset of H, we can, by Exercise (4), efficiently find a basis of H. Any d x m
matrix M over F, whose rows are a basis of H, is called a generator matrix of
H. The name "generator matrix" is weIl justified by Exercise (1). It follows
from Exercises (2), (3), (4), and (5) that, once a generator matrix of H is
given, we can efficiently compute a generator matrix of HJ... Moreover, since
H = (HJ..)J.. (verify), it follows that in order to find a generating set for H,
it suffices to find a generating set for H J.. .
5.1 Generalized Simon's Algorithm 75

For a subgroup H of lFm , let T be some set that consists of exactly one ele-
ment from each coset (see Section 9.1.2) of H. Such a set is called a transversal
of H in lFm and it is clear that ITI = [lFm : HJ = PB"; I = 2;; . It is also easy to
verify that, since lFm = HEB HJ..., the equation IlFml = IHI·IHJ...I must hold.

5.1.2 The Algorithms

We will use m qubits to represent the elements of lFr and approximately


log2[lFr : HJ = log2i;;1 = m -log2lHI qubits for representing the values of
the function p : lF 2 -+ R. Here the description size of G = lFr is only m, and
we assurne that p is computable in polynomial time with respect to the input
size.
Using only function p, we can find a set of generators for H. It can be
shown [18J that if no other knowledge on p is given, i.e., p is given as a
blackbox function (see Section 6.1.2), solving this problem using any classical
algorithm requires exponential time, even when allowing a bounded error
probability. On the other hand, we will now demonstrate that this problem
can be solved in polynomial time by using a quantum circuit.
The algorithm for finding a basis of H consists of finding a basis Y of H J... ,
then computing the basis of H. As mentioned earlier, the latter stage can be
efficiently performed by using a classical computer. The problem of finding a
basis for H J... would be easier if we knew the dimension of H J... in advance, as
we will see. In that case, the algorithm could be described as follows.
Algorithm A:
Finding the Basis (Dimension Known)
1. If d = dirn H J... = 0, output 0 and stop.
2. Use the Algorithm B below to choose the set Y of d elements of HJ...
uniformly.
3. Use the Gauss-Jordan elimination method (Exercise 4) to check whether
Y is an independent set. If Y is independent, output Y and stop; other-
wise, give "failure" as output.
In amoment, we will see that the second step of the above algorithm will
l
give a basis of HJ... with a probability of at least (Lemma 5.1.2). Before that,
we will describe how to perform that step rapidly by using a quantum circuit.
It is worth mentioning that, in the above algorithm, a quantum computer is
needed only to produce a set of elements of H J... .
Algorithm B:
Choosing Elements U niformly
1. Using the Hadamard-Walsh transform on lFr, prepare
1
. Mffi L
v 2 ··· :llEiF2"
Ix) 10) . (5.2)
76 5. Finding the Hidden Subgroup

2. Compute p to get

ffm
1
I: Ix) Ip(x)) = ffm I: I: It + x) Ip(t)).
1
(5.3)
2 "'ElF;" 2 tET "'EH

3. Use the Hadamard-Walsh transform on lF;" to get

_1_ I: I: _1_ I: (-l)(t+"')·y Iy) Ip(t))


ffm tET "'EH ffm yElF;"
= 2~ I: I: (_l)t oy I: (-l)"'·y Iy) Ip(t))
tETyElF;" "'EH

= I~I I: I: (_l)t. y Iy) Ip(t)).


tETYEH.L

4. Make an observation to get an element Y E H.l.


In the step 3, the equality between the the last and the second-last for-
mulae follows from Lemma 5.1.1. It is easy to see that the prob ability of
observing a particular Y E H.l is

so the Algorithm B works correctly: elements of H.l are drawn uniformly. It


is also clear that the above quantum algorithm runs in polynomial time.
For the second step of Algorithm A, we will now analyze the prob ability
that when choosing d elements uniformly from a d-dimensional subspace, the
chosen vectors are indpendent. The analysis is straight forward as seen in the
following lemma.

Lemma 5.1.2. Let t ::; d and Y1, ... , Yt be randomly chosen vectors, with
uniform distribution, in a d-dimensional vector space over lF 2. Then the prob-
ability that Y1, ... , Yt are linearly independent is at least i.
Proof. The cardinality of a d-dimensional vector space over lF 2 is 2d . There-
fore, the prob ability that {Yd is a linearly independent set is 2~d1 , since only
choosing Y1 = 0 makes {Yd dependent. Suppose now that Y1, ... , Yi have
been chosen in such a way that S = {Y1, ... , Yi-d is a linearly independent
set. Now S generates a subspace of dimension i - 1. Hence, there are 2i - 1
choices for Yi that make S U {Yd linearly dependent. Thus, the probability
that the set {Yl, ... , Yt} is an independent set is

2d - 20 2d _ 21
P = 2d 2d
(5.4)

Then
5.1 Generalized Simon's Algorithm 77

Now

In -
1
2p - ~
i=2
t
< "ln 1 + -.-
( 1)
<" -.- <
t 1
2' - 1 - ~ 2' - 1 - ~ - . 2'
i=2 i=2 4
"-31-.
t 4
< -3 "~ ----:-
i=2
2'
00 1

412
3 2 3'
so
1
2p < eaa < 2, hence p > 1
4:' o

Remark 5.1.1. Notice that if we build another algorithm (call it A') which
repeats Algorithm A whenever the output is "failure" , then A' cannot give
us a wrong answer, but we cannot find apriori upper bound for the number
of repeats it has to make. Instead, we can say that on average, the number
of repeats is at most four. Such an algorithm is called a Las Vegas algorithm,
cf. Section 3.1.2. An interesting method for finding the basis in polynomial
time with certainty is described in [18J.

Remark 5.1.2. Algorithm B for choosing an element in H ~ resembles the


period-finding algorithm of Section 4.2.3. The first step was to prepare a
quantum superposition of all the elements of lFr.
Then, by utilizing quan-
tum parallelism, function p was computed simultaneously on all elements of
lFr. This was the operative step to convey information of interest to the su-
perposition coefficients, and this information was extracted by using QFT.
Notice that since p is different on distinct cosets, all the vectors Ip(t)) are
orthogonal and, therefore, the states

L (_l)t. y Iy) Ip(t))


yEH.L

with different values of p(t) do not interfere.

To conclude this section, we describe how to find the basis of H ~, when


d = dirn H ~ is not known in advance. Because of Algorithm A, this is done
if we can describe an algorithm which can find out the dimension of H ~ .
The dimension of H~ can be found by utilizing Algorithm A (which uses
Algorithm B as a subprocedure). By Lemma 5.1.2 we see that, if we choose

that the chosen elements are independent is at least t


uniformly t :::; d elements from a subspace of dimension d, then the probability
We utilize this as
follows: if D = dimH~, the (unknown) dimension of H ,we just guess that
the dimension of H~ is d (0 :::; d:::; m), and run Algorithm A (with the d we
78 5. Finding the Hidden Subgroup

guessed). If d ::; D, we obtain, according to Lemma 5.1.2, a set of d linearly


independent vectors with a probability of at least i. Once we get such a set
of vectors, we can tell, with certainty, that the guess d ::; D was correct (the
number of linearly independent vectors that a vector space can contain is at
most the dimension of the space).
On the other hand, if d > D, the set of d vectors we obtain can never be
linearly independent. After repeating Algorithm A a number of times, we can
be convinced that our guess d > D was incorrect. Anyway, Algorithm A can
always be used to decide whether d ::; D holds with a correctness prob ability
of at least i.
We will now describe how to find D, the dimension of H.L. To make the
description easier, we will regard all other elements (D, H, and p) fixed, but
only the interval 1 where the dimension may be found, is mentioned as an
input.
For an interval 1 = {k, k + 1, ... , k + r} we use notation M(I) = l~J for
the "midpoint" of 1, B(I) = {k, k+ 1, ... ,M(I) + I} for the "initial part" and
T(I) = {M(I), ... ,k+r} for the "final part" of 1. In the following algorithm,
D always stands for the true (unknown) dimension of H.L and initially, the
input of the following algorithm is the interval 1 = {O, 1, ... , m}.
Algorithm C:
Determining the Dimension
1. If 111 = 1, output the unique element of 1 and stop.
2. If 111 > 1, run Algorithm A to decide if D ::; M(I). If the answer is "yes" ,
apply this algorithm with input B(I); otherwise, apply this algorithm
with input T(I).
Algorithm C thus finds the dimension D by recursively cutting the interval
{O, 1, ... , m} into two approximately equally long parts, and using Algorithm
A to decide which one is the part containing D. It is therefore clear that
Algorithm C stops after 8(lOg2 m) recursive calls. Since running Algorithm
C, we have to make only k = 8(log2 m) decisions based on Algorithm A,
which has a correctness probability of at most i, we have that Algorithm
C works with a correctness prob ability of p:::: (i)k = 8(i]Og2 m ) = 8(~).
Therefore, repeating Algorithm C O(m 2 ) times we get an algorithm which
gives the dimension D with a nonvanishing correctness probability.
It is left as an exercise to analyze the running time of Algorithms A, B,
and C to conclude that the generators of a hidden subgroup can be found in
polynomial time provided that p can be computed in polynomial time.

Remark 5.1.3. It is an open problem whether the hidden subgroup prob-


lem can be solved, by using a quantum computer, in polynomial time for
non-abelian groups as weIl. For progress in this direction, see [36] and the
references therein. The hidden subgroup problem for non-abelian groups is of
special interest, since the graph isomorphism problem can be reduced to it.
5.2 Examples 79

5.2 Examples

We will now demonstrate how the hidden subgroup problem can be applied
to computational problems. These examples are due to [59].

5.2.1 Finding the Order

This problem lies at the very heart of Shor's factorization algorithm: Let n
be a large integer and a another integer such that gcd( a, n) = 1. Denote
r = ordn(a), i.e., r is the least positive integer that satisfies a r == 1 (mod n).
In this problem, we have a group Z, and the hidden subgroup is rZ, whose
generator r should be found.
The function p(x) = aX + nZ satisfies Simon's promise, since
p(x) = p(y)
{::::::} aX + nZ = aY + nZ
{::::::} n I (a X - aY )
{::::::} r I (x - y)
{::::::} x + rZ = y + rZ.
Because Z is an infinite group, we cannot directly solve this problem by using
the algorithm of the previous section, but using finite groups Z21 instead of
Z already gives us approximations that are good enough, as shown in the
previous chapter.

5.2.2 Discrete Logarithm

If F is a cyc1ic group of order q, then there is a generator 9 E F such that


F = {1 = gO, g1, g2, ... ,gq-l} (see Section 9.1 for notions of group theory).
Hence, if we are given a generator 9 of Fand another element a E F, then
a can be uniquely expressed as a = gr, where r E {O, 1,2, ... ,q - I}. The
discrete logarithm problem is to find r.
Since gq = 1 (and q is the least positive number with this property),
we have that gr 1 = gr2 if and only if rl == r2 (mod q). Hence, instead of
regarding the exponent of 9 as an integer, we can regard it as an element of
Zq as weH. In fact, mapping Zq -+ F, r + qZ f-t gr is an isomorphism.
We can reduce the discrete logarithm to finding the hidden subgroup as
foHows:
If a = gr E F, let G = Zq x Zq. The hidden subgroup H = {(a, ra) I a E
Zq} is generated by element (1, r). That function p : Zq x Zq -+ G defined as

p(x,y) = gXa-Y
satisfies Simon's promise, can be verified easily:
80 5. Finding the Hidden Subgroup

P(XI, Yd = p(X2, Y2)


{==} gXla-Yl = gX2 a -Y2

{==} gXl-rY1 = gX2- r Y2


But the last equation is true if and only if Xl - rYI = X2 - rY2 (when regarded
as elements of Zq), which, in turn, is true if and only if Xl - X2 = r(YI - Y2).
This is equivalent to condition (Xl, Yd + H = (X2, Y2) + H.
On the other hand, there may be other generators of H than (1, r), but
using a number-theoretic argumentation similar to that used in Section 4.3.2,
we can see that the discrete logarithm can be found with a nonvanishing
probability in polynomial time.

Remark 5.2.1. The reliability ofU.S. Digital Signature Algorithm is based on


the assumption that the discrete logarithm problem on finite fields remains
intractable on classical computers [60].

5.2.3 Simon's Original Problem

Simon [84] originally formulated his problem as follows:

Input: An integer m 2': 1 and a function P : lF~ -+ R, where R is a finite set.


Promise: There exists a nonzero element S E lF~ such that for all x, y E lF2'
p(x) = p(y) if and only if x = y or x = y + s.
Output: Element s.
It is plain to see that choosing G as lF~ and H = {O, s} as the subspace
generated by s makes this problem a special case of a more general problem
presented at the beginning of this chapter.
Simon's problem is of historical interest, since it was the first one providing
an example of a problem that can be solved, with nonvanishing prob ability,
in polynomial time by using a quantum computer, but which requires expo-
nential time in any classical algorithm if pis considered a blackbox function
(see Section 6.1.2).

5.3 Exercises

1. Show that if M is a generator matrix of H, then

2. Show that if M is a generator matrix of Hand N is a d' x m matrix


such that MN.l = 0, then the subspace generated by the rows of N is
orthogonal to H.
3. The elementary row operations on a matrix over field lF are:
5.3 Exercises 81

a) Swapping two rows, Xi +-+ Xj.


b) Multiplying a row with a nonzero element of F, Xi H eXi, e #- O.
c) Adding to a row another row multiplied by any element of F, Xi H
Xi + eXj.
Show that if M' is obtained from M by using the elementary row oper-
ations, then the rows of M and M' generate exactly the same subspace.
4. A matrix is in a reduced row-echelon form if:
a) For any row containing a nonzero entry, the first such is 1. The first
nonzero entry in a row is called the pivot.
b) The rows containing only zero entries, if there are such rows, are the
last ones.
c) The pivot on each row is on the right-hand side of the pivot of the
rowabove.
d) Each column containing a pivot of a row has zeros everywhere else.
If only conditions a), b), and c) are satisfied, we say that the matrix is
in row-echelon form.
Prove the following Gauss-Jordan elimination theorem. Each kxm matrix
can be transformed into a reduced row-echelon form by using O(k 3 m)
field operations. Moreover, the nonzero rows of a row-echelon form matrix
are linearly independent.
5. Prove that if M = (Id I M') is a generator (d x (m - d)) matrix of H,
then N = (- MT I I m - d ) is a generator matrix of H T .
6. Grover's Search Algorithm

6.1 Search Problems

Let us consider a children's game called hiding the key: one player hides the
key in the house, and the others try to find it. The one who has hidden the key
is permanently advising the others by using phrases like "freezing", "cold",
"warm", and "hot", depending on how elose the seekers are to the hiding
place. Without this advice, the game would obviously last much longer. Or,
can you develop a strategy for finding the key without searching through the
entire house?

6.1.1 Satisfiability Problem

There are many problems in computer science that elosely resemble searching
for a tiny key in a large house. We will shortly discuss the problem of finding
a solution for an NP-complete 3-satisfiability problem: we are given a propo-
sitional expression in a conjunctive normal form; each elause is a disjunction
of three literals (a Boolean variable or a negation of a Boolean variable). In
the original form of the problem, the task is to find a satisfying truth as-
signment, if any such exists. Let us then imagine an advisor who always has
enough power to tell at once whether or not a given Boolean expression has a
satisfying truth assignment. If such an advisor were provided, finding a sat-
isfying assignment is no longer a difficult problem: we could just substitute 1
for some variable and ask the advisor whether or not the resulting Boolean
expression with less variables had a demanded assignment. If our choice was
incorrect, we would flip the substituted value and proceed on recursively.
Unfortunately, the advisor's problem to tell whether a satisfying valuation
exists is an NP-complete problem, and in light of our present knowledge, it
is very unlikely that there would be a fast solution to this problem in the
real world. But let us continue our thinking experiment by assuming that
somebody, let us call hirn a verifier, knows a satisfying assignment for a
Boolean expression but is not willing to tell it to uso Quite surprisingly,
there exist so-called zero-knowledge protocols (see [79J for more details), which
the verifier can use to guarantee that he really knows a satisfying valuation
without revealing even a single bit of his knowledge. Thus, it is possible that
84 6. Grover's Search Algorithm

we are quite sure that a satisfying truth assignment exists, yet we do not have
a clue what it might bel The obvious strategy to find a satisfying assignment
is to search through all of the possible assignments. But if there are n Boolean
variables in the expression, there are 2n assignments, and because we do not
have a supernatural advisor, our task seems quite hopeless for large n, at
least in the general case. In fact, no faster method than an exhaustive search
is known in the general case.

Remark 6.1.1. The most effective classical procedure for solving generic NP-
complete problems seems to be that one described by Uwe Schönig [83].

6.1.2 Probabilistic Search

In what follows, we will slightly generalize our search problem. Instead of


seeking a satisfying assignment for a Boolean expression, we will generally
talk about functions

and our search problem is to find a x E 1F2' such that J(x) = 1 (if any such
x exists).
Notice that, with the assumption that J is computable in polynomial
time, this model is enough to represent NP problems because it includes the
NP-complete 3-satisfiability problem. This model is also assigned [41] to an
unordered database search, where we have to find a specific item in a huge
database. Here the database consists of 2 n items, and J is the function giving
a value of 1 to a required item and 0 to the rest, hereby telling us whether
the item under investigation was the required one.
Another simplification, a huge one this time, is to assurne that J is a so-
called blackbox function, i.e., we do not know how to compute J, but we can
query J, and the value is returned to us instantly in one computational step.
With that assumption, we will demonstate in the next chapter how to derive
a lower bound for the number of queries to J to find an item x such that
J(x) = 1. In the next chapter, we will, in fact, present a general strategy for
finding lower bounds for the number of queries concerning other goals, too.
Now we will study probabilistic search, i.e., we will omit the requirement
that the search stragegy will always give such a value x that J(x) = 1 but
will do this only with a nonvanishing probability. This means that the search
algorithm will give the required x with at least some constant prob ability
o < p :s:; 1 for any n. This can be seen as a natural generalization of the
search which with certainty returns the required item.

Remark 6.1.2. Provided that the probabilistic search strategy is rapid, we use
very standard argumentation to say that we can find x rapidly in practice: the
prob ability that, after mattempts we have notfound x, is at most (l_p)m :s:;
e- pm , which is smaller than any given f > 0, when m > -~. Thus, we
6.1 Search Problems 85

can reduce the error probability p to any other positive constant c just by
repeating the original search a constant number of times.

It is easy to see that any classical deterministic search algorithm which


always gives an x such that 1(x) = 1 (if such x exists) will require N = 2n
queries to 1: if some algorithm makes less than N queries, we can modify 1
so that the answer is no longer correct.

Remark 6.1.3. In the above argumentation, it is, of course, essential that


we handle only blackbox functions. If we are, instead, given an algorithm
for computing 1, it is very difficult to say how easily this algorithm itself
will give information about x to be found. In fact, the quest ion about the
computational work required to recover x from the algorithm for 1 lies at
the very heart of the difficult open problem whether P i= NP.

We cannot do much better than a deterministic search by allowing ran-


domization: fix y E 1F2', and consider a blackbox function 1y defined by

f () { 1, if x = y, (6.1)
y x = 0, otherwise.

If we draw disjoint elements xI, ... , Xk E 1F2' with uniform distribution, the
probability of finding y is ~, so we would need at least pN queries to find y
with a probability of at least p. Using nonuniform distributions will not offer
any relief:

Lemma 6.1.1. Let N = 2n and 1 be a blackbox junction. Assume that At


is a probabilistic algorithm that makes queries to 1 and returns an element
x E 1F2'. IJ, 10r any nonconstant 1, the probability that 1(x) = 1 is at least
p > 0, then there is 1 such that AI makes at least pN - 1 queries.

ProoJ. Let 1y be as in (6.1) and Py(k) be the probability that At" returns
y using k queries. By assumption Py(k) 2: p, and we will demonstrate that
there is some y E 1F2' such that Py(k) :::; W.
First, by using induction, we show that

L Py(k) :::; k + 1.
yEIF2'

If k = 0, then At" gives any x E 1F2' with some probability Pro, and thus

L Py(O) = L Py = 1.
yEIF2' yEIF2'

Assurne, then, that

L Py(k - 1) :::; k.
yEIF2'
86 6. Grover's Search Algorithm

On the kth query A J" queries f(y) with some probability qy, and therefore
Py(k) ::::: Py(k - 1) + qy. Thus,

L Py(k)::::: L Py(k - 1) + L qy ::::: k + 1.

Because there are N = 2n different choices for y, there must exist one with

P. (k) < k + 1.
y - N

It follows that kt/ ? Py(k) ? p, so k ? pN - 1. D

6.1.3 Quantum Search with One Query

Let us continue studying blackbox functions f : lF2' -+ lF 2. The natural


quest ion arising now is whether we can devise a faster search method on
a quantum computer using quantum parallelism for querying many values of
a blackbox nlllction simultaneously.
The very first thing we need to fix is the notion of a quantum blackbox
junction. The notion we follow here is widely accepted: let x E lF2'. In order
to make a blackbox nlllction query f(x) on a quantum computer, we will
utilize a source register Ix) (n qubits) and a target qubit Ib). A query operator
QJ is a linear mapping defined by

QJ Ix) Ib) = Ix) Ib ffi f(x) , (6.2)

where ffi means addition modulo 2 or, in other words, exclusive or -operation.
We can now easily see that (6.2) defines a unitary mapping. In fact, vector
set

spans a 2n +1 -dimensional Hilbert space. Mapping QJ operates as a permu-


tation on B, and it is a well-known fact that any such mapping is unitary.
Moreover, since

QJQJ Ix) Ib) = Ix) Ib EB f(x) EB f(x) = Ix) Ib),


the inverse of QJ is QJ itself.
Let us now fix y E lF2' and try to devise aquanturn search algorithm for
function fy defined in (6.1). The very first idea is to generate astate

V
1
!Ort
2··
L
"'ElF;'
Ix) 10) (6.3)

beginning with 10) 10) and using Hadamard-Walsh transform H n (cf. Section
4.1.2). After that, one could make a single query QJ" to obtain state
6.1 Search Problems 87

V
Mn
1
2··
L
"'ElF,
Ix) 10 EB fy(x)) =
V
Mn
1
2··
L
"'ElF,
Ix) Ify(x)). (6.4)

But if we now observe the last qubit of state (6.4), we would only see 1 (and
after that, observing the first register, get the required y) with a prob ability
of 21n , so we would not gain any advantage over just guessing y.
But quantum search can improve the probabilistic search, as we will now
demonstrate. Having state (6.3), we could flip the target bit (to 11)), and
then apply the Hadamard transform to that bit to get astate

~
v2 n
L Ix) ~(IO) -11)) = v2~(
v2 n +1
L Ix) 10) - L Ix) 11)). (6.5)
"'ElF, "'ElF, "'ElF,

Ifx =I- y, then Qfy Ix) 10) = Ix) 10) and Qfy Ix) 11) = Ix) 11), but Qfy Iy) 10) =
Iy) 11) and Qfy Iy) 11) = Iy) 10); so, querying fy by the query operator Qfy
on state (6.5) would give us astate

J2~+1 (L Ix) 10) + Iy) 11) - L Ix) 11) -Iy) 10) )


"'#y "'#y

= J2~+1 (L Ix) (10) -11)) + Iy) (11) -1 0)))


"'#y

= _1_ L (-l)f y ("') Ix) ~(IO) -11)). (6.6)


ffn "'ElF, y'2
Notice that preparing the target bit in superposition ~(IO) - 11)) before
applying the query operator was used to encode the value f y (x) in the sign
(_1)fy("'). This is a very basic technique in quantum computation. In the
continuation we will not need the target bit anymore, and instead of (6.6) we
will write only

(6.7)

So far we have seen that applying the query operator only once in a quantum
superposition (6.4), we get state (6.7) ignoring the target qubit. We continue
as follows: we write (6.7) in form

~(",~,IX) - 21Y) ),
and apply Hadamard H n transform to get

10) - 2: L (-l)""y
"'ElF,
Ix)

(6.8)
88 6. Grover's Search Algorithm

In superposition (6.8), we separate 10) from all other states by designing


function 10 : 1F2' -+ 1F2 , whieh gets a value of 1 for 0 and 0 for all other
x E 1F2'. Such a function is possible to eonstruct by using O( n) Boolean gates
and, through the eonsiderations of seetion 3.2.3, it is also possible to eonstruet
10 by using O(n) quantum gates, using several aneilla qubits. We will store
the value of 10 in an additional qubit to obtain state

(1- 22n ) 10) 11) - 22n 2: (_l)OO· YIx) 10). (6.9)


00#0

Now, observation of the last qubit of (6.9) will result in

10) 11) (6.10)

with a probability of (1 - 22n )2 = 1 - 2-: + 2~n, and in


~ '"'(_l)OO· Y Ix) 10)
2n - 1 L..J
00#0

= ~(
2 1
2: (-l)OO·Y Ix) -1 0)) 10)
IF n
(6.11)
ooE 2

with a probability of 2-: - ~.


Applying Hadamard transform H n onee more to states (6.10) and (6.11)
will yield (ignoring the last qubit)
1
y ~.-
2: Iz)
rzn ZEIF2' (6.12)

and

respeetively.
Finally, observing (6.12) and (6.13) will give us y with probabilities 2~
and 2~~1 respeetively. Keeping in mind the probabilities to see states (6.10)
and (6.11), we find that the total probability of observing y is

which is approximately 2.5 times better than 22n , the best we eould get by
a randomized algorithm that queries I y only onee (see the proof of Lemma
6.1.1).
6.2 Grover's Amplification Method 89

Soon we will see that, by using a quantum circuit, it is still possible


to improve this probability. The next section is devoted to Lov Grover's
ingenious idea of using quantum operations to amplify the probability for
finding the required element.

6.2 Grover's Amplification Method


We will still use blackbox function 1y : lF2' ---+ lF 2 from the previous section as
an example. The task is to find y by querying the I y , and we will follow the
method of Lov Grover to demonstrate that using a quantum circuit, we can
find y with nonvanishing probability by using O( ffn) queries. We will also
show how to generalize this method for all blackbox functions.

6.2.1 Quantum Operators for Grover's Search Algorithm

A query operator Qf1l which is used to call for values of I y uses n qubits for
the source register and 1 for the target. We will also need a quantum operator
Rn defined on n qubits and operating as Rn 10) = -10) and Rn Ix) = Ix),
if x t- o. If we index the rows and columns of a matrix by elements of lF2'
(ordered as binary numbers), we can easily express Rn as a 2n X 2n matrix,

-100 ... 0
010 ... 0
001. .. 0 (6.14)

000 ... 1
But we require that the operations on a quantum circuit should be loeal, Le.,
there should be a fixed upper bound on number of qubits on which each gate
operates. Let us, therefore, pay some attention on how to decompose (6.14)
into local operations.
The decomposition can obviously be made by making use of function
10 : lF2' ---+ lF2 , which is 1 at 0 and 0 everywhere else. As discussed in the
earlier section, a quantum circuit F n for this function can be constructed by
using O(n) simple quantum gates (which operate on at most three qubits)
and some, say k n , ancilla qubits. To summarize: we can obtain a quantum
circuit Fn on n + k n qubits operating as

F n Ix) 10) 10) = Ix) I/o(x)) laal)' (6.15)

Using this, we can proceed as follows: we begin with

Ix) 11) 10),


and then operate with H 2 on the target bit to get
90 6. Grover's Search Algorithm

Ix) ~(IO) - 11)) 10).

After this, we call F n to get

Using first the reverse circuit F;;l and then H 2 on the target qubit will give
us

(_I)fo(:z:) Ix) 11) 10) .

Composing all of this together, we get an operation

Ix) 11) 10) I-t (_I)fo(:z:) Ix) 11) 10) ,

which is the required operation Rn with some ancilla qubits. As usual, we


will not write down the ancilla explicitly.
The other tool needed for a Grover search is to encode the value fex) in
the sign - but this can be easily achieved by preparing the target bit in a
superposition:

Qf Ix) _1 (10) -11)) = (_I)f(:z:) Ix) _1 (10) -11)). (6.16)


/2 /2
Instead of (6.16), we will introduce notation

for the modified query operator, again omitting the ancilla qubit.

6.2.2 Amplitude Amplification

Grover's method for finding an element Y E 1F~ that satisfies f(y) = 1 - we


will call such a y a solution hereafter - can be viewed as iterative amplitude
amplification. The basic element is quantum operator

(6.17)

working on n qubits, which represent elements x E 1F~. In the above defini-


tion, Vf is the modified query operator working as Vf Ix) = (_I)f(:z:) Ix), H n
is the Hadamard transform, and Rn is the operator which reverses the sign
of 10), Rn 10) = -10) and Rn Ix) = Ix) for x =f. o.
It is interesting to see that HnRnHn can be written in quite a simple
form as a 2n x 2n matrix: if we let elements x E 1F~ to index the rows and
the columns, we find that, since
6.2 Grover's Amplification Method 91

H n Ix) = _1_ L (_l):Z:·y Iy),


ffn yEIF2'
the element of H n at location (x,y) is .An(-l):Z:·Y. Writing generally A:z:y
for the element of a 2n x 2n matrix A at location (x, y), we see (recall the
matrix representation of Rn in the previous section) that

(HnRnHn):z:y = L (HnRn):z:z(Hn)zy
:Z:EIF2'

z w

= 2~ L (-l):Z:·Z(Rn)zz(-l)z.y
ZEIF2'

= 2~ (-2 + L (_l)(:Z:+Y)·Z)
zEIF2'
={ 1-
-~,.ifx=FY
2 n , If X = y.

Thus HnRnHn has matrix representation

HnRnHn = ( 1=~. .
.
2n 1 =~
..
.
. ..
. . =~)
2n· • . 2n
.
..
,

- 22n - 22n ••• 1 - 2~

which can also be expressed as

(6.18)

where I is 2n x 2n identity matrix and P is a 2n x 2n projection matrix, whose


every entry is 2~. In fact, it is quite an easy task to verify that P represents
a projection into a one-dimensional subspace generated by vector

'ljJ=
1
Mn L
v2·· :z:EIF2'
Ix).

Thus, using the notations of Chapter 8, we can write P =1 'ljJ)('ljJ I, but this
notation is not crucial here. It is more essential that representation (6.18)
gives us an easy method for finding the effect of -HnRnHn on a general
superposition

(6.19)

Writing
92 6. Grover's Search Algorithm

(the average of the amplitudes), we find that

(6.20)

is the decomposition of (6.19) into two orthogonal vectors: the first belongs to
the subspace spanned by 'l/Jj the second belongs to the orthogonal complement
of that subspace. In fact, it is clear that the first summand of the right-hand
side of (6.20) belongs to the subspace generated by 'I/J, and the orthogonality
of the summands is easy to verify:
(A L Ix) I L(Cy - A) Iy))
= L LA*(c:z: - A)(x I y)
:z:ElF2' ylF2'

=A* L C:z:- L A*A


:z:ElF2' :z:ElF2'
= A*2n A - 2n A* A = o.
Therefore,

and

= 2A L Ix) - L C:z: Ix)


:Z:ElF2' :z:ElF2'

= L (2A - c:z:) Ix) . (6.21)


:z:ElF2'

Expression (6.21) explains why operation -HnRnHn is also called inversion


about avemge: it operat es on a single amplitude C:z: by multiplying it by -1
and adding two times the average.
We use the following example to explain how Gn = - HnRnHnVf is used
to amplify the desired amplitude in order to find the solution (recall that the
solution is an element y E 1F~ which satisfies J(y) = 1). In this example, we
consider a nlllction J5 : 1F~ -+ 1F2 , which takes value 1 at y = (0, ... ,0,1,0,1),
and is 0 everywhere else. The search begins with superposition
1
l2n L
v ~." :Z:ElF2'
Ix)
6.2 Grover's Amplification Method 93

having uniform amplitudes. Figure 6.1 depicts this superposition, with Co =


Cl = ... C2 n = Jn.
The modified query operator Vfs flips the signs of all

Fig. 6.1. Amplitudes of the initial configuration.

those amplitudes that are coefficients of a vector Ix) satisfying !5(x) = 1. In


this example, y = (0, ... ,0, 1,0,1) is the only one having this property, and
C5 becomes - Jn.This is depicted in Figure 6.2.

1
~ -~I__- L_ _~~_ _- L_ _.-~_ _- L_ _ _ _- - L

Fig. 6.2. Amplitudes of after one query.

In the case of Figure 6.2, the average of the amplitudes is

which is still very close to Jn.


Thus, the inversion about average-operator
-HnRnHn will perform a transformation

I
ffn '---'.2A
,--, I
- ffn ~
~
I
ffn
- Jn Jn Jn,
{
r-+ 2A + :::0 3.

which is illustrated in Figure 6.3. It is thus possible to find the required

1
~

Fig. 6.3. Amplitudes after -HnRnHnVf.

y = (0, ... ,0,1,0,1) with a prob ability :::0 29n by just a single query. This is
approximately 4.5 times better than a classical randomized search can do.
94 6. Grover's Search Algarithm

6.2.3 Analysis of Amplification Method

In this section, we will find out the effect of iterative use of mapping e n =
-HnRnHn Vf, but instead of a blackbox function I : lF2' ~ lF 2 that assumes
only one solution, we will study a general I having k solutions (recall that
here a solution means a vector x E lF2' such that I(x) = 1).
Let the notation T <:: lF2' stand for the set of solutions, and F = lF2' \ T
for the set of non-solutions. Thus ITI = k and IFI = 2n - k. Assume that
after riteration steps, the state is

tr 2: Ix) + Ir 2: Ix) . (6.22)


aJET aJEF

The modified query operator Vf will then give us

-tr 2: Ix) + Ir 2: Ix) , (6.23)


aJET aJEF

and the average of the amplitudes of (6.23) is

Operator -HnRnHn will, therefore, transform (6.23) into

t r+1 2: Ix) + Ir+l 2: Ix) ,


where t r+1 = 2A -tr = (1- ;~ )tr+ (2- ;~ )/r and Ir+1 = 2A- Ir = - ;~tr+
(1 - ;~ )/r' Collecting all of this together, we find that e n = -HnRnHnVf
operates on (6.22) as a transformation

(6.24)

Therefore, to find out the effect of en on uniformly weighted initial super-


position
1
ffn 2: Ix),
aJElF2'

we have to salve (6.24) with boundary conditions to = 10 = ~. This can be


done as follows: it is clear that t r and Ir are real and, since e n is a unitary
operator, equation

(6.25)
6.2 Grover's Amplification Method 95

must hold for each r. That is to say that each point (t r , Ir) lies in the ellipse
defined by equation (6.25). Therefore, we can write

{
tr = ~ sinOn
Ir = v'2!-k cos Or

for some number Or. Recursion (6.24) can then be rewritten as

sin Or+! = (1 - ~! ) sin Or + 2~ y'2 n - kv'k cos On


{ (6.26)
cos Or+! = - 2~ y'2 n - kv'k sin Or + (1 - ~~ ) COS Or.
Recall that k is the number of elements in lF~ that satisfy I(Y) = 1, hence,
1- ~! E [-1,1]. Therefore, we can choose w E [0,11"] such that cosw = 1- ~!.
Then sinw = 22n y'2 n - kv'k and (6.26) becomes

{ SinOr+l = sin(Or + w),


cosOr+! = cos(Or +w).

The boundary condition gives us that sin2 00 = 2";.' and it is easy to verify
that the solution of the recursion is

tr = ~ sin(rw + 00),
{
Ir = v'2!-k cos(rw + 00),

where 00 E [0,11"/2] and w E [0,11"] are chosen in such a way that sin200 = 2";.
and cos w = 1 - ~~. In fact, then cos w = 1 - 2 sin 2 4>0 = cos 200 , so we have
obtained the following lemma.
Lemma 6.2.1. Solution 01 (6.24) with boundary conditions to = 10 = Jn
is
tr = ~ sin (2r + 1)00 ),
{ (6.27)
Ir = v'2!-k cos «2r + 1)00),

where 00 E [0,11"/2] is determined by sin 2 00 = 2";.'

We would like to find a suitable value for r to maximize the probability for
finding a solution. Since there are k solutions, the probability of seeing one
solution is

(6.28)

We would then like to find a non-negative r as small as possible such that


sin 2«2r + 1)00) is as elose to 1 as possible; Le., the least positive integer r
such that (2r + 1)00 is as elose to ~ as possible.
We notice that
96 6. Grover's Search Algorithm

7'017'0
(2r + 1)80 = -
2
{=? r = - -
2
+ -46
0 '

and, since 65 ~ sin 2 60 = 2':-.' we have

and therefore after approximately

iterations, the probability of seeing a desired element Y E lF~ is quite dose


to 1. To be more precise:

Theorem 6.2.1. Let f : lF~ ---+ lF 2 such that there are k elements x E lF~
satisfying f(x) = 1. Assume that 0 < k :::; ~ ·2 n ,l and let 60 E [0,7'0/3) be
chosen such that sin 2 60 = 2':-. :::; ~. After l4~o J iterations of G n on an initial
superposition
1
l2n"
v,(,··
L
"'ElF;'
Ix)

the probability of seeing a solution is at least ~.

Proof. The probability of seeing a desired element is given by sin 2 ((2r+ 1)80),
and we just saw that r = - ~ + 4~o would a give of probability 1. Thus, we
only have to estimate the error when - ~ + 4~o is replaced by l4~o J.
Clearly

for some 15 with 1151 :::; ~. Therefore,

(2l 4;0 J + 1) 60 = ~ + 21560 ,


so the distance of (2l4~oJ + 1)80 from ~ is 1215601:::; i. It follows that
.2(2 -
sm l7rJ
46
+ 1)6 0 >
-
.27'07'0
sm
2 3
1
(- - -) = -.
4
0

1 If k > ~ . 2 n , we can find a desired solution x with a prob ability of at least ~


just by guessing, and if k = 0, then G n does not alter the initial superposition
at all.
6.3 Utilizing Grover's Search Method 97

Grover's method for quantum search is summarized in the following algo-


rithm:
Grover's Search Algorithm
Input: A blackbox function f: 1F2' -+ 1F2 and k = I{x E 1F2' I f(x) = 1}1·
Output: An y E 1F2' such that f(y) = 1 (a solution), if such an element
exists.
1. If k > i .2n , choose y E 1F2' with uniform probability and stop.
2. Otherwise compute r = l4~o J, where (Jo E [0, Ti /3] is determined by
• 2 (J k
sIn 0 = 2n'
3. Prepare the initial superposition
1
.12n"
v ~ ..
L
OOEIF2'
Ix)

by using Hadamard transform H n'


4. Apply operator G n r times.
5. Observe to get some y E 1F2'.
If k 2: i.
2n , then the first step clearly produces correct output with
a probability of at least i.
Otherwise, according to Theorem (6.2.1), the
algorithm gives a solution with a probability of at least In each case, we i.
can find a solution with nonvanishing probability.
If k = 1 and n large, then l4~oJ ~ i..;'F', so using O(..;'F') queries to
f we can find a solution with nonvanishing probability, which is essentially
better than any classical randomized algorithm can do.
Case k = i.
2n is also very interesting. Then, sin 2 (Jo = so (Jo = i,
i. According to (6.28), the probability of seeing a solution after one single
iteration of G n is

sin 2 (3(Jo) = sin2 i= 1.

Thus for k = i.
2n we can find a solution with certainty using G n only once,
which is clearly impossible using any classical search strategy.
In a typical situation we, unfortunately, do not know the value of k in
advance. In the next section, we present a simplified version of a method due
to M. Boyer, G. Brassard, P. H0yer and A. Tapp [17] in order to find the
required element even if k is not known.

6.3 Utilizing Grover's Search Method


6.3.1 Searching with Unknown Number of Solutions
We begin with an elementary lemma, whose proof is left as an exercise.
98 6. Grover's Search Algorithm

Lemma 6.3.1. For any real a and any positive integer m,

~
~cos
((2 r+ 1))
a- _ sin(2ma)
..
r=O 2sma

The following important lemma can be found in [17].


Lemma 6.3.2. Let f : 1F~ -+ lF 2 a blackbox function with k ::; ~. 2n solutions
and 00 E [0, i] defined by equation sin 2 00 = 2~' Let m be any positive integer
and r E [0, m - 1] chosen with uniform distribution. If G n is applied to initial
superposition
1
'21'
v ~ ..
L
<llElF;i
Ix)

r times, then the probability of seeing a solution is

Pr = ~ _ sin( 4mOo) .
2 4m sin(20o)
Proof. In the previous section we saw that, after riterations of G n , the
probability of seeing a solution is sin2 ((2r + 1)00 ), Thus, if r E [0, m - 1] is
chosen uniformly, then the probability of seeing a solution is
m-I
Pm = ~ L sin 2 ((2r + 1)00 )
m r=O
1
L
m-I
= - (1 - cos((2r + 1)200 ))
2m r=O
1 sin(4mOo)
=
2 4msin(20o)
according to Lemma 6.3.l. o
Remark 6.3.1. If m >
-
~2f) ) ' then
Sln\""vO)

sin( 4mOo) ::; 1 = Sin(~Oo) sin( 20o) ::; msin(20o),


and therefore 4s::~~(:~~) ::; i.
The above lemma implies, then, that Pr ~ i·
This means that if m is great enough, then applying Gn r times, where r E
[0, m -1] is chosen uniformly, will yield a solution anyway with a probability
of at least i.
Assume now that the unknown number k satisfies °< k ::; ~ . 2 n. Then

_.,.,.1....,....,..
sin(20o)
= 1
2sinOocosOo
= 2n
2Jk(2 n -
<
k) -
rzn < v'2n,
Vk -
6.3 Utilizing Grover's Search Method 99

so choosing m 2': ffn we certainly have m >


-
.
SIn
(1200 )' This leads to the
following algorithm:
Quantum Search Algorithm
Input: A blackbox function f : F2' --+ F 2·
Output: Any y E F2' such that f(y) = 1 (a solution), if such an element
exists.
= 1,
1. Pick an element xE F2' randomly with uniform distribution. If f(x)
then output x and stop.
2. Otherwise, let m = lffnJ
+ 1 and choose an integer r E [0, m - 1]
uniformly.
3. Prepare the initial superposition
1
IF
v'''·
L
OOEIF2'
Ix)

by using Hadamard transform H n and apply G n r times.


4. Observe to get some y E F2'.
Using the previous lemmata, the correctness probability of this algorithm
is easy to analyze: let k be the number of solutions to f. If k > ~. 2n , then the
algorithm will output a solution after the first instruction with a probability
of at least ~. On the other hand, if k ~ ~ . 2n , then

Vf2r'
> 1-:--;-
m - k -> --:--:-:c
sin(20o)'

and, according to Remark 6.3.1, the probability of seeing a solution is at least


4:1 anyway.

Remark 6.3.2. Since the above algorithm is guaranteed to work with a prob-
ability of at least ~, we can say that on average the solution can be found
after four attempts. In each attempt, the number of queries to f is at most
ffn.
A linear improvement of the above algorithm can be obtained byemplying
the methods of [17]. In any case, the above algorithm results in the following
theorem.
Theorem 6.3.1. By using a quantum circuit making O( ffn) queries to a
blackbox function f, one can decide with nonvanishing correctness probability
if there is an element x E F2' such that f (x) = l.
In the next chapter we will present a clever technique formulated by
R. Beals, H. Buhrman, R. eleve, M. Mosca, and R. de Wolf [6], which will
allow us to demonstrate that the above algorithm is, in fact, optimal up to
100 6. Grover's Search Algorithm

a constant factor. Any quantum algorithm that can discover, with nonvan-
ishing probability, whether a solution to f exists uses n( ffn) queries to f.
This result concerns blackbox functions, so it does not imply a complexity
lower bound for computable functions (see Remark 6.1.3).
On the other hand, blackbox function f can be replaced with a com-
putable function to get the foHowing theorem.
Theorem 6.3.2. By using a quantum circuit, any problem in NP can be
solved with nonvanishing correctness probability in time O( ffnp( n)), where
p is a polynomial depending on the particular problem.
In the above theorem, polynomial p( n) is essentiaHy that the nondeterministic
computation needs to solve the problem (see Section 3.1.2).
Grover's search algorithm has been cunningly used for several purposes.
To mention a few examples, see [19] for quantum counting algorithm, and
[35] for minimum finding algorithm. In [17] the authors outline a generalized
search algorithm for sets other than 1F2'.

Remark 6.3.3. Recently, L. Grover and T. Rudolph have argued [42] that
many algoritms based on the "standard quantum search method" fail to be
nontrivial in the sense that the speedup provided could be obtained just as
weH by dividing the search space into suitable parts, and then performing
the "standard search" in parallel.
7. Complexity Lower Bounds for Quantum
Circuits

Unfortunately, the most interesting questions related to quantum complexity


theory, for example, "Is NP contained in BQP or not?", are extremely diffi-
cult and far beyond recent knowledge. In this chapter we will mention some
complexity theoretical aspects of quantum computation, which may somehow
illustrate these difficult quest ions.

7.1 General Idea

A general technique for deriving complexity theoretical lower bounds for


quantum circuits using blackbox function queries was introduced by R. Beals,
H. Buhrman, R. eleve, M. Mosca, and R. de Wolf in [6]. Before going into the
details, let us briefly discuss the idea informally. To simplify some notations,
we will sometimes interpret an element x E F2' as a binary number in interval
[0, 2n - 1].
Assurne that a quantum circuit finds out if an arbitrary blackbox function

has some property or not (for instance, in Section 6.1.2 we were inter-
ested in whether there was an element y E F2' such that f(y) = 1).
This quantum circuit can then be seen as a device which is given vector
(f(0), f(l), ... , f(2 n - 1)) (the values of 1) as an input, and it gives output
"yes" or "no" respectively if f has the property or not. In other words, the
quantum circuit can then be seen as a device for computing a {O, 1}-valued
function on input vector (f(0), f(l), ... , f(2 n - 1)) (unknown to us), which
is to say that the circuit is a device for computing a Boolean function on 2n
input variables f(O), f(l), ... , f(2 n - 1).
Although lot of things about Boolean functions are unknown to us, fortu-
nately, we do know many things about them; see [64] for discussion. It turns
out that the polynomial representations of Boolean functions will give us the
desired information about the number of blackbox queries needed to find out
how many queries are required to find out some property of f. The bound
will be expressed in terms of polynomial representation degree for quantum
circuits that exactly (with a prob ability of 1) compute the property (Boolean
102 7. Complexity Lower Bounds for Quantum Circuits

function), and in terms of polynomial approximation degree for circuits that


have a high probability of finding that property. We will see that the num-
ber of queries to the blackbox function will bound above the representation
degree of a Boolean function computable with a partieular quantum circuit.
The lower bound for the number of necessary queries is thus established by
finding the representation degree of the Boolean function to be computed: the
quantum circuit must make enough queries to reach the representation de-
gree. Because the polynomial representations and approximations play such
a key role, we will devote the following two sections to study them.

7.2 Polynomial Representations


7.2.1 Preliminaries
In this section, we represent the basic facts about polynomial representations
of Boolean functions and symmetrie polynomials. Areader who is already
familiar with these topies may skip this section.
Let N = 2n . As discussed in Sections 4.1.1 and 9.2.2, functions
F: lFf ~ C
form an 2N -dimensional vector space V over complex numbers. The idea is
that Boolean functions on N variables can be viewed as elements of V: they
are functions defined in N binary variables getting values in a two-element
set {O, 1} which can be embedded in C.
For each subset K = {kt, k 2 , .•. , k l }, ~ {O, 1, ... ,N - 1} we consider a
function
(7.1)
whieh we denote also by XK = Xk 1 Xk2 .•. Xkl and define in a most natural

°
way: the value of function X K on a vector x = (XQ, Xl, ... , xN-d is the
product Xk 1 Xk2 •.•.• Xkp which is interpreted as the number or 1. This may
require some explanation: x is an element in lFr, and each of its component
Xi is an element of the binary field lF 2 • Thus, the product XklXk2 . . . . • Xkl
also belongs to lF 2 , but embedding lF 2 in C, we may interpret the product
as also belonging to C. It is natural to call functions X K monomials and to
denote X0 = 1 (an empty product having no factors at all is interpreted as
1). If IKI = l, the degree of a monomial X K = X k1 X k2 ..• X kl is defined to be
deg XK = l. A linear combination of monomials Xkl' •.• , XK is naturally
B

denoted by
(7.2)
and is called a polynomial. The degree deg P of polynomial (7.2) is defined
as the highest degree of a monomial occurring in (7.2), except in the case of
zero polynomial p = 0, whose degree we symbolically define to be -00.
7.2 Polynomial Representations 103

Areader may now wonder why we would consider such a simple concept
as a polynomial so precisely? The problematic point is that, in a finite field,
the same function can be defined by many different polynomials. For example,
both elements of the binary field JF 2 satisfy equation x 2 = x and, thus, non-
constant polynomial x 2 - X + 1 behaves like a constant polynomial 1.
We will, therefore, define the product ofmonomials X K and XL by consid-
ering X K and XL as functions JFf -+ JF 2 . It is plain to see that this will result
in definition XKXL = XKUL. The product of two polynomials is defined as
usual: if

and

are polynomials, we define


s t

P1 P2 = LLCidjXkiXlj'
i=l j=1

The product differs slightly from the ordinary product of polynomials. Take,
for example, K = {O, I} and L = {l, 2}. Then

but if X OX 1 and X l X 2 were seen as "ordinary polynomials" over F2, the


product would have been XoXf X 2. In fact, the The functions behave just
like ordinary polynomials, except all powers above 1 must be reduced to 1.1

Lemma 7.2.1. The monomials form a basis of V.

Proof· For any y = (Yo, Yl, ... , Y2 n - d E JFf, define a polynomial

Py = (1 + Yo - X o)(l + Yl - Xl)'" (1 + YN-l - XN-d·

By expanding the product, we can get a representation for Py as a sum of


monomials, and for any x = (XO,Xl,'" ,xN-d E JFf,
Py(x) = 1
{==} 1 + Yi - Xi = 1 for each i
{==} Xi = Yi for each i
{==} x = y,
1 For areader who is aware of algebraie structures: the polynomial functions eon-
sidered here form a ring isomorphie to lF 2 [XO ,X1 , . . . ,XN-d/I, where I is the
ideal generated by all polynomials X; - Xi.
104 7. Complexity Lower Bounds for Quantum Circuits

whieh is to say that Py is the eharacteristie function of singleton set {y}.


But beeause we ean express eaeh eharaeteristie function lFf -+ C as a sum
of monomials, we ean surely express any function lFf -+ C as a linear eom-
bination of monomials (see Seetion 9.2.2). Therefore, monomials generate V,
and sinee there are

different monomials, they are also linearly independent. o


It follows from the above lemma that eaeh function lFf -+ lF 2 ean be
uniquely expressed as a linear eombination of monomials (7.1) with eomplex
eoeffieients. Clearly, all Boolean functions with N variables are included. In
fact, we would have also been able to derive sueh a representation for Boolean
functions also directly: Boolean variables ean be represented by monomials
Xo, ... , XN-l, and if PI and P 2 are polynomial representations of some
Boolean funetions, then ,PI ean be represented as 1 - PI, PI 1\ P2 as P I P 2 ,
and PI V P 2 as 1 - (1 - P I )(1 - P 2 ). Sinee all Boolean functions ean be
eonstructed by using -', 1\, and V, (see Seetion 3.2.1) we have the following
lemma.
Lemma 7.2.2. Boolean functions on N variables have a unique representa-
tion as a multivariate polynomial P(Xo, Xl, ... XN-d having real eoeffieients
and degree at most N.
We must keep in mind that "polynomial" in the above lemma is interpreted
in sueh a way that X k = X for eaeh k ~ 1.
In the eontinuation we will mainly eoneentrate on a subclass of symmetrie
Boolean functions that form a subclass of symmetrie functions F : lFf -+ Co
Symmetrie functions are formally defined as follows:
Definition 7.2.1. Let x = (xo, Xl, ... , XN-d E lFf be any veetor and
7r : {O, 1, ... ,N - I} -+ {O, 1, ... ,N - I} any permutation. Denote 7r(x) =
(x7I'(O),X7I'(1),"" X7I'(N-l))'A function F : lFf -+ C is symmetrie if for any
veetor x E lFf and for any permutation 7r E SN,

F(x) = F(7r(x))

holds.
Clearly, any linear eombination of symmetrie functions is again symmetrie,
thus, symmetrie functions form a subspaee of W C V. We will now find a basis
for W. Let P be a nonzero polynomial representing asymmetrie function.
Then P ean be represented as a sum of homogenous polynomials 2
2 A homogenous polynomial is a linear combination of monomials having the same
degree.
7.2 Polynomial Representations 105

such that deg Qi = i. Because P itself was assumed to be symmetrie, each Qi


must also be asymmetrie polynomial (otherwise, by permuting the variables,
we could get a different polynomial representing the same function as P does,
whieh is impossible). Thus, we can write

where the sum is taken over all subsets of {O, 1, ... ,N - I} having a cardi-
nality of i. Because Qi is invariant under variable permutations, and because
representation as a polynomial is unique, we must have that Ck 1 ,k2, ... ,ki = Ci
independent of the choiee of {k l , k 2 , ..• , kd. That is to say, symmetrie poly-
nomials
Vo = 1,
Vl = X o +Xl + ... +XN-b
V2 = XOXl + X OX 2 + ... + X n - 2 XN-l

VN = XOX1 ··· XN-l


generate W. Because they have different degrees, they are also linearly inde-
pendent.
The foIlowing triviality is worth mentioning : the value of a symmetric
function F(x) depends only on the Hamming weight of x, whieh is given by

wt(x) = I{i I Xi = 1}1·


We can even quite easily find out the value of Vi (x) when wt( x) = k: consider
a polynomial

peT) = (T - Xo)(T - Xd ... (T - X N- l ).

Since P is invariant under any permutation of X o, Xl, ... , X N - b the coeffi-


cients of peT) are symmetrie polynomials in X o, Xl. ... , X N - l . In fact, by
expanding P, we find out that the coefficient of T N - i is exactly (-l)iVi. On
the other hand, by substituting 1 for k variables X j and 0 for the rest, peT)
becomes

which teIls us that Vi(x) = (~) if i ~ k, and Vi(x) = 0 if i > k. This leads us
to an important observation whieh will be used later.
106 7. Complexity Lower Bounds for Quantum Circuits

Lemma 7.2.3. If P is a symmetric polynomial of N variables having com-


plex (resp. real) coefficients and degree d, then there is a polynomial Pw on
a single variable with complex (resp. real) coefficients such that

Pw(wt(x)) = P(x)
for each xE Ff.
Proof. Beeause P is symmetrie, it ean be represented as

P = Co Vo + Cl VI + ... Cd Vd .
The claim follows from the fact that Vi (x) = (wt~"')) is a polynomial of wt (x)
having degree i. D

7.2.2 Bounds for the Representation Degrees

There are many eomplexity measures for a Boolean funetion: the number of
Boolean gates -', 1\, and V needed to implement the function (Section 3.2.1)
and deeision tree eomplexity, to mention a few. For our purposes, the ideal
eomplexity measure is the degree of the multivariate polynomial representing
the function. We follow the notations of [6] in the definition below.

Definition 7.2.2. Let B : Ff ---+ {O, I} be a Boolean function on N vari-


ables. The polynomial representation degree deg( B) of B is the degree of the
unique multivariate polynomial representing B.

For future use, we will define polynomial approximations of a Boolean fune-


tion. Sinee the interesting results about approximating pertain to polynomials
with real eoefficients, we will also restriet ourselves to polynomials with real
eoefficients.
Definition 7.2.3. A multivariate polynomial P having real coefficients ap-
proximates a Boolean function B : Ff ---+ {O, I} if

1
IB(x) - P(x)1 ::; '3
for each xE Ff.
Clearly, there may be many polynomials whieh approximate a single Boolean
function B, but we are interested in the minimum degree of an approximating
polynomial. Therefore, following [6], we give the following definition.
Definition 7.2.4. The polynomial approximation degree of a Boolean func-
tion B is

deg(B) = min{ deg(P) I P is a polynomial approximating B}.


7.2 Polynomial Representations 107

It is a nontrivial task to find bounds for the polynomial representation


degree, but we ean derive a general bound for symmetrie Boolean functions
(for more bounds, see and [6] and the referenees within). The following lemma
ean be found in [6]:
Lemma 7.2.4. Let B : lFf -+ {O, I} be a non-eonstant symmetrie Boolean
funetion. Then deg(B) ~ N /2 + 1.

Proof. Let P be the symmetrie multivariate polynomial of deg(B) that rep-


resents B. By Lemma 7.2.3, there is a single-variate polynomial Pw such that
P w (wt( x)) = P( x) for eaeh x E lFf. As a representation of aBooien function
B, P(x) is either 0 or I, and sinee there are N + 1 possible Hamming weights
(from 0 to N), and P is not eonstant, Pw(wt(x)) = 0 for at least N/2 + 1
integers or Pw(wt(x)) = 1 for at least N/2 + 1 integers wt(x). In the former
ease, we eonelude that deg(Pw ) ~ N/2 + I, and in the latter, we eonelude
that deg( P w) = deg( P w - 1) ~ N /2 + 1. Altogether, we find that

deg(B) = deg(P) = deg(Pw) ~ N/2 + 1.


o
For some partieular funetions, we ean, of course, get better results than
the previous theorem gives. Consider, for instanee, function AND : lFf -+
{O, I} defined as AND(x) = 1 if and only if wt(x) = N. Clearly AND is a
symmetrie funetion, and the eorresponding single-variate polynomial AND w
representing AND has N zeros {O, I, ... ,N - I}. Therefore, deg(AND) = N.
Using a similar argument, we also find that deg(OR) = N, where OR is a
function taking a value of 0 if and only if wt(x) = o.
It appears mueh more diffieult to find bounds for the degree of an ap-
proximating polynomial. R. Paturi has proved [67] the following theorem
whieh eharaeterizes the degree of a polynomial approximating symmetrie
Boolean functions. To state Paturi's theorem, we fix some notations. Let
B : lFf -+ {O, I} be asymmetrie Boolean function. Then B ean be repre-
sented as a single-variate polynomial Bw(wt(x)) = B(x) depending only on
the Hamming weight of x. Let

r(B) = min{12k - N + 111 Bw(k) =f. Bw(k + 1) and 0 :::; k :::; N - I}.
Thus, r(B) measures how elose to weight N/2 function B w ehanges value: if
Bw(k) =f. Bw(k + 1) for some k approximately N /2, then r(B) is low. Reeall
that Bw(k) =f. Bw(k + 1) means that, if wt(x) = k, then value B(x) ehanges
if we flip one more co ordinate of x to 1.
Example 7.2.1. Consider function OR, whose only change oeeurs when the
weight of the argument inereases from 0 to 1. Thus, r (0 R) = N -1. Similarly
we see that the only jump for function AND oeeurs when the weight of x
inereases from N - 1 to N. Therefore, we have that r(AND) = N - 1.
108 7. Complexity Lower Bounds for Quantum Circuits

Example 7.2.2. Let PARITY : Ff -+ {O,1} be a function defined as


PARITY(x) = 1, ifwt(x) is divisible by 2 and 0 otherwise. Function PARITY
always changes its value when the Hamming weight of x increases. For our
choice N = 2n is even, so r(PARITY) = 1, but the theorem also holds if N
is odd, and then r(p ARITY) = O.
Function MAJORITY : Ff -+ {O, 1} is defined to take a value of 0, if
wt(x) < N/2, and 1 ifwt(x) ~ N/2. For even N, the only jump occurs when
the weight of x passes from N/2 - 1 to N/2, and for odd N when wt(x)
increases from (N - 1)/2 to (N + 1)/2; so, r(MAJORITY) = 1 for even N
and 0 for odd N.

Theorem 7.2.1 (R. Paturi, [67]). Let B : F~ -+ {O, 1} be a non-eonstant


symmetrie Boolean funetion. Then

deg(B) = 8( VN(N - r(B))).

Earlier, we saw that deg(OR) = deg(AND) = N, which means that


an exact representation of OR and AND requires a polynomial of degree
N. On the other hand, by using previous examples and Paturi's theorem,
we find that a polynomial approximating these functions may have essen-
tially lower degree: deg(OR) = 8( VN), and deg(AND) = 8( VN). However,
deg(PARITY) = 8(N) and deg(MAJORITY) = 8(N), so even an approxi-
mating polynomial for these two latter functions has degree fl(N).

7.3 Quantum Circuit Lower Bound

7.3.1 General Lower Bound

We are now ready to present the idea of [6], which connects quantum circuits
computing Boolean functions and polynomials representing and approximat-
ing Boolean functions.
We consider blackbox functions / : F~ -+ F 2 , and as earlier, quantum
blackbox queries are modelled by using a query operator Qf operating as

Qf Ix) Ib) = Ix) IbEB /(x)).

There mayaiso be more qubits for other computational purposes, but without
violating generality, we may assume that a blackbox query takes place on fixed
qubits. Therefore, we can assume that a general state of a quantum circuit
computing a property of / is a superposition of states of form

Ix) Ib) Iw), (7.3)


where the first register has n qubits, which are used as source bits of a
blackbox query, where b E F 2 is the source qubit, and where w is astring
7.3 Quantum Circuit Lower Bound 109

of some number, say m qubits, which are needed for other computational
purposes. There are 2n +m +1 different states (7.3), and we can expand the
definition of Qf to include all states (7.3) by operating on Iw) as an identity
operator; but, we will just denote the enlargened operator again by Qf:

Qf Ix) Ib) Iw) = Ix) Ib EB I(x») Iw).


We begin by finding the eigenvectors of Qf. It is easy to verify that, if s is
either 0 or 1, then
Qf(lx) 10) Iw) + (_1)8 Ix) 11) Iw))
= (_1)f("')'8 (Ix) 10) Iw) + (_1)8 Ix) 11) Iw) ), (7.4)
which means that

Ix) 10) Iw) + (-lY Ix) 11) Iw) (7.5)

is an eigenvector of Qf belonging to an eigenvalue (_1)f("')·8. On the other


hand, vector (7.5) can be chosen in 2n +m +1 different ways and it is simple to
verify that they form an orthogonal set, so (7.5) are all the eigenvectors of
Q f. It is clear that if the quantum circuit uses query operator Q f, the state
of the quantum circuit depends on the values of I, but equation (7.4) teIls us
the dependence in a more explicit manner: Qf introduces a factor (_1)f("')'8,
which has a polynomial representation 1- 2sl(x). The next lemma from [6]
expresses this dependency more precisely.
Lemma 7.3.1. Let Q be a quantum circuit that makes T queries to a black-
box lunction 1 : 1F~ --+ lF 2. Let N = 2n and identily each binary string in 1F~
with the number it represents. Then the final state 01 the circuit is a super-
position 01 states (7.3), whose coefficients are functions 01 values X o = 1(0),
Xl = f(l), ... , X N - I = f(N -1). These functions have a polynomial repre-
sentation degree 01 at most T.
Proof. The computation of Q can be viewed as a sequence of unitary trans-
formations

Uo,Qf, U1,Qf'U2 "" ,UT-1,Qf' UT


in a complex vector space spanned by all states (7.3). Operators Ui are fixed
but Qf, of course, depends on the particular blackbox function I, so the
coefficients are also functions of X o, XI, ... , X N - I . In the beginning of the
previous section we learned that each such function has a unique polynomial
representation. We prove the claim on degree by induction. Before the first
query, the state of the circuit is clearly a superposition of states (7.3) with
constant coefficients. Assurne, then, that after k queries and after applying
operator Uk, the circuit state is a superposition

L L L P(""b,w) Ix) Ib) Iw) , (7.6)


"'ElF;; bElF2 wElFr
llO 7. Complexity Lower Bounds for Quantum Circuits

where each PC""b,w) is a polynomial of variables X o, Xl, ... , X N - I having


a degree of at most k. Then k + lth query will translate Ix) 10) Iw) into
Ix) 11) Iw) and vi ce versa if X", = J(x) = 1, but this translation will not
happen if X", = J (x) = 0 (again, we interprete x as a binary representation
of an integer in [0, N - 1]). This is to say that a partial sum

PC""O,w) Ix) 10) Iw) + PC""I,w) Ix) 11) Iw) (7.7)

in (7.6) will transform into

(X",PC""I,w)+ (1 - X",)PC"',O,w)) Ix) 10) Iw)


+ (X",PC"',O,w) + (1 - X",)PC""I,w)) Ix) 11) Iw). (7.8)
Using the assumption that degPC""b,w) ::s; k, (7.8) directly teIls us that, after
k + 1 queries to Qf, the coefficients are polynomials of degree at most k + l.
Operator Uk + l pro duces a linear combination of these coefficient functions,
which cannot increase the representation degrees. D

Remark 1.3.1. The transformation from (7.7) to (7.8) caused by Qf is, of


course, exactly the same as (7.4), only it is expressed in a different basis. By
utilizing basis (7.5), Farhi & al. [37J have, independently from [6], derived
a lower bound for the number of quantum queries needed to approximate
PARITY.

Remark 1.3.2. Notice that the proof of Lemma 7.3.1 does not utilize the fact
that each Ui is unitary; but rather, only the linearity is needed. In fact, if we
could use nonlinear operators as weIl, reader may easily verify that we could
have much faster growth in the representation degree of coefficients.

The following theorem from [6J finally connects quantum circuits and
polynomial representations.
Theorem 7.3.1. Let N = 2n , J : F2' ~ F 2 an arbitrary blackbox Junction
and Q a quantum circuits that computes a Boolean Junction B on N variables
X o = J(O), Xl = J(I), ... , XN-I = J(N - 1).
1. IJ Q computes B with a probability oJ 1, using T queries to J, then
T ~ deg(B)/2.
2. IJ Q computes B with a correctness probability oJ at least ~ using T
queries to J, then T ~ deg(B)/2.

Proof. Without loss of generality, we mayassume that Q gives the value of


B by setting the right-most qubit to B(Xo , Xl, ... , XN-d, which is then
observed. By Lemma 7.3.1, the final state of Q is a superposition

L L L PC""b,w) Ix) Ib) Iw) ,


",ElF;' bElF 2 wElF!2'
7.3 Quantum Circuit Lower Bound 111

where each coefficient P(aJ,b,w) is a function of X o, ... , X N - 1 , which has a


polynomial representation with a degree of at most T. The probability of
seeing 1 on the right-most qubit is given by

P(Xo, ... , XN-d = L Ip(aJ,b,w) 1


2
, (7.9)

where the sum is taken over all those states where the right-most bit of w
is 1. By considering separately the real and imaginary parts, we see that
P(Xo, ... ,XN-d, in fact, can be represented as a polynomial of variables
X o, ... , XN-l, with real coefficients and degree at most 2T.
If Q computes B exactly, then (7.9) is 1 if and only if Bis 1, and, therefore,
P(Xo, ... , XN-l) = B. Thus, 2T 2: deg(P) = deg(B), and (1) follows. If Q
computes B with a probability of at least ~,then (7.9) is at most ~ apart from
1 when B is 1, and similarly, (7.9) is at most ~ apart from 0 when B is~This
means that (7.9) is a polynomial approximating B, so 2T 2: deg(P) 2: deg(B),
and (2) follows. D

7.3.2 Some Examples

We can apply the main theorem (Theorem 7.3.1) of the previous section by
taking B = OR to arrive at the following theorem.

Theorem 7.3.2. If Q is a quantum algorithm that makes T queries to a


blackbox function fand decides with a correctness probability of at least ~ if
there is an element xE lF2' such that f(x) = 1, then T 2: cffn, where cis a
constant.

Proof. To decide if f(x) = 1 for some x E lF2' is to compute function OR


on values f(O), f(l), ... , f(2 n - 1). According to Theorem 7.2.1, OR on
N = 2n variables has approximation degree of D( VN), so, by Theorem 7.3.1,
computing OR on N = 2n variables (with a bounded probability of error)
requires D( VN) = D( ffn) queries to f. D

The above theorem shows that the method of using Grover's search algo-
rithm to decide whether a solution for f exists, is optimal up to a constant
factor. Similar results can be derived for other functions whose representation
degree is known.

Theorem 7.3.3. To decide if a blackbox function f : lF2' ---+ lF 2 has an even


number of solutions, it requires D(2 n ) queries to f for any quantum circuit.

Proof. Take B = PARITY, apply Paturi's theorem (Theorem 7.2.1), and use
reasoning similar to the above proof. D

Remark 7.3.3. In [6], the authors also derive results analogous to the previous
ones for so-called Las Vegas quantum circuits, which must always give the
112 7. Cümplexity Lüwer Büunds für Quantum Circuits

correct answer, but that can, sometimes, be "ignorant", i.e., give the answer
"I do not know", with a prob ability of at most ~. Moreover, in [6] it is shown
that if a quantum circuit computes a Boolean function B on variables X o,
Xl, ... , X N - 1 using T queries to 1 with a nonvanishing error prob ability,
then there is a dassical deterministic algorithm that computes B exactly and
uses O(T 6 ) queries to I. Moreover, if B is symmetrie, then O(T 6 ) can even
be replaced with O(T 2 ).

Remark 7.3.4. It can be shown that only a vanishing ratio of Boolean func-
tions on N variables have representation degree lower than N and that the
same holds true for the approximation degree [2]. Thus, for almost any B, it
requires Jl(N) queries to 1 to compute B on 1(0), 1(1), ... , I(N - 1) even
with a bounded error probability.

Remark 7.3.5. With a little effort, we can utilize the results represented in
this section to orade Turing Machine computation: for almost all orades X,
Np x is not induded in BQPx. However, this does not imply that NP is
not induded in BQP, but it offers some reason to believe so. On the other
hand, blackbox functions are the most "unstructured" examples that one can
have - in [29] (see also [21]) W. van Dam has an example on breaking the
blackbox lower bound by replacing an arbitrary 1 with a more structured
function.

Remark 7.3.6. The lower bound by polynomial degree is not tight: A. Am-
bainis has proved the existence of a Boolean function that has degree M, but
quantum query complexity Jl(M1. 32 1...). For the construction of the function
and the lower bound method using the quantum adversary technique, see [3].
8. Appendix A: Quantum Physics

8.1 ABrief History of Quantum Theory


We may say that quantum mechanics was born in the beginning of the twen-
tieth century when experiments on atoms, moleeules and radiation physics
were not explained by classical physics. As an example, we mention the quan-
tum hypothesis by Max Planck in 19001 in connection with the radiation of
a black body.
A black body is a hypothetical object that is able to emit and absorb
electromagnetic radiation of all possible wavelengths. Although we cannot
manufacture ideal black bodies, for experimental purposes, a small hole in
an enclosure is a good approximation of a black body. Theoretical consider-
ations revealed that the radiation intensity on different frequencies 2 should
be the same for all black bodies. In the nineteenth century, there were two
contradictory theories which described how the curve depicting this inten-
sity should look: the first one was Wien's radiation law; the second one was
Rayleigh-Jean's law of radiation. Both of them failed to predict the observed
frequency spectrum, i.e., the distribution describing how the intensity of the
radiation depends on the frequency.
The spectrum of the radiation became understandable when Planck pub-
lished his radiation law [88]. An especially remarkable feature of Planck's
work was that his law of radiation was derived under the assumption that
electromagnetic radiation is emitted in discrete quanta whose energy E is
proportional to the frequency v:
E=hv. (8.1)
The constant h was eventually uncovered by studying the experimental data.
This famous constant is called Planck's constant and its value is approxi-
mately given by
h = 6.62608 . 10- 34 J s. (8.2)
1 M. Planck introduced his ideas in a lecture at the German Physical Society on
December 14, 1900. As a general refrerence to early works on quantum physics,
we mention [88].
2 We could just as weIl talk about wavelength A instead of the frequency v; these
quantities are related by formula VA = c, where cis the vacuum speed of light.
114 8. Appendix A: Quantum Physics

Planck's quantum hypothesis contains a somewhat disturbing idea: since the


days of Galileo and Newton, there had been a debate in the scientific commu-
nity on whether light should be regarded as a particle fiow or as waves. This
discussion seemed to cease in the nineteenth century, mainly for the following
reasons. First, in the beginning the nineteenth century, Young demonstrated
by his famous two-slit experiment that light inherently has an undulatory
nature. The second reason was that research at the end of the nineteenth
century showed that light is, in fact, electromagnetic radiation, which is bet-
ter described as waves. The research mentioned was mainly carried out by
Maxwell and Hertz.
At least implicitly, Planck again called upon the old question ab out the
nature of the light: the explanation of black body radiation, which finally
agreed with observations in accordance with measurement precision, was de-
rived under a hypothesis that light is made of discrete quanta.
Later, in 1905, Albert Einstein explained the photoelectric ejJect, basing
the explanation on Planck's quantum hypothesis [88]. The photoelectric ef-
fect means that negatively charged metal loses its electrical charge when it
is exposed to electromagnetic radiation of a certain frequency. It was not
understood why the photoelectric effect so fundamentally depended on the
frequency of the radiation and not on the intensity. In fact, reducing the
intensity will slow down the effect, but cutting down the frequency will com-
pletely stop the effect. Einstein pointed out that, under Planck's hypothesis
(8.2), this dependency on frequency becomes quite natural: the radiation
should be regarded as a stream of particles whose energy is proportional to
the frequency. It should be then quite clear that increasing the frequency of
the radiation will also increase the impact of the radiation quanta on the
electrons. When the impact becomes large enough, the electrons fiy off and
the negative charge disappears. 3 Einstein also introduced the not ion of the
photon, a light quantum. 4
Planck's radiation law and Einstein's explanation for photoelectric effect
opened the dOOf for a novel idea: electromagnetic radiation, that was tradi-
tionally treated as a wave movement, possesses some characteristic features
of a particle fiow.
The experimental energy spectrum of a hydrogen atom was explained by
Niels Bohr [88] up to the measurement precision of 1912. The model con-
cerning hydrogen atoms introduced by Bohr provided some evidence that
electrons, which had been considered as particles, may have wave-like char-
acteristics as weIl. More evidence of the wave-like characteristics of electrons
3 It should be emphasized here that the explanation given by Einstein is repre-
sented here only as a historical remark. According to modern quantum physics,
the photoelectric effect can also be explained by assuming the undulatory nature
of light and the quantum mechanical nature of the matter.
4 In 1921, Albert Einstein was given the Nobel prize for the explanation of the
photoelectric effect. The theory of relativity was not mentioned as a reason for
Nobel prize.
8.2 Mathematical Framework for Quantum Theory 115

came to light in the interference experiment carried out by C. J. Davidsson


and L. H. Germer in 1927.
Inspired by the previous results, Luis de Broglie introduced a general hy-
pothesis in 1924: particles mayaIso be described as waves 5 whose wavelength
>. can be uncovered when the moment um is known:
>.-!!.
- p' (8.3)

where h is Planck's constant (8.2).


Other famous physicists who have greatly influenced the development of
quantum physics in the form we know it today are W. Heisenberg, M. Born,
P. Jordan, W. Pauli, and P. A. M. Dirac.

8.2 Mathematical Framework for Quantum Theory


In this section, we shall first introduce the formalism of quantum mechanics
in a basic form based on state vectors. Later we will also study a more general
formalism based on self-adjoint operators, and we will see that the state vector
formalism can be produced as a restriction of the more general formalism.
The advantage of the state vector formalism is that it is mathematically
simpler than the more general one.
In connection with quantum computation, we are primarily interested in
representing a finite set 6 by using a quantum mechanical system. Therefore,
we will make another significant mathematical simplification by assuming
that all the quantum systems handled in this chapter are finite-dimensional,
unless explicitly stated otherwise.
In the introductory chapter, we stated that we can describe a probabilistic
system (with a finite phase space) by using a probability distribution

(8.4)
over all the configurations Xi. In the above mixed state, PI + ... + Pn = 1
and the system can be seen in state Xi with a probability of Pi. A quantum
mechanical counterpart of this system is called an n-level quantum system. To
describe such a system, we choose a basis lXI), ... , Ix n ) of an n-dimensional
Hilbert space H n , and a general state of an n-Ievel quantum system is de-
scribed by a vector

(8.5)
5 "There are not two worlds, one of light and waves, one of matter and corpuscles.
There is only a single universe. Some of its properties can be accounted for by
wave theory, and others by the corpuscular theory." (A citation of the lecture
given by Louis de Broglie at the Nobel prize award ceremony in 1929.)
6 In formal language theory, we also call a finite set representing information an
alphabet.
116 8. Appendix A: Quantum Physics

where each ai is a complex number called the amplitude of Xi and lal1 2 +


... + lan l2 = 1. The basis lXI), ... , Ix n ) refers to an observable that can have
some values, but for a moment we will ignore the values and say that the
system may have properties Xl, . . . , X n (with respect to the basis chosen). The
prob ability that the system is seen to have property Xi is lai 12 • Representation
(8.5) has some properties that (8.4) does not have. For example, the time
evolution of (8.4) sends each basis vector again into a combination of all the
basis vectors with non-negative coefficients that sum up to 1. On the other
hand, (8.5) may evolve in a very different way, as the basis states can also
cancel each other.
Example 8.2.1. Consider a quantum physical description of two-state system
having states 0 and 1. Assurne also that the time evolution of the system
(during a fixed time interval) is given by
1 1
10) H )21 0) + )211),
1 1
11) H )21 0) - )211).

If the system starts in state 10) or 11) and undergoes the time evolution, the
probability of observing 0 or 1 will be ~ in both cases. On the other hand,
if the system starts at state 10) and undergoes the time evolution twice, the
state will be
1 1 1 1 1 1
)2()21 0)+ )21 1))+ )2()21 0)- )21 1))
1 1 1 1
= 210) + 211) + 210) - 2 11) = 10),
and the probability of observing 0 becomes 1 again. The effect that the ampli-
tudes of 11) cancel each other, is called destructive interference and the effect
that the coefficients of 10) amplify each other is called constructive interfer-
ence. Destructive interference cannot occur in the evolution of the probability
distribution (8.4) since all the coefficients are always non-negative real num-
bers. A probabilistic counterpart to the quantum time evolution would be
1 1
[0] H 2[0] + 2[1],
1 1
[1] H 2[0] + 2[1],
but the double time evolution beginning with state [0] would give state ~ [0] +
~ [1] as the outcome.

Amplitude distribution (8.5) can naturally be interpreted as a unit-Iength


vector of a Hilbert space. Therefore, the following sections are devoted to the
study of Hilbert spaces (see also Section 9.3).
8.2 Mathematical Framework for Quantum Theory 117

8.2.1 Hilbert Spaces

A finite-dimensional Hilbert spaee H is a eomplete (eompleteness will be


defined later) veetor spaee over eomplex numbers whieh is equipped with an
inner product H x H --+ C, (x, y) f-t (x I y). All n-dimensional Hilbert spaees
are isomorphie, and we ean, therefore, denote any sueh spaee by H n . An inner
product is required to satisfy the following axioms for all x, y, zEH and
Cl, C2 E C:
1. (x I y) = (y I x)*.
2. (x I x) :::: 0 and (x I x) = 0 if and only if x = O.
3. (x I ClY + C2 Z ) = Cl (x I y) + C2(X I z).
If E = {eI, ... , e n } is an orthonormal basis of H, eaeh vector x E H ean
be represented as x = Xlel + ... xne n . With a fixed basis E, we also use
coordinate representation x = (Xl, ... , x n ). Using this representation one ean
easily see that an orthonormal basis induees an inner produet by

(x I y) = XiYI + ... x~Yn' (8.6)

On the other hand, eaeh inner produet is indueed by some orthonormal basis
as in (8.6). For, if (- I .) stands for an arbitrary inner product, we may
use the Gram-Sehmidt proeess (see Seetion 9.3) to find a basis {bI, ... , bn }
orthonormal with respect to (. I .). Then
(x I y) = (xlbl + ... + xnbn I ylbl + ... Ynbn)
= XiYI + ... +X~Yn
The inner produet induees a vector norm in a very natural way:

Ilxll=~·
Informally speaking, the completeness of the veetor spaee H means that
there are enough veetors in H, i.e., at least one for eaeh limit proeess:

Definition 8.2.1. A vector space H is complete, if for each vector sequence


Xi such that

lim
m,n--+(X)
Ilx m - xnll = 0,
there exists a vector x E H such that

lim Ilxn - xii = o.


n--+oo

An important geometrie property of Hilbert spaees is that eaeh subspaee


WEH whieh is also a Hilbert spaee,7 has an orthogonal eomplement. The
proofs of the following lemmata are left as exercises.
7 In finite-dimensional Hilbert spaces, all the subspaces are also Hilbert spaces.
118 8. Appendix A: Quantum Physics

Lemma 8.2.1. Let W be a subspace of H. Then the set of vectors

W1.. = {y E H I (y I x) = 0 whenever xE H.}

is also a subspace of H, which is called the orthogonal complement of W.


Lemma 8.2.2. If W is a subspace of finite-dimensional Hilbert space H,
then H = W EB W 1.. •

8.2.2 Operators

The modern description of quantum mechanics is profoundly based on linear


mappings. In the following sections, we represent the features of the linear
mappings that are most essential for quantum mechanies. Because we con-
centrate mainly on finite-Ievel quantum systems, the vector spaces that are
treated hereafter will be assumed to have finite dimensions, unless explicitly
stated otherwise.
Let us begin with some terminology: a linear mapping H -+ H is called
an operator. The set of operators on H is denoted by L( H). For an operator
T, we define the norm of the operator by

IITII = sup IITxll·


110011=1
A nonzero vector x E H is an eigenvector of T belonging to eigenvalue .x E C,
if Tx = .xx.
The set of operators also forms a vector space, if the sum and scalar
multiplication is defined in natural way: Let Band T E L(H) and a E Co
Then B + T and aB are operators in L(H) defined as

(B + T)x = Bx + Tx
and

(aB)x = a(Bx)
for each x E H.
Definition 8.2.2. For any operator T : H -+ H, the adjoint operator T* is
defined by the requirement

(x I Ty) = (T*x I y)
for all x, y EH.
Remark 8.2.1. With a fixed basis {eI, ... , e n } of H n , any operator T can be
represented as an n x n matrix over the field of complex numbers. It is not
difficult to see that the matrix representing the adjoint operator T* is the
transposed complex conjugate of the matrix representing T.
8.2 Mathematical Framework for Quantum Theory 119

If {Xl, ... , X n } and {Y1, ... , Yn} are orthonormal bases of H n , then it
can be shown that
n n
(8.7)
i=l i=l

see Exercise 1.

Definition 8.2.3. Por an orthonormal basis {Xl, ... , Xn }, quantity


n

Tr(T) = L (Xi I TXil


i=l

is called the trace of operator T.

By (8.7), the not ion of trace is well-defined. Moreover, it is clear that the
trace is linear. Notice also that in the matrix representation of T, the trace
is the sum of the diagonal elements.

Definition 8.2.4. An operator T is self-adjoint if T* = T. An operator T


is unitary if T* = T- 1 .

The following simple lemmata will be used frequently.


Lemma 8.2.3. A self-adjoint operator has real eigenvalues.

Proof. IfAx = "\x, then


"\*(X I Xl = (,,\x I Xl = (Ax I Xl = (x I AXI = "\(x I Xl·
Since X f= 0 as an eigenvector, if follows that ,,\* = "\. o
Lemma 8.2.4. The eigenvectors of a self-adjoint operator belonging to dis-
tinct eigenvalues are orthogonal.

Praof. Assume that ,,\ f= A', Ax = "\x, and Ax' = "\' x'. Since ,,\ and A' are
real by the previous lemma,

,,\'(x' I xl = (Ax' I Xl = (x' I AXI = "\(x' I Xl,


and therefore (x' I Xl = o. o
Definition 8.2.5. A self-adjoint operator T is positive if (x I TXI 2': 0 for
each xE H.

Remark 8.2.2. A partial order in the set of operators can be introduced by


defining T 2': S if and only if T - S is positive.
120 8. Appendix A: Quantum Physics

If W is a subspace of Hand H = W EB W.l, then each vector x E H can


be uniquely represented as Xw + Xw L, where Xw E Wand Xw LEW.l. It is
quite easy to verify that the mapping Pw defined by Pw (xw +xw L) = Xw is
a self-adjoint linear mapping, called the prajection onto subspace W. Clearly,
par = Pw · On the other hand, it can be shown that each self-adjoint P such
that p 2 = Pis a projection onto some subspace of H (Exercise 2). The set of
projections in L(H) is denoted by P(H). Notice that a projection can never
increase the norm:

The second-Iast equality, which is called Pythagoras' theorem, follows from


the fact that Xw and XWT are orthogonal. Projections play an important
role in the theory of self-adjoint operators, and so we will study them in
more detail in the next section.
Since unitary operators also playa great role in quantum theory, we will
study some of their properties.
Lemma 8.2.5. Unitary operator U : H -+ H preserves the inner praducts,
that is, (Ux I Uy) = (x I y).
Praof. This follows directly from the definition of the adjoint operator and
unitarity:
(Ux I Uy) = (U*Ux I y) = (U-1Ux I y) = (x I y).
o
Corollary 8.2.1. Unitary operators preserve norms, i.e., IIUxl1 = Ilxll.
A statement converse to the previous lemma also holds:
Lemma 8.2.6. Operators U : H -+ H that preserve inner praducts are uni-
tary.
Praof. Assurne that (Ux I Uy) = (x I y) for each x, y E H. Then, especially
IIUxl1 = Ilxll for each x E H which implies that U is injective. Since H was
assumed finite-dimensional, U is surjective, too. Therefore U- 1 exists, and
(x I U-1y) = (Ux I UU-1y) = (Ux I y),
which means that U* = U-t, Le., U is unitary. o
Lemma 8.2.7. Operators U : H -+ H that preserve norms are unitary.
Proof. The polarization equation
3
(x I y) = ~ ~::>k(y + ikx I y + ikx) (8.8)
k=O
is easy to verify by straightforward calculation (Exercise 4). From (8.8), it
follows directly that operators which preserve norms, also preserve inner prod-
ucts. Now, the claim follows by Lemma 8.2.6. 0
8.2 Mathematical Framework for Quantum Theory 121

We can even strengthen Lemma 8.2.6 a little bit:

Lemma 8.2.8. Let {Xl, ... , xd and {YI, ... , Yk} be two sets of vectors in
H. 1f (Xi I Xj) = (Yi I Yj) for each i and j; then there is a unitary mapping
U : H ~ H such that Yi = U Xi.

Proof. Let W be the subspace of H generated by vectors Xl, ... , Xk. There
exists a subset of {Xl,"" xd which forms a basis of W. Without loss of
generality, we may assume that this subset is {Xl, ... , Xk'} for some k'::; k.
Now we define a mapping U : W ~ H by UXi = Yi for each i E {1, ... , k'}
and extend this into a linear mapping in the only possible way.
We will first show that YI, ... , Yk' is a basis of Im(U). Clearly those vec-
tors generate Im(U), so it remains to show that they are linearly independent.
For that purpose, we assume that

(8.9)

for some coefficients al, ... , ak', For any i E {l, ... ,k'}, we compute the
inner product of (8.9) by Yi thus getting

(8.10)

On the other hand, by the assumption, (8.10) can be written as

al(Xi I Xl) + ... + ak(xi I Xk') = O. (8.11)

Equation (8.11) states that

(Xi I alXI + ... + ak'Xk') = 0


for each i E {1, ... ,k'}, which implies that the vector

(8.12)

is orthogonal to all vectors in space W. Since vector (8.12) itself belongs to


W, (8.12) must be the zero vector. Therefore al = ... = ak' = 0, and it
follows that YI, ... , Yk' forms a basis of Im(U).
It follows directly that U : W ~ H is injective: the equation

means that

and this implies that al = ... = ak' = O.


Now, having a bijection U : W ~ Im(U), we extend it to the whole space
H as follows: Let {Zl' ... , Zr} be an orthonormal basis of W-L and {z~, ... ,
122 8. Appendix A: Quantum Physics

Z~} an orthonormal basis of Im(W)-L. Defining the extension of U so that


U(Zi) = z~ for each i E {I, ... , r}, mapping U : H -+ H obtained in this way
is clearly bijective. To prove that U is unitary, by Lemma 8.2.7, it remains to
show that U preserves the inner products. For this purpose, it is sufficient (by
linearity) to show that U preserves all the inner products of the basis vectors
{Xl,'" ,Xk',ZI,··· ,Zr}. This is quite easy: (UXi I UXj) = (Yi I Yj) = (Xi I
Xj) for each i,j E {l, ... ,k' }. Second, (UXi I Uz j ) = (Yi I zD = 0 = (Xi I
Zi), and finally, (UZi I Uz j ) = (z~ I zj) = bij = (Zi I Zj).
Finally, we have to show that Yi = UXi for each i E {I, ... , k}. If i ~ k ' ,
there is nothing to prove, so we may assume that i > k'o Now that Xl, ... ,
Xk' form a basis of W, each vector Xi E W can be uniquely written as

It follows directly that

(Xj I Xi) = Cl(i) (Xj I Xl) + ... + Ck(i)' (Xj I Xk')' (8.13)

By the assumption, (8.13) can be rewritten as


(i) (i)
(Yj I Yi) = cl (Yj I YI) + ... + Ck' (Yj I Yk')'
or as

(8.14)

But 8.14 says that


0)
(Yj I Yi - (Cl YI + ... + Ck0)' Yk'») = 0
for each Yj. Now we can conclude as earlier: vector Yi - (cii)YI + ... + C~i)Yk')
is orthogonal to the subspace generated by vectors YI, ... , Yk. Because it
belongs to that space, we must conclude that it is the zero vector. Therefore,

8.2.3 Spectral Representation of Self-Adjoint Operators

For any vectors X and Y E H n , we define a linear mapping Ix) (y I: H n -+ H n


by

Ix)(YI Z = (y I z)x.

It is plain to see that, if Ilxll = 1, then I x)(x I is the projection onto the
one-dimensional subspace generated by X.
8.2 Mathematical Framewürk für Quantum Theüry 123

Remark 8.2.3. If A and B E L(Hn ), then

IAx)(Byl z = (By I z)Ax = A(y I B*z)x = A Ix)(yl B*z.

Since this holds for each ZEHn, we have that IAx)(Byl= A Ix)(yl B*.

Remark 8.2.4. Theadjointoperatoroflx)(yl can beeasilyfound: (lx)(YI)*


=ly)(xl·

Remark 8.2.5. If {Xl, ... , X n } is an orthonormal basis of H n , then the matrix


representation of mapping I Xi) (Xj I is simply the matrix having Os elsewhere
but 1 at the intersection of the ith row and the jth column. It follows directly
that mappings I Xi) (Xj I generate the whole space L(Hn ). In fact, if T E
L(Hn ), then
n n

T = L L(xr I Tx s ) Ixr)(x s I (8.15)


r=ls=l

is the matrix representation of T in basis {Xl, ... , x n }. It is easy to see that


the mappings I Xi) (Xj I are linearly independent, and therefore the dimension
of L(Hn ) is n 2 .

To introduce the spectral decomposition of a self-adjoint operator T, we


present the following weIl-known result:
Theorem 8.2.1. Let T : H n -+ H n be a self-adjoint operator. There exists
an orthonormal basis 01 H n that consists 01 eigenvectors 01 T.

Proof. The proof is made by induction on n = dirn H n . One-dimensional H l


is generated by some single vectür el of unit length. Clearly Tel = Ael for
some A E C, so the claim is obvious. If n > 1, let A be an eigenvalue of T,
and let W be the eigenspace of A, that is, the subspace of H n generated by
the eigenvectors belonging to A. Any linear combination of the eigenvectors
belonging to A is again an eigenvector belonging to A, which implies that W
is closed under T. But also, the orthogonal complement W 1- is closed under
T: in fact, take any X E Wand y E W 1-. Then,

(X I Ty) = (Tx I y) = A* (x I y) = 0,
which means that Ty E W 1- as weIl. Therefore, we may study the restrictions
T : W -+ Wand T : W 1- -+ W 1- and apply the induction hypothesis to find
an orthonormal basis for Wand W 1- consisting of eigenvectors of T. Since
H n = W EI:l W 1-, the union of these bases satisfies the claim. D

We can now introduce spectral representation, a powerful tool in the study


of self-adjoint mappings. Let Xl, ... , X n be orthonormal eigenvectors of a
self-adjoint operator T : H n -+ H n , and let Al, ... , An be the corresponding
124 8. Appendix A: Quantum Physics

eigenvalues. Numbers Ai are real by Theorem 8.2.1, but they are not neces-
sarily distinct. The set of eigenvalues is called a spectrum, and it can be easily
verified (Exercise 3) that

(8.16)

Decomposition (8.16) is called a spectral representation of T. If T has multiple


eigenvalues, we say that T is degenerate; otherwise, T is nondegenerate. The
spectral representation (8.16) of a nondegenerate T is unique, as is easily
verified. Notice that we do not claim that the vectors Xi are unique, only
that the projections I Xi) (Xi I are. If T is degenerate, we can collect the
multiple eigenvalues in (8.16) as common factors to obtain a representation

(8.17)

where P l , ... , Pn , are the projections onto the eigenspaces of A~, ... , A~,. It
is easy to see that the spectral representation (8.17) is unique. 8
Recall that all the eigenvectors belonging to distinct eigenvalues are or-
thogonal, which implies that, in representation (8.17), all the projections are
projections to mutually orthogonal subspaces. Therefore, PiPj = 0 whenever
i i= j. It follows that, if p is a polynomial, then

p(T) = p(A~)Pl + ... + p(A~' )Pn,. (8.18)

We also generalize (8.18): if f : ffi. -+ <C is any function and (8.17) is the
spectral representation of T, we define

f(T) = f(A~)Pl + ... + f(A~,)Pn" (8.19)

Example 8.2.2. The identity operator I E L(Hn ) is defined as Ix = X for


each X E H n . Operator I is trivially self-adjoint, and its only eigenvalue is
1. Moreover, the eigens pace of I is clearly the whole space H n . To find a
spectral representation of I, it therefore suffices to fix any orthonormal basis
{Xl, ... , X n , and then we know that

(8.20)

Equation (8.20) could, of course, also be verified straightforwardly.

Example 8.2.3. Matrix

defines a self-adjoint mapping in H 2 , as is easily verified. Matrix M~ is non-


degenerate, having 1 and -1 as eigenvalues. The corresponding orthonormal
8 Some authors call only the unique representation (8.17) a spectral representation,
not representation (8.16).
8.2 Mathematical Framework for Quantum Theory 125

eigenvectors can be chosen as Xl = 1(1,1)T and X-1 = 1(1, _l)T, for


example. The matrix representations of both projections IX±l) (X±l I can be
easily uncovered; they are

respectively. Thus, the spectral representation of M, is given by

( 01)
10 = 1· (!!)
!! - (!-!)
_! ! .
1.

Now we can find the square root of M,:

JM, =V1. (1 1) +R· (-1 -1)·


Choosing J1 = 1, A = i, we obtain the matrix ofExample 2.1.1. By fixing
the square roots above in all possible ways, we get four different matrices X
that satisfy the condition X 2 = M,.

8.2.4 Spectral Representation of Unitary Operators

Let us study a function eiT , which is defined on a self-adjoint operator defined


by spectral representation

By definition,

and we notice that

which is to say that eiT is unitary. We will now show that each unitary
mapping can be represented as eiT , where T is a self-adjoint operator. To do
this, we first derive an auxiliary result which also has independent interest.
Lemma 8.2.9. Self-adjoint opemtors A and B commute if and only if A
and B share an orthonormal eigenvector basis.
Proof. If {al, ... , an} is a set of orthonormal eigenvectors of both A and B,
then, by using the corresponding eigenvalues, we can write
126 8. Appendix A: Quantum Physics

and

Hence,

Assume, then, that AB = BA. Let Al, ... , Ah be all the distinct eigenvalues
of A, and let a~k), ... , a~~ be orthonormal eigenvectors belonging to Ak. For
any a;k), we have

ABa(k)
1.
= BAa(k)
t
= BAka(k)
'l
= AkBa(k)
'l ,

i.e., Ba;k) is also an eigenvector of A belonging to eigenvalue Ak. Therefore,


the eigenspace of Ak is closed under B. We denote this subspace by W k . As a
restriction of a self-adjoint operator, clearly B : Wk ---+ Wk is also self-adjoint.
Therefore, B has mk eigenvectors b~k), ... , b~~ that form an orthonormal
basis of W k . By finding such a basis for each W k , we obtain an orthonormal
eigenvector system such that
• Each b;k) is an eigenvector of Band A (for A, b;k) is an eigenvector be-
longing to Ak)'
• If i #- j, then b;k) and b)k) are orthogonal, since they belong to an or-
thonormal basis generating W k .
• If k #- k', then b~k) and bjk fl are orthogonal, since they are eigenvectors of
A belonging to distinct eigenvalues Ak and Ak f •

Thus, the system which is obtained is an orthonormal basis of H n but also a


set of eigenvectors of both A and B. 0

Now, let U be a unitary operator. We notice, first, that the eigenvalues of


U have absolute values of 1. To verify this, let x be an eigenvector belonging
to eigenvalue A, i.e., Ux = AX. Then,

(x I x) =I U*Ux) =
(x (Ux lUx) = (AX lAX) = IAI 2 (x I x),
and it follows that lAI = 1.
We decompose U into "real and imaginary parts" by writing U = A+iB,
where A = ~(U + U*) and B = ~(U - U*). Note that A and Bare now
self-adjoint, commuting operators. According to the previous lemma, we have
spectral representations

and
8.2 Mathematical Framewürk für Quantum Theüry 127

Since the eigenvalues of U are of absolute value 1, it follows that the eigen-
values of A and B, i.e., numbers .Ai and {.Li, have absolute values of at most
1. But since A and Bare self-adjoint, numbers .Ai and {.Li are also real. Thus,
U can be written as

where .Aj and {.Lj are real numbers in the interval [-1, 1J. Because the eigen-
values of U have absolute value 1, we must have .AJ + {.LJ = 1. Thus, there
exists a unique 8j E [0, 21f) for each j such that .Aj = cos 8j and {.Lj = sin 8j .
It follows that .Aj + i{.Lj = e ie ; and U can be expressed as U = e iH , where

Representation

is called the spectral representation of unitary operator U. Operator H (or


-H) is sometimes called the Hamilton operatorwhich induces U. Notice also
that the way we derived the spectral representation of a unitary operator gives
us, as a byproduct, the knowledge that a unitary matrix has eigenvectors that
form an orthonormal basis of H n .
The spectral representation for a unitary operator could, of course, have
been derived directly without using decomposition U = A + iBo Anyway, the
spectral representation is useful when decomposing a unitary matrix into a
product of simple unitary matrices. Notice also that if Tl and T 2 commute,
then clearly e i (T1 +T2l = eiTleiT2.
In the following examples, we will illustrate how the unitary matrices and
the Hamiltonians that induce them are related.

Example 8.2.4. The Hadamard- Walsh matrix

W2
1 (1 1)
= J2 1-1

is unitary but also self-adjoint. Thus, the decomposition W 2 = A + iB is just


W 2 itself. Matrix W 2 has two eigenvalues, -1 and 1, and the corresponding
orthonormal eigenvectors can be chosen as X-I = ~(1 - J2, IV and
4-2V2
Xl = ~(1 + J2, IV. The corresponding projections can be expressed
4+2V2
as

and
128 8. Appendix A: Quantum Physics

IXl)(Xll= (24 2~)'


The spectral representation of a self-adjoint mapping W 2 is, thus, given by

w2 = l·lxl)(Xll +(-l)'lx-l)(X-ll,
which can be also written as W2 = ei .Q 1 Xl)(Xl 1 +ei '1!" 1 X-l)(X-l I, so
W2 = eiT , where

Example 8.2.5. Let () be areal number and

R =
o
(COS() -Sin())
sin () cos ()
be the rotation matrix. Clearly, Ro is unitary. Now that we know that unitary
matrices also have eigenvectors forming an orthonormal basis, we can find
them direct1y without seeking a decomposition Ro = A + iB. It is an easy
task to verify that the eigenvalues of Ro are e±iO, and the corresponding
eigenvectors can be chosen as X+ = ~(i,l) and x_ = ~(-i,l)T. The
corresponding projections are given by

0
Ho = ( -i() 0
i()) .
Example 8.2.6. Let a and ß be real numbers. The phase shift matrix

is unitary, as is easily verified (Notice that these matrices are closely related
to phase flips, see Example 2.1.4). The spectral decomposition is now trivial:

P.a,ß = eia (1000) + e (0010) '


so Pa,ß = eiHa,ß, where


8.3 Quantum States as Hilbert Space Vectors 129

8.3 Quantum States as Hilbert Space Vectors


Let us return to the study of finite-level quantum systems. The mathematical
description of such a system is based on an n-dimensional Hilbert space H n .
We choose an orthonormal basis {lxI) , ... , Ix n )} and call vectors lXi) basis
states. Astate of the system is a unit-Iength vector in H n . States other than
the basis ones are called superpositions of the basis states. Two states Ix) and
Iy) are equivalent if Ix) = ei() Iy) for some real (J. Hereafter, we will regard
equivalent states as equal.
A general state
(8.21)
determines a probability distribution: ifthe system in state (8.21) is observed,
the system will be seen in a basis state lXi) with a probability of IGiI2. The
coefficients Gi are called amplitudes. We will study the observations in more
detail in Section 8.3.2.
For now, we present a mathematical description of compound quantum
systems: suppose we have n- and m-state distinguishable systems9 with bases
{lxI) , ... , Ix n )} of H n and {IYl)'· .. ' IYm)} of H m respectively. The com-
pound system which is made of these subsystems is described as tensor prod-
uct H n ® H m ~ H nm . The basis states of H n ® H m are
lXi) ®IYj) ,
where i E {I, ... ,n} and jE {I, ... ,m}. We also use notations lXi) ®IYj) =
lXi) IYj) = lXi, Yj). A general state Iz) of the compound system is a unit-
length vector in H nm . We say that astate Iz) is decomposable, if
Iz) = Ix) Iy)
for some states Ix) E H n and Iy) E H m • Astate that is not decomposable is
entangled. The inner product in space H n ® H m is defined by
(Xl ® Yl I X2 ® Y2) = (Xl I X2)(Yl I Y2).
Example 8.3.1. A two-Ievel quantum system is called a qubit. The orthonor-
mal basis vectors chosen for H 2 are usually denoted by 10) and 11) and are
identified with logical 0 and 1. A compound system of m qubits is called
a quantum register of length m and is described by a Hilbert space having
dimension 2m and basis

We can identify the binary sequence Xl, ... , X m and number x 1 2m - l +


X22m-2 + ... + x m - 1 2 + X m . Using this identification, the basis of H 2 m can
be expressed as
{IO) , 11) , ... ,12m - I)}.
9 The requirement that the systems are distinguishable is essential here. A com-
pound system of two identical subsystems has a different description.
130 8. Appendix A: Quantum Physics

8.3.1 Quantum Time Evolution


Our next task is to describe how quantum systems change in time. For that
purpose, we will assume, as is usually done, that there is some function Ut :
H n -+ H n which depends on the time t, and describes the time evolution of
the system. lO In other words, by denoting the state of the system at time t
by x(t),
x(t) = Utx(O).
We will now list some requirements that are usually put on the time-evolution
mapping Ut . The very first stress put on Ut is that it should map unit-Iength
states to unit-Iength states; that is, it should preserve the norm:
1. For each t E IR and each x E H n , IIUtxl1 = Ilxll.
Notice that the above requirement is quite mathematical: it just states that,
if the coefficients of x determine a probability distribution as in (8.21), then
the coefficients of U x should do so as weIl. The second requirement, on the
other hand, is that the superposition (8.21) should strongly resemble the
probability distribution, so strongly that each Ut operates on (8.21) in such
a way that each basis state evolves independently:
2. For each t, Ut (a1 IX1) + ... + an Ix n )) = a 1Ut IX1) + ... + anUt Ix n ). This
means that each Ut is linear.
We could seriously argue about the above requirement,l1 but that is not the
purpose of the present book. The next requirement is that Ut can always be
decomposed:
3. For all hand t2 E IR, Ut1 +t2 = Ut1 Ut2 .
Finally, we require that the time evolution must be smooth:
4. For each to E IR, limt-Ho Utx(O) = limHto x(t) = x(to).
From the above requirements, we can derive the following characteristic re-
sult.
Lemma 8.3.1. A time evolution mapping Ut satisfying 1-3 is a unitary op-
erator.
Proof. Condition 2 states that Ut is an operator, and from condition 1, it
follows directly that each Ut is injective. Requirement 3 implies that Uo = UJ.
But Uo Ix) = UJlx) implies, by the injectivity, that Ix) = Uo Ix) for any Ix),
which is to say that Uo is the identity mapping. By again using requirement
3, we see that Ix) = UtU- t Ix), Le., each Ut is also surjective. According to
requirement 1, each Ut preserves norm, and by Lemma 8.2.7 it follows that
each Ut is unitary. D
10 This assumption is referred to as the causality principle.
11 If astate (8.21) is not just a generalization of a probability distribution, why
should it evolve like one?
8.3 Quantum States as Hilbert Space Vectors 131

Condition 4 is still unused. Its place is found after the following theorem by
M. Stone; see [65J.
Theorem 8.3.1. 1f each Ut satisfies 1~4, then there is a unique self-adjoint
operator H such that
Ut = e- itH .
Recall that the exponential function of a self-adjoint operator A can be de-
fined by using the spectral representation (8.19), or even as series
A 1 2 1 3
e =1+A+,A
2.
+,A
3.
+ ....
It is clear that the definitions coincide in a finite-dimensional H n . By Stone's
theorem, the time evolution can be expressed as
x(t) = e-itHx(O),
from which we get

ix(t) = -iHe-itHx(O) = -iHx(t)


dt
by component-wise differentiation. This can also be written as
.d
zdt x(t) = Hx(t). (8.22)

Differential equation (8.22) is known as the abstract Schrödinger equation,


and the operator H is called the Hamilton operator of the quantum sys-
tem. In a typical situation, the Hamilton operator (which represents the total
energy of the system) is known, and the Schrödinger equation is used to de-
termine the energy eigenstates, which form an example of the basis states of
a quantum system.
Remark 8.3.1. The time evolution which is described by operators Ut is con-
tinuous, but, in connection to computational aspects, we are interested in the
system state at discrete time points h, t2, t3, .... Therefore, we will regard
the quantum system evolution as a sequence of unit-Iength vectors x, U1 x,
U2 U1 X, U3 U2 U1 X, ••. , where each Ui is unitary.

8.3.2 Observables
We again study a general state
(8.23)
The basis states were introduced in connection with the distinguishable prop-
erties which we are interested in, and a general state, Le., a superposition of
basis states, induces a probability distribution of the basis states: a basis
state Xi is observed with a probability of lail 2 • This will be generalized as
folIows:
132 8. Appendix A: Quantum Physics

Definition 8.3.1. An observable E = {E 1 , ••• E m } is a collection of mutu-


ally orthogonal subspaces of H n such that

In the above definition, inequality m :::; n must, of course, hold. We equip the
subspaces Ei with distinct real number "labels" fh, ... , Bm . For each vector
xE H n , there is a unique representation

x = Xl + ... +xm (8.24)

such that Xi E Ei. Instead of observing spaces Ei, we can talk ab out observ-
ing the labels Bi: we say that by observing E, value Bi will be seen with a
probability of Ilxi W. 12

Example 8.3.2. The not ion of observing the basis states Xi is a special case:
the observable E can be defined as

where each Ei is the one-dimensional subspace spanned by Xi. We can, for


instance, equip Ei with the label i. If the system is in state (8.23), the value
i is observed with a probability of

Another view of the observables can be achieved in the following way: if


H n = E 1 EB ... EB E m , let PEi be the projection on the subspaces Ei. Thus,
we mayaiso think of observables as (sometimes partially defined) mappings
E : R -+ L(H) that associate a projection E(Bi ) = P Ei to each label Bi'
Following the notation (8.24),

and the probability of observing the label Bi when the system state is Ix), is
given by

Mapping E : R -+ L(H) has other interesting properties: the probability of


observing either Bi or Bj is clearly

12 The original starting point is, of course, controversial: instead of talking ab out
observing subspaces, one talks about measuring some physical quantity and ob-
taining areal number as the outcome. However, here it seems to be logically
more consistent to introduce these quantity values as labels of subspaces.
8.3 Quantum States as Hilbert Space Vectors 133

and the probability that some ()i is observed is

1 = (x I Ix).
Thus, we may extend E to be defined on sets of real numbers by setting
E( {()i,{tj}) = E«()i) + E(Bj ) if Bi -:f. Bj . We can easily see that, as a mapping
from subsets of IR to L(H), E satisfies the following conditions:
1. For each X C IR, E(X) is a projection.
2. E(IR) = I.
3. E(UXi ) = L E(Xi ) for a disjoint sequence Xi of sets.
A mapping E that satisfies the above conditions is called a projection-valued
measure. 13 Viewing an observable as a projection-valued measure represents
an alternative way of thinking than Definition (8.3.1). In that definition, we
merely equipped each subspace with areal number, a label, which is thought
to be observed instead of the actual subspace. Here we associate a projection
which defines a subspace to a set of labels (real numbers). If X is a set of
real numbers, then (x I E(X)x) is just the probability of seeing a number in
the X.
It follows from Condition 3 that, if X and Y are disjoint sets, then the
corresponding projections E(X) and E(Y) project onto mutually orthogonal
subspaces. In fact, since E(X)(Hn ) is a subspace of E(X U Y)(Hn), we must
have E(X)E(X U Y) = E(X) and; therefore, E(X) = E(X)E(X U Y) =
E(X) + E(X)E(Y), so E(X)E(Y) = O.
The third and perhaps the most traditional view of the observables can
be achieved by regarding an observable as a self-adjoint operator

where E(()i) = E( {Oi}) are projections to mutually orthogonal subspaces.


Recalling that the prob ability of observing ()i when the system is in state x,
is (x I E(Oi)X), we can compute the expected value of an observable A in
state x. The expected value is:
Eoo(A) = Ol(X I E(Ot)x) + ... + Om(x I E(Om)x)
= (x I (()l E (()t) + ... + ()mE(()m))x) = (x lAx).
It now seems to be a good moment to collect together all the viewpoints
of observables which we have so far.
• An observable can be defined as a collection of mutually orthogonal sub-
spaces {E 1 , ... , E m } such that H = E 1 EB ... EB E m . Each subspace is
13 The notion of a projection-valued measure is not, in general, defined on all subsets
of lR, but on au-algebra which has enough "regular features". Typically, the
Borel subsets of lR will suffice. However, now we are interested in finite-level
quantum systems, so there are only finitely many real numbers associated with
each observable.
134 8. Appendix A: Quantum Physics

equipped with areal number label Bi. This point of view most closely
resembles the original idea of thinking about a quantum state as a gen-
eralization of a probability distribution. In fact, observing the state of a
quantum system can be seen as learning the value of an observable, which
is defined as a collection of one-dimensional subspaces spanned by the basis
states .
• An observable can be seen as a projection-valued measure E which maps
a set of labels to a projection that defines a subspace. This viewpoint
offers us a method for generalizing the not ion of an observable into positive
operator-valued measure.
• The traditional view of an observable is to define an observable as a self-
adjoint operator by the spectral representation

All these viewpoints are logically quite equal. In what follows, we will use all
of them, choosing the one which is best-suited for each situation.

8.3.3 The Uncertainty Principles

In this section, we first present an interesting classical result connected to


the simultaneous measurements of two observables. We will view the observ-
ables as self-adjoint mappings. We say that two observables A and Bare
commutative, if AB = BA. We also define the commutator of A and B as a
mapping [A, B] = AB - BA. Recall that, für a given state x, the expected
value of observable A is given by JE",(A) = (x lAx). For short, we denote
J-l = JE", (A). The variance of A (in state x) is the expected value of observable
(A - J-l?' Notice that, if Ais self-adjoint, then (A - J-l)2 is also self-adjoint,
and, thus, it also defines an observable. With these notations, the variance
can be expressed as
Var",(A) = JE",((A - J-l)2) = (x I (A - J-l)2 X)
= ((A-J-l)x I (A-J-l)x) = II(A-J-l)xI1 2 . (8.25)

Theorem 8.3.2 (The Uncertainty Principle). For any observables A,


Band any state x,

Var",(A)Var",(B) :::: 41 1(x 2


I [A, B]x)1 .

Proof. Notice first that, for self-adjoint A and B, [A, B]* = -[A, B], so the
commutator of A and B can be written as [A, B] = iG, where G = -i[A, B]
is self-adjoint. If J-lA and J-lB are the expected values üf A and B respectively,
we get
Var",(A)Var",(B) = II(A - J-lA)xWII(B - J-lB)xI1 2
:::: 1((A-J-lA)X I (B-J-lB)X)1 2
8.3 Quantum States as Hilbert Space Vectors 135

using representation (8.25) and the Cauchy-Schwartz inequality. To shorten


the notations, we write Al = A - /-LA and BI = B - /-LB. Clearly, Al and BI
are also self-adjoint and it is a straightforward task to verify that [Al, BI] =
[A, B] and that AIB I + BIA I is self-adjoint. Thus,
I(Alx 1B l x)1 2 = I(x 1A I B l x)1 2
= I(x 1 ~(AIBI + BIAI)x + ~(AIBI - B I A d x )1 2
1 2
= 4 1(x 1(AIBI + BIAdx) + (x 1[A, B]x)1

= ~ I(x 1(AIBI + BIAdx) + i(x 1Cx)1 2 ,


where C = -i[A, B] is a self-adjoint operator. Because of the self-adjointness
of the above operators, both inner products are real. Therefore,
1 .)2
Var:v(A)Var:v(B) 2 4 1(x 1(AIBI + BIAI)x) + z(x 1Cx 1
1
= 4( (x 1(AIB I + B 1 Adx)2 + (x 1Cx)2)
1
2 4(x 1 CX)2,
which can also be written as
1 2
Var:v(A)Var:v(B) 2 4 1(x 1[A, B]x)1 ,

as claimed. o
Remark 8.3.2. A classical example of noncommutative observables are posi-
tion and momentum. It turns out that the commutator of these two observ-
ables is a homothety, i.e., multiplication by a nonzero constant, so Lemma
8.3.2 demonstrates that, in any state, the product of the variances of position
and moment um has a positive lower bound. This is known as Heisenberg's
uncertainty principle.
The uncertainty principle (Theorem 8.3.2) was criticized by D. Deutsch
[30] on the grounds that the lower bound provided by Theorem 8.3.2 is not
generally fixed, but depends on the state x. Deutsch hirnself in [30] gave
astate-independent lower bound for the sum of the uncertaint'ies of two
observables. Here we present an improved version of this bound courtesy of
Maassen and Uffink [56].
We say that a sequence (PI, ... , Pn) of real numbers is a probability dis-
tribution if Pi 2 0 and PI + ... + Pn = 1. For a probability distribution
P = (PI, ... ,Pn), the Shannon entropy of distribution is defined to be

H(P) = -(PllogPl + ... + Pn logPn),


where 0 . log 0 is defined to be O. For basic properties of Shannon entropy, see
Section 9.5.
136 8. Appendix A: Quantum Physics

Let A and B be some observables of an n-state quantum system (again


regarded as self-adjoint operators). As discussed before, there are orthonor-
mal bases of H n consisting ofthe eigenvectors of A and B. By denoting these
bases by {al, ... , an} and {bI, ... , bn }, we get spectral representations of A
and B:
A = Al lal)(all + ... + An lan)(an I,
B I
= 1-"1 bl)(bll + ... + I-"n Ibn)(bn I,
where Ai and I-"i are the eigenvalues of A and B, respectively.
With respect to a fixed state Ix) E H n , observables A and B define
probability distributions P = (PI, ... ,Pn) and Q = (ql, ... , qn) that describe
the probabilities of seeing labels Ab ... , An and 1-"1, •.. , I-"n. Notice that the
projections E(Ai) and E(I-"i) are Iai) (ai I and Ibi ) (bi I respectively. Therefore,
by measuring observable A when the system is in state Ix), label Ai can be
observed with a probability of

and by measuring B, label I-"i is seen with a probability of qi = I(bi I x)1 2 •


The probability distributions P = (PI, ... ,Pm) and Q = (ql, ... , qm) obey
the following state-independent law:
Theorem 8.3.3 (Entropie Uneertainty Relation).

H(P) + H(Q) ~ -210gc,

where c = maJCi,j I(ai I bj)l·


Proof. The expressions H(P) and H(Q) seem somewhat difficult to estimate,
so we should find something to replace them. Let r > 0 and study the ex-
pression

(8.26)

We claim that
1 n n
lim -log (LPr+1)
r~O r
= LPi lOgPi. (8.27)
i=l i=l
In fact, since I:~l Pi = 1, the nominator and numerator of the left-hand side
of (8.27) tend to 0 as r does. Therefore, L'Hospital rule applies and

I (I: n :+1 ) 1 n n
lim og .=lP. = lim
"n r+l "p:+llogp·
L..J •
= "p.logp.•.
• L..J'
r~O r r~O L..i=l Pi i=l i=l

Notice also that (8.27) implies that


8.3 Quantum States as Hilbert Space Vectors 137

and that the claim is equivalent to exp(-H(P) - H(Q)) ::; c2 , which can be
written as

rr rr
n

i=1
pfi
n

i=1
q'!i ::; c2 . (8.28)

Our strategy is to replace the products in (8.28) with expressions similar to


those ones in (8.26), estimate them, and obtain (8.28) letting r --+ O.
To estimate (8.26), we use a theorem by M. Riesz [73]:14 If T : H n --+ H n
is a unitary matrix, y = Tx, and 1 ::; a ::; 2, then

(tIYkla~l )a~l ::;c2~a(tlxila)~,


i=1 i=1

where c = max ITijl.


By choosing Xi = (ai I x), Tij = (bi I aj), we have Yi = (bi I x). Clearly,
T is unitary. We can now write the theorem in form
n
(I: I(bi Ix) Ia~
i=1
1 )
a-l

-a
n
(I: I(ai I x W
i=1
r 1
a ::; C 2;;a . (8.29)

Since (8.29) can also be raised to some positive power k, we will try to fix
values such that (8.29) would begin to resemble the right-hand side of (8.26).
Therefore, we will search for numbers r, s, and k such that a~1 = 2(s + 1),
ka-;;l = ~, a = 2(r + 1), -~ = ~, and k 2 -;;a = 2. To satisfy 1 :::: a :::: 2, we
must choose r E [- ~,O]. A simple calculation shows that choosing s = - 2r~1 '
k = - 2r:2 will suffice. Thus, (8.29) can be rewritten as

and, by letting r --+ 0, we obtain the desired estimation. D

Remark 8.3.3. The lower bound of Theorem 8.3.3 does not depend on the
system state x, but it may depend on the representation of observables

and
14 Riesz's theorem is a special case of Stein's interpolation formula. The elementary
proof given by Riesz [73] is based on Hölder's inequality.
138 8. Appendix A: Quantum Physics

The independence of the representation is obtained if both observables are


non-degenerate. Then, all the vectors ai and bi are determined uniquely up to
a multiplicative factor having absolute values of 1; hence c = max I(bi I aj)1 is
also uniquely determined. Notiee also that the degeneration of an observable
means that there are less than n values to be observed, so naturally the
entropy decreases.
Example 8.3.3. Consider a system consisting of m qubits. One orthonormal
basis of H 2 m is the familiar computational basis B l = {Ixl ... x m ) I Xi E
{a, I}}, and another one is B 2 = {H Ix) I Ix) E B l } whieh is obtained by
applying Hadamard-Walsh transform (see Section 4.1.2 or 9.2.3)

H Ix) = _1_ L (_I)m. y Iy) .


...;zm yElF!{'
on B l . We will study observables Al and A2 whieh are determined by B l
and B 2 ,

A = L c~) Ix)(xl·
mEB;

Now we do not care very much about the labels c~); they can be fixed in any
manner such that both sequences ~) consist of 2m distinct real numbers.
This, in fact, is to regard an observable as a decomposition of H 2 m into
2m mutually orthogonal one-dimensional subspaces. For any Ix) E B l and
Ix') E B2, we have

I(x I x')1 = ~,
y2 m
as is easily verified. Thus, the entropie uncertainty relation gives

That is, the sum of the uncertainties of observing an m-qubit state in bases
B l and B 2 mutually is at least m bits! On the other hand, the uncertainty
of a single observation cannot naturally be more than m bits: an observation
with respect to any basis will give us a sequence x = Xl •.. X m E {a, I}m with
some probabilities of Pm such that EmElF!{' Pm = 1. The Shannon entropy is
given by

-L Pm log Pm ,
mElF!{'

which achieves the maximum at Pm = 2;" for each x E {a, I}m. Thus, the
maximum uncertainty is m bits.
8.4 Quantum States as Operators 139

The lower bound of the uncertainty principle (Theorem 8.3.2) depends


on the commutator [A, B] and on the state Ix). If [A, B] = 0, then the
lower is always trivial. Does a similar result hold also for the entropie un-
certainty relation? The answer is yes: if A and B commute, then, according
to Lemma 8.2.9, A and B share an orthonormal eigenvector basis. But then
c = max I(bi I aj) I = 1 and the lower bound is trivial.

8.4 Quantum States as Operators


Let us study a system of two qubits in state

~(IO) 10) + 11) 11)). (8.30)

In the introductory section we learned that this state is entangled; there is no


way to write (8.30) as a product of two single-qubit states. This also means
that the representation of quantum systems as Hilbert state vectors does not
offer us any vocabulary to speak about the state of the first qubit if the
compound system is in state (8.30). In this section, we will generalize the
not ion of a quantum state.

8.4.1 Density Matrices

Let {bI, ... , bn } be a fixed basis of H n . In this section, the coordinate


representation of a vector x = Xl bl + ... + xnbn is interpreted as a col-
umn vector (Xl, ... ,xnf, and we define the dual vector of Ix) as a row
vector (xl = (xi,···, x~). Notice that, for any x = x1b1 + ... xnbn and
Y = y1b 1 + ... + Ynbn, (xIIY) as a matrix product looks like

Inspired by the word "bracket", we will call the row vectors (xl bra-vectors
and column vectors Iy) ket-vectors. 15
Remark 8.4.1. A coordinate independent notion of dual vectors can be
achieved by using the theorem of Frechet and F. Riesz. Any vector x E H n
defines a linear function Im : H n ----7 C by

Im(Y) = (x I y). (8.31)

A linear function I : H n ----7 C is called a functional. Functionals form a


vector space H~, a so-called dual space of H n , where addition and scalar
15 This terminology was first used by P. A. M. Dirac.
140 8. Appendix A: Quantum Physics

multiplication are defined pointwise. The Fh~chet-Riesz's theorem states that,


for any functional f : H n -+ C, there is a unique vector x E H n such that
f = fm. According to this point of view, the dual vector of xis the functional
fm that satisfies (8.31).

For any state (vector of unit length) x = xlb l + ... + xnbn E H n , we

(jJ
define the density matrix belonging to x by

Ix) 0 (xl ~ 0 (X;, x;, ... ,x~)


IXll2 Xlx2 •.. XlX~)
X2 X * *
l IX2 12 •.. X2Xn
- (
·· .
.. . .. ... .
·
XnXl* X n X 2* ... IX n 12

Remark 8.4.2. For density matrices, there is also a coordinate-independent


viewpoint: it is not difficult to verify that Ix) ® (xl is a matrix representing
a projection onto the one-dimensional subspace generated by x, and we have
already used notation Ix)(x I to stand for such a projection. To shorten the
notations, we will omit ® and write Ix) ® (xl =1 x) (x I hereafter. Recall also
that we called states Ix) and lXI) equal, if x = ei8xl for some real number
(), but the density matrices are robust against factor eilJ, i.e., I Xl)(Xl 1=1
X)(X I. On the other hand, for a one-dimensional subspace W, we can pick
a unit-Iength vector x which generates W. This vector is unique up to the
multiplicative constant eilJ , and the projection onto W is 1x)(x I. Thus, there
is a bijective correspondence between the states and the density matrices
belonging to unit vectors x.

We list three important properties of density matrices Ix) (x lasfollows:


• Since Ilxll = 1, matrices Ix) (x I have unit trace (recall that the trace is the
sum of the diagonal elements).
• As projections, matrices Ix) (x I are self-adjoint.
• Matrices Ix)(xl are positive, since (lx)(xI)2 =Ix)(xl.
Using these properties, we generalize the notion of astate:
Definition 8.4.1. Astate of a quantum system is a positive, unit-trace self-
adjoint operator in H n . Such an operator is called a density operator.
Hereafter, any matrix representing astate is called a density matrix as weIl.
Spectral representation offers a powerful tool for studying the properties of
the density matrices.
8.4 Quantum States as Operators 141

If p is a density matrix having eigenvalues Al, ... , An, then there is an


orthonormal set {Xl,"" X n } of eigenvectors of p. Thus, p has a spectral
representation

(8.32)

The spectral representation (8.32) in fact tells us that each state (density
matrix) is a linear combination of one-dimensional projections, which are
alternative notations for quantum states which are defined as unit-length
vectors. We will call the states corresponding to one-dimensional projections
vector states or pure states. States that are not pure are referred to as mixed.
More information about density matrices can be easily obtained: using
the eigenvector basis, it is trivial to see that Al + ... + An = Tr(p) = 1. Since
p is positive, we also see that Ai = (Xi pXi) 2': O. Thus, we can say that each
1

mixed state is a convex combination of pure states.


Thus, a general density matrix looks very much like a probability dis-
tribution of orthogonal pure states. If fact, it is true that each probability
distribution (PI, ... ,Pn) of mutually orthogonal vector states Xl, ... , X n al-
ways defines a density matrix

but the converse is not so dear. For there is one unfortunate feature in the
spectral decomposition (8.32). Recall that (8.32) is unique if and only if p
has distinct eigenvalues. For instance, if Al = A2 = A, then it is easy to verify
that we can always replace

with

where
,
Xl = Xl cosa - X2
.
SIna,
x~ = xlsina+x2cosa,
and a is areal number (Exercise 5). Therefore, we cannot generally say,
without additional information, that a density matrix represents a prob ability
distribution of orthogonal state vectors.
We describe the time evolution of the generalized states briefly at first.
Afterwards, in Section 8.4.4, we will study the issue of describing the time
evolution in more detail. Here we merely use decomposition (8.32) straight-
forwardly: for a pure state 1 x) (x 1 there is a unitary mapping U(t) such
that

1x(t))(x(t) 1=1 U(t)X(O)) (U(t)x(O) 1= U(t) 1x(O))(x(O) 1U(t)* (8.33)


142 8. Appendix A: Quantum Physics

(see Exercise 6). By extending this linearly for mixed states p, we have

p(t) = U(t)p(O)U(t)*,
where U(t) is the time-evolution mapping ofthe system. By using representa-
tion U(t) = e- itH , it is quite easy to verify (Exercise 7) that the Schrödinger
equation takes the form

i !p(t) = [H, p(t)].

8.4.2 Observables and Mixed States

In this section we will demonstrate how to fit the concept of observables into
general quantum states. For that purpose, we will first regard an observable
as a self-adjoint operator A with spectral representation

where each E((}i) =1 Xi)(Xi 1is a projection such that the vectors Xl, ... , Xn
form an orthonormal basis of H n . If the quantum system is in astate X (pure
state 1 x)(x 1), we can write X = CiX1 + ... + Cnx n , where Ci = (Xi 1 X). Thus,
the probability of observing a label (}i is
n
(x 1E((}i)X) = \ 2: (Xj 1x)Xj 1E((}i)X)
j=1
n
= 2: \Xj 1 E((}i)(X 1 Xj)X)
j=l
n
= 2:<Xj 1E((}i) Ix)(xi Xj)
j=l
(8.34)
Equation 8.34 will be the basis for handling the observations on mixed
states: Any state T can be expressed as a convex combination

(8.35)

of pure states 1Xi) (Xi I, and we use the linearity of the trace to join the concept
of observables together with the not ion of states as self-adjoint mappings. Let
T be astate of a quantum system and E an observable which is considered
as a projection-valued measure. The probability that areal number (label) in
set X is observed is Tr(E(X)T) = Tr(TE(X)). Notice that, since Tr(T) = 1
and E(X) is a projection, we always have 0 :s; Tr(TE(X)) :s; 1. This can
be immediately seen by using the linearity of the trace and the spectral
representation (8.35) for T:
8.4 Quantum States as Operators 143
n n
Tr(TE(X» = LAiTr(lxi)(xil E(X» = LAi(xi I E(X)Xi).
i=1 i=1
Inequalities 0 :S Tr(TE(X)) :S 1 now directly follow from the facts that
o :S (Xi I E(X)Xi) :S 1, Ai ~ 0, and that L:~=I Ai = 1.
Example 8.4.1. Let E = {EI' ... ' E m } be an observable (a collection of mu-
tually orthogonal subspaces), as in Definition 8.3.1, and Pi the projection
onto Ei. If, moreover, BI, ... , Bm E lR are the values associated with the
subspaces, we can define a projection-valued measure E by E(Bi ) = Pi. Then
A = BIPI + ... + BmPm = BIE(Bd + ... + BmE(Bm ), and the probability
that the measurement of observable E will give Bi as an outcome is given by
Tr(TPi ) = Tr(TE(B i )).

As mentioned above, any state Tinduces a probability distribution on


the set of projections in Hilbert space H n by jLT(P) = Tr(TP). If n ~ 3,
the converse also holds. For the statement of the theorem below, recall that
P(H) is the set of projections in L(H).

Theorem 8.4.1 (A. Gleason, 1957). If dimH ~ 3, then each probability


measure jL : P(H) -+ [0,1] satisfying
1. jL(O) = 0, jL(I) = 1
2. jL(L:i Pi) = L:i jL(Pi )
for each sequence of orthogonal projections Pi is generated via astate T
(depending only on jL) by

jL(P) = Tr(TP).
The proof of Gleason's theorem can be found in [65], but no simple proof of
that theorem is known.
We will demonstrate here how an analogous statement can be reached in
a finite-dimensional case, if instead of projections, all self-adjoint mappings
are allowed. The proof of the theorem below is from [61].

Theorem 8.4.2. Let Ls(Hn ) be the set of self-adjoint operators and jL


Ls(Hn ) -+ lR a mapping which satisfies
1. jL(I) = 1
2. jL(1 x)(x I) ~ 0 for each unit-lenght XE H n
3. jL(L:i aiSi) = L:i aijL(Si) for each sequence of self-adjoint operators Si E
Ls(Hn ) and real numbers ai.
Then there is astate T depending only on jL such that

jL(S) = Tr(TS)

for each self-adjoint operator S.


144 8. Appendix A: Quantum Physics

Before the proof, we will list some simple auxiliary results.


Lemma 8.4.1.

c Ixr)(xs 1 +c* Ixr}{x s 1


= Re(c) ( Ixr}{xsl + Ixs)(xrl) + Im(c) (i Ixr}{xsl-i Ixs)(xrl),
where Re(c) and Im(c) are the real and imaginary parts of c, respectively.

Proof. Recalling that Re(c) = !(c+c*) and Im(c) = i(c-c*), the statement
of the lemma follows by direct calculation. D

Lemma 8.4.2. Mappings B rs =1 xr)(x s 1+ 1xs)(x r 1 and Crs = i 1Xr)(X s 1


-i 1 Xs)(X r 1 are self-adjoint. Moreover, B sr = B rs and C sr = -Crs·

Proof. By direct calculation, recalling Remark 8.2.4. D

Proof {Proof of Theorem 8.4.2}. Let S E Ls(Hn ). By fixing an orthonormal


basis {Xl, ... , x n } of H n we get a representation (Recall Remark 8.2.5).
n n

r=ls=l
n
= Z)xr 1 Sxr ) Ixr}{xrl + L(xr 1 SXs) Ixr)(xsl
r=l
n
= ~)Xr 1 Sxr ) Ixr}{xr 1
r=l

r<s r>s
changing the role of rand s in the last term we have that
n
S = L(xr 1 Sxr ) Ixr)(xr 1
r=l

r<s r<s
n
= L(xr 1 Sxr ) Ixr)(xrl
r=l
+ L ((xr 1 SXs) Ixr}{xsl +(xr 1 SXs)* IXs)(xrl) (8.36)
r<s
By applying Lemma 8.4.1 to (8.36), this can be rewritten as
8.4 Quantum States as Operators 145
n
S = ~)Xr 1 Sxr) Ixr)(xr 1
r=l

r<s
+ LIm(x r 1SX s) (i Ixr)(xsl-i Ixs)(xrl)
r<s
By denoting Ar =1 Xr)(Xr I, Brs =1 Xr)(X s 1+ 1Xs)(Xr I, and Crs = i 1Xr)(X s 1
-i 1 Xs)(X r 1 (mappings Ar are clearly self-adjoint and by Lemma 8.4.2 all
the mappings B rs and Crs are also self-adjoint), the above equality becomes
n
S = L(xr 1Sxr)Ar + LRe(xr 1Sxs)Brs + LIm(xr 1SXs)Crs .
r=l r<s r<s
Because all the coefficients in the above sums are real, and the mappings
self-adjoint, the assumption implies that
n
M(S) = L(xr 1 SXr)M(Ar)
r=l

r<s r<s
This can be written as
n
M(S) = L(xr 1 SXr)M(Ar)
r=l

r<s r<s
n
= L(xr 1 SXr)M(Ar)
r=l

Equations B rs = B sr and Crs = -Csr imply that

Denoting Trr = M(Ar), Tsr = ~M(Brs) - ~M(Crs) we have obviously that


T;s = T sr , so we can define a self-adjoint mapping T E L(Hn ) by
146 8. Appendix A: Quantum Physics
n n
T= L L Trs Ixr)(xsl. (8.37)
r=1s=1
Equation 8.37 implies directly that T rs = (x r 1 Tx s ) and that
n n
J-l(8) = LLTsr(xr 18x 8 )

r=1s=1
n n
= LL(xs 1Txr)(xr 18x 8)

r=1s=1
n n
= L L(x 8 11 Txr)(xr 18x 8)

r=1s=1
n n
= ~)xs 1TL Ixr)(xrl 8x 8 )
s=1 r=1
n
= L(xs 1 T8x s) = n(T8).
s=1
To finish the proof, we have to show that T is astate. That is, a self-adjoint,
unit-trace positive mapping.
We have already noticed that T is self-adjoint. The trace of T can be
easily found by summing over the diagonal elements:
n n n
n(T) = LTrr = LJ-l(Ar ) = J-l(LAr ) = J-l(I) = 1.
r=1 r=1 r=1
For the positivity of T, we notice that if XE H n has unit length, then

J-l(lx)(xl) = n(T Ix)(xl) = (x 1T Ix)(xl x) = (x 1Tx).

The second-last equality follows from the fact that it is always possible to
choose an orthonormal basis containing x. Therefore,

(x 1Tx) = J-l(lx)(xl) ~ 0 (8.38)

for each unit-length vector x E H n . If Y i:- 0 is not of unit length, then


x = nirrY is, and substituting this in (8.38) yields the desired result. 0

Remark 8.4.3. Paul Busch introduced a result [22] which states that proba-
bility measures defined on effects16 can be expressed analogously to Gleason's
theorem. The result of Busch is based on extending the measure J-l orginally
defined on effects only into a linear functional defined on all self-adjoint op-
erators. The previous theorem can then be applied.
16 Effect Eis an operator in Ls(Hn } satisfying 0 ~ E ~ I.
8.4 QuantumStates as Operators 147

8.4.3 Subsystem States

Now we will study the description of compound systems in more detail. Let
H n and H m be the state spaces of two distinguishable quantum systems.
As discussed before, the state space of a compound system is the tensor
product H n ~ H m . Astate of the compound system is a self-adjoint mapping
on H n ~ H m . Such astate can be comprised of states of subsystems in the
following way: if Tl and T 2 are states (also viewed as self-adjoint mappings)
of H n and H m , then the tensor product Tl ~T2 defines a self-adjoint mapping
on H n ~ H m , as is easily seen (Tl ~ T 2 is defined by

for the basis states and extended to Hn~Hm by using linearity requirements).
One can also easily verify that the matrix of Tl ~ T 2 is the tensor product
of the matrices of Tl and T 2 . As weIl, observables of subsystems make up an
observable of a compound system: if Al and A 2 are observables of subsystems
(now seen as unit-trace positive self-adjoint mappings), then Al ~ A 2 is a
positive, unit-trace self-adjoint mapping in H n ~ H m .
The identity mapping I: H --+ H is also a positive, unit-trace self-adjoint
mapping and, therefore, defines an observable. However, this observable is
most uninformative in the following sense: the corresponding projection-
valued measure is defined by

E (X) = {I, if 1 E X,
I 0, otherwise,

°
so the probability distribution associated with EI is trivial: Tr(TEi(X)) is 1
if 1 EX, otherwise. Notice also that I has only one eigenvalue 1, but all the
vectors of H are eigenvectors of I. This strongly reflects on the corresponding
decomposition of H into orthogonal subspaces: the decomposition has only
one component, Hitself. This corresponds to the idea that, by measuring
observable I, we are not observing any nontrivial property. In fact, if Tl and
T 2 are the states of H n and H m and Al and A 2 are some observables, then
it is easy to see that
Tr((TI ~T2)(AI ~I)) = Tr(TIA I ),
Tr( (Tl ~ T 2)(I ~ A 2)) = Tr(T2A 2).
That is, observing the compound observable Al ~I (resp., I ~A2) corresponds
to observing only the first (resp., second) system. Based on this idea, we will
define the substates of a compound system.

Definition 8.4.2. Let T be astate of a compound system with state space


H n ~ Hm- The states of the subsystems are unit-trace positive self-adjoint
mappings Tl : H n --+ H n and T 2 : H m --+ H m such that for any observables
Al and A 2 ,
148 8. Appendix A: Quantum Physics

Tr(TlA l ) = Tr(T(A l l8l I)), (8.39)


Tr(T2 A 2 ) = Tr(T(I l8l A 2 )). (8.40)
We say that Tl is obtained by tracing over H m , and T 2 by tracing over H n .
We can, of course, be skeptical about the existence and uniqueness of Tl and
T 2 , but, fortunately, they are always uniquely determined.
Theorem 8.4.3. For each state T there are unique states Tl and T 2 satis-
jying (8.39) and (8.40) jor alt observables Al and A 2 .
Proof. We will show only that Tl exists and is unique; the proof for T2 is
symmetrical. We will first search for a mapping Tl that will satisfy
Tr(TlA) = Tr(T(A l8l I)) (8.41)
for any linear mapping A : H n -+ H n . For that purpose, we fix orthonormal
bases {Xl,"" X n } and {Yl,"" Ym} of Hn and H m , respectively. Using these
bases, (8.41) can be rewritten as
n n m

2: (Xi! TlAXi) = 2: 2: (Xi l8l Yj ! T(A l8l I) (Xi l8l Yj))


i=l i=l j=l
n m
(8.42)
i=l j=l

Recall that Tl can be represented as


n n
Tl = E E Cij 1 Xi)(Xj I, (8.43)
i=l j=l

where Cij are complex numbers (Remark 8.2.5).


By choosing A =!Xk)(Xl! and substituting (8.43) in (8.42), we obtain
m
Clk = 2:(Xl l8l Yj ! T(Xk l8l Yj)),
j=l

which gives
n n m
Tl = 2: 2: 2:(Xi I8lYk! T(Xj I8lYk)) !xi)(Xjl (8.44)
i=l j=l k=l

as a candidate for Tl' In fact, mapping (8.44) is self-adjoint:


n n m

i=l j=l k=l


n n m
= EE2:(XjI8lYk !T(XiI8lYk)) !xj)(xil
i=l j=l k=l

= Tl'
8.4 Quantum States as Operators 149

By choosing A = I, we also see that Tr(TI ) = 1. What about the positivity


of Tl? Analogously to (8.34),

(x 1 Tlx) = Tr(TI Ix)(xl) = Tr(T(lx)(xl Q9I)) ~ 0,


so Tl is also positive. This proves the existence of the required state Tl.
Assume then, on the contrary, that there is another state T{ such that

°
Tr(TIA) = Tr(T{ A) for each self-adjoint A. By choosing A as a projection
1x)(x I, we have that (x 1 (Tl - T{)x) = for any unit-Iength x. Mapping
Tl - T{ is also self-adjoint, so there is an orthonormal basis of H n consisting of
eigenvectors of Tl - T{. If all the eigenvalues of Tl - T{ are 0, then Tl - T{ = 0,
and the proof is complete. In the opposite case, there is a nonzero eigenvalue
Al of Tl - T{ and a unit-Iength eigenvector Xl belonging to Al. Then,

a contradiction. D

Let us write Tl = TrH".. (T) for the state obtained by tracing over H m
and, similarly, T 2 = Tr Hn (T) for the state that we get by tracing over H n .
By collecting all facts in the previous proof, we see that
n n m
Tl = LLL(XiQ9Yk 1 T(XjQ9Yk) IXi)(Xjl· (8.45)
i=l j=l k=l
Similarly,
m m n
T2 = L L L(Xk Q9 Yi 1T(Xk Q9 Yj) IYi)(Yj I· (8.46)
i=l j=l k=l
It is easy to see that the tracing-over operation is linear: Tr H n (a8 + ßT) =
aTr Hn (8) + ßTr Hn (T).
Example 8.4.2. Let the notation be as in the previous lemma. If 8 is astate
of the compound system H n Q9 H m, then 8 is, in particular, a linear mapping
H n Q9 H m -+ H n Q9 H m . Therefore, 8 can be uniquely represented as
n m n m
8= LLLLSrstu IXr Q9Yt)(xs Q9Yul. (8.47)
r=lt=ls=lu=l
It is easy to verify that 1Xr Q9 Yt)(x s Q9 Yu 1=1 xr)(xsl Q9 1Yt)(Yu I, so we can
write
n n m m
8 = L L 1xr)(xsl Q9 L L Srstu 1Yt)(Yu 1
r=ls=l t=l u=l
n n
= L: L: 1 xr)(xsl Q98rs , (8.48)
r=ls=l
150 8. Appendix A: Quantum Physics

where each Srs E L(Hm ) (recall that the notation L(Hm ) stands for the
linear mappings H m -+ H m ). It is worth noticing that the latest expression
for S is actually a decomposition of an nm x nm matrix (8.47) into an n x n
block matrix having m x m matrices Srs as entries.
Substituting (8.48) in (8.45), we get, after some calculation, that
n m m n

r=l i=l j=l r=l

The above expression can be interpreted as follows: if the density matrix


corresponding to the state S is viewed as an n x n block matrix with m x m
blocks Srs, then the density matrix corresponding to the state of system S2
can be obtained by summing together all the diagonal blocks Srr.

Example 8.4.3. Let us investigate the compound system H 2 @ H 2 of two


qubits. We choose {IOO) , 101) , 110) , 111)} as the orthonormal basis of H 2 @H2 .
Consider the (pure) state y = ~(IOO) + 111)) of the compound system.
By using (8.45), we see that the states of the subsystems are

S' = ~ 10)(01 +~ 11)(11·


By using the familiar coordinate representation 100) = (1,0,0, O)T, 101) =
(O,l,O,O)T, 110) = (0,0,1, O)T, and 111) = (0,0,0, 1)T, the coordinate repre-
sentation of y is (~, 0, 0, ~)T and it is easy to see that the density matrix
corresponding to the pure state Iy) (y I is

1001)
0000 1
(
Mty)(Y!="2 0000
1001
and the density matrices correspoding to states S' are

Ms' ="21 (10)


01 '

which agrees with the previous example. On the other hand, if we try to
reconstruct the state of H 2 @ H 2 from states S', we find out that

1 1000)
0100
Ms' @ Ms' = 4" ( 0 0 1 0 '
0001
which is different from Mty)(y!. This demonstrates that subsystem states S'
obtained by tracing over are not enough to determine the state 01 the com-
pound system.
8.4 Quantum States as Operators 151

Even though the state of the compound system perfectly determines the
subsystem states, there is no way to obtain the whole system state from the
partial systems without additional information.
Example 8.4.4. Let the notation be as before. If z E H n Q9Hm is a unit-Iength
vector, then 1 z) (z 1 corresponds to a pure state of system H n Q9 H m . For each
pair (x, z) E H n x H n Q9 H m , there exists a unique vector Y E H m such that

(Y' 1 y) = (x Q9 y' 1 z)
for each Y' E H m . In fact, if Y1 and Y2 were two such vectors, then (Y' 1
Y1 - Y2) = 0 for each Y' E H m , and Y1 = Y2 follows immediately. On the
other hand,
m
Y = L (x Q9 Yk 1 Z)Yk
k=l
clearly satisfies the condition required. We define a mapping (., .) : H n x H n Q9
H m -tHm by
m
(x,z) = L(XQ9Yk 1 Z)Yk.
k=l
Clearly (.,.) has properties that resemble the inner product, such as linearity
with respect to the second component and antilinearity 17 with respect to the
first component.
Now, substituting 8 =1 z) (z 1in (8.45), we see that
n
8 2 = L I(Xk,Z))((Xk,Z)I· (8.49)
k=l
Comparing (8.49) with the equation
n n
Tr(lz)(zl) = L(Xk Ilz)(zl Xk) = L(Xk 1z)(z 1Xk)
k=l k=l
somehow explains the name "tracing over". Notice also that if
n m n m

Z = L L CijXi Q9 Yj = LXi Q9 I>ijYj,


i=l j=l i=l j=l
then clearly (Xk, z) = 2:;:1 CkjYj, and 8 2 becomes
n m m
82 =L ILCkjYj)(LCkjYjl.
k=l j=l j=l
17 Here antilinearity, as usual, means that (0:1:1':1 +0:2:1':2, z) = 0:;:(:1':1, z) +0:;(:1':2, z).
152 8. Appendix A: Quantum Physics

Theorem 8.4.4 (Schmidt decomposition). Assume that n :::; m. Let z E


H n ® H m be a unit-length vector and Tl = Tr H ", (I z)(z I) and T 2 = Tr Hn (I
z)(z I) the subsystem states in L(Hn ) and L(Hm ), respectively. Let also {Xl,
... , x n } be an orthonormal basis of H n consisting of eigenvectors ofT1 and

a spectral representation of Tl. Then,

z = ~X1 ® Y1 + ... + J>:Xn ® Yn,


where {Yi 1 Ai =I- O} is a set of orthonormal eigenvectors ofT2 (not necessarily
a basis of H m ).
ProoJ. Let {bI, ... , bm } be an orthonormal basis of H m . Then, vectors xi®bj
form an orthonormal basis of H n ® H m , and z can be represented as
n m n
Z = I: I: CijXi ® bj = I: Xi ® Y~, (8.50)
i=l j=l i=l

where
m
Y~ = I: cijbj .
j=l
On the other hand, we can write
m
Z = Lxj ®bj , (8.51)
j=l
where
n

xj = LCijXi.
i=l

By applying the previous example to (8.51), we get


m m n n

k=l k=l i=l j=l


m n n

= LLLcikcjk IXi)(xj I· (8.52)


k=l i=l j=l
On the other hand,
m m

k=l 1=1
m m m

= L L cjkCil(bk b1) = cjk cik , 1 L


k=11=1 k=l
8.4 Quantum States as Operators 153

so (8.52) can be rewritten as


n n
Tl = LL(yj 1 y~) IXi)(Xj I· (8.53)
i=l j=l

Now that Tl also possesses the spectral representation


n
Tl = LAi IXi)(Xil
i=l

and representation (8.53) is unique, we must conclude that

(Yj'I Yi') = { 0Ai, otherwise.


if j = i

Substituting y~ = AYi to (8.50) gives the desired decomposition, and


clearly {Yi 1 Ai =f. O} is an orthonormal set. Applying Example 8.4.3 to
decomposition
n
z= LAxQ9Yi
i=l

now directly yields


n
T2 = LAi 1Yi)(Yi I,
i=l

which means that Yl, ... , Yn are eigenvectors ofT2 belonging to eigenvalues
AI, ... , An. 0

Lemma 8.4.3 (Purification). Let T E L(Hn ) be an arbitrary state oi a


quantum system. There exists an integer m and a unit-length vector z E
H n Q9 H m such that T = TrH m (I z) (z 1).
Proof. Let

be a spectral representation of T. The previous theorem and Example 8.4.3


give us a clue how to construct z: we choose n = m and let Yb ... , Yn be an
orthonormal basis of another copy of H n . We define

Notice that since T is a positive mapping, all eigenvalues Ai are nonnegative,


and the square roots can be therefore defined. The length of z can also be
easily computed:
154 8. Appendix A: Quantum Physics

because T has unit trace.


Now z fits in the previous example by setting

Aifi=j
{
Cij = 0 otherwise.

By applying the result of the previous example, we see directly that


n n
TrHm(lz)(zl) = L 1v:x,;Xk) (v:x,;Xk 1= LAk IXk)(Xkl=T.
k=l k=l

Remark 8.4.4. The above lemma states that any state of a quantum system
can be interpreted as a result of tracing over a pure state of a larger system.
If T is a mixed state, a pure state 1z) (z 1such that T = Tr Hm (I z) (z I) is
called a purijication of T.

8.4.4 More on Time Evolution

We already know how to interprete abstract mathematical concepts such as


state (positive, unit-trace self-adjoint operator) and observable (can be seen
as a self-adjoint operator). Since these concepts refer to physical objects, it is
equally important to know how the description of a system changes in time.
In other words, we should know how to describe the dynamic evolution of
a system. We have already outlined in Section 8.3.1 how time evolution can
be described by the state vector formalism. Here, we will discuss the time
evolution of the general formalism in more detail.
In the so-called Schrödinger picture, the state of a system may change in
time but the observables remain, whereas in the Heisenberg picture the ob-
servables change. It turns out that both pictures are mathematically equiva-
lent, and here we will adopt the Schrödinger picture.
We will also adopt, as in Section 8.3.1, the causality principle, which teIls
us that the state of a system at some time t can be recovered from the state
at a given time t = O. Mathematically speaking, this means that if the state
of the system H n at t = 0 is So, there exists a function Vi : L(Hn ) -+ L(Hn )
such that the state at a given time t is St = Vi(So).
It follows that each Vi should preserve self-adjointness, trace, and positiv-
ity. A requirement potentially subject to more criticism is that each Vi must
be linear.
It is also quite common to require that each Vi is completely positive. To
define the concept of a completely positive linear mapping L(Hn ) -+ L(Hn ),
we consider first some basic properties of tensor products.
8.4 Quantum States as Operators 155

For some natural number m ~ 1, let H m be a Hilbert space with or-


thonormal basis {Yb ... , Ym}. If {Xb . .. ,xn } is also an orthonormal basis
of H n , then

{Yi Q9 Xj Ii E {1, ... ,m}, j E {1, ... ,n}}

is a natural choice for an orthonormal basis of H m Q9 H n . We will keep this


notation throughout this section.
Each linear mapping W E L(Hm Q9 H n ) can be uniquely represented as
m m
W = LL IYr)(Ysl Q9Ars , (8.54)
r=ls=l

where each Ars E L(Hn ) (see Example 8.4.2). Whenever W is represented


as in (8.54), then knowing each Ars uniquely determines Wand vice versa,
since

and any linear mapping in A kl E L(Hn ) is completely determined by the


values (Xi I AklXj). In fact, this is already quite clear from the fact that
(8.54) actually represents an mn x mn matrix W as an m x m matrix with
n x n matrices as entries.
Now, if V : L(Hn ) --+ L(Hn ) is a linear mapping and W is as in (8.54),
we define another mapping Im Q9 W : L(Hm Q9 H n ) --+ L(Hm Q9 H n ) by
m m

(Im Q9 V)(W) = LL IYr)(Ys I Q9V(Ars ).


r=ls=l

Definition 8.4.3. Mapping V: L(Hn ) --+ L(Hn ) is completely positive if for


any m ~ 1, the extended mapping Im Q9 V preserves positivity.

Remark 8.4.5. The intuitive meaning of the notion "completely positive" is


the foHowing: instead of merely regarding T as astate of the quantum system
H n , we could as weH introduce an "environment" H m and regard Im Q9 T as
the state of the compound system H m Q9 H n . The requirement that V should
be completely positive means that the time evolution on a compound system
must preserve positivity as weIl.

'frace-preserving completely positive mappings, which now will be as-


sumed as general time evolution operations of quantum systems, are also
known as superoperators. If the time evolution of a quantum system can be
described by unitary mappings, we caH the system closed. Otherwise, a quan-
tum system is said to be open.
Because of their conceptual importance, we will devote the next section
to the study of completely positive mappings.
156 8. Appendix A: Quantum Physics

8.4.5 Representation Theorems

We will derive two representation theorems of completely positive mappings.


For that purpose, we begin with some auxiliary results.
Lemma 8.4.4. A positive mapping A E L(Hn ) is self-adjoint.

Proof. The positivity of A means that (x lAx) 2 0 for each x E H n .


Specifically, (x lAx) is real, hence (x lAx) = (Ax I x) for each x E H n .
On the other hand, also (x lAx) = (A*x I x) for each x E H n . Writing
B = i(A - A*) we have that (Bx I x) = 0 for any x E H n . Since B
is self-adjoint, there exists an orthonormal basis {Xl, ... , x n } consisting of
eigenvectors of Band

see spectral representation (8.16). But then

for each i E {I, ... , n}, and therefore B o. Equality A A* follows


immediately. D

Lemma 8.4.5. A mapping A E L(Hn ) is positive if and only if there is an


orthonormal basis {Xl, ... , x n } of H n and nonnegative numbers Al, ... , An
such that

(8.55)

Proof. Assume that A has representation (8.55). Each xE H n can be written


as

and then (x lAx) = Allal1 2 + ... + An lan l2 20, hence A is positive.


On the other hand, if A is positive, then A is also self-adjoint by the
previous lemma. Using the spectral representation, we see that there is an
orthonormal {Xl, ... , x n } basis of H n such that

for some real numbers Al, ... , An. Since A is positive,

for each i. D

The proof of the following theorem, the first structure theorem, is due to
[25].
8.4 Quantum States as Operators 157

Theorem 8.4.5. Linear mapping V: L(Hn ) -+ L(Hn ) is completely positive


if and only if there exists n 2 mappings Vi E L(Hn ) such that

V(A) = L ViAVi* (8.56)


i=l

for each A E L(Hn ).

Remark 8.4.6. In the quantum computation community, (8.56) is frequently


referred as to the Kraus representation of a completely positive mapping.

Proof. If V has representation (8.56), we should show that whenever


m m

W = LL IYr)(Ysl Q9Ars
r=ls=l
is positive, then also

r=ls=l i=l

is positive. This is done by straight forward computation: Any Z E H m Q9 H n


can be represented in the form

where N :::; n 2 and each Ya E H m and X a E H n . Writing


N
Zi = LYa Q9 Vi*x a,
a=l

it is easy to check that

n2
(zi (Im Q9 V)(W)z) = L(Zi I Wzi) ~ 0,
i=l

because W is positive.
For the other direction, let V be a completely positive mapping. Then
particularly, In Q9 V should be positive. For clarity, we will denote the basis of
one copy of H n by {Yl, ... ,Yn} and that of the other copy by {Xl, ... ,Xn }.
Mapping W E L(Hn Q9 H n ), defined by
158 8. Appendix A: Quantum Physics
n n
W = LL IYr)(Ysll8J Ixr)(xsl
r=ls=l
n n
= L LI Yr l8J Xr)(Ys l8J Xs 1
r=ls=l
n n

= 1 LYr l8J xr)(LYs l8J XS I,


,·=1 s=l
is clearly positive, since it can be directly read from the last representation
that W is a constant multiple of a one-dimensional projection. Since V is
completely positive, it follows that
n n
(In I8JV)(W) = LL IYr)(YsII8JV(lxr)(xs l) (8.57)
r=ls=l
is a positive mapping. By Lemma (8.4.5), there exists an orthonormal basis
{VI, ... ,Vn 2} of H n l8J H n and positive numbers Al, ... , An 2 such that
n n
(In I8JV)(W) = LL IYr)(YsII8JV(lxr )(xs l) = LAi IVi)(Vil,
r=ls=l i=l

which can also be written as

(In l8J V) (W) = L h/~Vi) (v,\;Vi 1 . (8.58)


i=l

If
n n
V = LLVijYj I8JXi
j=li=l
is a vector in H n l8J H n , we associate with va mapping Vv E L(Hn ) by
n n

Vv = L L Vij 1 Xi) (Xj 1 . (8.59)


i=l j=l

Then, a straightforward computation gives


n n

Iv)(vl= LL IYr)(YsII8JVv Ixr)(xsl Vv *· (8.60)


r=ls=l
Associating to each AVi in (8.58), mapping V; as in (8.59), and using
(8.60) we see that
8.4 Quantum States as Operators 159
n2 n n
(In®V)(W) = LLL IYr)(Ysl ®Vi Ixr)(xsl Vi*
i=1 r=1 s=1
n n
= LL IYr)(Ysl ® LVi Ixr)(Xsl Vi*,
r=1s=1 i=1
which, together with (8.57), gives for each pair r, s,

V(lxr)(xsl) = LVi Ixr)(xsl Vi*.


i=1
Since mappings IXr)(X s I span the whole space L(Hn ), equation

V(A) = L ViAVi*
i=1
holds for each A E L(Hn ). o
Another quite useful criterion for complete positiveness can be gathered
from the proof of the previous theorem:
Theorem 8.4.6. A linear mapping V : L(Hn ) -+ L(Hn ) is completely posi-
tive if and only if
n n
T= LL IYr)(Ysl ®V(lxr)(xsl)
r=1s=1
is a positive mapping H n ® H n -+ H n ® H n .
Proof. If V E L(Hn ) is a completely positive mapping, then T is positive by
the very definition, since
n n
LL IYr)(Ysl ® Ixr)(xsl
r=1s=1
is positive. On the other hand, if T is positive, then a representation
n2
V(A) = L ViAVi*
i=1
can be found as in the proof of the previous theorem. o
Recall that we also required that V should preserve the trace.
Lemma 8.4.6. Let V : L(Hn ) -+ L(Hn ) be a completely positive mapping
n2
represented as V(A) = 2:i=1 Vi AVi * . Then V preserves the troce if and only
2
if2:~=1 Vi*Vi = I.
160 8. Appendix A: Quantum Physics

Praof. Let {Xl,"" Xn} be an orthonormal basis of H n . Since


n

1= L Ixr)(xrl,
r=l

by direct calculation we find that


n2

(Xk L 1 Vi*Vixl)
i=l

i=l r=l

i=l r=l

i=l r=l
n2

= Tr( LVii XI)(Xk 1Vi*) = Tr(V(1 XI)(Xk I))·


i=l
2
If I:~=l Vi*Vi = I, then

Le., V preserves the trace of the mappings 1XI)(Xk I. Since V and the trace
are linear and the mappings 1 Xl) (Xk 1 generate the whole space L(Hn ),
V preserves all traces. On the other hand, if V preserves all traces, then
2
I:~=l Vi*Vi = I, since any linear mapping A E L(Hn ) is determined by the
values (Xk AXI)' 1 D

The second structure theorem for completely positive mappings connects


them to unitary mappings and offers an interesting interpretation.
Theorem 8.4.7. A linear mapping V : L(Hn ) -+ L(Hn ) is completely ------
positive and trace preserving if and only if there exists a unitary mapping
U E L(Hn 181 H n2) and a pure state B =Ib)(bl of system L(Hn2) such that

V(A) = TrH n 2 (U(A 181 B)U*)

for each A E L(Hn ).

Remark 8.4.7. A unitary time evolution

1x)(x If---tl Ux)(Ux 1


8.4 Quantum States as Operators 161

can be rewritten as

Ix)(xIH U Ix)(xi U*,


so the above theorem states that any completely positive mapping L(Hn ) -+
L(Hn ) can be interpreted as a unitary time evolution in a larger system.

Proof. Assume first that V(A) = TrH n2(U(A®B)U*). Let {Xl,"" Xn } and
{Yb ... , Yn 2 } be orthonormal bases of H n and H n2, respectively. We write
U* in the form
n n
U* = LL Ixr)(xsl ®U:s
r=ls=l
and define Vk E L(Hn ) as
n n
Vk = LL(UjiYk 1 b) IXi)(xjl·
i=1 j=1
Now, if
n n
A = LLaij IXi)(Xj I,
i=1 j=l
a direct calculation gives
n n n n
VkAVk* = L L L L ( xr 1 AXt)(U:iYk IIb)(bl Ut'jYk) IXi)(xjl· (8.61)
i=1 j=l r=l t=1
On the other hand, (8.45) gives that
Tr Hn2 (U(A ® B)U*)

i=1 j=l k=l


n2 n n n n
=L L L L L (x r ® U:iYk 1 AXt ® BUt'jYk) 1 Xi) (Xj 1

k=1i=1j=1r=1t=1

= L L L L L ( Xr 1 AXt)(U:iYk IIb)(bl Ut'jYk) IXi)(Xjl,


k=l i=l j=1 r=l t=l
which, using (8.61), can be written as

n2
TrHn 2(U(A®B)U*) = LVkAVk*.
k=1
162 8. Appendix A: Quantum Physics

Thus, by Theorem (8.4.5), V is completely positive. To see that V is trace


preserving, notice that a unitary mapping U converts an orthonormal basis
B to another orthonormal basis B', so
Tr(UTU*) = L (z' I UTU*z') = L (U*z' I TU*z')
z'EB z'EB'

= L (U-lZ' I TU-lz') = L (z I Tz) = Tr(T).


z'EB' zEB

Since "tracing over" Tr H n 2 : L(Hn ® H n 2) -+ L(Hn ), T J--t Tl = Tr H n 2 (T) is


defined by condition

for each projection P E L(Hn ), we can use P = In to obtain


Tr(Tr Hn2 (U(A ® B)U*))
= Tr(U(A ® B)U*) = Tr(A ® B) = Tr(A)Tr(B) = Tr(A).
On the other hand, let V : L(Hn ) -+ L(Hn ) be a completely positive and
trace-preserving linear mapping. By Theorem 8.4.5, V can be represented as

V(A) = LViAVi*,
i=l

where each Vi E L(Hn ) and

LViVi* = I.
i=l

Fix an arbitrary vector b E H n 2 of unit length. For any x E H n , we define


n2

U(x®b) = LVix®Yi' (8.62)


i=l

Clearly U is linear with respect to x. If xI, X2 E H n , a direct computation


gives
n2 n2
(U(XI ® b) I U(X2 ® b)) = L L(Vixl ® Yi I Vj X2 ® Yj)
i=l j=l

i=l j=l i=l

n2
= L(XI I Vi*Vi x 2) = (Xl I X2)
i=l
8.4 Quantum States as Operators 163

Denoting
H n ® b = {x ® b I x E H n } <;;;: H n ® H n 2,
we see that U : H n ® b --t H n ® H n 2 is a linear mapping that preserves inner
products. It follows that U can be extended (usually, in many ways) to a
unitary mapping U' E L(Hn ® H n 2). We fix one extension and denote that
again by U. If x is any unit-length vector, then,
U(I x)(x I ® Ib)(bl)U* = U Ix ® b)(x ® bl U*
= IU(x®b))(U(x®b)1

i=l i=l
Using Example 8.4.4, it is easy to see that

i=l

i=l
= V(lx)(xl)·
Hence with U defined as in (8.62) and B =1 b)(b I, the claim holds at least
for all pure states Ix)(x I. Since linear mappings
A r--+ TrHn2 (U(A ® B)U*)
and A r--+ V (A) agree on all pure states and all states can be expressed as
convex combinations of pure states, the mappings must be equal. D

8.4.6 Jozsa's Theorem of Cloning and Deleting


Recently, Richard Jozsa introduced a stronger variant of the no-cloning prin-
ciple [48]. Jozsa's theorem can also be extended to cover the no-deleting
principle by Pati and Braunstein [66].
Theorem 8.4.8 (Jozsa's No-Cloning Theorem). Let Xl, ... , Xk be unit
vectors in H n such that (Xi I Xj) i=- 0 for each pair i, j. Let 0 E H n also
be a unit vector of H n and PI, ... , Pk some states in L(Hm ). There exists a
completely positive mapping V E (Hn ® H n ® H m ) such that

V( IXi)(Xil ® 10)(01 ®Pi) =IXi)(Xil ® IXi)(xil ®p~


for each i E {I, ... , k} if and only if there exists a completely positive mapping
V' E L(Hn ® H m ) such that

V'( 10)(01 ®Pi) =IXi)(Xil ®p~'


for each i E {l, ... , k}.
164 8. Appendix A: Quantum Physics

Remark 8.4.8. The above theorem can be interpreted as follows: There exists
a physical operation that pro duces copies of pure states IXk), •.• , IXk) (with
extra information Pb ... , Pk such that Pi belongs to state lXi)) if and only
if there exists a physical operation that pro duces state lXi) from ancillary
information Pi. Choosing PI = ... = Pk = p, we get the classical no-cloning
theorem of Wootters and Zurek (Theorem 2.3.1).

Proof. If mapping V' E L(Hn 0 H m ) satisfying

exists, then the desired mapping V could be easily obtained as an extension.


Assurne then that mapping V as stated in the theorem exists. By Lemma
8.4.3, each state Pi E H m can be represented as a partial trace over a pure
state determined by some vector Yi E H m 0 H m . Moreover, by Theorem
8.4.7, there exists an environment He such that V can be interpreted as a
unitary mapping in space H n 0 H n 0 H m 0 H m 0 He:

where e E He and Zi E H m 0 H m 0 He. Now that U is unitary, we have by


Lemma 8.2.5, that

(Xi 000 Yi 0 el Xj 000 Yj 0 e) = (Xi 0 Xi 0 Zi I Xj 0 Xj 0 Zj) (8.63)

for each i, j E {I, ... ,k}. Equation 8.63 can be written as

(Xi I Xj)(O 0 Yi 0 el 00 Yj 0 e) = (Xi I Xj)(Xi 0 Zi I Xj 0 Zj). (8.64)


By the assumption, (Xi I Xj) =I- 0 always, and (8.64) yields

(8.65)

for each i,j E {I, ... , k}. Now, by Lemma 8.2.8, we know that there is a
unitary mapping U' E L(Hn 0 H m 0 H m 0 He) such that

U'(O 0 Yi 0 e) = Xi 0 Zi

for each i E {I, ... , k}. By tracing over He and the other copy of H m , we get
the desired mapping V'. 0

The no-deleting principle by Pati and Braunstein [66] is the following:


If Xl, ..• , Xk contains no orthogonal states, then no physical operation can
delete another copy of Xi. That is, operation

is impossible in a sense that will be clarified in the statement of the next


theorem.
8.4 Quantum States as Operators 165

In fact, the deletion operation is not impossible: in [66], it is emphasized


that we can, for instance, include some environment and just define operation

which is essentially only a swap between the environment state and the second
copy of the state Xi. In this case, the state Xi remains in the environment,
and it can be recovered perfectly.
On the other hand, if the projection postulate is adopted, then deletion
is possible, as pointed out in [48]. For deletion, then, it is enough to observe
the second copy of state lXi) lXi) and then to swap the resulting state into
10).
Theorem 8.4.9 (No-Deleting Principle). Assume that Xl, ... , Xk E H n
and 0 E H n are as above. 1f there exists a completely positive mapping erasing
the second copy of Xi, i.e., a mapping for which

holds, then vectors Xi can be recovered perfectly from the environment.

Proof. Assurne that there is a completely positive mapping for which the
assumption holds. By Theorem 8.4.7, there is aspace He, vectors e, el, ... ,
ek E H n , and a unitary U E L(Hn ()9 H n ()9 He) that satisfies

U(Xi ()9 Xi ()9 e) = Xi ()9 0 ()9 ei·

By Lemma 8.2.5, for each i, j E {I, ... ,k},

which can be written as

or as

(8.66)

Lemma 8.2.8 states that there is a unitary mapping U' E L(Hn ()9 He) such
that

U'(Xi ()9 e) = 0 ()9 ei

for each i E {I, ... , k}. As a unitary mapping, U' also has an inverse, and
this implies that Xi can be recovered from the environment. 0
166 8. Appendix A: Quantum Physics

8.5 Exercises

1. Show that the trace function


n
Tr(A) = L (Xi I AXi)
i=l

is independent of the chosen orthonormal basis {Xl, ... , x n }.


2. Show that if P is a self-adjoint operator H n -+ H n and p 2 = P, then
there is a subspace W E H n such that P = Pw .
3. If AI, ... , An are eigenvalues of a self-adjoint operator A and Xl, ... , X n
the corresponding orthonormal set of eigenvectors, verify that

4. Verify that the polarization equation


3
(X I y) = ~L ik(y + ikx I y + ikx)
k=O

holds. Conclude that if IIAxl1 = Ilxll for any X E H n , then also (Ax I
Ay) = (X I y) for each X, y E H n .
5. Prove that

where
X~ = Xl cosa - X2 sina,
x~ = Xl sina + X2 cosa,
and a is any real number.
6. Prove that I Ax)(By 1= A I x)(y I B* for each x, y E H n and any
operators A, B : H n -+ H n .
7. Derive the generalized Schrödinger equation

.d
zdtP(t) = [H, p(t)]

with equation

p(t) = U(t)p(O)U(t)*

and representation U(t) = e- itH as starting points.


9. Appendix B: Mathematical Background

The purpose of this chapter is to introduce the reader to the basic mathe-
matical not ions used in this book.

9.1 Group Theory


9.1.1 Preliminaries

A group G is a set equipped with mapping G x G ----t G, i.e., with a rule that
unambiguously describes how to create one element of Gout of an ordered
pair of given ones. This operation is frequently called the multiplication or the
addition and is denoted by g = g1g2 or g = g1 + g2 respectively. Moreover,
there is a special required element in G, which is called the unit element
or the neutral element and it is usually denoted by 1 in the multiplicative
notation, and 0 in the additive notation.
Furthermore, there is one more required operation, inversion, that sends
any element of g into its inverse element g-1 (resp. opposite element -g
when additive notations are used). Finally, the group operations are required
to obey the following group axioms (using multiplicative notations):
1. For all elements g1, g2, and g3, g1(g2g3) = (g1g2)g3.
2. For any element g, gl = 19 = g.
3. For any element g, gg-1 = g-1 g = 1.
If a group G also satisfies
4. For all elements g1 and g2, g1g2 = g2g1,
then G is caHed an abelian group or a commutative group.
To be precise, instead of speaking about group G, we should say that
(G,', -\ 1) is a group. Here G is a set of group elements; . stands for the
multiplication; -1 stands for the inversion, and 1 is the unit element. However,
if there is no danger of confusion, we will just use the notation G for the group,
as weH as for the underlying set.
Example 9.1.1. Integers Z form an abelian group having addition as the
group operation, 0 as the neutral element and mapping n f-t -n as the
inversion. On the other hand. natural numbers
168 9. Appendix B: Mathematical Background

N = {1, 2, 3, ... }

do not form a group with respect to these operations, since N is not closed
under the inversion n t-+ -n, and there is no neutral element in N.

Example 9.1.2. Nonzero complex numbers form an abelian group with re-
spect to multiplication. The neutral element is 1, and the inversion is the
mapping z t-+ Z-1.

Example 9.1.3. The set of n x n matrices over C that have a nonzero deter-
minant constitute a group with respect to matrix multiplication. This group,
denoted by GLn(C) and called the general linear group over C, is not abelian
unless n = 1, when the group is essentially the same as in the previous
example.

Regarding group axiom 1, it makes sense to ornit the parenthesis and write
just g1g2g3 = g1 (g2g3) = (g1g2)g3' This clearly generalizes to the products
of more than three elements. A special case is a product g ... 9 (k times),
which we will denote by gk (in the additive notations it would be kg). We
also define gO = 1 and g-k = (g-1)k (Og = 0, -kg = k( -g) when written
additively).

9.1.2 Subgroups, Cosets

A subgroup H of a group G is a group contained in G in the following way:


H is contained in G as a set, and the group operations (product, inversion,
and the unit element 1 ) of H are only those of Gwhich are restricted to H.
If His a subgroup of G, we write H:::; G.
Let us now suppose that a group G (now written multiplicatively) has H
as a subgroup. Then, for each gE G, the set

gH = {gh I h E H} (9.1)
is called a coset of H (determined by g).
Simple but useful observations can easily be made; the first is that each
element 9 E G belongs to some coset, for instance in gH. This is true because
H, as a subgroup, contains the neutral element 1 and therefore 9 = gl E gH.
This observation teIls us that the cosets of a subgroup H cover the whole
group G. Notice that, especially for any h E H, we have hH = H because
H, being a group, is closed under multiplication and each element h 1 E H
appears in hH (h 1 = h(h- 1h 1)).
Other observations are given in the following lemmata.
Lemma 9.1.1. g1H = 92H if and only if g1 1g2 EH.
1 The unit element can be seen as a nullary operation.
9.1 Group Theory 169

Proof. If 91H = 92H, then by necessity 92 = 91h for some h E H. Hence,


91 192= h E H. On the other hand, if 91 192 = h E H, then 92 = 91h and so
92H = 91hH = 91H. 0

Remark 9.1.1. Notice that 91 192 EH if and only if 9;;1 91 EH. This is true
because H contains all inverses of the elements in Hand (91 1g2) -1 = 9;;1 91 .
In the additive notations, the condition 91 192 E H would be written as
-91 + g2 EH.

Lemma 9.1.2. Let H be a finite sub9roup 01 G. Then all the cosets 01 H


have m = IHI elements.

Prool. By the very definition of (9.1), it is clear that each coset can have
at most m elements. If some coset 9H has less than m elements, then for
some h i # hj we must have gh i = 9hj. Multiplication by 9-1, however, gives
h i = h j , which is a contradiction. 0

Definition 9.1.1. 11 91H = 92H, we say that 91 is con9ruent to 92 modulo


H.

Lemma 9.1.3. 11 91H # 92H, then also 91H n 92H = 0.


Proof. Assume, on the contrary, that 91H # 92H have a common element
9. Since 9 E 91H n 92H, we must have 9 = 91h1 = 92h2 for some elements
h1, h2 EH. But then 91 = 92h2hl\ and glH = 92h2hl1 H = 92h2H = 92H,
a contradiction. 0

If G is finite and H ~ G, we call the number of the cosets of H the index


of H in G and denote this number by [G : Hj. Since the cosets cover G, we
know by Lemma 9.1.3 that the group G is a disjoint union of [G : Hj cosets
of H, which all have the same cardinality IHI by Lemma 9.1.2. Thus, we have
obtained the following theorem
Theorem 9.1.1 (Lagrange). IGI = [G : Hj·IHI.
Even though Lagrange's theorem was easy to derive, it has very deep implica-
tion: the group structure, which does not look very demanding at first glance,
is complicated enough to heavily restrict the cardinalities of the subgroups;
the subgroup cardinality IHI must always divide IGI. It follows that a group
G with a prime number cardinality can only have the trivial subgroups, the
one consisting of only the unit element and the group G itself.

9.1.3 Factor Groups

A subgroup H ~ G is called normal if, for any 9 E G and h E H, always


ghg- 1 E H. Notice that all the subgroups of an abelian group are normal.
For anormal subgroup H ~ G we define the product 01 cosets 91H and 92H
by
170 9. Appendix B: Mathematical Background

(9.2)
But, as far as we know, two distinct elements can define the same coset:
91H = 92H may hold even if 91 =I 92. Can the coset product ever not be
well-defined, i.e., that the product would depend on the representatives 91
and 92 which are chosen? The answer is no:
Lemma 9.1.4. If H is anormal subgroup, g1H = g~H and g2H = 9~H,
then also (g1g2)H = (g~g~)H.
Proof. By assumption and Lemma 9.1.1, g11g~ = h 1 EH and g:;1g~ = h 2 E
H. But since H is normal, we have
( 91g2 ) -1 ( 9192
I ')
= g2-1 g1-1 91g2
I I
= g2-1h 192I
= 92-1h 19292-1 g2 = g2-1h 192 h 2 E
I H.
The conclusion that g:;1h192h2 EH is due to the fact that 9:;1h 192 is in H
because H is normal. Therefore, by Lemma 9.1.1, (g1g2)H = (9~9~)H. D

The coset product offers us a method of defining the factor group G j H:


Definition 9.1.2. Let G be a group and H ::::: G anormal subgroup. The
factor group G j H has the cosets of H as the group elements and the coset
product as the group operation. The neutral element of G j H is 1H = Hand
the inverse of a coset gH is 9- 1 H. The factor group is also called the quotient
group.
Using the above terminology, IGjHI = [G: H].
Example 9.1.4. Consider Z, the additive group of integers and a fixed integer
n. Then, all the integers divisible by n form a subgroup
nZ = { ... - 3n, -2n, -n, 0, n, 2n, 3n, ... },
as is easily verified. Any coset of the subgroup nZ looks like
k + nZ = { ... k - 3n, k - 2n, k - n, k, k + n, k + 2n, k + 3n ... }
and two integers kl, k 2 are congruent modulo nZ if and only if k 1 + nZ =
k 2 + nZ which, by Lemma 9.1.1, holds if and only if k 2 - k 1 E nZ, i.e, k 2 - k 1
is divisible by n. The factor group Zj(nZ) is usually denoted by Zn.
Definition 9.1.3. If integers k 1 and k 2 are congruent modulo nZ, we also
say that k 1 and k 2 are congruent modulo n and denote k 1 == k 2 (mod n).
Let us pay attention to the subgroups of a finite group G of special type.
Pick any element 9 E G and consider the set
{ 9 0 ,g 1 ,9 2 , ....
} (9.3)
The set (9.3) is contained in G and must, therefore, be finite. It follows that,
for some i > j, equality 9i = 9i holds, and multiplication by (9i )-1 gives
9 i - i = 1. This supplies the motivation to the following definition.
9.1 Group Theory 171

Definition 9.1.4. Let G be a finite group and let 9 E G be any element.


The smallest positive integer k such that gk = 1 is called the order of 9 and
denoted by k = ord(g).
It follows that, if k = ord(g) , then g/+km = g/ gkm = g/1 m = g/ for
any integer m. Moreover, it is easy to see that the set (9.3) is an abelian
subgroup of G. In fact, the set (9.3) is clearly closed under multiplication
and the inverse of an element g/ can be expressed as g-l+km, where m is any
integer. We say that set (9.3) is a cyclic subgroup of G genemted by 9 and
denote that by

{ 0 1 2
(g=g,g,g,
) ... ,g k-l} . (9.4)

According to Lagrange's theorem, k = ord(g) = l(g)1 always divides IGI.


Thus, we have obtained
Corollary 9.1.1. For each gE G, we have glGI = 1.

Proof. Since IGI is divisible by k = ord(g), we can write IGI = k·l for some
integer I, and then glGI = gk./ = 1/ = 1. 0

9.1.4 Group Z;'

Consider again the group Zn. Although the group operation is the coset
addition, it is also easy to see that the coset product

is a well-defined concept. In fact, suppose that k 1 + nZ = k~ + nZ and


k 2 + nZ = k~ + nZ. Then k1 - k~ and k 2 - k~ are divisible by n, and also
k 1 k 2 -k~ k~ = (k 1 -kDk2+k~ (k 2 -k~) is divisible by n. Therefore, k 1 k 2+nZ =
k~k~ + nZ.
Coset 1 + nZ is clearly a neutral element with respect to multiplication,
but the cosets k + nZ do not form a multiplicative group, the reason being
that the cosets k + nZ such that gcd(k, n) > 1 do not have any inverse. To
see this, assume that k + nZ has an inverse k' + nZ, i.e., kk' + nZ = 1 + nZ.
This means then that kk' - 1 is divisible by n and, hence. also divisible by
gcd(n, k). But then 1 would also be divisible by gcd(n, k), which is absurd
because gcd( n, k) > 1.
To remove the problem of non-invertible elements, we define Z~ to be
the set of all the cosets k + nZ that have a multiplicative inverse; thus, Z~
becomes a multiplicative group. 2 Previously, we saw that all cosets k + nZ
with gcd(k, n) > 1 do not belong to Z~ and next we will demonstrate that
2 To be more algebraic, the group Z~ is the unit group of ring Zn. The fact that nZ
is an ideal of ring Z automatically implies that the coset product is a well-defined
concept.
172 9. Appendix B: Mathematical Background

IZ~consists exactly of cosets k + nlZ such that gcd(k, n) = 1 (Notice that this
property is independent of the representative k chosen). For that purpose we
have to find an inverse for each such coset, which will be an easy task after
the following lemma.

Lemma 9.1.5 (Bezout's identity). For any natural numbers x, y there


are integers a and b such that ax + by = gcd(x, y).

Praof. By induction on M = max{x, y}. If M = 1, then necessarily x = y = 1


and 2·1 - 1· 1 = 1 = gcd(l, 1). For the general case we can assurne, without
loss of generality, that M = x> y. Because gcd(x - y, y) is a divisor of x - y
and y, it also divides x, so gcd(x - y,y) ~ gcd(x,y). Similarly, gcd(x,y) ~
gcd(x - y, y) and, hence, gcd(x - y, y) = gcd(x, y). We apply the induction
hypothesis to the pair (x - y, y) to get numbers a' and b' such that

a'(x - y) + b'y = gcd(x - y, y) = gcd(x, y),

which gives the required numbers a = a', b = b' - a'. D

If gcd(k, n) = 1, we can use the previous lemma to find the inverse of the
coset k + nlZ: let a and b be the integers such that ak + bn = 1. We claim
that the coset a + nlZ is the required multiplicative inverse. But this is easily
verified: since ak -1 is divisible by n, we have ak + nlZ = 1 + nlZ and therefore
(a + nlZ)(k + nlZ) = 1 + nlZ.
How many elements are there in group IZ~? Since all the cosets of nlZ
can be represented as k + nlZ, where k ranges over the set {O, ... , n - 1},
we have to find out how many of those k values satisfy the extra condition
gcd(k,n) = 1. The number ofsuch values of k is denoted by cp(n) = IIZ~I and
is called Euler's cp-function.
Let us first consider the case that n = pm is a prime power. Then, only
the numbers Op, 1p, 2p, ... , (pm-I - l)p in the set {O, 1, ... ,pm - I} da
not satisfy the condition gcd(k,pm) = 1. Therefore, cp(pm) = pm _ pm-I.
Especially cp(p) = p - 1 for prime numbers.
Assurne now that n = nl ... nr, where numbers ni are pairwise coprime,
i.e., gcd(ni,nj) = 1 whenever i -I- j. We will demonstrate that then cp(n) =
cp(nl)··· cp(n r ) and, for that pur pose , we present a well-known result:

Theorem 9.1.2 (Chinese Remainder Theorem). Let n = nl··· n r with


gcd( ni, nj) = 1 when i -I- j. Then, for given cosets k i + nilZ, i E {l, ... , r}
there exists a unique coset k + nlZ of nlZ such that for each i E {I, ... , r}
k + nilZ = ki + nilZ.

Proof. Let mi = njni = nl ... ni-Ini+1 ... n r . Then, clearly gcd(mi, ni) = 1,
and, according to Lemma 9.1.5, aimi + bini = 1 for some integers ai and bio
Let
9.1 Group Theory 173

Then

is divisible by ni, since each mj with j =I- i is divisible as weH, and aimi - 1 is
also divisible by ni because aimi+bini = 1. It foHows that k+niZ = ki+niZ,
which proves the existence of such a coset k + nZ. If also k' + niZ = k i + niZ
for each i, then k' - k is divisible by each ni and, since the numbers ni are
coprime, k' - k is also divisible by n = ni ... n r . Hence, k' + nZ = k + nZ.
o
Clearly, any coset k + nZ E Z~ defines r cosets k + niZ E Z~i' But in
accordance with Chinese Remainder Theorem, alt the cosets k i + niZ E Z~i
are obtained in this way. To show that, we must demonstrate that, if we
have gcd(ki , ni) = 1 for each i, then the k which is given by the Chinese
Remainder Theorem also satisfies gcd(k, n) = 1. But this is straightforward:
if gcd(k,n) > 1, then also d = gcd(k',ni) > 1 for some i and some k' such
that k = k' k". Then, however, k' k" + niZ = k + niZ ~ Z~. It follows that

IZ~l x ... x Z~rI = IZ~I

and therefore,
<p(nI)'" <p(nr ) = IZ~ll" 'IZ~r I
= IZ~l X •.• x Z~rI = IZ~I = <p(n).

Now we are ready to count the cardinality of Z~: let n = p~l ... p~r be the
prime factorization of n, Pi =I- Pj whenever i =I- j. Then
<p( n) = <p(p~l ) ... <p(p~r) = (p~l _ p~l -1) ... (p~r _ p~r-l)

= p~l (1 - ~) ... p~r (1 - ~) = n(l - ~ ) ... (1 - ~).


PI Pr PI Pr
By expressing the Corollary 9.1.1 on Z~ using the notations of Definition
(9.1.3), we have the following corollary.
Corollary 9.1.2. Ifgcd(a,n) = 1, then acp(n) == 1 (mod n).
This result is known as Euler's theorem. If n = P is a prime, then <p(p) = p-1
and Euler's theorem can be formulated as

aP- I == 1 (mod p). (9.5)

Congruence (9.5) is known as Fermat's little theorem.

9.1.5 Group Morphisms

Let G and H be two groups written multiplicatively.


174 9. Appendix B: Mathematical Background

Definition 9.1.5. A mapping ! : G --+ H such that !(glg2) = !(gd!(g2)


Jor all gl, g2 E G is called a group morphism Jrom G to H.
It follows that J(I) = J(1 . 1) = J(I)J(I) and, hence, J(I) = 1. Moreover,
1 = J(I) = J(gg-l) = J(g)J(g-l), which implies that J(g-l) = J(g)-l. By
induction it also follows that J(gk) = J(g)k for any integer k.

Definition 9.1.6. Let J : G --+ H be a group morphism.


1. The set Ker(J) = J-l(l) = {g E G I J(g) = I} is called the kernel oJ J.
2. The set Im(J) = J(G) = {J(g) I g E G} is called the image or the range oJ
J.
If J(kd = J(k 2) = 1, then also J(k l k 2) = 1 and J(k 1 l ) = 1, i.e., Ker(J)
is closed under multiplication and inversion, which means that Ker(J) is a
subgroup of G. In fact, if k E Ker(J), then also

J(gkg- l ) = J(g)J(k)J(g)-l = J(g)J(g)-l = 1,

so Ker(J) is even anormal subgroup. Similarly, it can be seen that Im(J) is


a subgroup of H, not necessarily normal.
A morphism J : G --+ H is injective if gl i- g2 implies J(gd i- J(g2),
surjective if J(G) = Hand bijective if it is both injective and surjective.
Injective, surjective and bijective morphisms are called monomorphisms, epi-
morphisms, and isomorphisms respectively.
Two groups G and G' are called isomorphie, denoted by G c:::: G', if there
is an isomorphism J : G --+ G'. Notice that two isomorphie groups differ only
in notations (any element g is replaced with leg)).
An interesting property is given in the following lemma:
Lemma 9.1.6. Let J : G --+ H be a group morphism. Then the Jaetor group
G /Ker(J) is isomorphie to Im( H).

Proof. First notice that Im(H) is always a subgroup of Hand that Ker(J)
is anormal subgroup of G, so the factor group G/Ker(J) can be defined. We
will denote K = Ker(J) for short and define a function F : G / K --+ Im( H)
by F(gK) = J(g). The first thing to be verified is that Fis well-defined, i.e,
the value of F does not depend on the choice of the coset representative g.
But this is straightforward: If glK = g2K, then by Lemma 9.1.1, gllg2 E
K and, by the definition of K, !(gllg2) = 1, which implies that J(gd =
J(g2) and F(glK) = J(gl) = J(g2) = F(g2K). It is clear that Fis a group
morphism. The injectivity can be seen as folIows: if F(glK) = F(g2K), then
by definition, J(gd = !(g2), hence J(gllg2) = 1, which means that gllg2 E
K. But this is to say, by Lemma 9.1.1, that glK = g2K. It is clear that F is
surjective, hence F is an isomorphism. D
9.1 Group Theory 175

9.1.6 Direct Product

Let G and G' be two groups. We can give the Cartesian product G X G' a
group structure by defining

It is easy to see that G x G' becomes a group with (1,1 ' ) as the neutral
element 3 and (g,g') H (g-1, g'-l) as the inversion. The group G x G' is
called the (outer) direct product of G and G'. The direct product is essentially
commutative, since G x G' and G' x G are isomorphie; an isomorphism is
given by (g,g') H (g',g). Also, (GI xG 2 ) xG 3 ~ GI X (G 2 xG 3 ) (isomorphism
given by ((gl,g2),g3) H (gI, (g2,g3)), so we mayas weIl write this product
as GI x G 2 X G 3 . This generalizes naturally to the direct products of any
number of groups.
If a group G has subgroups H 1 and H 2 such that G is isomorphie to
H 1 x H 2, we say that Gis the (inner) direct product of subgroups H 1 and H 2.
We then write also G = H 1 X H 2 . Notice also that in this case, the subgroup
H 1 (and also H 2 ) is normal. To see this, we identify, with some abuse of
notations, G and H 1 x H 2 and H 1 = {(h, 1) I hE Hd. Then

as required.
The following lemma will make the not ion of the factor group more nat-
ural:
Lemma 9.1.7. IfG = H 1 x H 2 (inner) then G/H1 ~ H 2.

Proof. Since G = H 1 X H 2, there is an isomorphism f : G -+ H 1 X H 2


which gives each 9 E G a representation f(g) = (h 1 , h 2), where h 1 E H 1
and h 2 E H 2. A mapping p : H 1 X H 2 -+ H 2 defined by p(h 1 , h 2) = h 2 is
evidently a surjective morphism, called a projection onto H 2. Because f is
an isomorphism, the concatenated mapping pf : G -+ H 2 is also a surjective
morphism. Now p(f(g)) = 1 if and only if f(g) = (h 1 , 1), which happens if
and only if gEHl. This means that Ker(pf) = H 1 , so by Lemma 9.1.6 we
have G/H1 ~ H 2. 0

Example 9.1.5. Consider again the group Z~, where n = p~l ... p~r is a prime
decomposition of n. It can be shown that the mapping

given by the Chinese Remainder Theorem is an isomorphism.


3 Here 1 and I' are the neutral elements of G and G' respectively.
176 9. Appendix B: Mathematical Background

9.2 Fourier Transforms

9.2.1 Characters of Abelian Groups

Let G be a finite abelian group written additively. A character of G is a


morphism X : G --+ <C \ {O}, i.e., each character satisfies the condition X(gl +
g2) = X(gdX(g2) for any elements gl and g2 E G. It follows that X(O) = 1.
Denoting n = IGI, we also have by Corollary (9.1.1) that X(g)n = x(ng) =
X(O) = 1, so any character value is a nth root of unity.4
An interesting property of characters is that for two characters Xl, X2,
we can define the product character XlX2 : G --+ <C by XlX2(g) = Xl(g)X2(g).
Moreover, it is easy to see that, with respect to this product, the characters
also form an abeliaE group called the character group or the dual group of
G and denoted by G. The neutral element XO of the dual group is called the
principal character or the trivial character and is defined by XO (g) = 1 for
each element 9 E G. The inverse of a character X is character X-I defined by
X-l(g) = X(g)-l.

Example 9.2.1. Let us determine the characters of a cyclic group

G = {g, 2g, ... , (n - l)g, ng = O}.

It is easy to see that any cyclie group of order n is isomorphie to Zn, the
additive group of integers modulo n, so we can consider Zn as a "prototype"
when studying cyclie groups. For any fixed y E Z we define mapping Xy :
Z --+ <C by
21Tixy
Xy(x) =e n •

Since e 27ri = 1, Xy has period n, so we can, in fact, consider Xy as a mapping


Zn --+ <C. Moreover, since Xy(x) = Xx(Y), we can also assume that y E Zn
instead of y E Z. Now

+ z)
21riy(x+z) 21Tixy 27l"izy
Xy(x = e n = e n e n = Xy(x)xy(z),

which means that each Xy is, in fact, a character of Zn. Moreover, if y and
z are some representatives of distinct cosets modulo n, then also Xy and
XZ are different. Namely, if Xy = xz, then especially Xy(1) = Xz(l), i.e.,
e~ = e 2:iZ , which implies that y = z + k . n for some integer k. But then
y and z represent the same coset, a contradiction.
It is straightforward to see that XaXb = Xa+b, so the characters of Zn also
form a cyclic group of order n, genera ted by Xl, for instance. This can be
summarized as folIows: the character group of Zn is isomorphie to Zn; hence,
the character group of any cyclic group is isomorphie to the group itself.
4 A nth root of unity is a complex number x satisfying x n = 1.
9.2 Fourier Thansforms 177

The fact that the dual group of a cyclie group is isomorphie to the group
itself, can be generalized: a well-known theorem states that any finite abelian
group G can be expressed as a direct sum5 (or as a direct product, if the
group operation is thought as multiplication) of cyclic groups Gi:

G = GI EB ... EB Gm· (9.6)

Lemma 9.2.1. Let G be an abelian group as above. Then G ~ G.


Proof. By Example 9.2.1 we know that for each i, Gi ~ Gi, so it suffices to
demonstrate that G = GI X ••• x Gm. For that purpose, let XI,· .. , Xm be
some characters of GI, ... , Gm. Since G = GI EB ... EB Gm, each element g E G
can be uniquely expressed as

g = gl + ... + gm,
where gi E Gi' This allows us to define a function X : G ---+ C \ {O} by

(9.7)

It is now easy to see that X is a character of G. Moreover, if X~ -=I- Xi, then


there is an element gi E Gi such that X~(gi) -=I- Xi(gi) and, hence,
X' (gi) = Xl (0) ... X~ (gi) ... Xm(O)
-=I- Xl(O) .. ·Xi(gi) .. ·Xm(O) = X(gi),
whieh means that characters of G defined by Equation (9.7) are all different
for different choiees of Xl, ... , Xm.
On the other hand, all the characters of G can be expressed as in (9.7).
For, if X is a character of G, we can define Xi by restricting X to Gi' It is
easy to see that each Xi is a character of Gi and that X = Xl ... Xm. 0

As an application, we will find some important characters.

Example 9.2.2. Consider.!Fr, the rn-dimensional vector space over the binary
field. Each element in the additive group of .!Fr has order 2, so group .!Fr has
a simple decomposition:

.!F2 = .!F2 EB ... EB.!F2 .


'-.,.-.'
m components

Now it suffices to determine the characters of .!F2, since by Lemma 9.2.1,


characters of.!Fr are just the rn-fold products of the characters of .!F2. But
the characters of.!F 2 = Z2 were already found in Example 9.2.1: Theyare
211'ixy
Xy(x) =e 2 = (-1)XY
5 Group G is a direct sum of subgroups GI, ... , Gm if each element of G has
unique representation 9 = gl + ... + gm, where gi E Gi.
178 9. Appendix B: Mathematical Background

for Y E {O, I}. Therefore, any character of lFr can be written as

where x . Y = XIYI + ... + XmYm is the standard inner product of vectors


x = (Xl, ... ,Xm) and Y = (YI, ... ,Ym).

9.2.2 Orthogonality of the Characters

Let G = {gI, ... , gn} be a finite abelian group. The functions f : G -+ C


clearly form a vector space V over C, addition and scalar multiplication
defined pointwise: (f + h)(g) = f(g) + h(g) and (c· I)(g) = c· f(g) (see
also Section 9.3). We also have an alternative way of thinking about V: each
function f : G -+ C can be considered as an n-tuple

(9.8)
so the vector space V evidently has dimension n. The natural basis of V is
given by el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , e n = (0,0, ... ,1). In other
words, ei is a function G -+ C defined as

I, if i = j
ei (gj) = { 0, otherwise.
The standard inner product in space V is defined by
n

(f I h) = L !*(gi)h(gi), (9.9)
i=l

and the inner product also induces norm in the very natural way:

Ilhll = y'(hfh).
Basis {eI, ... e n } is clearly orthogonal with respect to the standard inner
product. Another orthogonal basis is the characters basis:
Lemma 9.2.2. If Xi and Xj are characters of G, then

0, if i '" j (9.10)
(xiIXj)= { n,ifi=j.

Praof. First, it is worth noticing that

1 = Ix(g)1 2 = X*(g)X(g),
which implies that X*(g) = X(g)-l for any 9 E G. Then
n n n

(Xi I Xj) = LX;(gk)xj(9k) = LX;I(9k)Xj(9k) = L(X;IXj)(9k).


k=l k=l k=l
9.2 Fourier Transforms 179

If i = j, then X;IXj is the principal character and the claim for i = j follows
immediately. On the other hand, if i =I- j, then X = X;IXj is a nontrivial
character of G and it suffices to show that

°
n

S = LX(9k) =
k=1

for a nontrivial character X. Because of nontriviality, there exists an element


9 E G such that X(g) =I- 1. Furthermore, mapping 9 H 9 + gi is apermutation
of G for any fixed g, so
n n n
S = LX(9k) = LX(g + gk) = X(g) LX(9k) = X(g)S.
k=1 k=1 k=1

The claim S = ° follows now from the equation (1 - X(g))S = 0. D

Since the characters are orthogonal and there are n = IG I of them, they
also form a basis. By scaling the characters to the unit norm, we obtain an
orthonormal basis T3 = {BI, ... , B n }, where

Other interesting features of the characters can be easily deduced: let us


define a matrix X E Cn x n by

By denoting the transposed complex conjugate of X by X*, since Xij


X:;(gi), we get
n

k=1
n
= LX:(9k)xj(9k) = (Xi I Xj)
k=1

= {O,n, If~f z~ =I- ~


=],

which actually states that X* X = nI, which implies that X-I = .1n X*. But
since any matrix commutes with its inverse, we also have XX* = nI, which
can be written as

(XX*) ..
'J
= {O,n, If~f z~ =I- ~
=],

or as
180 9. Appendix B: Mathematical Background

~ () *()
L.t Xk gi Xk 9j =
{O,n ifif ii =# J'.j (9.11)
k=l '

Equations (9.10) and (9.11) are known as the orthogonality relations of char-
acters. Notice also that, by choosing Xj as the principal character in (9.10)
and 9j as the neutral element in (9.11), we get a useful corollary:
Corollary 9.2.1.

~ ( ) = {n, if X is the principal character


L.t X 9k
k=l
° otherwise.
'
(9.12)

~ () = {n, if 9 is the neutral element (9.13)


L.t Xk 9 0, otherwise.
k=1

In groups Zn and F2' we have a nice symmetry: Xi(9j) = Xj(9i). Therefore,


the above formulae can be contracted to one formula in both groups. In Zn
it becomes

L e 2"ixy
n
n, if X =
= { 0, otherwise,
°
yEZ n

and in F2' we have

~ (-1 )""Y = { 2m , if X = ?
L.t 0, otherwlse.
YElF2'

9.2.3 Discrete Fourier Thansform

We have now all the tools for defining the discrete Fourier trans form: any
element f E V (recall that V is the vector space of functions G --+ q has
unique representation with respect to basis !3 = {JnX1"'" JnXn}:

(9.14)

Definition 9.2.1. The function f : G --+ C defined by

where his the coefficient of Bi in the representation (9.14) is called the


discrete Fourier transform of f.
Because ß is an orthonormal basis, we can easily extract any coefficient fi
by calculating the inner product of Bi and (9.14):
n n
(Bi 11) = L(Bi I JjB j ) = L Jj(B i I Bj ) = K
j=l j=l
9.2 Fourier Transforms 181

Thus, the Fourier transform can be written as

(9.15)

By its very definition, it is dear that T+h = 1+ hand -;;j = cl for any
functions f, hand any cE C.

Example 9.2.3. An interesting corollary of the orthogonality relation (9.11)


can be easily derived:
n
11111 = (11 1) = L!*(9i)f(9i)
2
i=l

The equation 11111 = IIfil thus obtained is known as Parseval's identity.

Example 9.2.4. Let G = Zn. As we saw in Example (9.2.1), the characters of


Zn are all of form

and, therefore, the Fourier transform of a function f : Zn -+ C takes form

Example 9.2.5. The characters of group G = lFr are

so the Fourier transform of f : lFr -+ C looks like

This Fourier transform in lFr is also called a Hadamard transform, Walsh


transform, or Hadamard- Walsh transform.
182 9. Appendix B: Mathematical Background

9.2.4 The Inverse Fourier Transform

Notice that Equation (9.15) can be rewritten as

and that the matrix appearing in the right-hand side of (9.16) is X*, the
transpose complex conjugate of matrix X defined as X ij = Xj(gi).6 By mul-
tiplying (9.16) by X we get

(9.17)

which gives the idea for the following definition.


Definition 9.2.2. Let f : G -+ C be a function. The inverse Fourier trans-
form of f is defined to be

(9.18)

Keeping equations (9.16) and (9.17) in mind, it is clear that f= j= f.


Example 9.2.6. In Zn we have Xx(y) = Xy(x), so the inverse Fourier trans-
form is quite symmetrie to the Fourier transform:

j(x) = )n L e 2~~XY f(y)·


yEZ n

In group 11"2 the symmetry is perfeet:

-
f(x) = 1
Mm ""
~ (-I)""Y f(y) = ~
f(x).
v 2.. · Y ElF'"
2

6 Equation (9.16) makes Parseval's identity even clearer: matrices JnX and JnX*
are unitary and, therefore, they preserve the norm in V.
9.3 Linear Algebra 183

9.2.5 Fourier Transform and Periodicity

We will now give an illustration on how powerfu11y the Fourier transform can
extract information about the periodicity.
Let f : G ---t C be a function with period pE G, Le., f(g + p) = f(g) for
any 9 E G. Then
1
Vn L
~ n
f(gi) = X:(9k)f(9k)
k=l
1 n
= Vn (; Xi(9k + P - p)* f(9k + p)

= X:( -p) Jn t k=l


Xi(9k + P)f(9k + p)
= xi( -p)j(9i) ,
which implies that !(9i) = 0 whenever xi( -p) = Xi( _p)-l Xi (P) =f. 1.
Example 9.2.7. In group Zn the above equation takes form
.- 2n-ixp'-
f(x) = e- n f(x),
...... 21rixp
which means that f(x) = 0 whenever e- n =f. 1, which happens exactly
when xp is not divisible by n. If also gcd(p, n) = 1, then -xp can be divisible
by n and is so if and only if xis.

9.3 Linear Algebra

9.3.1 Preliminaries

A vector space over complex numbers is an abeIian group V (usually written


additively) that is equipped with scalar multiplication, which is a mapping
C x V ---t V. The scalar multipIication is usua11y denoted by (c,x) f-t cx.
It is required that the following vector space axioms must be fulfi11ed for a11
C,Cl,C2 E C and X,X1,X2 E V:

1. CI(C2X) = (CIC2)X.
2. + C2)X = CIX + C2X.
(Cl
3. C(XI + X2) = CXI + CX2·
4.1x=x.
Again, to be precise, we should talk ab out vector space (V, +, 0, -, C, .) in-
stead of space V, but to simplify the notations we identify the space and the
underlying set V. The elements of V are called vectors.
184 9. Appendix B: Mathematical Background

Example 9.3.1. Let V = C n , the set of n-tuples over complex numbers. Set
C n equipped with sum

(Xl, ... , Xn ) + (Yl' ... ' Yn) = (Xl + Yl,···, Xn + Yn),


zero element (0, ... ,0) and inversion

(Xl, ... , Xn ) H (-X!, ... , -Xn )

is evidently an abelian group. This set equipped with scalar multiplication

c(XI, ... , x n ) = (CXl, ... , cx n )

becomes a vector space, as is easily verified.

Example 9.3.2. The set of functions x : N -+ C, for which the series


00

is convergent, also constitutes a vector space over C. The sum and scalar
product are again defined pointwise: for functions x and y, (x + y)(n) =
x(n) + y(n) and (cx)(n) = cx(n). To simplify the notations, we write also
X n = x(n). To verify that the sum of two elements stays in the space, we
also have to check that the series E:=l /xn + Yn/ 2 is convergent whenever
E:=1/Xn /2 and E:=1/Yn/ 2 are convergent. For this purpose, we can use
estimations
/X + y/ 2 = /x/ 2 + 2Re(x*y) + /y/ 2
:::; /x/ 2 + 2/x//y/ + /y/ 2 :::; 2/x/ 2 + 2/y/ 2
for each summand. The vector space of this example is denoted by L 2 (C).
Notice that the domain N in the definition could be replaced with any numer-
able set. If, instead of N, there is a finite domain {l, ... ,n}, the convergence
condition would be unnecessary and the space would become essentially the
same as cn.
Definition 9.3.1. A linear combination of vectors Xl, ... , X n is a finite sum
+ ... + cnxn . A set S C V is calted linearly independent if
Cl Xl

ClXl + ... + CnX n = 0

implies Cl = ... = Cn = 0 whenever Xl, ... , X n ES. A set that is not linearly
independent is linearly dependent.
Definition 9.3.2. For any set S ~ V, L(S) is the set of alt linear combi-
nations of vectors in S. By its very definition, set L( S) is a vector space
contained in V. We say that L(S) is generated by Sand also calt L(S) the
span of S.
9.3 Linear Algebra 185

The proof of the following lemma is left as an exercise.


Lemma 9.3.1. A set S ~ V is linearly dependent if and only if some vector
x E S can be represented as a linear combination of vectors in S \ {x}.
In ligth of the previous lemma, a linearly dependent set contains some "un-
necessary" elements in the sense that they are not needed when making
linear combinations. For, if x E S can be expressed as a linear combination
of S\ {x}, then clearly L(S\ {x}) = L(S) (cf. Exercise 1).

Definition 9.3.3. A set Be V is a basis ofV ifV = L(B) and Bis linearly
independent.
It can be shown that all the bases have the same cardinality (See Exercise
2). This gives the following definition.
Definition 9.3.4. The dimension dim(V) of a vector space V is the cardi-
nality of a basis of V.
Example 9.3.3. Vectors el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , e n =
(0,0, ... ,1) form a basis of C n . This is a so-called natural basis. Thus C n ,
has dimension n.

Example 9.3.4. As in the previous example, we could define ei E L 2 (C) by

I, if i = n
ei(n)= { O,ifii-n

for each i E N. It is clear that the set t: = {el,e2,e3, ... } is linearly in-
dependent, but it is not a basis of L 2 (C) in the sense of definition (9.3.3),
because there are vectors in L 2 (C) that cannot be expressed as a linear com-
bination of vectors in c. This is because, according to definition (9.3.1), a
linear combination is a finite sum of vectors (Exercise 4).
Definition 9.3.5. A subset W ~ V is a subspace of V if W is a subgroup
of V that is closed under scalar multiplication.
Example 9.3.5. Consider the n-dimensional vector space C n . Set

is clearly a subspace of cn having eI, ... en-l as basis. Hence, dim(W) =


n-l.

Example 9.3.6. Let

W = {x E L 2 (C) IXn i- 0 only for finitely many n E N}.

It is straight forward to see that W is a subspace of L 2 (C) and that the set t:
of example (9.3.4) is a basis of W.
186 9. Appendix B: Mathematical Background

9.3.2 Inner Product

A natural way to introduce geometry into a complex vector space V (a vector


space over C) is to define the inner product.
Definition 9.3.6. An inner product on V is a mapping Vx V ~ C, (x, y) I-t
(x I y) that satisfies the following conditions for any c E C and any x, y,
and Z E V
1.(x I y) = (y I x)*.
2. (x I x) 2: 0 and (x I x) = 0 if and only if x = o.
3. (x I ClY + C2 Z ) = Cl(X I y) + C2(X I z).
In axiom 1 and hereafter, * means complex conjugation. Notice that byaxiom
1 it follows that (x I x) is always real, so axiom 2 makes sense. From axioms
1 and 3 it follows that

A vector space equipped with an inner product is also called an inner product
space.
Example 9.3.1. For vectors x = (Xl, ... , Xn ), and y = (Yl, ... , Yn) in C n the
formula

(x I y) = XiYl + ... + x~Yn


defines an inner product, as is easily verified. An inner product of x and
y E L 2 (C) can be defined by formula

(9.19)

since the series (9.19) is convergent. This is because

and the series 2:::=:='=1 Ix n l2 and 2:::=:='=1 Ix n l2 converge by the very definition of
L 2 (C).

Definition 9.3.7. Vectors x and y are orthogonal if (x I y) = o. Two sub-


sets SI <;;:; V and S2 <;;:; V are mutually orthogonal if all Xl E SI and X2 E S2
are orthogonal. A set S <;;:; V is orthogonal if Xl E Sand X2 E S are orthog-
onal whenever Xl i= X2
9.3 Linear Algebra 187

An orthogonal set that does not contain the zero vector 0 is always linearly
independent: for, if S is such a set and

is a linear combination of vectors in S, then Ci(Xi I Xi) = (Xi I CiXi) = (Xi I


0) = 0, which implies Ci = 0 since Xi i=- O.
Example 9.3.8. Let W ~ V be a subspace. Then also

W..L = {x E V I (x I y) = 0 for all y E W}

is a subspace of V, the so-called orthogonal complement of W.

Lemma 9.3.2 (Cauchy-Schwartz inequality). For any vectors x, y E V,

I(x I y)1 2 ::; (x I x)(y I y).

Proof. If y = 0, then (x I 0) = (x I 0 + 0) = (x I 0) + (x I 0), so (x I 0) = 0


and the claim holds with equality. If y i=- 0, we can define A = - ~:I:l. Then,
by the inner product axiom 2,

o ::; (x + AY I x + AY) = (x I x) + A(X I y) + A* (y I x) + IAI 2 (y I y),

which can also be written as

o <- (x I x) - (y I x) (x I y)
(y I y) ,
and the claim follows. o
Definition 9.3.8. A norm on a vector space is a mapping V -+ lR, x t--+ Ilxll
such that, for all vectors x and y, and C E <C we have
1.llxll ~ 0 and Ilxll = 0 if and only ifx = o.
2·llcxll = !clllxll·
3·llx + yll ::; Ilxll + Ilyll·
A vector space equipped with a norm is also called a normed space . Any
inner product always induces a norm by

Ilxll=~,
so an inner product space is always a normed space as weH. To verify that
~ is a norm, it is easy to check that the conditions 1 and 2 are fulfilled;
and for condition 3, we use the Cauchy-Schwartz inequality, which can also
be stated as I(x I y) I ::; Ilxllllyll· Thus,
Ilx + Yl12 = (x + Y I x + y) = IIxl1 2 + 2Re(x I y) + IIYl12
::;IIxl1 2 + 21(x I y)1 + IIyl12 ::; IlxW + 211xllllyII + IlyW
= (Ilxii + Ily11)2.
188 9. Appendix B: Mathematical Background

Any norm can be used to define the distance on V:


d(x, y) = Ilx - yll·
One can easily verify that d : V X V -+ ~ satisfies the axioms of a distance
function:
For each x, y, and z E V,
1 d(x,y) = d(y,x) ~ o.
2 d(x, y) = 0 if and only if x = y.
3 d(x, z) S d(x, y) + d(y, z).
A characteristic property of a normed space that is also an inner product
space is that
(9.20)
holds for any x, y E V (See Exercise 5). Equation (9.20) is called the Paml-
lelogmm rule.
Definition 9.3.9. Let V be a normed space. Ij, for each sequence of vectors
Xl, X2, ... such that

lim Ilxm
m,n-too
- xnll = 0,

there exists a vector X such that


lim Ilxn - xii =
m-too
0,
we say that V is complete.
A dass of vectors space having great importance in quantum physics and
in quantum computation is introduced in the following definition.
Definition 9.3.10. An inner product space V is a Hilbert space ifV is com-
plete with respect to the norm induced by the inner product.
Example 9.3.9. Both C n and L 2(C) are Hilbert spaces, but the subspace W
of L 2 (C) in Example 9.3.4 is not (cf. Exercise 6).
Definition 9.3.11. A vector space V is a direct sum of subspaces W I and
W 2, denoted by V = W I EB W 2, if V is a direct sum of the subgroups W I and
W2•
The next lemma shows that Hilbert spaces are structurally well-behaving:
Lemma 9.3.3. If a subspace W of a Hilbert space V is itself W a Hilbert
space, then V = W EB W~ (see Example (9.3.8)).
Definition 9.3.12. Let V and W be complex vector spaces. A mapping f :
V -+ W is a vector space morphism if
f(CIX + C2Y) = cd(x) + c2!(y)
for each x, y E V and Cl, C2 E C. Vector space morphisms are also called
linear mappings or opemtors.
9.4 Number Theory 189

9.4 Number Theory

9.4.1 Euclid's Algorithm

It should be emphasized that the proof of Lemma 9.1.5 already supplies a


recursive method for computing the greatest common divisor of two given
natural numbers:
gcd(x - y,y), if x> y
{
gcd(x, y) = gcd(x, y - x), ~f x< y
x, lf x = y.

It is clear that this method always gives gcd(x, y) correctly. However, the
method is not quite efficient: for instance, in computing gcd(x, 1) it proceeds
as

gcd(x, 1) = gcd(x - 1,1) = gcd(x - 2, 1) = ... = gcd(1, 1) = 1,


so it takes x recursive calls to perform this computation. Keeping in mind that
the decimal representation of x has length l(x) = LloglO X J, the computational
time is proportional to lOl(x) , Le., the current algorithm is exponential in l(x).
A more efficient method is given by Euclid's algorithm, whose core is given
in the following lemma.

and r such that x = dy + r where °: ;


Lemma 9.4.1. Given natural numbers x > y, there are unique integers d
r < y. Numbers y and d can be /ound
by using l(X)2 elementary arithmetic operations (multiplication, addition, and
subtraction 0/ two digits).
Euclid's algorithm is based on consecutive application of the previous lemma:
given x > y, we can find representations
x = d1y + rl, (9.21)
y = d2rl +r2, (9.22)
rl = d3r2 + r3,

rn-2 = dnrn-l + rn , (9.23)


rn-l = dn+1rn . (9.24)
The above procedure certainly terminates with some rn+l = 0, since rl, r2,
... is a strictly descending sequence of non-negative numbers. We claim that
gcd(x, y) equals to r n , the last nonzero remainder in this procedure. First, by
(9.24), r n divides rn-I. By (9.23) r n also divides r n -2. Continuing this way,
we see that r n divides all the numbers ri and also x and y. Therefore, r n is
a common divisor of x and y. Assurne, then, that there is another number
s that divides both x and y. By (9.21), s divides rl. By (9.22), s divides
190 9. Appendix B: Mathematical Background

T2 and again continuing this reasoning, we see that s eventually divides T n .


Therefore T n is the greatest common divisor of x and y.
How rapid is this method? Since each Ti < x, we know that each iteration
step can be done in a time proportional to f(x)2. Thus, it suffices to analyze
how many times we need to iterate in order to reach T n +1 = O. Because one
single iteration step gives an equation

where 0 ::; Ti < Ti-I, we have Ti = Ti-2 - diTi-1 < Ti-2 - diTi. The estimation
Ti-2> (1 + di)Ti :2: 2Ti follows easily. Therefore,
x >y> Tl > 2T3 > 4T5 ... > 2iT2i+1.
n-l n-l
If n is odd, we have x > 2-2-Tn :2: 2-2-. For an even n, we also have
n-2 n-2 n-2
X > 2-2- Tn _1 > 2-2-T n , so the inequality x > 2-2- holds in any case. It
follows that f(x) = 1l0glO X J > 10glO x-I > n 22 10glO 2 - 1. Therefore,
n< - 2\
oglQ
2 (f(x) + 1) + 2 = O(f(n)). As a conclusion, we have that, for num-

bers x> y, gcd(x, y) can be computed using O(f(x)3) elementary arithmetic


operations.

9.4.2 Continued Fractions


Example 9.4.1. Let us apply Euclid's algorithm on pair (263,189):
263 = 1 . 189 + 74,
189=2·74+41,
74 = 1·41 + 33,
41 = 1· 33 + 8,
33 = 4·8 + 1,
and the algorithm terminates at the next round giving gcd(263, 189) = 1. By
divisions we obtain
263 _ 1 74
189 - + 189'
189 41
74 = 2+ 74'
74 _ 1 33
41 - + 41'
41 -1 ~
33 - + 33'
33 1
8 = 4+ 8'
a set of equations such that the fraction occurring on the right-hand side is
the inverse of the fraction on the left-hand side of the next equation. We can
thus combine these equations to get a representation
9.4 Number Theory 191

263 1
- = 1+ 1 (9.25)
189 2 + ___"...-_
1
1+ 1
1+--1
4+ 8
Expression (9.25) is an example of a finite continued jraction.
It is clear that this procedure can be done for any positive rational number
a = ~ ?: 1 in the lowest terms.
For other values of a, there is a unique way to express a = aa + ß where
aa E Z and ß E [0,1). If ß =1= 0, this can also be written as
1 1
a = aa + -, where al = -ß > l.
al
By applying the same procedure recursively to al, we get an expansion
1
a = aa + - - - - - - - : ; 1 - - - (9.26)
al +------::-1--

where aa E Z, and al, a2, ... are natural numbers. If a is an irrational


number, then the sequence aa, ab a2, ... never stops (otherwise a would be
a rational number). Mainly for typographical reasons, we write (9.26) as
aa E Z, ai E N, for i ?: 1, (9.27)
and say that (9.27) is the continued jraction expansion of a. It is clear by its
very construction that each irrational a has unique infinite continued fraction
expansion (9.27).
On the other hand, all rational numbers r have a finite continued fraction
expansion
aa E Z, ai E N for 1 S; n (9.28)
that can be found by using Euclid's algorithm. But the expansion (9.28) is
never unique, as is shown by the following lemma.
Lemma 9.4.2. 11 r has expansion (9.28) olodd length, the r also has ex-
pansion (9.28) 01 even length and vice versa.
Proof. This follows straightforwardly from the identity
1
raa, al,···, an-l, an] = raa, al,···, an-l + -].
an
If an ?: 2, then raa, ab .. ·, an] = raa, al ... , an - 1,1]. If an = 1, then
raa, ab···, an-l, 1] = raa, al,···, an-2, an-l + 1]. o
192 9. Appendix B: Mathematical Background

Example 9.4.2. The expansion (9.25) can be written in two ways:


263 1 1
-=1+ 1 =1+ 1
189 2+ __----,.__
1 2+ 1
1+ 1 1+ 1
1+--1 1+ 1
4+ "8 4+--1
7+ 1
which illustrates how we can either lengthen or shorten any finite continued
fraction expansion by 1. It can, however, be shown that if a finite continued
fraction expansion is required to end up by an > 1, then the representation
is unique.
Clearly, each finite continued fraction represents some rational number.
But even though we now know that each irrational number has unique rep-
resentation as an infinite continued fraction, we cannot tell apriori if each
sequence

aa, al, a2, ... , where aa E Z and ai E N, when i ?: 1 (9.29)

regarded as a continued fraction represents an irrational number. In other


words, we do not yet know if the limit

lim raa, al,···, an] (9.30)


n-+oo

exists for each sequence (9.29). We will so on find an answer to that quest ion.
Let (ai) be a sequence as in (9.29). The nth convergent of the sequence
(ai) is defined to be Pn
qn
= [aa, ... , an]. A simple calculation shows that
Pa aa
qa T'
PI aaal +1

q2 al + ...l.
a2
a2ql + qa
Continuing this way, we see that Pn and qn are polynomials that depend on
aa, al, ... an. We are now interested in finding the polynomials Pn and qn,
so we will regard ai as indeterminates, assuming that they do not take any
specific integer values. After calculating some first Pn and qn, we may guess
that there are general recursion formulae for polynomials Pn and qn.
Lemma 9.4.3. For each n ?: 2,
= anPn-1 + Pn-2,
Pn (9.31 )
qn = anqn-l + qn-2· (9.32)
hold for each n ?: 2.
9.4 Number Theory 193

Proof. By induetion on n. The formulae (9.31) and (9.32) are true for n = 2,
as we saw before. Assume then that they hold for numbers {2, ... ,n}. Then,

Pn+l
-- = [ao,···,an-l,an,an+l 1= [ao,···,an-l,an +--
1 1
qn+l an+l
(an + ~ )Pn-l + Pn-2
= (an+ - 1a )qn-l + qn-2
n+l

an+l(anPn-l + Pn-2) + Pn-l


an+1(anqn-l + qn-2) + qn-l
_ an+lPn + Pn-l
- an+lqn + qn-l '
so the formulae (9.31) and (9.32) are valid for eaeh n. D

If al, a2, ... are natural numbers, then it follows by (9.32) that qn >
qn-l + qn-2 and the inequality qn 2: F n , where F n is the nth Fibonacci
number,7 ean be proved by induetion. In the sequel we will use notation Pn
and qn also for the Pn and qn evaluated at {ao, al, ... , an}. The meaning of
individual notation Pn or qn will be clear by the eontext.
Using the reeursion formulae, it is also easy to prove the following lemma:
Lemma 9.4.4. For any n 2: 1, Pnqn-l - Pn-lqn = (_1)n-l.
From the previous lemma it also follows that the eonvergents fu are always
in their lowest terms: for if d divides both Pn and qn, then d al~o divides 1.
Lemma 9.4.4 has even more interesting eonsequenees: By multiplying

(9.33)

byan and using the reeursion formulae (9.31) and (9.32) onee more, we see
that

(9.34)

whenever n 2: 2. Equations (9.33) and (9.34) can be rewritten as

Pn-l
Pn - - (_1)n-l
- - (9.35)
qn qn-l qnqn-l
and

Pn-2
Pn - - ( -1)n an
- - (9.36)
qn qn-2 qnqn-2

Equation (9.35) implies inequality


7 Fibonacci numbers F n are defined by Fa = F 1 = 1 and by F n = Fn - 1 + F n -2.
194 9. Appendix B: Mathematical Background

P2n < P2n-l ,


q2n q2n-l

which, together with (9.36), implies that


P2n-2 < P2n < P2n-l < P2n-3.
(9.37)
q2n-2 q2n q2n-l q2n-3

Inequalities (9.37) show that the sequence of even convergents is strictly


increasing but bounded above by the sequence of odd convergents which is
strictly decreasing. It follows by (9.35), and by the fact that qn tends to
infinity (recall that qn 2': F n ), that both of the sequences converge to a
limit a. Therefore, each sequence (9.29) represents some irrational number
a. Moreover, since the limit a is always between two consecutive convergents,
(9.35) implies that

(9.38)

Example 9.4.3. The continued fraction expansion of'lr begins with

'Ir = [3,7,15,1,292,1,1, ...].


Since q4 = 113 and q5 = 292 . q4 + q3 = 33102, ~ = ~~~ is a very good
approximation of'lr with a relatively small nominator. By (9.38),

355
113 - 'Ir
1 1
~ 113. 33102 = 0.000000267342 ....
1

In fact,

355 - 1 = 0.000000266764 ... ,


1 113 'Ir

so the estimation given by (9.38) is also quite precise.

Remark 9.4.1. Inequality (9.38) teIls us that, for an irrational a, the conver-
gents fu
h
give infinitely many rational approximations Eq with gcd(p, q) = 1
of a so precise that

(9.39)

This has furt her number theory interest, since the rational numbers have only
finitely many rational approximations (9.39) such that gcd(p, q) = 1. This is
true because, if a = ~ with s > 0, then for q 2': s and ~ =I- ~,

I!!.q - ?:.I = 1ps - qr 12': ~ 2': ~.


s qs qs q2
9.4 Number Theory 195

For 0 < q < s there are clearly only finitely many approximations ~ that
satisfy (9.39) and gcd(p, q) = 1.
Thus, the irrational numbers can be characterized by the property that
they have infinitely many rational approximations (9.38). The continued frac-
tion expansion also gives finitely many rational approximations (9.39) for
rational numbers. On the other hand, we will show in a moment (Theorem
9.4.3) that approximations which are good enough are convergents.

It can even be shown that the convergents to a number aare the "best"
rational approximations of a in the following sense. For the proof of the
following theorem, consult [44J.

Theorem 9.4.1. Let Er!.


qn
be a convergent to a. 1f 0 < q ::; qn and P.
q
=I- Er!.
qn'
then

whenever n ~ 2.

Now we will show that approximations which are good enough are con-
vergents. To do that, we must first derive an auxiliary result. Let a
[ao,al,a2 ... J be a continued fraction expansion of a. We define an+l =
[a n +1,a n +2, ... J and this gives us a formal expression

This expression is not necessarily a continued fraction in the sense that an ~ 1


need not be a natural number, but anyway it gives us a representation

(9.40)

with Pnqn-l - Pn-lqn = (_l)n-l just like in the proof of Lemma 9.4.3. But
also any representation looking "enough" like (9.40) implies that Er!.
qn
and Pn-l
qn-l
are convergents to a. More precisely:
Theorem 9.4.2. Let P, Q, P', and Q' be integers such that Q > Q' > 0
and PQ' - QP' = ±l. 1f also a' ~ 1 and

a'P+P'
a = a'Q+Q"

then QP; = Pn-l


qn-l
and -QP = Er!.
qn
are convergents of a continuedjraction expansion
ofa.
196 9. Appendix B: Mathematical Background

Proof. Anyway, we have a finite continued fraction expansion for

(9.41 )

By Lemma 9.4.2, we can choose n to be even or odd, so we can assume,


without loss of generality, that

PQ' _ QP' = (_l)n-l. (9.42)

Also by Lemma 9.4.4, we have gcd(Pn, qn) = 1 and, similarly, gcd(P, Q) = 1.


Since Q and qn have the same signs (Q is positive by assumption), it follows
by (9.41) that P = Pn and Q = qn' Using this knowledge and Lemma 9.4.4,
equation (9.42) can be rewritten as

or even as

(9.43)

Since gcd(Pn,qn) = 1, it follows by (9.43) that qn divides Q' - qn-l.


On the other hand, Q' - qn-l < Q' < qn since qn = Q > Q' > 0 by as-
sumption. Also qn-l - Q' < qn-l < qn and, by combining these observations,
we see that

which can be valid only if Q' = qn-l. By (9.43) it also follows that P' = Pn-l.
Hence,

which can also be written as

(9.44)

just like in the proof of Lemma 9.4.3. Since a' :::: 1, there is a continued
fraction expansion a' = [an+l' an+2, ... ] such that an+l :::: 1. 0

Now we are ready to conclude this section by


Theorem 9.4.3. 1f

0< q IE - al -< _1_


2q 2'

then E
q
is a convergent to the continued fraction expansion of a.
9.5 Shannon Entropy and Information 197

Proof. By assumption,

P ()
- -a=a-,
q q2

for some 0 < () :s; ~ and a = ±1. Let ~ = [aa, al, ... , an] be the continued
fraction expansion for ~. Again by Lemma 9.4.2, we may assurne that a =
(-1)n-l. Wedefine

where EI!.
qn
= E.q and Pn-l
qn-l
are the last and the second-Iast convergents to E..
q
Thus, we have

a=
a'Pn + Pn-l
a'qn + qn-l
and

Thus,

, 1 qn-l
a =---->2-1=1.
() qn

By Theorem 9.4.2, Pn-l


qn-l
and EI!. = E.
qn q
are consecutive convergents to a. D

9.5 Shannon Entropy and Information

The purpose of this section is to represent, following [20], the very elemen-
tary concepts of information theory. In the beginning, we concentrate on
information represented by binary digits: bits.

9.5.1 Entropy

By using a single bit we can represent two different configurations: the bit is
either 0 or 1. Using two bits, four different configurations are possible, three
bits allow eight configurations, etc. Inspired by these initial observations, we
say that the elementary binary entropy of a set with n elements is log2 n. The
elementary binary entropy, thus, approximately measures the length of bit
strings which we need to label all the elements of the set.
198 9. Appendix B: Mathematical Background

Remark 9.5.1. Consider the problem of specifying one single element out of
given n elements such that the probability of being the outcome is ~ for
each element. The elementary binary entropy log2 n thus measures the initial
uncertainty in bits, i.e., to specify the outcome, we have to gain log2 n bits
of information. In other words, we have to obtain the binary string of log2 n
bits that labels the outcome.
On the other hand, using an alphabet of 3 symbols, we would say that the
eZementary ternary uncerlainty of a set of n elements is log3 n. To get a more
unified approach, we define the eZementary entropy of a set of n elements to
be

H(n) = Klogn,

where K is a constant and log is the naturallogarithm. Choosing K = lo~ 2


gives the elementary binary information, K = lo~ 3 gives the ternary, etc.
However, it feels wen justified to say that the information needed in aver-
age is less than Klogn ifthe elements appear as outcomes with non-uniform
probability distribution. For example, if we have apriori knowledge that some
element does not appear at an, we can say that the information needed, i.e.,
the initial uncertainty of the outcome, is at most Klog(n - 1). How do we
deal with a set of n elements, each appearing with a known probability of
Pi?
Assume that we choose Z times an element of a set having n elements
with uniform probability distribution. This experiment can also be viewed as
specifying astring of length Z over an n-Ietter alphabet. There are n l such
strings, and since the letters appear with equal probabilities, the strings do
as wen. Thus, the elementary entropy of the string set is

H(n l ) = K log n l = Z· Klogn = Z· H(n),

which is exactly what we could expect: the elementary entropy of strings


length Z is Z times the elementary entropy of a single letter. We use a similar
idea to handle the non-uniform probability distributions.
Let us choose Z elements of an n-element set having probability distribu-
tion PI, ... , Pn. Therefore, with large enough Z, the ith letter in any such
string should appear approximately k i = PiZ times. Let us study a simplified
case where each k i = PiZ is an integer. A simple combinatorial calculation
shows that there are exactly
I!
k l !··· k n !

strings where the ith letter occurs k i = PiZ times. Moreover, the distribu-
tion of such strings tends to uniform distribution as Z tends to infinity. The
elementary entropy of the set of such strings is
9.5 Shannon Entropy and Information 199

I!
Klog k 1 ! ... k n ! = K(logl! -log k 1 ! - ... -log kn!).

By using log k! = k log k - k + O(log k) ami I = k 1 + ... + k n , we can write


the average entropy (per letter) as
H(Pl,'" ,Pn)
1
L
n
= T.K(llogl-l +O(logl) - (kilogki -ki +O(logki)))
,=1
1
= K T(L k i log 1 + o (log l) L ki log ki + O(log l))
n n
-
i=1 ;=1

_
--K Ln -ki 1og-+
k O(logl)
- i
I I I
i=1

~ logl
= -K ~Pi logpi + O(-Z-),
i=1

By letting I -t 00, the last term tends to O. This encourages the following
definition.
Definition 9.5.1. Suppose that the elements of an n-element set occur with
probabilities of PI, ... , Pn· The Shannon entropy of distribution PI, ... , Pn is
defined to be

H(Pl,'" ,Pn) = -K(P 1 logPl + .. ·Pn logPn)'

Remark 9.5.2. Because limp-+oplogp = 0, we define O· logO = O. Clearly,


H(Pl,'" ,Pn) 2': O.
Rcmark 9.5.3. If K = lo~2 in the above definition, we talk about binary
Shannon entropy. The special case that Pi = ~ for each i yields H (n) =
- K n . ~ log ~ = K log n, again giving the elementary entropy, which we
could now call uniform entropy as weIl .
Remark 9.5.4. Recall that the elementary binary entropy measures how
many bits of information we have to receive in order to specify an element
in a set of n elements with uniform probability distribution. In other words,
it measures the uncertainty of outcome in bits. On the other hand, binary
Shannon entropy measures the average uncertainty in bits in a set with a
prob ability distribution Pl, ... , Pn·
Since log is a concave function,8 it follows easily that for any probability
distribution Pl, ... , Pn, inequality
8 Areal function f is concave in an interval I if the graph of f is above the chord
connecting any points f(Xl) and f(X2), Xl, X2 EI. Formally, a concave function
f must satisfy f().,Xl + (1 - ).,)X2) :2: ).,f(x!) + (1 - ).,)f(X2) for )., E [0,1) and Xl,
X2 EI.
200 9. Appendix B: Mathematical Background

PI log Xl + ... + Pn log X n ~ 10g(PIXI + ... + Pnxn)


holds true. Therefore,
H(Pb ... ,Pn) = Pl logPl 1 + ... + Pn logp;;-l
~ 10g(PIPl 1 + ... + Pnp;;-l) = logn.
Notiee that the Shannon entropy H(PI, ... ,Pn) also attains the above upper
bound at PI = ... = Pn = ~. Moreover, Shannon entropy has the following
properties:
1. H (Pb ... ,Pn) is asymmetrie continuous function.
2. H(~, ... ,~) is a non-negative, strietly increasing function of variable n.
3. If (PI, ... ,Pn) and (ql, ... , qm) are probability distributions, then
H(Pb·.· ,Pn) + PnH(ql, ... , qm)
= H(Pb··· ,Pn-I,Pnqb··· ,Pnqm).
We will make the definition of Shannon entropy even more natural by showing
that the converse also holds:
Theorem 9.5.1. If H(PI, ... ,Pn) satisfies the conditions 1-3 above, then
H (PI, ... ,Pn) = - K (pI log PI + ... +Pn log Pn) for some positive constant K.
Proof. We will first prove an auxiliary result that will be of great help: if
f is a strietly increasing, non-negative function defined on natural numbers
such that f(sm) = mf(s), then f(s) = Klogs for some positive constant K.
To prove this, we take an arbitrary natural number n and fix m such that
sm ~ 2n < sm+l. Then
m log 2 m 1
- < -- <
n - logs
-+-.
n n
(9.45)

Because f is strict1y increasing, we have also f(sm) < f(2 n ) < f(sm+l),
whieh implies that

m < f(2) < m + ];.. (9.46)


n - f(s) n n
Inequalities (9.45) and (9.46) together imply that

_ 10g21 < ];..


I f(2)
f(s) logs - n

Since n can be chosen arbitrarily great, we must have f(s) = fo~~ logs =
K log s. Constant K = fo~~ is positive, since f is strietly increasing and
f(1) = f(1°) = Of(1) = o.
For any natural number n, we define f(n) = H(~, ... , ~). We will demon-
strate that f(sm) = mf(s). Symmetry of Hand condition 3 imply that
9.5 Shannon Entropy and Information 201

1 1 11 11 1
H(-,oo.,-)
sm sm
= H(-,oo.,-)
s S
+s· -H(-l,oo"-l)'
S sm- sm-

which is to say that f(sm) = f(s) + f(sm-l). Claim f(sm) = mf(s) follows
now by induction. Since f is strictly increasing and non-negative by 2, it
follows that f (s) = K log s for some positive constant K.
Let Pi, ... , Pn be some rational numbers such that PI + ... + Pn = 1. We
can assurne that Pi = ~i, where N = ml + ... + m n . Using 3 we get
1 1
=H(N,oo"N)
1 1 1 1
= H(PI . - , ... , PI . - , P2 . - , ... , ... , ... , Pn . - )
ml ml m2 mn
, v I '-v--" '-v--"

Hence,

n
= - K (L Pi log mi - log N)
i=l
n n

i=l i=l
n
= -KLPilogpi,
i=l

which demonstrates that the theorem is true for rational probabilities. Since
H is continuous, the result extends to real numbers straightforwardly. 0

9.5.2 Information

We will hereafter ignore the constant K in the definition of entropy.


Let X = {Xl,"" Xn } be a set of n elements, and p(xd, ... , p(x n ) the
corresponding probabilities of elements to occur. Similarly, let Y be a set
with m elements with a prob ability distribution p(Yd, ... , P(Ym). Such sets
are here identified with discrete random variables: we regard X as a variable
which has n potential values Xl, ... , Xn , any such with probability P(Xi).
The joint entropy of sets (random variables) X and Y is simply defined
as the entropy of set X x Y, where the corresponding probability distribution
is the joint distribution P(Xi, Yj):
202 9. Appendix B: Mathematical Background
n m
H(X,Y) = - L:~::>(xi,Yj)logp(xi,Yj).
i=l j=l
The eonditional entroPY of X, when the value of Y is known to be Yj, is
defined by merely replacing the distribution p( Xi) with conditional probability
distribution p(xiIYj):
n
H(XIYj) = - Lp(XiIYj) logp(xiIYj).
i=l
The eonditional entroPY of X when Y is known is defined as the expected
value
m
H(XIY) = Lp(Yj)H(XIYj).
j=l
By using the concavity of the function log, one can quite easily prove the
following lemma.
Lemma 9.5.1. H(XIY) ::; H(X).
Remark 9.5.5. The above lemma is quite natural: it states that the knowledge
of Y cannot increase the uncertainty about X.
Conditional and joint entropy are connected in the following lemma, which
has a straightforward proofif one keeps in mind that P(Xi, Yj) = p(xilYj )p(Yj).
Lemma 9.5.2. H(XIY) = H(X, Y) - H(Y).
Finally, we have all the necessary tools for defining the information of X
when Y is known.
Definition 9.5.2. I(XIY) = H(X) - H(XIY).
Definition (9.5.2) contains the natural idea that the knowledge that Y can
provide about X is merely the uncertainty of X minus the uncertainty of X
provided Y is known. According to Lemma (9.5.1), I(XIY) ?: 0 always, and
trivially I(XIY) ::; H(X).
To end this section, we list some properties of entropy and information.
The proofs of the following lemmata are left as exercises.
Lemma 9.5.3. H(X, Y) ::; H(X) + H(Y).
Lemma 9.5.4. Information is symmetrie: I(XIY) = I(YIX).
The above lemma justifies the terminology mutual information of X and Y.
For the following lemma, we need some extra terminology: if X and Y
are random variables, we say that X (randomly) depends on Y if there are
conditional probabilities p(xiIYj) such that
m
P(Xi) = Lp(XiIYj)p(Yj).
j=l
9.5 Shannon Entropy and Information 203

Lemma 9.5.5. Let X be a mndom variable that depends on Y, which is a


mndom variable depending on Z. If Xis conditionally independent of Z, then
I(ZIX) :S I(ZIY).

9.5.3 The Holevo Bound


Let P E L(Hn ) be astate (or density matrix) of an n-dimensional quantum
system. We know (see Section 8.4.1) that P has a spectral representation
P = Al 1:,e1)(:,e11 + ... + An l:,en)(:,en I,
where each Ai satisfies Ai ~ 0 and
Al + ... + An = 1.
The von Neumann entropy of the state P is defined by
S(p) = -Tr(plogp).
According to (8.19), we define p log p as
p log p = Al log Al 1:,e1) (:,e1 1+ ... + An log An 1:,en) (:,en I,
and therefore
S(p) = -Tr(plog p) = -(Al log Al + ... + An log An).
That is to say that the von Neumann entropy of a quantum state p E L(Hn )
is exactly the Shannon entropy of distribution (Al, ... , An).
Remark 9.5.6. From the definition it follows easily that the von Neumann
entropy of a pure state is always O.
The Holevo bound establishes an upper bound of the accessible informa-
tion of quantum systems. For the statement of Holevo's theorem, we assurne
that there is a source which pro duces quantum states Pb ... , Pn E L(Hn )
with probabilities Pb ... , Pn. We define X to be the random variable that
determines which one of the states Pi was produced. That is, the value of X
is i if and only if the source produces Pi. Let also Y be an observable on H n .
Then the following theorem holds.
Theorem 9.5.2 (The Holevo Bound, [47]).
n n
I(X 1 Y) :S S(LPiPi) - LPiS(Pi).
i=l i=l

Remark 9.5.7. By saying that "Y is an observable on H n " we mean the fol-
lowing: There is a collection {E1 , ... , E m } of mutually orthogonal subspaces
of Hn such that Hn = E 1 (f) ... (f)Em (see Definition 8.3.1), and each subspace
Ei has label i. Y is then defined as the random variable which gets value i if
and only if the observable {E 1 , •.• , E m } gets value i.
Remark 9.5.8. The above theorem holds also even if Y is allowed to be a
POVM rather than a typical observable, defined as in Definition 8.3.1.
204 9. Appendix B: Mathematical Background

9.6 Exercises

1. Prove that a set S <,:;; V is linearly dependent if and only if for some
element x ES, x E L (S \ {X} ).
2. Show that if Band B' are two bases of aHn, then necessarily IBI = IB'I.
3. a) Let n 2: 2 and Xl, X2 E Hn- Show that y E H n can be chosen in
such a way that L(XI, y) = L(XI, X2) and (Xl I y) = O.
b) Generalize a) into a procedure for finding an orthonormal basis of
H n (the procedure is called the Gmm-Schmidt method).
4. Prove that function f : N ---+ C defined by f(n) = ~ is in L 2 (C) but
cannot be expressed as a linear combination of E = {e I, e2, e3, ... }.
5. Prove the parallelogram rule: in inner product space V, the equation

holds for any x, y E V.


6. Show that the subspace W of L 2(C) generated by vectors {eI, e2, e3, ... }
is not a Hilbert space (cf. Example 9.3.4).
7. Prove Lemma 9.4.1.
8. Prove Lemma 9.4.4.
9. Prove Lemmata 9.5.3 and 9.5.4.
10. Use the properties of the function log to prove Lemma 9.5.5.
References

1. Manindra Agarwal, Neeraj Kayal, and Nitin Saxena: PRIMES is in P. Elec-


tronically available at http://www.cse.iitk.ac.in/primality.pdf.
2. Andris Ambainis: A note on quantum black-box complexity 01 almost all Boolean
functions, Information Processing Letters 71, 5-7 (1999). Electronically avail-
able at quant-ph/9811080. 9
3. Andris Ambainis: Polynomial degree vs. quantum query complexity. Electroni-
cally available at quant-ph/0305028.
4. E. Bach and J. Shallit: Computational number theory, MIT Press (1996).
5. Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo,
Norman Margolus, Peter Shor, Tycho Sleator, John Smolin, and Harald Wein-
furter: Elementary gates lor quantum computation, Physical Review A 52:5,
3457-3467 (1995). Electronically available at quant-ph/9503016.
6. R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf: Quantum lower
bounds by polynomials, Proceedings of the 39th annual IEEE Symposium on
Foundations of Computer Science - FOCS, 352-361 (1998). Electronically avail-
able at quant-ph/9802049.
7. P. A. Benioff: Quantum mechanical Hamiltonian models 01 discrete processes
that erase their own histories: application to Turing machines, International
Journal of Theoretical Physics 21:3/4, 177-202, (1982).
8. Charles H. Bennett: Logical reversibility 01 computation, IBM Journal of Re-
search and Development 17, 525-532 (1973).
9. Charles H. Bennett: Time/space trade-offs lor reversible computation, SIAM
Journal of Computing 18, 766-776 (1989).
10. Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh V. Vazirani:
Strengths and weaknesses 01 quantum computation, SIAM Journal of Comput-
ing 26:5, 1510-1523 (1997). Electronically available at quant-ph/9701001.
11. Charles H. Bennett, Gilles Brassard, Claude Crepeau, Richard Jozsa, Ahsher
Peres, Williams K. Wootters: Teleporting an unknown quantum state via dual
classical and Einstein-Podolky-Rosen channels. Physical Rewiev Letters 70,
1895-1899 (1993).
12. Charles H. Bennett, Stephen J. Wiesner: Communication via one- and two-
particle operators on Einstein-Podolsky-Rosen states. Physical Review Letters,
69(20): 2881-2884 (1992).
13. Ethan Bernstein and Umesh Vazirani: Quantum complexity theory, SIAM Jour-
nal of Computing 26:5, 1411-1473 (1997).
14. Andre Berthiaume and Gilles Brassard: Dracle quantum computing, Proceed-
ings of the Workshop on Physics and Computation - PhysComp'92, IEEE
Press, 195-199 (1992).

9 Code "quant-ph/9811080" refers to http://xxx.lanl.gov/abs/quant-ph/9811080


at Los Alarnos preprint archive.
206 References

15. D. Boschi, S. Branca, F. De Martini, L. Hardy, S. Popescu: Experimental


realization of teleporting an unknown pure quantum state via dual classical
and Einstein-Podolsky-Rosen channels Physical Review Letters 80:6,1121-1125
(1998). Electronically available at quant-ph/9710013.
16. Dik Bouwmeester, Jian-Wei Pan, Klaus Mattle, Manfred Eibl, Harald Wein-
furter, Anton Zeilinger: Experimental quantum teleportation, Nature 390, 575-
579 (1997).
17. Michel Boyer, Gilles Brassard, Peter H0yer, and Alain Tapp: Tight bound
on quantum searching, Fourth Workshop on Physics and Computation -
PhysComp'96, Ed: T. Toffoli, M. Biaford, J. Lean, New England Complex
Systems Institute, 36-43 (1996). Electronically available at quant-ph/9605034.
18. Gilles Brassard and Peter H0yer: An exact quantum polynomial-time algorithm
for Simon's problem, Proceedings of the 1997 Israeli Symposium on Theory of
Computing and Systems - ISTCS'97, 12-23 (1997). Electronically available at
q uant-ph / 9704027.
19. Gilles Brassard, Peter H0yer, and Alain Tapp: Quantum counting, Automata,
Languages and Programming, Proceedings of the 25th International Co11o-
quium, ICALP'98, Lecture Notes in Computer Science 1443, 820-831, Springer
(1998). Electronically available at quant-ph/9805082.
20. Leon Brillouin: Science and information theory, 2nd edition, Academic Press
(1967).
21. Harry Buhrman and Wim van Dam: Quantum Bounded Query Complexity,
Proceedings of the 14th Annual IEEE Conference on Computational Complex-
ity - CoCo'99, 149-157 (1999). Electronica11y available at quant-ph/9903035.
22. Paul Busch: Quantum states and generalized observables: a simple proof of Glea-
son's theorem. Electronica11y available at quant-ph/9909073.
23. Paul Busch, Pekka J. Lahti, and P. Mittelstaedt: The quantum theory of mea-
surement, Springer-Verlag, 1991.
24. A. R. Calderbank, Peter W. Shor: Good quantum error-correcting codes exist,
Physical Review A 54, 1098-1105 (1996). Electronica11y available at quant-
ph/9512032.
25. Man-Duen Choi: Completely positive linear maps on complex matrices, Linear
Algebra and its Applications 10, 285-290 (1975).
26. Isaac L. Chuang, Lieven M. K. Vandersypen, Xinlan Zhou, Debbie W. Leung,
and Seth Lloyd: Experimental realization of a quantum algorithm, Nature 393,
143-146 (1998). Electronically available at quant-ph/9801037.
27. Juan I. Cirac and Peter Zoller: Quantum computations with cold trapped ions,
Physical Review Letters 74:20, 4091-4094 (1995).
28. Henri Cohen: A course in computational algebraic number theory, Graduate
Texts in Mathematics 138, Springer (1993), 4th printing 2000.
29. Win van Dam: Two classical queries versus one quantum query. Electronica11y
available at quant-ph/9806090.
30. David Deutsch: Uncertainty in quantum measurements, Physical Review Let-
ters 50:9, 631-633 (1983).
31. David Deutsch: Quantum theory, the Church- Turing principle and the universal
quantum computer, Proceedings of the Royal Society of London A 400,97-117
(1985).
32. David Deutsch: Quantum computational networks, Proceedings of the Royal
Society of London A 425, 73-90 (1989).
33. David Deutsch, Adriano Barenco, and Artur Ekert: Universality in quantum
computation, Proceedings of the Royal Society of London A 449, 669-677
(1995). Electronica11y available at quant-ph/9505018.
References 207

34. David Deutsch, Richard Jozsa: Rapid solutions of problems by quantum com-
putation, Proceedings of the Royal Society of London A 439, 553-558 (1992).
35. Chistoph Dürr and Peter H!1lyer: A quantum algorithm for jinding the minimum.
Electronically available at quant-ph/9607014.
36. Mark Ettinger and Peter H!1lyer: On quantum algorithms for noncommutative
hidden subgroups, Proceedings of the 16th Annual Symposium on Theoretical
Aspects of Computer Science - STACS 99, Lecture Notes in Computer Science
1563, 478-487, Springer (1999). Electronically available at quant-ph/9807029.
37. Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser: A limit
on the speed of quantum computation in determining parity, Physical Review
Letters 81:5, 5442-5444 (1998). Electronically available at quant-ph/9802045.
38. Richard P. Feynman: Simulating physics with computers, International Journal
of Theoretical Physics 21:6/7, 467-488 (1982).
39. A. Furusawa, J. L. Sorensen, S. L. Braunstein, C. A. Fuchs, H. J. Kimble, E.
S. Polzik: Unconditional quantum teleportation, Science 282, 706-709 (1998).
40. Daniel Gottesmann: The Heisenberg Representation of Quantum Computers.
Electronically available at quant-ph/9807006.
41. Lov K. Grover: A fast quantum-mechanical algorithm for database search, Pro-
ceedings of the 28th Annual ACM Symposium on the Theory of Computing -
STOC, 212-219 (1996). Electronically available at quant-ph/9605043.
42. Lov K. Grover and Terry Rudolph: How signijicant are the known collision and
element distinctness quantum algorithms'? Electronically available at quant-
ph/0306017.
43. Josef Gruska: Quantum Computing, McGraw-Hill (1999).
44. G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, 4th
ed with corrections, Clarendon Press, Oxford (1971).
45. Mika Hirvensalo: On quantum computation, Ph.Lic. Thesis, University of
Turku, 1997.
46. Mika Hirvensalo: The reversibility in quantum computation theory, Proceed-
ings of the 3rd International Conference Developments in Language Theory -
DLT'97, Ed.: Symeon Bozapalidis, Aristotle University of Thessaloniki, 203-
210 (1997).
47. A. S. Holevo: Statistical Problems in Quantum Physics, Proceedings of the
Second Japan-USSR Symposium on Prob ability Theory, Eds.: G. Murayama
and J.V. Prokhorov, Springer, 104-109 (1973).
48. Richard Jozsa: A stronger no-cloning theorem. Electronically available at quant-
ph/0204153.
49. Loo Keng Hua: Introduction to number theory, Springer-Verlag, 1982.
50. A. Y. Kitaev: Quantum computation: algorithms and error correction, Russian
Mathematical surveys 52:1991 (1997).
51. E. Knill, R. Laflamme, R. Martinez, C.-H. Tseng: An algorithmic benchmark
for quantum information processing, Nature 404: 368-370 (2000).
52. Rolf Landauer: Irreversibility and heat generation in the computing process,
IBM Journal of Research and Development 5, 183-191 (1961).
53. M. Y. Lecerf: Recursive insolubilite de l'equation generale de diagonalisation
de deux monomorphismes de monoiäes libres r.px = 'lj;x, Comptes Rendus de
l'Academie des Sciences 257, 2940-2943 (1963).
54. Ming Li, John Tromp and Paul Vitanyi: Reversible simulation of irreversible
computation, Physica D 120:1/2, 168-176 (1998). Electronically available at
quant-ph/9703009.
55. Seth Lloyd: A potentially realizable quantum computer, Science 261, 1569-1571
(1993).
208 References

56. Hans Maassen and J. B. M. Uffink: Generalized entropic uncertainty relations,


Physical Review Letters 60:12, 1103-1106 (1988).
57. F. J. MacWilliams and Neil J. A. Sloane: The theory of error-correcting codes,
North-Holland (1981).
58. Gary L. Miller: Riemann's hypo thesis and tests for primality, Journal of Com-
puter and System Sciences 13, 300--317 (1976).
59. MicheIe Mosca and Artur Ekert: The hidden subgroup problem and eigenvalue
estimation on a quantum computer, Quantum Computing and Quantum Com-
munications, Proceedings of the 1st NASA International Conference, Lecture
Notes in Computer Science 1509, 174-188, Springer (1998). Electronically avail-
able at quant-phj9903071.
60. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone: Handbook of applied
cryptography, CRC Press Series on Discrete and Mathematics and Its Applica-
tions, CRC Press (1997).
61. John von Neumann: Mathematical foundations of quantum mechanics, Prince-
ton university press, translated from the German edition by Robert T. Beyer
(1955)
62. Michael A. Nielsen and Isaac L. Chuang: Quantum Computation and Quantum
Information, Cambridge University Press (2000).
63. Masanao Ozawa: Quantum Turing machines: local transitions, preparation,
measurement, and halting problem, Quantum Communication, Computing, and
Measurement 2, Eds.: Prem Kumar, G. Mauro D'Ariano, and Osamu Hi-
rota, Kluwer, New York, 241-248 (2000). Electronically available at quant-
phj9809038.
64. Christos H. Papadimitriou: Computational complexity, Addison-Wesley (1994).
65. K. R. Parthasarathy: An introduction to quantum stochastic calculus, Birk-
häuser, Basel (1992).
66. A.K. Pati, S. L. Braunstein: Impossibility of deleting an unknown quantum
state, Nature 404, 164-165 (2000).
67. R. Paturi: On the degreee of polynomials that approximate symmetric Boolean
junctions, Proceedings of the 28th Annual ACM Symposium on the Theory of
Computing - STOC, 468-474 (1992).
68. Max Planck: Annalen der Physik 1, 69, 1900; Verhandlg. dtsch. phys. Ges., 2,
202; Verhandlg. dtsch. phys. Ges. 2, 237; Annalen der Physik 4, 553, 1901.
69. M. B. Plenio and P. L. Knight: Realistic lower bounds for the factorization time
of large numbers on a quantum computer, Physical Review A 53:5, 2986-2990
(1996). Electronically available at quant-phj9512001.
70. E. L. Post: The two-valued iterative systems of mathematical logic, Princeton
University Press (1941).
71. E. L. Post: A variant of a recursively unsolvable problem, Bulletin of the Amer-
ican Mathematical Society 52, 264-268 (1946).
72. John Preskill: Robust solutions to hard problems, Nature 391, 631-632 (1998).
73. Marcel Riesz: Bur les maxima des formes bilineaires et sur les fonctionnelles
lineaires, Acta Mathematica 49, 465-497 (1926).
74. Yurii Roghozin On the notion of universality and small universal Turing ma-
chines, Theoretical Computer Science 168, 215-240 (1996).
75. Sheldon M. Ross: Introduction to probability models, 4th edition, Academic
Press (1985).
76. J. Barkley Rosser and Lowell Schoenfeld: Approximate formulas for some func-
tions of prime numbers, Illinois Journal of Mathematics 6:1, 64-94 (1962).
77. Walter Rudin: Functional Analysis, 2nd edition, McGraw-Hill (1991).
References 209

78. Keijo Ruohonen: Reversible machines and Post's correspondence problem for
biprefix morphisms, EIK - Journal of Information Processing and Cybernetics
21:12, 579-595 (1985).
79. Arto Salomaa: Public-key cryptography, Texts in Theoretical Computer Science
- An EATCS Series, 2nd ed., Springer (1996).
80. Yaoyun Shi: Both Toffoli and controlled-NOT need little help to do univer-
sal quantum computation. Quantum Information and Computation 3:1,84-92
(2003). Electronically available at quant-ph/0205115.
81. Peter W. Shor: Algorithms for quantum computation: discrete log and factoring,
Proceedings of the 35th annual IEEE Symposium on Foundations of Computer
Science - FOCS, 20-22 (1994).
82. Peter W. Shor: Scheme for reducing decoherence in quantum computer memory,
Physical Review A 52:4, 2493-2496 (1995).
83. Uwe Schöning: A Probabilistic algorithm for k -SAT based on limited loeal search
and restart, Algorithmica 32, 615-623 (2002).
84. Daniel R. Simon: On the power of quantum computation, Proceedings of the
35th annual IEEE Symposium on Foundations of Computer Science - FOCS,
116-123 (1994).
85. Douglas R. Stinson: Cryptography - Theory and practice, CRC Press Series on
Discrete Mathematics and Its Applications, CRC Press, Boca Raton (1995).
86. W. Tittel, J. Brendel, H. Zbinden, N. Gisin: Violation of Bell inequalities by
photons more than 10 km apart, Physical Review Letters 81:17, 3563-3566,
(1998). Electronically available at quant-ph/9806043.
87. Tommaso Toffoli: Bicontinuous extensions of invertible combinatorial functions,
Mathematical Systems Theory 14, 13-23 (1981).
88. B. L. van der Waerden: Sources of quantum mechanics, North-Holland (1967).
89. C. P. Williams and S. H. Clearwater: Explorations in quantum computing,
Springer (1998).
90. C. P. Williams and S. H. Clearwater: Ultimate zero and one. Computing at the
quantum frontier, Springer (2000).
91. William K. Wootters, Wojciech H. Zurek: A single quantum cannot be cloned,
Nature 299, 802-803 (1982).
92. Andrew Chi-Chih Yao: Quantum circuit complexity, Proceedings of the 34th
annual IEEE Symposium on Foundations of Computer Science - FOCS, 352-
361 (1993).
Index

BPP, 34-36, 38 Cauchy-Schwartz inequality, 187


BQP, 38 causality principle, 130, 154
EQP, 38 channel, classical, 24
NP, 34, 38 channel, quantum, 24
NQP, 38 character, 176
P, 32, 36 character group, 176
R,31 Chinese Remainder Theorem, 172
RE,31 Chuang, Isaac, V
RP, 34-36, 38 Church-Turing thesis, 31, 42
ZPP, 34-36, 38 circuit, 29
Cleve, R, 99, 101
accepting computation, 30 cloning, 22
adjoint matrix, 8 commutator, 134
adjoint operator, 118 complete, 117, 188
algorithm, 31 completely positive, 154, 155
alphabet, 29 compound systems, 129
Ambainis, Andris, 112 computation, 30
amplitude, 7, 13, 116, 129 computational basis, 13
amplitude amplification, 90 computational step, 30
concatenation, 29
basis, 185 concave function, 199
basis state, VIII, 8, 13, 129 conditional entropy, 202
Beals, R, 99, 101 configuration, 30, 37
Benioff, Paul, 1 configuration space, 33
Bennett, Charles, 39 congruence, 169
Bernstein, Ethan, 1, 40
constructive interference, 16, 116
Bezout's identity, 172
continued fraction, 191
bijective morphism, 174
controlled not, 18
binary alphabet, 41
convergent, 192
binary quantum gate, 18
convex combination, 141
black body, 113
coordinate representation, 13
blackbox function, 84-86
Bohr, Niels, 114 coset, 168
Bolzmann's constant, 39 coset product, 171
Boolean circuit, 31, 41 cyclic group, 176
Born, Max, 115
Boyer, M., 97 Davidsson, C. J., 115
bra-vector, 139 de Broglie, Luis, 115
Brassard, G., 97 de Wolf, R, 99, 101
Braunstein, S. L., 22, 164 decidability, 31
Buhrman, H., 99, 101 decision problem, 31
Busch, Paul, 146 decomposable state, 11, 17,20, 129
212 Index

degenerate, 124 group morphism, 174


density matrix, 139, 140 group theory, 167
densityoperator, 140 Grover's search algorithm, 83, 97
destructive interference, 16, 116 Grover, Lov, 83, 89, 90, 100
deterministie Thring machine, 29 Gruska, Josef, V
Deutsch, David, 1, 2, 40, 135
dimension, 185 Hadamard matrix, 52
Dirac, Paul, 115 Hadamard transform, 52, 181
direct product, 175, 177 Hadamard-Walsh matrix, 15, 127
direct sum, 177, 188 Hadamard-Walsh transform, 15, 26, 51,
discrete Fourier transform, 180 52, 181
discrete logarithm, 79 halting computation, 30
distance, 188 Hamilton operator, 127, 131
distance function, 188 Hamming weight, 105
dual group, 176 Heisenberg picture, 154
dual space, 139 Heisenberg's uncertainty principle, 135
Heisenberg, Werner, 115
effect, 146 Hertz, 114
eigenspace, 123 hidden subgroup, 73
eigenvector, 118 Hilbert space, 7, 11, 13, 17, 21, 117, 188
Einstein, Albert, 114 Holevo bound, 203
elementary row operations, 80 H0yer, P., 97
entangled state, 11, 17, 129
entropie uncertainty relation, 136 identity operator, 124
entropy, 197 image, 174
entropy, von Neumann, 203 index, 169
epimorphism, 174 Information, 197
EPR pair, 18, 25 information, VIII
equivalent states, 129 injective morphism, 174
Euclid's algorithm, 189 inner product, 178, 186
Euler's cp-function, 172 inner product space, 186
Euler's constant, 68 interference, 16
Euler's theorem, 173 internal control, 29
expected value, 133 inverse element, 167
inverse Fourier transform, 182
factor group, 170
inversion, 167
fast Fourier transform, 57
inversion about average, 92
Fermat's little theorem, 173
Feynman, Riehard, 1, 20, 40
Fibonacci numbers, 193 Jordan, Paul, 115
Fourier transform, 49, 50, 57, 176, Jozsa, Riehard, V, 22, 163
180-183
Fourier transform decomposition, 51 kernei, 174
Fh~chet, Maurice, 139 ket-vector, 139
functional, 139 Kraus representation, 157
Kronecker product, 18
Galileo, Galilei, 114
Gauss-Jordan elimination, 81 Lagrange's theorem, 169
general linear group, 168 Landauer, Ralf, 39
generator matrix, 74 Las Vegas algorithm, 35
Germer, L. H., 115 Lecerf, Yves, 39
Gleason's theorem, 143 letter, 29
Gram-Schmidt method, 204 linear combination, 184
group axioms, 167 linear dependence, 184
Index 213

linear mapping, 188 positive operator, 119


literal, 83 positive operator-valued measure, 134
Post, Emil, 39, 42
Maassen, Hans, 135 Preskill, John, 2
MacWilliams, F. J., 2 principal character, 176
Markov chain, 6 probabilistic Turing machine, 32
Markov matrix, 6 probability distribution, 135
matrix representation, 123 product character, 176
Maxwell, James Clerk, 114 projection, 120, 175
measurement, 22 projection-valued measure, 133, 143
measurement paradox, 23 pure state, 4, 141
mixed state, 4, 141 purification, 153, 154
monomorphism, 174
Monte Carlo algorithm, 35 quantum bit, 8, 13
Mosca, M., 99, 101 quantum channel, 24
multitape Turing machine, 36 quantum circuit, 45, 46
quantum computation, VIII
natural basis, 185 quantum computer, VIII
natural numbers, 167 quantum entropy, 203
neutral element, 167 quantum Fourier transform, 49, 50
Newton, Isaac, 114 quantum gate, binary, 18
Nielsen, Michael, V quantum gate, unary, 13
No-Cloning Theorem, 21, 163, 164 quantum information, VIII
No-Deleting Principle, 22, 163-165 quantum physics, VII
nondeterministic Turing machine, 34 quantum register, 17, 20, 129
norm, 118, 178, 187 quantum system, closed, 155
normal subgroup, 169 quantum system, open, 155
normed space, 187 quantum time evolution, 130
quantum Turing machines, 37
observable, 3, 132, 142
qubit, 13, 129
observation, VIII, 13, 17, 22
query operator, modified, 90
operator, 118, 188
quotient group, 170
opposite element, 167
order, 171
orthogonal complement, 118, 187 random variable, 201
orthogonality, 186 range, 174
orthogonality relations, 180 Rayleigh, John, 113
recursive solvability, 31
parallelogram rule, 188 reduced row-echelon form, 81
Parseval's identity, 50, 181 rejecting computation, 30
Pati, A. K., 22, 164 relativity theory, VII
Paturi, R., 107 reversible circuits, 43
Pauli, Wolfgang, 115 reversible gate, 43
period, 183 Riesz, Frigyes, 139
permutation matrix, 45 Riesz, Marcel, 137
phase flip, 16 rotation matrix, 128
phase shift matrix, 128 row-echelon form, 81
phase space, 3 Rudolph, T., 100
photoelectric effect, 114 Ruohonen, Keijo, 39
photon, 114
pivot, 81 scalar multiplication, 183
Planck's constant, 113 Schönig, Uwe, 84
Planck, Max, 113 Schmidt decomposition, 152
polarization equation, 120 Schrödinger equation, 131, 142
214 Index

Schrödinger picture, 154 transversal, 75


search problems, 83 trivial character, 176
self-adjoint operator, 115, 119, 122 trivial subgroup, 169
Shannon entropy, 135, 197, 199 Thring machine, VII, 29, 31
Shannon, Claude, 2
Shi, Yaoyun, 46 Uffink, J., 135
Shor, Peter, 2, 40, 58 unary quantum gate, 13
Simon's promise, 73 uncertainty principle, 134
Simon, Daniel, 73, 80 undecidability, 31
Solovay-Kitaev theorem, 47 uniform computation, 29
span, 184 unit element, 167
spectral representation, 124, 125, 127 unitary, 120
spectrum, 124 unitary matrix, 8
state, VIII, 3, 7, 13, 17, 129, 140 unitary operator, 119
state space, 7 universal Thring machine, 40
state vector, 3, 115
state, mixed, 142 van Dam, Wim, 112
Stone, M., 131 variance, 134
subgroup, 168 Vazirani, Umesh, 1, 40
subspace, 185 vector space, 183
successor, 30 vector space axioms, 183
superdense coding, 27, 28 vector states, 141
superoperator, 155 von Neumann entropy, 203
superposition, VIII, 8, 38, 129 von Neumann-Lüders measurement, 22
surjective morphism, 174

Tapp, A., 97 Walsh transform, 52, 181


teleportation, 24-26 wave function, 8
tensor product, 10, 18 Wien, Wilhelm, 113
time complexity, 32 Wootters, W. K., 21
time evolution, 154 word,29
Toffoli gate, 43
Toffoli, Tommaso, 45 Yao, Andrew, 46
trace, 119 Young,114
tracing over, 148
tractability, 32 zero-knowledge protocol, 83
transition function, 29 Zurek, W. H., 21
Natural Computing Se ries
W.M. Spears: Evolutionary Algorithms. The Role of Mutation and Recombination.
XIV, 222 pages, 55 figs., 23 tables. 2000
H.-G. Beyer: The Theory of Evolution Strategies.
XIX, 380 pages, 52 figs., 9 tables. 2001
1. Kallei, B. Naudts, A. Rogers (Eds.): Theoretical Aspects of Evolutionary
Computing.
X, 497 pages. 2001
G. Päun: Membrane Computing.An Introduction.
XI,429 pages, 37 figs., 5 tables. 2002
A.A. Freitas: Data Mining and Knowledge Discovery with Evolutionary
Algorithms.
XIV, 264 pages, 74 figs., 10 tables. 2002
H.-P. Schwefel, I. Wegener, K. Weinert (Eds.): Advances in Computational
Intelligence.
VIII, 325 pages. 2003
A. Ghosh, S. Tsutsui (Eds.): Advances in Evolutionary Computing.
XVI, 1006 pages. 2003
L.F. Landweber, E. Winfree (Eds.): Evolution as Computation.
DIMACS Workshop, Princeton, ]anuary 1999. XV, 332 pages. 2002
M. Amos: Theoretical and Experimental DNA Computation.
Approx. 200 pages. 2004
M. Hirvensalo: Quantum Computing.
2nd ed., XIII, 214 pages. 2004 (first edition published in the series)
A.E. Eiben, ].E. Smith: Introduction to Evolutionary Computing.
XV, 299 pages. 2003
G. Ciobanu (Ed.): Modelling in Molecular Biology.
Approx. 300 pages. 2004
A. Ehrenfeucht, T. Harju, I. Petre, D.M. Prescott, G. Rozenberg:
Computation in Living Cells.
Approx. 175 pages. 2004
R. Paton, H. Bolouri, M. Holcombe,]. H. Parish, R. Tateson (Eds.):
Computation in Cells and Tissues.
Approx. 350 pages. 2004
1. Sekanina: From Theory to Hardware Implementations.
XVI, 194 pages. 2004

You might also like