0% found this document useful (0 votes)
116 views132 pages

Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory

Uploaded by

Rishikesh Potdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views132 pages

Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory

Uploaded by

Rishikesh Potdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Lectures on Quantum Computation,

Quantum Error Correcting Codes and

Information Theory

by

K. R. Parthasarathy
Indian Statistical Institute
(Delhi Centre)

Notes by
Amitava Bhattacharya
Tata Institute of Fundamental Research, Mumbai
Preface

These notes were prepared by Amitava Bhattacharya on a course of


lectures I gave at the Tata Institute of Fundamental Research (Mumbai)
in the months of April 2001 and February 2002. I am grateful to my
colleagues at the TIFR, in general, and Professor Parimala Raman in
particular, for providing me a receptive and enthusiastic audience and
showering on me their warm hospitality. I thank Professor Jaikumar for
his valuable criticism and insight and several fruitful conversations in
enhancing my understanding of the subject. Finally, I express my warm
appreciation of the tremendous effort put in by Amitava Bhattacharya
for the preparation of these notes and their LATEX files.

Financial support from the Indian National Science Academy (in


the form of C. V. Raman Professorship), TIFR (Mumbai) and Indian
Statistical Institute (Delhi Centre) is gratefully acknowledged.

K. R. Parthasarathy
Delhi
June 2001
Contents

1 Quantum Probability 1
1.1 Classical Versus Quantum Probability Theory . . . . . . . 1
1.2 Three Distinguishing Features . . . . . . . . . . . . . . . . 7
1.3 Measurements: von Neumann’s Collapse Postulate . . . . 9
1.4 Dirac Notation . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Qubits . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Quantum Gates and Circuits 11


2.1 Gates in n-qubit Hilbert Spaces . . . . . . . . . . . . . . . 11
2.2 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 One qubit gates . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Two qubit gates . . . . . . . . . . . . . . . . . . . 14
2.2.3 Three qubit gates . . . . . . . . . . . . . . . . . . . 16
2.2.4 Basic rotations . . . . . . . . . . . . . . . . . . . . 17
2.3 Some Simple Circuits . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Quantum teleportation . . . . . . . . . . . . . . . 19
2.3.2 Superdense coding: quantum communication
through EPR pairs . . . . . . . . . . . . . . . . . . 21
2.3.3 A generalization of “communication through EPR
states” . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Deutsche algorithm . . . . . . . . . . . . . . . . . . 24
2.3.5 Arithmetical operations on a quantum computer . 25

3 Universal Quantum Gates 29


3.1 CNOT and Single Qubit Gates are Universal . . . . . . . 29
3.2 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 The Fourier Transform and an Application 41


4.1 Quantum Fourier Transform . . . . . . . . . . . . . . . . . 41
4.2 Phase Estimation . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Analysis of the Phase Estimation Circuit . . . . . . . . . . 45

i
ii Contents

5 Order Finding 49
5.1 The Order Finding Algorithm . . . . . . . . . . . . . . . . 49
Appendix 1: Classical reversible computation . . . . . . . 52
j
Appendix 2: Efficient implementation of controlled U 2
operation . . . . . . . . . . . . . . . . . . . . . . . 54
Appendix 3: Continued fraction algorithm . . . . . . . . . 55
Appendix 4: Estimating ϕ(r)
r
. . . . . . . . . . . . . . . . 58

6 Shor’s Algorithm 61
6.1 Factoring to Order Finding . . . . . . . . . . . . . . . . . 61

7 Quantum Error Correcting Codes 67


7.1 Knill Laflamme Theorem . . . . . . . . . . . . . . . . . . 67
7.2 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . 75
7.2.1 Invariants . . . . . . . . . . . . . . . . . . . . . . . 75
7.2.2 What is a t–error correcting quantum code? . . . . 76
7.2.3 A good basis for Et . . . . . . . . . . . . . . . . . . 77
7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3.1 A generalized Shor code . . . . . . . . . . . . . . . 78
7.3.2 Specialization to A = {0, 1}, m = 3, n = 3 . . . . . 79
7.3.3 Laflamme code . . . . . . . . . . . . . . . . . . . . 80
7.3.4 Hadamard-Steane quantum code . . . . . . . . . . 81
7.3.5 Codes based on Bush matrices . . . . . . . . . . . 83
7.3.6 Quantum codes from BCH codes . . . . . . . . . . 85

8 Classical Information Theory 87


8.1 Entropy as information . . . . . . . . . . . . . . . . . . . . 87
8.1.1 What is information? . . . . . . . . . . . . . . . . 87
8.2 A Theorem of Shannon . . . . . . . . . . . . . . . . . . . 90
8.3 Stationary Source . . . . . . . . . . . . . . . . . . . . . . . 93

9 Quantum Information Theory 97


9.1 von Neumann Entropy . . . . . . . . . . . . . . . . . . . . 97
9.2 Properties of von Neumann Entropy . . . . . . . . . . . . 97

Bibliography 127
Lecture 1

Quantum Probability
In the Mathematical Congress held at Berlin, Peter Shor presented a
new algorithm for factoring numbers on a quantum computer. In this
series of lectures, we shall study the areas of quantum computation (in-
cluding Shor’s algorithm), quantum error correcting codes and quantum
information theory.

1.1 Classical Versus Quantum Probability


Theory
We begin by comparing classical probability and quantum probabil-
ity. In classical probability theory (since Kolmogorov’s 1933 mono-
graph [11]), we have a sample space, a set of events, a set of random
variables, and distributions. In quantum probability (as formulated in
von Neumann’s 1932 book [14]), we have a state space (which is a Hilbert
space) instead of a sample space; events, random variables and distribu-
tions are then represented as operators on this space. We now recall the
definitions of these notions in classical probability and formally define
the analogous concepts in quantum probability. In our discussion we
will be concerned only with finite classical probability spaces, and their
quantum analogues—finite dimensional Hilbert spaces.

1
2 Lecture 1. Quantum Probability

Spaces
1.1 The sample space 1.2 The state space H: It
Ω: This is a finite set, say is a complex Hilbert space of di-
{1, 2, . . . , n}. mension n.

Events
1.3 The set of events FΩ : 1.4 The set of events P(H):
This is the set of all subsets of Ω. This is the set of all orthogo-
FΩ is a Boolean algebra with the nal projections in H. An ele-
union (∪) operation for ‘or’ and ment E ∈ P(H) is called an
the intersection (∩) operation for event. Here, instead of ‘∪’ we
‘and’. In particular, we have have the max (∨) operation, and
instead of ‘∩’ the min (∧) oper-
E∩(F1 ∪F2 ) = (E∩F1 )∪(E∩F2 ). ation. Note, however, that E ∧
(F1 ∨ F2 ) is not always equal to
(E ∧ F1 ) ∨ (E ∧ F2 ). (They are
equal if E, F1 , F2 commute with
each other).

Random variables and observables


1.5 The set of random vari- 1.6 The set of observ-
ables BΩ : This is the set of ables B(H): This is the
all complex valued functions on (non-Abelian) C ∗ -algebra of all
Ω. The elements of BΩ are operators on H, with ‘+’ and ‘·’
called random variables. BΩ is an defined as usual, and X ∗ defined
Abelian C ∗ -algebra under the op- to be the adjoint of X. We
erations will use X † instead of X ∗ . The
identity projection I is the unit
(αf )(ω) = αf (ω); in this algebra.
(f + g)(ω) = f (ω) + g(ω); We say that an observable is real-
(f · g)(ω) = f (ω)g(ω); valued if X † = X, that is, if X is
∆ Hermitian. For such an observ-
f ∗ (ω) = f † (ω) = f (ω̄). able, we define Sp(X) to be the
set of eigen values of X. Since
Here, α ∈ C, f, g ∈ BΩ , and the
X is Hermitian, Sp(X) ⊆ R, and
‘bar’ stands for complex conjuga-
by the spectral theorem, we can
tion. The random variable 1 (de-
∆ write X as
fined by 1(ω) = 1), is the unit in X
this algebra. X= λEλ ,
λ∈Sp(X)
1.1. Classical Versus Quantum Probability Theory 3

With each event E ∈ FΩ we asso- where Eλ is the projection on the


ciate the indicator random vari- subspace {u : Xu = λu} and
able 1E defined by
( Eλ Eλ0 =0, λ, λ0 ∈ Sp(X), λ 6= λ0 ;
X
1 if ω ∈ E; Eλ =I.
1E (ω) =
0 otherwise. λ∈Sp(X)

For a random variable f , let Similarly, we have



Sp(f ) = f (Ω). Then, f can X
be written as the following linear Xr = λr E λ ,
λ∈Sp(X)
combination of indicator random
variables: and in general, for a function ϕ :
X R → R, we have
f= λ1f −1 ({λ}) ,
λ∈Sp(f )
X
ϕ(X) = ϕ(λ)Eλ .
so that λ∈Sp(X)

1f −1 ({λ}) · 1f −1 ({λ0 }) =0 for λ 6= λ0 ;


X
1f −1 ({λ}) = 1.
λ∈Sp(f )

Similarly, we have
X
fr = λr 1f −1 ({λ}) ,
λ∈Sp(f )

and, in general, for a function


ϕ : C → C, we have the random
variable
X
ϕ(f ) = ϕ(λ)1f −1 ({λ}) .
λ∈Sp(f )

Later, we will be mainly inter-


ested in real-valued random vari-
ables, that is random variables f
with Sp(f ) ⊆ R (or f † = f ).
4 Lecture 1. Quantum Probability

Distributions and states


1.7 A distribution p: This 1.8 A state ρ: In quantum
is a function from FΩ to R, probability, we have a state ρ in-
determined by n real numbers stead of the distribution p. A
p1 , p2 , . . . , pn , satisfying: state is a non-negative definite
operator on H with Tr ρ = 1.
pi ≥ 0; The probability of the event E ∈
X n
P(H) in the state ρ is defined
pi = 1. to be Tr ρE, and the probability
i=1
that the real-valued observable X
The probability of the event E ∈ takes the value λ is
FΩ (under the distribution p) is (
Tr ρEλ if λ ∈ Sp(X);

X Pr(X = λ) =
Pr(E; p) = pi . 0 otherwise.
i∈E
Thus, a real-valued observable X
When there is no confusion we has a distribution on the real line
write Pr(E) instead of Pr(E; p). with mass Tr ρEλ at λ ∈ R.
We will identify p with the
sequence (p1 , p2 , . . . , pn ). The
probability that a random vari-
able f takes the value λ ∈ R is

Pr(f = λ) = Pr(f −1 ({λ}));

thus, a real-valued random vari-


able f has a distribution on the
real line with mass Pr(f −1 ({λ}))
at λ ∈ R.

Expectation, moments, variance


The expectation of a random The expectation of an observable
variable f is X in the state ρ is

X ∆
Ef = f (ω)pω . E X = Tr ρX.
p ρ
ω∈Ω

The r-th moment of f is the ex- The map X 7→ Eρ X has the fol-
pectation of f r , that is lowing properties:
1.1. Classical Versus Quantum Probability Theory 5

X
r
(f (ω))r pω (1) It is linear;
Ef =
p
ω∈Ω
X (2) Eρ X † X ≥ 0, for all X ∈
r −1
= λ Pr(f (λ)), B(H).
λ∈Sp(f )
(3) Eρ I = 1.
and the characteristic function
of f is the expectation of the The r-th moment of X is the ex-
complex-valued random variable pectation of X r ; if X is real-
eitf , that is, valued, then using the spectral
X decomposition, we can write
itf
Ee = eitλ Pr(f −1 (λ)). X
r
p
λ∈Sp(f ) EX = λr Tr ρEλ .
ρ
λ∈Sp(X)
The variance of a real-valued ran-
dom variable f is The characteristic function of
the real-valued observable X is

var(f ) = E(f − E f )2 ≥ 0. the expectation of the observable
p p
eitX . The variance of a (real-
Note that valued) observable X is

var(f ) = E f 2 − (E f )2 ; var(X) = Tr ρ(X − Tr ρX)2
p p
= Tr ρX 2 − (Tr ρX)2
also, var(f ) = 0 if and only if all ≥ 0.
the mass in the distribution of f
is concentrated at Ep f . The variance of X vanishes if and
only if the distribution of X is
concentrated at the point Tr ρX.
This is equivalent to the property
that the operator range of ρ is
contained in the eigensubspace of
X with eigenvalue Tr ρX.

Extreme points
1.9 The set of distribu- 1.10 The set of states: The
tions: The set of all probabil- set of all states in H is a convex
ity distributions on Ω is a com- set. Let ρ be a state. Since ρ
pact convex set (Choquet sim- is non-negative definite, its eigen
plex) with exactly n extreme values are non-negative reals, and
points, δj (j = 1, 2, . . . , n), where we can write
δj is determined by
6 Lecture 1. Quantum Probability

X
½ ρ= λEλ ;
∆ 1 if ω = j;
δj ({ω}) = λ∈Sp(ρ)
0 otherwise.
since Tr ρ = 1, we have
If P = δj , then every random X
variable has a degenerate distri- λ × dim(Eλ ) = 1.
bution under P : the distribution λ∈Sp(ρ)
of the random variable f is con-
centrated on the point f (j). The projection Eλ can, in turn,
be written as a sum of one-
dimensional projections:
dim(Eλ )
X
Eλ = Eλ,i .
i=1

Then,
dim(Eλ )
X X
ρ= λEλ,i .
λ∈Sp(ρ) i=1

Proposition 1.1.1 A one-dim-


ensional projection cannot be
written as a non-trivial convex
combination of states.
Thus, the extreme points of the
convex set of states are precisely
the one-dimensional projections.
Let ρ be the extreme state corre-
sponding to the one-dimensional
projection on the ray Cu (where
kuk = 1). Then, the expectation
m of the observable X is

m = Tr uu† X = Tr u† Xu = hu, Xui,

and

var(X) = Tr uu† (X − m)2


= Tr k(X − m)uk2 .
1.2. Three Distinguishing Features 7

Thus, var(X) = 0 if and only if u


is an eigenvector of X. So, even
for this extreme state, not all ob-
servables have degenerate distri-
butions: degeneracy of the state
does not kill the uncertainty of
the observables!

The product
1.11 Product spaces: If 1.12 Product spaces: If
there are two statistical systems (H1 , ρ1 ) and (H2 , ρ2 ) are two
described by classical probability quantum systems, then the
spaces (Ω1 , p1 ) and (Ω2 , p2 ) quantum system with state
respectively, then the proba- space H1 ⊗ H2 and state ρ1 ⊗ ρ2
bility space (Ω1 × Ω2 , p1 × p2 ) (which is a non-negative defi-
determined by nite operator of unit trace on
H1 ⊗ H2 ) describes the two
∆ independent quantum systems
Pr({(i, j)}; p1 × p2 ) =
as a single system.
Pr({i}; p1 ) Pr({j}; p2 ),

describes the two independent


systems as a single system.
Dynamics
1.13 Reversible dynam- 1.14 Reversible dynamics
ics in Ω: This is determined in H: This is determined by
by a bijective transformation a unitary operator U : H → H.
T : Ω → Ω. Then, Then, we have the dynamics of

f à f ◦ T (for random variables) Heisenberg: X à U XU for X ∈
B(H);
P Ã P ◦ T −1 (for distributions)
Schrödinger ρ Ã U ρU † for the
state ρ.

1.2 Three Distinguishing Features


We now state the first distinguishing feature.

Proposition 1.2.1 Let E and F be projections in H such that EF 6= F E.


Then, E ∨ F ≤ E + F is false.
8 Lecture 1. Quantum Probability

Proof Suppose E ∨ F ≤ E + F . Then, E ∨ F − E ≤ F . So,

F (E ∨ F − E) = (E ∨ F − E)F.

That is, F E = EF , a contradiction.


¤

Corollary 1.2.2 Suppose E and F are projections such that EF 6= F E.


Then, for some state ρ, the inequality Tr ρ(E ∨ F ) ≤ Tr ρE + Tr ρF is
false.

Proof By the above proposition, E ∨ F ≤ E + F is false; that is, there


exists a unit vector u such that

hu, (E ∨ F )ui 6≤ hu, Eui + hu, F ui .

Choose ρ to be the one dimensional projection on the ray Cu. Then,

Tr(E ∨ F )ρ = hu, (E ∨ F )ui ,


Tr Eρ = hu, Eui ,
Tr F ρ = hu, F ui .

The second distinguishing feature is:

Proposition 1.2.3 (Heisenberg’s inequality) Let X and Y be ob-


servables and let ρ be a state in H. Assume Tr ρX = Tr ρY = 0. Then,
µ ¶2 µ ¶2
1 1
var(X) var(Y ) ≥ Tr ρ {X, Y } + Tr ρ i [X, Y ]
ρ ρ 2 2
1
≥ (Tr ρ i [X, Y ])2 ,
4

where

{X, Y } = XY + Y X; and

[X, Y ] = XY − Y X.
1.3. Measurements: von Neumann’s Collapse Postulate 9

Proof For z ∈ C, we have

Tr ρ(X + zY )† (X + zY ) ≥ 0.

If z = reiθ ,

r2 Tr ρY 2 + 2r<e−iθ Tr ρY X + Tr ρX 2 ≥ 0.

The left hand side is a degree-two polynomial in the variable r. Since,


it is always non-negative, it can have at most one root. Thus, for all θ,
(Tr ρX 2 )(Tr ρY 2 ) ≥ (<e−iθ Tr ρY X)2
µ ¶
XY +Y X XY −Y X 2
≥ cos θ Tr ρ + sin θ Tr ρi
2 2
= (x cos θ + y sin θ)2 ,
∆ ∆
where x = Tr ρ 21 {X, Y } and y = Tr ρ i
2 [X, Y ]. Note that the right
hand side is maximum when cos θ = √ x
and sin θ = √ 2y 2 and
x2 +y 2 x +y
the proposition follows.
¤
Now we state the third distinguishing feature:
Extremal states (one-dimensional projections) are called pure states.
The set of all pure states in an n-dimensional complex Hilbert space is
a manifold of dimension 2n − 2. (The set of all extremal probability
distributions on a sample space of n points has cardinality n).

1.3 Measurements: von Neumann’s Collapse


Postulate
Suppose X is an observable (i.e. a Hermitian operator) with spectral
decomposition X
X= λEλ .
λ∈Sp(X)

Then, the measurement of X in the quantum state ρ yields the value λ


with probability Tr ρEλ . If the observed value is λ, then the state col-
lapses to
Eλ ρEλ
ρ̃λ = .
Tr ρEλ
The collapsed state ρ̃λ has its support in the subspace Eλ (H).
10 Lecture 1. Quantum Probability

1.4 Dirac Notation


Elements of the Hilbert space H are called ket vectors and denoted by |ui.
Elements of the dual space H∗ are called bra vectors and denoted by hu|.
The bra hu| evaluated on the ket |vi is the bracket hu | vi, the scalar
product between u, v as elements of H.
The operator |uihv| is defined by

|uihv|(|wi) = hv | wi |ui.

It is a rank one operator when u and v are non-zero.


Tr |uihv| = hv | ui
(|uihv|)† = |vihu|
|u1 ihv1 ||u2 ihv2 | · · · |un ihvn | = (hv1 | u2 i hv2 | u3 i · · · hvn−1 | un i)|u1 ihvn |.
The scalar product hu | vi is anti-linear (conjugate-linear) in the first
variable and linear in the second variable.

1.4.1 Qubits
∆ ­£ ¤ £ ¤®
The Hilbert space h = C2 , with scalar product ab , dc = āc + b̄d, is
called a 1-qubit Hilbert space. Let
· ¸ · ¸
1 0
|0i = and |1i = .
0 1
Then, · ¸
a
= a|0i + b|1i,
b
and the ket vectors |0i and |1i form an orthonormal basis for h.
The Hilbert space h⊗n = (C2 )⊗n is called the n-qubit Hilbert space.
If x1 x2 · · · xn is an n-length word from the binary alphabet {0, 1}, we let

|x1 x2 · · · xn i = |x1 i|x2 i · · · |xn i

= |x1 i ⊗ |x2 i ⊗ · · · ⊗ |xn i

= |xi,

where x = x1 × 2n−1 + x2 × 2n−2 + · · · + xn−1 × 2 + xn (that is, as


x1 x2 . . . xn varies over all n-length words, the integer x varies in the
range {0, 1, . . . , 2n − 1}).
Lecture 2

Quantum Gates and Circuits

2.1 Gates in n-qubit Hilbert Spaces


In ordinary (classical) computers, information is passed through a clas-
sical channel. Logic gates (like AND, OR, NOT) operate on these chan-
nels. Likewise, in a quantum computer, information is passed through a
quantum channel and it is operated upon by quantum gates. A quantum
gate is a unitary operator U in a (finite dimensional) Hilbert Space H.
Not all the classical gates are reversible (for example if a AND b = 0,
there are three possible values for the ordered pair (a, b)). On the con-
trary, all quantum gates are reversible.
If a gate U acts on an n-qubit Hilbert space H we depict it as in
Figure 2.1. If U acts on a single qubit it is represented pictorially as
shown in Figure 2.2.

n |ui U |ui
U

Figure 2.1: A quantum circuit.

|ui U |ui
U

Figure 2.2: A gate U acting on a single qubit.

If the input is |ui and it passes through the gate U , then the output
is written as U |ui.

11
12 Lecture 2. Quantum Gates and Circuits

Any unitary operator U which acts on a single qubit can be written


as
· ¸
iα a b
U =e ,
−b a

where |a|2 + |b|2 = 1 in the computational basis consisting of |0i and |1i.
The action of the unitary operator U on the basis states can be
computed as shown below.

· ¸· ¸
iα a b 1
U |0i = e = eiα {a|0i − b|1i}.
−b a 0

Similarly, U |1i = eiα {b|0i + a|1i}. By measurement on the n-qubit reg-


ister of a quantum computer we usually mean measuring the observable

n −1
2X
X= j|jihj|,
j=0

and it is indicated in circuits by the ammeter symbol, as in Figure 2.1.


Since by measuring we get two quantities, namely a classical value
and a (collapsed) quantum state, pictorially it is indicated by a dou-
ble line, as in Figure 2.1. The output consists of a value of X in the
range {0, 1, . . . , 2n − 1}, where the probability of the event {X = j} is
|hj|U |ui|2 , and a collapsed basis state |ji, where j is the observed value.
As an example, let us simulate a Markov chain using a quantum
circuit. Consider the circuit in Figure 2.3.

n |vi j1 j2
U1 U2

Figure 2.3: A quantum circuit to simulate a Markov Chain.

After each measurement, the observed classical parts j1 , j2 , . . . take


2.2. Quantum Gates 13

values in the space {0, 1, 2, . . . , 2n − 1} with the following properties:

Pr({j1 }) = |hj1 |U1 |vi|2 0 ≤ j1 ≤ 2n − 1


Pr({j2 | j1 }) = |hj2 |U2 |j1 i|2 0 ≤ j2 ≤ 2n − 1
.. ..
. .
Pr({jk | jk−1 jk−2 , . . . , j1 }) = |hjk |Uk |jk−1 i|2 0 ≤ jk ≤ 2n − 1
.. ..
. .

Thus, we have simulated a classical Markov chain with state space


{0, 1, 2, . . . , 2n − 1}. The drawback here is that we need a separate
unitary operator for each of the 2n possible outcomes of the measure-
ment.
Problem Given a doubly stochastic matrix M of size n × n, does
there exist a unitary matrix U such that, |uij |2 = pij for all i, j ∈
{0, 1, 2, . . . , n}?
Existence of such a matrix will result in simplification of the quantum
circuit for simulating a Markov chain.

2.2 Quantum Gates


2.2.1 One qubit gates
In classical computing, the only interesting one-bit gate is the NOT gate.
In the quantum world, we have many 1-qubit gates. Some of them are
given below.

1. Pauli gates: There are three such gates and they are denoted by
X, Y, Z. The unitary matrices of X, Y, Z in the computational
basis are given by
· ¸ · ¸ · ¸
0 1 0 −i 1 0
X= ,Y = ,Z = .
1 0 i 0 0 −1

The unitary matrix X is also called the not gate because X|0i = |1i
and X|1i = |0i.
These gates are called Pauli gates because the unitary matrices
corresponding to these operators are the Pauli matrices σ1 , σ2 , σ3
14 Lecture 2. Quantum Gates and Circuits

of quantum mechanics. Pauli matrices are the basic spin observ-


ables taking values ±1. X, Y, Z are hermitian, X 2 = Y 2 = Z 2 = 1
and X, Y, Z anticommute with each other i.e. XY + Y X = 0.
2. Hadamard gate: The
£ unitary
¤ matrix corresponding to the Hadamard
gate is H = √12 11 −11 . In this case, H|0i = |0i+|1i

2
and H|1i =
|0i−|1i n

2
Its n-fold tensor product H ⊗ is the Hadamard gate on
.
n-qubits satisfying
n 1 X
H ⊗ |00 . . . 0i = n |xi
2 2 x∈{0,1}n

and more generally


n 1 X
H ⊗ |xi = n (−1)x·y |yi,
22 y∈{0,1}n

where x · y = x1 y1 + x2 y2 + · · · + xn yn .
£ ¤
3. Phase gate: The unitary matrix for this gate is S = 10 0i . This
gate changes the phase of the ket vector |1i by i so that |1i be-
comes i|1i, and leaves the ket vector |0i fixed.
π
4. 8 gate: The unitary matrix for this gate is
" # " −iπ #
1 0 π e 8 0
T = iπ = ei 8 iπ .
0 e4 0 e8
π
This gate changes the phase of |1i by ei 4 .

2.2.2 Two qubit gates


1. Controlled NOT: This gate (Figure 2.4 ) acts as a NOT gate on the
second qubit (target qubit) if the first qubit (control qubit) is in
the computational basis state |1i. So the vectors |01i and |00i are
unaltered, while the vector |10i gets modified into |11i and vice
versa.
The unitary matrix for this gate is
 
1 0 0 0
0 1 0 0
T = 0 0
.
0 1
0 0 1 0
2.2. Quantum Gates 15

Figure 2.4: Two qubit gates. A CNOT gate and a SWAP gate.

The gate could also negate the content of the first qubit depending
on the second qubit. Such a gate will have a different unitary ma-
trix. The essential point is that a qubit can get negated depending
on a control qubit. The control qubit will always be denoted by a
solid dot in pictures.

2. Swap gate:
This gate (Figure 2.4) swaps the contents of the two qubits. Be-
cause the vectors |00i and |11i are symmetric, they are unaltered,
while the vector |01i gets mapped to |10i and vice versa.
The unitary matrix for this gate is
 
1 0 0 0
0 0 1 0
P =
0
.
1 0 0
0 0 0 1

Exercise 2.2.1 Prove that the two circuits given in Figure 2.5
are equivalent.

Figure 2.5: Swap gate as a composition of three CNOT gates.

Solution To check the equivalence of the circuits on the left hand


side and right hand side we compute how the circuit on the right
hand side acts on the basis state |a, bi.

|a, bi → |a, a⊕bi → |a⊕(a⊕b), a⊕bi = |b, a⊕bi → |b, (a⊕b)⊕bi = |b, ai.
16 Lecture 2. Quantum Gates and Circuits

3. Controlled unitary: This is just like the controlled NOT, but in-
stead of negating the target qubit, we perform the unitary trans-
form prescribed by the matrix U (only if the control qubit is in
state |1i). It is represented schematically as shown in the first
diagram of Figure 2.6.

2.2.3 Three qubit gates

Figure 2.6: A controlled unitary gate, Toffoli gate and a Fredkin gate.

1. Toffoli gate: This (as in second diagram of Figure 2.6) is a double


controlled NOT gate. The only computational basis vectors which
get changed are |110i and |111i. The corresponding unitary matrix
is  
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
 
0 0 0 1 0 0 0 0
U = 0 0 0 0 1 0 0 0 .

 
0 0 0 0 0 1 0 0
 
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
2. Fredkin gate: This is a controlled swap gate (last diagram of Fig-
ure 2.6). The corresponding unitary matrix is
 
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
 
0 0 0 1 0 0 0 0
U = 0 0 0 0 1 0 0 0 .

 
0 0 0 0 0 0 1 0
 
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
2.2. Quantum Gates 17

2.2.4 Basic rotations


We describe in this part, some basic rotation gates, each acting on a
single qubit.
The basic rotation operators, which induce rotation by an angle θ
about the x, y and z axis respectively, are denoted by Rx (θ), Ry (θ) and
Rz (θ) and they are defined by the following equations.
· ¸
cos 2θ −i sin 2θ − iθX θ θ
Rx (θ) = θ θ =e
2 = cos I − i sin X;
−i sin 2 cos 2 2 2
· θ θ
¸
cos 2 − sin 2 − iθY θ θ
Ry (θ) = θ θ =e
2 = cos I − i sin Y ;
sin cos 2 2 2
" θ2 #
e−i 2 0 − iθZ θ θ
Rz (θ) = i θ =e
2 = cos I − i sin Z.
0 e 2 2 2

More generally Rn̂ (θ) = (cos 2θ )I − (i sin 2θ )(n̂x X + n̂y Y + n̂z Z) is the
matrix corresponding to rotation by an angle θ about the axis with
direction cosines (n̂x , n̂y , n̂z ).

Theorem 2.2.2 (Euler) Every 2×2 unitary matrix U can be expressed


as
" #
−i( β+δ ) γ −i( β−δ ) γ
e 2 cos −e 2 sin
U = eiα β−δ
2
β+δ
2 = eiα R (β)R (γ)R (δ).
z y z
ei( 2 ) sin γ2 ei( 2 ) cos γ2

Corollary 2.2.3 Every 2 × 2 matrix U can be expressed as


U = eiα AXBXC,
where A, B and C are 2 × 2 unitary operators and ABC = I.

Proof By Theorem 2.2.2 we can write


U = eiα Rz (β)Ry (γ)Rz (δ).
Set
µ ¶ µ ¶ µ ¶ µ ¶
γ γ β+δ δ−β
A = Rz (β)Ry , B = Ry − Rz − and C = Rz .
2 2 2 2
It is easy to check that A, B and C satisfy the required conditions.
¤
18 Lecture 2. Quantum Gates and Circuits

Corollary 2.2.4 In Figure 2.7, the circuit on the left hand side is
equivalent to the circuit on the right hand side if AXBXC = e−iα U ,
ABC = I and · ¸
1 0
D= .
0 eiα

U C B A

Figure 2.7: Circuit implementing the controlled-U operation for single


qubit U . α, A, B and C satisfy U = eiα AXBXC, ABC = I.

Proof The equivalence of the circuits can be verified by checking how


the computational basis states evolve.

|0i|ui → |0iC|ui → |0iBC|ui → |0iABC|ui → D|0iABC|ui = |0i|ui.


|1i|ui → |1iC|ui → |1iXC|ui → |1iBXC|ui → |1iXBXC|ui
→ D|1iAXBXC|ui = eiα |1ie−iα U |ui = |1iU |ui.

Corollary 2.2.5 In Figure 2.8, the circuit on the left hand side is equiv-
alent to the circuit on the right hand side if V 2 = U .

U V V† V

Figure 2.8: Circuit for the C 2 (U ) gate. V is any unitary operator sat-
isfying V 2 = U . The special case V = (1 − i)(I + iX)/2 corresponds to
the Toffoli gate.
2.3. Some Simple Circuits 19

Proof
|00i|ui → |00i|ui.
|01i|ui → |01iV |ui → |01iV † V |ui = |01iI|ui = |01i|ui.
|10i|ui → |11i|ui → |11iV † |ui → |10iV † |ui → |10iV V † |ui = |10i|ui.
|11i|ui → |11iV |ui → |10iV |ui → |11iV |ui → |11iV V |ui = |11iU |ui.

Corollary 2.2.6 A Toffoli gate can be expressed as a composition of


controlled NOT’s and 1-qubit gates.

Proof Follows from the previous two corollaries.


¤

Exercise 2.2.7 Derive and verify that the circuit on the right hand side
of Figure 2.9 is a correct realization of the Toffoli gate using controlled
NOT and single qubit gates.

T† T† S

H T† T T† T H

Figure 2.9: Implementation of the Toffoli gate using Hadamard, phase,


controlled NOT and π8 gates.

2.3 Some Simple Circuits


2.3.1 Quantum teleportation
In quantum teleportation, Alice (sender) can send a qubit to Bob (re-
ceiver) without using a quantum communication channel. In order to
achieve this, Alice and Bob together generate an EPR pair (i.e. |00i+|11i

2
)
and share one qubit each.
20 Lecture 2. Quantum Gates and Circuits

Suppose Alice wants to send an unknown qubit |ψi = α|0i + β|1i.


Then she cannot even measure it because she has only one copy of it.
Even if Alice knows the state of the qubit |ψi sending it to Bob through
classical channel will not be possible at all. But by making use of the
EPR pair Alice can send the qubit |ψi to Bob just by sending two
additional classical bits of information.

M1
H

M2

X M2 Z M1
|ψ0 i |ψ1 i |ψ2 i

Figure 2.10: Circuit used by Alice and Bob

To accomplish the task Alice makes a circuit as shown in Figure 2.10.


Alice has access to the top two qubits. So all operations Alice does
involve only the top two qubits.
The initial state of the system is

|00i + |11i 1
|ψ0 i = |ψi √ = √ [α|0i(|00i + |11i) + β|1i(|00i + |11i)].
2 2

After the first CNOT gate the state of the system is

1
|ψ1 i = √ [α|0i(|00i + |11i) + β|1i(|10i + |01i)].
2

After she sends the first qubit through the Hadamard gate the state of
the system is

1
|ψ2 i = [α(|0i + |1i)(|00i + |11i) + β(|0i − |1i)(|10i + |01i)].
2
2.3. Some Simple Circuits 21

Collecting the first two qubits the state |ψ2 i can be re-written as

1
|ψ2 i = [|00i(α|0i + β|1i) + |01i(α|1i + β|0i) + |10i(α|0i − β|1i)+
2
|11i(α|1i − β|0i)].

When Alice makes a measurement on the two qubits she can control,
the state of Bob’s qubit is completely determined by the results of Alice’s
measurement on her first two qubits. Hence if Alice sends the results of
her measurement to Bob, he can apply appropriate gates on the qubit he
can access and get the state |ψi. The action of Bob can be summarized
as in the table below.

Alice State of Bob’s Gates needed


measures qubit to get |ψi
00 [α|0i + β|1i] I
01 [α|1i + β|0i] X
10 [α|0i − β|1i] Z
11 [α|1i − β|0i] ZX

Thus, the state of the first qubit |ψi is transferred to the third qubit
which is with Bob. The above algorithm implies that one shared EPR
pair and two classical bits of communication is a resource at least equal
to one qubit of quantum communication.

2.3.2 Superdense coding: quantum communication


through EPR pairs
If Alice and Bob initially share an EPR pair, Alice can send Bob two
bits of classical information by passing a single qubit as follows. Alice
makes a circuit as shown in Figure 2.11.

M1
(|00i+|11i)
G
|ϕ0 i = 2

Figure 2.11: Circuit used by Alice and Bob


22 Lecture 2. Quantum Gates and Circuits

Alice selects the gate G according to the bits she wants to send. She
selects a gate according to the table below and applies it to the qubit
she possesses before transmitting it to Bob.

Bits to Gates to Bob


be sent be used receives
|00i+|11i
00 I √
2
|00i−|11i
01 Z √
2
|10i+|01i
10 X √
2
|01i−|10i
11 iY √
2

The four possible states that Bob can receive are the so-called Bell
states or EPR pairs which constitute the Bell basis. Since the Bell states
form an orthogonal basis, they can be distinguished by measuring in the
appropriate basis. Hence when Bob receives the qubit sent by Alice he
has both the qubits. Then he does a measurement in the Bell basis and
finds out the message she wanted to send. In classical computation it is
impossible to send two bits of information by just passing a single bit.
So a qubit can carry more than one bit of classical information.

2.3.3 A generalization of “communication through EPR


states”
Let F be a finite abelian group of order n for example (Z/2Z)k with
n = 2k . Let F̂ denote its character group. Define the Hilbert space

H = L2 (F ) to be the space of functions from F to C under the standard
inner product. The characteristic functions of elements of the group F ,
1{x} where x ∈ F , form the standard orthonormal basis for H. Define

|xi = 1{x} . Let f ∈ H and x ∈ F . For a ∈ F and α ∈ F̂ , define unitary
operators Ua and Vα on H as

(Ua f )(x) = f (x + a), (Vα f )(x) = α(x) f (x).

Ua can be thought of as translation by the group element a and Vα can


be thought of as multiplication by the character α. For (a, α) ∈ F × F̂ ,

define the Weyl operator Wa,α = Ua Vα . It is a unitary operator.
2.3. Some Simple Circuits 23

Exercise 2.3.1 Wa,α Wb,β = α(b)Wa+b,αβ . i.e. the Wa,α form a projec-
tive unitary representation of the group F × F̂ . The term projective is
used to refer to the fact that the unitary operators Wa,α form a repre-
sentation of F × F̂ upto multiplication by a complex scalar (the number
α(b)) of modulus unity.

Exercise 2.3.2 Show that the only linear operators which commute
with Wa,α for all (a, α) ∈ F × F̂ , are the scalars. Hence, the Wa,α ’s
form an irreducible projective representation of the group F × F̂ , i.e.
the only subspaces of H which are invariant under every Wa,α are the
zero subspace and H itself.

Exercise 2.3.3 Show that the operators {Wa,α }(a,α)∈F ×F̂ are linearly
independent. Thus, they span the space B(H) of (bounded) linear op-
erators on H.

Exercise 2.3.4 Show that Wa,α = α(a)W−a,α . Show also that
Tr Wa,α = n if a = 0 and α is the trivial character, where n = |F |;

otherwise Tr Wa,α = 0. Hence, prove that Tr Wa,α Wb,β = nδ(a,α),(b,β) .

Exercise 2.3.5 Define


∆ 1
X
|ψ0 i = √ |xi|xi.
n
x∈F


Also define |(a, α)i = (Wa,α ⊗ I)|ψ0 i, where I is the identity operator
on H. Then, {|(a, α)i}(a,α)∈F ×F̂ is an orthonormal basis for H ⊗ H.

Enumerate (a, α) as f (a, α) ∈ {1, 2, . . . , n2 }, in some order. Define


the Hermitian measurement operator

X
X= f (a, α) | (a, α)ih(a, α) | .
(a,α)∈F ×F̂

|ψ0 i is the entangled state which Alice and Bob share. Alice holds the
first log n qubits of the state while Bob holds the other log n qubits. To
send a message m ∈ [n2 ], Alice applies the unitary transformation Wa,α ,
where f (a, α) = m, on her qubits. She then sends her qubits to Bob,
who then applies the measurement X on the 2 log n qubits which he now
has. The outcome of the measurement is m, which is exactly what Alice
intended to send. Thus Alice has communicated 2 log n classical bits of
information using only log n qubits of quantum communication.
24 Lecture 2. Quantum Gates and Circuits

Alice Bob

Alice Bob
Wa,α
log n log n
Bob
|ψ0 i
2 log n
Bob Bob
X
log n

Figure 2.12: Circuit used by Alice and Bob

Exercise 2.3.6 In the case where F = Z/2Z, this reduces to com-


municating two classical bits at a time using one qubit, by the usual
superdense coding technique!

2.3.4 Deutsche algorithm

This algorithm enables us to find out whether a function f : {0,1} → {0,1},


is a constant function or not, by computing the function only once. In
classical theory of computation we must evaluate the function twice be-
fore making such a conclusion.
Corresponding to the function f we consider the unitary operator Uf ,
where Uf |xyi = |xi|y⊕f (x)i, x, y ∈ {0, 1}. The circuit for implementing
the algorithm is shown in Figure 2.13.

|0i H x x H
Uf

|1i H y y ⊕ f (x)
ψ0 ψ1 ψ2 ψ3

Figure 2.13: Circuit for implementing Deutsche Algorithm.


2.3. Some Simple Circuits 25

We follow the evolution of the circuit in Figure 2.13.

|ψ0 i = |01i
1
|ψ1 i = (|0i + |1i) (|0i − |1i) .
2
³ ´ ³ ´
Observe that Uf |xi |0i−|1i

2
= (−1)f (x) |xi |0i−|1i

2
.
(
± 12 (|0i + |1i)(|0i − |1i) if f (0) = f (1);
|ψ2 i =
± 12 (|0i − |1i)(|0i − |1i) otherwise.
(
±|0i (|0i−|1i)

2
if f (0) = f (1);
|ψ3 i = (|0i−|1i)
±|1i √2 otherwise.
(|0i − |1i)
=⇒ |ψ3 i = ±|f (0) ⊕ f (1)i √ .
2
Thus, by measuring the first bit we get
½ ¾
(|0i − |1i)
{f (0) ⊕ f (1)}, ±|f (0) ⊕ f (1)i √ .
2
In this algorithm, both superposition and interference of quantum states
are exploited.

2.3.5 Arithmetical operations on a quantum computer


We now see how addition may be performed on a quantum computer.
Let x, y be two n + 1 bit integers. Then we have
x = an an−1 ... a0
y = bn bn−1 ... b0
x+y = cn sn sn−1 ... s0
and
x0 = an−1 an−2 ... a0
y0 = bn−1 an−2 ... b0
x0 + y 0 = cn−1 sn−1 sn−2 ... s0
Note that s0 , s1 , . . . sn−1 are same in both these additions. Also,

(cn , sn ) = (an bn ⊕ cn−1 (an ⊕ bn ), an ⊕ bn ⊕ cn−1 ).

Note that the Toffoli gate sends |abci → |abi|c ⊕ abi.


26 Lecture 2. Quantum Gates and Circuits

|cn−1 i |cn−1 i

|an i |an i

|bn i |an ⊕ bn ⊕ cn−1 i

|di |d ⊕ an bn ⊕ cn−1 (an ⊕ bn )i

Figure 2.14: Circuit for adding two single bit numbers with carry.

Consider a subroutine for adding two single bit numbers with carry.
The circuit for this subroutine is shown in Figure 2.14.
If we measure the last two qubits in the circuit in Figure 2.14, we
get the outputs {sn }, {cn } and the collapsed states |sn i, |cn i provided
d = 0. Hence, using this subroutine we can add two n-bit numbers.
Addition:
We would like to count the number of Toffoli and CNOT gates used
by the circuit as a measure of complexity. Suppose αn Toffoli and βn
CNOT gates are used for adding two n-bit numbers. Then

αn+1 = αn + 2, βn+1 = βn + 2
=⇒ αn = α1 + 2(n − 1), βn = β1 + 2(n − 1).

Consider the circuit in Figure 2.15.


|a0 i |a0 i

|b0 i |a0 ⊕ b0 i s0

|di |d ⊕ a0 b0 i c0 when d = 0

Figure 2.15: Circuit for adding two single bit numbers without carry.

Thus, α1 = 1 and β1 = 1. This implies αn = βn = 2n − 1. So by


this method of adding two n bit numbers we need 2n − 1 Toffoli and
2.3. Some Simple Circuits 27

2n − 1 CNOT gates. The circuit for adding two n bit numbers is shown
in Figure 2.16.
|a0 i |a0 i
|b0 i |s0 i
|0i → |d0 i |c0 i
|a1 i 1 bit |a1 i
|b1 i ADD |s1 i
|0i → |d1 i |c1 i
|a2 i 1 bit |a2 i
|b2 i ADD |s2 i
|0i → |d2 i |c2 i
.. .. ..
. . .

|0i → |dn−2 i |cn−2 i


|an−1 i 1 bit |an−1 i
|bn−1 i ADD |sn−1 i
|0i → |dn−1 i |cn−1 i

Figure 2.16: Circuit for adding two n bit numbers without carry.

Subtraction:

To evaluate a − b, where a, b are two n bit numbers, add a and


2n − b to get a + 2n − b = en en−1 . . . e0 . Note that 2n − b can be
easily computed using only CNOT gates. If en = 0, then a − b =
−(1 ⊕ en−1 )(1 ⊕ en−2 ) · · · (1 ⊕ e0 ). If en = 1, then a − b = en−1 en−2 . . . e0 .

Exercise 2.3.7 Count the number of gates required in the above sub-
traction algorithm.

Exercise 2.3.8 Device a circuit for addition (mod N ), multiplication


and division.
28 Lecture 2. Quantum Gates and Circuits
Lecture 3

Universal Quantum Gates

3.1 CNOT and Single Qubit Gates are


Universal
In classical computation the AND, OR and NOT gates are universal
which means that any boolean function can be realized using only these
three gates. In this lecture, we prove the quantum analogue of this
theorem. We show that any unitary transformation in an n-qubit Hilbert
space can be approximated by compositions of Hadamard, CNOT, phase
and π/8 gates to any desired degree of accuracy. We proceed by proving
two propositions from which the theorem immediately follows.

Lemma 3.1.1 Any n×n unitary matrix U can be expressed as a product


of at most one phase factor and n(n−1)
2 unitary matrices, each of which
acts on a 2-dimensional coordinate plane.
 
u11 u12 . . . u1n
 u21 u22 . . . u2n 
Proof Let U =   .
.
. ... . 
un1 un2 . . . unn
If u21 = 0, do nothing. Otherwise, left multiply by a unitary matrix
 
α β
0
U1 =  −β α 
0 In−2

such that −βu11 + αu21 = 0 and |α|2 + |β|2 = 1. Solving we get

u11 u21
α= p and β = p .
2
|u11 | + |u21 | 2 |u11 |2 + |u21 |2

29
30 Lecture 3. Universal Quantum Gates

Now consider M 1 = U1 U . The M 1 (2, 1) entry is 0. If M 1 (3, 1) is 0,


we do nothing. Otherwise we left multiply by U2 in the (1, 3) plane to
make the entry (3, 1) in the resulting matrix 0. Continuing this way we
get  
v11 v12 . . . v1n
 0 v22 . . . v2n 
Un−1 Un−2 · · · U1 U =  .

. ... . 
0 vn2 . . . vnn
where |v11 | = 1. Orthogonality between the 1st and any other column
shows that v12 = v13 = · · · = v1n = 0. Thus
 
1 0 0 ... 0
 0 
 
 0 
 
−1 
v11 Un−1 Un−2 · · · U1 U =  . 
W 
 . 
 
 . 
0

where W is an n − 1 × n − 1 unitary matrix. The same procedure


is repeated for the reduced matrix W . We repeat these operations
till we get the identity matrix I. Pooling
¡ ¢ the phase factors we get
eiα Um Um−1 · · · U1 U = I where m ≤ n2 . It is to be noted that Uj is
an element in SU (2) acting in a two dimensional subspace. Transferring
the Uj ’s to the right we get U = eiα U1† U2† · · · Um†
.
¤

Lemma 3.1.2 Any matrix U ∈ SU (2) acting in a 2-dimensional sub-


space can be realized using single qubit and r-controlled 1-qubit gates.

Proof Consider H = (C2 )⊗n with computational basis {|xi, x ∈ {0, 1}n }.
Consider a pair x, y which differ in exactly one place, say i.

|xi = |ai|0i|bi,
|yi = |ai|1i|bi,

with a and b being words of length i − 1 and n − i respectively.


A unitary matrix U in the two dimensional plane spanned by |xi and
|yi which leaves the other·kets |zi¸fixed can be expressed as in Figure 3.1,
α β
with U replaced by Ũ = and |α|2 + |β|2 = 1.
−β α
3.1. CNOT and Single Qubit Gates are Universal 31

a1 X1

a2 X2

ai−1 Xj1 −1

b1

b2 Xj1 +1

bn−i Xn

Figure 3.1: A generalized Figure 3.2: A generalized con-


controlled U operation on n- trolled NOT operation on n-
qubits. qubits.
32 Lecture 3. Universal Quantum Gates

Suppose now x and y differ in r places. Then we can construct a


sequence
x = x(0) , x(1) , x(2) , . . . , x(r−1) , x(r) = y
of n length words such that x(i) and x(i+1) differ exactly in one position
for all i = 0, 1, 2, . . . , r − 1.
Let x, x(1) differ at position j1 ,
x(1) , x(2) differ at position j2 ,
.. .. ..
. . .
and x(r−1) , x(r) differ at position jr .
Now a controlled NOT gate (it is not the CNOT gate) is applied on
x with the j1 bit as target and the remaining n − 1 bits as control bits.
The NOT gate acts on the j1 bit if the first bit is x1 , the second bit is x2
and so on. This can be implemented with X (NOT) and CNOT gates
as shown in the Figures 3.2 and 3.3.

0 X X

U U

Figure 3.3: Realizing a generalized controlled operation.

We follow this by a controlled NOT on x(1) with j2 as the target bit


and the remaining n − 1 as the control bits. After continuing this up to
x(r−1) , we apply Ũ . Then we just do the reverse of the controlled NOT
operations. This implements Ũ in the plane generated by |xi and |yi
keeping all |zi fixed where z differs from both x and y.
Figure 3.3 shows how a generalized controlled 1-qubit gate can be re-
alized using 1-qubit gates and r-controlled 1-qubit gate. This completes
the proof.
¤

Lemma 3.1.3 If n ≥ 2, then an n-controlled 1-qubit gate can be realized


by (n − 1)-controlled 1-qubit gates.
3.1. CNOT and Single Qubit Gates are Universal 33

Proof Let U = V 2 where U, V ∈ SU (2). Then we see that the two


circuits in Figure 3.4 are equivalent.
¤

U V V† V

Figure 3.4: n-controlled 1-qubit gate as a composition of five (n − 1)


controlled 1-qubit gates.

Exercise 3.1.4 Show that in Figure 3.4 the circuit on the left hand side
is equivalent to the circuit on the right hand side.

Lemma 3.1.5 A controlled 1-qubit gate can be realized using CNOT


and single qubit gates.
· ¸
iα 1 0
Proof Let U = e AXBXC, ABC = I, D = . Then from
0 eiα
Corollary 2.2.4 we know that the two circuits in Figure 3.5 are equivalent.

Proposition 3.1.6 Any arbitary unitary matrix on an n-dimensional


Hilbert space can be realized using phase, single qubit and CNOT gates.

Proof The proof follows from Lemma 3.1.1, Lemma 3.1.2, Lemma 3.1.3
and Lemma 3.1.5.
¤
34 Lecture 3. Universal Quantum Gates

U C B A

Figure 3.5: Controlled 1-qubit gate as a composition of two CNOT and


four 1-qubit gates.

π
Proposition 3.1.7 The group generated by H and e−i 8 Z is dense in
SU (2).
π π
Proof H 2 = I, HZH = X, HY H = −Y , He−i 8 Z H = e−i 8 X and
π π π ³ π ´n³ π´ ³ π´ o
e−i 8 Z e−i 8 X = cos2 I − i sin cos (X +Z) + sin Y
8 8 8 8
= R~n(α) ,
π π π
(cos , sin 8 , cos 8 )
where cos α = cos2 π8 , ~n = √
8
2 π
,
1+cos 8

π π π ³ π ´n³ π´ ³ π´ o
He−i 8 Z e−i 8 X H = cos2 I − i sin cos (X +Z)− sin Y
8 8 8 8
= Rm(α)
~ ,
π π π
(cos
where, m
~ = √− sin 82 , πcos 8 ) . Now we need the following lemma.
8
,
1+cos 8

Lemma 3.1.8 If cos α = cos2 π8 , then α is an irrational multiple of π.

Proof See Appendix.


¤
Any R~n (θ) can be approximated as closely as we want by a suitable
power of R~n (α) because α is an irrational multiple of π. Similarly, any
Rm~ (φ) can be approximated by a suitable power of Rm ~ (α).
Since ~n and m~ are two linearly independent unit vectors, any U ∈
SU (2) can be written as U = eiψ R~n (θ1 )Rm
~ (θ2 )R~
n (θ3 ). This is an imme-
diate consequence of Euler’s theorem (Theorem 2.2.2). This completes
the proof.
¤
3.2. Appendix 35

Now we are ready for the main theorem.

Theorem 3.1.9 The subgroup generated by the Hadamard gate H, phase


gate, CNOT and the π/8 is dense in the unitary group U (2).

Proof Immediate from Proposition 3.1.6 and Proposition 3.1.7.


¤

3.2 Appendix
In this section we first give all the definitions and results needed to
prove Lemma 3.1.8. The proofs which are routine are left out. The
reader may refer to [3, 8] for a comprehensive treatment. We start with
a few definitions.
A nonzero ring R with 1 6= 0 is called an integral domain if it has
no zero divisors. In other words, it has the property that for A, B ∈ R,
if AB = 0, then A = 0 or B = 0.
An ideal is called principal if it is generated by a single element.
An integral domain in which every ideal is principal is called a prin-
cipal ideal domain.

Exercise 3.2.1 Show that for any field k, k[x] is a principal ideal do-
main.
An element P (6= {0, 1}) of an integral domain R is called prime if
the following is true: if P divides a product of two elements of R, then
it divides one of the factors.
A nonconstant polynomial P ∈ F[x] is called irreducible if it is writ-
ten as a product of two polynomials P1 , P2 ∈ F[x] then either P1 or P2
is a constant.
A polynomial is called monic if the coefficient of the leading term
is 1.
A polynomial a0 + a1 x + · · · + an xn in Z[x] is called primitive if g.c.d.
(|a0 | , . . . , |an |) = 1 and an > 0.

Remark 3.2.2 Every nonzero polynomial P ∈ Q[x] can be written as


a product P = cP0 , where c is a rational number and P0 is a primitive
polynomial in Z[x]. Note that this expression for P is unique and the
polynomial P has integer coefficients if and only if c is an integer. In
that case |c| is the g.c.d. of the coefficients of P and c and the leading
coefficient of P have the same sign.
36 Lecture 3. Universal Quantum Gates

The rational number c which appears in Remark 3.2.2 is called the


content of P . If P has integer coefficients, then the content divides P
in Z[x]. Also, P is primitive if and only if its content is 1.

Lemma 3.2.3 Let ϕ : R −→ R0 be a ring homomorphism. Then for


any element α ∈ R0 , there is a unique homomorphism Φ : R[x] −→ R0
which agrees with the map ϕ on constant polynomials and sends x à α.

Let Fp = Z/pZ. Lemma 3.2.3 gives us a homomorphism Z[x] → Fp .


This homomorphism sends a polynomial P = am xm + · · · + a0 to its
residue P = am xm + · · · + a0 modulo p.

Theorem 3.2.4 (Gauss’s Lemma) A product of primitive polynomi-


als in Z[x] is primitive.

Proof Let P and Q be two primitive polynomials in Z[x] and let R be


their product. Obviously the leading coefficient of R is positive. To show
that R is primitive, it is enough to show that no prime integer p divides
all the coefficients of R. Consider the homomorphism Z[x] −→ Fp [x]
defined above. Since P is primitive, its coefficients are not all divisible
by p. So P 6= 0. Similarly, Q 6= 0. Since the polynomial ring Fp [x] is an
integral domain, R = P Q 6= 0. Therefore p does not divide one of the
coefficients of R. This implies that R is primitive.
¤

Proposition 3.2.5 1. Let F , G be polynomials in Q[x], and let F0 ,


G0 be the associated primitive polynomials in Z[x]. If F divides G
in Q[x], then F0 divides G0 in Z[x].
2. Let F, G ∈ Z[x] such that F is primitive and G is divisible by F in
Q[x], say G = F Q, with Q ∈ Q[x]. Then Q ∈ Z[x], and hence F
divides G in Z[x].
3. Let F , G be polynomials in Z[x]. If they have a common noncon-
stant factor in Q[x], then they have such a factor in Z[x] too.

Proof To prove (1), we may clear denominators so that F and G be-


come primitive. Then (1) is a consequence of (2). By Remark 3.2.2
we can write Q = cQ0 , where Q0 is primitive and c ∈ Q. By Gauss’s
Lemma, F Q0 is primitive, and the equation G = cF Q0 shows that it is
the primitive polynomial Q0 associated to Q. Therefore Q = cQ0 is the
expression for Q referred to in Lemma 3.2.2, and c is the content of Q.
3.2. Appendix 37

Since c is the content of both G and Q, and G ∈ Z[x], it follows that


c ∈ Z, hence that Q ∈ Z[x]. Now let us prove (3). Suppose that F , G
have a common factor H in Q[x]. We may assume that H is primitive,
and then by (2) H divides both F and G in Z[x].
¤

Corollary 3.2.6 If a nonconstant polynomial F is irreducible in Z[x],


then it is irreducible in Q[x].

Proposition 3.2.7 Let F be an integer polynomial with positive leading


coefficient. Then F is irreducible in Z[x] if and only if either
1. F is a prime integer, or
2. F is a primitive polynomial which is irreducible in Q[x].

Proof Suppose that F is irreducible. As in Remark 3.2.2, we may


write F = cF0 , where F0 is primitive. Since F is irreducible, this cannot
be a proper factorization. So either c or F0 is 1. If F0 = 1, then F is
constant, and to be irreducible a constant polynomial must be a prime
integer. The converse is trivial.
¤

Lemma 3.2.8 In a principal ideal domain, an irreducible element is


prime.

Proof Let R be a principal ideal domain and F be an irreducible


element in R. Let F |GH, G, H ∈ R. We assume that F - G. Then the
ideal generated by F and G is R (why?). Thus we may write F1 F +
G1 G = 1 for some F1 , G1 ∈ R. This implies F1 F H + G1 GH = H.
Hence F |H. This shows that F is prime.
¤

Theorem 3.2.9 Every irreducible element of Z[x] is a prime element.

Proof Let F be irreducible, and suppose F divides GH, where G,


H ∈ Z[x].
Case 1: F = p is a prime integer. Write G = cG0 and H = dH0 as in
Remark 3.2.2. Then G0 H0 is primitive, and hence some coefficient a of
G0 H0 is not divisible by p. But since p divides GH, the corresponding
coefficient, which is cda, is divisible by p. Hence p divides c or d, so p
divides G or H.
38 Lecture 3. Universal Quantum Gates

Case 2: F is a primitive polynomial which is irreducible in Q[x]. By


Lemma 3.2.8, F is a prime element of Q[x]. Hence F divides G or H in
Q[x]. By Proposition 3.2.5, F divides G or H in Z[x].
¤

Lemma 3.2.10 Let F = an xn +· · ·+a0 ∈ Z[x] be an integer polynomial,


and let p be a prime integer which does not divide an . If the residue F
of F modulo p is irreducible, then F is irreducible in Q[x].

Proof This follows from the natural homomorphism Z[x] −→ Fp [x]


(see Lemma 3.2.3). We may assume that F is primitive. Since p does
not divide an , the degrees of F and F are equal. If F factors in Q[x],
then it also factors in Z[x] by Corollary 3.2.6. Let F = GH be a proper
factorization in Z[x]. Since F is primitive, G and H have positive degree.
Since deg F = deg F and F = G H, it follows that deg G = deg G and
deg H = deg H, hence that F = G H is a proper factorization, which
shows that F is reducible.
¤

Theorem 3.2.11 (Eisenstein criterion) Let F = an xn + · · · + a0 ∈


Z[x] be an integer polynomial, and let p be a prime integer. Suppose that
the coefficients of F satisfy the following conditions:

1. p does not divide an ;


2. p divides other coefficients an−1 , . . . a0 ;
3. p2 does not divide a0 .

Then F is irreducible in Q[x]. If F is primitive, it is irreducible in Z[x].

Proof Assume F satisfies the hypothesis. Let F denote the residue


modulo p. The conditions (1) and (2) imply that F = an xn and that
an 6= 0. If F is reducible in Q[x], then it will factor in Z[x] into factors
of positive degree, say F = GH. Then G and H divide an xn , and
hence each of these polynomials is a monomial. Therefore all coefficients
of G and of H, except the highest ones are divisible by p. Let the
constant coefficients of G, H be b0 , c0 . Then the constant coefficient of
F is a0 = b0 c0 . Since p divides b0 and c0 , it follows that p2 divides
a0 , which contradicts (3). This shows that F is irreducible. The last
assertion follows from Proposition 3.2.7.
¤
3.2. Appendix 39

Corollary 3.2.12 Let p be a prime. Then the polynomial f (x) = xp−1 +


xp−2 + · · · + x + 1 is irreducible in Q[x]. (Such polynomials are called
cyclotomic polynomials, and their roots are the pth roots of unity.)

Proof First note that (x − 1)f (x) = xp − 1. Now substituting x = y + 1


into this equation we get
µ ¶ µ ¶
p p p p−1 p
y f (y + 1) = (y + 1) − 1 = y + y + ··· + y.
1 p−1
¡¢
We have pi = p(p − 1) · · · (p − i + 1)/i!. If i < p, then the prime p
is not a factor ¢ i!, so i! divides the product (p − 1) · · · (p − i + 1) This
¡ of
implies that pi is divisible by p. Dividing the expansion of y f (y + 1)
by y shows that f (y + 1) satisfies the Eisenstein criterion and hence it
is an irreducible polynomial. This implies that f (x) is also irreducible.
¤

Theorem 3.2.13 If cos α = cos2 π8 , then α is an irrational multiple


of π.

Before proceeding to the proof of this theorem we shall establish a


lemma.

Lemma 3.2.14 Let λ = α/π, where α is as in Theorem 3.2.13. Then


β = e2iπλ is a root of the irreducible monic polynomial mβ = x4 + x3 +
1 2
4 x + x + 1 (over Q[x]).

Proof Let mβ be the irreducible monic polynomial which has β as one


of its roots. Note that sin 2πλ is not equal to zero. This means mβ has
a complex root. Since its coefficients are rational it must also have the
root β. Thus, mβ must be divisible by x2 − 2Re{β} + 1. Elementary
computation shows that

1 √
2Re{β} = − + 2.
2

So mβ is divisible by p(x) = x2 − ( 2 − 12 )x + 1. Since, p(x) has
irrational coefficients and mβ has rational coefficients, mβ must have
another irrational root, say δ. This implies mβ has another quadratic
This means that deg(mβ ) ≥ 4. Consider
factor with real coefficients. √
the polynomial p0 (x) = x2 + ( 2 + 21 )x + 1. Multiplying p(x) and p0 (x)
40 Lecture 3. Universal Quantum Gates

we get x4 + x3 + 41 x2 + x + 1. From the construction β is a root of the


polynomial
1
mβ = x4 + x3 + x2 + x + 1,
4
which has no rational roots.
¤

Proof of Theorem 3.2.13 Note that the polynomial mβ (x) is not


cyclotomic. Let us assume that λ is rational. Then β = pq is a root of
the cyclotomic polymomial

Φq (x) = xq−1 + xq−2 + · · · + x + 1.


Q
But Φq (x) = p|q Φp (x), where p is prime. By Corollary 3.2.12 and The-
orem 3.2.9 we know this is a prime factorization of Φq (x). Since, mβ (x)
is minimum irreducible polynomial and Z[x] is a unique factorization
domain (follows from Theorem 3.2.9), mβ (x) is prime. Thus, mβ (x)
must divide Φq (x). Hence, mβ (x) must be a cyclotomic polynomial, a
contradiction.
¤
Lecture 4

The Fourier Transform and an


Application

4.1 Quantum Fourier Transform


The quantum Fourier transform F on a finite dimensional Hilbert space
H of dimension N is defined as a linear operator whose action on an
orthonormal basis |0i, . . . , |N − 1i is given by
N −1
1 X 2π i j k
F |ji → √ e N |ki.
N k=0

It can be easily verified that F defined as above is a unitary operator


and the matrix of the transformation is M (F ) = [ujk ], where ujk =
2π i j k
√1 e N .
N

Theorem 4.1.1 Let the dimension of the Hilbert space H be 2n . Then


the quantum Fourier transform F also has the following product repre-
sentation. Let j = j1 2n−1 + j2 2n−2 + · · · + jn−1 2 + jn . Then

1 2πi0·jn
F |ji = F |j1 j2 . . . jn i = n (|0i + e |1i)(|0i + e2πi0·jn−1 jn |1i) · · ·
2 2

(|0i + e2πi0·j1 j2 ...jn |1i).

41
42 Lecture 4. The Fourier Transform and an Application

Proof
2 −1 n
1 X 2πijk
F |ji = n e 2n |ki
2 2 k=0
1 1 1
1 X X X k1 k2 kn
= n ··· e2πij( 21 + 22 +···+ 2n ) |k1 k2 . . . kn i
2 2 k =0 k =0 kn =0
1 2

1 X 2πijkl
= n ⊗nl=1 e 2l |kl i
2 2 k ,k ,...,kn
1 2

1 2πij
= n ⊗nl=1 (|0i + e 2l |1i)
22
since

j jn−(l−1) jn−1 jn
l
= integer + + · · · + l−1 + l
2 2 2 2
1 n 2πi0.jn−(l−1) jn−(l−2) ...jn
F |ji = n ⊗l=1 (|0i + e |1i)
22
1
= n (|0i + e2πi0.jn |1i)(|0i + e2πi0.jn−1 jn |1i) . . .
22
(|0i + e2πi0.j1 j2 ...jn |1i).

The circuit for implementing Fourier Transform on n-qubits is shown


in Figure 4.1.

|j1 i H R2 Rn−1 Rn
|j2 i H Rn−2 Rn−1








|jn−1 i H R2

|jn i H

Figure 4.1: Efficient circuit for quantum Fourier transform. The output
on the k th qubit from top is √12 (|0i+e2 π i 0.jn−k+1 ...jn |1i). The correctness
of the circuit follows from Theorem 4.1.1.
4.1. Quantum Fourier Transform 43

In Figure 4.1, H represents the Hadamard " gate


# and the unitary
1 0
transform corresponding to the gate Rk is 2πi . From the product
0 e 2k
representation it is easy to see that this circuit does compute the Fourier
transform. To see how the circuit works we consider the input state
|j1 j2 . . . jn i and check how the system evolves. After the first Hadamard
gate the state is
1 j1
(H|j1 i)|j2 j3 . . . jn i = √ (|0i + e2πi 2 |1i)|j2 j3 . . . jn i.
2
After the controlled R2 gate acting on the first qubit the state is
1 j1 j2
(R2 H|j1 i)|j2 j3 . . . jn i = √ (|0i + e2πi( 2 + 22 ) |1i)|j2 j3 . . . jn i.
2
Hence, after the sequence of the controlled Rk ’s on the first qubit, the
state is

(Rn Rn−1 . . . R2 H|j1 i)|j2 j3 . . . jn i


1 j1 j2 jn
= √ (|0i + e2πi( 2 + 22 +... 2n ) |1i)|j2 j3 . . . jn i
2
1
= √ (|0i + e2πi0.j1 j2 ...jn |1i)|j2 j3 . . . jn i.
2
Similarly, we can compute the action on the other qubits. The final
state of the system is
1 2πi0·j1 j2 ...jn
n (|0i + e |1i)(|0i + e2πi0·j2 ...jn |1i) . . . (|0i + e2πi0·jn |1i).
22
Now, if we perform the swap operation i.e. interchange the order of the
qubits we get
1 2πi0·jn
n (|0i + e )(|0i + e2πi0·jn−1 jn ) . . . (|0i + e2πi0·j1 j2 ...jn ),
22
which is exactly the quantum Fourier transform applied to |ji. The
number of Hadamard gates used is n and the number
¥n¦ of controlled rota-
tion gates used is n(n−1)
2 . In the end at most 2 swap gates are used.
2
Therefore, this circuit uses Θ(n ) gates. The best classical algorithm
to compute Fourier transform on 2n elements takes Θ(2n (log 2n )) gates.
Thus to compute classical Fourier transform using classical gates takes
exponentially more time to accomplish the task compared to computing
quantum Fourier transform using a quantum computer.
44 Lecture 4. The Fourier Transform and an Application

Remark 4.1.2 This fact cannot be exploited very well because it is


not possible to get access to the amplitudes in a quantum computer
by measurements. Moreover, it is very difficult to obtain the initial
state whose Fourier transform is to be computed. But quantum Fourier
transform makes phase estimation “easy” which enables us to factor an
integer efficiently in a quantum computer.

4.2 Phase Estimation


Let U be a unitary operator with eigenvector |ui and eigenvalue e2πiϕ
j
where 0 ≤ ϕ ≤ 1. If |ui and controlled U 2 are available then using
Fourier transform one can efficiently estimate the phase ϕ. The circuit
for the first stage of the phase estimation is shown below.
t−1 φ)
|0i H ··· |0i + e2πi(2 |1i
t-qubits

2 φ)
|0i H ··· |0i + e2πi(2 |1i
1 φ)
|0i H ··· |0i + e2πi(2 |1i
0 φ)
|0i H ··· |0i + e2πi(2 |1i

|ui n 0 1 2
··· t−1 n |ui
U2 U2 U2 U2

Figure 4.2: First


√ stage of the phase estimation circuit. Normalization
factors of 1/ 2 have been omitted, on the right side.

In the second stage of the phase estimation inverse Fourier transform


is applied on some selected qubits and a measurement is done on those
qubits in the computational basis. It will be shown that this yields a
good estimate of the phase.
The first stage of the phase estimation uses two registers. The first
register contains t qubits all in the state |0i and the second register
contains n qubits in the state |ui. The number of qubits t in the first
register is chosen according to the accuracy and the probability of success
required in the phase estimation procedure.
4.3. Analysis of the Phase Estimation Circuit 45

The final state after the first stage is

1 t−1 ϕ t−2 ϕ 0
t (|0i + e2πi2 |1i)(|0i + e2πi2 |1i) . . . (|0i + e2πi2 ϕ |1i)|ui
2 2
t
2 −1
1 X
= t e2πiϕk |ki|ui.
22 k=0

In the second stage inverse Fourier transform is applied on the first


register (the first t qubits). This gives us a good estimate of ϕ. A
schematic of the circuit is shown in Figure 4.3. To get a rough idea why
this is true we consider the case when ϕ can be expressed exactly in t
bits (in binary) ϕ = 0 · ϕ1 ϕ2 . . . ϕt . In this case the final state after stage
one can be written as
1
t (|0i + e2πi0·ϕt |1i)(|0i + e2πi0·ϕt−1 ϕt |1i) . . . (|0i + e2πi0·ϕ1 ϕ2 ...ϕt |1i)|ui.
2 2

If we look at the product representation of the Fourier transform it is


immediate that the above expression is the Fourier transform of the state
|ϕ1 ϕ2 . . . ϕt i. Hence measurement in the computational basis after the
inverse Fourier transform will give the exact value of ϕ. If ϕ cannot be
represented in t bits the observed value after measurement will be some
ϕ̃. In the next section we analyze how good is ϕ̃ as an estimate of ϕ.

4.3 Analysis of the Phase Estimation Circuit


In this section we assume that 2t ϕ is not an integer. We follow the
evolution of the state |0i . . . |0i |ui in the circuit depicted in Figure 4.3:
| {z }
t

2 −1 t
1 X
|0i . . . |0i |ui → t |ji|ui
| {z } 22
t j=0
1 X
→ t |jie2πijϕ |ui
2 2
j
1 X −2πijk
→ t e 2t +2πijϕ |ki|ui
2
j,k
k t
1 X 1 − e2πi(ϕ− 2t )2
= t k |ki|ui.
2 1 − e2πi(ϕ− 2t )
k
46 Lecture 4. The Fourier Transform and an Application

|0i t H ⊗t FT†

|ui Uj |ui

Figure 4.3: The schematic for the overall phase estimation circuit.

Hence measurement of the first register (the first t qubits) produces


a random variable x with values in X = {0, 1, . . . , 2t − 1} with

¯ ¯
1 ¯ 1 − e2πi(ϕ− 2kt )2t ¯2
¯ ¯
Pr(x = k) = 2t ¯ k ¯
2 ¯ 1−e 2πi(ϕ− 2t
) ¯
1 sin2 π(ϕ − 2kt )2t
= .
22t sin2 π(ϕ − 2kt )

If the observed value is k, then 2kt is the desired estimate for ϕ. Let
¥ ¦
a = 2t ϕ , d = ϕ − 2at and δ be a positive integer < 2t−1 . We set 2δt to
be the desired tolerance of error in the estimate of ϕ. In other words,
the observed value of the random variable x should lie in

Xδ = {a − δ + 1(mod 2t ), a − δ + 2(mod 2t ), . . . , a + δ(mod 2t )}.

We now obtain a lower bound for the probability Pr(Xδ ).


We will need the following elementary fact which we leave as an
exercise.

Exercise 4.3.1 Show that for any θ ∈ (0, π2 ], sin θ


θ ≥ π2 .
4.3. Analysis of the Phase Estimation Circuit 47

Note that for −2t−1 < k ≤ 2t−1 ,

1 sin2 π(ϕ − a+k


2t )2
t
Pr(x = a + k(mod 2t )) =
22t sin2 π(ϕ − a+k
2t )
1 sin2 π(2t d − k)
=
22t sin2 π(d − 2kt )
1 1
≤ 2t 2
2 sin π(d − 2kt )
1
≤ (follows from Exercise 4.3.1).
4(k − 2t d)2

Now we are ready to give the bound.

−δ t−1
2X
X
t
Pr(X −Xδ ) = Pr(x = a+j(mod 2 )) + Pr(x = a+j(mod 2t ))
j=−2t−1 +1 j=δ+1
µ −δ
X X 2t−1 ¶
1 1 1
≤ +
4 (j − 2t d)2 (j − 2t d)2
j=−2t−1 +1 j=δ+1
µ −δ
X X2t−1 ¶
1 1 1
< + (since 0 < 2t d < 1)
4 j2 (j − 1)2
j=−2t−1 +1 j=δ+1
2t−1 −1
1 X 1
=
2 j2
j=δ
Z
1 ∞ dy
<
2 δ−1 y 2
1
= .
2(δ − 1)

To approximate ϕ up to the first r bits (where r < t) in the binary


expansion, we need to choose δ ≤ 2t−r − 1. If we use t = r + p qubits
in the first register of the phase estimation circuit, the probability of
obtaining an estimate of the phase within the desired error margin is at
least 1 − 2(2p1−2) . Let 1 − ² be the probability of obtaining an estimate
within the desired tolerance of error. Then
µ ¶
1 1
²≥ ⇒ p ≥ log 2 + .
2(2p − 2) 2²
48 Lecture 4. The Fourier Transform and an Application

Hence, if the desired accuracy is r and the required probability of suc-


cessfully getting such an estimate is 1 − ², then we need to choose
µ ¶
1
t ≥ r + log 2 + .

It is easy to see that the phase-estimation circuit uses polynomially many


gates.
Lecture 5

Order Finding

5.1 The Order Finding Algorithm


For any two positive integers x, y denote their greatest common divisor
(GCD) by (x, y). For any positive integer N let Z∗N denote the set
{x | x ∈ N, (x, N ) = 1}. Under multiplication modulo N , Z∗N is an
abelian group. Let ϕ(N ) be the order of this group. Then ϕ(·) is called
the Eulers’s ϕ function. The order of an element x ∈ Z∗N is defined to
be the smallest positive integer r satisfying xr = 1 (mod N ). In the
classical model of computation finding the order of an element in Z∗N is
considered to be a hard problem. Using the phase estimation procedure
of quantum computation we shall demonstrate how one can determine
the order of an element with high probability using only a polynomial
number of gates.
To solve the problem of order finding using a quantum computer we
first translate the problem into a problem concerning unitary operators
as follows.
Let N be an L bit number so that

N = 2j0 +2j1 +2j2 +· · ·+2jk−1 where 0 ≤ j0 < j1 < j2 < · · · < jk−1 < L.

Let the Hilbert space generated by L qubits be denoted by H = (C2 )⊗L .


We define a unitary operator U in H by
(
|x y (mod N )i if y < N, (y = 0, 1, 2, . . . , N − 1);
U |yi =
|yi if N ≤ y ≤ 2L − 1.

It is to be noted that if |x y1 (mod N )i = |x y2 (mod N )i for 0 ≤


y1 < y2 < N then we have x (y2 − y1 ) ≡ 0 (mod N ). But GCD of x
and N is 1. So N |(y2 − y1 ) which is impossible. This means U is a
permutation matrix and hence unitary.

49
50 Lecture 5. Order Finding

Let
r−1
1 X −2πi sk k
|us i = √ e r |x (mod N )i.
r
k=0
We observe that
r−1
1 X −2πi sk k+1
U |us i = √ e r |x (mod N )i
r
k=0
r−1
s 1 X sk
= e2πi r √ e−2πi r |xk (mod N )i.
r
k=0

Thus |us i is an eigenvector of the unitary matrix U with correspond-


s
ing eigenvalue e2πi r , for all s ∈ {0, 1, 2, . . . , r − 1}.
Now if we use the phase estimation algorithm we will get enough in-
formation to obtain the order r. But in order to be able to use the phase
j
estimation we must be able to implement the controlled U 2 operation
efficiently. The other requirement is that we must be able to prepare
the eigenvectors accurately.
j
The controlled U 2 operations can be implemented using O(L3 ) gates
as outlined in Appendix 2. But the second requirement seems impossible
because we need to know r in order to prepare the eigen states. This
problem can be solved by observing that
r−1
1 X
√ |us i = |1i.
r s=0

Thus in the phase estimation procedure


§ ¨ if we set the number of qubits
1
in the first register t = 2L + 1 + 2 + 2² and the L qubits in the second
register in the state |1i, then for each s ∈ {0, 1, . . . , r − 1} we will get
an estimate of the phase ϕ e ≈ rs correct up to the first 2L + 1 bits with
probability at least 1−²r . The circuit is shown in Figure 5.1.
It can be checked that if in the phase estimation circuit we feed in
the superposition of eigen states
r−1
X r−1
X
|ui = cs |us i, where |cs |2 = 1
s=0 s=0

then the output state before measurement will be


( k t
)
1 X 1 − e2πi(ϕs − 2t )2
cs |ki|us i.
2t k
1 − e2πi(ϕs − 2t )
s,k
5.1. The Order Finding Algorithm 51

Register 1
t qubits H ⊗t FT†

Register 2
L qubits xj mod N

Figure 5.1: Quantum circuit for order finding algorithm. The first reg-
ister is initialized to state |0i and the second register is initialized to
state |1i.

Hence on measuring the first t qubits we will get the value of the phase
ϕs correct up to 2L + 1 bits with probability at least |cs |2 (1 − ²).
Now our job is to extract the value of r from the estimated phase.
We know the phase ϕ e ≈ rs correct up to 2L + 1 places. If this estimate
is close enough, we should be able to get r because we know that ϕ e is
the ratio of two bounded integers. This task is accomplished efficiently
using the following result from number theory.
¯ ¯
Theorem 5.1.1 If rs is a rational number such that ¯ rs − ϕ e¯ ≤ 2r12 ,
then rs is a convergent of the continued fraction for ϕ
e and hence can be
efficiently computed using the continued fraction algorithm.

Proof See Appendix 3.


¤
We know that | rs − ϕ̃| ≤ 2−(2L+1) ≤ 2r12 , since r ≤ N ≤ 2L . So if
0
we now use the continued fraction algorithm we will get the fraction rs0
which is equal to rs with (r0 , s0 ) = 1. Thus if s and r are relatively prime
then we get the order of the element x. We know that the number of
positive integers relatively prime and less than r is at least 0·1rlog
log log r
r (see
Appendix 4). The order finding algorithm fails if the phase estimation
algorithm gives a bad estimate or if s divides r. The probability that the
first case does not occur is at least (1 − ²) and the second case does not
occur is at least 0·1 log
log log N
N . Hence if we repeat the algorithm O(L) times
we will find the order with probability greater than 1 − δ for any fixed
δ ∈ (0, 1]. Note that the algorithm presented here can be implemented
with O(L4 ) gates.
52 Lecture 5. Order Finding

The algorithm can be summarized as follows


Inputs: Relatively prime integers N and x.
Output: Order of x.

Runtime: O(L4 ).

Procedure:

Initialize: Set “current smallest” equal to N .

1. Prepare U(x,N ) The equivalent sequence of


j
controlled U 2 operations.
2. |0i|1i Initial state.
P2t −1
3. → √1 t j=0 |ji|1i Create superposition.
2 P
t
2 −1
4. → √1 t j=0 |ji|xj (mod N )i Apply U(x,N ) .
2
≈P
√1 r−1 P2t −1 2πisj
s=0 e r |ji|us i
r2t
1 Pr−1j=0
5. → √r s=0 |ϕ̃i|us i Apply inverse FT to first
register.
6. ϕ̃ Measure first register.
7. Get denominator of all conver- Use Theorem 5.1.2 of Ap-
gents of ϕ̃ pendix 3.
8. For all integers i obtained in Step
7, check if xi = 1 and keep the
smallest of them.
9. Update “current smallest”
10. Repeat steps 1 to 9 O(log N )
times
11. Return “current smallest” With a high probability.
This is the order.

Appendix 1: Classical reversible computation


All quantum gates are reversible (i.e. from the output we can uniquely
recover the input). But the classical gates like ‘AND’ and ‘OR’ are
not reversible. So a quantum circuit cannot exist for any such gate.
However, by adding a few extra wires we can obtain a gate which is
reversible and the required function appears on specified wires. This is
called a reversible classical gate. If the ‘size’ of the circuit is measured by
5.1. The Order Finding Algorithm 53

the number of ‘wires’ then this procedure uses only a constant multiple
of the number of wires used in the earlier classical circuit. The latter
gate can be implemented using a quantum gate. Reversible classical
gates can be built using the Fredkin gate (see Figure 5.2).

x x0

y y0

c c0

Figure 5.2: Fredkin gate (controlled swap).

If we set x to 0 then x0 will be y ∧ c which is the AND gate. If we


set x = 0 and y = 1 then we get c on x0 and ¬c on y 0 . Thus we get
both NOT and FANOUT gates. CNOT can also be used to copy classical
bits. In the process of constructing functional equivalents of the classical
gates using quantum gates some extra wires have been introduced. The
outputs of these wires are called junk. But if the ‘junk’ is some arbitrary
function of the input then the circuit may not behave as a quantum gate
for the function f (x). So instead of some junk output we would like to
have some fixed output on the extra wires. This model is called clean
computation. This can be done as shown in the Figures 5.3, 5.4 and 5.5.

Input Bits Output Bits

REVERSIBLE GATE Junk Bits


Clean Bits

Figure 5.3: Reversible gate


54 Lecture 5. Order Finding

Input
x x
C C −1
Clean Clean
Bits Bits

Clean Output
0’s

Figure 5.4: Clean computation. Computing x 7→ hx, f (x)i

j
Appendix 2: Efficient implementation of controlled U 2
operation
j
To compute the sequence of controlled U 2 operations we have to com-
pute the transformation
t−1 0
|zi|yi → |ziU zt 2 . . . U z1 2 |yi
t−1 0
= |zi|xzt 2 × · · · × xz1 2 y (mod N )i
= |zi|xz y (mod N )i.
j
Thus the sequence of controlled U 2 operations is equivalent to mul-
tiplying the content of the second register by the modular exponential
xz (mod N ), where z is the content of the first register. This can be
computed using clean reversible computation (see Appendix 1).
This is achieved by first reversibly computing the function xz (mod )N
in a third register and then multiplying the contents of the third and
the second register such that each qubit in the third register is in the
state |0i. The task is accomplished in two stages. In the first stage
j
we compute x2 for all j ∈ {1, 2, . . . , t − 1} by successively squaring

Clean Clean
Bits f (x)
Bits f (x) x Cf−1
Cf −1

x Clean
x f (x) Bits

Figure 5.5: Computing a bijective function f


5.1. The Order Finding Algorithm 55

§ 1
¨
x (mod N ), where t = 2L + 1 + log 2 + 2² = O(L). Each multiplica-
2
tion uses at most O(L ) gates (Indeed an O(L log L log log L) algorithm
using FFT is known. See [2, 10].) and there are t − 1 such multiplica-
tions. Hence in this step at most O(L3 ) gates are used. In the second
stage we compute xz (mod N ) using the identity

xz (mod N ) =
t−1 t−2 0
(xzt 2 (mod N ))(xzt 2 (mod N )) · · · (xzt 2 (mod N )).

Clearly this operation also uses at most O(L3 ) gates. Hence using O(L3 )
gates we compute the transformation |zi|yi → |zi|xz y (mod N )i.

Appendix 3: Continued fraction algorithm


A finite continued fraction of n + 1 variables is defined as

1
a0 + .
1
a1 +
1
a2 +
1
a3 + · · ·
an

For convenience it is also written as [a0 , a1 , . . . , an ]. The nth convergent


of a continued fraction [a0 , a1 , . . . aN ] is defined as [a0 , a1 , . . . , an ] for
n ≤ N.
The nth convergent is easily computed by the following theorem.

Theorem 5.1.2 If pn and qn are defined by

p0 = a0 , p1 = a1 a0 + 1, pn = an pn−1 + pn−2 for 2 ≤ n ≤ N,


q0 = 1, q1 = a1 , qn = an qn−1 + qn−2 for 2 ≤ n ≤ N

pn
then [a0 , a1 , . . . , an ] = qn .

Proof We prove by induction. It is easy to check for the base cases


n = 1, 2.
Induction Hypothesis: The conclusion holds for 1 ≤ n ≤ m.
56 Lecture 5. Order Finding

Induction step:
· ¸
1
[a0 , a1 , . . . , am , am+1 ] = a0 , a1 , . . . , am−1 , am +
am+1
³ ´
1
am + am+1 pm−1 + pm−2
= ³ ´
1
am + am+1 qm−1 + qm−2
am+1 (am pm−1 + pm−2 ) + pm−1
=
am+1 (am qm−1 + qm−2 ) + qm−1
am+1 pm + pm−1
=
am+1 qm + qm−1
pm+1
= .
qm+1

Theorem 5.1.3 The functions pn and qn satisfy the following relation

pn qn−1 − pn−1 qn = (−1)n .

Proof We use induction. The result is true for the base cases n = 1, 2.
Assume the result is true for any integer less than n.

pn qn−1 − pn−1 qn = (an pn−1 + pn−2 )qn−1 − pn−1 (an qn−1 + qn−2 )
= −1(pn−1 qn−2 − pn−2 qn−1 )
= (−1)n .

This completes the proof.


¤
Let x be a real number. Then the system of equations

x = a0 + α0 with a0 ∈ Z and α0 ∈ [0, 1)


1
= a1 + α1 with a1 ∈ Z and α1 ∈ [0, 1)
α0
1
= a2 + α2 with a2 ∈ Z and α2 ∈ [0, 1)
α1
..
.

is called the continued fraction algorithm. The algorithm continues till


αn 6= 0.
5.1. The Order Finding Algorithm 57

It is easy to see that if the algorithm terminates in N + 1 steps then


x = [a0 , a1 , . . . , aN ] and hence rational. But the converse of this is also
true.

Theorem 5.1.4 Any rational number can be represented by a finite con-


tinued fraction.

Proof Let x = hk . Then from the continued fraction algorithm we get


the following set of equations.

h = a0 k + k1 (0 < k1 < k)
k = a1 k1 + k2 (0 < k2 < k1 )
..
.

We observe that k > k1 > k2 · · · . Hence the algorithm must terminate.


Also, this is exactly the Euclid’s GCD algorithm. Hence its complexity
is O((log(h + k))3 ) (see [5, 10]).
¤

Theorem 5.1.5 If x is representable by a simple continued fraction with


an odd (even) number of convergents, it is also representable by one with
an even (odd) number of convergents.

Proof Let x = [a0 , . . . , an ]. If an ≥ 2, then [a0 , . . . , an ] = [a0 , . . . , an−1,1].


If an = 1, then [a0 , . . . , an−1 , 1] = [a0 , . . . , an−1 + 1].
¤

Theorem 5.1.6 Let x be a rational number and p and q two integers


such that ¯ ¯
¯p ¯
¯ − x¯ ≤ 1 .
¯q ¯ 2q 2
p
Then q is a convergent of the continued fraction for x.

Proof Let [a0 , . . . , an ] be the continued fraction expansion of pq . From


Theorem 5.1.5 it follows that without loss of generality we may assume
n to be even. Let pi and qi be defined as in Theorem 5.1.2.
Let δ be defined by the equation

pn δ
x= + 2.
qn 2qn
58 Lecture 5. Order Finding

pn p
Then |δ| ≤ 1 and qn = q is the nth convergent. Let
µ ¶
qn pn−1 − pn qn−1 qn−1
λ=2 − .
δ qn

The definition of λ ensures that the equation

λpn + pn−1
x=
λqn + qn−1

is satisfied. Hence x = [a0 , . . . an , λ]. By Theorem 5.1.3 we get

2 qn−1
λ= −
δ qn
> 2 − 1 since qi > qi−1
= 1.

This implies that λ is a rational number greater than 1 and it has a finite
continued fraction, say [b0 , . . . , bm ]. Hence x = [a0 , . . . , an , b0 , . . . , bm ].
Thus pq is a convergent of x.
¤

ϕ(r)
Appendix 4: Estimating r

ϕ(r) log log r


Lemma 5.1.7 The ratio r is at least 10 log r for r ≥ 16.

Qa αi Qb βj
Proof Let r = i=1 pi j=1 qj , where

2 log r
p 1 < p2 < · · · < p a ≤ < q1 < q2 < · · · < qb .
log log r

Then
a
Y b
Y
αi −1 β −1
ϕ(r) = (pi − 1)pi (qi − 1)qi j .
i=1 j=1

2 log r
Note that q1b ≤ r. This implies b ≤ logq r ≤ log r. Since q1 > log log r , we
log r
have b ≤ log log r−log log log r+log 2 .
5.1. The Order Finding Algorithm 59

Hence,
Qa Q β −1
ϕ(r) − 1)piαi −1 bj=1 (qj − 1)qj j
i=1 (pi
= Qa
r αi Qb βj
i=1 pi j=1 qi
a µ
Y ¶ b µ ¶
pi − 1 Y 1
= 1−
pi qj
i=1 j=1
2 log r
Y µi − 1¶ Y
b µ
log log r
1

> 1−
i qj
i=2 j=1
b µ
Y ¶
log log r 1
= 1−
2 log r qj
j=1
µ ¶
log log r log log r b
> 1−
2 log r 2 log r
µ ¶
log log r log log r
> 1− b
2 log r 2 log r
· µ ¶¸
log log r log log r log r
= 1−
2 log r 2 log r log log r − log log log r + log 2
· ¸
log log r 1 − 2E
> where E = log loglogloglogr−log
r
2
2 log r 2(1 − E)
µ ¶
log log r 1 − 2E
>
2 log r 2
log log r
> for r ≥ 16.
10 log r

¤
In fact the following theorem is true.

Theorem 5.1.8 limn→∞ ϕ(n) log


n
log n
= e−γ where γ is the Euler’s con-
stant.
¤
The interested reader may look up Hardy and Wright [6] for the
proof.
60 Lecture 5. Order Finding
Lecture 6

Shor’s Algorithm

6.1 Factoring to Order Finding


Lemma 6.1.1 Let N be an odd number with prime factorization
pα1 1 pα2 2 . . . pαmm , m ≥ 2. Let


A = {x ∈ Z∗N | (ord(x) is odd ) or (ord(x) is even and xord(x)/2 = −1)},

where ord(x) = min{i ≥ 1 | xi = 1}. If x is chosen at ramdom from


Z∗N , then
1
Pr∗ (x ∈ A) ≤ m−1 .
x∈ZN 2

Proof 1 Let |Z∗N | = ϕ(N ) = 2` s, where s is odd (note ` ≥ 2). Let V


be the set of square-roots of 1 in Z∗N .

Lemma 6.1.2 (a) If ord(x) is odd, then xs = 1.


i
(b) If ord(x) is even, then x2 s ∈ V −{1}, for some i ∈ {0, 1, . . . , `−1}.
i
(c) If ord(x) is even and xord(x)/2 = −1, then x2 s = −1 for some
i ∈ {0, 1, . . . , ` − 1}.

Proof (a) Since x ∈ Z∗N , we have ord(x)|φ(N ). Since ord(x) is odd,


ord(x)|s.
1
Our proof is based on the proof of correctness of Miller’s primality test in Kozen’s
book [12, page 206]. Nielsen and Chuang [4, Theorem A4.13, page 634] give a bound
of 2−m . Their bound is not correct: for N = 21 = 3 × 7, we have |Z∗N | = 12 and
|A| = 6. Then, |Z|A|
∗ | 6≤ 2
−2
.
N

61
62 Lecture 6. Shor’s Algorithm

0
(b) and (c) Let ord(x) = 2` s0 (where `0 ≥ 1 and s0 is odd). Then,
0 0 `0 −1
ord(x)|2` s, but ord(x) - 2` −1 s. Hence, x2 s ∈ V − {1}. Now, if
`0 −1 0 `0 −1
xord(x)/2 = −1, then x2 s = −1. Hence, x2 s = −1.
¤
∆ i
For i = 0, 1, . . . , ` − 1, and v ∈ V , let Si,v = {x ∈ Z∗N : x2 s = v}. By
Lemma 6.1.2, we have
`−1
[
A ⊆ S0,1 ∪ Si,−1 ; (6.1.3)
i=0
`−1
[ [
and Z∗N = S0,1 ∪ Si,v . (6.1.4)
i=0 v∈V −{1}

Lemma 6.1.5 All the sets appearing on the right hand side of (6.1.4)
are disjoint.

Proof Consider two such sets Si,v and Sj,w appearing above. If i = j
then v 6= w and these sets are disjoint by defnition. Hence, suppose
i+1
i < j; this implies that w 6= 1. But for each x ∈ Si,v , we have x2 s =
j
v 2 = 1. This implies that x2 s = 1 6= w, and therefore x 6∈ Sj,w .
¤

To prove that |A| ≤ 2−m+1 |Z∗N |, we will use the isomorphism

Z∗N → Z∗pα1 × Z∗pα2 × · · · × Z∗pαmm ;


1 2

j 7→ (j (mod pα1 1 ), j (mod pα2 2 ), . . . , j (mod pαmm )),

which follows from the Chinese remainder theorem.


Since pi is odd, 1 6= −1 (mod pαi i ), for i = 1, 2, . . . , m, and the
2m elements in W = {+1, −1}m correspond to square roots of 1 in
Z∗N ; of these, the only trivial square roots are 1 = (1, 1, . . . , 1) and
−1 = (−1, −1, . . . , −1).
¤

Lemma 6.1.6

|S0,1 | = |S0,−1 |; (6.1.7)


|Sj,−1 | = |Sj,w |, for w ∈ W and j = 0, 1, . . . , ` − 1. (6.1.8)
6.1. Factoring to Order Finding 63

Proof To see (6.1.7), observe that x ∈ S0,1 if and only if xs = 1, if


and only if (−x)s = −1, if and only if −x ∈ S0,−1 .
To prove (6.1.8), fix j and w. We first show that if Sj,−1 6= ∅, then
Sj,w 6= ∅. For, suppose b = (b1 , b2 , . . . , bm ) ∈ Sj,−1 . Then, consider
c ∈ Z∗pα1 × Z∗pα2 × · · · × Z∗pαmm , defined by
2

(
1 if wi = 1;
ci =
bi if wi = −1.
j
Clearly, c2 s = w, so Sj,w 6= ∅. Furthermore, the map x 7→ cb−1 x is a
bijection between Sj,−1 and Sj,w . Hence, |Sj,−1 | = |Sj,w |.
¤

Since |W | = 2m , from (6.1.3), (6.1.4) and Lemma 6.1.6 we obtain


[
2m−1 |S0,1 ∪ S0,−1 | = | S0,w |,
w∈W
[
m
and for i = 0, 1, 2, . . . , ` − 1, (2 − 1)|Si,−1 | = | Si,w |,
w∈{W −{1}}

which implies
`−1
[
2m−1 |A| ≤ 2m−1 |S0,1 ∪ Si,−1 |
i=0
`−1
[ [
≤ |S0,1 ∪ Si,w |
i=0 w∈{W −{1}}
`−1
[ [
≤ |S0,1 ∪ Si,v |
i=0 v∈{V −{1}}

= |Z∗N | .

¤
Lemma 6.1.1 is the main tool for analyzing the Shor’s factoring algo-
rithm. The crucial observation is that, if we can get a nontrivial square
root of unity, then we can find a nontrivial factor of N using Euclid’s
G.C.D. algorithm. Lemma 6.1.1 tells us that if we randomly pick a
number x, less than N and look at its order, with probability greater
1
than 1 − 2m−1 it is even and we can get a nontrivial square root of unity
64 Lecture 6. Shor’s Algorithm

by raising x to the power ord(x)/2. The lemma holds if N is odd and


has at least two distinct prime factors. But a classical polynomial time
algorithm exists for finding the prime number which divides N , if N is
a prime power. So this gives us a polynomial time factoring algorithm.
So far it is not known whether classical computers can factorize a num-
ber N in polynomial time, even if randomness is allowed. Below is the
Shor’s factoring algorithm.

Shor’s factoring algorithm.

Input N

1) If N is even, return 2.

2) Use quantum order finding algorithm to find the order of 2. If


ord(2) = N − 1, conclude N is prime and stop.

3) Check if N is of the form pα , α > 1 by the subroutine Prime-power.

4) Pick an element x ∈ N .

5) If x | N , return x.

6) Use quantum order finding algorithm to find the order of x.

7) If ord(x) is odd then abort.


ord(x)
8) If x 2 = −1 (mod N ) then abort.
ord(x)
9) Get a nontrivial square root of 1 (mod N ), by setting y ← x 2 .

10) Use Euclid’s G.C.D. algorithm to find the greatest common divisor
of (y − 1, N ) and (y + 1, N ). Return the nontrivial numbers.

Output: With high probability it gives a divisor of N or tells if N is


prime.

Subroutine: Prime-power

Input: Integer N.

1) Compute y = log2 N .

2) For all i ∈ {2, 3, . . . , log2 N } compute xi = yi .


6.1. Factoring to Order Finding 65

3) Find ui < 2xi < ui + 1 for all i ∈ {2, 3, . . . , log2 N }.

4) Check if ui | N or ui + 1 | N for all i ∈ {2, 3, . . . , log2 N }. If any one


of the numbers divide N , say u, then return u. Else fail.

Output: If N is a prime power of p, the subroutine “prime-power”


returns p. If it is not a prime power it fails to produce any output.
In O((log N )3 ) steps it terminates. The most costly operation
in the algorithm is the order finding algorithm. Since the order
finding takes O((log N )4 ) time, the time taken by this factoring
algorithm is also O((log N )4 ).

Remark 6.1.9 Step 1) just checks if the number N is divisible by 2.


Step 2) checks if the number N is prime and Step 3) if N is a prime
power. So after Step 3) Lemma 6.1.1 is applicable.
Probability of success in Shor’s algorithm is greater than probability
of success in order finding multiplied by the probability that the chosen
element x is not in the set A, of Lemma 6.1.1. Running time of the al-
gorithm is O((log N )4 ). Thus, by running the algorithm only a constant
number of times we can get probability of success greater than 1 − ² for
any fixed ² > 0.

Exercise 6.1.10 Find a randomized polynomial time algorithm for fac-


toring an integer N , if ϕ(N ) is known.
66 Lecture 6. Shor’s Algorithm
Lecture 7

Quantum Error Correcting Codes

7.1 Knill Laflamme Theorem


The mathematical theory of communication of messages through a quan-
tum information channel is based on the following three basic principles.
1) Messages can be encoded as states and transmitted through quan-
tum channels.
2) The output state may not be the same as the input state due to
presence of noise in the channel.
3) There is a collection of “good” states which when transmitted
through the noisy channel leads to output states from which the
input state can be recovered with no error or with a small margin
of error.
The aim is to identify the set of good states for a given model of the
noisy channel and to give the decoding procedure.

CHANNEL
Input state Output state
ρ T (ρ)

Noise

Figure 7.1: A model of noisy quantum channel

Let H be a finite dimentional complex Hilbert space. We assume


that there is a linear space E ⊂ B(H), called the error space such that

67
68 Lecture 7. Quantum Error Correcting Codes

for any input state ρ on H the output state T (ρ) has always the form
X
T (ρ) = Lj ρL†j (7.1.1)
j

where Lj belongs to E for every j (See Figure 7.1). If the same input
state is transmitted again the operators Lj may be completely different.
But they always come from the error space E and satisfy the equation
µX
k ¶
Tr L†j Lj ρ = 1. (7.1.2)
j=1

The Lj ’s may or may not depend on the density matrix ρ which is


transmitted through the noisy channel.

Definition 7.1.3 A state ρ is said to have its support in a subspace


S ⊂ H if Tr ρE S = 1 where E S is the orthogonal projection on S.
This means if we choose an orthonormal basis e1 , . . . ek , ek+1 , . . . , eN
for H such that e1 , e2 , . . . , ek is an orthonormal
· ¸ basis for S then the
ρ̃ 0
matrix of ρ in this basis has the form where ρ̃ is a k × k
0 0
matrix. To recover the input state at the output of the channel we
apply a recovery operator R of the form
X
R(T (ρ)) = Mj T (ρ)Mj† ,
j
X
Mj† Mj = I.
j

It would be desirable to have R(T (ρ)) = ρ for all ρ, whenever the


L’s are from E and they act on ρ as in (7.1.1). Of course this is too
ambitious. We would like to achieve this pleasant situation at least for
all ρ with support in some ‘large’ subspace C ⊂ H. Then we can encode
messages in terms of states from C and recover them with the help of a
decoding operation R. The idea is formalized in the following definition.

Definition 7.1.4 A subspace C ⊂ H is called a E-correcting quantum


code, if there exist operators M1 , M2 , . . . , Mk , such P
that for every ρ with
support in C and any L1 , L2 , . . . , Ll ∈ E, with Tr( j L†j Lj )ρ = 1, one
P
has i,j Mi Lj ρL†j Mi† = ρ.
7.1. Knill Laflamme Theorem 69

Remark 7.1.5 Now consider |ui ∈ C. Then |uihu| has support in C.


Consider the equations
X
Mi Lj |uihu|L†j Mi† = |uihu|

and
X †
hu|( Lj Lj )|ui = 1.
j

Choose any |vi ∈ H such that hu | vi = 0. Then we have


X
|hv|Mi Lj |ui|2 = 0. (7.1.6)
i,j

This is true if and only if hv|Mi Lj |ui = 0 for all |vi ∈ {|ui}⊥ and every
i, j. Thus,
Mi Lj |ui = c(u)|ui for all |ui ∈ C.

Mi Lj is an operator and C is a subspace. Hence this can happen if and


only if
¯ ¯
Mi L ¯C = λi (L)I ¯C for all L ∈ E.

We state this as a proposition.

Proposition 7.1.7 A subspace C ⊂ H is an E-correcting quantum code


if
P and† only if there exist ¯ operators¯ M1 , M2 , . . . Mk in H, such that,
M M i = I and M i L ¯ = λi (L)I ¯ for all L ∈ E.
i i C C

We would like to have a characterization of the quantum code C


without involving the Mi ’s. That is, a condition entirely in terms of C
and E. This is achieved by the following remarkable criterion due to
Knill and Laflamme.

Theorem 7.1.8 (Knill and Laflamme) A subspace C with an ortho-


normal basis ψ0 , ψ1 , . . . , ψk−1 is an E-correcting quantum code if and
only if

1. hψi |L†1 L2 |ψj i = 0 for all i 6= j, and all L1 , L2 ∈ E;


2. hψi |L†1 L2 |ψi i is independent of i = 0, 1, . . . , k − 1.
70 Lecture 7. Quantum Error Correcting Codes

Proof Necessity: By the Proposition 7.1.7 we know thatPthere must


exist recovery operators R1 , . . . , Rl satisfying the equations i Ri† Ri = I
and Ri Lψ = λi (L)ψ, ψ ∈ C, L ∈ E. Let L1 , L2 ∈ E, then
µX ¶
† † †
hψi |L1 L2 |ψj i = hψi |L1 Rr Rr L2 |ψj i
r
X
= λr (L1 )λr (L2 )hψi | ψj i
r
X
= λr (L1 )λr (L2 )δij .
r

Sufficiency: Let the conditions (1) and (2) hold. Consider the sub-
spaces Eψ0 , Eψ1 , . . . , Eψk−1 . It can be verified that the correspondence
Lψi → Lψj , for all L ∈ E is a scalar product preserving map. So we can
write the following table.
ψ0 ψ1 ··· ψj ··· ψk−1
Eψ0 Eψ1 ··· Eψj ··· Eψk−1

ϕ00 ϕ01 ··· ϕ0j ··· ϕ0k−1


.. .. .. ..
. . ··· . ··· .
ϕl−1
0 ϕl−1
1 ··· ϕl−1
j ··· ϕl−1
k−1

Here ϕ00 , ϕ10 , . . . , ϕl−1


0 is an orthonormal basis for the subspace Eψ0 . The
map Lψ0 → Lψj , for any L ∈ E, is a unitary isomorphism between
the subspaces Eψ0 and Eψj . So dim Eψj = l for all j ∈ {0, 1, . . . k − 1}
and there exists a global unitary operator Uj , satisfying Uj ϕi0 = ϕij ,
i = 0, 1, . . . , l − 1. Since by the first condition hL1 ψi | L2 ψj i = 0 for
L1 , L2 ∈ E and i 6= j, the subspaces Eψj j = 0, 1, . . . k − 1 are mutually
orthogonal. Let Ei be the projection on the span of the ith row in the
array {ϕij }. Now we define a unitary operator V (i) satisfying V (i) ϕij = ψj
for i = 0, 1, . . . , l − 1.
Let Ri = V (i) Ei for i = 0, 1, . . . , l − 1 and Rl = El , the
P projection on
{ϕj , 0 ≤ i ≤ l−1, 0 ≤ j ≤ k−1}⊥ . It can be verified that li=0 Ri† Ri = I.
i

Now consider any ψ = c0 ψ0 + c1 ψ1 + · · · + ck−1 ψk−1 in C. Then

Lψ = c0 Lψ0 + c1 Lψ1 + · · · + ck−1 Lψk−1 ,


= c0 Lψ0 + c1 U1 Lψ0 + · · · + ck−1 Uk−1 Lψ0 .
Let
Lψ0 = α0 (L)ϕ00 + α1 (L)ϕ10 + · · · + αl−1 ϕl−1
0 .
7.1. Knill Laflamme Theorem 71

Then we have
Uj Lψ0 = α0 (L)ϕ0j + α1 (L)ϕ1j + · · · + αl−1 ϕjl−1
⇒ Ei Uj Lψ0 = αi (L)ϕij
⇒ V (i) Ei Uj Lψ0 = αi (L)ψj .
That is,
Ri Uj Lψ0 = αi (L)ψj for i = 0, 1, . . . , l − 1,
El Uj Lψ0 = 0 = Rl Uj Lψ0 .
Thus we have,
Ri Lψ = c0 αi (L)ψ0 + c1 αi (L)ψ1 + · · · + ck−1 αi (L)ψk−1
= αi (L)ψ for i ∈ {0, 1, . . . , l − 1}, and
Rl Lψ = 0.
¯ ¯
That is, Ri L ¯C = αi (L)I ¯C , where αl (L) = 0.
¤

Example 7.1.9 Let G be a finite group with identity element e and


H = L2 (G), the Hilbert space of functions on G with
X
hf1 , f2 i = f1 (x)f2 (x).
x∈G

Let E ⊂ G be called the error set and C ⊂ G the code set. Let E =
lin{Lx | x ∈ E}, where (La f )(x) = f (a−1 x), lin denotes linear span and
C = lin{1{c} | c ∈ C}. It can be verified that La 1{b} = 1{ab} .
If c1 6= c2 , then
D E ­ ®
1{c1 } , L†x Ly 1{c2 } = 1{c1 } , 1{x−1 yc2 }
= 0 if x−1 yc2 6= c1
or x−1 y 6= c1 c−1 −1
2 or E E ∩ CC
−1
= {e}.
Also, (
D E 1 if x = y;
1{c} , L†x Ly 1{c} =
0 otherwise.

Thus h1{c} , L†x Ly 1{c} i is independent of c. Hence by Knill–Laflamme


theorem we see that C is an E-correcting quantum code if E −1 E ∩
CC −1 = {e}.
72 Lecture 7. Quantum Error Correcting Codes

input output
CHANNEL
c∈C xc ∈ Ec

x∈E

Figure 7.2: A model of noisy classical channel.

Consider the model of a noisy classical channel shown in Figure 7.2.


If E −1 E ∩ CC −1 = {e} then for all distinct c1 , c2 , Ec1 ∩ Ec2 = ∅. So
C is an E–correcting classical code. If the output falls in the set Ec the
message is decoded as c.
For example, set G = Z32 , where Z2 = {0, 1} with addition mod 2.
Let the error set E be {100, 010, 001} and the code set C be {000, 111}.
Then E −E = {000, 110, 011, 101} and C −C = C = {000, 111} implying
(E − E) ∩ (C − C) = {000}.

In order to formulate our next proposition we introduce some nota-


tion. Let A be a finite abelian group with operation +, null element 0
and character group Â. In the Hilbert space H = L2 (A) of complex val-
ued functions on A we define the unitary operators Ua , a ∈ A, Vα , α ∈ Â
by (Ua f )(x) = f (x + a), (Vα f )(x) = α(x)f (x). Then we have the Weyl
commutation rules:

Ua Ub = Ua+b , Vα Vβ = Vαβ , Ua Vα = α(a)Vα Ua .

Let E ⊂ A, F ⊂ Â and let

E(E, F ) = lin {Ua Vα | a ∈ E, α ∈ F̂ }.

Our aim is to construct a quantum code which is E(E, F )–correcting by


using subgroups C1 ⊂ C2 ⊂ A. To this end, for any subgroup C ⊂ A,
we define

C ⊥ = {α | α ∈ Â, α(x) = 1, for all x ∈ C}.


7.1. Knill Laflamme Theorem 73

C ⊥ is called the annihilator of C. We have C1⊥ ⊃ C2⊥ . Clearly C1⊥ , C2⊥


are subgroups of the character group  under multiplication. Suppose

(E − E) ∩ C2 = {0}
F −1 F ∩ C1⊥ ⊆ C2⊥ ,

and let S be the cross section for C2 /C1 in the sense that S ⊂ C2
and C2 = ∪a∈S C1 + a is a coset decomposition (or partition) of C2 by
C1 –cosets. Note that

S ⊥ = {α | α ∈ Â, α(a) = 1 for all a ∈ S}

is a subgroup of Â. One may view C2 as a classical E-correcting group


code in A. Define
1
ψa (x) = (#C1 )− 2 1C1 +a (x), a ∈ S.

Theorem 7.1.10 lin{ψa | a ∈ S} is an E(E, F )-correcting quantum


code of dimension #C 2
#C1 .

Proof Note that hψa1 | ψa2 i = δa1 a2 , a1 , a2 ∈ S. It is enough to verify


Knill-Laflamme conditions for

L1 = Ua1 Vα1 , L2 = Ua2 Vα2 , a1 , a2 ∈ E, α1 , α2 ∈ F.

Then by the Weyl commutation rules we have

L†1 L2 = α1 (a2 − a1 )Ua2 −a1 Vα−1 α2 , a2 − a1 ∈ E − E, α1−1 α2 ∈ F −1 F.


1

Let a1 , a2 ∈ S, a1 6= a2 . We have for a ∈ E − E, α ∈ F −1 F,


X
hψa1 |Ua Vα |ψa2 i = (#C1 )−1 1C1 +a1 +a (x)α(x)1C1 +a2 (x). (7.1.11)
x∈A

The x-th term in the summation on the right side of (7.1.11) is not equal
to zero only if x ∈ (C1 + a1 + a) ∩ (C1 + a2 ), which implies the existence
of x1 , x2 ∈ C1 such that

x1 + a1 + a = x2 + a2
(7.1.12)
=⇒ a = (x2 − x1 ) + a2 − a1 .

In (7.1.12) a − (x1 − x2 ) belongs to C2 whereas a2 − a1 belongs to E − E.


By hypothesis (E −E)∩C2 = {0}. Thus the x-th term vanishes if a 6= 0.
74 Lecture 7. Quantum Error Correcting Codes

Now consider the case a = 0. Then for a1 , a2 ∈ S, a1 6= a2 , C1 + a1


and C1 + a2 are two disjoint cosets and therefore the right hand side of
(7.1.11) vanishes once again. In other words

hψa1 |Ua Vα |ψa2 i = 0 for all a1 6= a2 , a ∈ E − E, α ∈ F −1 F.

Now let us consider the case a1 = a2 = b ∈ S. Then the left hand side
of (7.1.11) is equal to
X
(#C1 )−1 1C1 +b+a (x) 1C1 +b (x) α(x). (7.1.13)
x∈A

The x-th term is not equal to zero only if

x ∈ (C1 + b + a) ∩ (C1 + b) =⇒ (C1 + a) ∩ C1 6= ∅


=⇒ a ∈ C1 ∩ (E − E)
=⇒ a = 0.

Thus the expression (7.1.13) vanishes if a 6= 0. If a = 0 then (7.1.13) is


equal to
X X
(#C1 )−1 1C1 +b (x) α(x) = (#C1 )−1 α(b) α(x).
x∈A x∈C1

/ C1⊥ then, α is a nontrivial character for C1 and by Schur orthog-


If α ∈
onality the right hand side vanishes. If α ∈ C1⊥ , then

α ∈ C1⊥ ∩ F −1 F =⇒ α ∈ C2⊥ =⇒ α(b) = 1.

Thus the expression (7.1.13) is independent of b. In other words, Knill-


Laflamme conditions are fulfilled for the orthonormal set {ψa | a ∈ S}.

Theorem 7.1.14 Let C1 ⊂ C2 ⊂ A be subgroups. Consider the sub-


groups C2⊥ ⊂ C1⊥ ⊂ Â and the coset decomposition C1⊥ = ∪α∈S̃ C2⊥ α
with respect to the cross section S̃. Define
1
ψα = (#C2 )− 2 1C2 α, α ∈ S̃.

Let E ⊂ A, F ⊂ Â be such that (E−E)∩C2 = {0}, F −1 F ∩C1⊥ ⊂ C2⊥ . Then


lin{ψα | α ∈ S̃} is an E(E, F )–correcting quantum code of dimension
(#C2 )/(#C1 ).
7.2. Some Definitions 75

Proof Let b ∈ E − E, β ∈ F −1 F, α1 , α2 ∈ S̃. Then

hψα1 |Ub Vβ |ψα2 i =


X
(#C2 )−1 1C2 +b (x)α1 (x)α2 (x)1C2 (x)β(x)α1 (b). (7.1.15)
x

If the x-th term in the right hand side of equation (7.1.15) is not equal
to zero, then C2 + b ∩ C2 6= ∅ =⇒ b ∈ C2 ∩ (E − E) =⇒ b = 0. Thus
the right hand side of equation (7.1.15) vanishes whenever b 6= 0 for any
α1 , α2 in S̃. Let b = 0. Then the right hand side of equation (7.1.15) is
X
(#C2 )−1 α1 (x)α2 (x)β(x). (7.1.16)
x∈C2

P
If α1 = α2 = α ∈ S̃ this becomes (#C2 )−1 x∈C2 β(x) which is inde-
pendent of α ∈ S̃. So we consider the case b = 0, α1 6= α2 , α1 , α2 ∈ S̃.
Then the expression (7.1.16) is not equal to zero only if α1 α2 β ∈ C2⊥ .
This implies β ∈ C1⊥ ∩ F −1 F. So by hypothesis β is in C2⊥ . This implies
α1 α2 ∈ C2⊥ . i.e., α1 and α2 lie in the same coset of C2⊥ in C1⊥ . This is
impossible. So expression (7.1.16) must be equal to zero. In other words
Knill-Laflamme conditions are fulfilled.
¤

7.2 Some Definitions


7.2.1 Invariants
Let C be an E correcting quantum code with recovery operators
R1 , . . . , Rl . Suppose U is a unitary operator such that U EU −1 ⊆ E.
Define, Sj = U Rj U −1 . We have Rj Lψ = λj (L)ψ, where ψ ∈ C and
L ∈ E. Since L̃ = U −1 LU is an element of E we have

Sj LU ψ = U Rj U −1 LU ψ
= U Rj L̃ψ
= λj (L̃)U ψ.

In other words, if C is an error correcting quantum code with recovery


operators R1 , R2 , . . . , Rl then for any unitary U , satisfying U EU ∗ ⊆ E,
U (C) is also E–correcting with recovery operators S1 , S2 , . . . Sk , where
Sj = U Rj U −1 for all j.
76 Lecture 7. Quantum Error Correcting Codes

Definition 7.2.1 Two E-correcting quantum codes C1 , C2 , are said to


be equivalent if and only if there exists a unitary operator U , satisfying
U EU ∗ ⊆ E, such that U (C1 ) = C2 .

Remark 7.2.2 Finding invariants for the equivalence of E-correcting


quantum codes is an important problem in the development of the sub-
ject.
Let A be a finite set, called an alphabet, of cardinality N . An element
x in An is called a word of length n. A word x is also written as
(x1 , x2 , . . . , xn ). C ⊂ An is called an (n, M, d)A code if, #C = M and

min d(x, y) = d.
x,y ∈C,x6=y

Here, d(x, y) = #{i | xi 6= yi }. This is also known as the Hamming


distance between x and y.
If A is an abelian group with + as its addition and 0 its null element
then

w(x) = #{i | xi 6= 0}, x = (x1 , x2 , . . . , xn )
is called the weight of x. If C ⊂ An is a subgroup with

d= min w(x), #C = M,
x6=0, x∈C

then C is called an (n, M, d)A group code, and it is denoted by hn, M, diA .
If A is the additive group of a finite field Fq of q elements (q = pm , for
some prime p) and C ⊂ Fnq is a linear subspace of the n-dimensional
vector space Fnq over Fq and d = minx6=0 w(x), then C is called a linear
code over Fq with minimum distance d and written as [n, k, d]q code,
where k = dim C. When q = 2, it is simply called an [n, k, d] code
(binary code). ¥ ¦
An hn, M, diA code is t-error correcting when t = d−1 2 .

7.2.2 What is a t–error correcting quantum code?


n
Let G be a Hilbert space of finite dimension and H = G ⊗ its n-fold
tensor product. A typical example is G = C2 , so that H is an n-qubit
Hilbert space. Consider all operators in H of the form

X = X1 ⊗ X2 ⊗ · · · ⊗ Xn ,

where #{i | Xi 6= I} ≤ t.
7.2. Some Definitions 77

Denote by Et the linear span of all such operators. An element X ∈ Et


is called an error operator of strength at most t. An Et -correcting quan-
tum code C ⊂ H is called a t–error correcting quantum code.

Remark 7.2.3 In an n–qubit quantum computer, if errors affect at


most t wires among the n wires, they can be corrected by a t–error
correcting quantum code.

7.2.3 A good basis for Et


We now construct a “good basis” for Et ⊂ B(H). Suppose dim G = N .
Consider any abelian group A of cardinality N and identify G with
L2 (A). We define the unitary operators Ua , Vα and Wa,α as follows

(Ua f )(x) = f (x + a) where a ∈ A,


(Vα f )(x) = α(x)f (x) where f ∈ L2 (A), α ∈ Â, and Wa,α = Ua Vα .

Then we have
W(a,α) W(b,β) = α(b)Wa+b,αβ

and Tr W(a,α) W(b,β) = (δa,b δα,β )N.

The family {W(a,α) | (a, α) ∈ A × Â} is irreducible and the set


n 1 ¯ o
¯
√ W(a,α) ¯(a, α) ∈ A × Â
N
is an orthonormal basis for the Hilbert space B(G) with scalar product
hX, Y i = Tr X † Y, X, Y ∈ B(G). For (a, α) ∈ An × Ân (∼ = (A × Â)n )
define W(a,α) = W(a1 ,α1 ) ⊗ W(a2 ,α2 ) ⊗ · · · ⊗ W(an ,αn ) , so that
n
Y
W(a,α) W(b,β ) = αi (bi )W(a+b,αβ )
i=1

and ½ ¯ ¾
1 ¯
n W
¯ n
(a, α) ∈ A × Ân
N 2 (a,α) ¯
n
is an orthonormal basis for B(H) = B(G ⊗ ). Define

w(a, α) = #{i | (ai , αi ) 6= (0, 1)}

the weight of (a, α) in the abelian group (A × Â)n .


78 Lecture 7. Quantum Error Correcting Codes

Then
{W(a,α) | w(a, α) ≤ t}
is a linear basis for the subspace Et .
n
A subspace C ⊂ G ⊗ is called a quantum code of minimum distance d,
if C has an orthonormal basis ψ1 , ψ2 , . . . , ψk satisfying

1. hψi |W(a,α) |ψj i = 0, i 6= j, w(a, α) ≤ d,


2. hψi |W(a,α) |ψi i is independent of i whenever w(a, α) ≤ d,
3. Either condition (1) or condition (2) is false, for some (a, α) with
w(a, α) = d + 1.
¥ ¦
Such a quantum code is d−12 -error correcting. We call it an [[n, k, d]]A
quantum code.

7.3 Examples
7.3.1 A generalized Shor code
We begin with a few definitions. Let A be a finite abelian group with
binary operation + and identity element 0. Let  denote its character
n
group. Let H be the Hilbert space L2 (A)⊗ . Let Ua and Vα denote the
Weyl operators. Let Cn ⊂ An be a t-error correcting (d(Cn ) ≥ 2t + 1)
group code of length n with alphabet A. Let Dn,m ⊂ (Cˆn )m be a t-error
correcting group code with alphabet Cˆn of length m.
An element in Dn,m is denoted by χ. Sometimes we also denote by
χ the m-tuple χ1 , χ2 , . . . , χm , where each χi is in Cˆn . Define
(
−1
#Cn 2 α(x) if x ∈ Cn ;
fα (x) =
0 otherwise.

Let Fχ = fχ1 ⊗ fχ2 ⊗ · · · ⊗ fχm , where χ is in Dn,m .

Theorem 7.3.1 {Fχ | χ ∈ Dn,m } is a t-error correcting quantum code


in L2 (A)⊗ ∼
mn
= L2 (Amn ).

Proof Let (a, α) ∈ Amn × Âmn such that w(a, α) ≤ 2t. We have
m
X Y
hFβ |Ua Vα |Fγ i = f βj (x(j) − a(j) )fγj (x(j) )α(x). (7.3.2)
x ∈Amn j=1
7.3. Examples 79

Note that w(a) ≤ 2t in Amn and w(α) ≤ 2t in Âmn .


Case 1: Let a 6= 0. Then a(j) 6= 0 for some j = j0 , and w(a) ≤ 2t
implies w(a(j0 ) ) ≤ 2t. Then Cn + a(j0 ) ∩ Cn = ∅. So every summand in
the right hand side of equation (7.3.2) vanishes.
Case 2: Let a = 0. Then the right hand side of equation (7.3.2) re-
duces to X
1
β(x)γ(x)α(x).
#Cn
x∈Cnm
Let β 6= γ, β, γ ∈ Dn,m . Then βγ ¯ ∈ Dn,m (a group code), ¯and
w(βγ) ≥ 2t + 1. Since w(α) ≤ 2t, α ¯C m has weight ≤ 2t. So β γ α ¯C m
n n
is nontrivial. By Schur orthogonality relations the right hand side of
equation (7.3.2) is equal to 0.
We now consider the case whenPβ = γ. Then the right hand side
1
of equation (7.3.2) reduces to #C n x∈Cnm α(x) which is independent
of β.
Thus the Knill-Laflamme conditions are fulfilled.
¤

7.3.2 Specialization to A = {0, 1}, m = 3, n = 3


Design of a 9-qubit, 1 error correcting, 2-dimensional code.

C3 = {000, 111}
Ĉ3 has two elements,
χ1 (000) = χ1 (111) = 1 (identity character) and
χ2 (000) = +1, χ2 (111) = −1.
1
fχ1 = √ (|000i + |111i)
2
1
fχ2 = √ (|000i − |111i)
2
D3,3 = {(χ1 , χ1 , χ1 ), (χ2 , χ2 , χ2 )}
3
Fχ1 χ1 χ1 = fχ⊗1
3
Fχ2 χ2 χ2 = fχ⊗2 .

Thus, we encode 0 as Fχ1 χ1 χ1 and 1 as Fχ2 χ2 χ2 . The circuit for imple-


menting the code is shown in Figure 7.3. This code is called the Shor
code.
80 Lecture 7. Quantum Error Correcting Codes

|ψi


H

 









|0i
|0i

|0i H









|0i
|0i

|0i H 










|0i
|0i

Figure 7.3: Circuit for encoding the Shor code.

7.3.3 Laflamme code


Laflamme found the following 5-qubit 1-error correcting quantum code.
1
0 7→ |ψ0 i = {(|00000i+|11000i+|01100i+|00110i+|00011i+|10001i)
4
− (|01010i+|00101i+|10010i+|01001i+|10100i)
− (|11110i+|01111i+|10111i+|11011i+|11101i)}
1
1 7→ |ψ1 i = {(|11111i+|00111i+|10011i+|11001i+|11100i+|01110i)
4
− (|10101i+|11010i+|01101i+|10110i+|01011i)
− (|00001i+|10000i+|01000i+|00100i+|00010i)}.
The code can also be written in the following way. Let a0 = a1 +a2 +
a3 + a4 + x (mod 2).
1 X
x 7→ |ψx i = (−1)(a0 a2 +a1 a3 +a2 a4 +a3 a0 +a4 a1 ) |a0 i|a1 a2 a3 a4 i.
4
a1 ,a2 ,a3 ,a4 ∈Z2

This observation allows us to construct a simple circuit for implementing


the Laflamme code. The circuit for the Laflamme code is shown in
Figure 7.4.
7.3. Examples 81

x a0
a1 a1
a2 a2
a1 a3
a4 a4

C (1)

H
a0 Z
a1 Z H
a2 Z H
a3 Z
a4 Z H

C (2) C (3)

|ψi

|0i
|0i C (1) C (2)
C (3)
|0i
|0i

Figure 7.4: Circuit for encoding the Laflamme code.

7.3.4 Hadamard-Steane quantum code

Consider the Table 7.1. The ij th entry, for i, j > 1, is the inner prod-
uct of the ith entry in the first row and j th entry in the first column,
computed over the field F2 .
The portion inside the box is Hadamard [7, 3, 4] simplex code. Let
82 Lecture 7. Quantum Error Correcting Codes

000 001 010 011 100 101 110 111


000 0 0 0 0 0 0 0 0
001 0 1 0 1 0 1 0 1
010 0 0 1 1 0 0 1 1
011 0 1 1 0 0 1 1 0
100 0 0 0 0 1 1 1 1
101 0 1 0 1 1 0 1 0
110 0 0 1 1 1 1 0 0
111 0 1 1 0 1 0 0 1

Table 7.1:

C be the set of all row vectors. Define


1 X 1 X
|ψ0 i = √ |xi and |ψ1 i = √ |xi.
2 2 x∈C 2 2 x∈C+(1,1,1,1,1,1,1,1)

Then, lin{|ψ0 i, |ψ1 i} is a 7-qubit single error correcting quantum code.


Note that, C ∪ C + (1, 1, 1, 1, 1, 1, 1, 1) is a group code of minimum dis-
tance 3.
Permute the columns to the order 4 6 7 1 2 3 5 in the table above.
Then the enumerated rows can be expressed as

(x1 x2 x3 x1 + x2 x1 + x3 x2 + x3 x1 + x2 + x3 )

where x1 , x2 , x3 vary in F2 . In other words we have expressed the code


as a parity check code with the first three positions for messages and
the last four as parity checks. Then the Hadamard-Steane code can be
expressed as
X
|ψa i = |x1 + a x2 + a x3 + a x1 + x2 + a x1 + x3 + a
x1 ,x2 ,x3

x2 + x3 + a x1 + x2 + x3 + ai

where a ∈ {0, 1}. Put y1 = x1 + x3 + a, y2 = x2 + x3 + a, y3 =


x1 + x2 + x3 + a. Then
X
|ψa i = |y2 + y3 + a y1 + y3 + a y1 + y2 + y3 y1 + y2 + a
y1 ,y2 ,y3 ∈{0,1}

y1 y2 y3 i.
7.3. Examples 83

|ai

|0i

|0i

|0i
|ψa i
|0i H
|0i H

|0i H

Figure 7.5: Circuit implementing the Steane-Hadamard code.

This shows that the code can be implemented by the circuit shown in
Figure 7.5.

Exercise 7.3.3 Verify directly the Knill-Laflamme conditions for


{|ψ0 i, |ψ1 i}, for single error correction.

7.3.5 Codes based on Bush matrices


Let Fq = {a1 , a2 , . . . , aq } be the field of q = pm elements, where p is
prime.
Let P(t, q) = { all polynomials of degree ≤ t with coefficients from Fq },
a linear space of dimension q t+1 .
We enumerate the elements of P(t − 1, q), t − 1 ≤ q as ϕ0 , ϕ1 , . . .,
ϕN −1 and construct the matrix Bt of order q t × q, q t = N as follows :

a1 a2 ··· aj ··· aq
ϕ0 = 0 0 0 ··· 0 ··· 0
.. .. .. .. ..
. . . ··· . ··· .
ϕi ϕi (a1 ) ϕi (a2 ) ··· ϕi (aj ) ··· ϕi (aq )
.. .. .. .. ..
. . . ··· . ··· .
ϕN −1 ϕN −1 (a1 ) ϕN −1 (a2 ) ··· ϕN −1 (aj ) ··· ϕN −1 (aq )

Denote the linear space of all the row vectors in Bt also by Bt .


84 Lecture 7. Quantum Error Correcting Codes

Proposition 7.3.4 Bt is a linear code of minimum distance q − t + 1.

Proof Consider the i-th row in Bt , i 6= 0. ϕi is a nonzero polynomial


of degree ≤ t − 1. So ϕi has at most t − 1 zeroes. Thus, the weight of
this row ≥ q − t + 1. On the other hand consider the polynomial

ϕ(x) = (x − a1 )(x − a2 ) · · · (x − at−1 ).

Its zeros are exactly a1 , a2 , . . . at−1 . Thus, the weight of the correspond-
ing row is q − t + 1.
¤
¥ q−t ¦
Corollary 7.3.5 Bt is a 2 –error correcting group code.
¥ ¦
If Et is the Hamming sphere of radius q−t
2 with (0, . . . , 0) as center
t
in Fqq then (Et − Et ) ∩ Bt = {0}.

⊥ ˆ q
¥ t ¦ Let α ∈ Bt ⊂ (Fq ) . If α 6= 1, then w(α) ≥ t + 1.
Proposition 7.3.6

Thus Bt is a 2¥ ¦error correcting group code. If Ft is the Hamming
sphere of radius 2t then Ft−1 Ft ∩ Bt⊥ = {1}.

Proof Suppose w(α) = r, where 0 < r ≤ t. Let α = (α1 , α2 , . . . , αq ),


αi ∈ Fˆq , αi 6= 1 if and only if i ∈ {i1 < i2 < · · · < ir }. Write
bj = aij , j = 1, 2, . . . , r. For arbitrary c1 , c2 , . . . , cr in Fq consider the
Lagrange polynomial (for interpolation)

X (x − b1 )(x − b2 ) · · · (x − bj )ˆ· · · (x − br )
ϕ(x) = cj ,
(bj − b1 )(bj − b2 ) · · · (bj − bj )ˆ· · · (bj − br )

where “ˆ” indicates omission of that term. Then ϕ is a polynomial of


degree r − 1 (≤ t − 1) and ϕ(bj ) = cj , j = 1, 2, . . . , r. Corresponding
to ϕ there is a row in Bt . Evaluating α on this row we get
r
Y
α(ϕ(a1 ), ϕ(a2 ), . . . , ϕ(aq )) = αij (cj ) = 1,
j=1

since α ∈ Bt⊥ . Since cj ’s are arbitrary, we have αij = 1 for all j =


1, 2, . . . , r, a contradiction.
¤
7.3. Examples 85

We can now use Theorem 7.1.10 and Theorem 7.1.14 to the case
C1 ⊂ C2 ⊂ Aq , A = Fq , as an additive group, C1 = Bt0 , C2 = Bt ,
0 < t0 < t < q. Then Bt = Bt0 ⊕ S, where S consists of all polynomials
of the form
0 0
s(x) = xt (a0 + a1 x + · · · + at−t0 −1 xt−t −1 ).

For any polynomial ϕ consider the state |ϕi = |ϕ(a1 )ϕ(a2 ) . . . ϕ(aq )i.
For any s ∈ S define
t0 X
ψs = q − 2 |ϕi.
ϕ∈P(t0 −1,q)

0
Then Ct,t0 = lin{ψs | s ∈ S} is a quantum code with dim Ct,t0 = q t−t ,
¥ q−t ¦ j t0 k
which can correct 2 ∧ 2 errors.

Remark 7.3.7 Choose t = bθqc , t0 = bθ0 qc , 0 < θ0 < θ < 1. Then, as


q → ∞, we have
log dim Ct,t0 t − t0 bθqc − bθ0 qc
= = → (θ − θ0 ).
log dim H q q
Therefore,
j k j k
(1−θ) θ0 q
# errors corrected 2 ∧ 2 1 − θ θ0
≥ → ∧
# qubits q 2 2
as q → ∞.
θ0
Then, for θ = 43 , θ0 = 14 we get, θ − θ0 = 12 and 1−θ 1
2 ∧ 2 = 8 . It
means 50% of the qubits are used for sending the messages, 50% for
error checking and up to 12 12 % errors can be corrected.

7.3.6 Quantum codes from BCH codes


In this example we use the celebrated BCH (Bose-Chaudhuri-Hocquen-
hem) codes to construct a quantum code. We begin with a few facts
from classical coding theory. Let Fnq be a vector space over the finite
field Fq with q = pm , where p is a prime. Choose and fix a primitive
element α of Fqn .
Let σ be a cyclic permutation defined by

σ(a0 , . . . , an−1 ) 7→ (an−1 , a0 , . . . , an−2 ).


86 Lecture 7. Quantum Error Correcting Codes

Then a subspace C ⊂ Fnq invariant under the cyclic permutation σ is


called a cyclic code of length n. For every word w = (w0 , . . . , wn−1 ) ∈ Fnq
we associate the word polynomial w(x) = w0 + w1 x + · · · + wn−1 xn−1 . If
w is in C it is called the code word polynomial. Let Rn = Fq [x]/(xn − 1).
Then Rn can be viewed as a vector space over Fq and it is isomorphic
to Fnq . Under the identification w à w(x) the image C ∗ of a cyclic code
C in Rn is an ideal with a single generator polynomial gC . Without loss
of generality we may assume gC to be monic and therefore unique. It is
known that gC is a divisor of xn − 1. If deg(gC ) = k then dim C = n − k.
If gC has a string of successive powers αa , αa+1 , . . . , αa+b−2 as its roots
and 0 ≤ a < a + b − 2 ≤ q n − 2, then d(C) ≥ b (where d(C) is the
minimum distance of C). For any cyclic code denote

C ⊥ = {x | xy = x1 y1 + · · · + xn yn = 0, for all y ∈ C}.

Then C ⊥ is also a cyclic code called the dual of C.


Conversely if g is a divisor of xn − 1 then there exists a unique
cyclic code Cg generated by g. Suppose xn − 1 = gh where g(x) =
a0 +a1 x+· · ·+ak−1 xk−1 +xk , h(x) = b0 +b1 x+· · ·+bn−k−1 xn−k−1 +xn−k
so that a0 b0 = −1. Define h̃ = b−10 (1 + bn−k−1 x + · · · + b0 x
n−k ). If h

has a string of successive powers αl , αl+1 , . . . , αl+m+2 as its roots then


so does the polynomial h̃ which can be written as

h̃ = (−1)n−k (β1 · · · βn−k )−1 (1 − β1 x) · · · (1 − βn−k x)

where β1 , . . . , βn−k are the roots of h in Fqn . It is known that C ⊥ = Ch̃


and therefore it follows that d(C ⊥ ) ≥ m. (For complete proofs we refer
to [15, 7, 13]).
Let xn −1 = g1 g2 g3 , d(Cg1 ) = d1 , d(Cg3 ) = d3 . Note that Cg⊥1 g2 = Cg˜3 .
By Theorem 7.1.10 we get a quantum code C of dimension
(#Cg1 )/(#Cg1 g2 ) = q deg(g2 ) . If Cg1 and Cg3 are respectively t1 and
t3 – error correcting codes then C can correct min(t1 , t3 ) errors.
Lecture 8

Classical Information Theory

8.1 Entropy as information


8.1.1 What is information?
Let us consider a simple statistical experiment of observing a random
variable X, which takes one of the values x P1 , x2 , . . . , xn with respective
probabilities p1 , . . . , pn (pi ≥ 0 for all i and i pi = 1). When we observe
X we gain some information because the uncertainty regarding its value
is eliminated. So the information gained is the uncertainty eliminated.
We wish to have a mathematical model which gives us a measure of this
information gained. A function which measures this information gained
or the uncertainty associated with a statistical experiment must depend
only on the probabilities pi and it should be symmetric. This is based on
the intuition that changing the names of the outcomes does not change
the uncertainty associated with the random variable X.
The desirable properties of a function H which measures the uncer-
tainty associated with a statistical experiment are listed below.

1) For each fixed n, H(p1 , p2 , . . . , pn ; n) is a nonnegative symmetric


function of p1 , p2 , . . ., pn .
2) H( 21 , 21 ; 2) = 1. This is to fix the scale of the measurement. One
can look at the information obtained by performing one of the
simplest statistical experiments. That is, tossing an unbiased coin
and observing the outcome. An outcome of this experiment is said
to give one unit of information.
3) H(p1 , p2 , . . . , pn ; n) = 0 if and only if one of the pi ’s is 1. This cor-
responds to the case when there is no uncertainty in the outcome
of the experiment.

87
88 Lecture 8. Classical Information Theory

4) Let X and Y be two independent statistical experiments. Let XY


denote the experiment where the experiments X and Y are per-
formed together and the output is the ordered pair of the outcomes
of X and Y . Then H(XY ) = H(X) + H(Y ).
5) H(p1 , p2 , . . . , pn ; n) attains its maximum when pi = n1 , for all
i ∈ 1, 2, . . . , n. That is, we gain maximum information when all
possible outcomes are equally likely.
6) H(p1 , p2 , . . . , pn , 0; n + 1) = H(p1 , p2 , . . . , pn ; n).
7) H(p1 , p2 , . . . , pn ; n) is continuous in p1 , . . . pn . This is a natural
condition because we would like to say that, if two statistical ex-
periments have the same number of possible outcomes and their
associated probabilities are close, then the information contained
in each of them should also be close.
P
Let H0 = − nj=0 pj log2 pj . This function is also known as the
entropy function. It can be verified that this function satisfies all the
above desired properties.
Let X, Y be two statistical experiments in which the outcomes of X
and Y are x1 , . . . , xn and y1 , . . . , ym respectively. Suppose

Pr(X = xi ) = pi , Pr(Y = yj | X = xi ) = qij , Pr(Y = yj ) = qj .

Then Pr(X = xi , Y = yj ) = pi qij . Let H(q


P i1 , . . . , qim ) = Hi (Y ). We
define conditional entropy as H(Y | X) = ni=1 pi Hi (Y ), i.e. the entropy
of Y on knowing X.

Exercise 8.1.1 Verify that H0 defined earlier satisfies the following


equality.

H0 (XY ) = H0 (X) + H0 (Y | X). (8.1.2)

This can be interpreted as follows: The total information obtained


by performing the experiments X and Y together is equal to the sum of
the information obtained by performing X and the information left in
Y after knowing the outcome of X.
This seems to be a reasonable property that the function H should
have. Note that Property 4) is a special case of equation (8.1.2). If we
replace Property 4) by the hypothesis, H(XY ) = H(X) + H(Y | X)
then there is a unique function which satisfies all the above properties.
Hence H0 is the only candidate as a measure of entropy. From now on-
wards we use H to denote the measure of entropy and H(p) to denote
8.1. Entropy as information 89

H(p1 , p2 , . . . , pn ; n). If a random variable X has a probability distribu-


tion p, we sometimes write H(p) instead of H(X).

Note 8.1.3 If Property 4) is not changed then there can be other func-
tions which satisfy properties 1) to 7). See [1] for other measures of
entropy.

The entropy function H has several important properties. Some of


them are listed in the following exercises.

Exercise 8.1.4 Show that H(XY ) ≥ H(X).

Mutual information H(X : Y ) of two statistical experiments is de-


fined as H(X : Y ) = H(X) + H(Y ) − H(XY ) = H(X) − H(X | Y ). It
is the information about X gained by observing Y .

Exercise 8.1.5 Show that H(Y : X) ≥ 0, where X and Y are two


statistical experiments.

Exercise 8.1.6 Let X, Y, Z be three statistical experiments. Then show


that the inequality H(X | Y ) ≥ H(X | Y Z) holds.

Exercise 8.1.7 (Subadditivity) Show that H(XY ) ≤ H(X) + H(Y ),


where X and Y are two statistical experiments.

Exercise 8.1.8 (Strong subadditivity) Show that

H(XY Z) + H(Y ) ≤ H(XY ) + H(Y Z),

where X, Y and Z are three statistical experiments. Equality holds if


and only if {Z, Y, X} is a Markov chain.

The following identity is also very useful.

Theorem 8.1.9 (Chain rule for conditional entropy)

H(X1 , . . . , Xn | Y ) =
H(X1 | Y ) + H(X2 | Y X1 ) + · · · + H(Xn | Y X1 · · · Xn−1 ).
90 Lecture 8. Classical Information Theory

Proof We prove by induction.


Base case: n = 2.
H(X1 X2 | Y ) = H(X1 X2 Y ) − H(Y )
= H(X1 X2 Y ) − H(X1 Y ) + H(X1 Y ) − H(Y )
= H(X2 | X1 Y ) + H(X1 | Y )
= H(X1 | Y ) + H(X2 | X1 Y ).
Induction hypothesis: For all n ∈ {2, 3, . . . k}

H(X1 , . . . , Xn | Y ) =
H(X1 | Y ) + H(X2 | Y X1 ) + · · · + H(Xn | Y X1 · · · Xn−1 ).
Induction step:

H(X1 , . . . , Xk+1 |Y ) = H(X1 |Y )+H(X2 . . . Xk+1 | Y X1 ) (by base case)


= H(X1 | Y ) + H(X2 | Y X1 ) + · · · +
H(Xk+1 | Y X1 . . . Xk ) (by induction hypothesis).

Exercise 8.1.10 (Data processing inequality) Let X → Y → Z be


a Markov chain. Then H(X) ≥ H(X : Y ) ≥ H(X : Z).

Exercise 8.1.11 (Data pipeline inequality) Let X → Y → Z be a


Markov chain. Then H(Z) ≥ H(Z : Y ) ≥ H(Z : X).

8.2 A Theorem of Shannon


Let A be an alphabet of size N . Denote by S(A) the free semigroup
generated by A. Any element W ∈ S(A) can be expressed as W =
ai1 ai2 . . . ain , where aij ∈ A for each j. We say that W is a word of length
n. Let B be another alphabet, say of size M . Any map C : A → S(B) is
called a code and any word in the image of C is called a codeword. Extend
C to a map C̃ : S(A) → S(B) by putting C̃(W ) = C̃(ai1 ai2 . . . ain ) =
C(ai1 )C(ai2 ) . . . C(ain ). We say that C is uniquely decipherable if C̃ is
injective (or one to one). C is called an irreducible code if no code
word of C is an extension of another code word. An irreducible code is
uniquely decipherable. Indeed, in such a case we can recover a word W
in S(A) from its image C̃(W ) by just reading C̃(W ) left to right.
8.2. A Theorem of Shannon 91

Theorem 8.2.1 Let A = {a1 , . . . , aN } and B = {b1 , . . . , bM } be two


alphabets. Let C : A → S(B) be an irreducible code. Let the lengths
of the words C(a1 ), C(a2 ), . . . , C(aN ), be n1 , n2 , . . . , nN , respectively.
Then

M −n1 + M −n2 + · · · + M −nN ≤ 1. (8.2.2)

Conversely, if n1 , n2 , . . . , nN are nonnegative integers satisfying this in-


equality then there exists an irreducible code C : A → S(B) such that
C(ai ) has length ni for each i = 1, 2, . . . , N.

Proof Let C : A → S(B) be an irreducible code with L = maxi ni .


Denote by wi the number of code words of length i.
Necessity: Since there can be at most M words of length 1 we have
w1 ≤ M . Since C is irreducible, words of length 2 which are extensions
of the code words of length 1 cannot appear in the image of C. This
gives w2 ≤ M 2 − w1 M .
Continuing this way we get wL ≤ M L − w1 M L−1 − · · · − wL−1 M .
The last inequality can be rewritten as

w1 M −1 + w2 M −2 + · · · + wL M −L ≤ 1. (8.2.3)

Sufficiency: We pick any w1 words of length 1. Then we pick any w2


words of length 2 which are not extensions of the w1 words of length 1
already picked. This is possible because inequality (8.2.3) is satisfied.
This way we keep picking words of required lengths.
¤

Suppose the letters ai , i = 1, 2, . . . , N of the alphabet A are picked


with probabilities pi ,P
i = 1, 2, . . . , N respectively. Then the expected
length of the code is N i=1 pi ni , where ni is the length of C(ai ).
Let
XN
M −nj
qj = PN and `(C) = pi n i .
ni
i=1 M i=1

By using the inequality “arithmetic mean is greater than or equal to


geometric mean” we get
Y µ qj ¶pj X µ
qj
¶ X
≤ pj = qj = 1.
pj pj
j j
92 Lecture 8. Classical Information Theory

Taking logarithm on both sides and using (8.2.3), we get


P
pi log2 pi
`(C) ≥ − .
log2 M
Hence
P the average length of an irreducible code must be at least
pi log2 pi
− log M .
2
log p log p
Let nj be an integer between − log 2 Mj and − log 2 Mj + 1 for all
P −nj ≤
P2 2
j ∈ {1, 2, . . . , N }. Then jM j pj ≤ 1. By the above dis-
cussion we know that an irreducible code C 0 exists with length Pof C 0 (ai )
equal to ni . The expected length of this code word `(C 0 ) (= j nj pj )
satisfies P P
pj log2 pj 0 pj log2 pj
≤ `(C ) ≤ − + 1.
log2 M log2 M
Theorem 8.2.4 (Sardinas-Patterson, 1953) Let A = {a1 , . . . , aN }
and B = {b1 , . . . , bM } be two alphabets. Let C : A → S(B) be a uniquely
decipherable code. Let the lengths of the PN words C(a1 ), C(a2 ), . . ., C(aN )
be n1 , n2 , . . ., nN respectively. Then j=1 M −nj ≤ 1.

Proof Let wj = #{i | ni = j}. Then the desired inequality can be


rewritten as
L
X
wj M −j ≤ 1 where L = max(n1 , n2 , . . . , nN ).
j=1
P
Let Q(x) = L j
j=1 wj x and let N (k) denote the number of B words of
length k. Then we have the following recursive relation.
N (k) = w1 N (k − 1) + w2 N (k − 2) + · · · + wL N (k − L), (8.2.5)
where N (0) = P 1 and N (j) = 0 if j < 0. Consider the formal power
series F (x) = ∞ k k
k=0 N (k)x . We know that N (k) ≤ M . Hence the
formal series converges in the case |x| < M −1 . From (8.2.5) we have
1
F (x) − 1 = Q(x)F (x) ⇒ F (x) = 1−Q(x) . F (x) is analytic in the disc
(|x| < M ) and 1−Q(x) > 0 when |x| < M −1 . Therefore, by continuity
−1

we have, Q(M −1 ) ≤ 1. This is the required inequality.


¤

Corollary 8.2.6 Let A and B be as in Theorem 8.2.1. Suppose the


letters a1 , a2 , . . . , aN are picked with probabilities p1 , p2 , . . . , pN respec-
tively. Then for P any uniquely decipherable code C from A to S(B) one
− pi log2 pi
has `(C) ≥ log M .
2
8.3. Stationary Source 93

Thus, Theorem 8.2.4 implies that corresponding to any uniquely de-


cipherable code C : A → S(B) with length of code words n1 , n2 , . . . , nN
there exists an irreducible code C 0 : A → S(B) with lengths of code
words n1 , n2 , . . . , nN .

Remark 8.2.7 Suppose an i.i.d. sequence X1 , X2 , . . . of letters from A


comes from a source with Pr(Xj = ai ) = pi . Then Pr((X1 X2 . . . Xn ) =
(ai1 ai2 . . . ain )) = pi1 pi2 . . . pin and H(X1 X2 . . . Xn ) = nH(p1 , . . . , pN ).
Now consider blocks of length n. The new alphabet is An . Encode
C : a → C(a), where a = ai1 ai2 . . . ain and C(a) ∈ S(B), in a uniquely
decipherable form, so that the following inequalities hold.

nH(p1 , p2 , . . . , pN ) X nH(p1 , p2 , . . . , pN )
≤ p(a)`(C(a)) < + 1.
log2 M a log2 M

This implies
¯P ¯
¯ a p(a)`(C(a)) H(p1 , p2 , . . . , pN ) ¯ 1
¯ − ¯< . (8.2.8)
¯ n log2 M ¯ n

In this block encoding procedure, the expected length of an encoded


block is X
`(C) = p(a)`(C(a)).
a
The ratio of expected length of an encoded block and the size of the
p(a)`(C(a))
P
a block, namely a , is called the compression coefficient.
n
Equation (8.2.8) tells us that, as n increases the compression coefficient
tends to H(p1log
,p2 ,...,pN )
M .
2

8.3 Stationary Source


We consider a discrete information source I which outputs elements
xn ∈ A, n = 0, ±1, ±2, . . . where A is a finite alphabet. Thus a ‘possible
life history’ of the output can be expressed as a bilateral sequence

x = (. . . , x−1 , x0 , x1 , x2 , . . .), xn ∈ A. (8.3.1)

Any set of the form


© ¯ ª
x ¯ x ∈ AZ , xt1 = a1 , . . . , xtn = an = [a1 . . . an ]t1 ,t2 ,...,tn
94 Lecture 8. Classical Information Theory

is called cylinder with base a1 , a2 , . . . , an at times t1 < t2 < · · · < tn .


Consider the smallest σ-algebra FA containing such cylinders. Any prob-
ability measure µ on the Borel space (AZ , FA ) is uniquely determined
by the values of µ on the cylinders. The probability space (AZ , FA , µ)
is called a discrete time random process.
Consider the shift transformation T : AZ → AZ defined by T x = y
where yn = xn−1 for all n ∈ Z. If the probability measure µ is invariant
under T we say that (AZ , FA , µ) is a stationary information source and
we denote it by [A, µ]. (From now onwards when there is no chance of
confusion we may use the notation µ(E) to denote Pr(E, µ).) For such
a source

µ([a1 a2 . . . an ]t1 ,t2 ,...,tn ) = µ([a1 a2 . . . an ]t1 +1,t2 +1,...,tn +1 ).

The information emitted by such a source during the time period t,


t + 1, . . ., t + n − 1 is also the information emitted during the period 0,
1, . . ., n − 1 and is given by
X
Hn (µ) = − µ(C) log µ(C).
C

where the summation is over all cylinders based on a0 , a1 , . . . , an−1 at


times 0, 1, 2, . . . , n − 1, aj varying in A. We call Hnn(µ) as the rate at
which information is generated by the source during [0, n − 1]. Our next
result shows that this rate converges to a limit as n → ∞.
Hn (µ)
Theorem 8.3.2 For any stationary source [A, µ] the sequence n
monotonically decreases to a limit H(µ).

Proof For any a0 , a1 , . . . , an−1 ∈ A we write

[a0 a1 . . . an−1 ] = [a0 a1 . . . an−1 ]0,1,2,...,n−1 .

Consider the output during [0, n − 1] as a random variable. Then we can


express

Hn+1 (µ) = −E(log µ[x−n , x−(n−1) , . . . , x0 ])


Hn (µ) = −E(log µ[x−n , x−(n−1) , . . . , x−1 ])

where the expectation is with respect to µ. We now show that the


sequence Hn+1 (µ) − Hn (µ) is monotonic decreasing. Let A, B and C
be schemes determined by the cylinders [x0 ], [x−n , x−(n−1) , . . . , x−1 ] and
8.3. Stationary Source 95

[x−(n+1) ] respectively. Then the joint scheme BC is given by the cylinder


[x−(n+1) , x−n , . . . , x−1 ]. Then we have

H(A | B) = Hn+1 (µ) − Hn (µ) and


H(A | BC) = Hn+2 (µ) − Hn+1 (µ).

By using the fact H(A | BC) ≤ H(A | B) we get

Hn+2 (µ) − Hn+1 (µ) ≤ Hn+1 (µ) − Hn (µ).

Also H2 (µ) ≤ 2H1 (µ). Thus the sequence H1 (µ), H2 (µ) − H1 (µ), . . . ,
Hn (µ) − Hn−1 (µ), . . . is monotonic decreasing. Since

Hn (µ) H1 (µ) + (H2 (µ) − H1 (µ)) + · · · + (Hn (µ) − Hn−1 (µ))


= ,
n n

it follows that Hnn(µ) is monotonic decreasing. But Hn (µ)


n is bounded
from below. Hence limn→∞ Hnn(µ) exists.
¤
96 Lecture 8. Classical Information Theory
Lecture 9

Quantum Information Theory

9.1 von Neumann Entropy


Following the exposition of quantum probability in chapter 1 we now
replace the classical sample space Ω = {1, 2, . . . , n} by a complex Hilbert
space H of dimension n and the probability distribution p1 , p2 . . . pn on
Ω by a state ρ, i.e., a nonnegative definite operator ρ of unit trace.
Following von Neumann we define the entropy of a quantum state ρ by
the expression S(ρ) = − Tr(ρ log ρ) where the logarithm is with respect
to the base 2 and it is understood that the function x log x is defined to
be 0 whenever x = 0. We call S(ρ) the von Neumann entropy of ρ. If
λ1 , λ2 , . . . , λn are the eigenvalues of ρ (inclusive of multiplicity) we have
X
S(ρ) = − λi log λi . (9.1.1)
i

If ρ is the diagonal matrix diag(p1 , . . . , pn ) and


P p = (p1 , . . . , pn ) a prob-
ability distribution, then S(ρ) = H(p) = − i pi log pi .

9.2 Properties of von Neumann Entropy


Property 1) 0 ≤ S(ρ) ≤ log2 d, where d is the dimension of the Hilbert
space H. S(ρ) = 0 if and only if ρ is pure, i.e., ρ =| ψihψ | for some unit
vector |ψi in H. S(ρ) = log2 d if and only if ρ = d−1 I.
Property 2) For any unitary operator U , S(U ρ U † ) = S(ρ).
Property 3) For any pure state |ψi, S(|ψihψ|) = 0.
Note that Property 3) is already contained in Property 1).
Suppose HA ⊗ HB describes the Hilbert space of a composite quan-
tum system whose constituents are systems A and B with their states

97
98 Lecture 9. Quantum Information Theory

coming from the Hilbert spaces HA and HB respectively. For any op-
erator X on H we define two operators X A and X B on HA and HB
respectively by
X
hu|X A |vi = hu ⊗ fj |X|v ⊗ fj i (9.2.1)
j
X
0 B 0
hu |X |v i = hei ⊗ u0 |X|ei ⊗ v 0 i (9.2.2)
i

for all u, v ∈ HA , u0 , v 0 ∈ HB , {ei }, {fj } being orthonormal bases in


HA , HB respectively. Note that the right side of (9.2.1) and (9.2.2) are
sesquilinear forms on HA and HB , and therefore the operators X A and
X B are uniquely defined. A simple algebra shows that X A and X B are
independent of the choice of orthonormal bases in HA and HB . We write
X A = TrB X, X B = TrA X. TrA and TrB are called the operators of
relative trace on the operator variable X. Note that Tr X A = Tr X B =
Tr X. If X is nonnegative definite so are X A and X B . In particular, for
any state ρ of the composite system ρA and ρB are states on HA and
HB respectively. We call them the marginal states of ρ.
Let |iA i, |jB i, i = 1, 2, . . . , m; j = 1, 2, . . . , n be orthonormal bases
for HA , HB respectively. Then {|iA i|jB i, 1 ≤ i ≤ m, 1 ≤ j ≤ n} is an
orthonormal basis for H = HAB = HA ⊗ HB and hence any joint pure
state |ψi can be expressed as
X
|ψi = aij |iA i|jB i. (9.2.3)
i,j

The m × n matrix A = [aij ] can be expressed as


· ¸
D 0
[aij ] = U V
0 0
where U is a unitary matrix of order m × m, V is a unitary matrix of
order n × n and D = diag(s1 , s2 , . . . , sr ), s1 ≥ s2 ≥ · · · ≥ sr ≥ 0, r being
the rank of [a√
ij ]. It follows
√ that s1 , s2 , . . . , sr are positive eigenvalues of
the matrices A A and AA† , called the singular values of A.

Define the vectors


m
X
i
|αA i = uki |kA i, 1≤i≤m
k=1
Xn
j
|βB i= vjl |lB i, 1≤j≤n
l=1
9.2. Properties of von Neumann Entropy 99

where U = [uki ], V = [vjl ]. Then (9.2.3) becomes


r
X
i i
|ψi = si |αA i|βB i. (9.2.4)
i=1

Here |αA1 i, |α2 i, . . . , |αr i and |β 1 i, |β 2 i, . . . , |β r i are orthonormal sets


A A B B B
in HA and HB of same cardinality and s1 , s2 , . . . , sr are the singular
values of A. The decomposition of |ψi in the form (9.2.4) is called the
Schmidt decomposition of |ψi.
Property 4) Let |ΩihΩ| be a pure state for AB and let ρA and ρB be
its marginal states. Then S(ρA ) = S(ρB ).

Proof By Schmidt decomposition we know that if |ψi is a pure state


for the composite system, AB; then there exist orthonormal states
|iA i for
Psystem A and orthonormal states |iB i for system B such that
|ψi
P 2 i λi |iA i|iB i, where λi ’s are nonnegative
= P real numbers satisfying
= 1. So we can write |ΩihΩ|
i λi P P 2= λi λj |iA ihjA | ⊗ |iB ihjB |. Thus
ρA = λ2i | iA ihiA | and ρB = λi | iB ihiB |. Hence the eigenvalues of
ρA and ρB are same. Therefore by (9.1.1) we have S(ρA ) = S(ρB ).
¤
Property 5) Let ρ1 , ρ2 , . . . , ρn be states with mutually orthogonal sup-
port and let p = (p1 , p2 , . . . , pn ) be a probability distribution. Then
µX ¶ X
S pi ρi = H(p) + pi S(ρi ),
i i
P
where H(p) = − pi log pi .

Proof Let λji and |eji i be the eigenvalues and corresponding eigenvec-
P
tors of ρi . Then pi ρi has eigenvalues pi λji with respective eigenvectors
|eji i. Thus,
µX ¶ X
S pi ρi = − pi λji log pi λji
i i,j
X X X
=− pi log pi − pi λji log λji
i i j
X
= H(p) + pi S(ρi ).
i

¤
100 Lecture 9. Quantum Information Theory

An immediate consequence of Property 5) is the following.

Corollary 9.2.5 (Joint entropy theorem) Let p = (p1 , p2 , . . . , pn )


be a probability distribution, {|ii, i = 1, 2, . . . , n} an orthonormal set of
states in HA and {ρi , i = 1, 2, . . . , n} a set of density operators in HB .
Then µX ¶ X
S pi |iihi| ⊗ ρi = H(p) + pi S(ρi ).
i i

Property 6) The following theorem shows that the correspondence


ρ → S(ρ) is continuous. For any matrix ρ, by |ρ| we mean the positive
square root of ρ† ρ. For two positive semidefinite matrices ρ and σ, we
define the trace distance as tr|ρ − σ|. Now we are ready to state Fannes’
inequality.

Theorem 9.2.6 (Fannes’ inequality) Suppose ρ and σ are density


matrices such that the trace distance between them satisfies Tr |ρ−σ| < 1e .
Then |S(ρ) − S(σ)| ≤ Tr |ρ − σ| log d + η(Tr |ρ − σ|), where d is the di-
mension of the Hilbert space, and η(x) = −x log x.

Proof Let r1 ≥ · · · ≥ rd and s1 ≥ · · · ≥ sd be the eigenvalues


of ρ and σ respectively. By the spectral decomposition we can write
ρ − σ = Q − R, where Q and R are positive operators with orthogo-
nal support, so Tr |ρ − σ| = Tr(R) + Tr(Q). Defining V = R + ρ =
Q + σ, we get Tr |ρ − σ| = Tr(R) + Tr(Q) = Tr(2V ) − Tr(ρ) − Tr(σ).
Let t1 ≥ · · · ≥ td be the eigenvalues of V . By the variational prin-
ciple for the ith eigenvalue it follows that ti ≥ max(ri , si ). Hence
2ti ≥ ri + si + |ri − si | and
X
Tr |ρ − σ| ≥ |ri − si | . (9.2.7)
i

When |r−s| ≤ 1e , from mean value theorem it follows that |η(r)−η(s)| ≤


η(|r − s|). Since |ri − si | ≤ 1e for all i, it follows that
¯X ¯ X
¯ ¯
|S(ρ) − S(σ)| = ¯¯ (η(ri ) − η(si ))¯¯ ≤ η(|ri − si |).
i i
P
Setting ∆ = i |ri − si | and observing that

η(|ri − si |) = ∆η(|ri − si | /∆) − |ri − si | log(∆),


9.2. Properties of von Neumann Entropy 101

we obtain
X
|S(ρ) − S(σ)| ≤ ∆ η(|ri − si | /∆) + η(∆) ≤ ∆ log d + η(∆).

By (9.2.7) and monotonicity of η(·) on the interval [0, 1/e], we get

|S(ρ) − S(σ)| ≤ Tr |ρ − σ| log d + η(Tr |ρ − σ|).

Property 7) For any two quantum states ρ, σ we define the relative


entropy S(ρ||σ) of ρ with respect to σ by
(
Tr ρ log ρ − Tr ρ log σ if supp ρ ⊂ supp σ;
S(ρ||σ) = (9.2.8)
∞ otherwise.

Theorem 9.2.9 (Klein’s inequality) S(ρ||σ) ≥ 0, where equality ho-


lds if and only if ρ = σ.

Proof
P Let the eigenP decompositions of the states ρ and σ be given by
ρ = i pi | iihi |, σ = j qj | jihj |. Then we have
X X
S(ρ||σ) = pi log pi − hi|ρ log σ|ii
X X
= pi log pi − pi |hi | ji|2 log qj .
i,j

We may assume S(ρ||σ) P to be finite. Since − log x is a convex function


in the interval [0, 1] and j |hi | ji|2 = 1, we have
X X
− |hi | ji|2 log qj ≥ − log |hi | ji|2 qj .
j j

P P
Putting ri = j |hi | ji|2 qj and observing that i ri = 1, we have
X ri
S(ρ||σ) ≥ − pi log ≥ 0.
pi
i

¤
102 Lecture 9. Quantum Information Theory

Property 8) Let ρAB be a state in HA ⊗HB with marginal states ρA and


ρB . We denote by S(A), S(B) and S(AB) the von Neumann entropy of
ρA , ρB and ρAB respectively. The quantum mutual information of the
systems A and B is defined as S(A : B) = S(A) + S(B) − S(AB).

Theorem 9.2.10 S(A : B) ≥ 0.

Proof Observe that

S(A) = − Tr ρA log ρA
= − Tr ρAB log(ρA ⊗ IB ).

Substituting in the expression for S(A : B) we get

S(A : B) = − Tr ρAB (log ρA ⊗ IB + log IA ⊗ ρB ) + Tr ρAB log ρAB


= S(ρAB ||ρA ⊗ ρB )
≥ 0.

Let ρAB be a state in HA ⊗ HB with marginal states ρA and ρB .


The conditional entropy of the state ρA given the state ρB is defined as
S(A | B) = S(AB) − S(B). Note that the state ρAB may be a pure
state and the state ρB an impure state. So S(A | B) can be less than
zero.
Property 9) Let A be a quantum system with Hilbert space HA . By
a projective measurement we mean Pn a family of projection operators
P1 , P2 , . . . , Pn in HA satisfying i=1 Pi = I. When such a measure-
ment is made in a state ρ the outcome of the measurement is j with
probability Tr ρPj . According to collapse postulate 1.3 if the outcome
P ρP
is j the state collapses to Trj ρPjj . Thus the post measurement state,
ignoring the individual outcome, is
X Pj ρPj X
(Tr ρPj ) = Pj ρPj .
Tr ρPj
j j

Theorem 9.2.11 Let ρ be the state of a quantum


P system and P1 , . . . , Pn
be a projective measurement and let ρ0 = j Pj ρPj . Then S(ρ0 ) ≥ S(ρ)
and equality holds if and only if ρ0 = ρ.
9.2. Properties of von Neumann Entropy 103

Proof
0 ≤ S(ρ||ρ0 )
= Tr ρ log ρ − Tr ρ log ρ0
µX ¶
0
= Tr ρ log ρ − Tr Pi ρ log ρ
i
X
= Tr ρ log ρ − Tr Pj ρ(log ρ0 )Pj
j
X
= Tr ρ log ρ − Tr Pj ρPj (log ρ0 )
j
0
= S(ρ ) − S(ρ).

¤
By a generalized
P measurement we mean a set of operators L1 , . . . , Ln
satisfying ni=1 L†i Li = I. If ρ is a state in which such a generalized
measurement is made, the probability of the outcome i is Tr ρL†i Li and
Li ρL†i
the post measurement state is . Thus the post measurement
Tr ρL†i Li
state, ignoring the individual outcome, is
X Li ρL†i X
(Tr ρL†i Li ) = Li ρL†i .
Tr ρL†i Li i

Remark 9.2.12 A generalized measurement may decrease the entropy.

Example 9.2.13 Let L1 = |0ih0| and L2 = |0ih1|. Note that L†1 L1 +


L†2 L2 = I. Let ρ = p|0ih0| + (1 − p)|1ih1|. Then

S(ρ) = −p log p − (1 − p) log(1 − p).

Let ρ be measured using the measurement operators L1 and L2 . The


resulting state is ρ0 = L1 ρL†1 + L2 ρL†2 =| 0ih0 |. This implies S(ρ0 ) = 0.
Property 10)

Theorem 9.2.14 Let ρAB be a state in HA ⊗ HB with marginal states


ρA and ρB . Then the following inequalities hold.
1) S(AB) ≤ S(A) + S(B),
2) S(AB) ≥ |S(A) − S(B)|.
104 Lecture 9. Quantum Information Theory

The first inequality is known as the sub-additivity inequality for the


von Neumann entropy. The second is known as the triangle inequality
or the Araki-Lieb inequality.

Proof The first inequality follows immediately from Klein’s inequality


(Theorem 9.2.9), S(ρ) ≤ − Tr ρ log σ. Let ρ = ρAB and σ = ρA ⊗ ρB .
Then

− Tr(ρ log σ) = − Tr(ρAB (log ρA + log ρB ))


= − Tr(ρA log ρA ) − Tr(ρB log ρB )
= S(A) + S(B).

Therefore we have S(AB) ≤ S(A) + S(B). From Klein’s theorem it


follows that equality holds if and only if ρAB = ρA ⊗ ρB .
To prove the triangle inequality, we introduce a reference system R
such that ρABR is a pure state in HA ⊗HB ⊗HR . Then by sub-additivity
we have S(R) + S(A) ≥ S(AR). Since ρABR is a pure state we have
S(AR) = S(B) and S(R) = S(AB). Substituting we get S(AB) ≥
S(B) − S(A). By symmetry we get the second inequality.
¤
P
Exercise 9.2.15 Let ρAB = i λi |iihi| be the spectral decomposition
for ρAB . Then, show that S(AB) = S(B) − S(A) if and only if the
operators ρA
i = TrB (|iihi|) have a common eigen basis, and the operators
B
ρi = TrA (|iihi|) have orthogonal support.

Property 11) S(ρ) is concave in ρ.

Theorem 9.2.16 Let ρ1 , ρ2 , . . . , ρn be states and let p = (p1 , p2 , . . . , pn )


be a probability distribution. Then
µX ¶ X
S pi ρi ≥ pi S(ρi ).
i i

Proof Let ρi ’s be the states in HA . Consider an auxiliary Hilbert space


HB , whose state space has an orthonormal basis |ii corresponding to the
index i of the density operators ρi . Let a joint state on HA ⊗ HB be
defined by X
ρAB = pi ρi ⊗ |iihi|.
i
9.2. Properties of von Neumann Entropy 105

P
Note that S(AB) = H(p) + pi S(ρi ), by the joint entropy theorem
(Corollary 9.2.5).

X µX ¶
A A
ρ = pi ρi ⇒ S(ρ ) = S pi ρi .
X
ρB = pi |iihi| ⇒ S(ρB ) = H(p).

By subadditivity
P we have, S(ρA )P
+ S(ρB ) ≥ S(ρAB ). Substituting we
get S( pi ρi ) + H(p) ≥ H(p) + pi S(ρi ).
¤

Property 12)
P P P
Theorem 9.2.17 pi S(ρi ) ≤ S( pi ρi ) ≤ H(p) + pi S(ρi ).

Proof First let us consider the case when ρi = |ψi ihψi | for all i. Let
ρi ’s be the states in HA and let HB be an auxiliary Hilbert space with
an orthonormal basis |ii corresponding to the P√ index i of the probabilities
pi . Let PρAB =| ABihAB | where |ABi = pi |ψi i|ii. In other words

ρAB = i,j pi pj |ψ ihψ
Pi j | ⊗ |iihj|. Since ρ AB is a pure state we have

S(A) = S(B) = S( i pi |ψi ihψi |). After performing measurement P on the


0
state ρB in the |ii basis, the state of the system will be ρB = i pi |iihi|.
But, projective measurements neverP decrease entropy and using the fact
S(ρi ) = 0 we get S(A) ≤ H(p) + i pi S(ρi ). Note that the equality
0
holds if and only if ρB = ρB and this occurs if and only if |ψi i0 s are
orthogonal. Now we can prove the mixed state case.
P
Let ρi = |eij iheij | be an orthonormal decomposition for the
i pijP
state ρi . Let ρ = i,j pi pij |eP i ihei |. Applying the result for the pure
j j
state case and observing that j pij = 1 for all i, we get
X
S(ρ) ≤ − pi pij log(pi pij )
i,j
X X X
= pi log pi − pi pij log pij
i i j
X
= H(p) + pi S(ρi ).
i

¤
106 Lecture 9. Quantum Information Theory

The sub-additivity and the triangle inequality for two quantum sys-
tems can be extended to three systems. This gives rise to a very impor-
tant and useful result, known as the strong sub-additivity. The proof
given here depends on a deep mathematical result known as Lieb’s the-
orem.
Let A, B be bounded operator variables on a Hilbert space H. Sup-
pose the pair (A, B) varies in a convex set C. A map f : C → R is said
to be jointly convex if

f (λA1 + (1 − λ)A2 , λB1 + (1 − λ)B2 ) ≤ λf (A1 , B1 ) + (1 − λ)f (A2 , B2 ).

for all 0 ≤ λ ≤ 1, (Ai , Bi ) ∈ C, i = 1, 2.


Now we are ready to state the next property.

Property 13)

Theorem 9.2.18 Relative entropy is jointly convex in its arguments.

Let H1 and H2 be two finite dimensional Hilbert spaces. Let α be a


map from B(H1 ) to B(H2 ) which satisfies α(X † X) ≥ α(X)† α(X). In our
case α will be a star homomorphism. Let Ti , Si , i ∈ {1, 2} be positive
operators in Hi , i = 1, 2. The index i corresponds to the Hilbert space
Hi . To prove Theorem 9.2.18 we need the following lemma. This is also
known as Lieb’s inequality.

Lemma 9.2.19 If Tr XT1 ≥ Tr α(X)T2 and Tr XS1 ≥ Tr α(X)S2 and


Ti , i = 1, 2 are invertible then

Tr α(X † )S2t α(X)T21−t ≤ Tr X † S1t XT11−t . (9.2.20)

Observe that (9.2.20) is true when the parameter t is equal to 1 or 0.


We need to show that the conclusion of (9.2.20) holds even when t is a
real number in the range (0, 1). So Lieb’s inequality is an interpolation
inequality. To prove Lieb’s inequality we need the following results.

Lemma 9.2.21 The following equation is true.


Z ∞£
t 1 ¤
x = λt−1 − λt (λ + x)−1 dλ. (9.2.22)
β(t, 1 − t) 0
9.2. Properties of von Neumann Entropy 107

Proof We first perform the substitution 1 + λx = u1 . Then,


Z ∞
1 £ t−1 ¤
λ − λt (λ + x)−1 dλ
β(t, 1 − t) 0
Z 0" µ ¶t−1 µ ¶t ³ ´# ³
1 1 − u 1 − u u x´
= xt−1 − xt − 2 du
β(t, 1 − t) 1 u u x u
Z 1
xt
= (1 − u)t−1 u−t du
β(t, 1 − t) 0
= xt .
¤

Lemma 9.2.23 Let 0 < t < 1 and let A, B be two positive operators
such that A ≤ B. Then At ≤ B t .

Proof
A ≤ B ⇒ (λ + A)−1 ≥ (λ + B)−1
⇒ λt (λ + A)−1 ≥ λt (λ + B)−1
⇒ λt−1 − λt (λ + A)−1 ≤ λt−1 − λt (λ + B)−1 .
Thus by spectral theorem and Lemma 9.2.21 we have At ≤ B t .
¤
· ¸
A11 A12
Lemma 9.2.24 Let A = be a strictly positive definite ma-
A21 A22
trix where A11 and A22 are square matrices. Then A11 and A22 are also
strictly positive definite and
÷ ¸−1 !
A11 A12
> A−1
11 .
A21 A22
11

Proof Note that


· ¸−1
A11 A12
=
A21 A22
· ¸
(A11 − A12 A−1
22 A21 )
−1 −(A11 − A12 A−1 −1 −1
22 A21 ) A12 A22 .
−(A22 − A21 A−1 −1 −1
11 A12 ) A21 A11 (A22 − A21 A−1
11 A12 )
−1

Therefore (A−1 )11 = (A11 − A12 A−1 −1 −1


22 A21 ) . Since A12 A22 A21 is a posi-
tive operator we have (A−1 )11 ≥ A−1
11 .
¤
108 Lecture 9. Quantum Information Theory

Lemma 9.2.25 Let X be a positive operator in a finite dimensional


Hilbert space H0 and let V be a contraction map. Then (V † XV )t ≥
V †X tV .

Proof Observe that the lemma is true when V is unitary. Let


· √ ¸
√ V 1−VV†
U= .
− 1 − V †V V†
√ √
Note that, since V is a contraction map, 1 − V V † and 1 − V † V are
well defined and U is unitary.
Let P be the map P : H0 ⊕ H0 → H0 which is projection on the first
co-ordinate. Then V = P U P |H0 . By Lemma 9.2.24 we have

(λIH0 + V † XV )−1 = (λIH0 + P U † P XP U P |H0 )−1


≤ P (λI + U † P XP U )−1 P |H0
= P U † (λ−1 P ⊥ + P (λ + X)−1 P )U P |H0
= λ−1 P U † (I − P )U P |H0 +V † (λ + X)−1 V
= λ−1 (I − V † V ) + V † (λ + X)−1 V.

This implies
Z ∞
1
λt−1 − λt (λI + V † XV )−1 dλ
β(1, 1 − t) 0
Z ∞
1
≥ λt−1 − λt (λ−1 (I − V † V ) + V † (λ + X)−1 V )dλ.
β(1, 1 − t) 0

By applying Lemma 9.2.21 we get (V † XV )t ≥ V † X t V . This completes


the proof.
¤

Remark 9.2.26 Lemma 9.2.25 holds even when the contraction V is


from one Hilbert space H1 to another Hilbert space H2 and X is a
positive operator in H2 . In this case the operator U of the proof is from
H1 ⊕ H2 to H2 ⊕ H1 .

We look upon B(H1 ) and B(H2 ) as Hilbert spaces with the scalar
product between two operators defined as hX, Y i = Tr X † Y. Define
1 1
V : B(H1 ) → B(H2 ) by V : XT12 = α(X)T22 .
9.2. Properties of von Neumann Entropy 109

Lemma 9.2.27 V is a contraction map.

Proof
1 1 1
||α(X)T22 ||2 = Tr T22 α(X)† α(X)T22
≤ Tr α(X † X)T2 ≤ Tr X † XT1
1 1
= Tr T12 X † XT12
1
= ||XT12 ||2 .
Hence the assertion is true.
¤
Assume that T1 and T2 are invertible and put ∆t X = S1t XT1−t and
Dt Y = S2t Y T2−t . Note that ∆t ∆s = ∆t+s and Dt Ds = Ds+t for s, t ≥ 0.
Furthermore,
1 1 1 1
−t
hXT12 | ∆t | XT12 i = Tr T12 X † S1t XT12
= Tr(X † S1t X)T11−t
≥ 0,
1 1
and similarly hY T22 | Dt | Y T22 i ≥ 0.
Hence ∆t and Dt are positive operator semigroups and in particular
∆t = ∆t1 and Dt = D1t .
1 1 1 1
Lemma 9.2.28 hXT12 | ∆1 | XT12 i ≥ hXT12 | V † D1 V | XT12 i.

Proof
1 1 1
− 12
hXT12 | ∆1 | XT12 i = Tr T12 X † S1 XT1
= Tr X † S1 X
= Tr XX † S1
≥ Tr α(XX † )S2
≥ Tr α(X)α(X † )S2
1
− 12
= Tr T22 α(X)† S2 α(X)T2
1 1
= hXT12 | V † D1 V | XT12 i.

¤
110 Lecture 9. Quantum Information Theory

Proof of Lemma 9.2.19 From Lemmas 9.2.28, 9.2.23 and 9.2.25 it


follows that
∆1 ≥ V † D1 V
⇒ ∆t ≥ (V † D1 V )t
≥ V † D1t V (true since V is a contraction map)

= V Dt V.
By expanding one can verify that the inequality
1 1 1 1
hXT12 | ∆t | XT12 i ≥ hα(X)T12 | Dt | α(X)T12 i
is same as (9.2.20).
¤
£ ¤
Proof of Theorem 9.2.18 Let H2 = H ⊗ H and α(X) = X0 X0 .
For 0 < λ < 1 define S1 , T1 , S2 and T2 as follows: S1 = λρ1 + (1−λ)ρ2 ,
T1 = λσ1 + (1 − λ)σ2 ,
· ¸ · ¸
λρ1 0 λσ1 0
S2 = and T2 = ,
0 (1 − λ)ρ2 0 (1 − λ)σ2
where σ1 and σ2 are invertible. Then
Tr α(X)S2 = λ Tr ρ1 X + (1 − λ) Tr ρ2 X
= Tr S1 X and
Tr α(X)T2 = λ Tr σ1 X + (1 − λ) Tr σ2 X
= Tr T1 X.
Applying (9.2.20) with X = I we get,
Tr S2t T21−t ≤ Tr S1t T11−t
1 − Tr S2t T21−t 1 − Tr S1t T11−t
lim ≥ lim
t→1 1−t t→1 1−t
d t 1−t d t 1−t
Tr S2 T2 |t=1 ≥ Tr S1 T1 |t=1
dt dt
Tr S2 log S2 − Tr S2 log T2 ≥ Tr S1 log S1 − Tr S1 log T1 .
That is,

Tr λρ1 log λρ1 +(1−λ)ρ2 log(1−λ)ρ2 −λρ1 log λσ1 −(1−λ)ρ2 log(1−λ)σ2
≥ S(λρ1 + (1 − λ)ρ2 ||λσ1 + (1 − λ)σ2 ).
9.2. Properties of von Neumann Entropy 111

Thus λS(ρ1 ||σ1 ) + (1 − λ)S(ρ2 ||σ2 ) ≥ S(λρ1 + (1 − λ)ρ2 ||λσ1 + (1 − λ)σ2 ).

Property 14) Let ρAB be a state in HA ⊗ HB with marginal states


ρA and ρB . Then the conditional entropy is concave in the state ρAB of
HA ⊗ HB .

Proof Let d be the dimension of HA . Then


µ ¶ µ µ ¶¶
I I
S ρAB || ⊗ ρB = −S(AB) − Tr ρAB log ⊗ ρB
d d
= −S(AB) − Tr(ρB log ρB ) + log d
= −S(A | B) + log d.

Therefore concavity of S(A | B) follows from convexity of the relative


entropy.
¤

Property 15)

Theorem 9.2.29 (Strong subadditivity) For any three quantum sys-


tems, A, B, C, the following inequalities hold.

1) S(A) + S(B) ≤ S(AC) + S(BC).


2) S(ABC) + S(B) ≤ S(AB) + S(BC).

Proof To prove 1), we define a function T (ρABC ) as follows:

T (ρABC ) = S(A) + S(B) − S(AC) − S(BC) = −S(C | A) − S(C | B).


P
Let ρABC = i pi | iihi | be a spectral decomposition of ρABC . From the
concavity of the conditional entropy we see that T (ρABC ) is a convex
function of ρABC . From the convexity of T we have
X
T (ρABC ) ≤ pi T (|iihi|).
i

But T (|iihi|) = 0, as for a pure state S(AC) = S(B) and S(BC) = S(A).
This implies T (ρABC ) ≤ 0. Thus

S(A) + S(B) − S(AC) − S(BC) ≤ 0.


112 Lecture 9. Quantum Information Theory

To prove 2) we introduce an auxiliary system R purifying the system


ABC so that the joint state ρABCR is pure. Then using 1) we get

S(R) + S(B) ≤ S(RC) + S(BC).

Since ABCR is a pure state, we have, S(R) = S(ABC) and S(RC) =


S(AB). Substituting we get

S(ABC) + S(B) ≤ S(AB) + S(BC).

Property 16) S(A : BC) ≥ S(A : B)

Proof Using the second part of Property 15) we have

S(A : BC) − S(A : B) = S(A) + S(BC) − S(ABC)−


[S(A) + S(B) − S(AB)]
= S(BC) + S(AB) − S(ABC) − S(B)
≥ 0.

Let H be the Hilbert space of a finite level quantum system. Re-


call that by a generalized measurement we meanP a finite collection of
operators {L1 , L2 , . . . , Lk } satisfying the relation i L†i Li = I. The set
{1, 2, . . . , k} is the collection of the possible outcomes of the measure-
ment and if the state of the system at the time of measurement is ρ then
the probability pi of the outcome i is given by pi = Tr Li ρL†i = Tr ρLi L†i .
If the outcome of the measurement is i, then the state of the system col-
lapses to
Li ρL†i
ρi = .
pi
P P
Thus the post measurement state is expected to be i pi ρi = i Li ρL†i .
The map E defined by
P
E(ρ) = Li ρL†i (9.2.30)
i

on the set of states is called a quantum operation.


9.2. Properties of von Neumann Entropy 113

If we choose and fix an orthonormal basis in H andPexpress the


operators Li as matrices in this basis the condition that i L†i Li = I
can be interpreted as the property that the columns of the matrix
 
L1
L2 
 
 .. 
 . 
Lk
constitute an orthonormal set of vectors. The length of the column
vector is kd where d is the dimension of the Hilbert space H. Extend
this set of orthonormal vectors into an orthonormal basis for H ⊗ Ck
and construct a unitary matrix of order kd × kd of the form
 
L1 · · ·
L2 · · ·
 
U = . ..  .
 .. . 
Lk · · ·
We can view this as a block matrix where each block is a d × d matrix.
Define  
1
0
 
|0i =  .  ,
 .. 
0
so that for any state ρ in H we have
 
ρ 0 ··· 0
0 0 · · · 0
 
M = ρ ⊗ | 0ih0 |=  . . .. .. 
 .. .. . .
0 0 ··· 0
as states in H ⊗ Ck . Then
 
L1 ρL†1 L1 ρL†2 · · · L1 ρL†k
L2 ρL† L2 ρL† · · · L2 ρL† 
†  1 2 k
UMU =  . .. .. ..  .
 .. . . . 
Lk ρL†1 Lk ρL†2 · · · Lk ρL†k
P
Thus we have TrCk U (ρ ⊗ | 0ih0 |)U † = ki=1 Li ρL†i = E(ρ), where E(ρ)
is defined as in (9.2.30). We summarize our discussion in the form of a
lemma.
114 Lecture 9. Quantum Information Theory

Lemma 9.2.31 Let E be a quantum operation on the states of a quan-


tum system with Hilbert space H determined by a generalized measure-
ment {Li , 1 ≤ i ≤ k}. Then there exists a pure state |0i of an auxiliary
system with a Hilbert space K of dimension k and a unitary operator U
on H ⊗ K satisfying the property E(ρ) = TrK U (ρ ⊗ | 0ih0 |)U † for every
state ρ in H.
Property 17) Let AB be a composite system with Hilbert space HAB =
HA ⊗ HB and let E be a quantum operation on B determined by the
generalized measurement {Li , 1 ≤ i ≤ k} in HB . Then id ⊗E is a
quantum operation on AB determined by the generalized measurement
{IA ⊗ Li , 1 ≤ i ≤ k}. If ρAB is any state in HAB = HA ⊗ HB and
0 0
ρA B = id ⊗ E(ρAB ), then, S(A0 : B 0 ) ≤ S(A : B).

Proof Following Lemma 9.2.31, we construct an auxiliary system C


with Hilbert space HC , a pure state |0i in HC and a unitary operator
U on HB ⊗ HC so that
P
E(ρB ) = Li ρB L†i = TrC U (ρB ⊗ | 0ih0 |)U † .
i
0 0 0
Define Ũ = IA ⊗ U . Let ρABC = ρ ⊗ | 0ih0 | and ρA B C = Ũ ρABC Ũ † .
0 0 0
Then for the marginal states we have ρA = ρA , ρB C = U ρBC U † and
therefore S(A0 ) = S(A), S(B 0 C 0 ) = S(BC). Thus using Property 16),
we get
S(A : B) = S(A) + S(B) − S(AB)
= S(A) + S(BC) − S(ABC)
= S(A0 ) + S(B 0 C 0 ) − S(A0 B 0 C 0 )
= S(A0 : B 0 C 0 )
≥ S(A0 : B 0 ).

¤
Property 18) Holevo Bound
Consider an information source in which messages x from a finite set
X come with probability p(x). We denote this probability distribution
by p. The information obtained from such a source is given by
X
H(X) = − p(x) log2 p(x).
x∈X

Now suppose the message x is encoded as a quantum state ρx in a Hilbert


space H. In order to decode
P the message make a generalized measure-
ment {Ly , Y ∈ Y } where y∈Y L†y Ly = I. Given that the message x
9.2. Properties of von Neumann Entropy 115

came from the source, or equivalently, the state of the quantum system
is the encoded state ρx the probability for the measurement value y is
given by Pr(y | x) = Tr Ly ρx L†y . Thus the joint probability Pr(x, y),
that x is the message and y is the measurement outcome, is given by

Pr(x, y) = p(x) Pr(y | x) = p(x) Tr ρx L†y Ly .

Thus we obtain a classical joint system XY described by this probability


distribution in the space X × Y . The information gained from the gen-
eralized measurement about the source X is measured by the quantity
H(X) + H(Y ) − H(XY ) (see [9]). Our next result puts an upper bound
on the information thus gained.

Theorem 9.2.32 (Holevo, 1973)


P X
H(X) + H(Y ) − H(XY ) ≤ S( p(x)ρx ) − p(x)S(ρx ).
x x

Proof Let {|xi, x ∈ X}, {|yi, y ∈ Y } be orthonormal bases in Hilbert


spaces HX , HY of dimension #X, #Y respectively. Denote by HZ the
Hilbert space of the encoded states {ρx , x ∈ X}. Consider the Hilbert
space HXZY = HX ⊗ HZ ⊗ HY . Choose and fix an element 0 in Y and
define the joint state
X
ρXZY = p(x) | xihx | ⊗ρx ⊗ | 0ih0 | .
x

p space HZY consider the generalized measurement deter-


In the Hilbert
mined by { Ey ⊗ Uy , y ∈ Y } where Ey = L†y Ly and Uy is any unitary
operator in HY satisfying Uy |0i = |yi. Such a measurement gives an
operation E on the states of the system ZY and the operation id ⊗E
satisfies
X p p 0 0 0
(id ⊗E)(ρXZY ) = p(x) | xihx | ⊗ Ey ρx Ey ⊗ | yihy |= ρX Z Y ,
x∈X, y∈Y

say. By Property 17) we have S(X : Z) = S(X : ZY ) ≥ S(X 0 : Z 0 Y 0 ).


By Property 16)

S(X : Z) ≥ S(X 0 : Y 0 ). (9.2.33)


116 Lecture 9. Quantum Information Theory

P
Since ρXZ = p(x) P | xihx | ⊗ρx we have from the joint entropy theorem
S(XZ) = H(p) + p(x)S(ρx ). Furthermore
X
ρX = p(x) | xihx |, S(X) = H(p) = H(X)
X
ρZ = p(x)ρx , S(Z) = S(ρZ ) (9.2.34)
P P
S(X : Z) = S( p(x)ρx ) − p(x)S(ρx ).

On the other hand


0Z0Y 0
X p p
ρX = p(x) | xihx | ⊗ Ey ρx Ey ⊗ | yihy |
x,y
X
X0
ρ = p(x) | xihx |
x
0
X
ρY = p(x) Tr ρx Ey | yihy |
x,y
0Y 0
X
ρX = p(x) Tr ρx Ey | xihx | ⊗ | yihy | .
x,y

Thus,
S(X 0 : Y 0 ) = H(X) + H(Y ) − H(XY ). (9.2.35)
Combining (9.2.33), (9.2.34) and (9.2.35) we get the required result.
¤
Property 19) Schumacher’s theorem
Let p be a probability distribution on a finite set X. For ² > 0 define

ν(p, ²) = min{#E | E ⊂ X, Pr(E; p) ≥ 1 − ²}.

It is quite possible that #X is large in comparison with ν(p, ²). In other


words, by omitting a set of probability at most ² we may have most of
the statistical information packed in a set E of size much smaller than
#X. In the context of information theory it is natural to consider the
ratio log 2 ν(p,²)
log2 #X as the information content of p upto a negligible set of
probability at most ². If now we replace the probability space (X, p)
by its n-fold cartesian product (X n , p⊗n ) (n i.i.d. copies of (X, p)) and
allow n to increase to infinity then an application of the law of large
numbers leads to the following result:

log ν(p⊗n , ²) H(p)


lim n
= .
n→∞ log #X log X
9.2. Properties of von Neumann Entropy 117

Or equivalently,

log ν(p⊗n , ²)
lim = H(p) for all ² > 0 (9.2.36)
n→∞ n

where H(p) is the Shannon entropy of p. This is a special case of


Macmillan’s theorem in classical information theory. Our next result is
a quantum analogue of (9.2.36), which also implies (9.2.36). Let (H, ρ)
be a quantum probability space where H is a finite dimensional Hilbert
space and ρ is a state. For any projection operator E on H denote by
dim E the dimension of the range of E. For any ² > 0 define

ν(ρ, ²) = min{dim E | E is a projection in H, Tr ρE ≥ 1 − ²}.

Theorem 9.2.37 (Schumacher) For any ² > 0

log ν(ρ⊗n , ²)
lim = S(ρ) (9.2.38)
n→∞ n

where S(ρ) is the von Neumann entropy of ρ.

Proof By the spectral theorem ρ can be expressed as


X
ρ= p(x) | xihx |
x

where x varies in a finite set X of labels, p = {p(x), x ∈ X} is a prob-


ability distribution with p(x) > 0 for every x and {|xi, x ∈ X} is an
orthonormal set in H. Then
X
ρ⊗n = p(x1 )p(x2 ) . . . p(xn ) | xihx |
x=(x1 ,x2 ,...,xn )

where xi ’s vary in X and |xi denotes the product vector |x1 i|x2 i . . . |xn i.
Write pn (x) = p(x1 ) . . . p(xn ) and observe that p⊗n = {pn (x), x ∈ X ⊗n }
is Pthe probability distribution of n i.i.d. copies of p. We have S(ρ) =
− x p(x) log p(x) = H(p). From the strong law of large numbers for
i.i.d. random variables it follows that
n
1 1X
lim − log p(x1 )p(x2 ) . . . p(xn ) = lim − log p(xi ) = S(ρ)
n→∞ n n→∞ n
i=1
118 Lecture 9. Quantum Information Theory

in the sense of almost sure convergence in the probability space


(X ∞ , p⊗∞ ). This suggests that, in the search for a small set of high
probability, we consider the set
½ ¯ ¯ ¾
¯ 1 ¯
¯ ¯
T (n, ²) = x : ¯ − log p(x1 )p(x2 ) . . . p(xn ) − S(ρ)¯ ≤ ² (9.2.39)
n

Any element of T (n, ²) is called an ²-typical sequence of length n. It is


a consequence of the large deviation principle that there exist constants
A > 0, 0 < c < 1 such that

Pr(T (n, ²)) ≥ 1 − Acn , (9.2.40)

Pr denoting probability but according to the distribution p⊗n . This says


but for a set of sequences of total probability < Acn every sequence is
²-typical. It follows from (9.2.39) that for any ²-typical sequence x

2−n(S(ρ)+²) ≤ pn (x) ≤ 2−n(S(ρ)−²) . (9.2.41)

Define the projection


X
E(n, ²) = | xihx | (9.2.42)
x∈T (n,²)

and note that dim E(n, ²) = #T (n, ²). Summing over x ∈ T (n, ²) in
(9.2.41) we conclude that

2−n(S(ρ)+²)) dim E(n, ²) ≤ Pr(T (n, ²)) ≤ 2−n(S(ρ)−²)) dim E(n, ²)

and therefore by (9.2.40) and the fact that probabilities never exceed 1,
we get
2n(S(ρ)−²)) (1 − Acn ) ≤ dim E(n, ²) ≤ 2n(S(ρ)+²))
for all ² > 0, n = 1, 2, . . .. In particular

log dim E(n, ²)


≤ S(ρ) + ².
n
Fix ² and let δ > 0 be arbitrary. Choose n0 so that Acn0 < δ. Note
that Tr ρ⊗n E(n, ²) = Pr(T (n, ²)) ≥ 1 − δ for n ≥ n0 . By the definition
of ν(ρ⊗n , δ) we have

log ν(ρ⊗n , δ) log dim E(n, ²)


≤ ≤ S(ρ) + ², for n ≥ n0 .
n n
9.2. Properties of von Neumann Entropy 119

Letting n → ∞ we get

log ν(ρ⊗n , δ)
limn→∞ ≤ S(ρ) + ².
n
Since ² is arbitrary we get

log ν(ρ⊗n , δ)
limn→∞ ≤ S(ρ).
n
Now we shall arrive at a contradiction by assuming that

log ν(ρ⊗n , δ)
limn→∞ < S(ρ).
n
Under such a hypothesis there would exist an η > 0 such that

log ν(ρ⊗n , δ)
≤ S(ρ) − η
n
for infinitely many n, say n = n1 , n2 , . . . where n1 < n2 < · · · . In such
a case there exists a projection Fnj in H⊗nj such that

dim Fnj ≤ 2nj (S(ρ)−η))


(9.2.43)
Tr ρ⊗nj Fnj ≥ 1 − δ

for j = 1, 2, . . .. Choosing ² < η and fixing it we have

1 − δ ≤ Tr ρ⊗nj Fnj
(9.2.44)
= Tr ρ⊗nj E(nj , ²)Fnj + Tr ρ⊗nj (I − E(nj , ²))Fnj .

From (9.2.40) and the fact that ρ⊗n and E(n, ²) commute with each
other we have

Tr ρ⊗nj (I − E(nj , ²))Fnj ≤ Tr ρ⊗nj (I − E(nj , ²))


= 1 − Pr(T (nj , ²)) (9.2.45)
nj
< Ac .

Furthermore from (9.2.41) we have


X
ρ⊗nj E(nj , ²) = pnj (x) | xihx | ≤ 2−nj (S(ρ)−²)) I.
x∈T (nj ,²)
120 Lecture 9. Quantum Information Theory

Thus by (9.2.43) we get


Tr ρ⊗nj E(nj , ²)Fnj ≤ 2−nj (S(ρ)−²)) dim Fnj
≤ 2−nj (S(ρ)−²))+nj (S(ρ)−η)) (9.2.46)
= 2−nj (η−²) .
Now combining (9.2.44), (9.2.45) and (9.2.46) we get
1 − δ ≤ 2−nj (η−²) + Acnj ,
where the right side tends to 0 as j → ∞, a contradiction.
¤
Property 20) Feinstein’s fundamental lemma
Consider a classical information channel C equipped with an in-
put alphabet A, an output alphabet B and a transition probability
{px (V ), x ∈ A, V ⊂ B}. We assume that both A and B are finite sets.
If a letter x ∈ A is transmitted through the channel C then any output
y ∈ B is possible and px (V ) denotes the probability that the output
letter belongs to V under the condition that x is transmitted. For such
a channel we define a code of size N and error probability ≤ ² to be
a set C = {c1 , c2 , . . . , cN } ⊂ A together with a family {V1 , V2 , . . . , VN }
of disjoint subsets of B satisfying the condition pci (Vi ) ≥ 1 − ² for all
i = 1, 2, . . . , N . Let
½ ¾
there exists a code of size N and error proba-
ν(C, ²) = max N .
bility ≤ ²
Our aim is to estimate ν(C, ²) in terms of information theoretic param-
eters concerning the conditional distributions px (·), x ∈ A, denoted by
px . To this end consider an input probability distribution p(x), x ∈ A,
denoted by p and define the joint input-output distribution P such that
Pr(x, y; P ) = p(x)px ({y}). From now onwards we write Pr(x, y) instead
of Pr(x, y, P ). Denote by Hp(A : B) the mutual information between
the input and the output according to the joint distribution P . Put
C = sup Hp(A : B) (9.2.47)
p

where the supremum is taken over all input distributions p. For a fixed
input distribution p, put
X ½ ¾2
2 Pr(x, y)
σp = Pr(x, y) log − Hp(A : B) (9.2.48)
p(x)q(y)
x∈A,y∈B
9.2. Properties of von Neumann Entropy 121

where q is the B-marginal distribution determined by P . Thus q(y) =


P
x Pr(x, y). With these notations we have the following lemma.

Lemma 9.2.49 Let η > 0, δ > 0 be positive constants and let p be any
input distribution on A. Then there exists a code of size N and error
probability ≤ η where
à !
σp2
N ≥ η − 2 2Hp (A:B)−δ .
δ

Proof Put R = Hp (A : B). Define the random variable ξ on the


probability space (A × B, P ) by

Pr(x, y)
ξ(x, y) = log .
p(x)q(y)

Then ξ has expectation R and variance σp2 defined by (9.2.48). Let


½ ¯ ¯ ¾
¯ Pr(x, y) ¯
¯
V = (x, y) : ¯log ¯
− R¯ ≤ δ . (9.2.50)
p(x)q(y)

Then by Chebyshev’s inequality for the random variable ξ we have

σp2
Pr(V ; P ) ≥ 1 − . (9.2.51)
δ2
Define Vx = {y | (x, y) ∈ V }. Then (9.2.51) can be expressed as

X σp2
p(x)px (Vx ) ≥ 1 − . (9.2.52)
δ2
x∈A

This shows that for a p-large set of x’s the conditional probabilities
px (Vx ) must be large. When (x, y) ∈ V we have from (9.2.50)

Pr(x, y)
R − δ ≤ log ≤R+δ
p(x)q(y)

or equivalently
q(y)2R−δ ≤ px (y) ≤ q(y)2R+δ .
Summing over y ∈ Vx we get

q(Vx )2R−δ ≤ px (Vx ) ≤ q(Vx )2R+δ .


122 Lecture 9. Quantum Information Theory

In particular,

q(Vx ) ≤ px (Vx )2−(R−δ) ≤ 2−(R−δ) . (9.2.53)

In other words Vx ’s are q-small. Now choose x1 in A such that px1 (Vx1 ) ≥
1 − η and set V1 = Vx1 . Then choose x2 such that px2 (Vx2 ∩ V10 ) > 1 − η,
where the prime 0 denotes complement in B. Put V2 = Vx2 ∩V10 . Continue
this procedure till we have an xN such that

pxN (VxN ∩ V10 ∩ · · · ∩ VN0 −1 ) > 1 − η,

and for any x ∈


/ {x1 , x2 , . . . , xN },

px (Vx ∩ (∪N 0
j=1 Vj ) ) ≤ 1 − η

where VN = VxN ∩ V10 ∩ · · · ∩ VN0 −1 . By choice the sets V1 , V2 , . . . , VN are


disjoint, ∪N N
i=1 Vi = ∪i=1 Vxi and therefore

px (Vx ∩ (∪N 0
j=1 Vj ) ) ≤ 1 − η for all x ∈ A. (9.2.54)

From (9.2.52), (9.2.53) and (9.2.54) we have

σp2 X
1− ≤ p(x)px (Vx )
δ2 x
X X
= p(x)px (Vx ∩ (∪N 0
i=1 Vi ) ) + p(x)px (Vx ∩ (∪N
i=1 Vi ))
x x
X
≤1−η+ p(x)px (Vx ∩ (∪N
i=1 Vi ))
x
= 1 − η + q(∪N
i=1 Vi )
N
X
≤1−η+ q(Vi )
i=1
XN
≤1−η+ q(Vxi )
i=1
≤ 1 − η + N 2−(R−δ) .
³ ´
σp2
Thus N ≥ η − δ2
2(R−δ) .
¤
9.2. Properties of von Neumann Entropy 123

Now we consider the n-fold product C(n) of the channel C with input al-
(n)
phabet An , output alphabet B n and transition probability {px (V ), x ∈
n n
A , V ⊂ B } where for x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ),
n
Y
(n)
px ({y}) = pxi ({yi }).
i=1

We now choose and fix an input distribution p on A and define the


product probability distribution P (n) on An × B n by
n
Y
P (n) (x, y) = p(xi )pxi ({yi }).
i=1

Then the An marginal of P (n) is given by


n
Y
p(n) (x) = p(xi )
i=1

and Hp(n) (An : B n ) = nHp(A : B), σp2 (n) = nσp2 where σp2 is given by
(9.2.48). Choose η > 0, δ = n² and apply the Lemma 9.2.49 to the
product channel. Then it follows that there exists a code of size N and
error probability ≤ η with
à ! à !
nσp2 σ 2
N ≥ η − 2 2 2n(Hp (A:B)−²) = η − 2 2n(Hp (A:B)−²) .
p
n ² n²

Thus
à !
σ 2
1 1
log ν(C(n) , η) ≥ log η − 2 + Hp(A : B) − ².
p
n n n²

In other words
1
limn→∞ log ν(C(n) , η) ≥ Hp(A : B) − ².
n
Here the positive constant ² and the initial distribution p on the input
alphabet A are arbitrary. Hence we conclude that

1
limn→∞ log ν(C(n) , η) ≥ C.
n
124 Lecture 9. Quantum Information Theory

It has been shown by J. Wolfowitz ([16])that

1
limn→∞ log ν(C(n) , η) ≤ C.
n
The proof of this assertion is long and delicate and we refer the reader
to [16]. We summarize our discussions in the form of a theorem.

Theorem 9.2.55 (Shannon-Wolfowitz) Let C be a channel with fi-


nite input and output alphabets A and B respectively and transition prob-
ability {px (V ), x ∈ A, V ⊂ B}. Define the constant C by (9.2.47). Then

1
lim log ν(C(n) , η) = C for all 0 < η < 1.
n→∞ n

Remark 9.2.56 The constant C deserves to be and is called the ca-


pacity of the discrete memory less channel determined by the product
of copies of C.

A quantum information channel is characterized by an input Hilbert


space HA , an output Hilbert space HB and a quantum operation E which
maps states on HA to states on HB . We assume that HA and HB are
finite dimensional. The operation E has the form
k
X
E(ρ) = Li ρL†i (9.2.57)
i=1

P † L1 , . . . , Lk are operators from HA to HB obeying the condition


where
i Li Li = IA . A message encoded as the state ρ on HA is transmitted
through the channel and received as a state E(ρ) in HB and the aim
is to recover ρ as accurately as possible from E(ρ). Thus E plays the
role of transition probability in the classical channel. The recovery is
implemented by a recovery operation which maps states on HB to states
on HA . A quantum code C of error not exceeding ² can be defined
as a subspace C ⊂ HA with the property that there exists a recovery
operation R of the form


R(ρ0 ) = Mj ρ0 Mj† for any state ρ0 on HB
j=1

where the following conditions hold:


9.2. Properties of von Neumann Entropy 125

P` †
1. M1 , . . . , M` are operators from HA to HB satisfying j=1 Mj Mj =
IB ;

2. for any ψ ∈ C, hψ|R ◦ E(| ψihψ |)|ψi ≥ 1 − ².

Now define

ν(E, ²) = max{dim C | C is a quantum code of error not exceeding ²}.

We may call ν(E, ²) the maximal size possible for a quantum code of
error not exceeding ². As in the case of classical channels one would like
to estimate ν(E, ²).
If n > 1 is any integer define the n-fold product E ⊗n of the operation
E by
X
E ⊗n = Li1 ⊗ Li2 ⊗ · · · ⊗ Lin ρL†i1 ⊗ L†i2 ⊗ · · · ⊗ L†in
i1 ,i2 ,...,in

⊗n
for any state ρ on HA , where the Li ’s are as in (9.2.57). It is an
interesting problem
© ª to analyze the asymptotic behavior of the sequence
1 ⊗n , ²) as n → ∞.
n log ν(E
126 Lecture 9. Quantum Information Theory
Bibliography
[1] J. Aczel and Z. Daroczy, On Measures of Information and Their
Characterizations, Academic Pub., New York, 1975.

[2] A. Aho, J. Hopcroft and J. Ullman, The Design and Analysis of


Computer Algorithms, Addison-Wesley, Reading, Massachusetts,
1974.

[3] M. Artin, Algebra, Prentice Hall of India Pvt. Ltd., 1996

[4] I. L. Chuang and M. A. Nielsen, Quantum Computation and Quan-


tum Information, Cambridge University Press, 2000.

[5] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to


Algorithms, McGraw-Hill Higher Education, March 1, 1990.

[6] G. H. Hardy and E. M. Wright, Introduction to the Theory of Num-


bers, ELBS and Oxford University Press, 4th edition, 1959.

[7] W. C. Huffman and Vera Pless, Fundamentals of Error-correcting


Codes, Cambridge University Press, Cambridge, 2003.

[8] N. Jacobson, Basic Algebra I, II, Freeman, San Francisco, 1974,


1980.

[9] A. I. Khinchin, Mathematical Foundations of Information Theory,


New York, 1957.

[10] D. E. Knuth, Seminumerical Algorithms, volume 2 of The Art of


Computer Programming, 3rd edition, Addison-Wesley, 1997.

[11] A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung,


1933 (Foundations of the Theory of Probability, Chelsea, New york,
1950).

[12] D. C. Kozen, The Design and Analysis of Algorithms, Springer-


Verlag, 1992.

127
128 Bibliography

[13] F. J. Macwilliams and N. J. A. Sloane, Theory of Error-correcting


Codes, North Holland, Amsterdam, 1978.

[14] J. von Neumann, Mathematical Foundations of Quantum Mechan-


ics (translated from German), Princeton University Press, 1955.
Original in, Collected Works, Vol 1, pp. 151–235, edited by A.H.
Taub, Pergamon Press 1961.

[15] K. R. Parthasarathy, Lectures on Error-correcting Codes, Indian


Statistical Institute, New Delhi.

[16] J. Wolfowitz, Coding Theorems of Information Theory, Springer


Verlag, 3rd edition, 1978.

You might also like