Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
Information Theory
by
K. R. Parthasarathy
Indian Statistical Institute
(Delhi Centre)
Notes by
Amitava Bhattacharya
Tata Institute of Fundamental Research, Mumbai
Preface
K. R. Parthasarathy
Delhi
June 2001
Contents
1 Quantum Probability 1
1.1 Classical Versus Quantum Probability Theory . . . . . . . 1
1.2 Three Distinguishing Features . . . . . . . . . . . . . . . . 7
1.3 Measurements: von Neumann’s Collapse Postulate . . . . 9
1.4 Dirac Notation . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Qubits . . . . . . . . . . . . . . . . . . . . . . . . . 10
i
ii Contents
5 Order Finding 49
5.1 The Order Finding Algorithm . . . . . . . . . . . . . . . . 49
Appendix 1: Classical reversible computation . . . . . . . 52
j
Appendix 2: Efficient implementation of controlled U 2
operation . . . . . . . . . . . . . . . . . . . . . . . 54
Appendix 3: Continued fraction algorithm . . . . . . . . . 55
Appendix 4: Estimating ϕ(r)
r
. . . . . . . . . . . . . . . . 58
6 Shor’s Algorithm 61
6.1 Factoring to Order Finding . . . . . . . . . . . . . . . . . 61
Bibliography 127
Lecture 1
Quantum Probability
In the Mathematical Congress held at Berlin, Peter Shor presented a
new algorithm for factoring numbers on a quantum computer. In this
series of lectures, we shall study the areas of quantum computation (in-
cluding Shor’s algorithm), quantum error correcting codes and quantum
information theory.
1
2 Lecture 1. Quantum Probability
Spaces
1.1 The sample space 1.2 The state space H: It
Ω: This is a finite set, say is a complex Hilbert space of di-
{1, 2, . . . , n}. mension n.
Events
1.3 The set of events FΩ : 1.4 The set of events P(H):
This is the set of all subsets of Ω. This is the set of all orthogo-
FΩ is a Boolean algebra with the nal projections in H. An ele-
union (∪) operation for ‘or’ and ment E ∈ P(H) is called an
the intersection (∩) operation for event. Here, instead of ‘∪’ we
‘and’. In particular, we have have the max (∨) operation, and
instead of ‘∩’ the min (∧) oper-
E∩(F1 ∪F2 ) = (E∩F1 )∪(E∩F2 ). ation. Note, however, that E ∧
(F1 ∨ F2 ) is not always equal to
(E ∧ F1 ) ∨ (E ∧ F2 ). (They are
equal if E, F1 , F2 commute with
each other).
Similarly, we have
X
fr = λr 1f −1 ({λ}) ,
λ∈Sp(f )
The r-th moment of f is the ex- The map X 7→ Eρ X has the fol-
pectation of f r , that is lowing properties:
1.1. Classical Versus Quantum Probability Theory 5
X
r
(f (ω))r pω (1) It is linear;
Ef =
p
ω∈Ω
X (2) Eρ X † X ≥ 0, for all X ∈
r −1
= λ Pr(f (λ)), B(H).
λ∈Sp(f )
(3) Eρ I = 1.
and the characteristic function
of f is the expectation of the The r-th moment of X is the ex-
complex-valued random variable pectation of X r ; if X is real-
eitf , that is, valued, then using the spectral
X decomposition, we can write
itf
Ee = eitλ Pr(f −1 (λ)). X
r
p
λ∈Sp(f ) EX = λr Tr ρEλ .
ρ
λ∈Sp(X)
The variance of a real-valued ran-
dom variable f is The characteristic function of
the real-valued observable X is
∆
var(f ) = E(f − E f )2 ≥ 0. the expectation of the observable
p p
eitX . The variance of a (real-
Note that valued) observable X is
∆
var(f ) = E f 2 − (E f )2 ; var(X) = Tr ρ(X − Tr ρX)2
p p
= Tr ρX 2 − (Tr ρX)2
also, var(f ) = 0 if and only if all ≥ 0.
the mass in the distribution of f
is concentrated at Ep f . The variance of X vanishes if and
only if the distribution of X is
concentrated at the point Tr ρX.
This is equivalent to the property
that the operator range of ρ is
contained in the eigensubspace of
X with eigenvalue Tr ρX.
Extreme points
1.9 The set of distribu- 1.10 The set of states: The
tions: The set of all probabil- set of all states in H is a convex
ity distributions on Ω is a com- set. Let ρ be a state. Since ρ
pact convex set (Choquet sim- is non-negative definite, its eigen
plex) with exactly n extreme values are non-negative reals, and
points, δj (j = 1, 2, . . . , n), where we can write
δj is determined by
6 Lecture 1. Quantum Probability
X
½ ρ= λEλ ;
∆ 1 if ω = j;
δj ({ω}) = λ∈Sp(ρ)
0 otherwise.
since Tr ρ = 1, we have
If P = δj , then every random X
variable has a degenerate distri- λ × dim(Eλ ) = 1.
bution under P : the distribution λ∈Sp(ρ)
of the random variable f is con-
centrated on the point f (j). The projection Eλ can, in turn,
be written as a sum of one-
dimensional projections:
dim(Eλ )
X
Eλ = Eλ,i .
i=1
Then,
dim(Eλ )
X X
ρ= λEλ,i .
λ∈Sp(ρ) i=1
and
The product
1.11 Product spaces: If 1.12 Product spaces: If
there are two statistical systems (H1 , ρ1 ) and (H2 , ρ2 ) are two
described by classical probability quantum systems, then the
spaces (Ω1 , p1 ) and (Ω2 , p2 ) quantum system with state
respectively, then the proba- space H1 ⊗ H2 and state ρ1 ⊗ ρ2
bility space (Ω1 × Ω2 , p1 × p2 ) (which is a non-negative defi-
determined by nite operator of unit trace on
H1 ⊗ H2 ) describes the two
∆ independent quantum systems
Pr({(i, j)}; p1 × p2 ) =
as a single system.
Pr({i}; p1 ) Pr({j}; p2 ),
F (E ∨ F − E) = (E ∨ F − E)F.
where
∆
{X, Y } = XY + Y X; and
∆
[X, Y ] = XY − Y X.
1.3. Measurements: von Neumann’s Collapse Postulate 9
Tr ρ(X + zY )† (X + zY ) ≥ 0.
If z = reiθ ,
r2 Tr ρY 2 + 2r<e−iθ Tr ρY X + Tr ρX 2 ≥ 0.
1.4.1 Qubits
∆ £ ¤ £ ¤®
The Hilbert space h = C2 , with scalar product ab , dc = āc + b̄d, is
called a 1-qubit Hilbert space. Let
· ¸ · ¸
1 0
|0i = and |1i = .
0 1
Then, · ¸
a
= a|0i + b|1i,
b
and the ket vectors |0i and |1i form an orthonormal basis for h.
The Hilbert space h⊗n = (C2 )⊗n is called the n-qubit Hilbert space.
If x1 x2 · · · xn is an n-length word from the binary alphabet {0, 1}, we let
∆
|x1 x2 · · · xn i = |x1 i|x2 i · · · |xn i
∆
= |x1 i ⊗ |x2 i ⊗ · · · ⊗ |xn i
∆
= |xi,
n |ui U |ui
U
|ui U |ui
U
If the input is |ui and it passes through the gate U , then the output
is written as U |ui.
11
12 Lecture 2. Quantum Gates and Circuits
where |a|2 + |b|2 = 1 in the computational basis consisting of |0i and |1i.
The action of the unitary operator U on the basis states can be
computed as shown below.
· ¸· ¸
iα a b 1
U |0i = e = eiα {a|0i − b|1i}.
−b a 0
n −1
2X
X= j|jihj|,
j=0
n |vi j1 j2
U1 U2
1. Pauli gates: There are three such gates and they are denoted by
X, Y, Z. The unitary matrices of X, Y, Z in the computational
basis are given by
· ¸ · ¸ · ¸
0 1 0 −i 1 0
X= ,Y = ,Z = .
1 0 i 0 0 −1
The unitary matrix X is also called the not gate because X|0i = |1i
and X|1i = |0i.
These gates are called Pauli gates because the unitary matrices
corresponding to these operators are the Pauli matrices σ1 , σ2 , σ3
14 Lecture 2. Quantum Gates and Circuits
where x · y = x1 y1 + x2 y2 + · · · + xn yn .
£ ¤
3. Phase gate: The unitary matrix for this gate is S = 10 0i . This
gate changes the phase of the ket vector |1i by i so that |1i be-
comes i|1i, and leaves the ket vector |0i fixed.
π
4. 8 gate: The unitary matrix for this gate is
" # " −iπ #
1 0 π e 8 0
T = iπ = ei 8 iπ .
0 e4 0 e8
π
This gate changes the phase of |1i by ei 4 .
Figure 2.4: Two qubit gates. A CNOT gate and a SWAP gate.
The gate could also negate the content of the first qubit depending
on the second qubit. Such a gate will have a different unitary ma-
trix. The essential point is that a qubit can get negated depending
on a control qubit. The control qubit will always be denoted by a
solid dot in pictures.
2. Swap gate:
This gate (Figure 2.4) swaps the contents of the two qubits. Be-
cause the vectors |00i and |11i are symmetric, they are unaltered,
while the vector |01i gets mapped to |10i and vice versa.
The unitary matrix for this gate is
1 0 0 0
0 0 1 0
P =
0
.
1 0 0
0 0 0 1
Exercise 2.2.1 Prove that the two circuits given in Figure 2.5
are equivalent.
|a, bi → |a, a⊕bi → |a⊕(a⊕b), a⊕bi = |b, a⊕bi → |b, (a⊕b)⊕bi = |b, ai.
16 Lecture 2. Quantum Gates and Circuits
3. Controlled unitary: This is just like the controlled NOT, but in-
stead of negating the target qubit, we perform the unitary trans-
form prescribed by the matrix U (only if the control qubit is in
state |1i). It is represented schematically as shown in the first
diagram of Figure 2.6.
Figure 2.6: A controlled unitary gate, Toffoli gate and a Fredkin gate.
More generally Rn̂ (θ) = (cos 2θ )I − (i sin 2θ )(n̂x X + n̂y Y + n̂z Z) is the
matrix corresponding to rotation by an angle θ about the axis with
direction cosines (n̂x , n̂y , n̂z ).
Corollary 2.2.4 In Figure 2.7, the circuit on the left hand side is
equivalent to the circuit on the right hand side if AXBXC = e−iα U ,
ABC = I and · ¸
1 0
D= .
0 eiα
U C B A
Corollary 2.2.5 In Figure 2.8, the circuit on the left hand side is equiv-
alent to the circuit on the right hand side if V 2 = U .
U V V† V
Figure 2.8: Circuit for the C 2 (U ) gate. V is any unitary operator sat-
isfying V 2 = U . The special case V = (1 − i)(I + iX)/2 corresponds to
the Toffoli gate.
2.3. Some Simple Circuits 19
Proof
|00i|ui → |00i|ui.
|01i|ui → |01iV |ui → |01iV † V |ui = |01iI|ui = |01i|ui.
|10i|ui → |11i|ui → |11iV † |ui → |10iV † |ui → |10iV V † |ui = |10i|ui.
|11i|ui → |11iV |ui → |10iV |ui → |11iV |ui → |11iV V |ui = |11iU |ui.
Exercise 2.2.7 Derive and verify that the circuit on the right hand side
of Figure 2.9 is a correct realization of the Toffoli gate using controlled
NOT and single qubit gates.
T† T† S
H T† T T† T H
M1
H
M2
X M2 Z M1
|ψ0 i |ψ1 i |ψ2 i
|00i + |11i 1
|ψ0 i = |ψi √ = √ [α|0i(|00i + |11i) + β|1i(|00i + |11i)].
2 2
1
|ψ1 i = √ [α|0i(|00i + |11i) + β|1i(|10i + |01i)].
2
After she sends the first qubit through the Hadamard gate the state of
the system is
1
|ψ2 i = [α(|0i + |1i)(|00i + |11i) + β(|0i − |1i)(|10i + |01i)].
2
2.3. Some Simple Circuits 21
Collecting the first two qubits the state |ψ2 i can be re-written as
1
|ψ2 i = [|00i(α|0i + β|1i) + |01i(α|1i + β|0i) + |10i(α|0i − β|1i)+
2
|11i(α|1i − β|0i)].
When Alice makes a measurement on the two qubits she can control,
the state of Bob’s qubit is completely determined by the results of Alice’s
measurement on her first two qubits. Hence if Alice sends the results of
her measurement to Bob, he can apply appropriate gates on the qubit he
can access and get the state |ψi. The action of Bob can be summarized
as in the table below.
Thus, the state of the first qubit |ψi is transferred to the third qubit
which is with Bob. The above algorithm implies that one shared EPR
pair and two classical bits of communication is a resource at least equal
to one qubit of quantum communication.
M1
(|00i+|11i)
G
|ϕ0 i = 2
Alice selects the gate G according to the bits she wants to send. She
selects a gate according to the table below and applies it to the qubit
she possesses before transmitting it to Bob.
The four possible states that Bob can receive are the so-called Bell
states or EPR pairs which constitute the Bell basis. Since the Bell states
form an orthogonal basis, they can be distinguished by measuring in the
appropriate basis. Hence when Bob receives the qubit sent by Alice he
has both the qubits. Then he does a measurement in the Bell basis and
finds out the message she wanted to send. In classical computation it is
impossible to send two bits of information by just passing a single bit.
So a qubit can carry more than one bit of classical information.
Exercise 2.3.1 Wa,α Wb,β = α(b)Wa+b,αβ . i.e. the Wa,α form a projec-
tive unitary representation of the group F × F̂ . The term projective is
used to refer to the fact that the unitary operators Wa,α form a repre-
sentation of F × F̂ upto multiplication by a complex scalar (the number
α(b)) of modulus unity.
Exercise 2.3.2 Show that the only linear operators which commute
with Wa,α for all (a, α) ∈ F × F̂ , are the scalars. Hence, the Wa,α ’s
form an irreducible projective representation of the group F × F̂ , i.e.
the only subspaces of H which are invariant under every Wa,α are the
zero subspace and H itself.
Exercise 2.3.3 Show that the operators {Wa,α }(a,α)∈F ×F̂ are linearly
independent. Thus, they span the space B(H) of (bounded) linear op-
erators on H.
†
Exercise 2.3.4 Show that Wa,α = α(a)W−a,α . Show also that
Tr Wa,α = n if a = 0 and α is the trivial character, where n = |F |;
†
otherwise Tr Wa,α = 0. Hence, prove that Tr Wa,α Wb,β = nδ(a,α),(b,β) .
∆
Also define |(a, α)i = (Wa,α ⊗ I)|ψ0 i, where I is the identity operator
on H. Then, {|(a, α)i}(a,α)∈F ×F̂ is an orthonormal basis for H ⊗ H.
|ψ0 i is the entangled state which Alice and Bob share. Alice holds the
first log n qubits of the state while Bob holds the other log n qubits. To
send a message m ∈ [n2 ], Alice applies the unitary transformation Wa,α ,
where f (a, α) = m, on her qubits. She then sends her qubits to Bob,
who then applies the measurement X on the 2 log n qubits which he now
has. The outcome of the measurement is m, which is exactly what Alice
intended to send. Thus Alice has communicated 2 log n classical bits of
information using only log n qubits of quantum communication.
24 Lecture 2. Quantum Gates and Circuits
Alice Bob
Alice Bob
Wa,α
log n log n
Bob
|ψ0 i
2 log n
Bob Bob
X
log n
|0i H x x H
Uf
|1i H y y ⊕ f (x)
ψ0 ψ1 ψ2 ψ3
|ψ0 i = |01i
1
|ψ1 i = (|0i + |1i) (|0i − |1i) .
2
³ ´ ³ ´
Observe that Uf |xi |0i−|1i
√
2
= (−1)f (x) |xi |0i−|1i
√
2
.
(
± 12 (|0i + |1i)(|0i − |1i) if f (0) = f (1);
|ψ2 i =
± 12 (|0i − |1i)(|0i − |1i) otherwise.
(
±|0i (|0i−|1i)
√
2
if f (0) = f (1);
|ψ3 i = (|0i−|1i)
±|1i √2 otherwise.
(|0i − |1i)
=⇒ |ψ3 i = ±|f (0) ⊕ f (1)i √ .
2
Thus, by measuring the first bit we get
½ ¾
(|0i − |1i)
{f (0) ⊕ f (1)}, ±|f (0) ⊕ f (1)i √ .
2
In this algorithm, both superposition and interference of quantum states
are exploited.
|cn−1 i |cn−1 i
|an i |an i
Figure 2.14: Circuit for adding two single bit numbers with carry.
Consider a subroutine for adding two single bit numbers with carry.
The circuit for this subroutine is shown in Figure 2.14.
If we measure the last two qubits in the circuit in Figure 2.14, we
get the outputs {sn }, {cn } and the collapsed states |sn i, |cn i provided
d = 0. Hence, using this subroutine we can add two n-bit numbers.
Addition:
We would like to count the number of Toffoli and CNOT gates used
by the circuit as a measure of complexity. Suppose αn Toffoli and βn
CNOT gates are used for adding two n-bit numbers. Then
αn+1 = αn + 2, βn+1 = βn + 2
=⇒ αn = α1 + 2(n − 1), βn = β1 + 2(n − 1).
|b0 i |a0 ⊕ b0 i s0
|di |d ⊕ a0 b0 i c0 when d = 0
Figure 2.15: Circuit for adding two single bit numbers without carry.
2n − 1 CNOT gates. The circuit for adding two n bit numbers is shown
in Figure 2.16.
|a0 i |a0 i
|b0 i |s0 i
|0i → |d0 i |c0 i
|a1 i 1 bit |a1 i
|b1 i ADD |s1 i
|0i → |d1 i |c1 i
|a2 i 1 bit |a2 i
|b2 i ADD |s2 i
|0i → |d2 i |c2 i
.. .. ..
. . .
Figure 2.16: Circuit for adding two n bit numbers without carry.
Subtraction:
Exercise 2.3.7 Count the number of gates required in the above sub-
traction algorithm.
u11 u21
α= p and β = p .
2
|u11 | + |u21 | 2 |u11 |2 + |u21 |2
29
30 Lecture 3. Universal Quantum Gates
Proof Consider H = (C2 )⊗n with computational basis {|xi, x ∈ {0, 1}n }.
Consider a pair x, y which differ in exactly one place, say i.
|xi = |ai|0i|bi,
|yi = |ai|1i|bi,
a1 X1
a2 X2
ai−1 Xj1 −1
b1
b2 Xj1 +1
bn−i Xn
0 X X
U U
U V V† V
Exercise 3.1.4 Show that in Figure 3.4 the circuit on the left hand side
is equivalent to the circuit on the right hand side.
Proof The proof follows from Lemma 3.1.1, Lemma 3.1.2, Lemma 3.1.3
and Lemma 3.1.5.
¤
34 Lecture 3. Universal Quantum Gates
U C B A
π
Proposition 3.1.7 The group generated by H and e−i 8 Z is dense in
SU (2).
π π
Proof H 2 = I, HZH = X, HY H = −Y , He−i 8 Z H = e−i 8 X and
π π π ³ π ´n³ π´ ³ π´ o
e−i 8 Z e−i 8 X = cos2 I − i sin cos (X +Z) + sin Y
8 8 8 8
= R~n(α) ,
π π π
(cos , sin 8 , cos 8 )
where cos α = cos2 π8 , ~n = √
8
2 π
,
1+cos 8
π π π ³ π ´n³ π´ ³ π´ o
He−i 8 Z e−i 8 X H = cos2 I − i sin cos (X +Z)− sin Y
8 8 8 8
= Rm(α)
~ ,
π π π
(cos
where, m
~ = √− sin 82 , πcos 8 ) . Now we need the following lemma.
8
,
1+cos 8
3.2 Appendix
In this section we first give all the definitions and results needed to
prove Lemma 3.1.8. The proofs which are routine are left out. The
reader may refer to [3, 8] for a comprehensive treatment. We start with
a few definitions.
A nonzero ring R with 1 6= 0 is called an integral domain if it has
no zero divisors. In other words, it has the property that for A, B ∈ R,
if AB = 0, then A = 0 or B = 0.
An ideal is called principal if it is generated by a single element.
An integral domain in which every ideal is principal is called a prin-
cipal ideal domain.
Exercise 3.2.1 Show that for any field k, k[x] is a principal ideal do-
main.
An element P (6= {0, 1}) of an integral domain R is called prime if
the following is true: if P divides a product of two elements of R, then
it divides one of the factors.
A nonconstant polynomial P ∈ F[x] is called irreducible if it is writ-
ten as a product of two polynomials P1 , P2 ∈ F[x] then either P1 or P2
is a constant.
A polynomial is called monic if the coefficient of the leading term
is 1.
A polynomial a0 + a1 x + · · · + an xn in Z[x] is called primitive if g.c.d.
(|a0 | , . . . , |an |) = 1 and an > 0.
1 √
2Re{β} = − + 2.
2
√
So mβ is divisible by p(x) = x2 − ( 2 − 12 )x + 1. Since, p(x) has
irrational coefficients and mβ has rational coefficients, mβ must have
another irrational root, say δ. This implies mβ has another quadratic
This means that deg(mβ ) ≥ 4. Consider
factor with real coefficients. √
the polynomial p0 (x) = x2 + ( 2 + 21 )x + 1. Multiplying p(x) and p0 (x)
40 Lecture 3. Universal Quantum Gates
1 2πi0·jn
F |ji = F |j1 j2 . . . jn i = n (|0i + e |1i)(|0i + e2πi0·jn−1 jn |1i) · · ·
2 2
41
42 Lecture 4. The Fourier Transform and an Application
Proof
2 −1 n
1 X 2πijk
F |ji = n e 2n |ki
2 2 k=0
1 1 1
1 X X X k1 k2 kn
= n ··· e2πij( 21 + 22 +···+ 2n ) |k1 k2 . . . kn i
2 2 k =0 k =0 kn =0
1 2
1 X 2πijkl
= n ⊗nl=1 e 2l |kl i
2 2 k ,k ,...,kn
1 2
1 2πij
= n ⊗nl=1 (|0i + e 2l |1i)
22
since
j jn−(l−1) jn−1 jn
l
= integer + + · · · + l−1 + l
2 2 2 2
1 n 2πi0.jn−(l−1) jn−(l−2) ...jn
F |ji = n ⊗l=1 (|0i + e |1i)
22
1
= n (|0i + e2πi0.jn |1i)(|0i + e2πi0.jn−1 jn |1i) . . .
22
(|0i + e2πi0.j1 j2 ...jn |1i).
|j1 i H R2 Rn−1 Rn
|j2 i H Rn−2 Rn−1
|jn−1 i H R2
|jn i H
Figure 4.1: Efficient circuit for quantum Fourier transform. The output
on the k th qubit from top is √12 (|0i+e2 π i 0.jn−k+1 ...jn |1i). The correctness
of the circuit follows from Theorem 4.1.1.
4.1. Quantum Fourier Transform 43
2 φ)
|0i H ··· |0i + e2πi(2 |1i
1 φ)
|0i H ··· |0i + e2πi(2 |1i
0 φ)
|0i H ··· |0i + e2πi(2 |1i
|ui n 0 1 2
··· t−1 n |ui
U2 U2 U2 U2
1 t−1 ϕ t−2 ϕ 0
t (|0i + e2πi2 |1i)(|0i + e2πi2 |1i) . . . (|0i + e2πi2 ϕ |1i)|ui
2 2
t
2 −1
1 X
= t e2πiϕk |ki|ui.
22 k=0
2 −1 t
1 X
|0i . . . |0i |ui → t |ji|ui
| {z } 22
t j=0
1 X
→ t |jie2πijϕ |ui
2 2
j
1 X −2πijk
→ t e 2t +2πijϕ |ki|ui
2
j,k
k t
1 X 1 − e2πi(ϕ− 2t )2
= t k |ki|ui.
2 1 − e2πi(ϕ− 2t )
k
46 Lecture 4. The Fourier Transform and an Application
|0i t H ⊗t FT†
|ui Uj |ui
Figure 4.3: The schematic for the overall phase estimation circuit.
¯ ¯
1 ¯ 1 − e2πi(ϕ− 2kt )2t ¯2
¯ ¯
Pr(x = k) = 2t ¯ k ¯
2 ¯ 1−e 2πi(ϕ− 2t
) ¯
1 sin2 π(ϕ − 2kt )2t
= .
22t sin2 π(ϕ − 2kt )
If the observed value is k, then 2kt is the desired estimate for ϕ. Let
¥ ¦
a = 2t ϕ , d = ϕ − 2at and δ be a positive integer < 2t−1 . We set 2δt to
be the desired tolerance of error in the estimate of ϕ. In other words,
the observed value of the random variable x should lie in
−δ t−1
2X
X
t
Pr(X −Xδ ) = Pr(x = a+j(mod 2 )) + Pr(x = a+j(mod 2t ))
j=−2t−1 +1 j=δ+1
µ −δ
X X 2t−1 ¶
1 1 1
≤ +
4 (j − 2t d)2 (j − 2t d)2
j=−2t−1 +1 j=δ+1
µ −δ
X X2t−1 ¶
1 1 1
< + (since 0 < 2t d < 1)
4 j2 (j − 1)2
j=−2t−1 +1 j=δ+1
2t−1 −1
1 X 1
=
2 j2
j=δ
Z
1 ∞ dy
<
2 δ−1 y 2
1
= .
2(δ − 1)
Order Finding
N = 2j0 +2j1 +2j2 +· · ·+2jk−1 where 0 ≤ j0 < j1 < j2 < · · · < jk−1 < L.
49
50 Lecture 5. Order Finding
Let
r−1
1 X −2πi sk k
|us i = √ e r |x (mod N )i.
r
k=0
We observe that
r−1
1 X −2πi sk k+1
U |us i = √ e r |x (mod N )i
r
k=0
r−1
s 1 X sk
= e2πi r √ e−2πi r |xk (mod N )i.
r
k=0
Register 1
t qubits H ⊗t FT†
Register 2
L qubits xj mod N
Figure 5.1: Quantum circuit for order finding algorithm. The first reg-
ister is initialized to state |0i and the second register is initialized to
state |1i.
Hence on measuring the first t qubits we will get the value of the phase
ϕs correct up to 2L + 1 bits with probability at least |cs |2 (1 − ²).
Now our job is to extract the value of r from the estimated phase.
We know the phase ϕ e ≈ rs correct up to 2L + 1 places. If this estimate
is close enough, we should be able to get r because we know that ϕ e is
the ratio of two bounded integers. This task is accomplished efficiently
using the following result from number theory.
¯ ¯
Theorem 5.1.1 If rs is a rational number such that ¯ rs − ϕ e¯ ≤ 2r12 ,
then rs is a convergent of the continued fraction for ϕ
e and hence can be
efficiently computed using the continued fraction algorithm.
Runtime: O(L4 ).
Procedure:
the number of ‘wires’ then this procedure uses only a constant multiple
of the number of wires used in the earlier classical circuit. The latter
gate can be implemented using a quantum gate. Reversible classical
gates can be built using the Fredkin gate (see Figure 5.2).
x x0
y y0
c c0
Input
x x
C C −1
Clean Clean
Bits Bits
Clean Output
0’s
j
Appendix 2: Efficient implementation of controlled U 2
operation
j
To compute the sequence of controlled U 2 operations we have to com-
pute the transformation
t−1 0
|zi|yi → |ziU zt 2 . . . U z1 2 |yi
t−1 0
= |zi|xzt 2 × · · · × xz1 2 y (mod N )i
= |zi|xz y (mod N )i.
j
Thus the sequence of controlled U 2 operations is equivalent to mul-
tiplying the content of the second register by the modular exponential
xz (mod N ), where z is the content of the first register. This can be
computed using clean reversible computation (see Appendix 1).
This is achieved by first reversibly computing the function xz (mod )N
in a third register and then multiplying the contents of the third and
the second register such that each qubit in the third register is in the
state |0i. The task is accomplished in two stages. In the first stage
j
we compute x2 for all j ∈ {1, 2, . . . , t − 1} by successively squaring
Clean Clean
Bits f (x)
Bits f (x) x Cf−1
Cf −1
x Clean
x f (x) Bits
§ 1
¨
x (mod N ), where t = 2L + 1 + log 2 + 2² = O(L). Each multiplica-
2
tion uses at most O(L ) gates (Indeed an O(L log L log log L) algorithm
using FFT is known. See [2, 10].) and there are t − 1 such multiplica-
tions. Hence in this step at most O(L3 ) gates are used. In the second
stage we compute xz (mod N ) using the identity
xz (mod N ) =
t−1 t−2 0
(xzt 2 (mod N ))(xzt 2 (mod N )) · · · (xzt 2 (mod N )).
Clearly this operation also uses at most O(L3 ) gates. Hence using O(L3 )
gates we compute the transformation |zi|yi → |zi|xz y (mod N )i.
1
a0 + .
1
a1 +
1
a2 +
1
a3 + · · ·
an
pn
then [a0 , a1 , . . . , an ] = qn .
Induction step:
· ¸
1
[a0 , a1 , . . . , am , am+1 ] = a0 , a1 , . . . , am−1 , am +
am+1
³ ´
1
am + am+1 pm−1 + pm−2
= ³ ´
1
am + am+1 qm−1 + qm−2
am+1 (am pm−1 + pm−2 ) + pm−1
=
am+1 (am qm−1 + qm−2 ) + qm−1
am+1 pm + pm−1
=
am+1 qm + qm−1
pm+1
= .
qm+1
Proof We use induction. The result is true for the base cases n = 1, 2.
Assume the result is true for any integer less than n.
pn qn−1 − pn−1 qn = (an pn−1 + pn−2 )qn−1 − pn−1 (an qn−1 + qn−2 )
= −1(pn−1 qn−2 − pn−2 qn−1 )
= (−1)n .
h = a0 k + k1 (0 < k1 < k)
k = a1 k1 + k2 (0 < k2 < k1 )
..
.
pn δ
x= + 2.
qn 2qn
58 Lecture 5. Order Finding
pn p
Then |δ| ≤ 1 and qn = q is the nth convergent. Let
µ ¶
qn pn−1 − pn qn−1 qn−1
λ=2 − .
δ qn
λpn + pn−1
x=
λqn + qn−1
2 qn−1
λ= −
δ qn
> 2 − 1 since qi > qi−1
= 1.
This implies that λ is a rational number greater than 1 and it has a finite
continued fraction, say [b0 , . . . , bm ]. Hence x = [a0 , . . . , an , b0 , . . . , bm ].
Thus pq is a convergent of x.
¤
ϕ(r)
Appendix 4: Estimating r
Qa αi Qb βj
Proof Let r = i=1 pi j=1 qj , where
2 log r
p 1 < p2 < · · · < p a ≤ < q1 < q2 < · · · < qb .
log log r
Then
a
Y b
Y
αi −1 β −1
ϕ(r) = (pi − 1)pi (qi − 1)qi j .
i=1 j=1
2 log r
Note that q1b ≤ r. This implies b ≤ logq r ≤ log r. Since q1 > log log r , we
log r
have b ≤ log log r−log log log r+log 2 .
5.1. The Order Finding Algorithm 59
Hence,
Qa Q β −1
ϕ(r) − 1)piαi −1 bj=1 (qj − 1)qj j
i=1 (pi
= Qa
r αi Qb βj
i=1 pi j=1 qi
a µ
Y ¶ b µ ¶
pi − 1 Y 1
= 1−
pi qj
i=1 j=1
2 log r
Y µi − 1¶ Y
b µ
log log r
1
¶
> 1−
i qj
i=2 j=1
b µ
Y ¶
log log r 1
= 1−
2 log r qj
j=1
µ ¶
log log r log log r b
> 1−
2 log r 2 log r
µ ¶
log log r log log r
> 1− b
2 log r 2 log r
· µ ¶¸
log log r log log r log r
= 1−
2 log r 2 log r log log r − log log log r + log 2
· ¸
log log r 1 − 2E
> where E = log loglogloglogr−log
r
2
2 log r 2(1 − E)
µ ¶
log log r 1 − 2E
>
2 log r 2
log log r
> for r ≥ 16.
10 log r
¤
In fact the following theorem is true.
Shor’s Algorithm
∆
A = {x ∈ Z∗N | (ord(x) is odd ) or (ord(x) is even and xord(x)/2 = −1)},
61
62 Lecture 6. Shor’s Algorithm
0
(b) and (c) Let ord(x) = 2` s0 (where `0 ≥ 1 and s0 is odd). Then,
0 0 `0 −1
ord(x)|2` s, but ord(x) - 2` −1 s. Hence, x2 s ∈ V − {1}. Now, if
`0 −1 0 `0 −1
xord(x)/2 = −1, then x2 s = −1. Hence, x2 s = −1.
¤
∆ i
For i = 0, 1, . . . , ` − 1, and v ∈ V , let Si,v = {x ∈ Z∗N : x2 s = v}. By
Lemma 6.1.2, we have
`−1
[
A ⊆ S0,1 ∪ Si,−1 ; (6.1.3)
i=0
`−1
[ [
and Z∗N = S0,1 ∪ Si,v . (6.1.4)
i=0 v∈V −{1}
Lemma 6.1.5 All the sets appearing on the right hand side of (6.1.4)
are disjoint.
Proof Consider two such sets Si,v and Sj,w appearing above. If i = j
then v 6= w and these sets are disjoint by defnition. Hence, suppose
i+1
i < j; this implies that w 6= 1. But for each x ∈ Si,v , we have x2 s =
j
v 2 = 1. This implies that x2 s = 1 6= w, and therefore x 6∈ Sj,w .
¤
Lemma 6.1.6
(
1 if wi = 1;
ci =
bi if wi = −1.
j
Clearly, c2 s = w, so Sj,w 6= ∅. Furthermore, the map x 7→ cb−1 x is a
bijection between Sj,−1 and Sj,w . Hence, |Sj,−1 | = |Sj,w |.
¤
which implies
`−1
[
2m−1 |A| ≤ 2m−1 |S0,1 ∪ Si,−1 |
i=0
`−1
[ [
≤ |S0,1 ∪ Si,w |
i=0 w∈{W −{1}}
`−1
[ [
≤ |S0,1 ∪ Si,v |
i=0 v∈{V −{1}}
= |Z∗N | .
¤
Lemma 6.1.1 is the main tool for analyzing the Shor’s factoring algo-
rithm. The crucial observation is that, if we can get a nontrivial square
root of unity, then we can find a nontrivial factor of N using Euclid’s
G.C.D. algorithm. Lemma 6.1.1 tells us that if we randomly pick a
number x, less than N and look at its order, with probability greater
1
than 1 − 2m−1 it is even and we can get a nontrivial square root of unity
64 Lecture 6. Shor’s Algorithm
Input N
1) If N is even, return 2.
4) Pick an element x ∈ N .
5) If x | N , return x.
10) Use Euclid’s G.C.D. algorithm to find the greatest common divisor
of (y − 1, N ) and (y + 1, N ). Return the nontrivial numbers.
Subroutine: Prime-power
Input: Integer N.
1) Compute y = log2 N .
CHANNEL
Input state Output state
ρ T (ρ)
Noise
67
68 Lecture 7. Quantum Error Correcting Codes
for any input state ρ on H the output state T (ρ) has always the form
X
T (ρ) = Lj ρL†j (7.1.1)
j
where Lj belongs to E for every j (See Figure 7.1). If the same input
state is transmitted again the operators Lj may be completely different.
But they always come from the error space E and satisfy the equation
µX
k ¶
Tr L†j Lj ρ = 1. (7.1.2)
j=1
and
X †
hu|( Lj Lj )|ui = 1.
j
This is true if and only if hv|Mi Lj |ui = 0 for all |vi ∈ {|ui}⊥ and every
i, j. Thus,
Mi Lj |ui = c(u)|ui for all |ui ∈ C.
Sufficiency: Let the conditions (1) and (2) hold. Consider the sub-
spaces Eψ0 , Eψ1 , . . . , Eψk−1 . It can be verified that the correspondence
Lψi → Lψj , for all L ∈ E is a scalar product preserving map. So we can
write the following table.
ψ0 ψ1 ··· ψj ··· ψk−1
Eψ0 Eψ1 ··· Eψj ··· Eψk−1
Then we have
Uj Lψ0 = α0 (L)ϕ0j + α1 (L)ϕ1j + · · · + αl−1 ϕjl−1
⇒ Ei Uj Lψ0 = αi (L)ϕij
⇒ V (i) Ei Uj Lψ0 = αi (L)ψj .
That is,
Ri Uj Lψ0 = αi (L)ψj for i = 0, 1, . . . , l − 1,
El Uj Lψ0 = 0 = Rl Uj Lψ0 .
Thus we have,
Ri Lψ = c0 αi (L)ψ0 + c1 αi (L)ψ1 + · · · + ck−1 αi (L)ψk−1
= αi (L)ψ for i ∈ {0, 1, . . . , l − 1}, and
Rl Lψ = 0.
¯ ¯
That is, Ri L ¯C = αi (L)I ¯C , where αl (L) = 0.
¤
Let E ⊂ G be called the error set and C ⊂ G the code set. Let E =
lin{Lx | x ∈ E}, where (La f )(x) = f (a−1 x), lin denotes linear span and
C = lin{1{c} | c ∈ C}. It can be verified that La 1{b} = 1{ab} .
If c1 6= c2 , then
D E ®
1{c1 } , L†x Ly 1{c2 } = 1{c1 } , 1{x−1 yc2 }
= 0 if x−1 yc2 6= c1
or x−1 y 6= c1 c−1 −1
2 or E E ∩ CC
−1
= {e}.
Also, (
D E 1 if x = y;
1{c} , L†x Ly 1{c} =
0 otherwise.
input output
CHANNEL
c∈C xc ∈ Ec
x∈E
(E − E) ∩ C2 = {0}
F −1 F ∩ C1⊥ ⊆ C2⊥ ,
and let S be the cross section for C2 /C1 in the sense that S ⊂ C2
and C2 = ∪a∈S C1 + a is a coset decomposition (or partition) of C2 by
C1 –cosets. Note that
∆
S ⊥ = {α | α ∈ Â, α(a) = 1 for all a ∈ S}
The x-th term in the summation on the right side of (7.1.11) is not equal
to zero only if x ∈ (C1 + a1 + a) ∩ (C1 + a2 ), which implies the existence
of x1 , x2 ∈ C1 such that
x1 + a1 + a = x2 + a2
(7.1.12)
=⇒ a = (x2 − x1 ) + a2 − a1 .
Now let us consider the case a1 = a2 = b ∈ S. Then the left hand side
of (7.1.11) is equal to
X
(#C1 )−1 1C1 +b+a (x) 1C1 +b (x) α(x). (7.1.13)
x∈A
If the x-th term in the right hand side of equation (7.1.15) is not equal
to zero, then C2 + b ∩ C2 6= ∅ =⇒ b ∈ C2 ∩ (E − E) =⇒ b = 0. Thus
the right hand side of equation (7.1.15) vanishes whenever b 6= 0 for any
α1 , α2 in S̃. Let b = 0. Then the right hand side of equation (7.1.15) is
X
(#C2 )−1 α1 (x)α2 (x)β(x). (7.1.16)
x∈C2
P
If α1 = α2 = α ∈ S̃ this becomes (#C2 )−1 x∈C2 β(x) which is inde-
pendent of α ∈ S̃. So we consider the case b = 0, α1 6= α2 , α1 , α2 ∈ S̃.
Then the expression (7.1.16) is not equal to zero only if α1 α2 β ∈ C2⊥ .
This implies β ∈ C1⊥ ∩ F −1 F. So by hypothesis β is in C2⊥ . This implies
α1 α2 ∈ C2⊥ . i.e., α1 and α2 lie in the same coset of C2⊥ in C1⊥ . This is
impossible. So expression (7.1.16) must be equal to zero. In other words
Knill-Laflamme conditions are fulfilled.
¤
Sj LU ψ = U Rj U −1 LU ψ
= U Rj L̃ψ
= λj (L̃)U ψ.
min d(x, y) = d.
x,y ∈C,x6=y
d= min w(x), #C = M,
x6=0, x∈C
then C is called an (n, M, d)A group code, and it is denoted by hn, M, diA .
If A is the additive group of a finite field Fq of q elements (q = pm , for
some prime p) and C ⊂ Fnq is a linear subspace of the n-dimensional
vector space Fnq over Fq and d = minx6=0 w(x), then C is called a linear
code over Fq with minimum distance d and written as [n, k, d]q code,
where k = dim C. When q = 2, it is simply called an [n, k, d] code
(binary code). ¥ ¦
An hn, M, diA code is t-error correcting when t = d−1 2 .
X = X1 ⊗ X2 ⊗ · · · ⊗ Xn ,
where #{i | Xi 6= I} ≤ t.
7.2. Some Definitions 77
Then we have
W(a,α) W(b,β) = α(b)Wa+b,αβ
†
and Tr W(a,α) W(b,β) = (δa,b δα,β )N.
and ½ ¯ ¾
1 ¯
n W
¯ n
(a, α) ∈ A × Ân
N 2 (a,α) ¯
n
is an orthonormal basis for B(H) = B(G ⊗ ). Define
Then
{W(a,α) | w(a, α) ≤ t}
is a linear basis for the subspace Et .
n
A subspace C ⊂ G ⊗ is called a quantum code of minimum distance d,
if C has an orthonormal basis ψ1 , ψ2 , . . . , ψk satisfying
7.3 Examples
7.3.1 A generalized Shor code
We begin with a few definitions. Let A be a finite abelian group with
binary operation + and identity element 0. Let  denote its character
n
group. Let H be the Hilbert space L2 (A)⊗ . Let Ua and Vα denote the
Weyl operators. Let Cn ⊂ An be a t-error correcting (d(Cn ) ≥ 2t + 1)
group code of length n with alphabet A. Let Dn,m ⊂ (Cˆn )m be a t-error
correcting group code with alphabet Cˆn of length m.
An element in Dn,m is denoted by χ. Sometimes we also denote by
χ the m-tuple χ1 , χ2 , . . . , χm , where each χi is in Cˆn . Define
(
−1
#Cn 2 α(x) if x ∈ Cn ;
fα (x) =
0 otherwise.
Proof Let (a, α) ∈ Amn × Âmn such that w(a, α) ≤ 2t. We have
m
X Y
hFβ |Ua Vα |Fγ i = f βj (x(j) − a(j) )fγj (x(j) )α(x). (7.3.2)
x ∈Amn j=1
7.3. Examples 79
C3 = {000, 111}
Ĉ3 has two elements,
χ1 (000) = χ1 (111) = 1 (identity character) and
χ2 (000) = +1, χ2 (111) = −1.
1
fχ1 = √ (|000i + |111i)
2
1
fχ2 = √ (|000i − |111i)
2
D3,3 = {(χ1 , χ1 , χ1 ), (χ2 , χ2 , χ2 )}
3
Fχ1 χ1 χ1 = fχ⊗1
3
Fχ2 χ2 χ2 = fχ⊗2 .
|ψi
H
|0i
|0i
|0i H
|0i
|0i
|0i H
|0i
|0i
x a0
a1 a1
a2 a2
a1 a3
a4 a4
C (1)
H
a0 Z
a1 Z H
a2 Z H
a3 Z
a4 Z H
C (2) C (3)
|ψi
|0i
|0i C (1) C (2)
C (3)
|0i
|0i
Consider the Table 7.1. The ij th entry, for i, j > 1, is the inner prod-
uct of the ith entry in the first row and j th entry in the first column,
computed over the field F2 .
The portion inside the box is Hadamard [7, 3, 4] simplex code. Let
82 Lecture 7. Quantum Error Correcting Codes
Table 7.1:
(x1 x2 x3 x1 + x2 x1 + x3 x2 + x3 x1 + x2 + x3 )
x2 + x3 + a x1 + x2 + x3 + ai
y1 y2 y3 i.
7.3. Examples 83
|ai
|0i
|0i
|0i
|ψa i
|0i H
|0i H
|0i H
This shows that the code can be implemented by the circuit shown in
Figure 7.5.
a1 a2 ··· aj ··· aq
ϕ0 = 0 0 0 ··· 0 ··· 0
.. .. .. .. ..
. . . ··· . ··· .
ϕi ϕi (a1 ) ϕi (a2 ) ··· ϕi (aj ) ··· ϕi (aq )
.. .. .. .. ..
. . . ··· . ··· .
ϕN −1 ϕN −1 (a1 ) ϕN −1 (a2 ) ··· ϕN −1 (aj ) ··· ϕN −1 (aq )
Its zeros are exactly a1 , a2 , . . . at−1 . Thus, the weight of the correspond-
ing row is q − t + 1.
¤
¥ q−t ¦
Corollary 7.3.5 Bt is a 2 –error correcting group code.
¥ ¦
If Et is the Hamming sphere of radius q−t
2 with (0, . . . , 0) as center
t
in Fqq then (Et − Et ) ∩ Bt = {0}.
⊥ ˆ q
¥ t ¦ Let α ∈ Bt ⊂ (Fq ) . If α 6= 1, then w(α) ≥ t + 1.
Proposition 7.3.6
⊥
Thus Bt is a 2¥ ¦error correcting group code. If Ft is the Hamming
sphere of radius 2t then Ft−1 Ft ∩ Bt⊥ = {1}.
X (x − b1 )(x − b2 ) · · · (x − bj )ˆ· · · (x − br )
ϕ(x) = cj ,
(bj − b1 )(bj − b2 ) · · · (bj − bj )ˆ· · · (bj − br )
We can now use Theorem 7.1.10 and Theorem 7.1.14 to the case
C1 ⊂ C2 ⊂ Aq , A = Fq , as an additive group, C1 = Bt0 , C2 = Bt ,
0 < t0 < t < q. Then Bt = Bt0 ⊕ S, where S consists of all polynomials
of the form
0 0
s(x) = xt (a0 + a1 x + · · · + at−t0 −1 xt−t −1 ).
For any polynomial ϕ consider the state |ϕi = |ϕ(a1 )ϕ(a2 ) . . . ϕ(aq )i.
For any s ∈ S define
t0 X
ψs = q − 2 |ϕi.
ϕ∈P(t0 −1,q)
0
Then Ct,t0 = lin{ψs | s ∈ S} is a quantum code with dim Ct,t0 = q t−t ,
¥ q−t ¦ j t0 k
which can correct 2 ∧ 2 errors.
87
88 Lecture 8. Classical Information Theory
Note 8.1.3 If Property 4) is not changed then there can be other func-
tions which satisfy properties 1) to 7). See [1] for other measures of
entropy.
H(X1 , . . . , Xn | Y ) =
H(X1 | Y ) + H(X2 | Y X1 ) + · · · + H(Xn | Y X1 · · · Xn−1 ).
90 Lecture 8. Classical Information Theory
H(X1 , . . . , Xn | Y ) =
H(X1 | Y ) + H(X2 | Y X1 ) + · · · + H(Xn | Y X1 · · · Xn−1 ).
Induction step:
w1 M −1 + w2 M −2 + · · · + wL M −L ≤ 1. (8.2.3)
nH(p1 , p2 , . . . , pN ) X nH(p1 , p2 , . . . , pN )
≤ p(a)`(C(a)) < + 1.
log2 M a log2 M
This implies
¯P ¯
¯ a p(a)`(C(a)) H(p1 , p2 , . . . , pN ) ¯ 1
¯ − ¯< . (8.2.8)
¯ n log2 M ¯ n
Also H2 (µ) ≤ 2H1 (µ). Thus the sequence H1 (µ), H2 (µ) − H1 (µ), . . . ,
Hn (µ) − Hn−1 (µ), . . . is monotonic decreasing. Since
97
98 Lecture 9. Quantum Information Theory
coming from the Hilbert spaces HA and HB respectively. For any op-
erator X on H we define two operators X A and X B on HA and HB
respectively by
X
hu|X A |vi = hu ⊗ fj |X|v ⊗ fj i (9.2.1)
j
X
0 B 0
hu |X |v i = hei ⊗ u0 |X|ei ⊗ v 0 i (9.2.2)
i
Proof Let λji and |eji i be the eigenvalues and corresponding eigenvec-
P
tors of ρi . Then pi ρi has eigenvalues pi λji with respective eigenvectors
|eji i. Thus,
µX ¶ X
S pi ρi = − pi λji log pi λji
i i,j
X X X
=− pi log pi − pi λji log λji
i i j
X
= H(p) + pi S(ρi ).
i
¤
100 Lecture 9. Quantum Information Theory
we obtain
X
|S(ρ) − S(σ)| ≤ ∆ η(|ri − si | /∆) + η(∆) ≤ ∆ log d + η(∆).
Proof
P Let the eigenP decompositions of the states ρ and σ be given by
ρ = i pi | iihi |, σ = j qj | jihj |. Then we have
X X
S(ρ||σ) = pi log pi − hi|ρ log σ|ii
X X
= pi log pi − pi |hi | ji|2 log qj .
i,j
P P
Putting ri = j |hi | ji|2 qj and observing that i ri = 1, we have
X ri
S(ρ||σ) ≥ − pi log ≥ 0.
pi
i
¤
102 Lecture 9. Quantum Information Theory
S(A) = − Tr ρA log ρA
= − Tr ρAB log(ρA ⊗ IB ).
Proof
0 ≤ S(ρ||ρ0 )
= Tr ρ log ρ − Tr ρ log ρ0
µX ¶
0
= Tr ρ log ρ − Tr Pi ρ log ρ
i
X
= Tr ρ log ρ − Tr Pj ρ(log ρ0 )Pj
j
X
= Tr ρ log ρ − Tr Pj ρPj (log ρ0 )
j
0
= S(ρ ) − S(ρ).
¤
By a generalized
P measurement we mean a set of operators L1 , . . . , Ln
satisfying ni=1 L†i Li = I. If ρ is a state in which such a generalized
measurement is made, the probability of the outcome i is Tr ρL†i Li and
Li ρL†i
the post measurement state is . Thus the post measurement
Tr ρL†i Li
state, ignoring the individual outcome, is
X Li ρL†i X
(Tr ρL†i Li ) = Li ρL†i .
Tr ρL†i Li i
P
Note that S(AB) = H(p) + pi S(ρi ), by the joint entropy theorem
(Corollary 9.2.5).
X µX ¶
A A
ρ = pi ρi ⇒ S(ρ ) = S pi ρi .
X
ρB = pi |iihi| ⇒ S(ρB ) = H(p).
By subadditivity
P we have, S(ρA )P
+ S(ρB ) ≥ S(ρAB ). Substituting we
get S( pi ρi ) + H(p) ≥ H(p) + pi S(ρi ).
¤
Property 12)
P P P
Theorem 9.2.17 pi S(ρi ) ≤ S( pi ρi ) ≤ H(p) + pi S(ρi ).
Proof First let us consider the case when ρi = |ψi ihψi | for all i. Let
ρi ’s be the states in HA and let HB be an auxiliary Hilbert space with
an orthonormal basis |ii corresponding to the P√ index i of the probabilities
pi . Let PρAB =| ABihAB | where |ABi = pi |ψi i|ii. In other words
√
ρAB = i,j pi pj |ψ ihψ
Pi j | ⊗ |iihj|. Since ρ AB is a pure state we have
¤
106 Lecture 9. Quantum Information Theory
The sub-additivity and the triangle inequality for two quantum sys-
tems can be extended to three systems. This gives rise to a very impor-
tant and useful result, known as the strong sub-additivity. The proof
given here depends on a deep mathematical result known as Lieb’s the-
orem.
Let A, B be bounded operator variables on a Hilbert space H. Sup-
pose the pair (A, B) varies in a convex set C. A map f : C → R is said
to be jointly convex if
Property 13)
Lemma 9.2.23 Let 0 < t < 1 and let A, B be two positive operators
such that A ≤ B. Then At ≤ B t .
Proof
A ≤ B ⇒ (λ + A)−1 ≥ (λ + B)−1
⇒ λt (λ + A)−1 ≥ λt (λ + B)−1
⇒ λt−1 − λt (λ + A)−1 ≤ λt−1 − λt (λ + B)−1 .
Thus by spectral theorem and Lemma 9.2.21 we have At ≤ B t .
¤
· ¸
A11 A12
Lemma 9.2.24 Let A = be a strictly positive definite ma-
A21 A22
trix where A11 and A22 are square matrices. Then A11 and A22 are also
strictly positive definite and
÷ ¸−1 !
A11 A12
> A−1
11 .
A21 A22
11
This implies
Z ∞
1
λt−1 − λt (λI + V † XV )−1 dλ
β(1, 1 − t) 0
Z ∞
1
≥ λt−1 − λt (λ−1 (I − V † V ) + V † (λ + X)−1 V )dλ.
β(1, 1 − t) 0
We look upon B(H1 ) and B(H2 ) as Hilbert spaces with the scalar
product between two operators defined as hX, Y i = Tr X † Y. Define
1 1
V : B(H1 ) → B(H2 ) by V : XT12 = α(X)T22 .
9.2. Properties of von Neumann Entropy 109
Proof
1 1 1
||α(X)T22 ||2 = Tr T22 α(X)† α(X)T22
≤ Tr α(X † X)T2 ≤ Tr X † XT1
1 1
= Tr T12 X † XT12
1
= ||XT12 ||2 .
Hence the assertion is true.
¤
Assume that T1 and T2 are invertible and put ∆t X = S1t XT1−t and
Dt Y = S2t Y T2−t . Note that ∆t ∆s = ∆t+s and Dt Ds = Ds+t for s, t ≥ 0.
Furthermore,
1 1 1 1
−t
hXT12 | ∆t | XT12 i = Tr T12 X † S1t XT12
= Tr(X † S1t X)T11−t
≥ 0,
1 1
and similarly hY T22 | Dt | Y T22 i ≥ 0.
Hence ∆t and Dt are positive operator semigroups and in particular
∆t = ∆t1 and Dt = D1t .
1 1 1 1
Lemma 9.2.28 hXT12 | ∆1 | XT12 i ≥ hXT12 | V † D1 V | XT12 i.
Proof
1 1 1
− 12
hXT12 | ∆1 | XT12 i = Tr T12 X † S1 XT1
= Tr X † S1 X
= Tr XX † S1
≥ Tr α(XX † )S2
≥ Tr α(X)α(X † )S2
1
− 12
= Tr T22 α(X)† S2 α(X)T2
1 1
= hXT12 | V † D1 V | XT12 i.
¤
110 Lecture 9. Quantum Information Theory
Tr λρ1 log λρ1 +(1−λ)ρ2 log(1−λ)ρ2 −λρ1 log λσ1 −(1−λ)ρ2 log(1−λ)σ2
≥ S(λρ1 + (1 − λ)ρ2 ||λσ1 + (1 − λ)σ2 ).
9.2. Properties of von Neumann Entropy 111
Property 15)
But T (|iihi|) = 0, as for a pure state S(AC) = S(B) and S(BC) = S(A).
This implies T (ρABC ) ≤ 0. Thus
¤
Property 18) Holevo Bound
Consider an information source in which messages x from a finite set
X come with probability p(x). We denote this probability distribution
by p. The information obtained from such a source is given by
X
H(X) = − p(x) log2 p(x).
x∈X
came from the source, or equivalently, the state of the quantum system
is the encoded state ρx the probability for the measurement value y is
given by Pr(y | x) = Tr Ly ρx L†y . Thus the joint probability Pr(x, y),
that x is the message and y is the measurement outcome, is given by
P
Since ρXZ = p(x) P | xihx | ⊗ρx we have from the joint entropy theorem
S(XZ) = H(p) + p(x)S(ρx ). Furthermore
X
ρX = p(x) | xihx |, S(X) = H(p) = H(X)
X
ρZ = p(x)ρx , S(Z) = S(ρZ ) (9.2.34)
P P
S(X : Z) = S( p(x)ρx ) − p(x)S(ρx ).
Thus,
S(X 0 : Y 0 ) = H(X) + H(Y ) − H(XY ). (9.2.35)
Combining (9.2.33), (9.2.34) and (9.2.35) we get the required result.
¤
Property 19) Schumacher’s theorem
Let p be a probability distribution on a finite set X. For ² > 0 define
Or equivalently,
log ν(p⊗n , ²)
lim = H(p) for all ² > 0 (9.2.36)
n→∞ n
log ν(ρ⊗n , ²)
lim = S(ρ) (9.2.38)
n→∞ n
where xi ’s vary in X and |xi denotes the product vector |x1 i|x2 i . . . |xn i.
Write pn (x) = p(x1 ) . . . p(xn ) and observe that p⊗n = {pn (x), x ∈ X ⊗n }
is Pthe probability distribution of n i.i.d. copies of p. We have S(ρ) =
− x p(x) log p(x) = H(p). From the strong law of large numbers for
i.i.d. random variables it follows that
n
1 1X
lim − log p(x1 )p(x2 ) . . . p(xn ) = lim − log p(xi ) = S(ρ)
n→∞ n n→∞ n
i=1
118 Lecture 9. Quantum Information Theory
and note that dim E(n, ²) = #T (n, ²). Summing over x ∈ T (n, ²) in
(9.2.41) we conclude that
and therefore by (9.2.40) and the fact that probabilities never exceed 1,
we get
2n(S(ρ)−²)) (1 − Acn ) ≤ dim E(n, ²) ≤ 2n(S(ρ)+²))
for all ² > 0, n = 1, 2, . . .. In particular
Letting n → ∞ we get
log ν(ρ⊗n , δ)
limn→∞ ≤ S(ρ) + ².
n
Since ² is arbitrary we get
log ν(ρ⊗n , δ)
limn→∞ ≤ S(ρ).
n
Now we shall arrive at a contradiction by assuming that
log ν(ρ⊗n , δ)
limn→∞ < S(ρ).
n
Under such a hypothesis there would exist an η > 0 such that
log ν(ρ⊗n , δ)
≤ S(ρ) − η
n
for infinitely many n, say n = n1 , n2 , . . . where n1 < n2 < · · · . In such
a case there exists a projection Fnj in H⊗nj such that
1 − δ ≤ Tr ρ⊗nj Fnj
(9.2.44)
= Tr ρ⊗nj E(nj , ²)Fnj + Tr ρ⊗nj (I − E(nj , ²))Fnj .
From (9.2.40) and the fact that ρ⊗n and E(n, ²) commute with each
other we have
where the supremum is taken over all input distributions p. For a fixed
input distribution p, put
X ½ ¾2
2 Pr(x, y)
σp = Pr(x, y) log − Hp(A : B) (9.2.48)
p(x)q(y)
x∈A,y∈B
9.2. Properties of von Neumann Entropy 121
Lemma 9.2.49 Let η > 0, δ > 0 be positive constants and let p be any
input distribution on A. Then there exists a code of size N and error
probability ≤ η where
à !
σp2
N ≥ η − 2 2Hp (A:B)−δ .
δ
Pr(x, y)
ξ(x, y) = log .
p(x)q(y)
σp2
Pr(V ; P ) ≥ 1 − . (9.2.51)
δ2
Define Vx = {y | (x, y) ∈ V }. Then (9.2.51) can be expressed as
X σp2
p(x)px (Vx ) ≥ 1 − . (9.2.52)
δ2
x∈A
This shows that for a p-large set of x’s the conditional probabilities
px (Vx ) must be large. When (x, y) ∈ V we have from (9.2.50)
Pr(x, y)
R − δ ≤ log ≤R+δ
p(x)q(y)
or equivalently
q(y)2R−δ ≤ px (y) ≤ q(y)2R+δ .
Summing over y ∈ Vx we get
In particular,
In other words Vx ’s are q-small. Now choose x1 in A such that px1 (Vx1 ) ≥
1 − η and set V1 = Vx1 . Then choose x2 such that px2 (Vx2 ∩ V10 ) > 1 − η,
where the prime 0 denotes complement in B. Put V2 = Vx2 ∩V10 . Continue
this procedure till we have an xN such that
px (Vx ∩ (∪N 0
j=1 Vj ) ) ≤ 1 − η
px (Vx ∩ (∪N 0
j=1 Vj ) ) ≤ 1 − η for all x ∈ A. (9.2.54)
σp2 X
1− ≤ p(x)px (Vx )
δ2 x
X X
= p(x)px (Vx ∩ (∪N 0
i=1 Vi ) ) + p(x)px (Vx ∩ (∪N
i=1 Vi ))
x x
X
≤1−η+ p(x)px (Vx ∩ (∪N
i=1 Vi ))
x
= 1 − η + q(∪N
i=1 Vi )
N
X
≤1−η+ q(Vi )
i=1
XN
≤1−η+ q(Vxi )
i=1
≤ 1 − η + N 2−(R−δ) .
³ ´
σp2
Thus N ≥ η − δ2
2(R−δ) .
¤
9.2. Properties of von Neumann Entropy 123
Now we consider the n-fold product C(n) of the channel C with input al-
(n)
phabet An , output alphabet B n and transition probability {px (V ), x ∈
n n
A , V ⊂ B } where for x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ),
n
Y
(n)
px ({y}) = pxi ({yi }).
i=1
and Hp(n) (An : B n ) = nHp(A : B), σp2 (n) = nσp2 where σp2 is given by
(9.2.48). Choose η > 0, δ = n² and apply the Lemma 9.2.49 to the
product channel. Then it follows that there exists a code of size N and
error probability ≤ η with
à ! à !
nσp2 σ 2
N ≥ η − 2 2 2n(Hp (A:B)−²) = η − 2 2n(Hp (A:B)−²) .
p
n ² n²
Thus
à !
σ 2
1 1
log ν(C(n) , η) ≥ log η − 2 + Hp(A : B) − ².
p
n n n²
In other words
1
limn→∞ log ν(C(n) , η) ≥ Hp(A : B) − ².
n
Here the positive constant ² and the initial distribution p on the input
alphabet A are arbitrary. Hence we conclude that
1
limn→∞ log ν(C(n) , η) ≥ C.
n
124 Lecture 9. Quantum Information Theory
1
limn→∞ log ν(C(n) , η) ≤ C.
n
The proof of this assertion is long and delicate and we refer the reader
to [16]. We summarize our discussions in the form of a theorem.
1
lim log ν(C(n) , η) = C for all 0 < η < 1.
n→∞ n
X̀
R(ρ0 ) = Mj ρ0 Mj† for any state ρ0 on HB
j=1
P` †
1. M1 , . . . , M` are operators from HA to HB satisfying j=1 Mj Mj =
IB ;
Now define
We may call ν(E, ²) the maximal size possible for a quantum code of
error not exceeding ². As in the case of classical channels one would like
to estimate ν(E, ²).
If n > 1 is any integer define the n-fold product E ⊗n of the operation
E by
X
E ⊗n = Li1 ⊗ Li2 ⊗ · · · ⊗ Lin ρL†i1 ⊗ L†i2 ⊗ · · · ⊗ L†in
i1 ,i2 ,...,in
⊗n
for any state ρ on HA , where the Li ’s are as in (9.2.57). It is an
interesting problem
© ª to analyze the asymptotic behavior of the sequence
1 ⊗n , ²) as n → ∞.
n log ν(E
126 Lecture 9. Quantum Information Theory
Bibliography
[1] J. Aczel and Z. Daroczy, On Measures of Information and Their
Characterizations, Academic Pub., New York, 1975.
127
128 Bibliography