0% found this document useful (0 votes)
13 views17 pages

Lecture 1

Quatrieme lecture pour l'examen.

Uploaded by

Pushkar Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Lecture 1

Quatrieme lecture pour l'examen.

Uploaded by

Pushkar Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Lecture 1: Quantum Mechanics and information

theory

Y. Piétri

September 18, 2024

In this lecture, we will introduce the basic tools that are required for this course. In
particular, we will start by reviewing the basic tools in quantum mechanics, before moving
to considerations on classical and quantum information theory.

1 Quantum mechanics
This first section is freely based on chapters 2 and 3 of [1] and chapters 1,2,4 and 5 of
[2].

1.1 States and Hilbert space


The state of an isolated quantum system can be described by a normalised vector in a
Hilbert space, which is defined below.

Definition 1 (Hilbert space). A Hilbert space H is a vector space over the complex
numbers C equipped with a scalar product:

⟨·|·⟩ : H × H → C (1)
with the following properties:
1. Positivity: ⟨ψ|ψ⟩ ≥ 0 for all |ψ⟩ ∈ H , with equality if and only if |ψ⟩ = 0;
2. Linearity: ⟨ϕ| (α |ψ1 ⟩ + β |ψ⟩) = α ⟨ϕ|ψ1 ⟩ + β ⟨ϕ|ψ2 ⟩ for all |ϕ⟩ , |ψ1 ⟩ , |ψ2 ⟩ ∈ H
and α, β ∈ C;
3. Conjugate symmetry: ⟨ϕ|ψ⟩∗ = ⟨ψ|ϕ⟩ for all |ψ⟩ , |ϕ⟩ ∈ H .

We implicitly defined the bra-ket notations:

Definition 2 (Bra-ket notations). A state, represented by a complex vector, is noted


as a ket |ψ⟩ ∈ H . For each vector |ψ⟩, there exists a unique linear map

⟨ψ| · : H → C (2)

1
such that, for each |ϕ⟩ ∈ H , ⟨ψ|ϕ⟩ is the scalar product. This map, called bra and
noted as ⟨ψ|, is a vector in the dual Hilbert space H ∗ .

The kets can be expressed, as any vectors, in a basis representation. From now on, for
simplicity, we will work with vectors in a finite dimensional Hilbert space (with dimension
d = dim(H ).

Definition 3 (Basis). An orthonormal basis of the Hilbert space H is a set of vectors


{|ei ⟩}1≤i≤d , such that for any 1 ≤ i, j ≤ d

⟨ei |ej ⟩ = δij (3)

and such that any state |ψ⟩ can be written in the unique following way:

d
X
|ψ⟩ = ⟨ei |ψ⟩ |ei ⟩ (4)
i=1

In the previous definition, δij referred to the Kronecker delta symbol which is always 0
except for i = j, where it takes value 1.
P
For instance, if |ψ⟩ = i ai |ei ⟩, then the basis representation is
 
a1
 a2 
|ψ⟩ =  ..  (5)
 
.
ad

and the inner product of two vectors is then given by


   
a1 b1
 a2   b2  Xd
|ψ⟩ =  ..  , |ϕ⟩ =  ..  , ⟨ϕ|ψ⟩ = b∗i ai (6)
   
. . i=1
ad bd

The inner product is independent of the basis.

Note. From eq. (6) it is evident that, given a ket |ψ⟩, the corresponding bra ⟨ψ| is
given by

⟨ψ| = (|ψ⟩)† , (7)

where † (”dagger”) us the conjugate-transpose operation.

The inner product defines a norm:


P
Definition 4 (Norm of a vector). The norm of a ket vector |ψ⟩ = i ai |ei ⟩ is defined
as

2
d
X d
X
p
|| |ψ⟩ || = ⟨ψ|ψ⟩ = a∗i ai = |ai |2 ≥ 0 (8)
i=1 i=1

with the norm vanishing if and only if |ψ⟩ = 0.

Quantum states are represented by normalised vectors, i.e. || |ψ⟩ || = 1.

Related exercises (TD 1). See exercise 1.

1.2 Composite systems


If the state of a system A is in HA and the state of a system B is in HB , then the state
of the composite system AB is in the tensor product HA ⊗ HB , that have dimension
dim(HA ) × dim(HB ). If the system A is in state |ψ⟩A and B in state |ϕ⟩B , the state of
the composite system is |ψ⟩A ⊗ |ϕ⟩B .
We given an example of the tensor product with two-dimensional vectors:
    
c ac
    a
a c 
 d  = ad
  
|ψ⟩A = , |ϕ⟩B = , |ψ⟩A ⊗ |ϕ⟩B =  (9)
b d  c   bc 
b
d bd

We usually simplify the notation as

|ψ⟩A ⊗ |ϕ⟩B = |ψ⟩ ⊗ |ϕ⟩ = |ψ⟩ |ϕ⟩ = |ψϕ⟩ = |ψϕ⟩AB (10)

Proposition 1 (Properties of the tensor product). The tensor product is


1. distributive: |ψ1 ⟩ ⊗ (|ψ2 ⟩ + |ψ3 ⟩) = |ψ1 ⟩ ⊗ |ψ2 ⟩ + |ψ1 ⟩ ⊗ |ψ3 ⟩;
2. associative: |ψ1 ⟩ ⊗ (|ψ2 ⟩ ⊗ |ψ3 ⟩) = (|ψ1 ⟩ ⊗ |ψ2 ⟩) ⊗ |ψ3 ⟩.
It is not, in general, commutative.

1.3 Operators
We defined the notation of operators, that act on quantum states to transform them to
other quantum states.

Definition 5 (Linear operators). A linear operator between Hilbert space H and H ′


is a function  : H → H ′ such that
!
X X
 ai |ψi ⟩ = ai Â(|ψi ⟩) (11)
i i

3
Given two states |ψ⟩ , |ϕ⟩ ∈ H it is possible to build a particular operator, the outer
product |ψ⟩ ⟨ϕ|, which is defined by its action on a state |ξ⟩ ∈ H as:

(|ψ⟩ ⟨ϕ|)(|ξ⟩) = ⟨ϕ|ξ⟩ |ψ⟩ ∈ H (12)

Definition 6 (Adjoint). For an operator Â, its adjoint, denoted † is defined as the
only operator such that

(† |ϕ⟩ , |ψ⟩) = (|ϕ⟩ , A |ψ⟩) (13)

where, in this particular context, (·, ·) is the inner product.

Proposition 2. The adjoint of  is given by the complex conjugate of Â.

Definition 7 (Hermitian and unitary operators). An operator  is Hermitian if

† =  (14)
An operator  is unitary if

† = †  = I (15)


where I is the identity matrix.

We define the three Pauli matrices:


     
0 1 0 −i 1 0
σ̂X = , σ̂Y = , σ̂Z = (16)
1 0 i 0 0 −1
that are operators acting on a 2-dimensional space (i.e. a qubit space). The Pauli matrices
are Hermitian, unitary, and with the identity matrix, form a basis of the 2 × 2 operators.

Definition 8 (Projectors). Let H be a Hilbert space and H ′ be a subspace of H


with orthonormal basis {|ei ⟩}i . The projector of H onto H ′ is the operator
X
P̂H ′ = |ei ⟩ ⟨ei | (17)
i

For any projector P̂ , P̂ 2 = P̂ and its eigenvalues are either 0 or 1.

Related exercises (TD 1). See exercise 2.

1.4 Trace
The trace is the function from the operator space to C that, given a basis eii , is defined
as the sum of the diagonal elements:

4
X
Tr(Â) = ⟨ei |Â|ei ⟩ (18)
i

It is:
1. Linear: for Â, B̂ operators, Tr( + B̂) = Tr(Â) + Tr(B̂) and for λ ∈ C, Tr(λÂ) =
λ Tr(Â);
2. Cyclic: for Â, B̂, Ĉ operators, Tr(ÂB̂ Ĉ) = Tr(B̂ Ĉ Â) = Tr(Ĉ ÂB̂).
3. Basis independent.

1.5 Qubits
The simplest quantum system is a two-dimensional system called qubit, where we can
define a basis {|0⟩ , |1⟩} composed of the eigenvectors of the σ̂Z Pauli matrix, defining by
convention the canonical basis
   
1 0
|0⟩ = |1⟩ = (19)
0 1

A general qubit state then takes the form


 
α
|ψ⟩ = α |0⟩ + β |1⟩ = (20)
β

with α, β ∈ C, and the normalisation condition |α|2 + |β|2 = 1.


The eigenvectors of the σ̂X Pauli matrix are given by
   
|0⟩ + |1⟩ 1 1 |0⟩ − |1⟩ 1 1
|+⟩ = √ =√ , |−⟩ = √ =√ (21)
2 2 1 2 2 −1

{|+⟩ , |−⟩} also defines a basis, that is related to the σ̂Z basis by the Hadamard operator

Ĥ |0⟩ = |+⟩ , Ĥ |1⟩ = |−⟩


(22)
Ĥ |+⟩ = |0⟩ , Ĥ |−⟩ = |1⟩

with
 
1 1 1
Ĥ = √ (23)
2 1 −1

Due to the normalisation condition, a qubit can be defined by two angles θ and φ such
that

5
   
θ θ iφ
|ψ⟩ = cos |0⟩ + sin e |1⟩ (24)
2 2

usually called the Bloch sphere representation, as shown in Fig. 1.

Figure 1: Bloch sphere representation of a single qubit.

1.6 Density matrices


It usually happens, for instance when preparing a state, that we do not hold a perfect
single state, but a statistical mixture of an ensemble of states {|ψ⟩i }, with the associated
probabilities pi . The state is then said to be mixed, and is described by its density
operator, or density matrix:

X
ρ̂ = pi |ψ⟩ ⟨ψi | (25)
i

Definition 9 (Density matrices). A density matrix (or density operator) ρ̂ is an oper-


ator from H to H such that:
1. it is normalised Tr(ρ̂) = 1;
2. it is Hermitian ρ̂† = ρ̂;
3. it is positive semi-definite: for all |ψ⟩, ⟨ψ|ρ̂ψ⟩ ≥ 0, which we write ρ̂ ≥ 0.

Note that the operator in eq. (25) satisfies these conditions.


The density matrix of a pure state |ψ⟩ is

6
ρ̂ = |ψ⟩ ⟨ψ| (26)

and ρ̂2 = |ψ⟩ ⟨ψ|ψ⟩ ⟨ψ| = |ψ⟩ ⟨ψ| = ρ̂.


The spectral theorem states that ρ̂ can be written as

X
ρ̂ = λi |φi ⟩ ⟨φi | (27)
i

with |φi ⟩ the eigenvectors of ρ̂, forming an orthonormal basis of the space, and P
0 ≤ λi ≤ 1
its eigenvalues. We hence see from this that that if ρ is mixed, then Tr(ρ ) = i λ2i < 1.
2

We hence define the purity of a state:

Definition 10 (Purity). The purity of a density operator ρ̂ is defined as

X
P(ρ̂) = Tr(ρ2 ) = λ2i (28)
i

A state is pure if and only if it has purity one, otherwise it is mixed.


The state with density operator ρ̂ = I/d, where d is the dimension of the space, is called
the maximally mixed state.

Related exercises (TD 1). See exercise 3.

1.7 Measurements
Êx† Êx =
P
In general, a measured is described by an ensemble {Êx } of operators such that x
I where the x’s refer to the measurement outcomes.
The probability of measuring x on a system characterised by the quantum state |ψ⟩ is,
according to the postulates of quantum mechanics,

p(x) = ⟨ψ|Êx† Êx |ψ⟩ (29)

and the post-measurement state

Êx |ψ⟩
|ψ ′ ⟩ = p (30)
p(x)

For a state described by a density matrix ρ̂

p(x) = Tr(Êx† ρ̂Êx )



Eˆx ρ̂Eˆx (31)

ρˆx =
p(x)

7
A particularly important class of measurements are the projective measurements, where
Êx are projectors. In this case, since projectors are always Hermitian and squared to
themselves, the probability is simply p(x) = ⟨ψ|Px |ψ⟩, which means that the probability
of projecting a state is the inner product of the initial state and the projected state.
For instance, the σ̂Z measurement is defined by the projectors on the eigenspaces of σ̂Z :
P̂0 = |0⟩ ⟨0|, P̂1 = |1⟩ ⟨1| and the associated probabilities, for a qubit |ψ⟩ = α |0⟩ + β |1⟩:

p(0) = ⟨ψ| P̂0 |ψ⟩ = ⟨ψ| (|0⟩ ⟨0|) |ψ⟩ = | ⟨ψ|0⟩ |2 = |α|2
(32)
p(1) = |β|2

We also have
   
1 0 0 0
|0⟩ ⟨0| + |1⟩ ⟨1| = + =I (33)
0 0 0 1

A general measurement is described by a POVM:

Definition 11 (POVM). A Positive Operator-Valued Measurement (POVM) over


some Hilbert space H is defined as a set of operators {M̂i }i for 1 ≤ i ≤ n such
that
1. For all 1 ≤ i ≤ n, M̂i is Hermitian M̂i† = M̂i ;
2. For all 1 ≤ i ≤ n, M̂i is positive semidefinite M̂i ≥ 0;
3. The operators sum to identity, ni=1 M̂i = I
P

and the probability of getting the outcome i when measuring a state described by ρ̂
is given by

Pρ̂ (i) = Tr(ρ̂M̂i ) (34)

Related exercises (TD 1). See exercises 4, 5.

1.8 Fidelity and trace distance


It is useful in general to be able to see how close two states are. A first tool is given by
the fidelity:

Definition 12 (Fidelity). The fidelity between two quantum states ρ̂1 and ρ̂2 is given
by

 q 2
p p
F(ρ̂1 , ρ̂2 ) = Tr ρ̂1 ρ̂2 ρ̂1 (35)


where ρ is defined as the operator σ̂ such that σ̂ 2 = ρ̂ and σ̂ ≥ 0.

8
If ρ̂1 is pure ρ̂1 = |ψ1 ⟩ ⟨ψ1 |, then the expression simplifies as

F(|ψ1 ⟩ ⟨ψ1 | , ρ̂2 ) = ⟨ψ1 |ρ̂2 |ψ1 ⟩ (36)

If ρ̂2 is also pure, the expression further simplifies as

F(|ψ1 ⟩ ⟨ψ1 | , |ψ2 ⟩ ⟨ψ2 |) = | ⟨ψ1 |ψ2 ⟩ |2 (37)

The fidelity is such that 0 ≤ F(ρ̂1 , ρ̂2 ) ≤ 1, with equality to 1 if both states are equal
(up to a global phase), and equality to 0 when the states are orthogonal. In some sense,
it can be used to quantify how much a state deviates from another, but the fidelity does
not define a proper distance measure.
We hence here define the trace distance, which is a true distance metric.

Definition 13 (Trace distance). The trace distance between two states ρ̂1 and ρ̂2 is
defined as
1 1 p 
T (ρ̂1 , ρ̂2 ) = ||ρ̂1 − ρ̂2 ||1 = Tr (ρ̂1 − ρ̂2 )† (ρ̂1 − ρ̂2 ) (38)
2 2
p 
1
Since the operators are Hermitian, we have T (ρ̂1 , ρ̂2 ) = 2
Tr (ρ̂1 − ρ̂2 )2 which
can also be written T (ρ̂1 , ρ̂2 ) = 21 Tr (|ρ̂1 − ρ̂2 |).

Related exercises (TD 1). See exercise 6.

1.9 Entanglement
Let’s consider a bipartite state between A and B. Some states can be written as the
tensor product of individual states on A and B side:

|0⟩A |0⟩B
|00⟩ + |01⟩ (39)
|0⟩A |+⟩B = √
2

These states are called separable. However, according to the laws of quantum mechanics,
some bipartite states cannot be decomposed as such, as for instance:

1
|Φ+ ⟩AB = √ (|0⟩A |0⟩B + |1⟩A |1⟩B ) (40)
2
In this example, it is not possible to determine the individual state of A (as a pure state).

Definition 14 (Entanglement and separability). Let |ψ⟩AB be a bipartite state on HA ⊗


HB . |ψ⟩AB is said to be separable if there exists |ϕ⟩A ∈ HA and |φ⟩B ∈ HB such
that |ψ⟩AB = |ϕ⟩A |φ⟩B .

9
Otherwise the state is said to be entangled.
The definition extends to mixed states: a state ρ̂AB is said to be separable if there exists
aPprobability distribution {pi }i and two state ensembles {ρ̂A B
i }i and {ρ̂i }i such that ρ̂AB =
A B
i pi ρ̂i ⊗ ρ̂i .

Definition 15. Let ρ̂AB be a bipartite state on HA ⊗ HB . Let |i⟩B be an orthonormal


basis of HB . The partial trace over subsystem B is defined as

X
TrB (ρ̂AB ) = (IA ⊗ B ⟨i|)ρ̂AB (IA ⊗ |i⟩B )
i
X
= B ⟨i|ρ̂AB |i⟩B (41)
i
= ρ̂A

and the leftover state represents the reduced state over subsystem A.

If is easy to show that

1 1
TrB (|Φ+ ⟩) = |0⟩ ⟨0| + |1⟩ ⟨1| = I/2 (42)
2 2
We have that Tr(ρ̂AB ) = TrA (TrB (ρ̂AB )) = TrB (TrA (ρ̂AB )).

Definition 16 (Purification). Let ρ̂A be a state on system A. A pure bipartite state


|ψ⟩RA over HR ⊗ HA where R is a reference system, is said to be a purification of ρˆA
if

TrR (|ψ⟩RA ⟨ψ|RA ) = ρ̂A (43)


P
If ρ̂A has spectral decomposition ρ̂A = i pi |φi ⟩ ⟨φi |, then

X√
|ψ⟩RA = pi |φi ⟩R |φi ⟩A (44)
i

is a purification of ρ̂A where {|φi ⟩R }i is some orthonormal basis of R.

Related exercises (TD 1). See exercises 7, 8.

1.10 Quantum classical ensembles


Let X be a a classical random variable, with outcomes x ∈ X and probability distribution
pX (x). Then the random variable can be described by a mixed state

X
ρ̂X = pX (x) |x⟩ ⟨x| (45)
x∈X

10
It is then possible to define quantum classical ensembles or cq-states where the state on
subsystem A depends on the value of X (or vice versa), with the ensemble {pX (x), ρ̂xA ⊗
|x⟩ ⟨x|}:

X
ρ̂AX = pX (x)ρ̂xA ⊗ |x⟩ ⟨x| (46)
x∈X

1.11 Quantum channels


A quantum channel in an operation E that maps quantum states to quantum states,
characterized by the following three properties:
1. Linearity: E is said to be linear if for any ρ̂A and σ̂A , α, β ∈ C, E(αρ̂A + β σ̂A ) =
αE(ρ̂A ) + βE(σ̂A );
2. Complete positivity: E is completely positive is for any ρ̂A ≥ 0, E(ρ̂A ) ≥ 0;
3. Trace preserving: E is trace-preserving if for any ρ̂A , Tr(E(ρ̂A )) = Tr(ρ̂A ).

Definition 17 (CPTP map). A quantum channel is a linear, completely positive, trace


preserving (CPTP) map.

Theorem 1 (Kraus decomposition). A map E : HA → HB is linear, completely pos-


itive and trace preserving if and only if there exists a set of operators {K̂i }i such
that X †
K̂i K̂i = I (47)
i

and, for any ρ̂, X


E(ρ̂) = K̂i ρ̂K̂i† (48)
i

This is called the Kraus decomposition of a quantum channel.

If we have a bipartite system ρ̂AB , we can consider maps that only affect one subsystem:

ρ̂′AB = (EA ⊗ IB )(ρ̂AB ) (49)

2 Information theory
This Section contains some bases of classical information theory. It is mainly based on
Chapter 2 of [3]. Another useful reference is Chapter 3.1 of [1].

2.1 Shannon and Von Neumann entropies


A fundamental object in classical information theory is given by the random variable.
A random variable X is a variable whose value depends on some random process. If
the set X of possible values (i.e., the alphabet) is discrete, the random variable is said
to be a discrete random variable. The random nature of the process is captured by

11
a probability distribution PX (x) defined on set X 1 . The probability distribution is a
function PX : P(X ) → [0, 1] defined on the power set of X , i.e., the set of all subsets of
X , with the following properties:
1. PX (X ) = 1,
S P
2. PX ( i Ai ) = i PX (Ai ) for any family of pairwise disjoint sets, i.e., such that
Ai ∩ Aj = ∅.
An element A ∈ P(X ) is called an event. In these notes, we will mostly deal with a
particular class of events, i.e., the one composed of events of the form X = x, meaning
that the random variable X has assumed the value x. We will use the notation P (x) with
the meaning that the random variable X has taken the value x, i.e., P (X = x).

Given an event B with P (B) ̸= 0, it is possible to define a new probability measure


P (·|B) : P(X ) → [0, 1] as
P (E, B)
P (E|B) = , (50)
P (B)
for each event E ∈ P(X ). This is called conditional probability.

In 1948, Shannon asked the question of how much information a realisation of a random
variable would have. He decided to characterise the amount of surprise of the said realisa-
tion. Intuitively, the less probable an event is, the more surprise, and hence information,
it contains. On the other side, an event that always occurs with probability 1 does not
bear any information.
The measure should also be such that the realisation of two independent events gives an
information that is the sum of their individual information. Hence, Shannon defined the
information content of a particular realisation x as

i(x) = − log(PX (x)) (51)

with i(x) = 0 if PX (x) = 1 and i(x) ≥ 0. We also have i(x1 , x2 ) = i(x1 ) + i(x2 ) for
independent events.
When dealing with a random variable, it is interesting to quantify its level of uncertainty.
The best-known measure of this uncertainty is given by the average information content
of the random variable X, which is called entropy (or equivalently Shannon entropy)

Definition 18 (Shannon entropy). Let X be a random variable over X , with associated


probability distribution PX . Then the Shannon entropy H(X) of X is defined as:
X
H(X) = − PX (x) log PX (x). (52)
x∈X

1
A full definition of random variable would require a much deeper introduction of what a probability
space is. This is out of the scope of this course, but an interested reader can find a more rigorous definition
in [4].

12
If we take the logarithm in base 2, the entropy is defined in bits. The higher the entropy,
the higher the uncertainty about the random variable. The entropy is a measure of the
information content of the random variable, i.e., the additional information that we get
once we know the actual value of the random variable. Clearly, the higher the uncertainty,
the higher is the information that we get by knowing its actual value.
A particularly important example is the case of a binary random variable where X = {0, 1}
and p(0) = p and p(1) = 1 − p for some p ∈ [0, 1], representing a coin flip. Then the
entropy is given by the so-called binary entropy

h(p) = −p log(p) − (1 − p) log(1 − p) (53)

The binary entropy is plotted in Fig. 2.

1.0

0.8

0.6
h(p)

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2: Plot of the binary entropy function.

A similar entropy definition can be made for quantum states, called the Von Neumann
entropy:

Definition
P 19 (Von Neumann entropy). Let ρ̂ be quantum state, with decomposition
ρ̂ = i λi |ψi ⟩ ⟨ψi |, the Von Neumann entropy is given by

X
S(ρ̂) = − Tr(ρ̂ log(ρ̂)) = − λi log λi (54)
i

It is straightforward to check that both entropies are the same for a classical state ρ̂X =
P
x pX (x) |x⟩ ⟨x|.

Related exercises (TD 1). See exercises 10, 11, 15, 16, 17.

13
2.2 Joint and conditionnal entropy
The concept of information content can be extended to systems comprising multiple ran-
dom variables. If we consider a pair of random variables (X, Y ), the joint entropy is
defined as XX
H(X, Y ) = − P (x, y) log P (x, y), (55)
x∈X y∈Y

where P (·) is the joint probability distribution of the two random variables.

The knowledge of the value of a random variable X can modify the uncertainty that we
have on another random variable Y . If we know that the event X has taken the value x,
the residual uncertainty of the random variable Y is
X
H(Y |X = x) = − P (y|x) log P (y|x). (56)
y∈Y

Its average over the possible values of x gives the conditional entropy H(Y |X), defined as
X
H(Y |X) = P (x)H(Y |X = x)
x∈X
X X
=− P (x) P (y|x) log P (y|x)
x∈X y∈Y
XX
=− P (x, y) log P (y|x), (57)
x∈X y∈Y

where we have used the Bayes’ rule P (x, y) = P (y|x)P (x). Using the Bayes’ rule it is also
easy to demonstrate that

H(X, Y ) = H(X) + H(Y |X), (58)

which means that the information content of the pair of random variables (X, Y ) is given
by the sum of the information content of the random variable X and the residual infor-
mation content of Y conditioned on the knowledge of X.

2.3 Mutual information


A fundamental concept for classical communication systems is the mutual information
I(X; Y ), which measures the average amount of information that one random variable
has about another random variable (or, equivalently, the reduction in the uncertainty of
one random variable due to the knowledge of the other). It is defined as
XX P (x, y)
I(X; Y ) = P (x, y) log . (59)
x∈X y∈Y
P (x)P (y)

14
Using the Bayes’ rule, it is possible to rewrite Eq. (59) as
X P (x|y)
I(X; Y ) = P (x, y) log
x,y
P (x)
X X
=− P (x, y) log P (x) + P (x, y) log P (x|y)
x,y x,y
!
X X
=− P (x) log P (x) − − P (x, y) log P (x|y)
x x,y

= H(X) − H(X|Y ). (60)

A fundamental property of the mutual information is that it is non-negative, i.e., I(X; Y ) ≥


0, with equality if and only if the two random variables X and Y are independent (see
Theorem 2.6.3 and the following Corollary from [3]). A consequence of this property is
that conditioning always reduces entropy, i.e., H(X) ≥ H(X|Y ) with equality if and only
if X and Y are independent random variables, as it is easy to see by combining it with
Eq. (60). Here is a list of the most important properties of the mutual information,

I(X; Y ) = H(X) − H(X|Y ), (61)


I(X; Y ) = H(Y ) − H(Y |X), (62)
I(X; Y ) = H(X) + H(Y ) − H(X, Y ), (63)
I(X; Y ) = I(Y ; X), (64)
I(X; X) = H(X), (65)

which can be visualized in the Venn diagram shown in Fig. 3.

Figure 3: Relationship between entropy and mutual information [3].

Related exercises (TD 1). See exercise 12.

15
2.4 Channel capacity
How can we model an information exchange between two parties ? Let’s say that we
have two parties, Alice and Bob, where Alice has a random variable X which contains
the information she wants to send, and Bob a random variable Y that represents the
information he gets. The relation between X and Y is called a communication channel.
In this course, we will only consider X to be binary variable so that X = {0, 1}.
For instance, let’s say that X is faithfully transmitted to Bob, which is the perfect channel:

1
0 0

1 1
1

Figure 4: Perfect channel

Then we might consider a channel, called the binary symmetric channel (BSC) that flips
the input bit with some probability p:

1−p
0 0
p
p
1 1
1−p

Figure 5: Binary Symmetric Channel BSC(p)

Shannon also asked the question on how much information can be transmitted through
a channel, and he proved that the maximal information transmitted through one use of
channel was upper-bounded by the channel capacity:

Definition 20 (Channel capacity). The capacity of a channel with input variable X


and output variable Y is given by

C = max(I(X; Y )) (66)
PX

which is the maximum of the mutual information over all the sending strategies.

We will show in the tutorials that the capacity of the BSC(p) channel is given by X =
1 − h(p).

Related exercises (TD 1). See exercise 18.

2.5 Min entropy


In a cryptography context, the Shannon entropy might not be the most appropriate
randomness measure. Indeed, if we use the entropy as a measure of uncertainty in a

16
random variable to assess its quality we might have an issue with the fact that the Shannon
entropy measures the average information content: consider a random variable X on
{0, 1}n with
(
1
2
if x = 1...1
PX (x) = 1 1
(67)
2 2n −1
otherwise

You can check that the Shannon entropy is relatively high (≃ n2 ) but if you were to guess
the value of x, it would easy to be right half of the time by always guessing x = 1 . . . 1.
Hence, in a cryptography setting, we are sometimes interesting in the min-entropy, which
basically corresponds as the uncertainty in guessing the value of some variable.

Definition 21 (Min-entropy). Let X be a random variable over X with some proba-


bility distribution PX . The min entropy is defined as

Hmin (X) = − log(max PX (x)) (68)


x∈X

References
[1] Ramona Wolf. Quantum Key Distribution. Springer International Publishing, 2021.
ISBN [’9783030739904’, ’9783030739911’]. doi: 10.1007/978-3-030-73991-1. URL
http://dx.doi.org/10.1007/978-3-030-73991-1.

[2] Thomas Vidick and Stephanie Wehner. Introduction to Quantum Cryptography. Cam-
bridge University Press, 2023.

[3] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley,
9 2005. ISBN [’9780471241959’, ’9780471748823’]. doi: 10.1002/047174882x. URL
http://dx.doi.org/10.1002/047174882x.

[4] Joseph M. Renes. Lecture notes on quantum information theory, 2014. URL https:
//edu.itp.phys.ethz.ch/hs15/QIT/renes_lecture_notes14.pdf.

17

You might also like