Lecture 1
Lecture 1
theory
Y. Piétri
In this lecture, we will introduce the basic tools that are required for this course. In
particular, we will start by reviewing the basic tools in quantum mechanics, before moving
to considerations on classical and quantum information theory.
1 Quantum mechanics
This first section is freely based on chapters 2 and 3 of [1] and chapters 1,2,4 and 5 of
[2].
Definition 1 (Hilbert space). A Hilbert space H is a vector space over the complex
numbers C equipped with a scalar product:
⟨·|·⟩ : H × H → C (1)
with the following properties:
1. Positivity: ⟨ψ|ψ⟩ ≥ 0 for all |ψ⟩ ∈ H , with equality if and only if |ψ⟩ = 0;
2. Linearity: ⟨ϕ| (α |ψ1 ⟩ + β |ψ⟩) = α ⟨ϕ|ψ1 ⟩ + β ⟨ϕ|ψ2 ⟩ for all |ϕ⟩ , |ψ1 ⟩ , |ψ2 ⟩ ∈ H
and α, β ∈ C;
3. Conjugate symmetry: ⟨ϕ|ψ⟩∗ = ⟨ψ|ϕ⟩ for all |ψ⟩ , |ϕ⟩ ∈ H .
⟨ψ| · : H → C (2)
1
such that, for each |ϕ⟩ ∈ H , ⟨ψ|ϕ⟩ is the scalar product. This map, called bra and
noted as ⟨ψ|, is a vector in the dual Hilbert space H ∗ .
The kets can be expressed, as any vectors, in a basis representation. From now on, for
simplicity, we will work with vectors in a finite dimensional Hilbert space (with dimension
d = dim(H ).
and such that any state |ψ⟩ can be written in the unique following way:
d
X
|ψ⟩ = ⟨ei |ψ⟩ |ei ⟩ (4)
i=1
In the previous definition, δij referred to the Kronecker delta symbol which is always 0
except for i = j, where it takes value 1.
P
For instance, if |ψ⟩ = i ai |ei ⟩, then the basis representation is
a1
a2
|ψ⟩ = .. (5)
.
ad
Note. From eq. (6) it is evident that, given a ket |ψ⟩, the corresponding bra ⟨ψ| is
given by
2
d
X d
X
p
|| |ψ⟩ || = ⟨ψ|ψ⟩ = a∗i ai = |ai |2 ≥ 0 (8)
i=1 i=1
1.3 Operators
We defined the notation of operators, that act on quantum states to transform them to
other quantum states.
3
Given two states |ψ⟩ , |ϕ⟩ ∈ H it is possible to build a particular operator, the outer
product |ψ⟩ ⟨ϕ|, which is defined by its action on a state |ξ⟩ ∈ H as:
Definition 6 (Adjoint). For an operator Â, its adjoint, denoted † is defined as the
only operator such that
† =  (14)
An operator  is unitary if
1.4 Trace
The trace is the function from the operator space to C that, given a basis eii , is defined
as the sum of the diagonal elements:
4
X
Tr(Â) = ⟨ei |Â|ei ⟩ (18)
i
It is:
1. Linear: for Â, B̂ operators, Tr( + B̂) = Tr(Â) + Tr(B̂) and for λ ∈ C, Tr(λÂ) =
λ Tr(Â);
2. Cyclic: for Â, B̂, Ĉ operators, Tr(ÂB̂ Ĉ) = Tr(B̂ Ĉ Â) = Tr(Ĉ ÂB̂).
3. Basis independent.
1.5 Qubits
The simplest quantum system is a two-dimensional system called qubit, where we can
define a basis {|0⟩ , |1⟩} composed of the eigenvectors of the σ̂Z Pauli matrix, defining by
convention the canonical basis
1 0
|0⟩ = |1⟩ = (19)
0 1
{|+⟩ , |−⟩} also defines a basis, that is related to the σ̂Z basis by the Hadamard operator
Ĥ
with
1 1 1
Ĥ = √ (23)
2 1 −1
Due to the normalisation condition, a qubit can be defined by two angles θ and φ such
that
5
θ θ iφ
|ψ⟩ = cos |0⟩ + sin e |1⟩ (24)
2 2
X
ρ̂ = pi |ψ⟩ ⟨ψi | (25)
i
6
ρ̂ = |ψ⟩ ⟨ψ| (26)
X
ρ̂ = λi |φi ⟩ ⟨φi | (27)
i
with |φi ⟩ the eigenvectors of ρ̂, forming an orthonormal basis of the space, and P
0 ≤ λi ≤ 1
its eigenvalues. We hence see from this that that if ρ is mixed, then Tr(ρ ) = i λ2i < 1.
2
X
P(ρ̂) = Tr(ρ2 ) = λ2i (28)
i
1.7 Measurements
Êx† Êx =
P
In general, a measured is described by an ensemble {Êx } of operators such that x
I where the x’s refer to the measurement outcomes.
The probability of measuring x on a system characterised by the quantum state |ψ⟩ is,
according to the postulates of quantum mechanics,
Êx |ψ⟩
|ψ ′ ⟩ = p (30)
p(x)
7
A particularly important class of measurements are the projective measurements, where
Êx are projectors. In this case, since projectors are always Hermitian and squared to
themselves, the probability is simply p(x) = ⟨ψ|Px |ψ⟩, which means that the probability
of projecting a state is the inner product of the initial state and the projected state.
For instance, the σ̂Z measurement is defined by the projectors on the eigenspaces of σ̂Z :
P̂0 = |0⟩ ⟨0|, P̂1 = |1⟩ ⟨1| and the associated probabilities, for a qubit |ψ⟩ = α |0⟩ + β |1⟩:
p(0) = ⟨ψ| P̂0 |ψ⟩ = ⟨ψ| (|0⟩ ⟨0|) |ψ⟩ = | ⟨ψ|0⟩ |2 = |α|2
(32)
p(1) = |β|2
We also have
1 0 0 0
|0⟩ ⟨0| + |1⟩ ⟨1| = + =I (33)
0 0 0 1
and the probability of getting the outcome i when measuring a state described by ρ̂
is given by
Definition 12 (Fidelity). The fidelity between two quantum states ρ̂1 and ρ̂2 is given
by
q 2
p p
F(ρ̂1 , ρ̂2 ) = Tr ρ̂1 ρ̂2 ρ̂1 (35)
√
where ρ is defined as the operator σ̂ such that σ̂ 2 = ρ̂ and σ̂ ≥ 0.
8
If ρ̂1 is pure ρ̂1 = |ψ1 ⟩ ⟨ψ1 |, then the expression simplifies as
The fidelity is such that 0 ≤ F(ρ̂1 , ρ̂2 ) ≤ 1, with equality to 1 if both states are equal
(up to a global phase), and equality to 0 when the states are orthogonal. In some sense,
it can be used to quantify how much a state deviates from another, but the fidelity does
not define a proper distance measure.
We hence here define the trace distance, which is a true distance metric.
Definition 13 (Trace distance). The trace distance between two states ρ̂1 and ρ̂2 is
defined as
1 1 p
T (ρ̂1 , ρ̂2 ) = ||ρ̂1 − ρ̂2 ||1 = Tr (ρ̂1 − ρ̂2 )† (ρ̂1 − ρ̂2 ) (38)
2 2
p
1
Since the operators are Hermitian, we have T (ρ̂1 , ρ̂2 ) = 2
Tr (ρ̂1 − ρ̂2 )2 which
can also be written T (ρ̂1 , ρ̂2 ) = 21 Tr (|ρ̂1 − ρ̂2 |).
1.9 Entanglement
Let’s consider a bipartite state between A and B. Some states can be written as the
tensor product of individual states on A and B side:
|0⟩A |0⟩B
|00⟩ + |01⟩ (39)
|0⟩A |+⟩B = √
2
These states are called separable. However, according to the laws of quantum mechanics,
some bipartite states cannot be decomposed as such, as for instance:
1
|Φ+ ⟩AB = √ (|0⟩A |0⟩B + |1⟩A |1⟩B ) (40)
2
In this example, it is not possible to determine the individual state of A (as a pure state).
9
Otherwise the state is said to be entangled.
The definition extends to mixed states: a state ρ̂AB is said to be separable if there exists
aPprobability distribution {pi }i and two state ensembles {ρ̂A B
i }i and {ρ̂i }i such that ρ̂AB =
A B
i pi ρ̂i ⊗ ρ̂i .
X
TrB (ρ̂AB ) = (IA ⊗ B ⟨i|)ρ̂AB (IA ⊗ |i⟩B )
i
X
= B ⟨i|ρ̂AB |i⟩B (41)
i
= ρ̂A
and the leftover state represents the reduced state over subsystem A.
1 1
TrB (|Φ+ ⟩) = |0⟩ ⟨0| + |1⟩ ⟨1| = I/2 (42)
2 2
We have that Tr(ρ̂AB ) = TrA (TrB (ρ̂AB )) = TrB (TrA (ρ̂AB )).
X√
|ψ⟩RA = pi |φi ⟩R |φi ⟩A (44)
i
X
ρ̂X = pX (x) |x⟩ ⟨x| (45)
x∈X
10
It is then possible to define quantum classical ensembles or cq-states where the state on
subsystem A depends on the value of X (or vice versa), with the ensemble {pX (x), ρ̂xA ⊗
|x⟩ ⟨x|}:
X
ρ̂AX = pX (x)ρ̂xA ⊗ |x⟩ ⟨x| (46)
x∈X
If we have a bipartite system ρ̂AB , we can consider maps that only affect one subsystem:
2 Information theory
This Section contains some bases of classical information theory. It is mainly based on
Chapter 2 of [3]. Another useful reference is Chapter 3.1 of [1].
11
a probability distribution PX (x) defined on set X 1 . The probability distribution is a
function PX : P(X ) → [0, 1] defined on the power set of X , i.e., the set of all subsets of
X , with the following properties:
1. PX (X ) = 1,
S P
2. PX ( i Ai ) = i PX (Ai ) for any family of pairwise disjoint sets, i.e., such that
Ai ∩ Aj = ∅.
An element A ∈ P(X ) is called an event. In these notes, we will mostly deal with a
particular class of events, i.e., the one composed of events of the form X = x, meaning
that the random variable X has assumed the value x. We will use the notation P (x) with
the meaning that the random variable X has taken the value x, i.e., P (X = x).
In 1948, Shannon asked the question of how much information a realisation of a random
variable would have. He decided to characterise the amount of surprise of the said realisa-
tion. Intuitively, the less probable an event is, the more surprise, and hence information,
it contains. On the other side, an event that always occurs with probability 1 does not
bear any information.
The measure should also be such that the realisation of two independent events gives an
information that is the sum of their individual information. Hence, Shannon defined the
information content of a particular realisation x as
with i(x) = 0 if PX (x) = 1 and i(x) ≥ 0. We also have i(x1 , x2 ) = i(x1 ) + i(x2 ) for
independent events.
When dealing with a random variable, it is interesting to quantify its level of uncertainty.
The best-known measure of this uncertainty is given by the average information content
of the random variable X, which is called entropy (or equivalently Shannon entropy)
1
A full definition of random variable would require a much deeper introduction of what a probability
space is. This is out of the scope of this course, but an interested reader can find a more rigorous definition
in [4].
12
If we take the logarithm in base 2, the entropy is defined in bits. The higher the entropy,
the higher the uncertainty about the random variable. The entropy is a measure of the
information content of the random variable, i.e., the additional information that we get
once we know the actual value of the random variable. Clearly, the higher the uncertainty,
the higher is the information that we get by knowing its actual value.
A particularly important example is the case of a binary random variable where X = {0, 1}
and p(0) = p and p(1) = 1 − p for some p ∈ [0, 1], representing a coin flip. Then the
entropy is given by the so-called binary entropy
1.0
0.8
0.6
h(p)
0.4
0.2
0.0
A similar entropy definition can be made for quantum states, called the Von Neumann
entropy:
Definition
P 19 (Von Neumann entropy). Let ρ̂ be quantum state, with decomposition
ρ̂ = i λi |ψi ⟩ ⟨ψi |, the Von Neumann entropy is given by
X
S(ρ̂) = − Tr(ρ̂ log(ρ̂)) = − λi log λi (54)
i
It is straightforward to check that both entropies are the same for a classical state ρ̂X =
P
x pX (x) |x⟩ ⟨x|.
Related exercises (TD 1). See exercises 10, 11, 15, 16, 17.
13
2.2 Joint and conditionnal entropy
The concept of information content can be extended to systems comprising multiple ran-
dom variables. If we consider a pair of random variables (X, Y ), the joint entropy is
defined as XX
H(X, Y ) = − P (x, y) log P (x, y), (55)
x∈X y∈Y
where P (·) is the joint probability distribution of the two random variables.
The knowledge of the value of a random variable X can modify the uncertainty that we
have on another random variable Y . If we know that the event X has taken the value x,
the residual uncertainty of the random variable Y is
X
H(Y |X = x) = − P (y|x) log P (y|x). (56)
y∈Y
Its average over the possible values of x gives the conditional entropy H(Y |X), defined as
X
H(Y |X) = P (x)H(Y |X = x)
x∈X
X X
=− P (x) P (y|x) log P (y|x)
x∈X y∈Y
XX
=− P (x, y) log P (y|x), (57)
x∈X y∈Y
where we have used the Bayes’ rule P (x, y) = P (y|x)P (x). Using the Bayes’ rule it is also
easy to demonstrate that
which means that the information content of the pair of random variables (X, Y ) is given
by the sum of the information content of the random variable X and the residual infor-
mation content of Y conditioned on the knowledge of X.
14
Using the Bayes’ rule, it is possible to rewrite Eq. (59) as
X P (x|y)
I(X; Y ) = P (x, y) log
x,y
P (x)
X X
=− P (x, y) log P (x) + P (x, y) log P (x|y)
x,y x,y
!
X X
=− P (x) log P (x) − − P (x, y) log P (x|y)
x x,y
15
2.4 Channel capacity
How can we model an information exchange between two parties ? Let’s say that we
have two parties, Alice and Bob, where Alice has a random variable X which contains
the information she wants to send, and Bob a random variable Y that represents the
information he gets. The relation between X and Y is called a communication channel.
In this course, we will only consider X to be binary variable so that X = {0, 1}.
For instance, let’s say that X is faithfully transmitted to Bob, which is the perfect channel:
1
0 0
1 1
1
Then we might consider a channel, called the binary symmetric channel (BSC) that flips
the input bit with some probability p:
1−p
0 0
p
p
1 1
1−p
Shannon also asked the question on how much information can be transmitted through
a channel, and he proved that the maximal information transmitted through one use of
channel was upper-bounded by the channel capacity:
C = max(I(X; Y )) (66)
PX
which is the maximum of the mutual information over all the sending strategies.
We will show in the tutorials that the capacity of the BSC(p) channel is given by X =
1 − h(p).
16
random variable to assess its quality we might have an issue with the fact that the Shannon
entropy measures the average information content: consider a random variable X on
{0, 1}n with
(
1
2
if x = 1...1
PX (x) = 1 1
(67)
2 2n −1
otherwise
You can check that the Shannon entropy is relatively high (≃ n2 ) but if you were to guess
the value of x, it would easy to be right half of the time by always guessing x = 1 . . . 1.
Hence, in a cryptography setting, we are sometimes interesting in the min-entropy, which
basically corresponds as the uncertainty in guessing the value of some variable.
References
[1] Ramona Wolf. Quantum Key Distribution. Springer International Publishing, 2021.
ISBN [’9783030739904’, ’9783030739911’]. doi: 10.1007/978-3-030-73991-1. URL
http://dx.doi.org/10.1007/978-3-030-73991-1.
[2] Thomas Vidick and Stephanie Wehner. Introduction to Quantum Cryptography. Cam-
bridge University Press, 2023.
[3] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley,
9 2005. ISBN [’9780471241959’, ’9780471748823’]. doi: 10.1002/047174882x. URL
http://dx.doi.org/10.1002/047174882x.
[4] Joseph M. Renes. Lecture notes on quantum information theory, 2014. URL https:
//edu.itp.phys.ethz.ch/hs15/QIT/renes_lecture_notes14.pdf.
17