Lecture 3 4
Lecture 3 4
n Li i = m Ri i . (1)
i=1 i=1
A graph from the ensemble (n, L, R) is generated using the configuration model: We draw nLi variable nodes
with i half-dges, for each i ∈ {2, . . . , dmax } and rRi variable nodes with i half-dges, for each i ∈ {2, . . . , dmax }.
Pdmax
We then match the two sets of half-edges according to a uniformly random permutation over N = n i=1 Li i
objects.
It is customary (and convenient) to encode the two distributions as generating polynomials:
dX
max dX
max
i
L(x) = Li x , R(x) = Ri xi . (2)
i=0 i=0
Note that interesting quantities can be easily expressed in termsof these polynomials. For instance, the
average variable node degree is L0 (1) and the average check node degree is R0 (1). The rate is given by
L0 (1)
r =1− . (3)
R0 (1)
Also, the number of edges in the graph is N = L0 (1)n = R0 (1)m.
Another equivalent way of describing the graph consists in assigning the edge-perspective degree distri-
butions
L0 (x) R0 (x)
λ(x) = , ρ(x) = . (4)
L0 (1) R0 (1)
λi xi−1 and ρ(x) = ρi xi−1 , where
P P
These are polynomials λ(x) = i i
iLi iRi
λi = P , ρi = P . (5)
j jLj j jRj
1
Node that (λi )2≤iledmax and (ρi )2≤iledmax are probability distributions (normalized non-negative vectors).
They are some-times called the size-biasing of L, R. Note that the node perspective degree distributions can
be recovered from the edge perspective ones, via
Rx Rx
λ(y)dy ρ(y)dy
L(x) = R01 , R(x) = R01 . (6)
0
λ(y)dy 0
ρ(y)dy
Imagine now you draw a uniformly random edge in the graph G (out of the N possible edges). What is
the probability that the adjacent variable node has degree i? There are nLi i edges adjacent do a variable
node of degree i, hence this probability is
nLi i nLi i
= = λi . (7)
N nL0 (1)
Analogously ρi is the probability that, when you draw a uniformly random edge, the adjacent check node
has degree i. This is why they are called ‘edge perspective degree distributions.’
Consider next a slightly different experiment. Draw a uniformly random variable node, and look at its
neighboring variable nodes. A calculation similar to the one given above shows that they will be distinct
and their degrees will be asymptotically i.i.d. with distribution ρ. The neighbors of this check nodes are
themselves asymptotically distinct and their degrees i.i.d. with distribution λ.
This argument can be extended to any neighborhood Bi (t) of a random variable node i. This is a random
rooted graph whose distribution converges to the one of a random rooted tree which we will call Tλ,ρ (t).
This is a bipartite tree with offspring distributions λ, for variable nodes (except the root), and ρ, for check
nodes. More precisely, the probability for a variable node to have i − 1 descendants (and thus i neighbors)
is λi , while for a check node is ρi . The degree distribution of the root is L.
This type of convergence is known in probability theory as ‘local weak convergence’ or ‘Benjamini-
Schramm convergence’.
2 Density evolution
Using the local weak convergence of the graph G, we can derive the following density evolution recursion
with initial condition x0 = 1. In particular, we obtain the following general formula for the threshold of a
random code from the (n, L, R) ensemble:
x
∗ (λ, ρ) = inf . (9)
x∈(0,1) λ(1 − ρ(1 − x))
The density evolution recursion can also be written as λ−1 (xt+1 /) = 1 − ρ(1 − xt ). Notice that we have
Z 1 Z 1
1
λ(x) dx = 0 , [1 − ρ(1 − x)] dx = 1 − . (10)
0 L (1) 0 R0 (1)
If we represent these relations graphically, they imply that we cannot obtain vanishing bit error rate unless
< L0 (1)/R0 (1) = 1 − r. While we knew this already from Shannon channel coding theorem, we obtained an
independent proof. Also, this proof gives a crucial insigth into how we should design thedegree distributions
if we want to achieve capacity. Namely we should match the two curves so that λ−1 (x/) ≈ 1 − ρ(1 − x) for
all x ∈ (0, 1).
2
3 A capacity achieving degree sequence
It turns out that for any given degree distributions λ, ρ with bounded support the threshold erasure prob-
ability (to be denoted as ∗ (λ, ρ)) is strictly less than the information theoretic bound 1 − r. We will thus
resort to a sequence of degree distributions, to be denoted by {λ(k) , ρ(k) )}. The design rate of a degree
distribution pair will be denoted as r(λ, ρ).
Let ρ(k) (z) = z k−1 , λ̂(k) (z) = 1ε [1 − (1 − z)1/(k−1) ]. It follows from the Taylor expansion of (1 + x)α that
1
(−1) l Γ k−1 +1
(k)
λ̂l = (11)
ε Γ(l) Γ 1 − l + 2
k−1
We claim that: (i) ∗ (λ(k) , ρ(k) ) > ε. Hence this ensemble achieves reliable communication for any ≤ ε.
(ii) limk→∞ r(λ(k) , ρ(k) ) = 1 − ε. Hence the ensemble is capacity achieving.
In order to prove these claims we proceed as follows:
1. Show that ελ(k) (1 − ρ(k) (1 − x)) < x for all x ∈ (0, 1], and, as a consequence ∗ (λ(k) , ρ(k) ) > ε.
Notice indeed that the coefficients λl in Eq. (11) are non-negative and hence λ(k) (x) ≤ λ̂(k) (x)/zL(k,ε) .
Therefore
1
ελ(k) (1 − ρ(k) (1 − x)) ≤ ελ̂(k) (1 − ρ(k) (1 − x)) (14)
zL(k,ε)
1
= x < x. (15)
zL(k,ε)
3
4 Error floor and outer coding
Capacity achieving LDPC codes often present a bad ‘error floor’ behavior. Namely, the bit errror probability
below threshold vanishes very slowly. Indeedone often gets:
K()
Pb (n, ) = + o(1/n) , (17)
n
The block error rate stays bounded away from 0 as n → ∞.
Outer coding is a standard approach to address this problem. Before transmitting through the channel,
we encode the information message with a code that has high rate r0 , but very good behavior at low noise
(for instance, a code with good minimum distance). This code is not required to be capacity achieving, can
be itself a regular LDPCcode, or an algebraic code.
We then encode the resulting message using a capacity achieving LDPC code with rate r1 . Notice that
the two blocklength do not need to match, as we can put together several block of the outer code vefore
encoding with theinner code.
At decoding, we first decode the inner code and then the outer code. The total rate is r = r0 r1 .
One can think of the inner LDPC code as turning the channel BEC() into a channel with very small
erasure probability (in fact, with erasure probability of order 1/n if Eq. (17) is correct).
5 Rateless codes
Given an information sequence z ∈ {0, 1}k , a rateless code produces an infinite sequence of bits x = (xi )i≥1 .
These are transmitted through a noisy channel which, as we have done so far, we will assume to be a BEC().
We would like to be able to reconstruct the original information sequence as soon as we collect n = k(1 + δ)
of the transmitted symbols, for some small δ. The parameter δ is the code overhead.
This can be achieved using a type of ‘low density generating matrix’ (LDGM) codes called LT codes (for
Linear Transform or Luby Transform). For each i ≥ 1, we sample an integer with distribution (Ωk )k≥1 and
then transmit the modulo two sum of k information bits chosen uniformly at random. We will denote by
Ω(x) = k Ωk xk the corresponding generating function and define γ = Ω0 (1)(1+δ), and ω(x) = Ω0 (x)/Ω0 (1).
P
This defines a sparse random graph that we can use as a basis for a message passing decoder. Density
evolution reads
Summarizing
n o
zt+1 = exp − (1 + δ) Ω0 (1 − zt ) . (20)
Notice that the bit error rate is bounded away from zero if Ω0 (1) < ∞, and hence in order to achieve vanishing
bit erro rate we need the average degree do diverge. Matching the two sides of the density evolution equation
at δ = 0, yields Ω0 (x) = − log(1 − x), and hence
∞
X x`
Ω(x) = , (21)
`(` − 1)
`=2
In other words, for each transmitted, we draw an integer ` ≥ 1 with probability Ω` = 1/(`(` − 1)), draw `
uniformly random information bits and transmit their modulo two sum.
In practice this type of construction suffers from problems at finite blocklength, and hence it is modified
by truncating to a large `max , and rescaling it for ` ≤ `max . Also, the residual bit error rate can be improved
by outer coding (an approach known as ‘raptor codes’).
4
Summary
At the end of this week you should know:
1. How to analyze standard irregular ensembles on the erasure channel through density evolution.
2. How to optimize irregular ensembles over the erasure channel, and construct capacity-achieving se-
quences.
3. What is a rateless code, and how the LT construction relates to LDGM codes.
This material can be found in Section 3.15 of MCT, page 109 onwards.
Homework
Write a program that generates a random graph from the (λ, ρ) ensemble. Use it to do simulations with the
ensemble (λ(k) , ρ(k) ) defined above for k = 4, 6, 8, 10. More precisely: (i) Choose a value of ε; (ii) Compute
the corresponding degree distribution (λ(k) , ρ(k) ); (iii) Generate a random graph with the prescribed degree
distribution and blocklength n of your choice; (iv) Compute bit error probability curves for transmission
over BEC().
I expect to receive