Fountain Codes: Capacity Approaching Codes Design and Implementation Special Section
Fountain Codes: Capacity Approaching Codes Design and Implementation Special Section
Fountain Codes: Capacity Approaching Codes Design and Implementation Special Section
SPECIAL SECTION
Fountain codes
D.J.C. MacKay
Abstract: Fountain codes are record-breaking sparse-graph codes for channels with erasures, such
as the internet, where files are transmitted in multiple small packets, each of which is either received
without error or not received. Standard file transfer protocols simply chop a file up into K packet-
sized pieces, then repeatedly transmit each packet until it is successfully received. A back channel is
required for the transmitter to find out which packets need retransmitting. In contrast, fountain
codes make packets that are random functions of the whole file. The transmitter sprays packets at
the receiver without any knowledge of which packets are received. Once the receiver has received
any N packets, where N is just slightly greater than the original file size K, the whole file can be
recovered. In the paper random linear fountain codes, LT codes, and raptor codes are reviewed.
The computational costs of the best fountain codes are astonishingly small, scaling linearly with the
file size.
010 010
011 011 K
100 100
101 101
110 110
transmitted packets
111 111
the bucket is a little larger than K. They can then recover the
original file. K
Fountain codes are rateless in the sense that the number
of encoded packets that can be generated from the source
message is potentially limitless; and the number of encoded
packets generated can be determined on the fly. Fountain
codes are universal because they are simultaneously near-
optimal for every erasure channel. Regardless of the N
statistics of the erasure events on the channel, we can send Fig. 2 The generator matrix of a random linear code
as many encoded packets as are needed in order for the When the packets are transmitted, some are not received, shown by
decoder to recover the source data. The source data can be the grey shading of the packets and the corresponding columns in the
decoded from any set of K0 encoded packets, for K0 slightly matrix. We can realign the columns to define the generator matrix,
larger than K. Fountain codes can also have fantastically from the point of view of the receiver (bottom)
small encoding and decoding complexities.
To start with, we will study the simplest fountain codes,
which are random linear codes.
of the packet. As long as the packet size l is much bigger
than the key size (which need only be 32 bits or so), this key
3 The random linear fountain introduces only a small overhead cost. In some applications,
every packet will already have a header for other purposes,
Consider the following encoder for a file of size K packets
which the fountain code can use as its key. For brevity, let’s
s1 ; s2 ; . . . ; sK . A ‘packet’ here is the elementary unit that is
call the K–by–N matrix fragment ‘G ’ from now on.
either transmitted intact or erased by the erasure channel.
Now, as we were saying, what is the chance that the
We will assume that a packet is composed of a whole
receiver will be able to recover the entire source file without
number of bits.
error?
At each clock cycle, labelled by n, the encoder generates
If NoK, the receiver has not got enough information to
K random bits {Gkn}, and the transmitted packet tn is set to
recover the file. If N ¼ K, it is conceivable that he can
the bitwise sum, modulo 2, of the source packets for which
recover the file. If the K–by–K matrix G is invertible
Gnk is 1.
(modulo 2), the receiver can compute the inverse G 1 by
X
K Gaussian elimination, and recover
tn ¼ sk Gkn ð1Þ
k¼1 X
N
sk ¼ tn G1
nk ð2Þ
This sum can be done by successively exclusive-or-ing the n¼1
packets together. You can think of each set of K random
bits as defining a new column in an ever growing binary So, what is the probability that a random K–by–K binary
generator matrix, as shown at the top of Fig. 2. matrix is invertible? It is the product of K probabilities, each
Now, the channel erases a bunch of the packets; a of them the probability that a new column of G is linearly
receiver, holding out his bucket, collects N packets. What is independent of the preceding columns. The first factor, is
the chance that the receiver will be able to recover the entire (12K), the probability that the first column of G is not the
source file without error? Let us assume that he knows the all-zero column. The second factor is (12(K1)), the
fragment of the generator matrix G associated with his probability that the second column of G is equal neither
packets, for example, maybe G was generated by a to the all-zero column nor to the first column of G, what-
deterministic random-number generator, and the receiver ever non-zero column it was. Iterating, the probability
has an identical generator that is synchronised to the of invertibility is ð1 2K Þð1 2ðK1Þ Þ ð1 18Þ
encoder’s. Alternatively, the sender could pick a random ð1 14Þð1 12Þ, which is 0.289, for any K larger than 10.
key, kn, given which the K bits fGkn gKk¼ 1 are determined by That is not great (we would have preferred 0.999!) but it is
a pseudo-random process, and send that key in the header promisingly close to 1.
0.7
10 −3
0.6 So when N ¼ K, the probability that one particular bin is
10 −4
0.5 empty is roughly 1/e, and the fraction of empty bins must be
0.4 10 −5 roughly 1/e too. If we throw a total of 3K balls, the empty
0.3 10 −6 fraction drops to 1/e3, about 5%. We have to throw a lot of
balls to make sure all the bins have a ball! For general N,
0.2 10 −7 0 5 10 15 20 the expected number of empty bins is
0.1
0 KeN =K ð5Þ
0 2 4 6 8 10
number of redundant packets This expected number is a small number d (which roughly
implies that the probability that all bins have a ball is (1d))
Fig. 3 Performance of the random linear fountain
The solid line shows the probability that complete decoding is not only if
possible as a function of the number of excess packets, E. The thin K
dashed line shows the upper bound, 2E, on the probability of error N 4K loge ð6Þ
d
5 The LT code
In summary, the number of packets required to have
probability 1d of success is ’K þ log2 1=d. The expected The LT code retains the good performance of the random
encoding cost per packet is K/2 packet operations, since on linear fountain code, while drastically reducing the encoding
average half of the packets must be added up (a packet and decoding complexities. You can think of the LT code as
operation is the exclusive-or of two packets of size l bits). a sparse random linear fountain code, with a super-cheap
The expected decoding cost is the sum of the cost of the approximate decoding algorithm.
matrix inversion, which is about K3 binary operations, and
the cost of applying the inverse to the received packets, 5.1 Encoder
Each encoded packet tn is produced from the source file
which is about K2/2 packet operations.
While a random code is not in the technical sense a s1 ; s2 ; s3 ; . . . sK as follows:
‘perfect’ code for the erasure channel (it has only a chance 1. Randomly choose the degree dn of the packet from a
of 0.289 of recovering the file when K packets have arrived), degree distribution r(d ); the appropriate choice of r
it is almost perfect. An excess of E packets increases the depends on the source file size K, as we will discuss later.
probability of success to at least (1d), where d ¼ 2E. 2. Choose, uniformly at random, dn distinct input packets,
Thus, as the file size K increases, random linear fountain and set tn equal to the bitwise sum, modulo 2, of those dn
codes can get arbitrarily close to the Shannon limit. The packets.
only bad news is that their encoding and decoding costs are
quadratic and cubic in the number of packets encoded. This This encoding operation defines a graph connecting
scaling is not important if K is small (less than one encoded packets to source packets. If the mean degree d
thousand, say); but we would prefer a solution with lower is significantly smaller than K then the graph is sparse. We
computational cost. can think of the resulting code as an irregular low-density
generator-matrix code.
4 Intermission 5.2 Decoder
Decoding a sparse-graph code is especially easy in the case
Before we study better fountain codes, it will help to solve of an erasure channel. The decoder’s task is to recover s
the following exercises. Imagine that we throw balls from t ¼ sG, where G is the matrix associated with the
independently at random into K bins, where K is a large graph ( just as in the random linear fountain code, we
number such as 1000 or 10 000. assume the decoder somehow knows the pseudorandom
matrix G ).
1. After N ¼ K balls have been thrown, what fraction of the The simple way to attempt to solve this problem is by
bins do you expect have no balls in them? message passing. We can think of the decoding algorithm as
1064 IEE Proc.-Commun., Vol. 152, No. 6, December 2005
the sum–product algorithm [5, Chaps. 16, 26 and 47] if we a). We set that source bit s1 accordingly (panel b), discard
wish, but all messages are either completely uncertain or the check node, then add the value of s1 (1) to the checks to
completely certain. Uncertain messages assert that a which it is connected (panel c), disconnecting s1 from the
message packet sk could have any value, with equal graph. At the start of the second iteration (panel c), the
probability; certain messages assert that sk has a particular fourth check node is connected to a sole source bit, s2. We
value, with probability one. set s2 to t4 (0, in panel d), and add s2 to the two checks it is
This simplicity of the messages allows a simple descrip- connected to (panel e). Finally, we find that two check
tion of the decoding process. We will call the encoded nodes are both connected to s3, and they agree about the
packets tn check nodes. value of s3 (as we would hope!), which is restored in panel f.
1. Find a check node tn that is connected to only one source 5.3 Designing the degree distribution
packet sk (if there is no such check node, this decoding The probability distribution r(d) of the degree is a critical
algorithm halts at this point, and fails to recover all the part of the design: occasional encoded packets must have
source packets). high degree (i.e., d similar to K) in order to ensure that there
(a) Set sk ¼ tn. are not some source packets that are connected to no-one.
(b) Add sk to all checks tn0 that are connected to sk: Many packets must have low degree, so that the decoding
process can get started, and keep going, and so that the
tn0 : ¼ tn0 þ sk for all n0 such that Gn0 k ¼ 1: total number of addition operations involved in the
encoding and decoding is kept small. For a given degree
(c) Remove all the edges connected to the source distribution r(d), the statistics of the decoding process can
packet sk. be predicted by an appropriate version of density evolution,
2. Repeat (1) until all sk are determined. a technique first developed for low-density parity-check
codes [5, p. 566].
Before giving Luby’s choice for r(d), let us think about
This decoding process is illustrated in Fig. 4 for a toy case
the rough properties that a satisfactory r(d) must have. The
where each packet is just one bit. There are three source
encoding and decoding complexity are both going to scale
packets (shown by the upper circles) and four received
linearly with the number of edges in the graph, so the
packets (shown by the lower check symbols), which have
crucial quantity is the average degree of the packets. How
the values t1 ; t2 ; t3 ; t4 ¼ 1011 at the start of the algorithm.
small can this be? The balls-in-bins exercise helps here: think
At the first iteration, the only check node that is
of the edges that we create as the balls and the source
connected to a sole source bit is the first check node (panel
packets as the bins. In order for decoding to be successful,
every source packet must surely have at least one edge in it.
The encoder throws edges into source packets at random,
a s1 s2 s3 so the number of edges must be at least of order K loge K.
If the number of packets received is close to Shannon’s
optimal K, and decoding is possible, the average degree of
+ + + +
each packet must be at least loge K, and the encoding and
decoding complexity of an LT code will definitely be at
1 0 1 1 least K loge K. Luby showed that this bound on complexity
b 1 can indeed be achieved by a careful choice of degree
distribution.
Ideally, to avoid redundancy, we would like the received
+ + + graph to have the property that just one check node has
0 1 1
degree one at each iteration. At each iteration, when this
c
check node is processed, the degrees in the graph are
1
reduced in such a way that one new degree-one check node
appears. In expectation, this ideal behaviour is achieved by
the ideal soliton distribution,
+ + +
rð1Þ ¼ 1=K
1 1 0
1 ð7Þ
d 1 0 rðdÞ ¼ for d ¼ 2; 3; . . . ; K
dðd 1Þ
The expected degree under this distribution is roughly
+ + loge K.
1 1 This degree distribution works poorly in practice, because
e 1 0
fluctuations around the expected behaviour make it very
likely that at some point in the decoding process there will
be no degree-one check nodes; and, furthermore, a few
source nodes will receive no connections at all. A small
+ + modification fixes these problems.
1 1 The robust soliton distribution has two extra parameters,
f 1 0 1 c and d; it is designed to ensure that the expected number of
degree-one checks is about
pffiffiffiffi
S c loge ðK=dÞ K ð8Þ
Fig. 4 Example decoding for a fountain code with K ¼ 3 source
bits and N ¼ 4 encoded bits rather than 1, throughout the decoding process. The
From [5] parameter d is a bound on the probability that the decoding
0.5
rho 10 400
tau
0.4 10 200
10 000
0.3 10 −2 10 −1
b c
Fig. 5 The distributions r(d) and t(d) for the case K ¼ 10 000, 6 Raptor codes
c ¼ 0:2, d ¼ 0.05, which gives S ¼ 244, K/S ¼ 41, and Z’1:3
The distribution t is largest at d ¼ 1 and d/K ¼ S. From [5] You might think that we could not do any better than LT
codes: their encoding and decoding costs scale as K loge K,
where K is the file size. But raptor codes [6] achieve linear
Luby’s analysis [3] explains how the small-d end of t has time encoding and decoding by concatenating a weakened
the role of ensuring that the decoding process gets started, LT code with an outer code that patches the gaps in the LT
and the spike in t at d ¼ K/S is included to ensure that every code.
source packet is likely to be connected to a check at least LT codes had decoding and encoding complexity that
once. Luby’s key result is that ( for an appropriate value of scaled as loge K per packet, because the average degree of
the constant c) receiving K 0 ¼ K þ 2 loge ðS=dÞS checks the packets in the sparse graph was loge K. Raptor codes
ensures that all packets can be recovered with probability at use an LT code with average degree d about 3. With this
least 1d. In the illustrative Figures (Figs. 6a and b) the lower average degree, the decoder may work in the sense
allowable decoder failure probability d has been set quite that it does not get stuck, but a fraction of the source
large, because the actual failure probability is much smaller packets will not be connected to the graph and so will
than is suggested by Luby’s conservative analysis. not be recovered. What fraction? From the balls-in-bins
In practice, LT codes can be tuned so that a file of
original size K ’ 10 000 packets is recovered with an exercise, the expected fraction not recovered is f~ ed ,
overhead of about 5%. Figure 7 shows histograms of the which for d ¼ 3 is 5%. Moreover, if K is large, the law of
actual number of packets required for a couple of settings of large numbers assures us that the fraction of packets not
the parameters, achieving mean overheads smaller than 5% recovered in any particular realisation will be very close to
and 10% respectively. Figure 8 shows the time-courses of f~. So, here is Shokrollahi’s trick: we transmit a K-packet file
three decoding runs. It is characteristic of a good LT code by first pre-coding the file into K ~ ’ K=ð1 f~Þ packets with
that very little decoding is possible until slightly more than an excellent outer code that can correct erasures if the
1066 IEE Proc.-Commun., Vol. 152, No. 6, December 2005
K = 16
+ + + + + + + + + + + + + + + + + +
N = 18
10 000
max degree 8
10 000
max degree K
8000
8000
6000
number decoded
6000
4000
4000
2000
2000
0
0 0 2000 4000 6000 8000 10 000 12 000
0 2000 4000 6000 8000 10 000 12 000
Fig. 10 The idea of a weakened LT code
Fig. 8 Practical performance of LT codes The LT degree distribution with parameters c ¼ 0.03, d ¼ 0.5 is
Three experimental decodings are shown, all for codes created with the truncated so that the maximum degree to be 8. The resulting graph has
parameters c ¼ 0.03, d ¼ 0.5 (S ¼ 30, K/S ¼ 337, and Z’1:03) and a mean degree 3. The decoder is run greedily as packets arrive. As in
file of size K ¼ 10 000. The decoder is run greedily as packets arrive. Fig. 8, the thick lines show the number of recovered packets as a
The vertical axis shows the number of packets decoded as a function of function of the number of received packets. The thin lines are the
the number of received packets. The right-hand vertical line is at a curves for the original LT code from Fig. 8. Just as the original LT
number of received packets N ¼ 11 000, i.e., an overhead of 10% code usually recovers K ¼ 10 000 packets within a number of received
packets N ¼ 11 000, the weakened LT code recovers 8000 packets
within a received number of 9250