Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
Abstract This document describes my semester project in the eld of LDPC codes for erasure protection at the Swiss Federal Institute of Technology Lausanne supervised by Betrand Nzdana Nzdana and Prof. Amin Shokrollahi. At the beginnign some basic coding theory topics are quickly reviewed in order to understand later the principles of linear codes and especially LDPC codes. In a second part the concept behind LDPC codes is shown and illustrated with some simple examples. The third part discusses an implementation of one or several encoders resp. decoders. The implementations are entirely written in C++.
Contents
1 Basic Coding Theory 1.1 General communication system . . . . . . 1.2 Channel . . . . . . . . . . . . . . . . . . . 1.2.1 Binary Symmetric Channel . . . . 1.2.2 Binary Erasure Channel . . . . . . 1.3 Decoding . . . . . . . . . . . . . . . . . . 1.4 Entropy and Information . . . . . . . . . 1.5 Capacity of a Channel . . . . . . . . . . . 1.5.1 Shannons noisy Coding Theorem . 2 3 3 3 3 4 5 6 7 8 8 9 9 12 12 12 12 15
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
2 Linear Codes 2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Encoding Information . . . . . . . . . . . . . . . . . . . . . . 2.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Low Density Parity Check (LDPC) Codes 3.1 Encoding . . . . . . . . . . . . . . . . . . . 3.1.1 Standard Encoding Method . . . . . 3.1.2 Advanced Encoding Method . . . . . 3.2 Decoding . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Chapter 1
1.1
A more general communication system than the one described by Claude Shannon is depicted in Figure 1.2. The message is a k -tuple choosen from a set of possible message words. This k -tuple is transformed by the encoder into a codeword n-tuple from an alphabet A and sent over the channel. Please notice that we are describing a memoryless transmission system, which means that at any given time the message does not depend on previous messages. Moreover the encoder transmits blocks of symbols of xed size n (block coding). The decoder receives from the channel an n-tuple of symbols which will either be transformed into an estimate of a message k -tuple from the encoders alphabet A or into an estimate of the codeword n-tuple. The decoding procedure will entirely be specied by our decoding schema.
1.2
Channel
In an ideal world the transmission channel is perfect and no noise is added to the codeword. In practice this will never be the case. We shall concentrate on coding on a discrete memoryless channel or DMC. The channel is called discrete because we only consider nite input and output alphabets. Because the error of one symbol does not aect the reliability of a neighbouring one the channel is called memoryless. An important type of channel is the mary symmetric channel which has an input alphabet 1 = {x1 , x2 , ..., xm } and an output alphabet 2 = {x1 , x2 , ..., xm } of m symbols and is completly characterized by its channel matrix, which has as an entry pij the probability that after the transmission of the symbol xi the symbol xj = xi is received.
1.2.1
We are specially interested in the 2-ary symmetric channel (Figure 1.2) which is also called binary symmetric channel or BSC(p), where p is the parameter of the channel matrix. The input and output alphabet consist of the set = {0, 1}.
1.2.2
Another important type is the binary erasure channel or BEC( ) which is depicted in Figure 1.3. The BEC has the same input alphabet as the BSC, namely 1 = {0, 1}. The output alphabet however is 2 = {0, 1, ?}. The 3
channel itself is caracterized by a single parameter which describes the probability that one of the symboles has been fudged by the channel. Note that bits cannot be ipped anymore. This channel can be used to model a system where messages are either received correctly or lost due to some reason. The decoding problem is then to nd the correct bits given the erasure locations.
1.3
Decoding
After the n-tuple is received from the channel we can distinguish between two possible decoding schemas: hard and soft decoding. With the hard decoding schema the decoder will always decode the received n-tuple into a k -tuple message. When performing soft decoding the decoder has the choice to decode the n-tuple either into a message k -tuple or into an additinal symbol ?, which indicates the inability to make an educated guess. In the second case we speak of a channel erasure, which is best described as a symbol error whose location is known. Suppose when we are designing our decoder that for each n-tuple y we know the probability p(y x) that the n-tuple x was sent. The basis of our hard quantization decoding schema would then be: When y is received, we decode it to a codeword x that minimizes p(y x). This is called Maximum Likelihood Decoding and abbreviated to MLD. If we take as a basis the same schema but allow the decoder also to decode the received y to the symbol ? (soft quantization schema), which means error detected but not corrected, we speak of Incomplete Maximum Likelihood Decoding (IMLD). It is clear that when our goal is to maximize the probability of correct decoding that MLD should be used, because in this situation any guess is better than none. When we consider a code C in An and a decoding algorithm A we are in-
terested in the average error expectation for decoding C using the algorithm
Px (A).
(1.1)
This will tell us nothing about the performance of A but how good A is as an algorithm for decoding C. We nally dene the error expecation PC for C by PC = min PC (A). (1.2) A It is clear that if PC (A) is large then the decoding algorithm is not good. However if PC is large, then no decoding algorithm is good for C and we should consider another code. In Section 1.3 of [1] some basic code examples like repetition codes, parity check codes, Hamming codes etc. are reviewed.
P =
0 1
0 1p p
1 p 1p
P =
0 1
0 1 0
1 0 1
Figure 1.3:
1.4
Entropy is best described as a measurement for uncertainty for some event. If we consider a discrete random variable X we dene its entropy as H(X) =
k
pk log2 pk .
(1.3)
For a certain probability distribution (p1 , p2 , ..., pn ) of the random variable X there exists an upper bound for the entropy. It can be shown that H(X) = 5
H(p1 , p2 , ..., pn ) log2 n (proof see [2]), with equality if and only if p1 = p2 = 1 ... = pn = n . If X and Y are two random variables we have H(X, Y ) H(X) + H(Y ). In this case there is equality if X and Y are independent. Moreover we dende the conditional entropy as H(X Y ) =
j
H(X Y = bj )P (Y = bj ).
(1.4)
Notice that entropy is the mean value of information. As with entropy we dene the conditional information as I(U V ) = H(U ) H(U V ). (1.6)
This equation directly relates the concepts of entropy and information. By knowing V we lose some uncertainty about U.
1.5
Capacity of a Channel
As we have seen in section 1.2 that a channel is entirely characerized by the input alphabet 1 , the ouput alphabet 2 and the channel matrix. If we attach this channel to a memoryless source A which emits symboles from the set 1 with a probability distribution (p1 , p2 , ..., pn ), then the output of the channel could be regarded as another memoryless source B which outputs symbols from the set 2 with probability distribution (q1 , q2 , ..., qm ). Notice that qj = n P (bj received ai sent)P (ai sent) = n pi pij . Therefor we i=1 i=1 can dene the capacity of the channel as Cap = max{I(A B)} = max{H(A) + H(B) H(A, B)}. (1.7)
Notice that the capacity of a channel is entirely determined by the channel matrix. Looking at the channel matrix of the binary erasure channel and calculating the information I(A B) we nd the capacity as Cap = max{I(A B)} = 1 plog2 (p) (1 p)log2 (1 p). (1.8)
Doing the same calculation for the binary erasure channel we nd that Cap = max{I(A B)} = 1 . (1.9)
In general the capacity calculation for a channel is nontrivial and often involves some kind of an optimization problem. Often it can be usefull to relate the new channel to a channel where we have already calculated the capacity. An important relation is that the r -th extension of a memoryless channel with capacity C has capacity rC. 6
1.5.1
This marvellous theorem shows us that as long as we keep the transmission rate below the channel capacity we can achieve arbitrarily high transmission realiability. If we take for example the binary symmetric channel with capacity C and a trandmission rate R, such that 0 < R < C. We then know that for a sucient large n there exists a set of 2Rn codewords of length n which has error probability less than any given threshold. The beauty of Shannons Theorem is that is assures us that such good codes exists, however it does not tell us how to nd such codes.
Chapter 2
Linear Codes
In order to be able to encode and decode eciently it is convenient to add some structure to the code space. The mathematical framework of vectorspaces provides exactly such a structure. Thus a linear code of length n is a subspace of the vectorspace F n where words will be vectors.
2.1
Basics
If a linear code is a k -dimensional subspace of F n we say that C is a [n,k] k code. Once a code is found we can dene its rate as n bits per channel use and its redundancy as n-k. When in addition to the dimensions we know the minimim distance d we call C a [n,k,d] -code. Because we are describing the code in terms of vecor subspaces we see that such a code allows k codewords of lenght n which describe a code with has nk codewords. This is true because once we have found k linearly independent codewords any subspace of dimension k is completly specied. The space savings can be enormous. The hamming weight or weight of a vector v is the number of the nonzero entries and is denoted wH (v). The minimum weight of the code C is the minimum nonzero weight among all codewords, wmin (C) = w(C) = min0=xC wH (x). (2.1)
The Hamming distance between two vectors x and y is the number of places in which those two vectors dier. We dene the minimum distance of a code C as dmin (C) = minci ,cj C ci =cj d(ci , cj ). (2.2) The Singleton Bound theorem shows us that there exists an upper bound to the minimal distance which is dmin (C) n k + 1. A linear code that meets the Singleton Bound with equality is called maximum distance separable. Notice that for linear codes we have dmin (C) = wmin (C). The matrix G is a spanning matrix for the linear code C if the rowspace of 8
G is equal to C. A generator matrix G of the [n,k] -code C is a k n matrix where the rowspace of G equals C. It is clear that a generator matrix is a spanning matrix whose rows are linearly independent. For any code C we can dene its dual code C as C = {x F n x c = 0, c C}. (2.3)
If C is a linear [n,k] -code then we know that the dual C is a linear [n,nk] -code and (C ) = C. A generator matrix H for the dual code C is called a check matrix for the code C. In general we can calculate the check matrix H from the generator matrix G of C. In order to simplify the calculations we pass G to its reduced row echelon form Aknk G = Ikk (2.4) and calculate H as H= Ankk Inknk . (2.5)
2.2
Encoding Information
For the encoding schema we consider a linear [n,k] -code. The message k tuples x are mapped to the codewords xG. We say that G is a standard generator matrix if the rst k columns form a k k identity matrix and that G is a systematic generator matrix if there are k columns of a k k identity matrix among the columns of G. A codeword will always consist of the information set and the additional redundancy. This is best described with a little example. Consider the following systematic generator matrix G for a binary parity check linear [2,1,2] -code in F 2 : G= 1 0 1 . 0 1 1
we see that the rst two columns of the codeword hold the information set while the third column consists of the redundancy.
2.3
Decoding
The simplest form of decoding always available is dictionary decoding, where we keep a list of all possible received words with the corresponding codeword to which it would be decoded under MLD. It is obvious that this is not a 9
prictical solution because we would need a lot of storage for the dictionary. A more sophisticated solution to the problem is to think of the channel as an additional source which will add to the codeword c the error vector e = x c F n . The decoding problem given x then consists in estimating either e or c. From the denition of e we know that the received vector x and the error pattern e belong to the same coset x + C = e + C. Notice that we can calculate the coset of C in advance. Actually we are looking for vectors in each coset with minimal weight, also known as coset leaders. If there is more than one coset leader we can make an arbitrary choice. When the word x is received, we know that the introduced error belongs to the coset x + C. Thus the most likely error pattern introduced by the channel is the coset leader choosen for this coset. We can decode x to e the codeword c = x . We see that with coset leader decoding the actual e errors corrected are the choosen coset leaders and that coset leader decoding is a Minimal Distance Decoding or MDD algorithm. One implementation of coset leader decoding is standard array decoding. This method takes advantage of the structure of the linear code and is therefor more ecient than dictionary decoding. See [1] for a detailled description. But again this method will not be very practical because it also demands a great amount of storage. The second method of coset leader decoding is syndrome decoding where we will use the dual code C and the check matrix H. The syndrome of a codeword x is dened as s = Hx and is for every received n-tuple x a measure wheter or not it belongs to the code. As the syndrome of a codeword c is zero we have Hx = H (c + e) = 0 + He = He . (2.6)
This equation shows us that the received vector x and the corresponding error vector e introduced by the channel will have the same syndrome. This means that the only thing we need to store is a dictionary which contains all syndromes sj toghether with the coset leaders ej such that Hej = sj . The decoding schema consists in rst calculating the syndrome sr of the received vector x. By looking at the syndrome dictionary we can then decode the received vector x to the codeword = x er . Because we know c that C contains codewords with a minimal distance d we can decode up to d-1 erasures correctly with syndrome decoding. The following example will illustrate the concept of syndrome decoding. The code C we will use is dened by the following generator matrix G= 1 0 1 0 0 1 1 1 = I22 A22 , with A22 = 1 0 . 1 1
c1 =
0 0 0 0
, c2 =
1 1 0 1
, c3 =
1 0 1 0
, c4 =
0 1 1 1
Considering all possible syndromes we can calculate the corresponding coset leaders and construct our syndrome lookup table which will look like 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 . 1 1 1 1 1 1 1 1
=
we can calculate
Hy =
1 1 1 0 0 1 0 1
1 0
and decode it to
1 1 1 1
1 0 0 0
0 1 1 1
11
Chapter 3
Encoding
Standard Encoding Method
The standard method for encoding consists in calculation the product of the message vector x and the generator matrix G. At a rst glance this looks trivial, however there are two problems with this encoding schema. First of all the procedure often requires O(n3 ) operations because the matrix G is not sparse. Another problem arises when we are not working with a particular code but considering an ensemble of codes which is constructed out of a degree distribution. These ensembles are usually dened in terms of ensembles of bipartite graphs. With such a code denition we dont even consider at all the generator matrix G. We therefore have to nd an ecient algorithm in order to construct the generator matrix G out of the parity check matrix H which is clearly non-trivial to nd. All this problems lead us to a second approach which is discussed below and merely uses the parity check matrix H.
3.1.2
As mentioned above we do not look at a particular code anymore but consider an ensemble of codes. The advantage of such an ensemble is that the sparsness of the parity check matrix H enables ecient encoding while the randomness ensures a robust code. The actual encoding schema now consists in two steps. At rst we do the preprocessing step which has to be done once for a given code an which transforms the parity check matrix H into a certain shape. Then the actual encoding step uses the precalculated matrix. We know that this second step can be accomplished eciently if the matrix H is sparse, i.e. does not con12
tain a lot of non-zero entries. The most straightforward way of doing the preprocessing step would probably be to bring the matrix H into a lower triangular form by using gaussian elimination, as depicted in Fig 3.1a. The codeword c can then be split up into the information part s which is lled with the symbols of the message vector x and the parity part p, which can be calculated by using back-substitution. The problem with this approach is that the preprocessing step requires O(n3 ) operation and that after this step the matrix H is not necessarily sparse anymore. The actual encoding calculations of the parity part p take then O(n2 ) operations. A more sophisticated solution would be to try to bring the matrix H into an approximate lower triangular form, shown in Fig. 3.1b, by only using column and row permutations. This ensures that even after the preprocessing step the matrix H is sparse. More precisely the matrix after this step is in the form H= A B T C D E
with A (k g) (n k), B (k g) (g), T (k g) (k g), C (g) (n k), D (g)(g) and E (g)(kg), all being sparse matrices and T is lower triangular with ones along the diagonal. The constant g describes the gap between the lower triangular matrix T and the initial matrix H of dimension (k) (n). The codeword is now splitted in three parts, namely the information part s, the rst parity part p1 and the second parity part p2 , which leads to c = (s, p1 , p2 ). By multiplying the matrix H from the left with I 0 ET 1 I we get H= A B T ET 1 A + C ET 1 B + D 0 .
The equation Hx = 0 then splits into the two equations, which are As + Bp1 + T p2 = 0 and (ET 1 A + C)s + (ET 1 B + D)p1 = 0. (3.2) Dening := ET 1 B + D and assuming that is non-singular the calculations of p1 respectively p2 are straightforward and given by p1 = 1 (ET 1 A + C) and p2 = T 1 (As + Bp1 ). 13 (3.4) (3.3) (3.1)
If after clearing the matrix E is singular we can always perform some additional column permutation in order to remove this singularity. The authors of [4] show that the overall complexity determining p1 is O(n + g 2 ) while determining p2 is O(n). From the same authors there exists an other approach which is in terms of encoding exactly the same but which is dirent in terms of implementation. In this second schema codeword c has the form c = (p1 , p2 , s) and we try to nd a matrix H that has approximate upper triangular form, depicted in Fig 3.1c. This is again accomplished by only doing row and column permutation in order to preserve the sparseness of the matrix H. The matrix H of the form H= T A B E C D
will again be premultiplied from the left by another matrix in order to clear out E. We calculate H as H= I 0 1 I ET T A B E C D = T 0 A B 1 A D ET 1 B C ET .
As above we dene := ET 1 B + D and assume that it is non-singular, otherwise the procedure to make non-singular is the same as above. The calculations of p1 respectively p2 can be done by solving the equations T p1 + Ap2 + Bs = 0 and Ep1 + Cp2 + Ds = 0. This actually consists in the following steps. 1. Set p2 = 0 and solve (3.5) for p1 . 2. Evaluate y := Ep1 + Cp2 + Ds = Ep1 + Ds . 3. Set p2 = 1 y . 4. Solve (3.6) for p1 . Again a performance analysis shows us that calculating step(1), step (2) and step(4) are O(n) and step(3) is O(g 2 ). The question is know if there exists an alogrithm that can eciently transform any given parity check matrix H into the desired approximate lower or approximate upper triangular form with a gap g as small as possible. Such an algorithm is given in [4] and operates on H by permuting rows and columns. A performance analysis shows us that the algorithm is clearly of complexity o(n2 ), in many cases however we can do a lot better, namely at the order of O(n). 14 (3.6) (3.5)
Figure 3.1:
3.2
Decoding
15
Chapter 4
Implementation
4.1 4.2 LDPC Encoder LDPC Decoder
16
Chapter 5
Conclusion
17
Bibliography
[1] J.I. Hall, Notes of Coding Theory, Departement of Mathematics, Michigan State University, 2003 [2] Dominic Welsh,Codes and Cryptography, Clarendon Press, Oxford, 1998 [3] Amin Shokrollahi, LDPC Codes: An Introduction, Digital Fountain Inc, Fremont, 2002 [4] Thomas J. Richardson and Rdiger L. Urbanke, Ecient Encoding of Low-Density Parity-Check Codes, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 2, FEBRUARY 2001
18