Fundamentals of Codes, Graphs, and Iterative Decoding
Fundamentals of Codes, Graphs, and Iterative Decoding
Stephen B. Wicker
Cornell University,
Ithaca, NY, U.S.A.
Saejoon Kim
Korea Institute for Advanced Study,
Seoul, Korea
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
List of Figures ix
List of Tables xi
Preface xiii
1. DIGITAL COMMUNICATION 1
1. Basics 1
2. Algorithms and Complexity 5
3. Encoding and Decoding 6
4. Bounds 8
5. Overview of the Text 12
2. ABSTRACT ALGEBRA 13
1. Sets and Groups 13
2. Rings, Domains, and Fields 16
3. Vector Spaces and 23
4. Polynomials over Galois Fields 28
5. Frequency Domain Analysis of Polynomials over GF(q) 34
6. Ideals in the Ring 37
3. LINEAR BLOCK CODES 39
1. Basic Structure of Linear Codes 40
2. Repetition and Parity Check Codes 43
3. Hamming Codes 44
4. Reed-Muller Codes 45
5. Cyclic Codes 49
6. Quadratic Residue Codes 50
vi CODES, GRAPHS, AND ITERATIVE DECODING
7. Golay Codes 51
8. BCH and Reed-Solomon Codes 53
9. Product Codes 58
4. CONVOLUTIONAL AND CONCATENATED CODES 61
1. Convolutional Encoders 62
2. Analysis of Component Codes 65
3. Concatenated Codes 68
4. Analysis of Parallel Concatenated Codes 71
5. ELEMENTS OF GRAPH THEORY 79
1. Introduction 80
2. Martingales 83
3. Expansion 86
6. ALGORITHMS ON GRAPHS 93
1. Probability Models and Bayesian Networks 94
2. Belief Propagation Algorithm 99
3. Junction Tree Propagation Algorithm 104
4. Message Passing and Error Control Decoding 109
5. Message Passing in Loops 115
7. TURBO DECODING 121
1. Turbo Decoding 121
2. Parallel Decoding 126
3. Notes 132
8. LOW-DENSITY PARITY-CHECK CODES 137
1. Basic Properties 137
2. Simple Decoding Algorithms 143
3. Explicit Construction 147
4. Gallager’s Decoding Algorithms 151
5. Belief Propagation Decoding 162
6. Notes 172
9. LOW-DENSITY GENERATOR CODES 177
1. Introduction 177
2. Decoding Analyses 181
3. Good Degree Sequences 188
Contents vii
Index 217
List of Figures
1
N. Elridge and S. J. Gould, “Punctuated Equilibria: An Alternative to Phyletic Gradual-
ism,” in Models in Paleobiology, T. J. M. Schopf (ed), San Francisco: Freeman Cooper, pp.
82 - 115, 1972.
2
C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Jour-
nal, Volume 27, pp. 379 - 423 and pp. 623 - 656, 1948.
xiv CODES, GRAPHS, AND ITERATIVE DECODING
tools. Slightly less obvious has been our dependence on and exploitation
of other fields. The recognition of structure has allowed for the identi-
fication of connections to others fields of mathematics and engineering,
and the subsequent looting of the other fields’ respective toolboxes. Fi-
nally, coding theorists have shown great prescience over the years, a
prescience so extreme that we often fail to appreciate our colleagues’ re-
sults for several decades. Fortunately many of our colleagues have very
good memories, and we can thus incorporate and build on results that
were initially given short shrift.
To put this current volume in context, a quick review of that past
fifty years will prove helpful.
The first significant error control codes – those due to Hamming3
and Golay4 – were based on linear algebra and some relatively simple
combinatorial techniques. The early error control codes were developed
as linear block codes – subspaces of vector spaces over finite fields. These
subspaces have dual spaces, whose bases can be interpreted as explicit
parity relations among the coordinates of the codewords that constitute
the code. The creation and exploitation of parity relations is a major
theme in this book, and the creative and intelligent inculcation of parity
relations is clearly the key to the recent developments in error control.
In the 1950’s, however, the principle metric for the quality of an error
control code was minimum distance, with the Hamming bound serving as
the performance limit. This conflation of the sphere packing and error
control problems was limiting, and the discovery of all of the perfect
codes by 1950 (a result unknown at the time5) left little room for new
results through the combinatorial approach.
Reed took the first step away from the combinatorial approach with
his recognition that Muller’s application of Boolean algebra to switch-
ing circuits could be re-interpreted as a construction technique for error
control codes. Reed saw that by viewing codewords as truth tables of
Boolean functions, various results in Euclidean geometry and Boolean
algebra could be used as design tools for error control codes6. The re-
sulting Reed-Muller codes were a significant step beyond the earlier work
3
R. W. Hamming, “Error Detecting and Error Correcting Codes”, Bell System Technical
Journal, Volume 29, pp. 147 – 160, 1950.
4
M. J. E. Golay, “Notes on Digital Coding,” Proceedings of the IRE, Volume 37, pg. 657,
June 1949.
5
A. Tietäväinen, “On the Nonexistence of Perfect Codes over Finite Fields,” SIAM Journal
of Applied Mathematics, Volume 24, pp. 88 - 96, 1973.
6
I. S. Reed, “A Class of Multiple-Error-Correcting Codes and a Decoding Scheme,” IEEE
Transactions on Information Theory, Volume 4, pp. 38 – 49, September 1954. See also D. E.
Muller, “Application of Boolean Algebra to Switching Circuit Design,” IEEE Transactions
on Computers, Volume 3, pp. 6 - 12, September 1954.
PREFACE xv
7
I. S. Reed and G. Solomon, “Polynomial Codes over Certain Finite Fields,” SIAM Journal
on Applied Mathematics, Volume 8, pp.300 – 304, 1960. See also S. B. Wicker and V. K.
Bhargava, (editors) , Reed-Solomon Codes and Their Applications, Piscataway: IEEE Press,
1994.
8
See, for example, E. Prange, “Some Cyclic Error-Correcting Codes with Simple Decoding
Algorithms,” Air Force Cambridge Research Center-TN-58-156, Cambridge, Mass., April,
1958, R. C. Bose and D. K. Ray-Chaudhuri, “On a Class of Error Correcting Binary Group
Codes,” Information and Control, Volume 3, pp. 68 - 79, March 1960, A. Hocquenghem,
“Codes Correcteurs d’Erreurs,” Chiffres, Volume 2, pp. 147 - 156, 1959, and D. Gorenstein
and N. Zierler, “A Class of Error Correcting Codes in Symbols,” Journal of the Society
of Industrial and Applied Mathematics, Volume 9, pp. 207 - 214, June 1961.
9
E. Berlekamp, “Nonbinary BCH Decoding,” presented at the 1967 International Symposium
on Information Theory, San Remo, Italy. See also E. R. Berlekamp, Algebraic Coding Theory,
New York: McGraw-Hill, 1968. (Revised edition, Laguna Hills: Aegean Park Press, 1984.)
10
P. Elias, “Coding for Noisy Channels,” IRE Conv. Record, Part 4, pp. 37 – 47, 1955.
11
R. M. Fano, “A Heuristic Discussion of Probabilistic Decoding,” IEEE Transactions on
Information Theory, IT-9, pp. 64 - 74, April 1963.
12
A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum
Decoding Algorithm,” IEEE Transactions on Information Theory, IT-13, pp. 260 - 269,
April 1967.
xvi CODES, GRAPHS, AND ITERATIVE DECODING
13
S. B. Wicker, “Deep Space Applications,” Handbook of Coding Theory, (Vera Pless and
William Cary Huffman, ed.), Amsterdam: Elsevier, 1998.
14
The structure of the Viterbi algorithm has its roots in earlier optimization algorithms.
See, for example, G. J. Minty, “A Comment on the Shortest Route Problem,” Operations
Research, Volume 5, p. 724, October 1957.
15
O. Collins, “The Subtleties and Intricacies of Building a Constraint Length 15 Convo-
lutional Decoder,” IEEE Transactions on Communications, Volume 40, Number 12, pp.
1810–1819, December 1992. See also S. B. Wicker, “Deep Space Applications,” Handbook of
Coding Theory, (Vera Pless and William Cary Huffman, ed.), Amsterdam: Elsevier, 1998.
16
See, for example, G. D. Forney, Jr., “Convolutional Codes I: Algebraic Structure,” IEEE
Transactions on Information Theory, IT-16, pp. 720 – 738, November 1970 and G. D. Forney,
Jr., “Convolutional Codes II: Maximum Likelihood Decoding,” Information and Control,
Volume 25, pp. 222 – 266, July 1974.
17
L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for
minimizing symbol error rate,” IEEE Transactions on Information Theory, IT-20:284–287,
1974.
18
L.E. Baum and T. Petrie, “Probabilistic functions of finite state Markov chains,” Ann.
Math. Stat. 37:1554–1563, 1966, L.E. Baum and G.R. Sell, Growth transformations for
functions on manifolds,” Pac. J. Math. 27(2):211–227, 1968, and L. E. Baum, T. Petrie,
G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of
probabilistic functions of Markov chains” Ann. Math. Stat. 41:164–171, 1970.
PREFACE xvii
19
The portion of the BW algorithm relevant to the decoding of convolutional codes is often
referred to in the coding community as the BCJR algorithm.
20
C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding
and decoding: Turb Codes,” Proceedings of the 1993 International Conference on Commu-
nications, 1064–1070, 1993.
21
See, for example, C. Heegard, and S. B. Wicker, Turbo Coding, Boston: Kluwer Academic
Press, 1999.
22
R. J. McEliece, D. J. C. MacKay and J. -F. Cheng, “Turbo Decoding as an Instance
of Pearl’s ‘Belief Propagation’ Algorithm,” IEEE Journal on Selected Areas in Commun.,
vol. 16, pp. 140-152, Feb. 1998 and F.R. Kschischang and B.J. Frey, “Iterative Decoding
of Compound Codes by Probability Propagation in Graphical Models,” IEEE Journal on
Selected Areas in Commun., vol. 16, pp. 219-230, Feb. 1998.
xviii CODES, GRAPHS, AND ITERATIVE DECODING
23
R.G. Gallager, Low-Density Parity-Check Codes. The M.I.T. Press, Cambridge, MA, 1963.
24
R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform.
Theory, vol. IT-27, pp. 533-547, Sept. 1981.
25
M. Sipser and D.A. Spielman, “Expander Codes,” IEEE Trans. Inform. Theory, vol. IT-42,
pp. 1710-1722, Nov. 1996.
26
D.J.C. MacKay, “Good Error-Correcting Codes based on Very Sparse Matrices,” IEEE
Trans. Inform. Theory, vol. 45, pp. 399-431, Mar. 1999.
27
See, for example, M.G. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman,
“Improved Low-Density Parity-Check Codes Using Irregular Graphs and Belief Propagation,”
Proc. 1998 IEEE Int. Symp. on Inform. Theory, Boston, USA, August 16-21, 1998, M.C.
Davey and D.J.C. MacKay, “Low-Density Parity Check Codes over GF(q),” IEEE Commun.
Letters, vol. 2., no. 6, June 1998, an T. Richardson, M.A. Shokrollahi and R. Urbanke,
“Design of Provably Good Low-Density Parity-Check Codes,” submitted to IEEE Trans.
Inform. Theory.
PREFACE xix
DIGITAL COMMUNICATION
1. Basics
A discrete channel is a system that consists of input symbols from an
alphabet output symbols from an alphabet and an input-output
relation that is expressed in terms of a probability function
and We assume that the cardinality of the input alphabet is
or equivalently, that the input is The selection of a channel
input symbol induces a probability distribution of the channel output
through where The transmission of data over a
discrete channel typically consists of distinct channel transmissions
corresponding to a word in The output of the channel is a
(possibly) distored form of the word in A discrete channel
is memoryless if probability distribution describing the output depends
only on the current input and is conditionally independent of previous
input or output. We assume throughout the book that our channel of
interest is a discrete memoryless channel.
The general problem of coded digital communication over a noisy
channel can be set up as follows. Assume that there are M possible
2 CODES, GRAPHS, AND ITERATIVE DECODING
Observe that since inputs are correctly received at the output with prob-
ability the capacity for the BEC is somewhat intuitive.
Hence
Digital Communication 5
Here the usage of for the domain of is made only for notational
convenience and is irrelevant to the our discussion, and
If the channel alphabet is the same as the code alphabet then
the decoding problem reduces to what is called hard decision decoding.
In this case error correcting code provides error control capability at the
receiver through redundancy; not all patterns in are valid, so the
receiver is able to detect changes in the transmitted symbol sequence
caused by channel noise when such changes result in invalid patterns.
The receiver may also be able to map a received sequence to a codeword
If is the transmitted codeword, then we have “corrected” the errors
caused by the channel. Otherwise, decoding error has occurred.
There are many types of hard decision decoding functions Several
are listed below.
Definition 10 (Nearest-Codeword Decoding) Nearest-codeword de-
coding selects one of the codewords that minimize the Hamming distance
between a codeword and the received word.
Theorem 11 The problem of finding the nearest codeword for an arbi-
trary linear block code is NP-hard.
Proof: See [18].
Definition 12 (Bounded Distance Decoding) Bounded distance de-
coding selects all codewords that are within Hamming distance from the
received word for some predefined
If then is called the error correction capability of the
code, and if it exists, is unique. An code is called a
error correcting code for For some hard decision decoding
algorithms, serves as a limit on the number of errors that
can be corrected. For others, it is often possible to correct more than
errors in certain special cases.
Definition 13 The weight of a word is the number of nonzero coordi-
nates in the word. The weight of a word is commonly written
In hard decision decoding we can speak meaningfully of an error vector
induced by noise on the channel. On most communication
8 CODES, GRAPHS, AND ITERATIVE DECODING
channels of practical interest it is often the case that the probability mass
function on the weight of the error vector is strictly decreasing, so that
the codeword that maximizes is the codeword that minimizes
Under these assumptions, a code with minimum
distance can correct all error patterns of weight less than or equal to
In many cases the channel alphabet is not the same as the code
alphabet Generally this is due to quantization at the receiver that
provides a finer discrimination between received signals than that pro-
vided with hard decisions. The resulting decoding problem is called soft
decision decoding, and the solution takes the form of a mapping from the
received space to the code space. In this case it is misleading to speak
of “correcting” channel errors, as the received sequence does not contain
erroneous code symbols.
There are three basic types of soft decision decoding considered in
this book.
Definition 14 (Maximum Likelihood(ML) Decoding)
Maximum likelihood decoding finds one of the codewords that, for a
received word maximize the distribution
There is a related decoding algorithm that is identical to ML decoding
if the codeword distribution is uniform.
Definition 15 (Maximum A Posteriori (MAP) Decoding)
Maximum a posteriori decoding finds one of the codewords that, for a
received word maximize the distribution
Definition 16 (Symbol-by-symbol MAP Decoding)
Symbol-by-symbol MAP decoding finds the information symbol that,
for a received word maximizes the distribution
4. Bounds
In this section we consider several classical upper and lower bounds
on minimum distance and code rate as a function of other code param-
eters. These bounds treat a fundamental problem that can be stated
as follows: For a given codeword length and minimum distance, what is
the greatest possible number of codewords? We consider both the non-
asymptotic and asymptotic behavior of error control codes. In the latter,
we approach the aforementioned question while allowing the codeword
length to tend toward infinity.
1
is typically the Euclidean space.
Digital Communication 9
Lemma 21 If then
Proof: Since
and
Applying
Proof: Arrange each codeword in each row of a matrix such that the
first coordinates of each row are different from each other.
This can be done since all codewords differ from one another by bits.
Hence since the proof is finished.
Definition 26 A code that satisfies the Singleton Bound with equality
is said to be maximum distance separable (MDS).
There exist a large number of MDS codes, the most prominent being
the Reed-Solomon codes. The maximum possible length of MDS codes
over a given alphabet is not known, though the following conjecture is
widely believed to be true.
Conjecture The length of a MDS code with dimension or redun-
dancy 3 is at most If neither the length nor the redundancy are
3, then the length is at most
ABSTRACT ALGEBRA
Now suppose that and are in the same coset. It follows that
where Using the above, we have and
for some We then see that if an element
in coset is equivalent to an element in coset then every element
in is equivalent to every element in It follows that contains
Since the reverse is also true, Distinct cosets must therefore be
disjoint.
for all
Abstract Algebra 17
Euclid’s Algorithm
Let a Euclidean domain, where
(1) Let the indexed variable take on the initial values and
Note that with each iteration of the recursion formula, the size of the
remainder gets smaller. It can be shown that, in a Euclidean domain,
the remainder will always take on the value zero after a finite number
of steps. For a proof that is the when first takes on the value
zero, see McEliece [81].
Example 41 (Using Euclid’s Algorithm)
The reader may wish to prove to herself that the order of a Galois
field must exist, and must be finite.
Proof: Suppose that the elements of a basis are not linearly indepen-
dent. It follows that one of the elements can be deleted without reducing
the span of the basis. This reduced the set’s cardinality by one, contra-
dicting the minimality of the cardinality of a basis.
Corollary 56 Though a vector space may have several possible bases,
all of the bases will have the same cardinality.
Definition 57 (Dimension of a Vector Space) If a basis for a vec-
tor space has elements, then the vector space is said to have dimen-
sion written
Theorem 58 Let be a basis for a vector space For every vec-
tor there is a representation This
representation is unique.
Proof: The existence of at least one such representation follows from the
definition of bases. Uniqueness can be proven by contradiction. Suppose
that there are two such representations for the same vector with different
coefficients. Then we can write
where Then
Since the basis vectors must not be independent. This
contradicts Theorem 55.
The following theorem shows that the field contains all Galois
fields of order where divides
Theorem 68 An element in lies in the subfield if and
only if
Proof: Let It follows from Theorem 47 that
and thus
Now assume that is then a root of The
elements of comprise all roots, and the result follows.
The following lemmas will prove useful in showing that the roots of a
primitive polynomial are conjugates of one another.
We can see from the conjugacy classes that . factors into two
irreducible binary polynomials of degree 6 and one of degree 3.
Definition 84 (Cyclotomic Cosets) The cyclotomic cosets mod with
respect to are a partitioning of the integers into
sets of the form
Example 85 (Cyclotomic Cosets) The cyclotomic cosets mod 15 with
respect to GF(2) are {0}, {1, 2, 4, 8}, {3, 6, 9, 12}, {5, 10}, and{7, 11, 13, 14}
34 CODES, GRAPHS, AND ITERATIVE DECODING
where
bining, we have
mod
Theorem 89 (The GFFT Convolution Theorem) Consider the fol-
lowing GFFT pairs.
Proof: First note that there is at least one nonzero ideal in the ring
namely, the ring itself. Since the degrees of the poly-
nomials in are bounded below, there must be at least one polynomial
of minimal degree. This polynomial can be made monic by dividing
through by the leading nonzero coefficient.
We now proceed with a proof by contradiction. Let and
be distinct monic polynomials of minimal degree in Since forms
an additive group, must also be in Since
and are monic, must be of lower degree, contradicting the
minimality of the degree of
Consider such that is not a multiple of Since
forms a Euclidean domain and can
be expressed as where
since and is an ideal. Since forms an additive
group, contradicting the minimality of the
degree of
Suppose does not divide in Since
is a Euclidean domain, can be expressed as
where Since it is the
additive inverse of contradicting
the minimality of the degree of
Chapter 3
From 1948 through the early 1990’s, there was a tension between ex-
istential results in coding theory and the reality of coding practice. As
we shall see, the existence of codes with desirable properties is readily
proved through probabilistic methods. The general approach is to con-
struct a class of structures, and then prove that the probability that a
structure with desirable properties exists is positive. This much is quite
straightforward, and the classical results that we will review in this chap-
ter have been known (and well regarded) for some time. The next step –
the actual construction of the desired codes – proved extremely difficult
and has been an elusive goal for some time. Chapter 5 –9 of this book
are dedicated to significant strides that have recently been taken toward
realizing this goal.
This is not to say, however, that the first few decades of the evolu-
tion of error control coding were not successful. In this chapter we will
describe several classical constructions for block codes that have had a
great impact on the telecommunications and computing industries. The
first error correcting codes, developed by Richard W. Hamming in the
late 1940’s, continue to play a significant role in error control for semi-
conductor memory chips. Both Reed-Muller, Golay, and Reed-Solomon
codes have been used in deep space telecommunications, while Reed-
Solomon codes made the digital audio revolution possible, and continue
to be used in wireless systems and new digital audio and video storage
technologies. The applications of classical block codes have been le-
gion and their impact substantial; their performance, on their own, has
simply fallen short of the Shannon limit. We include brief descriptions
here because of their general interest, as well as their potential use as
40 CODES, GRAPHS, AND ITERATIVE DECODING
1
There are cases in which we may wish to define codes as subspaces over rings or other
algebraic structures. We will not pursue such cases in this text, and will thus restrict the
definition of linear codes to those constructed as vector subspaces over finite fields
Linear Block Codes 41
The term “parity check matrix” refers to the fact that the null space
for the linear transformation induced by H is exactly the code space
This follows directly from Definition 61 in Chapter 2. It follows in turn
that if and only if
Each row of the parity check matrix places a parity constraint on
two or more of the coordinates of a codeword. In later chapters we will
consider graph-theoretic interpretations of these constraints.
Corollary 105 The minimum distance of a linear block code is the min-
imum cardinality over all nonempty sets of linearly dependent columns
for any of its parity check matrices.
Randomly chosen linear codes can possess many nice properties such
as achieving some of the bounds shown in Chapter 1. The next theorem
shows one such example.
Theorem 114 Randomly selected linear block codes achieve the Gilbert-
Varshamov Bound as with high probability.
Linear Block Codes 43
and
we have
Repetition Codes:
if and only if
Parity check codes are most frequently defined over GF(2). They are
formed by appending a single bit to an information word. For “even
parity” the value of this appended bit is selected so as to make the total
44 CODES, GRAPHS, AND ITERATIVE DECODING
number of ones in the codeword even. For “odd parity,” the value is
selected so as to make the number of ones in the codeword odd.
Parity-Check Codes:
Even parity: if and only if
where denotes addition in GF(2).
Odd parity: if and only if
Parameters:
3. Hamming Codes
Hamming codes of length over are described by parity
check matrices whose columns are the set of distinct nonzero
where no column is a multiple of another. Note that exactly such
columns exist.
Lemma 115 For every there is a Hamming
code over
Consider a parity check matrix that has as columns all distinct nonbinary
The code is extended by adding a row of ones and the column vector
resulting in the following matrix.
The additional constraint of even parity increases the length to 8 and the
minimum distance to four.
4. Reed-Muller Codes
The codes that are now called Reed-Muller (RM) codes were first de-
scribed by Muller in 1954 using a “Boolean Net Function” language.
That same year Reed [92] recognized that Muller’s codes could be rep-
resented as multinomials over the binary field. The resulting “Reed-
Muller” (RM) codes were an important step beyond the Hamming and
46 CODES, GRAPHS, AND ITERATIVE DECODING
This generator matrix should look familiar. It is also the parity check
matrix for an (8, 4) extended Hamming code. First order Reed-Muller
codes are the duals of extended Hamming codes.
has length 8, dimension 4, and minimum distance 4. It is thus
single error correcting and double error detecting.
To determine the minimum distance of we first prove the
following lemma.
Lemma 119
where denotes the concatenation of and
Proof: By definition, the codewords in are associated
with Boolean functions of degree For each
function there exists and
such that Since
has degree and has degree the associated vectors and
can be found in and respectively.
Now let and
The associated vectors have the form where
48 CODES, GRAPHS, AND ITERATIVE DECODING
It follows that
Parameters:
5. Cyclic Codes
We denote the right cyclic shift of as
Cyclic Codes:
A cyclic code of length is a principal ideal
if and only if
for some
The parameters of cyclic codes depend on the specific type of construc-
tion adopted. The minimum distances of some cyclic codes are unknown.
Furthermore
The quadratic residue codes Q, N, of length are defined
by the generator polynomials and
respectively.
Theorem 126 The minimum distance of or satisfies
Furthermore, if is of the form then
Proof: See [76], p. 483.
7. Golay Codes
The binary Golay code is the (23, 12, 7) quadratic residue code
with and The quadratic residues modulo 23 are
Let be a primitive 23rd root of unity.
52 CODES, GRAPHS, AND ITERATIVE DECODING
It follows that
We have now described the entire family of perfect codes. The follow-
ing powerful result is due to Tietäväinen [111].
Theorem 129 Any nontrivial perfect code must have the same length
symbol alphabet and cardinality as a Hamming, Golay,
or repetition code.
Proof: See [76] or [111].
Proof: The proof follows by first showing that the constraint placed on
the generator polynomial in the premise ensures that all of the square
submatrices of a BCH parity check matrix are Vandermonde. Vander-
monde matrices are nonsingular, thus placing a lower bound on the
minimum distance of the code. The details can be found in [116].
An equivalent version of the BCH bound can be proven using Galois
Field Fourier Transforms.
Theorem 131 Let divide for some positive integer A
with weight that also has consecutive zeros
in its spectrum must be the all-zero vector.
Proof: Let be a vector with exactly nonzero coordinates, these
coordinates being in positions We now define a “locator
Linear Block Codes 55
we have
To show this using matrices, observe that the parity check matrix H
has rows with elements in Convert each element of
H into a column vector of length over and delete those rows
that are linearly dependent, if any. Hence
Step 1 follows from the design procedure for general cyclic codes.
Steps 2 and 3 ensure, through the BCH Bound, that the minimum dis-
tance of the resulting code equals or exceeds and that the generator
polynomial has the minimal possible degree. Since is a product of
minimal polynomials with respect to must be in
and the corresponding code is thus with
Linear Block Codes 57
9. Product Codes
Product Codes:
Parameters:
Proof: The first condition guarantees that the second condition is well-
defined. That is, if then the Chinese Remainder Theorem
guarantees that are all distinct for and
If then there exists some and
such that
To show that second condition guarantees that the product code is
cyclic, observe that given a codeword of the product code, a cyclic shift
of it yields a cyclic shift of a row and column of the array. Thus, a cyclic
shift of a codeword of the product code is a codeword of the product code
since the rows of the array are codewords of and the columns of the
array are the codewords of by definitions of and respectively.
Going in the other direction, a cyclic shift of a row or a column of the
array yields a shift of the codeword of the product code for some
since
More detailed analyses of product codes can be found in [49, 56, 68]
and the references therein.
Chapter 4
1. Convolutional Encoders
Figure 4.1 shows a pair of rate-1/2, linear, nonrecursive convolutional
encoders. The encoders operate by taking blocks as inputs, and
generation blocks at the output. In this particular case, the en-
coder outputs two bits for every input bit, and is thus said to have
rate-1/2. Figure 4.2 shows a rate-2/3 convolutional encoder. In gen-
eral, an encoder with inputs and outputs is said to have rate
even though the delay lines introduce a “fractional rate loss” (see [116],
Chapter 11).
Convolutional and Concatenated Codes 63
The encoders in Figures 4.1 and 4.2 are nonrecursive in that they do
not employ feedback in the encoding operation. The encoding operation
can be described as a linear combination of the current input and a finite
number of past inputs. The linear combination is generally expressed
in terms of generator sequences for the encoders. A generator sequence
relates a particular input sequence to a particular output
sequence A particular value of denotes the presence or ab-
sence of a tap connecting the memory element of the input shift
register to the output. The generator sequences for the encoders in
Figure 4.1 are and and
and
The output equations for a convolutional encoder have the general
form
The output can be seen to be the sums of the convolutions of the input
sequences with the associated encoder generator sequences. Note that
the operations are addition and multiplication in GF(2).
The encoders are linear in that, as can be seen from the above expres-
sion, the encoding operation obeys the superposition principle - linear
combinations of input blocks correspond to linear combinations of the
associated output blocks.
The encoder in Figure 4.1 (a) is systematic, meaning that one of its
outputs is a copy of the source data. This is not the case for the encoder
in Figure 4.1(b), so is it called nonsystematic. The encoders in Figure
4.3 differ from those in Figure 4.1 in that the former employ feedback,
and are thus recursive.
The memory for each of the inputs of any of the above encoders is
enumerated by the memory vector where the input
shift register has memory elements. It is assumed that for each i there
is at least one with The state complexity of the encoder is
determined by the total encoder memory The
number of states in the encoder is while the the constraint length of
the convolutional encoder is where
The most convenient means for relating the output of any convolu-
tional encoder to the input, particularly in the case of a recursive en-
coder, is through the “D transform.” The D transform of a temporal
sequence is the polynomial
where D denotes relative delay. Using this simple tool, the
output of an non-recursive encoder can be written in terms of the input
by the matrix expression
64 CODES, GRAPHS, AND ITERATIVE DECODING
The elements of the generator matrix for a recursive encoder are ra-
tional functions in D with binary coefficients. For example, the encoders
in Figure 4.3 have the following generator matrices.
Convolutional and Concatenated Codes 65
Note that one can obtain IRWEF given its conditional IRWEF and vice
versa.
RSC code.
NSC and RSC codes generate the same set of codewords, and thus
the sequences for an NSC code and its associated RSC code are the
same. However, the mapping of information sequences to code sequences
differs between the two encoders. The difference can be characterized
as follows: RSC encoders exhibit slower growth in the information
weight sequences than do the NSC encoders. It follows that the RSC
encoders provide a more favorable bit error rate at low SNR’s, as the
number of information bits affected by decoders errors is proportional
to the weight of the information blocks associated with low-weight code-
words. At high SNR’s, only the codewords of weight and
have significant impact, and the lower and provided by
NSC codes offer better performance.
Convolutional and Concatenated Codes 67
Both encoders generate the same set of output sequences, however, in-
formation bits get mapped to different codewords for the two encoders.
To explore this in detail, we introduce the input-output weight enumer-
ating function (IOWEF). In the IOWEF A(W, X), the exponent fo r the
dummy variable W denotes information sequence weight, while the ex-
ponent for X denotes the codeword weight. Using Mason’s gain rule
(see, for example, Chapter 11 in [116]), it can be shown that the NSC
encoder has the following IOWEF
Using long division, we can compute the terms for the codeword weights
1 through 10.
and
3. Concatenated Codes
Concatenated error control systems use two or more component codes
in an effort to provide a high level of error control performance at the
expense of only a moderate level of complexity. A concatenated encoder
consists of two or more component encoders that combine to generate
a long code with good properties. The decoder uses the component
code substructure of the concatenated encoder to realize a multi-stage
implementation that is much less complex than a single-stage approach.
In this chapter we consider both the original, serial form of concatenation
as well as the more recent, parallel form. The former allows for various
forms of iterative decoding that will be discussed briefly here. The latter
was developed in conjunction with turbo iterative decoding, which will
be the subject of a later chapter. The details of the performance of
parallel concatenated codes are presented at the end of this chapter.
Turbo decoding of parallel concatenated codes is described in Chapter
7.
the Shannon limit [15, 16, 17, 19, 20, 31, 32, 33, 47, 89]. The reasons
for this excellent performance lie in the weight distributions of the com-
ponent codes and the overall code. This will be investigated in detail in
the next section.
1
Also called the “waterfall region” due to the shape of the bit error rate curve in this region
which falls off steeply
72 CODES, GRAPHS, AND ITERATIVE DECODING
2
Also called the “error floor region,” again, due to the shape of the bit error rate curve in
this region which levels off horizontally
Convolutional and Concatenated Codes 73
Their results are shown in Table 4.1. The search was conducted
assuming an interleaver length but the results do not seem to
change for longer interleavers. The octal notation is a convenient means
for representing long polynomials. For example, consider the first of the
optimal component codes with The octal value is converted
to the binary value (011001). The LSB is taken to be the first nonzero
bit on the left. The feedback polynomial is then
Note that in most cases, the free distance of a PCC based on
a pair of such recursive systematic encoders is much smaller than the
effective free distance. The information weight associated with the
sequences of weight is large, substantially reducing the impact of
these sequences on bit error rate.
Chapter 5
construction for a regular graph that achieves the lower bound. This
gives the best explicit expander known.
1. Introduction
We begin with a basic definition for a “graph”.
Definition 145 A graph G is an ordered pair (V, E) of a set of vertices
V and a set of edges E. An edge is a pair of distinct vertices from V.
In a simple example below, the graph consists of the vertex set V =
{A, B, C, D, E, F} and the edge set E = {(A, D), (B, D), (C, E), (D, E)}.
This particular graph is disconnected in that paths do not exist between
arbitrary pairs of vertices in the graph. In this case the vertex F is not
connected by an edge to any other vertex. This graph is also undirected
in that there is no directionality associated with the edges. In a directed
graph, the edges are ordered pairs of vertices, with the first vertex being
the originating vertex and the second the terminating vertex. A directed
version of the first graph is shown in Figure 5.2. We will focus on undi-
rected graphs in this chapter, but will return to directed graphs in the
next chapter.
In the remainder of this chapter and the rest of the text we will use
the following terminology. An edge is said to be incident on its end
vertices. The number of edges incident on a vertex is the degree of the
vertex. In the two graphs shown in Fig. 5.3, for example, each vertex
has degree 2. Two vertices connected by an edge are said to be adjacent.
The chromatic number of the graph G, is the minimum number of
colors required to color the vertices of G such that no adjacent vertices
have the same color. The chromatic number for the graph at the top of
Fig. 5.3, for example, is 3. A graph is regular if all vertices of the graph
are of equal degree. If each vertex of a regular graph is of degree we
say the graph is A graph is irregular if it is not regular.
Elements of Graph Theory 81
Definition 146 Let G be a graph with edge set E and vertex set V.
The edge-vertex incidence graph of G is the bipartite graph with vertex
set and edge set is an endpoint of
2. Martingales
The term “martingale” has in interesting history - it is a gambling
strategy that is now forbidden in many of the world’s casinos. The basic
idea is that the gambler doubles her wager after every loss. Assuming
that the gambler is sufficiently well-funded, she will eventually win and
recover all of her earlier losses.
84 CODES, GRAPHS, AND ITERATIVE DECODING
Lemma 151 If satisfies the edge (vertex) Lipschitz condition then its
edge (vertex) exposure martingale satisfies
and Define
86 CODES, GRAPHS, AND ITERATIVE DECODING
Thus,
and
3. Expansion
This section focuses on techniques for bounding the expansion of ran-
dom graphs. For regular graphs, the results are based on the calculation
of the second largest eigenvalue (of the adjacency matrix) of the corre-
sponding graph. Let denote the eigenvalues of the graph such
that and let denote the second largest eigenvalue
in absolute value of G. Furthermore, we shall assume throughout this
section that G is a graph with vertices, unless otherwise spec-
ified. Note that if G is not bipartite and
if G is bipartite. In particular, and and
if and only if G is bipartite.
Elements of Graph Theory 87
It was shown by Alon [5] and Tanner [108] that a graph G is a good
expander if and only if and are far apart. Hence in order to find
a good expander graph G, it suffices to check The following lower
bound for was derived in [5, 67].
1
Srinivasa Aiyangar Ramanujan (1887 - 1920) made significant contributions to the analytical
theory of numbers. Essentially self-taught, he is considered one of the greatest mathemati-
cians of all time.
88 CODES, GRAPHS, AND ITERATIVE DECODING
and
we get
Theorem 157 For a graph G = (V, E), and two sets of vertices
and where and we have
where
92 CODES, GRAPHS, AND ITERATIVE DECODING
ALGORITHMS ON GRAPHS
graph, or DAG. Cyclic and acyclic directed graphs are shown in Figure
6.3.
There are two basic types of DAG’s: singly-connected DAG’s and
multiply-connected DAG’s. A DAG is singly-connected if there exists
exactly one undirected path between any pair of vertices. A singly-
connected DAG is also referred to as a tree. Within the class of singly-
connected DAG’s, a network may be either a simple tree or a polytree.
A tree is simple if each vertex has no more than one parent, as shown
in Figure 6.4(b). A polytree is a tree that has vertices with more than
one parent, as illustrated in Figure 6.4(c).
The important distinction to be made here between DAG’s that are
multiply connected and those that are singly connected is that the former
can have loops. A loop is a closed, undirected path in the graph. A tree
cannot have a loop since a loop requires two distinct paths between any
pair of vertices in the loop.
Within a DAG we can relate vertices to one another using familiar
terms. We will use the DAG in Figure 6.4(c) as an example throughout.
A vertex is a parent of another vertex if there is a directed connection
from the former to the latter. Vertices C and D are parents of Vertex
E in Figure 6.4(c). Similarly, a vertex is a child of a given vertex if
there is a directed connection from the latter to the former. Vertex D is
thus a child of vertices A and B. An ancestor of a vertex is any vertex
for which a directed path leads from the former to the latter. Vertices
A, B, C, and D are thus ancestors of vertex E. The complete set of
all ancestors of a given vertex is called the ancestor set of the vertex.
For example, the ancestor set of vertex H is {A, B, C, D, E}. Similarly,
there are descendent vertices and descendent sets. Vertices G and H are
98 CODES, GRAPHS, AND ITERATIVE DECODING
where is the evidence or the set of instantiated variables that has the
total available information in the network about the random variable X.
Our goal is to find beliefs of vertices in the network in a computationally
efficient manner.
Let and be the evidence contained in the subgraph whose root
is X and in the rest of the graph, respectively, and define
and
We will use throughout the book to mean that the left hand side
of is equal to the right hand side weighted by a normalizing constant.
Since the network is loop-free, we can put the belief of X in the following
factored form:
Defining
and
102 CODES, GRAPHS, AND ITERATIVE DECODING
yields
Note that it suffices for the vertex X to pass to all its children,
instead of passing to each child of X different – each child can
compute by dividing the received by the value of the message
that it previously passed to X. That is, by exploiting the relation
we are able to get a computational improvement in
specifically,
and suppose has K neighboring cliques and let be the set of cliques
in the subtree containing when dropping the edge Let
be the set of the vertices in the subtree containing when dropping the
Algorithms on Graphs 107
The message sent from clique to the neighbor clique now can
be reformulated as
108 CODES, GRAPHS, AND ITERATIVE DECODING
where is chosen to have the smallest size clique that contains and
We can represent the code by the graph shown in Figure 6.8. The
vertices labeled 1, … , 4 are the bits of the codeword and the vertices
(1,4), (2,4) and (3,4) are the constraints on the bits of the codeword.
The modulo-2 sum of values of neighbors of a constraint vertex should
be 0. In the graph in Figure 6.8, we left out the evidence vertices each
of which is connected to a vertex in the graph. The graph is loop-free,
so we can apply the belief propagation algorithm.
110 CODES, GRAPHS, AND ITERATIVE DECODING
Then,
and
where represents the modulo-2 sum. The belief of each bit vertex is
then
as expected.
Now consider the second of the semiring examples, repeated below.
Semiring Example 2 : The set of nonnegative
real numbers with the operation + being the maximum that has the
identity and the operation · being the sum that has the
identity element 0.
Using the metric log instead of yields
112 CODES, GRAPHS, AND ITERATIVE DECODING
Since the directed probability graph shown in Figure 6.9(a) has loops,
we will not be able to directly apply the belief propagation algorithm.
We will instead use the junction tree propagation algorithm. To use the
algorithm, it is necessary to convert the directed probability graph into
a junction tree. By deriving the moral graph shown in Figure 6.9(b),
we obtain the desired junction tree of the directed probability graph
Algorithms on Graphs 113
and
where (a) follows from a series of substitutions into the equations above.
This application of the junction tree algorithm results in the symbol-by-
symbol a posteriori probability decoding of the code. As before, doing
the algorithm in the semiring of Semiring Example 2 and using log
metric instead of gives us maximum likelihood decoding.
and
Algorithms on Graphs 115
is counted at most once. For example, consider the loopy graph shown
in Figure 6.12(a), and its equivalent tree of depth 3 for calculating the a
posteriori probability of vertex 3 shown in Figure 6.12(b). All messages
in the Figure 6.12(b) are independent, as can be easily verified.
consider vertex and calculate its true a posteriori probability and its
value calculated from the message-passing algorithm in the loopy graph.
Reformulating
TURBO DECODING
1. Turbo Decoding
Turbo error control was introduced in 1993 by Berrou, Glavieux, and
Thitimajshima [19]. The encoding and decoding techniques that fall
under this rubric were fundamentally novel to Coding Theory, and are
now recognized to provide performance that is close to the theoretical
limit determined by Shannon in 1948 [100].
There are two key concepts that continue to underlie turbo decod-
ing: symbol-by-symbol MAP decoding of each of the component codes
and information exchange between the respective decoders. This is best
exemplified through reference to Figure 7.1. is the sequence of infor-
mation bits that is encoded as by the first component code and whose
interleaved version is encoded as by the second component code.
122 CODES, GRAPHS, AND ITERATIVE DECODING
where the sum is taken over all possible values of for all The
next step is to use the last factor in the expression of
As before, the next step is to use the last factor in the expression of
Turbo Decoding:
126 CODES, GRAPHS, AND ITERATIVE DECODING
2. Parallel Decoding
A simple parallel mode of turbo decoding that will have positive ex-
tensions for the cases when the parallel concatenated code has more
than two component codes is described in this section. Empirical results
show these algorithms can give better limiting performances than the se-
rial mode of turbo decoding of parallel concatenated codes. This guides
us to direct our practice in turbo decoding towards the parallel modes.
Clearly, the parallelism of the activation of the component decoders re-
quires that these decoding algorithms give estimates of the a posteriori
probabilities that are less biased to one particular component code. For
the sake of comparison with the parallel concatenated code introduced
by Berrou, Glavieux and Thitimajshima [19], we will first study the two
component codes case.
Consider the parallel mode of turbo decoding shown in Figure 7.4.
Because of the simultaneous activation of the component decoders in
Figure 7.4, it is not altogether obvious how to combine the information
from the two decoders such that the estimate of the information symbol
is always better than the estimate given by the serial mode of decoding.
Turbo Decoding 127
Parallel Decoding:
Turbo Decoding 129
Extended Parallel One (EP1): Only the black (or only the grey)
interconnections are active.
choosing
3. Notes
The impact of the turbo principle on the research community was
explosive – the best indication being the number of papers or books that
were subsequently published (see, for example, [32, 33, 34, 47, 65, 82,
91, 48]). It it impossible in this short text to mention all who have made
significant contributions to this field, but it is appropriate to note the
works that the authors have relied on the most. Benedetto and Montorsi
gave the first detailed analyses of the performance of turbo error control
[15, 16] and extended the capability of iterative decoding techniques to
serially concatenated codes [17] which also give near capacity-achieving
performance. Perez et al. [89] explored the distance properties of turbo
Turbo Decoding 133
134 CODES, GRAPHS, AND ITERATIVE DECODING
code, and showed that the interleaver size of turbo code should be large
for the code to have small number of codewords of low weight.
Wiberg et al. [115] described how a general iterative decoding algo-
rithm can be described as a message passing on graphs, and McEliece
et al.’s paper [83] on turbo decoding described turbo decoding as an
instance of belief propagation algorithm, providing a graphical view of
turbo decoding and allowing generalized decoding algorithms of turbo
codes [57]. In an instance of simultaneous inspiration, Kschischang and
Frey also showed that turbo decoding can be viewed as an application of
the belief propagation algorithm [60, 61] on a multiply-connected graph.
More recently, the probability density of the data estimates developed
during turbo decoding was shown to be well approximated by the gaus-
sian distribution when the input to the decoder is gaussian [35, 97]. This
allows for the calculation of a threshold for turbo decoding so that, at
a noise level below some threshold, the probability of error of turbo de-
coding goes to zero in the length of the codeword; at a noise level above
the same threshold, the probability of error goes to one in the length of
the codeword.
Turbo Decoding 135
Chapter 8
1. Basic Properties
Low-density parity-check codes defined over regular bipartite graphs
are called regular low-density parity-check codes, with the obvious analogs
for irregular graphs. We will refer to them as regular and irregular codes,
respectively. Throughout the chapter, we shall use the notation
and for the codeword node, constraint node, received word
node, message from a codeword node to a constraint node and message
from a constraint node to a codeword node, respectively. By abuse of
138 CODES, GRAPHS, AND ITERATIVE DECODING
notation, we shall use node and bit (value taken by the node) inter-
changeably, and also use variable node and codeword node interchange-
ably. Since we will typically use the left and right sets of nodes in the
bipartite graph as the variable and constraint nodes, respectively, we
will refer to variable nodes as left nodes and constraint nodes as right
nodes.
For regular graphs, and will denote the degrees of a variable node
and a constraint node, respectively. For irregular graphs, and will
denote the degrees of a variable node and a constraint node and
and will denote the maximum degrees of a variable bit node and
a constraint node, respectively. Irregular codes, by definition, include
regular codes, and for this reason we shall normally describe codes in
terms of irregular bipartite graphs, unless explicitly stated otherwise.
Definition 172 Let B be a bipartite graph with variable nodes
A low-density parity-check code is
where the inequality follows from each being greater than 2. On the
other hand,
Combining Equations (8.1) and (8.2) yields the first lower bound given
by
which implies Equation (8.3) since each active constraint node is adja-
cent to nonzero even number of nodes with value 1.
On the other hand,
1
Recall from chapter 1, definition 20, that is defined to be the minimum relative distance
if is the code length and is the minimum distance of the code.
142 CODES, GRAPHS, AND ITERATIVE DECODING
linear restrictions are shared among the variable bits. Hence, the variable
bits have at least independent bits which gives the
lower bound on the rate of the code Suppose there is a codeword
of weight at most in which V is the set of variable bits that are 1.
The expansion property of the bipartite graph tells us that V has at
least
Consider now the case when is a linear code of rate block length
and minimum relative distance and B is the edge-vertex incidence
graph of a graph G with the eigenvalue with the second
largest magnitude. If the number of vertices of G is then the number
of variables and constraints of are and respectively. Code
rate can be obtained from the degree of the variable node being 2 and
from Theorem 175. Now Lemma 161 of Chapter 5 tells us that any
set of variables will have at least constraints as
neighbors for some constant and since each variable has two neighbors,
the average number of variables per constraint will be
indicating that there exists a corrupt bit that is in more unsatisfied than
satisfied constraints. Rephrased, for the algorithm will flip
some variable bit. We finish the proof by showing that for the
algorithm will flip a corrupt variable bit. Hence, assume and
so To deduce a contradiction, observe that it suffices
to show that our algorithm may fail if the algorithm flips variable bits
that are not corrupt and becomes greater than If so, then when
becomes equal to we have from Equation (8.5) which is a
contradiction.
Proof: [Sketch] The average left and right node degrees are independent
of the code length, and the number of unsatisfied constraints, which is
linear in the code length, decreases.
In fact, a weak converse is true also. That is, in order for the Sim-
ple Sequential Decoding Algorithm to correct all errors successfully, the
graph must be an expander. The next theorem [103] proves this for the
case of regular codes.
Low-Density Parity-Check Codes 145
neighbors.
Proof: Observe that if a corrupt variable bit is flipped then the number
of unsatisfied constraint nodes decreases by at least 1 for odd, and
by at least 2 for even. We shall consider these two cases separately.
Case 1: is even. The algorithm decreases the number of unsatisfied
constraint nodes by to correct corrupt variable bits. Thus, all
sets of variable nodes of size have at least neighbors.
Case 2: is odd. The algorithm decreases the number of unsatisfied
constraint nodes by to correct corrupt variable bits. So assume
first that there is no variable node that will decrease the number of
unsatisfied constraint nodes by > 1. Each corrupt variable bit node
has of its edges in satisfied constraint nodes and each satisfied
constraint nodes may have corrupt neighbors. Hence there must
be satisfied neighbors of the variable bits.
On the other hand, since there must be unsatisfied neighbors of
the variable bits, the variable nodes must have
neighbors.
Now assume that there exists a variable bit such that if the variable
bit is flipped, then the decrease in the number of unsatisfied constraint
nodes is > 1, or Suppose the algorithm flips corrupt variable
bits that decrease the number of unsatisfied constraint nodes by and
corrupt variable bits that decrease the number of unsatisfied
constraint nodes by 1. So any variable nodes have at least
3. Explicit Construction
In this section, we show an explicit construction of an asymptotically
good code from a low-density parity-check code. The code is due to
Barg and Zémor, and can be constructed in polynomial time and de-
coded in linear time. It is an asymptotically good binary code that does
not use concatenation, in the sense of Forney’s concatenated codes, and
furthermore, it is the first code to be constructed whose error exponent
148 CODES, GRAPHS, AND ITERATIVE DECODING
Decoding Round:
Do the following in serial:
Notice that the number of bits among the left nodes with neighbors in
the above set is at least Hence the minimum relative distance of
this code is approximated by
The theorem implies that the relative minimum distance of this code
is defined for 5-times the range from the previous con-
struction. The decoding algorithm shown in this section can also be
simulated in linear time under the logarithmic cost model. To do this,
the Decoding Round must be modified as follows:
Decoding Round':
For each perform maximum likelihood decoding for its
neighbors in serial.
Gallager’s Algorithm 1:
Iterate the following two steps: For all edges do the following in
parallel:
Low-Density Parity-Check Codes 153
Gallager’s Algorithm 2:
Iterate the following two steps: For all edges do the following in
parallel:
If this is the zeroth round, then set to
If this is a subsequent round, then set as follows:
The next lemma provides efficient computation for the message passed
from a constraint node to a variable node during decoding.
154 CODES, GRAPHS, AND ITERATIVE DECODING
Lemma 185 Consider nodes whose values are binary and taken in-
dependently, where each node has the value 0 with probability The
probability that an even number of nodes have value 0 is
We can combine the above equations and get a recursive equation for
in terms of as follows.
Low-Density Parity-Check Codes 157
Now combine the results of Theorems 188 and 189 with those of The-
orems 177 and 180. We know Gallager’s Algorithms 1 and 2 reduce the
number of bits in error to a small number with exponentially high prob-
ability, and Simple Sequential and Parallel Decoding Algorithms reduce
the number of bits in error to zero if the bipartite graph defining the
code is some expander. Hence in order to guarantee successful decod-
ing, use Lemma 159 of Chapter 5 to show whether a randomly chosen
bipartite graph has the necessary expansion, and then decode by Gal-
lager’s Algorithms 1 or 2 followed by the Simple Sequential or Parallel
Decoding Algorithms.
Regarding the complexity of this cascaded decoding, we know Gal-
lager’s Algorithms 1 and 2 both require only a linear number of com-
putations per decoding round, and we only need to perform a constant
number of decoding rounds of Gallager’s Algorithms 1 and 2 to reduce
the number of bits in error to a small number. On the other hand, we
know Simple Sequential and Parallel Decoding Algorithms both require
only a linear number of computations to correct all the remaining bits
in error. Hence for a bipartite graph with minimum left node degree
greater than or equal to 5, there exist explicit linear-time bounded algo-
Low-Density Parity-Check Codes 159
rithms that can correct all bits in error successfully with exponentially
high probability.
the same value to in the previous round, set to this value. To get
the expression for we only need to replace each
and in particular, only one nonzero entry right sequence suffices in most
cases.
Codes 1 and 2 in Table 8.3 have minimum left degree greater than
or equal to 5, giving the graph the necessary expansion called for in
Theorem 190. Codes 1 and 2 are decoded by Gallager’s Algorithm 1
or 2 followed by Simple Sequential or Parallel Decoding Algorithm for
successful decoding. To see the effect of finite code lengths for Codes 1
and 2 in the probability of decoding error, we consider Figure 8.3[71].
Recall that in Theorems 192 and 193 are valid only for infinite code
length. A code with 16000 variable bit nodes and 8000 constraint
nodes is shown in the figure and for the (4,8)-regular code,
which is the best performance among regular codes. The figure shows
the percentage of successful decoding operations based on 2000 trial
runs, and shows that it agrees quite well with the theoretical result of
the asymptotic number of Codes 3 and 4 in Table 8.3 do not have
minimum degree greater than or equal to 5 and hence we decode them
with Gallager’s Algorithm alone. Simulation results show that Gallager’s
Algorithm 1 or 2 usually corrects all the errors and it is unnecessary to
switch to the Simple Sequential or Parallel Decoding Algorithm.
Figure 8.4 [71] shows the effect of finite code lengths of Codes 3 and
4 in the probability of decoding error. Similar to Figure 8.3, this figure
shows the percentage of successful decoding operations based on 2000
trial runs and shows that it agrees quite well with the theoretical result
of asymptotic number of
error correcting capability was simplified by the fact that the messages
passed between the variable bit and constraint nodes were binary. In
belief propagation decoding, the messages are real numbers, making the
analysis much more difficult.
We prove that the application of belief propagation algorithm to a
randomly chosen irregular low-density parity-check code results in er-
ror correcting performance that is very near the theoretical limit. For
example, in the AWGN channel, a code has an error correcting
capability that is less than 0.06 dB from the theoretical limit. Confirm-
ing our theoretical analysis, empirical results also show that irregular
low-density parity-check codes, decoded using the belief propagation al-
gorithm, achieve near the theoretical limit.
For purposes of explication, our channel of interest will be either the
BSC or AWGN. As before, we use throughout this chapter to mean
that the left hand side of is equal to the right hand side weighted by
a normalizing constant. We first state the algorithm using the notation
of Chapter 6. will represent the message from the constraint or
child node to the variable bit or parent node2, or the message from the
instantiated or child node to the variable bit or parent node in the
decoding round. will represent the message from the variable bit or
parent node to the constraint or child node3 in the decoding round.
The Bayesian network of a low-density parity-check code is shown in
Figure 8.5.
2
We assume that the reader will not confuse this with left degree sequences that use the same
notation.
3
Likewise, we assume that the reader will not confuse this with right degree sequences.
Low-Density Parity-Check Codes 165
two sets of nodes. Hence, we shall not use the superscript denoting the
decoding round number for the messages that are passed between the
two sets of nodes.
In the part of the Bayesian network that has the sets of nodes
and in each decoding round, each variable bit node sends a message
to a constraint node first and a constraint node sends a message to a
variable node second. Consider regular codes first. Define
and
Belief propagation decoding of a regular
low-density parity-check code proceeds as follows.
Lemma 185 tells us that the Constraint Node Pass can be reformulated
as
Low-Density Parity-Check Codes 167
where
Proof: [Sketch] For notational convenience, we will sketch the proof for
the case of regular codes. To calculate the probability density of the
message of the variable node pass, it will be convenient to express the
message in the form of a log-likelihood ratio. Let
If then
Hence for regular graphs,
168 CODES, GRAPHS, AND ITERATIVE DECODING
Expressing the message sent from the constraint node to the variable
node in the form of a log-likelihood ratio, we have
Then substituting for and from the Constraint Node Pass gives
us
As in the case of the variable node pass, we now calculate the proba-
bility distribution of the message from constraint node to variable node.
To facilitate the calculation, we adopt the following trick. Define
In Figure 8.6 [94], three curves for the best currently known
codes of length from the classes of regular low-density parity-
check codes, irregular low-density parity-check codes, and turbo codes
are shown. In Figure 8.7 [94], six curves for the best currently known
codes of lengths and from the classes of irregular
low-density parity-check code, and turbo code are shown. The dotted
curves represent turbo code curves and the solid curves represent the
low-density parity-check codes curves. These curves show that while
low-density parity-check codes perform worse than turbo codes for small
block length, they outperform turbo codes as the block length of the code
is increased.
Table 8.6, taken from [94], lists a set of good degree sequences for
irregular codes that give high standard deviation for successful decoding
in AWGN channel.
We can further improve the performance demonstrated in the previous
section by using codes defined over larger fields. The classical results tell
us to increase the block length of the code to get a lower probability of
decoding error. It follows that we want to know if we can improve the
code performance if we use bipartite graph of variable nodes over the
field instead of variable nodes over the
field GF(2). Following empirical results from [30], we can see that this
may indeed be the case. In Figure 8.8, low-density parity-check codes of
block lengths 2000, 1000, 6000, 2000 over fields GF(2), GF(4), GF(2),
GF(8), respectively, are tested over the BSC. In Figure 8.9, low-density
Low-Density Parity-Check Codes 171
parity-check codes of block lengths 18000, 9000, 6000, 6000 over fields
GF(2), GF(4), GF(8), GF(16), respectively, are tested over the binary
Gaussian channel. These figures show that code performance can be
improved by using the bipartite graph whose variable nodes take values
in larger field.
172 CODES, GRAPHS, AND ITERATIVE DECODING
6. Notes
It has been shown that low-density parity-check codes can achieve
the theoretical limit when maximum-likelihood decoded [75, 84]. Un-
fortunately, a linear-time bounded algorithm that provides maximum-
likelihood decoding of low-density parity-check codes is yet to be found if
it exists. In fact, optimal decoding of randomly constructed low-density
parity-check codes is known to be NP-hard [18]. As a result, we follow
the lead of Berrou, Glavieux, and Thitimajshima and look for linear-time
bounded suboptimal decoding algorithms that produce performance very
close to that given by maximum-likelihood decoding. Best known sub-
optimal decoding algorithms use the recursive structure of the code to
Low-Density Parity-Check Codes 173
facilitate the decoding, and are iterative. Each iteration takes only lin-
ear time, and the algorithms require only a constant number of rounds.
Bounds on the number of errors guaranteed to be corrected by some of
the algorithms indicate that they are very close to optimal decoding.
In particular, the suboptimal iterative decoding algorithms that we pre-
sented are variations of the belief propagation algorithm as applied to
the graph that defines the code. For example, hard-decision decoding
algorithms that we presented are variations of a hard-decision version of
the belief propagation algorithm and soft-decision decoding algorithm
was precisely the belief propagation algorithm. The variations of the
belief propagation algorithm presented here proceed until a codeword is
found or decoding failure is declared after some fixed number of decoding
iterations.
Gallager [40, 41] considered regular bipartite graphs of high girth and
gave an explicit construction of such graphs, as randomly chosen graphs
may have short cycles. He derived low complexity message-passing de-
coding algorithms on the graphs and the algorithms’ lower bounds on
the number of correctable errors. The explicit construction of high girth
graphs was motivated by his desire to make the analysis simple, given a
tree-like neighborhood structure of the graph. While Gallager’s explicit
construction yielded the necessary girth for his analysis, Margulis [78]
gave an explicit construction whose girth is larger than Gallager’s. Low-
density parity-check codes were then largely forgotten for more than 30
years, with the exception of Zyablov and Pinsker’s work [120] on de-
coding complexity and Tanner’s work [107] on codes defined by graphs.
174 CODES, GRAPHS, AND ITERATIVE DECODING
It was not until MacKay [73] showed that Gallager’s decoding algo-
rithms are related to the belief propagation algorithm that Gallager’s
work received renewed interest. Improving on the results of Gallager,
MacKay showed that low-density parity-check codes can achieve near
the theoretical limit if decoded by belief propagation algorithm [74], and
that low-density parity-check codes can achieve the theoretical limit if
maximum-likelihood decoded [75].
Sipser and Spielman [103] introduced a class of low-density parity-
check codes called expander codes in 1996 that are low-density parity-
check codes whose graphs have good expansion properties. Empirical
results show that Simple Sequential and Parallel Decoding Algorithms
seem to correct a significantly larger number of errors than are guaran-
teed by the theorems. Zémor [119] in 1999 and Barg and Zémor [13, 14]
in 2001 and 2002 have improved Sipser and Spielman’s result on the er-
ror correction capability of a family of explicitly constructible expander
codes. Furthermore, their results imply that the error exponent is a
better measure of error correcting performance than the minimum dis-
tance of a code, and show that a family of low-density parity-check codes
achieve capacity of a BSC under an iterative decoding procedure. The
construction of codes using expansion properties has also been studied
by Alon et al. [7], who constructed low-rate asymptotically good codes.
It is worth noting that while the encoding of expander codes requires
the usual quadratic-time complexity, Lafferty and Rockmore [62] gave an
encoding scheme based on Cayley graphs and representation theory that
has sub-quadratic-time complexity. In [95], Richardson and Urbanke
made use of the sparseness of parity-check matrix of low-density parity-
check codes to obtain efficient encoding schemes that allow near linear
time encoding. The application of Gallager’s decoding algorithms to
expander codes was carried out in [23].
In 1998, Luby et al. [71] generalized a result of Gallager that provided
lower bounds on the number of correctable errors on randomly chosen
regular and irregular graphs. Furthermore, Luby et al. [72] improved
MacKay’s results regarding the belief propagation decoding of regular
codes and was able to achieve performance even closer to the theoretical
limit by using irregular codes. In the same year, Davey and MacKay
[30] improved Luby et al.’s result by using irregular codes over larger
fields. Soon after, Richardson et al. [94] gave the best known irregular
codes over the binary field by carefully selecting the degree sequences of
the codes.
Linear programming and gaussian approximation methods have been
carried out in [25, 26] to compute the error probability of belief propaga-
tion. The former method resulted in low-density parity-check codes that
Low-Density Parity-Check Codes 175
have performance within 0.0045 dB from the theoretical limit. While the
best known codes relied on random selection from an ensemble of codes,
explicit algebraic expander and Ramanujan graphs were used to con-
struct codes that have performance comparable to regular low-density
parity-check codes in [63] and in [96], respectively.
Chapter 9
1. Introduction
Low-density generator codes are defined by a bipartite graph
where X represents the set of information nodes and C rep-
resents the set of check nodes. The codeword of a low-density generator
code is then the values of nodes in X concatenated with those in C. The
values of nodes in C are defined by those in X and the set of edges E.
The generator matrix of a low-density generator code is in the form of
G = [I : P] and the adjacency matrix of B is in the form of
where I is the identity matrix. It follows that the construction of a
low-density generator code reduces to the selection of a bipartite graph.
Since we will typically use the left and right sets of nodes in the bipar-
tite graph as the information and check nodes, respectively, we will refer
to information or variable nodes as left nodes and check nodes as right
nodes. The generator matrix induced by the graph is sparse because the
degree of the nodes in the graph is fixed while the number of nodes in
the graph is increased (hence the name for this class of codes). Clearly,
the encoding of a low-density generator code takes a linear number of
operations in the number of variable bits by construction.
Low-density generator codes clearly seem to share the features low-
density parity-check codes have and it may seem unnecessary to have a
chapter on low-density generator codes. In fact, one code is defined by
its generator matrix, and the other is defined by its parity-check matrix.
We shall see in this chapter that class of low-density generator codes
clearly has applications of its own and enables one to obtain results not
possible with low-density parity-check codes. To begin the exposition, let
178 CODES, GRAPHS, AND ITERATIVE DECODING
2. Decoding Analyses
In this section we exhibit linear-time decoding algorithms that are
closely related to the algorithms of low-density parity-check codes. We
present two simple bit-flipping algorithms and their bounds. While the
lower bounds on the number of errors that are guaranteed to be corrected
by the algorithms presented in the theorems are very small constants,
the simulation results show that the algorithms seem to correct a signif-
icantly larger number of errors. As one could have easily guessed, Gal-
lager’s Algorithms and the belief propagation algorithm can be applied
to low-density generator codes to obtain efficient decoding algorithms.
We only very briefly discuss them in this chapter, as the basic philosophy
of decoding analysis is identical to that shown in the previous chapter.
The performance of low-density generator codes and that of low-density
parity-check codes are also comparable, at least as decoded by these two
types of algorithms.
In conducting a performance analysis, it will be convenient to first
define an error reducing code. Roughly speaking, if not too many of the
variable bits and check bits are corrupted, then an error reducing code
corrects a fraction of the corrupted variable bits while leaving the cor-
rupted check bits unchanged. The algorithms and concepts introduced
in this section follow from [105].
the code are two simple decoding algorithms: Simple Sequential Error
Reducing Algorithm and Simple Parallel Error Reducing Algorithm.
If there is a variable bit that has more unsatisfied than satisfied neigh-
bors, then flip the value of that variable bit.
Repeat until no such variable bit remains
Proof: [Sketch] The average left and right node degrees are independent
of the code length, and the number of unsatisfied checks which is linear
in the code length decreases.
We cannot include Repeat until no such variable bit remains for the
Simple Parallel Error Reducing Algorithm since the algorithm may not
halt. The next theorem shows how many repeats are necessary for suc-
cessful error reduction.
184 CODES, GRAPHS, AND ITERATIVE DECODING
which finishes half of the proof of the theorem. If algorithm does not
halt, then after iterating for rounds for some constant K, we get
variable bits.
Proof: has rate and each check node has check bits, and
B is a graph between left nodes and right nodes.
Hence has check bits. If A is the set of variable bits
in which the input differs from a codeword of then a variable bit
will be corrupt at the end of a decoding round if
1 It receives a flip signal but is not in A, or
2 If it does not receive a flip signal but is in A.
It will be convenient to call a check node confused if it sends a flip
signal to an variable bit that is not corrupt, and unhelpful if it contains
an variable bit of A but fails to send a flip signal to that variable bit.
If a check node has at least corrupt variable and check
bits, then it will be confused. On the other hand, since each variable bit
is an input to two check nodes, there can be at most
confused check nodes. Because each of these can send at most flip
signals, at most
Low-Density Generator Codes 187
variable bits not in A can receive flip signals. By similar analysis, there
can be at most
variable bits.
Lemma 203 If
Low-Density Generator Codes 189
then
for then
where and are the average left node and right node degrees, respec-
tively.
Proof: The first inequality in the theorem is equivalent to
for then
where
Proof: Since
But
and
sends to
198 CODES, GRAPHS, AND ITERATIVE DECODING
sends to
sends to
For all edges do the following in parallel:
It suffices that this equation does not have a solution in the interval
(0,1] for the decoding to end successfully. In other words, we want
and let
5. Cascaded Codes
In this section we consider three constructions for cascading error
reducing codes with an error-correcting code. All three constructions
share the property that they can be encoded and decoded in linear-
time and that they are defined for an infinite family. In particular, the
last construction that we show is able to correct the maximum possible
fraction of errors correctable by any code over a given alphabet size.
Since half the relative minimum distance of the code is the upper bound
on the maximum fraction of errors that can be corrected, is the
maximum fraction for binary codes and is the maximum fraction for
codes of large alphabet size for some arbitrarily small positive constant
The first construction is due to Luby et al. [69], who developed it for
erasure codes. We will apply their construction here to error correcting
codes and give a bound on the fraction of errors that can be corrected.
Let each bipartite graph have left nodes and
right nodes. We associate each graph with an error reducing code
that has variable bits and check bits, We also use
an error correcting code C that has variable bits and check
bits.
To encode variable bits, apply to obtain check bits. Next,
use the check bits from as the variable bits for to
Low-Density Generator Codes 201
linear size circuit and hence the final decoding circuit has size
and depth
As in the previous theorem, the error correction capability for this code
does not depend on the position of errors, and assumption on the error
reduction code can be relaxed to give the same result.
Combining the results from Theorems 197 and 213, we have proved
the following.
Theorem 215 There exists an infinite family of linear-time encodable
and decodable error-correcting codes from irregular ex-
Through similar analysis, one can strengthen the fraction of error that
can be corrected to if the graphs are regular expanders.
Using the results from Theorems 199 and 214, we can obtain the parallel
version of the above theorem.
If the graphs are regular expanders, one can remove the
condition on and obtain the same results.
Theorem 216 There exists an infinite family of error-correcting codes
that can be encoded by circuits of linear size of logarithmic depth and
decoded by circuits of size of logarithmic depth from irregular
expander graphs with greater than or equal to
Lastly, let us look at codes due to Guruswami and Indyk [45]. The
codes have rate are defined over alphabet of size and can
correct the maximum possible fraction of errors which is for arbi-
trarily small We note that while best known explicit codes with
large relative minimum distance achieve code rate of the decod-
ing complexity of these codes is at least cubic in the blocklength of the
code. Codes by Guruswami and Indyk achieve large minimum distance
and are linear-time encodable and decodable. In particular, Spielman’s
code just described can correct about fraction of errors with
regular bipartite graphs, while their code can correct about 0.5 fraction
of errors. We note, however, that their codes are only additive and not
linear. In other words, their codes are defined over larger alphabet but
are only a vector space over GF(2).
The code is very simple to describe, and is defined by a bipartite
graph and an error correcting code The left nodes in the graph
206 CODES, GRAPHS, AND ITERATIVE DECODING
represent the codeword of code and the right nodes represent the
codeword of We shall use the just described code constructed by
Spielman [105] as our and assume that can correct a fraction of
errors for some The codeword of is defined by sending the bits
on the left nodes to their neighbors and for each right node, the value of
it is obtained by concatenating the received bits. So the codeword takes
values in an alphabet of larger size than that of code For example,
if a right node has 3 neighboring left nodes whose values are
respectively, then the value of is
The motivation of such a transformation of a codeword of to a new
codeword is to allow a much corrupted codeword of to a less corrupted
codeword. This transformation can be efficiently facilitated through the
use of an expander graph as the bipartite graph which will enable the
code to have large minimum distance. Let’s then describe the bipartite
graph, B, used in the code. Let G be a Ramanujan graph of
vertices with that is equal to Take B as the double cover of
G such that is a graph with
In particular, take Code has rate since
has constant rate and the encoding complexity of is that of plus the
number of left nodes times the degree of left nodes which equals
The decoding algorithm for the code is as follows.
Decoding:
For each left node, set the value to the majority of the right neigh-
boring bits.
6. Notes
Low-density generator codes were first empirically tested by Cheng
and McEliece [24], where they found that irregular codes perform better
than regular codes using belief propagation decoding. Spielman [105]
analyzed the potential of low-density generator codes through his simple
algorithms and showed that they can reduce the number of errors in
the variable bits. For this reason, he called the codes “error reducing
codes.” Through the recursive use of error reducing codes, he gave
the first explicit construction of a family of asymptotically good linear-
time encodable and decodable error-correcting codes. His codes can also
208 CODES, GRAPHS, AND ITERATIVE DECODING
[1] S.M. Aji and R.J. McEliece, “A general algorithm for distributing information
on a graph,” Proc. 1997 IEEE Int. Symp. on Inform. Theory, Ulm, Germany,
July 1997.
[2] S.M. Aji, G.B. Horn and R.J. McEliece, “Iterative decoding on graphs with a
single cycle,” Proc. 1998 IEEE Int. Symp. on Inform. Theory, Boston, USA,
August 1998.
[3] M. Ajtai, J. Komlos and E. Szemeredi, “Deterministic simulation in logspacc,”
Proc. 19th Annual ACM Symp. on Theory of Computing, pp. 132-139, 1987.
[4] A. Albanese, J. Blömer, J. Edmonds, M. Luby and M. Sudan, “Priority Encod-
ing Transmission,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1737-1744,
Nov. 1996.
[5] N. Alon, “Eigenvalues and expanders,” Combinatorica, vol. 6, no. 2, pp. 83-96,
1986.
[6] N. Alon and F.R.K. Chung, “Explicit construction of linear sized tolerant net-
works,” Discr. Math., vol. 72, pp. 15-19, 1988.
[7] N. Alon, J. Bruck, J. Naor, M. Naor and R. Roth, “Construction of asymp-
totically good low-rate error-correcting codes through pseudo-random graphs,”
IEEE Trans. Inform. Theory, vol. 38, pp. 509-516, 1992.
[8] N. Alon, J. Edmonds, and M. Luby, “Linear Time Erasure Codes with Nearly
Optimal Recovery,” Proc. 36th Annual Symp. on Foundations of Computer
Science, pp. 512-519, 1995.
[9] N. Alon and M. Luby, “A Linear Time Erasure-Resilient Code with Nearly
Optimal Recovery,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1732-1736,
Nov. 1996.
[10] N. Alon and J.H. Spenser, The Probabilistic Method. New York: Wiley, 2000.
[11] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, “Optimal decoding of linear codes
for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. 20, pp.
284-287, Mar. 1974.
210 CODES, GRAPHS, AND ITERATIVE DECODING
[30] M.C. Davey and D.J.C. MacKay, “Low-Density Parity-Check Codes over
GF(q),” IEEE Commun. Letters, vol. 2., no. 6, June 1998.
[31] D. Divsalar and F. Pollara, “Multiple Turbo Codes for Deep-Space Communi-
cations,” TDA Progress Report 42-121, pp. 66-77, May 15, 1995.
[32] D. Divsalar and F. Pollara, “Turbo Codes for PCS Applications,” Proc. IEEE
Int. Conf. on Communications, Seattle, Washington, June 1995.
[33] D. Divsalar and R.J. McEliece, “On the Design of Generalized Concatenated
Coding Systems with Interleavers,” manuscript, 1998.
[34] S. Dolinar and D. Divsalar, “Weight Distributions for Turbo Codes Using Ran-
dom and Nonrandom Permutations,” TDA Progress Report 42-122, pp. 56-65,
August 15, 1995.
[35] H. El Gamal and A.R. Hammons, Jr, “Analyzing the turbo decoder using the
Gaussian approximation,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 671-
686, Feb. 2001.
[36] P. Elias, “Coding for Noisy Channels,” IRE Conv. Record, Part 4, pp. 37 - 47,
1955.
[38] G.D. Forney, Jr., “The forward-backward algorithm,” Proc. 34th Allerton Con-
ference on Communications, Control and Computing, 1996.
[39] B.J. Frey, Graphical Models for Machine Learning and Digital Communication.
The M.I.T. Press, Cambridge, MA, 1998.
[40] R.G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform. Theory,
vol. 8, pp. 21-28, Jan. 1962.
[41] R.G. Gallager, Low-Density Parity-Check Codes. The M.I.T. Press, Cambridge,
MA, 1963.
[42] E.N. Gilbert, “A Comparison of Signaling Alphabets,” Bell Sys. Tech. J., vol.
31, pp. 504-522, 1952.
[44] W.C. Gore, “Transmitting Binary Symbols with Reed-Solomon Codes,” Pro-
ceedings of the Princeton Conference on Information Science and Systems,
Princeton, New Jersey, pp. 495 - 497, 1973.
[45] V. Guruswami and P. Indyk, “Linear-time Codes to Correct a Maximum Pos-
sible Fraction of Errors,” Proc. 39th Allerton Conference on Communications,
Control and Computing, 2001.
[46] V. Guruswami and P. Indyk, “Near-optimal linear-time codes for unique decod-
ing and new list-decodable codes over smaller alphabets,” preprint, 2002.
[47] J. Hagenauer, E. Offer and L. Papke, “Iterative decoding of binary block and
convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 429-445,
Mar. 1996.
[48] C. Heegard and S.B. Wicker, Turbo Coding. Kluwer Academic Press, 1998.
[52] K. A. S, Immink, “RS Codes and the Compact Disc,” in Reed Solomon Codes
and Their Applications, (Stephen Wicker and Vijay Bhargava, ed.) , IEEE Press,
1994.
[53] F.V. Jensen, S.L. Lauritzen and K.G. Olesen, “Bayesian updating in recursive
graphical models by local computation,” Computational Statistical Quarterly,
vol. 4, pp. 269-282, 1990.
[54] H. Jin, A. Khandekar and R. McEliece, “Irregular Repeat-Accumulate Codes,”
Proc. 2nd. International Conf. Turbo Codes, Brest, France, pp. 1-8, Sept. 2000.
[55] N. Kahale, “Expander Graphs,” Ph.D. dissertation, M.I.T., 1993.
[61] F.R. Kschischang, B.J. Prey and H-A Loeliger, “Factor Graphs and the Sum-
Product Algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498-519,
Feb. 2001.
[62] J. Lafferty and D.N. Rockmore, “Spectral Techniques for Expander Codes,”
Proc. 29th Annual ACM Symposium on Theory of Computing, pp. 160-167,
1997.
[63] J. Lafferty and D.N. Rockmore, “Codes and Iterative Decoding on Algebraic
Expander Graphs,” Int. Symp. Inform. Theory and Appl., Nov. 2000.
[64] S.L. Lauritzen and D.J. Spiegelhalter, “Local Computation with Probabilities
on Graphical Structures and Their Application to Expert Systems,” Journal of
the Royal Statistical Society, Series B, vol. 50, pp. 157-224, 1988.
[65] S. Le Goff, A. Glavieux and C. Berrou, “Turbo-Codes and High Spectral Effi-
ciency Modulation,” Proc. IEEE Int. Conf. on Communications, New Orleans,
USA, May 1994.
[66] R. Lidl and H. Niederreiter, Finite Fields, Reading, Mass.: Addison Wesley,
1983.
[67] A. Lubotzky, R. Phillips and P. Sarnak, “Ramanujan Graphs,” Combinatorica,
vol. 8, no. 3, pp. 261-277, 1988.
[68] S. Lin and E.J. Weldon, “Further Results on Cyclic Product Codes,” IEEE
Trans. Inform. Theory, vol. IT-16, no. 4, pp. 452-459, July 1970.
[73] D.J.C. MacKay and R.M. Neal, “Good error-correcting codes based on very
sparse matrices,” Cryptography and Coding, Lecture Notes in Computer Science
no. 1025, pp. 100-111, Springer-Verlag, 1995.
[74] D.J.C. MacKay and R.M. Neal, “Near Shannon limit performance of low density
parity check codes,” Electron. Lett., vol. 32, no. 18, pp. 1645-1646, Aug. 1996;
reprinted Electron. Lett., vol. 33, no. 6, pp. 457-458, Mar. 1997.
214 CODES, GRAPHS, AND ITERATIVE DECODING
[75] D.J.C. MacKay, “Good Error-Correcting Codes based on Very Sparse Matrices,”
IEEE Trans. Inform. Theory, vol. 45, pp. 399-431, Mar. 1999.
[76] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes,
Amsterdam: North Holland, 1977.
[77] G.A. Margulis, “Explicit constructions of concentrators,” Probl. Inform.
Transm., vol. 9, pp. 325-332, 1973.
[78] G.A. Margulis, “Explicit constructions of graphs without short cycles and low
density codes,” Combinatorica, vol. 2, pp. 71-78, 1982.
[79] G.A. Margulis, “Explicit group-theoretical constructions of combinatorial
schemes and their applications to the design of expanders and concentrators,”
Probl. Inform. Transm., vol. 24, pp. 39-46, 1988.
[80] R. J. McEliece, E. R. Rodemich, H. C. Rumsey, Jr. and L. R. Welch, “New Upper
Bounds on the Rate of a Code using the Delsarte-MacWilliams Inequalities,”
IEEE Trans. Inform. Theory, vol. 23, pp. 157-166, 1977.
[81] R. J. McEliece, Finite Fields for Computer Scientists and Engineers, Boston:
Kluwer Academic Publishers, 1987.
[82] R.J. McEliece, E. Rodemich and J.-F. Cheng, “The Turbo Decision Algorithm,”
Proc. 33rd Allerton Conference on Communication, Control and Computing,
1995.
[83] R.J. McEliece, D.J.C. MacKay and J.-F. Cheng, “Turbo Decoding as an In-
stance of Pearl’s ‘Belief Propagation’ Algorithm,” IEEE Journal on Selected
Areas in Commun., vol. 16, pp. 140-152, Feb. 1998.
[84] G. Miller and D. Burshtein, “Bounds on the Maximum-Likelihood Decoding
Error Probability of Low-Density Parity-Check Codes,” IEEE Trans. Inform.
Theory, vol. 47, pp. 2696-2710, Nov. 2001.
[85] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University
Press, 1995.
[86] P. Oswald and M.A. Shokrollahi, “Capacity-Achieving Sequences for the Erasure
Channel,” manuscript, 2000.
[87] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
[88] W. W. Peterson, “Encoding and Error-Correction Procedures for the Bose-
Chaudhuri Codes,” IRE Transactions on Information Theory, Volume IT-6,
pp. 459 - 470, September 1960.
[89] L.C. Perez, J. Seghers and D.J. Costello, Jr., “A distance spectrum interpreta-
tion of turbo codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 1698-1709, Nov.
1996.
[90] N. Pippenger, “Superconcentrators,” SIAM Journal of Computing, vol. 6, pp.
298-304, 1977.
REFERENCES 215