0% found this document useful (0 votes)

25 views

Fundamentals of Codes, Graphs, and Iterative Decoding

The document is a comprehensive text on error control coding, covering fundamental concepts such as digital communication, abstract algebra, linear block codes, and iterative decoding techniques. It discusses the evolution of coding theory, highlighting key developments and methodologies in the field. The book serves as a resource for understanding the interplay between codes, graphs, and decoding algorithms.

Uploaded by

Laura Coleman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Fundamentals of Codes, Graphs, and Iterative Decoding

Uploaded by

Laura Coleman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 241

FUNDAMENTALS OF CODES,

GRAPHS, AND ITERATIVE

DECODING
THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE
FUNDAMENTALS OF CODES,
GRAPHS, AND ITERATIVE
DECODING

Stephen B. Wicker
Cornell University,
Ithaca, NY, U.S.A.

Saejoon Kim
Korea Institute for Advanced Study,
Seoul, Korea

KLUWER ACADEMIC PUBLISHERS

NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: 0-306-47794-7
Print ISBN: 1-4020-7264-3

©2002 Kluwer Academic Publishers

New York, Boston, Dordrecht, London, Moscow

Print ©2003 Kluwer Academic Publishers

Dordrecht

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.com

and Kluwer's eBookstore at: http://ebooks.kluweronline.com
Contents

List of Figures ix
List of Tables xi
Preface xiii
1. DIGITAL COMMUNICATION 1
1. Basics 1
2. Algorithms and Complexity 5
3. Encoding and Decoding 6
4. Bounds 8
5. Overview of the Text 12
2. ABSTRACT ALGEBRA 13
1. Sets and Groups 13
2. Rings, Domains, and Fields 16
3. Vector Spaces and 23
4. Polynomials over Galois Fields 28
5. Frequency Domain Analysis of Polynomials over GF(q) 34
6. Ideals in the Ring 37
3. LINEAR BLOCK CODES 39
1. Basic Structure of Linear Codes 40
2. Repetition and Parity Check Codes 43
3. Hamming Codes 44
4. Reed-Muller Codes 45
5. Cyclic Codes 49
6. Quadratic Residue Codes 50
vi CODES, GRAPHS, AND ITERATIVE DECODING

7. Golay Codes 51
8. BCH and Reed-Solomon Codes 53
9. Product Codes 58
4. CONVOLUTIONAL AND CONCATENATED CODES 61
1. Convolutional Encoders 62
2. Analysis of Component Codes 65
3. Concatenated Codes 68
4. Analysis of Parallel Concatenated Codes 71
5. ELEMENTS OF GRAPH THEORY 79
1. Introduction 80
2. Martingales 83
3. Expansion 86
6. ALGORITHMS ON GRAPHS 93
1. Probability Models and Bayesian Networks 94
2. Belief Propagation Algorithm 99
3. Junction Tree Propagation Algorithm 104
4. Message Passing and Error Control Decoding 109
5. Message Passing in Loops 115
7. TURBO DECODING 121
1. Turbo Decoding 121
2. Parallel Decoding 126
3. Notes 132
8. LOW-DENSITY PARITY-CHECK CODES 137
1. Basic Properties 137
2. Simple Decoding Algorithms 143
3. Explicit Construction 147
4. Gallager’s Decoding Algorithms 151
5. Belief Propagation Decoding 162
6. Notes 172
9. LOW-DENSITY GENERATOR CODES 177
1. Introduction 177
2. Decoding Analyses 181
3. Good Degree Sequences 188
Contents vii

4. Irregular Repeat-Accumulate Codes 196

5. Cascaded Codes 200
6. Notes 207
References 209

Index 217
List of Figures

4.1 Non-Recursive Rate-1/2 Encoders:

(a) Systematic, (b) Nonsystematic 62
4.2 Non-Recursive Rate-2/3 Encoder 62
4.3 Recursive Rate-1/2 Encoders: (a)
Systematic, (b) Nonsystematic 64
4.4 The Serial Concatenated CCSDS Telemetry Standard 69
4.5 A Parallel Concatenated Encoder 71
5.1 An Undirected Graph 80
5.2 A Directed Graph 81
5.3 Edge-Vertex Incidence Graph 82
6.1 A Directed Probability Graph and its Moral Graph 96
6.2 Perfect Directed and Undirected Probability Graphs 96
6.3 Directed Graphs: (a) Unconnected, (b) Connected
Cyclic, (c)Connected Acyclic (DAG) 97
6.4 DAG’s: (a) Multiply-Connected, (b) Simple Tree,
(c) Polytree 98
6.5 Cross-Section of a Singly-Connected Bayesian Network 101
6.6 Constructing a Junction Tree 106
6.7 Cross-Section of a Junction Tree 107
6.8 A Block Code Graph 109
6.9 Convolutional Code Graphs 112
6.10 Trellis Graphs 114
6.11 A Loopy Graph and an Equivalent Tree 116
6.12 A Loopy Graph and its Equivalent Tree of Depth 3 117
6.13 A Single Loop Graph 118
x CODES, GRAPHS, AND ITERATIVE DECODING

7.1 The Turbo Decoding Problem 121

7.2 Turbo Decoder 123
7.3 Bayesian Network for a Parallel Concatenated Code 124
7.4 Parallel Mode of Decoding 127
7.5 Extended Parallel Modes of Decoding 130
7.6 Performance of Turbo Decoding in Serial Mode 133
7.7 Performance of Turbo Decoding in Parallel Mode 133
7.8 Performance of Turbo Decoding in Extended Par-
allel One Mode 135
7.9 Performance of Turbo Decoding in Extended Par-
allel Two Mode 135
8.1 Bipartite Graph 139
8.2 Unwrapped Bipartite Graph 155
8.3 Percentage of Successes for Codes 1 and 2 Based
on 2000 Trials 163
8.4 Percentage of Successes for Codes 3 and 4 Based
on 2000 Trials 163
8.5 Bayesian Network Representation of a Low-Density
Parity-Check Code 164
8.6 Best Known and Length Codes 170
8.7 Codes for Various Code Lengths 171
8.8 Low-Density Parity-Check Codes over BSC(p) 171
8.9 Low-Density Parity-Check Codes over Binary Gaus-
sian Channel 173
9.1 Bipartite Graph Representing a Code 179
9.2 Bipartite Graph 179
9.3 Bipartite Graph Representation of a (7,4)-Hamming
Code 180
9.4 (2, 3)-Regular Graph 189
9.5 Tree-Like Neighborhood of Depth-2 Graph with
AND-OR Tree 190
9.6 Cascaded Code 201
9.7 Spielman’s Cascaded Code 203
List of Tables

2.1 Minimal Polynomials of the Elements in GF(8)

with Respect to GF(2) 32
2.2 Transforms of the Minimal Polynomials of the El-
ements in GF(8) with Respect to GF(2) 37
3.1 Minimal Polynomials of the Nonzero Elements in
GF(32) with Respect to GF(2) 56
4.1 The Best Rate 1/2 Recursive Systematic Convolu-
tional Component Codes for Rate 1/3 PCC’s with
Interleaver Size = 100 78
8.1 Threshold by Gallager’s Algorithm 1 for Various
Regular Codes 157
8.2 Threshold by Gallager’s Algorithm 2 for Various
Regular Codes 158
8.3 Degree Sequence of Some Codes 162
8.4 Threshold and Theoretical Limit for 169
8.5 Threshold and Theoretical Limit for the
AWGN Channel 169
8.6 Good Degree Sequences 172
9.1 Codes 194
9.2 Right Regular Codes for Rates Close to and 196
9.3 Performance of Right-Regular Irregular Repeat-Accumulate
codes 200
We dedicate this book, with love and thanks, to our
parents:

Louise Zeller Wicker,

Richard Fenton Wicker, Jr.,
Jung-ja Choi Kim,
and
Gu-ung Kim.
Preface

As with all scientific and engineering endeavors, the fifty-year history

of error control coding can best be characterized as a mass of incremental
research interrupted by occasional great leaps forward. The punctuated
equilibrium model1, developed by Niles Eldridge and the late Stephen
Jay Gould to describe the process of natural evolution, is an equally apt
model for the development of error control coding. Eldridge and Gould
felt that the old models of speciation theory could not predict grad-
ual transitions over millions of years, and that instead, the evolution of
species was best characterized by the sudden appearance of new species,
occasional eruptions in what would otherwise be an unbroken landscape
of species stability. So we have found it in coding theory, but with one
significant difference. Coding theorists have always had a well-defined
goal – the performance bound set by Shannon’s Noisy Channel Coding
Theorem2 – as well as useful metrics for assessing our progress toward
that goal – signal to noise ratios, bit error rates, and computational com-
plexity. Given the goal and metrics, we can safely state in the Summer
of 2002 that error control coding has entered a fundamentally new and
different stage in its development.
Looking back, several global tendencies can be seen to have been most
helpful in punctuating the equilibrium and getting us where we are now.
The most obvious lies in the exploitation of structure – the interpretation
of error control codes in light of combinatorial, algebraic, and probabilis-
tic models has allowed for the application of increasingly powerful design

1
N. Elridge and S. J. Gould, “Punctuated Equilibria: An Alternative to Phyletic Gradual-
ism,” in Models in Paleobiology, T. J. M. Schopf (ed), San Francisco: Freeman Cooper, pp.
82 - 115, 1972.
2
C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Jour-
nal, Volume 27, pp. 379 - 423 and pp. 623 - 656, 1948.
xiv CODES, GRAPHS, AND ITERATIVE DECODING

tools. Slightly less obvious has been our dependence on and exploitation
of other fields. The recognition of structure has allowed for the identi-
fication of connections to others fields of mathematics and engineering,
and the subsequent looting of the other fields’ respective toolboxes. Fi-
nally, coding theorists have shown great prescience over the years, a
prescience so extreme that we often fail to appreciate our colleagues’ re-
sults for several decades. Fortunately many of our colleagues have very
good memories, and we can thus incorporate and build on results that
were initially given short shrift.
To put this current volume in context, a quick review of that past
fifty years will prove helpful.
The first significant error control codes – those due to Hamming3
and Golay4 – were based on linear algebra and some relatively simple
combinatorial techniques. The early error control codes were developed
as linear block codes – subspaces of vector spaces over finite fields. These
subspaces have dual spaces, whose bases can be interpreted as explicit
parity relations among the coordinates of the codewords that constitute
the code. The creation and exploitation of parity relations is a major
theme in this book, and the creative and intelligent inculcation of parity
relations is clearly the key to the recent developments in error control.
In the 1950’s, however, the principle metric for the quality of an error
control code was minimum distance, with the Hamming bound serving as
the performance limit. This conflation of the sphere packing and error
control problems was limiting, and the discovery of all of the perfect
codes by 1950 (a result unknown at the time5) left little room for new
results through the combinatorial approach.
Reed took the first step away from the combinatorial approach with
his recognition that Muller’s application of Boolean algebra to switch-
ing circuits could be re-interpreted as a construction technique for error
control codes. Reed saw that by viewing codewords as truth tables of
Boolean functions, various results in Euclidean geometry and Boolean
algebra could be used as design tools for error control codes6. The re-
sulting Reed-Muller codes were a significant step beyond the earlier work

3
R. W. Hamming, “Error Detecting and Error Correcting Codes”, Bell System Technical
Journal, Volume 29, pp. 147 – 160, 1950.
4
M. J. E. Golay, “Notes on Digital Coding,” Proceedings of the IRE, Volume 37, pg. 657,
June 1949.
5
A. Tietäväinen, “On the Nonexistence of Perfect Codes over Finite Fields,” SIAM Journal
of Applied Mathematics, Volume 24, pp. 88 - 96, 1973.
6
I. S. Reed, “A Class of Multiple-Error-Correcting Codes and a Decoding Scheme,” IEEE
Transactions on Information Theory, Volume 4, pp. 38 – 49, September 1954. See also D. E.
Muller, “Application of Boolean Algebra to Switching Circuit Design,” IEEE Transactions
on Computers, Volume 3, pp. 6 - 12, September 1954.
PREFACE xv

of Hamming and Golay, but remained relatively weak in comparison to

what was to come.
The next major step beyond the combinatorial approach was made
by Reed and Solomon in 19607. By interpreting the coordinates of code-
words as the coefficients of polynomials, they opened up a world of
structure that allowed for far more powerful and elegant codes. The
concurrent development of the theory of cyclic codes by Prange, Bose,
Ray-Chaudhuri, Hocquenghem and others led to an interpretation of
Reed-Solomon, BCH and in general all cyclic codes as rings of polyno-
mials over finite fields8. This led to several deep results in algebraic
coding theory in the 1960’s, culminating in Berlekamp’s decoding algo-
rithm for Reed-Solomon codes in 19679.
At virtually the same time that Reed was trying to move beyond
Reed-Muller codes, Elias was focusing on the the use of shift registers
for creating parity relations in an information stream10. The resulting
convolutional encoders were a significant advance in that they consti-
tuted a means for recursively introducing parity constraints across an
arbitrarily large information stream, and were thus the first significant
step toward the codes that would provide the performance promised by
Shannon’s work. The subsequent development of sequential decoders by
Fano11 and others was even more promising, in hindsight, in that it con-
stituted a suboptimal, yet efficient approach to decoding convolutional
codes with extremely long constraint lengths.
The sequential decoding of convolutional codes gave way to Viterbi
decoding12 in the late 1960’s. The “optimal,” maximum-likelihood ap-
proach to decoding represented by the Viterbi algorithm works extremely

7
I. S. Reed and G. Solomon, “Polynomial Codes over Certain Finite Fields,” SIAM Journal
on Applied Mathematics, Volume 8, pp.300 – 304, 1960. See also S. B. Wicker and V. K.
Bhargava, (editors) , Reed-Solomon Codes and Their Applications, Piscataway: IEEE Press,
1994.
8
See, for example, E. Prange, “Some Cyclic Error-Correcting Codes with Simple Decoding
Algorithms,” Air Force Cambridge Research Center-TN-58-156, Cambridge, Mass., April,
1958, R. C. Bose and D. K. Ray-Chaudhuri, “On a Class of Error Correcting Binary Group
Codes,” Information and Control, Volume 3, pp. 68 - 79, March 1960, A. Hocquenghem,
“Codes Correcteurs d’Erreurs,” Chiffres, Volume 2, pp. 147 - 156, 1959, and D. Gorenstein
and N. Zierler, “A Class of Error Correcting Codes in Symbols,” Journal of the Society
of Industrial and Applied Mathematics, Volume 9, pp. 207 - 214, June 1961.
9
E. Berlekamp, “Nonbinary BCH Decoding,” presented at the 1967 International Symposium
on Information Theory, San Remo, Italy. See also E. R. Berlekamp, Algebraic Coding Theory,
New York: McGraw-Hill, 1968. (Revised edition, Laguna Hills: Aegean Park Press, 1984.)
10
P. Elias, “Coding for Noisy Channels,” IRE Conv. Record, Part 4, pp. 37 – 47, 1955.
11
R. M. Fano, “A Heuristic Discussion of Probabilistic Decoding,” IEEE Transactions on
Information Theory, IT-9, pp. 64 - 74, April 1963.
12
A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum
Decoding Algorithm,” IEEE Transactions on Information Theory, IT-13, pp. 260 - 269,
April 1967.
xvi CODES, GRAPHS, AND ITERATIVE DECODING

well for many applications, including deep space telecommunications13,

and has broader applications in operations research14. As the complexity
of maximum-likelihood decoders increases exponentially with the con-
straint length of the codes in use (and thus with the extent of the parity
relations across the information stream), the performance of convolu-
tional encoding with Viterbi decoding is limited by the amount of com-
putational power available to the decoder. This is not to say, however,
that efforts were not made. Increasingly immense Viterbi decoders were
developed for deep space telecommunications through the early 1990’s15,
ending only when the advent of turbo error control offered a significantly
better, less complex alternative.
As convolutional encoders impart parity relations through the use of a
delay line, the encoding process can be described in terms of a sequence
of transitions from one encoder state to another. It follows that the
resulting code can be represented as a trellis16. In 1974 Bahl, Cocke,
Jelinek, and Raviv showed that any linear block code can be represented
as a trellis, making them amenable to “optimal” soft-decision decoding17.
Bahl et al. also showed that a particularly powerful algorithm for solving
the inference problem in hidden Markov models can be applied to the
decoding of any code that can be represented as a trellis. This algorithm,
known as the Baum-Welch (BW) algorithm in the statistics community,
was developed in a classified research environment in the early 1960’s.
It was described in a series of articles18 in the late 1960’s, and was
subsequently applied and duly referenced by Bahl et al. in 1974. The
BW algorithm was a progenitor of the class of Expectation-Maximization

13
S. B. Wicker, “Deep Space Applications,” Handbook of Coding Theory, (Vera Pless and
William Cary Huffman, ed.), Amsterdam: Elsevier, 1998.
14
The structure of the Viterbi algorithm has its roots in earlier optimization algorithms.
See, for example, G. J. Minty, “A Comment on the Shortest Route Problem,” Operations
Research, Volume 5, p. 724, October 1957.
15
O. Collins, “The Subtleties and Intricacies of Building a Constraint Length 15 Convo-
lutional Decoder,” IEEE Transactions on Communications, Volume 40, Number 12, pp.
1810–1819, December 1992. See also S. B. Wicker, “Deep Space Applications,” Handbook of
Coding Theory, (Vera Pless and William Cary Huffman, ed.), Amsterdam: Elsevier, 1998.
16
See, for example, G. D. Forney, Jr., “Convolutional Codes I: Algebraic Structure,” IEEE
Transactions on Information Theory, IT-16, pp. 720 – 738, November 1970 and G. D. Forney,
Jr., “Convolutional Codes II: Maximum Likelihood Decoding,” Information and Control,
Volume 25, pp. 222 – 266, July 1974.
17
L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for
minimizing symbol error rate,” IEEE Transactions on Information Theory, IT-20:284–287,
1974.
18
L.E. Baum and T. Petrie, “Probabilistic functions of finite state Markov chains,” Ann.
Math. Stat. 37:1554–1563, 1966, L.E. Baum and G.R. Sell, Growth transformations for
functions on manifolds,” Pac. J. Math. 27(2):211–227, 1968, and L. E. Baum, T. Petrie,
G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of
probabilistic functions of Markov chains” Ann. Math. Stat. 41:164–171, 1970.
PREFACE xvii

(EM) algorithms, and remains a topic of research in statistics. When

applied to convolutional codes, the BW algorithm19 provides a means for
iteratively generating maximum likelihood estimates of the information
represented by the received codeword. This stands in contrast to the
Viterbi algorithm, which provides a maximum likelihood estimate of
the transmitted codeword, not the information used to generate that
codeword. This distinction would prove crucial in the performance of
turbo decoders.
The equilibrium that was coding theory was punctuated in the 1990’s
by two interconnected events. Over the past ten years these events
have launched the field into a significantly different stage in its devel-
opment. The first event was the invention of turbo error control by
Berrou, Glavieux, and Thitimajshima20. What is now called “Turbo
Coding” consists of two discrete elements: parallel concatenated encod-
ing and iterative (turbo) decoding. Parallel concatenation is a clever
means for generating very complicated codes using several “component”
encoders. Turbo decoding exploits this component encoder substructure
by using separate, relatively simple BW decoders to develop separate es-
timates of the transmitted information. A consensus is then obtained,
when possible, by iterating between the estimates21. Turbo error control
brought coding theory within a few tenths of a decibel of the Shannon
limit. The only problem was that it was not at all clear how or why it
worked.
The second critical event was the recognition by the teams of McEliece,
MacKay, and Cheng and Kschischang and Frey that turbo decoding was
actually an instance of belief propagation in a graphical model22. This
was a critical discovery in that it freed research in coding theory from
the specific, and in places ad hoc elements of turbo coding, and brought
the focus to bear on the more general problems of algorithms on graphs.
It is now clear that the best error control systems are to be developed
through the systematic, recursive generation of parity connections and

19
The portion of the BW algorithm relevant to the decoding of convolutional codes is often
referred to in the coding community as the BCJR algorithm.
20
C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding
and decoding: Turb Codes,” Proceedings of the 1993 International Conference on Commu-
nications, 1064–1070, 1993.
21
See, for example, C. Heegard, and S. B. Wicker, Turbo Coding, Boston: Kluwer Academic
Press, 1999.
22
R. J. McEliece, D. J. C. MacKay and J. -F. Cheng, “Turbo Decoding as an Instance
of Pearl’s ‘Belief Propagation’ Algorithm,” IEEE Journal on Selected Areas in Commun.,
vol. 16, pp. 140-152, Feb. 1998 and F.R. Kschischang and B.J. Frey, “Iterative Decoding
of Compound Codes by Probability Propagation in Graphical Models,” IEEE Journal on
Selected Areas in Commun., vol. 16, pp. 219-230, Feb. 1998.
xviii CODES, GRAPHS, AND ITERATIVE DECODING

iterative, suboptimal decoding that reduces complexity by exploiting re-

peated local structure across the code. The key tools are graph theory
and probabilistic methods based on graphs.
With these ideas firmly in place, the systematic despoiling of graph
theory, bayesian belief propagation, and coding theory’s own deep archives
began. The forty-year-old work of Gallager on low density parity check
codes23 was finally recognized for being a deeply prophetic work. In
his 1961 Ph.D. thesis, Gallager had developed techniques for recursively
generating very long codes, and then proposed several suboptimal tech-
niques for decoding these codes. Given a graph-theoretic interpretation,
this was exactly the direction that needed to be taken to realize the
promise of Shannon. More recent work by Tanner24 was also dusted off
and recognized as critical to the construction of good long codes with
local structure that lends itself to iterative decoding.
Sipser and Spielman showed a quick appreciation for the work of Gal-
lager and Tanner, extending it to a class of low-density parity-check
codes called expander codes25 in 1996. MacKay then showed in 199926
that low-density parity-check codes can achieve the Shannon limit when
decoded using a maximum-likelihood decoding algorithm. It was then
only a matter of time, with Davey, MacKay, Luby, Mitzenmacher, Shokrol-
lahi, Spielman, Richardson, Urbanke, and others trading results27 in a
last dash to the Shannon limit.
Fifty years of learning how to design good codes can now be reduced
to a single sentence: good codes have high degrees of local connectiv-
ity, but must have simply structural descriptions to facilitate iterative
decoding. This book is an explanation of how to introduce local con-
nectivity, and how to exploit simple structural descriptions. Chapter
1 provides an overview of Shannon theory and the basic tools of com-
plexity theory, communication theory, and bounds on code construction.
Chapters 2 – 4 provide an overview of “classical” error control coding,

23
R.G. Gallager, Low-Density Parity-Check Codes. The M.I.T. Press, Cambridge, MA, 1963.
24
R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform.
Theory, vol. IT-27, pp. 533-547, Sept. 1981.
25
M. Sipser and D.A. Spielman, “Expander Codes,” IEEE Trans. Inform. Theory, vol. IT-42,
pp. 1710-1722, Nov. 1996.
26
D.J.C. MacKay, “Good Error-Correcting Codes based on Very Sparse Matrices,” IEEE
Trans. Inform. Theory, vol. 45, pp. 399-431, Mar. 1999.
27
See, for example, M.G. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman,
“Improved Low-Density Parity-Check Codes Using Irregular Graphs and Belief Propagation,”
Proc. 1998 IEEE Int. Symp. on Inform. Theory, Boston, USA, August 16-21, 1998, M.C.
Davey and D.J.C. MacKay, “Low-Density Parity Check Codes over GF(q),” IEEE Commun.
Letters, vol. 2., no. 6, June 1998, an T. Richardson, M.A. Shokrollahi and R. Urbanke,
“Design of Provably Good Low-Density Parity-Check Codes,” submitted to IEEE Trans.
Inform. Theory.
PREFACE xix

with an introduction to abstract algebra, and block and convolutional

codes. Chapter 5 – 9 then proceed to systematically develop the key
research results of the 1990’s and early 2000’s with an introduction to
graph theory, followed by chapters on algorithms on graphs, turbo error
control, low density parity check codes, and low density generator codes.
This book is intended as a synthesis of recent research results with
a recognition of where these results fit into the bigger picture of error
control coding.
The authors have been very fortunate to have the active cooperation
of several of those who have made key contributions in the last few years,
including Alexander Barg, Sae-Young Chung, Venkat Guruswami, Amin
Shokrollahi, and Yair Weiss. Special thanks go to Alexander Barg and
Amin Shokrollahi for carefully reading early versions of the book and
providing us with invaluable help and suggestions.
The authors would like to thank the National Science Foundation and
the Defense Advanced Research Projects Agency of the United States, as
well as Samsung Electronics Co. and the Korea Institute for Advanced
Study for their long term support for our efforts.
The authors extend their thanks to their editor, Jennifer Evans, for
her support, patience, and good humor.
The first author would also like to thank Toby Berger, Terrence Fine,
Robert Thomas, and James Thorp for their able mentoring over the
past few years. Their efforts have been greatly appreciated, though they
should not be held responsible for the subsequent results.
The first author would like to extend warm thanks to the twenty-five
doctoral students he has supervised over the past fifteen years. He is
grateful for the intellectual stimulation that they have provided, and for
their boundless energy, dedication, and patience. And of course, the first
author is very grateful to have had the opportunity to “supervise” the
research of the second author, Dr. Saejoon Kim. Graduate students are
certainly one, if not the only unmixed blessing of an academic career.
Finally, the authors are forever indebted to their parents for gifts too
numerous to mention. We wish that we could have dedicated a more
readable work to them, but this is the best we could do. Thank you.
Chapter 1

DIGITAL COMMUNICATION

This book describes an approach to error control coding for digital

communication that provides performance that is extremely close to that
promised by Shannon in his seminal 1948 paper [100]. We begin in this
chapter with a brief discussion of Shannon’s work and the context it pro-
vides for the general problem of digital communication in the presence
of noise. We proceed to a consideration of algorithms and complexity,
with a focus on the general classes of decoding algorithms. We then re-
view the classical bounds on the nature and performance of error control
codes. The chapter concludes with an overview of the rest of the book.

1. Basics
A discrete channel is a system that consists of input symbols from an
alphabet output symbols from an alphabet and an input-output
relation that is expressed in terms of a probability function
and We assume that the cardinality of the input alphabet is
or equivalently, that the input is The selection of a channel
input symbol induces a probability distribution of the channel output
through where The transmission of data over a
discrete channel typically consists of distinct channel transmissions
corresponding to a word in The output of the channel is a
(possibly) distored form of the word in A discrete channel
is memoryless if probability distribution describing the output depends
only on the current input and is conditionally independent of previous
input or output. We assume throughout the book that our channel of
interest is a discrete memoryless channel.
The general problem of coded digital communication over a noisy
channel can be set up as follows. Assume that there are M possible
2 CODES, GRAPHS, AND ITERATIVE DECODING

messages to be transmitted. We further assume that each message is

represented as a word in where The M mes-
sages are encoded by mapping them onto a set of _ words in
We will consider various design issues surrounding this map-
ping throughout this book, but for now we adopt the intuitive notion of
selecting a mapping of the M words into so that the words in the
associated set of words (the code are well-separated. As we will
see, this separation gives the decoder the ability to recover the originally
transmitted word, even when the received version of that word has been
distorted by noise. We will consider this in more detail by defining a
measure of “closeness” for two words in a given space.
Definition 1 The Hamming distance is the num-
ber of coordinates in which and differ.
Definition 2 An code over the alphabet is a set of
in such that and is
the largest number with this property.
Definition 3 The parameter for an code is the minimum
distance of the code
In the literature and throughout this book, an code is often
referred to simply as an code. the number of coordinates per
word, is often referred to as the length of the code, while the
themselves are refered to as codewords. The difference is
often referred to as the redundancy of the code, while the ratio
is referred to as the rate of the code.
Decoding is the process of examining the received word (the output
of the channel) and determining either the codeword that was most
likely to have been transmitted or directly determining the most likely
information. We will assume for now that we are interested in identifying
the most likely codeword to have been transmitted. Note that if
for some then one can draw an sphere of radius
around each codeword without having any of the spheres touch. If we
assume that a small number of errors is more likely than a large number,
then any received word that falls within a sphere should be decoded to
the respective codeword at the center of the sphere.
We say the rate achievable if there exists a sequence of codes
such that the maximum probability of error goes to 0 as The
maximum probability of error is defined as the maximum probability
over all codewords of picking a wrong codeword as that which was orig-
inally transmitted. Then the natural question arises. What is the max-
imum rate achievable over a given channel? The answer is provided
Digital Communication 3

by Shannon’s Noisy Channel Coding Theorem [100]. To present the

theorem, we must first introduce the concept of entropy, a measure of
uncertainty for the values assumed by random variables.
Definition 4 The entropy of a discrete random variable is

We will sometime use to denote the entropy of X that has distri-

bution The joint entropy H(X, Y) of two discrete random variables
X and Y and the conditional entropy of Y given X are defined
as

respectively. After some calculations, one can see that

This is intuitively clear since
the combined uncertainty of X and Y is the sum of the uncertainty of
X and the additional uncertainty contributed Y under the assumption
that we already know X. can be thought of as the uncertainty
that remains about Y when we are told the value of X.
Definition 5 The mutual information between two discrete random vari-
ables X and Y, I(X ;Y ), is

Note that In the setting

of uncertainty, the mutual information is the decrease in the uncertainty
of one random variable due to knowledge of the other.
If we assume a binary channel (i.e. that X and Y are the set {0,1},
then the maximum value of H(X) is 1, with the units expressed in bits.
If there is no noise on the channel, then I(X; Y) = 1 – no information
is lost during transmission. If there is noise, then the corresponding
reduction in mutual information reflects the loss of information during
transmission. If a code is properly selected, this loss of information can
be ameliorated, or even completely negated. Proceeding intuitively, if
we use the channel times to transmit binary data, then there are about
words that can be distinguished by the receiver.
4 CODES, GRAPHS, AND ITERATIVE DECODING

Definition 6 The channel capacity C of a discrete memoryless channel

We are now ready to state Shannon’s Noisy Channel Coding Theorem.

Theorem 7 (Noisy Channel Coding Theorem) All rates R < C
are achievable. Specifically, there exist codes of cardinality
such that maximum probability of error goes to 0 as if and only
if R < C.
Proof: See [28].
In general, the expression for channel capacity is not available in a nice
closed form. For some simple channels, however, it is readily derived.

Example 8 (Binary Erasure Channel (BEC)) In this channel in-

put bits are erased (lost) with probability α and received correctly with
probability Then

Calculating for the capacity of BEC,

Observe that since inputs are correctly received at the output with prob-
ability the capacity for the BEC is somewhat intuitive.

Example 9 (Binary Symmetric Channel (BSC)) This is the chan-

nel where input bits are received incorrectly with probability p and received
correctly with probability Then

Hence
Digital Communication 5

2. Algorithms and Complexity

In this section, we briefly discuss the general complexity of algorithms.
In particular, we focus on algorithms for solving problems that arise
in the analysis and use of error control codes. We use the following
notations for describing asymptotic behavior of a function.
such that
such that
such that
for some function For example, is asymptotically tight
however is not. For this, we have and
Hence if then
An algorithm runs in polynomial time, or has polynomial time com-
plexity if it halts within time where is the length of the input.
Equivalently, the algorithm is for some Similarly, an algorithm
as an exponential time running time or complexity if it halts after
time where is some polynomial in the input length
We will now consider the somewhat more complicated class of nonde-
terministic polynomial time algorithms, or the class NP. The class NP
is the set of problems that can be solved by a nondeterministic algo-
rithm whose running time is bounded by a polynomial in the length of
the input. A nondeterministic algorithm is an algorithm that replicates
itself at each branching point in the problem, creating a copy for each
potential path to be followed. These copies continue to run, replicating
themselves at subsequent branching points, until one of the copies finds
a solution or the space of potential solutions is exhausted. The number
of copies generated by the nondeterministic algorithm may grow expo-
nentially, but the individual copies run in polynomial time, hence the
name of the class.
A more formal definition of NP proceeds as follows. Given a finite
alphabet let denote the set of all possible strings over A lan-
guage is a subset of and language recognition problem is deciding
whether a given element of belongs to The complexity class P
consists of all languages such that the problem can be found and ver-
ified by a polynomial time algorithm in the length of The complexity
class NP consists of all languages such that the problem can be veri-
fied by a polynomial time algorithm in the length of Clearly,
however, it is not known whether equality holds. It is widely believed
that equality does not hold.
A polynomial reduction from a language to a language is a
function such that can be computed in polynomial time and for all
6 CODES, GRAPHS, AND ITERATIVE DECODING

if and only if A language is NP-hard if

for all there exists a polynomial reduction from to So
while problems considered in complexity classes are decision problems,
NP-hard problems need not be a decision problem. A language is NP-
complete is it is in NP and it is NP-hard. Hence NP-complete problems
are the “hardest” problems in NP since if an NP-complete problem can
be solved in polynomial time, then all NP problems can be solved in
polynomial time.
There are several general models for computation to be considered
for the implementation of encoding and decoding algorithms. Formal
descriptions of computation are typically made within the context of
the Turing machine. Sequential computations can also be considered
within the context of the RAM (random access machine) model. The
RAM model has two basic types, namely the unit or uniform cost model
and the logarithmic cost model. For parallel computations, we formalize
the complexity of algorithms using a logical circuit model.
In the RAM model, it is assumed that we have a machine that can
perform a few basic operations, including arithmetic, branching, memory
operations, and register transfer. In the unit-cost RAM model, each
operation is performed in one unit of time independent of the length
of the operand. In the logarithmic cost RAM model, each operation is
performed in units of time that are proportional to the length of the
operand. We shall assume throughout this book that for sequential
computations, the unit-cost model is used.
The logical circuit model is based on a tree with nodes representing
gates. Each gate has two inputs and an output that is a Boolean function
of the two inputs. The complexity of parallel computation is measured
in terms of the size and the depth of the logical circuit – the number of
gates and the length of the longest path from an input to an output,
respectively. We will assume throughout this book that for parallel
computations, the logical circuit model is used.

3. Encoding and Decoding

There are three basic elements to the design of an error control system:
code construction, code encoding, and code decoding. Code construction
refers to the design of a code that satisfy certain code parameters. Code
encoding refers to the encoding of a message to be conveyed to a code-
word. Code decoding refers to intelligently selecting, given a received
word, one of the possible codewords as that which was transmitted.
Specifically, by encoding a message, we mean representing the message
as a codeword by some encoding function and by decoding a received
word, we mean reconstructing the codeword from the received word or
Digital Communication 7

the noise-corrupted version of the codeword by some decoding function

where

Here the usage of for the domain of is made only for notational
convenience and is irrelevant to the our discussion, and
If the channel alphabet is the same as the code alphabet then
the decoding problem reduces to what is called hard decision decoding.
In this case error correcting code provides error control capability at the
receiver through redundancy; not all patterns in are valid, so the
receiver is able to detect changes in the transmitted symbol sequence
caused by channel noise when such changes result in invalid patterns.
The receiver may also be able to map a received sequence to a codeword
If is the transmitted codeword, then we have “corrected” the errors
caused by the channel. Otherwise, decoding error has occurred.
There are many types of hard decision decoding functions Several
are listed below.
Definition 10 (Nearest-Codeword Decoding) Nearest-codeword de-
coding selects one of the codewords that minimize the Hamming distance
between a codeword and the received word.
Theorem 11 The problem of finding the nearest codeword for an arbi-
trary linear block code is NP-hard.
Proof: See [18].
Definition 12 (Bounded Distance Decoding) Bounded distance de-
coding selects all codewords that are within Hamming distance from the
received word for some predefined
If then is called the error correction capability of the
code, and if it exists, is unique. An code is called a
error correcting code for For some hard decision decoding
algorithms, serves as a limit on the number of errors that
can be corrected. For others, it is often possible to correct more than
errors in certain special cases.
Definition 13 The weight of a word is the number of nonzero coordi-
nates in the word. The weight of a word is commonly written
In hard decision decoding we can speak meaningfully of an error vector
induced by noise on the channel. On most communication
8 CODES, GRAPHS, AND ITERATIVE DECODING

channels of practical interest it is often the case that the probability mass
function on the weight of the error vector is strictly decreasing, so that
the codeword that maximizes is the codeword that minimizes
Under these assumptions, a code with minimum
distance can correct all error patterns of weight less than or equal to

In many cases the channel alphabet is not the same as the code
alphabet Generally this is due to quantization at the receiver that
provides a finer discrimination between received signals than that pro-
vided with hard decisions. The resulting decoding problem is called soft
decision decoding, and the solution takes the form of a mapping from the
received space to the code space. In this case it is misleading to speak
of “correcting” channel errors, as the received sequence does not contain
erroneous code symbols.
There are three basic types of soft decision decoding considered in
this book.
Definition 14 (Maximum Likelihood(ML) Decoding)
Maximum likelihood decoding finds one of the codewords that, for a
received word maximize the distribution
There is a related decoding algorithm that is identical to ML decoding
if the codeword distribution is uniform.
Definition 15 (Maximum A Posteriori (MAP) Decoding)
Maximum a posteriori decoding finds one of the codewords that, for a
received word maximize the distribution
Definition 16 (Symbol-by-symbol MAP Decoding)
Symbol-by-symbol MAP decoding finds the information symbol that,
for a received word maximizes the distribution

4. Bounds
In this section we consider several classical upper and lower bounds
on minimum distance and code rate as a function of other code param-
eters. These bounds treat a fundamental problem that can be stated
as follows: For a given codeword length and minimum distance, what is
the greatest possible number of codewords? We consider both the non-
asymptotic and asymptotic behavior of error control codes. In the latter,
we approach the aforementioned question while allowing the codeword
length to tend toward infinity.

1
is typically the Euclidean space.
Digital Communication 9

The following approximation, known as the Stirling’s formula, will be

useful.

Let be the number of vectors in a sphere of radius in

Theorem 17 (Sphere Packing Upper Bound) Given code of
minimum distance over

Proof: Consider a sphere centered at a codeword of The number

of vectors that are exactly Hamming distance from is It
follows that

and is the volume of all spheres of radius centered

at codewords of The total volume of these spheres must be less than
or equal to the volume of the entire space of over
The Sphere Packing Upper Bound is often referred to as the Hamming
Bound.
Definition 18 (Perfect Codes) A code is perfect if it satisfies the
Hamming Bound with equality.
Theorem 19 (Gilbert Lower Bound) There exist codes of length
cardinality M, and minimum distance in that satisfy

Proof: [42] The code is to be constructed by selecting one vector at a

time from the space of over When a codeword is selected,
all vectors that are within Hamming distance from the selected
codeword are deleted from further consideration. This ensures that the
resulting code has minimum distance The selection of each codeword
results in the deletion of at most vectors from the space
of It follows that at least M codewords are selected, where
The result follows.
Let be the maximum cardinality of a code of length and
minimum distance where
10 CODES, GRAPHS, AND ITERATIVE DECODING

Definition 20 (Minimum Relative Distance) is called the mini-

mum relative distance of a code if is the minimum distance of the
code.
Define

which measures the achievable rate of the code.

Define to be the entropy function given by

Lemma 21 If then

Proof: Since

Taking the logarithm and dividing by we get

and

After the usual calculations, it follows that

Applying

(from Stirling’s formula) to the binomial coefficient of the above equation

finishes the proof of the lemma.
Digital Communication 11

Theorem 22 (Gilbert-Varshamov Lower Bound)

For minimum relative distance

Proof: It suffices to prove that

The proof is finished at once by noting that

The Gilbert-Varshamov Lower Bound shows that it is possible to con-
struct a code with at least codewords by adding more
codewords to the code with minimum distance d until no more can be
added.
The best known upper bound on for binary codes was derived by
McEliece, Rodemich, Rumsey, and Welch. We state a simplified form
without proof.

Theorem 23 (McEliece-Rodemich-Rumsey-Welch Bound)

Proof: See [80].

While randomly chosen codes will satisfy the Gilbert-Varshamov Bound
with high probability, it is very difficult to explicitly construct such a
code. One can henceforth consider a code that satisfies a less tight
bound but that is easier to construct. To this end, we have the notion
of asymptotically good codes codes.

Definition 24 Let a family of codes of increasing block length

have the limit parameters and The family of
codes are called asymptotically good codes if
There is another bound that is widely used but is weaker than the
Sphere Packing Bound upper bound.

Theorem 25 (Singleton Upper Bound) The minimum distance for

an code is bounded above by
12 CODES, GRAPHS, AND ITERATIVE DECODING

Proof: Arrange each codeword in each row of a matrix such that the
first coordinates of each row are different from each other.
This can be done since all codewords differ from one another by bits.
Hence since the proof is finished.
Definition 26 A code that satisfies the Singleton Bound with equality
is said to be maximum distance separable (MDS).
There exist a large number of MDS codes, the most prominent being
the Reed-Solomon codes. The maximum possible length of MDS codes
over a given alphabet is not known, though the following conjecture is
widely believed to be true.
Conjecture The length of a MDS code with dimension or redun-
dancy 3 is at most If neither the length nor the redundancy are
3, then the length is at most

5. Overview of the Text

Having set the stage by exploring what is possible, the remainder of
the book will focus on actual codes and their respective encoding and
decoding algorithms. Chapters 2 – 4 focus on “classical” error control
coding. Chapter 2 provides a quick overview of the necessary abstract
algebra, while chapter 3 explores the construction of several of the more
notable algebraic block codes. Chapter 4 completes the picture of classi-
cal error control coding with an overview of convolutional and concate-
nated codes. Chapter 5 begins the in-depth study of the more recent
results in coding theory and practice. Chapter 5 introduces the vari-
ous elements of graph theory that will be used in the rest of the book.
Chapter 6 then considers various types of algorithms that are executed
on graphs. These algorithms include the belief propagation algorithms
and other, related algorithms. Chapter 7 provides a concise survey of
turbo coding, focusing on the relation of turbo decoding to belief prop-
agation. Chapter 8 explores low density parity check codes, beginning
with the work of Gallager and the proceeding to more recent results,
including the use of belief propagation on these codes to obtain perfor-
mance near the Shannon limit. The final chapter, Chapter 9 considers
low density generator codes and repeat-accumulate codes.
Chapter 2

ABSTRACT ALGEBRA

The first twenty years of the development of coding theory consisted

of the increasingly sophisticated use of algebraic structures, culminat-
ing in the development of Reed-Solomon codes and Berlekamp’s algo-
rithm. More recently, algebraic structures have provided a useful tool
for creating more general forms of existing decoding algorithms, and for
exploring commonality in structure across algorithms. In this chapter
we present several basic algebraic structures – groups, rings, fields, and
vector spaces – as well as a few interesting variants, like semigroups.
We present the basic properties of Galois fields, and develop the con-
cepts of minimal polynomials and ideals in rings. As will be seen in the
next chapter, the latter are particularly important in the construction of
cyclic codes, including the Reed-Solomon and BCH codes. In Chapter
6, semigroups will be used to develop the idea of a common structure
for algorithms on graphs.

Readers interested in a thorough and eminently readable overview are

referred to Finite Fields for Computer Scientists and Engineers by R.
J. McEliece [81]. Those interested in an encyclopedic treatment of the
subject are referred to Finite Fields by R. Lidl and H. Niederreiter [66].

1. Sets and Groups

A set can be (loosely) defined to be an arbitrary collection of objects,
or elements. There are no predefined operations between elements in
a set. The number of elements in the set is often referred to as the
cardinality of the set.
14 CODES, GRAPHS, AND ITERATIVE DECODING

Definition 27 Let be a set on which a binary operation has been

defined. is a group if the operation satisfies the following four con-
straints:

The group is said to be abelian, or commutative, if the operation

is commutative; in other words, if The
group operation for a commutative group is usually represented using the
symbol “+”, an allusion to the fact that the integers form a commutative,
additive group.
The order of a group is defined to be the cardinality of the group. The
order of a group alone is not sufficient to completely specify the group
unless we restrict ourselves to a particular operation.
The simplest groups can be formed by grouping the integers into
equivalence classes under addition modulo Two integers and are
said to be in the same equivalence class modulo if can be expressed
as for some integer If this is the case we write mod
In this text equivalence classes are labeled using one of the constituent
elements, usually the element with the smallest magnitude.
Theorem 28 Integer addition mod partitions the integers into equiv-
alence classes that form a commutative group
of order
Proof: The associativity and commutativity of modulo addition fol-
lows from the associativity and commutativity of integer addition. The
identity element in is 0, while the additive inverse of an element is
the equivalence class containing the integer Closure is assured
by the modularity of the additive operation, and the result follows.
Theorem 29 Integer multiplication mod partitions the integers into
equivalence classes that form a commutative
group of order if and only if is prime.
Proof: The associativity and commutativity of modulo multiplication
follows from the associativity and commutativity of integer multiplica-
tion. The multiplicative identity is clearly 1. Closure and the existence
of inverses for all elements are only assured, however, if is prime. If
Abstract Algebra 15

is not prime, then there exists such that mod and

closure is not satisfied. If is prime, then there can be no such pair of
elements and closure is satisfied. To show the existence of inverses, we
note that for any element the products
must be distinct; otherwise, and there exists such that
Since the products are distinct for all there
must be a product indicating the existence of the multiplicative
inverse for
Definition 30 (Order of a Group Element) Let be an element in
the group with group operation “.”. Let and
so on. The order of is the smallest positive integer such that
is the group identity element.
Theorem 31 Let be a subset of the group If
then is a group.
Proof: By the premise, for all there exists such that
If we set then there exists some such that
for each Multiplying on the right by the inverse of
shows that in each case e is the identity element in It follows that the
identity element in is in Returning to the expression now
set to the identity element, and the existence of inverses in follows
immediately. Since is a subset of a group obtains associativity
from

A group contained in a group is said to be a subgroup of A

subgroup of is proper if
Definition 32 (Left and Right Cosets) Let be a subgroup of
with operation “+”. A left coset of in is a subset of whose ele-
ments can be expressed as for some A
right coset of in is a subset of whose elements can be expressed
as for some

Note that if is a subgroup, it must contain an identity element. It

follows that the that defines a given coset must be an element
of that coset. If the group is commutative, every left coset is
identical to every right coset
Theorem 33 The distinct cosets of a subgroup in a group are dis-
joint.
Proof: Let and be in the same coset of in It follows that for
some we can write and for some It
16 CODES, GRAPHS, AND ITERATIVE DECODING

then follows that where More generally,

this shows that elements in the same coset of differ by an element of

Now suppose that and are in the same coset. It follows that
where Using the above, we have and
for some We then see that if an element
in coset is equivalent to an element in coset then every element
in is equivalent to every element in It follows that contains
Since the reverse is also true, Distinct cosets must therefore be
disjoint.

Theorem 33 shows that a subgroup of a group defines a parti-

tioning of into distinct, disjoint cosets. This partitioning of is called
the coset decomposition of induced by

Theorem 34 (Lagrange’s Theorem) If is a subgroup of then

Proof: Let and (i.e. is in but not in ).

It follows that otherwise,
implies which contradicts Given an element in a coset
of in defines a one-to-one mapping of all of the other
elements in the coset to elements All cosets of in thus
have cardinality Since they are also disjoint by Theorem 33, the
result follows.
The above result is due to the mathematician and astronomer Joseph
Louis Lagrange (1736 - 1813).

2. Rings, Domains, and Fields

Definition 35 A ring is a set together with two binary operations
called “+” and “·”, which satisfy the following three axioms:
forms a commutative group under the operation “+”. The additive
identity element in the group is labeled “0”.
The operation “.” is dosed and associative, and there is a multiplica-
tive identity element labeled “1” such that for all
The distributive law holds, i.e.

for all
Abstract Algebra 17

If the operation “·” is commutative then is said to

be a commutative ring. If “·” has an identity element, then is said to
be a ring with identity. And, of course, if “·” is commutative and has an
identity element, then is said to be a commutative ring with identity.
Definition 36 A semiring is a set together with two binary opera-
tions called “+” and “·”, which satisfy the following three axioms:
The operation “+” is closed, associative and commutative, and there
is an additive identity element called “ 0” such that for all

The operation “.” is closed and associative, and there is a multiplica-

tive identity element called “1” such that for all
The distributive law holds.
In other words, a ring without an additive inverse is a semiring. A com-
mutative semiring is a semiring in which “·” commutes. The following
are examples of commutative semirings.

Semiring Example 1 The set of nonnegative

real numbers with the operation + being the sum that has the identity
element 0, and the operation . being the product that has the identity
element 1.
Semiring Example 2 The set of nonnegative
real numbers with the operation + being the maximum that has the
identity element and the operation . being the sum that has the
identity element 0.
Semiring Example 3 The set of polynomials over
a field in the variable with the operation + being the polynomial
sum that has the identity element 0, the zero polynomial, and the op-
eration . being the polynomial product that has the identity element
1, a constant polynomial. In the polynomial sum, the coefficients are
added component-wise, and in the polynomial product, the coefficients
are calculated through convolution.
Definition 37 (Euclidean Domains) A Euclidean domain is a set
with two binary operations “+” and “·” that satisfy the following:
forms an additive commutative ring with identity.
“.” is closed over
Cancellation: if then
18 CODES, GRAPHS, AND ITERATIVE DECODING

Every element has an associated metric such that

there exists such that

with or
The metric for the additive identity element, is generally taken to
be undefined, though a value of can be assigned if desired. is
called the quotient and the remainder.
Example 38 (Euclidean Domains)

The ring of integers under integer addition and multiplication with

metric (absolute value) forms a Euclidean domain.
The ring of polynomials over a finite field with metric
forms a Euclidean domain.
Let and be two elements in a Euclidean domain is said to be
a divisor of if there exists such that
Definition 39 (Common Divisors) An element is said to be a com-
mon divisor of a collection of elements if for

Definition 40 (Greatest Common Divisors) If is a common di-

visor of the and all other common divisors are less than then
is called the greatest common divisor of the
Euclid’s Algorithm is a polynomial-time algorithm for finding the
of sets of elements in Euclidean domains.

Euclid’s Algorithm
Let a Euclidean domain, where
(1) Let the indexed variable take on the initial values and

(2) If then define using the recursive relation

where
(3) Repeat until
(4) For the smallest where
Abstract Algebra 19

Note that with each iteration of the recursion formula, the size of the
remainder gets smaller. It can be shown that, in a Euclidean domain,
the remainder will always take on the value zero after a finite number
of steps. For a proof that is the when first takes on the value
zero, see McEliece [81].
Example 41 (Using Euclid’s Algorithm)

Theorem 42 If is any finite subset of elements from

a Euclidean domain then has a which can be expressed as a
linear combination where the coefficients
Proof: [81] Let be the set of all linear combinations of the form
where the coeffients Let be the element in with
the smallest metric By definition is a linear combination of
the elements in the set If does not divide some element
then where But must be in
since and are in This contradicts the minimality of in S. It
follows that is thus a common divisor of Now let be any other
20 CODES, GRAPHS, AND ITERATIVE DECODING

common divisor of Then for each Since

is a multiple of every common divisor
and thus the of all of the elements in

The Extended Form of Euclid’s Algorithm

The extended form of Euclid’s algorithm finds s and t such that

(1) A set of indexed variables is given the initial conditions

(2) If then let such that

(3) Compute
(4) Compute
(5) Repeat steps 2 through 4 until At this point
and The sequences and satisfy
for all

Example 43 (The Extended Form of Euclid’s Algorithm)

Find and find and such that

Using and we have

If the nonzero elements in a domain form a multiplicative group, then

the domain is a field. Fields are best defined in terms of their dual-group
structure.
Definition 44 (Fields) Let be a set of objects on which two opera-
tions “+” and “.” are defined. is said to be a field if and only if
Abstract Algebra 21

forms a commutative group under +. The additive identity element

is labeled “0”.
(the set with the additive identity removed) forms a com-
mutative group under. The multiplicative identity element is labeled
“1”.
+ and . distribute:
A field can also be defined as a commutative ring with identity in
which every element has a multiplicative inverse.
Fields of finite order (cardinality) are known as Galois fields in honor
of the French mathematician, political radical, and duelist Evariste Ga-
lois (1811-1832). A Galois field of order is usually denoted
A finite field of order is unique up to isomorphisms (the renaming of
elements) . A nice development of this result can be found in [81].
Example 45 (Fields of Prime Order) The integers
where is a prime, form the field under modulo addition and
multiplication. This follows immediately from distributivity of integer
addition and multiplication and Theorems 28 and 29.

We shall shortly see that Galois fields of order can be described

as vector spaces over
Definition 46 (Order of a Galois Field Element) The order of
written is the smallest positive integer such that
1.

The reader may wish to prove to herself that the order of a Galois
field must exist, and must be finite.

Theorem 47 If for some then

Proof: If for some then

forms a multiplicative subgroup of the nonzero elements in The
result follows from Lagrange’s Theorem (Theorem 34).
Theorem 48 Let and be elements in such that
Then
Proof: We first show that if and only if This is
proven as follows. First note that if then trivially.
If then the minimality of is contradicted. If
then where and
22 CODES, GRAPHS, AND ITERATIVE DECODING

must be zero; otherwise, the minimality of

is again contradicted.
Now let Since by definition of we
have

which implies that by the above result. Sim-

ilarly, since and
Since and divide each other, they must be equal.
(Adapted from [81])
Definition 49 (The Euler Function) is defined to be the car-
dinality of the set
The Euler function is called the Euler totient function in some texts.
The following properties are easily proven.
for all prime
for distinct primes and

for prime and integer

for distinct primes and
can be computed directly using The product
is taken over all positive prime integers that divide
Theorem 50 (The Multiplicative Structure of Galois Fields)
The multiplicative structure of Galois fields has two basis elements.
If does not divide then there are no elements of order in

If divides then there are elements of order in

Proof: The first part follows immediately from Theorem 47. If

then is the set of all solutions to the expression
Abstract Algebra 23

It follows that all elements of order must thus be contained

in this set. Theorem 47 implies that only those elements of the form
where have order There are such elements.
Definition 51 (Primitive Elements in a Galois Field)
An element with order in is called a primitive element in

It follows immediately from Theorem 50 that in every finite field

there are exactly primitive elements. Since is positive
for positive every Galois field contains at least one primitive element

Let be primitive in and consider the set

The elements of the set must be distinct, otherwise the minimality of
the order of is contradicted. They must also be nonzero, otherwise the
existence of inverses for the nonzero elements in a field is contradicted.
It follows that the elements of the set are the nonzero elements of
We reach the important conclusion that the nonzero elements in
can be represented as the consecutive powers of a primitive
element

3. Vector Spaces and GF

Definition 52 (Vector Spaces) Let be a set of elements called vec-
tors and a field of elements called scalars. In addition to the two field
operations, there are two vector operations. Vector addition, denoted
“+”, maps pairs of vectors onto Scalar
multiplication, denoted “.,” maps a scalar and a vector
onto a vector forms a vector space over if the
following conditions are satisfied:
forms a commutative group under the operation “+”.
and
“+” and “.” distribute: and
(Note that refers to the additive field operation,
not the additive vector operation.)
Associativity: and
The multiplicative identity 1 in acts as a multiplicative identity in
scalar multiplication:

is commonly called the “scalar field” or the “ground field” of the

vector space
24 CODES, GRAPHS, AND ITERATIVE DECODING

The set of forms a vec-

tor space under coordinate-by-coordinate addition and scalar multiplica-
tion. Let and
Vector addition can then be defined by
and scalar multiplication by
Since forms a commutative group under “+”, the
linear combination
Definition 53 (Spanning Sets) A spanning set for a vector space
is a set of vectors for which the set of all linear
combinations includes all vectors in is said to
span
Definition 54 (Bases) A spanning set for that has minimal cardi-
nality is called a basis for
Theorem 55 The elements of a given basis are linearly independent.

Proof: Suppose that the elements of a basis are not linearly indepen-
dent. It follows that one of the elements can be deleted without reducing
the span of the basis. This reduced the set’s cardinality by one, contra-
dicting the minimality of the cardinality of a basis.
Corollary 56 Though a vector space may have several possible bases,
all of the bases will have the same cardinality.
Definition 57 (Dimension of a Vector Space) If a basis for a vec-
tor space has elements, then the vector space is said to have dimen-
sion written
Theorem 58 Let be a basis for a vector space For every vec-
tor there is a representation This
representation is unique.
Proof: The existence of at least one such representation follows from the
definition of bases. Uniqueness can be proven by contradiction. Suppose
that there are two such representations for the same vector with different
coefficients. Then we can write
where Then
Since the basis vectors must not be independent. This
contradicts Theorem 55.

Due to the uniqueness of the representation, the number of vectors in

a vector space can be counted by enumerating the number of possible
Abstract Algebra 25

representations. The number of vectors in the vector space equals the

number of possible choices for the Let a vector
space over a field have dimension Then

Theorem 59 Let is a vector subspace of if and only if

and
Proof: Assume that is a vector subspace. It follows by definition
of a vector space that is closed under linear combinations, and the
first half of the result follows. Now assume that
and The closure properties for vector addition
and scalar multiplication are clearly satisfied for Since is closed
under scalar multiplication, It follows that the
additive identity must also be in The remainder of the vector space
properties follow by noting that since is a vector space, the various
properties (associativity and commutativity) for operations that hold in
must also hold in
Definition 60 Let and be
vectors in the vector space over the field The inner product of
and is defined as
The following properties of the inner product follow immediately.
Commutativity:
Associativity with scalar multiplication:

Distributivity with vector addition:

Definition 61 (Dual Spaces of Vector Spaces) Let be a di-
mensional subspace of a vector space V. Let be the set of all vectors
in such that for all and for all is said to
be the dual space of
Theorem 62 The dual space of a vector subspace is itself a
vector subspace of
Proof: Let Then and by definition
of a dual space. Using the distributivity of the inner product with vector
addition, we have Using
commutativity with scalar multiplication we have
All linear combinations of elements in are then
elements in is a vector subspace by Theorem 59.
26 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 63 (The Dimension Theorem) Let be a finite- dimen-

sional vector subspace of and let be the corresponding dual space.
Then
Proof: Let and Let be a matrix
whose rows form a basis for S. It follows that if
and only if is a vector space by Theorem 62, so we can
construct a basis where is the dimension of Since
the basis for can be extended to form a basis for of the
form
Since the elements of a basis are independent, the row rank of
is The row and column ranks of a matrix are always equal [51],
so the columns of span a vector space. By defini-
tion of a spanning set, every vector in the column space of can
be written in the (not necessarily unique) form where
It follows that the column space of is spanned by the vectors

Since it follows that for

leaving the remaining vectors to span the
dimensional column space of
The vectors in the spanning set must also
be independent. This follows by noting that if there exists a linear
combination then
Since the are not in it follows
that If the are not identically
zero, the vectors in the basis for are not linearly independent, which
is a contradiction.
Since the vectors are linearly independent
and span a space, it follows that and the
dimension of is thus

Having developed several tools for exploring vector spaces, we can

now consider Galois fields of order
All Galois fields contain a multiplicative identity element that is usu-
ally given the label “1”. Consider the sequence 0, 1, 1 + 1, 1 + 1 + 1, 1 +
1 + 1 + 1,. . . . Since the field is finite, this sequence must begin to repeat
at some point. Let denote the summation of ones. If is the
first repeated element, being equal to it follows that
must be zero; otherwise, is an earlier repetition than

Definition 64 (Characteristic of a Field) The characteristic of the

finite field is the smallest positive integer such that
Abstract Algebra 27

Theorem 65 The characteristic of a Galois field is always a prime in-

teger.
Proof: (by contradiction). Consider the sequence 0, 1, 2(1), 3(1),...,
Suppose that the first repeated element is
where is not a prime. is thus the characteristic of the field by
definition. Since is not prime, there exist positive integers
such that It follows that Since a
field cannot contain zero divisors, either or must equal zero.
Since this is a contradiction of the minimality of the
characteristic of the field.

Since in a field of charac-

teristic for all
Theorem 66 A field with characteristic contains
the field of integers under modulo addition
and multiplication.
Proof: Since the sums or products of sums of ones remain sums of ones,
is closed under addition and multiplication. The additive
inverse of is The multiplicative inverse of
or a multiple of is where mod (the
existence of a solution follows from the primality of ). The rest of
the field requirements (associativity, distributivity, etc.) are satisfied by
noting that is embedded in the field Since the field of order
is unique up to isomorphisms, must be the field of integers under
modulo addition and multiplication.
Theorem 67 The order of a Galois field must be a power of
a prime.
Proof: Theorem 66 showed that every finite field of characteristic
contains a subfield We now show that is a vector
space over
Let be nonzero element in There are distinct elements of
in the set They must be distinct, for
implies If the field contains no other elements, the
result follows immediately. If then there are
distinct elements in in the set
This process continues until all elements in are included in some
set Since there
is a one-to-one mapping between coefficients
and elements in
28 CODES, GRAPHS, AND ITERATIVE DECODING

The following theorem shows that the field contains all Galois
fields of order where divides
Theorem 68 An element in lies in the subfield if and
only if
Proof: Let It follows from Theorem 47 that
and thus
Now assume that is then a root of The
elements of comprise all roots, and the result follows.

4. Polynomials over Galois Fields

Definition 69 (Irreducible Polynomials) A polynomial is ir-
reducible in if cannot be factored into a product of lower
degree polynomials in
Definition 70 (Primitive Polynomials) Let be an irreducible
polynomial of degree in is said to be primitive if the
smallest positive integer for which is
It can be shown that any irreducible polynomial
divides (see, for example, [81]). A primitive poly-
nomial is always irreducible in but irre-
ducible polynomials are not always primitive. Consider, for example,

Definition 71 (Minimal Polynomials) Consider The

minimal polynomial of with respect to is the nonzero polynomial
of smallest degree in such that
Theorem 72 For each element there exists a unique monic
polynomial of minimal degree in such that the following
are true.
(1)
(2)
(3) implies that
(4) is irreducible in

Proof: Consider the set of elements Since

is a vector space of dimension over by the proof of
Theorem 67, the elements in must be linearly
Abstract Algebra 29

dependent over i.e. there must be a polynomial

such that can
be made monic by multiplying through by the inverse of Given
the existence of at least one such monic there must be at least
one monic polynomial of minimal degree, for the degrees of all
polynomials in are bounded below by 0, while the degree of
is bounded above by the degree of
The uniqueness of the monic minimal polynomial of follows by con-
tradiction. Suppose that and are distinct monic minimal poly-
nomials of Since is a ring, it contains
Since and are monic,
which contradicts the minimality of the degree of the minimal polyno-
mial.
Now consider Since is a Eu-
clidean domain, we can write where either
or Since it fol-
lows that We then have by the minimality of the
degree of and
Finally, we show that must be irreducible. The proof is by con-
tradiction. Suppose If
and then implies that neither
nor can have inverses in contradicting the definition of a field.
If either or the minimality of is contradicted.
The irreducibility of follows.

Definition 73 (Conjugates of Field Elements) Let be an element

in The conjugates of with respect to the subfield are
, This set of elements is called the conjugacy class of
in with respect to

When discussing conjugates, the modifying clause “with respect to

is occasionally suppressed, but is always assumed.

Theorem 74 If the conjugacy class of with respect to

has cardinality d, then and

Proof: Consider the sequence

The conjugacy class of must clearly have finite cardinality. The
above sequence must thus begin to repeat as some point. Suppose
that is the first element to repeat, taking on the value where
Then and
We know that by Theorem 47.
30 CODES, GRAPHS, AND ITERATIVE DECODING

Since is a power of a prime, and for any integer x and prime

and Then
Since and It follows that
by the proof of Theorem 48. Since is the first rep-
etition of the element , and is a divisor of

The following lemmas will prove useful in showing that the roots of a
primitive polynomial are conjugates of one another.

Lemma 75 For all a prime,

Proof: is always an integer. Since is

prime, none of the integers are divisors of

must be a multiple of
Lemma 76 If then
for
Proof: We show that the general case follows by
induction on
For we have

The previous lemma shows that It

follows that Since

has characteristic
For the general case, the induction hypothesis gives
The result then follows by using the
above technique to show

We can now prove the following theorem.

Theorem 77 Consider . Let be the minimal poly-
nomial of with respect to The roots of are exactly the
conjugates of with respect to
Proof: By definition of minimal polynomial with respect to
where the coefficients Using
Lemma 76 we have The
Abstract Algebra 31

same technique can be used to show that the distinct conjugates of

are all roots of the minimal polynomial of with
respect to
Let
Using the second lemma and rearranging the order of
multiplication, we have

It then follows that

which shows that

The coefficients of satisfy and are thus roots of
But the elements in the subfield constitute all roots of this
expression. All of the coefficients of are thus in the subfield, and
the result follows.

We have shown that the minimal polynomial of with

respect to must contain all of the conjugates of with respect to
as roots, and that those roots alone are sufficient to ensure that
the coefficients of the resulting polynomial are in the subfield
Corollary 78 All of the roots of an irreducible polynomial in the
ring have the same order.
Proof: Let be the smallest field containing all of the roots
of an irreducible polynomial The roots must thus
have orders that divide An irreducible polynomial in
must be the minimal polynomial for its roots with respect to
otherwise, it would be divisible by the minimial polynomial, and thus
not irreducible. The previous theorem then shows that the roots are
conjugates of one another, and take the form Since
is the order of a finite field, it must be a power of a prime. and its
powers are thus relatively prime to and all divisors of
Theorem 48 provides the final conclusion that
32 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 79 The roots of a primitive polynomial

have order
Proof: Let be an arbitrary root of a primitive polynomial
is also a root of since
1 by Definition 51. We now show that implies that
Since is a root of and must
be a root of unity. The result follows by contradiction. If
does not divide then where
It follows that This
contradicts the minimality of the order of It follows that all of the
roots o f are roots of which i n turn implies
that
Since all of the roots of an irreducible polynomial have the same order,
divides which in turn divides It follows by
the definition of a primitive polynomial (see Definition 51) that

Corollary 80 The roots of a primitive polynomial in

are primitive elements in

Theorem 81 The set of nonzero elements in form the com-

plete set of roots of the expression

Proof: Theorem 47 shows that any element in the field has an

order that divides It follows that the nonzero elements
in the field are all roots of the expression or
equivalently, the elements of are roots of unity. As
the expression is of degree the nonzero
elements in must comprise the complete set of roots.

The minimal polynomials with respect to of the nonzero ele-

ments in a given field thus provide the complete factorization
Abstract Algebra 33

of into irreducible polynomials in the ring For

example, the factorization of in is

To factor the more general expression consider the following.

Theorem 50 showed that if is a divisor of then there are
elements of order in Since is always positive for
positive it follows that we are guaranteed the existence of a primitive
root of unity in an extension field of so long as we can find a
positive integer such that
Now consider an element with order in and all powers
of satisfy by definition of order. It also follows that
the elements must be distinct. The roots of
are generated by computing consecutive powers of For
this reason elements of order are often called primitive roots of
unity.
Definition 82 (The Order of Modulo The order of mod is
the smallest positive integer such that divides
If is the order of mod then is the smallest extension
field of in which one may find primitive roots of unity. Once
the desired primitive root has been located, the factorization of
can be completed by forming the conjugacy classes and computing the
associated minimal polynomials.
Example 83 (Factoring We begin by noting that a primitive
21st root of unity can be found in GF(64) (Theorem 50 shows that there
are 12 such elements in GF(64)). Let be one of the roots. We proceed
to identify the conjugacy classes formed by the powers of

We can see from the conjugacy classes that . factors into two
irreducible binary polynomials of degree 6 and one of degree 3.
Definition 84 (Cyclotomic Cosets) The cyclotomic cosets mod with
respect to are a partitioning of the integers into
sets of the form
Example 85 (Cyclotomic Cosets) The cyclotomic cosets mod 15 with
respect to GF(2) are {0}, {1, 2, 4, 8}, {3, 6, 9, 12}, {5, 10}, and{7, 11, 13, 14}
34 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 86 Let have order in has

coefficients in if and only if is a union of cyclotomic subsets
modn with respect to

Proof: If is empty, the result follows immediately. Now consider

and some Since Theorem 72 indicates that the
minimal polynomial of with respect to divides Theorem
77 then indicates that the roots of include all of the conjugates of
with respect to The result follows.

5. Frequency Domain Analysis of Polynomials

over
Definition 87 (Galois Field Fourier Transform (GFFT)) Let
where is a vector space over Let length
for some positive integer Let be an element oforder in
The Galois Field Fourier Transform of is the vector

where

The GFFT was originally developed by Mattson and Solomon in the

form of what is now called the Mattson-Solomon polynomial [76]. The
inverse transform is provided in the following theorem.

Theorem 88 (The GFFT Transform Pair) Let where is

a vector space over a field with characteristic and its trans-
form are related by the following expressions.

Proof: [76] Since has order the elements are

the zeros of If then
must be a zero of and thus satisfies the equality
Abstract Algebra 35

When we have mod Com-

bining, we have
mod
Theorem 89 (The GFFT Convolution Theorem) Consider the fol-
lowing GFFT pairs.

Then if and only if

Proof: Apply the inverse GFFT to

The reverse direction follows from Theorem 88.

Corollary 90 Consider the following GFFT pairs.
36 CODES, GRAPHS, AND ITERATIVE DECODING

Then if and only if

Proof: Apply the same argument used in the previous proof.

Definition 91 The spectrum of the polynomial
is the GFFT of the vector
Theorem 92 Let the coefficients of and be the components
of and respectively. Then the following is true.
if and only if
if and only if
Proof: For part 1, we note that

The second part follows in a similar manner.

Earlier it was shown that a polynomial with coefficients in

has as roots the union of one or more conjugacy classes with respect
to The GFFT provides an analogous result in terms of the
spectrum of the polynomial.
Theorem 93 Let V be a vector of length over where
and has characteristic For
if and only if mod
where
Proof: [76]: Recall that for fields with characteristic
Also recall that is in the subfield if and
only if It follows that
Abstract Algebra 37

To prove the reverse case, assume that for all j. We

then have by definition of the
GFFT. Now let Since and must be
relatively prime. As ranges from 0 to takes on all values in the
same range. We then have Since
and have the same transform, they must be equal (see Theorem
89).
Example 94 (Minimal Polynomials and the GFFT)
In Table 4 eight elements in GF(8) were arranged in conjugacy classes
and their minimal polynomials computed. When we compute the GFFT
of the coefficients of the minimal polynomials, we obtain the results in
Table 94. Note that the positions of the zero coordinates in the spectra
correspond with roots of the minimal polynomials. For example, note that
for the conjugacy class the associated minimal polynomial
has a spectrum with zeros in the 3 , 5th, and 6th positions.
rd

6. Ideals in the Ring

Theorem 95 If is an irreducible polynomial in then
is a field.
Proof: The result follows from Theorem 67 and its proof.
Definition 96 (Ideals) Let be a ring. A nonempty subset is
said to be an ideal if it satisfies the following:
forms a group under the addition operation in
and
Definition 97 (Principal Ideals) An ideal contained in a ring
is said to be principal if there exists such that every element
can be expressed as the product for some
38 CODES, GRAPHS, AND ITERATIVE DECODING

The element used to represent all of the elements in a principal ideal

is commonly called the generator element. The ideal generated by is
denoted
Let Each equivalence class in is labeled
with the smallest degree element that it contains (a polynomial of degree
or, if it contains zero, it is labeled with zero.
Theorem 98 Let be a nonzero ideal in Then the
following is true.
1. There exists a unique monic polynomial of minimal degree.
2. is principal with generator
3. divides

Proof: First note that there is at least one nonzero ideal in the ring
namely, the ring itself. Since the degrees of the poly-
nomials in are bounded below, there must be at least one polynomial
of minimal degree. This polynomial can be made monic by dividing
through by the leading nonzero coefficient.
We now proceed with a proof by contradiction. Let and
be distinct monic polynomials of minimal degree in Since forms
an additive group, must also be in Since
and are monic, must be of lower degree, contradicting the
minimality of the degree of
Consider such that is not a multiple of Since
forms a Euclidean domain and can
be expressed as where
since and is an ideal. Since forms an additive
group, contradicting the minimality of the
degree of
Suppose does not divide in Since
is a Euclidean domain, can be expressed as
where Since it is the
additive inverse of contradicting
the minimality of the degree of
Chapter 3

LINEAR BLOCK CODES

From 1948 through the early 1990’s, there was a tension between ex-
istential results in coding theory and the reality of coding practice. As
we shall see, the existence of codes with desirable properties is readily
proved through probabilistic methods. The general approach is to con-
struct a class of structures, and then prove that the probability that a
structure with desirable properties exists is positive. This much is quite
straightforward, and the classical results that we will review in this chap-
ter have been known (and well regarded) for some time. The next step –
the actual construction of the desired codes – proved extremely difficult
and has been an elusive goal for some time. Chapter 5 –9 of this book
are dedicated to significant strides that have recently been taken toward
realizing this goal.
This is not to say, however, that the first few decades of the evolu-
tion of error control coding were not successful. In this chapter we will
describe several classical constructions for block codes that have had a
great impact on the telecommunications and computing industries. The
first error correcting codes, developed by Richard W. Hamming in the
late 1940’s, continue to play a significant role in error control for semi-
conductor memory chips. Both Reed-Muller, Golay, and Reed-Solomon
codes have been used in deep space telecommunications, while Reed-
Solomon codes made the digital audio revolution possible, and continue
to be used in wireless systems and new digital audio and video storage
technologies. The applications of classical block codes have been le-
gion and their impact substantial; their performance, on their own, has
simply fallen short of the Shannon limit. We include brief descriptions
here because of their general interest, as well as their potential use as
40 CODES, GRAPHS, AND ITERATIVE DECODING

component codes in the recursive constructions to be discussed in later

chapters.
We begin the chapter with a few basic definitions and an existential
result. We then proceed in the succeeding sections to describe several
different types of linear block codes. The reader interested in detailed
examples and implementation strategies is referred to [76] and [116].

1. Basic Structure of Linear Codes

Let denote the set of with each coordinate in
With vector addition and scalar multiplication as in Definition 52,
becomes a vector space over
Definition 99 A code that is a subspace of is said to be a
linear code1. The dimension of the space is called the dimension of
The code rate is the ratio of the dimension to the length The
next lemma follows directly from the definition of linear code.
Lemma 100 The minimum distance of a linear code is the mini-
mum weight of a nonzero codeword.
Lemma 101 Let be a linear code of dimension over Then

Proof: is a vector subspace, and thus has a basis. Let

be a basis for By definition of a basis, is distinct for all
distinct Simple counting shows that there are
distinct sets of coefficients.
Definition 102 (Generator Matrix) Let be a ba-
sis for a linear code with dimension The matrix

is a generator matrix for

The term “generator matrix” refers to the fact that G induces a linear
mapping from the space of over F to the code space Given a
(or “message”) the mapping generates the codeword

1
There are cases in which we may wish to define codes as subspaces over rings or other
algebraic structures. We will not pursue such cases in this text, and will thus restrict the
definition of linear codes to those constructed as vector subspaces over finite fields
Linear Block Codes 41

Definition 103 (Parity Check Matrix) Let be

a basis for the dual space for a linear code with dimension in
The matrix

is a parity check matrix for

The term “parity check matrix” refers to the fact that the null space
for the linear transformation induced by H is exactly the code space
This follows directly from Definition 61 in Chapter 2. It follows in turn
that if and only if
Each row of the parity check matrix places a parity constraint on
two or more of the coordinates of a codeword. In later chapters we will
consider graph-theoretic interpretations of these constraints.

Lemma 104 contains a codeword of weight if and only if

the parity check matrix H contains dependent columns.

Proof: Let where is a

vector. is a codeword if and only if which implies that

Corollary 105 The minimum distance of a linear block code is the min-
imum cardinality over all nonempty sets of linearly dependent columns
for any of its parity check matrices.

To construct a code with minimum distance it suffices to find a

matrix in which any set of columns is linearly independent. The
matrix is then used as the parity check matrix of the code.
An code is systematic if the first coordinates have each of the
possible combinations. Hence if the generator matrix G of is of the
form [I : P], where I is the identity matrix, then is systematic. To see
this, note that It follows
that is a parity check matrix for
Two codes are equivalent if one can be derived from the other by a
permutation of the coordinates. Elementary linear algebra tells us that
every matrix with independent rows can be converted to a matrix of the
form [I : P] by elementary row operations and column permutations.
42 CODES, GRAPHS, AND ITERATIVE DECODING

Lemma 106 Let G be a generator matrix of and let be a matrix

derived from G via elementary row operations and column permutations.
Then is a generator matrix of a code equivalent to

Corollary 107 Every linear block code is equivalent to a systematic

linear block code.

A given code can be modified to fit a particular application. The

Reed-Solomon codes used in compact audio disc players, for example,
are shortened, as are the Hamming codes used for error protection in
semiconductor memory.

Definition 108 (Shortened Codes) A code is shortened by deleting

a message coordinate from the encoding process. An code thus
becomes an code.

Definition 109 (Extended Codes) A code is extended by adding an

additional redundant coordinate. An code thus becomes an
code.

Definition 110 (Punctured Codes) A code is punctured by deleting

one of its parity coordinates. An code thus becomes an
code.

Definition 111 (Expurgated Codes) A code is expurgated by delet-

ing some of its codewords. If of the codewords are deleted in
a manner such that the remainder form a linear subcode, then a
code becomes a code.

Definition 112 (Augmented Codes) A code is augmented by adding

new codewords. If the number of codewords is increased by the factor
such that the resulting code is linear, then a code becomes
an code.

Definition 113 (Lengthened Codes) A code is lengthened by adding

message coordinates. An code thus becomes an code.

Randomly chosen linear codes can possess many nice properties such
as achieving some of the bounds shown in Chapter 1. The next theorem
shows one such example.

Theorem 114 Randomly selected linear block codes achieve the Gilbert-
Varshamov Bound as with high probability.
Linear Block Codes 43

Proof: We follow the proof of [12]. Let be a set of vectors

of size L, and let be the number of codewords of a randomly chosen
code contained in For all non-zero vectors in we have

since for some where is the

expectation operator.
Consider now the case of

Using the inequalities,

and

we have

Hence the probability that a random linear code contains a vector of

weight such that falls exponentially with

2. Repetition and Parity Check Codes

In the repetition code of length each information symbol is repeated
times.

Repetition Codes:

if and only if

Parameters: for a positive integer.

Parity check codes are most frequently defined over GF(2). They are
formed by appending a single bit to an information word. For “even
parity” the value of this appended bit is selected so as to make the total
44 CODES, GRAPHS, AND ITERATIVE DECODING

number of ones in the codeword even. For “odd parity,” the value is
selected so as to make the number of ones in the codeword odd.

Parity-Check Codes:
Even parity: if and only if
where denotes addition in GF(2).
Odd parity: if and only if

Parameters:

3. Hamming Codes
Hamming codes of length over are described by parity
check matrices whose columns are the set of distinct nonzero
where no column is a multiple of another. Note that exactly such
columns exist.
Lemma 115 For every there is a Hamming
code over

Binary Hamming Codes:

Let be the set of distinct nonzero binary
Let if and only if
Parameters:

Theorem 116 All Hamming codes are perfect.

Proof: By Theorem 17 and Definition 18, the result follows by showing
that the Hamming Bound is satisfied with equality. In other words, we
show that the spheres of radius 1 centered on the Hamming codewords
cover all vectors in the space in which the code is defined.
Therefore

The extended Hamming codes are formed by first adding a row of

ones to the bottom of the parity check matrix for a Hamming code,
Linear Block Codes 45

then appending the column vector This construction

results in a parity check matrix that requires that all codewords have
even parity. Given that the minimum distance of the original Hamming
code was three, the construction results in an extended Hamming code
of minimum distance four.
Example 117 (The (7, 4, 3) Hamming Code and its Extension)

Consider a parity check matrix that has as columns all distinct nonbinary

Clearly any that satisfies must have weight three or more.

It follows that the code is single-error correcting.
Note that this parity check matrix is in systematic form. A generator
matrix for this code is readily obtained.

The code is extended by adding a row of ones and the column vector
resulting in the following matrix.

The additional constraint of even parity increases the length to 8 and the
minimum distance to four.

4. Reed-Muller Codes
The codes that are now called Reed-Muller (RM) codes were first de-
scribed by Muller in 1954 using a “Boolean Net Function” language.
That same year Reed [92] recognized that Muller’s codes could be rep-
resented as multinomials over the binary field. The resulting “Reed-
Muller” (RM) codes were an important step beyond the Hamming and
46 CODES, GRAPHS, AND ITERATIVE DECODING

Golay codes of 1949 and 1950 because of their flexibility in correcting

varying numbers of errors per codeword. More recently, Reed-Muller
codes have been re-discovered in the ongoing development of wavelets in
the signal processing community [98].
We begin with Boolean functions. A Boolean function in vari-
ables is a mapping from the vector space of binary
m-tuples into the set {0,1}. Boolean
functions are completely described by an -element truth
table. The first rows of the table form an matrix that con-
tains as columns all binary The bottom row contains the
binary value assigned to each of the by the Boolean function.
The following, for example, is the truth table for the Boolean function
Note that addition is in the binary field.

Let be the vectors in whose coordinates are the ele-

ments of the corresponding rows of the truth table for
Let be the vector associated with the truth table for the
Boolean monomial A few useful results can now be
derived.
Since is binary with length there are distinct Boolean func-
tions in variables. Under coordinate-by-coordinate binary addition of
the associated vectors, the Boolean functions form the vector space
over GF(2).
Let the set consist of all Boolean monomials in binary variables.
Since and represent the same Boolean function, consists of the
Boolean function 1 and the products of all combinations of one or more
variables in the set

Since the Boolean functions in are linearly independent, the vectors

with which they are associated are also linearly independent in over
GF(2). There is thus a unique Boolean function for every vector of
the form
Since there are a total of such vectors, forms a
basis for the vector space of Boolean functions in variables.
Linear Block Codes 47

The Reed-Muller code of order and length is the set of

vectors associated with all Boolean functions in variables with degree
less than or equal to The vectors associated with the monomials of
degree less than or equal to in form a basis for It follows
that the dimension of is

Example 118 – A First Order RM Code of Length 8)

The monomials in 3 variables of degree 1 or less are Each
of these monomials is associated with a vector as shown below.

The codewords in consist of the 16 distinct linear combinations

of these vectors. Since the four vectors form a basis set for we
can employ them as the rows of a generator matrix.

This generator matrix should look familiar. It is also the parity check
matrix for an (8, 4) extended Hamming code. First order Reed-Muller
codes are the duals of extended Hamming codes.
has length 8, dimension 4, and minimum distance 4. It is thus
single error correcting and double error detecting.
To determine the minimum distance of we first prove the
following lemma.
Lemma 119
where denotes the concatenation of and
Proof: By definition, the codewords in are associated
with Boolean functions of degree For each
function there exists and
such that Since
has degree and has degree the associated vectors and
can be found in and respectively.
Now let and
The associated vectors have the form where
48 CODES, GRAPHS, AND ITERATIVE DECODING

It follows that

Theorem 120 The minimum distance of is

Proof: Proceed by induction on For is the length-2
repetition code with consists of all 2-tuples and
has
Assume that, for fixed the minimum distance of is
Consider and and in Applying Lemma
119,
If then twice the minimum
distance of If then
Note that since the
nonzero elements in and may not completely overlap. It follows that
Since
The result follows.
We conclude this brief overview of Reed-Muller codes by noting that
the dual of a Reed-Muller code is always a Reed-Muller code.
Theorem 121 (Dual Codes of Reed-Muller Codes) For
is the dual code to
Proof: [76]: Consider the codewords and
By definition, a is associated with a polynomial
of degree and is associated with a polynomial
of degree The polynomial product has degree
and is thus associated with a codeword in the parity check code
has even weight, so the inner product mod 2.
is thus contained in the dual space of However,
since must
be the dual code of by Theorem 63.
Linear Block Codes 49

Binary Reed-Muller Codes:

The binary Reed-Muller code of order and length is the

set of vectors associated with all Boolean functions
of degree less than or equal to

Parameters:

5. Cyclic Codes
We denote the right cyclic shift of as

The right cyclic shift of is denoted by

A block code is cyclic if and by transitivity In
analyzing cyclic codes, we associate the codeword
with a code polynomial
Theorem 122 is a linear cyclic code of length if and only if
the code polynomials in form an ideal
Proof: We first assume that is a linear cyclic code of length
and show that forms an ideal in Consider
and Since is cyclic, mod
The product mod is then associated with a
linear combination of cyclically shifted versions of

Since forms a vector space, must be a valid code polyno-

mial. It follows immediately that the space of code polynomials forms
an ideal.
Now suppose that the space of code polynomials forms an ideal. Since
it follows from the definition of an ideal that if
is a valid code polynomial, then mod is a valid
code polynomial, and the result follows.

Theorem 98 from Chapter 2 translates into the following result for

cyclic codes.
50 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 123 Let be a linear cyclic code.

1. Within the set of code polynomials in there is a unique monic poly-
nomial with minimal degree is called the generator
polynomial of
2. Every code polynomial in can be expressed uniquely as
where is the generator polynomial of and is
a polynomial of degree less than in
3. The generator polynomial of is a factor of in

Cyclic Codes:
A cyclic code of length is a principal ideal
if and only if
for some
The parameters of cyclic codes depend on the specific type of construc-
tion adopted. The minimum distances of some cyclic codes are unknown.

Shortened cyclic codes (see Definition 108) are frequently referred to

as cyclic redundancy check (CRC) codes or polynomial codes. See, for
example, [116].

6. Quadratic Residue Codes

Definition 124 (Quadratic Residues) A quadratic residue modulo
a prime, is a number that is a square of another (possibly the same)
number modulo
Example 125 (Quadratic Residues)

The quadratic residues modulo 5 are

The quadratic residues modulo 7 are
Let be the set of quadratic residues modulo a prime, and the
set of corresponding nonresidues. From the definition of there
must exist at least one primitive element that generates
all of the elements in and must be a quadratic nonresidue,
otherwise there would exist some element with order
1), contradicting Theorem 47 in Chapter 2. It follows that if
and only if is even; otherwise, We conclude that all of the
elements in correspond to the first consecutive powers of
and is a cyclic group under modulo multiplication.
Linear Block Codes 51

Consider a field that contains a primitive root of unity.

Such a field exists for a given and whenever (see The-
orem 50 in Chapter 2). In the construction that follows, must also
be a quadratic residue modulo This can be somewhat restrictive; for
example, if then must be of the form Since is a
cyclic group and thus closed under multiplication, the conjugates with
respect to of any element in must also be in It follows that
is the union of one or more cyclotomic cosets modulo with respect
to
Let be primitive in The above results and Theorem 86 in
Chapter 2 show that the following polynomials have coefficients in the
subfield

Furthermore
The quadratic residue codes Q, N, of length are defined
by the generator polynomials and
respectively.
Theorem 126 The minimum distance of or satisfies
Furthermore, if is of the form then
Proof: See [76], p. 483.

Quadratic Residue Codes:

Let and be the quadratic residues and nonresidues respec-
tively. Let be primitive in where and The
quadratic residue codes of length are cyclic codes with generator
polynomials and

Parameters: a prime, and

7. Golay Codes
The binary Golay code is the (23, 12, 7) quadratic residue code
with and The quadratic residues modulo 23 are
Let be a primitive 23rd root of unity.
52 CODES, GRAPHS, AND ITERATIVE DECODING

The distinct powers of form two cyclotomic cosets modulo 23 with

respect to GF(2).

It follows that

Depending on the selection of there are two possible generator

polynomials for

Using either of these generator polynomials, the resulting code can be

shown to have the following properties [76]:
Let have even weight. Then
The minimum distance of is 7, and the code is triple-error-
correcting.
Theorem 127 is perfect.
Proof:

and the (23, 12, 11) binary Golay code is perfect.

The extended Golay code is obtained by adding a parity bit to

each codeword of The additional parity bit increases the minimum
distance of the code by 1 (see, for example, [116]).
The ternary Golay code is the quadratic residue code with
and factors into three irreducible polynomials in
Linear Block Codes 53

Again there are two possible generator polynomials:

has length 11, dimension 6, and minimum distance 5.

Theorem 128 is perfect.
Proof:

and the (11, 6, 4) ternary Golay code is perfect.

We have now described the entire family of perfect codes. The follow-
ing powerful result is due to Tietäväinen [111].
Theorem 129 Any nontrivial perfect code must have the same length
symbol alphabet and cardinality as a Hamming, Golay,
or repetition code.
Proof: See [76] or [111].

8. BCH and Reed-Solomon Codes

BCH and Reed-Solomon codes remain the core of the most powerful
known algebraic codes and have seen widespread application in the past
forty years. The fundamental work on BCH codes was conducted by
two independent research teams that published their results at roughly
the same time. Binary BCH codes were discussed as “a generalization of
Hamming’s work” by A. Hocquenghem in a 1959 paper entitled “Codes
correcteurs d’erreurs” [50]. This was followed in March and September
of 1960 by Bose and Ray-Chaudhuri’s publications on “Error Correct-
ing Binary Group Codes” [21, 22]. Given their simultaneous discovery
of these codes, all three gentlemen have given their name to what are
now called BCH codes. Shortly after these initial publications, Peterson
proved that BCH codes were cyclic and presented a moderately efficient
decoding algorithm [88]. Gorenstein and Zierler then extended BCH
codes to arbitrary fields of size Today BCH codes are used for
54 CODES, GRAPHS, AND ITERATIVE DECODING

a variety of wireless applications, including Intelsat satellite communi-

cations and one- and two-way digital messaging.
Reed-Solomon codes were first described in a June 1960 paper in the
SIAM Journal on Applied Mathematics by Irving Reed and Gustave
Solomon. The paper, “Polynomial Codes over Certain Finite Fields”,
introduced an extremely powerful set of nonbinary codes that are now
known as Reed-Solomon codes. Through the work of Gorenstein and
Zierler it was later discovered that Reed-Solomon codes and BCH codes
are closely related, and that Reed-Solomon codes can be described as
nonbinary BCH codes. Several other approaches to the design and anal-
ysis have since been discovered. The application of Fourier transforms
over finite fields to BCH and Reed-Solomon codes was first discussed
by Gore in 1973 [44]. The transform approach was then pursued by a
number of other authors. Reed-Solomon codes are the most frequently
used error control codes in the world, as they form the heart of the cross-
interleaved protocol used in digital audio discs [52]. They have also seen
extensive use in deep space telecommunications [117].
We begin with the key theorem that distinguishes BCH and Reed-
Solomon codes from their fellow cyclic codes.
Theorem 130 (The BCH Bound) Let be a cyclic code
with generator polynomial Let be the multiplicative order of
modulo is thus the smallest extension field of that
contains a primitive root of unity). Let be a primitive root of
unity. Let be the minimal polynomial of over and set
to the least common multiple of for
some integers and thus has consecutive
powers of as zeros. The code defined by has minimum distance

Proof: The proof follows by first showing that the constraint placed on
the generator polynomial in the premise ensures that all of the square
submatrices of a BCH parity check matrix are Vandermonde. Vander-
monde matrices are nonsingular, thus placing a lower bound on the
minimum distance of the code. The details can be found in [116].
An equivalent version of the BCH bound can be proven using Galois
Field Fourier Transforms.
Theorem 131 Let divide for some positive integer A
with weight that also has consecutive zeros
in its spectrum must be the all-zero vector.
Proof: Let be a vector with exactly nonzero coordinates, these
coordinates being in positions We now define a “locator
Linear Block Codes 55

polynomial” in the frequency domain whose zeros correspond to

the nonzero coordinates of in the time domain.
The inverse transform of is thus a time domain vector that has
zero coordinates in the exact positions where has nonzero coordinates
and vice versa. The time domain products are thus zero for all in-
dices The transform of these products must also be zero. Theorem 89
then implies that the convolution of the transform of c and the transform
of , is zero as well. Now assume that has weight less than or equal to
while its transform has consecutive zeros. The coordinates
are thus equal to zero for all We also know that
by its definition above ( cannot be a factor). The frequency domain
convolution of and can then be expressed as follows.
By substituting the consecutive zero coordinates of into the last
expression, the recursion shows that is all-zero, and thus its inverse
transform is as well.
Definition 132 (Designed Distance) The parameter in Theo-
rem 130 is the designed distance of the BCH code defined by the gener-
ator polynomial
Definition 133 (Narrow-Sense and Primitive BCH Codes) If
then the BCH code is narrow-sense. If for some
positive integer then the BCH code is primitive. The latter name
follows from the fact that the root of unity is a primitive element
in
Example 134 (Binary BCH Codes of Length 31) Let be a root
of the primitive polynomial and thus a primitive element
in GF(32). Since 31 is of the form the BCH codes constructed
in this example are primitive. We begin by determining the cyclotomic
cosets modulo 31 with respect to GF(2) and the associated minimal poly-
nomials.
If is to be a binary cyclic code, then it must have a generator poly-
nomial that factors into one or more of the above minimal polyno-
mials. If is to be correcting BCH code, then must have as
zeros 2t consecutive powers of
Theorem 135 A primitive BCH code of length and de-
signed distance has dimension
Proof: Note that the minimal polynomial of has degree at most
Since
56 CODES, GRAPHS, AND ITERATIVE DECODING

we have
To show this using matrices, observe that the parity check matrix H
has rows with elements in Convert each element of
H into a column vector of length over and delete those rows
that are linearly dependent, if any. Hence

The BCH Code Design Procedure:

To construct a t-error correcting BCH code of length

1 Find a primitive root of unity in a field where is
minimal.
2 Select consecutive powers of starting with for
some nonnegative integer
3 Let be the least common multiple of the minimal polynomials
for the selected powers of with respect to

Step 1 follows from the design procedure for general cyclic codes.
Steps 2 and 3 ensure, through the BCH Bound, that the minimum dis-
tance of the resulting code equals or exceeds and that the generator
polynomial has the minimal possible degree. Since is a product of
minimal polynomials with respect to must be in
and the corresponding code is thus with
Linear Block Codes 57

Definition 136 (Reed-Solomon Codes) A Reed-Solomon code is a

BCH code of length
Consider the construction of a correcting Reed-Solomon code of
length The first step is to note that the required primitive
root of unity can be found in Since the code symbols
are to be from the next step is to construct the cyclotomic
cosets modulo with respect to This is a trivial task,
for mod The cyclotomic cosets are singleton sets
of the form and the associated minimal polynomials are of the form

The BCH Bound indicates that consecutive powers of are required

as zeros of the generator polynomial for a Reed-
Solomon code. The generator polynomial is the product of the associ-
ated minimal polynomials:

Theorem 137 (Minimum Distance of Reed-Solomon Codes)

Reed-Solomon codes have minimum distance

Proof: Let be an Reed-Solomon code. The Singleton Bound
places an upper bound of on the minimum distance of all
codes. We now apply the BCH Bound. The degree of the generator
polynomial is so it must contain as roots
consecutive powers of a primitive root of unity. The BCH Bound
provides a lower bound on the minimum distance,
and the proof follows.
Example 138 (A (63, 49, 7) 64-ary Reed-Solomon Code) Let
be a root of the primitive polynomial To be triple-error-
correcting, must have six consecutive powers of as zeros.

defines a (63, 57) triple-error-correcting Reed-Solomon code.

The extension of a Reed-Solomon code is the code obtained by
adding a parity check bit to each codeword of If is an code,
then the extension code is a code. To see this, let
be a codeword of weight It suffices
to show that or Since where
58 CODES, GRAPHS, AND ITERATIVE DECODING

is the information polynomial and is the generator polynomial,

since will be a multiple of
which has weight otherwise. since

9. Product Codes

Product Codes:

where is in the form of a two-dimensional array

such that the rows and the columns of the array are the codewords of the
code and the code respectively. Assume is a code,

Parameters:

Lemma 139 If is an code over and is an

code over then is an code
over
Proof: The block and information lengths follow from the definition.
To get the minimum distance of the code, it suffices to consider the
minimum weight of non zero codewords by linearity of the code. A
codeword will have at least non zero entries in every nonzero row
and at least non-zero entries in every nonzero column. Hence all
codewords have at least nonzero entries where

We establish the condition for a product code of two cyclic codes to be

a cyclic code. A cyclic product code, as the name suggests, has the nice
property of both having algebraic structures of the class of cyclic codes
and of the class of recursive codes. Consider a product code defined by an
by array and identify each element in the array by an ordered pair
where and refers to the row and column, respectively. Define
such that the element in the array identified by the ordered pair
is located in the position in the sequence of symbols
of a codeword of the code. We will need the following result first.
Theorem 140 (Chinese Remainder Theorem) If for
then the set of congruences
Linear Block Codes 59

given has a solution.

Proof: See, for example, [76].

Theorem 141 The product of two cyclic codes is a cyclic code if

1 such that
2

Proof: The first condition guarantees that the second condition is well-
defined. That is, if then the Chinese Remainder Theorem
guarantees that are all distinct for and
If then there exists some and
such that
To show that second condition guarantees that the product code is
cyclic, observe that given a codeword of the product code, a cyclic shift
of it yields a cyclic shift of a row and column of the array. Thus, a cyclic
shift of a codeword of the product code is a codeword of the product code
since the rows of the array are codewords of and the columns of the
array are the codewords of by definitions of and respectively.
Going in the other direction, a cyclic shift of a row or a column of the
array yields a shift of the codeword of the product code for some
since

Example 142 (Cyclic Code of Length 15) Here,

and is shown for an and array.
0 6 12 3 9
10 1 7 13 4
5 11 2 8 14
It is clear that a cyclic shift of a position ordering of the product code
results in a cyclic shift of rows and columns. Going in the other direction,
a cyclic shift of any row or column gives a shift of a codeword
of the product code for some
Consider now polynomial representations of codes. That is, represent
each vector as a degree polynomial

Theorem 143 If and have generator polynomials and

respectively, then has the generator polynomial
60 CODES, GRAPHS, AND ITERATIVE DECODING

Proof: Let be the value of the element in row and column of

the array of the product code. Without loss of generality, let
The codewords of take the form

Re-expressing for some we have

since This shows that

A similar argument shows that which yields

On the other hand, since this

shows that

Then, after the usual calculations, we have

More detailed analyses of product codes can be found in [49, 56, 68]
and the references therein.
Chapter 4

CONVOLUTIONAL AND CONCATENATED

CODES

In this chapter we describe convolutional and concatenated codes.

Convolutional codes were introduced by Elias in 1955 [36]. He showed
that redundancy can be introduced into a data stream through the use
of a linear shift register acting as a digital tapped delay line. He also
showed that the resulting codes were very good when randomly chosen.
This result was very interesting for it correlated with Shannon’s demon-
stration that there exist randomly selected codes that, on the average,
provide arbitrarily high levels of reliability given data transmission at a
rate less than the channel capacity (see Chapter 1 and [100]). Convolu-
tional codes have been used in many voice applications, such as cellular
telephony, and are a common choice for component codes in parallel
concatenated encoders.
Concatenated codes are a combination of two or more component
codes. Serial concatenated codes were introduced by G. David Forney,
Jr. in his 1965 doctoral thesis (reprinted in expanded form in [37]). In a
serial concatenated system, an “outer” component encoder is first used
to encode the source data. The result of this encoding process is then
encoded yet again using a second, “inner” component encoder. Serial
concatenated codes are generally used on power-limited channels, with
the classical example being that of the deep space channel. We will
briefly consider the former CCSDS standard for deep space telemetry;
the interested reader is referred to [117] for a more detailed discussion.
Parallel concatenated encoders were introduced by Berrou, Glavieux,
and Thitimajshima in their seminal paper on turbo coding [19]. In par-
allel concatenated encoding, two or more component encoders operate
simultaneously on the source data or a permuted image of the source
data. The permuation is important, as we shall see.
62 CODES, GRAPHS, AND ITERATIVE DECODING

1. Convolutional Encoders
Figure 4.1 shows a pair of rate-1/2, linear, nonrecursive convolutional
encoders. The encoders operate by taking blocks as inputs, and
generation blocks at the output. In this particular case, the en-
coder outputs two bits for every input bit, and is thus said to have
rate-1/2. Figure 4.2 shows a rate-2/3 convolutional encoder. In gen-
eral, an encoder with inputs and outputs is said to have rate
even though the delay lines introduce a “fractional rate loss” (see [116],
Chapter 11).
Convolutional and Concatenated Codes 63

The encoders in Figures 4.1 and 4.2 are nonrecursive in that they do
not employ feedback in the encoding operation. The encoding operation
can be described as a linear combination of the current input and a finite
number of past inputs. The linear combination is generally expressed
in terms of generator sequences for the encoders. A generator sequence
relates a particular input sequence to a particular output
sequence A particular value of denotes the presence or ab-
sence of a tap connecting the memory element of the input shift
register to the output. The generator sequences for the encoders in
Figure 4.1 are and and
and
The output equations for a convolutional encoder have the general
form

The output can be seen to be the sums of the convolutions of the input
sequences with the associated encoder generator sequences. Note that
the operations are addition and multiplication in GF(2).
The encoders are linear in that, as can be seen from the above expres-
sion, the encoding operation obeys the superposition principle - linear
combinations of input blocks correspond to linear combinations of the
associated output blocks.
The encoder in Figure 4.1 (a) is systematic, meaning that one of its
outputs is a copy of the source data. This is not the case for the encoder
in Figure 4.1(b), so is it called nonsystematic. The encoders in Figure
4.3 differ from those in Figure 4.1 in that the former employ feedback,
and are thus recursive.
The memory for each of the inputs of any of the above encoders is
enumerated by the memory vector where the input
shift register has memory elements. It is assumed that for each i there
is at least one with The state complexity of the encoder is
determined by the total encoder memory The
number of states in the encoder is while the the constraint length of
the convolutional encoder is where
The most convenient means for relating the output of any convolu-
tional encoder to the input, particularly in the case of a recursive en-
coder, is through the “D transform.” The D transform of a temporal
sequence is the polynomial
where D denotes relative delay. Using this simple tool, the
output of an non-recursive encoder can be written in terms of the input
by the matrix expression
64 CODES, GRAPHS, AND ITERATIVE DECODING

where The polynomial matrix G(D) is often called a

generator matrix for the encoder. In the non-recursive case, each term of
the generator matrix is a polynomial of degree
at most The encoders in Figures 4.1(b) and 4.2 we have the following
generator matrices.

The elements of the generator matrix for a recursive encoder are ra-
tional functions in D with binary coefficients. For example, the encoders
in Figure 4.3 have the following generator matrices.
Convolutional and Concatenated Codes 65

A binary convolutional code (BCC) is the set of codewords produced

at the output of a BCE. It should be noted that a given BCC can be
generated by many different BCE’s. For example, Figure 4.1(b) and
Figures 4.3(a) and (b) all generate the same BCC. The difference lies in
the mapping of messages to codewords. The importance of this difference
is dealt with in some detail in the last section of this chapter. A more
complete treatment of this point can be found in Chapter 3 of [48].

2. Analysis of Component Codes

In this section we investigate the weight distributions of individual
and concatenated component codes. The weight distribution of a code
can be expressed in the form of a weight enumerating function (WEF).
The WEF for a code of length has the form

where is the number of codewords of weight For systematic en-

coders, an input-redundancy weight enumerating function (IRWEF) can
be used to separate codeword weights into information and parity block
weights. The IRWEF has the form

where is the number of codewords with an information block of

weight and a parity block of weight The overall weight of the
codeword is, obviously, so it follows that the WEF coefficients
can be computed from the IRWEF coefficients using the expression
In analyzing performance, it is often useful to group the
terms in the IRWEF according to the information block weight. The
conditional IRWEF enumerates the parity block weights for codewords
associated with information blocks of a particular weight
66 CODES, GRAPHS, AND ITERATIVE DECODING

Note that one can obtain IRWEF given its conditional IRWEF and vice
versa.

The minimum distance between a distinct pair of valid transmitted

sequences is the free Euclidean distance If the signal-to-noise ra-
tio(SNR) is high, then the probability of decoder error will be dominated
by those sequences that are closest to the transmitted codeword. It fol-
lows that if is increased, then code performance in the high SNR
region will improve. On the other hand, if the SNR is low, sequences
with weights above will make significant contributions to the de-
coder error probability.
In 1993 Thitimajshima wrote a doctoral thesis [110] in which he com-
pared overall codeword weight sequences and information weight
sequences for various nonsystematic and recursive systematic con-

tional (RSC) encoder from a nonsystematic convolutional(NSC) encoder,

one simply uses a feedback loop and sets one of the outputs of the en-
coder to equal the current input bit. For example if an NSC encoder
has generator matrix where and are the generating func-
tions, then its associated RSC code is represented as In [19], the
component code used was a

RSC code.
NSC and RSC codes generate the same set of codewords, and thus
the sequences for an NSC code and its associated RSC code are the
same. However, the mapping of information sequences to code sequences
differs between the two encoders. The difference can be characterized
as follows: RSC encoders exhibit slower growth in the information
weight sequences than do the NSC encoders. It follows that the RSC
encoders provide a more favorable bit error rate at low SNR’s, as the
number of information bits affected by decoders errors is proportional
to the weight of the information blocks associated with low-weight code-
words. At high SNR’s, only the codewords of weight and
have significant impact, and the lower and provided by
NSC codes offer better performance.
Convolutional and Concatenated Codes 67

As an example, consider the following generator matrices from [48]:

Both encoders generate the same set of output sequences, however, in-
formation bits get mapped to different codewords for the two encoders.
To explore this in detail, we introduce the input-output weight enumer-
ating function (IOWEF). In the IOWEF A(W, X), the exponent fo r the
dummy variable W denotes information sequence weight, while the ex-
ponent for X denotes the codeword weight. Using Mason’s gain rule
(see, for example, Chapter 11 in [116]), it can be shown that the NSC
encoder has the following IOWEF

and RSC code has

Using long division, we can compute the terms for the codeword weights
1 through 10.

and

Note first that both encoders provide which is the max-

imum possible for an encoder with memory order 2. It is also clear
that both encoders generate the same number of distinct codewords of
weights 5 through 10. The difference lies in the weights of the informa-
tion sequences associated with the codewords of a given weight. The
nonsystematic encoder associates a weight-1 information sequence with
the minimum-weight codeword. The recursive systematic encoder, on
the other hand, associates a weight-3 information sequence with the
minimum-weight codeword. At high SNR’s, the most likely decoder
error event causes three times as many bit errors with the recursive sys-
tematic encoder as with the nonsystematic encoder. At lower SNR’s,
68 CODES, GRAPHS, AND ITERATIVE DECODING

the balance starts to shift in favor of the recursive systematic encoder.

Observe that the total information weight for weight-8 codewords is the
same for both encoders. For error events associated with higher weight
codewords, the recursive systematic encoder has the lower information
weight and the advantage in bit error rate performance.

3. Concatenated Codes
Concatenated error control systems use two or more component codes
in an effort to provide a high level of error control performance at the
expense of only a moderate level of complexity. A concatenated encoder
consists of two or more component encoders that combine to generate
a long code with good properties. The decoder uses the component
code substructure of the concatenated encoder to realize a multi-stage
implementation that is much less complex than a single-stage approach.
In this chapter we consider both the original, serial form of concatenation
as well as the more recent, parallel form. The former allows for various
forms of iterative decoding that will be discussed briefly here. The latter
was developed in conjunction with turbo iterative decoding, which will
be the subject of a later chapter. The details of the performance of
parallel concatenated codes are presented at the end of this chapter.
Turbo decoding of parallel concatenated codes is described in Chapter
7.

Serial Concatenated Encoders

As exemplified in Figure 4.4, a serial concatenated error control sys-

tem consists of two distinct sets of encoders and decoders. A data block
of length is first encoded by an “outer” code The re-
sulting i s then encoded b y an “inner” code

Decoding is performed in two stages, with a decoding operation

followed by a decoding operation. This two stage decoding operation
is far less complex than the alternative one stage decoder. In 1966 Forney
showed that serial concatenation provides an exponential decrease in
error probability at the cost of an increase in complexity that is a small
power of the block length [37].
In this section we consider the serial concatenated Consultative Com-
mittee for Space Data Systems (CCSDS) error control standard for deep
space telemetry. The CCSDS system serves as a useful design example,
but is also of interest in that it has served as a focus for research into
iterative algorithms (see, for example, [117]).
Convolutional and Concatenated Codes 69

The Serial Concatenated CCSDS Deep Space Telemetry Stan-

dard

The inner, convolutional code in this system is the rate-1/2, constraint

length-7 BCE with generator sequences
Before its adoption by the CCSDS, this code had been used as one of the
two convolutional codes in the NASA “Planetary Standard” (the other
is a rate-1/3 code [117]). The CCSDS standard encoder is a feedforward
design in which the first symbol is associated with The output is
inverted.
The inner decoder associated with the inner encoder is a Viterbi de-
coder. Viterbi decoders are maximum likelihood sequence decoders. We
will discuss them within the context of algorithms on graphs in Chapter
6. A tutorial on the Viterbi algorithm and its implementation can be
found in Chapter 12 of [116].
The outer code in Figure 4.4 is a (255, 223) Reed-Solomon (RS) outer
code defined over GF(256). The MDS property of the RS code is a
strong factor in the effectiveness of Reed-Solomon codes as outer codes
in serial concatenated systems, but the fact that Reed-Solomon codes
are nonbinary also plays a role. Decoders for nonbinary codes correct
symbols (blocks of consecutive bits) as opposed to individual bits. If
the error process at the output of the inner decoder is bursty, then a
symbol-oriented outer decoder will provide some additional coding gain
70 CODES, GRAPHS, AND ITERATIVE DECODING

by “trapping” consecutive bit errors within a relatively small number of

symbols.
To fully appreciate the impact of the burst trapping capability of
Reed-Solomon codes in the CCSDS standard, we have to consider the
error pattern statistics at the output of the inner Viterbi decoder. Recall
that the constraint length (µ + 1) is the maximum number of output
blocks that can be directly affected by an input bit. The convolutional
code used in the standard is linear, so the error patterns at the output
of the Viterbi decoder must themselves be valid codewords. It follows
that the error patterns will stretch over (µ + 1) or more consecutive
trellis branches, and that the information bit errors will occur in bursts
of comparable length.
Outer decoder performance is improved by placing a symbol inter-
leaver and de-interleaver between the inner and outer encoders and de-
coders, respectively. The interleaver has no impact on the number of bit
errors seen by the outer decoder over a long period of time, but it does
reduce the variance of the error process, distributing the error control
load more evenly among the codewords. Note that is too large an inter-
leaver is used, the impact of the nonbinary Reed-Solomon outer code is
reduced.
This serial concatenated system has been used extensively by both
NASA and the European Space Agency (example missions include Voy-
ager, Giotto, and Galileo). A detailed discussion of the role of the serial
concatenated CCSDS standard can be found in [117]

Parallel Concatenated Encoders

Parallel concatenation was originally introduced by Berrou, Glavieux,
and Thitimajshima[19]. It is one of the two concepts critical to turbo
coding, the other being iterative decoding. Parallel concatenated en-
coders (PCE’s), as with serial, use two or more component encoders in
conjunction with an interleaver to efficiently construct long codes with
nice properties (we will provide a definition for “nice” in the next sec-
tion). Figure 4.5 shows a rate-1/3 PCE.
A block of source information bits m is encoded by a component
encoder to create a parity sequence The message bits are also
interleaved and encoded by to create a second parity sequence
The message bits and two parity streams are then multiplexed to create
a coded sequence In this case the result is a rate-1/3 code. To achieve
higher rates, the coded streams can be punctured.
Code sequences generated through parallel concatenation have been
shown to provide reliability performance within a few tenths of a dB of
Convolutional and Concatenated Codes 71

the Shannon limit [15, 16, 17, 19, 20, 31, 32, 33, 47, 89]. The reasons
for this excellent performance lie in the weight distributions of the com-
ponent codes and the overall code. This will be investigated in detail in
the next section.

4. Analysis of Parallel Concatenated Codes

A recursive systematic encoder output is said to be a finite codeword
if the output has finite weight. Otherwise, it is an infinite codeword.
A recursive systematic encoder generates finite codewords only under
well-defined conditions. Consider a recursive systematic encoder with
generator matrix An input sequence is mapped
to an output sequence The output will be a
finite codeword if and only if is divisible by
Input sequences of weight 2 are particularly important in parallel
concatenated codes, in part because they are the lowest weight input
sequences associated with finite-weight codewords. A weight-2 input
sequence in which the nonzero elements are separated by bits has the
form The smallest for which divides
is said to be the period of The period of a binary polynomial of
degree is less than or equal to A polynomial of degree is
said to be primitive if its period is equal to In designing parallel
concatenated codes we will be interested in minimizing the number of
weight-2 sequences in a finite block of bits that generate finite codewords.
The parallel concatenated code constructed by Berrou et al. [19]
provides excellent performance in the low SNR region1. The generator
in the encoder is not primitive. Using a parallel concatenated

1
Also called the “waterfall region” due to the shape of the bit error rate curve in this region
which falls off steeply
72 CODES, GRAPHS, AND ITERATIVE DECODING

code in which is primitive, one can improve the code performance

in the mid to high SNR region2 [89]. However, this code gives slightly
worse performance in the low SNR region than the original code. Using
a parallel concatenated code in which one component code has
that is not primitive, and the other componenent code has that
is primitive, one obtains code performance that is in between that of a
parallel concatenated code with both component codes of non-primitive
and one with both component codes of primitive [106].
Using the weight enumerator analysis developed in the previous sec-
tion, we will now draw some conclusions as to why parallel concatenated
encoders provide such excellent error control performance.
Let denote the conditional IRWEF for the two com-
ponent encoders in the parallel concatenated code (PCC). To facilitate
the analysis we assume that the interleaver is uniform; i.e. if we re-
peatedly apply a particular input sequence of length and weight to
the input of the interleaver, the possible sequences of length and
weight occur at the output of the interleaver with equal probability
(frequency). The PCC with a uniform interleaver thus has a uniform
probability of matching a given output sequence indicated as a term in
with any given sequence in It follows that the conditional
IRWEF for the overall PCC is

We can now explore the impact of interleaver depth on PCC per-

formance. The bit error rate performance of the PCC over an AWGN
channel can be approximated by [116]

To further explore the structure of Equation (4.2) we need to find a

way to expand We begin by assuming that we are performing

2
Also called the “error floor region,” again, due to the shape of the bit error rate curve in
this region which levels off horizontally
Convolutional and Concatenated Codes 73

maximum likelihood sequence decoding at the receiver. If we further

assume that the all-zero codeword has been transmitted, then decoder
errors correspond to the selection of nonzero codewords by the decoder.
These nonzero codewords are associated with a pair of nonzero informa-
tion blocks (one an interleaved version of the other) at the inputs to the
two convolutional encoders. We will now attempt to characterize these
information blocks.
For a single convolutional code, a simple error event is denned to
be the nonzero output sequence associated with a path through the
convolutional code graph that starts and stops at the all-zero state,
but does not visit the all-zero state at any intermediate point. The
nonzero simple codewords that are enumerated using Mason’s gain rule
are exactly the simple error events, given the assumption that the all-
zero codeword has been transmitted.
Each simple error event is associated with a sequence of information
bits that contains nonzero values. At this point in the analysis, we need
to determine how many and in what manner such sequences of informa-
tion bits can be arranged within the PCC interleaver. If the interleaver
depth is much larger than the memory order of the component con-
volutional codes, then the lengths of the nonzero information sequences
that are associated with the simple error events of most likely occurence
will be much less than We can then assume that the information
sequences associated with simple error events can be arranged in ap-
proximately ways in the interleaver. (There are approximately
positions in the interleaver from which to arbitrarily choose starting
points for the information sequences associated with the error events.)
Let

be the m-event enumerating function, where is the number of convo-

lutional code codewords consisting of the concatenation of simple error
events with total information weight and parity weight Note that
“concatenation” does not allow for strings of zeros between the simple
error events. If is set to one, the are simply the coefficients for
the IRWEF derived through Mason’s gain rule. The conditional weight
enumerating function for the convolutional code is then approximated
by

for where is the largest number of distinct simple error

events that can be associated with a information sequence in
74 CODES, GRAPHS, AND ITERATIVE DECODING

either component encoder The conditional

weight enumerating function for a PCC consisting of two convolutional
codes and a uniform interleaver of length can now be expressed as

Using the approximation which is particularly accurate

when is small relative to we get

A further approximation is made by substituting for both

and The summation is replaced by a factor which can be
deleted to offset the effect of the substitution. We also assume that
the two component encoders are identical, though this is not strictly
necessary.

Finally, introducing this into the AWGN performance expression, we

have

where is the minimum information weight for error events in the

component convolutional codes. This expression allows for several gener-
alizations. But given that this is an approximation, we note beforehand
that the generalizations must be supported with empirical evidence.
In non-recursive encoders, the minimum information weight associ-
ated with an error event is The maximum number of distinct
simple error events that we can associate with nonzero information
bits is then It follows that since we
have potential error events to assign to each of positions in
the concatenated string. We then have
Convolutional and Concatenated Codes 75

Equation (4.4) has an interesting interpretation. All information

blocks with weight are associated with low-weight (often the
lowest weight) codewords. At high SNR’s, these codewords will be the
primary contributors to the bit error rate. When is substituted
into the summand of Equation (4.4), the factor goes to unity. We
conclude that, for non-recursive component encoders, the size of the in-
terleaver has very little or no impact on the bit error rate! We can see
this intuitively by noting that if the information sequence has weight
one, the interleaver is not able to counter a low-weight output from one
encoder with a high-weight output from the other. Both outputs will be
of exactly the same weight (ignoring boundary effects).
If the convolutional codes are recursive, we see a different result.
Weight-one inputs to non-catastrophic recursive encoders leave the en-
coder in a loop of nonzero state sequences, resulting in a high-weight
output. If we are concerned with low-weight outputs, we must first con-
sider weight-2 input sequences, beginning with the following result from
[15].
Theorem 144 for recursive encoders.
Proof: Let G contain at least one rational element of the form
where has degree Finite weight error events are generated by
polynomial multiples of Every polynomial divides for
some (note that this is equivalent to saying that we can generate any
sequence with period using an LFSR with as a connection
polynomial). cannot, on the other hand, divide a polynomial of
the form
Hence, for PCC’s with recursive component encoders, we have
2 and Now consider the summand of Equation (4.3).
When is odd, we have for positive integers and
The summand takes the form
76 CODES, GRAPHS, AND ITERATIVE DECODING

When is large, the factor renders the summand negligible for

odd When is even, we have for positive integers and
The summand takes the form

In this case the interleaver factor is relatively large. We continue the

analysis by focusing solely on those cases in which is even. Equation
(4.3) reduces to the following form.

Note that The analysis of is then reduced

to an examination of codewords associated with weight-2 input blocks.
A finite codeword associated with a weight-2 input must have a weight-
2 information block and a parity block with weight where
is the parity weight associated with the graph cycle, is an integer, and
is the parity weight associated with the transient path leading to the
beginning of the first cycle and from the last cycle to the all-zero state.
Let be the weight of the minimum-weight parity sequence gen-
erated by an recursive encoder with a weight-2 input. It follows that
The single-event enumerating function for inputs of
weight 2 is then approximated as follows (it is not an equality unless the
interleaver has infinite depth).

Substituting this expression into Equation (4.5) and setting W = Z =

H, we have
Convolutional and Concatenated Codes 77

Equation (4.6) leads to several interesting conclusions. We first note

that the performance of a PCC with recursive component encoders varies
proportionally with the size of the interleaver (in some ranges of SNR).
It can also be seen that has a significant impact on performance.
It follows that the component encoders should not only be recursive, but
should also be selected so as to maximize Benedetto and Montorsi
have defined a figure of merit for PCC’s that they call effective free
distance, It is the exponent of the variable H in the numerator
of Equation (4.6), i.e.

is the total weight of the PCC output when a weight-2 informa-

tion sequence generates the minimum weight parity sequences in both
component encoders. The maximization of obviously results in a
maximization of the effective free distance. It should be noted, however,
that the free distance for the PCC (taken over all possible information
block weights) may be substantially less than In other words,
information blocks with weights greater than two may result in par-
ity blocks whose weights are less than It is the function of the
interleaver to minimize the impact of such potential low-weight parity
sequences. The impact of free distance has been explored by Divsalar
and McEliece in [33].
Benedetto and Montorsi [15] conducted a search for the best rate-
1/2 recursive systematic component encoders for various memory orders
78 CODES, GRAPHS, AND ITERATIVE DECODING

Their results are shown in Table 4.1. The search was conducted
assuming an interleaver length but the results do not seem to
change for longer interleavers. The octal notation is a convenient means
for representing long polynomials. For example, consider the first of the
optimal component codes with The octal value is converted
to the binary value (011001). The LSB is taken to be the first nonzero
bit on the left. The feedback polynomial is then
Note that in most cases, the free distance of a PCC based on
a pair of such recursive systematic encoders is much smaller than the
effective free distance. The information weight associated with the
sequences of weight is large, substantially reducing the impact of
these sequences on bit error rate.
Chapter 5

ELEMENTS OF GRAPH THEORY

The graph-theoretic interpretation of error control codes has led to

construction techniques for codes whose performance is extremely close
to the Shannon limit. The next several chapters discuss these construc-
tions techniques. In this chapter we consider several basic results that
will be used throughout the rest of the book.
We begin with a basic definition for graphs, and then proceed to a dis-
cussion of the important class of bipartite graphs. A method is presented
for transforming a non-bipartite graph into a bipartite graph through the
use of edge-vertex incidence graphs. The powerful probabilistic tech-
nique of martingales is then introduced as a tool for use in bounding
the deviation of a random variable from its expected value. Using this
technique, many properties of a graph can be bounded exponentially
around their respective expected values.
A definition is then given for an expander graph, a graph in which
each vertex has a large number of neighbors, where a neighbor is a node
to which the first node is directly connected by an edge. The properties
of such graphs will prove useful in that they are natural analogs of codes
in which the value of an information bit is spread across several codeword
bits. Intuitively, such a situation provides a large number of options for
recovering the value of an information bit in the presence of noise.
We derive a lower bound on the expansion of a randomly chosen graph.
This result shows that a randomly chosen graph will be an expander with
high probability. One important consequence of the proof of this result
is that a regular graph is a good expander if and only if the eigenvalue of
the graph with the second largest magnitude is well separated from the
eigenvalue with the largest magnitude. We provide a lower bound on
the second largest eigenvalue of a regular graph and describe an explicit
80 CODES, GRAPHS, AND ITERATIVE DECODING

construction for a regular graph that achieves the lower bound. This
gives the best explicit expander known.

1. Introduction
We begin with a basic definition for a “graph”.
Definition 145 A graph G is an ordered pair (V, E) of a set of vertices
V and a set of edges E. An edge is a pair of distinct vertices from V.
In a simple example below, the graph consists of the vertex set V =
{A, B, C, D, E, F} and the edge set E = {(A, D), (B, D), (C, E), (D, E)}.
This particular graph is disconnected in that paths do not exist between
arbitrary pairs of vertices in the graph. In this case the vertex F is not
connected by an edge to any other vertex. This graph is also undirected
in that there is no directionality associated with the edges. In a directed
graph, the edges are ordered pairs of vertices, with the first vertex being
the originating vertex and the second the terminating vertex. A directed
version of the first graph is shown in Figure 5.2. We will focus on undi-
rected graphs in this chapter, but will return to directed graphs in the
next chapter.

In the remainder of this chapter and the rest of the text we will use
the following terminology. An edge is said to be incident on its end
vertices. The number of edges incident on a vertex is the degree of the
vertex. In the two graphs shown in Fig. 5.3, for example, each vertex
has degree 2. Two vertices connected by an edge are said to be adjacent.
The chromatic number of the graph G, is the minimum number of
colors required to color the vertices of G such that no adjacent vertices
have the same color. The chromatic number for the graph at the top of
Fig. 5.3, for example, is 3. A graph is regular if all vertices of the graph
are of equal degree. If each vertex of a regular graph is of degree we
say the graph is A graph is irregular if it is not regular.
Elements of Graph Theory 81

A bipartite graph is a graph that can be divided

into two disjoint sets of vertices and such that no two vertices in
the same set are adjacent. The graph in Figure 5.3(b), for example, is
bipartite, the two sets of vertices being {A, B, C} and A
bipartite graph is a regular bipartite graph if all the vertices in have
the same degree and all vertices in have the same degree where the
two degrees need not be same. A bipartite graph is an irregular bipartite
graph if it is a bipartite graph that is not regular. For a given bipartite
graph we refer to as the set of left vertices and
as the set of right vertices. If is a regular bipartite
graph, then we will use to denote the degree of left vertices and
to denote the degree of right vertices, and G will be called a
regular bipartite graph or just graph.
Edge-vertex incidence graphs can be used to obtain bipartite graphs
from graphs that are not bipartite.

Definition 146 Let G be a graph with edge set E and vertex set V.
The edge-vertex incidence graph of G is the bipartite graph with vertex
set and edge set is an endpoint of

An example of a graph and its edge-vertex incidence graph is shown

in Figures 5.3 (a) and (b), respectively. In the figures, G = (V, E),V =
{A,B,C}, and the edge-vertex incidence graph

The edge-vertex incidence graph is a (2, 2)-regular

graph.
One can also obtain a bipartite graph from a non-bipartite graph
through the use of a double cover. Specifically, for a graph G with
vertex set V, define a bipartite graph with vertex set such
that and are copies of V and a vertex in and a vertex in are
82 CODES, GRAPHS, AND ITERATIVE DECODING

adjacent only if the corresponding vertices in V are adjacent in G. is

called the double cover of G.
There is a natural relationship between block codes and bipartite
graphs. Let and denote the sets of columns and rows, respec-
tively, of the parity check matrix of an error correcting code. Define a
bipartite graph where if the
element of the parity check matrix at column and row is 1. Con-
versely, from any bipartite graph define the related binary
matrix with columns and rows with element 1 at column and
row of the matrix if and only if In this manner one can
define a block code from a bipartite graph and vice versa. A similar
approach holds for convolutional codes.
Definition 147 The adjacency matrix of a graph G = (V, E) is the ma-
trix where and
Clearly A is symmetric and all its eigenvalues are real.
There is an interesting relationship between graphs obtained from par-
ity check matrices and the adjacency matrices for the graphs. Consider
a parity check matrix and a bipartite graph
where if and only if then the
adjacency matrix of G is

Lemma 148 Let be an eigenvalue of A and µ be an eigenvalue of

Then Moreover, if is an eigenvector of then
He is the eigenvector of with the same eigenvalue.
Elements of Graph Theory 83

Proof: Let where So, and

Since we get So, Now if is an
eigenvalue of then for some Multiplying both sides
by H,
We now consider a definition for expander graphs. There are many
variations of the definition but most of them are essentially the same.

Definition 149 Consider G = (V,E), where the cardinality of V is

The expansion of a subset in G is defined to be the ratio
where

is called the set of neighbors of X and is the set of edges attached

to nodes in X. A graph G is an expander if every subset of size
at most has expansion at least

Expander graphs do exist, and are widely used in theoretical computer

science. One can show using probabilistic methods that any randomly
chosen graph will be a good expander. It is rather difficult to give
explicit constructions of expander graphs, however, and expansion by
explicit construction is generally much smaller than that by random
construction.
The best known method for calculating the bounds of expansion is
the spectral method, which involves the calculation of the second largest
eigenvalue in absolute value of the adjacency matrix of the graph. Lower
bounds on the expansion of explicit and random regular graphs have
been calculated in various places; see, for example, [5, 10, 55, 67, 77,
79, 99, 108]. We will focus mostly on bipartite expander graphs for
applications in later chapters; however, we also consider non-bipartite
expander graphs since we can always use the edge-vertex incidence graph
of a non-bipartite graph to obtain a bipartite graph.
A bipartite graph is an expander if any subset, X that contains
a fraction of all of the left nodes has at least right node
neighbors. We use the notation expander to denote a
graph that is an expander.

2. Martingales
The term “martingale” has in interesting history - it is a gambling
strategy that is now forbidden in many of the world’s casinos. The basic
idea is that the gambler doubles her wager after every loss. Assuming
that the gambler is sufficiently well-funded, she will eventually win and
recover all of her earlier losses.
84 CODES, GRAPHS, AND ITERATIVE DECODING

In its more general mathematical context, we capture the structure

of the sequence of wagers as a sequence of random variables. Let
be the gambler’s initial stake (presumably quite large). After some
wagers, this stake will have been reduced to The question now
arises as to the expected amount of the stake that will remain after
the next wager. If we assume nothing more than that the game is fair
(the expected gain is balanced by the probability of winning), then the
expected remaining stake in the future is the same as that at present –
we have no other information, so we simply project the current situation
into the future. The expected value of the random variable is simply
the value assumed by the random variable. Now consider a
formal definition.
Definition 150 A sequence of random variables is a mar-
tingale sequence if for all

The mathematical tools associated with martingales are useful in that

they assume very little with regard to the distribution underlying the
random variables in the sequence.
We now identify two important martingale sequences that arise in the
context of random graphs. Random graphs are sets of graphs for which
edges exist between arbitrary pairs of nodes with some fixed probability
Consider a positive integer and The probability space
over the set of graphs of vertices with edge probability is called the
random graph We will usually use the term random graph G
to mean a graph that has been randomly chosen from the probability
space
Let f be any real-valued function on a graph and let G be a random
graph. Now label the possible edges of G sequentially from 1 to
The edge exposure martingale sequence is defined as

where if possible edge is in G and otherwise, for

In other words, to get reveal the first edges to find
if they belong to G. The remaining edges are left unknown and
considered as random. is the conditional expected value of
given only the “exposed” edges up through the possible edge. For
example and
To illustrate, let f be the chromatic number of a graph and let
and is either 2.25 or 1.75 depending on whether the first
edge revealed belonged to G. If the first edge revealed belonged to G,
Elements of Graph Theory 85

then and is either 2.5 or 2 by the same argument.

Similarly, if the second edge revealed belonged to G, then
and is either 3 or 2. We leave the proof to the reader that this
and the next sequence are both martingale sequences.
Similarly, label the vertices of the graph G sequentially from 1 to
The vertex exposure martingale sequence is defined as

where if the edge between and is in G and

otherwise, for Observe that by an appropriate ordering,
the vertex exposure martingale is a subsequence of the edge exposure
martingale.
A function satisfies the edge(vertex) Lipschitz condition if when G
and differ in only one edge (vertex), then If a
function satisfies the Lipschitz condition then more can said about its
martingale sequence.

Lemma 151 If satisfies the edge (vertex) Lipschitz condition then its
edge (vertex) exposure martingale satisfies

The following lemma [10] provides an exponential small-tail bound for

a martingale sequence. It is very useful in that it shows that the tail of
the martingale sequence is exponentially concentrated around the mean
of the random variable.

Lemma 152 (Azuma’s Inequality) Let be a martin-

gale sequence such that for each

for all Then for any

Proof: Let and First, observe that

and Define
86 CODES, GRAPHS, AND ITERATIVE DECODING

Since for all

Thus,

and

Theorem 151 and Azuma’s Inequality combine to provide a very po-

tent tool for proving many combinatorial problems. One application
that follows from this combination is the bound on the difference of the
chromatic number and the expected value of the chromatic number of a
graph. Later in the text we will show that these results are essential for
the proofs of several error correction bounds.
Theorem 153 For a random graph G with arbitrary and

Proof: Consider the vertex exposure martingale and let

The vertex Lipschitz condition is satisfied here since the differ-
ent vertex can be given a new color. The result follows from Azuma’s
Inequality.

3. Expansion
This section focuses on techniques for bounding the expansion of ran-
dom graphs. For regular graphs, the results are based on the calculation
of the second largest eigenvalue (of the adjacency matrix) of the corre-
sponding graph. Let denote the eigenvalues of the graph such
that and let denote the second largest eigenvalue
in absolute value of G. Furthermore, we shall assume throughout this
section that G is a graph with vertices, unless otherwise spec-
ified. Note that if G is not bipartite and
if G is bipartite. In particular, and and
if and only if G is bipartite.
Elements of Graph Theory 87

It was shown by Alon [5] and Tanner [108] that a graph G is a good
expander if and only if and are far apart. Hence in order to find
a good expander graph G, it suffices to check The following lower
bound for was derived in [5, 67].

This leads us to Ramanujan graphs.

Definition 154 A graph G is a Ramanujan graph if

Ramanujan graphs are optimal in the sense of obtaining good expan-

sion1. Ramanujan graphs have been shown to exist, and an explicit
construction of a Ramanujan graph was developed independently by
Lubotzky, Phillips and Sarnak [67], and by Margulis [79]. The graphs
are Cayley graphs of the projective general linear group
and can be constructed in polynomial-time. We briefly sketch the result
here for illustration.
Let J be a group and X ( J , A) is called the Cayley graph of the
group J with respect to the subset A if it is a graph such that its vertices
are associated with the elements of J and are adjacent if and
only if or for some Thus if the subset A
consists of the generators of J, then the Cayley graph is an
graph.
Let and be distinct primes such that mod 4 and mod 4.
Elementary number theory tells us that there are solutions of
the form to and that there are
with and odd, and all even. To each of these
associate a matrix in

where is an integer such that mod Hence we have

matrices in Consider Cayley graphs of the group
with respect to these matrices. These matri-
ces are the generators of the group and hence the Cay-
ley graphs are The number of elements of the group
is

1
Srinivasa Aiyangar Ramanujan (1887 - 1920) made significant contributions to the analytical
theory of numbers. Essentially self-taught, he is considered one of the greatest mathemati-
cians of all time.
88 CODES, GRAPHS, AND ITERATIVE DECODING

If then the graph is not connected because the elements of

S all lie in To avoid this, we consider the Cayley graph
of with respect to the matrices in this case. The
number of elements of the group is
If the graph is bipartite between

and

Since the graph is bipartite between equal size

sets of the vertices. The Cayley graphs just described are Ramanujan
graphs.
We now consider some properties of a graph in relation to its second
largest eigenvalue in absolute value. The next theorem [10] shows that
the number of neighbors of a set of vertices is close to its expected value
of a random graph if is small.
Theorem 155 For G = (V, E), and for any subset where

Proof: Define a function such that for

and for Since is orthogonal to the
eigenvectors of the largest eigenvalue, A, of the adjacency matrix of G.
In other words, we have

where (·, ·) is the usual scalar product. By noting that

and

the theorem is proved.

As a consequence of Theorem 155, we can get a bound on the expansion
of bipartite graphs [45].
Corollary 156 Let be the double cover of G. For every
subset such that we have
Elements of Graph Theory 89

Proof: For a graph G and W as in the above theorem, let

where By noting that

we get

from Theorem 155. Taking B to be the double cover of G finishes the

proof.
For graph G = (V, E), if A and B are two subsets of V , let
denote the number of edges between the ordered vertices of A and B.
The two sets A and B need not be disjoint here. Furthermore let
denote the number of edges between two vertices of A.

Theorem 157 For a graph G = (V, E), and two sets of vertices
and where and we have

Proof: First note that

where E( · ) is the expectation function of a random graph with

This inequality can be reformulated into

whose right expression is less than or equal to

by the Cauchy-Schwarz Inequality. Applying Theorem
155 to this expression finishes the proof of the theorem.
By using the techniques similar to those that we have seen so far, one
can obtain an upper bound on the average degree between two arbitrary
sets in a bipartite graph [119].

Lemma 158 For a bipartite graph that is a double

cover of G, and two sets and where
90 CODES, GRAPHS, AND ITERATIVE DECODING

and the average degree between the sets and is

upper-bounded by

A randomly chosen bipartite graph will be a good expander with high

probability. The usual technique for proving this fact goes as follows.
First, fix a set of left vertices and calculate the lower bound of the ex-
pected number of its neighbors. Then, reveal the edges adjacent to this
set of left vertices one by one and let be the random variable rep-
resenting the expected number of neighbors given the first edges have
been revealed. By noting that forms a martingale sequence, apply
Azuma’s Inequality to show that the number of neighbors is exponen-
tially close to its expected value. For example, if B is a randomly chosen
bipartite graph between left vertices and right ver-
tices, then the usual technique shows that all sets of left vertices in
B have at least

neighbors for all with exponentially high probability, where

is the entropy function [103]. To generalize on the degree of the
vertices to allow for irregular bipartite graphs, the following standard
lemma gives a bound on expansion with a restriction on the minimum
value on the degree of the left vertices.
Lemma 159 Let B be a bipartite graph with left nodes and right
nodes for some Moreover the minimum left node degree is at
least 5. Then with probability B is an expander for
some and
Proof: Let be the event that left nodes have at most neighbors
where is the average degree of these left nodes. Since there are
ways of choosing the left nodes, there are ways of choosing the
right nodes, and the probability that a given left nodes have a
given right nodes as neighbors is at most We then have
Elements of Graph Theory 91

where follows from the inequality and is a constant

that depends on and Choose so that Since minimum
left node degree is at least 5, and

which finishes the proof of the lemma.

Corollary 160 Let B be a bipartite graph with left nodes chosen at
random. Moreover the minimum left node degree is at least 3. Then
there is some such that with probability graph B is
an expander.
The edge-vertex incidence graph of G is a bipartite
graph with left nodes and right nodes. We can use this fact to
lower bound the expansion of a bipartite graph through
the following lemma [6].
Lemma 161 If X is a subset of the vertices of G of size then the
number of edges contained in the subgraph induced by X in G is at most

Proof: For G = (V, E), define and

For such where A is the
adjacency matrix since For a particular we have

where
92 CODES, GRAPHS, AND ITERATIVE DECODING

Now this implies for

Since G is

which completes the proof.

Because the average degree in the subgraph induced by some set is (twice
the number of edges in the subgraph) / (size of the set), the lemma
implies that the average degree in the subgraph induced by X is upper-
bounded by Furthermore, note that this bound coincides
with that given in Lemma 158.
The path-l-vertex incidence graph developed by Ajtai et al. [3] pro-
vides a more general approach to the construction of bipartite graphs
than the edge-vertex incidence graphs. A path-l-vertex incidence graph
of graph G is a bipartite graph such that represents
the paths of length in G and represents the vertices of G. Two
vertices of B, and are adjacent if the vertex in G repre-
sented by in B is in the path of length in G represented by in B.
So an edge-vertex incidence graph is a path-1-vertex incidence graph.
Regarding the expansion of a path-l-incidence graph of G, we have
the following lemma due to Kahale [55].
Lemma 162 If X is a subset of the vertices of G of size then the
number of paths of length contained in the subgraph induced by X in
G is at most

The lemma gives a lower bound on the expansion of a

regular bipartite graph with left nodes and right nodes. Note
that for the case of path-1-vertex incidence graph, Lemma 161 gives a
tighter lower bound.
Chapter 6

ALGORITHMS ON GRAPHS

Graphs are often used to denote interrelationships among elements in

a set. We can use the edges of a graph to denote general relationships
between elements, such as contingency or temporal order. Algorithms
that exploit such relationships can then use the graphs as a basis for
determining algorithmic flow. In this chapter we will consider graphs
that represent contingency in the form of conditional probabilistic re-
lationships between random variables. Probabilistic reasoning can then
be performed by systematically exploiting the conditional relationships
indicated by the graph. In a graph that represents an error control code,
probabilistic reasoning algorithms exploit conditional relationships im-
posed by parity check relations, inferring the most likely transmitted
values associated with each vertex under the constraints imposed by all
adjacent vertices.
We begin this chapter by briefly reviewing several types of graph rep-
resentations. For example, we have already seen that a graph can be
either undirected or directed; we consider the means by which one can
be related to the other. A good tutorial on this topic is [104], which we
followed closely. Next, we present a probabilistic reasoning algorithm,
called the belief propagation algorithm [87], which finds the a posteriori
probability of vertices in particular class of graph representations. We
then proceed to a more general algorithm for finding the a posteriori
probabilities, the junction tree propagation algorithm, which was devel-
oped in [53, 64] and studied in the context of error correcting codes in
[1, 39]. We refer to the two algorithms as message-passing algorithms
since both are defined by the passing of messages between the vertices
in the graph.
94 CODES, GRAPHS, AND ITERATIVE DECODING

The relationship between these message-passing algorithms and sev-

eral decoding algorithms for error correcting codes are illustrated by
examples. It is to be shown that the so-called BCJR algorithm [11] and
the Viterbi algorithm [112], are precisely special cases of the message-
passing algorithms.
We introduce several simple probabilistic reasoning algorithms and
relate them to several known decoding algorithms. In the chapters that
follow we will use these connections to establish explicit constructions
of iterative decoding algorithms.

1. Probability Models and Bayesian Networks

Probability models and their associated graphs are a common form
of knowledge representation scheme – they denote and quantify an un-
derstanding of the connections between entities in a given set. Consider
a collection of discrete random variables each
with alphabet Random variables have probability mass functions
and conditional and joint probability
distributions amongst themselves.
Throughout the book, we shall use the convention that and
are two different probability mass functions for two different random
variables, in this case, X and Y. Let the function be the
joint probability distribution on X which is said to induce a probability
model on X.
A fundamental probabilistic reasoning problem is created by instan-
tiating subsets of the variables in X and trying to infer the probabilities
of values of other subsets of variables of interest [87, 104]. Instanti-
ated variables are the vertices in the network whose values are known or
observed.
The brute force approach to solving this probabilistic reasoning prob-
lem in its general form involves computational complexity that is an
exponential function of the number of uninstantiated random variables.
Suppose we learn that the variable and would now like to
update the distribution on With nothing to exploit except for the
joint probability distribution we are forced to marginalize
the distribution in a straightforward manner. The complexity of the
computation is an exponential function of the cardinality of the set of
uninstantiated variables If each random variable as-
sumes values in a finite alphabet of cardinality then the complexity
of this operation is
Algorithms on Graphs 95

In specific cases, probabilistic independence networks, which we will

refer to as as probability graphs, can be used to substantially simplify the
problem. Probability graphs explicitly describe the independencies of
the random variables and serve as a basis for a distributed computational
approach that leads to computationally efficient algorithms for solving
reasoning problems. The computational complexity of these algorithms
is a function of the means by which the independencies are exploited.
We create a probability graph G for a probability model by first es-
tablishing a one-to-one relationship between vertices of the graph and
random variables in the model. Edges are then drawn between vertices
whose associated variables are dependent. In some cases we may wish
to draw edges between vertices associated with independent variables as
well – the absence of an edge thus denotes an independence relationship,
but the presence of an edge does not guarantee dependence. G is said
to be minimal for if the deletion of any edge in G implies an inde-
pendence relation not present in G is called perfect if an edge is
absent between two vertices in the graph if and only if the two random
variables are conditionally independent in the model. Note that a min-
imal probability graph is not necessarily equal to a perfect probability
graph.
There are two classes of probability graphs: undirected and directed.
The distinction is quite simple – undirected graphs have edges that are
undirected and directed graphs have edges that are directed. If a graph
has a combination of directed and undirected edges, it can be converted
to a purely directed graph by replacing each undirected edge with a pair
of directed edges pointing in opposite directions. Undirected probabil-
ity graphs are called Markov random fields, while directed probability
graphs are called Bayesian networks.
The moral graph of a directed graph is an undirected graph that re-
tains some of the structure of the directed graph.
Definition 163 The moral graph of a directed graph is the graph that
results from adding edges to nonadjacent parents of a vertex and replac-
ing all the directed edges by undirected ones.
A directed graph and its moral graph is shown in Fig 6.1. Moral graph
derive their name from the fact that the all sets of parents are “associ-
ated” through connection by an edge. In the following sections we will
show how the moral graph is often a simpler alternative to the original
directed graph in a number of algorithm construction techniques.
The question arises as to whether perfect undirected and directed
graphs exist for an arbitrary probability model. The answer is no. For
example, consider the probability model in which a random variable Y
96 CODES, GRAPHS, AND ITERATIVE DECODING

depends on random variables W and Z which in turn depend on a ran-

dom variable X. The model’s perfect directed graph is shown in Figure
6.2(a); however, there does not exist a perfect undirected probability
graph for this model. The model’s minimal undirected graph is the
moral graph of the graph in Figure 6.2(a), but is not perfect due to the
edge between vertices W and Z. Consider now the probability model in
which random variables X and Y are conditionally independent on ran-
dom variables W and Z and vice versa. The model’s perfect undirected
graph is shown in Figure 6.2(b); however, there does not exist a perfect
directed probability graph for this model.

At this point we need to introduce further terminology related to

directed graphs. A directed graph is said to be connected if there exists
at least one undirected path (i.e. a path that ignores the directed nature
of the edges traversed) between any pair of vertices. Otherwise, the
graph is said to be unconnected (see Figure 6.3).
A directed graph is said to be cyclic if there exists a closed, directed
path in the graph. Otherwise, the graph is said to be a directed acyclic
Algorithms on Graphs 97

graph, or DAG. Cyclic and acyclic directed graphs are shown in Figure
6.3.
There are two basic types of DAG’s: singly-connected DAG’s and
multiply-connected DAG’s. A DAG is singly-connected if there exists
exactly one undirected path between any pair of vertices. A singly-
connected DAG is also referred to as a tree. Within the class of singly-
connected DAG’s, a network may be either a simple tree or a polytree.
A tree is simple if each vertex has no more than one parent, as shown
in Figure 6.4(b). A polytree is a tree that has vertices with more than
one parent, as illustrated in Figure 6.4(c).
The important distinction to be made here between DAG’s that are
multiply connected and those that are singly connected is that the former
can have loops. A loop is a closed, undirected path in the graph. A tree
cannot have a loop since a loop requires two distinct paths between any
pair of vertices in the loop.
Within a DAG we can relate vertices to one another using familiar
terms. We will use the DAG in Figure 6.4(c) as an example throughout.
A vertex is a parent of another vertex if there is a directed connection
from the former to the latter. Vertices C and D are parents of Vertex
E in Figure 6.4(c). Similarly, a vertex is a child of a given vertex if
there is a directed connection from the latter to the former. Vertex D is
thus a child of vertices A and B. An ancestor of a vertex is any vertex
for which a directed path leads from the former to the latter. Vertices
A, B, C, and D are thus ancestors of vertex E. The complete set of
all ancestors of a given vertex is called the ancestor set of the vertex.
For example, the ancestor set of vertex H is {A, B, C, D, E}. Similarly,
there are descendent vertices and descendent sets. Vertices G and H are
98 CODES, GRAPHS, AND ITERATIVE DECODING

descendants of vertex E, while {E, G, H} is the descendent set of vertex

D.
Figure 6.4(c) is a polytree, as opposed to being a simple tree, because
several of the vertices have more than one parent. A polytree vertex with
more than one parent is often said to be head-to-head in the undirected
path that connects the parents. For example, vertex E is head-to-head
in the shortest undirected path that connects vertices C and D. There
are some interesting causal implications that can be exploited when one
vertex is head-to-head with respect to two or more other vertices (see,
for example, Pearl on “explaining away” potential causes [87]).
We now have sufficient terminology to explore the means by which
Bayesian networks depict conditional dependence and independence be-
tween sets of variables.

Definition 164 (U- and D-Separation)

Suppose X, Y and Z are any disjoint subsets of the vertices in an

undirected probabilistic network. We say X U-separates Y and Z if
all paths between vertices in Y and Z contain at least one vertex in
X.
If the network is directed, we say X D-separates Y and Z if all undi-
rected paths between vertices in Y and Z contain at least one vertex
A such that
A is a head-to-head vertex in the path, and neither A nor its
descendants are in X, or
A is not a head-to-head vertex in the path and A is in X.
Algorithms on Graphs 99

Example 165 (D-Separation) In Figure 6.4(c) we see the following.

A and B are D-separated from E and F by D.
A is not D-separated from B by D (D is head-to-head in the only
connecting undirected path between A and B).
A, B and C are D-separated from G and H by E.
E is D-separated from F by the null set.
The ancestor set of E is D-separated from the descendent set of E by
E.
D- and U-separation in a graph are formally related to conditional
independence in the probability model associated with the graph by the
following theorem from Pearl [87].

Theorem 166 (Separation Theorem) If G is an undirected (or di-

rected) probability graph for the probability model then X U- (or D-)
separating Y and Z in G implies Y and Z are conditionally independent
given X in

The last of the examples in Example 165 is particularly important.

In any polytree the ancestor set of a vertex is D-separated from the
descendent set by the vertex itself. This is exploited in the next section
when we develop the concept of belief propagation.
In the next section, we first consider a probabilistic reasoning algo-
rithm, polynomial-time bounded in the number of vertices in the graph,
for singly-connected Bayesian networks. Since this algorithm has rather
restricted applications, we next consider a more general polynomial-time
bounded probabilistic reasoning algorithm for Markov random fields in
the form of junction trees.

2. Belief Propagation Algorithm

Consider a finite set of discrete random variables
and its Bayesian network. The vertices represent the random variables
and the edges represent the causal influences from a parent vertex to
its child vertex, and the probability distributions represent the strength
of the edges. The probability distribution over the sample space in a
Bayesian network is given by
100 CODES, GRAPHS, AND ITERATIVE DECODING

where is the value of the set of parent vertices of and the

probability is given a priori for We assume that if
is a root vertex then Now, instead of
probabilities in the joint probability space, only

probabilities are needed to compute a marginal distribution where

is the set of alphabets of the parent vertices of
In this section we present the belief propagation algorithm – an exact
algorithm invented by Judea Pearl [87] that solves probabilistic reason-
ing problems of singly connected Bayesian network. By an exact algo-
rithm, we mean an algorithm that finds exact solutions, as opposed to
approximations.
Define the belief of a vertex X in the network to be

where is the evidence or the set of instantiated variables that has the
total available information in the network about the random variable X.
Our goal is to find beliefs of vertices in the network in a computationally
efficient manner.
Let and be the evidence contained in the subgraph whose root
is X and in the rest of the graph, respectively, and define
and

We will use throughout the book to mean that the left hand side
of is equal to the right hand side weighted by a normalizing constant.
Since the network is loop-free, we can put the belief of X in the following
factored form:

To calculate we first develop some additional notation. Let

be the set of X’s parents,
be the set of X’s children,
be the evidence in the subgraph on side of the link

be the evidence in the subgraph on side of the link

Algorithms on Graphs 101

The notation is illustrated in Figure 6.5.

By the mutual conditional independencies of and for
given X, and similarly of and for given X, we
are able to re-express and as follows.

Defining
and
102 CODES, GRAPHS, AND ITERATIVE DECODING

yields

We call the message that is passed from the vertex to X and

the message that is passed from the vertex to X.
Now for generality, consider and the set
that is the union of disjoint sets and Then can
be re-expressed as follows.

Consider now Since is the union of dis-

joint sets and becomes
Algorithms on Graphs 103

Note that it suffices for the vertex X to pass to all its children,
instead of passing to each child of X different – each child can
compute by dividing the received by the value of the message
that it previously passed to X. That is, by exploiting the relation
we are able to get a computational improvement in
specifically,

The belief propagation algorithm is based on the fact that, given a

probability graph that forms a DAG, the vertices and need only
pass the messages and respectively, to the vertex X
for the vertex X to have sufficient information to calculate the belief
of X according to Equation (6.2). The process is recursive in that the
messages and can be calculated from the messages
and received from their neighbors. In short, it suffices for all the
vertices in the network to pass messages to their neighbors to support the
calculation of the beliefs of all the vertices in the network. This message-
passing is also called belief propagation or the propagation of evidence.
It is clear from the above expressions that the messages and
are independent of each other, as one would expect from in
a loop-free graph.
We summarize the belief propagation algorithm as follows.

Belief Propagation Algorithm

Given a loop-free Bayesian network:
Boundary Conditions
The root vertices are given the prior for
The non-instantiated leaf vertices are given for all
The instantiated vertices are given when is
observed, where equals 1 if and equals 0 otherwise.
Iterate
If X received messages from all other neighbors, send to
where
104 CODES, GRAPHS, AND ITERATIVE DECODING

If X received messages from all other neighbors, send to

where

Repeat the above two steps until no new message is calculated.

Conclusion

3. Junction Tree Propagation Algorithm

In this section we consider a class of undirected probability graphs
called junction trees. We present a probabilistic reasoning algorithm
called the junction tree propagation algorithm. This algorithm general-
izes on the belief propagation algorithm, which should not be altogether
surprising since junction trees are a generalization of Bayesian networks.
Definition 167 A clique in an undirected graph G is a subgraph of G
such that there exist edges between all pairs of vertices in the subgraph,
and the subgraph is not a proper subgraph of a graph with the same
property.

Definition 168 A junction tree of G is a tree of cliques of G such that

if a vertex belongs to any two cliques, then it also belongs to all the
cliques in the path between them.

It can be shown that any undirected or directed probabilistic inde-

pendence graph can be transformed into a junction tree by adding edges
as necessary. A clique graph is a graph whose vertices are the cliques of
the associated graph.
The probability distribution over a sample space
in an undirected probability graph is generally given by
for a set of cliques and for some pos-
itive potentials defined on the set of cliques. The
probability distribution over a sample space in a junction tree is given
by
Algorithms on Graphs 105

where and Now,

the joint probability space has been reduced to

where and are the alphabets of and respectively.

Unfortunately, the problem of finding junction trees with the smallest
maximal cliques is NP-hard. To check whether an undirected graph has
an associated junction tree, we first define the following.
Definition 169 A chord is a link between two vertices in a loop that is
not a part of the loop.
Definition 170 An undirected graph is triangulated if every loop of
length greater than three has at least one chord.
Theorem 171 An undirected graph has a junction tree if and only if it
is triangulated.
For a proof, see, for example, Pearl [87]. To obtain a junction tree,
it suffices to add edges to an untriangulated graph, transforming it into
a triangulated graph. Building on this result, we can now state the
algorithm for constructing junction trees given any graph.

Construction of Junction Trees

If the graph is directed, then moralize it to obtain an undirected graph.
Do the following to an undirected graph:
Triangulate the graph.
Obtain a chain of cliques of the triangulated graph. Denote the chain
of cliques by
Choose from a clique with the largest number of
common vertices and add the edge to the set of edges of the
junction tree for each

Figure 6.6 shows an example of the construction of junction tree. Fig-

ure 6.6(a) is the directed graph that we want to convert to a junction
tree. Figure 6.6(b) is the moralized graph of Figure 6.6(a), and Fig-
ure 6.6(c) is the triangulated graph of Figure 6.6(b). Figure 6.6(d) is
the junction tree obtained from the chain of cliques of the triangulated
106 CODES, GRAPHS, AND ITERATIVE DECODING

graph in Figure 6.6(c). The joint probability distribution function of the

junction tree in Figure 6.6(d) is

The junction tree propagation algorithm solves problems with com-

putational complexity on the order of only the sum of cardinalities of
alphabets. This algorithm is very general in the sense that all the exact
algorithms for a probabilistic reasoning problem are special cases of it.
We are interested in the a posteriori probability of a random variable
in the graph as before. To calculate where e is the evidence about
we will be interested in the a posteriori probability of a clique that
contains in the graph from which we can obtain by marginal-
izing. Clearly, the clique should be the smallest size clique that contains
For two neighboring cliques and define

and suppose has K neighboring cliques and let be the set of cliques
in the subtree containing when dropping the edge Let
be the set of the vertices in the subtree containing when dropping the
Algorithms on Graphs 107

edge That is, is the union of cliques in The notation

is illustrated in Figure 6.7.

Defining the message sent from clique to the neighboring clique

as we have

where and similarly, defining the message sent from the

neighboring clique to the clique as we have

Computation of or can be simplified if we consider a

decomposition of as

The message sent from clique to the neighbor clique now can
be reformulated as
108 CODES, GRAPHS, AND ITERATIVE DECODING

and similarly for

In order to calculate the joint probability distribution function of a
clique, say it will be convenient to decompose the set as

where (a) follows from the fact that Completing the

derivation, we have

We now summarize the junction tree propagation algorithm.

Junction Tree Propagation Algorithm

Given a junction tree: Iterate
If received the messages from all other neighbors, send to
where

Repeat the above step until no new message is calculated.

Conclusion
Algorithms on Graphs 109

where is chosen to have the smallest size clique that contains and

4. Message Passing and Error Control Decoding

We now relate the message passing algorithms of the previous sections
to decoding algorithms for error correcting codes. We will show that the
generic use of semirings as a defining algebraic structure illustrates the
structural identity of different types of decoding algorithms.

Example 1: MAP and ML Decodings of a Simple Block Code

Consider the (4,1) block code defined by the following parity check ma-
trix H.

We can represent the code by the graph shown in Figure 6.8. The
vertices labeled 1, … , 4 are the bits of the codeword and the vertices
(1,4), (2,4) and (3,4) are the constraints on the bits of the codeword.
The modulo-2 sum of values of neighbors of a constraint vertex should
be 0. In the graph in Figure 6.8, we left out the evidence vertices each
of which is connected to a vertex in the graph. The graph is loop-free,
so we can apply the belief propagation algorithm.
110 CODES, GRAPHS, AND ITERATIVE DECODING

Semirings were defined in Definition 36 of Chapter 2. We repeat the

first example of a semiring here, and then show how it’s structure can
be used to interpret the decoding of the simple block code.
Semiring Example 1 : The set of nonnegative
real numbers with the operations + being the sum that has the identity
element 0, and the operation · being the product that has the identity
element 1.
Now to the code. For and we have

Then,

and

Finally, for and we have

Algorithms on Graphs 111

where represents the modulo-2 sum. The belief of each bit vertex is
then

for and which is the symbol-by-symbol a

posteriori probability decoding. For example,

as expected.
Now consider the second of the semiring examples, repeated below.
Semiring Example 2 : The set of nonnegative
real numbers with the operation + being the maximum that has the
identity and the operation · being the sum that has the
identity element 0.
Using the metric log instead of yields
112 CODES, GRAPHS, AND ITERATIVE DECODING

which is the maximum likelihood decoding of the parity check. Belief

propagation yields both MAP and ML decoding for the block code,
depending on the definition of the underlying semiring.

Example 2: MAP and ML Decodings of a Convolutional Code

Consider the convolutional code represented by the graph shown in
Figure 6.9(a). The input bits are represented by that cause
state transitions in the convolutional encoder from to The
channel outputs are represented by that depend on
respectively.

Since the directed probability graph shown in Figure 6.9(a) has loops,
we will not be able to directly apply the belief propagation algorithm.
We will instead use the junction tree propagation algorithm. To use the
algorithm, it is necessary to convert the directed probability graph into
a junction tree. By deriving the moral graph shown in Figure 6.9(b),
we obtain the desired junction tree of the directed probability graph
Algorithms on Graphs 113

as shown in Figure 6.9(c). Applying the junction tree propagation al-

gorithm to the graph in Figure 6.9(c) using the semiring of Semiring
Example 1, we obtain

and

Now the a posteriori probability of, say, vertex is

where (a) follows from a series of substitutions into the equations above.
This application of the junction tree algorithm results in the symbol-by-
symbol a posteriori probability decoding of the code. As before, doing
the algorithm in the semiring of Semiring Example 2 and using log
metric instead of gives us maximum likelihood decoding.

Example 3: MAP and ML Decodings of Block and Convolu-

tional Codes Revisited Consider the probability graph of a trellis
114 CODES, GRAPHS, AND ITERATIVE DECODING

representation of code shown in Figure 6.10(a). The vertices

represents the states of the trellis and the vertices represents
the outputs of the state transitions in the trellis. As before, to apply
the junction tree propagation algorithm, we need to convert the graph
in Figure 6.10(a) to a junction tree. The moral graph of the graph in
Figure 6.10(a) is shown in Figure 6.10(b) and finally, the junction tree
of the graph is shown in Figure 6.10(c).

Simply applying the junction tree propagation algorithm in the semiring

of Semiring Example 1, we get

and
Algorithms on Graphs 115

Note that for corresponds to the for

corresponds to the and for corresponds to
the in the BCJR algorithm. Now

which give the symbol-by-symbol a posteriori probability decoding. By

applying the junction tree propagation algorithm in the semiring of
Semiring Example 2, we get maximum likelihood decoding. Forney [38]
and Wiberg [115] also illustrated the relation between algorithms on
graphs and decoding algorithms of error correcting codes.

5. Message Passing in Loops

We now consider message passing in loopy graphs. It was implicit in
the previous section that message-passing algorithms in a loopy graph is
not well-defined. Intuitively, loops in a graph indicate a loss of mutual
independence among the messages that are passed in the algorithms. A
message that a vertex passed to its neighboring vertex may eventually
return to the originating vertex, which will consider the message as new
information. This “double counting” of a single source of evidence can
go on indefinitely and the a posteriori probability estimate of the vertex
will converge, if at all, to a wrong value with nonzero probability. We
now develop this intuitive perspective to get a better understanding of
message passing in loopy graphs.
There are two issues to be considered. The first issue concerns the
girth of the graph and the number of rounds the message-passing algo-
rithm is executed in the graph. The girth of a graph G is the length
of the shortest loop in G. If G has no loops, the girth is defined to be
infinity. The number of rounds is defined to be the number of times
the vertices in the graph both passed and received a message once. The
second issue concerns the impact of multiple counting of a single source
of evidence on the performance of the message-passing algorithm. We
will use an equivalent tree for the loopy graph to develop this point. An
equivalent tree is a tree for which the message-passing algorithm yields
the same a posteriori probability estimate as in the original loopy graph.
To build an equivalent tree for calculating the a posteriori probability of
a given vertex, we assign the vertex to be the root of the equivalent tree.
The neighbors of the vertex in the loopy graph are then the neighbors
116 CODES, GRAPHS, AND ITERATIVE DECODING

of the vertex in the equivalent tree, giving us a tree of depth 1. The

neighbors of the neighbors of the vertex in the loopy graph (excepting
the original vertex) are the neighbors of the neighbors of the vertex in
the equivalent tree, giving us a tree of depth 2. Repeating this process
indefinitely, we get an equivalent tree of depth for the loopy graph.
Normally we settle for an equivalent tree of depth for some finite To
illustrate an equivalent tree, consider Figure 6.11(a) which is a complete
graph with 3 vertices. An equivalent tree for calculating the a posteriori
probability of vertex 1 is shown in Figure 6.11(b).

Applying the message-passing algorithm from bottom to top in an

equivalent tree yields the same a posteriori probability estimate as in
the loopy graph. To see this, it suffices to just compare the paths from
bottom to top in an equivalent tree and the loops in the loopy graph.
Hence, the message a vertex receives after rounds of message passing
in a loopy graph that is bipartite is the same as the message the vertex
receives in the equivalent tree of depth with as the root.
Looking at the equivalent tree of depth of a loopy graph that is
bipartite for some constant clearly all the messages in a message-
passing algorithm in this tree are independent if and only if the girth
of the loopy graph is greater than In other words, if the girth of the
graph is and the number of rounds the message-passing algorithm is
executed in the graph is less than then each single source of evidence
Algorithms on Graphs 117

is counted at most once. For example, consider the loopy graph shown
in Figure 6.12(a), and its equivalent tree of depth 3 for calculating the a
posteriori probability of vertex 3 shown in Figure 6.12(b). All messages
in the Figure 6.12(b) are independent, as can be easily verified.

One may wonder whether the a posteriori probability of a vertex cal-

culated from a message-passing algorithm even converges. Recent re-
sults of Yedidia et al. show that message-passing algorithms converge
to points that are stationary points of the Bethe free energy [118]. In
particular, [114] shows that at convergence in an arbitrary graph, the
maximum a posteriori probability of assignments to vertices obtained
from a message-passing algorithm is greater than that obtained from
all other assignments to a large subset of the vertices. In this chapter,
we shall show a result of Weiss [113] that says in a single loop graph
of vertex alphabet size two, the a posteriori probability estimate from
the message-passing algorithm at convergence and the true a posteriori
probability of a vertex are shown to be either both greater than or equal
to 0.5 or both less than or equal to 0.5.
Consider a loopy graph that has a single loop. An example of such
graph is shown in Figure 6.13 where the graph has a part that forms a
single loop and the other part that forms a tree. Focus on the part that
forms a single loop for now and, for generality, consider the single loop
graph of vertices, such that the vertices and are the
neighbors of the vertex Each vertex has an evidence vertex with
value as its neighbor for Now, without loss of generality,
118 CODES, GRAPHS, AND ITERATIVE DECODING

consider vertex and calculate its true a posteriori probability and its
value calculated from the message-passing algorithm in the loopy graph.

Expressing in terms of a product of matrices, we have

where is a transition matrix with element as

and is a diagonal matrix with elements and
is the vector that is 0 in every coordinate except for a 1 in the
coordinate. In fact, if

then where is the trace of

Now define to be the value of the vertex calculated from the
message-passing algorithm at convergence in the loopy graph. Vertex
Algorithms on Graphs 119

sends to vertex the message at time and

the vertex sends to vertex the message
at time where Since the messages that the
vertices and send to the vertex reduce to

the messages at convergence are in the direction of the principal eigen-

vector of and respectively. Defining and
gives us

We can express where P is the eigenvector matrix and

is the eigenvalue matrix. It easily follows that
since and substituting this into Equation (6.3),
we get

In fact, in Equation (6.4) can be replaced with equality since

Reformulating

Consider now the case of alphabet size two. It follows immediately

from Equation (6.5) that
120 CODES, GRAPHS, AND ITERATIVE DECODING
or

Since we have if and only

if which gives our result. Reformulating
Equation (6.6), we get

We can now calculate the true a posteriori probability of a vertex from

the a posteriori probability calculated from a message-passing algorithm
at convergence. Going back to Figure 6.13, consider the entire graph,
both the part that forms a loop and the part that forms a tree. It is clear
by now that in a loopy graph of one loop and vertex alphabet size two,
we can calculate the true a posteriori probabilities of vertices. Using the
argument developed above, we may consider the messages that come
from the part of the graph that forms a tree as messages from evidence
vertices and calculate the true a posteriori probabilities of vertices that
form the loop from Equation (6.7) and then send messages back to the
vertices that form the tree and calculate their a posteriori probabilities
using a message-passing algorithm.
Chapter 7

TURBO DECODING

1. Turbo Decoding
Turbo error control was introduced in 1993 by Berrou, Glavieux, and
Thitimajshima [19]. The encoding and decoding techniques that fall
under this rubric were fundamentally novel to Coding Theory, and are
now recognized to provide performance that is close to the theoretical
limit determined by Shannon in 1948 [100].
There are two key concepts that continue to underlie turbo decod-
ing: symbol-by-symbol MAP decoding of each of the component codes
and information exchange between the respective decoders. This is best
exemplified through reference to Figure 7.1. is the sequence of infor-
mation bits that is encoded as by the first component code and whose
interleaved version is encoded as by the second component code.
122 CODES, GRAPHS, AND ITERATIVE DECODING

To begin turbo decoding, symbol-by-symbol MAP decoding of the

first component code from the noise-corrupted values of
is performed. Symbol-by-symbol MAP decoding calculates
which can be expressed as

where the sum is taken over all possible values of for all The
next step is to use the last factor in the expression of

as the a priori value of the information symbol for symbol-by-symbol

MAP decoding of the second component code from the noise-corrupted
values of That is, we compute which
equals

As before, the next step is to use the last factor in the expression of

as the a priori value of the information symbol for symbol-by-symbol

MAP decoding of the first component code. This process is iterated
until some stopping condition is achieved.
Berrou, Glavieux and Thitimajshima called the three factors
and for a priori,
systematic and extrinsic information, respectively. Figure 7.2 shows a
picture of the described process where symbol-by-symbol MAP decoding
of the first and second component codes are performed in “Decoder 1”
and “Decoder 2,” respectively. The extrinsic information is represented
by the letter “E” in Figure 7.2.
Turbo Decoding 123

In essence, turbo decoding exploits the component code substructure

of the parallel concatenated code and performs MAP decoding of each
component code and exchanges information between the two component
decoders to approximate MAP decoding of the parallel concatenated
code which computes

This avoids the complexity of the latter, which necessitates exponen-

tial computational complexity for nontrivial interleaving.
Translating the description of turbo decoding to the language of graphs,
consider the Bayesian network representation of a parallel concatenated
code shown in Figure 7.3.
We know that the belief propagation algorithm is a very efficient
means for calculating the a posteriori probabilities of the uninstanti-
ated nodes in a singly connected Bayesian network. However, the above
network is multiply connected. One heuristic way to approximate the a
posteriori probabilities of the uninstantiated nodes is to apply the belief
propagation algorithm to the multiply connected Bayesian network by
splitting the network into two singly connected Bayesian networks for
which the belief propagation algorithm is well-defined. In other words,
the algorithm is applied iteratively to each singly connected Bayesian
network, depicted within the solid line, one after another separately.
This process is precisely turbo decoding.
To explicitly illustrate turbo decoding, we show the relevant expres-
sions using the notation of the previous chapter. Without confusion, we
124 CODES, GRAPHS, AND ITERATIVE DECODING

shall use and to refer to the nodes representing the in-

formation bit, codeword from the first component code, codeword from
the second component code, and the noise-corrupted information bit,
respectively, and so forth. The and are initial-
ized with value equal to 1, and the root and the instantiated nodes are
given and respectively. The
nodes and will send messages and respec-
tively, to the nodes which will then send back messages
and to the nodes and respectively. Then the nodes
and will send messages and respectively,
to the nodes which will then send back messages and
to the nodes and respectively. This process is re-
peated.
Because the two singly connected Bayesian networks, “Bayesian Net-
work 1” and “Bayesian Network 2,” that form the multiply connected
Bayesian network of Figure 7.3 are activated in a serial mode, we will de-
note the messages and beliefs with an extra argument S as in serial, e.g.
becomes becomes
etc. We will in addition use the superscript to denote the update
of the base letter, where is a nonnegative integer. Because the root and
the instantiated nodes are constants from the boundary conditions, we
suppress the associated superscripts. Then after the usual calculations,
the belief propagation algorithm reduces to the following for
Turbo Decoding 125

Simplifying the above decoding we obtain the following algorithm.

Turbo Decoding:
126 CODES, GRAPHS, AND ITERATIVE DECODING

where the initial condition is

Associating each factor in with the terms defined as in

[19], correspond respec-
tively to, systematic, a priori and extrinsic information. Combining
above equations, we are able to get an explicit description of the double
counting of a single source of evidence. By the presence of interleav-
ing between the two component codes in a parallel concatenated code
represented by the factors multiple counting of
single source of evidence is decorrelated to some extent in the described
application of belief propagation algorithm to the multiply connected
Bayesian network, or turbo decoding. Our next step is to generalize the
algorithm described by turbo decoding.

2. Parallel Decoding
A simple parallel mode of turbo decoding that will have positive ex-
tensions for the cases when the parallel concatenated code has more
than two component codes is described in this section. Empirical results
show these algorithms can give better limiting performances than the se-
rial mode of turbo decoding of parallel concatenated codes. This guides
us to direct our practice in turbo decoding towards the parallel modes.
Clearly, the parallelism of the activation of the component decoders re-
quires that these decoding algorithms give estimates of the a posteriori
probabilities that are less biased to one particular component code. For
the sake of comparison with the parallel concatenated code introduced
by Berrou, Glavieux and Thitimajshima [19], we will first study the two
component codes case.
Consider the parallel mode of turbo decoding shown in Figure 7.4.
Because of the simultaneous activation of the component decoders in
Figure 7.4, it is not altogether obvious how to combine the information
from the two decoders such that the estimate of the information symbol
is always better than the estimate given by the serial mode of decoding.
Turbo Decoding 127

Furthermore, for parallel concatenated codes of more than two compo-

nent codes, because of the simultaneous activation of more than two
component decoders as shown in Figure 7.5, it is not altogether obvious
how to combine the information from the two or more decoders of the
current stage in the decoding process for the information that feeds into
the decoder of the next stage in the process.
To this end we use the fact that, because of the existence of loops in
the representation, the belief propagation algorithm applied to a graph-
ical representation of the parallel concatenated code has some degree of
freedom in the choice of order of activation of nodes. As opposed to
the loopless graphical representation of the parallel concatenated code,
in which the belief propagation algorithm computes the exact a posteri-
ori probability estimate of the uninstantiated nodes independent of the
choice of order of activation of nodes, the estimate of the nodes by the
algorithm applied to loopy graphical representations can vary depending
on the order of activation of nodes.
Decoding algorithm that we establish here will be in the parallel mode.
Roughly speaking, instead of applying the belief propagation algorithm
to the two singly connected Bayesian networks, “Bayesian Network 1”
and “Bayesian Network 2” in Figure 7.3, one by one in the serial mode,
we will now apply the algorithm to the two Bayesian networks simulta-
neously, or in the parallel mode.
The initial conditions and the usual notations are the same as before.
However in this case, since the two singly connected Bayesian networks
are activated in a parallel mode, we will denote the messages and be-
liefs with an extra argument P as in parallel, e.g. becomes
becomes etc.
128 CODES, GRAPHS, AND ITERATIVE DECODING

Belief propagation algorithm applied to the multiply connected Bayesian

network of Figure 7.3 is the following. For all

The procedure described above reduces to the following algorithm.

Parallel Decoding:
Turbo Decoding 129

It is a trivial exercise to extend above equations to examples with

three or more component codes. It is intuitively clear that this parallel
mode of decoding compares favorably to the serial mode of decoding.
Classical turbo decoding requires that either one of the two component
codes be activated first and end with the belief estimate in the form of
(systematic) • (a priori) • (extrinsic from either decoder one or two)
as exhibited before. The parallel mode of turbo decoding generalizes this
in the sense that both component codes are activated simultaneously and
ends with the belief estimate in the form of
(systematic) • (extrinsic from decoder one) • (extrinsic from decoder two).
Because the extrinsic information provides better estimates than the a
priori values (otherwise there is no need to iterate) replacing, for exam-
ple, the a priori value in the computation of by the extrinsic
information from decoder one in the equation of that is,
taking the average of the extrinsic values from the two serial modes of
decoding in which one has decoder one being activated first and the
other has decoder two being activated first, makes a bet-
ter approximation than to the true a posteriori value of
for all Through the simulation results later in this section, we show
examples in which parallel mode of turbo decoding gives strictly better
performance than the serial mode for an increasing number of iterations.
Our experiments indicate that in the case of parallel concatenated
codes with two component codes, the error correction capability of the
parallel mode did not prove to be significantly superior to that of the
serial mode. The advantage of using parallel mode is exhibited in the
case of parallel concatenated codes with three or more component codes
where parallel mode showed far better asymptotic performance than the
serial mode. Our experiments also indicated that if the code rate is not
too low, then the error correction capability of turbo decoding with more
number of component codes proved to be weaker than small number of
component codes for both types of mode of decoding.
Consider Figure 7.5 where there are now three component codes.
Since direct application of the belief propagation algorithm to each of the
130 CODES, GRAPHS, AND ITERATIVE DECODING

three associated singly connected Bayesian networks for each component

code gives only either the serial or the parallel mode of turbo decoding,
this shows that we must make some changes to the original belief prop-
agation algorithm in order to construct extensions to the parallel mode
of turbo decoding.

Now, consider the interconnection of the component decoders in Fig-

ure 7.5. By taking various combinations of the black and grey connec-
tions, we can form the basis for three distinct parallel decoding algo-
rithms:
Parallel Mode (P): All interconnections, both black and grey, are
active.

Extended Parallel One (EP1): Only the black (or only the grey)
interconnections are active.

Extended Parallel Two (EP2): All interconnections are active

at odd-numbered iterations, while only the black (or only the grey)
interconnections are active at even-numbered iterations.
The decoding algorithm that we describe here is in the parallel mode
with some constraints. Consider the Bayesian network representation
of a parallel concatenated code of three component codes to which we
will now apply the algorithm. We first show the extended parallel one
mode (EP1) of turbo decoding. To make our decoding algorithm work
according to the black interconnections in Figure 7.5, the messages that
the nodes send to the nodes and will be modified by
Turbo Decoding 131

choosing

for some This process is iteratively repeated. We shall omit the

equations here since they can easily be derived from earlier description.
Analysis of extended parallel one mode of turbo decoding is beyond
the scope of this book. Our experiments indicate that the error correc-
tion capability of extended parallel one mode of turbo decoding proved
to be slightly better in medium to high signal-to-noise ratio region and
worse in low signal-to-noise ratio region than that of parallel mode of
turbo decoding. By not feeding all the information prescribed by the
belief propagation algorithm as shown in Equation (7.1), extended par-
allel one mode however is weaker in the low signal-to-noise ratio region
than the parallel mode.
Next, we show a slightly modified belief propagation algorithm ac-
cording to the activation of all the interconnections at odd-numbered
times and only the black interconnections at even-numbered times in
Figure 7.5. This construction follows almost identically from the par-
allel and extended parallel one modes of turbo decoding. Experimental
results have indicated that this method performed considerably better
than extended parallel one mode of turbo decoding for any signal-to-
noise ratio region while considerably better than the parallel mode of
decoding for signal-to-noise ratios not too small.
There are two explanations for such behavior. The first is that while
the most amount of information is necessary to obtain good performance
at very low signal-to-noise ratio, shown by the superiority of the parallel
mode to extended parallel one and two modes regardless of the amount
of multiple countings of single sources of evidence, less, but not neces-
sarily least, amount of multiple countings of single sources of evidence
is necessary to obtain good performance for other signal-to-noise ratios,
shown by the superiority of extended parallel one and two modes to the
parallel mode.
The second explanation is that the extended parallel two mode has
a priori values at the current stage in the decoding from the extrin-
sic values from the previous stage in the decoding that are irregularly
constructed compared to the parallel and extended parallel one modes.
Such construction of extrinsic values has a large amount of information
but small amount of multiple countings of single sources of evidence.
Thus, due to the irregularity of the a priori values, extended parallel
two mode performs significantly better than the rest of the methods for
any other signal-to-noise ratio.
132 CODES, GRAPHS, AND ITERATIVE DECODING

Experiments show significant performance improvement by the decod-

ing algorithms presented in this section to classical turbo decoding. It
is therefore necessary to direct our research towards the case of parallel
modes of turbo decoding. Because the performance differential for the
four decoding algorithms discussed for parallel concatenated codes that
had only two component codes was too small, we present some of the
experimental results of the performance of parallel concatenated codes
of three component codes.
In Figures 7.6, 7.7, 7.8 and 7.9, we show the performance of serial,
parallel, extended parallel one and extended parallel two modes of turbo
decoding, respectively. A parallel concatenated code of three
(37,21) component recursive systematic convolutional codes was used.
The interleaver is a 63 by 31 block interleaving with its columns and
rows selected at random. The small numbers next to the curves in the
figures indicate the number of iterations.
In our experiments, we found that while parallel mode of turbo decod-
ing performed the best in very low signal-to-noise ratio region, extended
parallel two mode of turbo decoding significantly performed better than
the rest of the decoding algorithms in all other regions. Figures indicate
that, clearly, our decoding algorithms perform significantly better than
the classical serial mode of turbo decoding. For parallel concatenated
codes of higher number of component codes, the performance differen-
tial for our decoding algorithms and the classical decoding algorithm
only increases. As follows from previous exposition, the three decoding
algorithms have the same computational complexity as classical turbo
decoding algorithm. An analogue of our parallel decoding algorithms
of parallel concatenated codes was presented in [17] where the compo-
nent codes were concatenated in serial and the iterative decoding was
performed in serial.

3. Notes
The impact of the turbo principle on the research community was
explosive – the best indication being the number of papers or books that
were subsequently published (see, for example, [32, 33, 34, 47, 65, 82,
91, 48]). It it impossible in this short text to mention all who have made
significant contributions to this field, but it is appropriate to note the
works that the authors have relied on the most. Benedetto and Montorsi
gave the first detailed analyses of the performance of turbo error control
[15, 16] and extended the capability of iterative decoding techniques to
serially concatenated codes [17] which also give near capacity-achieving
performance. Perez et al. [89] explored the distance properties of turbo
Turbo Decoding 133
134 CODES, GRAPHS, AND ITERATIVE DECODING

code, and showed that the interleaver size of turbo code should be large
for the code to have small number of codewords of low weight.
Wiberg et al. [115] described how a general iterative decoding algo-
rithm can be described as a message passing on graphs, and McEliece
et al.’s paper [83] on turbo decoding described turbo decoding as an
instance of belief propagation algorithm, providing a graphical view of
turbo decoding and allowing generalized decoding algorithms of turbo
codes [57]. In an instance of simultaneous inspiration, Kschischang and
Frey also showed that turbo decoding can be viewed as an application of
the belief propagation algorithm [60, 61] on a multiply-connected graph.
More recently, the probability density of the data estimates developed
during turbo decoding was shown to be well approximated by the gaus-
sian distribution when the input to the decoder is gaussian [35, 97]. This
allows for the calculation of a threshold for turbo decoding so that, at
a noise level below some threshold, the probability of error of turbo de-
coding goes to zero in the length of the codeword; at a noise level above
the same threshold, the probability of error goes to one in the length of
the codeword.
Turbo Decoding 135
Chapter 8

LOW-DENSITY PARITY-CHECK CODES

Robert Gallager introduced low-density parity-check codes in 1961, in

his doctoral thesis [40, 41]. Low-density parity-check codes are a class
of linear block codes in which the parity-check matrices are sparse or of
low density. They can be treated as a bipartite graph
whose adjacency matrix is of the form

where H is the parity-check matrix of the code. In the graph B, X

represents the set of codeword nodes and C represents the set of con-
straint nodes. The “constraint” imposed by the constraint nodes is that
the neighbors of each constraint node must sum to 0 modulo 2. The
parity-check matrix of the code is sparse because the degree of nodes in
the graph is fixed while the number of nodes in the graph is increased.
Stated in another way, the parity-check matrix has a constant number of
one’s in each column and in each row while the size of the parity-check
matrix increases. The encoding of a low-density parity-check code takes
a number of operations that is quadratic in the number of message bits.

1. Basic Properties
Low-density parity-check codes defined over regular bipartite graphs
are called regular low-density parity-check codes, with the obvious analogs
for irregular graphs. We will refer to them as regular and irregular codes,
respectively. Throughout the chapter, we shall use the notation
and for the codeword node, constraint node, received word
node, message from a codeword node to a constraint node and message
from a constraint node to a codeword node, respectively. By abuse of
138 CODES, GRAPHS, AND ITERATIVE DECODING

notation, we shall use node and bit (value taken by the node) inter-
changeably, and also use variable node and codeword node interchange-
ably. Since we will typically use the left and right sets of nodes in the
bipartite graph as the variable and constraint nodes, respectively, we
will refer to variable nodes as left nodes and constraint nodes as right
nodes.
For regular graphs, and will denote the degrees of a variable node
and a constraint node, respectively. For irregular graphs, and will
denote the degrees of a variable node and a constraint node and
and will denote the maximum degrees of a variable bit node and
a constraint node, respectively. Irregular codes, by definition, include
regular codes, and for this reason we shall normally describe codes in
terms of irregular bipartite graphs, unless explicitly stated otherwise.
Definition 172 Let B be a bipartite graph with variable nodes
A low-density parity-check code is

where is an incidence function defined such that for each constraint

the nodes neighboring are is the number of
constraint nodes, and are very small compared to the number of
variable nodes
If and are the average left and right node degrees in B, re-
spectively, then If the bipartite graph is regular, then we
interpret the neighbors of each constraint node as a codeword of a block
length code of even weight codewords. In the rest of the book,
we shall assume that is implicitly defined by the code when B
is a regular bipartite graph, unless stated otherwise. An example of a
bipartite graph B with variable nodes is shown in Fig-
ure 8.1. In the figure, all neighbors of each constraint node must be a
codeword of for the variable bits to be codeword of
Irregular bipartite graphs are represented by the variable bit node de-
gree sequence and by the constraint node degree sequence
where and are the fractions of codeword bit and con-
straint nodes of degree respectively, and some of the and may
be zero. Let be the probability that a randomly chosen edge is adja-
cent to a degree left node, and let be the probability that a randomly
chosen edge is adjacent to a degree right node. Hence, and
and clearly
Define and To get the average
degree of left nodes, count the number of edges of degree left nodes
Low-Density Parity-Check Codes 139

which must equal Since the number of nodes of degree on the

left side of the graph is the fraction of left nodes of degree is
The constraint of gives

Similar assertion shows that Hence we have proved

the following lemma.
Lemma 173 The average degree of left nodes and the
average degree of right nodes
From the lemma, we can express and in terms of and

Corollary 174 The coefficient of in and equals the

fraction of nodes of degree on the left side and right side of the graph,
respectively.

We now derive two simple lower bounds on the minimum distance,

of a regular code with parity-check matrix H (originally shown by
Tanner [109]). The bounds are expressed in terms of the eigenvalues of
the adjacency matrix of the graph and indicate that good expansion of a
graph implies good distance property for the associated error correcting
code. Let be the eigenvalues of . such that
140 CODES, GRAPHS, AND ITERATIVE DECODING

and assume that every constraint node is connected to at

least 2 variable nodes with nonzero value. Let the first eigenvector of
the adjacency matrix of the graph be with Let
be a real-valued vector corresponding to a minimum weight
codeword, and let be the projection of onto the eigenspace.
Observe that

Now let be the weight on the parity defined by and by

assumption all are nonzero. Then

where the inequality follows from each being greater than 2. On the
other hand,

Combining Equations (8.1) and (8.2) yields the first lower bound given
by

Note that by fixing and decreasing the numerator grows much

more rapidly than the denominator.
For the second bound, we use a projection vector corresponding to
the constraints instead of the codeword. Since eigenvalues of and
are the same, we will, by the abuse of notation, use as
the eigenvalues of such that Let the first eigenvector
of the adjacency matrix of the graph be with Define
the constraint nodes that are adjacent to any nonzero bit in the nonzero
minimum weight codeword as active constraint nodes. Let be a length-
real-valued vector that has a 1 in every active constraint node and 0
Low-Density Parity-Check Codes 141

elsewhere, and let be the projection of onto the eigenspace. If

is the number of 1’s in then and as before.
Now let be the weight on the bit defined by Then

The above inequality bears further examination. First, observe that

assigns an integer weight distribution to the nodes where the
weight is either if the value of the bit is 1, or the number of ad-
jacent active constraint nodes if the value of the bit is 0. If is the
average number of adjacent nodes with weight for an
active constraint node, we have

which implies Equation (8.3) since each active constraint node is adja-
cent to nonzero even number of nodes with value 1.
On the other hand,

Combining Equations (8.3) and (8.4) yields

Noting that and yields the second minimum dis-

tance bound given by

Now, suppose B is a expander graph and is an error

correcting code of block length rate and minimum rel-
ative distance1 For the associated code since each constraint

1
Recall from chapter 1, definition 20, that is defined to be the minimum relative distance
if is the code length and is the minimum distance of the code.
142 CODES, GRAPHS, AND ITERATIVE DECODING

gives linear restrictions, at most

linear restrictions are shared among the variable bits. Hence, the variable
bits have at least independent bits which gives the
lower bound on the rate of the code Suppose there is a codeword
of weight at most in which V is the set of variable bits that are 1.
The expansion property of the bipartite graph tells us that V has at
least

neighbors. Therefore, each constraint has less than edges on average,

or there exists a constraint with less than neighbors. Since neighbors
of each constraint must be a codeword of all codewords must have
weight at least We have proved the following theorem [103].

Theorem 175 If B is an expander graph and is an

error correcting code of block length rate and minimum
relative distance then has rate at least and
relative minimum distance at least

Consider now the case when is a linear code of rate block length
and minimum relative distance and B is the edge-vertex incidence
graph of a graph G with the eigenvalue with the second
largest magnitude. If the number of vertices of G is then the number
of variables and constraints of are and respectively. Code
rate can be obtained from the degree of the variable node being 2 and
from Theorem 175. Now Lemma 161 of Chapter 5 tells us that any
set of variables will have at least constraints as
neighbors for some constant and since each variable has two neighbors,
the average number of variables per constraint will be

which equals If then a

word of relative weight cannot be a codeword of
In particular, cannot have a nonzero codeword of relative weight
or less. Hence we have prove the following theorem [103].
Low-Density Parity-Check Codes 143

Theorem 176 If is a linear code of rate block length and

minimum relative distance and B is the edge-vertex incidence graph
of a graph G with second largest eigenvalue then the code
has rate at least and minimum relative distance at least

2. Simple Decoding Algorithms

Using the expansion properties, we develop two decoding algorithms,
the Simple Sequential Decoding Algorithm and the Simple Parallel De-
coding Algorithm, that correct a linear fraction of errors in low-density
parity-check codes. The algorithms and the underlying analyses of this
section are from the results of Sipser and Spielman [103]. To present the
algorithms, it will be convenient to define a constraint to be satisfied
if

Otherwise, define the constraint to be unsatisfied. We say we flip a vari-

able bit when we set the variable bit to 1 if it was 0, and set the variable
bit to 0 if it was 1. As we are considering hard-decision decoding algo-
rithms, we can adopt, without loss of generality, the Binary Symmetric
Channel (BSC) as the channel of interest in this section. Since the codes
of interest are linear, we assume throughout the chapter that the all-zero
codeword is sent.

Simple Sequential Decoding Algorithm:

For all edges do the following in serial:
If there is a variable bit that is in more unsatisfied than satisfied
constraints, then flip the value of that variable bit.

Repeat until no such variable bit remains.

Theorem 177 If B is an irregular expander for some fixed

then the Simple Sequential Decoding Algorithm can correct a
number of errors that is a fraction of the length of
144 CODES, GRAPHS, AND ITERATIVE DECODING

As is common in the literature, we will henceforth use the phrase “frac-

tion of errors” to refer to the ratio of the number of errors per codeword
to the length of the codeword. One can deduce from the proof that if the
bipartite graph is regular, say, B is a expander, then the
same analysis will show that the fraction of correctable errors improves
to
Proof: Assume that we are given a word that differs in bits
from a codeword in Let V be the set of codeword nodes that are
corrupt and set such that is the number of edges connected to nodes
in V. If constraints are unsatisfied, while of the satisfied constraints
have neighbors that are corrupt bits, then we have

By definition of satisfied and unsatisfied constraint, we must have

Combining the above two inequalities yields,

indicating that there exists a corrupt bit that is in more unsatisfied than
satisfied constraints. Rephrased, for the algorithm will flip
some variable bit. We finish the proof by showing that for the
algorithm will flip a corrupt variable bit. Hence, assume and
so To deduce a contradiction, observe that it suffices
to show that our algorithm may fail if the algorithm flips variable bits
that are not corrupt and becomes greater than If so, then when
becomes equal to we have from Equation (8.5) which is a
contradiction.

Corollary 178 The Simple Sequential Decoding Algorithm can be im-

plemented to run in linear-time.

Proof: [Sketch] The average left and right node degrees are independent
of the code length, and the number of unsatisfied constraints, which is
linear in the code length, decreases.
In fact, a weak converse is true also. That is, in order for the Sim-
ple Sequential Decoding Algorithm to correct all errors successfully, the
graph must be an expander. The next theorem [103] proves this for the
case of regular codes.
Low-Density Parity-Check Codes 145

Theorem 179 Let B be a bipartite graph between

variable bits and constraints, Let be the low-density
parity-check code defined by B. If the Simple Sequential Decoding Algo-
rithm successfully decodes all sets of at most an error in then all
sets of variable bits must have at least

neighbors.

Proof: Observe that if a corrupt variable bit is flipped then the number
of unsatisfied constraint nodes decreases by at least 1 for odd, and
by at least 2 for even. We shall consider these two cases separately.
Case 1: is even. The algorithm decreases the number of unsatisfied
constraint nodes by to correct corrupt variable bits. Thus, all
sets of variable nodes of size have at least neighbors.
Case 2: is odd. The algorithm decreases the number of unsatisfied
constraint nodes by to correct corrupt variable bits. So assume
first that there is no variable node that will decrease the number of
unsatisfied constraint nodes by > 1. Each corrupt variable bit node
has of its edges in satisfied constraint nodes and each satisfied
constraint nodes may have corrupt neighbors. Hence there must
be satisfied neighbors of the variable bits.
On the other hand, since there must be unsatisfied neighbors of
the variable bits, the variable nodes must have
neighbors.
Now assume that there exists a variable bit such that if the variable
bit is flipped, then the decrease in the number of unsatisfied constraint
nodes is > 1, or Suppose the algorithm flips corrupt variable
bits that decrease the number of unsatisfied constraint nodes by and
corrupt variable bits that decrease the number of unsatisfied
constraint nodes by 1. So any variable nodes have at least

unsatisfied neighbors. But from the previous argument, we already know

that variable nodes have at least
146 CODES, GRAPHS, AND ITERATIVE DECODING

neighbors. Combining Equations (8.6) and (8.7), we obtain

As we would expect, the expansion bound on necessity from Theorem

179 is less than that on sufficiency from Theorem 177 for the successful
decoding of a low-density parity-check code. Now we state the other
simple decoding algorithm.

Simple Parallel Decoding Algorithm:

For all edges do the following in parallel:
Flip all the variable bits that are in more unsatisfied than satisfied
constraints in parallel.

Repeat until no such variable bit remains.

Theorem 180 If B is an irregular expander for some fixed

and minimum left degree greater than then the Simple
Parallel Decoding Algorithm can correct a fraction of the errors
in
Again, one can deduce from the theorem that if the bipartite graph is
regular, then similar argument shows that restriction on the minimum
left degree can be removed and the fraction of correctable errors improves
to

Proof: Assume that we are given a word that differs in variable

bits from a codeword in Let
V = {corrupt variable bits that enter a round}
F = {corrupt variable bits that fail to flip after a round}
E = {uncorrupt variable bits that become corrupt after a round}
and let be such that is the number of constraints that are
neighbors of V. That is, and after a decoding
round, the set of corrupt variable bits is To prove the theorem, we
Low-Density Parity-Check Codes 147

first show, by contradiction, that Suppose that

and consider such that Then
by expansion and by the execution of the algorithm,

Bounding using the above two inequalities yields

which gives a contradiction since
Now,

By expansion of the graph and execution of the algorithm,

Combining Equations (8.8) and (8.9) shows that

After rounds, all errors are corrected, which completes

the proof of the theorem.
Corollary 181 The Simple Parallel Decoding Algorithm can be per-
formed by a circuit of size and depth

3. Explicit Construction
In this section, we show an explicit construction of an asymptotically
good code from a low-density parity-check code. The code is due to
Barg and Zémor, and can be constructed in polynomial time and de-
coded in linear time. It is an asymptotically good binary code that does
not use concatenation, in the sense of Forney’s concatenated codes, and
furthermore, it is the first code to be constructed whose error exponent
148 CODES, GRAPHS, AND ITERATIVE DECODING

is comparable to that of concatenated codes. The derivation of the error

exponent is beyond the scope of this book; the interested reader may
refer to [13, 14]. The code construction improves upon the result of
Sipser and Spielman [103] which is the first asymptotically good binary
code that did not use concatenation.
In the following, we use Ramanujan graphs of Cayley type as origi-
nally constructed by Lubotzky, Phillips, and Sarnak [67], and by Mar-
gulis [79]. Explicit construction of such graphs was shown in Chapter
5. The bipartite graph B that we will use for our asymptotically good
code will be the edge-vertex incidence graph of a Ramanujan
graph G. The construction shown in Chapter 5 tells us that about half
of the known constructions of Ramanujan graphs are bipartite. We use
this fact and assume that G is bipartite so that the set of vertices of G
can be partitioned into two sets and Allowing
the vertices in G to be the constraint nodes in B, we have a reg-
ular bipartite graph between left nodes and right nodes. Notice
that every left node has a neighbor in and in The code is
defined by the set of codewords of length in which the neighbors of
each constraint node in are codewords of the block length code
and neighbors of each constraint node in are codewords of the block
length code For convenience, we will assume that and are
the same code of rate minimum distance and relative minimum
distance
The rate and relative minimum distance of the code follow from
Theorem 176. To get the best code in terms of rate and minimum
distance, we pick and that meet the Gilbert-Varshamov bound of
The decoding of the code is executed by iterating the
following Decoding Round.

Decoding Round:
Do the following in serial:

For each perform maximum likelihood decoding for its neigh-

bors in parallel. (Step One)

For each perform maximum likelihood decoding for its neigh-

bors in parallel. (Step Two)
Low-Density Parity-Check Codes 149

This algorithm exploits the fact that G is bipartite, facilitating itera-

tive decoding between the two codes and In fact, this decoding
scheme reflects roughly the essence of turbo decoding, in which iterative
decoding of component codes is executed in serial to decode the whole
code. Specifically, in this code there is a set of codes and
a set of codes that are iteratively decoded in serial, where the
codes in each set are maximum likelihood decoded in parallel. In a
parallel concatenated convolutional code, say the code by Berrou et al.,
there are two components codes, each of which is iteratively maximum a
posteriori decoded in serial. If G is not bipartite, then the best currently
known decoding algorithm [103] corrects a fraction of errors that is 12
times worse than that correctable by the above algorithm.
To analyze the performance of this code with this decoding algorithm,
assume that all-zero codeword has been sent and label the left nodes by
1 if the associated bits are in error. Let V be the nodes with label 1,
the nodes with label 1 after Step One in the Decoding Round, and
the nodes with label 1 after Step Two in the Decoding Round.
Furthermore, let M and N be the set of neighboring right nodes of
and the set of neighboring right nodes of respectively. Then if
M then neighbors of have at least nodes in common with V which
implies that If we assume that for
then
)

The upper bound of is approximately for large from the

property of Ramanujan graphs. Similarly, if then neighbors of
have at least nodes in common with Now by Lemma 158 of
Chapter 5, we have

which together with Equation (8.10) simplifies to

From the property of Ramanujan graphs, we can assume and

applying this to above equation, we get By induction, it-
erating the Decoding Round for times reduces the number
of errors to zero. Hence we have proved the following theorem [119].
Theorem 182 There exists a family of polynomial-time constructible
codes of rate and relative minimum distance for all
150 CODES, GRAPHS, AND ITERATIVE DECODING

that can correct any fraction of errors with

circuit of size and depth for all

Notice that in the theorem is defined for To get an

improvement in the range of assume that each left node represents
not a single variable bit but variable bits. The code is defined by
the set of codewords of length such that, for each constraint node
in the neighbors must be a codeword of a length binary code
or a length code and for each constraint node in
the neighbors must be a codeword of a length binary code or a
length code As before, we shall assume that and are
the same code of rate minimum distance and relative minimum
distance if treated as a binary code and of rate minimum distance
and relative minimum distance if treated as a code. For
analysis of this code, assume all-zero codeword has been sent and label
the nodes by 1 if at least 1 bit in the node is in error. Observe that all
the previous argument remains essentially unchanged, and in particular
we get and We will use the best codes
and ; in terms of rate and minimum distance as usual.
For codes, the Gilbert-Varshamov bound (Theorem 22) be-
comes arbitrarily close to the Singleton bound (Theorem 25) of
1 for large and Hence we know that for large and there exist
codes such that and Applying Lemma
161 from Chapter 5, we find that the number of right nodes, D, that
have a neighbor labeled 1 is at least

Notice that the number of bits among the left nodes with neighbors in
the above set is at least Hence the minimum relative distance of
this code is approximated by

which goes to for large, since for Ramanujan

graphs. We have proved the following theorem, originally developed by
Barg and Zémor [13].

Theorem 183 There exists a family of polynomial-time constructible

codes of rate and relative minimum distance
Low-Density Parity-Check Codes 151

for all that can correct any fraction

of errors.

The theorem implies that the relative minimum distance of this code
is defined for 5-times the range from the previous con-
struction. The decoding algorithm shown in this section can also be
simulated in linear time under the logarithmic cost model. To do this,
the Decoding Round must be modified as follows:

Decoding Round':
For each perform maximum likelihood decoding for its
neighbors in serial.

The decoding is executed by iterating Decoding Round'. Since the

neighbors of each or are distinct from each other, the analysis
of error correction of codes and follows directly from the previous
results. During each Decoding Round', we shall keep a list of pointers
to each constraint whose neighbors do not form a codeword. Clearly,
these are the only constraints that need to be decoded and decoding
complexity is linear in the total size of those lists. Since the number
of constraints whose neighbors do not form codewords decreases by a
constant factor for each Decoding Round', the sum of the number of
such constraints is a sum of a geometric series. Since the largest term in
the geometric series is linear in the length of the code, we have proved
the following.
Theorem 184 There exists a family of polynomial-time constructible
codes of rate and relative minimum distance
for all that can correct any fraction
of errors in linear time.

4. Gallager’s Decoding Algorithms

In his doctoral thesis Gallager introduced two low complexity decod-
ing algorithms for his low-density parity-check codes. He then provided
analyses of the algorithms’ performance for a special class of low-density
parity-check codes – high girth codes. The first of his two algorithms
is a hard decision decoding algorithm that is similar to the belief prop-
agation algorithm when applied to the bipartite graph of a low-density
parity-check code. The hard decision decoding algorithm has two forms
152 CODES, GRAPHS, AND ITERATIVE DECODING

that we will call Gallager’s Algorithms 1 and 2. The second decoding

algorithm is a soft decision decoding algorithm that is precisely the be-
lief propagation algorithm. This will be described in detail in the next
section.
For ease of calculation of certain properties of low-density parity-check
codes, we consider an ensemble of codes that is defined by the number
of ones in each column and in each row and the size of the parity-check
matrix of the codes. An ensemble of codes allows one to calculate various
properties, such as the decoder error probability, for a class of codes that
would otherwise be very difficult to carry out for a particular code. By
a randomly chosen code, we shall mean a code that is a representative
of the ensemble in the sense that the code satisfies the properties of the
ensemble, and similarly for the associated graph.
We first derive the lower bounds on the fraction of correctable errors
for randomly chosen regular graphs that do not necessarily have high
girth. Empirical results show that Gallager’s Algorithms 1 and 2 can
correct a larger number of errors in randomly chosen irregular bipartite
graphs than in regular bipartite graphs. From an analysis of randomly
chosen regular graphs using techniques for random processes, we derive
lower bounds on the fraction of correctable errors for randomly chosen
irregular graphs. Analyses will show that, for randomly chosen graphs,
while Gallager’s Algorithms can provably correct regular codes
up to only a fraction 0.0517 of errors asymptotically, they can provably
correct the same rate irregular codes up to a fraction 0.0627 of errors
asymptotically. This key result – that the error correcting capability of
randomly chosen irregular low-density parity-check codes is better than
that for their regular counterparts using Gallager’s Algorithms 1 and 2
– will lead to the best currently known error correcting performance, as
we will see in the next section.
In Gallager’s Algorithms 1 and 2, messages are passed along the edges
of the bipartite graph between variable nodes and constraint nodes.
Messages along the edges from variable bit nodes to constraint nodes
are passed first and then the messages along the edges from constraint
nodes to variable nodes are passed next. This pair of message passing
operations will constitute a single decoding round. The messages will
represent the estimates of the variable bits. To begin this section, we
state Gallager’s Algorithms 1 and 2.

Gallager’s Algorithm 1:

Iterate the following two steps: For all edges do the following in
parallel:
Low-Density Parity-Check Codes 153

If this is the zeroth round, then set to

If this is a subsequent round, then set as follows:
If all the check nodes of excluding sent the same value to
in the previous round, set to this value.
Else set to
In either case, sends to
For all edges do the following in parallel:
Set the message to the exclusive-or of the values received in
this decoding round from its adjacent nodes excluding
sends to

Gallager’s Algorithm 2:
Iterate the following two steps: For all edges do the following in
parallel:
If this is the zeroth round, then set to
If this is a subsequent round, then set as follows:

If at least check nodes of excluding sent the same value

to in the previous round, set to this value.
Else set to
In either case, sends to
For all edges do the following in parallel:
Set the message to the exclusive-or of the values received in
this decoding round from its adjacent nodes excluding
sends to

The next lemma provides efficient computation for the message passed
from a constraint node to a variable node during decoding.
154 CODES, GRAPHS, AND ITERATIVE DECODING

Lemma 185 Consider nodes whose values are binary and taken in-
dependently, where each node has the value 0 with probability The
probability that an even number of nodes have value 0 is

Proof: Consider the probability that an even number of nodes have

value 1. We start with two similar functions and
The coefficients of in the first function are the probabilities
that there are 1’s. By adding the two functions, all even powers of
are doubled and the odd powers of are canceled out. Therefore, letting
and dividing by 2 yields the probability that there are an even
number of 1’s. Since

finishes the proof of the lemma.

For both Gallager’s Algorithms 1 and 2, the probability that
is not the actual transmitted codeword bit will be interpreted as the
probability of decoding error. Gallager’s Algorithms 1 and 2 both only
require computation that is bounded by the degrees of the variable bit
and constraint nodes. Hence Gallager’s Algorithms 1 and 2 are both
linear-time bounded per decoding round.
Consider a randomly chosen regular bipartite graph for which we now
derive the lower bounds on the fraction of correctable errors. One must
be careful when applying a probabilistic decoding algorithm to graphs
with cycles. Results in Chapter 6 tells us that while the neighborhood
of a node is well modeled by a tree for some fixed number of decoding
rounds, the same does not hold true for a large number of decoding
rounds. To see this explicitly, it will be convenient to consider the un-
wrapped version for a given node, as shown in Figure 8.2.
If the number of decoding rounds is small, there will be no multiple
copies of the same node in the unwrapped graph and we may consider the
bipartite graph as a tree. If the number of decoding rounds is increased,
there will be multiple copies of the same node in the unwrapped graph
and we can no longer consider the bipartite graph as a tree. The for-
mer case is easy to analyze and the latter case is not. To make analysis
simple, Gallager gave explicit construction of graphs whose girth is log-
arithmically increasing in the number of variable nodes. A logarithmic
increase was all that is necessary since the expected girth of a random
graph of fixed degree sequences increases logarithmically in the number
Low-Density Parity-Check Codes 155

of nodes in the graph. Rephrased, Gallager’s analysis of Gallager’s Algo-

rithms 1 and 2 assumed that the bipartite graph can be represented by a
tree. We will call his assumption a tree-like neighborhood assumption. In
fact, a tree-like neighborhood assumption is true with high probability
for increasing number of nodes.
Lemma 186 The probability that the neighborhood of depth of a node
in a bipartite graph is not tree-like is for some constant and
number of variable bits
The lemma implies that we can correctly calculate the average prob-
ability of bit error even if we assume the graph does not have cycles.
However, it will be nice to see that we can calculate correctly any par-
ticular bit error probability. A martingale argument shows that we can
do this with high probability as the block length of the code increases.
In other words, we can calculate with arbitrary precision the probabil-
ity of bit error on graphs with cycles just by calculating the probability
of bit error assuming that the graph does not have cycles. Here is the
necessary theorem from [93].
Theorem 187 (Sharp Concentration Theorem) Let be the ran-
dom variable describing the fraction of edges set to pass incorrect mes-
sages after rounds. be the expected number of incorrect messages
passed along an edge with the tree-like neighborhood assumption after
rounds. Then for any there exist constants and N such that for
156 CODES, GRAPHS, AND ITERATIVE DECODING

Proof: [Sketch] First, consider the probability that the neighborhood

around a variable bit does not form a tree. The previous lemma tells
us that a neighborhood of depth of a variable bit is not a tree with
probability less than On the other hand, for large By
revealing the edges in the neighborhood of depth one by one using
an edge martingale argument followed by Azuma’s Inequality (Theorem
152 of Chapter 5), the fraction of edges with non-tree neighborhoods is
greater than with probability at most
Second, now that we know the number of non-trees is small, we only
need to prove the tight concentration of around given that vari-
able bits may be initially wrong with some probability. For this, if
is the expected number of edges set to pass incorrect messages after
rounds, then with high probability. Using Azuma’s In-
equality, we get for some constant that
depends on
We shall use the notation of for the probability of sending 0 and
for the probability of sending 1, both in the round. So is
the probability of sending the incorrect message in the round by our
assumption of the all-zero codeword being sent.
Assume that B is a tree and consider the end of the decoding
round of Gallager’s Algorithm 1. Since each was incorrectly sent to
with probability after a simple computation, the probability that
receives an even number of errors, and possibly no error, is given by

Hence the probability that receives incorrectly but sends correctly in

round is given by

and the probability that receives correctly but sends incorrectly in

round is given by

We can combine the above equations and get a recursive equation for
in terms of as follows.
Low-Density Parity-Check Codes 157

Theorem 188 Gallager’s Algorithm 1 corrects all but at most an arbi-

trarily small number of errors in regular in some constant number
of rounds for all where such that
and is given by Equation (8.11).

The Sharp Concentration Theorem with Theorem 188 defines the

threshold for Gallager’s Algorithm 1 on a BSC in the sense that if the
crossover probability in a BSC is less than the number determined by
Theorem 188, then the probability of successful decoding approaches 1
for almost all increasing block length codes. Conversely, if the crossover
probability in BSC is greater than the number determined by Theorem
188, then the probability of successful decoding is bounded away from
1, for almost all increasing block length codes. A list of for various
values of and is shown in Table 8.1, which is taken from [93].

For Gallager’s Algorithm 2, notice that to get the expression for

we only need to replace each

factor in Equation (8.11) by

We state a similar theorem for Gallager’s Algorithm 2 without proof.

Theorem 189 Gallager’s Algorithm 2 corrects all but at most an arbi-
trarily small number of errors in regular in some constant number
of rounds for all where such that
and as just described.
158 CODES, GRAPHS, AND ITERATIVE DECODING

As before, the Sharp Concentration Theorem with Theorem 189 de-

fines the threshold for Gallager’s Algorithm 2 on BSC. A list of for
various values of and is shown in Table 8.2 [93]. Comparing with
Table 8.1, Table 8.2 shows that Gallager’s Algorithm 2 achieves higher
error correcting capability than Gallager’s Algorithm 1. The best choice
of for the most possible advantage of Gallager’s Algorithm 2 over 1
was given by Gallager and is the smallest integer that satisfies

Now combine the results of Theorems 188 and 189 with those of The-
orems 177 and 180. We know Gallager’s Algorithms 1 and 2 reduce the
number of bits in error to a small number with exponentially high prob-
ability, and Simple Sequential and Parallel Decoding Algorithms reduce
the number of bits in error to zero if the bipartite graph defining the
code is some expander. Hence in order to guarantee successful decod-
ing, use Lemma 159 of Chapter 5 to show whether a randomly chosen
bipartite graph has the necessary expansion, and then decode by Gal-
lager’s Algorithms 1 or 2 followed by the Simple Sequential or Parallel
Decoding Algorithms.
Regarding the complexity of this cascaded decoding, we know Gal-
lager’s Algorithms 1 and 2 both require only a linear number of com-
putations per decoding round, and we only need to perform a constant
number of decoding rounds of Gallager’s Algorithms 1 and 2 to reduce
the number of bits in error to a small number. On the other hand, we
know Simple Sequential and Parallel Decoding Algorithms both require
only a linear number of computations to correct all the remaining bits
in error. Hence for a bipartite graph with minimum left node degree
greater than or equal to 5, there exist explicit linear-time bounded algo-
Low-Density Parity-Check Codes 159

rithms that can correct all bits in error successfully with exponentially
high probability.

Theorem 190 If and the minimum left node degree is greater

than or equal to 5, then Gallager’s Algorithms 1 or 2 followed by the
Simple Sequential or Parallel Decoding Algorithms successfully corrects
all bits in error in some constant number of rounds with high probability.

It is easy to remedy the restriction of minimum left node degree

for successful decoding, which we state as a corollary.

Corollary 191 If then Gallager’s Algorithms 1 or 2 followed

by the Simple Sequential or Parallel Decoding Algorithm successfully cor-
rects all bits in error in some constant number of rounds with high prob-
ability.

Proof: Consider the time after a constant number of decoding rounds

when all but at most variable bits have been corrected by Gallager’s
Algorithm 1 or 2. At this time, construct a new bipartite graph between
variable bit nodes from the original graph and constraint nodes
such that the minimum left degree is 5. At this time, correct the at most
variable bits in error by the Simple Sequential or Parallel Decoding
Algorithm. By making arbitrarily small, the rate of the new code
obtained by adding constraint nodes is almost the same as the original
code.
We now extend the previous argument to the analysis of randomly
constructed irregular graphs. The Sharp Concentration Theorem and
next two theorems[71] combine to define the thresholds for Gallager’s
Algorithms as applied to irregular codes. The error correcting capability
for irregular codes will be shown to be strictly greater to that for codes
defined by regular bipartite graphs. The basic rationale is as follows.
A variable node will require a high degree in order to receive messages
from a larger number of constraint nodes. On the contrary, a constraint
node will want to have low degree in order to send a larger number of
correct messages to the variable nodes. Hence by allowing the degrees
of the nodes to vary, we can satisfy this requirement better than by
fixing the degrees of the nodes. For example, by fixing the degrees of
the constraint nodes and varying the degrees of the variable nodes, high
degree variable bit nodes that are more likely to be corrected in a small
number of decoding rounds than low degree variable nodes can help low
degree variable nodes in error to be corrected in later decoding rounds.
160 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 192 Gallager’s Algorithm 1 corrects all but at most an ar-

bitrarily small number of errors in in some constant number of
rounds for all where such that and

Proof: The proof is similar to those above.

Building upon the above theorem, we state a similar theorem for Gal-
lager’s Algorithm 2 without proof. To do this we need a slight mod-
ification to the algorithm to take into account the irregularity of the
bipartite graph. Hence we replace if at least check nodes of ex-
cluding sent the same value to in the previous round, set to this
value with if at least check nodes of degree node excluding sent

the same value to in the previous round, set to this value. To get
the expression for we only need to replace each

factor in Equation (8.12) by

Theorem 193 Gallager’s Algorithm 2 corrects all but at most an ar-

bitrarily small number of errors in in some constant number of
rounds for all where such that and
as just described.

As before, we can strengthen the above theorem by choosing to

minimize A solution to this is the smallest integer that satisfies
Low-Density Parity-Check Codes 161

Since = (number of check nodes that

agree in majority) - (number of check nodes that agree in minority),
independent of the degree of a node, it suffices to only test whether the
exponent in the above equation is above a threshold value in deciding
whether to send 0 or 1. Theorem 190 and Corollary 191 all apply to
codes defined by irregular bipartite graphs and so we will not repeat
them here.
In constructing irregular codes, there is the problem of the design of
degree sequences. A degree sequence lists the fractions of the nodes in
the graph that have degrees 1, 2, 3, and so forth. It turns out that selec-
tion of the degree sequence has a significant impact on code performance.
The application of Gallager’s Algorithms to graphs with different degree
sequences results in performance variations across the associated codes.
We will find degree sequences of irregular graphs that can correct larger
number of errors than others. We call such degree sequences, not sur-
prisingly, good degree sequences. The search for good degree sequences
was initially motivated by empirical results.
Intuitively, the search for a good degree sequence may proceed as
follows. The probability that a constraint node sends a correct message
to a variable node at the end of decoding round is This
number is approximately equal to for small
Hence to maximize we minimize the expected number of
neighboring variable nodes of the check node. Rephrased, good degree
sequences will include graphs in which a neighboring check node of a
variable node has a small number of neighboring variable nodes.
is maximized if all check nodes have almost the same
degree if and in particular we study the case when all
but at most two consecutive are zero. We shall call such sequences
right-concentrated degree sequences.
Linear programming is the best currently known method for finding
degree sequences that give the largest possible Given a good left de-
gree sequence linear programming allows us to find a good
right degree sequence and vice versa. Whereas this method
does not find good left and right degree sequences simultaneously, it
finds sequences that give performance that are sufficiently better than
that of regular codes of the same block length and code rate. Table 8.3
shows numerical results for good degree sequences of found by
the method of linear programming [71]. The table shows that the best
right sequences are usually the ones with at most two nonzero entries,
162 CODES, GRAPHS, AND ITERATIVE DECODING

and in particular, only one nonzero entry right sequence suffices in most
cases.

Codes 1 and 2 in Table 8.3 have minimum left degree greater than
or equal to 5, giving the graph the necessary expansion called for in
Theorem 190. Codes 1 and 2 are decoded by Gallager’s Algorithm 1
or 2 followed by Simple Sequential or Parallel Decoding Algorithm for
successful decoding. To see the effect of finite code lengths for Codes 1
and 2 in the probability of decoding error, we consider Figure 8.3[71].
Recall that in Theorems 192 and 193 are valid only for infinite code
length. A code with 16000 variable bit nodes and 8000 constraint
nodes is shown in the figure and for the (4,8)-regular code,
which is the best performance among regular codes. The figure shows
the percentage of successful decoding operations based on 2000 trial
runs, and shows that it agrees quite well with the theoretical result of
the asymptotic number of Codes 3 and 4 in Table 8.3 do not have
minimum degree greater than or equal to 5 and hence we decode them
with Gallager’s Algorithm alone. Simulation results show that Gallager’s
Algorithm 1 or 2 usually corrects all the errors and it is unnecessary to
switch to the Simple Sequential or Parallel Decoding Algorithm.
Figure 8.4 [71] shows the effect of finite code lengths of Codes 3 and
4 in the probability of decoding error. Similar to Figure 8.3, this figure
shows the percentage of successful decoding operations based on 2000
trial runs and shows that it agrees quite well with the theoretical result
of asymptotic number of

5. Belief Propagation Decoding

In this section, we describe the application of the belief propagation
algorithm to the graphs of low-density parity-check codes, and show how
Low-Density Parity-Check Codes 163

the probability distribution of bit error defined by the algorithm evolves

as the number of decoding rounds increases. Through application of the
Sharp Concentration Theorem, we assume the bipartite graph defining
the code is represented by a tree and calculate the threshold for which
the probability of decoding error goes to zero in the number of variable
nodes. Hence the idea of calculation of the error correcting capability
is basically the same as that for Gallager’s Algorithms. On the other
hand, in the analysis of Gallager’s Algorithms, the calculation of the
164 CODES, GRAPHS, AND ITERATIVE DECODING

error correcting capability was simplified by the fact that the messages
passed between the variable bit and constraint nodes were binary. In
belief propagation decoding, the messages are real numbers, making the
analysis much more difficult.
We prove that the application of belief propagation algorithm to a
randomly chosen irregular low-density parity-check code results in er-
ror correcting performance that is very near the theoretical limit. For
example, in the AWGN channel, a code has an error correcting
capability that is less than 0.06 dB from the theoretical limit. Confirm-
ing our theoretical analysis, empirical results also show that irregular
low-density parity-check codes, decoded using the belief propagation al-
gorithm, achieve near the theoretical limit.
For purposes of explication, our channel of interest will be either the
BSC or AWGN. As before, we use throughout this chapter to mean
that the left hand side of is equal to the right hand side weighted by
a normalizing constant. We first state the algorithm using the notation
of Chapter 6. will represent the message from the constraint or
child node to the variable bit or parent node2, or the message from the
instantiated or child node to the variable bit or parent node in the
decoding round. will represent the message from the variable bit or
parent node to the constraint or child node3 in the decoding round.
The Bayesian network of a low-density parity-check code is shown in
Figure 8.5.

Belief propagation decoding starts with the set of nodes sending

messages to its set of parents nodes Because the part
of the Bayesian network that has the sets of nodes and does
not have cycles, this will be the only time messages are sent between the

2
We assume that the reader will not confuse this with left degree sequences that use the same
notation.
3
Likewise, we assume that the reader will not confuse this with right degree sequences.
Low-Density Parity-Check Codes 165

two sets of nodes. Hence, we shall not use the superscript denoting the
decoding round number for the messages that are passed between the
two sets of nodes.

In the part of the Bayesian network that has the sets of nodes
and in each decoding round, each variable bit node sends a message
to a constraint node first and a constraint node sends a message to a
variable node second. Consider regular codes first. Define
and
Belief propagation decoding of a regular
low-density parity-check code proceeds as follows.

Simplifying the above decoding rounds and combining the variable

node and the constraint node message passing to get an a posteriori
probability of we obtain the following algorithm.
166 CODES, GRAPHS, AND ITERATIVE DECODING

Belief Propagation Decoding:

Variable Node Pass:

Constraint Node Pass:

Variable Node Information:

where if and, and otherwise.

The above expressions indicate that in the variable node pass

– each edge is assigned the probability of the variable node having the
correct value and in the constraint node pass – – each edge is
assigned the probability of the modulo 2 sum of values of other adjacent
variable nodes being the correct value.
Messages being real numbers in this section necessitates redefining the
notation of As in Gallager’s Algorithms, will indicate
the probability that a randomly chosen variable node has value a in the
decoding round. The difference here is that while was the message
sent with probability in the decoding round in Gallager’s Algo-
rithms case, will denote the actual message sent in the decoding
round in the belief propagation algorithm. For example, will equal
in this section while equaled in Section 8.3. With this notation,
we have, for the Variable Node Pass,

Lemma 185 tells us that the Constraint Node Pass can be reformulated
as
Low-Density Parity-Check Codes 167

The belief propagation algorithm for regular low-density parity-check

codes easily generalizes to that for irregular codes. Consider the follow-
ing theorem [94] regarding the probability of decoding error.
Theorem 194 If is the code defined by a bipartite graph B, then
the probability distribution for the log-likelihood of messages in the belief
propagation algorithm is given by

where

Proof: [Sketch] For notational convenience, we will sketch the proof for
the case of regular codes. To calculate the probability density of the
message of the variable node pass, it will be convenient to express the
message in the form of a log-likelihood ratio. Let

Substituting for and from the Variable Node Pass gives us

If then
Hence for regular graphs,
168 CODES, GRAPHS, AND ITERATIVE DECODING

where * denotes the convolutional operator. We leave it as an exercise

to show that for irregular graphs,

Expressing the message sent from the constraint node to the variable
node in the form of a log-likelihood ratio, we have

Then substituting for and from the Constraint Node Pass gives
us

As in the case of the variable node pass, we now calculate the proba-
bility distribution of the message from constraint node to variable node.
To facilitate the calculation, we adopt the following trick. Define

Then for regular bipartite graphs,

where is the inverse of

Low-Density Parity-Check Codes 169

We leave it as an exercise to show that for irregular bipartite graphs,

The Sharp Concentration Theorem in conjunction with Theorem 194

defines the threshold for belief propagation decoding on BSC and AWGN
channels in the sense that if the crossover probability in BSC or the stan-
dard deviation of the noise in AWGN channel is less than the threshold
for successful decoding that can be determined by equations in the theo-
rem, then the probability of successful decoding approaches 1 for almost
all increasing block length codes. Conversely, if the crossover probability
for the BSC or the standard deviation of the noise in the AWGN channel
is greater than the threshold, then the probability of successful decoding
is bounded away from 1, for almost all increasing block length codes.
Tables 8.4 and 8.5 [93] show the largest possible crossover probability
and the largest possible standard deviation for successful belief
propagation decoding versus the theoretical limit and
for various values of and In Table 8.5 the channel is a binary-input
AWGN channel.
170 CODES, GRAPHS, AND ITERATIVE DECODING

In Figure 8.6 [94], three curves for the best currently known
codes of length from the classes of regular low-density parity-
check codes, irregular low-density parity-check codes, and turbo codes
are shown. In Figure 8.7 [94], six curves for the best currently known
codes of lengths and from the classes of irregular
low-density parity-check code, and turbo code are shown. The dotted
curves represent turbo code curves and the solid curves represent the
low-density parity-check codes curves. These curves show that while
low-density parity-check codes perform worse than turbo codes for small
block length, they outperform turbo codes as the block length of the code
is increased.

Table 8.6, taken from [94], lists a set of good degree sequences for
irregular codes that give high standard deviation for successful decoding
in AWGN channel.
We can further improve the performance demonstrated in the previous
section by using codes defined over larger fields. The classical results tell
us to increase the block length of the code to get a lower probability of
decoding error. It follows that we want to know if we can improve the
code performance if we use bipartite graph of variable nodes over the
field instead of variable nodes over the
field GF(2). Following empirical results from [30], we can see that this
may indeed be the case. In Figure 8.8, low-density parity-check codes of
block lengths 2000, 1000, 6000, 2000 over fields GF(2), GF(4), GF(2),
GF(8), respectively, are tested over the BSC. In Figure 8.9, low-density
Low-Density Parity-Check Codes 171

parity-check codes of block lengths 18000, 9000, 6000, 6000 over fields
GF(2), GF(4), GF(8), GF(16), respectively, are tested over the binary
Gaussian channel. These figures show that code performance can be
improved by using the bipartite graph whose variable nodes take values
in larger field.
172 CODES, GRAPHS, AND ITERATIVE DECODING

6. Notes
It has been shown that low-density parity-check codes can achieve
the theoretical limit when maximum-likelihood decoded [75, 84]. Un-
fortunately, a linear-time bounded algorithm that provides maximum-
likelihood decoding of low-density parity-check codes is yet to be found if
it exists. In fact, optimal decoding of randomly constructed low-density
parity-check codes is known to be NP-hard [18]. As a result, we follow
the lead of Berrou, Glavieux, and Thitimajshima and look for linear-time
bounded suboptimal decoding algorithms that produce performance very
close to that given by maximum-likelihood decoding. Best known sub-
optimal decoding algorithms use the recursive structure of the code to
Low-Density Parity-Check Codes 173

facilitate the decoding, and are iterative. Each iteration takes only lin-
ear time, and the algorithms require only a constant number of rounds.
Bounds on the number of errors guaranteed to be corrected by some of
the algorithms indicate that they are very close to optimal decoding.
In particular, the suboptimal iterative decoding algorithms that we pre-
sented are variations of the belief propagation algorithm as applied to
the graph that defines the code. For example, hard-decision decoding
algorithms that we presented are variations of a hard-decision version of
the belief propagation algorithm and soft-decision decoding algorithm
was precisely the belief propagation algorithm. The variations of the
belief propagation algorithm presented here proceed until a codeword is
found or decoding failure is declared after some fixed number of decoding
iterations.
Gallager [40, 41] considered regular bipartite graphs of high girth and
gave an explicit construction of such graphs, as randomly chosen graphs
may have short cycles. He derived low complexity message-passing de-
coding algorithms on the graphs and the algorithms’ lower bounds on
the number of correctable errors. The explicit construction of high girth
graphs was motivated by his desire to make the analysis simple, given a
tree-like neighborhood structure of the graph. While Gallager’s explicit
construction yielded the necessary girth for his analysis, Margulis [78]
gave an explicit construction whose girth is larger than Gallager’s. Low-
density parity-check codes were then largely forgotten for more than 30
years, with the exception of Zyablov and Pinsker’s work [120] on de-
coding complexity and Tanner’s work [107] on codes defined by graphs.
174 CODES, GRAPHS, AND ITERATIVE DECODING

It was not until MacKay [73] showed that Gallager’s decoding algo-
rithms are related to the belief propagation algorithm that Gallager’s
work received renewed interest. Improving on the results of Gallager,
MacKay showed that low-density parity-check codes can achieve near
the theoretical limit if decoded by belief propagation algorithm [74], and
that low-density parity-check codes can achieve the theoretical limit if
maximum-likelihood decoded [75].
Sipser and Spielman [103] introduced a class of low-density parity-
check codes called expander codes in 1996 that are low-density parity-
check codes whose graphs have good expansion properties. Empirical
results show that Simple Sequential and Parallel Decoding Algorithms
seem to correct a significantly larger number of errors than are guaran-
teed by the theorems. Zémor [119] in 1999 and Barg and Zémor [13, 14]
in 2001 and 2002 have improved Sipser and Spielman’s result on the er-
ror correction capability of a family of explicitly constructible expander
codes. Furthermore, their results imply that the error exponent is a
better measure of error correcting performance than the minimum dis-
tance of a code, and show that a family of low-density parity-check codes
achieve capacity of a BSC under an iterative decoding procedure. The
construction of codes using expansion properties has also been studied
by Alon et al. [7], who constructed low-rate asymptotically good codes.
It is worth noting that while the encoding of expander codes requires
the usual quadratic-time complexity, Lafferty and Rockmore [62] gave an
encoding scheme based on Cayley graphs and representation theory that
has sub-quadratic-time complexity. In [95], Richardson and Urbanke
made use of the sparseness of parity-check matrix of low-density parity-
check codes to obtain efficient encoding schemes that allow near linear
time encoding. The application of Gallager’s decoding algorithms to
expander codes was carried out in [23].
In 1998, Luby et al. [71] generalized a result of Gallager that provided
lower bounds on the number of correctable errors on randomly chosen
regular and irregular graphs. Furthermore, Luby et al. [72] improved
MacKay’s results regarding the belief propagation decoding of regular
codes and was able to achieve performance even closer to the theoretical
limit by using irregular codes. In the same year, Davey and MacKay
[30] improved Luby et al.’s result by using irregular codes over larger
fields. Soon after, Richardson et al. [94] gave the best known irregular
codes over the binary field by carefully selecting the degree sequences of
the codes.
Linear programming and gaussian approximation methods have been
carried out in [25, 26] to compute the error probability of belief propaga-
tion. The former method resulted in low-density parity-check codes that
Low-Density Parity-Check Codes 175

have performance within 0.0045 dB from the theoretical limit. While the
best known codes relied on random selection from an ensemble of codes,
explicit algebraic expander and Ramanujan graphs were used to con-
struct codes that have performance comparable to regular low-density
parity-check codes in [63] and in [96], respectively.
Chapter 9

LOW-DENSITY GENERATOR CODES

1. Introduction
Low-density generator codes are defined by a bipartite graph
where X represents the set of information nodes and C rep-
resents the set of check nodes. The codeword of a low-density generator
code is then the values of nodes in X concatenated with those in C. The
values of nodes in C are defined by those in X and the set of edges E.
The generator matrix of a low-density generator code is in the form of
G = [I : P] and the adjacency matrix of B is in the form of
where I is the identity matrix. It follows that the construction of a
low-density generator code reduces to the selection of a bipartite graph.
Since we will typically use the left and right sets of nodes in the bipar-
tite graph as the information and check nodes, respectively, we will refer
to information or variable nodes as left nodes and check nodes as right
nodes. The generator matrix induced by the graph is sparse because the
degree of the nodes in the graph is fixed while the number of nodes in
the graph is increased (hence the name for this class of codes). Clearly,
the encoding of a low-density generator code takes a linear number of
operations in the number of variable bits by construction.
Low-density generator codes clearly seem to share the features low-
density parity-check codes have and it may seem unnecessary to have a
chapter on low-density generator codes. In fact, one code is defined by
its generator matrix, and the other is defined by its parity-check matrix.
We shall see in this chapter that class of low-density generator codes
clearly has applications of its own and enables one to obtain results not
possible with low-density parity-check codes. To begin the exposition, let
178 CODES, GRAPHS, AND ITERATIVE DECODING

us formally define a low-density generator code. Low-density generator

codes can be divided into regular and irregular low-density generator
codes. Regular codes are codes defined by unbalanced regular bipartite
graphs and irregular codes are codes defined by unbalanced irregular
bipartite graphs. We will refer to them as regular and irregular codes,
respectively, throughout this chapter.
Throughout the chapter, we use the notations and
for the variable node, check node, received variable node, received check
node, message from a variable node to a check node and message from
a check node to a variable node, respectively. By abuse of notation, we
shall use node and bit (value taken by the node) interchangeably. Much
of the notation with obvious analogs will follow from Chapter 8. We
shall normally describe codes in terms of irregular bipartite graphs as in
Chapter 8.
Definition 195 Let B be a bipartite graph between variable nodes
and check nodes A low-density
generator code is

where is an incidence function defined such that for each check

the nodes neighboring are is the number of
check nodes, and are very small compared to the number of variable
nodes
As we have previously seen, if and are the average
left and right degrees, respectively. If the bipartite graph is regular,
then we will interpret the neighbors of each check node concatenated
with the check bit as a codeword of a code We
shall, unless otherwise specified, consider in which for a given set of
variable bits, each check bit is defined as the XOR of the neighboring
variable bits in the graph (i.e. is a parity-check code). An example
of a bipartite graph B with variable nodes is shown in
Figure 9.1. For each check node in the figure, all neighboring bits of a
check bit concatenated with it must be a codeword of for the bits
to be a codeword of
In generalization of the above definition, we may also consider in
which the check node is not necessarily the XOR of the neighboring
variable bits. For example, the check node may represent more than
one bit. Consider B that is a bipartite graph between sets of and
nodes where each left node represents 1 bit and each right node
Low-Density Generator Codes 179

represents bits, and is an error-correcting code with check bits. A

low-density generator code is then a code of variable bits and
check bits where the bits in the check node are the check bits
of the codeword of of which the variable bits are
An example of the low-density generator code of this kind defined by
graph B is shown in Figure 9.2.
180 CODES, GRAPHS, AND ITERATIVE DECODING

Clearly if B is a graph between left nodes and

right nodes, and is a block length and rate code, then is
a block with rate
Consider the following simple example, which provides a general idea
of what is involved in the encoding of a low-density generator code.
Low-density generator codes are special cases of systematic codes; for
example, the (7,4)-Hamming code with the following generator matrix
G is systematic.

If is a sequence of variable bits, then is the sequence

of check bits where and

The bipartite graph that corresponds to the generator matrix G is

shown in Figure 9.3. In the figure, the nodes labeled are the
variable bits and the nodes labeled are the check bits such that
the neighbors of each must sum to the value of the check bit mod 2 by
definition of the generator matrix.
The decoding philosophy of low-density generator code is identical to
that of low-density parity-check codes. For example, the decoding starts
by the variable nodes sending messages to the check nodes
Then the check nodes send messages to the variable
nodes This will constitute one decoding round and a decoding
round will repeat until a valid codeword is found or the process is stopped
Low-Density Generator Codes 181

after some predetermined period. One can send messages sequentially

or in parallel. The messages sent by the variable and check nodes are
defined by the algorithms used in the decoding, which we will consider
shortly.

2. Decoding Analyses
In this section we exhibit linear-time decoding algorithms that are
closely related to the algorithms of low-density parity-check codes. We
present two simple bit-flipping algorithms and their bounds. While the
lower bounds on the number of errors that are guaranteed to be corrected
by the algorithms presented in the theorems are very small constants,
the simulation results show that the algorithms seem to correct a signif-
icantly larger number of errors. As one could have easily guessed, Gal-
lager’s Algorithms and the belief propagation algorithm can be applied
to low-density generator codes to obtain efficient decoding algorithms.
We only very briefly discuss them in this chapter, as the basic philosophy
of decoding analysis is identical to that shown in the previous chapter.
The performance of low-density generator codes and that of low-density
parity-check codes are also comparable, at least as decoded by these two
types of algorithms.
In conducting a performance analysis, it will be convenient to first
define an error reducing code. Roughly speaking, if not too many of the
variable bits and check bits are corrupted, then an error reducing code
corrects a fraction of the corrupted variable bits while leaving the cor-
rupted check bits unchanged. The algorithms and concepts introduced
in this section follow from [105].

Definition 196 A low-density generator code of variable bits

and check bits is an error reducing code of rate R, error re-
duction and reducible distance if there exists an algorithm such that
for a word that differs from a codeword in at most
variable bits and check bits, the algorithm outputs a word that
differs from in at most variable bits.

To establish the algorithms of error reducing codes that will reduce

the number of errors in the variable bits, it will be convenient to define
a check to be satisfied if the XOR of the bits of the adjacent variable
nodes is equal to the bit of the check node Otherwise, define the check
to be unsatisfied. We say we flip a variable bit when we set the variable
bit to 1 if it was 0, and set the variable bit to 0 if it was 1. Associated with
182 CODES, GRAPHS, AND ITERATIVE DECODING

the code are two simple decoding algorithms: Simple Sequential Error
Reducing Algorithm and Simple Parallel Error Reducing Algorithm.

Simple Sequential Error Reducing Algorithm:

If there is a variable bit that has more unsatisfied than satisfied neigh-
bors, then flip the value of that variable bit.
Repeat until no such variable bit remains

To calculate error reduction and reducible distance bounds, we exploit

the expansion properties of bipartite graphs. The next two theorems
build upon the good expansion property of a randomly chosen graph, as
developed in Chapter 5. In the following discussion will denote
the minimum degree of the left nodes of a bipartite graph.
Theorem 197 If B is an irregular expander, then
is an error reducing code of error reduction and reducible distance

Proof: We show that the Simple Sequential Error Reducing Algorithm

is the algorithm that we need. Let µ variable and check bits be corrupt
and set such that is the number of edges connected to corrupt
variable bits. Let be the number of unsatisfied check bits, be the
number of satisfied check bits whose neighbors are corrupt variable bits,
and be the number of left nodes. Initially, We use the
following claim to prove the theorem.
Claim: there is a variable bit whose value is flipped by
the execution of the algorithm. By expansion of the graph, we have

and by definition of satisfied and unsatisfied check bits, we must have

Combining the above two inequalities yields,

Low-Density Generator Codes 183

Thus, since when there is some variable bit

that has more unsatisfied than satisfied neighbors which completes the
proof of the claim.
Now the claim tells us that if the algorithm halts, then or
We show by contradiction that if the algorithm halts then
Since and initially, we can get an upper bound
on which monotonically decreases by the execution of the algorithm.
So,

If the algorithm halts so that then before the algorithm halts,

there must have been a time when This can be translated into
a lower bound on via Equation (9.1) as

which is a contradiction. Because the algorithm must halt since the

number of unsatisfied checks decreases, we have
Spielman [105] has shown originally that if the bipartite graph is a reg-
ular expander, then the analysis will show that reducible
distance increases to

Corollary 198 The Simple Sequential Error Reducing Algorithm can

be implemented to run in linear-time.

Proof: [Sketch] The average left and right node degrees are independent
of the code length, and the number of unsatisfied checks which is linear
in the code length decreases.

Simple Parallel Error Reducing Algorithm:

If there are variable bits that have more unsatisfied than satisfied
neighbors, then flip the values of those variable bits.

We cannot include Repeat until no such variable bit remains for the
Simple Parallel Error Reducing Algorithm since the algorithm may not
halt. The next theorem shows how many repeats are necessary for suc-
cessful error reduction.
184 CODES, GRAPHS, AND ITERATIVE DECODING

Theorem 199 If B is an irregular expander and

is greater than or equal to then is an error reducing code of
error reduction and reducible distance

Proof: As before, let be the number of left nodes. We show that if

we are given a word that differs from a codeword in in at most
µ variable and check bits, then the Simple Parallel Error
Reducing Algorithm followed by either repeating until no variable bit
with more unsatisfied than satisfied neighbors remains or repeating for
rounds, whichever is smaller, outputs a word that differs from
the codeword in in at most variable bits.
Let µ variable and check bits be corrupt, and define the sets M, N,
F, C to be
M = {corrupt variable bits that enter a decoding round}
N = {corrupt check bits that enter a decoding round}
F = {corrupt variable bits that fail to flip after a decoding round}
C = {uncorrupt variable bits that become corrupt
after a decoding round}

So and after a decoding round, the set of corrupt

variable bits is Observe that if then there is a
variable bit whose value is flipped by the execution of the algorithm,
which follows directly from the proof of the previous theorem.
Claim: We prove by contradiction. That is, suppose
and consider such that Defining
to be the set of neighbors of a set A,

by expansion and since at most neighbors of are uncorrupt satisfied

check bits, we get

which gives a contradiction since and

Now since we get
Low-Density Generator Codes 185

and at least edges from F go to uncorrupt satisfied check

bit nodes that have at least one neighbor in M\F which implies that

Combining the above two equations yields

This implies that

Consider now the two cases and Substituting

for and respectively, in Equation (9.2), we get

We know that if the algorithm halts then or and µ, is

initially at most Above equations imply that if algorithm halts then

which finishes half of the proof of the theorem. If algorithm does not
halt, then after iterating for rounds for some constant K, we get

since Choosing finishes the proof of the theorem.

If the bipartite graph is regular, then similar argument shows that a
expander without any minimum left degree constraint has
the same error reduction and reducible distance.

Corollary 200 The Simple Parallel Error Reducing Algorithm can be

performed by a circuit of linear-size and constant depth.
186 CODES, GRAPHS, AND ITERATIVE DECODING

We can calculate the error reduction provided by an error reducing

code by using the expansion property of the associated edge-vertex in-
cidence graph. The next lemma [105] provides an exemplary technique.
Lemma 201 If is a linear code of rate block length and
minimum relative distance and B is the edge-vertex incidence graph
of a graph on vertices with second-largest eigenvalue
then is a code with variable bits and check bits.
To decode for each check node, if neighboring variable bits and
the check bits are within relative distance of a codeword of then
send flip message to all variable bits that differs from the codeword. If
a variable bit has received one or more flip messages then flip the value
of the variable bit.
If given an input that differs from a codeword of in at most
variable bits and at most check bits, then decoding will output
a word that differs from the codeword of in at most

variable bits.
Proof: has rate and each check node has check bits, and
B is a graph between left nodes and right nodes.
Hence has check bits. If A is the set of variable bits
in which the input differs from a codeword of then a variable bit
will be corrupt at the end of a decoding round if
1 It receives a flip signal but is not in A, or
2 If it does not receive a flip signal but is in A.
It will be convenient to call a check node confused if it sends a flip
signal to an variable bit that is not corrupt, and unhelpful if it contains
an variable bit of A but fails to send a flip signal to that variable bit.
If a check node has at least corrupt variable and check
bits, then it will be confused. On the other hand, since each variable bit
is an input to two check nodes, there can be at most

confused check nodes. Because each of these can send at most flip
signals, at most
Low-Density Generator Codes 187

variable bits not in A can receive flip signals. By similar analysis, there
can be at most

unhelpful check nodes. Lemma 161 of Chapter 5 says that at most

variable bits have unhelpful check nodes as neighbors. Hence at most

variable bits will be corrupt at the end of a decoding round.

More generally, by using the expansion property of the
incidence graph, we can calculate the error reduction of a code. For
example, we obtain the following lemma.

Lemma 202 If is a linear code of rate block length

and minimum relative distance and B is the incidence
graph of a graph on vertices with second-largest eigenvalue
then is a code with variable bits and check
bits. Decoding proceeds as in Lemma 201.
If given an input that differs from a codeword of in at most
variable bits and at most check bits, then decoding will
output a word that differs from the codeword of in at most

variable bits.

Gallager’s Algorithms, discussed in the previous chapter, have natu-

ral analogues for application to low-density generator codes. The basic
principle of the analogues for low-density generator codes is exactly the
same as in the decoding of low-density parity-check codes. The only
changes that need to be made is to replace
set the message to the exclusive-or of the values received in this
decoding round from its adjacent nodes excluding
188 CODES, GRAPHS, AND ITERATIVE DECODING

in the description of Gallager’s Algorithms for low-density parity-check

codes with
set the message to the exclusive-or of and the values received in
this decoding round from its adjacent nodes excluding
to account for the slight difference in the code construction. The
decoding analysis is almost the same with the possible exception that
right node bits may be in error in low-density generator codes. However,
we may assume that the check bits are received correctly, to which we will
give justification in Section 5. In this case, all the theorems in Chapter 8
regarding Gallager’s Algorithms will hold true for low-density generator
codes, while a similar case can be made for belief propagation decoding.

3. Good Degree Sequences

The only known technique for deriving a degree sequence that achieves
the theoretical performance limit applies only to erasure channels. The
technique relies on the analysis of random processes by two probabilistic
analysis tools, AND-OR tree and the martingale argument we saw in
Chapter 8. Recall that the theoretical limit on the maximum erasure
probability for error-free recovery is 1– R where R is the rate of the code.
An AND-OR tree is a tree of depth where depth 0 is the root and
depth is the leaf of the tree, such that at even depths, there
are OR-gates and at odd depths, there are AND-gates.
Assume that an OR-gate can have from 1 to children, and an
AND-gate can have from 1 to children. Each OR-gate and AND-
gate has and children with probability and respec-
tively, that is independent of other gates, where and

Finally, each OR-gate is short-circuited to output a

1 with probability and each AND-gate is short-circuited to out-
put a 0 with probability both chosen independently of other gates.
Denote by the probability that the root of evaluates to 0. We can
express in terms of as the following lemma shows (see [69] for the
proof).

Lemma 203 If
Low-Density Generator Codes 189

then

Consider now the graph G of a low-density generator code such that

a randomly chosen edge has degree left node with probability and
degree right node with probability Label a node by 0 if its bit is
erased and by 1 if its bit is not erased, where left node is labeled 0 with
probability and right node is labeled 0 with probability Consider
a left node labeled 0 that has a neighbor c labeled 1 whose neighbors
excluding are labeled 1. Since is the XOR of and its neighboring
nodes excluding we can recover After the bit of is calculated,
label the node by 1. We define the decoding as repeating this pro-
cess of relabeling an initially label 0 left node by label 1. Decoding is
successfully completed if there are no remaining label 0 left nodes. For
a subgraph of depth of G, the probability that is not a tree
is where is the number of nodes in G. Hence we shall consider
in which is very large. Successful decoding on this asymptotic dis-
tribution of is equivalent to getting 1 at the output of a randomly
chosen AND-OR tree
For example, consider the graph in Figure 9.4 in which we want to
find the bit of the node that is labeled with 0. Figure 9.5(a) shows a
tree-like neighborhood of depth 2 graph with root node The label of
will stay as 0 if and only if the AND-OR tree in Figure 9.5(b) produces
0 at the output of the root.

Because and are continuous, in order to get we

need
190 CODES, GRAPHS, AND ITERATIVE DECODING

Unfortunately, if then Equation (9.3) cannot be true for all

Analysis shows that decoding completes where a randomly
chosen edge will have a label 0 left node with probability at most if
Equation (9.3) is true for all On the other hand, if
then Equation (9.3) is true for all for some
is equivalent to assuming that no check bit is erased. Assume
hereafter that all right nodes are labeled with 1. In this case, for all
the decoding algorithm can correct a fraction of all but at most
erasure if

for all with exponential high probability.

Equation (9.4) can be reformulated as

for all by the change of variable Equa-

tions (9.4) and (9.5) are significant in the following sense. Given a right
degree sequence, we may use Equation (9.4) to find a good left degree
sequence; given a left degree sequence, we may use Equation (9.5) to
find a good right degree sequence. Hence through back-and-forth appli-
cation of Equations (9.4) and (9.5), we may be able to find good degree
sequences, and empirical results have shown that this works pretty well.
However, this does not show that such back-and-forth strategies will find
an optimal degree sequence in theory .
Consider now in which is finite for practical reasons. Because the
preceding argument is true only for in which is infinite, to consider
the finite case, the usual martingale argument shows that if Equation
(9.4) or (9.5) is satisfied for all then decoding finishes with at
Low-Density Generator Codes 191

most variable bits not recovered with exponentially small probability

in the length of the code To this end, we make a slight restriction on
the degree sequence of left nodes. Let A be the set of left nodes that
are labeled as 0 at the end of decoding. If the average degree of nodes
in A is and the number of right node neighbors of A is greater than
then one of these right nodes has only 1 left node neighbor in A.
In other words, decoding will continue. Hence we need to show that the
graph is an expander on small subsets. For this, Lemma 160 of Chapter
5 is exactly what we need, i.e., a bipartite graph in which
is an expander on small subsets with probability where is
the number of left nodes. Hence if then with probability
decoding completes successfully.
We now consider some consequences of Equations (9.4) and Equation
(9.5) that may provide insight into the design of good degree sequences
for erasure channels. In particular, we derive some upper bounds on
in terms of the degree sequence due to Shokrollahi [102].
Theorem 204 If is a positive real number such that

for then

where and are the average left node and right node degrees, respec-
tively.
Proof: The first inequality in the theorem is equivalent to

for since the polynomial is strictly increasing for positive

and thus it has a unique inverse which is also strictly increasing.
Hence

Using the fact and invoking Lemma 173 of

Chapter 8, we get

Clearly if we can show that then we are done.

To this end, suppose are nonnegative real numbers such that
192 CODES, GRAPHS, AND ITERATIVE DECODING

The usual calculations using the concavity of log-function

show that satisfy

The proof is completed by letting be the coefficient of in

because by Corollary 174 of Chapter 8, is the fraction of right nodes
of degree and hence
The theorem gives a lower bound on since

Corollary 205 If is a positive real number such that

for then where R is the code rate.

Hence if we want to complete decoding successfully, we cannot make
too small. In particular, if we want to have close to 1 – R, the
erasure channel capacity, must be large. On the other hand, since de-
coding complexity is the number of right nodes times the average right
node degree, we want to make small. This gives us a relationship be-
tween computational complexity and performance of erasure codes with
the described decoding algorithm. In particular, for a graph satisfy-
ing Equations (9.4) or (9.5), we can construct an erasure code that can
come arbitrarily close to the channel capacity at logarithmic sacrifice in
encoding and decoding complexity. The following also from [102] gives
another upper bound on in terms of the derivatives of and
Lemma 206 If is a positive real number such that

for then

Proof: If then for by

assumption. In other words,
We are now ready to show two degree sequences that can achieve
the theoretical limit on erasure channels. The first degree sequence is
Low-Density Generator Codes 193

due to Luby et al. [69] in which is the expansion of

truncated at the term and is the expansion of with
appropriately chosen to guarantee average right node degree equals
This degree sequence is called the “tornado sequence” and the associated
codes are called “tornado codes.”
For practical reasons, we truncate at a sufficiently large term so
that the next theorem holds.
Theorem 207 Define

where and If then

on (0,1].
Proof: Since is monotonically increasing in

On the other hand, which implies

Since in the theorem is not a minimum left degree of 3 sequence we

need to make some modification for successful decoding. By the usual
technique, we make a small change in the structure of the graph, as
shown in the next theorem from [69].
Theorem 208 For any code rate R, and left nodes, there is a
code that can recover from a random erasure of
of bits in time with high probability.
Proof: Assume there are left nodes and right nodes, and so
Construct a new graph between left nodes and
right nodes whose degree sequence is and respectively, and a
new graph between left nodes and right nodes such that all left
nodes have degree 3, where Consider first. If
194 CODES, GRAPHS, AND ITERATIVE DECODING

then Hence at this point, for all at

most fraction of left nodes are not recovered with exponentially high
probability. Now using all sets of size have the necessary
expansion with high probability and thus decoding finishes successfully
with high probability for To finish the proof, observe
that and set
Table 9.1, taken from [102], shows the parameters of the code defined by
and in Theorem 207. Note how close to the theoretical limit
the code can get.

We now show the second degree sequence due to Shokrollahi [102]

that allows erasure recovery at a rate arbitrarily close to the theoretical
limit. In order to show this, it will be convenient to use the next lemma.
Lemma 209 If is a positive real number and is an integer
then

where

and is a constant independent of and

Theorem 210 Define

for where for some and If

then
Low-Density Generator Codes 195

Proof: Since

and the rate of the code is

Calculating for the maximum value of for successful decoding, by

Lemma 209

where the right hand side of the inequality is simply

Hence in order for we need

But

and

where follows from Lemma 209.

Shokrollahi calls the degree sequence in Theorem 210 a right regular
degree sequence for the obvious reason. Right regular degree sequences
are motivated by the inequality

in Theorem 204. The usual calculation using the concavity of log-

function shows that the inequality is tight if and only if all but one
are zero, hence tight if and only if right regular. The left degree
sequence was chosen as the Taylor expansion of truncated at
term where is an integer. Table 9.2 [102] lists the parameters
of the code defined by and in Theorem 210.
196 CODES, GRAPHS, AND ITERATIVE DECODING

4. Irregular Repeat-Accumulate Codes

The error reducing codes in Section 2 do not guarantee error correc-
tion on corrupt variable bits received. For this case, we can cascade error
reducing codes with an error-correcting code to construct a code that
can correct all variable bits that are corrupted if the number of such
bits is not too large. We show one construction in this section and three
constructions in the next section. In this section we provide a construc-
tion for irregular repeat-accumulate codes. Irregular repeat-accumulate
codes are low-density generator codes cascaded (serially concatenated)
with an accumulating convolutional code. In its original conception due
to McEliece, repeat-accumulate codes consisted of a repetition code cas-
caded with an accumulator – hence the repeat-accumulate portion of the
name. Note that the associated graph of the repetition code is a regu-
lar (1/R, 1) graph where R is the rate of the code. Simulation results
have since shown that when the repetition code in the original concep-
tion, which is a special case of low-density generator code, is replaced
by an irregular low-density generator code, error control performance is
significantly improved.
In fact, on erasure channels, irregular repeat-accumulate code can
achieve very near the channel capacity with linear-time encoding and
decoding time complexity.
Let a bipartite graph B between left nodes and right nodes repre-
sent a low-density generator code, and let A denote the rate-1 accumu-
lator convolutional code. A rate-1 accumulator convolutional code is a
Low-Density Generator Codes 197

convolutional code with transfer function For an input

the accumulator gives output where

To encode variable bits, use to obtain check bits, which

are used as variable input bits to encoder A to obtain check bits. The
code consisting of codewords of variable bits concatenated with check
bits from A is the irregular repeat-accumulate code The code rate
is The check bits in or variable bits in A will be called
intermediate bits. Hence in an irregular repeat-accumulate code, there
are variable bits, intermediate bits, and
check bits, and the codeword takes the form The
encoding complexity of an irregular repeat-accumulate code is clearly
linear, since it takes linear time to encode both and A.
We now provide an analysis for the irregular repeat-accumulate code
decoding algorithm as applied to erasure channels. This analysis shows
that irregular repeat-accumulate codes achieve very near the channel
capacity. There does not yet exist a similar analysis for the AWGN
channel; however, simulation results show that similar results do hold
for AWGN channels. The algorithm shows that decoding complexity is
also linear by noting that check bits are related through
Recall and represent the received variable bit and received check
bit, respectively.

Loss Recovery Algorithm:

Iterate the following four steps:
For all edges do the following in parallel:
If is not an erasure, set the message to
If this is the zeroth round and is an erasure, set the message
to an erasure
Else set and the message to any of the messages from adjacent
intermediate nodes in the previous round

sends to
198 CODES, GRAPHS, AND ITERATIVE DECODING

For all edges do the following in parallel:

Set to an erasure if any of the messages from adjacent variable
nodes is an erasure

Else set the message to the exclusive-or of the messages from

adjacent variable nodes

sends to

For all edges do the following in parallel:

Set the message to an erasure if any message from intermediate
nodes but is an erasure or is an erasure

Else, set the message to the exclusive-or the messages from

adjacent intermediate nodes but and

sends to
For all edges do the following in parallel:

Set the message to an erasure if all of the messages from adjacent

check nodes is an erasure

Else set to the exclusive-or of the messages from adjacent vari-

able nodes but and any of the messages from adjacent check nodes.

Consider a right-regular bipartite graph B in which We

will see that while simplifying the analysis, this restriction does not pre-
vent the code from approaching the channel capacity arbitrarily closely.
The initial probability of erasure of a variable bit and a check bit is Let
and denote the probabilities of erasure of
and respectively. If we define
and then we have the following set of equa-
tions if decoding stops at a fixed point.
Low-Density Generator Codes 199

Combining Equations (9.6) and (9.7), we get

Combining this with Equation (9.8) gives

Combining this with Equation (9.6) we get the following equation if

decoding stops at a fixed point.

It suffices that this equation does not have a solution in the interval
(0,1] for the decoding to end successfully. In other words, we want

since for all Let

and let

where and Now satisfies Equation

(9.11) and generates a code that recovers a fraction of erasure. Since
the are non-negative and and exist and are
unique.
It is necessary to have this stronger condition, specifically,

because we cannot guarantee non-negative coefficients in the power ex-

pansion for the last expression in the above inequalities. However, the
loss in the code rate from this condition is arbitrarily small for suffi-
ciently large We leave it as exercise to show that the rate of these
codes, approaches as goes to
200 CODES, GRAPHS, AND ITERATIVE DECODING

infinity [54]. Numerical computation shows that it is not necessary to

have the above condition since many terms in the power expansion of the
last expression in the inequalities are non-negative. Table 9.3 [54] sum-
marizes the performance of several codes obtained using the described
technique. in the table is the maximum fraction of erasure that can
be recovered.

5. Cascaded Codes
In this section we consider three constructions for cascading error
reducing codes with an error-correcting code. All three constructions
share the property that they can be encoded and decoded in linear-
time and that they are defined for an infinite family. In particular, the
last construction that we show is able to correct the maximum possible
fraction of errors correctable by any code over a given alphabet size.
Since half the relative minimum distance of the code is the upper bound
on the maximum fraction of errors that can be corrected, is the
maximum fraction for binary codes and is the maximum fraction for
codes of large alphabet size for some arbitrarily small positive constant

The first construction is due to Luby et al. [69], who developed it for
erasure codes. We will apply their construction here to error correcting
codes and give a bound on the fraction of errors that can be corrected.
Let each bipartite graph have left nodes and
right nodes. We associate each graph with an error reducing code
that has variable bits and check bits, We also use
an error correcting code C that has variable bits and check
bits.
To encode variable bits, apply to obtain check bits. Next,
use the check bits from as the variable bits for to
Low-Density Generator Codes 201

obtain an additional check bits. Repeat this process until we use

check bits from as the variable bits for C, obtaining an
additional check bits. The resulting code is a cascade of the
codes and C which we denote by The code has
variable bits and

check bits, and is thus a code of rate looks as shown in Figure

9.6.

To decode the cascaded code we simply decode the individual codes

in reverse order. Since C is an error correcting
code, the check bits of the code are corrected and the variable
bits of can be corrected from the algorithms of Section 2. Since
the check bits of the code are known – they are the variable bits
of – we can repeat this process up to code completing the
decoding of By choosing a code C that can be encoded and decoded
in at most quadratic time and choosing such that the
code can be encoded and decoded in linear time.
Through the application of Theorem 197, we have proved the following
theorem.
Theorem 211 Let be an irregular expander graph
between left nodes and right nodes. Let C be an error cor-
recting code of variable bits and check bits, that
can correct a random fraction of errors. Then is a rate
error-correcting code that can be encoded in linear time and can correct
a random fraction of errors in linear time.

The relative advantage of this cascaded method with regard to the

low-density parity-check codes discussed in Chapter 8 is that the cas-
202 CODES, GRAPHS, AND ITERATIVE DECODING

caded method allows for the encoding of low-density generator codes in

linear time for equal error correcting capability. However, this method
may require a negligible higher expansion, and assumes that errors oc-
cur at random positions. This assumption can be justified by randomly
permuting coordinates of a codeword prior to transmission such that
any form of non-adversarial noise results in random errors in the cas-
cade. If is a regular expander, then is a rate
error-correcting code that can be encoded in linear time and can correct
a fraction of random errors in linear time. We can also decode using
the Simple Parallel Error Reducing Algorithm. Again, if is a regular
expander, then similar analysis shows that is a rate
error-correcting code that can be encoded by a linear-size circuit of con-
stant depth and can correct a random fraction of errors in a linear-size
circuit of at most logarithmic depth.
Theorem 212 Let be an irregular expander in which
is greater than or equal to between left nodes and
right nodes. Let C be an error correcting code of variable bits and
check bits, that can correct a random fraction of
errors. Then is a rate error-correcting code that can be encoded
by a linear-size circuit of constant depth and can correct a random
fraction of errors in a linear-size circuit of at most logarithmic depth.

The second construction for cascaded codes is due to Spielman [105].

Spielman’s construction provides the first explicitly constructed asymp-
totically good code that is linear-time encodable and decodable. In this
section we will focus on the linear-time encodability and decodability
provided by the construction. Let be an error correcting code of
block length and code rate that can correct a fraction of errors.
For let be an error reducing code of variable bits and
check bits such that the block length and rate of the code is
and respectively. The reducible distance and error reduction
of is and respectively.
From and we shall construct error correcting codes
that can correct fraction of error. is defined in terms of
and
Each has variable bits and check bits. Given
variable bits which we will denote by for is used
as variable bits to which gives check bits which we denote
by Using as variable bits to gives check bits which
we denote by Using and as variable bits to gives
check bits which we denote by and are defined as the check
Low-Density Generator Codes 203

bits of Since the rate and the block

length of the code is and respectively. A figure of is shown
in Figure 9.7.

To decode naturally, we perform error reduction in first,

then error correction in second, and then error reduction in
last. The next two theorems are from [105].

Theorem 213 If is an error reduction code with error reduction

and reducible distance that has linear-time encoding and decoding algo-
rithm, then are linear-time encodable and decodable error-correcting
codes of lengths and rate from which a fraction of errors can
be corrected.

Proof: is a and code that can correct a frac-

tion of errors. Since is a constant, we can both encode and decode
this code in constant time Let be the time to encode and
be the time to decode
Consider the encoding time complexity first. Assume that the time
to encode is The time to encode is the time to
encode plus the time to encode plus the time to encode
which is by induction

Consider now the decoding time complexity and capability. Assume

that we have a word that differs from a codeword of in at most
bits. It follows that there are at most errors in and and
after the bits in are used in to perform error reduction on the
bits in there are at most errors in the bits in As
we know, this process takes linear time. On the other hand, can
204 CODES, GRAPHS, AND ITERATIVE DECODING

correct errors. Hence after the bits in are used in to

perform error correction on the bits there will be no error in the bits
in Likewise after the bits in are used in to perform error
reducing the bits in that has at most errors, there will be no
error in the bits in This also takes linear time. Hence by induction,
can be decoded in linear time.
As can be seen in the proof, unlike the first cascaded code construc-
tion, decoding for this code does not assume that errors occur at random
positions. Also note that the assumption of the error reducing code may
be relaxed slightly to give the same result. For example, it suffices
that output a word that differs from a codeword of in at most
variable bits for a received word that differs from a codeword
of in µ variable and check bits, where and output the
correct variable bits for a received word that differs from a codeword of
in µ variable bits and no check bits where

Theorem 214 If is an error reduction code of error reduction

and reducible distance that can be encoded and decoded by linear-size
circuits of constant depth, then are error-correcting codes of lengths
and rate from which a fraction of errors can be corrected. The
codes can be encoded by circuits of linear size and logarithmic depth and
decoded by circuits of size and logarithmic depth.

Proof: The encoding circuit complexity follows trivially. The proof of

the decoding circuit complexity is subtle in the sense that if we simulate
the recursive algorithm used in the proof of Theorem 213, we get a
circuit of depth Hence to get a quantitative improvement to
we do the following.
is used in to reduce the number of errors in
are considered as the received word of and thus are made up of
is used in to reduce the number of
errors in This will repeat until we reach the base code
which is decoded in constant time, and hence given a word that differs
from a codeword of in at most errors, the usual calculation
shows that there are no errors in the bits in and that there are
at most errors in the bits in Note that Up until
now, our circuit has logarithmic depth and linear size.
To finish the decoding, use the bits in to reduce errors in all of
simultaneously. Since there are at most errors in
the bits in and less than errors in the bits in
just before the decoding round, there will be at most bits
in error in after decoding round. Each decoding round requires
Low-Density Generator Codes 205

linear size circuit and hence the final decoding circuit has size
and depth
As in the previous theorem, the error correction capability for this code
does not depend on the position of errors, and assumption on the error
reduction code can be relaxed to give the same result.
Combining the results from Theorems 197 and 213, we have proved
the following.
Theorem 215 There exists an infinite family of linear-time encodable
and decodable error-correcting codes from irregular ex-

pander graphs that can correct an fraction of error where

Through similar analysis, one can strengthen the fraction of error that
can be corrected to if the graphs are regular expanders.
Using the results from Theorems 199 and 214, we can obtain the parallel
version of the above theorem.
If the graphs are regular expanders, one can remove the
condition on and obtain the same results.
Theorem 216 There exists an infinite family of error-correcting codes
that can be encoded by circuits of linear size of logarithmic depth and
decoded by circuits of size of logarithmic depth from irregular
expander graphs with greater than or equal to

and can correct an fraction of error where

Lastly, let us look at codes due to Guruswami and Indyk [45]. The
codes have rate are defined over alphabet of size and can
correct the maximum possible fraction of errors which is for arbi-
trarily small We note that while best known explicit codes with
large relative minimum distance achieve code rate of the decod-
ing complexity of these codes is at least cubic in the blocklength of the
code. Codes by Guruswami and Indyk achieve large minimum distance
and are linear-time encodable and decodable. In particular, Spielman’s
code just described can correct about fraction of errors with
regular bipartite graphs, while their code can correct about 0.5 fraction
of errors. We note, however, that their codes are only additive and not
linear. In other words, their codes are defined over larger alphabet but
are only a vector space over GF(2).
The code is very simple to describe, and is defined by a bipartite
graph and an error correcting code The left nodes in the graph
206 CODES, GRAPHS, AND ITERATIVE DECODING

represent the codeword of code and the right nodes represent the
codeword of We shall use the just described code constructed by
Spielman [105] as our and assume that can correct a fraction of
errors for some The codeword of is defined by sending the bits
on the left nodes to their neighbors and for each right node, the value of
it is obtained by concatenating the received bits. So the codeword takes
values in an alphabet of larger size than that of code For example,
if a right node has 3 neighboring left nodes whose values are
respectively, then the value of is
The motivation of such a transformation of a codeword of to a new
codeword is to allow a much corrupted codeword of to a less corrupted
codeword. This transformation can be efficiently facilitated through the
use of an expander graph as the bipartite graph which will enable the
code to have large minimum distance. Let’s then describe the bipartite
graph, B, used in the code. Let G be a Ramanujan graph of
vertices with that is equal to Take B as the double cover of
G such that is a graph with
In particular, take Code has rate since
has constant rate and the encoding complexity of is that of plus the
number of left nodes times the degree of left nodes which equals
The decoding algorithm for the code is as follows.

Decoding:

For each left node, set the value to the majority of the right neigh-
boring bits.

Specifically, each left node has a number of neighboring right nodes

each of which contain a number of bits. It is the majority of these bits
that determine the value of a neighboring left node. By Lemma 156 of
Chapter 5, for all with if then we have
Low-Density Generator Codes 207

Now for all such that and where

we get

from Lemma 157 of Chapter 5.

Recall that is the number of edges between the ordered ver-
tices of S and The definition of implies that which
in turn implies that

Since has relative minimum distance at least implied by the fact

that can correct a fraction of errors, the bound on gives the
relative minimum distance of at least for the code
Suppose now we have a received word that has at most fraction
of errors. Decoding algorithm gives at most bits in error for the
left nodes, deducible from the bound on Since can correct a
fraction of errors by assumption and decoding complexity of is that of
plus the number of left nodes times the degree of left nodes which is
we have proved the following theorem [45].
Theorem 217 For all there exists an explicitly constructible code
of rate relative minimum distance at least alphabet size of
and blocklength that can be encoded in time and decoded
upto a fraction of errors in time.

6. Notes
Low-density generator codes were first empirically tested by Cheng
and McEliece [24], where they found that irregular codes perform better
than regular codes using belief propagation decoding. Spielman [105]
analyzed the potential of low-density generator codes through his simple
algorithms and showed that they can reduce the number of errors in
the variable bits. For this reason, he called the codes “error reducing
codes.” Through the recursive use of error reducing codes, he gave
the first explicit construction of a family of asymptotically good linear-
time encodable and decodable error-correcting codes. His codes can also
208 CODES, GRAPHS, AND ITERATIVE DECODING

be encoded in constant-time and decoded in at most logarithmic-time

if a linear number of processors are used. His construction of linear-
time encodable and decodable error-correcting codes is related to the
construction of super concentrators of [90]. Kim and Wicker [58, 59]
have extended Spielman’s analysis by considering irregular bipartite
graphs.
Guruswami and Indyk [45] constructed the first linear-time encodable
and decodable error correcting code that can correct up to the maximum
possible number of errors. In their paper, both binary codes and codes
over large alphabets that exhibit these properties are constructed. In
fact, in a recent paper [46] they improved the code rate in the construc-
tion such that their improved codes achieve the best code rate, encoding
and decoding complexity, and error correcting capability known.
On the erasure channel side, there were many good attempts [4, 8, 9]
to construct erasure codes that achieve capacity. Works by Luby et al.
[69] and Jin et al. [54] gave the best constructions in the sense that they
achieve closest to the theoretical limit given an encoding and decoding
complexity and vice versa. More recently, Oswald and Shokrollahi [86]
provided a systematic study of degree sequences whose associated codes
achieve capacity.
References

[1] S.M. Aji and R.J. McEliece, “A general algorithm for distributing information
on a graph,” Proc. 1997 IEEE Int. Symp. on Inform. Theory, Ulm, Germany,
July 1997.
[2] S.M. Aji, G.B. Horn and R.J. McEliece, “Iterative decoding on graphs with a
single cycle,” Proc. 1998 IEEE Int. Symp. on Inform. Theory, Boston, USA,
August 1998.
[3] M. Ajtai, J. Komlos and E. Szemeredi, “Deterministic simulation in logspacc,”
Proc. 19th Annual ACM Symp. on Theory of Computing, pp. 132-139, 1987.
[4] A. Albanese, J. Blömer, J. Edmonds, M. Luby and M. Sudan, “Priority Encod-
ing Transmission,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1737-1744,
Nov. 1996.
[5] N. Alon, “Eigenvalues and expanders,” Combinatorica, vol. 6, no. 2, pp. 83-96,
1986.
[6] N. Alon and F.R.K. Chung, “Explicit construction of linear sized tolerant net-
works,” Discr. Math., vol. 72, pp. 15-19, 1988.
[7] N. Alon, J. Bruck, J. Naor, M. Naor and R. Roth, “Construction of asymp-
totically good low-rate error-correcting codes through pseudo-random graphs,”
IEEE Trans. Inform. Theory, vol. 38, pp. 509-516, 1992.
[8] N. Alon, J. Edmonds, and M. Luby, “Linear Time Erasure Codes with Nearly
Optimal Recovery,” Proc. 36th Annual Symp. on Foundations of Computer
Science, pp. 512-519, 1995.
[9] N. Alon and M. Luby, “A Linear Time Erasure-Resilient Code with Nearly
Optimal Recovery,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1732-1736,
Nov. 1996.
[10] N. Alon and J.H. Spenser, The Probabilistic Method. New York: Wiley, 2000.
[11] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, “Optimal decoding of linear codes
for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. 20, pp.
284-287, Mar. 1974.
210 CODES, GRAPHS, AND ITERATIVE DECODING

[12] A. Barg, “Complexity Issues in Coding Theory,” Handbook on Coding Theory,

editors V. Pless and W.C. Huffman. Amsterdam, Elsevier Publishing, 1998.
[13] A. Barg and G. Zémor, “Error exponents of expander codes,” IEEE Trans.
Inform. Theory, vol.48, pp. 1725-1729, 2002.
[14] A. Barg and G. Zémor, “Error exponents of expander codes under linear-
complexity decoding,” manuscript, 2001.
[15] S. Benedetto and G. Montorsi, “Unveiling Turbo Codes: Some Results on Par-
allel Concatenated Coding Schemes,” IEEE Trans. Inform. Theory, vol. 42, no.
2, pp. 409-428, Mar. 1996.
[16] S. Benedetto and G. Montorsi, “Design of Parallel Concatenated Convolutional
Codes,” IEEE Trans. Commun., vol. 44, no. 5, pp. 591-600, May 1996.
[17] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, “Serial Concatenation
of Interleaved Codes: Performance Analysis, Design, and Iterative Decoding,”
IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 909-926, May 1998.
[18] E.R. Berlekamp, H. Van Tilborg and R.J. McEliece, “On the inherent intractibil-
ity of certain coding problems,” IEEE Trans. Inform. Theory, vol. 24, pp. 384-
386, 1978.
[19] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon limit error-
correcting coding and decoding: Turbo-codes(l),” Proc. IEEE Int. Conf. on
Communications, Geneva, Switzerland, May 1993.
[20] C. Berrou and A. Glavieux, “Near Optimum Error Correcting Coding and De-
coding: Turbo-Codes,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1261-1271,
Oct. 1996.
[21] R. C. Bose and D. K. Ray-Chaudhuri, “On a Class of Error Correcting Binary
Group Codes,” Information and Control, Volume 3, pp. 68 - 79, March 1960.
[22] R. C. Bose and D. K. Ray-Chaudhuri, “Further Results on Error Correcting Bi-
nary Group Codes,” Information and Control, Volume 3, pp. 279 - 290, Septem-
ber 1960.
[23] D. Burshtein and G. Miller, “Expander Graph Arguments for Message-Passing
Algorithms,” IEEE Trans. Inform. Theory, vol. 47, pp. 782-790, Feb. 2001.
[24] J.-F. Cheng and R.J. McEliece, “Some High-Rate Near Capacity Codecs for
the Gaussian Channel,” Proc. 34th Allerton Conference on Communications,
Control and Computing, 1996.
[25] S-Y Chung, G.D. Forney Jr., T. Richardson and R. Urbanke, “On the design of
low-density parity-check codes within 0.0045 dB of the Shannon limit,” IEEE
Commun. Lett., vol. 5, pp. 58-60, Feb. 2001.
[26] S-Y Chung, T. Richardson and R. Urbanke, “Analysis of Sum-Product Decoding
of Low-Density Parity-Check Codes Using a Gaussian Approximation,” IEEE
Trans. Inform. Theory, vol. 47, pp. 657-670, Feb. 2001.
REFERENCES 211

[27] G.F. Cooper, “The Computational Complexity of Probabilistic Inference Using

Bayesian Belief Networks,” Artificial Intelligence, vol. 42, pp. 393-405, 1990.

[28] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley

& Sons, Inc., 1991.

[29] P. Dagum and M. Luby, “Approximating probabilistic inference in Bayesian

belief networks is NP-hard,” Artificial Intelligence, vol. 60, pp. 141-153, 1993.

[30] M.C. Davey and D.J.C. MacKay, “Low-Density Parity-Check Codes over
GF(q),” IEEE Commun. Letters, vol. 2., no. 6, June 1998.

[31] D. Divsalar and F. Pollara, “Multiple Turbo Codes for Deep-Space Communi-
cations,” TDA Progress Report 42-121, pp. 66-77, May 15, 1995.

[32] D. Divsalar and F. Pollara, “Turbo Codes for PCS Applications,” Proc. IEEE
Int. Conf. on Communications, Seattle, Washington, June 1995.

[33] D. Divsalar and R.J. McEliece, “On the Design of Generalized Concatenated
Coding Systems with Interleavers,” manuscript, 1998.

[34] S. Dolinar and D. Divsalar, “Weight Distributions for Turbo Codes Using Ran-
dom and Nonrandom Permutations,” TDA Progress Report 42-122, pp. 56-65,
August 15, 1995.

[35] H. El Gamal and A.R. Hammons, Jr, “Analyzing the turbo decoder using the
Gaussian approximation,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 671-
686, Feb. 2001.

[36] P. Elias, “Coding for Noisy Channels,” IRE Conv. Record, Part 4, pp. 37 - 47,
1955.

[37] G. D. Forney, Jr.. Concatenated Codes, Cambridge: MIT Press, 1966.

[38] G.D. Forney, Jr., “The forward-backward algorithm,” Proc. 34th Allerton Con-
ference on Communications, Control and Computing, 1996.

[39] B.J. Frey, Graphical Models for Machine Learning and Digital Communication.
The M.I.T. Press, Cambridge, MA, 1998.

[40] R.G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform. Theory,
vol. 8, pp. 21-28, Jan. 1962.

[41] R.G. Gallager, Low-Density Parity-Check Codes. The M.I.T. Press, Cambridge,
MA, 1963.

[42] E.N. Gilbert, “A Comparison of Signaling Alphabets,” Bell Sys. Tech. J., vol.
31, pp. 504-522, 1952.

[43] D. Gorenstein and N. Zierler, “A Class of Error Correcting Codes in Sym-

bols,” Journal of the Society of Industrial and Applied Mathematics, Volume 9,
pp. 207 - 214, June 1961.
212 CODES, GRAPHS, AND ITERATIVE DECODING

[44] W.C. Gore, “Transmitting Binary Symbols with Reed-Solomon Codes,” Pro-
ceedings of the Princeton Conference on Information Science and Systems,
Princeton, New Jersey, pp. 495 - 497, 1973.
[45] V. Guruswami and P. Indyk, “Linear-time Codes to Correct a Maximum Pos-
sible Fraction of Errors,” Proc. 39th Allerton Conference on Communications,
Control and Computing, 2001.
[46] V. Guruswami and P. Indyk, “Near-optimal linear-time codes for unique decod-
ing and new list-decodable codes over smaller alphabets,” preprint, 2002.
[47] J. Hagenauer, E. Offer and L. Papke, “Iterative decoding of binary block and
convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 429-445,
Mar. 1996.
[48] C. Heegard and S.B. Wicker, Turbo Coding. Kluwer Academic Press, 1998.

[49] S. Hirasawa, M. Kasahara, Y. Sugiyama and T. Namekawa, “Modified Product

Codes,” IEEE Trans. Inform. Theory, vol. 30, no. 2, pp. 299-306, Mar. 1984.
[50] A. Hocquenghem, “Codes Correcteurs d’Erreurs,” Chiffres, Volume 2, pp. 147
- 156, 1959.
[51] T. W. Hungerford, Algebra, New York: Springer-Verlag, 1974.

[52] K. A. S, Immink, “RS Codes and the Compact Disc,” in Reed Solomon Codes
and Their Applications, (Stephen Wicker and Vijay Bhargava, ed.) , IEEE Press,
1994.
[53] F.V. Jensen, S.L. Lauritzen and K.G. Olesen, “Bayesian updating in recursive
graphical models by local computation,” Computational Statistical Quarterly,
vol. 4, pp. 269-282, 1990.
[54] H. Jin, A. Khandekar and R. McEliece, “Irregular Repeat-Accumulate Codes,”
Proc. 2nd. International Conf. Turbo Codes, Brest, France, pp. 1-8, Sept. 2000.
[55] N. Kahale, “Expander Graphs,” Ph.D. dissertation, M.I.T., 1993.

[56] E.M. Kasahara, Y. Sugiyama, S. Hirasawa and T. Namekawa, “New classes of

binary codes constructed on the basis of concatenated codes and product codes,”
IEEE Trans. Inform. Theory, vol. IT-22, pp. 462-468, July 1976.
[57] S. Kim, “Probabilistic Reasoning, Parameter Estimation, and Issues in Turbo
Decoding,” Ph.D. dissertation, Cornell University, 1998.
[58] S. Kim and S.B. Wicker, “Thoughts on Expander Codes: Codes via Irregu-
lar Bipartite Graphs,” Annual Conf. on Information Sciences and Systems ’00,
Princeton, USA, 2000.
[59] S. Kim and S.B. Wicker, “Linear-Time Encodable and Decodable Irregular
Graph Codes,” Proc. 2000 IEEE Int. Symp. on Inform. Theory, Italy, 2000.
[60] F.R. Kschischang and B.J. Frey, “Iterative Decoding of Compound Codes by
Probability Propagation in Graphical Models,” IEEE Journal on Selected Areas
in Commun., vol. 16, pp. 219-230, Feb. 1998.
REFERENCES 213

[61] F.R. Kschischang, B.J. Prey and H-A Loeliger, “Factor Graphs and the Sum-
Product Algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498-519,
Feb. 2001.

[62] J. Lafferty and D.N. Rockmore, “Spectral Techniques for Expander Codes,”
Proc. 29th Annual ACM Symposium on Theory of Computing, pp. 160-167,
1997.

[63] J. Lafferty and D.N. Rockmore, “Codes and Iterative Decoding on Algebraic
Expander Graphs,” Int. Symp. Inform. Theory and Appl., Nov. 2000.

[64] S.L. Lauritzen and D.J. Spiegelhalter, “Local Computation with Probabilities
on Graphical Structures and Their Application to Expert Systems,” Journal of
the Royal Statistical Society, Series B, vol. 50, pp. 157-224, 1988.

[65] S. Le Goff, A. Glavieux and C. Berrou, “Turbo-Codes and High Spectral Effi-
ciency Modulation,” Proc. IEEE Int. Conf. on Communications, New Orleans,
USA, May 1994.

[66] R. Lidl and H. Niederreiter, Finite Fields, Reading, Mass.: Addison Wesley,
1983.
[67] A. Lubotzky, R. Phillips and P. Sarnak, “Ramanujan Graphs,” Combinatorica,
vol. 8, no. 3, pp. 261-277, 1988.

[68] S. Lin and E.J. Weldon, “Further Results on Cyclic Product Codes,” IEEE
Trans. Inform. Theory, vol. IT-16, no. 4, pp. 452-459, July 1970.

[69] M. Luby, M. Mitzenmacher, M.A. Shokrollahi, D.A. Spielman and V. Stemann,

“Practical Loss-Resilient Codes,” Proc. 29th Annual ACM Symp. on Theory of
Computing, pp. 150-159, 1997.

[70] M. Luby, M. Mitzenmacher and M.A. Shokrollahi, “Analysis of Random Pro-

cesses via And-Or Trees,” in Proc. 9th Symp. on Discrete Algorithms, pp. 364-
373, 1998.

[71] M. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman, “Analysis of

Low Density Codes and Improved Designs Using Irregular Graphs,” Proc. 30th
Annual ACM Symposium of Theory of Computing, pp. 249-258, 1998.

[72] M. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman, “Improved

Low-Density Parity-Check Codes Using Irregular Graphs and Belief Propaga-
tion,” Proc. 1998 IEEE Int. Symp. on Inform. Theory, Boston, USA, August
1998.

[73] D.J.C. MacKay and R.M. Neal, “Good error-correcting codes based on very
sparse matrices,” Cryptography and Coding, Lecture Notes in Computer Science
no. 1025, pp. 100-111, Springer-Verlag, 1995.

[74] D.J.C. MacKay and R.M. Neal, “Near Shannon limit performance of low density
parity check codes,” Electron. Lett., vol. 32, no. 18, pp. 1645-1646, Aug. 1996;
reprinted Electron. Lett., vol. 33, no. 6, pp. 457-458, Mar. 1997.
214 CODES, GRAPHS, AND ITERATIVE DECODING

[75] D.J.C. MacKay, “Good Error-Correcting Codes based on Very Sparse Matrices,”
IEEE Trans. Inform. Theory, vol. 45, pp. 399-431, Mar. 1999.
[76] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes,
Amsterdam: North Holland, 1977.
[77] G.A. Margulis, “Explicit constructions of concentrators,” Probl. Inform.
Transm., vol. 9, pp. 325-332, 1973.
[78] G.A. Margulis, “Explicit constructions of graphs without short cycles and low
density codes,” Combinatorica, vol. 2, pp. 71-78, 1982.
[79] G.A. Margulis, “Explicit group-theoretical constructions of combinatorial
schemes and their applications to the design of expanders and concentrators,”
Probl. Inform. Transm., vol. 24, pp. 39-46, 1988.
[80] R. J. McEliece, E. R. Rodemich, H. C. Rumsey, Jr. and L. R. Welch, “New Upper
Bounds on the Rate of a Code using the Delsarte-MacWilliams Inequalities,”
IEEE Trans. Inform. Theory, vol. 23, pp. 157-166, 1977.
[81] R. J. McEliece, Finite Fields for Computer Scientists and Engineers, Boston:
Kluwer Academic Publishers, 1987.
[82] R.J. McEliece, E. Rodemich and J.-F. Cheng, “The Turbo Decision Algorithm,”
Proc. 33rd Allerton Conference on Communication, Control and Computing,
1995.
[83] R.J. McEliece, D.J.C. MacKay and J.-F. Cheng, “Turbo Decoding as an In-
stance of Pearl’s ‘Belief Propagation’ Algorithm,” IEEE Journal on Selected
Areas in Commun., vol. 16, pp. 140-152, Feb. 1998.
[84] G. Miller and D. Burshtein, “Bounds on the Maximum-Likelihood Decoding
Error Probability of Low-Density Parity-Check Codes,” IEEE Trans. Inform.
Theory, vol. 47, pp. 2696-2710, Nov. 2001.
[85] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University
Press, 1995.
[86] P. Oswald and M.A. Shokrollahi, “Capacity-Achieving Sequences for the Erasure
Channel,” manuscript, 2000.
[87] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
[88] W. W. Peterson, “Encoding and Error-Correction Procedures for the Bose-
Chaudhuri Codes,” IRE Transactions on Information Theory, Volume IT-6,
pp. 459 - 470, September 1960.
[89] L.C. Perez, J. Seghers and D.J. Costello, Jr., “A distance spectrum interpreta-
tion of turbo codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 1698-1709, Nov.
1996.
[90] N. Pippenger, “Superconcentrators,” SIAM Journal of Computing, vol. 6, pp.
298-304, 1977.
REFERENCES 215

[91] R. Pyndiah, A. Glavieux, A. Picart and S. Jacq, “Near Optimum Decoding of

Product Codes,” Proc. of Globecom 94, vol. 1, pp. 339-343, Nov. 1994.
[92] I. S. Reed, “A Class of Multiple-Error-Correcting Codes and a Decoding
Scheme,” IEEE Transactions on Information Theory, Volume 4, pp. 38 – 49,
September 1954.
[93] T. Richardson and R. Urbanke, “The capacity of low-density parity check codes
under message-passing decoding,” IEEE Trans. Inform. Theory, vol. 47, pp.
599-618, Feb. 2001.
[94] T. Richardson, M.A. Shokrollahi and R. Urbanke, “Design of capacity-
approaching irregular low-density parity-check codes,” IEEE Trans. Inform.
Theory, vol. 47, pp. 619-637, Feb. 2001.
[95] T. Richardson and R. Urbanke, “Efficient Encoding of Low-Density Parity-
Check Codes,” IEEE Trans. Inform. Theory, vol. 47, pp. 638-656, Feb. 2001.
[96] J. Rosenthal and P. Vontobel, “Construction of Low-Density Parity-Check
Codes using Ramanujan Graphs and Ideas from Margulis,” Proc. 38th Aller-
ton Conference on Commun. Control and Computing, Monticello, Illinois, Oct.
2000.
[97] P. Rusmevichientong and B. Van Roy, “An analysis of belief propagation on the
turbo decoding graph with Gaussian densities,” IEEE Trans. Inform. Theory,
vol. 47, no. 2, pp.745-765, Feb. 2001.
[98] E. Sakk and S. B. Wicker, “Finite Field Wavelet Packets for Error Control Cod-
ing” , Proceedings of the 39th Annual Allerton Conference on Communication,
Control and Computing, Urbana-Champaign, Il, October 2001.
[99] P. Sarnak, Some Applications of Modular Forms. Cambridge University Press,
1990.
[100] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech.
J., vol 27, pp. 379-423 and pp. 623-656, 1948.
[101] S.E. Shimony, “Finding MAPs for belief networks is NP-hard,” Artificial Intel-
ligence, vol. 68, pp. 399-410, 1994.
[102] M.A. Shokrollahi, “New Sequences of Linear Time Erasure Codes approaching
the Channel Capacity,” Proc. AAECC-13, Lecture Notes in Computer Science
no. 1719, pp. 65-76, 1999.
[103] M. Sipser and D.A. Spielman, “Expander Codes,” IEEE Trans. Inform. Theory,
vol. 42, pp. 1710-1722, Nov. 1996.
[104] P. Smyth, D. Heckerman and M.I. Jordan, “Probabilistic Independence Net-
works for Hidden Markov Probability Models,” Neural Computation, vol. 9, pp.
227-269, 1997.
[105] D.A. Spielman, “Linear-Time Encodable and Decodable Error-Correcting
Codes,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1723-1731, Nov. 1996.
216 CODES, GRAPHS, AND ITERATIVE DECODING

[106] O. Y. Takeshita, O. M. Collins, P. C. Massey and D. J. Costello, Jr., “A note

on asymmetric turbo-codes,” IEEE Commun. Letters, vol. 3, no. 3, pp. 69-71,
Mar. 1999.
[107] R.M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans.
Inform. Theory, vol. 27, pp. 533-547, Sept. 1981.
[108] R.M. Tanner, “Explicit concentrators from generalized n-gons,” SIAM Journal
of Alg. Disc. Meth., vol. 5, no. 3, pp. 287-293, Sept. 1984.
[109] R.M. Tanner, “Minimum Distance Bounds by Graph Analysis,” manuscript.

[110] P. Thitimajshima, “Les codes convolutifs recursifs systematiques et leur appli-

cation a la concatenation parallele,” (in French), Ph.D. no. 284, Universite de
Bretagne Occidcntale, Brest, France, Dec. 1993.
[111] A. Tietäväinen, “On the Nonexistence of Perfect Codes over Finite Fields,”
SIAM Journal of Applied Mathematics, Volume 24, pp. 88 - 96, 1973.
[112] A.J. Viterbi, “Error bounds for convolutional codes and an asymptotically op-
timum decoding algorithm,” IEEE Trans. Inform. Theory, vol. 13, pp. 260-269,
Apr. 1967.
[113] Y. Weiss, “Correctness of local probability propagation in graphical models and
loops,” Neural Computation, vol. 12, pp. 1-41, 2000.
[114] Y. Weiss and W.T. Freeman, “On the Optimality of Solutions of the Max-
Product Belief-Propagation Algorithm in Arbitrary Graphs,” IEEE Trans. In-
form. Theory, vol. 47, pp. 736-744, Feb. 2001.
[115] N. Wiberg, H.-A. Loeliger and R. Kötter, “Codes and iterative decoding on gen-
eral graphs,” European Trans. on Telecommun., vol. 6, pp. 513-525, Sep./Oct.
1995.
[116] S. B. Wicker, Error Control Systems for Digital Communication and Storage,
Englewood Cliffs: Prentice Hall, 1995.
[117] S. B. Wicker, “Deep Space Applications,” Handbook of Coding Theory, (Vera
Pless and William Cary Huffman, ed.), Amsterdam: Elsevier, 1998.
[118] J.S. Yedidia, W.T. Freeman and Y. Weiss, “Bethe free energy, Kikuchi approx-
imations and belief propagation algorithms,” manuscript, 2001.
[119] G. Zémor, “On Expander Codes,” Trans. on Inform. Theory, vol. 47, pp. 835-
837, Feb. 2001.
[120] V.V. Zyablov and M.S. Pinsker, “Estimation of the error correction complexity
of Gallager low-density codes,” Probl. Inform. Transm., vol. 11, no. 1, pp. 18-28,
May 1976.
Index

expander, 83 asymptotic behavior, function, 5

expander, 83 asymptotically good code, 147, 174
code, 2 augmented codes, 42
a priori information, 122, 129 Azuma’s inequality, 85, 86, 90, 156

abelian, 14 Bahl, L. R., xvi, 209

abstract algebra, 12 Barg, A., 147, 150, 174, 210
achievable rate, 2 basis, 24
additive white Gaussian noise channel, Baum, L., xvi
169 Baum-Welch algorithm, xvi
adjacency matrix, 82, 137, 177 Bayesian network, 95, 99, 123
Aji, S. M., 209 low-density parity-check code, 164
Ajtai, M., 92, 209 BCH bound, 54, 57
Albanese, A., 209 BCH codes, xv, 53
algebraic block codes, 12 design procedure, 56
algebraic coding theory, 12 narrow-sense, 55
algorithm primitive, 55
BCJR, 94, 112, 113, 115 BCJR algorithm, 94, 112, 113, 115
belief propagation, 93, 99, 103, 104, belief propagation, xvii, xviii, 12, 93, 99,
109, 112 103, 104, 109, 112, 123, 162,
Berlekamp’s, xv 174
error reducing, 182, 183 Benedetto, S., 77, 132, 210
Euclid’s, 18 Berger, T., xix
exponential time, 5 Berlekamp’s algorithm, xv, 13
Gallager, 151–153, 181, 187 Berlekamp, E. R., 210
junction tree propagation, 93, 104, Berrou, C., xvii, 61, 70, 71, 121, 122, 210
106, 108, 112, 114, 115 Bethe free energy, 117
loss recovery, 197 Bhargava, V. K., xv, 212
message-passing, 93, 103, 115–118, binary erasure channel (BEC), 4
120 binary symmetric channel (BSC), 4, 169
nondeterministic, 5 bipartite graph, 79, 80, 137, 177
polynomial time, 5 Blömer, J., 209
probabilistic reasoning, 93, 99, 100, Boolean function, xiv, 46
104, 106 boolean net function language, 45
Viterbi, 94, 112, 113 Bose, R. C., xv, 53, 210
Alon, N., 87, 174, 209 bound
ancestor, 97 BCH, 54, 57
ancestor set, 97 Gilbert, 9
AND-OR tree, 188 Gilbert-Varshamov, 11, 42, 150
associativity, 14 Hamming, 9
218 CODES, GRAPHS, AND ITERATIVE DECODING

McEliece-Rodemich-Rumsey- Welch, 11 inner, 61

Singleton, 11, 57, 150 lengthened, 42
sphere packing, 9 low-density generator, 12, 177, 179–
bounded distance decoding, 7 181, 187
Bruck, J., 209 irregular, 178
Burshtein, D., 210 regular, 178
low-density parity-check, 12, 137,
capacity, 61 177, 187
cardinality, 13 Bayesian network representation,
cascaded code, 200, 202 164
decoding, 201 maximum cardinality, 9
Spielman’s construction, 202 maximum distance separable (MDS),
Cauchy-Schwarz inequality, 89 12
Cayley graph, 87, 148, 174 outer, 61
CCSDS standard for deep space teleme- parallel concatenated, 71
try, 61 parity-check, 44
cellular telephony, 61 perfect, 9, 52
channel capacity, 4 product, 58
characteristic, 26 punctured, 42
check node, 177 quadratic residue, 51
confused, 186 rate, 40
unhelpful, 186 Reed-Muller, 39, 45
Cheng, J.-F., xvii, 207, 210 duals, 47
child, 97 Reed-Solomon, 13, 39, 53, 57, 69
Chinese remainder theorem, 58 repeat-accumulate, 12, 196
chord,105 repetition, 43
chromatic number, 80, 86 shortened, 42
Chung, F. R. K., 209 systematic, 180
Chung, S. Y., 210 tornado, 193
class NP, 5 code polynomial, 49
clique, 104 codeword, 2
clique graph, 104 finite, 71
closure, 14 node, 137
Cocke, J., xvi, 209 Collins, O. M., xvi, 215
code common divisors, 18
asymptotically good, 11, 147, 174 commutative, 14
augmented, 42 complexity, 5
BCH, 53 component code, 68, 123
cascaded, 200, 202 component encoder, xvii, 70
decoding, 201 concatenated code, 12, 61, 147
Spielman’s construction, 202 serial, 61
component, 123 conditional entropy, 3
concatenated, 12, 61, 68, 147 confused, 186
construction, 6 conjugacy class, 29
convolutional, 12, 61, 65 conjugates of field elements, 29
cyclic, 13, 49 connected graph, 96
decoding, 6 constraint
dimension, 40 length, 63
encoding, 6 node, 137
error reducing, 181 degree, 138
expander, 174 satisfied, 143
expurgated, 42 unsatisfied, 143
extended, 42 Consultative Committee for Space Data
Golay, 39, 51 Systems (CCSDS), 68
Hamming, 44 convolutional code, 12, 61, 65
duals, 47 convolutional encoder, 62
high girth, 151 nonrecursive, 62
INDEX 219
nonsystematic, 63 discrete memoryless channel, 1
systematic, 63 distributive law, 16
Cooper, G. F., 210 Divsalar, D., 77, 210
coset, 15 Dolinar, S., 211
cyclotomic, 33 double cover, 81, 88
Costello, D. J., Jr., 214 dual space, 25
Cover, Thomas M., 211
cyclic codes, xv, 13, 49 edge exposure martingale, 84
cyclic graph, 96 edge-vertex incidence graph, 79, 81, 92,
cyclic product code, 58 148
cyclotomic cosets, 33 Edmonds, J., 209
effective free distance, 77
eigenvalue, 79, 82, 83, 86, 88
D transform, 63
graph, 139
D-Separation, 98
El Gamal, H., 211
Dagum, P., 211
Eldridge, N., xiii
Davey, M. C., xviii, 174, 211
Elias, P., xv, 61, 211
decoding, 2, 6
encoders
belief propagation, 162
component, 70
low-density parity-check code, convolutional, 62
164
parallel concatenated, 70
bounded distance, 7
recursive convolutional, 63
Gallager
recursive systematic, 66, 71
performance, 157 encoding, 6
hard decision, 7 entropy, 3
low-density parity-check code, 143, conditional, 3
151 joint, 3
maximum likelihood, 8, 112, 113, equivalent tree, 115
115 error reducing algorithm, 182, 183
maximum a posteriori (MAP), 8 error reducing code, 181
nearest-codeword, 7 Euclid’s algorithm, 18
soft decision, 8 extended form, 20
symbol-by-symbol MAP, 8 Euclidean domain, 17
turbo, 125 Euler function, 22
Viterbi, 69 European Space Agency (ESA), 70
deep space telecommunications, xvi, 39 evidence, 100
degree expander code, 174
constraint node, 138 expander graph, 79, 83, 175
variable node, 138 expansion, 79, 139, 142, 143, 184, 187
vertex, 80 bound, 83
degree sequence, 174, 188 Expectation-Maximization (EM) algo-
irregular code rithms, xvii
good, 170 explaining away, 98
node, 138 exponential time complexity, 5
right regular, 195 expurgated codes, 42
depth extended codes, 42
logical circuit, 6 extended form of Euclid’s algorithm, 20
descendent, 97 extended Hamming codes, 44
designed distance, 55 extended Reed-Solomon codes, 57
digital audio, 39 extrinsic information, 122, 129
dimension
code, 40 factoring 33
vector space, 24 Fano, xv
dimension theorem, 26 field, 20
directed acyclic graph (DAG), 97 Galois, 21
directed graph, 80, 95 order 21
disconnected graph, 80 order 26
discrete channel, 1 Fine, T., xix
220 CODES, GRAPHS, AND ITERATIVE DECODING

finite codewords, 71 expansion, 139, 142, 143, 184, 187

Forney, G. D., Jr., xvi, 61, 115, 147, 210, high girth, 173
211 irregular, 80
fraction of errors, 144 junction tree, 104
fractional rate loss, 62 loopy, 115
Freeman, W. T., 216 moral, 95
Frey, B., xvii, 134, 211 multiply-connected, 97
function path-l-vertex incidence, 92
Euler 22 polytree, 97
incidence, 178 Ramanujan, 87, 88, 148–150, 175,
206
Galileo, 70 random, 84
Gallager, R. G., xviii, 12, 137, 151, 173, regular, 79, 80
174, 211 singly-connected, 97
decoding algorithms, 151–153, 181, tree, 97
187 equivalent, 115
performance, 157 triangulated, 105
Galois field, 13, 21 unconnected, 96
undirected, 80
Fourier transform, 34
transform pair, 34 graph theory, xviii, 12, 79
greatest common divisors, 18
multiplicative structure, 22
ground field, 23
order 21
Guruswami, V., 205, 208, 212
order 26
primitive element, 23 Hagenauer, J., 212
Galois, Evariste, 21 Hamming
gaussian approximation, 174 bound, xiv, 9
generator matrix, 40 codes, xiv, 44
convolutional code, 64 codes, extended, 44
generator polynomial, 50 distance, 2
generator sequence, 63 Hamming, R. W., xiv, 39
Gilbert bound, 9 Hammons, A. R., Jr., 211
Gilbert, E. N., 211 hard decision decoding, 7
Gilbert-Varshamov bound, 11, 42, 150 head-to-head, 98
Giotto, 70 Heckerman, D., 215
Glavieux, A., xvii, 61, 70, 71, 121, 122, Heegard, C., xvii, 212
210 high girth code, 151
Golay codes, xiv, 51 high girth graph,173
extended, 52 Hirasawa, S., 212
ternary, 52 Hocquenghem, A., xv, 53, 212
Golay, M., xiv Horn, G. B., 209
Gore, W. C., 212 Huffman, W. C., xvi, 216
Gorenstein, D., 53, 211 Hungerford, T. W., 212
Gould, S. J., xiii
graph ideals, 13, 37
adjacency matrix, 82, 177 principle, 37
bipartite, 79, 80, 137, 177 identity, 14
Cayley, 87, 148, 174 Immink, Kees A. S., 212
chromatic number, 80, 86 incidence function, 178
clique, 104 Indyk, P., 205, 208, 212
connected, 96 inequality
directed, 80 Azuma’s, 156
directed acyclic (DAG), 97 information
disconnected, 80 priori, 122, 129
edge-vertex incidence, 79, 81, 92, extrinsic, 122, 129
148 systematic, 122, 129
eigenvalue, 79, 82, 83, 86, 88, 139 information node, 177
expander, 79, 83, 175 inner code, 61, 68
INDEX 221
input-output weight enumerating func- regular, 178
tion (IOWEF), 67 low-density parity-check code, 12, 137,
input-redundancy weight enumerating 177, 187
function (IRWEF), 65 Bayesian network representation,
Intelsat, 54 164
interleaver decoding, 143, 151
uniform, 72 belief propagation, 162, 164
inverses, 14 regular, 137
irreducible polynomials, 28 Lubotzky, A., 87, 148, 213
irregular graph, 80 Luby, M., xviii, 174, 193, 200, 208, 209
iterative decoding, xvii
MacKay, D. J. C., xvii, xviii, 174, 211
Jacq, S., 214 MacWilliams, F. J., 214
Jelinek, F., xvi, 209 Margulis, G. A., 87, 148, 173, 214
Jensen, F. V., 212 Markov random field, 95, 99
Jin, H., 208, 212 martingale, 79, 83, 155, 188
joint entropy, 3 edge exposure, 84
Jordan, M. I., 215 sequence, 84, 85, 90
junction tree, 104 vertex exposure, 85, 86
junction tree propagation algorithm, 93, Mason’s gain rule, 73
104, 106, 108, 112, 114, 115 Massey, P. C., 215
Mattson-Solomon polynomial, 34
maximum cardinality of a code, 9
Kötter, R., 216
maximum distance separable (MDS), 12
Kahale, N., 92, 212 maximum likelihood decoding, 8, 112,
Kasahara, E. M., 212 113, 115
Khandekar, A., 212 maximum likelihood sequence decoding,
Kim, S., 208, 212 xv
Komlos, J., 209 maximum a posteriori (MAP) decoding,
Kschischang, F. R., xvii, 134, 212 8
McEliece, R. J., xvii, 13, 77, 134, 196, 207,
Lafferty, J., 174, 213 209, 214
Lagrange’s theorem, 16 McEliece-Rodemich-Rumsey-Welch bound,
Lagrange, Joseph Louis, 16 11
language recognition problem, 5 memory vector, 63
Lauritzen, S. L., 212 memoryless channel, 1
Le Goff, S., 213 message, 102
left coset, 15 message–passing algorithm, 93, 103, 115–
lengthened codes, 42 118, 120
Lidl, R., 13 Miller, G., 210
Lin, Shu, 213 minimal polynomial, 13, 28
linear code, 40 roots, 30
linear independence, 24 minimum distance, 2
linear programming, 161, 174 linear block code, 40
Lipschitz condition, 85, 86 minimum relative distance, 10
locator polynomial, 55 Minty, G. J., xvi
Loeliger, H. -A., 213 Mitzenmacher, M., xviii, 213
logarithmic cost model, 6 Montorsi, G., 77, 132, 210
logical circuit moral graph, 95
depth, 6 Motwani, R., 214
model, 6 Muller, D., xiv, 45
size, 6 multiply-connected graph, 97
loop, 97 mutual information, 3
loopy graph, 115
loss recovery algorithm, 197 Namekawa, T., 212
low-density generator code, 12, 177, 179– Naor, J., 209
181, 187 Naor, M., 209
irregular, 178 narrow-sense BCH codes, 55
222 CODES, GRAPHS, AND ITERATIVE DECODING

National Aeronautics and Space Agency roots, 32

(NASA), 70 primitive BCH codes, 55
Neal, R. M., 213 principle ideal, 37
nearest-codeword decoding, 7 probabilistic independence network, 95
Niederreiter, H., 13 probabilistic reasoning, 93, 94, 99, 100,
node degree sequence, 138 104, 106
Noisy Channel Coding Theorem, xiii, 3, 4 probability model, 94
nondeterministic algorithm, 5 product codes, 58
nonsystematic convolutional encoders, 63 cyclic, 58
NP-complete, 6 projective general linear group, 87
NP-hard, 6, 172 punctuated equilibrium model, xiii
punctured codes, 42
Offer, E., 212 Pyndiah, R., 214
Olesen, K. G., 212
order quadratic residue, 50
q modulo n, 33 codes, 51
Galois field element, 21
group element, 15 Raghavan, P., 214
Oswald, P., 208, 214 Ramanujan graph, 87, 88, 148–150, 175,
outer code, 61, 68 206
Ramanujan, Srinivasa Aiyangar, 87
Papke, L., 212 random access machine (RAM), 6
parallel concatenated code, 70, 71, 123 random graph, 84
encoding, xvii, 61, 70 rate
parent, 97 achievable, 2
parity check matrix, 41 code, 40
parity relations, xiv Raviv, J., xvi, 209
parity-check codes, 44 Ray-Chaudhuri, D. K., xv, 53, 210
path-l-vertex incidence graph, 92 recursive
Pearl, J., 100, 214 constructions, 40
Perez, L. C., 132, 214 convolutional encoders, 63
perfect codes, 9, 52 systematic convolutional encoders,
Peterson, W. W., 53, 214 66
87 systematic encoders, 71
Phillips, R., 87, 148, 213 redundancy, 2
Picart, A., 214 Reed, I. S., xiv, 45, 54, 215
Pinsker, M. S., 173, 216 Reed-Muller codes, xiv, 45
Pippenger, N., 214 Reed-Solomon codes, xv, 13, 53, 57, 69
planetary standard, 69 extended, 57
Pless, V., xvi, 216 minimum distance, 57
Pollara, F., 210 regular graph, 79, 80
polynomial regular low-density parity-check code, 137
code, 49 repeat-accumulate code, 12, 196
generator, 50 repetition codes, 43
irreducible, 28 Richardson, T., xviii, 174, 210
locator, 55 right coset, 15
matrix, 64 right regular degree sequence, 195
Mattson-Solomon, 34 ring, 16
minimal, 28 Rockmore, D. N., 174, 213
primitive, 28 Rodemich, E. R., 11, 214
spectrum, 36 roots
polynomial time complexity, 5 minimal polynomial, 30
polytree, 97 primitive polynomial, 32
Prange, G., xv Rosenthal, J., 215
primitive Roth, R., 209
element, 23 Rumsey, H. C., Jr., 11, 214
polynomial, 28 Rusmevichientong, P., 215
INDEX 223
Sakk, Eric, 215 Lagrange’s, 16
Sarnak, P., 87, 148, 213 Noisy Channel Coding, 3, 4
satisfied constraint, 143 separation, 99
scalar field, 23 Sharp Concentration, 155, 157–159
scalar multiplication, 23 Sharp concentration, 163, 169
Seghers, J., 214 Thitimajshima, P., xvii, 61, 66, 70, 71,
semigroups, 13 121, 122, 210, 216
semiring, 17 Thomas, J. A., 211
separation theorem, 99 Thomes, R. J., xix
sequential decoding, xv Thorp, J. S, xix
set, 13 Tietäväinen, A., xiv, 53, 216
Shannon limit, xvii, 39, 71, 79, 175 tornado code, 193
Shannon, C. E., 1, 61, 121, 215 tornado sequence, 193
Sharp concentration theorem, 155, 157– total encoder memory, 63
159, 163, 169 transform
Shimony, S. E., 215 D, 63
Shokrollahi, M. A., xviii, 191, 194, 195, Galois field Fourier, 34
208, 213 transform pair, Galois field Fourier trans-
shortened codes, 42 form, 34
simple tree, 97 tree, 97
Singleton bound, 57, 150 AND-OR, 188
Singleton upper bound, 11 tree-like neighborhood, 155, 189
singly-connected graph, 97 triangulated graph, 105
Sipser, M., xviii, 143, 148, 174, 215 truth table, 46
Sloane, Neil J. A., 214 turbo coding, xvii
Smyth, P., 215 turbo decoding, 12, 121, 125
soft decision decoding, 8 extended parallel mode, 130
Solomon, G., xv, 54 multiple counting of evidence, 126
spanning set, 24 parallel mode, 126, 130
spectral method, 83
spectrum, 54 U-Separation, 98
polynomial, 36 unconnected graph, 96
Spenser, J. H., 209 undirected graph, 80, 95
sphere packing upper bound, 9 unhelpful, 186
Spiegelhalter, D. J., 213 uniform cost model, 6
Spielman, D. A., xviii, 143, 148, 174, 183, uniform interleaver, 72
202, 207, 213 unsatisfied constraint, 143
state complexity, 63 Urbanke, R., xviii, 174, 210
Stemann, V., 213 Van Roy, B., 215
Stirling’s formula, 9, 10 Van Tilborg, H., 210
subgroup, 15 Vandermonde matrices, 54
Sudan, M., 209 variable node
Sugiyama, Y., 212 degree, 138
symbol-by-symbol MAP decoding, 8, 121, vector addition, 23
122 vector space, 23
systematic, 41 vertex exposure martingale, 85, 86
code, 180 video storage technologies, 39
convolutional encoder, 63 Viterbi algorithm, 94, 112, 113
information, 122, 129 Viterbi decoder, xv, 69, 70
Szemeredi, E., 209 Viterbi, A. J., 216
Vontobel, P., 215
Takeshita, O. Y., 215 Voyager, 70
Tanner, R. M., xviii, 87, 173, 216
theorem wavelets, 215
Chinese remainder, 58 weight, 7
dimension, 26 weight enumerating function (WEF), 65
GFFT convolution, 35 Weiss, Y., 117, 216
224 CODES, GRAPHS, AND ITERATIVE DECODING

Welch, L. R., xvi, 11, 214 Yedidia, J. S., 117, 216

Weldon, E. J., 213
Zémor, G., 147, 150, 174, 210
Wiberg, N., 115, 134, 216
Zierler, N., 53, 211
Wicker, S. B., xv–xvii, 208, 212, 215, 216 Zyablov, V. V., 173, 216