Information Theory and Coding - Solved Problems (PDFDrive)
Information Theory and Coding - Solved Problems (PDFDrive)
Information
Theory and
Coding - Solved
Problems
Information Theory and Coding - Solved Problems
Predrag Ivaniš Dušan Drajić
•
Information Theory
and Coding - Solved
Problems
123
Predrag Ivaniš Dušan Drajić
Department of Telecommunications, School Department of Telecommunications, School
of Electrical Engineering of Electrical Engineering
University of Belgrade University of Belgrade
Belgrade Belgrade
Serbia Serbia
The aim of the book is to offer a comprehensive treatment of information theory and
error control coding, by using a slightly different approach then in existed literature.
There are a lot of excellent books that threat error control coding, especially modern
coding theory techniques (turbo and LDPC codes). It is clear that understanding
of the iterative decoding algorithms require the background knowledge about the
classical coding and information theory. However, the authors of this book did not
find other book that provides simple and illustrative explanations of the decoding
algorithms, with the clearly described relations with the theoretical limits defined by
information theory. The available books on the market can be divided in two
categories. The first one consists of books that are specialized either to algebraic
coding or to modern coding theory techniques, offering mathematically rigorous
treatment of the subject, without much examples. The other one provides wider
treatment of the field where every particular subject is treated separately, usually
with just a few basic numerical examples.
In our approach we assumed that the complex coding and decoding techniques
cannot be explained without understanding the basic principles. As an example, for
the design of LDPC encoders, the basic facts about linear block codes have to be
understood. Furthermore, the functional knowledge about the information theory is
necessary for a code design—the efficiency of statistical coding is determined by
the First Shannon theorem whereas the performance limits of error control codes are
determined with the Second Shannon theorem. Therefore, we organized the book
chapters according to the Shannon system model from the standpoint of the
information theory, where one block affects the others, so they cannot be treated
separately.
On the other hand, we decided to explain the basic principles of information
theory and coding through the complex numerical examples. Therefore, a relatively
brief theoretical introduction is given at the beginning of every chapter including a
few additional examples and explanations, but without any proofs. Also, a short
overview of some parts of abstract algebra is given at the end of the corresponding
chapters. Some definitions are given inside the examples, when they appear for the
first time. The characteristic examples with a lot of illustrations and tables are
v
vi Preface
chosen to provide a detail insight to the nature of the problem. Especially, some
limiting cases are given to illustrate the connections with the theoretical bounds.
The numerical values are carefully chosen to provide the in-depth knowledge about
the described algorithms. Although the examples in the different chapters can be
considered separately, they are mutually connected and the conclusions in one
considered problem formulates the other. Therefore, a sequence of problems can be
considered as an “illustrated story about an information processing system, step by
step”. The book contains a number of schematic diagrams to illustrate the main
concepts and a lot of figures with numerical results. It should be noted that the in
this book are exposed mainly the problems, and not the simple exercises. Some
simple examples are included in theoretical introduction at the beginning of the
chapters.
The book is primarily intended to graduate students, although the parts of the
book can be used in the undergraduate studies. Also, we hope that this book will
also be of use to the practitioner in the field.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Data Compression (Source Encoding) . . . . . . . . . . . . . . . . . . . . . . . . . 45
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Information Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5 Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Brief Introduction to Algebra I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Brief Introduction to Algebra II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7 Convolutional Codes and Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . 327
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
8 Trellis Decoding of Linear Block Codes, Turbo Codes. . . . . . . . . . . . 385
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
vii
viii Contents
SIGNAL
SOURCE ENCODER
GENERATOR
CHANNEL
DISCRETE CHANNEL
Fig. 1.1 A simplified block-scheme of communication system from the information theory point
of view
The second chapter deals with the information sources. Mainly discrete sources
are considered. The further division into memoryless sources and sources with
memory is illustrated. The corresponding state diagram and trellis construction are
explained, as well as the notions of adjoint source and source extension. Further, the
notions of quantity of information and entropy are introduced. At the end an
example of continuous source (Gaussian distribution) is considered.
In the next chapter source encoding (data compression) is considered for discrete
sources. The important notions concerning the source codes are introduced—
nonsingularity, unique decodability and the need for an instantaneous code. Notions
of a code tree and of average code word length are introduced as well. The short
discussion from the First Shannon theorem point of view is included. Further
Shannon-Fano and Huffman encoding algorithms are illustrated with corresponding
problems. Adaptive Huffman algorithms (FGK, Vitter) are discussed and illustrated.
At the end LZ algorithm is considered.
Information channels are considered in the fourth chapter. As the sources, the
channels can be as well discrete and continuous, but there exists also a mixed type
(e.g. discrete input, continuous output). Discrete channels can be with or without
memory. They are described using the corresponding channel matrix. A few dis-
crete channels without memory are analyzed in details (BSC, BEC etc.).
Transmitted information and channel capacity (for discrete and continuous chan-
nels) are defined and the Second Shannon theorem is commented. The decision
rules are further analyzed (hard decoding and soft decoding), and some criteria
1 Introduction 3
(MAP, ML) are considered. At the end Gilbert-Elliott model for the channels with
memory is illustrated.
In the fifth chapter block codes (mainly linear block codes) are illustrated by using
some interesting problems. At the beginning the simple repetitions codes are used to
explain FEC, ARQ and hybrid error control procedures. Hamming distance is further
introduced, as well as Hamming weight and distance spectrum. Corresponding
bounds are discussed. The notion of systematic code is introduced. Hamming codes
are analyzed in many problems. The notions of generator matrix, parity-check matrix
and syndrome are explained. Dual codes and McWilliams identities are discussed.
The notion of interleaving is illustrated as well. At the end of the chapter arithmetic
and integer codes are illustrated. At the end of chapter a brief overview of the
corresponding notions from abstract algebra is added (group, field, vector, space).
The sixth chapter deals with cyclic codes, a subclass of linear block codes
obtained by imposing on an additional strong structure requirement. In fact, cyclic
code is an ideal in the ring of polynomials. The notions of generator polynomial and
parity-check polynomial are introduced. The usage of CRC is illustrated in a few
problems. BCH codes are as well illustrated. RS codes are analyzed in details,
especially decoding algorithms (Peterson, Berlekamp-Massey, Gorenstein and
Zierler, Forney). At the end of chapter a brief overview of the corresponding
notions from abstract algebra is added (Galois field, primitive and minimal poly-
nomial, ideal).
Convolutional codes and decoding algorithms are analyzed in the next chapter.
The corresponding notions are explained (constraint length, transfer function
matrix, state diagram and trellis, free distance). Majority logic decoding, sequential
decoding and especially Viterbi algorithm are illustrated as well as the possibilities
of hard (Hamming metric) and soft (Euclidean metric) decision. Punctured codes
are explained as well. TCM is considered in details.
Trellis decoding of linear block codes and turbo decoding are explained in the
chapter eight. It is shown how from suitable transformed (trellis oriented) generator
matrix, the corresponding parity-check matrix can be obtained suitable for trellis
construction. The corresponding decoding algorithms are analyzed (generalized
Viterbi algorithm, BCJR, SOVA, log-MAP, max-log-MAP). At the end turbo codes
are shortly described.
The last chapter deals with LDPC codes. They provide an iterative decoding
with a linear complexity. Tanner interpretation of LDPC codes using bipartite
graphs is explained. The various decoding algorithms are analyzed with hard
decision (majority logic decoding, bit-flipping) and with soft decision
(belief-propagation, sum-product, self-correcting min-sum). At the end the algo-
rithms are compared using Monte Carlo simulation over BSC with AWGN.
For reader convenience, a relatively brief theoretical introduction is given at the
beginning of every chapter including a few additional examples and explanations,
but without any proofs. Also, a short overview of some parts of abstract algebra is
given at the end of the corresponding chapters. This material is mainly based on the
textbook An Introduction into Information Theory and Coding [1] (in Serbian) from
the same authors.
Chapter 2
Information Sources
Generally, the information sources are discrete or continuous. The discrete source
has finite or countable number of messages, while the con source messages are from
an uncountable set. In this book will be mainly dealt with discrete sources, espe-
cially with those having a finite number of symbols.
The further subdivision of sources is according to the memory they may have.
The sources can be without memory (zero-memory, memoryless) (Problems 2.1,
2.2, 2.3, 2.5, 2.6 and 2.8) emitting the symbols (messages) si according only to the
corresponding probabilities P(s). Therefore, the zero-memory source is completely
described by the list of symbols S (source alphabet) i
S s1 ; s2 ; . . .; sq
It is supposed as well that the symbols (i.e. their emitting) is a complete set of
mutually exclusive events, yielding
X
q
Pðsi Þ ¼ 1:
i¼1
For the sources with memory, where the emitting of the next symbol depends on
m previously emitted symbols (m is the memory order) (Problems 2.4 and 2.7), the
corresponding conditional probabilities are needed.
where si1 is the oldest and sim the youngest symbol, proceeding the symbol sj. The
other name for such source is Markov source (Markov chain is the emitted
sequence). For the source with memory-order m, m previously emitted symbols are
called the source state. Generally, there are qm states. For a binary source (two
symbols only) the number of states is 2m. For every state the following must be
satisfied
X
q
P sj =si1 ; si2 ; . . .; sik ; . . .; sim ¼ 1 ðik ¼ 1; 2; . . .; qÞ ðk ¼ 1; 2; . . .; mÞ:
j¼1
From every state the source can emit any of q possible symbols, and there are
totally qmq = qm+1 conditional probabilities (some can be equal to zero!).
The state diagram (Problems 2.4, 2.6 and 2.7) can be drawn, comprising the
states and the corresponding conditional probabilities. Instead of a state diagram,
trellis (Problems 2.5, 2.6 and 2.7) can be constructed being some kind of a dynamic
state diagram.
In further considering only the ergodic sources will be taken into account. For
example, the source having at least one absorbing state (Problem 2.4), i.e. where
from such a state the source can not pass into the other states, is not ergodic.
Loosely speaking, the source is ergodic, if observed for a long time, it will pass
through every of its possible states. If the elements of the matrix containing tran-
sition probabilities do not change in time, the source is homogenous. The source is
stationary if the steady state probabilities can be found by solving the corre-
sponding matrix equation (Problems 2.4 and 2.5)
p ¼ p P;
X
q
pij ¼ 1 ði ¼ 1; 2; . . .; qÞ:
j¼1
Some authors for transition matrix use the transposed version of the above
matrix and in this case the sum of the elements in every column equals 1.
Brief Theoretical Overview 7
If the sum of elements in every column also equals 1, the matrix is doubly
stochastic and all the states are equally probable. For any Markov source the
probabilities of emitted symbols can be found as well. The adjoint source
(Problems 2.4 and 2.7) to the source S, denoted by S is a zero-memory information
source with source alphabet identical to that of S, having the same symbol prob-
abilities. The nth extension (Problems 2.3 and 2.7) of any source S is the source
denoted by Sn whose symbols are all possible different sequences (combinations) of
the symbols of source S of the length n. If the original source has q symbols, source
Sn has qn different symbols.
Consider binary (q = 2) Markov source having memory order 2 (m = 2). There
are qm = 22 = 4 states and qm+1 = 23 = 8 transition probabilities. Let these proba-
bilities are
00 01 10 11
2 3
00 0:7 0:3 0:0 0:0
01 6
6 0:0 0:0 0:5 0:5 7
7
P¼
10 4 0:5 0:5 0:0 0:0 5
11 0:0 0:0 0:3 0:7
The states are 00, 01, 10 and 11. It should be noted that the source cannot go
directly from any state into the any other state. E.g. if source is in the state 00, it can
pass only to the same state (i.e. stay in it) (00) if in the emitted symbol sequence
after 00 symbol 0 appears (000 ! 000), or into the state 01 if after 00 symbol 1
appears (001 ! 001). It can be easily visualized as the state of the corresponding
shift register, where new (emitted) symbols enter from the right side. Equivalently,
the “window” of the corresponding width (m = 2) sliding over the sequence of
emitted symbols can be conceived, showing the current state. The corresponding
state diagram is illustrated in the Fig. 2.1.
Steady state probabilities can be calculated as follows
The first equation is the consequence of the fact the entering into the state 00 is a
result of two mutually exclusive events—either the source was in state 00 and 0 is
emitted, either the source was in state 10 and 0 was emitted. The second and the
8 2 Information Sources
10 11
0.3 0.7
third equation are obtained in a similar way. However, if the corresponding (fourth)
equation based on the same reasoning was added for the state 10, a singular system
would be obtained, because this equation is dependent of the previous three.
Instead, the following equation will be used
because the source must be in one of the possible states. The corresponding
solution is
Taking into account the fact that state diagram is symmetrical, one should expect
as well that the symmetrical states are equally probable. The stationary probabilities
of the symbols in emitted sequence can now be calculated. The probability to find 0
in the emitted sequence is equal to the sum (the states are mutually exclusive) of the
probability that the source is in the state 00 and the half of the probability that
source is in the state 01 or 10
Similarly, one can calculate P(1) = 0.5. However, this value can be easily
obtained from
Pð0Þ þ Pð1Þ ¼ 1:
Further, because the state diagram is symmetrical (i.e. the values of the corre-
sponding steady state probabilities) the obtained result should be expected as well
Brief Theoretical Overview 9
00
01
10
11
0
1
Fig. 2.2 Trellis for the source from the Fig. 2.1
—in the emitted sequence 0 and 1 are equally probable. The corresponding trellis is
shown in Fig. 2.2. States are denoted at the trellis beginning only. Further, only
points are used. The corresponding emitted bits are marked using arrows.
The state transition probabilities can be marked also. It is easy to note that trellis
has a periodical structure. After the first step, the trellis repeats. If the initial source
state is defined in advance (e.g. 00) the trellis starts from this state.
The information is represented (encoded) by signals, which are the carriers of
information. From the information theory point of view, the information depends on
the probability of the message emitted by the source
If the probability of some message is smaller, the user uncertainty about this
event is greater. By receiving the message (symbol) si, the user is no more uncertain
about this event. In other words, if the uncertainty is greater, it can be concluded
that by receiving the corresponding message, the greater quantity of information is
received. Therefore, the quantity of information of some message is equal, or at
least proportional, to the corresponding uncertainty at the receiving part. It can be
easily concluded that the quantity of information of some message is inversely
proportional to its (a priori) probability. A step further, if the user is sure that some
message will be emitted (P(si) = 1), then it does not carry any information, because
there was not any uncertainty. That means as well that an information source must
10 2 Information Sources
It is easy to verify that the above conditions are satisfied. Of course, some other
function, satisfying these conditions could have been chosen, e.g.
1
Qðsi Þ ¼ 1
Pðsi Þ
However, the logarithmic function is a unique one making possible the fulfill-
ment of one more necessary condition for the quantity of information—the addi-
tivity of information quantities of independent messages. Let the probabilities of
messages si and sj are P(si) i P(sj). Then the joint message obtained by successive
emitting of these independent messages should have the information quantity equal
to the sum of their information quantities. Using logarithmic function, one obtains
1 1
Qðsi ; sj Þ ¼ log ¼ log
Pðsi ; sj Þ Pðsi ÞPðsj Þ
1 1
¼ log þ log ¼ Qðsi Þ þ Qðsj Þ:
Pðsi Þ Pðsj Þ
It can be proved that the logarithmic function is the unique one satisfying all
mentioned conditions. Logarithm to any base can be used, however, it is suitable to
use a logarithm to the base 2. It will be denoted as ld(.).
If a binary source is considered, its output signals (symbols) are known as bits
(usually denoted as 0 and 1). If the bits are equally probable, i.e. P(s1) =
P(s2) = 0.5, one obtains
This definition gives a very suitable etalon for a unit to measure the quantity of
information, especially having in view the digital transmission. Therefore, here
Brief Theoretical Overview 11
every bit can carry a unit of information. To avoid the confusion between the bits
(signals) and the units of information in this book the information unit using a
logarithm to the base 2 will be called shannon (denoted by Sh). Therefore, one bit
can carry up to one shannon.
For a source having q symbols, the corresponding quantities of information for
the symbols are
1
Qðsi Þ ¼ ld ½Sh ði ¼ 1; 2; . . .; qÞ:
Pðsi Þ
X
q X
q
1
HðSÞ ¼ E½Qðsi Þ ¼ Qðsi Þ ¼ Pðsi ÞQðsi Þ ¼ Pðsi Þld
Pðsi Þ
Xq i¼1 i¼1
Sh
¼ Pðsi ÞldPðsi Þ :
i¼1
symb
X
q
1 X q
HðSÞ ¼ Pðsi Þ ld ¼ Pðsi Þ ld Pðsi Þ ½Sh=symb
i¼1
Pðsi Þ i¼1
This expression is analogue to the Boltzmann formula for the entropy of the
ideal gas, where P(si) is the probability of the state in the phase space. It is some
measure of the disorder and according to the Second law of thermodynamics can
not be become smaller in a closed system. In information theory it is an average
measure (per symbol) of the user uncertainty about the emitted source symbols.
Shannon (1948) introduced the name entropy for H(S).
Entropy of the nth extension of zero-memory source is H(Sn) = nH(S) (Problem
2.3). By taking at least m extensions of an mth order Markov source the first-order
Markov source is obtained. There will be always the dependence at the borders of
of the extension “symbols”. The property H(Sn) = nH(S) also holds in this case.
The entropy of the mth order Markov source with q symbols {s1, s2, …, sq} having
the transitional probabilities
12 2 Information Sources
is
!
X
q X
q X
q X
q
1
HðSÞ ¼ Pðsi1 ; si2 ; . . .; sim ; sj Þld
j¼1 i1 ¼1 i2 ¼1 im ¼1 P sj =si1 ; si2 ; . . .; sim
!
X 1
¼ Pðsi1 ; si2 ; . . .; sim ; sj Þld :
Sm þ 1
P sj =si1 ; si2 ; . . .; sim
In the last row the summation over all possible symbols of m + 1th extension
(Sm+1) of the original source is symbolically denoted. The number of elements of
the sum is qm+1. Further, the entropy of the corresponding source adjoined to the
extended Markov source is not equal to the entropy of the extended Markov source
(Problems 2.4, 2.5 and 2.7).
The entropy is an important characteristic of the information source. It is
interesting how the entropy changes when the symbol probabilities change. If the
probability of one symbol equals 1, then the probabilities of the all others are equal
to zero, and the entropy is equal to zero as well, because
1
1 ld(1Þ ¼ 0; lim x ld ¼ 0:
x!0 x
The expression for the entropy is a sum of non negative quantities, and the
entropy cannot have the negative values. The next question is do the entropy is
limited from the upper side. By using some inequalities, it can be easily proved
HðSÞ ld q:
yielding finally
0 HðSÞ ld q:
Furthermore, it can be proved as well that the entropy will have the maximum
value (ld q) if P(si) = 1/q for every i, i.e. when all the symbols are equally probable.
The “physical interpretation” is here very clear. If the entropy is a measure of
uncertainty of the user about the source emitting, then it is obvious that this
uncertainty will have the maximal value when all the symbols are equally probable.
If these probabilities are different, the user will expect the more probable symbols,
they will be emitted more frequently, and at the average, the smaller quantity of
information will be obtained per symbol.
The sequence of emitted symbols can be conceived as well as some kind of a
discrete random process, where all possible sequences generated by a source form
an ensemble. Sometimes, it is also said that the time-series are considered. It will be
Brief Theoretical Overview 13
supposed that the ensemble is wide sense stationary, i.e. that the average value and
autocorrelation function do not depend on the origin. Further, the process can be
ergodic for the mean value and autocorrelation function if they are the same if
calculated by using any generated sequence, or averaged over the whole ensemble
for any fixed time moment. If the symbol duration is T, instead of x(t) symbol x(nT)
can be used, or x(n) or simply xn. The average value is
Efxn g;
Rx ðlÞ ¼ Efxn xn þ l g:
If the symbols are complex numbers, the second factor should be complemented.
On the base of the Wiener- Khinchin theorem for discrete signals (z-transformation)
the corresponding discrete average power spectrum density (APSD) (Problem 2.6)
can be found.
A special kind of discrete random process is pseudorandom (PN—Pseudo
Noise) process (sequence). In most applications binary sequences are used. They
are generated by pseudorandom binary sequence generator (PN), i.e. by the shift
register with linear feedback (Problem 2.6). PN sequence is periodical and auto-
correlation function is periodical as well. If the number of register cells is m, the
corresponding maximal sequence length (period) is L = 2m − 1. Of course, it will
be the case, if the feedback is suitably chosen.
For a continuous source (Problem 2.8), emitting the “symbols” s from an
uncountable set, having the probability density w(s), the entropy can be defined as
follows
Z1
1
HðSÞ ¼ wðsÞld ds:
wðsÞ
1
Z1
wðsÞds ¼ 1;
1
However, one should be careful here, because in this case the entropy does not
have all the properties of the entropy as for discrete sources. For example, it can
depend on the origin, it can be infinite etc. Some additional constraints should be
added. If the probability density is limited to a finite interval (a, b), the entropy will
be maximum for a uniform distribution, i.e.
14 2 Information Sources
ba ;
1
asb
wðsÞ ¼ ;
0; otherwise
This result is similar to the result obtained for a discrete source having a finite
number of symbols.
If the stochastic variable can take any value from the set of real numbers, but if
its variance is finite, i.e.
Z1
r ¼
2
s2 wðsÞds\1;
1
It is
1
HðS)jmax ¼ ld 2per2 :
2
According to the Central limit theorem, the sum of more independent random
variables, having limited variances, when the number of variables increases, con-
verges to Gaussian distribution. Therefore, otherwise speaking, Gaussian distribu-
tion in the nature is a result of the action of a large number of statistically
independent causes. Therefore, it is to be expected that the measure of uncertainty
(entropy) in such case has a maximal value.
Some authors consider that the variables whose probability density is limited to a
finite interval (a, b) correspond to the “artificial” signals (speech, TV etc.), while the
variables with unlimited interval of values correspond to the “natural” signals (noise).
It is also interesting that it is possible to generate Gaussian random process
having a predefined autocorrelation function (Problem 2.7).
One more problem is how to define the information rate for continuous sources.
In this case the finite frequency band (limited to fc) of power density spectrum
should be supposed. The corresponding signal should be sampled with the rate 2fc
samples per second (these values are statistically independent!). The corresponding
information rate is
sample Sh Sh
UðSÞ ¼ 2fg HðSÞ ¼ ;
s sample s
i.e., the same units are used as for discrete sources (Sh/s). In such a way it is
possible to compare discrete and continuous sources.
Problems 15
Problems
si s1 s2 s3 s4 s5 s6 s7 s8
P(si) 0.3 0.21 0.17 0.13 0.09 0.07 0.01 0.02
(a) Find the entropy and the source information rate if the symbol rate is vs = 100
[symb/s].
b) How the entropy and information rate will change in the case of equiprobable
symbols?
Solution
(a) Source entropy is the average entropy per symbol
X
q
1 X q
HðSÞ ¼ Pðsi Þ ld ¼ Pðsi Þ ld Pðsi Þ:
i¼1
Pðsi Þ i¼1
where, for the easier calculation of logarithms to the base 2 (denoted by ld(x)),
the following relation can be used
loga ðxÞ
ldðxÞ ¼ :
loga ð2Þ
Information rate, the average information quantity emitted by the source per
second, is
Sh symb Sh
UðSÞ ¼ HðSÞvs ¼ 2:5717 100 ¼ 257:17 ;
symb s s
X
q
1 1 1
HðSÞ ¼ ld ¼ q ldðqÞ ¼ ldðqÞ; UðSÞ ¼ vs ld q;
i¼1
q 1=q q
As expected, the zero-memory source has the maximum entropy when the
symbols are equiprobable, because the uncertainty about the next emitted
symbol is maximal.
si s1 s2 s3 s4
P(si) 0.5 0.25 0.125 0.125
(a) Find the entropy and the information rate if the symbol rate is vs = 400
[symb/s].
(b) Find the entropy if the source emits q symbols according to the following
expression
2i ; i ¼ 1; 2; . . .; q 1
Pðsi Þ ¼
2i þ 1 ; i ¼ q
and draw the values of entropy depending on the number of symbols for
q = 2, 3, …, 20.
Solution
(a) Direct using of the corresponding formula for entropy yields
X
4
1 1 1 1 1
HðSÞ ¼ Pðsi Þ ld ¼ ld 2 þ ld 4 þ ld 8 þ ld 8
i¼1
Pðs i Þ 2 4 8 8
Sh
¼ 1:75 :
symb
Problems 17
In this case, the logarithms can be easily calculated, because the probabilities
are inverses of the integer powers of 2. Information rate is
Sh symb Sh
UðSÞ ¼ 1:75 400 ¼ 700 :
symb s s
(b) When the probability of one symbol is twice smaller than that of the previous
one, and when they are ordered according to the probabilities (while the last
two have equal probabilities), the entropy of the source which has q symbols
can be found in the form
X
q
1 X i
q1
q1
HðSÞ ¼ Pðsi Þ ld ¼ i
þ q1 ;
i¼1
Pðsi Þ i¼1
2 2
yielding
4.5
3.5
Entropy, H(s)
2.5
1.5
1
2 4 6 8 10 12 14 16 18 20
Number of symbols, q
Fig. 2.3 Entropy of sources from Problems 1.1b and 1.2b for q = 2, 3, …, 20
Solution
Binary source emits only two symbols with the probabilities P(s1) = p and
P(s2) = 1 − p. As its extension the sequences of original binary symbols as new
(compound) symbols are considered.
(a) From p(s1) = 0.1 follows P(s2) = 1 − p = 0.9 yielding the entropy
and can be easily calculated P(r1) = 0.81, P(r2) = 0.09, P(r3) = 0.09 and
P(r4) = 0.01, yielding the corresponding entropy
for the third extension, which has eight compound symbols, given in
Table 2.1 the corresponding entropy is
X
8
1 Sh
HðS Þ ¼3
Pðri Þ ld ¼ 1:407 ;
i¼1
Pðri Þ symb
HðSn Þ ¼ nHðSÞ:
while for the entropy of nth extension of zero-memory binary source one
obtains
1 1
HðSn Þ ¼ nHðSÞ ¼ np ld þ nð1 pÞ ld :
p 1p
Problem 2.4 First-order (Markov) memory source whose source alphabet is (s1, s2,
s3, s4), is defined by the transition matrix
p=0.5
4.5
p=0.1
Entropy of the n−th extension, H(S ) p=0.01
n
4
3.5
2.5
1.5
0.5
0
1 1.5 2 2.5 3 3.5 4 4.5 5
The extension order, n
Fig. 2.4 Entropy of the nth extension of zero-memory binary source, one binary symbol with the
probability p
2 3
0 0:7 0:3 0
6 0 0 0:7 0:3 7
6
P¼4 7:
0:1 0 0 0:9 5
0:7 0:3 0 0
(a) Draw the source state diagram and find the state probabilities and the symbol
probabilities. Whether the source is a homogenous one and a stationary one?
(b) If the probabilities in the matrix are changed as follows P(s1/s3) = 0.3 and
P(s4/s3) = 0.7, draw the state diagram and find stationary probabilities of
source symbols.
(c) Repeat the procedure from (b) for P(s1/s3) = 0 and P(s4/s3) = 1, the other
probabilities being unchanged.
(d) How the state diagram changes for P(s1/s3) = P(s4/s3) = 0 and P(s3/s3) = 1?
(e) Find the entropies of the sources defined previously. For which transition
matrix the entropy would achieve the maximum? What are the symbol
probabilities in this case?
Solution
(a) Transition matrix describing the first-order memory source (first order
Markov source) contains the conditional probabilities—the element in ith row
and jth column corresponds to the conditional probability P(sj/si). This matrix
Problems 21
is always stochastic because the sum of elements in every row must be equal
to 1 [3].
In this problem every symbol corresponds to one source state and transition
probabilities are determined by the transition matrix. The corresponding state
diagram is shown in Fig. 2.5. The source is stationary if the corresponding
stationary symbol probabilities can be calculated, for which the following
relation holds
It can be easily verified that the system consisting of the first four equations
has not a unique solution. If the fourth equation is substituted by the fifth, the
following matrix equation can be obtained
0.3
0.3 0.7
0.7
s4 s3
0.9
22 2 Information Sources
2 3
0 0:7 0:3 1
6 0 0 0:7 1 7
p ¼ ½0 0 0 1þp 6
4 0:1
7;
0 0 1 5
0:7 0:3 0 0
yielding
2 31
1 0:7 0:3 0
6 1 1 0:7 0:3 7
p ¼ ½0 0 0 1 6
4 0:1
7
0 1 0:9 5
1 1 1 1
s1 s2 s1 s2
0.3 0.3 0.3
0.3 0.3
0.3 0.7 0.3 0.7
0.7 0.7
s4 s4 s3
s3
0.7 1.0
Fig. 2.6 State diagrams for (b) (a) and (c) (b)
Problems 23
(c) For P(s1/s3) = 0 and P(s4/s3) = 1, the other probabilities remaining unchan-
ged, state diagram is shown in Fig. 2.6b. The source from state s3 goes
deterministically to the state s4. However, the source is stationary, because the
solution can be found
(d) For P(s1/s3) = P(s4/s3) = 0 and P(s3/s3) = 1, the other probabilities remaining
unchanged, state diagram is shown in Fig. 2.7. In this case the equation
has not solution except for P(s1) = P(s2) = 0 (being in contradiction to the
other relations, i.e. not yielding the unique solutions for P(s3) i P(s4)). In this
case it is not possible to find probabilities satisfying the relation p ¼ p P
and the source is not stationary [1, 3]. It is clear as well from the fact that the
state s3 is an absorbing one, and the source cannot go out from this state.
(e) For the first-order memory sources the general expression for entropy of the
source with memory is substantially simplifies
X
q X
q
1
HðSÞ ¼ Pðsi Þ Pðsj =si Þ ld :
i¼1 j¼1
Pðsj =si Þ
s1 0.3 s2
0.3
0.3 0.7
0.7
s4 s3
1.0
24 2 Information Sources
Taking into account the previously calculated state probabilities in d), the
additional simplification is achieved comprising only 8 sum elements, instead
of q2 = 16 (because some elements of transition matrix are equal to zero)
1 1
H ðaÞ ðSÞ ¼PðaÞ ðs1 Þ 0:7 ld þ 0:3 ld
0:7 0:3
1 1
þ PðaÞ ðs2 Þ 0:7 ld þ 0:3 ld
0:7 0:3
1 1
þ PðaÞ ðs3 Þ 0:1 ld þ 0:9 ld
0:1 0:9
1 1
þ PðaÞ ðs4 Þ 0:7 ld þ 0:3 ld
0:7 0:3
Sh
¼ 0:7825 ;
symb
X
4
1 Sh
H ðaÞ ð
SÞ ¼ PðaÞ ðsk Þ ld ðaÞ
¼ 1:9937 :
k¼1
P ðsk Þ symb
For the source where the transition matrix is doubly stochastic, the state diagram
is symmetric, the entropies of the source and the corresponding adjoint source are
1 1 Sh
H ðbÞ ðSÞ ¼ 4 0:25 0:7 ld þ 0:3 ld ¼ 0:8813 ;
0:7 0:3 symb
1 Sh
H ðbÞ ð
SÞ ¼ 4 0:25 ld ¼2 :
0:25 symb
In the case P(s1/s3) = P(s4/s3) = 0 i P(s3/s3) = 1, the source is not stationary, and
the entropy cannot be found.
Maximal entropy would correspond to the case when all the elements of the
transition matrix have the values 1/4, the symbols are equally probable as well,
yielding
Problems 25
T T
yk-2 yk-1
X
4
1X
4
1 Sh
Hmax ðSÞ ¼ ld ð4Þ ¼ 2 :
i¼1
4 j¼1
4 symb
In this case, the entropy of the adjoint source has the same value. It can be easily
shown that this type of transition matrix corresponds to the zero-memory source.
Problem 2.5 Zero-memory source emits binary sequence xk at the encoder input as
shown in Fig. 2.8, where T denotes the cell delay corresponding to the duration of
one symbol and addition is modulo 2.
(a) If encoder and source form an equivalent source emitting sequence yk, find the
type of this sequence. Is it a binary sequence? Whether the equivalent source
has memory?
(b) Draw state diagram and trellis for this source. Find the state probabilities and
symbol probabilities of the equivalent source.
(c) If the probability of one binary symbol in sequence xk is p = 0.2 find the
entropy of equivalent source and entropy of the source adjointed to it.
(d) Draw the entropy of the equivalent source as well as the entropy of its adjoint
source as a function of p.
Solution
(a) All operations are in binary field and the sequence yk is binary sequence as
well. Forming of the sequence emitted by the equivalent source with memory
is defined by
yk ¼ xk yk2 ;
showing clearly that output depends on the input and on the delayed input,
where the delay is equal to the duration of two binary symbols. Therefore, the
output symbol depends on two previously emitted symbols and second-order
memory source is obtained.
(b) State diagram of the equivalent source can be easily drawn having in view that
the source state (second-order memory source—m = 2) depends on two pre-
viously emitted symbols. These are the symbols at the cell outputs and the
current state is S ¼ ðyk2 ; yk1 Þ while the next state will be S0 ¼ ðyk1 ; yk Þ.
From every state the source can enter only into two other states depending on
26 2 Information Sources
(b)
(a) S S'
p 1-p
00 00
1-p
p
00 01
1-p 01 1-p 01
p
p p
p
1-p 10 1-p 10
10 11 p
p 1-p 1-p
11 11
Fig. 2.9 State diagram (a) and trellis (b) of the second-order memory source
yk, i.e. on the value of the binary symbol emitted by zero-memory source.
Therefore, the transition probabilities are p and 1 − p. The corresponding state
diagram and trellis are shown in Fig. 2.9a, b, where the symbols are denoted
by ‘0’ and ‘1’.
The following set of equations holds
However, this system does not have the unique solution (it is singular!) and a
new equation should be added to substitute any one in the set. This equation
can be easily written, because the equivalent source must be always in some
state
Steady state probabilities are P(00) = P(01) = P(10) = P(11) = 1/4, and all
states are equiprobable.
The corresponding probabilities of the binary symbols are
From the state diagram it can be seen that the conditional probability
Pðyk =yk2 yk1 Þ equals to the transition probability from the state
S ¼ ðyk2 ; yk1 Þ into the state S0 ¼ ðyk1 ; yk Þ. Therefore,
(c) Entropy of the sequence yk can be found using the general expression for the
entropy of the second-order memory source
2 X
X 2 X
2
1
HðYÞ ¼ Pðyi1 ; yi2 ÞPðyj =yi1 ; yi2 Þ ld ;
i1 ¼1 i2 ¼1 j¼1
Pðyj =yi1 ; yi2 Þ
writing equivalently
4 X
X 2
1
HðYÞ ¼ PðSi ÞPðyj =Si Þ ld ;
i¼1 j¼1
Pðyj =Si Þ
X
2
1 Sh
HðYÞ ¼ Pðyj Þ ld ¼ 0:5 ld 2 þ 0:5 ld 2 ¼ 1 :
k¼1
Pðyj Þ symb
The above derived expressions are valid for any value of parameter p. For
p = 0.2, the corresponding numerical value of entropy for the source with
memory is
while the entropy of the adjoint source does not depend on p, being always
HðSÞ ¼ 1 ½Sh/symb:
(d) The entropy as a function of p is shown in Fig. 2.10. As one can expect, the
entropy of the source with memory is always smaller than the entropy of its
adjoint source. It should be noted that for practically all values of p the source
having equiprobable symbols ‘0’ i ‘1’ is obtained (it is only not true in the case
p = 0, but for some initial states only). This means that the condition P(0) =
P(1) = 0.5 is not sufficient for the entropy to have the maximal value. The
same could be concluded from the previous problem, but it should be noted
that in this case, the zero-memory source is considered.
0.9
0.8
0.7
Entropy H(S)
0.6
0.5
0.4
0.3
0.2
Entropy − source with memory
0.1
Entropy − adjoint source
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
where the coefficients ai are from binary field and addition is modulo 2.
(a) Considering zero-memory source and encoder as an equivalent source
(emitting sequence yk) find entropy of this source.
(b) Draw state diagram and trellis for this source for the case p = 2 and a1 =
a2 = 1. Find the corresponding state and symbol probabilities.
(c) If the zero-memory source emits all-zeros sequence and encoder parameters
are p = 5, a2 = a5 = 1 and a1 = a3 = a4 = 1, draw block-scheme of an
equivalent source (with memory), find its entropy and draw autocorrelation
function of the sequence yk.
(d) Find the autocorrelation function of an equivalent source, when its input is
generated by zero-memory source emitting binary sequence xk and (subse-
quently) encoded by encoder having p cells (p even) according to relation
yk ¼ xk ykp :
Solution
(a) All calculations performed by encoder are over binary field and sequence yk is
binary as well, the source memory is of order p. The corresponding entropy is
30 2 Information Sources
2p X
X 2
1
HðYÞ ¼ PðSi ÞPðyj =Si Þ ld ;
i¼1 j¼1
Pðyj =Si Þ
where Si ¼ ðyi1 ; yi2 ; . . .; yip Þ denotes the current encoder state. From this state
only the transitions into two next states are possible—ðyi1 ; yi2 ; . . .; yip ; 0Þ and
ðyi1 ; yi2 ; . . .; yip ; 1Þ, depending on the value yj ¼ 0 or yj ¼ 1. Transition
probabilities depend on the current state yielding Pðyj ¼ 1=yi1 ; yi2 ; . . .; yip Þ ¼
Pi or Pðyj ¼ 0=yi1 ; yi2 ; . . .; yip Þ ¼ 1 Pi . The previous expression can be
written in the form
X
2p
1 1
HðYÞ ¼ PðSi Þ Pi ld þ ð1 Pi Þ ld :
i¼1
Pi 1 Pi
it is clear that the sum in brackets does not depends on i (one addend will be p, the
other 1 − p), yielding finally
X2p
1 1
HðYÞ ¼ p ld þ ð1 pÞ ld PðSi Þ
p 1p
i¼1
1 1
¼ p ld þ ð1 pÞ ld :
p 1p
Therefore, the encoding cannot change the source memory (but can introduce the
memory into emitted sequence).
(b) For p = 2 and a1 = a2 = 1 block-scheme of equivalent source slightly differs
from that in Fig. 2.8, because the output of the first cell is introduced into
modulo-2 adder as well. The corresponding state diagram and trellis of the
equivalent source are constructed in the same way as in the previous problem.
They are shown in Fig. 2.11a, b.
Comparing state diagram and trellis to these ones in the previous problem
(Fig. 2.9), the complete asymmetry can be noticed here. However, all four states are
also equiprobable yielding P(0) = P(1).
(c) The emitting of an all-zeros sequence by zero-memory source is equivalent to
the case where it is off, and the corresponding block-scheme of the source is
given in Fig. 2.12.
Problems 31
(b)
(a) S S'
p 00
1-p
1-p 00
p
00 01
01 p 01
p
1-p
1-p p
p
1-p 10 1-p 10
10 11 1-p
1-p p p
11 11
Fig. 2.11 State diagram (a) and trellis (b) of the source having parameters p = 2 and a1 = a2 = 1
yk
T T T T T
yk-2 yk-1
1X L
2N1 ðL þ 1Þ=2 1 1
my ¼ Efyk g ¼ yk ¼ ¼ ¼ þ ;
L k¼1 L L 2 2L
1X L
Ry ðlÞ ¼ Efyk yk þ l g ¼ yn ynk
L k¼1
ab ¼ ða þ b a bÞ=2:
1 XL
1X L
yk ykl
Ry ðlÞ ¼ ðyk þ ykl yk ykl Þ ¼ my :
2L k¼1 L k¼1 2
Sum in this expression for l = 0 is the sum of the addends being equal to zero,
while for l 6¼ 0 it corresponds to one half of the average value yielding finally
my ; l ¼ 0; L; . . .
Ry ðlÞ ¼
my =2; l 6¼ 0; L; . . .
Average value is
0.7
0.6
Autocorrelation function, Ry (l)
0.5
0.4
0.3
0.2
0.1
−0.1
−0.2
−60 −40 −20 0 20 40 60
Discrete shift, l
Efyk g ¼ Efxk g þ E ykp 2E xk ykp ;
my ¼ mx þ my 2mx my ) my ¼ 1=2:
mx ; l ¼ 0; L; . . .
Rx ðlÞ ¼
mx =2; l 6¼ 0; L; . . .
mx =2
UðzÞ ¼ ;
ð1 zÞð1 ð1 2mx Þzp Þ
X
p=2
Ry ðlÞ ¼ Bj qlj ; l 0:
j¼1
(a) (b)
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
Ry(l)
Ry(l)
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −40 −30 −20 −10 0 10 20 30 40
Discrete shift, l Discrete shift, l
T
yk-1
Solution
(a) This is a first-order memory binary symmetric source, it can be realized by
combining zero-memory binary source and a differential encoder [5], as shown
in Fig. 2.15.
(b) State diagram and the corresponding trellis are shown in Fig. 2.16. The source
state corresponds to the previously emitted bit and transition probability into
the next state is determined only by the bit probability at the zero-order
memory binary source output. The source is symmetric (easily concluded from
state diagram or the trellis) output binary symbols are equally probable
yielding P(0) = P(1) = 0.5. The probability for the source to be in the state ‘0’
equals the probability that previously emitted bit is ‘0’ and state probabilities
are equal (the source is stationary).
(c) If zero-memory source emits “binary ones” with probability p = 0.1, the
transition probability into the other state will be small and transition in the
sequence at the source output will be rare. Characteristic sequence emitted by
the source with memory will have the following form
. . . 00000000001111111000000111111111111111111100000000000111111
111110000000000 . . .
Source extension of the order n = 2 results in grouping two neighboring binary
symbols into one compound symbol, the possible compound symbols being
A = ‘00’, B = ‘01’, C = ‘10’ and D = ‘11’. The above binary sequence can now be
written using two twice shorter sequence
(a) (b)
p S 1-p S'
0 0
1-p 1-p p
0 1
p
p 1 1
1-p
Fig. 2.16 State diagram (a) and trellis (b) of first-order memory binary symmetric source
36 2 Information Sources
. . .AAAAADDDCAABDDDDDDDDDAAAAABDDDDDAAAAA. . .
While binary symbols of the original sequence are equally probable, in the new
sequence the symbols A and D will have substantially greater probabilities than the
symbols B and C, being the consequence of the unequal probabilities in the state
diagram. By using the joint probability, it is easy to find
To find the entropy of the nth extension of the m-order memory source, the
memory order of the extended source firstly has to be found. It is calculated by
using formula [7]
lmm
l¼ ;
n
where the operator d:e denotes the taking of upper integer (if the quotient is not an
integer). In this case m = 1 and n = 2 yielding
l ¼ d1=2e ¼ 1;
X
2 X
2
1
HðSÞ ¼ Pðsi ÞPðsj =si Þ ld
i¼1 j¼1
Pðsj =si Þ
or
1 1 1
HðSÞ ¼ 0:5ð1 pÞ ld þ 0:5p ld þ 0:5p ld þ 0:5ð1
1p
p p
1
pÞ ld ;
1p
1 1
HðSÞ ¼ ð1 pÞ ld þ p ld ;
1p p
X
2
1 Sh
HðSÞ ¼ Pðsi Þ ld ¼ 0:5 ld 2 þ 0:5 ld 2 ¼ 1 :
i¼1
Pðsi Þ symb
The extended source is the first-order memory source as well, and the following
expression can be used
2 X
X 4
1
HðS2 Þ ¼ Pðsi ÞPðrj =si Þ ld ;
i¼1 j¼1
Pðrj =si Þ
and after the final ordering of the entropy expression (tedious, but straight proce-
dure) the following is obtained
1 1
HðS2 Þ ¼ 2ð1 pÞ ld þ 2p ld :
1p p
X
2 X
2
1 ð1 pÞ 1p p p
HðS2 Þ ¼ Pðri Þld ¼ 2 ld ð Þ 2 ld ð Þ
i¼1 j¼1
Pðri Þ 2 2 2 2
¼ ð1 pÞ½ ld ð1 pÞ 1 p½ ld ðpÞ 1 ¼ 1 ð1 pÞ ld ð1 pÞ p ld ðpÞ:
One can easily verify that two previously given expressions can be obtained on
the basis of the general relations
HðSn Þ ¼ nHðSÞ;
HðSn Þ ¼ ðn 1ÞHðSÞ þ HðSÞ;
38 2 Information Sources
where the last equality is obtained using the fact that the difference between the
entropy of the extended source and its adjoint source does not change with the
extension order.
For p = 0.1 the corresponding numerical values are
and the values of HðSn Þ and HðSn Þ are shown in Fig. 2.17 for three characteristic
values of p.
When p = 0.5, the transitions into the next possible states are equiprobable and
the source loses the memory (generated sequence becomes uncorrelated) yielding
HðSÞ ¼ HðSÞ and HðSn Þ ¼ HðSn Þ ¼ nHðSÞ. On the other hand, if p = 0.01 there is
a great difference between the entropies of the original source and its adjoint source
(HðSÞ HðSÞ) and it even after the extension HðSn Þ has substantially greater
value than HðSn Þ. If p is fixed, the difference between entropy of the extended
source and the entropy of its adjoint source does depend of the extension order,
being in concordance with relation
10
the extended source, p=0.01
9 the adjoint to the extended source, p=0.01
the extended source, p=0.1
8 the adjoint to the extended source, p=0.1
the extended source, p=0.5
the adjoint to the extended source, p=0.5
7
Entropy [Sh/s]
0
1 2 3 4 5 6 7 8 9 10
The extension order, n
Fig. 2.17 HðSn Þ and HðSn Þ as functions of the extension-order n for p = 0.01, p = 0.2 and
p = 0.5
Problems 39
X
p
yk ¼ xk ai yki ;
i¼1
where coefficients ai are real numbers and the addition is in decimal system.
(a) Draw the system block-scheme and explain its functioning. Find the average
value and autocorrelation function of the output process.
(b) Is it possible to generate Gaussian random process having predefined auto-
correlation function?
(c) Find the entropy H(X) at the filter input. Does the entropy of the output
random process is smaller or greater than H(X)?
Solution
(a) Block-scheme of the system, shown in Fig. 2.18, can be found directly from
the corresponding equation. The average value of the random process at the
output is
X
p
Efyk g ¼ Efxk g ai Efyki g;
i¼1
and this process is the stationary Gaussian process as well (autoregressive IIR filter
is a linear system!). As mx = Efxk g ¼ 0, it follows
!
X
p
my 1 þ ai ¼ 0 ) my ¼ 0:
i¼1
1
HðzÞ ¼ ;
P
p
1 ai zi
i¼1
and the discrete average power spectrum density of the output Gaussian process is
N0
Uy ðzÞ ¼ 2 :
1 þ P ai zi
p
i¼1
Zp
1 N0 ezl
RðlÞ ¼ 2 dz ðl ¼ 0; 1; ; ðp 1ÞÞ:
1 þ P ai zi
2p p
p
i¼1
(b) Autoregressive (AR) model makes possible to generate process having the
predefined autocorrelation function. If the values of R(l) are given in advance,
it is possible to define correlation matrix R, coefficients vector of AR filter
a and correlation vector r, as follows
2 3 2 3 2 3
Ry ð0Þ Ry ð1Þ Ry ðp 1Þ a1 Ry ð1Þ
6 Ry ð1Þ Ry ð0Þ Ry ðp 2Þ 7 6 a2 7 6 Ry ð2Þ 7
6 7 6 7 6 7
R¼6 .. .. .. .. 7; a ¼ 6 .. 7; r ¼ 6 . 7:
4 . . . . 5 4 . 5 4 .. 5
Ry ðp 1Þ Ry ðp 2Þ Ry ð0Þ ap Ry ðpÞ
a ¼ R1 r:
In Fig. 2.19 it is shown how from an uncorrelated Gaussian random process xk, a
correlated random process yk can be generated. It is obtained by filtering process xk
using filter with p = 50 delay cells, where the delay of one cell corresponds to the
sampling period of the process T = 1 [ls]. Estimation was made using the sample
Problems 41
4 4
2 2
x(t)
y(t)
0 0
−2 −2
−4 −4
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
time t[ms] time t[ms]
0.3 0.3
f (x)
fY(y)
0.2 0.2
X
0.1 0.1
0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x y
Fig. 2.19 Uncorrelated process (noise) x(t) at the input, correlated process at the output y(t) and
the corresponding probability density functions for lmax = 50
size N = 106 and the uncorrelated and correlated signals are shown in Fig. 2.19. In
the same figure the probability density functions of both processes are shown. It is
clear that the first order statistics are not changed when AR filter was used.
On the other hand, second order statistics of input and output processes differ
substantially, as shown in Fig. 2.20. The input process is uncorrelated, while, the
output process has the autocorrelation function corresponding very good to the
predefined linearly decreasing function.
Z1
1
HðXÞ ¼ wðxÞ ld dx:
wðxÞ
1
42 2 Information Sources
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25 30 35 40 45 50
discrete shift
1 x2
wðxÞ ¼ pffiffiffiffiffiffi e 2 ; 1\x\1:
2p
Taking into account that process is uncorrelated, the above written expression
for entropy can be used
Z1 Z1
1 1 x2 1 x2
HðXÞ ¼ Þdx ¼ pffiffiffiffiffiffi
wðxÞ ldð e 2 ld ð2pÞ2 e 2 ds
wðxÞ 2p
1 1
2 3 2 3 \!endaligned [
Z1 Z1
ld ð2pÞ 4 1 2 ld ðeÞ 4 1 2
e 2 dx5 þ x2 e 2 dx5:
x x
¼ pffiffiffiffiffiffi pffiffiffiffiffiffi
2 2p 2 2p
1 1
ld ð2peÞ
: HðXÞ ¼
2
The output process is correlated, and can be considered as generated by a
continuous source with memory. For such sources, the entropy cannot be calculated
Problems 43
using the same formula as above. However, it can be noticed that the source
generates signal yk which has the same probability density function as signal xk and
that their first order statistics are identical. Therefore, one can consider the source
generating xk as an adjoint source to that one generating signal yk, from which
follows
The equality is valid only for lmax = 1, while for its higher values, the entropy of
the output sequence of AR filter is smaller than entropy at the input, and the
difference is greater if autocorrelation function decreases slowly.
Chapter 3
Data Compression (Source Encoding)
In this chapter only the discrete sources are considered. Generally, the encoding is a
mapping of sequences of source alphabet symbols (S) into sequences of code
alphabet symbols (X) (Fig. 3.1).
Number of code alphabet symbols (r) is called code base. If the code alphabet
consists of two symbols only, binary code is obtained. Block code maps each of the
symbols (or their sequences) of source alphabet S into a fixed sequence of code
alphabet X symbols. The obtained sequences are called code words (Xi). A number
of symbols in a code word is called code word length.
Consider the following binary code (r = 2) for a source with q = 4 symbols.
Code alphabet symbols are denoted simply as 0 and 1.
S Xi
s1 0
s2 01
s3 11
s4 01
This code is defined by the table. A code can be also defined in some other way,
e.g. using mathematical operations. The above code is practically useless because
two source symbols have the same code words. The decoding is impossible. It is
singular code. Therefore, a natural restriction is put—all code words should be
distinct. A code is nonsingular if all code words are distinct. Consider the fol-
lowing nonsingular code
S Xi
s1 0
s2 01
s3 11
s4 00
But this code is also useless. The sources are emitting sequences of symbols and
the corresponding code words will be received by decoder in sequences. E.g., if the
received sequence is 0011, there are two possibilities for decoding—s1s1s3 and s4s3.
Therefore, a further restriction should be put. All possible sequences of code words
(corresponding to all possible sequences of source symbols) must be different to
allow a unique decodability. In other words, a nonambiguous decoding must be
allowed. A (block) code is uniquely decodable if the nth extension of the code (i.e.
the sequence corresponding to the nth source extension) is nonsingular for finite n.
Consider the following uniquely decodable codes
From the practical point of view, one more condition should be imposed. All
three codes are uniquely decodable. Code words of code (a) have the same code
length. Code (b) is a comma code (Problems 3.1 and 3.2), where at the end of every
word is the symbol not appearing anywhere else. In binary code this symbol can be
0 (and 1 as well). In the above example the code words consist of a series of ones,
with a zero at their end. However, code (c) is a little bit different. Here all code
words start with 0. It means that decoder has to wait for the beginning of the next
code word, to decode the current one. Therefore, an additional condition is put.
A uniquely decodable code is instantaneous (Problems 3.2–3.4) if it is possible to
decode each word in a sequence without reference to neighboring code symbols.
A necessary and sufficient condition for a code to be instantaneous is that no
complete code word is a prefix of some other code word. Code (c) does not satisfy
this condition. In Fig. 3.2 the mentioned subclasses of codes are shown.
Brief Theoretical Overview 47
CODES
NONSINGULAR
UNIQUELY DECODABLE
INSTANTANEOUS
As a suitable way to visualize the code, a code tree can be constructed. The tree
consists of the root, the nodes and the branches. From the root and from every node,
for r-base code, there are r branches, each one corresponding to one code symbol.
The alphabet symbols are at the ends of outer branches. Decoder starts from the root
and follows the branch determined by a code symbol. At the end the corresponding
source symbol should be found. In Fig. 3.3 the code trees corresponding to the last
example are shown.
In the figure is arbitrary chosen that going along left branch corresponds to 0 and
along right branch corresponds to 1. Of course, bits in the code words can be
s1
0
s1
0
s2
s2 01
10
s1 s2 s3 s4
00 01 10 11 s3
s3
011
110
s4 s4
1110 0111
X
q
r li 1:
i¼1
X
q
2li 1:
i¼1
If this relation is satisfied with equality, the code is complete (Problem 3.4). In
fact, it is merely a condition on the word lengths of a code and not on the con-
struction of words themselves (in the extreme case, one can put only symbol 0 in all
code words). The next example is interesting.
Consider the following binary (r = 2) codes for a source with q = 4 symbols [3]
(a) (b)
s1 s2 s3 s1 s2
0 1 2 0 1
s3 s4 s5
20 21 22
Fig. 3.4 Code trees of ternary code (r = 3) for the source with q = 3 (a) and q = 5 symbols (b)
Brief Theoretical Overview 49
Codes (a), (b), (c) and (d) satisfy Kraft inequality. However, code (d) is not
instantaneous because s4 is a prefix of s3. But, the code words with such lengths can
be used to obtain an instantaneous code—(c). On the other hand, using code words
whose lengths correspond to code (e), it is not possible to construct an instanta-
neous code.
The goal of data compression is to transmit or record the information as eco-
nomically as possible. The useful measure is the average code word length
(Problem 3.1)
X
q
l ¼ L ¼ Pi li :
i¼1
HðSÞ
L :
ld r
L HðSÞ:
50 3 Data Compression (Source Encoding)
This result is obvious, because one bit cannot carry more than one shannon.
This theorem can be proved as well for sources with memory. For the nth
extension of such source it is obtained (where the adjoint source to the source S, is
denoted by S)
It means that with sufficient source extension, the equivalent average code word
length can be as near as we wish to the source entropy.
To compare the various codes, some parameters are defined. A code efficiency
(Problems 3.1, 3.7 and 3.8). is
HðSÞ
g¼ 100% ¼ 100%
L
For 100% efficiency a code is perfect (Problems 3.1 and 3.3). Compression ratio
is (Problems 3.1, 3.7 and 3.8).
d ld qe
q¼ :
L
where operator d:e denotes the first greater (or equal) integer than the argument.
The maximum possible compression ratio value (Problem 3.1) is
d ld ðqÞe d ld ðqÞe
qmax ¼ ¼
HðSÞ ld ðqÞ
The interesting features concerning the transmission errors are maximum code
word length (Problem 3.5). and sum of code word lengths as well as code words
length variance (Problem 3.5). Further, effective efficiency of code for sources with
memory, can be defined as well as maximum possible efficiency and maximum
possible compression ratio (Problem 3.7) for a given extension order.
The first known procedure for the source encoding was the Shannon-Fano one
[10,11] (Problems 3.1 and 3.4). It was proposed by Shannon, while Fano made it
easier to follow by using the code tree (Problems 3.3, 3.4, 3.6, 3.7 and 3.8). By
applying this procedure sometimes more code trees have to be considered. The main
reason is that the most probable symbols should have the shortest words. For binary
codes it could be achieved by dividing symbols into two groups having equal (or as
equal as possible) probabilities. Code words for one group will have the first bit 0,
and for the second the first bit will be 1. It is continued into the subgroups until all
subgroups contain only one symbol. For r > 2, the number of subgroups is r.
Consider the next examples:
Brief Theoretical Overview 51
code a) code b)
s1
s2 s1 s2 s3
s3
s4 s5
s4 s5
Fig. 3.5 Code trees corresponding to the codes (a) and (b)
The corresponding code trees are shown in Fig. 3.5 (left branch corresponds to 0,
right one to 1).
It is easy to conclude that this procedure is not an algorithm. Generally, for every
q and r, each corresponding code trees should be examined to obtain a compact
code. In Fig. 3.6, two code trees (from possible 5) are shown for a source with 6
symbols.
The corresponding codes, entropy and average word code lengths are
52 3 Data Compression (Source Encoding)
(a) (b)
Fig. 3.6 Two possible code trees for a source with 6 symbols
S Pi (a) (b)
s1 0.65 0 0
s2 0.15 10 10
s3 0.08 110 1100
s4 0.05 1110 1101
s5 0.04 11110 1110
s6 0.03 11111 1111
H(S) = 1.6597 La=1.74 b/symb Lb=1.75 b/symb
Sh/symb
The difference in code word average length is very small and it is difficult to
choose the right one code tree without calculation.
However, Huffman [12] (Problems 3.3, 3.4, 3.5, 3.6, 3.7 and 3.8) gave an
algorithm resulting in obtaining a compact code. For illustration binary coding will
be used. Consider the source without memory with the symbols s1, s2, …, sq and the
corresponding symbol probabilities Pi (i = 1, 2, …, q). Symbols should be ordered
according to nonincreasing probabilities P1 P1 Pq. Now, the reduc-
tions start. The last two of ordered symbol are combined into one symbol (its
probability is Pq−1 + Pq) and a new source is obtained containing q − 1 symbols
(first reduction). It is continued until only two symbols are obtained (after q − 2
reductions). Their code words start with 0 and 1. After that one goes back dividing
the compound symbols into ones it was obtained from, adding 0 and 1 into the
corresponding code words. At the end, the compact code is obtained.
Consider the previous example. Probabilities of symbols to be combined are
denoted by (i.e. ♦) after the probability value, and probability of resulting symbol
are denoted with the same sign before the numerical value of the combined symbol
probability, S1, …, S4 are the sources obtained after reduction.
Brief Theoretical Overview 53
S Pi Xi S1 S2 S3 S4
s1 0.65 0 0.65 0 0.65 0 0.65 0 0.65 0
s2 0.15 11 0.15 11 0.15 11 •0.20♦ 11 ♦0.35 1
s3 0.08 101 0.08 101 ♦0.12• 110 0.15♦ 110
s4 0.05 1001 •0.07♦ 1000 0.08• 101 101
s5 0.04• 10000 0.05♦ 1001
s6 0.03• 10001
Average code word length is L = 1.74 b/symb corresponding to the code tree
(b) from the Fig. 3.6. The obtained code words are not identical to the previous
solution, but the code word lengths are the same. Of course, by complementing the
corresponding bits, the identical code words could be obtained.
One more question is of interest. In some cases there will be more than two
compound code words with the same probabilities. What will happen if the com-
pound code word is not put at the last place? Consider the following example.
(a) Symbol obtained by reduction is put always at the end of symbols having the
same probabilities
S Pi Xi S1 S2 S3 S4
s1 0.5 0 0.5 0 0.5 0 0.5 0 0.5 0
s2 0.2 11 0.2 11 0.2 11 •0.3♦ 10 ♦0.5 1
s3 0.1 101 0.1 101 ♦0.2• 100 0.2♦ 11
s4 0.1 1000 0.1♦ 1000 0.1• 101
s5 0.07• 10010 •0.1♦ 1001
s6 0.03• 10011
compact codes for the same source and the same code alphabet. This fact can be
used to impose some additional conditions for the obtained code. For example, the
condition can be put that the lengths of the obtained code words have smaller
variance. It will lessen the demands for encoder memory. It can be achieved by
putting the compound symbol always at the first place among of symbols having the
same probabilities.
One drawback of a “classic” Huffman procedure is the need to know the symbol
probabilities. However, these probabilities are not known at the beginning (except
mainly for the languages). But often there is no time to wait the end of a long
message to obtain a corresponding statistics. The messages must be transmitted
continuously as they are generated. The problem is solved by using the adaptive
(dynamic) Huffman algorithm. The code tree is changed continuously to take into
account the current symbol probabilities, it is dynamically reordered. One possible
solution was proposed independently by Faller and Gallager, later improved by
Knuth and the algorithm is known as FGK algorithm (Problem 3.9). It is based on
the so called sibling property (Problems 3.2, 3.3, 3.4 and 3.9). Vitter (Problem 3.9)
gave the modification that is suitable for very large trees. These long procedures are
exposed in details in the corresponding problems.
The Huffman coding does not take into account the source memory. LZ
(Lempel-Ziv) (Problem 3.10) procedure is some kind of universal coding without
any explicit model. It can be thought out as a try to write a program for a decoder to
generate the sequence compressed by the encoder at the transmitting part.
For ternary encoding, the last three symbols are combined, and correspondingly
for r-ary encoding. If after the last reduction the number of compounded symbols
differs from 3 (r), dummy symbols (having the probability equal to zero) should be
added before the first reduction to avoid the loose of short code words.
Problems
Problem 3.1 By applying Shannon-Fano encoding procedure find the binary code
corresponding to zero-memory source emitting q = 5 symbols whose probabilities
i
2 ; i ¼ 1; 2; . . .; q 1
(a) are defined by Pðsi Þ ¼
2i þ 1 ; i ¼ q
(b) are mutually equal, P(si) = 1/q, i = 1, 2, …, q.
For both cases find the code efficiency and the compression ratio. For values
2 q 20 draw efficiency and compression ratio as the functions of q for both
sources.
Solution
(a) Procedure for obtaining the code using Shannon-Fano encoding is shown in
Table 3.1. The encoding steps are:
Problems 55
1
li ¼ ld ;
Pðsi Þ
and so called comma code is obtained, where the binary zero occurs only at the last
position in the code word, except for one of the words that has the maximum length,
having binary ones only.
To find the code efficiency, the entropy and the average code word length
should be found. In this case
X
5
1 1 1 1 1
HðSÞ ¼ Pðsi Þld ¼ ld 2 þ ld 4 þ ld 8 þ 2 ld 16 ¼ 1:875½Sh=symb
i¼1
Pðsi Þ 2 4 8 16
X
5
1 1 1 1
L¼ Pðsi Þli ¼ 1 þ 2 þ 3 þ 2 4 ¼ 1:875½b=symb
i¼1
2 4 8 16
HðSÞ
g¼ 100% ¼ 100%:
L
Compression ratio shows the reducing of the number of bits at the encoder
output comparing to the case when binary (non compression) code is used, it is
defined as
d ld qe 3
q¼ ¼ ¼ 1:6;
L 1:875
where operator d:e denotes the first greater (or equal) integer than the argument. In
this case, the number of bits at the encoder output is 1.6 times smaller.
For the given probabilities, it is easy to verify that for any number of emitted
symbols, a perfect comma code always can be found, and the following relation
holds
d ld ðqÞe
q¼ ;
HðSÞ
where the general expression for entropy, derived in Problem 2.2, yields
HðSÞ ¼ 2 22q :
(b) For the second set of probabilities, the procedure for obtaining the code using
Shannon-Fano encoding is shown in Table 3.2.
Entropy and average code word length are now
X
5
1 1
HðSÞ ¼ Pðsi Þld ¼ 5 ld 5 ¼ 2:3219 ½Sh=symb;
i¼1
Pðsi Þ 5
X
5
1 1
L¼ Pðsi Þli ¼ 3 2 þ 2 3 ¼ 2:4 ½b=symb;
i¼1
5 5
HðSÞ
g¼ 100% ¼ 96:75%;
L
d ld ðqÞe 3
q¼ ¼ ¼ 1:25:
L 2:4
For the probabilities in the form P(si) = 1/q, i = 1, 2, …, q, general expression for
entropy is
X
q
1 X1
q
HðSÞ ¼ Pðsi Þ ld ¼ ld ðqÞ ¼ ld ðqÞ;
i¼1
Pðsi Þ i¼1
q
d ld ðqÞe
q¼ ¼ qmax g;
L
where qmax denotes maximum possible compression ratio value, not depending on
the code used, but on the source and shows the “source compression potential”,
given by
d ld ðqÞe d ld ðqÞe
qmax ¼ ¼ :
HðSÞ ld ðqÞ
2.6
a) Perfect comma code
2.4 b) Symbols with equal probability
Maximum compression ratio, ρmax(q)
2.2
1.8
1.6
1.4
1.2
1
2 4 6 8 10 12 14 16 18 20
Number of symbols, q
Fig. 3.7 The maximum possible compression ratio vs. the number of source symbols for two
cases analyzed
Problem 3.2 Find the binary code for zero-memory source defined by
si s1 s2 s3 s4
P(si) 0.6 0.25 0.125 0.025
Problems 59
Solution
(a) It is obvious that comma code is the best solution. The first symbol is encoded
by one bit only (‘0’), second by two (‘10’) and third and fourth by three bits
(‘110’ and ‘111’). The entropy and the average code word length are
X
4
2li ¼ 21 þ 22 þ 2 23 ¼ 1 1:
i¼1
60 3 Data Compression (Source Encoding)
Kraft inequality is always satisfied for comma code (in fact with equality), it is
obvious that the code can be constructed with the same code word lengths, but it is
not instantaneous. For example, the code that has the code words ‘0’, ‘10’ ‘100’ and
‘111’ is not even uniquely decodable. Therefore, it should be verified whether some
code word is the prefix of some other code word.
For the codes (a) and (b) it is not the case, and the codes are instantaneous.
Problem 3.3 Find the binary code for zero-memory source defined by:
si s1 s2 s3 s4 s5 s6
P(si) 0.65 0.05 0.08 0.15 0.04 0.03
(a) Apply the Huffman procedure, find the efficiency. Whether the obtained code
is compact?
(b) Draw the corresponding code tree. Whether the code tree satisfies the sibling
property?
Solution
(a) Huffman procedure consists of the following steps:
1. The symbols are ordered according to the non increasing probabilities.
2. The reduction is carried out by combining two symbols having the smallest
probabilities into the one symbol.
3. The obtained symbol (denoted in this example by an asterisk) has the
probability equal to sum of the combined symbols probabilities.
4. The procedure is repeated (including the possible reordering of new
symbols) until only two symbols remain. One is denoted e.g. by “0”, and
the second by “1”.
5. Code word for a combined symbol is a prefix of the code words for
symbols from which it was obtained. After this prefix, “0” and “1” are
added.
6. The procedure is repeated until all original symbols obtain corresponding
code words (third column in the Table 3.3).
Entropy, average code word length and efficiency are:
X
6
1
HðSÞ ¼ Pðsi Þ ld ¼ 1:66 ½Sh=symb;
i¼1
Pðsi Þ
X
6
L¼ Pðsi Þlðsi Þ
i¼1
¼ 0:65 1 þ 0:05 4 þ 0:08 2 þ 0:15 3 þ 0:04 5 þ 0:03 5
¼ 1:74 ½b=symb;
Problems
s1 0,65
s4 0,15 1
0 0
s3 0,08
1 0,35
1
s2 0,05
1 0,20
0
1 0,12
s5 0,04
0
0,07
s6 0,03 0
Fig. 3.8 Huffman procedure illustrated by tree
q ¼ d ld ðqÞe=L ¼ 1:8072:
Huffman procedure guarantees the obtaining of the compact code. The obtained
code here is not perfect. However, for this source it is not possible to find an
instantaneous code achieving smaller average code word length (higher efficiency).
(b) It is easier to follow the Huffman procedure using the code tree, as shown in
Fig. 3.8. The tree is formed from the root (on the right side), two branches
starting from it are denoted by different bits and the procedure is repeated
going to the lower hierarchical levels. It is not important which branch is
denoted by binary zero and which by binary one.
Here the result of Huffman procedure gives slightly different code words than in
the previous case—1, 0101, 011, 00, 01001 and 01000. In Fig. 3.9 the ordered code
tree is shown where the hierarchical levels can be easily noticed. It is obvious that
this tree satisfies the sibling property, because the symbol probabilities, started from
the left to the right and from the lowest hierarchical level up, do not decrease. One
can verify easily that the same tree topology corresponds to the code from Table 3.3,
although the code words differ. There are five nodes at the tree, from each one two
branches are going out. Therefore, 32 equivalent compression codes can be formed
having the same tree topology, and each one is instantaneous and compact.
Problem 3.4 Zero-memory source is defined by:
si s1 s2 s3 s4 s5 s6 s7
P(si) 0.39 0.21 0.13 0.08 0.07 0.07 0.05
Problems 63
0 1
0,35 0,65
s1
0 1
0,15 0,20
s4
1 0
0,08 0,12
s3
1 0
0,05 0,07
s2
0 1
s6 s5
0,03 0,04
(a) Apply Shannon-Fano encoding procedure and find the average code word
length.
(b) Whether the obtained code is compact? Whether the code is perfect?
(c) Whether the Kraft inequality is satisfied?
Solution
(a) It is chosen that in the first division symbols s1 and s2 form one group, the
other symbols forming the other group (probability ratio is 0.6:0.4). The
alternatives are s1 in the first group (0.39:0.61) or s1, s2 and s3 in the first group
(0.73:0.27), but according to the Shannon-Fano procedure they are suboptimal
solutions. The obtained groups are further subdivided into the subgroups by
trying that they have as equal as possible probabilities. The procedure is given
in Table 3.4.
Entropy and the average code word length are
64 3 Data Compression (Source Encoding)
X
7
1
HðSÞ ¼ Pðsi Þ ld ¼ 2:43½Sh=symb;
i¼1
Pðsi Þ
X
7
L¼ Pðsi Þli ¼ 2:52½b=symb;
i¼1
X
7
L¼ Pðsi Þli
i¼1
¼ 0:39 1 þ 0:21 3 þ 0:13 3 þ 0:08 4 þ 0:07 4 þ 0:07
4 þ 0:05 4
¼ 2:49½b=symb
s1 0.39
s4 0.21 0
0 0.36
s3 0.13
0 0
1 1
s2 0.08 1
0.61
1
s5 0.07 0 0.15 0.25
s6 0.07 0 1
0.12
s7 0.05 1
1 0
0.40 0.60
1 0 1 0
0.08 0.13
0.07 0.12 s3
s5 s4
1 0
0.05 0.07
s7 s6
X
7
2li ¼ 2 22 þ 3 23 þ 2 24 ¼ 1 1;
i¼1
Problem 3.5 Apply the Huffman procedure for a source defined by the table given
below if the symbols obtained by reduction are always put at the last place in the
group of equally probable symbols, either they are put arbitrary.
(a)
s1 0.4
s4 0.2 0
1.0
0
s3 0.1 0 1
0.2
0.6
s2 0.1 1 1
0 0.4
s5 0.1
1
0.2
s6 0.1 1
(b)
s1 0.4
0
0.6
s4 0.2 0
0 1.0
0.4
s3 0.1 0 0.2 1
1
1
s2 0.1
1
s5 0.1 0
0.2
s6 0.1 1
si s1 s2 s3 s4 s5 s6
P(si) 0.4 0.2 0.1 0.1 0.1 0.1
(a) Draw at least two code trees corresponding to obtained codes that have dif-
ferent topologies. Whether the codes satisfy Kraft inequality? Whether these
codes have the same average code word length? What are the differences?
(b) If the source emits the sequence s2, s1, s4, s1, s5, s3, s6 and the channel errors
occur at the first and eighth bit, find the decoded sequences in both cases.
Solution
In Fig. 3.12a, b two possible results of Huffman procedure use are shown.
The obtained code words are given in Table 3.5. For both cases the average code
word length is the same—L ¼ 2:4 ½b=symb, but the maximum code word length is
different (max(li) = 5 for the first code and max(li) = 4 for the second one) as well
as the sum of code word lengths (sum(li) = 20 and sum(li) = 18).
Also, differ the code words length variances found according the formula
1 X ðnÞ
q
Var ðnÞ ¼ ðl LÞ2 ;
q i¼1 i
For both topologies it is possible to form a number of different code words sets.
There are five nodes in the tree, from each one two branches going out. Therefore
32 different compact codes can be formed for each tree topology. Codes corre-
sponding to one topology have different code words from the second one, but their
lengths are the same and their variances are the same.
68 3 Data Compression (Source Encoding)
(b) For the first code when seven symbols were sent, eight symbols were received,
one bit error usually cause the error propagation to the next symbol as well.
s2 s1 s4 s1 s5 s3 s6
10 0 1101 0 1110 1100 1111
0 0 0 1101 1101 0 1100 1111
s1 s1 s1 s4 s4 s1 s3 s6
For the second code when seven symbols were sent, seven symbols were
received, and error propagation is not noticeable.
s2 s1 s4 s1 s5 s3 s6
10 00 111 00 010 110 011
00 00 111 00 110 110 011
s1 s1 s4 s1 s3 s3 s6
Although in the second case code words length variance is smaller (code words
lengths are more uniform) it not resulted in reducing the problems due the bad
encoder and decoder synchronization, usually caused by transmission errors. It is
obvious that this effect arouses always when the code words have different lengths.
In this case the number of symbols in error as well as the error structure depend on
code words structure and the consequences generally cannot be easily predicted.
Problem 3.6 Zero-memory source is defined by
si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01
Apply the Huffman procedure to obtain binary, ternary and quaternary code.
Find the corresponding efficiencies.
Solution
The procedure of based m Huffman code obtaining consists of the following steps
[1, 7]:
1. In the first step q0 symbols are grouped where 2 q0 m and q is the smallest
integer satisfying
nq q o
0
rem ¼ 0:
m1
s2 0,20
0
0,41
s1 0,18
0 0
1,0
1
s4 0,15
1 0,33
0 1
s3 0,14
0,59
s6 0,11 0 1 1
s7 0,05 0
0
s8 0,04 0 0,12
1
s9 0,02 0 0,07
1
0,03
s10 0,01 1
X
10
1
HðSÞ ¼ Pðsi Þ ld ¼ 2:981½Sh=symb; Lð2Þ ¼ 3:02½b=symb; gð2Þ ¼ 98:72%:
i¼1
Pðsi Þ
The procedure for ternary code forming is shown in Fig. 3.14a. In this case
q = 10 and as q − q0 should be even, it is easy to find q0 = 2. The corresponding
code words are given in Table 3.6. The average code word length and efficiency are
X
10
ð3Þ HðSÞ
Lð3Þ ¼ Pðsi Þli ¼ 1:95 ½ter: symb=symb; gð3Þ ¼ 100%
i¼1
ld ð3ÞLð3Þ
¼ 96:42%:
The procedure for quaternary code forming is shown in Fig. 3.14b. In this case
m = 4, q0 = 4 yielding
70 3 Data Compression (Source Encoding)
(a) (b)
s2 0.20 s2 0.20
s1 0.18 A s1 0.18 B A
A 1.00
1.00
B 0.47 s4 0.15 C
s4 0.15
B
C s3 0.14
s3 0.14
C D
s6 0.11 s6 0.11 A
A B
s5 0.10 s5 0.10
B 0.33 C 0.47
s7 0.05 s7 0.05
A C A D
s8 B
s8 0.04 0.04
B 0.12 0.12
s9 0.02 A C s9 0.02 C
0.03
s10 0.01 B s10 0.01 D
Fig. 3.14 Huffman procedure for three (a) and four (b) symbols of the code alphabet
X
10
ð4Þ HðSÞ
Lð4Þ ¼ Pðsi Þli ¼ 1:59½quot: symb=symb; gð4Þ ¼ 100%
i¼1
ld ð4ÞLð4Þ
¼ 90:88%:
Problem 3.7 The source emits two symbols according to the table:
si s1 s2
P(si) 0.7 0.3
(a) By applying the Huffman procedure find the code for the source and for its
second and third extension as well. Find the efficiency and the compression
ratio for all codes and compare the results to the bounds given by the First
Shannon theorem.
Table 3.6 Code words of binary, ternary and quaternary code and their lengths
si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01
x(2)
i 100 00 111 101 011 010 1100 11010 110110 110111
l(2)
i 3 2 3 3 3 3 4 5 6 6
x(3)
i BA A BC BB CB CA CCA CCB CCCA CCCB
l(3)
i 2 1 2 2 2 2 3 3 4 4
x(4)
i B A DA C DC DB DDA DDB DDC DDD
l(4)
i 1 1 2 1 2 2 3 3 3 3
Problems 71
(b) Draw the efficiency as a function of the extension order (n) for P(si) = 0.05,
P(s1) = 0.1, P(s1) = 0.2, P(s1) = 0.3 and P(s1) = 0.4 for the extension order
1 n 8.
(c) Find the dependence for the maximal compression ratio on the probability
P(s1).
Solution
(a) Encoding of the original source is trivial—symbol s1 is encoded by binary
zero and symbol s2 by binary one. Entropy and average code word length are
HðSn Þ
lim ¼ 1:
n!1 Ln
Table 3.9 The efficiency as a function of extension order, for various probabilities P(s1)
n 1 2 3 4 5 6 7 8
P(s1) = 0.05 28.64 49.92 66.10 78.00 85.19 90.56 94.29 95.86
P(s1) = 0.1 46.90 72.71 88.05 95.22 97.67 99.75 98.87 98.57
P(s1) = 0.2 72.19 92.55 99.17 97.45 97.83 99.54 98.66 98.59
P(s1) = 0.3 88.13 97.38 96.99 98.82 99.13 99.22 99.64 99.48
P(s1) = 0.4 97.10 97.10 98.94 98.96 99.31 99.45 99.55 99.64
100
90
80
70
Efficiency, η
60
P(s1)=0.4
50
P(s1)=0.3
P(s1)=0.2
40
P(s )=0.1
1
P(s1)=0.05
30
20
1 2 3 4 5 6 7 8
The extension order, n
compression ratio does not mean the highest compression ratio. It will be
additionally explained in the next part of solution.
(c) The extensions are obtained from the original source and the maximal com-
pression ratio does not depend on the extension because the compression
potential does not depend on the extension order
Problems 73
d ld ð2n Þe nd ld ð2Þe 1
qmax
ðnÞ
¼ ¼ ¼ ¼ qmax
ð1Þ
:
HðSn Þ nHðSÞ HðSÞ
1 L
lim ¼ lim ¼ HðSÞ:
n!1 qðnÞ n!1 n
Problem 3.8 First-order memory binary source is defined by the transition matrix
1p p
P¼ :
p 1p
25
Maximum compression ratio, ρmax
(n)
20
15
10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of one symbol of binary source, P(s1)
Fig. 3.16 The dependence of maximum compression ratio on the binary source symbol
probability
74 3 Data Compression (Source Encoding)
(a) Find the corresponding binary Huffman codes for a source as well as for its
second and third extension if p = 0.01.
(b) Find the minimum value of the average code word length for the nth exten-
sion. Draw the efficiency and compression ratio dependence on the extension
order for p = 0.1, p = 0.01 and p = 0.001.
Solution
(a) In solution of Problem 2.6 it was shown that the corresponding stationary
binary symbols probabilities are P(0) = P(1) = 0.5. Huffman code for such a
source is trivial—symbol ‘0’ is coded by ‘0’ and ‘1’ by ‘1’. Obviously, the
average code word length is L1 = HðSÞ = 1[b/symb]. The entropy is
g ¼ 8:08%; q ¼ 1:
X 4
H S2 ¼ Pðri Þ ld ð1=Pðri ÞÞ ¼ 1:0808 ½Sh=symb;
i¼1
X
4
L2 ¼ Pðri Þli ¼ 1:515½b=symb;
i¼1
the efficiency is η = 10.67%, and the compression ratio q ¼ 2=L2 ¼ 1:32. The
relation HðS2 Þ\L is satisfied as well.
Table 3.10 Stationary probabilities of the second extension symbols and the corresponding code
words
Combined symbols Probabilities Code words li
r1 = ‘00’ P(00) = P(0)P(0/0) = 0.495 0 1
r2 = ‘01’ P(01) = P(0)P(1/0) = 0.005 110 3
r3 = ‘10’ P(10) = P(1)P(0/1) = 0.005 111 3
r4 = ‘11’ P(11) = P(1)P(1/1) = 0.495 10 2
Problems 75
The probabilities of the third extension combined symbols and the corre-
sponding code words are given in Table 3.11. In this case, it is easy to verify
X 4
H S2 ¼ Pðri Þldð1=Pðri ÞÞ ¼ 1:1616½Sh=symb;
i¼1
X
8
L3 ¼ Pðri Þli ¼ 1:549½b=symb;
i¼1
L n H Sn :
Table 3.11 Stationary probabilities of the third extension symbols and the corresponding code
words
Combined Conditional Probabilities Code li
symbols probabilities words
r1 = ‘000’ P(00/0) = P(0/0)P P(000) = P(0)P 0 1
(0/0) = 0.9801 (00/0) = 4.9005 10−1
r2 = ‘001’ P(01/0) = P(1/0)P P(001) = P(0)P 1111 4
(0/0) = 0.0099 (01/0) = 4.95 10−3
r3 = ‘010’ P(10/0) = P(0/1)P P(010) = P(0)P 111001 6
(1/0) = 0.0001 (10/0) = 5 10−5
r4 = ‘011’ P(11/0) = P(1/1)P P(011) = P(0)P 1100 4
(1/0) = 0.0099 (11/0) = 4.95 10−3
r5 = ‘100’ P(00/1) = P(0/0)P P(100) = P(1)P 1101 4
(0/1) = 0.0099 (00/1) = 4.95 10−3
r6 = ‘101’ P(01/1) = P(1/0)P P(101) = P(1)P 111000 6
(0/1) = 0.0001 (01/1) = 5 10−5
r7 = ‘110’ P(10/1) = P(0/1)P P(110) = P(1)P 11101 5
(1/1) = 0.0099 (10/1) = 4.95 10−3
r8 = ‘111’ P(11/1) = P(1/1)P P(111) = P(1)P 10 2
(1/1) = 0.9801 (11/1) = 4.9005 10−1
76 3 Data Compression (Source Encoding)
Further, the effective efficiency of Huffman code for sources with memory can
be defined
ðnÞ H Sn
gef ¼ 100%;
Ln
It is clear that with the increasing of the extension order, the effective efficiency
rapidly converge to the its maximum value even when there is a significant dif-
ference between the symbol probabilities, what is here the case. More critically is
how the maximum possible efficiency and the maximum possible compression
degree change. The correspond formulas are
Using the known relations for entropy of nth extension and the source adjoined
to it, as well as the fact that the original source is binary, the following expressions
are obtained (and the dependences shown in Figs. 3.17 and 3.18)
100
90
80
(n)
max
70
Maximum efficiency, η
60
50
40
30
p=0.1
20 p=0.01
p=0.001
10
0
0 20 40 60 80 100 120 140 160 180 200
The extension order, n
Fig. 3.17 The maximum efficiency that can be achieved by Huffman encoding, first-order
memory source, p parameter
Problems 77
70
60
Maximum compression ratio
50
p=0.001
40 p=0.01
p=0.1
30
20
10
0
0 20 40 60 80 100 120 140 160 180 200
The extension order, n
Fig. 3.18 The maximum compression ratio that can be achieved by Huffman encoding, first-order
memory source, p parameter
nHðSÞ n
gmax
ðnÞ
¼ 100%; qmax
ðnÞ
¼ :
ðn 1ÞHðSÞ þ HðSÞ ðn 1ÞHðSÞ þ HðSÞ
For the higher p values maximum code efficiency can be achieved with a small
extension order, but the corresponding maximum value of compression ratio is
significantly smaller. Sources with memory where parameter p has small values
have a great compression potential. For p = 0.001, the 50th extension provides
compression ratio of the order 30. This means that 30 equiprobable bits emitted by
the source can be, in average, represented by one bit only at the encoder output.
However, in this case the Huffman procedure should be applied for 250 symbols of
source alphabet!
Problem 3.9 Discrete source for which the number of symbols and its probabilities
are not known in advance, emits the sequence
aabcccbd. . .
Solution
In adaptive Huffman algorithm encoder and decoder use dynamic change of code
tree structure. The procedure of tree rearranging after the reception of every new
symbol at the encoder input depends on the applied variant of the adaptive Huffman
algorithm, but the code tree in every moment must have the sibling property. The
algorithm is based on the fact that the binary code is a Huffman code if and only if
the code tree has a sibling property. It practically means that in any moment (for
any length of the incoming sequence) the code tree obtained by applying the
adaptive Huffman algorithm could be obtained and by applying the static Huffman
algorithm as well. Therefore, the code is compact always, although the code tree
structure changes in time.
(a) The basis of FGK algorithm was independently found by Newton Faller
(1973) [13] and Robert Gallager (1978) [14], while Donald Knuth (1985)
[15] added improvements into the original algorithm, and the name of the
algorithm is derived from the family initials of the authors.
The procedure of the code forming for the input sequence aabcccbd is exposed
in what follows
1. At the beginning the Huffman tree is degenerated and consists of the zero
node only (knot 0) as shown in Fig. 3.19a. At the encoder input is symbol
a and it is easy to verify that this symbol is not on the tree. Therefore, the
encoder emits code word corresponding to the knot 0 (being 0 as well),
emitting simultaneously the corresponding ASCII code for the symbol—
ASCII(a). As shown in Fig. 3.19b, now a new tree is formed by division of
the node 0 into two nodes—0 and a. To node 0 the code word 0 is adjoined
and the number of previous appearances of this symbol corresponds to its
weight being always w(0) = 0 (this symbol does not appear at the input
sequence, but it is added artificially, for the easier algorithm realization).
To the node corresponding to symbol a code word 1 is adjoined and the
corresponding node weight is w(a) = 1 (in the figures, the weight is always
in the rectangle, near the node).
2. Once more into encoder enters a, already existing on the tree. The encoder
emits code corresponding to the knot a—it is now the code word 1. Tree
structure does not change because the entering element already exists on
the tree. Because two symbols a have been entered, the node weight
0
0 1
0 1
0 a
Problems 79
(a) 2 (b) 3
0 1 0 1
0 2 1 2
0 a a
0 1
0 1
0 b
Fig. 3.20 Tree structure after entering of the second (a) and of the third (b) symbol
increments, now it is w(a) = 2, and to this node at the new tree code 1 is
adjoined, as shown in Fig. 3.20a.
3. Into encoder enters b, not existing yet at the tree. The encoder emits the
code corresponding to the node 0 (the code being 0) and after that ASCII
(b). New tree is formed by separating the node 0 into two nodes—0 and b,
as shown in Fig. 3.20b. To the node 0 the code 00 is adjoined with the
weight w(0) = 0 while the code 01 is adjoined to the node b with the
weight w(b) = 1.
4. Now, into encoder enters c, not existing yet at the tree. The encoder emits
the code corresponding to the node 0, now being 00 and after that ASCII
(c). New tree is formed by separating the node 0 into two nodes—0 and
c. To the node 0 code 000 is adjoined (it is obvious that the corresponding
code for this node will be all-zeros sequence, always with the weight
w(0) = 0!), and to the node b the code 001 is adjoined, with the weight
w(b) = 1, as shown in Fig. 3.21a.
(a) 4 (b) 5
0 1 0 1
2 2 2 3
a a
0 1 0 1
1 1 1 2
b c
0 1 0 1
0 1 0 1
0 c 0 b
(a) (b)
Fig. 3.21 Tree structure after entering of the fourth (a) and of the fifth (b) symbol
80 3 Data Compression (Source Encoding)
5. Into encoder now enters c, already existing on the tree. The encoder emits
code corresponding to the node c being now 001. The weight of node
corresponding to symbol c is incremented and a new tree does not have the
sibling property. Because of that, symbols b and c change the places and a
new tree is obtained (the tree topology does not change). The rule for this
reordering is that the node whose weight increased by incrementation
(w(c) = 2) change the place with the node which has a smaller weight than
it, and from all the nodes that have this property, the one is chosen which is
at the lowest hierarchical level. Just because of that, the node c goes up for
one level towards the root. It should be noted that this reordering is logical,
because the same result would be obtained for static Huffman algorithm
with the same symbol probabilities (i.e. P(a) = P(c) = 0.4 and P(b) = 0.2),
with the exception that here formally exists zero node always being at the
lowest hierarchical level. New tree is shown in Fig. 3.21b (the first level is
drawn in a slightly different way to emphasize that the sibling property is
satisfied).
6. Now once more enters the symbol c. At the encoder output the code
corresponding to the node c is emitted being now 01. Because the weight
incrementation yields now the weight w(c) = 3, the node c change the
place with the node a going up one level. New tree is shown in Fig. 3.22a.
7. Into encoder now enters once more the symbol b. At the encoder output the
code corresponding to the knot b is emitted which is now 001. Because
totally two symbols b have been entered so far, the weight increments is
now w(b) = 2, and to this node at a new tree code 001 is adjoined, as
shown in Fig. 3.22b. Tree structure in this step does not change, because
the tree with a new weights satisfies the sibling property.
8. Into encoder enters d, not existing yet at the tree. The encoder emits the
code corresponding to the knot 0, now being 000 and after that ASCII(d).
New tree is formed by separating the node 0 into two nodes—0 and d. It is
obvious that the weights observed from left to right starting at the lowest
(a) 6
(b) 7
0 1 0 1
3 3 3 4
c c
0 1 0 1
2 2
1 2
a a
0 1 0 1
0 2
0 1
0 b
0 b
Fig. 3.22 Tree structure after entering of the sixth (a) and of the seventh (b) symbol
Problems 81
c
0 1
2 3
a
0 1
1
2
b
0 1
0 1
0 d
level to the upper levels do not decrease. The tree satisfies the sibling
property and further reordering is not needed. Final tree structure for a
given sequence is shown in Fig. 3.23.
The emitted bit sequence for the input sequence abcccbd is
where with the sign “,”the symbols emitted for one symbol at the input are sepa-
rated, while “;” corresponds to the adjacent input symbols. It should be noted that
the same entering sequence was encoded using the Huffman static code corre-
sponding to the tree from Fig. 3.18 (when there is no zero node, and the symbol d is
represented by 000) is
while for the correct decoding, the decoder must receive in advance ASCII(a),
ASCII(b), ASCII(c), ASCII(d), enabling it to “know” the correspondence between
the symbols and the code words.
Therefore, for FGK algorithm 16 bits and four ASCII codes for letters are
transmitted in real time, as the symbols enter into the encoder. For a static Huffman
algorithm, 14 bits are transmitted, but four ASCII codes for letters had to be
transmitted in advance. The reader should try to explain why, in this case, the static
Huffman algorithm yields better results (whether only due to the absence of the zero
knot?). Of course, great advantage of adaptive Huffman algorithm is that for the
encoder functioning, the input symbol probabilities should not be known in
advance.
82 3 Data Compression (Source Encoding)
(b) If the encoder output is connected directly to the decoder input, error-free
transmission us supposed. In this case the decoder output should be found, if
its input is
sibling property and symbols a and c change the places giving the tree
shown in Fig. 3.22a.
7. In this step the received sequence is 001. The first and the second bit are
not sufficient for symbol location and the third one (the complete sequence)
points to the symbol b whose weight is incremented to w(b) = 2. The tree
satisfies the sibling property and does not change the structure, as shown in
Fig. 3.22b.
8. In this step the received sequence is 000, ASCII(d). The code 000 points to
symbol 0, resulting in the adding of a new symbol (node 0 is separated into
0 and d), having the weight w(d) = 1 with the corresponding code word
0001. The tree is shown in Fig. 3.23, satisfies the sibling property and
further reordering is not needed.
(c) The second variant of adaptive Huffman algorithm was proposed by Jeffrey
Vitter (1987) [16]. The basic modification is that during the tree reordering
symbols nodes are put at as low as possible hierarchical levels. If the two
nodes have the same weight, being at the different levels, then during
reordering the goal is that a node corresponding to the end symbol has to be
put on the lower level. In this procedure the numeration of nodes is introduced
from the left to the right and from the lowest level up (underlined numbers in
Fig. 3.24). It is not allowed that from the two notes having the same weight,
one corresponding to the one symbol and the other being combined, the
symbol node has the upper ordinal number. If it happens, these two nodes
should change the positions at the tree (the combined node “carries” its suc-
cessors as well).
(a) 9
8 (b) 8
9
0 1 0 1
3 5
7 8 3 5
7 8
c
0 1
0 1 0 1
3 2
6 1 2 2 3
5 3 5 6
a 4
0 1 b a c
0 1
1
2
3 4
0 1 2 1
b
0 1 0 d
0 1 2 1
0 d
Fig. 3.24 Code trees obtained by FGK procedure (a) and by Vitter procedure (b)
84 3 Data Compression (Source Encoding)
For the input sequence in this problem (aabcccbd) the obtained trees are iden-
tical for the first seven steps as for FGK procedure, because the above condition is
satisfied. However, the tree for the eighth step differs, as shown in Fig. 3.24. It is
obvious that for tree obtained by FGK procedure (Fig. 3.24a) symbol node (No. 6)
and the combined node (No. 7) have the same weight, but, the symbol node is at the
lower level. If these nodes change the places (symbol No. 7 successors), the tree
shown in Fig. 3.24b is obtained as a result of Vitter procedure.
In the previous problems it was mentioned that the number of levels in a code
tree corresponds to the maximal code word length, denoted by max(li), having here
the value 4 for FGK and 3 for Vitter procedure. Sum of the lengths of all code
words sum(li) is here 14 for FGK and 12 for Vitter procedure. Therefore, the Vitter
procedure provides smaller values for both numerical characteristics comparing to
FGK. Vitter procedure can easily be expanded for the case when both nodes to be
changed are combined and to the higher level (nearer to the root) the node which
has a greater value for max(li) either sum(li) should be put. In such way, the code
words lengths are equalized (the variance defined in Problem 3.5 is smaller) and the
time for which code word lengths attain the maximum value defined by the cor-
responding format and by the implementation is longer. If 16 bits are used for the
word lengths, for the node weights greater than 4095 the exceeding arises. This
problem for the weights is easily solved (when the root weight becomes 4095, the
symbol node weights are divided by two) but the problem of maximum code word
length stands for very great trees and the Vitter procedure has an important
application.
Problem 3.10 The sequence from the binary symmetric source with memory enters
into Lempel-Ziv encoder which forms dictionary of length N = 8.
(a) For the following relation of source parameters P(0/1) P(0/0) and for the
emitted sequence
01010101010101010010110100101010101011010. . .
explain the principle of dictionary forming and find the achieved compression
ratio. What are minimum and maximum values for the compression ratio in
this case?
(b) If the errors occur at the first and at the tenth bit of encoded sequence, find the
decoded sequence.
(c) Repeat the previous for the case P(0/1) P(0/0) if the sequence emitted by
source is
00000000000000000000000000000000011111111. . .
(d) Find the compression ratio for the cases P(0/1) = 1 and P(0/1) = 0, when
dictionary length is N and the sequence is very long. How to estimate the
dependence of compression ratio on the dictionary length for single-order
memory binary source?
Problems 85
Solution
(a) Dictionary forming for Lempel-Ziv code [17, 18] including Welch modifi-
cation [19] can be described by program in pseudo language, given in left
column in the Table 3.12. The encoder functioning can be described by the
entering and the outgoing sequences
where the underlined part corresponds to the bits at the input, used for
dictionary forming, i.e. to the addresses emitted at the encoder output during
the dictionary forming.
Every address is denoted by three bits and the obtained compression ratio equals
to the quotient of the entering and outgoing bits, i.e.
Nul 13 þ 28
q¼ ¼ ¼ 1:0513;
Nizl 18 þ 21
where the first addends in numerator and the denominator correspond to the number
of bits at the input and the output during the dictionary forming (the transient
regime), while the second addends describe the coder functioning in the stationary
regime.
86 3 Data Compression (Source Encoding)
13
qmin ¼ ¼ 0:7222;
18
while the maximum value is achieved if the sequence for encoding is very long and
the transient regime can be neglected, and the sequence has a property that a
maximum numbers of symbols (four in this case) from the input, can be represented
by one address (three binary symbols) yielding
4
qmax ¼ ¼ 1:3333:
3
In this case the decoder, on the base of known dictionary and the decoder input
(denoted by in) estimate the sequence entering its input, denoted by k’ (as given in
Table 3.12). From decoded sequence, the following is obtained
out ¼ 0 1 2 4 3 6 5 5 7 5 5 5 7
¼ 000 001 010 100 011 110 101 101 111 101 101 101 111:
If the errors occurred at the first and the tenth bit (Fig. 3.25), the received
sequence is
in ¼ 100 001 010 000 011 110 101 101 111 101 101 101 111
¼ 4 1 2 0 3 6 5 5 7 5 5 5 7:
Although the error propagation in the decoded sequence does not exist, it is
possible that in the decoded sequence occurs smaller or greater number of bits
because to the various addresses correspond to the code words having the various
length, and the estimation of the encoded sequence (from encoder input) is
LZ encoder
k out in LZ decoder k'
(dictionary is
(dictionary is
created by using
known)
the input sequence)
e
Fig. 3.25 LZ encoder and decoder
Problems 87
(b) For a new sequence, the dictionary forming is given in Table 3.13 and the
input and outgoing sequences are
All symbol groups inscribed in dictionary are the combinations of binary zeros
only. The problem arouses when in the stationary regime appears the first binary
one—coder in this case cannot find the corresponding word and the encoding
becomes impossible.
This example illustrates the fact that the sequence used for dictionary forming
has to be representative. It is desirable that during the transient regime the sequence
at the coder input has the same statistical properties as the rest of the sequence. For
stationary signals it is always possible although the duration of the transient regime
can be relatively long, and this duration determines the needed dictionary
88 3 Data Compression (Source Encoding)
dimension. For practical purpose is (e.g. for compression of the text corresponding
to the speaking language) usually the dictionary dimension of 1024 or 2048
addresses is chosen (every symbol is represented by 10 or 11 bits).
(c) If P(0/1) = 1 and if the source is symmetric then P(1/0) = 1 and combinations
00 and 01 will never appear in the input encoder sequence. It is obvious that in
this case only two sequences of length three (010 and 101), two sequences of
length four (1010 and 0101) and two sequences of any length n will appear.
Therefore, in the dictionary with N = 2n positions only the pairs of sequences
having the lengths m = 1, 2, …, N/2 will be kept.
In the stationary regime only the sequences of the length N/2 will be separated
and each one is encoded using n = ld(N) bits, sufficiently for a binary record of
every decimal address. For this optimal case the compression ratio is
ð1Þ N
qopt ðNÞ ¼ :
2 ld ðNÞ
For the case of binary symmetric source with memory where P(0/1)=P(1/0)=0,
the sequence consists of zeroes only either of ones only (depending on the starting
state) and the compression ratio is
50
45
40 limited by
the dictionary length
35
The compression ratio
limited by
the source parameters
30
P(1/0)=0.005
25
20
P(1/0)=0.01
15
10
P(1/0)=0.1
5
0
0 100 200 300 400 500 600 700 800 900 1000
The dictionary length, N
Fig. 3.26 The limits of compression ratio for LZ code and for some first-order memory sources
Problems 89
ð2Þ N
qopt ðNÞ ¼ :
ld ðNÞ
Of course, the expressions are valid if the length of the sequence is sufficiently
long and the duration of the transition period can be neglected. The dependence of
compression ratio on the number of bits representing the address in the dictionary is
shown in Fig. 3.26.
It is clear the with the increase of dictionary length, the compression ratio tends
to infinity. It is understandable, because the entropy in this case equals zero, and for
such binary source
1
qmax ¼ ! 1:
HðSÞ
Of course, if the sequence is not a deterministic one, but has a long memory, the
compression ratio must satisfy the condition
d ld ðqÞe;
lim ðqðNÞÞ :
N!1 HðSÞ
yielding
1
lim ðqðNÞÞ ¼ qmax ¼
N!1 Pð1=0ÞldðPð1=0ÞÞ þ ð1 Pð1=0ÞÞldð1 Pð1=0ÞÞ
and the corresponding upper bound of compression ratio is shown in Fig. 3.26 by
dashed line, for various values of P(1/0).
It should be as well noticed that the lines are the limiting cases only, and that the
real compression ratio can substantially differ from these values, but it must be
below them
N 1
qðNÞ\min ; :
2ldðNÞ Pð1=0Þld ðPð1=0ÞÞ þ ð1 Pð1=0ÞÞld ð1 Pð1=0ÞÞ
The deviation will be greater as the memory is shorter, but some useful con-
clusion can be drawn as well from the diagram. The compression ratio for a
first-order memory binary source where P(1/0) = 0.01 in any case cannot be greater
than 12.38, but using LZ coder having dictionary length 128 bits (seven-bits
addresses) even in the ideal case (with an infinite source extension) cannot achieve
the compression ratio greater than 9.14, because the address length is a charac-
teristic of the encoder itself.
90 3 Data Compression (Source Encoding)
One more example can be considered. It is known that the printed English text
can be approximately described as a source with memory of the 30th order, emitting
26 letters. The entropy of this source is estimated as H(S) = 1.30 [Sh/symb], while
ld(q) = 4.70 yielding d ld ðqÞe ¼ 5 and a maximum compression ratio is approx-
imately qmax = 3.85. From the figure one could conclude that for the corresponding
compression a dictionary should have a few tens of bits. It is not the case and the
obtained compression ratio for LZ coder increases substantially slower than it is
shown in Fig. 3.26, and the needed dictionary length is at least N = 1024.
Chapter 4
Information Channels
The information channels, as well as the sources, generally are discrete or con-
tinuous. Physical communication channel is, by its nature, a continuous one. Using
such an approach the noise and other interferences (other signals, fading etc.) can be
directly taken into account. However, it can be simplified introducing the notion of
a discrete channel (Fig. 1.1), incorporating the signal generation (modulation),
continuous channel and signal detection (demodulation). Some channel can be of a
mixed type. The input can be discrete, as for the case of digital transmission, while
at the receiver, the decision is made on the basis of continuous amplitude range, i.e.
the output is continuous.
The discrete channel is described by the input alphabet (xi 2 X{x1, x2, …,
xi, …, xr}), the output alphabet (yj 2 Y{y1, y2, …, yj, …, ys}) and a set of con-
ditional probabilities P(yj/xi), according to Fig. 4.1. These sets are finite (generally,
they may be countable as well). This channel is a zero-memory (memoryless)
channel, because the conditional probabilities depend on the current symbol only.
A good example for such a channel is that one where there is no intersymbol
interference and where only the white Gaussian noise is present. In the channel with
intersymbol interference, the probability of the received symbols depends as well
and on the adjacent symbols and in such way a memory is introduced into the
channel. Therefore, here P(yj/xi) is a conditional probability that at the channel
output symbol yj will appear, if the symbol xi is sent (emitted). These probabilities
can be arranged as the transition (channel) matrix completely describing the
channel (Problems 4.1–4.3)
X{xi} Y{y j}
{P(yj/xi)}
2 3
P11 P12 ... P1s
6 P
6 21 P22 ... P2s 7
7
P Pij ¼ 6 .. .. .. 7;
4 . . ... . 5
Pr1 Pr2 ... Prs
where Pij P(yj/xi). Index i corresponds to ith row (i.e. input symbol) and index
j corresponds to jth column (i.e. output symbol).
Obviously,
X
s
Pij ¼ 1 ði ¼ 1; 2; . . .; rÞ;
j¼1
i.e. matrix P must be stochastic, because at the receiving end the decision must be
made—which symbol yj is received, after symbol xi was sent. Some authors use Pij
to denote the conditional probability that symbol yi is received, after symbol xj was
sent. In this case, the sum of elements in every column must be equal 1.
A very simple discrete channel is binary channel (BC) (Problems 4.1–4.3)
where the transition matrix is
v p1
PBC ¼ 1 ;
p2 v2
(a)
v1 (b)
X {x1=0, x2=1} Y {y1=0, y2=1}
p1 p2 0110110011 0100011011
e
0010101000
v2
Fig. 4.2 Graph corresponding to a binary channel (a) and an equivalent description using error
sequence (b)
Brief Theoretical Overview 93
the input sequence using an XOR gate and the error sequence (e) (Fig. 4.2b). In
this sequence binary ones are at the positions where the errors occurred, comple-
menting the transmitted bit.
A special case of binary channel is binary symmetric channel (BSC) (Problems
4.4–4.6) where the transition matrix is
v p
PBSC ¼ ðp þ v ¼ 1Þ:
p v
The notion “symmetry” here means that the probabilities of transition from 0 to
1, and from 1 to 0 are the same. It is the simplest channel described by only one
parameter p(v) corresponding in the same time to the probability of error (Pe)
(crossover probability). In this case Pe does not depend on the probabilities of input
symbols. For the simple binary channel probability of error is
i.e., the equivalent channel is BSC as well. Its matrix is obtained by multiplication
of the corresponding transition matrices. Using induction method, it can be easily
proved that in a general case the matrix of the equivalent channel is obtained by
multiplication of matrices of channels is cascade. For identical channels the
p p
v
94 4 Information Channels
One may ask why r 6¼ s? The answer is that in general case such channels are
encountered in praxis. For example, BSC can be modified, if it is decided that the
transmission is unsuccessful, using the “erasure” of the bit instead of the decision
that the other symbol (bit) is received (error!). Such channel is called binary era-
sure channel (BEC) (Problem 3.8). The corresponding transition matrix is
v p 0
PBEC ¼ ðp þ v ¼ 1Þ:
0 p v
v
Brief Theoretical Overview 95
X
r
Pðyj Þ ¼ Pðxi ÞPij ðj ¼ 1; 2; . . .; sÞ:
i¼1
Therefore, the output probabilities can be found if the input and transition
probabilities are known. Of course, the following must hold
X
r X
s
Pðxi Þ ¼ 1; Pðyj Þ ¼ 1:
i¼1 j¼1
Of course, P(xi) are a priori input probabilities. P(yj/xi) can be called a poste-
riori output probabilities and P(yj)—a priori output probabilities. Generally, on the
basis of transition and output probabilities sometimes it is not possible to calculate
input probabilities.
Consider the next (trivial) example. BSC is described by p = v = 0.5. Let P
(x1) = a and P(x2) = b (a + b = 1). It is easy to calculate
This formula (Bayes’ rule) is obtained from the joint probability of events xi and yj
X
r r X
X s
Pðxi =yj Þ ¼ 1; Pðxi ; yj Þ ¼ 1:
i¼1 i¼1 j¼1
Find all unknown probabilities if the input (a priori) probabilities are P(x1) = 3/4
and P(x2) = 1/4.
Output (a posteriori) probabilities are
3 2 1 1 21
Pðy1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼ þ ¼
4 3 4 10 40
3 1 1 9 19
Pðy2 Þ ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy2 =x2 Þ ¼ þ ¼
4 3 4 10 40
20 1
Pðx1 =y1 Þ þ Pðx2 =y1 Þ ¼ 1; Pðx2 =y1 Þ ¼ 1 Pðx1 =y1 Þ ¼ 1 ¼ :
21 21
Analogously
9 10
Pðx1 =y2 Þ ¼ 1 Pðx2 =y2 Þ ¼ 1 ¼ :
19 19
Brief Theoretical Overview 97
20 21 1
Pðx1 ; y1 Þ ¼ Pðy1 ÞPðx1 =y1 Þ ¼ ¼ :
21 40 2
3 2 1
Pðx1 ; y1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ ¼ ¼ :
4 3 2
X
r
1
HðXÞ ¼ Pðxi Þld
i¼1
Pðxi Þ
X
r
1
HðX=yj Þ ¼ Pðxi =yj Þld :
i¼1
Pðxi =yj Þ
3 4 1 Sh
HðXÞ ¼ ld þ ld 4 ¼ 0:811
4 3 4 symb
20 21 1 Sh
HðX=y1 Þ ¼ ld þ ld 21 ¼ 0:276
21 20 21 symb
9 19 10 19 Sh
HðX=y2 Þ ¼ ld þ ld ¼ 0:998 :
19 9 19 10 symb
and 10/19. The conclusion is that this uncertainty can increase after the reception of
some specific symbol.
However, information theory considers the average values and it is natural to
average partial a posteriori entropy over all output symbols to obtain a posteriori
entropy
X
s
HðX/Y) ¼ Pðyj ÞHðX=yj Þ
j¼1
Xs X
r
1
¼ Pðyj Þ Pðxi =yj Þld
j¼1 i¼1
Pðxi =yj Þ
XX
r s
1
¼ Pðyj ÞPðxi =yj Þld
i¼1 j¼1
Pðxi =yj Þ
XX
r s
1
¼ Pðxi ; yj Þld :
i¼1 j¼1
Pðxi =yj Þ
Sh
HðX=YÞ ¼ Pðy1 ÞHðX=y1 Þ þ Pðy2 ÞHðX=y2 Þ ¼ 0:617 ;
symb
X
r X
s
1
IðX;Y) ¼ Pðxi ÞPðyj =xi Þ ld
i¼1 j¼1
Pðxi Þ
XX
r s
1
Pðxi ; yj Þ ld
i¼1 j¼1
Pðxi =yj Þ
Xr X
s
Pðxi =yj Þ
¼ Pðxi ; yj Þ ld
i¼1 j¼1
Pðxi Þ
Xr X s
Pðxi ; yj Þ
¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðxi ÞPðyj Þ
where the last row is obtained using Bayes’ rule. It should be noted that I(X, Y) is
symmetric with regard to X and Y (I(X, Y) = I(Y, X)—mutual information!).
For the above example
Sh
IðX; YÞ ¼ HðXÞHðX=YÞ ¼ 0:8110:617 ¼ 0:194 :
symb
This conclusion verifies once more that the approach to definition the quantity of
information is intuitively correct. When input and output symbols are statistically
independent in pairs, i.e. when any received symbol does not depend on the
emitted, there is no any transmission of information.
100 4 Information Channels
As said earlier, the entropy of the finite discrete source is limited. It can be easily
shown the following
IðX,Y) 0;
where the equality sign corresponds only to the case of independent symbols. On
the other hand, the mutual information is maximal, if the a posteriori entropy equals
zero, because then all emitted information is transmitted. Therefore,
0 IðX; YÞ HðXÞ:
Besides the above defined entropies, three more can be defined. The joint
entropy of X and Y
X
r X
s
1
HðX,Y) ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðxi ; yj Þ
X
s
1
HðYÞ ¼ Pðyj Þ ld :
j¼1
Pðyj Þ
X
r X
s
1
HðY/X) ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj =xi Þ
Using the above mentioned symmetry and some mathematical manipulation the
following expression can be obtained
Mutual information is
Brief Theoretical Overview 101
X
2 X
2
1
IðX,Y) ¼ HðY) HðY/X) ¼ HðY) Pðxi Þ Pðyj =xi Þ ld
i¼1 j¼1
Pðy j =xi Þ
X2
1 1
¼ HðY) Pðxi Þ p ld þ v ld
i¼1
p v
1 1
¼ HðY) ða þ bÞ p ld þ v ld
p v
1 1
¼ HðY) p ld þ v ld :
p v
or,
From
p ap þ bv v; za p 0:5;
i.e.
p ap þ bv v za p 0:5 ðv 0:5Þ;
it follows
therefore, I(X, Y) 0.
Now, the last step to define the channel capacity follows.
Mutual information can be written using input (a priori) probabilities, charac-
terizing the source, and transition probabilities, characterizing the channel.
102 4 Information Channels
r X
X
s
Pðyj =xi Þ
IðX,Y) ¼ Pðxi ÞPðyj =xi Þ ld Pr :
i¼1 j¼1 i¼1 Pðxi ÞPðyj =xi Þ
where the maximization is carried out over the set of input probabilities. Dimension
of Imax is Sh/symb corresponding to the maximal quantity of information that can
be transmitted in the average by one symbol.
The engineers prefer to know the maximal quantity of information that can be
transmitted over the channel in one second. Let the source has symbol rate
vm(X, Y) [symb/s], being the maximal rate for the channel. Then, the maximal
information rate is
symb Sh Sh
C ¼ vm ðX,Y) Imax ¼ :
s symb s
It will be called in this book the channel capacity (Problems 4.4, 4.5, 4.10,
4.11). Some authors (especially in the courses on probability) call Imax channel
capacity. However, the symbol rates are in a range from a few thousands symbols
per second to more terasymbols per second. It is usually taken vm(X, Y) = 2fg, to
avoid the intersymbol interference (First Nyquist criterion).
In reality, the information rate (flux) (Problems 4.1, 4.5) is
Sh
UðX,Y) vðX,Y) IðX,Y) ;
s
where v(X, Y) and I(X, Y) are the real values for the system. The corresponding
efficiency can be defined
UðX,Y)
gC ;
C
Generally, the calculation of channel capacity is very difficult, usually using the
variation calculus or some similar mathematical tool. Here only two interesting case
will be given.
Brief Theoretical Overview 103
IðX; YÞ ¼ HðXÞHðX=YÞ:
However, after the receiving of any symbol there would not be any ambiguity,
because there is no noise nor intersymbol interference. Hence, H(X/Y) = 0, and
Imax ¼ HðXÞ:
C ¼ 2fc ldr
Z1
1
HðX) ¼ wðxÞ ld dx;
wðxÞ
1
Z1
1
HðY) ¼ wðyÞ ld dy;
wðyÞ
1
Z1 Z1
1
HðX/Y) ¼ wðx; yÞ ld dxdy;
wðx=yÞ
1 1
Z1 Z1
wðx; yÞ
IðX,Y) ¼ wðx; yÞ ld dxdy;
wðxÞwðyÞ
1 1
It is supposed that a frequency band is limited, e.g. 0 fc, and the corre-
sponding rate is vm(X, Y) = 2fc, yielding
104 4 Information Channels
C ¼ 2fc Imax :
Let the continuous channel input (x) has the probability density w(x), where
Z1
mx ¼ x ¼ xwðxÞdx ¼ 0
1
Z1
r2x ¼ x2 ¼ x2 wðxÞdx\1:
1
Z1
mn ¼ n ¼ nwðnÞdn ¼ 0
1
Z1
r2n ¼ n2 ¼ n2 wðnÞdn\1:
1
y ¼ x þ n:
where
my ¼ y ¼ mx þ mn ¼ 0;
r2y ¼ y2 ¼ r2x þ r2n :
Mutual information is
The output ambiguity depends only on additive noise (independent from signal)
and the following can be shown (H(N) is noise entropy)
Now, a hypothesis about the noise (its probability density) should be made. The
worst case, as explained in Chap. 2, for a fixed variance (r2), is when the noise has
Gaussian probability density, yielding
1
Hmax ¼ ld 2per2 :
2
Therefore,
1
IðX,Y) ¼ HðY) ld 2per2n ;
2
yielding
1
Imax ¼ max HðYÞ ld 2per2n :
wðxÞ 2
Now, the probability density w(x) of the input signal should be found to max-
imize H(Y). The output signal, sum of the input signal and noise (statistically
independent), will have the maximal entropy if its probability density is Gaussian.
To obtain it, the input signal should be Gaussian as well because the sum of the
independent Gaussian processes is Gaussian process. Therefore,
1 1 h i
HðY)jmax ¼ ld 2per2y ¼ ld 2pe r2x þ r2y ;
2 2
Some authors instead of fc use symbol B (frequency band) and instead of vari-
ances quotient—symbol S/N (from signal-to-noise, being here a natural quotient of
signal and noise powers, not in dB!), yielding finally (Problems 4.4, 4.5, 4.9–4.11)
S
C ¼ B ld 1 þ :
N
106 4 Information Channels
This expression is obtained using mainly abstract consideration, but the used notions
are very well known to the communication engineers—frequency band (B) and
signal-to-noise ratio (S/N). Therefore, capacity can be increased by increasing either
band either signal-to-noise ratio (i.e. by increasing the transmitter power or by sup-
pressing the noise power). Of course, some compensation can be introduced—de-
creasing the band can be compensated by increasing the transmitter power (but the
increase is slow—logarithmic!) and vice versa.
When applying the above expression for the channel capacity it is very important to
keep in mind the conditions supposed. The channel is considered with additive Gaussian
noise, a capacity is achieved if the input signal is Gaussian as well. Therefore, it is
maximum possible value for capacity. In reality, if the signals are used which are not
Gaussian (usually some kind of baseband pulses or some kind of digital modulation),
the capacity is smaller. In such cases, a calculation of capacity can be very complicated.
The notion of error probability is very well known. Still, it is useful to connect it
with so called decision rule. Consider BC described by transition matrix
v1 p1
PBC ¼ :
p2 v2
It is obvious that the error will happen as a result of two exclusive events—either
x1 is emitted and y2 is received or x2 is emitted and y1 is received. Therefore, the
average error probability is
and it is just the error probability (Problems 4.1, 4.2, 4.3, 4.7), as usually this
notion is used. For P(x1) = P(x2) = 0.5
Pe ¼ 0:5 ðp1 þ p2 Þ:
because P(x1) + P(x2) = 1 and p is just the parameter equivalent to the error
probability.
Consider BSC described by transition matrix
0:99 0:01
:
0:01 0:99
Brief Theoretical Overview 107
It is obvious that the error probability is Pe = p = 0.01. But, consider the fol-
lowing transition matrix
0:01 0:99
:
0:99 0:01
Do the error probability is 0.99? Of course no! One should only “complement”
the logic. In previous cases after receiving symbol y1(=0) the decision was that
symbol x1(=0) is emitted, and vice versa. If now after the receiving y1(=0) the
decision is x1(=1), the error probability is again Pe = 0.01. Therefore, the error
probability does not depend on transition probabilities only, but on the decision rule
as well—on the way the receiver interprets the received symbol. It should be noted
that here (binary channel) the worst case is when the error probability is 0.5.
Now, the notion of decision rule will be more precisely defined, Consider the
channel having r input symbols {xi} (i = 1, 2, …, r) and s output symbols {yj}
(j = 1, 2, …, s). Decision rule D(yj) is a function specifying a unique input symbol
xi for each output symbol yj, i.e.
Dðyj Þ ¼ xi
In the next to last example it was D(y1) = x1, D(y2) = x2 and in the last one
D(y1) = x2, D(y2) = x1.
Generally, there are rs different decision rules. For BC there are 22 = 4 different
decision rules.
The next question is which decision rule one should choose? The answer is clear—
that one yielding minimum error probability. Therefore, the relation between decision
rule and error probability should be found. Let symbol yj is received. The receiver will
use the decision rule Dðyj Þ ¼ xi ; But, the receiving of symbol yj can be as well the
consequence of emitting some other symbol from {X}. It can be written as follows
where P(e/yj) is a conditional probability that the wrong decision was made if the
received symbol is yj. Error probability is obtained by averaging this probability
over all output symbols.
X
s
Pe ¼ Pðyj ÞPðe=yj Þ:
j¼1
The considered events are mutually exclusive, the addends are nonnegative, and
minimization is achieved by minimizing every addend independently. P(yj) does
not depend on decision rule and D(yj) = xi should be chosen minimizing every
conditional probability for itself. With the chosen decision rule one obtains
108 4 Information Channels
Pðe=yj Þ ¼ 1 Pðxi =yj Þ ¼ 1 P Dðyj Þ=yj :
Therefore, to minimize the error probability, for every received symbol yj the
following decision rule should be chosen
Dðyj Þ ¼ x ;
where x* is defined as
In other words, for every received symbol, the error probability will be minimal if
to each yj the input symbol x* is joined having the maximum a posteriori probability
P(x*/yj). Of course, if there are more such symbols, anyone can be chosen.
This rule is denoted as MAP (Maximum A Posteriori Probability rule)
(Problems 4.1, 4.2). To apply this rule the a posteriori probabilities P(xi/yj) should
be found. These probabilities can be found only if besides the transition proba-
bilities, a priori probabilities of input symbols are known.
On basis of Bayes’ rule, previous inequalities can be written as
1
Pðxi Þ ¼ ði ¼ 1; 2; . . .; rÞ;
r
yielding a simple way to find the decision rule on the basis of transition matrix only.
Simply, for every received symbol yj the decision is that symbol xi is emitted for
which in the corresponding column in matrix the conditional probability has a
maximum value. This rule is denoted as ML (Maximum Likelihood rule) (Problem
4.2). Minimal error probability will be obtained if all input symbols are
equiprobable.
For MAP it can be shown that the error probability is
Brief Theoretical Overview 109
X
s
Pe ¼ Pðe=yj ÞPðyj Þ
j¼1
Xs
¼ Pðyj ÞP 1 Pðx =yj
j¼1
X
s X
s
¼ Pðyj Þ Pðyj ÞPðx =yj Þ
j¼1 j¼1
X
s
¼1 Pðx ; yj Þ
j¼1
Xs
¼1 Pðx ÞPðyj =x Þ:
j¼1
1X s
Pe ¼ 1 Pðyj =x Þ;
r j¼1
and error probability is obtained from transition matrix, by adding maximal values
in columns.
The above described decision process can be called as well the hard decision
(Problems 4.2, 4.4, 4.10). This means that the final decision about the received
symbol is made in demodulator (detector). However, it is better to quantize the
output signal, i.e. instead of symbol value to give some estimation (i.e. instead of
the value “1” to generate real (measured) values—0.8; 1.05; 0.98 etc.). In this case,
it is said the soft decision (Problems 3.10, 3.11) is made. The drawback is that more
bits should be used.
In fact, in digital transmission, the channel input is discrete. By noise and other
interferences superposition, the channel output is continuous. By using the corre-
sponding thresholds, the output is quantized becoming discrete. The thresholds
positions influence the error probability. These positions should be chosen to
minimize the error probability.
Up to now, channels without memory are considered. As said earlier, in the
channel with intersymbol interference, the error probability of the received symbols
depends as well and on the adjacent symbols in such a way introducing memory
into the channel. Models for channels with memory are often made using a com-
bination of memoryless channels, usually by so called Markov models (Problem
4.12).
It is obvious that channel without intersymbol interference with additive
Gaussian noise can be modeled as BSC. However, in such channels sometimes
110 4 Information Channels
PGB
PBG
PG ¼
PGB þ PGG
PGB
PB ¼ ;
PGB þ PGG
pB PGB
Pe ¼ pB PB ¼ :
PGB þ PGG
Brief Theoretical Overview 111
The sequence of states can be easily found. The average duration of time
intervals (sojourn time) in good state corresponds to the intervals without the
impulsive noise (without the errors in Gilbert model). The average duration of time
intervals in bad state corresponds to the intervals with the impulsive noise. They are
1 1
NG ¼ ; NB ¼ :
1 PGG 1 PBB
It is only the first step in modeling real channels from the error statistics point of
view. Elliott modified this model by introducing the possibility of errors (pG 6¼ 0)
in the good state (Problem 4.12) [21].
Now, the time came to consider the trough meaning of the channel capacity. It is
given by Second Shannon theorem. Its complicated proof here will be omitted.
According to this theorem, the error probability can be made as small as one wish if
the channel information rate is smaller than the capacity (U(X, Y) < C). If the
signaling rate is v(X, Y), channel information rate is U(X, Y) = v(X, Y) I(X, Y)
[Sh/s]. This result is unexpected. The reliable transmission over the unreliable
channel is possible! It should be noted that the transmission without any error is not
possible. It only means that the error probability can be kept as low as one wish.
This goal can be achieved using error control coding (considered in details in
the next five chapters). To get some insight, an elementary example will be done
here—error control coding using repetitions.
Consider binary memoryless binary source (emitting symbols 0 and 1). BSC
with parameter p = 10−2 (crossover probability) is used. Usually it is high error
probability. One possible way to lower error probability is to repeat every infor-
mation bit three times, i.e. 0 is encoded as 000 and 1 as 111. That means that three
channel bits are used for every information bit. Of course, the corresponding
information rate is three times smaller (i.e. the code rate (Problem 4.12) is 1/3). At
the receiving end the majority logic is implemented. Three bits are decoded
according to the greater number of ones or zeros as one or zero. It is in fact MAP
criterion. Probability of one error is this case is 3p(1–p)2 = 3 0.01 0.992 =
0.0294, while the probability of two errors is 3 p2 (1 − p) = 3 0.0001 0.99
= 0.000297. The probability of undetected error is
3 3 3 2
Pe ¼ p þ p ð1 pÞ ¼ 0:000298:
0 1
Therefore, single errors in a code word are detected and corrected. If the errors
are only detected, then all single and double errors will be detected, but the error
probability would be Pe = p3 = 10−6. Considering further fivefold repetition, with
error correction, the probability of uncorrected error is
5 5 5 4 5 3
Pe ¼ p þ p ð1 pÞ þ p ð1 pÞ2 105 :
0 1 2
112 4 Information Channels
10-2
10-4
10-6
10-8
ε
Of course, the code rate is now R = 1/5. The number of repetition can be further
increased (Pe 4 10−7 for sevenfold repetition and Pe 10−8). Corresponding
results are shown using fat points in Fig. 4.6. Note the reciprocal values at the
abscissa.
By considering this figure, it would be easy to understand the essence of error
control coding and the corresponding problems. In this case (n-fold repetition), to
obtain small error probability (<e) a code rate must be drastically decreased.
Therefore, the logical question is: does exist some way to decrease the error
probability, without such decrease of code rate? Second Shannon theorem just gives
a positive answer.
Problems
Problem 4.1 Zero-memory binary source emits symbols with rate vs = 100 [b/s],
the probability of one symbol is P(x1) = 0.3. The corresponding channel is
described by transition matrix
0:4 0:6
P¼ :
0:75 0:25
(a) Find the entropy and the information rate of the source.
(b) Find the mutual (transmitted) information and the information rate of the
channel.
(c) Find the decision rule for the receiver yielding the minimum error probability.
(d) Repeat the same as in (c) for P(x1) = 0.2.
Problems 113
Solution
(a) Entropy and information rate are the source characteristics. Finding
P(x2) = 1 − P(x1) = 0.7, the following is calculated
X
2
1 Sh
HðXÞ ¼ Pðxi Þ ld ¼ 0:882 ;
i¼1
Pðxi Þ b
For the case of equiprobable symbols, the entropy equals one and the infor-
mation rate has the maximum value—100 [Sh/s].
(b) Transmitted (mutual) information is [1, 2]
X
2 X
2
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ
• Joint probabilities of binary symbols at the input and the output of the
channel are
Pðx1 ; y1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ ¼ 0:12
Pðx2 ; y1 Þ ¼ Pðx2 ÞPðy1 =x2 Þ ¼ 0:525
Pðx1 ; y2 Þ ¼ Pðx1 ÞPðy2 =x1 Þ ¼ 0:18
Pðx2 ; y2 Þ ¼ Pðx2 ÞPðy2 =x2 Þ ¼ 0:175:
Transmitted information is
0:4 0:75 0:6
IðX; YÞ ¼ 0:12 ld þ 0:525 ld þ 0:18 ld
0:645 0:645
0:355
0:25 Sh
þ 0:175 ld ¼ 0:0793 ;
0:355 b
From this follows that the “opposite” decision rule is optimum in this case, and
the error probability is
leading to an unexpected result. However, the decision rule must provide that for
the different output symbols, the decision must be that different input symbols are
emitted. In this case the decision can be based on the a posteriori probabilities, or
one have to choose the rule giving a smaller error probability, i.e.
Problems 115
and the conclusion is that in this case, the optimum decision rule is
dMAP ðy1 Þ ¼ x2 ; dMAP ðy2 Þ ¼ x1 :
Problem 4.2 Zero-memory binary source where the probability of one symbol is
P(x1) = p emits the symbols at the input of memoryless channel whose transition
matrix is
0:3 0:7
P¼ :
0:01 0:99
(a) Find the entropies of input and output sequences and conditional entropy of
output symbols when the input sequence is known.
(b) Find the transmitted information for p = 0.1, p = 0.5 and p = 0.9. For the
same values of p find the decision rule yielding the minimum error probability
as well as the probability of error in this case.
(c) Draw the transmitted information as a function of p. Draw the dependence of
the error probability as a function of p for the cases when the maximum a
posteriori probability (MAP) rule is used and when maximum likelihood
(ML) rule is used.
Solution
(a) For zero-memory binary source the entropy is
1 1
HðXÞ ¼ p ld þ ð1 pÞ ld ;
p 1p
1 1
HðYÞ ¼ ð0:3p þ 0:01ð1 pÞÞ ld þ ð0:7p þ 0:99ð1 pÞÞ ld :
0:3p þ 0:01ð1 pÞ 0:7p þ 0:99ð1 pÞ
Conditional entropy of output symbols when the sequence of the input sym-
bols is known can be expressed as
1 1 1
HðY=XÞ ¼ 0:3p ld þ 0:01ð1 pÞ ld þ 0:7p ld
0:3 0:01 0:7
1
þ 0:99ð1 pÞ ld :
0:99
116 4 Information Channels
1
IðX; YÞ ¼ HðYÞ HðY=XÞ ¼ ð0:3p þ 0:01ð1 pÞÞ ld
0:3p þ 0:01ð1 pÞ
1
þ ð0:7p þ 0:99ð1 pÞÞ ld
0:7p þ 0:99ð1 pÞ
1 1 1 1
0:3p ld 0:01ð1 pÞ ld 0:7p ld 0:99ð1 pÞ ld ;
0:3 0:01 0:7 0:99
p¼0:1 p¼0:5
IðX; YÞ ¼ 0:0768 ½Sh/b ; IðX; YÞ ¼ 0:1412 ½Sh/b ;
p¼0:9
IðX; YÞ ¼ 0:0417 ½Sh/b :
To obtain the minimum probability of error MAP decision rule is used as follows
(1) For p = 0.1
i.e. does not depending in any way on the probabilities of the input binary
symbols.
The transmitted information as a function of p is shown in Fig. 4.7 and from
the figure it can be found that the maximum corresponds to value p = 0.409.
Probability of error as a function of p for MAP and ML decision rules are
shown in Fig. 4.8. It is obvious that the minimum error probability
0.16
0.14
Transitted information, I(X,Y)
0.12
0.1
0.08
0.06
0.04
0.02
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a symbol at the channel input, p
0
10
Pe , MAP
Pe , ML
Pe for optimum decision rule
−1
10
d MAP (y 1)=x 2
d MAP (y 1)=x 1 d MAP (y 2)=x 1
d MAP (y 2)=x 2
−2
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a symbol at the channel input, p
Problem 4.3 Zero-memory binary source where the probability of one symbol is P
(x1), emits the symbols at the input of memoryless channel described by the tran-
sition matrix
1 p0 p0
P¼ :
p1 1 p1
(a) For P(x1) = 0.1 find the dependence of I(X,Y) on the channel parameters.
(b) Draw the dependence of I(X, Y) on p0 for p1 = 0.6 and P(x1) = 0.1; P
(x1) = 0.5 and P(x1) = 0.9.
(c) For a fixed value p1 = 0.6, for p1 = 0.01 and p1 = 0.99 find the dependence of
I(X, Y) and of the decision rule on the probability of symbol 0 in the incoming
sequence.
(d) For p0 = p1 = p find the probability P(x1) yielding the maximum value of I(X,Y).
Solution
(a) The transmitted information for binary asymmetric channel with transition
matrix P is
Problems 119
I(X,Y)
p0
p1
0.4
0.35 P(0)=0.1
P(0)=0.5
Transmited information, I(x,y)
P(0)=0.9
0.3
0.25
0.2
0.15
0.1
0.05
−0.05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a binary symbol, P(0)=p
Fig. 4.10 The dependence of transmitted information on p0 for p1 = 0.6 and some values of P(x1)
ð1 p0 Þ p0 p1 ð1 p1 Þ
IðX; YÞ ¼ Pðx1 Þ ð1 p0 Þ ld þ p0 ld þ Pðx2 Þ p1 ld þ ð1 p1 Þ ld :
Pðy1 Þ Pðy2 Þ Pðy1 Þ Pðy2 Þ
If the parameters p0 and p1 are not fixed, the transmitted information can be
considered as a three-dimensional function shown in Fig. 4.9. The maximum is
120 4 Information Channels
0.4
p =0.99
0
p =0.01
0.35 0
0.25
0.2
0.15
0.1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a binary symbol, P(0)
Fig. 4.11 Dependence of transmitted information on P(x1) for p1 = 0.6 and different values of p0
It is easy to verify that it is very difficult to find the optimum value of P(x1) in
a closed form and it is easier to obtain it numerically, by finding the position of
transmitted information maximum for a fixed p0 and p1. For p0 = 0.01 and
p1 = 0.6 maximum value of transmitted information is obtained for P
(x1)max = 0.5770 and in the case p0 = 0.99 and p1 = 0,6 for P
(x1)max = 0.5890.
For p0 = p1 = p, the transmitted information corresponds to the main diagonal
in Fig. 4.9. Then binary channel becomes binary symmetric channel
(BSC) and the transmitted information is
Problems 121
1 1
IðX; YÞ ¼ ða þ p 2apÞ ld ð Þ þ ð1 a p þ 2apÞ ld ð Þ
a þ p 2ap 1 a p þ 2ap
1 1
p ld ð Þ þ ð1 pÞ ld ð Þ :
p 1p
and the optimum value P(x1) is aopt = 0.5 regardless on p value. The
maximum transmitted information is
1 1
IðX; YÞmax ¼ 1 p ld ð Þ þ ð1 pÞ ld ð Þ :
p 1p
(d) Find the capacity of channel when noise is present, obtained by removing
line encoder and decision block (the case N = 1 is considered). How to
achieve the capacity in this case?
Solution
(a) System block-scheme is shown in Fig. 4.12. The source information rate is
1 3 4 kSh
UðSÞ ¼ vs ld 4 þ ld ¼ 811:3 :
4 4 3 s
Channel – LP
0100 Line encoder Decision
Binary filter with noise
(polar code) (comparison User
source fg=1MHz,
vs=1Mb/s U=1V 2 with threshold )
=0.1W
n(t )
Equivalent BSC
ðNÞ
pekv ¼ ð1 ð1 2pÞN Þ=2;
ðNÞ ðNÞ
Cekv ¼ max vmax ðX; Y ÞUekv ðX; YÞ
Pð0Þ
ðNÞ ðNÞ ðNÞ ðNÞ
¼ vmax ðX; Y Þ 1 þ pekv ld ðpekv Þ þ ð1 pekv Þ ld ð1 pekv Þ :
5.5
5 Continuous channel
Binary channel
4.5
3.5
Capacity, C
2.5
1.5
0.5
0 5 10 15
Signal to noise ratio a n [dB]
Fig. 4.13 Capacities of discrete and continuous channel corresponding to one section for An = 10
124 4 Information Channels
C ¼ fg ld ð1 þ An Þ;
and the channel capacity can be achieved only in the case where at his input a
continuous information source generating the Gaussian random process is applied.
In this case, the process at the channel (linear system) output has the Gaussian
distribution as well.
It is obvious that binary channel has a smaller capacity compared to the corre-
sponding continuous channel with noise. This is a consequence of binary signaling
used at the transmitting part and of a hard decision at the receiving part, trans-
forming the channel with noise into a binary symmetric channel. In the next
problem, it will be shown that by applying a different decision rule at the receiver,
this difference could be made smaller.
Problem 4.5 Digital communication system consists of
1. Information source emitting symbols s1, …, s6, with rate vs = 800 [symb/s].
Symbol probabilities are
Solution
(a) System block-scheme is shown in Fig. 4.14. On the basis of the five given
probabilities, the probability of the sixth symbol is
Problems 125
P(si), H(S)
s1 ,..., s6 Statistical encoder 0, 1
Discrete
(Huffman
memoryless channel
vs = 800 [simb /s] algorithm) v( X , Y )
BSC 2 BSC 1
Equivalent BSC
Receiver
(statistical decoder and user)
s4 0.14
0
s5 0.07 0.26
0 1
0.12
s6 0.05 1
X
5
Pðs6 Þ ¼ 1 Pðsi Þ ¼ 0:05;
i¼1
X
6
Sh
HðSÞ ¼ Pðsi Þ ld ðPðsi ÞÞ ¼ 2:366 ;
symb
i¼1
Sh
UðSÞ ¼ vs HðSÞ ¼ 1893:3 :
s
126 4 Information Channels
(b) The procedure of the Huffman code obtaining is shown in Fig. 4.15, and it is
obvious that to symbols s1–s6 correspond the code words ‘00’, ‘10’, ‘11’,
‘010’, ‘0110’ and ‘0111’, respectively. Average code word length and effi-
ciency are
X
6
b HðSÞ
L¼ Pðsi Þli ¼ 2:38 ; g¼ 100% ¼ 99:41%:
i¼1
symb L
The probability of zeros and ones can be found by calculating their average
number per code words as follows
and
In this case UðXÞ [ UðSÞ, what is, of course, impossible because the encoder
cannot introduce the new information. This illogical result is a consequence of
the assumption that the sequence at the encoder output is uncorrelated, what is
not true in fact. The introduced imprecision is small and it can be corrected
having in mind that the previously found entropy is the entropy of adjoint
source, and the true entropy is
HðXÞ ¼ UðSÞ=vðX; YÞ ¼ 0:9944\0:9949 ¼ HðXÞ:
Problems 127
(c) If the probability of binary zero at the channel input is denoted by a, the
transmitted information can be calculated using the expression
1 1
IðX; YÞ ¼ ða þ p1 2ap1 Þ ld ð Þ þ ð1 a p1 þ 2ap1 Þ ld ð Þ
a þ p1 2ap1 1 a p1 þ 2ap1
1 1
p1 ld ð Þ þ ð1 p1 Þ ld ð Þ ;
p1 1 p1
and
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p1 ¼ ð1 1 2pÞ;
2
The source is already using the maximum signaling rate, and the following is
valid vðX; YÞ ¼ vmax ðX; YÞ. Therefore, the capacity calculation is carried out
by maximizing the transmitted information by the choice of the symbol
probabilities at the channel input. If these symbols are equiprobable, it is
obtained [1]
1 1 Sh
C ¼ vðX; YÞ 1 p1 ld ð Þ ð1 p1 Þ ld ð Þ ¼ 1011 :
p1 1 p1 s
It is important to note that in this case C\UðSÞ, therefore, it is obvious that for
this channel (with a given error probability) it is impossible to transmit all infor-
mation emitted by the source. However, if the channel state becomes substantially
better—the probability of error decreasing to p1 = 10−4, the capacity should
increase to C = 1901.2 [Sh/s].
It can be noticed that for the information transmission a more unreliable channel
can be used if the encoder efficiency is smaller. In the case when the encoder is
directly at the channel input, the redundancy introduced by the encoder allows the
reliable transmission over the unreliable channel. In other words, without the
redundancy, the information without any distortion can be transmitted over the
completely reliable channel only. This feature will be used in theory of error control
codes where the redundancy will be intentionally introduced to make possible the
information transmission over very unreliable channels.
Problem 4.6 Binary source completely defined by the probability P(0) = a, sends the
bits to the error control encoder which repeats every bit m times. These bits are further
sent over the binary symmetric channel where the crossover probability is p = 0.1. On
the receiving end is the error control decoder using majority logic decision.
Do the following:
(a) Find the entropy and information rate if the source emits the bits with the rate
vi = 1 [Mb/s] and if a = 0.4.
(b) Find the bit error probability observed by the user. Calculate the transmitted
information over the equivalent channel. Draw its dependence as a function of
the number of repetitions for a = 0.1, a = 0.2 and a = 0.4. Comment the results.
(c) If the maximum signaling rate over the channel is vb = 10 [Mb/s], find the
time needed for transmission of N = 107 bits under the condition that the error
probability observed by the user is smaller than Pe,min = 10−2. Whether by the
use of some procedure this time can be additionally shortened for a = 0.2?
Solution
(a) Entropy and information rate for zero-memory binary source are
Sh
HðSÞ ¼ a ld ðaÞ ð1 aÞ ld ð1 aÞ ¼ 0:9709 ;
symb
kSh
UðSÞ ¼ vi ½a ld ðaÞ ð1 aÞ ld ð1 aÞ ¼ 970:9 :
s
(b) When applying n repetitions error control coding with majority logic and
when the channel is symmetric and binary, with the crossover probability p,
the error probability observed by user is
Problems 129
0.9
a=0.4
0.8
H(S), I(X,Y)
0.7
0.6
a=0.2
0.5
0.4
0.2
2 4 6 8 10 12 14
Number of repetitions, n
Fig. 4.16 Entropy of input sequence and the transmitted information for p = 0.1
X
n
n k
PeðnÞ ¼ p ð1 pÞnk ;
k
k¼ðn1Þ=2 þ 1
1
I ðnÞ ðX; YÞ ¼ ða þ PeðnÞ 2aPeðnÞ Þ ld ð ðnÞ ðnÞ
Þ
a þ Pe 2aPe
1
þ ð1 a PeðnÞ þ 2aPeðnÞ Þ ld ð ðnÞ ðnÞ
Þ
1 a Pe þ 2aPe
1 1
PeðnÞ ld ( ðnÞ Þ ð1 PeðnÞ Þ ld ð ðnÞ
Þ:
Pe 1 Pe
t ¼ N=vb ¼ 5½s :
This time can be decreased if a sufficient source extension is done, after that by
the use compression encoding and after the repetition encoding as well. The
error probability does not change by this procedure and the code with R = 1/5
is the optimum one. On the other hand, N bits of the source (S) alphabet can be
substituted by N H(S) symbols of code alphabet and for a = 0.2 it is
obtained
si s1 s2 s3 s4 s5 s6 s7
P(si) 0.3 0.2 0.15 0.15 0.1 0.05 0.05
2. Binary Huffman encoder and the corresponding decoder at the receiving end.
3. Error control encoder using threefold repetition and the corresponding decoder
with majority logic at the receiving end (these blocks are included optionally).
4. Channel consisting of two cascaded sections where one is modeled by binary
symmetric channel (crossover probability equals p = 10−2) and the other is
binary asymmetric channel with parameters p1 = 2p0 = 2p.
Problems 131
Solution
(a) The source characteristics are easily obtained
X
7
1 Sh Sh
HðSÞ ¼ Pðsi Þ ld ¼ 2:57 ; UðSÞ ¼ vs HðSÞ ¼ 2570 :
i¼1
Pðsi Þ symb s
Code words obtained by Huffman encoding are given in Table 4.2, from
which the average code word length, code efficiency and the compression ratio
are
X
7
b
L¼ Pðsi Þli ¼ 2:6 ; g ¼ 98:88%; q ¼ 1:1538:
i¼1
symb
P (si ) P (xi)
Statistical encoder
s1 ,..., s 7
Source (Huffman 0, 1
v s H (s) Φ ( s) algorithm)
Error control
encoder
(repetition 3x)
PBSC
BSC
PBNC
BNC
equivalent BC
Error control
v (X ,Y )
decoder
I (X ,Y )
(majority voting)
Φ ( x, y) superchannel
Statistical
User
decoder
yielding
X
2
1
¼
HðXÞ Pðxi Þ ld ¼ 0:9957 ½Sh/symb ;
i¼1
Pðxi Þ
¼ vðX; YÞHðXÞ
UðXÞ ¼ 2588:82 ½Sh/symb :
Problems 133
2 X
X 2
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ
Pðy1 =x1 Þ ¼ ð1 pÞ2 þ 2p2 ¼ 0:9803; Pðy2 =x1 Þ ¼ pð2 3pÞ ¼ 0:0197;
Pðy1 =x2 Þ ¼ 3pð1 pÞ ¼ 0:0297; Pðy2 =x2 Þ ¼ ð1 pÞð1 2pÞ þ p2 ¼ 0:9703:
In this case it is obvious that the optimum decision rule is d(y1) = x1, d
(y2) = x2 and the error probability without the error control coding is
(c) The case with three repetitions now will be considered. If at the output of
Huffman encoder the binary zero is sent to the error control encoder, at its
output the sequence ‘000’ is generated and at the error control decoder output
binary zero will be generated in the cases without transmission errors or when
only one error occurred at these three bits. Otherwise, the error will occur for
binary zero transmission. Therefore, the superchannel (consisting of error
control encoder, channel and error control decoder) parameters are
PSK ðy1 =x1 Þ ¼ Pðy1 =x1 Þ3 þ 3Pðy1 =x1 Þ2 Pðy2 =x1 Þ ¼ 0:9989;
PSK ðy2 =x1 Þ ¼ 3Pðy1 =x1 ÞPðy2 =x1 Þ2 þ Pðy2 =x1 Þ3 ¼ 0:0011;
PSK ðy1 =x2 Þ ¼ 3Pðy2 =x2 ÞPðy1 =x2 Þ2 þ Pðy1 =x2 Þ3 ¼ 0:0026;
PSK ðy2 =x2 Þ ¼ Pðy2 =x2 Þ3 þ 3Pðy2 =x2 Þ2 Pðy1 =x2 Þ ¼ 0:9974:
The probabilities of input symbols are not changed and the output symbol
probabilities are
Binary rate in the channel is three times greater than the rate at the coder input
– the code rate is R = 1/3. If the binary rate in the channel cannot be increased
(the channel is the same) information rate at the error control coder must be
decreased three times, i.e.
and the source should emit the symbols three times slower.
Finally, transmitted information and information rate can be calculated
being 13.5 times smaller than in the case without error control coding.
It is important to notice the following
• If the channel is connected directly to the information source, the channel
cannot transmit more information (per second) than the source sent, i.e.
UðX; YÞ UðSÞ
C UðSÞ;
UðX; YÞ ¼ UðSÞ;
• If the error control code enabling the negligible error probability is applied,
the transmitted information through the superchannel is ISK ðX; YÞ ¼ 1, the
above relation becoming
R IðX; YÞ;
Problem 4.8 Zero-memory binary source, where the probability of binary zero is P
(x1), is at the input of line encoder which generates polar binary signal with voltage
levels ±1. From line encoder the signal is emitted into the channel with additive
white Gaussian noise (r = 0.5). The decision block is at the channel output with
two thresholds ±G (0 G 1).
(a) Find the transmitted information for P(x1) = 0.5 and G = 0.5.
(b) Draw the transmitted information versus G when P(x1) = 0.5.
(c) Draw the transmitted information versus P(x1) for G = 0, G = 0.5 and G = 1
and compare it with source entropy.
Solution
Signal (amplitude) probability density function at the channel output depends on
voltage level emitted by line encoder. It is shown in Fig. 4.18. If the received signal
is below the lower threshold (u < −G), the decision is that the symbol ‘0’ is at the
discrete channel output, while, in the case u > G, the decision is ‘1’. If the received
signal is between thresholds, the decision would not be sufficiently reliable. In this
case the symbol is considered as erased, i.e. to it formally corresponds symbol E.
The probabilities for the received signal to be between the thresholds, and above the
more distant threshold are
1 1G 1 1þG
p1 ¼ wY ðG\y\G=x ¼ 1Þ ¼ wY ðG\y\G=x ¼ þ 1Þ ¼ erfc pffiffiffi erfc pffiffiffi ;
2 2r 2 2r
1 1þG
p2 ¼ wY ðy [ G=x ¼ 1Þ ¼ wY ðy\ G=x ¼ þ 1Þ ¼ erfc pffiffiffi :
2 2r
Probability density function of the output signal
0.4
wY (y/x=+1)
0.35 wY (y/x=−1)
0.3
0.25
0.2
0.15 −G G
’0’ ’1’
0.1
0.05 ’E’
0
−4 −3 −2 −1 0 1 2 3 4
Received signal, u
Fig. 4.18 Probability density functions of the signal at channel output, equivalence with BEC
(Binary Erasure Channel)
Problems 137
where for the erasure channel often is supposed p2 0, not supposed here.
Transmitted information is
ð1 p1 p2 Þ p1 p2
IðX; YÞ ¼ Pðx1 Þ ð1 p1 p2 Þ ld þ p1 ld þ p2 ld
Pðy1 Þ Pðy2 Þ Pðy3 Þ
p2 p1 ð1 p1 p2 Þ
þ Pðx2 Þ p2 ld þ p1 ld þ ð1 p1 p2 Þ ld ;
Pðy1 Þ Pðy2 Þ Pðy3 Þ
where
1
P(x1 )=0.5
0.9
P(x1 )=0.3
Transmited information, I(X,Y)
P(x1 )=0.1
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Absolute value of the threshold, G
Fig. 4.19 The transmitted information dependence on the threshold G absolute value
138 4 Information Channels
0.8
0.7
0.6
0.5
0.4
0.3
H(X)
0.2
I(X,Y), G=0,0001
I(X,Y), G=0.2
0.1 I(X,Y), G=0,9
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a simbol at the channel input, P(x 1)
Fig. 4.20 The dependence of the transmitted information on the input signal probabilities
si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01
Whether it is justified to put the source output at the input of the channel modeled
as a pass-band filter with the frequency band B = 1 [MHz] if the noise power
spectrum density is N0 = 10−14 [W/Hz] and if the signal power is Ps = 0.1 [lW]?
Problems 139
Solution
The source information rate is
X
8
UðSÞ ¼ vs Pðsi Þ ld ðPðsi ÞÞ ¼ 2:981 ½MSh/s ;
i¼1
The information rate is smaller than the channel capacity, and it is justified to use
the channel for a given source. According to the first Nyquist criterion for trans-
mission in the transposed frequency band, it should be provided for the signaling rate
to be smaller than vmax = 1 [Msymb/s] (the set of signaling symbols, as a rule, is not
the same as emitted by the source!). As the minimum binary rate over the channel is
determined by the information rate, it is obvious that BPSK cannot be used, but some
M-level modulation scheme could be used, providing U(S)/ld(M) < vmax.
Problem 4.10 Zero-memory binary source generating equiprobable symbols is at
the line encoder input which generates polar binary signal having the voltage levels
±U. From line encoder the signal is emitted over the channel with additive white
Gaussian noise (standard deviation is r). At the channel output are the quantizer
(s = 2n quantization levels) and a corresponding decision block. The bounds of the
quantization intervals are
Solution
(a) Because of the quantizer, the complete voltage range is divided into
s non-overlapping intervals every corresponding to one discrete symbol. When
s = 4 the signal probability density function at the channel output is shown in
140 4 Information Channels
0.35 w Y (y/x=−1)
0.3
0.25
0.2
0.15
’0’ ’3
0.1
’1’ ’2’
0.05
0
−4 −3 −2 −1 0 1 2 3 4
Received signal, u
Fig. 4.21, and a graph corresponding to discrete channel with r = 2 inputs and
s = 4 outputs is shown in Fig. 4.22.
The bounds of quantizing intervals are
Pðyj =xi Þ ¼ 8
pjjij n o
> 1 erfc min UX ðiÞU
> pffiffigr pffiffi
ðj1Þ UX ðiÞUgr ðjÞ
;
>
> 2
n 2r 2r o
>
>
>
> 1
2 erfc max pffiffi ; pffiffi
UX ðiÞUgr ðj1Þ UX ðiÞUgr ðjÞ
; i 6¼ j; 1\j\s
>
> 2r 2r
<
¼ 12 erfc X pffiffi2rgr
U ðiÞU ð1Þ
; i 6¼ j ¼ 1
>
>
>
> erfc U
pffiffi
ðiÞU ðs1Þ
>
>
1 X gr
; i 6¼ j ¼ s
>
>
2
P 2r
>
>
: 1 Pðyj =xi Þ i ¼ j:
j6¼i
X
2 X
s
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ
1X 2 X s
2Pðyj =xi Þ
Imax ðX; YÞ ¼ Pðyj =xi Þ ld :
2 i¼1 j¼1 Pðyj =x1 Þ þ Pðyj =x2 Þ
For U = 1 [V] and r = 0.5 [V], the transmitted information for a given values
of s is given in Table 4.3. It is obvious that with the increased number of
quantization levels, the transmitted information increases as well, but after
some number of levels, the saturation is achieved. In the case when the
number of symbols at the channel input is very great—corresponding to the
levels being very close each other—the channel with discrete input (r = 2) and
a continuous output is obtained. It should be noticed as well that the quanti-
zation levels here are chosen in a suboptimal way (only the range from −2U to
+2U was quantized) and that the additional increase in the transmitted
0.9
Imax(X,Y) = Cd / vmax(X,Y)
0.8
0.7
0.6
Capacity of continuous channel
Entropy of binary source
0.5 Transmited information for r=2, s=2
Transmited information for r=2, s=4
Transmited information for r=2, s=8
0.4 Transmited information for r=2, s=16
0.3
0 1 2 3 4 5 6 7 8 9 10
Signal to noise ratio, snr[dB]
Fig. 4.23 Normalized continuous channel capacity and capacity of the corresponding discrete
channel with r = 2 inputs and s outputs
Having in view that the data for calculating the maximum signaling rate for a
channel are not available, the capacity will be normalized to that rate (this
approach is very often used in the literature).
The corresponding numerical values are shown in Fig. 4.23. It is obvious that
for all signal-to-noise ratio values the increasing of the number of quantization
levels increases the transmitted information as well. In such a way it is
practically confirmed that soft decision is superior to hard decision where the
received signal is compared to one threshold value only.
It is interesting to find as well the capacity of the continuous channel with
noise, the channel corresponding to the system part from line encoder output
to the quantizer input.
C ¼ B ld ð1 þ Ps =Pn Þ:
Problems 143
Although the frequency band is not given in this problem, it is obvious that the
signaling is in the baseband and the condition vmax ðX; YÞ 2B must be sat-
isfied, and the maximum channel capacity (corresponding to the system
modeled as an ideal low-pass filter) is
C=vmax ðX; YÞ ¼ ld 1 þ U 2 =r2 =2:
Solution
(a) System block-scheme is shown in Fig. 4.24. It is obvious that the binary
source combined with the extension block emits M equiprobable symbols, and
the entropy of the combined source is
X2n
1
HðSn Þ ¼ n
ld ð2n Þ ¼ nHðSÞ ¼ n ½Sh/symb :
i¼1
2
144 4 Information Channels
0100
Binary
2->M=2n M->2 User
source
M-ary source
Channel – LP Decision
Line encoder
filter with (comparison
(M-ary polar
noise with
code) Ps =1[W]
fg=100 [kHz] threshold)
n(t )
Equivalent BSC
pn = 5 ×10−6 [W /Hz]
The receiving part of this system is practically the same as in the previous
problem. The transmitting part differs, because n binary symbols are firstly
grouped (in the extension block) and every symbol in line coder is represented
by one (from M) voltage level
Multilevel signal has a unity power and the following relation must hold
" #
1X M
1X M
U2 X M XM
Psr ¼ Ux2 ðiÞ ¼ ð2i M 1Þ2 U 2 ¼ 4 i2 4ðM þ 1Þ i þ ðM þ 1Þ2 ¼ 1;
M i¼1 M i¼1 M i¼1 i¼1
X
M
MðM þ 1Þ X
M
MðM þ 1Þð2M þ 1Þ
i¼ ; i2 ¼ ;
i¼1
2 i¼1
6
(b) Equivalent discrete channel now has M inputs and M outputs and for M = 4 is
described by the graph shown in Fig. 4.25, and the corresponding transition
matrix is
2 3
1 p1 p2 p3 p1 p2 p3
6 p1 1 2p1 p2 p1 p2 7
P¼6 4
7:
5
p2 p1 1 2p1 p2 p1
p3 p2 p1 1 p1 p2 p3
Transition probabilities are found by using the same equalities as in the pre-
vious problem, with only a few changes—parameter s is substituted by M,
values Ugr(l) are given in the problem text being
9ð2i M 1Þ
UX ðiÞ ¼ ð2i M 1ÞU ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; i ¼ 1; 2; . . .; M:
M2 1
Table 4.4 Transmitted information for some values of M (equiprobable input symbols)
s 2 4 8 16 32 64 1024
I(X,Y) 0.8434 0.9668 1.0691 1.1025 1.1123 1.1153 1.1170
[Sh/symb]
146 4 Information Channels
Each of this probabilities should be in the range from 0 to 0.5, but from the
Fig. 4.27 it is obvious that the optimal value of probability P(x1) = P(x4)
depends on the noise power and has a greater value when the noise power is
4.5
2.5
I(X,Y)
1.5
0.5
0
0 5 10 15 20 25
Signal to noise ratio, snr [dB]
1.8
Transmited information, I(X,Y)
1.6
1.4
1.2
0.8
0.6 σ=0.1
σ=0.3
σ=0.5
0.4
0.2
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Probability of symbols 00 and 11
Fig. 4.27 Optimization of parameters P(00) = P(11) for signaling with M = 4 levels
Problems 147
Problem 4.12 In one data transmission system zero-memory binary source sends
bits with the rate vb = 100 [kb/s] to the BPSK modulator. Modulated signal with
the average power Ps = 5 [lW] is transmitted over the channel modeled by an ideal
pass-band filter. The average power density spectrum of the additive white
Gaussian noise in the channel is N0 = 2 10−11 [W/Hz]. The bandwidth is chosen
so as to eliminate the intersymbol interference obtaining in the same time the
minimum noise power at the channel output.
Beside the Gaussian noise, in the channel there is the impulsive noise which has
the average pulse duration TL = 50 [ls] and the average interval between two
pulses is TD = 500 [ls]. The impulsive noise has also the Gaussian distribution and
the average power spectral density pi = 10−9 [W/Hz], being constant as well in the
frequency band of interest. The BPSK signal receiver is using integrate and dump
circuit.
(a) Find the signal-to-noise ratio in the channel in intervals without the impulsive
noise, as well as when it is active.
(b) Find the error probability in intervals without the impulsive noise as well as
when it is active. By which discrete model the corresponding channel error
sequence can be modeled?
(c) Find the error probability in the good and in the bad state (impulsive noise), as
well as the average number of bits transmitted in every state. Find the channel
average error probability.
(d) Draw the autocorrelation function of the corresponding error sequence.
Solution
The channel is modeled as an ideal pass-band filter. The bandwidth is chosen so as
to eliminate the intersymbol interference, and for such transmission the rate cor-
responds to the bandwidth multiplied by the any integer n, being
25
20
15
10
5
Noise level
−5
−10
−15
−20
−25
0 5 10 15 20
Time [ms]
Let the “good” state corresponds to the intervals without the impulsive noise in
the channel. In this case noise power and the average signal-to-noise ratio are
Ps
r2n ¼ pn Bmin;no ISI ¼ 2 ½lW ; SNRD ¼ ¼ 2:5:
r2n
The “bad” state corresponds to the intervals when both noises are present, and
the impulsive noise power, total noise power and the average signal-to-noise ratio
are
Ps
r2i ¼ pi Bmin;no ISI ¼ 100 ½lW ; r2tot ¼ r2n þ r2i ¼ 102½lW ; SNRL ¼ ¼ 0:0490:
r2tot
The time form of the channel noise signal is illustrated in Fig. 4.28.
(a) The receiver is realized as a integrate and dump circuit, the probability of error
is
rffiffiffiffiffiffi
1 Eb
Pe ¼ erfc :
2 N0
PDL
Eb Ps =vb B
¼ ¼ SNR ;
N0 Pn =B vb
and for vb = B, in this case, the error probabilities in good and bad states are
1 pffiffiffiffiffiffiffiffiffiffiffiffi
pD ¼ erfc SNRD ¼ 0:0127;
2
1 pffiffiffiffiffiffiffiffiffiffiffi
pL ¼ erfc SNRL ¼ 0:3771:
2
This channel can be successfully modeled using the Gilbert-Elliott model [20,
21] where it is supposed that every state can be considered as a binary sym-
metric channel, the corresponding matrices are
vG pG vB pB
PG ¼ ; ðvG þ pG ¼ 1Þ and PB ¼ ; ðvB þ pB ¼ 1Þ;
pG vG pB vB
and the transition from one state into another can be illustrated graphically, as
shown in Fig. 4.29.
(b) Stationary probabilities of good and bad state in Gilbert-Elliott model are fully
determined by transition probabilities from good into the bad state (PGB) and
vice versa (PBG) being
PBG PGB
pG ¼ ; pB ¼ ;
PGB þ PBG PGB þ PBG
NG ¼ 1=PGB ; NB ¼ 1=PBG :
On the base of the average duration of time intervals in the good and the bad
state (TG = 500 [ls], TB = 50 [ls]) with the signaling interval duration
Tb = 1/vb = 10 [ls], a model should be found where in the good state the
average number of emitted bits is NG = TG/Tb = 50, and for every sojourn in
the bad state the average number of emitted bits is NB = TB/Tb = 5. It can be
easily found
150 4 Information Channels
1
N G=50, N B=5
0.9
N G=50, N B=50
0.8 N G=5, N B=50
0.7
Normalized ACF
0.6
0.5
0.4
0.3
0.2
0.1
0
−100 −80 −60 −40 −20 0 20 40 60 80 100
Discrete shift
yielding
pG ¼ 0:9091; pB ¼ 0:0909:
pG PBG pB PGB
Pe ¼ pG pG þ pB pB ¼ þ ¼ 0:0458:
PGB þ PBG PGB þ PBG
. . .GGGGGGGGGGGGBBBGGGGGGGGGGGGGBBBBBGGGGGGGGGGGGGGGG. . .
. . .00000100000010100000000000001010100000000000000. . .
where the bold and underlined bits correspond to the bad state.
Normalized sequence of error autocorrelation function obtained on the base
1X N
Re ðkÞ ¼ eðnÞeðn þ kÞ;
N n¼1
In the remaining part of this book error control codes will be considered. As said
earlier, these codes enable detection and possible correction of transmission errors.
Usually, it is supposed that at the encoder input there is a series of bits, statistically
independent and equally probable. It can be a result of previous data compression or
scrambling. To detect and correct the errors some redundancy should be added.
Depending on the way how the redundancy is added, the error control codes are
divided into two families—block codes and convolutional codes. Further subdi-
vision is possible, but it will be postponed and commented in the corresponding
chapters. Block codes are invented before convolutional codes. These last will be
considered in the next chapter.
From the name it is obvious that block encoding consists of taking a block of
input bits (k bits) and representing it with the block of output bits (n bits). This code
is denoted as (n, k) code. In this case the code rate is R = k/n (as used earlier in this
book). Generally, block code can be defined by the corresponding table comprising
k-tuples of input (information) bits and the corresponding n-tuples of output
encoded bits (code words). This table will have 2k rows. For greater values of k it is
not practical and some rule should be used how to obtain the code word from
k input bits. Further, the corresponding decoding rule is needed—how to obtain
information bits from the received code word. In praxis the systematic codes are
especially interesting, where the block comprising k information bits is not changed
in the code word, and n – k control bits are added, forming a code word of the
length n. In the following, it is primarily supposed that binary codes are considered.
But, many of exposed principles and conclusions can be used for forming nonbi-
nary codes as well.
In this chapter will be considered primarily so called linear block codes. At the
beginning, some elementary notions will be introduced. Repetitions codes
(Problem 5.1) are considered in the previous chapter as well (Introductory part and
Problems 4.6 and 4.7). At the end of Sect. 4.6 an example where multifold repe-
tition is considered (BSC where crossover probability is p < 0.5). Here, only the
fivefold repetition will be analyzed—i.e. information bit 0 is encoded as (00000)
and information bit 1 is encoded as (11111). Various decoding strategies can be
used. There are 32 possible received words. According to one decision rule (all
rules are MAP, with majority logic implemented). Five bits are decoded according
to the greater number of ones or zeros as one or zero. Therefore, all single errors
and double errors in a code word are detected and corrected. The feedback channel
is not needed. This procedure is Forward Error Control (FEC). According to the
next decision rule only (00000) and (11111) are decoded as 0 or 1. In this case all
single, double, triple and fourfold bit errors are detected. However, then the repe-
tition of this word is requested over the feedback channel. This procedure is
Automatic Repeat reQuest (ARQ). But, the third decision rule is possible as well.
Besides the combination (00000), (no errors!), all combinations having four 0 and
one 1 are decoded as 0, and analogously the combination (11111) and combinations
having four 1 are decoded as 1. For the other combination having 2 or 3 zeros either
2 or 3 ones, the retransmission is requested. It means that single errors are corrected,
and all double and triple errors are detected. It is a hybrid procedure.
In fact, these rules are based on Hamming distance (d) (Problems 5.2 and 5.4).
For binary codes Hamming distance between two (binary) sequences of the same
length is defined as the number of places (bits) in which they differ. It has the
properties of metric. In the above example the Hamming distance between code
words is d = 5. A discrete five dimensional space can be conceived, having total of
32 points, two of which are code words. Their distance is 5.
According to the first decision rule only the received words identical to code
words are decoded, for the other cases, the retransmission is requested (ARQ), It is
illustrated in Fig. 5.1a, where the code words are denoted by fat points. According
to the second decision the received word is decoded as the nearer word in hamming
sense (FEC) as in Fig. 5.1b. For the third decision rule (hybrid procedure), the
space is divided into three subspaces. The points in the subspaces around code
words are at the distance 1 from code words and single errors are corrected. In the
third subspace (in the middle) are the points having Hamming distance 2 or 3 from
code words and double and triple errors are detected (Fig. 5.1c). Decoder which
every received word decodes as some code word (final decision) is called complete
decoder.
Fig. 5.1 Detection of errors without correction (a), with correction of all (up to 4) errors (b) and
with partial (single error) correction (c)
Brief Theoretical Overview 155
d qd þ 1:
Of course, for a code with more (than two) code words, the worst case should be
considered, i.e. minimum Hamming distance in the code should be found. To
correct qc errors, the minimum Hamming distance should be
qc \d=2;
d 2qc þ 1:
Step further, to detect qd errors and of them to correct qc errors (qd qc), the
following must hold.
d qd þ qc þ 1:
d qe þ 1;
to allow for distinguishing from the nearest code word. Also, to correct qc errors—
the possibility not explicitly considered in BEC definition—and to erase qe bits the
following must hold
d 2qc þ qe þ 1:
Codes using single parity check are probably the oldest error control codes.
Simply, at the end of k-bits information word a single parity check is added gen-
erating (n = k + 1, k) block code. The total number of ones is even. Hamming
distance is d = 2 and any odd number of errors can be detected. The code rate is
R = (n – 1)/n.
156 5 Block Codes
where the positions of parity bits are denoted with “x”. If the block consists of k1k2
information bits the obtained block (“codeword”) will have n = (k1 + 1)(k2 + 1)
bits (k1 + k2 + 1 control bits). Code rate is
k1 k2
R¼ :
ðk1 þ 1Þðk2 þ 1Þ
Maximal code rate value is obtained for k1 = k2 = k, while n = (k + 1)2, and the
code rate is
k2
R¼ :
ðk þ 1Þ2
where q is the code base (q = 2 for binary code). There is also a Singleton bound
(Problem 5.8) for any code (n, k) having a minimum distance dmin
dmin 1 þ nk:
lower bound for a maximum possible code words number for a given code base q,
code word length n and a minimum code Hamming distance d
d 1
X n
Aq ðn; dÞ qn = ðq 1Þ j :
j¼0
j
Hamming code can be expanded by adding one general parity-check bit (the
code word length is now n = 2b). The obtained code can detect practically all even
numbers of errors. A code can be shortened as well by omitting some information
symbols.
It is obvious that for every Hamming (n, k) code, where the syndrome has
n – k bits, all single errors can be corrected because
n ¼ 2nk 1:
The corresponding code word lengths are 2j – 1 (j 3), i.e. codes (7, 4),
(15, 11), (31, 26), (63, 57) etc. By adding a general parity check, the codes (8, 4),
(16, 11), (32, 26), (64, 57) etc. are obtained, correcting all single errors and
detecting all double errors. It is very suitable having in view the byte data structure.
In a previous consideration it was mentioned that generally any block code can
be defined by the corresponding table comprising k-tuples of input (information)
bits and the corresponding n-tuples of output encoded bits (code words). This table
will have 2k rows. For the corresponding words there are 2 possible “can-
n
ncode
2
didates”. Therefore, there are in total possible codes (of course, some of
2k
them are not good!). For the greater values of n and k it is not practical and some
rule should be used how to obtain the code word from k input bits. It means that
some simple rule should be defined. These rules (algorithms in fact) can be very
different, but there exists a special class of block codes having a simple mathe-
matical description. With the help of discrete mathematics (abstract algebra)
apparatus such class was found. In such a way it is possible to construct codes
having needed characteristics (Hamming distance, etc.). These codes are linear
block codes. This class of codes is defined by imposing a strong structural property
on the codes. This structure provides guidance in the finding the good codes and
helps to practical encoders and decoders realization. At the end of this chapter a
short overview of the corresponding part of abstract algebra is included.
The definition of linear block code is very simple. Consider finite field—Galois
field GF(q), i.e. field that has q symbols. In the field two operations are defined—
addition and multiplication (both are commutative). Sequences of n field elements
(vectors) form a vector space V dimension n over the field. In this space vector
addition is defined where the corresponding vector elements are added according to
the rules from field. The set V is a commutative group under vector addition. Scalar
multiplication is defined as well where vectors (i.e. all their elements) are multiplied
158 5 Block Codes
by field elements (scalars). The product of two n-tuples, (a1, a2, …, an) and (b1, b2,
…, bn), defined as follows
ða1 ; a2 ; . . .; an Þ ðb1 ; b2 ; . . .; bn Þ ¼ a1 b1 þ a2 b2 þ þ an bn
is called inner product or dot product. If inner product equals zero (0 from the
field), for the corresponding vectors it is said to be orthogonal. Definition of linear
code is as follows: A linear code is a subspace of vector space over GF(q). In this
chapter mainly binary field GF(2) will be considered, but the theory is general, it
can be applied for linear codes with q different symbols, if these symbols can be
connected 1:1 with the symbols from GF(q). It means that linear block codes exist
if the number of elements in the code alphabet is a prime number or exponent of the
prime number—2, 3, 4 (=22), 5, 7, 8 (=23), 9 (=32), 11, 13, 16 (=24) etc. It is very
convenient that for any exponent of 2 there exists a linear block code. This
approach in describing codes is often called algebraic coding theory.
Of course, block codes can be formed for any number of symbols, but then they
are not linear. The vector spaces over the finite fields can be used to define code
words not as the elements of a subspace. Then, it is usually said that “nonlinear”
codes are considered.
Also, at the end of this chapter there are two problems not connected to linear
block codes. The arithmetic block codes (Problem 5.11) construction is based on
the arithmetic operation connecting decimal representations of information and
code word. These relations are usually very simple. For integer codes (Problem
5.12) information and code words symbols take the values from the set Zr = {0, 1,
2, …, q − 1} (integer ring) while all operation during the code words forming are
modulo-q.
Previous “mathematical” definition of linear block code can be “translated” by
using previously introduced notions. Vector space has 2n vectors—points—can-
didates for code words. From these candidates 2k code words should be chosen. If
these words are chosen as a subspace of the considered space, a linear block code is
obtained. According to Lagrange’s theorem, the number of elements in the group
must be divisible by the number of elements in the subgroup. The vectors form a
group for addition and for every n and every k there exists linear code (n, k),
because 2n is divisible by 2k (of course, qn is divisible by qk). The identity element
is 0(0, 0, …, 0) must be in a subgroup and in a code there must be code word (00 …
0). The subgroup is closed under addition and the sum of code words will be a code
word as well. The Hamming distance between two code words (vectors) equals the
Hamming weight (i.e. to the number of ones) of their sum. It further means that for
any two code vectors there exists code vector obtained by their summation and the
Hamming weights of code vectors are in the same time possible Hamming distances
in the code. Therefore, to find the minimum Hamming distance in the linear code,
the code word having the minimum Hamming weight should be found (of course,
not the all zeros code word!). To calculate the error probability, the weight dis-
tribution should be found called also the weight spectrum (Problems 5.2, 5.3, 5.8
Brief Theoretical Overview 159
and 5.9) of the code. It specifies the number of code words that have the same
Hamming weight.
Consider Hamming code (7, 4) (Hamming code is a linear code!). Code words
(24 = 16) are
(0000000), (1101001), (0101010), (1000011), (1001100), (0100101), (1100110),
(0001111),
(1110000), (0011001), (1011010), (0110011), (0111100), (1010101), (0010110),
(1111111).
Corresponding weight spectrum is: 0(1) 3(7) 4(7) 7(1), shown in Fig. 5.2a. In
Fig. 5.2b weight spectrum of extended Hamming code (8, 4) obtained by adding
the parity check is shown.
A set of vectors is said to span a vector space if every vector equals at least one
linear combination of these vectors. The number of linearly independent vectors is
called a vector space dimension. Any set of linearly independent vectors spanning
the same vector space is called the basis. Therefore, there can be more bases of the
same vector space. Generally, the group can have the nontrivial subgroups.
Similarly, the vector space can have the subspaces. The subspace dimension is
smaller than the space dimension. For (n, k) code vector space dimension equals
n and code subspace dimension equals k. The subspace can have more bases as
well. The subspaces can be orthogonal (dot products of the vectors from bases equal
zero).
Consider (n, k) code. Vectors of its base can be ordered in a matrix form, its
dimensions are k n. It is a generator matrix) (Problems 5.2, 5.3, 5.4 and 5.9) of
the code. Its rows are linearly independent. The matrix rang is k. There are ele-
mentary operations over the rows which do not change the matrix rang. Therefore,
in this case the code will be the same, only the base would change. Further, by
commuting the matrix columns, an equivalent code (Problems 5.2 and 5.9) would
(a) (b)
Number of code words with given weight, a(d)
Number of code words with given weight, a(d)
8 15
7 4(14)
3(7) 4(7)
6
10
5
3
5
2 0(1) 7(1)
0(1) 8(1)
1
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8
Fig. 5.2 Weight spectrum of Hamming code (7, 4) (a) and Hamming code (8, 4) (b)
160 5 Block Codes
be obtained. This code will have the same weight spectrum as the previous one, i.e.
will have the same performances.
Consider linear block code (5, 2). Code words are (00000), (11010), (10101) i
(01111), i.e. there are 22 = 4 code words. Combination any two of three nonzero
code words forms a generator matrix
1 1 0 1 0 1 1 0 1 0 1 0 1 0 1
H1 ¼ ; H2 ¼ ; H3 ¼ :
1 0 1 0 1 0 1 1 1 1 0 1 1 1 1
Of course, any of them can be obtained from the other using elementary oper-
ations over the rows.
Such mathematical description provides very simple generation of code words.
Consider linear code (n, k) that has generator matrix (dimensions k n)
2 3
g11 g12 g1n
6 g21 g22 g2n 7
6 7
G¼6 . .. .. .. 7:
4 .. . . . 5
gk1 gk2 gkn
To obtain the code word for information bits vector i(i1, i2, …, ik), the following
multiplication is used
v ¼ i G;
where the resulting code vector (word) v(v1, v2, …, vn) is the sum of G rows
corresponding to the places of ones in i. Therefore, a code word must be obtained.
Further, the rows of G are linearly independent and to any specific information
vector i corresponds code word v, different from other code words.
Total number of
k
possible code words is equal to the number of rows of G − , plus the number
1
k
of words obtained by summing two rows of G − , plus the number of words
2
obtained by summing three rows of G etc. This number is 2k – 1. To this total
should be added all zeros code vector, corresponding to all zeros information
vector, i.e.
k k k
þ þ ¼ 2k :
0 1 k
2 3
1 0 0 p11 p1;nk
60 1 0 p21 p2;nk 7
6 7
Gs ¼ 6 . .. .. . .. .. .. 7 ¼ ½ Ik P ;
4 .. . . .. . . . 5
0 0 1 pk1 pk;nk
i.e. kj is parity check taking into account the bits on the positions where P in jth
column has ones. The number of these checks is just n − k. In binary encoding these
are real parity checks, and in general case (nonbinary encoding) these are general-
ized parity checks. It is also possible to construct a product code (Problem 5.7). For a
systematic product code information bits are ordered in two (or more) dimensions.
Then, one error control code is applied for the rows and the other for the columns.
Therefore, the problem of encoding is solved. The encoding is very simple. It is
sufficed to know matrix G. The next question is what to do at the receiving end.
How to verify do the received vector (u) is a code vector (v)? The abstract algebra
gives here an elegant solution as well. As said earlier, if inner product of two
vectors equals zero (0 from the field), the vectors are orthogonal. Two subspaces
are orthogonal if all vectors from one subspace are orthogonal to all vectors of the
other subspace. It is shown that for every subspace there is a subspace orthogonal to
it. To test the orthogonality, it is sufficient to verify the orthogonality of the cor-
responding bases. If the bases are orthogonal, then all their corresponding linear
combinations are orthogonal. Further, if the dimension of vector space is n, and if
the dimension of one subspace is k, the dimension of corresponding orthogonal
subspace is n − k. Let the base (generator matrix) of one subspace is G (dimensions
k n) and let H is the base (generator matrix) of the orthogonal subspace (di-
mensions (n − k) n). Taking into account that in matrix multiplication the ele-
ment aij is obtained as a dot product of ith row of the first matrix and the jth column
of the second matrix, the following is obtained
G H T ¼ 0:
where (T) denotes transposition and the dimension of matrix 0 (all elements equal
zero) are k (n − k). Of course, by transposing this product one obtains
H G T ¼ 0T :
where In−k is a unity matrix dimensions (n − k) (n − k), the sign “–” denotes that
for every element of matrix P the corresponding inverse element in the field should
be taken.
In the case of binary code—GF(2)—each element (0 and 1) is inverse to itself
and a sign “minus” is not needed. It is obvious that the first n − k elements of every
row of matrix H are parity-checks and “one” shows the position of the corre-
sponding parity bit. Some authors use a systematic code matrix
Gs ¼ ½ P Ik :
For the earlier mentioned Hamming code (7, 4) generator and parity-check
matrices are
2 3
1 1 1 0 0 0 0 2 3
61 1 0 1 0 1 0 1
0 0 1 1 0 07
G¼6
40
7; H ¼ 4 0 1 1 0 0 1 1 5:
1 0 1 0 1 05
0 0 0 1 1 1 1
1 1 0 1 0 0 1
For decoding the received word u, the following vector should be found
S ¼ u HT:
S ¼ u H T ¼ ð v þ eÞ H T ¼ v H T þ e H T ¼ 0 þ e H T ¼ e H T :
Now, vector e should be found. It is obvious that if the error vectors have the
same syndrome, only one of them must be taken into account. Hoping that the
smaller number of errors is expected, for the error vector should be taken that one
that has a smaller number of ones.
Still further insight into error detection and correction can be obtained using
standard array. Linear code (n, k) is a subspace of a vector space dimension n. It is
in the same time a subgroup of the additive group. Each group can be partitioned
with respect to the subgroup. Here the vector space is partitioned with respect to the
code subspace. The procedure is described and commented in details in Problem
5.2. Here, a short overview of the procedure will be given. For any linear block
code standard array has 2n−k rows and 2k columns. Every row is called coset, code
words c1, c2, …, c2k are in the first row (usually starting from the all zeros code
word) and the first elements in every row are called coset leaders. The element in
the ith column and the jth row is obtained by adding ci and jth coset leader, denoted
by ej. In fact, it is assumed that the code word ci was transmitted, and that the error
164 5 Block Codes
vector ej is added to the code word. Syndrome corresponding to this received word
is uniquely determined by jth coset leader
S¼v H T ¼ ðci ej Þ H T ¼ ej HT ;
because the parity-check matrix does not depend on the received word and it is
known at the receiving end. The error vectors corresponding to coset leaders are
uniquely determined by the syndrome and can be corrected. Therefore, 2n−k dif-
ferent error patterns (vectors) can be corrected.
As an example consider code (5, 2) with code words (00000), (11010), (10101),
(01111). Minimal Hamming distance equals 3 and a code should correct all single
errors. It is obvious from standard array
S
00000 11010 10101 01111 000
00001 11011 10100 01110 101
00010 11000 10111 01101 110
00100 11110 10001 01011 001
01000 10010 11101 00111 010
10000 01010 00101 11111 100
--------------------------------
00011 11001 10110 01100 011
00110 11100 10011 01001 111
For coset leaders of two last rows the some error vectors having two errors are
chosen (from four possible candidates).
An interesting class of linear codes over GF(2) are Reed-Muller codes (Problem
5.10). They can be easy described and by a simple majority logic easy decoded as
well. Here the code word at the receiver is found iteratively, without using a
syndrome. A notion of orthogonality is introduced (different from orthogonality of
vectors). The set of check sums (parity checks) is orthogonal to a particular error
bit, if this bit is involved in every sum, and no other error bit is checked by more
than one sum.
Considered codes are constructed for error control at the channels where single
errors are dominant (memoryless channels). However, sometimes the errors occur
in packets (“bursts”). By using interleaving (Problem 5.6) at the transmitter and
deinterleaving at the receiver the error control code for random errors can suc-
cessfully combat with packet errors.
Problems
Problem 5.1 Explain a block code construction where bits are repeated n times.
(a) Explain the decoding procedure when the decisions are made using majority
logic.
(b) If it is possible to form some other decision rule for this code, explain its
advantages and drawbacks with respect to the previously described rule.
Problems 165
(c) Find the residual error probability after decoding and draw it for n = 5, n = 7,
n = 9 and n = 11, for two different decision rules. Find the code rate for all
cases considered.
(d) Comment the quality of this code comparing it to the limiting case given by
Second Shannon theorem.
Solution
The construction of a repetition code is quite simple and it is based on the fact that a
binary zero at the encoder input is represented by a series of n zeros at its output,
while to the binary one correspond a series of n successive ones. It is obvious that at
the encoder output only two code words can appear, while at the receiver input any
bit combination can appear (but not all having the same probability). The decoder,
on the basis of the received code word, has to decide which information bit is
transmitted.
(a) To attain this goal, usually the majority decision rule is used, where it is
decided that at the encoder input was binary zero if in the received word there
are more zeros than ones, and vice versa. To avoid the case when the received
word has an equal number of binary zeros and ones, it is usually taken that a
code word length, denoted by n, is an odd number. In this case all received n-
bits words are divided into two groups, as illustrated in Fig. 5.3 for n = 3.
Using this approach the decision error appears if there are errors on at least
(n + 1)/2 positions. The probability of the residual error for the user just
corresponds to the probability of such an event being
X
n
n t
Pe;rez1 ¼ p ð1 pÞnt ;
t
t¼ðn þ 1Þ=2
000 100
0 Repetition 000 001 010 Majority 0
code, n=3 BSC decision rule
1 111 101 110 1
011 111
Fig. 5.3 Encoder and decoder procedures when threefold repetition is applied
166 5 Block Codes
case the error occurs only if all bits in a code word are hit with the channel
errors, and a corresponding probability is
Pe;rez2 ¼ pn :
0
10 n=5, decision rule no. 1
n=7, decision rule no. 1
−1
10 n=9, decision rule no. 1
n=11, decision rule no. 1 decision rule
no. 1
−2 n=5, decision rule no. 2
Residual bit error probability, prez
10
n=7, decision rule no. 2
−3 n=9, decision rule no. 2
10
n=11, decision rule no. 2
−4
10
−5
10
−6
10
decision rule
−7
10 no. 2
−8
10
−9
10
−3 −2 −1 0
10 10 10 10
Crossover probability in BSC, p
Fig. 5.4 Residual error probability, the repetition code, two decision rules
Problems 167
achieves the better performances, but it implies the feedback channel and
ARQ procedure. Because of that, to achieve the small residual error with the
moderate decoder complexity is a very complicated task.
(d) In Problem 4.7 it was shown that the transmission reliability can be increased
by reducing the channel information rate. It is interesting to check whether
the results obtained here are in concordance with the Second Shannon theo-
rem, formulated in 1948 [10], where it was proved that the reliable trans-
mission can be provided if the code rate (R) is smaller than the transmitted
information—I(X, Y)—for a given channel.
For BSC, when the input symbol probabilities are equal, the transmitted infor-
mation depends only on the average transmission error (crossover) probability (p)
1 1
Imax ðpÞ ¼ 1 ð1 pÞ ld p ld :
1p p
The function I(p) is shown in Fig. 5.5, where for crossover probability the
logarithmic scale is used (for p < 0.5). In the same figure there are shown the
parameter p values for which n-fold repetition provides the residual error proba-
bility Pe,res = 10−9 (for the second decision rule) and at y-axis the corresponding
code rate is denoted.
It should be noted that such comparison is not quite correct—I(p) is the upper
limit for a code rate for the case of when the residual error is arbitrary small, while,
0.9
0.8
R>I(p)
Transmitted information, I
0.5
0.4
n=15
n=11
0.3 n=13
n=3 n=9
0.2
0.1 n=5
n=7
0
−4 −3 −2 −1
10 10 10 10
Crossover probabitily, p
for the repetition code the values are given for the case when the Pe.res is sufficiently
small for the practical purposes (for a majority of communication systems
Pe,res < 10−9). However, some interesting effects can be noticed:
(1) For a chosen decision rule, the same value of residual error is achieved for
crossover probability values as the code rate is smaller.
(2) In all considered cases R I(p), and it is possible to construct a code having
substantially greater code rate, which will provide the same (and even the
better) transmission quality that the n-repetition code, for the same crossover
probability.
(3) For a very great number of repetitions, the code performances approach to the
limits given by the Second Shannon theorem (as R/I(p) ! 1) but then R and
I(p) approach to zero.
Solution
(a) The three-bits information words are represented (according to Table 5.1) by
five bits code words yielding the code rate R = 3/5.
By definition, a linear block code is a vector subspace of a vector space. For a
binary code (n, k) vector space consists of all n-bits combination (2n), while a vector
subspace is formed of 2k sequences of the length n, satisfying the following
(1) Vector subspace has a neutral (identity) element for addition
(2) Vector subspace is closed for the modulo-2 addition.
To check the code linearity these conditions should be verified. In the code there
is a code word consisting of all zeros and the first condition is satisfied. By adding
any pair of the code words some code word is obtained (one of the bit combinations
in Table 5.1) and code is a linear one.
From Table 5.1 it can be easily found that the code word corresponding to
information word 110 can be obtained by adding the code words corresponding to
information words 100 and 010. Now, the general rule can be found, from which
follows that ith row of a generator matrix should correspond to the code word for
an information word having one at the ith position only, and a generator matrix is
2 3
1 1 0 1 1
G ¼ 41 0 1 1 0 5:
0 1 0 0 1
All code words can be obtained multiplying all information words by a generator
matrix
c¼i G
Vector
subspace
Vector space
00000
(5,3)
11011 00100
11111 10110 10010 11100
01101 11101 11110
01001 10000
00001 10001 10111 11010
00011 01010 10011 11000
00010 00110 01000 10100 11001
00101 01011 01100 10101
00111 01110
01111
(b) Code words are given in Table 5.1 and shown in Fig. 5.6 as the elements of
vector subspace. Linear block code is a subspace and it is closed under
addition modulo-2 (bit-by-bit), and the sum of two code words is a code word
as well
Hamming distance between two code words ci and cj, denoted by dH(ci, cj), is
determined by a Hamming weight of the word ci ⊕ cj (i.e. by the number of ones
in it). It can be written formally as
where w(ci) denotes the Hamming weight of the ith code word. Code words weight
distribution determines completely code distances spectrum. In this code there is
one code word having the weight d = 1, two words with d = 2, two words with
d = 3, one word with d = 4 and one word with d = 5. The corresponding weight
distribution is shown in Fig. 5.7.
The minimum Hamming distance is dmin = 1, relation dmin 2ec + 1 is sat-
isfied only for ec = 0, and under the previous condition the inequality dmin
ec + ed + 1 is satisfied only for ed = 0. It is obvious that this code cannot correct
even not to detect all single errors.
(c) Standard array of any binary linear block code has 2n−k rows and 2k columns,
as given in Table 5.2. Every row is one coset, code words c1, c2, …, c2k are in
the first row (usually starting from the all zeros code word) and the first
elements in every row are coset leaders. The element in the ith column and the
jth row is obtained by adding ci and jth coset leader, denoted by ej. Under
weight distribution
1.5
0.5
0
0 1 2 3 4 5
Hamming weight, d
Problems 171
Table 5.2 General clishé for obtaining the standard array of a linear block code
c1 = (0, 0, …, 0) c2 … ci … ck
2
e2 c2 ⊕ e2 … ci ⊕ e2 … c2k ⊕ e2
… … … … … …
e2n−k c2 ⊕ e2n−k … ci ⊕e2n−k … c2k ⊕ e2n−k
assumption that the code word ci was transmitted, it is obvious that the ele-
ment of the jth row in the ith column corresponds to the received word if the
error vector is equal to the jth coset leader. It holds as well as for every
column and every row (any i, j combination). Syndrome corresponding to this
code word is uniquely determined by jth coset leader
S¼r H T ¼ ðci ej Þ H T ¼ ej HT
because the parity-check matrix does not depend on the received word and it is
known at the receiving end.
The error vectors corresponding to coset leaders are uniquely determined by the
syndrome and can be corrected. In general case, it is obvious that 2n−k different
error patterns can be corrected. To minimize the error probability it is suitable that
the coset leaders are the error patterns which occur the most frequently. It is obvious
that for BSC (which has a relatively small crossover probability) single errors are
more probable than double ones and because of that for coset leaders the combi-
nations with the smallest number of ones should be chosen.
Standard array of the (5, 3) code has 8 columns, 4 rows, as given in Table 5.3.
Code words ci are in bold, while the coset leaders are in italic and underlined. In this
example, every element in ith column differs from ci by at most one bit. E.g. second
element in the third column equals the sum of c3 = (10110) and the second coset
leader e2 = (10000). This code can correct the error patterns (10000), (00010) and
(00001). Of course, the trivial error pattern (00000) can be “corrected” as well. On
the other hand, the error pattern (00100) cannot be corrected and the code cannot
correct all single bit error patterns. This pattern is in fact a code word and this error
cannot be even detected, confirming the above conclusion that for this code
ec = ed = 0 holds. It is recommended to the reader to verify whether the vector
(01000) can be a coset leader as well as whether the code, in this case, could detect
and correct the errors.
172 5 Block Codes
Let the ith column in the standard array is denoted by Di. The decoding by using
a standard array is simple—if word r was received from the column Di, the con-
clusion is that code word ci is transmitted. It is easy to verify that this way of
decoding is based on the maximum likelihood rule, because it is concluded that the
code word was transmitted to which the received one has a minimum Hamming
distance. The following conclusions can be drawn:
(1) It is possible to decode 2n−k various patterns because there is the same number
of different syndromes. For a code having minimum Hamming distance dmin
for coset leaders the error patterns with weights ec (dmin − 1)/2 should be
chosen.
(2) If the received word is different from code words, the error can be detected.
Therefore, there are in total 2n − 2k detectable error patterns.
(d) The code generator matrix is not unique. In fact, it can be obtained if as its
rows any three independent code words (the third one cannot be the sum of the
other two) from the vector subspace shown in Fig. 5.6 are written. It is easy to
verify that this condition is satisfied as well as with the matrix
2 3
1 0 0 1 0
G0 ¼ 4 1 1 1 1 1 5:
0 0 1 0 0
Of course, the transformation of information words into the code words is not
now defined by Table 5.1, but the set of code words will be the same. Therefore,
this generator matrix generates the same code. The following should be noticed as
well—although for the same code more generator matrices can be found, one
generator matrix defines one and only one code.
(e) An equivalent code is, by definition, the code having the same code distances
spectrum, but at least one its code word differs from the words of the origi-
nating code. This code can be formed if two (or more) columns change the
places. The following matrix is obtained from the previous generator matrix
where the third and the fourth columns have changed the places.
2 3
1 0 1 0 0
G00 ¼ 4 1 1 1 1 1 5:
0 0 0 1 0
and it is obvious that the code distance spectrum is not changed comparing to the
previous case.
(f) The simplest way to form a nonlinear block code is to substitute all zeros code
word by some bit combination not belonging to the code. In such a way in the
vector subspace (shown in Fig. 5.6) there will be no more the identity element
for addition, if the combination (00000) is substituted with (00001). It is easy
to verify that, in this case, is not possible to define the generator matrix
providing for the transformation of all zeros information word into non zero
binary combination (the multiplying of any matrix by a vector consisting of all
zeros yields the all zero vector).
Of course, even if a code word “all zeros” exists in the code, it might not be
linear. If any code word is changed in such a way that its sum with some other code
word does not yield a code word, the closure property regarding the addition is not
fulfilled (the sum is a word not belonging to the vector subspace, of course, it
belongs to some other vector subspace). In the second variant shown in Table 5.4
the critical word is (01100). In this case, as well, it is not possible to define a
generator matrix, because this word cannot be obtained as a linear combination of
other three code words.
Problem 5.3 Explain the construction of Hamming code having code word length
n = 2b – 1 (b—any integer greater than 1), and after
(a) Explain in details the construction of Hamming (7, 4) code, as well the
encoding and decoding process for a code word corresponding to information
sequence i = (1110), if during the transmission an error occurred in the third
position of the code word;
(b) Decode the received word, if the same information sequence as above was
transmitted for the cases without errors and when the errors occurred in the
third and the fifth position of the code word, comment the corresponding
syndromes;
(c) Find the generator and the parity-check matrices for this code;
(d) Draw the distance spectrum of the code.
Solution
In this problem a class of codes proposed by Richard Hamming in 1950 [23] will be
described. Suppose that the aim is to construct a Hamming code when n = 2b − 1
bits. If the bit positions are numbered as l = 1, 2, …, n, then the parity-check bits
are in the code word c positions l = 2m (m = 1, 2, …, b), while information bits are
in the other places. It is obvious that the total number of parity-check bits in the
code word is just b = ld(n + 1), and the information word length at the encoder
input should be k = n − ld(n + 1). Some possible (n, k) combinations, according to
this expression, are given in Table 5.5.
Parity-check bit m, which is in the position l = 2m of the code word c, is obtained
by adding modulo-2 (XOR) of all information bits which are at the positions having
1 at mth position in their binary record.
After the superimposing the channel errors, the code word r is at the decoder
input. Here, the syndrome is formed consisting of b = n − k bits. The value of mth
syndrome bit is obtained by adding the bits of the received word which are at the
positions having 1 at mth position in their binary record. If during the transmission
only one of these bits was inverted (error!), the syndrome value, starting from the
bits with the higher ordinal number to the position having the smaller ordinal
number is a binary record of the error position. General block scheme of the
Hamming encoder and decoder are shown in Fig. 5.8.
(a) Hamming code (7, 4) construction can be simply described by a clishé given
in Table 5.6. The code word length is n = 7, the number of parity-check bits is
b = ld(7 + 1) = 3.
Number l, from the first column determines the ordinal bit number in the code
word. The position of the first 1 in the mth (m = 1, 2, 3) position of a binary record
of l, determines the position of mth parity-check bit in code word, this bit is denoted
by zm. The value of mth parity check bit is obtained by summing modulo-2 the
k
zm = ∑ il −⎡ld ( l )⎤ ,
⎢ ⎥
[ bin(l )]b −m =1, ld (l ) ≠ ⎡⎢ld (l ) ⎤⎥
l =1
information bits in the positions where the binary record of ordinal number of l in
position 3 − m has 1. For the obtaining the first parity-check bit the last column in a
binary record is used, and for the third one the first column is used, yielding
z1 ¼ c3 c5 c7 ¼ i1 i2 i4 ;
z2 ¼ c3 c6 c7 ¼ i1 i3 i4 ;
z3 ¼ c5 c6 c7 ¼ i2 i3 i4 :
column of the clishé in Table 5.6. Denoting by el the lth bit in the error vector,
syndrome components can be written as
s1 ¼ r1 r3 r5 r7 ¼ ði1 i2 i4 Þ e1 i1 e3 i2 e5 i3 e7 ¼ e1 e3 e5 e7 ;
s2 ¼ r2 r3 r6 r7 ¼ ði1 i3 i4 Þ e2 i1 e3 i3 e6 i3 e7 ¼ e2 e3 e6 e7 ;
s3 ¼ r4 r5 r6 r7 ¼ ði2 i3 i4 Þ e4 i2 e5 i2 e6 i3 e7 ¼ e4 e5 e6 e7 :
On the basis of the above relations, it is easy to prove that the corresponding
parity-checks (ordered by descending indexes) form the syndrome which deter-
mines the position of the code word bit where the error occurred, recorded in binary
system S ¼ ðs3 s2 s1 Þ. By complementing this bit, the error can be corrected.
When the transmitted sequence is i = (1110), information bits are i1 = 1, i2 = 1,
i3 = 1, i4 = 0 and the corresponding parity-checks are
z1 ¼ i1 i2 i4 ¼ 1 1 0 ¼ 0;
z2 ¼ i1 i3 i4 ¼ 1 1 0 ¼ 0;
z3 ¼ i2 i3 i4 ¼ 1 1 0 ¼ 0;
and at the encoder output is the code word c = (0010110). Due to the transmission
error at the fourth position, the received sequence is r = (0011110), the
parity-check sums have the values
s1 ¼ r1 r3 r5 r7 ¼ 0 1 1 0 ¼ 0;
s2 ¼ r2 r3 r6 r7 ¼ 0 1 1 0 ¼ 0;
s3 ¼ r4 r5 r6 r7 ¼ 1 1 1 0 ¼ 1:
It is obvious that syndrome is S ¼ ð100Þ ¼ 4 and that the error is detected at the
fourth position. The decoder inverts the fourth bit, the error is corrected and a
correctly reconstructed code word is obtained, from which the information bits are
extracted
^c ¼ ½0010110 ¼ c ) ^i ¼ ½1110 ¼ i:
(b) Firstly, it should be noted that the code word structure depends on the code
construction as well as on the input information sequence. If the same code is
applied, to the unchanged information word corresponds the same code word
c = (0010110) because the transformation is 1:1.
If during the transmission the error did not occurred, the received word is r = c,
parity-checks are
Problems 177
s1 ¼ r1 r3 r5 r7 ¼ 0 1 1 0 ¼ 0;
s2 ¼ r2 r3 r6 r7 ¼ 0 1 1 0 ¼ 0;
s3 ¼ r4 r5 r6 r7 ¼ 0 1 1 0 ¼ 0;
s1 ¼ r1 r3 r5 r7 ¼ 0 0 0 0 ¼ 0
s2 ¼ r2 r3 r6 r7 ¼ 0 0 1 0 ¼ 1
s3 ¼ r4 r5 r6 r7 ¼ 0 0 1 0 ¼ 1
and the syndrome S ¼ ð110Þ ¼ 6 indicates that the error occurred at the 6th bit.
After the corresponding inversion, the estimations of the code and information
words are
^c ¼ ð0000000Þ ) ^i ¼ ð0000Þ 6¼ i:
It is obvious that the decoding in this case was unsuccessful because even three
information word bits are wrongly decoded. Here the decoding made situation even
the worse—channel errors were at two information bits, while the decoder itself
introduced an additional error. It does not happen always that the channel (or
decoder) introduces the errors on the information bits, but such situation is not
desirable.
As a Hamming code syndrome has n – k = b bits, it is easy to calculate that the
total number of possible syndrome values is 2n−k = n + 1. From it, n syndrome
values show the single error positions, and the last one (all zeros) corresponds to the
transmission without errors. This relation shows that here the Hamming bound
ec
X n
qnk ðq 1Þt
t¼0
t
is satisfied with equality for ec = 1 (for a binary code the code word consists only of
zeroes and ones, i.e. q = 2), the code is perfect. Therefore, this code can correct all
single errors, but not any combination of two errors in the code word. Besides the
Hamming code, it is known that the Golay code (23, 12) is as well a perfect one, it
can correct all single, double or triple errors [24].
Now it is interesting to consider how often the single and double errors will
occur. Generally, it depends on the code word length, type of the channel errors and
the average value of error probability. For BSC the probability that from n trans-
mitted bits, the errors occurred in t positions is given by a binomial distribution
178 5 Block Codes
n t
Pe;t ¼ p ð1 pÞnt ; t n:
t
For illustration, let suppose that the channel error probability is p = 10−3. In this
case the probabilities of single or double channel errors in the code word of length
n = 7 are
7 7!
Pe;1 ¼ pð1 pÞ6 103 ¼ 7 103 ;
1 1!6!
7 7!
Pe;2 ¼ p2 ð1 pÞ5 ð103 Þ2 ¼ 2:1 105
2 2!5!
2 3
g11 g12 ... g17
6 g21 g22 ... g27 7
½z1 z2 i1 z3 i2 i3 i4 ¼ ½i1 i2 i3 i4 6 7:
4 g31 g32 ... g37 5
g41 g42 ... g47
The previous relation, combined with the parity-check bits defining expressions,
yields the Hamming code (7, 4) generator matrix
2 3
1 1 1 0 0 0 0
z1 ¼ i 1 i 2 i 4 ; 61
6 0 0 1 1 0 077:
z2 ¼ i 1 i 3 i 4 ; ) G¼4
0 1 0 1 0 1 05
z3 ¼ i 2 i 3 i 4 :
1 1 0 1 0 0 1
On the other hand, the Hamming code parity-check matrix can be found gen-
erally from the relation
2 3
h11 h21 h31
6 h12 h22 h32 7
S¼r HT ) ½s1 s2 s3 ¼ ½r1 r2 r3 r4 r5 r6 r7 6 7;
4 ... ... ... 5
h17 h27 h37
Problems 179
and in this case, the parity-check matrix can be found practically without
difficulties
2 3
1 0 0
60 1 07
6 7 2 3
s1 ¼ r1 r3 r5 r7 61 1 07 1 0 1 0 1 0 1
6 7
s2 ¼ r2 r3 r6 r7 s ) HT ¼6
60 0 177 ) H ¼ 40 1 1 0 0 1 1 5:
3 ¼ r4 r5 r6 r7
61 0 17 0 0 0 1 1 1 1
6 7
40 1 15
1 1 1
(d) It is obvious that this code has 2k = 16 code words. They can be obtained
easily if all four-bit information words are multiplied by the generator matrix,
resulting in the following combinations
c1 = (0000000), c2 = (1101001), c3 = (0101010), c4 = (1000011),
c5 = (1001100), c6 = (0100101), c7 = (1100110), c8 = (0001111),
c9 = (1110000), c10 = (0011001), c11 = (1011010), c12 = (0110011),
c13 = (0111100), c14 = (1010101), c15 = (0010110), c16 = (1111111).
Let two code words from this set are chosen at random, e.g. c4 and c11. Their
sum is the code word as well, because
As in a vector subspace (and the linear code by definition is it) the closure for
addition must be satisfied, the sum of any two code words must be a code word as
well. As a consequence is that the Hamming distance between the addends will be
equal to the Hamming weight of their sum (i.e. to the number of ones in the
obtained code word). Because of that, every code word is a sum of some other two
code words, and its weight is the Hamming distance between some other code
words and there is no any two words whose Hamming distance does not correspond
to the Hamming weight of some third code word.
As a further consequence, it is sufficient to count ones in the code words,
because it completely determines the spectrum of all possible distances between the
code words. In the above code there is one word consisting of all zeros, seven
words having three ones, seven words having four ones and one word having seven
ones. The corresponding weight spectrum is shown in Fig. 5.9.
Minimum Hamming distance in this case is dmin = 3, the number of correctable
bits (errors) is given by dmin 2ec + 1 and ec = 1, while the number of errors that
can be detected obtained from dmin ec + ed + 1 is ed = 1. This code obviously
can detect only one error, and to correct it. If ec = 0, then ed = 2. Of course, it does
not mean that all combinations of three errors (in the code word) result in an
180 5 Block Codes
0
0 1 2 3 4 5 6 7
Hemming weight, d
undetectable error—it is the case only for combinations having the same pattern as
the code words.
Problem 5.4 Explain the construction of Hamming code having code word length
n = 2b, b—any integer greater than 1, and analyze the Hamming code (8, 4) in
details.
(a) Illustrate the procedures of encoding and decoding for the information word
i = (0011) if the error occurred at the third position of the code word.
(b) Explain in details the decoding of sequences (00000101) and (00101100).
(c) Find the minimum Hamming distance of the code and verify the number of
errors the code can correct and detect.
(d) Analyze the code correction capabilities for p = 10−3.
(e) Find the generator and the parity-check matrix for systematic (8, 4) Hamming
code.
Solution
Hamming code having the code word length n = 2b bits, is constructed beginning
from the previously formed code (2b − 1, 2b – b − 1). Then, at the last position one
parity-check bit is added, while after the reception one additional syndrome bit is
calculated
2X
b
1 X
2b
zb þ 1 ¼ cl ; sb þ 1 ¼ rl :
l¼1 l¼1
In such a way the code which has parameters (2b, 2b − b) is obtained, and the
additional parity-check bit just makes possible the detection of one more error. The
obtained code can correct one and detect two errors in the code word.
Problems 181
The construction of (8, 4) Hamming code differs from (7, 4) code construction
only in adding one bit obtained by simple parity-check, i.e. the fourth parity-check
bit is calculated as follows
X
7
z4 ¼ ci ¼ z 1 z 2 i 1 z 3 i 2 i 3 i 4 :
i¼1
If the number of ones in code word of the basic code (7, 4) is odd, then z4 = 1,
and if it is even, then z4 = 0. In any case, after the inclusion of this bit, the sum
(modulo-2) of all bits in the code word must be equal to zero, and the additional bit
of the syndrome can be written as
X
8 X
8 X
8
s4 ¼ ri ¼ ðci ei Þ¼ ei :
i¼1 i¼1 i¼1
s1 ¼ r1 r3 r5 r7 ¼ 0 0 1 0 ¼ 1;
s2 ¼ r2 r3 r6 r7 ¼ 0 1 1 0 ¼ 1;
s3 ¼ r4 r5 r6 r7 ¼ 1 1 1 0 ¼ 0;
X
8
s4 ¼ ri ¼0 0 1 1 1 1 0 1 ¼ 1:
i¼1
^c ¼ ð00101101Þ ) ^i ¼ ð1110Þ:
One can conclude that the error control coding was successful, i.e. the user
obtained the information bits without errors in spite of the fact that one error
occurred during transmission
(b) If the received word is r = (00000101), the parity-check bits are:
s1 ¼ r1 r3 r5 r7 ¼ 0 1 0 0 ¼ 0
s2 ¼ r2 r3 r6 r7 ¼ 0 1 1 0 ¼ 1
s3 ¼ r4 r5 r6 r7 ¼ 1 0 1 0 ¼ 1
X
8
s4 ¼ ri ¼0 0 1 1 0 1 0 1 ¼ 0
i¼1
From s4 ¼ 0 follows that an even number of errors occurred during the trans-
mission corresponds to the case when the errors cannot be corrected. Now the
syndrome is S ¼ ðs3 s2 s1 Þ ¼ ð110Þ ¼ 6, but it is not the error position.
It could be noticed that this case corresponds to the (b) from the previous
problem, but here the parity bit z4 ¼ 1 is added. The errors there were at the same
positions (third and fifth bit), but the decoder could not correct the errors (even
introduced additional error at the sixth position). In this case the decoder only
detects the occurrence of an even number of errors and does not try to correct them,
but the retransmission is asked for. In such a way the possibility that the decoder
introduces new errors is avoided, and if the channel state is better during the
retransmission (if not more than one error in the code word occurred) the infor-
mation word can be successfully transmitted.
For r = (00101100), the parity-check bits are
s1 ¼ r1 r3 r5 r7 ¼ 0 1 0 0 ¼ 0;
s2 ¼ r2 r3 r6 r7 ¼ 0 1 1 0 ¼ 0;
s3 ¼ r4 r5 r6 r7 ¼ 1 0 1 0 ¼ 0;
X
8
s4 ¼ ri ¼0 0 1 1 0 1 0 1 ¼ 1:
i¼1
8 7 8 2
Pe;1 ¼ pð1 pÞ 8 10 ; Pe;2 ¼
3
p ð1 pÞ6 2:8 105 ;
1 2
8 3
Pe;3 ¼ p ð1 pÞ5 5:6 108 ;
3
The drawback of this code is that it is incapable to correct double errors and, if
the decoder is designed to correct one error per codeword, it is not capable to detect
triple errors. In such a case, the probability for code word wrong detection for
p = 10−3 is mainly determined by the probability of triple error
8 3
Ped Pe;3 ¼ p ð1 pÞ5 5:6 108 :
3
In the case when the decoder is designed not to correct any error, it is capable to
detect up to three errors in any positions but it is not capable to detect some patterns
with larger weights that corresponds to the code words (analysis is related with part
c) of this problem and similar to Problem 5.2). For this code, there are 14 critical
eight-bit critical error patterns having four ones and one error pattern with weight
eight, for which the syndrome equals zero! In this case, the probability for code
word wrong detection is
z4 ¼ z1 z2 i1 z3 i2 i3 i4
¼ ði1 i2 i4 Þ ði1 i3 i4 Þ i1 ði2 i3 i4 Þ i2 i3 i4
¼ i1 i2 i3
and the generator matrix of a Hamming code (7, 4) from the previous problem
should be slightly modified
2 3
1 1 1 0 0 0 0 1
61 0 0 1 1 0 0 17
G¼6
40
7:
1 0 1 0 1 0 15
1 1 0 1 0 0 1 0
The corresponding parity-check matrix can be easily found by noting that to the
submatrix P correspond the last four columns of the generator matrix, yielding as a
result
2 3
1 1 0 1j 1 0 0 0
T 61 0 1 1j 0 1 0 07
H s ¼ P ; I3 ¼ 6
40
7:
1 1 1j 0 0 1 05
1 1 1 0j 0 0 0 1
Problems 185
Problem 5.5
(a) Perform the encoding of sequence (10101010101) by (15, 11) code and
decode the corresponding code word if the error occurred at the ninth position.
How many errors can be corrected and how many detected by this code?
(b) How to modify the code to make possible the detection of two errors? For the
previous information word, analyze the case when the errors occurred at the
positions 6 and 16.
(c) Decode the sequences (111111000000) and (000000010001) if it is known
that the code used can correct one error and detect one error.
(d) Decode the sequences (001001000001) and (101111000011) if it is known
that Hamming (12, 7) code is used.
Solution
(a) The code word length is n = 15, the first higher power of two is 24 = 16 and
the code will be constructed using four columns from the cliché and there are
b = 4 parity-check bits. Because of n – k = b, the code have the possibility to
detect and to correct one error. The cliché for forming the code, encoding and
decoding procedures is shown below.
l bin(l) cl -encoding -decoding
1 0001 z1 i ¼ ð10101010101Þ r ¼ ð101101000010101Þ
2 0010 z2 z1 ¼ c3 c5 c7 c9 c11 c13 c15 s1 ¼ r1 r3 r5 r7 r9
3 0011 i1 ¼ i1 i2 i4 i5 i7 i9 i11 ¼ 1 r11 r13 r15 ¼ 1
4 0100 z3 z2 ¼ c3 c6 c7 c10 c11 c14 c15 s2 ¼ r2 r3 r6 r7 r10
5 0101 i2 ¼ i1 i3 i4 i6 i7 i10 i11 ¼ 0 r11 r14 r15 ¼ 0
6 0110 i3 z3 ¼ c5 c6 c7 c12 c13 c14 c15 s3 ¼ r4 r5 r6 r7 r12
7 0111 i4 ¼ i2 i3 i4 i8 i9 i10 i11 ¼ 1 r13 r14 r15 ¼ 0
8 1000 z4 z4 ¼ c9 c10 c11 c12 c13 c14 c15 s4 ¼ r8 r9 r10 r11 r12
9 1001 i5 ¼ i5 i6 i7 i8 i9 i10 i11 ¼ 0 r13 r14 r15 ¼ 1
10 1010 i6 c ¼ ðz1 z2 i1 z3 i2 i3 i4 z4 i5 i6 i7 i8 i9 i10 i11 Þ S ¼ ðs4 s3 s2 s1 Þ ¼ ð1001Þ ¼ 9
11 1011 i7 ¼ ð101101001010101Þ ^c ¼ ð101101001010101Þ
12 1100 i8 ^i ¼ ð10101010101Þ
13 1101 i9
14 1110 i10
15 1111 i11
Therefore, the error occurred at the position 9 was detected and corrected, and
the user received the correct information sequence.
(b) To made the detection of two errors possible one parity-check bit should be
added on the position 16. The extended Hamming code has the parameters
(n, k) = (16, 12) and the added parity bit is
186 5 Block Codes
X
15
z4 ¼ ci ¼ 0;
i¼1
X
16
S ¼ ðs4 s3 s2 s1 Þ ¼ ð0110Þ ¼ 6; s5 ¼ ri ¼ 0:
i¼1
In this case syndrome shows the position of one of the errors, but because of
s5 = 0 the receiver detects the double error and does not start the error correction,
but asks for retransmission. If the decoder “knew” that one error occurred at the
parity-check bit, it could correct both errors.
According to the solution from the previous problem the syndrome can be as
well written as
and the combination S ¼ ð0110Þ, s5 = 0 appears as well in the case when the errors
occurred at the positions 7 and 14. Therefore, the various error combinations can
result in the same syndrome value and the errors positions cannot be found exactly.
The same can be clearly seen as well from the Hamming bound in this case
ec
X
5 16 16
2 ¼ 1þ ¼ 17;
t¼0
t 1
l bin(l) cl r ¼ ð001001000001Þ
1 0001 z1 s1 ¼ r1 r3 r5 r7 r9 r11 ¼ 1
2 0010 z2 s2 ¼ r2 r3 r6 r7 r10 r11 ¼ 0
3 0011 i1 s3 ¼ r4 r5 r6 r7 ¼ 1
4 0100 z3 s4 ¼ r8 r9 r10 r11 ¼ 0
5 0101 i2 S ¼ ðs4 s3 s2 s1 Þ ¼ ð0101Þ ¼ 5
6 0110 i3 s5 ¼ 1
7 0111 i4 ^c ¼ ð111111100000Þ
8 1000 z4 ^i ¼ ð1110000Þ
9 1001 i5
10 1010 i6
11 1011 i7
188 5 Block Codes
Differently from the previous case, the decoding of the word (101111000011)
yields the following syndrome values
s1 ¼ r1 r3 r5 r7 r9 r11 ¼ 0
s2 ¼ r2 r3 r6 r7 r10 r11 ¼ 1
s3 ¼ r4 r5 r6 r7 ¼ 1
s4 ¼ r8 r9 r10 r11 ¼ 1
S ¼ ðs4 s3 s2 s1 Þ ¼ ð1110Þ ¼ 14
s5 ¼ 1
It is obvious that an even number of errors did not occur, but it is unusual that
the syndrome gives the position 14, not existing in the code. The reader should
found which error can result in such syndrome value.
Problem 5.6 Uplink of one geostationary satellite system is considered. Due to the
frequency limitations maximum signaling rate is vb = 200 [kb/s]. The bad weather
conditions in some intervals (not longer than 0.01 [ms]) cause the errors in packets.
These intervals have a period T = 0.14 [ms]. Due to the long propagation delay, the
satellite link is not suitable for ARQ procedure.
(a) Propose the solution enabling transmission without errors under the circum-
stances. Find the code rate of the proposed error control code.
(b) If the demodulator output sequence is:
011010010001001111110100011111001010011110101
comment the syndrome values, decode the information word and determine
the type of channel errors.
(c) If from the terrestrial station a sequence of N = 5 106 symbols from the set
fA; B; C; D; Eg with the corresponding probabilities {0.45; 0.35; 0.1; 0.07;
0.03} is emitted, find a minimum theoretical time for their emitting, if the error
control code from (a) is used. If the time for this sequence transmission should
not be greater than tmax = 80 [s], propose a compression code that satisfies this
condition.
Solution
The satellite is at the geostationary orbit, the distance from the terrestrial station is at
least d = 33600 [km]. The electromagnetic waves propagate as a light (c = 3108
[m/s]), minimum propagation time (delay) on link is s = d/c = 112 [ms]. The
satellite is in fact the relay between two terrestrial stations, and if the errors occur,
for a complete retransmission, the total delay for a packet retransmission is
8s = 896 [ms] (the initial transmission earth-satellite-earth, the return path for a
negative acknowledgment for a packet reception, once more the complete path for a
positive acknowledgement after the packet reception for resetting a transmission
buffer). Taking into account that when using ARQ procedure sometimes more
Problems 189
retransmissions are used, it is obvious that the total delay can attain a few seconds.
Due to this reason, in satellite systems instead of ARQ procedure, the FEC tech-
niques are more suitable, where the codes having a possibility to correct all the
detected errors have the advantage (as a difference from e.g. computer networks).
(a) The signaling rate is vb = 200 [kb/s] and the time for one bit transmission is
Tb = 1/vb = 5 [ls]. During one period (0.14 [ms]) 28 bits are transmitted, and
during the bad interval (0.01 [ms]) two neighboring bits may be inverted. The
interleaving period always equals the number of bits transmitted during one
period (LI = 28) and after the deinterleaving it should provide the maximum
distance between the errors. If the disturbances appear regularly (strictly
periodically and approximately having the same duration for every period), it
is suitable to use matrix block-interleaver where one matrix dimension is
determined by the number of errors in one disturbance period (denoted
by l) and the other by a code word length n, where nl = LI yielding
n = 28/2 = 14.
There are two Hamming codes having this code word length. A code (14, 9) can
detect two errors, but it is not interesting, because there is no the retransmission.
Therefore, code (14, 10) is chosen having the greater code rate as well as the same
capability for error correction as the code (14, 9).
(b) The procedure for obtaining the Hamming code (14, 10) code words is given
by the following cliché, where the calculating the parity-check bits and syn-
drome are shown as well
1 0001 z1
2 0010 z2
3 0011 i1 z1= i1⊕ i2⊕ i4 ⊕ i5⊕ i7 ⊕ i9,
4 0100 z3 z2= i1⊕ i3⊕ i4 ⊕ i6⊕ i7 ⊕ i10
5 0101 i2
6 0110 i3 z3= i2⊕ i3⊕ i4 ⊕ i8⊕ i9 ⊕ i10
7 0111 i4 z4= i5⊕ i6⊕ i7 ⊕ i8⊕ i9 ⊕ i10
8 1000 z4
9 1001 i5 s1= r1⊕ r3⊕ r5 ⊕ r7⊕ r9 ⊕ r11⊕ r13
10 1010 i6
11 1011 i7 s2= r2⊕ r3⊕ r6 ⊕ r7⊕ r10 ⊕ r11⊕ r14
12 1100 i8 s3= r4⊕ r5⊕ r6 ⊕ r7⊕ r12 ⊕ r13⊕ r14
13 1101 i9 s4= r8⊕ r9⊕ r10 ⊕ r11⊕ r12 ⊕ r13⊕ r14
14 1110 i10
15 1111
Fig. 5.10 The illustration of the working principle of matrix interleaver and deinterleaver
first row in the deinterleaver matrix corresponds to the first interleaver column and
the errors are not more concentrated into a packet, but they are at maximum
distances. If not more than l successive errors occurred and the period having the
length LI = nl, at the decoder input there would not be any received word having
more than one error.
For a given sequence at the receiver input, the forming of code words by
interleaver is as follows.
y ¼ ð0011101011101111100100010001Þ
#
0 1 1 1 1 1 1 1 0 0 0 0 0 0 ! r ðIÞ
0 1 0 0 1 0 1 1 1 1 0 1 0 1 ! r ðIIÞ
1 2 3 4 5 6 7 8 9 10 11 12 13 14
s1 ¼ r1 r3 r5 r7 r9 r11 r13 ¼ 0 1 1 1 0 0 0 ¼ 1
s2 ¼ r2 r3 r6 r7 r10 r11 r14 ¼ 1 1 1 1 0 0 0 ¼ 0
s3 ¼ r4 r5 r6 r7 r12 r13 r14 ¼ 1 1 1 1 0 0 0 ¼ 0
s4 ¼ r8 r9 r10 r11 r12 r13 r14 ¼ 1 0 0 0 0 0 0 ¼ 1
and because the error is at the ninth position (1001), the nearest code word and
decoded sequences are
cˆ ( I ) = (01111111100000) ⇒ iˆ( I ) = (1111100000) .
The second word decoding is as follows
Problems 191
s1 ¼ r1 r3 r5 r7 r9 r11 r13 ¼ 0 0 1 1 1 0 0 ¼ 1
s2 ¼ r2 r3 r6 r7 r10 r11 r14 ¼ 1 0 0 1 1 0 1 ¼ 0
s3 ¼ r4 r5 r6 r7 r12 r13 r14 ¼ 0 1 0 1 1 0 1 ¼ 0
s4 ¼ r8 r9 r10 r11 r12 r13 r14 ¼ 1 1 1 0 1 0 1 ¼ 1
In both cases, the errors were in the same position in the code words (corre-
sponding to the neighboring positions in the interleaver entering sequence)—it is
packet error having the length l = 2.
(c) The complete block-scheme of the system for data transmission is shown in
Fig. 5.11. The symbol probabilities determine the average information per
symbol emitted by a source, i.e. the entropy
X
q
HðSÞ ¼ Pðsi ÞldðPðsi ÞÞ ¼ 1:801 ½Sh/symb:
i¼1
The average code word length cannot be smaller than the entropy. Therefore,
Ns = 5 106 symbols after the compression cannot be represented by less than
Nb = N H(S). After adding the parity-check bits in Hamming encoder (14, 10)
total number of bits emitted through the channel is
vs = ? vb1 = vs L vb 2 = vs L / R
H(s)=1.801Sh/s Compression Errror control
Source encoder, encoder
N s = 5 × 106 Huffman Hamming (14,10)
A, B, C , D, E Matrix block
interleaver, 2×14
Channel
Matrix block
deinterleaver, 2×14
vs = 1000 sb / s Compression Errror control
User decoder, decoder
Huffman Hamming (14,10)
and because the transmitting rate is vb = 200 [kb/s], the minimum time for this
sequence transmission is
X
q
L¼ Pðsi Þli ¼ 0:45 1 þ 0:35 2 þ 0:1 3 þ 0:07 4 þ 0:03 4
i¼1
¼ 1:85 [bit/symb];
HðSÞ
g¼ ¼ 97:35%:
L
NS L
tHuff ¼ ¼ 64:75 [s];
Rvb
and because it is shorter than that in the problem text, the given condition is
satisfied. An additional shortening of the time needed for transmission could be
achieved if with the Huffman coding the source extension is combined. However,
this time in any case, cannot be shorter than tmin.
Problem 5.7 Explain the product code construction using a cliché 2 3 and write
the code word structure, i.e. the positions of information and parity-check bits. Find
Problems 193
the code rate, write the code word corresponding to information sequence (000111)
and explain the decoding procedure if during the transmission the error occurred at
the third position.
Solution
In this case the information bits are written in the matrix having the dimensions
2 3 and then one parity-check bit is calculated for every row and for every
column. The code word of the product code is formed by reading the bit sequence
starting from the first row of the cliché as follows
i1 i2 i3 jz1
i4 i5 i6 jz2 ) c ¼ ði1 ; i2 ; i3 ; z1 ; i4 ; i5 ; i6 ; z2 ; z3 ; z4 ; z5 ; z6 Þ;
z3 z4 z5 jz6
r1 r2 r3 jr4
r5 r6 r7 jr8
r9 r10 r11 jr12
where the parity-checks are calculated for the rows and for columns. If there is no
more than one error, its position can be easily found and the error can be corrected.
If the number of information bits in every row is k1, and in every column k2, the
code rate is
k1 k2
R¼ ;
ðk1 þ 1Þðk2 þ 1Þ
in this case R = 6/12 = 1/2. The encoder from sequence (000111) generates the
code word
0 0 0 j0
1 1 1 j1 ) c ¼ ð000011111111Þ:
1 1 1 j1
After the channel error occurred at the third position, at the receiving end the
cliché is formed
0 0 1 0 1
1 1 1 1 0
r = (001011111111) ⇒
1 1 1 1 0
0 0 1 0
194 5 Block Codes
and the error is at the intersection of the first row and the third column, i.e. on the
third position in the code word. The information sequence is correctly decoded. The
reader should find the generator matrix of the code, to calculate the minimum
Hamming distance and to check the number of errors correctable and detectable by
the code.
Problem 5.8 Consider a linear block code described by the generator matrix
2 3
1 1 1 0 0 0
G ¼ 41 0 0 1 1 0 5:
0 1 0 1 0 1
(a) Find generator and parity-check matrix of the equivalent systematic code
(b) Find the weight spectrum of the code. Comment the code correcting and
detecting capabilities.
(c) Find syndromes corresponding to correctable error patterns of the systematic
code. Is this code perfect? Is it a MDS code?
(d) For a binary symmetric channel calculate the probability of the unsuccessful
error detection. Find the probability that the code does not correct the error.
(e) If the code words (bits) are represented by polar pulses (amplitudes +1 and –1)
and are transmitted over the channel with additive white Gaussian noise
(AWGN), explain the optimum decision rule for decoding. Illustrate the rule
for the case when the received word is (0.1; 0.35; −0.2; −1; 1; 0.15). How the
error probability can be estimated for such procedure?
Solution
It is a linear block code (6, 3), code rate R = 1/2.
(a) A systematic code is simply obtained by permuting the columns to obtain as
the first (or last) three columns a unity submatrix I3
2 3
1 0 0 1 1 0
Gs ¼ 4 0 1 0 0 1 1 5 ¼ ½Ik jP;
0 0 1 1 0 1
2 3
1 1 0
2 3 60 1 17 2 3
1 0 0 1 1 0 6 7 0 0 0
61 0 17
Gs H Ts ¼ 4 0 1 0 0 1 15 6
61
7 ¼ 40
7 0 0 5:
6 0 07
0 0 1 1 0 1 40 0 0 0
1 05
0 0 1
(b) The list of all codewords of the systematic code is given in Table 5.8, and the
corresponding code weight spectrum is shown in Fig. 5.12. It is obvious that a
nonsystematic code has the same spectrum, although the code words are
different (equivalent codes are considered). The minimum Hamming distance
is dmin = 3, relation dmin 2ec + 1 is satisfied for ec 1 and the code
corrects one error in the code word. When ec = 1 the relation dmin ec +
ed + 1 is satisfied for ed 1 (one error can be detected and corrected). If the
decision rule is used where ec = 0, two errors in the code word can be
detected.
4
of code (6, 3)
3.5
2.5
1.5
0.5
0
0 1 2 3 4 5 6
Hamming weight, d
196 5 Block Codes
(c) Standard array of the code has 2n−k = 8 rows and seven error patterns can be
corrected [besides the trivial one (000000)]. On the basis of relation
S ¼ ej H Ts
it is easy to verify that to every error vector containing only single one (i.e. one
error) corresponds a unique syndrome, as shown in Table 5.9. It is a direct con-
sequence of the fact that parity-check matrix Hs has not two identical columns. It is
obvious that every linear code having this feature, and where n < 2n−k holds, can
correct the single errors. Unused syndrome value (111) clearly corresponds to
double error, and the sum of the corresponding columns of parity-check matrix
yields just this value. One of such error vectors is (100001), but it is not a unique
one—the same condition is fulfilled as well for the patterns (010100) and (001010).
The Hamming bound generally is
Xec
n
q nk
ðq 1Þt ;
t¼0
t
where q denotes the code basis (the number of symbols to construct a code words).
Because this code is binary (q = 2, the code word consists of zeros and ones) and
one error at code word can be corrected, the previous relation reduces to
nk n n
2 þ ¼ 1 þ n:
0 1
dmin n k þ 1;
and, as in this case dmin = n−k = 3, the bound is not satisfied with equality.
Because of that the code is not Maximum Distance Separable (MDS), meaning that
for the same number of parity bits, it may be possible to construct code having the
greater minimum Hamming distance (dmin = 4).
(d) If only the code possibility to detect the errors is considered, it is sufficient to
verify does the received word belong to the set of possible code words. If it
does not belong to this set, it is obvious that the errors occurred during
transmission. However, it is possible to emit one code word and to receive the
other—in this case the error will be undetected.
When the transmission system can be modeled as a binary symmetric channel,
the probability of occurrence of exactly d errors at n positions is
Pðn; d Þ ¼ pd ð1pÞnd ;
and the probability of error undetectability equals the probability that error vector
is the same as the code word. It can be found on the weight spectrum basis [25, 26]
X
n
Pe;d ðpÞ ¼ aðdÞpd ð1 pÞnd ;
d¼dmin
where a(d) denotes the number of code words having Hamming weight d.
On the other hand, the probability of the uncorrectable error equals the prob-
ability that the error vector differs from the coset leaders in the standard array. If the
maximum weight of error patterns which correspond to coset leaders is denoted by
l, and the number of patterns having weight i l by L(i), this probability is
X
l
Pe;c ðpÞ ¼ 1 LðiÞpi ð1 pÞni ;
i¼0
are always valid, and when the weight spectrum is not known, the coefficients a
(d) and L(i) can be substituted by a binary coefficients yielding the upper bound of
198 5 Block Codes
0
10
missed detection
missed correction
−2
10
Residual bit errir rate
−4
10
−6
10
−8
10
−10
10
−4 −3 −2 −1
10 10 10 10
Crossover probability in BSC, p
Fig. 5.13 The probability that the error is not detected/corrected, BSC
Y
n Y
n
1 ðyi xi Þ2
pðyjxÞ ¼ pðyi jxi Þ ¼ pffiffiffiffiffiffi e 2r2 ;
i¼1 i¼1 2pr
where it is supposed that the adjacent noise samples are statistically independent.
The previous expression can be minimized if the Maximum Likelihood Decoding
(ML) is applied, which reduces in this case to choice of the sequence having a
minimum squared Euclid distance defined by Morelos-Zaragoza [26]
Problems 199
Fig. 5.14 The illustration of two decision rules for linear block codes
X
n
D2 ðx; yÞ ¼ ðyi xi Þ2 :
i¼1
Xn rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aðdÞ Eb
Pe;c;meko ðEb =N0 Þ erfc dR ;
d¼d
2 N0
min
where R denotes a code rate and Eb/N0 is energy per bit divided by the noise power
density spectrum. The crossover probability for the equivalent BSC is
rffiffiffiffiffiffiffiffiffi
1 Eb
pðEb =N0 Þ ¼ erfc R ;
2 N0
and the probability of residual error versus Eb/N0 for hard decision can be written as
X
l rffiffiffiffiffiffiffiffiffii rffiffiffiffiffiffiffiffiffini
1 Eb 1 Eb
Pe;c;tvrdo ðEb =N0 Þ ¼ 1 LðiÞ erfc R 1 erfc R :
i¼0
2 N0 2 N0
The coding gain (G) is usually defined as a saving in Eb/N0 ratio in comparison
to the case when the error control code was not applied, for some fixed error
probability
0
10
Residual bit error rate after decoding, Pe,c
−5
10
−10
10
uncoded
code (6,3), hard decoding
code (6,3), soft decoding
−15
10
0 5 10 15
Eb/No [dB]
For the considered code (6, 3) the corresponding numerical results are shown in
Fig. 5.15.
From the figure the following coding gains can be found:
• for the crossover probability 10−4; • for the crossover probability 10−10;
– Ghard(10−5) = –0.3 dB, – Ghard(10−15) = 0.1 dB,
– Gsoft(10−5) = 2.6 dB, – Gsoft(10−15) = 3 dB,
(a) Find the generator matrix of an equivalent systematic code. How the corre-
sponding dual code can be described?
(b) Find the weight spectrum of the code defined by the generator matrix G.
202 5 Block Codes
(c) Find the probability that this code does not detect the transmission error. Find
the probability that dual code does not detect the transmission error
Solution
(a) From the theory of linear block codes it is known that the weight spectrum
does not change (the code is equivalent) if the following operations with the
generator matrix are performed:
(1) Permutation of any two columns
(2) Adding of one row multiple to the other row
(3) Multiplication of one row or column with a nonzero element
Generator matrix of the corresponding systematic code has the form Gs ¼ ½Ik ; P,
its parameters are n = 12 and k = 10. To achieve that first eight rows of matrix
G form a unity matrix, the binary one from the first row, the fourth column, should
be removed, as well as from 7th and 8th columns in the last row. The first row of
systematic generator matrix is obtained by addition of the first and fourth row of the
matrix G. Finally, by adding 7th, 8th and the last row of G the last row of new
matrix is obtained
2 3
1 0 0 0 0 0 0 0 0 0 0 1
60 1 0 0 0 0 0 0 0 0 1 07
6 7
60 0 1 0 0 0 0 0 0 0 1 17
6 7
60 0 0 1 0 0 0 0 0 0 0 17
6 7
60 0 0 0 1 0 0 0 0 0 1 17
Gs ¼ 6
60
7 ¼ ½I8 ; P:
6 0 0 0 0 1 0 0 0 0 0 077
60 0 0 0 0 0 1 0 0 0 1 17
6 7
60 0 0 0 0 0 0 1 0 0 0 17
6 7
40 0 0 0 0 0 0 0 1 0 1 05
0 0 0 0 0 0 0 0 0 1 1 0
Code words of this code (12, 2) are all linear combinations of the rows of the
matrix, i.e. c1 = (000000000000), c2 = (011010101110), c3 = (101110110001) and
c4 = (110100011111). It is obvious that the minimum Hamming distance of dual
code is dmin = 7.
Problems 203
(b) Weight spectrum of the code defined by G can be found directly, by a com-
puter search. In this case all information words (28 = 256) should be found,
everyone should be separately multiplied by generator matrix and find the
Hamming weights of the obtained code words. The weight spectrum of the
code can be written as a polynomial
Að xÞ ¼ a0 þ a1 x1 þ þ an xn ;
where the number of words having weight d is denoted by ad ð d nÞ. The codes
defined by generator matrix G and its systematic version are equivalent, i.e. they
have the same weight spectrum. Because of this, the previous relation determines
the systematic code weight spectrum as well. But, for large values of n and k, the
finding of weight spectrum by computer search is very time consuming.
However, the procedure can be accelerated having in view that for the case
k > (n − k) it is simpler to find the weight spectrum of a dual code. If with B(x) the
spectrum polynomial of dual code is denoted, where the coefficients Bd determine
the spectrum, than the MacWilliams identities can be formulated giving the cor-
respondence between the dual codes spectra written as polynomials [27]
1x 1x
AðxÞ ¼ 2ðnkÞ ð1 þ xÞn B ; BðxÞ ¼ 2k ð1 þ xÞn A :
1þx 1þx
In this case the spectrum of dual code (for which Gd,s = Hs) is described by a
polynomial
Bð xÞ ¼ 1 þ 2x7 þ x8 ;
Spectra of a dual code (12, 2) and the original code (12, 10) are shown in
Fig. 5.16a and b, respectively. The dual code has a possibility to correct all single,
204 5 Block Codes
(a) (b)
d
Number of code words with given weight, ad
200
1.5
150
1
100
0.5 50
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Fig. 5.16 Spectrum of dual code (12, 2) (a) and the original code (12, 10) (b) defined by G
double and triple errors in the code word having the length n = 12 bits, while the
code defined by G cannot even detect all single errors.
(c) As it was explained in the previous problems, the probability that this linear
code does not detect the error is
X
n X
n
p d
Pe;d ðpÞ ¼ ad pd ð1 pÞnd ¼ ð1 pÞn ad ð Þ ;
d¼dmin d¼dmin
1p
and because a(0) = 1 and a(d) = 0 for 1 d dmin always holds, the previous
relation becomes
" #
n
X
n
p d n p
Pe;d ðpÞ ¼ ð1 pÞ ad ð Þ 1 ¼ ð1 pÞ Að Þ1 :
d¼0
1p 1p
and finally
X
n
Pe;d ðpÞ ¼ 2ðnkÞ bd ð1 2pÞd ð1 pÞn :
d¼0
On the other hand, the probability that the dual code (12, 2) does not detect the
transmission error is given by
Problems 205
0
10
−5
10
original code, G
dual code, G
dual
−10
10
−15
10
−3 −2 −1
10 10 10
Crossover probability in BSC, p
Fig. 5.17 The probability that the codes do not detect the code word error versus the crossover
probability
X
n
Pe;d;dual ðpÞ ¼ bd pd ð1 pÞnd :
d¼dmin;dual
The probability that the codes do not detect the error versus the channel error
probability is shown in Fig. 5.17.
Problem 5.10
(a) Explain the construction of the Reed-Muller (RM) codes with parameters (8,
4) and (8, 7). Draw code distances spectra and find their parity-check matrices.
(b) Draw the spectrum of the code obtained by forming 5th, 6th and 7th matrix
G of the code RM(8, 7) if instead of vector multiplication the vector addition
was used. Comment the result.
(c) Explain the majority decoding logic procedure for the code (8, 4) and then
decode the word r = (00000010).
(d) If it is known that in a data transmission system is used Reed-Muller code
which corrects one and detects two errors in the code word, decode the
received word r2 = (0000001000000000).
Solution
It is possible to construct for any natural number m and any natural number r < m a
Reed-Muller code having code word length n = 2m with minimum Hamming
206 5 Block Codes
distance dmin = 2m−r [28, 29]. To Reed-Muller code of tth order, denoted by RM
(2m, k, 2m−r), corresponds code word length
r
X
m
k¼ ; r m:
i¼0
i
For the case r = 1, the generator matrix rows are determined by vectors 1, x1, x2,
x3, where x1 corresponds to a bit of the maximum weight (MSB) and x3 corre-
sponds to a bit of minimum weight (LSB) of the corresponding three-bit combi-
nations resulting in
Problems 207
⎡1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 1 1 1 1⎥⎥ ⎡G(8,4)
(0)
⎤
G(8,4) =⎢ = ⎢ (1) ⎥ .
⎢0 0 1 1 0 0 1 1⎥ ⎢⎣G(8,4) ⎥⎦
⎢ ⎥
⎣0 1 0 1 0 1 0 1⎦
Weight spectrum of the code is shown in Fig. 5.18a. It is obvious that the
minimum code Hamming distance is dmin,1 = 4 and that the code is equivalent to
Hamming code (8, 4) which has the code rate R1 = 4/8 = 0.5. It is easy to verify
that this code is selfdual, because of
⎡1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 1 1 1 1⎥⎥
⎢
1⎥ ⎡G(8,7) ⎤
(1)
⎢0 0 1 1 0 0 1
⎢ ⎥ ⎢ (2) ⎥
G(8,7) = ⎢0 1 0 1 0 1 0 1⎥ = ⎢G(8,7) ⎥ .
⎢0 0 0 0 0 0 1 1⎥ ⎢⎢G(8,7)
(3) ⎥
⎢ ⎥ ⎣ ⎦⎥
⎢0 0 0 0 0 1 0 1⎥
⎢0 0 0 1 0 0 0 1⎥⎦
⎣
H ð8;7Þ ¼ ½ 1 1 1 1 1 1 1 1 :
and this code corresponds to a simple parity check, its spectrum being shown in
Fig. 5.18b.
(a) (b)
Number of code words with given weight, a(d)
Number of codewords with given weight, a(d)
14 70
12 60
10 50
8 40
6 30
4 20
2 10
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Hamming weight, d Hamming weight, d
Fig. 5.18 Weight spectrum of RM(8, 4) (a) and RM(8, 7) (b) code
208 5 Block Codes
S ¼ rH Tð8;4Þ ;
it is clear that the result can be equal to zero or to one, i.e. the syndrome consists of
one bit (scalar). If S = 0 the conclusion can be drawn that during the transmission
there were no errors (or an even number of errors occurred), while if S = 1 the
conclusion can be drawn that one error occurred (or an odd number of errors
occurred). Therefore, single errors can be detected, but not corrected, because their
position cannot be found, which corresponds to the value dmin,2 = 2 used for a code
construction.
(b) If for obtaining the last three rows of the code (8, 7) generator matrix, instead
of multiplying (as for Reed-Muller algorithm), the adding was used, the
generator matrix rows (vectors) would have been 1, x1 ; x2 ; x3 ; x1 ; x2 ; x1
x3 ; x2 ; x3 yielding
2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
6 7
60 0 1 1 0 0 1 17
6 7
Gð8;7Þ ¼6
60 1 0 1 0 1 0 177;
60 0 1 1 1 1 0 07
6 7
40 1 1 0 0 1 1 05
0 1 0 1 1 0 1 0
H ð8;7Þ ¼ ½ 1 1 1 1 1 1 1 1 :
80
60
40
20
0
0 1 2 3 4 5 6 7 8
Hamming weight, d
8 8!
¼ ¼ 70\112:
4 2 4!
Therefore, the transformation done by the encoder is not of 1:1 type, and the set
of words at the encoder output is not even a block code (and, of course, not a linear
block code). This is a reason that in this case (and generally as well) on the basis of
minimum Hamming weight the minimum Hamming distance cannot be determined,
and especially the number of errors detectable and correctable by this code.
(b) From the generator matrix
⎡1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1⎥⎥
⎢ ⎡ G1 ⎤
G = ⎢0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1⎥ = ⎢ ⎥ .
⎢ ⎥ ⎣G2 ⎦
⎢0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1⎥
⎢⎣0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1⎥⎦
c1 ¼ i1 ; c2 ¼ i1 i4 ; c3 ¼ i1 i3 ; c4 ¼ i1 i3 i4 ; c5
¼ i1 i2 ; c6 ¼ i1 i2 i4 ; c7 ¼ i1 i2 i3 ; c4 ¼ i1 i2 i3 i4 :
ð4Þ
S1 ¼ r1 r2 ¼ c1 c2 e1 e2 ¼ ði1 Þ ði1 i4 Þ e1 e2 ¼ i4 e1 e2
ð4Þ
S2 ¼ r3 r4 ¼ c3 c4 e3 e4 ¼ ði1 i3 Þ ði1 i3 i4 Þ e3 e4 ¼ i4 e3 e4
ð4Þ
S3 ¼ r5 r6 ¼ c5 c6 e5 e6 ¼ ði1 i2 Þ ði1 i2 i4 Þ e5 e6 ¼ i4 e5 e6
ð4Þ
S4 ¼ r7 r8 ¼ c7 c8 e7 e8 ¼ ði1 i2 i3 Þ ði1 i2 i3 i4 Þ e7 e8 ¼ i4 e7 e8
each depending only on the fourth information bit and the error vector bit. Further,
any error vector bit does not influence to values of two sums (either to more sums).
Sums orthogonal to the third information bit are
ð3Þ ð3Þ
S1 ¼ r1 r3 ¼ i4 e1 e3; S2 ¼ r2 r4 ¼ i4 e2 e4 ;
ð3Þ ð3Þ
S3 ¼ r5 r7 ¼ i4 e5 e7; S4 ¼ r6 r8 ¼ i4 e6 e8 ;
ð2Þ ð2Þ
S1 ¼ r1 r5 ¼ i2 e1 e5; S2 ¼ r2 r6 ¼ i2 e2 e6 ;
ð2Þ ð2Þ
S3 ¼ r3 r7 ¼ i2 e3 e7; S4 ¼ r4 r8 ¼ i2 e4 e8 :
On the basis of the calculated orthogonal sums, by majority logic decoding the
corresponding information bit values are estimated
Using the obtained estimations, the code sequence is formed, with the removed
influence of decoded information bits
ð1Þ
r0 ¼ r i02 i03 i04 Gð8;4Þ ;
finally yielding
4
k ¼ 1þ ¼ 5;
1
⎡1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1⎥⎥
⎢ ⎡ G1 ⎤
G = ⎢0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1⎥ = ⎢ ⎥ .
⎢ ⎥ G
⎢0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1⎥ ⎣ 2 ⎦
⎣⎢0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1⎦⎥
Orthogonal sums for i5 include bits of the received word having the ordinal
number determined by the ordinal numbers of column pairs which should be
summed (element by element) to obtain only one “one” in the fifth row, i.e.
i5 : S1 ¼ r1 r2 ; S2 ¼ r3 r4 ; S3 ¼ r5 r6 ; S4 ¼ r7 r8 ; S5 ¼ r9 r10 ; S6
¼ r11 r12 ; S7 ¼ r13 r14 ; S8 ¼ r15 r16 ;
therefore, the adjacent bits are summed and the decision is based on the majority
logic.
Orthogonal sums corresponding to bits i4, i3 and i2 are
and the received bits being apart for 2, 4 and 8 positions are summed. As for i5, the
value of every transmitted bit is found based on the majority decision, and the
estimations are denoted with i2′, i3′, i4′ i i5′.
212 5 Block Codes
finally yielding
The decoded information word is i = (00000) and the reconstructed code word
(all zeros) differs from the received one in one bit.
Problem 5.11 Explain the construction of the following arithmetic codes
(a) binary Brown code defined by relation c = 19i + 61, for the base b = 8;
(b) binary Varshamov code, code word length n = 4;
(c) ternary Varshamov code, code word length n = 4;
For every problem part find the code words, weight spectrum, minimum
Hamming distance and the number of correctable errors by the code. Explain the
notion of asymmetric error and verify whether the codes (b) and (c) satisfy the
Varshamov-Gilbert bound.
Solution
The arithmetic block codes construction is based on the arithmetic operation
connecting decimal representations of information and a code word. These relations
are usually very simple and an overview of some typical constructions follows.
(a) Construction method proposed by David Brown in 1960 uses the rule
c = Ai + B, where c represents a decimal equivalent of the code word and i—
a decimal equivalent of the information word [30]. Coefficients A and B are
chosen so as that the binary representations of all code words have suitable
features and that a decimal equivalent of information word has to fulfill the
condition n b − 1. The basis b can be any integer greater than two, and in
the case b = 2k binary equivalent of number i can be any combination of k bits,
and the set of code words consists of 2k n-bits combinations.
For b = 8 in this case, decimal equivalents of information words are numbers
0 i 7 and decimal equivalents of code words can be found using the relation
c = 19i + 61, the corresponding set of code words is given in Table 5.12. The code
word length is determined by the number of bits needed to represent the largest
decimal number c (here cmax = 194) yielding here n = 8, the corresponding code
rate is R = k/n = 3/8.
Problems 213
0
0 1 2 3 4 5 6 7 8
Hamming weight, d
It is obvious that the set of code words does not include combination “all zeros”
leading to the conclusion that the code is not a linear one. This is the reason that the
minimum Hamming distance cannot be found from the code words weights,
because the sum of two code words may not be a code word. The distance spectrum
of non linear codes is obtained by comparing all pairs of code words. In this case
there is total of 2k(2k – 1)/2 = 28 code word pairs and the corresponding spectrum
is shown in Fig. 5.20. Minimum Hamming distance is d = 3 and the code can
correct all single errors, by a search to find the code word which differs from the
received word in one bit. It is interesting that these codes besides the single errors
can correct as well some specific types of multiple errors.
214 5 Block Codes
(b) Rom Varsharmov (Poм Bapшaмoв) in 1965 proposed a method for block
code construction where the ordered n-tuples c = (c1, c2, …, cn) are consid-
ered, where every symbol can be an integer from the set {0, 1, …, q − 1} [31].
This ordered n-tuple is a code word only if the following holds
!
X
n
ixi mod ðn þ 1Þ ¼ 0:
i¼1
Therefore, all the words c = (c1, c2, …, cn) satisfying the above condition form
so called integer block code not being necessarily linear in the general case. This
procedure can be modified by summation using some other modulo m > n (e.g.
m = 2n + 1), the remainder being equal to some in advance fixed number from the
set {0, 1, …, m − 1}.
For q = 2, it is a binary code, and in a case n = 4 and m = n+1 = 5 only the
words (0000), (1001), (0110) and (1111) satisfy the relation
" #
X
4
ixi mod 5 ¼ 0;
i¼1
it is easy to verify that the code is linear and that its generator matrix is
1 0 0 1
G¼ ;
0 1 1 0
(a) (b)
4 4
Number of code word pairs with given distance
Number of code words with given weight
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming weight Hamming distance
Fig. 5.21 Weight spectrum (a) and the code distances spectrum of binary Varshamov code (4, 2)
(b)
Problems 215
It is interesting to note that the weight spectrum is not fully equivalent to the
distances spectrum even when the code is linear one. The code distances spectrum
is found by the comparison of all different code words (giving always a(0) = 0!),
and there is a total of 2k(2k − 1)/2 = 6 combinations, shown in Fig. 5.21b. E.g.
there are two code words pairs ((0000) and (1111), (0110) and (1001)) where the
Hamming distance equals 4, but there is one word only having the Hamming
weight d = 4. However, minimum distance and minimum weight are same, and the
spectrum shapes are the same for d 6¼ 0.
Minimum Hamming distance is dmin = 2 and the code cannot correct all single
errors. However, if it is supposed that the error probability for binary zero is much
higher than the error probability for binary one (the corresponding binary channel
has parameters P(1/0) P(0/1) 0, what is the case for some magnetic recording
systems), i.e. the errors are “asymmetric” and an asymmetric error can result in
following transitions
d 1
X n
Aq ðn; dÞ qn = ðq 1Þ j :
j¼0
j
!
X
4
ici mod 9 ¼ 0
i¼1
216 5 Block Codes
20
15
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming distance, d
is satisfied by code words (0000), (0111), (0222), (1002), (1120), (1201), (2011),
(2122), (2210) (rk = 9 code words). This code is not linear, although includes
identity element for addition, because e.g.,
and it is not a code word meaning that the closure for addition is not satisfied.
Varshamov-Gilbert bound for q = 3, n = 4 and d = 2 yields
34
Aq ðn; dÞ ¼ 9;
1 þ 4 21
0 1 2 3 4 1
H2 ¼ :
1 1 1 1 1 0
Solution
(a) For integer codes information and code words symbols take the values from
the set Zr = {0, 1, 2, …, q − 1} while all operation during the code words
forming are modulo-q.
For any two elements a and b from ring Zq two operations are defined—addition
and multiplication. Ring properties (axioms) are:
1. The set {0, 1, 2, …, q − 1} is an additive Abelian group and 0 is an identity
element for addition modulo-q (denoted by a + b).
2. For the multiplication (denoted by ab), the product is in the ring ab 2 Zr
(closure).
3. The multiplication is associative a(bd) = (ab)d.
4. The multiplication is distributive in relation to addition, i.e. a(b + d) = ab + ad,
(b + d)a = ba + da.
A ring can be commutative (if ab = ba for any pair of elements), but it is not
always the case. Every ring element has an inverse element for addition (their sum
equals 0), but it should be noted that an inverse element for multiplication is not
needed, and the division cannot be defined in that case. If these inverse elements are
in the ring Zq as well and if the multiplication group is commutative, the ring would
become a field. The ring of integers is commutative and the subtraction is defined
by
a b ¼ a þ q b;
but the inverse elements for multiplication are not defined (the rational numbers are
not in a set).
New ring Zqn has qn ordered n-tuples, which have symbols from the originating
ring Zq, and Zqn can be regarded as an nth extension of the ring Zq. Now an integer
code with a basis q and parameters (n, k) can be defined as a set of code words,
denoted by c, satisfying the conditions [33]
218 5 Block Codes
n o
c 2 Zqn ; cH T ¼ 0 :
The first condition shows that a code word must be an ordered n-tuple from the
ring Zqn and matrix H is a parity-check matrix, dimensions (n − k) n, with the
elements from the ring Zq. Of course, the capability of integer code to correct the
errors depends on parity-check matrix.
(b) In this case n = 4 and n – k = 2, and the generator matrix can be found on the
base of relation
2 3
0 1
g11 g12 g13 g14 6
61 077¼ 0 0
GH T ¼ 0 ) ;
g21 g22 g23 g24 4 1 15 0 0
1 2
the code is ternary one and all operations are modulo q = 3. The previous matrix
equation can be written in an expanded form
and it is obvious that this system has not a unique solution. E.g. one solution is
g11 = 2, g12 = 2, g13 = 0, g14 = 1, g21 = 0, g22 = 1, g23 = 1, g24 = 1 and the other
could be g11 = 0, g12 = 1, g13 = 1, g14 = 1, g21 = 2, g22 = 2, g23 = 0, g24 = 1.
It is interesting to note that the code is self-dual, because
2 3
0 1
0 1 1 1 6
61 077¼ 0 0
;
1 0 1 2 41 15 0 0
1 2
(c) All code words of the code can be found multiplying all possible information
words by generator matrix yielding
Problems 219
0 1 1 1 0 1 1 1
c1 ¼ i1 G ¼ ½ 0 0 ¼ ½0 0 0 0 ; c2 ¼ i2 G ¼ ½ 0 1 ¼ ½1 0 1 2 ;
1 0 1 2 1 0 1 2
0 1 1 1 0 1 1 1
c3 ¼ i3 G ¼ ½ 0 2 ¼ ½2 0 2 1 ; c4 ¼ i4 G ¼ ½ 1 0 ¼ ½0 1 1 1 ;
1 0 1 2 1 0 1 2
0 1 1 1 0 1 1 1
c5 ¼ i5 G ¼ ½ 1 1 ¼ ½1 1 2 0 ; c6 ¼ i6 G ¼ ½ 1 2 ¼ ½2 1 0 2 ;
1 0 1 2 1 0 1 2
0 1 1 1 0 1 1 1
c7 ¼ i7 G ¼ ½ 2 0 ¼ ½0 2 2 2 ; c8 ¼ i8 G ¼ ½ 2 1 ¼ ½1 2 0 1 ;
1 0 1 2 1 0 1 2
0 1 1 1
c9 ¼ i9 G ¼ ½ 2 2 ¼ ½2 2 1 0 :
1 0 1 2
Because the code is linear, the Hamming distance between any two code words
corresponds to the Hamming weight of the word obtaining by their addition. The
Hamming weight corresponds to the number of non-zero elements in the code
words, and it is obvious that the minimum Hamming distance in this code is
dmin = 4. The corresponding distance spectrum is shown in Fig. 5.23.
An integer code can correct ec errors of weight t if it is possible to correct all
error vectors e = (e1, e2, …, en) having the Hamming weight w(e) ec, where
ei 2 ft; t þ 1; . . .; t 1; tg. To achieve this feature, the syndromes correspond-
ing to all correctable error patterns must be different, and a Hamming bound
generalization can be defined
ec
X n
qnk ð2tÞi :
i¼0
i
Number of code word pairs with given distance, a(d)
4.5
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming distance, d
Therefore, the errors of the type −1 and +1 at any code word position can be
corrected. Taking into account that for operations modulo-3, the subtraction of one
(–1) corresponds to the adding of two (+2), the correctable error patterns are (0001),
(0002), (0010), (0020), (0100), (0200), (1000) and (2000).
It is obvious that the code can correct all possible error patterns, and the cor-
responding syndromes are defined by
S ¼ rH T ¼ eH T ;
S1 ¼ ½ 0 0 0 1 H T ¼ ½ 1 2 ; S2 ¼ ½ 0 0 0 2 H T ¼ ½ 2 1 ;
T T
S3 ¼ ½ 0 0 1 0 H ¼ ½ 1 1 ; S4 ¼ ½ 0 0 2 0 H ¼ ½ 2 2 ;
T T
S5 ¼ ½ 0 1 0 0 H ¼ ½ 1 0 ; S6 ¼ ½ 0 2 0 0 H ¼ ½ 2 0 ;
S7 ¼ ½ 1 0 0 0 H T ¼ ½ 0 1 ; S8 ¼ ½ 2 0 0 0 H T ¼ ½ 0 2 :
Obviously, when there are no errors, the same syndrome is always obtained
S0 ¼ ½ 0 0 0 0 H T ¼ ½ 0 0 :
(d) In this case n = 6, and n − k = 2, the generator matrix has the dimensions
4 6 and can be written as
2 3 2 3
g11 g12 g13 g14 g15 g16 T 0 0
6 g21 g22 g23 g24 g25 g26 7 60 07
6 7 0 1 2 3 4 1
¼6 7;
4 g31 g32 g33 g34 g35 g36 5 1 1 1 1 1 0 40 05
g41 g42 g43 g44 g45 g46 0 0
the code basis is q = 5 and the equation system corresponding to the first generator
matrix column is
whose one solution is g11 = g12 = g13 = g14 = g15 = 1, g16 = 0, and the other
g11 = 1, g12 = g15 = g16 = 0, g13 = g14 = 2.
The equation of a similar form can be written as well for the other rows of
generator matrix, but the solutions for the various rows of the matrix G have to be
mutually different. One possible form of generator matrix is
2 3
1 1 1 1 1 0
61 2 0 2 0 27
G¼6
42
7;
2 1 0 0 15
0 0 2 1 2 0
where one should be careful that the third and the fourth row are not the linear
combinations of the upper two (the operations are modulo-5).
The corresponding distance spectrum is shown in Fig. 5.24. Total number of
code words is 54 = 625 and a minimum Hamming distance is dmin = 3, the code
can correct ec = 1 error in the code word.
In this case q = 5 and ec = 1, and a generalized Hamming bound is
64 6
5 1þ 2t ) 25 1 þ 12t;
1
here t = 2 and the errors of the type −2, −1, 0, +1, +2 can be corrected at any
position in the code word. Correctable error patterns are
Number of code word pairs with given distance, a(d)
300
250
200
150
100
50
0
0 1 2 3 4 5 6
Hamming distance, d
therefore, the code can correct the error of any weight at any position in the code
word.
The aim of this subsection is to facilitate the study of block codes. Practically all
proofs are omitted. The corresponding mathematical rigor can be found in many
excellent textbooks [25, 27, 34].
Algebraic systems satisfy rules which are very often the same as applied to an
“ordinary number system”.
Groups
Let G be a set of elements (a, b, c, …). An operation (⊗) is defined for which
certain axioms hold. The operation is a procedure applied to two elements to obtain
the uniquely defined third one (it is in fact a binary operation). It can be denoted
f ða; bÞ ¼ c;
8a; b; c 2 G: a ðb cÞ ¼ ða bÞ c¼a b c:
Brief Introduction to Algebra I 223
G3. (Identity element) There is a unique identity (neutral) element e2G satis-
fying (even the operation is not commutative).
8a 2 G: e a¼a e ¼ a:
8a 2 G; 9b 2 G: a b¼b a ¼ e:
8a; b 2 G: a b¼b a;
If the number of elements in a group is finite, the group is called a finite group
(the number of elements is called order of G.
Examples:
1. The set of integers (positive, negative and zero) is a commutative group under
addition.
2. The set of positive rationals is a commutative group under multiplication.
3. The set of real numbers (excluding zero) is a commutative group under
multiplication.
4. The set of n n real-valued matrices is a commutative group under matrix
addition.
5. The set of n n nonsingular (det 6¼ 0) matrices is a non-commutative group
under the matrix multiplication.
6. The functions f1(x) = x and f2(x) = 1/x under the operation
are the only elements of a binary group (i.e. the group consisting of only two
elements). The corresponding table (for every finite group this table can be
constructed) for the operation is
f1 ðxÞ f2 ðxÞ
f1 ðxÞ f1 ðxÞ f2 ðxÞ
f2 ðxÞ f2 ðxÞ f1 ðxÞ
The identity element is f1(x) and every element is its own inverse. Is a finite
group of the order 2.
7. The linear transformations (rotation or reflections) of an equilateral triangle into
itself are the elements of an algebraic group of the order 6.
B
A C
a b
The obtained group is not commutative (the table is not symmetric with respect
to the main diagonal). The corresponding inverses can be easily found from the
table
8. The set of integers G = {0, 1, …, m − 1} is a group under modulo-m addition
(i.e. the result is obtained as the remainder from dividing the sum by m). For
m = 3, the table is the following
⊕ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
The table is practically the same as for Example 6. In fact, there is a unique table
for all groups consisting of two elements (all groups of order 2 are isomorphic). The
same table is obtained for modulo-2 addition (by putting a = 1).
226 5 Block Codes
Subgroup
Let G be a group. Let H be a subset of G. H is a subgroup of G if its elements
satisfy all the axioms for the group itself with the same operation. The identity
element must be in the subgroup.
Examples:
1. In the group of 6 transformations of the equilateral triangle (Ex. 7), the fol-
lowing sets are subgroups: (1, a, b), (1, c), (1, d) and (1, e).
2. In the group of all integers (Example 1), the subset of integers that are multiples
of any integer is a subgroup.
Rings
The introducing of another “operation” results in an algebraic structure called
ring. One operation is called addition (denoted as a + b), the other is called mul-
tiplication (denoted as ab). In order for a set R to be a ring, some axioms must hold.
Axioms for the ring:
R1. The set R is a commutative group under addition (the identity element is
denoted by 0).
R2. (Closure) The operation “multiplication” is defined for any two elements of R,
the product being also always element of R, i.e.
8a; b 2 R; ab 2 R:
aðbcÞ ¼ ðabÞc:
Note that the existence of identity element for multiplication (denoted e.g. by 1)
is not supposed. A ring having this identity element is called a ring with identity. In
such a ring
a1 ¼ 1a ¼ a:
+ 0 a
0 0 a
a a 0
⋅ 0 a ⋅ 0 a
0 0 0 or 0 0 0
a 0 a a 0 0
Fields
The group can be considered as a set in which additions and subtraction are
possible, the ring—as a structure where besides the addition and subtraction, the
multiplication is also possible. To include division, a more powerful algebraic
structure is needed. This is a field.
Axioms for the field:
F1. The set F is a commutative group under addition.
F2. The set F is closed under multiplication. All set nonzero (6¼0) elements form a
commutative group under multiplication.
F3. The distributive law holds for all field elements, i.e.
+ 0 1 ⋅ 0 1
0 0 1 0 0 0
1 1 0 1 0 1.
2. It can be shown that the field with q elements exists only if q is a prime (p) or a
power of a prime (pn). All fields of the same order have the same tables (they are
isomorphic).
Brief Introduction to Algebra I 229
3. The tables for the field of order p (a prime) are obtained by modulo-p addition
and modulo-p multiplication. The field is also called a prime field.
Examples:
1. The tables for GF(3) (instead of “2” as a third field element a can be written):
+ 0 1 2 ⋅ 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1.
2. The tables for GF(4) = GF(22). The other two field elements are denoted by
a and b.
+ 0 1 a b ⋅ 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a.
The table for modulo-4 multiplication even does not correspond to the table of
the group (because 2 2 = 4, the element 2 does not have the inverse element—as
commented earlier the set of integers 0, 1, …, q is a group under modulo-q mul-
tiplication only if q is a prime number). By comparing these tables to tables for GF
(4) it can be noted that even the table for addition in GF(4) does not correspond to
modulo-4 addition.
The further analysis of the finite fields as well as of the way to obtain the tables
for GF(pn) will be postponed for the next chapter.
Coset Decomposition and Factor Groups
Suppose that the elements of a finite group G are g1, g2, g3, … and the elements
of a subgroup H are h1, h2, h3, … (h1 is the identity element). Construct the array as
follows—the first row is the subgroup itself with the identity element at the left (i.e.
h1, h2, …). The first (leading) element in the second row (g1) is any element not
appearing in the first row (i.e. not being the element of the subgroup), the other
elements are obtained by the corresponding (group) operation g1 ⊗ hi (in fact, the
first element of the row can be envisaged as g1 ⊗ h1 = g1). The next row is formed
by choosing any previously unused group element in the first column (e.g. g2). The
construction is continued until all the elements of a (finite) group appear somewhere
230 5 Block Codes
in the array. The row is called a left coset (if the commutativity is supposed, it is
simply a coset), and the first element the coset leader. The array itself is a coset
decomposition of the group. If the commutativity is not supposed there exists also a
right coset.
The array is (the multiplication is supposed):
h1=1 h2 h3 . . . hn
g1h1=g1 g1 h2 g1h3 . . . g1hn
g2h1=g2 g2 h2 g2h3 . . . g2hn
. . . . . . .
. . . . . . .
. . . . . . .
gmh1=gm gmh2 gmh3 . . . gmhn
Further, it can be proved that every element of the group will appear only once in
a coset decomposition. It can be proved also that the group elements g′ and g′′ are in
the same (left) coset if (and only if) (g′)−1g′′ is an element of the subgroup.
There is a well known Lagrange’s theorem stating that the order of a subgroup
must divide the order of a group. It is clearly seen (but not directly proved!) from
this rectangular decomposition. It means further, that in a group whose order is a
prime number there are no subgroups, except the trivial ones—the whole group and
the identity element.
Example:
Consider noncommutative group corresponding to the linear transformations of
an equilateral triangle (Example 7 for groups).
(a) Decomposition using subgroup H1(1, a, b)
1 a b or 1 a b or 1 a b
c e d e d c d c e
8h 2 H ^ 8g 2 G: g1 hg 2 H:
For a normal subgroup every left coset is also a right coset. In the commutative
groups every subgroup is normal. For the preceding example, H1 is a normal
subgroup.
Further, for a normal subgroup it is possible to define an operation on the cosets.
In fact a new group is formed, the cosets being its elements. The new group is
called the factor group. The corresponding coset containing the element g is
denoted {g}. The operation definition is
Brief Introduction to Algebra I 231
valid only if no matter which element is chosen from the cosets, the resulting coset
is the same. The identity element is the subgroup itself—H = {1}
and the inverse is the coset containing the corresponding inverse element {g−1}, i.e.
fgg g1 ¼ gg1 ¼ f1g:
1 a b
c e d
denoting the cosets by I = {1} and C = {c}, the following factor group is obtained
I C
I I C
C C I
Of course, the group is isomorphic with the other groups of two elements.
2. Let G be the set of integers (positive, negative and zero)—a commutative group
under addition. Let H be the subgroup consisting of multiples of an integer—
n. All the numbers from 0 to n – 1 are in different cosets. They can be taken as
the coset leaders. For n = 3 the cosets are
0 3 -3 6 -6 9 -9 . . .
1 4 -2 7 -5 10 -8 . . .
2 5 -1 8 -4 11 -7 . . .
Denoting the cosets by {0}, {1} and {2} the corresponding table is obtained
In fact, it is the addition modulo-3.
+ {0} {1} {2}
{0} {0} {1} {2}
{1} {1} {2} {0}
{2} {2} {0} {1}
232 5 Block Codes
Vector Spaces
The notion of a vector space is a well known. However, in the following a
specific kind of vector space will be considered whose vectors have the elements
from a finite field.
Axioms for the vector space:
A set of elements (vectors) V is called a vector space over a field F (whose
elements are called scalars) if it satisfies the following axioms:
V1. (Closure under addition) The set V is a commutative group under addition.
V2. (Scalar multiplication) The product cv 2 V is defined (c 2 F, v 2 V). It can be
called an outer operation.
V3. (Distributivity) Scalar multiplication is distributive over vector addition, i.e.
cðu þ vÞ ¼ cu þ cv:
it can be verified easily that a set of n-tuples forms a vector space over a field.
The identity element is 0(0, 0, …, 0). 0 is the identity element for addition in the
field. The following relations hold:
Brief Introduction to Algebra I 233
0v ¼ 0; a0 ¼ 0; ðvÞ ¼ ð1Þv
(–1 is the inverse element for addition of the multiplication identity element in
the field).
A subspace is a subset of a vector space satisfying the axioms for a vector space.
In a vector space a sum of the form (vector)
u ¼ a1 v 1 þ a 2 v 2 þ þ a k v k
c1 v1 þ c2 v2 þ þ ck vk ¼ 0:
The vectors (1, 0, …, 0), (0, 1, 0, …, 0) … can be considered as orts. They are
linearly independent. Therefore, the dimension of the vector space consisting of the
vectors with n elements equals n.
The product of two n-tuples defined as follows
ða1 ; a2 ; . . .; an Þðb1 ; b2 ; . . .; bn Þ ¼ a1 b1 þ a2 b2 þ þ an bn
is called inner product or dot product. If inner product equals zero (0 from the
field), for the vectors is said to be orthogonal. For inner product the commutativity
and distributivity can be easily verified on the basis of the field axioms.
Matrices
In this subsection only the corresponding elements needed for the study of the
codes are exposed.
234 5 Block Codes
In the following exposition matrix elements are also the elements of a finite field.
Therefore, the matrix rows can be considered as a n-tuples of field elements—
vectors (the same for the columns—vectors of m elements).
The number of linearly independent rows is called the row rank (the same for the
number of linearly independent columns—the column rank). The row rank equals
column rank being the rank of the matrix.
The following set of elementary row operations does not change the matrix rank:
1. Interchange of any two rows.
2. Multiplication of any row by a (nonzero) field element
3. Multiplication of any row by a (nonzero) field element and addition of the result
to another row.
Now, the row-space of a matrix can be defined—a space whose basis consists of
the rows of the matrix. Its dimension equals the rank of the matrix. By performing
the elementary row operations (and obtaining another matrix from the first), the
basis is transformed (the space does not change). These matrices have the same
row-space. From the row-space point of view, linearly dependent rows can be
omitted from the matrix.
A suitable simplification of a matrix can be obtained using elementary row
operations. An echelon canonical form has the following properties:
1. Every leading term of a nonzero row is 1 (multiplication identity element from
the field).
2. Every column containing such a leading term has all other elements 0 (addition
identity element from the field).
3. The rows are ordered in such way that the leading term of the next row is to the
right of the leading term in the preceding row. All zero rows (if any) are below
the other rows.
This procedure corresponds to that one when solving the system of linear
equations by the successive elimination of the variables.
If the rows of n n matrix are linearly independent, matrix is non-singular
(det 6¼ 0), its rank is n and at the end of procedure an identity matrix is obtained
(corresponding the complete solution of the system of equations).
When multiplying an m n matrix [aij] by an n p matrix [bjk] the
m p matrix [cik] is obtained where cik is the inner product of the ith row of [aij] by
the kth column of [bjk].
Brief Introduction to Algebra I 235
M1 MT2 ¼ 0 or M2 MT1 ¼ 0T ;
where the corresponding matrix types are: M1(k n), M2((n − k) n) and 0
(k (n – k)). By using the transposing (T) the multiplication of rows of matrix M1
by the rows of matrix M2 is achieved. Therefore, null space V2 for the subspace V1
of dimension k must have the dimension n − k, if the dimension of the whole space
V equals n.
Chapter 6
Cyclic Codes
Cyclic codes (cyclic subspaces) are one of the subclasses of linear codes. They are
obtained by imposing on an additional strong structure requirement. A brief
overview of the corresponding notions of abstract algebra comprising some
examples and comments is at the end of this chapter. Here only the basic definitions
are given. Thus imposed structure allows a successful search for good error control
codes. Further, their underlying Galois field description leads to efficient encoding
and decoding procedures. These procedures are algorithmic and computationally
efficient. In the previous chapter a notion of ring was introduced. For any positive
integer q, there is the ring of integers obtained by modulo-q operations. If q is a
prime integer (p), Galois field GF(p) is obtained. The number of field elements is
just p—all remainders obtained by division modulo-p. These fields are called prime
fields. A subset of field elements is called a subfield if it is a field under the inherited
operations. The original field can be considered as an extension field. Prime
numbers cannot be factored and they have not nontrivial subfields. Consider now a
polynomial over a field GF(q), i.e. the polynomial having the highest power q − 1,
its coefficients being from GF(q). If its leading coefficient equals one, it is called a
monic polynomial. For any monic polynomial p(x) a ring of polynomials modulo p
(x) exists with polynomial addition and multiplication. For a binary ground field
every polynomial is monic. If p(x) is a prime polynomial, i.e. if it is irreducible (it
cannot be further factored) and monic, then a ring of polynomials modulo p(x) is a
field (Galois field). In any Galois field the number of elements is a power of a
prime. In fact, polynomial coefficients are taken from a prime field. For any prime
(p) and any integer (m) there is a Galois field with pm elements. These fields can
have a nontrivial subfields. A primitive element of GF(q) is an element (usually
denoted by a) such that every field element can be expressed as its power. Every
Galois field has a primitive element. Therefore, a multiplicative group in Galois
field is cyclic—all elements are obtained by exponentiation of the primitive element
polynomial). However, the same approach is valid for a nonbinary ground field,
taking into account the corresponding relations in this case. Therefore, this poly-
nomial can be uniquely factored using the minimal polynomials. Euclidean algo-
rithm can be used for polynomials as well as for integers.
A vector is an ordered n-tuple of elements from some field. An equivalent is a
polynomial where the elements are ordered as the polynomial coefficients.
Therefore, vectors and polynomials can be considered as a different way to denote
the same extension field elements.
For vector
vða0 ; a1 ; . . .; an1 Þ
because xn = 1.
An ideal is a subset of elements of a ring if it is a subgroup of the additive group
and a result of multiplication of an ideal element by any ring element is in the ideal.
In an integer ring a set of integer is an ideal if and only if it consists of all multiples
of some integer. Similarly, a set of polynomials is an ideal (in a polynomial ring) if
and only if it consists of all multiples of some polynomial. It was shown that a
subspace is the cyclic subspace if and only if it is an ideal.
Let the power of g(x) is n − k, then linearly independent residue classes
(polynomials) are
Brief Theoretical Overview 239
The power of polynomial xkg(x) is n, and after division by xn − 1 (in binary field
−1 = 1!) it will be represented by the corresponding residue.
Now, cyclic code can be defined. Consider vector space dimension n (length of
code vectors). Subspace of this space is called cyclic (code) if for any vector
vða0 ; a1 ; . . .; an1 Þ
vector
v1 ðan1 ; a0 ; a1 ; . . .; an2 Þ
Linear block
codes
Cyclic codes
240 6 Cyclic Codes
components of b(x) in normal order and reverting and shifting the components of
a(x).
The following short analysis shows the extreme simplicity in description of
cyclic codes:
– For a complete description of (n, k) block code all code words (2k) should be
known.
– For a complete description of linear block code with the same parameters (n, k),
its basis should be known—k code words of length n (generator matrix).
– For a complete description of the corresponding cyclic code it is sufficient to
know only one code word of length n. All others code words are obtained by its
cyclic shifts and as the linear combinations of these shifts.
There are some trivial cyclic codes. Repetition code (n, 1) is trivial (only two
code words), as well as the code (n, n) (full vector space, but no error control!). The
binary code where the number of ones is even is trivial as well. E.g. code (3, 2)
which has code words (000), (011), (101) i (110), and the code (4, 3) which has
code words (0000), (0011), (0101), (0110), (1001), (1010), (1100) i (1111). It is
interesting that for some codewords length, e.g. for n = 19, binary trivial codes are
in the same time unique existing cycling codes.
Consider polynomial x7 + 1 over GF(2). Here
x7 þ 1 ¼ ð 1 þ xÞ 1 þ x þ x3 1 þ x2 þ x3 :
gðx) ¼ 1 þ x2 þ x3
gðx) ¼ ð1011000Þ
xgðx) ¼ ð0101100Þ
x2 gðx) ¼ ð0010110Þ
x3 gðx) ¼ ð0001011Þ
The other code words (vectors, i.e. polynomials) are their linear combinations. In
Fig. 6.2 all code words are shown as well as the way to obtain them using poly-
nomials corresponding to the rows of generator matrix. There are total seven code
words forming a “cyclic set” (all possible cyclic shifts of the word corresponding to
g(x)). Hamming weight is d = 3. These words are linearly independent and by
summing two words from this set, the code word having Hamming weight d = 4 is
obtained. By its cyclic shifts the other “cyclic set” is obtained having seven code
words as well. Including combinations (0000000) ans (1111111) a complete code
(16 code words) is obtained. It is equivalent to Hamming (7, 4) code which has the
same weight spectrum.
Parity check polynomial is
hðx) ¼ ð1 þ xÞ 1 þ x þ x3 ¼ 1 þ x2 þ x3 þ x4 :
It is here obtained by multiplication of the other factors, but it could have been
obtained by division as well, i.e. h(x) = (x7 + 1): g(x).
Polynomial h(x) forms its code (ideal)
hðx) ¼ ð1011100Þ
xhðx) ¼ ð0101110Þ
x2 hðx) ¼ ð0010111Þ:
then the corresponding coefficients vectors are orthogonal (the coefficients of one of
them should be taken in inverse order). Parity-check matrix is here
2 3
0 0 1 1 1 0 1
H ¼ 40 1 1 1 0 1 0 5:
1 1 1 0 1 0 0
It is easy to verify
GH T ¼ 0:
It should be noticed that the rows in this matrix are as well the cyclic permu-
tation of one row. Taking H (h(x)) as a generator matrix (polynomial) a dual code is
obtained.
The same field can be obtained modulo any primitive polynomial. Consider FG
(23). Two primitive polynomials are x3 + x + 1 and x3 + x2 + 1:
x3+x+1 x3+x2+1
(primitive element α) (primitive element β=α 3)
(100) α 0= 1 β 0= 1
1
(010) α = α β 1= β
(001) α 2= α2 β 2= β2
(110) α 3= 1 + α β 3= 1 + β2
4 2 4
(011) α = α + α β = 1 + β + β2
(111) α 5= 1 + α + α2 β 5= 1 + β
6 2 6
(101) α = 1 + α β = β + β2
7 0 7 0
(100) α (α )= 1 β (β )= 1
The finite fields with the same number of elements are isomorphic—they have
the unique operation tables. The difference is only in the way of naming their
elements.
The fact that in both matrix the next rows are only the cyclic shifts of preceding
rows contributes significantly to the elegant and economic encoder and decoder
construction.
From
it is obvious that codes generated by g(x) and h(x) are dual codes. Therefore, to
generate cyclic code word from information polynomial i(x) (power k − 1, because
the are k information bits) one should obtain code polynomial (power n − 1),
divisible by generator polynomial g(x). At the receiver, the received word u
(x) should be divided by g(x) and if the remainder equals zero, the conclusion can
be drawn that there were no errors. Also, u(x) could b multiplied by h(x) and verify
does the result (not remainder!) equals zero (modulo xn − 1!).
Brief Theoretical Overview 243
This result should be divided by g(x) (power n − k). It is obvious that the power
of remainder will be n − k − 1, having in total n − k coefficients
and information bits are not changed, while the rest of bits are parity-checks.
Therefore, a systematic code is obtained. Here, parity-checks are obtained in a more
elegant way comparing to classic way to introduce parity-checks. From the
“hardware point of view” it means that the information bits are put in the upper part
of shift register and parity-checks in the rest of cells.
244 6 Cyclic Codes
Golay code (23, 12) (Problem 6.5) is a very known cyclic code. It was invented
before the invention of cyclic codes and later it was recognized as a cyclic code. It
satisfies the Hamming bound with equality, it is the perfect code.
In communications the Cyclic Redundancy Check (CRC) (Problems 6.6, 6.7,
6.8 and 6.9) is often used, where the cyclic code is applied to detect the errors. In its
systematic version, the number of information bits is not fixed, but it can vary in
some broad range. A main advantage of code shortening is a possibility to adjust
the code word length to a length of the message entering the encoder. In such a
way, the same generator polynomial can be used for encoding the information
words of various lengths. CRC codes are generally used for a channels which have
a low noise power which (except for rare packet errors) can be modeled as a BSC
where the channel error probability is sufficiently small (p 10−2), and the codes
are optimized for this channel type.
CRC procedure is based on forming systematic cyclic code. The information bits
transmitted in one frame are encoded using cyclic codes as follows
iðxÞxnk
cðxÞ ¼ iðxÞxnk þ rem ¼ iðxÞxnk þ rðxÞ;
gðxÞ
where rem{} denotes the corresponding remainder. The information and the
parity-check bits are separated. It can be conceived that, to transmitted information
bits, CRC appendix is “attached”. Its dimension is usually substantially shorter than
the rest of he packet (n − k k). During the transmission an error polynomial e
(x) is superimposed. At the receiver, the received sequence is divided by g(x). The
corresponding remainder is
cðxÞ þ eðxÞ
r0 ðxÞ ¼ rem :
gðxÞ
If the remainder equals zero, it is concluded that there were no errors. In the
opposite case, the retransmission can be requested. Problems 6.6, 6.7, 6.8 and 6.9
deal in details with CRC procedure showing as well encoder and decoder hardware
realization.
Cyclic codes are a subclass of linear codes and can be described as well by
matrices. Of course, as said earlier, the polynomial description is more elegant
because the generator polynomial coefficients just form one row of generator
matrix, other rows are its cyclic shifts. Up to now it seemed that one can firstly
chose a generator polynomial and after that has to determine the number of cor-
rectable errors. However, a special class of cyclic codes guarantees the required
number of correctable errors in advance. These are BCH codes (discovered sepa-
rately by Bose and Ray-Chaudhuri at one side, and by Hocquenghem at the other
side) (Problems 6.10, 6.11, 6.12 and 6.13). They are obtained using a constructive
way, by defining generator polynomial roots. Simply, to correct ec errors, the
generator polynomial should have 2ec roots expressed as the consecutive powers of
the primitive element. If it starts from a, the code is called primitive, otherwise, it is
Brief Theoretical Overview 245
Block codes
LBC
Cyclic
BCH
Reed-Solomon
codes
not a primitive one. Of course, to find the generator polynomial, the corresponding
conjugate elements should be taken into account and sometimes a number of
correctable errors is augmented (the conjugates continue a series of the consecutive
powers). These codes are very known cyclic codes. In the Fig. 6.3 their position in
the family of linear block codes is shown. Reed-Solomon (RS) codes (considered
later) are a special subset of nonbinary BCH codes.
Any cyclic code can be specified using generator polynomial g(x) roots. Let
these roots from the field extension are
a1 ; a2 ; . . .; ar :
The received polynomial (vector) u(x) is a code word if and only if a1, a2, …, ar
are roots of u(x)—what is equivalent to the fact that u(x) is divisible by. g(x).
However, the coefficients are from ground field and u(x) should have all others
conjugate elements as a roots—meaning that u(x) must be divisible by the other
minimal polynomials
because some roots can have the same minimal polynomial. Of course, xn − 1 is
divisible by g(x).
It can be shown that for every positive integer m0 and dmin (minimal Hamming
distance) exists BCH code [generator polynomial g(x)], only if it is the minimum
246 6 Cyclic Codes
power polynomial with the coefficients from GF(r), which has the roots from the
extension field
m0 þ 1 0 þ dmin 2
0 ; a0
am 0
; . . .; am
0
where a0 is some element from extension field. Code word length is LCM of the
roots order.
Generally, it is not necessary for r to be a prime. It can be as well power of the
prime. It is only important that GF(r) exists. Used symbols are from this field, but
the generator polynomial roots are from the extension field. Therefore, coefficients
can be from GF(pn) (p—prime) and the roots from GFððpn Þk Þ. Of course, some
roots can be from the ground field, but it is the subfield of the extension field.
Consider the following example. Let p = 2 (binary ground field) and let GF(23)
is generated by primitive polynomial p(x) = x3 + x + 1. Find cyclic code with the
minimal number of parity-check bits, over this field, if one root of generator
polynomial is a1 = a.
Complete GF(23) is shown in Table 6.1 (it should be noticed that an = an−7 and that
identity addition element is omitted (it belongs to the field and sometimes is denoted as
a−∞). Groups of conjugate elements (roots) are denoted with symbols A, B and C.
Conjugate elements for a are a2 and a4, yielding
gðxÞ ¼ ðx þ aÞ x þ a2 x þ a4 ¼ x3 þ x þ 1:
and
gðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ x3 þ x þ 1 x3 þ x2 þ 1 ¼ x6 þ x5 þ x4 þ x3 þ x2 þ x þ 1;
yielding
hðxÞ ¼ x þ 1:
m1 ðxÞ ¼ x þ 1
m2 ðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þ ¼ x3 þ x þ 1
yielding
gðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ ðx þ 1Þ x3 þ x þ 1 ¼ x4 þ x3 þ x2 þ 1
and
hðxÞ ¼ x3 þ x2 þ 1:
Generally, consider GF(2m). It can be shown that for any two positive integers
m (m 3) and ec (ec < 2m−1) exists binary BCH code having the following
characteristic:
– Code word length n = 2m − 1
– Number of control bits n – k mec
– Minimal Hamming distance: dmin 2ec + 1.
This code can correct all combinations of ec of smaller number of errors.
Generator polynomial is a minimal power polynomial which has for roots suc-
cessive powers of a:
a; a2 ; a3 ; . . .; a2ec :
n ¼ qm 1 ¼ q1
and RS codes are relatively “short”. Minimal polynomial for any element b is just
mb ðxÞ ¼ xb:
its power is always 2ec because conjugate elements do not exist. Number of control
symbols is n − k = 2ec and a code word is obtained by multiplication of infor-
mation polynomial and generator polynomial.
In original paper the case q = 2b was considered (binary code for alphanumerical
characters), i.e. field GF(2b). Minimum Hamming distance for RS (n, k) code is
d ¼ nk þ 1;
Brief Theoretical Overview 249
gðxÞ ¼ xa:
Consider now field GF(2b = 8) (Table 6.1). Starting from the primitive element
a, find RS code for correcting one error. There is no need to find the conjugate
elements. The generator polynomial roots are a and a2
gðxÞ ¼ ða þ xÞða2 þ xÞ ¼ a3 þ a4 x þ x2 ;
Gorenstein and Zierler (Problem 6.17) modified Peterson algorithm to decode the
nonbinary codes. Berlekamp-Massey algorithm for RS codes decoding starts from a
fact that the error position polynomial does not depend on error magnitudes in the
code word. It makes possible to use Berlekamp algorithm to find the error position,
and later on to determine the error magnitudes and to do their correction. Forney
algorithm (Problem 6.18) for roots finding allows to determine the error magnitude if
its location is known, as well as error locator polynomial and magnitude error
polynomial, but to find a magnitude of error, the position of other errors are not
explicitly used. The erasure of symbols is a simplest way of soft decision. For
nonbinary codes the location of erased symbol is known, but not the error magnitude.
Here erasure locator polynomial is calculated by using known erasure locators.
Generally, at the beginning the error locator polynomial and be found [e.g. by using
Berlekamp-Massey (Problem 6.18) or Euclidean algorithm (Problem 6.19)], and
then a main equation for errors and erasures decoding is used to obtain a final solution.
Problems
Problem 6.1 Consider the linear block codes, word code length n = 7.
(a) How many cyclic codes can be constructed which have this word length? Find
their generator polynomials.
(b) For one of the possible generator polynomials find all code words and code
weight spectrum. Comment the code possibilities concerning the error
correction.
Problems 251
Solution
Consider a vector space whose elements are all vectors of the length n over GF(q).
A subspace of this space is called cyclic if for any vector ðc0 ; c1 ; . . .; cn1 Þ, a vector
ðcn1 ; c0 ; c1 ; . . .; cn2 Þ, obtained by cyclic permutation of its coefficients also
belongs to this subspace. For an easier analysis a code vector ðc0 ; c1 ; . . .; cn1 Þ can
be represented as well as a polynomial, which has the coefficients equal to the
vector elements [34].
cð xÞ ¼ cn1 xn1 þ . . . þ c1 x þ co :
xn þ 1 ¼ gðxÞhðxÞ;
Y
l
xn þ 1 ¼ mi ðxÞ;
i¼1
where m1(x), m2(x), … are irreducible (minimal) polynomials over field GF(2).
(a) For n = 7 the factorization is as follows
x7 þ 1 ¼ ð1 þ xÞ 1 þ x þ x3 1 þ x2 þ x3 ;
(b) Previous analysis shows that one possible cyclic code generator polynomial is
g1 ð x Þ ¼ ð 1 þ x Þ 1 þ x þ x 3 ¼ 1 þ x 2 þ x 3 þ x 4
¼ g0 þ g1 x þ g2 x 2 þ g3 x 4 þ g4 x 4 :
For arbitrary generator polynomial the elements of the first row of generator
matrix are determined by its coefficients (starting from the lowest power),
while every next row is obtained by shifting the row above for one place at the
right as follows
2 3 2 3
g0 g1 g2 g3 g4 0 0 1 0 1 1 1 0 0
G1 ¼ 4 0 g0 g1 g2 g3 g4 0 5 ¼ 40 1 0 1 1 1 0 5:
0 0 g0 g1 g2 g3 g4 0 0 1 0 1 1 1
In a linear block code there is always all zeros word. The second code word
can be red from generator polynomial coefficients (it corresponds to the
combination (100) at the encoder input) and other six can be obtained as its
cyclic shifts.
It is obvious that a code have one word with weight zero and seven words with
weight four (a(0) = 1, a(4) = 7), and it can correct one error and detect two
errors in the code word. Reader should verify to which information words
correspond written code words (it can be easily found by using a generator
matrix).
Problems 253
g2 ð x Þ ¼ 1 þ x 3 þ x 4
is a generator polynomial of a cyclic code (7, 3). The polynomial g(x) highest
power is n − k = 4 and it seems possible that a cyclic code is obtained. If
g(x) is cyclic code generator polynomial, then xg(x) i x2g(x) must be also the
code polynomials and a generator matrix is
2 3
1 0 0 1 1 0 0
G2 ¼ 4 0 1 0 0 1 1 0 5:
0 0 1 0 0 1 1
Code words of this linear block code are obtained multiplying all possible
three-bit information vectors by a generator matrix, given in Table 6.2.
It is clear that by cyclic shifts of the code word c(1) = (000000) no other code
word can be obtained. Cyclic shifts of a word (1001100) yield
c(5) = (1001100) ! c(3) = (0100110) ! c(2) = (0010011) ! (1001001) !
(1100100) ! (0110010) ! (0011001), and the last four combinations are not
code words. Because of the existence of at least one code word whose all
cyclic shifts are not code words, this code obviously is not cyclic.
It can be noted that the same conclusion could be drawn from the fact that the
supposed generator polynomial g(x) = 1 + x3 + x4 does not divide a polyno-
mial x7 + 1 without remainder, because it cannot be written as a product of its
minimal polynomials. It should be noted as well that the spectrum of a code
defined by generator matrix G2 is
8
>
> 1; d ¼ 0;
>
>
< 3; d ¼ 3;
aðdÞ ¼ 2; d ¼ 4;
>
>
> 1; d ¼ 5;
>
:
1; d ¼ 6;
and this code has a minimum Hamming distance dmin = 3, it detects one error
and can correct it as well. However, the code is not cyclic and cannot be
defined by a generator polynomial g2(x), i.e. the transition from g2(x) to G2,
was not correct.
(d) Starting from a generator polynomial g(x) = 1 + x = g0 + g1x by previ-
ously described procedure a starting nonsystematic generator matrix of
this cyclic code is obtained
2 3 2 3
g0 g1 0 0 0 0 0 1 1 0 0 0 0 0
60 g0 g1 0 0 0 07 60 1 1 0 0 0 07
6 7 6 7
60 0 g0 g1 0 0 07 6 07
G¼6 7 ¼ 60 0 1 1 0 0 7;
60 0 0 g0 g1 0 07 6
7 07
6 60 0 0 1 1 0 7
40 0 0 0 g0 g1 0 5 40 0 0 0 1 1 05
0 0 0 0 0 g0 g1 0 0 0 0 0 1 1
the addition of all six rows and putting the result in the first row yields
2 3 2 3
1 1 0 0 0 0 0 1 0 0 0 0 0 1
60 1 1 0 0 0 07 60 1 1 0 0 0 07
6 7 6 7
60 0 1 1 0 0 07 6 07
GðIÞ ¼6 7 ! GðIIÞ ¼ 6 0 0 1 1 0 0 7:
60 0 0 1 1 0 07 60 0 0 1 1 0 07
6 7 6 7
40 0 0 0 1 1 05 40 0 0 0 1 1 05
0 0 0 0 0 1 1 0 0 0 0 0 1 1
Further, in matrix G(II) the rows 2–6 are added and the result is put into the
second row of a new matrix G(III), where the rows 3–6 are added and put in the
next one
2 3 2 3
1 0 0 0 0 0 1 1 0 0 0 0 0 1
60 1 1 0 0 0 07 60 1 0 0 0 0 17
6 7 6 7
60 0 1 1 0 0 07 6 07
GðIIÞ ¼ 6 7 ! GðIIIÞ ¼ 6 0 0 1 1 0 0 7
60 0 0 1 1 0 077 60 0 0 1 1 0 07
6 6 7
40 0 0 0 1 1 0 5 40 0 0 0 1 1 05
0 020 0 0 1 1 3 0 0 0 0 0 1 1
1 0 0 0 0 0 1
60 1 0 0 0 0 17
6 7
60 0 1 0 0 0 17
! GðIVÞ ¼6
60
7:
6 0 0 1 1 0 07 7
40 0 0 0 1 1 05
0 0 0 0 0 1 1
Problems 255
Problem 6.2 A cyclic code, code word length n = 7, is defined by the generator
polynomial g(x) = 1 + x + x3.
(a) Verify whether the code obtained by this generator polynomial is cyclic. Find
the corresponding parity-check polynomial, generator matrix and parity-check
matrix.
(b) Describe in details the procedure to find the code word corresponding to
information word (1010). List all code words of the code and comment its
capability to correct errors.
(c) Describe in details the decoding procedure when in transmitted code word
(corresponding to (0111) information word) if the error occurred at the third
position.
(d) Whether the code obtained by shortening of this code for one bit is cyclic as
well?
Solution
(a) A polynomial is generator one if it divides the polynomial xn + 1 without
remainder. To verify it, the polynomial division is carried out:
256 6 Cyclic Codes
The remainder equals zero and the code is cyclic. Parity-check polynomial is
obtained as the result of a division
h ð xÞ ¼ 1 þ x þ x2 þ x4 :
For a cyclic code the polynomials corresponding to the code words must be
divisible by g(x). Therefore, the generator matrix rows are determined by
coefficients of polynomials g(x), xg(x), x2g(x), x3g(x)
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
G¼6
40
7:
0 1 1 0 1 05
0 0 0 1 1 0 1
Parity-check polynomial h(x) defines the corresponding code as well, and the
corresponding parity-check matrix rows are determined by coefficients of
polynomials h(x), xh(x), x2h(x) red out in a reverse order
2 3
1 1 1 0 1 0 0
H ¼ 40 1 1 1 0 1 0 5:
0 0 1 1 1 0 1
G H T ¼ 0:
cð xÞ ¼ ið xÞgð xÞ ¼ 1 þ x2 1 þ x þ x3 ¼ 1 þ x þ x3 þ x2 þ x3 þ x5
¼ 1 þ x þ x2 þ x5 :
(c) From Table 6.3 it is easy to find that to information word i = (0111) corre-
sponds the code word c = (0100011). If during the transmission an error
occurred at the third position, the received sequence is r = (0110011) and the
corresponding polynomial is r(x) = x + x2 + x5 + x6. The decoding procedure
is based on finding the remainder after the division of received word poly-
nomial by a generator polynomial
The obtained remainder is (001), it is not equal to zero meaning that an error
occurred. To find its position, one should find to what error type corresponds
this remainder. This code can correct all single errors, the syndrome value is
found from S ¼ rH T ¼ ðc eÞH T ¼ eH T and to the error at jth position the
syndrome corresponds being equal to the jth column of a parity-check matrix,
as shown in Table 6.4.
(d) Generator matrix of the code obtained from the originating one by shortening
it for one bit is obtained by eliminating the last row and the last column from
the basic code matrix. There are six cyclic shifts of the code word and all
words (given in Table 6.5) have not the same Hamming weight, it is obvious
that the shortening disturbed the cyclical property. However, the minimum
Problems 259
Table 6.5 Code words of the code (6, 3) obtained by cyclic code (7, 4) shortening
Information word Code word
0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 1
0 1 0 0 1 1 0 1 0
0 1 1 0 1 0 1 1 1
1 0 0 1 1 0 1 0 0
1 0 1 1 1 1 0 0 1
1 1 0 1 0 1 1 1 0
1 1 1 1 0 0 0 1 1
Hamming distance was not changed and the code still can detect and correct
one error in the code word (but having a smaller code rate R = 1/2). It is easy
to verify that this code is equivalent to shortened Hamming code (6, 3) from
Problem 4.8.
(a) Find the code generator matrix and list all code words.
260 6 Cyclic Codes
(b) Explain the procedure to obtain one code word of the systematic version of the
code. Write all code words and a corresponding generator matrix.
(c) Do the generator matrix of a systematic code can be obtained from the cor-
responding matrix of the nonsystematic code and how?
Solution
(a) By applying the procedure from the previous problem, it is easy to find the
generator polynomial
gð x Þ ¼ 1 þ x þ x 2 þ x 4 ;
To obtain a nonsystematic code, the code polynomial is found from c(x) = i(x)
g(x), the list of all code words is given in Table 6.6.
(b) Code polynomial of a systematic cyclic code is found using the relation
iðxÞxnk
cð xÞ ¼ ið xÞxnk þ rem
gðxÞ
where rem{.} denotes the remainder (polynomial) after division. In this case
n − k = 4 and to information word i = (111) corresponds the polynomial i
(x) = 1 + x + x2, while the code polynomial is
ð1 þ x þ x2 Þx4
cð xÞ ¼ 1 þ x þ x2 x4 þ rem :
1 þ x þ x2 þ x4
Table 6.6 Code words of cyclic code (7, 3), nonsystematic and systematic version
Information Code word of the Code word of the
word nonsystematic code systematic code
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0
0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0
0 0 1 0 0 1 1 1 0 1 1 1 0 1 0 0 1
1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0
0 1 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1
1 0 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1
1 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 1
Problems 261
as rem ðiðxÞ xnk Þ=gðxÞ ¼ x, a code word is obtained from the code
polynomial coefficients
cð xÞ ¼ x4 þ x5 þ x6 þ x ¼ x þ x4 þ x5 þ x6 ! c ¼ ð0100111Þ:
Using the linear combinations of the generator matrix rows, the code words of
this systematic code can be easily found (given in the third column of the
Table 6.6). It is obvious that both systematic and nonsystematic codes are
cyclic, because by a cyclic shift of any code word, the code word is obtained
as well. In this case a set of code words is identical, the code being systematic
or not.
(c) Generator matrix of the systematic code can be easily found, its third row will
be obtained by summing the first and the third row of the nonsystematic code
generator matrix.
⎡1 1 1 0 1 0 0 ⎤ ⎡1 1 1 0 1 0 0 ⎤
G = ⎢⎢0 1 1 1 0 1 0 ⎥⎥ ⎢ ⎥
⎢ 0 1 1 1 0 1 0 ⎥ = Gs .
⎢⎣0 0 1 1 1 0 1 ⎥⎦ ⎢⎣1 1 0 1 0 0 1 ⎥⎦
262 6 Cyclic Codes
(a) Verify do the code is a cyclic one! Find all code words and draw its weight
spectrum,
(b) How many errors can be corrected and how many can be detected? Verify
whether the Hamming and Singleton bounds are satisfied.
(c) Find the probability that the emitted code word is not corrected if the channel
can be modeled as BSC (crossover probability p = 10−2).
Solution
A generator polynomial has to satisfy the condition to be a divisor of polynomial
xn þ 1 (n—code word length). It means that for given g(x) a minimum n should be
found so as that
xn þ 1
rem ¼0
gðxÞ
In this case, for gðxÞ ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þðx3 þ x2 þ 1Þ2 , minimum n value
satisfying the above equality is n = 14, because the factoring can be carried out in
field GF(2):
x14 þ 1 ¼ ðx7 þ 1Þ2 ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þ2 ðx3 þ x2 þ 1Þ2 ¼ gðxÞðx3 þ x þ 1Þ;
where the second element at the right side is the parity-check polynomial, while the
generator polynomial is
and, procedure for a direct obtaining of a code word from the information
word i(x) = 1 + x2 and the generator polynomial coefficients is as follows
111010011101×101
111010011101
111010011101
c=(1 1 0 1 0 0 1 1 1 0 1 0 0 1).
Problems 263
It is easy to show that six code words more can be obtained by cyclic shifts of
the obtained code word. As to the information word i = (000) always corre-
sponds all zero code word, list of all code words is given in Table 6.7 and the
corresponding weight spectrum is shown in Fig. 6.4.
(b) Minimum Hamming distance is dmin = 8, it is obvious that the code corrects
up to ec = 3 errors and detects ed = 4 errors in the code word. If the code
capability to correct the errors (ec = 0) is not used, the code can detect up to
ed dmin − 1 = 7 errors in the code word.
The code parameters are n = 14, k = 3, a Hamming bound is satisfied because
264 6 Cyclic Codes
14 14 14 14
2 143
þ þ þ ) 2048 [ 1 þ 14 þ 91 þ 364
0 1 2 3
¼ 470;
dmin n k þ 1 ) 8 12:
It is obvious that both bounds are satisfied with a very large margin, meaning
that it is possible to construct a code having better performances with the same
code parameters. It can also be noticed that for the same code word length and
the information word length, the following relation is satisfied as well
4
X 14
2 143
;
t¼1
t
meaning that a Hamming bound predicts the existence of a linear block code
having the parameters (14, 3) correcting ec = 4 words at the code word.
Singleton bound in this case is less severe and it even predicts the code
existence with dmin 12, correcting up to ec = 5 error at the code word.
A code (14, 3) correcting five errors at a code word does not fulfill a Hamming
bound, it does not exist, and there is no sense to look for it. However, it even
does not mean that a code (14, 3) correcting four errors exists. The fulfilling of
both bounds only means that this code may exist (and one has to find it!). The
Hamming and Singleton bounds consider the possibility of existence the linear
block codes which has given parameters, and even if such a code exists, it
does not mean that it is a cyclic code.
(c) In Problem 5.8 it was shown that the probability not to detect errors corre-
sponds to the probability that an error vector corresponds to the code word,
and it can be easily calculated using the weight spectrum
X
n
Pe;d ðpÞ ¼ aðdÞpd ð1 pÞnd ;
d¼dmin
On the other hand, the probability that the error is not corrected corresponds to
the probability that an error vector does not corresponds to the coset leader in a
standard array. If the maximum weight of the pattern corresponding to the coset
Problems 265
1000
800
600
8(506) 15(506)
400
8(253) 16(253)
200
0(1) 23(1)
0
0 5 10 15 20
Hamming weight, d
X
l
Pe;c ðpÞ ¼ 1 LðiÞpi ð1 pÞni :
i¼0
The code corrects single, double and triple errors and it is obvious that all error
vectors having one, two or three ones (total 470) can be coset leaders, and L(0) = 1,
L(1) = 14, L(2) = 91 and L(3) = 364. Now, the coset leaders corresponding to the
syndrome values left over should be found. The reader should verify that, for this
code, all error vectors having four ones result in the various syndrome values and to
find the probability not to correct the errors. It is the most easier to use a computer
search.
Problem 6.5 The Golay cyclic code (code word length n = 23) generator poly-
nomial is:
Solution
The code word length is n = 23 and generator polynomial power is n − k = 11,
therefore the information word length is k = 12. It is the Golay code (23, 12)
described in [24].
(a) The forming of a code word corresponding to information word i(x) = 1 +
x5 + x11 is carried out by finding the code polynomial coefficients
(b) Golay code (23, 12) weight spectrum was found by a computer search by
repeating the above exposed procedure to find all 212 code words. The result is
shown in Fig. 6.5. Minimum Hamming distance is dmin = 7 and the code can
correct up to three errors in the code word.
In this case the Hamming bound is satisfied with equality because
3
X
2312 23
2 ¼ ¼ 2048;
t¼1
t
and the Golay (23, 12) code is perfect. Singleton bound is satisfied as well
dmin n k þ 1 ) 7 12
(c) The extended Golay code which has the parameters (24, 12) is obtained by
adding a general parity-check to the Golay (23, 12) code. Its generator
polynomial is
(d) The systematic Golay code (24, 12) has the code words determined by the
coefficients of the polynomial
iðxÞxnk
cð xÞ ¼ ið xÞx nk
þ rem ;
gðxÞ
This procedure is repeated for all information words having only one “one”
resulting in the corresponding code words. By using these words as the rows
of a matrix which has 12 rows and 24 columns, a systematic generator matrix
is obtained
G ¼ ½P; I12 :
This matrix is usually additionally modified so as that the unity matrix is at the
first 12 columns, while the positions of other columns are changed so as their
rows and columns become identical:
gðxÞ ¼ 1 þ x2 þ x8 :
(a) Explain the procedure for forming a sequence at the CRC encoder output if the
information sequence i = (01100111) was emitted.
(b) Verify the remainder in decoder after division if there were no errors during
the transmission.
(c) If during the transmission the errors at the second, at the last but one and at the
last position occurred find the result in CRC decoder.
(d) Draw a code weight spectrum and comment its capability to detect the errors.
Solution
(a) A polynomial corresponding to the information sequence is i(x) = x7 + x6 +
x5 + x2 + x, and the code sequence is
iðxÞ xnk
cð xÞ ¼ ið xÞxnk þ rem ¼ ið xÞx8 þ r ð xÞ:
gðxÞ
A procedure for finding the remainder r(x) will be split into two phases.
Firstly, a polynomial to be divided will be found and a corresponding binary
sequence (bits corresponding to the coefficients)
and the remainder after division of the found polynomial by the generator
polynomial is
i ðxÞ
r ð xÞ ¼ rem ¼ 1 þ x4 þ x5 þ x6 ! r ¼ ð1000011Þ;
gðxÞ
(b) If during the transmission there were no errors, the code word was not
changed. The decoder is performing a Cyclic Redundancy Check (CRC) to
find the remainder after division of the received polynomial by a generator
Problems 269
c ¼ ð1000011001100111Þ
e ¼ ð0100000000000011Þ
c0 ¼ c þ e ¼ ð1100011001100100Þ
c0 ðxÞ
r ð xÞ ¼ rem ¼ x7 þ x6 þ x4 þ x3 þ x2 þ 1
gðxÞ
X
n X
n
Pe;d ðpÞ ¼ aðdÞpd ð1 pÞnd aðdÞpd aðdmin Þpdmin :
d¼dmin d¼dmin
It is obvious that this probability would be greater if dmin is greater and a(dmin)
smaller. In this case dmin = 3, and a code can detect all single and double
errors, while more errors can be detected, if they do not correspond to the code
word structures.
Solution
(a) As mentioned in the previous problems, forming of a cyclic code can be
conceived as multiplying information polynomial by a generator polynomial
cnesist ðxÞ ¼ iðxÞgðxÞ, but the obtained code word is not a systematic one.
Problems 271
c11...c0
CLR
Q CLR
Q CLR
Q CLR
Q
clk
i7 i6 i5 i4 i3 i2 i1 i0
from which it can be concluded that dividing a polynomial iðxÞxnk by g(x) the
result i2 ðxÞ is obtained, and the remainder is rðxÞ ¼ rem iðxÞxnk =gðxÞ . In
this example iðxÞ ¼ x þ x2 þ x5 þ x6 þ x7 , while a division by g(x) is as
follows
272 6 Cyclic Codes
i2 ðxÞ ¼ x þ x2 þ x4 þ x5 þ x7 ) i2 ¼ ð0110111Þ
rðxÞ ¼ x þ x2 :
cˆ0 ...cˆ11
D D D D
(1) OE
D 0 D 1 D 2 D 3 D4 D 5 D 6 D 7
(2)
mˆ 0 ...mˆ 7
(b) Block scheme of a CRC encoder for (12, 8) code is shown in Fig. 6.7—during
the first k shifts, the lower bits are red out, while at the upper part a division is
performed. The encoder consists of two shift registers (usually realized using
D-flip flops, as standard elements for delay). Generally, a lower register has
k delay cells. The writing is parallel, but the reading (sending at the line) is
serial, during k successive shifts. The upper register consists of n − k delay
cells and a feedback is defined by coefficients of generator polynomial. In this
register, after k shifts, a remainder after division by a generator polynomial is
obtained. It will be sent at the line after the information bits, because the
switch will be at the upper position after the first k shifts.
(a) (b)
Fig. 6.9 Weight spectra of two CRC codes having the same generator polynomial g(x) = x4 +
x3 + 1, but the different code words lengths—(11, 7) (a) and (7, 3) (b)
274 6 Cyclic Codes
Table 6.9 Minimum code distances profile—optimum and for two considered codes
Dmin Optimum CCITT, nc = 32767 C1, nc = 151
17 17
12 18
10 19, …, 21 17, …, 20
9 22
8 23, …, 31 21, 22
7 32, …, 35
6 36, …, 151 23, …, 151
5 152, …, 257
4 258, …, 32767 17, …, 32767
3 32768, …, 65535
2 65536 32768 152
Block scheme of CRC decoder is shown in Fig. 6.8, using simplified symbols
for delay cells. When the switch is in position (1), the first k bits enter the
division circuit and the shift register. Here the remainder after division by a
generator polynomial, whose coefficients determine the feedback structure as
well, is obtained. If the remainder differs from zero, the signal OE (Output
Enable) is not active and the reading from the lower register is not permitted.
If the remainder equals zero indicating that there were no errors during
transmission, signal OE is active and the decoded information can be delivered
to a user.
(c) CRC procedure as a result gives the code words of a code obtained by the
shortening a binary cyclic code. The polynomial gðxÞ ¼ x4 þ x3 þ 1 is an
irreducible factor of a polynomial x15 + 1, CRC procedure based on this
polynomial can generate all codes having the parameters n 15 and n −
k = 4. Codes having the parameters (11, 7) and (7, 3) fulfill these conditions
and they can be generated using this polynomial. By a computer search all
code words of these codes were found and their weight spectra are shown in
Figs. 6.9a, b. In Table 6.8 all code words of a code (7, 3) are listed. It can be
easily seen that this code is not cyclic, nor equivalent to the code from
Problem 6.3.
It is clear that every obtained code, whose spectra were found, has a minimum
Hamming distance dmin = 3, and could correct one error in a code word.
However, here the CRC codes are considered where the aim is not the error
correction. If ec = 0, they can detect at most ed = 2 errors in the code word. It
should be noticed that even and a spectrum shape indicates that a code is
cyclic. If a spectrum is symmetric (as for the case of non-shortened code (15,
11), spectrum shown in Fig. 6.10), a code is cyclic. From Fig. 6.9 it is clear
that a code shortening usually disturbs this feature.
Problem 6.8 During CRC-8 procedure for error detecting, after the information
word, CRC extension is included, where its forming is based on the generator
polynomial g(x) = x8 + x7 + x6 + x4 + x2 + 1.
(a) Find the code word corresponding to information word i = (1000000) and find
the code distances spectrum.
(b) Find the code word corresponding to information word i = (10000000) and
find the code distances spectrum.
(a) (b)
Number of words with given weight, a(d)
90 1500
80
70
60 1000
50
40
30 500
20
10
0 0
0 2 4 6 8 10 12 14 16 0 5 10 15 20
Hamming weight,d Hamming weight, d
Fig. 6.11 Weight spectra for two CRC codes which have the same generator polynomial g
(x) = x8 + x7 + x6 + x4 + x2 + 1, but the different code word lengths—(16, 8) (a) and (20, 12) (b)
276 6 Cyclic Codes
Solution
CRC code word is formed using the identity
iðxÞxnk
cCRC ðxÞ ¼ iðxÞx nk
þ rem ;
gðxÞ
and always consists of clearly separated information and parity-check bits. Where
CRC-8 is applied, the remainder has always eight bits, while a length of the
information part depends on the message to be transmitted.
(a) If the information sequence length is k = 8, a code word has length n = 16 and
for the considered example iðxÞ i(x) = 1 yielding
x8
ðIÞ 8
c ðxÞ ¼ x þ rem ¼ 1 þ x2 þ x4 þ x6 þ x7 þ x8 :
gðxÞ
Here generator polynomial corresponds to the code word, and the code word is
obtained by reading coefficients starting from x0 to x15
cðn¼8Þ ¼ ð1010101110000000Þ:
(b) Although the information sequence length here is k = 10, a procedure for
obtaining a code word is practically unchanged, because here the following is
satisfied as well
However, a code word in this case has the length n = 18 being obtained by
reading coefficients starting from x0 to x17
cðn¼10Þ ¼ ð101010111000000000Þ:
If the same information polynomial is used, but the code words have the
various lengths, it is obvious that the resulting code polynomials will be
unchanged, and under the same conditions, a longer code word relating to a
shorter one has only zeros at the right side. The code spectrum is shown in
Fig. 6.11b. In this case dmin = 4 as well, and a code can also detect all three-bit
error combinations, but this time in the twenty-bit block. The code will detect
all sequences with an odd number of errors as well.
Problems 277
Some general rules for the choice and analysis of CRC code performances are:
(1) By shortening a cyclic code, the code is obtained having minimum
Hamming distance at least as the original code. In fact, by shortening a
cyclic code, a parameter dmin cannot be decreased! Taking into account
that code corrects at least the same number of errors as before a short-
ening, but has shorter code words, it is obvious that the capability for
error detection is increased by code shortening. Of course, it is paid by
decrease of code rate, because in shorter code word the number of
parity-checks remains the same.
(2) A main advantage of code shortening is a possibility to adjust a code
word length to a length of the message entering the encoder. In such way,
the same generator polynomial can be used for encoding of information
words of various lengths. However, for every generator polynomial there
is an upper bound for a code word length, after which code performances
become worse abruptly. It will be considered in the next problem.
(3) It is desirable for a code to have the capability to detect an odd number of
errors. It can be achieved by a generator polynomial choice, if it has a
factor (x + 1). It is interesting that in this case a code can correct an odd
number of errors although x + 1 is not a factor of generator polynomial!
(4) A spectrum of a code (20, 12) shown in Fig. 6.11b, was obtained by
inspection of Hamming weights of all 4096 code words. In communi-
cation systems the information bits are grouped into still longer blocks,
and to find a linear block code spectrum a long time is needed. In
practice, the codes are usually used which have a maximum generator
polynomial power 16 or 24 (wireless systems) up to n − k = 32 (e.g.
Ethernet), while the information block lengths are a few hundreds and
(a) (b)
Fig. 6.12 Weight spectra for a two CRC codes, for the same code word length (n = 30), but
with different generator polynomials g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 (a) and g2 ðxÞ ¼ x16 þ x13 þ x12 þ
x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 (b)
278 6 Cyclic Codes
Solution
(a) A code word length is relatively short and a direct method was used to find a
spectrum by a computer search. Here n = 30 and n − k = 16, in both cases
Fig. 6.13 Probability that a code does not detect error vs. the code word length for two analyzed
codes
Problems 279
214 = 16384 code words were generated, their Hamming weights and their
spectra were found. For both polynomials they are shown in Figs. 6.12a, b.
A code with generator polynomial g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 is a standard
CCITT CRC-16 code having dmin = 4, and in a code there are A(dmin) = 14
words having four ones. A CRC-16 code using polynomial g2 ðxÞ ¼
x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 was proposed [36] as C1
code which has dmin = 6 and in a code there are A(dmin) = 18 words having six
ones. Both codes can correct any odd number of errors. The reader should
verify the divisibility of generator polynomials by factor x + 1.
As mentioned earlier, a shortening of a cyclic code cannot additionally
decrease the minimum Hamming weight. If CCITT CTC-16 code is shortened
to the code word length n = 25, dmin = 4 (the same), in the code there are less
words with this weight, because A(4) = 9. Similarly, for a code whose gen-
erator polynomial is g2 ðxÞ ¼ x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ
x2 þ 1 for n = 25, dmin = 4, but now A(6) = 3.
On the basis of the analysis from the previous chapter, the probability that the
error was not detected is
X
n
Ped ðp; nÞ ¼ Ad ðnÞpd ð1 pÞnd ;
d¼dmin ðnÞ
Fig. 6.14 Probability for a code not to detect error vs. crossover probability for two considered
CRC codes and two code word lengths
280 6 Cyclic Codes
where is was stressed that the minimum Hamming distance, as well as the
weight spectrum coefficients, depend on the code word length (it is important
how much a CRC code was shortened). CRC codes are generally used for the
channels which have a low noise power which, except for rare packet errors,
that can be modeled as a BSC where the crossover probability is sufficiently
small (p 10−2), and the codes are optimized for such a channel type. Then,
the first element of a series is dominant, and the following approximation can
be used
(b) The above used procedure is repeated for a code words where n 300. Of
course, here n − k = 16 and it is obvious that the code word length cannot be
shorter than n = 17 and a typical example of such a code is the simple
parity-check. For n > 35 the performances are found using a dual code
spectrum (as described in Problem 5.9) and the results are shown in Fig. 6.13.
It is obvious that a code with generator polynomial g2 ðxÞ ¼
x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 (here denoted as a C1 code),
provides greater reliability for code word lengths n 151, while CCITT
code has much better performances for a longer code word lengths.
The above exposed results can be understand easier if one notice that a gen-
erator polynomial CCITT CRC-16 code g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 is a factor
of polynomial x32767 + 1, while a generator polynomial of C1 CRC-16 code
g2 ðxÞ ¼ x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 is a factor of poly-
nomial x151 + 1. The code length of a non-shortened CCITT code is nc =
32767 and a polynomial g1(x) can generate only codes being a shortened
versions of a cyclic code (32767, 32751). Similarly, a polynomial g2(x) can
generate shortened versions of a cyclic code (151, 135) because here nc = 151.
By using the polynomial g2(x) it is not possible to obtain a code for effective
error detection for code word length n > 151. It was shown that a code
and if their probability not to detect error monotonically decreases are called
proper codes. It was shown that code C1 is proper for all code word lengths
n nc, while CRC-CCITT code is not proper for all values n nc.
In Fig. 6.14 the probability for a code not to detect error versus BSC crossover
probability is shown. It is obvious that CCITT CRC-16 code has such prob-
ability higher than C1 CRC-16 code, but this probability does increase sig-
nificantly with the code word length increasing. On the other hand, a code
defined by generator polynomial g2 ðxÞ provides better performances for a
short code words, but when n > 151, performances are drastically worse
282 6 Cyclic Codes
Solution
Firstly, the root of primitive polynomial generating a field should be found. The
polynomial p(x) coefficients take the values from a binary field (addition is
modulo-2), and it is obvious that binary zero and binary one are not polynomial
roots because p(0) = p(1) = 1. An element a is introduced, for which
pðaÞ ¼ a3 þ a2 þ 1 ¼ 0
It is obvious that this element is not from a binary field, but a relation a3 =
2
+ 1 must formally be satisfied. It provides the expression of every power of a (the
element of binary field extension) as a polynomial b0a0 + b1a1 + b2a2, where the
ordered coefficients (b0 b1 b2) are a binary equivalent of a corresponding element
from field extension GF(23). E.g. it can be written a4 = (a2 + 1)a = a3 + a = 1 +
+a2 and a binary equivalent is (1,1,1). A complete Galois field GF(23) is given in
Table 6.10.
In this table the identity element for addition (combination “all zeros”) is not
included, it exists as the eighth field element, but it is not interesting for BCH codes
(a) (b)
Fig. 6.15 Code distances spectrum of BCH(7, 4) (a) and BCH(7, 1) (b) codes
Problems 283
where for calculations, the relation al = al−7 was used as well as a corresponding
equivalence between exponential and polynomial equivalents, given in Table 6.10.
Coefficients of minimal polynomials are by rule from a binary field. Further, in a
general case, minimal polynomials are all irreducible factors of polynomial xn + 1
where n = 2m − 1, 2m is the number of elements in extension field GF(2m)
x7 þ 1 ¼ x3 þ x2 þ 1 x3 þ x þ 1 ðx þ 1Þ:
This relation, previously written in Problem 6.1, can now be explained more
precisely. To obtain a binary cyclic code it suffices that its generator polynomial is a
product of minimal polynomials, and a code word polynomial is easily obtained
multiplying information polynomial by code generator polynomial. Of course, all
combinations of minimal polynomials will not result in equally effective error
control codes.
284 6 Cyclic Codes
On the other hand, BCH code construction guarantees to obtain a code where a
capability to correct errors is given in advance. For every natural number m 3 it
is possible to construct BCH code where the code word length is n = 2m − 1,
correcting ec errors, where n − k mec. To provide these features, it is necessary
and sufficient that a code generator polynomial has the following (successive) roots
from a field extension
If a starting element am0 is a primitive one, all its conjugate elements are
primitive as well, and the corresponding minimal polynomial is primitive. BCH
code, where a starting element is primitive, it is called a primitive code. As the
element a is primitive by rule, usually as a generator polynomial the polynomial of
a minimal power is chosen having 2ec successive powers of a
a; a2 ; a3 ; . . .; a2ec :
(a) To obtain a code which has code word length n = 7, capable to correct one
error, a generator polynomial should have roots a i a2. As both elements are
roots of minimal polynomial m1(x), it is obvious
g1 ð xÞ ¼ m1 ð xÞ ¼ x3 þ x2 þ 1:
(b) If the goal is to construct a code, having code word length n = 7 and cor-
recting two errors, the generator polynomial should have roots a, a2, a3, a4.
Problems 285
(c) To construct a code correcting three errors and having word code length n = 7,
a generator polynomial should have the roots a, a2, a3, a4, a5, a6. Elements a,
a2 i a4 are roots of minimal polynomial m1(x), while the elements a3, a5 i a6
are roots of minimal polynomial m2(x), and a generator polynomial is the same
as in a previous case
g3 ð xÞ ¼ m1 ð xÞm2 ð xÞ ¼ 1 þ x þ x2 þ x3 þ x4 þ x5 þ x6 :
Solution
In this case a root of primitive polynomial is found from
pðaÞ ¼ a4 þ a þ 1 ¼ 0
and the relation a4 = a+1 must hold, enabling to form the corresponding extended
field GF(24) given in Table 6.11.
The minimal polynomials in GF(24) are obtained
only with polynomials m1(x) and m3(x), but any element can be used to find a BCH
code polynomial. However, whether the obtained BCH code is primitive depends
on a nature of generator polynomials roots.
(a) To construct a code correcting one error in the code word length n = 15, the
generator polynomial should have the roots a and a2, being here
g1 ðxÞ ¼ m1 ðxÞ ¼ x4 þ x þ 1
and the obtained code is (15, 11). A code spectrum, shown in Fig. 6.16, was
obtained by a computer search generating all 211 = 2048 information words,
obtaining the corresponding code words and then finding their weights.
It is obvious that the code minimum Hamming distance is dmin = 3 providing
the correction of all single errors, as required. This code is equivalent to
Hamming (15, 11) code.
(b) To construct a code correcting two errors in the code word length n = 15, a
generator polynomial should have the roots a, a2, a3, a4, where a, a2 and a4 are
roots of minimal polynomial m1(x), while a3 is a root of m2(x), and generator
polynomial is
Of course, it is the same code as in the above case, i.e. 15-fold repetition code
correcting up to seven errors at a code word length. It is easy to conclude that
this effect cannot be avoided by a suitable choice of the starting element for
roots, and in this case, the requirement to construct a code correcting ec 4
errors will always result in a code (15, 1), correcting seven errors.
(c) Calculate syndrome values at the receiver end and verify whether the relations
connecting syndromes and error locators in the previous example are fulfilled.
(d) Explain in short a procedure for finding locator error polynomial on the basis
of syndrome and verify the procedure carried out in the previous example.
(e) Explain in short Peterson procedure for BCH codes decoding and verify its
efficiency in a previous example.
Solution
Spectra of dual codes for BCH codes correcting double errors in the code word
length n = 2m − 1 (m even), can be found using identities given in Table 6.12 [25].
When m = 4, BCH code correcting double error in the code word length n = 15
bits has the parameters (15, 7). Dual code (15, 8) spectrum in this case has six
nonzero components, and it is easy to calculate that in the dual code there are 1
word with weight 0, 15 words with weight 4, 100 words with weight 6, 75 words
with weight 8, 60 words with weight 10 and 5 words with weight 12, what can be
described by a relation
BCH code (15, 7) spectrum can be now found by using the first McWilliams
identity
8 15 1x
AðxÞ ¼ 2 ð1 þ xÞ B
1þx
"
8 15 1x 4 1x 6 1x 8
¼ 2 ð1 þ xÞ 1 þ 15 þ 100 þ 75
1þx 1þx 1þx
#
1 x 10 1 x 12
þ 60 þ5
1þx 1þx
h
¼ 28 ð1 þ xÞ15 þ 15ð1 xÞ4 ð1 þ xÞ11 þ 100ð1 xÞ6 ð1 þ xÞ9
i
þ 75ð1 xÞ8 ð1 þ xÞ7 þ 60ð1 xÞ10 ð1 þ xÞ5 þ 5ð1 xÞ12 ð1 þ xÞ3
¼ 1 þ 18x5 þ 30x6 þ 15x7 þ 15x8 þ 30x9 þ 18x10 þ x15
and the same result was obtained as in the previous problem, where a computer
search was used (Fig, 6.17),
(b) In the previous problem it was shown that a generator polynomial for BCH
(15, 7) code is
by reading out its coefficients from the lowest to the highest power. Therefore,
the code word, error vector and the received word are, respectively
c ¼ ð1000101110000000Þ
e ¼ ð0100000100000000Þ
r ¼ ð1100101010000000Þ
rðxÞ ¼ 1 þ x þ x4 þ x6 þ x8 :
Syndrome values at the receiver end are calculated as polynomial values for
generator polynomial roots which should exist for a case ec = 2 being a, a2, a3, a4.
For the above polynomial, syndrome components are
S1 ¼ rðaÞ ¼ 1 þ a þ a4 þ a6 þ a8 ¼ 1 þ a þ ð1 þ aÞ þ ða2 þ a3 Þ þ ð1 þ a2 Þ
¼ 1 þ a3 ¼ a14 ;
S2 ¼ rða2 Þ ¼ 1 þ ða2 Þ þ ða2 Þ4 þ ða2 Þ6 þ ða2 Þ8
¼ 1 þ a2 þ ð1 þ a2 Þ þ ð1 þ a þ a2 þ a3 Þ þ a ¼ 1 þ a2 þ a3 ¼ a13 ;
S3 ¼ rða3 Þ ¼ 1 þ ða3 Þ þ ða3 Þ4 þ ða3 Þ6 þ ða3 Þ8 ¼ 1 þ a3 þ ð1 þ a þ a2 þ a3 Þ þ a3
þ ða þ a3 Þ ¼ a2 ;
S4 ¼ rða4 Þ ¼ 1 þ ða4 Þ þ ða4 Þ4 þ ða4 Þ6 þ ða4 Þ8 ¼ 1 þ ð1 þ aÞ þ a þ ða þ a3 Þ þ a2
¼ a þ a2 þ a3 ¼ a11 :
If there were no errors during the transmission, roots a, a2, a3, a4 would have
been zeros of the received word polynomial, because it is code word polynomial,
divisible by a generator polynomial.
Syndromes can be obtained as well from error locators (locator Xi = ak corre-
sponds to ith error which occurred in the kth position in a code word) by following
operations
292 6 Cyclic Codes
S1 ¼ X1 þ X2 ¼ a1 þ a7 ¼ a þ ð1 þ a þ a3 Þ ¼ 1 þ a3 ¼ a14 ;
S2 ¼ X12 þ X22 ¼ ða1 Þ2 þ ða7 Þ2 ¼ a2 þ ð1 þ a3 Þ ¼ a13 ;
S3 ¼ X13 þ X23 ¼ ða1 Þ3 þ ða7 Þ3 ¼ a3 þ ða2 þ a3 Þ ¼ a2 ;
S4 ¼ X14 þ X24 ¼ ða1 Þ4 þ ða7 Þ4 ¼ ð1 þ aÞ þ ð1 þ a2 þ a3 Þ ¼ a11 :
It is important to note that the syndrome values in two previous sets are obtained
in different ways. In the first case, syndrome values are determined by a received
word structure, while in the second, syndrome values are determined by error
positions. Of course, the receiver does not know the error positions (it has to find
them!) but it just gives a possibility for a decoder construction.
(d) Consider firstly a general case, i.e. BCH code for correcting ec errors in the
code word. Let during a transmission at m bits errors occurred. Syndrome
values can be found as polynomial values for roots a; a2 ; . . .; a2ec and an
equations system can be formed to find unknown error locators X1 ; X2 ; . . .; Xm
S1 ¼ X1 þ X2 þ þ Xm ;
S2 ¼ X21 þ X22 þ þ X3m ;
S3 ¼ X31 þ X32 þ þ X2m ;
..
.
S2ec ¼ X2e m :
2ec
1 þ X2 þ
c
þ X2e c
It is obvious that error locations can be easily found if the coefficients of error
locator polynomial are known. On the other hand, it was shown that syndrome
components and coefficients of error locator polynomial are connected by a
system of linear equations
In the special case, when all operations are in a binary field and when there are
m = ec errors, the system is additionally simplified and can be written as
2 32 3 2 3
1 0 0 0 0 0 K1 S1
6 76 7 6 7
6 S2 S1 1 0 0 0 76 K2 7 6 S3 7
6 76 7 6 7
6 S4 S3 S2 S1 0 0 76 K3 7 6 S5 7
6 76 7 6 7
6 76 7 6 7
AK ¼ 6 S S5 S4 S3 0 0 7 6 K4 7 ¼ 6 S7 7:
6 6 76 7 6 7
6 .. .. .. .. .. .. .. 7 6 .. 7 6 .. 7
6 . . . . . . . 76 . 7 6 . 7
6 76 7 6 7
6 76 7 6 7
4 S2ec 4 S2ec 5 S2ec 6 S2ec 7 Sec 2 Sec 3 5 4 Kec 1 5 4 S2ec 3 5
S2ec 2 S2ec 3 S2ec 4 S2ec 5 S ec Sec 1 Kec S2ec 1
This system has a unique solution if and only if syndrome matrix A is not
singular. Peterson showed that, due to the limitations for syndromes, when a
code is binary one, matrix A has a determinant different from zero if there are
ec or ec − 1 errors in a received word. Solutions of a system of equations for
ec = 1, ec = 2, ec = 3 and ec = 4 are given in Table 6.13.
In the previous example the case of a code correcting ec = 2 errors was
considered and on the basis of the received word the syndrome values were
calculated S1 ¼ a14 ; S2 ¼ a13 ; S3 ¼ a2 ; S4 ¼ a11 : System of equations con-
necting syndrome and coefficients of error locator polynomial is for this case
1 0 K1 S1
AK ¼ ¼ ;
S2 S1 K2 S3
K2 ¼ ðS3 S2 S1 Þ=S1 ¼ ða2 a13 a14 Þ=a14 ¼ ð1 þ a þ a3 Þ=a14 ¼ a7 =a14
¼ a7 þ 15 ¼ a8 ;
and a same result could be obtained using relation K2 ¼ ðS3 þ S31 Þ=S1 and
error locator polynomial is
KðxÞ ¼ 1 þ a14 x þ a8 x2 :
Y
2
KðxÞ ¼ ð1 þ Xi xÞ ¼ ð1 þ axÞð1 þ a7 xÞ ¼ 1 þ ða þ ð1 þ a þ a3 ÞÞx þ a8 x2
i¼1
¼ 1 þ a14 x þ a8 x2 ;
X1 þ X2 ¼ a14 ; X1 X2 ¼ a8 :
In this case even without a Chien search, the values of error locator can be
easily found X1 ¼ a1 ; X2 ¼ a7 , and a correction is carried out by inversion of
the first and the eighth bit.
Polynomial corresponding to the corrected received word is now
Problems 295
^r ðxÞ ¼ 1 þ x4 þ x6 þ x7 þ x8 ;
and syndrome components for the corrected polynomial have the values
^
S1 ¼ ^r ðaÞ ¼ 1 þ a4 þ a6 þ a7 þ a8 ¼ 1 þ ð1 þ aÞ þ ða2 þ a3 Þ þ ð1 þ a þ a3 Þ
þ ð1 þ a2 Þ ¼ 0;
^
S2 ¼ ^r ða2 Þ ¼ 1 þ ða2 Þ4 þ ða2 Þ6 þ ða2 Þ7 þ ða2 Þ8 ¼ 1 þ ð1 þ a2 Þ
þ ð1 þ a þ a2 þ a3 Þ þ ð1 þ a3 Þ þ a ¼ 0;
^
S3 ¼ ^r ða3 Þ ¼ 1 þ ða3 Þ4 þ ða3 Þ6 þ ða3 Þ7 þ ða3 Þ8 ¼ 1 þ ð1 þ a þ a2 þ a3 Þ
þ a3 þ ða2 þ a3 Þ þ ða þ a3 Þ ¼ 0;
^
S4 ¼ ^r ða4 Þ ¼ 1 þ ða4 Þ4 þ ða4 Þ6 þ ða4 Þ7 þ ða4 Þ8 ¼ 1 þ a þ ða þ a3 Þ
þ ð1 þ a2 þ a3 Þ þ a2 ¼ 0:
All syndrome components are equal to zero and it can be concluded that roots
a, a2, a3, a4 are polynomial ^r ðxÞ zeros. Otherwise speaking, this polynomial is
divisible by a generator polynomial and its coefficients define a valid code
word. In this case it can be considered that the decoding was successful,
because decoder corrected both errors in a code word.
Solution
In this case a primitive polynomial root is found on the basis
pðaÞ ¼ a5 þ a2 þ 1 ¼ 0;
for addition is included, always existing in a field, but not important for a BCH code
construction.
The code (31, 16) generator polynomial is obtained as a product of three min-
imal polynomials
(a) Polynomial corresponding to the received word r(x) = x10, in the case when
the received word consists of all zeros, except at the 11th position, where is
binary one. The code corrects at least three errors, the guaranteed minimum
Hamming distance is d = 2 3 + 1 = 7 and in a code there is not a code
word having less than seven ones (except “all zeros” word). It is obvious that
the most probable is that the received word originates just from this word
consisting of all zeros, i.e. it is the most probable that the error occurred at the
eleventh position. It will be now verified by the Peterson algorithm [40].
The first step is the syndrome values determining
2 3
1 0 0
A ¼ 4 a20 a10 1 5
a9 a30 a20
In this (binary) case coefficients of X(x) having an odd index must be zero, and
a decoding reduces to finding the polynomial K(x) having a power smaller or
equal to ec satisfying
ðk Þ ðk Þ
dk ¼ Sk þ 1 þ K1 Sk þ . . . þ K1 Sk þ 1lk
pointing to errors at the positions corresponding to a13, a16 and a19 and the
correct received word is
r ð xÞ ¼ 1 þ x9 þ x11 þ x13 þ x14 þ x16 þ x19 ¼ x4 þ x þ 1 gð xÞ:
Solution Galois field GF(23) construction was explained in one of the previous
problems and field structure is given in Table 6.10. As previously explained, a
procedure for BCH construction guarantees the code correcting at least the number
of errors as required. However, it might happen that a code in some cases corrects
more errors than defined in advance, as illustrated in Problems 6.1 and 6.2.
This BCH feature, sometimes undesirable, is a consequence of the fact that gen-
erator polynomial has a greater number of successive roots than directly required by
code construction procedure (conjugate roots are included). In this problem it will
be illustrated how this drawback can be avoided, if the requirement for a binary
code is given up.
Reed-Solomon codes are a special class of nonbinary BCH codes. For a dif-
ference from binary BCH codes, here the code symbols and generator polynomial
300 6 Cyclic Codes
roots as well are from the same field. Because of that, generator polynomial of
Reed-Solomon code for correcting ec errors is not a minimal polynomials product,
but a direct product of factors am0 ; am0 þ 1 ; am0 þ 2 ; . . .; am0 þ 2ec 1 where usually
m0 = 1 is chosen, because the element a is primitive by rule [42]. In such a way the
code having word length n = 2m − 1 symbols is obtained where the number of
redundant symbols is always n − k = 2ec.
(a) To construct a code to correct one symbol at the code word having length
n = 7 nonbinary symbols, a generator polynomial should have roots a and a2.
Number of redundant symbols is n − k = 2, and the obtained code is (7, 5).
A generator polynomial is obtained directly
(a) (b)
Fig. 6.18 Code distances spectrum of Reed-Solomon codes RS(7, 5) (a) and RS(7, 3) (b)
(b) To construct a RS code correcting two symbol errors in the code word length
n = 7, roots of generator polynomial should be a, a2, a3, a4, and for the code
(7, 3) it is obtained
(c) To construct a code correcting three errors in the code word length n = 7, roots
of generator polynomial should be a, a2, a3, a4, a5, a6, and it is obtained
The coefficients of generator polynomial are binary and a code is a binary one.
It is obvious that a code RS(7, 1) reduces to BCH code (7, 1) equivalent to
sevenfold repetition code, where dmin = 7, the corresponding spectrum shown
in Fig. 6.15b.
and a code shortening is carried out by omitting some number (in general case—s)
information bits (symbols) usually at the most significant positions.
Solution
Galois field GF(8) construction by using polynomial p(x) = x3 + x + 1 is given in
Table 6.17 (not the same as in Problem 6.1!). Minimal polynomials corresponding
to conjugate roots are not given, because they are not interesting for RS code
construction.
(a) Reed-Solomon code correcting two errors in the code word length n = 7 has
the generator polynomial
Let the input sequence is (110010101). From Table 6.17 it can be concluded
that the corresponding symbols are a3 a1 a6.
Procedure to obtain the remainder after division by a generator polynomial can
be symbolically written as follows:
304 6 Cyclic Codes
...
gn-k gn-k+1 g2 g1 g0
...
code word
information word
α2 α5 α5 α6
and the corresponding cyclic code word is (a5 a0 a5 a6 a5 a0 a1), the binary
equivalent is (111001111101111001010). Block scheme of systematic Reed-
Solomon encoder for any generator polynomial is given in Fig. 6.19. The
functioning is the same as for binary codes—at a link the information bits are
send firstly and after them the remainder polynomial coefficients found in
delay cells of the circuit for division by polynomial g(x). Circuit for division,
as a main encoder element, is shown in details in Fig. 6.20 for a considered
case.
(a) (b)
Fig. 6.21 Code distance spectrum of the shortened codes RS(6, 1) (a) and RS(5, 1) (b)
as every code word symbol can take one from eight field values, total number
of code words is 83 = 512.
The omitting of the last information symbol yields the code (6, 2) having
82 = 64 code words, the word length n = 6, with the corresponding
polynomial
while the omitting the last two information symbols yields the code (5, 1),
with the corresponding polynomial
Code distance spectra of shortened codes RS(6, 2) and RS(5, 1) are shown in
Figs. 6.21a, b. It is obvious that a minimum Hamming distance for both codes
is dmin = 5, and they can, as the code RS(7, 3), by which shortening they were
obtained, correct two errors in the code word length.
Solution
(a) The relation connecting syndrome components and error locator polynomial
coefficients, when during the transmission a number of errors occurred being
just equal to the number of errors (m = ec) correctable by Reed-Solomon code
is [40]
2 32 3 2 3
S1 S2 S3 S4 Sec 1 Sec Kec Sec þ 1
6 S Sec þ 1 7 6 7 6 7
6 2 S3 S4 S5 Sec 7 6 Kec 1 7 6 Sec þ 2 7
6 76 7 6 7
6 S3 Sec þ 2 7 6 7 6 7
6 S4 S5 S6 Sec þ 1 7 6 Kec 2 7 6 Sec þ 3 7
6 S Sec þ 3 7 6 Kec 3 7 ¼ 6 Sec þ 4 7
7 6 7 6
AK¼6
0
6
4 S5 S6 S7 Sec þ 2
76 7 6
7:
7
6 .. ... ... ... .. .. .. 7 6 .. 7 6 .. 7
6 . . . . 7 6 . 7 6 . 7
6 76 7 6 7
6 76 7 6 7
4 Sec 1 Sec Sec þ 1 Sec þ 2 S2ec 3 S2ec 2 5 4 K2 5 4 S2ec 1 5
Sec Sec þ 1 Sec þ 2 Sec þ 3 S2ec 2 S2ec 1 K1 S2ec
a a3 a4 a6 a4 a5 a0 a
A0 K ¼ ¼ ; Be ¼ ¼ :
a3 a6 a5 a5 a a3 a4 a3
Problems 307
a6 a3 K2 a4
A0 K ¼ :
a3 a4 K1 a3
from which by a simple substitution it is found that the error at the fourth
position has the magnitude a, while the error at the sixth symbol has the
magnitude a5. Finally, error polynomial becomes
and a nearest code word is found by adding error polynomial and the received
polynomial
and it is easy to verify that the code word can be written as cðxÞ ¼ a2 x2 gðxÞ.
a a5
A0 ¼ ;
a5 a2
which is singular, because the second row equals to the first multiplied by a4.
A matrix equation reduces to scalar equality A″K = aK1 ¼ a5 from which
easily follows K1 ¼ a4 and K(x) = 1 + a4x. Therefore, an error occurred at the
fifth code word symbol.
Error magnitude is found using
Be ¼ a4 e4 ¼ a;
from which easily follows that the error at the fourth symbol has the
magnitude a4 and the error polynomial becomes
and a nearest code word is found by adding error polynomial and the received
polynomial
Although the fifth symbol of received word is a and the fifth code word
symbol is a2 it should be noted that the error magnitude is a very large being
a4. However, it is only a consequence of the way for defining addition in a
field GF(8) shown in Table 6.15.
Solution
(a) Berlekamp-Massey algorithm for RS codes decoding starts from the fact that
the error position polynomial does not depend on error magnitudes in a code
word. It makes possible to use Berlekamp algorithm to find the error position,
and later on to determine error magnitudes and to do their correction.
A complete algorithm is divided into a few steps as follows [41, 44]:
1. Find syndromes for a received word: {Sl} = {r(al)}, l = 1, 2, 3, …, 2ec
2. Set initial conditions K(−1)(x) = K(0)(x) = 1, d-1 = 1, d0 = S1, l-1 = l0 = 1.
3. Calculate improved error locator polynomial
a. if dk = 0 then Kðk þ 1Þ ð xÞ ¼ KðkÞ ð xÞ h i
b. if dk 6¼ 0 then Kðk þ 1Þ ðxÞ ¼ KðkÞ ðxÞ þ dk dq1 xkq KðqÞ ðxÞ
ðk Þ ðk Þ
dk ¼ Sk þ 1 þ K1 Sk þ . . . þ Klk Sk þ 1lk
Y
m
eloga Xi ¼ zðXi1 Þ= ð1 þ Xk Xi1 Þ
k¼1
k 6¼ i
11. Form error polynomial eðxÞ ¼ eloga X1 xloga X1 þ . . . þ eloga X2 xloga X2 , add it to
the received word polynomial to find the nearest code word c′(x).
310 6 Cyclic Codes
XðXi1 Þ
eloga Xi ¼ ; i ¼ 1; 2; k 6¼ i:
1 þ Xk Xi1
Although a code was designed to correct ec = 2 errors, in this case only one
error occurred (m = 1) and its magnitude is
and error polynomial is eðxÞ ¼ a4 x4 , the nearest code word is found by adding
error polynomial and received polynomial
Solution
(a) The Euclidean algorithm is an efficient method for finding the greatest com-
mon divisor (GCD) for a group of elements. It can be expressed as a linear
combination of these elements. Extended form of this algorithm, besides the
finding GCD, makes possible to find the corresponding coefficients.
Let a GCD for two elements (a,b) should be found. If the initial conditions
r−1 = a, r0 = b, s−1 = 1, s0 = 0, t−1 = 0, t0 = 1 are given, the algorithm
continues according to the following recursive relations
ri ¼ ri2 qi ri1
si ¼ si2 qi si1
ti ¼ ti2 qi ti1
where in every step the parameter qi is chosen providing that the relation
ri \ri1 is satisfied. Algorithm stops when the remainder rn = 0. The
remainder rn−1 is GCD(a,b). Recursive relations are chosen so as that in
every (ith) iteration provide
si a þ t i b ¼ ri
Table 6.19 illustrates the procedure for finding GCD(322, 115) and coeffi-
cients in linear combination for every iteration. It is obvious that GCD for 322
and 115 is 23, and the in every iteration the above relation is satisfied.
Table 6.19 Euclidean extended algorithm for finding the greatest common divisor for two
positive integers
i qi ri si ti sia + tib
−1 – a = 322 1 0 –
0 – b = 115 0 1 –
1 2 92 1 −2 1 322 − 2 115 = 92
2 1 23 −1 3 −1 322 + 3 115 = 23
3 4 0 5 −14 5 322 − 14 115 = 0
312 6 Cyclic Codes
(b) Euclidean algorithm for RS codes decoding is defined by the following steps
[39]
nP
k
1. Find syndrome polynomial SðxÞ ¼ sj xj þ 1 . If S(x) = 0, the received
j¼1
vector corresponds to a code word. If it is not true, go to the next step.
2. Set the following initial conditions: X1 ðxÞ ¼ x2ec þ 1 , X0 ðxÞ = 1 + S(x),
K-1(x) = 0, K0(x) = 1, i = 1.
3. By applying the extended algorithm find successive remainders ri(x) and
the corresponding polynomials ti(x)
Now the extended Euclidean algorithm for finding GCD for x2t+1 and [1 + S
(x)], can be used to obtain a complete of solutions (K(i)(x), X(i)(x)) enabling
that in ith iteration the following is satisfied
Xð xÞ ¼ a3 þ ax þ a6 x2 ¼ a3 ð1 þ a5 x þ a3 x2 Þ and Kð xÞ ¼ a3 þ a4 x þ a2 x2
¼ a3 ð1 þ ax þ a6 x2 Þ;
The aim of this subsection is to facilitate the study of the cyclic codes. Practically
all proofs are omitted. The corresponding mathematical rigor can be found in many
excellent textbooks [25, 27, 34].
It is in fact, the continuation of the introduction to algebra from the previous
chapter.
Finite (Galois) Field Arithmetic
Losely speaking, rings can be regarded as an initial step when introducing the
fields. The ring is a commutative group under addition, while the multiplication is
closed, associative and distributive over addition. The existence of an identity
314 6 Cyclic Codes
element for multiplication is not supposed (although, it may exist), Even, if the
identity element exists (it must be unique) all elements of the ring need not have the
inverses. If all ring elements (with the exception of identity element for addition—
0) have inverses and the multiplication is commutative, the field is obtained. Here it
will be entered more deeply to the theory of finite fields.
Ideals, Residue Classes, Residue Class Ring
The normal (invariant) subgroup is very important notion in the theory of groups.
The corresponding notion in the ring theory is ideal.
Axioms for the ideal:
I.1. Ideal I is the subgroup of the ring’s (R) additive group
I.2. 8i2I; 8r2R : ir^ ri2I:
Example
1. In the ring of all integers, the set of multiples of any particular integer is an ideal.
2. In the ring of polynomials in one variable with integer coefficients, the set of
polynomials which are multiples of any particular polynomial is an ideal.
Cosets can be formed, ideal being a subgroup. These cosets are here called
residue classes. The elements of the ideal form the first row (starting from 0). The
procedure is the same as being explained for the coset decomposition.
i1 = 0 i2 i3 . . .
r1 = r1 + i 1 r1 + i 2 r1 + i 3 . . .
r2 = r2 + i 1 r2 + i 2 r2 + i 3 . . .
. . . . . .
. . . . . .
. . . . . .
The ideal is a normal (invariant) subgroup and the cosets (residue classes) form a
group (factor group) under addition defined as follows
fri g þ rj ¼ ri þ rj ;
where {r} denotes the residue class containing r. If the multiplication is defined as
follows
fri g rj ¼ ri rj ;
it can be shown that the residue classes of a ring with respect to an ideal form also a
ring. This ring is called residue class ring.
Brief Introduction to Algebra II 315
Example In the ring of all integers, the integers which are multiples of 3 form an
ideal. The residue classes are {0}, {1} and {2}. They form the residue class ring—
modulo 3 addition and multiplication. In fact, they are elements of GF(3).
Ideals and Residue Class of Integers
As explained earlier, the set of all integers forms a ring under addition and mul-
tiplication. Let start with some definitions (only integers are considered):
1. If rs = t, it is said that t is divisible by r (and by s as well), or that r (s) divides t,
or that r and s are factors of t.
2. A positive integer p > 1 divisible only by ±p or ±1 is called a prime (integer).
3. The greatest common divisor of two integers, GCD(r, s), is the largest positive
integer dividing both of them.
4. The least common multiple of two integers, LCM(r, s), is the smallest positive
integer divisible by both of them.
5. Two integers are relatively prime if their greatest common divisor equals 1.
For every pair of integers t and d (non-zero integers) there is a unique pair of
integers q (the quotient) and r (the remainder), such that
t ¼ dq þ r; 0 r\jd j:
This is well known Euclid division algorithm. In the following the remainder
will be written as
r ¼ Rd ½t
or
r ¼ tðmod dÞ:
In fact, the last relation means that r and t have the same remainder modulo
d (both can be greater than d)—it is called congruence.
Some interesting properties of the remainders are
Rd ½t þ id ¼ Rd ½t ; for any i;
Rd ½t ¼ Rd ½t ; only quotient changes sign:
For two distinct non-zero integers (r, s) their GCD can always be expressed in
the form
GCDðr; sÞ ¼ ar þ bs;
105 ¼ 1 91 þ 14
91 ¼ 6 14 þ 7
7 ¼ 1 7þ0
Therefore GCD(105, 91) = 7 and, starting at the bottom we obtain
7 ¼ 916 14
7 ¼ 916 ð105 91Þ
7 ¼ 916 105 þ 6 91
7 ¼ 6 105 þ 7 91
Therefore, if one is interested only in remainders, to avoid big integers, they can
be replaced by their remainders at any point in computation.
It can be shown also that every positive integer can be written as a product of
prime number powers.
Using previously explained procedure it can be shown that residue class ring
modulo p is a field if (and only if) p is a prime number. Therefore, the arithmetic in
GF(p) (p—prime number) can be described as addition and multiplication modulo
p. These fields are called also prime fields.
In such a way, the arithmetic in prime fields is defined. To define the arithmetic
in GF(pn) the polynomial rings must be introduced.
Ideals and Residue Classes of Polynomials
Some definitions:
1. A polynomial with one variable x with the coefficients (fi) from (“over”) a field
GF(q) is of the following form
results will also have the coefficients from the field because they are obtained by
addition and multiplication of the field elements. The inverse element for addition is
just the same polynomial having as the coefficients the corresponding inverse
elements from the field. Generally, the inverse elements for multiplication do not
exist, but the division is possible. To obtain the field special conditions should be
imposed.
Some more definitions:
1. A polynomial p(x) of degree n is irreducible if it is not divisible by any poly-
nomial of degree less than n, but is greater than 0.
2. A monic irreducible polynomial of degree of at least 1 is called a prime
polynomial.
3. The greatest common divisor of two polynomials is the monic polynomial of
largest degree which divides both of them.
4. Two polynomials are relatively prime if their greatest common divisor is 1.
5. The least common multiple of two polynomials is the monic polynomial of
smallest degree divisible by both of them.
where the degree of r(x) is less than the degree of d(x). The remainder polynomial
can be written
rðxÞ ¼ RdðxÞ ½tðxÞ :
It can also be called a residue of t(x) when divided by d(x). The corresponding
congruence relation is
t ð xÞ r ð xÞðð d ð xÞÞ;
where the degree of r(x) can be greater than the degree of d(x).
Two important properties of remainders are:
RdðxÞ ½aðxÞ þ bðxÞ ¼ RdðxÞ ½aðxÞ þ RdðxÞ ½bðxÞ
RdðxÞ ½aðxÞbðxÞ ¼ RdðxÞ RdðxÞ ½aðxÞ RdðxÞ ½bðxÞ :
Further, the greatest common divisor of two polynomials r(x) and s(x) can
always be expressed in the form
318 6 Cyclic Codes
where a(x) and b(x) are polynomials (they are not unique!). They can be obtained in
an analogous way as for the corresponding integer relation.
Therefore, there is a parallel between the ring of integers and rings of polynomial
over a field. The word “integer” can be changed to word “polynomial”, “a < b” to
“deg a(x) < deg b(x)”, “prime number” to “irreducible polynomial”.
The nonzero polynomial over a field has a unique factorisation into a product of
prime polynomials (like the factorisation of integers into a product of primes). The
irreducible (prime) polynomial cannot be factored further, because it has no field
elements as the zeros.
Having the preceding note in view, the next conclusion is evident:
The set of polynomials which are multiples of any particular polynomial f(x) is
an ideal. The residue class ring formed from this ideal is called polynomial ring
modulo f(x).
One more note. The number of residue classes in the integer ring modulo d is
just d ({0}, {1}, …, {d − 1}). The number of residue classes in the polynomial ring
modulo d(x) of the degree n over the GF(q) equals the number of all possible
remainders—all possible polynomials—with the degree less equal to n − 1, i.e.
there are qn residue classes.
The Algebra of Polynomial Residue Classes
It can be easily proved that the residue classes of polynomials modulo polynomial f
(x) form a commutative linear algebra of a dimension n over the corresponding
coefficients field GF(q).
Denoting the field element with a, the scalar multiplication is defined as
The other axioms are also easily verified. Among qn residue classes, only n are
linearly independent spanning the vector space. They are, for example
f1g; fxg; x2 ; . . .; xn1 :
Of course,
ff ð x Þ g ¼ f0g ¼ 0
fxf ð xÞg ¼ 0
fx þ f ð xÞg ¼ fxg:
Brief Introduction to Algebra II 319
In the algebra of polynomials modulo f(x) there are ideals—the residue classes
which are multiples of some monic polynomial (class) g(x). But the polynomial g
(x) must divide f(x). It is called generator (generating) polynomial of the ideal. In
fact, every factor of f(x) is a generator polynomial of an ideal. The factors can be
irreducible polynomials or their products as well. There are no other ideals. The
ideal is a subgroup of the additive group. It is the basis of the corresponding
subspace. Let the degree of f(x) is n and the degree of g(x) – n − k. Then, the
dimension of the subspace is k and the linearly independent residue classes are
fgð xÞg; fxgð xÞg; . . .; xnk1 gð xÞ :
f ð xÞ ¼ gð xÞhð xÞ;
It can be said also that the corresponding subspaces generated by g(x) and h
(x) are null-spaces of each other.
In the following exposition, the parenthesis “{.}” will be usually omitted.
The “value” of the polynomial for any field element can be easily computed. Let
be p1(x) = 1 + x + x2 and p2(x) = 1 + x over GF(2). then
að xÞ ¼ a0 þ a1 x þ . . . þ an1 xn
bð xÞ ¼ b0 þ b1 x þ . . . þ bn1 xn ;
then
cð xÞ ¼ að xÞbð xÞ ¼ c0 þ c1 x þ . . . þ cn1 xn ;
because
xn þ j ¼ x j :
The first part of the expression corresponds to the terms whose sum of the
indexes is j and the second part to the terms whose sum is n + j. The coefficient cj
can be rewritten as an inner product
cj ¼ a0 ; a1 ; . . .; aj ; aj þ 41 ; aj þ 2 ; . . .; an1 bj ; bj1 ; . . .; b0 ; bn1 ; bn2 ; . . .; bj þ 1 :
It should be noted that the second vector is obtained taking the coefficients of b
(x) in reverse order and shifted cyclically j + 1 positions to the right. Therefore, if a
(x)b(x) = 0, then the vector corresponding to a(x) is orthogonal to the vector cor-
responding to b(x) with the order of its components reversed as well as to every
cyclic shift of this vector. Of course, the multiplication is commutative, therefore, a
(x)b(x) = 0 ) b(x)a(x) = 0 and the same rule can be applied keeping components
of b(x) in normal order and reverting and shifting the components of a(x).
Having the cyclic codes in view, the residue classes of polynomials modulo
polynomial xn − 1 will be almost exclusively considered in the following.
Example
Let f(x) = 1 + x3 = (1 + x)(1 + x + x2) over GF(2). The residue classes are (in
the first column the corresponding vectors are written):
(continued)
(000) {0} 1 + x3 x(1 + x3) = x + x4 . . .
3
(010) {x} 1+x+x x4 . . .
(110) {1 + x} x + x3 1 + x4 . . .
(001) {x2} 1 + x2 + x3 x + x2 + x4 . . .
(101) {1 + x2} x2 + x3 1 + x + x2 + x4 . . .
(011) {x + x2} 1 + x + x2 + x3 x2 + x4 . . .
(111) {1 + x + x2} x + x2 + x3 1 + x2 + x4 . . .
The first row is the ideal, the other rows are the residue classes. The elements of any
row have the same remainder after dividing by f(x). The corresponding basis is {1}, {x}
and {x2}. f(x) is the product of two factors—1 + x and 1 + x + x2. Let g
(x) = 1 + x. The elements of the corresponding ideal are {0}, {1 + x}, {xg
(x)} = {x + x2} as well as their sum {(1 + x)g(x)} = {1 + x} + {x + x2} = {1 + x2}.
This ideal is the null-space for the ideal generated by h(x) = 1 + x + x2 whose elements
are {0} and {1 + x + x2}. The corresponding orthogonality can be easily verified.
Galois Fields
Let p(x) is a polynomial with coefficients in a field F. The algebra of polynomials
modulo p(x) is a field if (and only if) p(x) is irreducible in F. If p(x) is of degree n, the
obtained field is called the extension field of degree n over F, while the field F is called
the ground field. The extension field contains as the residue classes all the elements of
the ground field. It is said that extension field contains the ground field. (Note: some
authors instead of irreducible polynomials consider only monic irreducible polyno-
mials called a prime polynomials—in the case of GF(2) there is no any difference).
The residue classes modulo prime number p form GF(p) of p elements. The ring
of polynomials over any finite field has at least one irreducible polynomial of every
degree. Therefore, the ring of polynomials over GF(p) modulo an irreducible
polynomial of degree n is the finite field of pn elements—GF(pn).
The existence of inverse elements in the field GF(pn) is proved in an analogous
way as in the field GF(q), using now the relation
There are no other finite fields. Moreover, the finite fields with the same number
of elements are isomorphic—they have the unique operation tables. The difference
is only in the way of naming their elements.
The fields GF(pn) are said to be fields of characteristic p. In the GF(p), p = 0
(mod p), therefore, in an extension field GF(pn), p = 0 as well. Then
p
X
p p
ða þ bÞ ¼ ai bpi ¼ ap þ bp ;
i¼0
i
322 6 Cyclic Codes
p p
because all binomial coefficients except and have p(=0) as a factor.
0 p
A few definitions and theorems follow (without proof!):
1. Let F be a field and a let be the element of an extension field of F. The
irreducible polynomial m(x) of the smallest degree over F with m(a) = 0, is
called the minimal polynomial of a over F.
2. The minimal polynomial always exists and it is unique.
3. Every element of the extension field has a minimal polynomial.
4. If f(x) is the polynomial over F and if f(a) = 0, then f(x) is divisible by the
corresponding minimal polynomial m(x).
For example, the minimal polynomial for the “imaginary unit” j from the field of
complex numbers (being extension of the field of real numbers) is x2 + 1, whose
coefficients are from the field of real numbers.
The Multiplicative Group of a Galois Field
Let G be any finite group. Consider the set of elements formed by any element
(g) and its powers
g; gg ¼ g2 ; gg2 ¼ g3 ; . . .
gi ¼ g j ) 1 ¼ gji :
The conclusion is that some power of g equals 1. Let e be the smallest such
positive integer (ge = 1). Then e is called the order of the element g. Obviously, the
set of elements
1; g; g2 ; . . .; ge1
forms a subgroup. There are e elements in the subgroup—the order of any field
element divides the order of the group (Lagrange’s theorem). A group consisting of all
the powers of one of its elements is called a cyclic group. Such an element is called a
primitive element. There can be more distinct primitive elements in the group.
The multiplicative group in the Galois field is cyclic—all the elements of the
field (except 0) are the powers of an element (primitive). Therefore, the Galois field
has a primitive element. The order of the primitive element equals the number of
elements of the multiplicative group—pn − 1.
The following definition will be useful:
A primitive polynomial p(x) (also called primitive irreducible polynomial) over
Galois field is a prime polynomial over the same field with the property that in the
extension field obtained modulo p(x), the field element represented by x is primi-
tive. The conclusion is that all the field elements can be obtained by the corre-
sponding exponentiation. Because {p(x)} = 0, one can say that a primitive
Brief Introduction to Algebra II 323
x7:(x4+x+1)= x3+1
x7+x4+x3
x4+x3
x4+x+1
x3+x+1 ⇒ α7=α3+α+1
(1000) α 0= 1 (1010) α 8= 1 + α2
(0100) α 1= α (0101) α 9= α + α3
2 2 10 2
(0010) α = α (1110) α = 1 + α + α
(0001) α 3= α3 (0111) α 11= α + α 2 + α3
(1100) α 4= 1 + α (1111) α 12= 1 + α + α 2 + α3
5 2 13
(0110) α = α + α (1011) α = 1 + α 2 + α3
(0011) α 6= α2 + α3 (1001) α 14= 1 + α3
7 3 15 0
(1101) α = 1 + α + α (1000) α (α )=1
and the last row in the table is added to show the cyclic structure only.
The addition in the Galois field GF(pn) can be easily carried out by adding the
corresponding coefficients in GF(p). On the other hand, the multiplication is more
complicated. After multiplying the polynomials (residue classes) the obtained result
should be divided by the corresponding primitive polynomial and the remainder taken
as a final result. The representation of the field elements by the powers of the primitive
324 6 Cyclic Codes
element will ease the task. For the preceding example, when multiplying {x2 +
x} = a5 by {x3 + x2 + x} = a11, the result is a5a11 = a16 = a15a = 1a = a (={x}).
The addition using primitive element is also possible by introducing Zech logarithms.
The next theorems are also important:
1. All the pn − 1 nonzero elements of GF(pn) are roots of the polynomial
n
xp 1 1. The element 0 can be included by considering the polynomial
n n
ðx 0Þðxp 1 1Þ ¼ xp x:
2. If m(x) is the minimal polynomial of degree r over GF(p) of an element b2GF
(pn), then m(x) is also minimal polynomial of bp.
In fact, m elements
2 r1
b; bp ; bp ; ; bp
are all the roots of m(x). These elements are, by analogy with the fields of real and
complex numbers, called conjugates. All conjugates have the same order. Let
e denotes the order of b, i.e. be = 1. Then
i
e j j j
bp ¼ bep ¼ ðbe Þp ¼ 1p ¼ 1:
r
There are no the others roots of m(x), because b is also the root of xp 1 1, i.e.
r r
r1 p r r
bp 1 1 ¼ 0 ) bp 1 ¼ 1: Therefore, bp ¼ bp ¼ b bp 1 ¼ b 1 ¼ b: Let
remind that the coefficients of m(x) are from the ground field GF(p).
Let a be a primitive element in a Galois field, then all the conjugates are also the
primitive elements because they all have the same order.
Example Find the minimal polynomial for a2GF(24) over GF(2). The conjugates
are
m1 ðaÞ ¼ a4 þ a þ 1 ¼ 1 þ a þ a þ 1 ¼ 0
m1 ða2 Þ ¼ a8 þ a2 þ 1 ¼ 1 þ a2 þ a þ 1 ¼ 0
m1 ða4 Þ ¼ a16 þ a4 þ 1 ¼ a þ 1 þ a þ 1 ¼ 0 ða15 ¼ 1 ) a16 ¼ aÞ
m1 ða8 Þ ¼ a32 þ a8 þ 1 ¼ a2 þ 1 þ a2 þ 1 ¼ 0 ða30 ¼ 1 ) a32 ¼ a2 Þ:
Brief Introduction to Algebra II 325
m2 ðxÞ ¼ ðx þ a3 Þðx þ a6 Þðx þ a12 Þðx þ a24 Þ ¼ ðx þ a3 Þðx þ a6 Þðx þ a12 Þðx þ a9 Þ
¼ x4 þ x3 þ x2 þ x þ 1:
and all the nonzero elements GF(24) are the roots of x15 + 1 [a0 = 1 is the element
both of GF(2) and GF(24)].
It should be noted that a7 and its conjugates are also the primitive elements:
a7 ; ða7 Þ2 ¼ a14 ; ða7 Þ3 ¼ a21 ¼ a6 ; ða7 Þ4 ¼ a28 ¼ a13 ; a5 ; a12 ; . . .; ða7 Þ14 ¼ a98
¼ a8 ; ða7 Þ15 ¼ a105 ¼ a0 ¼ 1:
multiplication of field elements will be always the same. As stated earlier, all finite
fields of the same order are isomorphic. When forming the new table only the field
elements are denoted by different symbols.
It should be noted also that conjugate non-primitive elements form a corre-
sponding subgroup. The multiplicative group in GF(24) has the following sub-
groups (the order of the subgroup is divisor of the order of the group—
15 = 1 3 5):
a0 ¼ 1
a3 ; a6 ; a9 ; a12 ; a15 ¼ a0 ¼ 1
a5 ; a10 ; a15 ¼ a0 ¼ 1:
Chapter 7
Convolutional Codes and Viterbi
Algorithm
Parallelly to the block codes, the convolutional codes are the second important
family of error correcting codes. They are the most important subset of so called
tree codes, where some additional limitations are imposed (finite memory order,
time invariance, linearity). They were first introduced in the “systematic” form by
Elias in 1955 [46] as an alternative to block codes.
During block encoding, a block of k information symbols is represented by a
code word of length n. In the case of systematic code, information symbols are not
changed and n − k symbols are added. If information bits (symbols) are statistically
independent the obtained code words will be statistically independent as well. From
theory, it is known that good results can be obtained if the code words are relatively
long. On the other hand, for a longer code word encoders, and especially decoders,
are more and more complex. During convolutional encoding, to obtain n channel
bits, parity checks added do not depend on the considered k information bits only,
but as well on m previous k-tuples of information bits. Therefore, the statistical
dependence is introduced not for n channel bits, but for (m + 1)n channel bits.
Otherwise speaking, if the channel bits are observed using “window” of length
(m + 1)n, the statistical dependence will be found. In the next step, the window
“slides” for n bits, and the observed bits are statistically dependent as well. The
encoding procedure can be interpreted as follows. The input signal (bits) consisting
of blocks of k bits enters into the linear system whose impulse response is m + 1
blocks “long”. At its output, the blocks of n symbols are generated and the output
signal is the convolution of the input signal and the system impulse response. The
name convolutional code originates from this interpretation. In such way, and for
smaller values of k and n, the longer “code words” can be obtained, i.e. one would
expect that good performance will be obtained. Often, the codes with k = 1 are
used, i.e. with code rate R = 1/n. The convolutional codes can be systematic as well
as nonsystematic ones.
kL
Reff ¼ \R;
nðL þ m 1Þ
v ¼ nðm þ 1Þ
BUFFER MEMORY k
LOGIC
n bits
where m is length of the longest shift register in memory (there is no need for all
shift registers to be the same length). It can be interpreted as the maximum number
of encoder output bits that can be affected by a single information bit.
As said previously, if the incoming information bits are not changed during
encoding, a systematic code is obtained. In Fig. 7.2 a nonsystematic convolutional
encoder is shown. If there is not a feedback in the encoder, it is not recursive (as in
Fig. 7.2) (Problems 7.3, 7.8). The parameters are k = 1, n = 2, m = 2, R = 1/2 and
m = 6. The encoder input is denoted as x and outputs as y1 and y2.
The outputs are obtained by adding the bit which is at the encoder input and bits
at the corresponding cells of shift register. They are denoted as s1 , s2 being the
components of state vector, i.e. the encoder state is S ¼ ½s1 s2 . This encoder has
four states corresponding to dibits or to its decimal equivalents (00 = 0, 01 = 1,
10 = 2, 11 = 3). Convolutional encoder can be considered as a finite state auto-
mate. Its functioning can be represented by table or by state diagram. If the state
vector components in the next moment are denoted as s01 i s02 , the corresponding
equations are
s01 ¼ x; s02 ¼ s1 ; y1 ¼ x s1 s2 ; y2 ¼ x s2 :
y1
x ∈ {0,1} (1)
s1 s2
(2)
y2
1/11
00 0/11
1/00
10 01
0/10
1/01 0/01
11
1/10
The corresponding state diagram (Problems 7.1–7.3) is shown in Fig. 7.3 (state
diagram is explained in Chap. 2). The same encoder is used in Problem 7.3, but
there Viterbi algorithm is illustrated.
Along the branches in state diagram, the bit entering the encoder is denoted as
well as the bits at the encoder output. Historically, for the first description of
convolutional encoder the code tree was used (code tree is explained in Chap. 2). In
Fig. 7.4 a corresponding code tree for the considered example is shown, and the
path (heavy lines) corresponding to sequence 1010 at the encoder input (when the
encoder is reset). The corresponding states sequence is 00-10-01-10-01-…, while at
the encoder output, the sequence is 11100010…. Therefore, to every input (and
output) sequence corresponds a unique path in the state diagram.
The main drawback of the code tree is that the number of nodes grows expo-
nentially. To follow the system “behavior” some kind of “dynamic” state diagram
can be used—trellis (explained as well in Chap. 2). The corresponding trellis is
shown in Fig. 7.5.
There exist the other useful descriptions of convolutional encoders. Consider
firstly the polynomial description (Problem 7.1). Every memory cell is a part of
delay circuit. By introducing delay operator D (“delay”) or z−1, denoting one shift
delay, the following can be written
s1 ¼ Dx; s2 ¼ Ds1 ;
y1 ¼ x s1 s2 ¼ xð1 þ D þ D2 Þ;
y2 ¼ x s2 ¼ xð1 þ D2 Þ:
Therefore, the encoder can be completely described using two generator poly-
nomials [47]
Brief Theoretical Overview 331
00
a
Stanja: 00
a
a=00 11
b=01 c
c=10 00
a
d=11 10
b
11
c
01 d
00
a
11 a
10
b
00
c
11
c
01
b
01 d
10
x =0 d
a
00
a
x =1
11 a
11
c
10 b
10 b
00
c
01
d
11
c=(10)
11
a
01
b
00
c
01
d
01
b
10
d
10
d
10
1/01
1/01 0/01
11
1/10
g1 ðDÞ ¼ 1 þ D þ D2 ; g2 ðDÞ ¼ 1 þ D2 ;
xðDÞ ¼ 1 D0 þ 0 D1 þ 1 D2 ¼ 1 þ D2 :
and the corresponding output sequence, after multiplexing the obtained sequences is
y ¼ 11 10 00 10 11:
g1 ¼ ð111Þ; g2 ¼ ð101Þ:
In the literature, sometimes octal equivalent is used. Sets of three-bit (from left
to right) are written using octal symbols. If the number of bits is not divisible by
three, at the sequence beginning the corresponding number of zeros is added (some
authors put the zeros at the sequence end!). In the considered example, generator
matrix is GOKT = [7, 5].
The “impulse response” can be used as well. For the considered example for a
single 1 at the input, at the first output 111 will be obtained, and at the second one 101.
Generally, for the input sequence
i ¼ ði0 ; i1 ; . . .; il ; . . .Þ;
the corresponding output sequences (encoder with two outputs) are the corre-
sponding convolutions
ð1Þ ð1Þ ð1Þ ð2Þ ð2Þ ð2Þ
vð1Þ ¼ v0 ; v1 ; . . .; vl ; . . . ; vð2Þ ¼ v0 ; v1 ; . . .; vl ; . . . :
For the previous example (Fig. 7.2) for input sequence x = 101, the output
sequence is y = 1110001011:
1 11 10 11
0 00 00 00
1 11 10 11
Sum: 11 10 00 10 11
It should be noted as well that shift registers are not uniformly drawn in liter-
ature. For example, two encoders shown in Fig. 7.6 are equivalent.
+ +
+
+
Now, the decoding problem will be considered. The following procedures are of
interest:
– majority logic decoding;
– sequential decoding;
– Viterbi algorithm.
It should be mentioned that majority logic decoding and Viterbi algorithm can be
used for block codes decoding as well (Chaps. 8 and 9). Majority logic decoding
and sequential decoding are specific for error control, while Viterbi decoding is
universal. Viterbi decoding and sequential decoding use trellis.
Majority logic decoding is a special case of threshold decoding elaborated in
details by Massey [48] having in mind convolutional encoding. However, this
approach was used earlier, especially for Reed-Muller codes [49, 50] (Chap. 5).
Hard decision and soft decision can be applied as well. Threshold decoding can be
applied for some codes classes only. Further, majority logic decoding is possible
only if hard decision is used.
Sequential decoding can be used for a very long constraint lengths. It was
proposed by Wozencraft [51]. Fano [52] proposed a new algorithm now known as
Fano algorithm. Later, Zigangirov [53] and Jelinek [54] (separately) proposed a
new version of decoding procedure called now stack algorithm. The idea of
sequential decoding is very simple. Decoder slides over only one path in trellis
measuring distance (hard or soft) of the received sequence and this path. If it is
“satisfying”, the received sequence is decoded and the process starts anew. If the
distance grows up over some threshold, decoder returns to some node and starts
considering the other branch. Threshold should be adjusted during the decoding.
This adjusting and mode of sliding form in fact the corresponding algorithm.
Convolutional codes and sequential decoding were used in some space missions
(Pioneer Program, Voyager Program, Mars Pathfinder).
However, the Viterbi algorithm is the most interesting and will be exclusively
considered and explained in details in this chapter. Viterbi proposed his algorithm
more as a pedagogical device, but it has found many applications. Here the
important code characteristic is called free distance (dfree) (Problem 7.2). It is a
minimum weight of encoded sequence whose first information block is nonzero.
Viterbi algorithm is a maximum likelihood decoding. It moves through the trellis
keeping only the best path (survivor) (Problem 7.3) to each state. It means that the
corresponding metric is used—usually Hamming distance (Problems 7.3–7.6) or
Euclidean (squared) distance (Problems 7.5–7.7). The signal used for metric cal-
culation can be quantized. If there are only two levels it results in a Hamming
distance (hard decision), otherwise the Euclidean metric is used (soft decision).
The state diagram can be modified by splitting the initial state into two states
(Problem 7.2) to find a modified transfer (generator) function, from which the
corresponding weight spectrum can be found. It can be used to find the bounds of
residual error probability. Consider convolutional encoder shown in Fig. 7.2, which
has transfer function matrix G(D) = [g1(D), g2(D)] = [1 þ D þ D2 , 1 þ D2 ].
Brief Theoretical Overview 335
(a) (b)
0/00 Sout Sin
00 LND 2 LD 2
1/11
0/11
LN
1/00
S2 S1
10 01
LD
0/10
1/01
LND LD
0/01
11 S3
1/10 LND
Fig. 7.7 Original (a) and modified (b) state diagram of the considered encoder
By eliminating S1, S2 and S3, the corresponding relation between two zero states
is obtained
LD LND2 ð1 LNDÞ
Sin ¼ D2 L Sout ;
1 LND 1 NLD NL2 D
Sin L3 ND5
TðL; N; DÞ ¼ ¼ :
Sout 1 NLDð1 þ LÞ
336 7 Convolutional Codes and Viterbi Algorithm
By using identity
1 X1
¼ xi ¼ 1 þ x þ x2 þ x3 þ x4 þ
1x i¼0
where the element m Ll N n Dd shows that there are m weight paths, d formed for
l entering bits, from which in this entering sequence there are n ones.
Denoting with ad,n the number of paths having weight d, generated by an input
of the weight n, a modified generating function for L = 1 can be written in a form
X
1 X
1
TðN; DÞ ¼ ad;n N n Dd ;
d¼dfree n¼1
where the total number of input ones on the paths having a weight d, denoted by cd,
can be found from
@TðD; NÞ X1 X1 X1 X1
jN¼1 ¼ nad;n Dd ¼ cd Dd ) cd ¼ nad;n :
@N d¼d n¼1 d¼d n¼1
free free
Trellis corresponding to the considered encoder is shown in Fig. 7.8, while the
weight spectrum is given in Table 7.2.
The first nonzero component is obtained for d = 5, therefore dfree = 5. It cor-
responds as well to the above obtained infinite series where the first nonzero term is
0/01 1/01
1/01 0/01 0/01 1/01
3
1/10 1/10 1/10
1
10
L3 ND5 . In this code k = 1, minimum distance after the first (i + 1) information bits
(di—i = 0,1,2,…) is d0 = 0, d1 = 0, d2 = 5, d3 = 5,…. Therefore dm = d2 = 5, and
the code can correct up to ec = (dfree–1)/2 = 2 errors on one constraint length
(6 bits).
The next question is what will happen if the number of errors is greater than code
correcting capability. If there are no errors at further received bits, the decoder will
return to the correct path. The path part comprising errors is called “error event”.
However, some bits during the error event will be correctly decoded. The standard
measure of error correcting capability is BER (“Bit Error Rate”).
Performance of convolutional code can be found using the weight spectrum. For
the above example suppose that the encoder emits all zeros sequence and that the
error vector transforms this sequence into some possible sequence. Let this
sequence is 111011. According to Fig. 7.9 this sequence is valid. Generally, the
decision error will appear if three or more bits of the received sequence differ from
the emitted sequence. It should be noted that the error at the fourth bit (0 for both
paths) does not affect the decision. Because of that the summing is up to d = 5.
For BSC with crossover probability p, the probability to choose the path with
Hamming weight d = 5 (denoted as P5)
5
X 5
P5 ¼ pe ð1 pÞ5e
e¼3
e
338 7 Convolutional Codes and Viterbi Algorithm
For d = 6, 3 bit errors will yield the same probability for right and wrong
decision, while 4 and more bit errors will always result in wrong decision yielding
X6
1 6 3 6 e
P6 ¼ p ð1 pÞ3 þ p ð1 pÞ6e :
2 3 e¼4
e
Generally, path error event probability for path having the weight d is
8
> Pd d e
>
< e¼ðd þ 1Þ=2 p ð1 pÞde ; d neparno
e
Pd ¼ Pd :
>
> d d e
:21 d=2 d=2
p ð1 pÞ þ e¼d=2 þ 1 p ð1 pÞde ; d parno
d=2 e
This probability can be limited from the upper side (for both even and odd
d values) [55]
X
d X
d
d e d d=2
Pd ¼ p ð1 pÞde \ p ð1 pÞd=2 ¼ 2d pd=2 ð1 pÞd=2 :
e e
e¼ðd þ1Þ=2 e¼ðd þ1Þ=2
Taking into account the number of paths which have the weight d(ad), the
average probability for wrong path choice does not depend on the node where the
decision is made, being
X
1 X
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d
Pe ad P d \ ad ½2 ð1 pÞp ¼ TðDÞD¼2pffiffiffiffiffiffiffiffiffiffi
ð1pÞp
ffi:
d¼dfree d¼dfree
Having in mind that p is usually a small number (p 0.5), previous sum can be
calculated on the basis of its first term as follows:
This analysis can used for BER calculation. In fact, for every wrong decision, the
number of bits in error equals the number of ones at the wrong part. Denoting this
number with cd (the total number of ones at all sequences with weight d) and
denoting with k the number of information bits entering the encoder in one time
unit, BER is upper-limited as follows
1 X 1
Pb \ cd Pd :
k d¼d
free
Brief Theoretical Overview 339
By using the above expression for Pd, this expression can be further simplified
1 X 1
1 X 1 h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiid 1 @TðD; NÞ
Pb \ cd Pd ¼ cd 2 pð1 pÞ ¼ jN¼1; D¼2pffiffiffiffiffiffiffiffiffiffi
ffi
k d¼d k d¼d k @D pð1pÞ
free free
and if it is supposed that the first term in the sum is dominant, a simple expression
for hard decision Viterbi algorithm is obtained [25] (Problem 7.5)
1
Pb \ cdfree 2dfree pdfree =2 :
k
For the above considered example dfree = 5, adfree ¼ 1 and cdfree ¼ 1, and for
p = 10−2 and k = 1 it is obtained
Consider the system using BPSK and coherent demodulation with binary
quantization of demodulator output modeled as BSC. Denoting by Eb/N0 the ratio
of energy per information bit and average power density spectrum, error probability
without encoding is
rffiffiffiffiffiffi
1 Eb 1 Eb =N0
p ¼ erfcð Þ e ;
2 N0 2
1
Pb \ cdfree 2dfree eðdf ree =2ÞðEb =N0 Þ :
2
and
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 1
Eb 1 @TðD; NÞ
Pb cd erfcð Rc d Þ jN¼1;D¼1=2eREb =N0 :
2 d¼d N0 k @D
free
340 7 Convolutional Codes and Viterbi Algorithm
If the initial encoder state is not known (Problem 7.6), the algorithm starts from
all possible states. If the synchronization is lost, decoder and encoder can be
resynchronized after some steps, if there were no channel errors. One of drawbacks
of Viterbi algorithm is variable rate (Problem 7.4) of the decoded bits at the decoder
output. It can be overcomed by so called fixed delay (Problem 7.6). The decoding is
performed until some trellis depth (defined by fixed delay) and then the decisions are
made. An interesting class are punctured convolutional codes (Problem 7.7), where
some output bits of the originating code are omitted (“punctured”). It can result in the
encoder having the same code rate and better error correcting capability.
There are two ways to approach the limiting performance of digital communi-
cation systems. In the case when the signal-to-noise ratio is limited (i.e. where the
average signal power is limited), the performance can be improved by using binary
error control coding including the adequate frequency band broadening. When the
frequency band is limited, performance can be improved using multilevel signaling
and the corresponding more powerful transmitter. For the last case an interesting
procedure is TCM (Trellis Coded Modulation) (Problem 7.8) being a combination of
convolutional encoder and modulator whose parameters are simultaneously opti-
mized (e.g. the mapping information bit combinations into a modulated signal
levels). For decoding the Euclidean metric is used. The first mention of this possi-
bility was in 1974 by Massey [56], while Ungerboeck later, starting from 1976,
elaborated the idea in many articles [57–60]. Coding and modulation are jointed here.
The next step is the substitution of code symbols by the channel signals. In
Fig. 7.10 some one-dimensional and two-dimensional signal constellations are
shown. One dimension corresponds to the baseband transmission, but two
dimensions (complex numbers) corresponds to modulation (multilevel PSK,
QAM). To compare the efficiency of coding schemes the notion of asymptotic
coding gain is often used. For soft Viterbi algorithm decoding it is (Problem 7.8)
dfree
Ga ¼ 20 log ½dB;
dref
where dfree is free Euclidean distance and dref is minimal Euclidean distance without
the error control coding (for the same signal-to-noise ratio).
Brief Theoretical Overview 341
One-dimensional constellations
binary
-1 0
quaternary
-3 -1 0 1
eight levels
-7 -5 -3 -1 0 1 3 5 7
Two-dimensional constellations
QPSK (4QAM rotated for 90o) 8PSK 16QAM
01
d1
10 01
10
11 11
Consider now digital communication system using QPSK where every level
(carrier phase) corresponds to two information bits. Let the error probability is
higher than allowed. What to do? One way is to use the higher transmission power.
However, there are also some other ways. One is to use convolutional code R = 2/3,
with the same power. Error probability will be smaller, but the information rate will
be smaller as well. The other way is to use 8PSK and the same convolutional code
without changing the power nor bandwidth. In Fig. 7.11 the constellation for QPSK
signal (without error coding) is shown and the corresponding “degenerated trellis”.
Gray code is used. There is no any memory in the signal and all four branches enter
into the same node. Error probability is determined by referent (minimal) distance
pffiffiffi
d1 ¼ 2.
342 7 Convolutional Codes and Viterbi Algorithm
Let augment the number of levels (phases) to eight. Every phase corresponds to
three bits. In Fig. 7.12 the corresponding constellation is shown with the corre-
sponding distances denoted.
Consider now 8PSK and convolutional coding R = 2/3. Two equivalent encoder
configurations are shown in Fig. 7.13 and the part of the corresponding trellis. The
encoder is systematic (easily seen from the second variant). From every node four
branches (paths) are going out (corresponding to different information bits combi-
nations—last two bits in tribit) and four paths enter into every node. It should be
noticed that every nonsystematic convolutional encoder [if it is not catastrophic
(Problem 7.4)] can be made equivalent to a systematic encoder, if feedback is allowed.
The corresponding TCM modulator is shown in Fig. 7.14.
By comparing the paths going from node (0) to node (0) one can find that
the distance of other paths from the all zeros path, over the node (1) is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dð1Þ ¼ d02 þ d12 , over the node (2) dð2Þ ¼ d22 þ d32 and over the node
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(3) dð3Þ ¼ d32 þ d02 . Therefore, the free distance is dfree ¼ dð1Þ ¼ d02 þ d12 , and
the code gain is
pffiffiffi
d02 þ d12 2 2þ2
Ga ¼ 10 log ¼ 10 log ¼ 1; 1 dB:
d12 2
000(ejπ/2)
d0
100 001(ejπ/4)
d1
011(ej0)
101
d2
d3
111 010
110
Fig. 7.12 8PSK signal constellation, bit combinations and Euclidean distances.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
(d0 ¼ 2 2; d1 ¼ 2; d2 ¼ 2 þ 2; d3 ¼ 2)
Brief Theoretical Overview 343
+
+ +
+
+
110
110
(1) 011
010
100 101
101
(2) 110
111
Fig. 7.13 Two equivalent scheme of convolutional encoder and part of trellis
ej0
ENCODER
ejπ/4
Inf.
bits
SIGNAL ejπ/2
SELECTOR .
.
.
generator polynomials are not suitably chosen, i.e. if they have the common factors.
At the end of convolutional decoding a few bits non carrying information (tail bits)
(Problem 7.8) are added to provide the trellis termination in the initial state.
However, this number is very small comparing to the number of bits usually
transferred in one file. Convolutional codes are practically used for error correction
only. They can be used as well to correct bursts of errors. Block codes decoding by
using trellis will be considered in the next chapter.
344 7 Convolutional Codes and Viterbi Algorithm
Problems
y1 ¼ x1 s1 s2 ¼ ð1 þ D þ D2 Þx1 ;
y2 ¼ x1 s3 ¼ x1 þ Dx2 ;
y3 ¼ x2 ¼ x2 ;
also described in a matrix form (transfer function matrix), having as the elements
polynomials, binary or octal numbers [61]
1 þ D þ D2 1 0
GPOL ðDÞ ¼ ) ðy1 y2 y3 Þ ¼ ðx1 x2 Þ GPOL ;
0 D 1
111 1 0 111 001 000 7 1 0
GBIN ¼ ¼ ) GOCT ¼ ;
0 10 1 000 010 001 0 2 1
where, in binary form a corresponding number of zeros should be added from the
left (or from right, it is a matter of convention which is not unique in the literature)
and after that every three-bit is transformed into a corresponding octal number.
y1
x1
s1 s2
y2
x2
s3
y3
Corresponding trellis parameters are determined directly from the encoder structure:
– the encoder has in total m = 3 delay cells, the number of states is 2m = 8;
– the encoder has in total k = 2 inputs (number of rows in GOCT), number of
branches leaving every node is 2k = 4;
– the encoder has in total three outputs (number of columns in GOCT), number of
outgoing bits for every branch is n = 3.
Transitions into the next states are given by s01 = x1, s02 = s1, s03 = x2. The
encoder is described in details by Table 7.3, while the corresponding state diagram
is shown in Fig. 7.16. For a better insight the corresponding combinations of input
and output bits are omitted.
It can be shown that with the code memory increasing, its correcting capability
increases as well, the code rate remaining constant (the same number of inputs and
outputs), but with the number of delay cells the number of states grows expo-
nentially, resulting in more complicate decoding procedure.
On the other hand, all convolutional codes which have the same code rate and
the memory order have not the same error correction capability. A code quality
depends on the encoder structure and is primarily determined by a corresponding
free distance, as shown in the next problem.
Problem 7.2
(a) Find the modified generating function and the weight spectrum of the con-
volutional code defined by matrix G(D) = [1 + D, D];
(b) Draw a few first steps of a trellis diagram needed to find the weight spectrum
of the convolutional code defined by matrix G(D) = [1 + D+D2, 1].
Solution
(a) The encoder block scheme is shown in Fig. 7.17, and the corresponding
state diagram is shown in Fig. 7.18a. Modified state diagram is shown in
Fig. 7.18b.
000
001 111
010 110
011 101
100
y1
x
y2
(a) (b)
Sout Sin
0/11
0 1 LND LD2
1/10
S1
0/00 1/01
LND
Fig. 7.18 Original (a) and modified (b) state diagram of the encoder shown in Fig. 7.17
Problems 347
The marks on the branches of a modified state diagram are standard, every
branch that has n units at the input is marked with Nn, every branch that has
d units at the output is marked with Dd. One time interval corresponds to the
transmission of one information bit, it is symbolically denoted with L. The
corresponding system of equations is
and it is easy to find a relation connecting the initial and the final zero state,
yielding a modified generation function of a code [47]
1 X1
¼ xi ¼ 1 þ x þ x2 þ x3 þ x4 þ
1x i¼0
where the element m Ll N n Dd shows that there are m weight paths, d formed
for l entering bits, from which in this entering sequence there are n ones.
Denoting with ad,n a number of paths having weight d, generated by an input
of the weight n, a modified generating function for L = 1 can be written in a
form [47]
X
1 X
1
TðN; DÞ ¼ ad;n N n Dd ;
d¼dfree n¼1
where the total number of input ones on the paths having a weight d, denoted
by cd, can be found from
@TðD; NÞ X1 X1 X1 X1
jN¼1 ¼ nad;n Dd ¼ cd D d ) cd ¼ nad;n ;
@N d¼d n¼1 d¼dfree n¼1
free
X
1
TðDÞ ¼ TðL; N; DÞjL¼N¼1 ¼ ad Dd ¼ D3 þ D4 þ D5 þ . . .
d¼dfree
Table 7.4 Spectrum of a code which state diagram is shown in Fig. 7.19
d 1 2 3 4 5 6 7 8 …
ad 0 0 1 1 1 1 1 1 …
cd 0 0 1 2 3 4 5 6 …
1
1/01 1/01 1/01
Fig. 7.19 Modified trellis for finding weight spectrum for a code from (a)
x1
y2
Problems 349
(a) (b)
0/00 Sout Sin
00 LND2 LD
1/11
0/10 LND
1/01
S2 S1
10 01
LD
0/10
LND L
1/01 0/00
S3
11
1/11 LND2
Fig. 7.21 Original (a) and modified (b) state diagram of the encoder shown in Fig. 7.20
0/00
1/01 0/00 1/01 0/00 1/01 1/01
3
1/11 1/11 1/11 1/11
Problem 7.3 One digital communication system for error control coding use the
convolutional encoder and Viterbi decoding. A convolutional encoder has one
input, two outputs, defined by matrix
GPOL ðDÞ ¼ 1 þ D þ D2 ; 1 þ D2
The initial convolutional encoder state is 00.
(a) Draw the convolutional encoder block scheme, find the table describing the
encoder functioning and draw a state diagram.
(b) If the decoder input sequence is
00 11 10 11 11 0
Solution
Using matrix GPOL(D) a convolutional encoder structure is firstly obtained, shown
in Fig. 7.23, where D denotes a delay corresponding to the duration of one infor-
mation symbol. This symbol is used because D flop-flops are usually used as delay
cells. There are m = 2 delay cells, a total number of states is 2m = 4 being deter-
mined by all combinations of cell delay outputs, denoted by s1 and s2. Number m is
usually called memory order, and m = (m + 1)n is a constraint length, showing a
number of output bits depending on one input bit. There is only one input and from
every state 2k = 2 paths start. The number of encoder outputs is n = 2, its code rate
is R = k/n = 1/2, meaning that a binary rate at the output is two times greater than
information rate at its input,
(a) If the series of encoder input bits is denoted by xn, states are determined by
previously emitted bits in this sequence, i.e.
x ¼ xn ; s1 ¼ xn1 ; s2 ¼ xn2
where s1 ; s2 define a current encoder state S ¼ ðs1 s2 Þ, they are usually denoted
by the corresponding decimal numbers (00 = 0, 01 = 1, 10 = 2, 11 = 3). On
the other hand, the next state S0 ¼ ðs01 s02 Þ and the output are determined by
encoder structure and by the previous state:
s01 ¼ x; s02 ¼ s1 ;
y1 ¼ x s 1 s 2 ; y2 ¼ x s 2 :
For all possible combinations of x, s1 and s2 values, using the above relations,
Table 7.5 is formed. Corresponding state diagram and trellis are shown in
Fig. 7.24a, b. The transitions corresponding to ‘0’ information bits are drawn
by dashed lines and those corresponding to ‘1’ information bits are drawn by
full lines. It can be noticed that from two paths going out of a considered state,
the branch going “up” (relatively to the other one) corresponds to ‘0’ infor-
mation bit and vice versa. This rule is always valid when there is not a
y1
(1)
D D
s1 s2
(2)
y2
1/10 11 11
1/10
Fig. 7.24 State diagram (a) and trellis (b) of convolutional encoder from Fig. 7.23
feedback in the encoder (i.e. when the encoder is not recursive). In the fol-
lowing it will be shown that this feature makes the decoding process easier.
(b) On the basis of input information bits the encoder emits one of the possible
sequences, denoted by i. The input sequence uniquely determines the output
sequence c transmitted through the channel. If the sequence at the channel
output, i.e. at the decoder input (denoted by r), is known, the input encoder
sequence can be uniquely reconstructed. Therefore, at the receiving end, one
should estimate which of possible sequences was emitted resulting directly in
a decoding sequence.
The encoder functioning can be described by a semi-infinite tree starting from
one node and branching infinitely, and the decoding is equivalent to finding
the corresponding tree branch. To every output sequence corresponds a unique
path through a state diagram—trellis diagram. An example for a trellis cor-
responding to the considered convolutional encoder, if the initial state was 00
352 7 Convolutional Codes and Viterbi Algorithm
10
1/01
1/01 0/01
11
1/10
Fig. 7.25 Trellis corresponding to a sequence for the first three information bits at the encoder
input
(encoder was reset at the beginning) is shown in Fig. 7.25. The transient
regime is finished when in every of states the same number of branches enters
as started from the initial state at the depth t = 0. In the case of encoder having
one input, the transient regime duration equals the number of delay cells.
In the case of Viterbi decoder with Hamming metric the rules are as follows
[62]:
1. To every branch a branch metric is adjoined being equal to the number of
different outgoing bits from the branch relating to the received bits in this step.
2. To the nodes a metric is adjoined equal to the sum of the node metric of the
node from which the incoming branch starts and a branch metric of this branch.
In this problem in every node two branches are coming, resulting in two dif-
ferent metrics. From these two the branch is chosen which has a smaller metric,
the corresponding path is survived. The branch with a larger Hamming distance
is eliminated (overwritten) and erased from a trellis (the corresponding data are
erased from the memory). In such way the path with smaller Hamming distance
is always chosen.
If at some trellis depth both branches outgoing from some state (a trellis node)
are eliminated, then a branch (i.e. all the branches) entering into this state are
eliminated.
3. A unique path from depth t = 0 till depth t = N, determines N decoded bits,
having the same values as the first N bits at the convolutional encoder input, to
which corresponds this path.
4. If the Hamming distance for two paths is the same, any of them can be chosen.
However, to accelerate the decoding, usually one of the following criteria is
applied:
Problems 353
00 11 10 11 11 01 01 11 00 11 01 10 1:
At the beginning, the encoder was reset, and decoding starts from a state 00 as
well. For every next step (for every new information bit entering the encoder) a
new state diagram is drawn, but only the states are connected where the tran-
sitions are possible in this step.
1. Step—transient regime
As shown in Fig. 7.26 and in Table 7.6, in this case the transition regime ends at
the depth t = 3, because in this step two branches are entering into the every state.
The possible sequences at the convolutional encoder output are compared to the
received sequence and for every pair of paths entering the same node that one is
chosen differing in smaller number of bits, i.e. having smaller Hamming distance
[for a depth t, it is denoted by d(t)]. Survived path goes from state 00 into the state
00. This change of states in convolutional encoder is caused by the input bit 0,
easily seen from the state diagram (encoder output is 00). In the case of correct
transmission encoder and decoder change the states simultaneously, and the
decoder performs an inverse operation, i.e. to the encoder input corresponds the
decoder output. Therefore, the decoder output is i1 = 0.
00 11 10
0/00 0/00 0/00 3
00
1/11 4
1/11
0/11
0
01 1/11
1/00 5
0/10
0/10 3
10
4
1/01
1/01 0/01
2
11
1/10 3
Table 7.6 Detailed data concerning decoding during the transient regime
Corresponding Decoder input/ d(3)
path encoder output
Received sequence 00 11 10
Possible paths, corresponding 0!0 0-0-0-0 00 00 00 3
emitted bits at the convolutional 0-2-1-0 11 10 11 4
encoder output and Hamming
0!1 0-0-2-1 00 11 10 0
distance to the received sequence
0-2-3-1 11 01 01 5
0!2 0-0-0-2 00 00 11 3
0-2-1-2 11 10 00 4
0!3 0-0-2-3 00 11 01 2
0-2-3-3 11 01 10 3
2. Step—stationary regime
All overwritten paths are eliminated and only the survived paths are withheld, i.e.
the paths having a smaller Hamming distance. The Hamming distance of a survived
path becomes the Hamming distance of a node where the path terminates. At the
depth t = 4 in state 00 enter two paths—one with d(4) = 5 and the other with
d(4) = 0. Therefore, the probability that a code sequence 00 00 00 00 was emitted is
very small, because the sequence 00 11 10 11 in this case could be received only if
there were five bit errors (bits 3, 4, 5, 7 and 8). The decision made is more reliable if
the metric difference between two competing paths is larger, and the decisions for
states 01, 10 and 11 are less reliable than that one for a state 00.
For further calculation it is taken into account the previous Hamming distance of
every node (from which the branches start) and a trellis structure is the same
between trellis depths t = 2 and t = 3. It is clear that after this step the path to trellis
depth t = 2 survived, and the decoded bits are i1 = 0 and i2 = 1 (see the Fig. 7.27
00 11 10 11
0/00 0/00 0/00 3 0/00 5
00
1/11 1/11 0
1/11 0/11
0 4
01
1/00 3
0/10 3 0/10 3
10
2
1/01 1/01
0/01
2 4
1/10
11
3
Table 7.7 Detailed data concerning decoding after eight received bits
Corresponding d(3) Decoder input/ d = d(3) + d
path encoder output (3 − 4)
t=3!t=4
Received sequence 11
Possible paths, 0!0 0-0 3 00 3 + 2 = 5
corresponding 1-0 0 11 0 + 0 = 0
emitted bits at the
0!1 2-1 3 10 3 + 1 = 4
convolutional encoder
output and Hamming 3-1 2 01 2 + 1 = 3
distance to the received 0!2 0-2 3 11 3 + 0 = 3
sequence 1-2 0 00 0 + 2 = 2
0!3 2-3 3 01 3 + 1 = 4
3-3 2 10 2 + 1 = 3
and Table 7.7). In this step the second information bit was decoded because i1 was
decoded in a previous step.
3. Step—stationary regime continuation
The principle is the same as in the previous case, but the node metrics have the
other values. All overwritten paths are omitted and only survived paths are with-
held, i.e. the paths which have a smaller Hamming distance. Hamming distance of
survived paths becomes the node metric where the path ends.
From the trellis shown in Fig. 7.28 and from Table 7.8, it is clear that in this step
the third information bit is decoded and decoded bits are i1 = 0, i2 = 1 and i3 = 0. It
should be noticed that the accumulated Hamming distance of survived path till the
depth t = 5 is d(5) = 0, meaning that during the transmission of the first ten bits
over the channel there were no errors. Therefore, with a great probability it can be
supposed that in the future the survived path will follow the path where node
00 11 10 11 11
0/00 0/00 5 0/00 2
00
1/11 0 1/11 3
1/11 0
0/11
4
0/11
3
01
1/00 3 1/00 3
0/10 0/10
3 0/10 3 0
10
2 5
1/01 1/01 1/01
0/01 0/01
2
1/10
4
1/10
3
11
3 4
Table 7.8 Detailed data concerning decoding after ten received bits
Corresponding d(4) Decoder input/ d = d(4) + d
path encoder output (4 − 5)
t=4!t=5
Received sequence 11
Possible paths, 0!0 0-0 0 00 0 + 2 = 2
corresponding emitted bits 1-0 3 11 3 + 0 = 3
at the convolutional
0!1 2-1 2 10 2 + 1 = 3
encoder output and
Hamming distance to the 3-1 3 01 3 + 1 = 4
received sequence 0!2 0-2 0 11 0 + 0 = 0
1-2 3 00 3 + 2 = 5
0!3 2-3 2 01 2 + 1 = 3
3-3 3 10 3 + 1 = 4
metrics are equal to zero, i.e. that the next decoded bits are i4 = 0, i5 = 1. However,
these two bits are not still decoded!
Because the transmission is without errors, bit sequence at the encoder output is
the same as the sequence at the decoder input (until this moment—ten transmitted
bits). Therefore, the decoder output sequence must be identical to the convolutional
encoder input sequence. However, for a five bits at the encoder input, at the decoder
output appeared only three. Reader should think about this phenomenon.
(c) If the decoding starts from the second bit of the received sequence, it can be
considered that the decoder input is
01 11 01 11 10
The transient regime for this case is shown in Fig. 7.29, and it is clear that at
the depth t = 3 no one bit was decoded. The same situation is repeated and
after two additional received bits (Fig. 7.30)—no one information bit was
decoded and the path with Hamming distance equals zero does not exists.
After two more received bits the situation is unchanged, only the node metrics
have other values (shown in Fig. 7.31). Also, at the trellis depth t = 5, the
paths entering into two nodes have the same Hamming distance. In spite of
here applied approach where a more suitable path was chosen, i.e. that one was
eliminated yielding a successive elimination of a more paths in previous steps,
the noticeable branch shortening was not achieved, and after three steps no one
bit was detected (decoded)!
There can be two possible reasons:
1. Large number of errors during the transmission
2. Bad synchronization.
Problems 357
01 11 01
0/00 0/00 0/00 4
00
1/11 3
1/11
0/11
3
01 1/11
1/00 2
0/10
0/10 4
10
3
1/01
1/01 0/01
1
11
1/10 4
01 11 01 11
0/00 4 0/00 5
00
3 1/11 2
1/11
0/11 0/11
3 4
01 1/11
1/00 2 1/00 2
0/10
4 0/10 3
10 4
3 1/01
1/01
Because all paths during the first bit decoding are shortened, it is obviously the
bad synchronization. It can be concluded as well and during the further decoding,
because a minimum Hamming distance for any state at some large trellis depth will
attain the large value, and a distance of two paths entering the same node will differ
for a small values.
358 7 Convolutional Codes and Viterbi Algorithm
01 11 01 11 10
0/00 0/00 0/00 3
00
1/11 2 1/11 3
1/11
0/11 0/11 0/11
3
01 1/11
0/10
1/00 1/00 2 1/00 4
0/10 3 0/10 3
10 3
1/01 1/01 1/01
Problem 7.4 One digital communication system for error control coding use the
convolutional encoder and Viterbi decoding. Convolutional encoder has one input
and three outputs defined by
(a) Draw the block scheme of the convolutional encoder and find its generator
matrix.
(b) Find the code rate and draw the state diagram. Is the encoder a systematic one?
(c) If it is known that the decoding starts from the first received bit and that the
initial state of convolutional encoder is 01, decode the first two information
bits if at the input of Viterbi decoder is the sequence
Whether in this moment some further information bit was decoded? If yes,
how many?
(d) Comment the code capability to correct the errors.
Solution
(a) Encoder block scheme is shown in Fig. 7.32, and the code generator is
GðDÞ ¼ 1 þ D2 ; 1 þ D; D2 :
(b) Code rate equals to the ratio of inputs and outputs of encoder—R = k/n = 1/3.
It is a systematic encoder because the first input is directly sent to the third
output. State diagram is shown in Fig. 7.33. Because there is no a loop con-
taining all zeros at the output for a nonzero input sequence, the encoder is not
a catastrophic one.
Problems 359
x s1 s2
y2
y3
1/111 00 0/100
1/011
10 01
0/010
0/110
1/101 11
1/001
(c) According to problem text, the encoder initial state is 01. The decoding starts
from this state. At the decoder input at every moment enter n = 3 bits, the
trellis has 2m ¼ 4 states. As shown in Fig. 7.34, after the transient regime, the
path survived at the trellis depth t = 1, and one information bit is decoded—
i1 = 0. After the second step (see Fig. 7.35) there are no new decoded bits.
The decoding should be continued, because two information bits are not yet
decoded.
In the next step, shown in Fig. 7.36, two additional bits are decoded, i2 = 1 and
i3 = 0. At this moment the condition that at least two information bits are decoded
is fulfilled, the decoding procedure should be stopped, and there is no need to use
other bits at the decoder input.
The Viterbi algorithm has just a feature that the rate at the decoder output is not
constant. It can be explained by the fact that this algorithm does not make the
decision instantly, but in some way it “ponders” until the moment when the
360 7 Convolutional Codes and Viterbi Algorithm
1/101 0/110
4
11
1/001 7
0/110
5
11 1/001
4 6
1/111 0/100
01
0/010
10
11
Suppose at a moment that the encoder at the beginning was reset and that at his
output the sequence of all zeros is emitted, and that the channel introduces the
errors at the first ec = 2 bits. On the basis of Fig. 7.37 it is clear that the decoder will
nevertheless made a good decision, i.e. it will chose the path corresponding to the
emitted sequence. If ec = 3 the decision will be wrong as this decoder can correct at
most ec = 2 errors if they are sufficiently separated. It can be shown that by using
Viterbi algorithm, regardless the initial state and the sequence at the encoder input,
one can correct up to (dfree − 1)/2 bits (errors) on the constraint length m = (m + 1)n
bits.
Problem 7.5 One digital communication system for error control coding uses the
convolutional encoder and Viterbi decoding. The encoder is shown in Fig. 7.38.
362 7 Convolutional Codes and Viterbi Algorithm
y1
x x' s1 s2
P y2
y3
(a) Do the encoder structure can be presented in a form more suitable for
decoding?
(b) Find the bit sequence at the encoder output if the switch is closed, and at his
input is the binary sequence i = (0110…).
(c) When the samples of channel noise are n = (0.2, −0.1, −0.5, −0.15, −0.6, 0.6,
−0.1, 0.15, 0.4, 0.1, 0.3, 0.1), find the decoder output sequence if Hamming
metric is used, after that repeat the same procedure if the decoder uses
Euclidean metric.
(d) If the switch is open decode a sequence by decoder using Euclidean metric.
(e) If the switch is open calculate the approximate values of error probability for
both ways of decoding (hard and soft) if Eb/N0 = 5 [dB].
Solution
(a) It is clear that the delay cell at the encoder input will not substantially influence
the decoding result—sequence x is obtained at the cell output becoming x′ after
only one step. Therefore, during the decoding it is sufficient to reconstruct
sequence x′, and this cell can be omitted. On the other hand, output y3 corre-
sponds to the input sequence delayed for three bits, denoted by s2 (corre-
sponding to the second state component), and the output is active only when the
switch is closed. Now, it is obvious that this encoder can be completely
described by a simplified block scheme shown in Fig, 7.39, i.e. by the state
diagram having four states.
y2
Problems 363
(b) When the switch is closed, the corresponding code rate is R = 1/3, and the
code is systematic because the input is practically sent to the third output (only
the delay of two bit intervals is introduced). One can suppose that the encoder
was reset at the beginning, because nothing contrary was said in the problem
text. Corresponding trellis and state diagram are similar to these ones in
Problem 7.3, but the first and the second output bit change the places, while
the third output bit is determined by the first bit of the state from which the
corresponding branch at the trellis starts. For a sequence i = (0110) at the
encoder input, the following transitions in the Fig. 7.40 can be observed:
– the first input bit 0 causes the transition from the state 00 into the state 00,
and 000 is emitted at the output,
– the second input bit 1 causes the transition from the state 00 into the state
10, and 110 is emitted at the output,
– the third input bit 1 causes the transition from the state 10 into the state 11,
and 100 is emitted at the output,
– the fourth input bit 0 causes the transition from the state 11 into the state
01, and 101 is emitted at the output,
yielding the channel input sequence
c ¼ ð000110100101Þ:
(c) If for transmission the unipolar pulses are used, the corresponding voltage
levels being 1 and 0, then x = c, and for a known input sequence and noise
samples the following is obtained
y ¼ x þ n ¼ ð0:2; 0:1; 0:5; 0:85; 0:4; 0:6; 0:9; 0:15; 0:4; 1:1; 0:3; 1:1Þ:
11 1/011 1/011
Fig. 7.40 Trellis for the simplified convolutional encoder structure (switch closed)
364 7 Convolutional Codes and Viterbi Algorithm
i c x Channel y
Convolutional 0→0 Viterbi decoder,
Source with User
encoder 1→1 Euclidean metric
noise
Fig. 7.41 Block scheme of a complete system, both ways of decoding are illustrated
The further procedure is illustrated in Fig. 7.41. When the hard decoding is
used, between the channel output and the decoder input, a decision block
should be inserted (a comparison to a threshold), while for soft decoding there
is no need for it. If the decision is carried out on one sample basis, it is
understood that the symbols are equally probable, and a threshold is at 0.5.
At the trellis corresponding to the hard decoding, shown in Fig. 7.42, the input
bits corresponding to branches are not denoted (from two branches leaving
one node, the “upper” branch always corresponds to input bit 0 and it is
denoted by dashed line). The branches eliminated after the transient regime are
denoted with ‘+’ and those eliminated in the next step with ‘’. After the first
two steps of decoding no one bit was decoded, but it is clear that if the first
decoded bit is i1 = 0 the second must be i2 = 1, while, the alternative com-
bination is i1 = 1, i2 = 1. Most likely to be detected is a sequence i = (0110),
because for a corresponding path d = 2, but it is not still the algorithm result.
In Fig. 7.43 the procedure of soft decoding is shown, when the metrics are
calculated as a square of Euclidean distance between the received sequence
y and a supposed code word x, i.e. [63]
Problems 365
0.2; -0.1; -0.5 0.85; 0.4; 0.6 0.9; 0.15; 0.4 1.1; 0.3; 1.1
000 000 000 2.535 000 5.045
00
110 4.435 110 3.245
110
111 111
2.735 5.345
01 110
001 2.835 001 1.345
011
010 2.435 010 4.245
10 4.535 4.045
100 100
101
101 101
1.235 3.745
11 011 011
4.335 2.945
Fig. 7.43 Viterbi decoding using Euclidean metric, the switch is closed
X
n
dE2 ðx; yÞ ¼ ðxi yi Þ2 :
i¼1
Here two bits were successfully decoded—i1 = 0 and i2 = 1. It is rather certain that
the next two decoded bits will be i3 = 1 and i4 = 0 (look at the metrics relation at
depth t = 4!). If the numerical metrics values are compared, the Hamming metrics
can be considered as a rounded off values of Euclidean metrics, where the error of
rounding off propagates resulting in non-optimum decision.
(d) When the switch is open, the encoder is no more a systematic one, and its code
rate is increased to R = 1/2. This encoder is not catastrophic (as well as in the
case of closed switch) because the polynomial corresponding to the second
output is not divisible with the polynomial corresponding to the third output
[D2 is not a factor of polynomial 1 + D+D2 over the field GF(2)].
In this case the trellis structure is changed in such a way that the third encoder
output is not taken into account. Code word is now c = (0,0, 1,1, 1,0, 1,0), and
at the decoder input is the sequence obtained by superimposition of the first
eight noise samples on the code word, i.e. y = (0.2; −0.1; 0.5; 0.85; 0.4; 0.6;
0.9; 0.15). Decoding procedure is illustrated in Fig. 7.44, and one bit is cor-
rectly decoded (i1 = 0). Although the transmitted sequence i = (0110) is most
likely to be reconstructed, at this moment even its second bit is not yet
decoded, because the noise influence is stronger than in previous case (e.g. it is
not the same case if noise sample −0.5 is superimposed on 0 and 1!), and
because of the increased code rate, the correcting capability is decreased.
(e) Error probability registered by a user when hard decoding is used, for
implemented encoder having one input (k = 1) is [25, 26]
366 7 Convolutional Codes and Viterbi Algorithm
X
1 h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiid
Pb;HM \ cd 2 pð1 pÞ cdfree 2dfree pdfree =2
d¼dfree
rffiffiffiffiffiffi dfree =2
1 Eb
¼ cdfree 2 dfree
erfcð Þ ;
2 N0
where cd denotes the total number of input binary ones at all sequences having
the weight d, while p denotes the probability of error at the binary symmetric
channel (crossover probability) when error control coding is not applied.
For the first case R = 1/3, dfree = 7, c7 = 1, and in the case when the switch is
open R = 1/2, dfree = 5, c5 = 1. If it is known that before the error control
application the signal-to-noise ratio was Eb/N0 = 105/10 (it is a value per
information bit, while after the encoding the ratio of energy per code word bit
and noise spectral power density it decreases to Ec/N0 = Eb/N0 R). The
reader should find the numerical values for every considered case (switch
closed or open, Hamming or Euclidean metric).
Problem 7.6
(a) If at the input of the convolutional encoder defined by the generator
G(D) = [1 + D+D2, 1 + D2] is the sequence i = (010010) find the corre-
sponding code word and explain the effect of quantization on the system
performances. The system uses BPSK modulation, Viterbi decoding and
Problems 367
Euclidean metric. Find a sequence for decoding if two and four quantization
regions are used and if the real noise samples are:
n ¼ ð0:51; 0:1; 0:95; 1:1; 0:2; 0:9; 2:34; 0:68; 0:42; 0:56; 0:08; 0:2Þ:
(b) Using the same example explain the decoding with a fixed delay where the
decision about transmitted bit is made after three steps. What are the advan-
tages and the drawbacks of this decoding type? Decode the first four infor-
mation bits if at the receiver input is the same sequence as above, but if
eight-level quantization is used.
Solution
(a) Soft decision in practice can never be applied exactly because the Viterbi
algorithm is usually implemented by software or hardware, and the samples of
received signal are represented by a limited number of bits. Therefore, the
received signal should be quantized with the finite number of levels, and it is
suitable that this number is a power of two.
Let BPSK signal has a unity power (baseband equivalent is +1/−1) while the
received signal is uniformly quantized. The positive and negative values of
additive noise are equally probable (easily concluded from a constellation
diagram of BPSK signal and noise), it is logical that the thresholds are
symmetric with respect to the origin.
– Quantization into only Q = 2 regions is trivial reducing to the hard deci-
sion, where only one threshold is used for the comparison. Centers of
quantization regions can be denoted using one bit (1-bit quantization), and
samples at the decoder input are denoted by −1 and +1, as shown in
Fig. 7.45a.
– In the case of quantization using Q = 4 decision regions, it is logical that
regions in the negative part are symmetric in respect to the points corre-
sponding to transmitted signals, in the same time not disturbing the sym-
metry regarding to the origin. In this case two bits are used for
quantization, regions borders have values −1, 0, 1 and quantization regions
centers are points −1.5, −0.5, 0.5 and 1.5, as shown in Fig. 7.45b.
To the encoder input sequence i = (010010) corresponds the code word
c = (001110111110), and the baseband equivalent of BPSK signal becomes
Taking into account the noise samples, signal at the receiver input is
368 7 Convolutional Codes and Viterbi Algorithm
(a) 2
0.5
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
In−phase component of the received signal
(b) 2
Quadrature component of the received signal
1.5
0.5
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
In−phase component of the received signal
Fig. 7.45 BPSK signal quantization with Q = 2 (a) and Q = 4 (b) quantization levels
y ¼ x þ n ¼ ð1:51; 0:90; 0:05; 0:10; 0:80; 1:90; 3:34; 1:68; 0:58; 1:56; 1:08; 1:20Þ;
while the signal values at the quantizer output for Q = 2, Q = 4 are Q = 8 are
as follows:
Problems 369
yq2 ¼ ð1:00; 1:00; 1:00: 1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00Þ;
yq4 ¼ ð1:5; 0:5; 0:5; 0:5; 0:5; 1:5; 1:5; 1:5; 0:5; 1:5; 1:5; 1:5Þ;
yq8 ¼ ð1:75; 0:75; 0:25; 0:25; 0:75; 1:75; 1:75; 1:75; 0:75; 1:75; 1:25; 1:25Þ:
0
10
Q=2
Q=4
Q=8
−1
Q=16
10 unquantized
Bit error rate, BER
−2
10
−3
10
−4
10
0 1 2 3 4 5 6 7
Signal−to noise ratio, snr [dB]
Fig. 7.46 The influence of the number of quantization levels on the error probability registered by
user, encoder with G = [7,5], BPSK modulation, coherent detection
370 7 Convolutional Codes and Viterbi Algorithm
(c) When the decoding is carried out using the fixed delay, the rules are as follows
[25]:
1. For the first step a classic Viterbi algorithm is applied, using a chosen
metric. After the transition regime, in every state a path having smaller
metric is chosen, i.e. path having larger metric is eliminated.
2. If the initial state is not known, the decoding should start from all states,
considering that the accumulated distance in these states are equal to zero.
Duration of transient regime in this case is only one step and the decision
can be done already at a trellis depth t = 1.
3. The procedure from the previous steps is repeated until the trellis depth
determined by a fixed delay duration (registers containing the metrics have
a limited length determining this numerical value!). When this depth is
achieved, the survived paths are found and the state is identified at the
largest depth to whom corresponds the smaller metric. For the considered
example the decoding procedure until this step is shown in Fig. 7.47, fixed
delay duration is three, the survived paths are denoted by heavy lines and
the minimal metric value at the depth t = 3 is 3.375 corresponding to the
state 01.
4. At the trellis the path is found connecting the state having a minimal metric
(at the depth corresponding to a fixed delay) with some of candidates for
the initial state. If a fixed delay is sufficiently large and if during the
decoding a sufficient number of branches was eliminated, this path is
unique. In Fig. 7.48 it is denoted by the strongest lines, with arrows
pointing to the left. In such a way, the initial state is found (in this case 00).
5. Now, it can be considered that the first bit corresponding to this path is
decoded. Therefore, although for the depth t = 1 all paths are not
0/1,1
3.375 18.5
01 1/1,1
1/-1,-1 21.5
0/1,-1
0/1,-1 7.5
10 2.75 10.375 18.5
1/-1,1
0/-1,1
18.5
11
13.375 1/1,-1 21.5
• The previous claim is valid only in a case if the decision about the first
information bit was made correctly. Then the uncertainty about the
result of decoding really becomes smaller, because this bit value
influences to a series of next bits emitted by encoder. However, if the
decision is incorrect, the consequences for a further decoding procedure
are catastrophic.
• Of course, the decision about the first information bit is more reliable if
a fixed delay has a larger value.
7. Now, the trellis is formed from depth t = 1 to depth t = 4, starting from a
state 00, where some branches are back, and metrics are corrected as
shown in Fig. 7.48 The procedure from 1-6 is now repeated:
– metrics are calculated, some branches are eliminated and survived paths
are denoted;
– at the depth t = 4 the state having the smallest metric is identified (it is
10, the corresponding distance 4.5);
– the path is found using the survived branches from this state to the
initial state (being now known);
– the second information bit is decoded, being here i2 = 1;
– part of a trellis till the depth t = 2 is erased, and a new part from the
depth t = 4 to the depth t = 5 is formed, while a new initial state is 10;
– the metrics from the previous step are taken, with a correction described
in 6. (being here applied to the states 01, 10 and 11).
8. The previous procedure is repeated until the end of decoding.
The next step is shown in Fig. 7.49, where the decoded bit is i3 = 0. It should be
noted that not here neither in the previous steps, there would not be a correct
0/1,1 0/1,1
3.375 21.5 26.125
01
1/-1,-1 1/-1,-1 25.125
0/-1,1 0/-1,1
22.125
11
13.375 1/1,-1 21.5 1/1,-1 29.125
decoding if the decision was not made “by force”, because the branch elimination
was not resulted in a unique path till the depth t = 3. Because of that, after this step,
a trellis part till the depth t = 3 is erased, new initial state is 01, and (insignificant)
metrics correction is carried out only for a state 01.
The last considered step is shown in Fig. 7.50. For a difference to the previous
cases, here already a standard procedure of branches elimination, to which corre-
spond larger distances, yields the decoding of the fourth information bit (i4 = 1).
After this step one should erase only the trellis part till the depth t = 4, new initial
state is now 00, the trellis part till the depth t = 7 is drawn and a decoding pro-
cedure is continued.
It can be noted that in this example in every step only one information bit was
decoded. However, it is a consequence of the fixed delay small value—it is here
minimum and corresponds to the encoder transient state duration, to simplify the
procedure illustration. Usually, a fixed delay is 15–20 bits and it is clear that, in
such case, in one window more bits can be decoded, and that a “truncation”, i.e. a
decision “by force” is applied only when in a standard decoding procedure there is
some delay.,
It was shown earlier that in this example the decoded information bits are i1 = 0,
i2 = 1, i3 = 0, i4 = 0, corresponding completely to the sequence at the encoder input
(see the first part of the solution). One can verify that by using a classic decoding
procedure at least after the first three steps no one bit would be detected (it is
recommended to a reader to verify it!). If the decoding was continued, if there were
no more transmission errors, all bits would be decoded. However, the decoder
would not emit anything for some longer time, but after he would emit a large
number of bits.
In addition to lessening the need for a registers length where the metrics are
stored, the decoding with a fixed delay is used just to lessen the variable rate at the
0/-1,1
15.25
11
22.125 1/1,-1 22.25
decoder output as well. However, one should keep in the mind that the decoding
will be correct only if a delay is sufficiently long to “clear up” the path, i.e. that it
has no branches for a number of steps equal to the delay.
Problem 7.7
(a) Draw a block scheme of a convolutional encoder defined by the generator
G(D) = [1 + D, D, 0; 0, D, 1 + D] and a corresponding trellis with denoted
transitions (x1x2/y1y2y3). Find the code rate and the free distance.
(b) Design the encoder with one input which combined with a suitable puncturing
provides the same dfree and the same code rate as the encoder above.
(c) Decode using Viterbi algorithm with Euclidean metric if at the input of
encoder from (b) the sequence
i ¼ ð1101Þ;
is entering, represented by unit power pulses, the channel noise samples are
Solution
(a) The encoder block scheme is shown in Fig. 7.51, and a corresponding trellis
for a transient regime is shown in Fig. 7.52. Code rate is R = 2/3. Free dis-
tance is dfree = 3.
(b) According to the problem condition, a punctured convolutional code should
be obtained which has the code rate R = 2/3 and free distance at least dfree = 3,
where for starting convolutional encoder is chosen one having one input. The
simplest way to realize it is to start from the best possible encoder having two
inputs and two delay cells (Rm = 1/2). It is well known that the corresponding
generator is G(D) = [1 + D+D2, 1 + D2] (or G(D) = [1 + D2, 1 + D+D2]),
which has dfree = 5, which after the puncturing will be additionally decreased.
y2
+
x2 s2 y3
+
Problems 375
10/100
10 10/111
10/010
11/101 10/001
11/101
11/110
11 11/011
11/000
Later, it will be verified do by using this starting encoder the wished perfor-
mances can be obtained.
It is known that the code rate after the puncturing equals the ratio of punc-
turing period (denoted by P) and a number of ones in the puncturing matrix
(denoted by a) [1]
RP ¼ Rm ðnP=aÞ ¼ P=a;
and that the simplest puncturing matrices providing code rate R = 2/3 are as
follows
0 1 1 1 1 0 1 1
P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼ :
1 1 0 1 1 1 1 0
From Fig. 7.53 the weights of paths returning in the state 00 after three and
four steps can be found, the results are shown in Table 7.9 (it can be con-
sidered that there are no longer paths having smaller weights). All matrices
satisfy the problem conditions, but the encoder will have better correction
capabilities if matrices P3 are P4 are used yielding dfree = 4. Therefore, by
puncturing the originating code having one input, to which corresponds a
simpler encoder block scheme and a lower complexity trellis, it is possible to
construct an encoder which has the same code rate as well as a higher capa-
bility to correct errors [64].
(c) In what follows the puncturing matrix P3 will be used, the complete block
scheme of corresponding encoder is shown in Fig. 7.54. To information
sequence i = (1101) corresponds the code word c = (11010100). However, at
the transmission line a polar equivalent of this sequence is emitted where
(because of puncturing) third and seventh bit are omitted, and the following
sequences are at the channel input i.e. output
376 7 Convolutional Codes and Viterbi Algorithm
0/11
0/11
01 1/11
1/00
0/10
10
1/01 0/01
11 1/10 1/10
Fig. 7.53 Trellis corresponding to punctured convolutional code which has the code rate R = 2/3
and the free distance dfree = 3
1101 ⎡1 0 ⎤
s2 Π3 = ⎢ ⎥
s1 ⎣1 1 ⎦
x ¼ ð þ 1; þ 1; þ 1; 1; þ 1; 1Þ;
y ¼ x þ n ¼ ð1:3; 0:3; 1:4; 1:3; 0:2; 0:1Þ:
x1 y1
y2 TCM
x2
signal
s1 s2 8-PSK
y3
every of two decoding steps, one bit of information sequence was successfully
reconstructed (i1 = 1, i2 = 1), and there is a high probability that the next two
decoded bits will be i3 = 0 and i4 = 1, because the corresponding path
Euclidean distance is substantionaly smaller with respect to the other paths.
Solution
(a) In TCM (Trellis Coded Modulation) transmitting end consists from encoder
and modulator, their parameters are jointly optimized [65]. State diagram of
recursive systematic convolutional encoder is shown in Fig. 7.57, where
dashed lines correspond to zero value of the second information bit. The first
information bit does not influence the path shape, but only the first output
value (it is the reason for parallel paths).
In Fig. 7.58 complete trellis at the depth t = 1 and a trellis part to the depth
t = 3 are shown. It is clear that the free code distance is dfree = 1. Parallel to all
00/000
10/100
01/010 00
01/010
11/110
11/110
00/000
s1’=s2
10/100
s2’= x2 s1
01 10
y1= x1
01/011 y2= x2
11/111 y3= s2
10/101 00/001
00/001
11 10/101
01/011
11/111
Fig. 7.57 State diagram and encoder (Fig. 7.58) working rule
011, 111
001, 101 (01) 010 (01)
001 110
101
zeros path is a path which in t = 0 leaves state 00, and in t = 1 returns into the
same state containing only one output bit different from one.
(b) Furthermore, two ways of constellation diagram mapping (binary and Gray)
will be considered, for a complex form of modulated signal which has a
normalized amplitude, for every three bit combination at the modulator input.
1. Binary mapping:
000 x =1
010
001 x = (1 + j ) / 2
011 001
d1 010 x= j
d2 d0
011 x = (−1 + j ) / 2
100 000 100 x = −1
d3
101 x = (−1 − j ) / 2
111 110 x=−j
101
111 x = (1 − j ) / 2
110
2. Gray mapping:
011 000 x =1
010 001 001 x = (1 + j ) / 2
d1
011 x= j
d2 d0
010 x = (−1 + j ) / 2
110 000
d3 110 x = −1
111 x = (−1 − j ) / 2
111 100 101 x=−j
101 100 x = (1 − j ) / 2
From the trellis in Fig. 7.59 the paths can be noticed which at depth t = 0
leave the state 00 and in some next moment return to this state, i.e. they are
competing for a path having a minimum distance. Euclidean distance is cal-
culated step by step, as a distance of a point at constellation diagram to the
(00) (00)
010 100
110
010 (01)
(01)
001 110
101
(10) (10)
(11) (11)
Fig. 7.59 Trellis part for finding minimum Euclidean distance for TCM modulator
380 7 Convolutional Codes and Viterbi Algorithm
2. t = 3:
2
c ¼ ð010; 001; 010Þ ! dðIIÞ;bin ¼ d02 þ 2d12 ; dðIIÞ;Grej
2
¼ 2d22 þ d02
2
c ¼ ð010; 101; 110Þ ! dðIIIÞ;bin ¼ d02 þ 2d12 ; dðIIIÞ;Grej
2
¼ d22 þ d02 þ d32
2
c ¼ ð010; 001; 010Þ ! dðIVÞ;bin ¼ d22 þ 2d12 ; dðIVÞ;Grej
2
¼ 2d22 þ d12
2
c ¼ ð010; 101; 110Þ ! dðVÞ;bin ¼ d22 þ 2d12 ; dðVÞ;Grej
2
¼ d22 þ d12 þ d32
2
c ¼ ð110; 001; 010Þ ! dðVIÞ;bin ¼ d02 þ 2d12 ; dðVIÞ;Grej
2
¼ d32 þ d02 þ d22
2
c ¼ ð110; 001; 110Þ ! dðVIIÞ;bin ¼ d02 þ 2d12 ; dðVIIÞ;Grej
2
¼ 2d32 þ d02
2
c ¼ ð110; 101; 010Þ ! dðVIIIÞ;bin ¼ d22 þ 2d12 ; dðVIIIÞ;Grej
2
¼ d32 þ d12 þ d22
2
c ¼ ð110; 101; 110Þ ! dðIXÞ;bin ¼ d22 þ 2d12 ; dðIXÞ;Grej
2
¼ 2d32 þ d12
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
where d0 ¼ 2 2; d1 ¼ 2; d2 ¼ 2 þ 2; d3 ¼ 2. It is easy to
verify that in both cases the path terminating at the depth t = 1 is the
more critical, because is this case the Euclidean distance is the smallest.
Corresponding code gains are
d32 4
Gbin ¼ 10 log ¼ 10 log ¼ 3:01 dB,
d12 2
2
pffiffiffi
d0 2 2
GGrej ¼ 10 log 2 ¼ 10 log ¼ 5:33 dB,
d1 2
and after noise superimposing at the decoder input the following symbols
enter the decoder
Problems 381
pffiffiffi
v ¼ u þ n ¼ ð0:1 þ 0:8j; ð1 þ jÞ= 2; 0:7 þ 0:1jÞ:
y=0.1+0.8j (1 + j ) / 2 y=-0.7+0.1j
1 1.20 1 1.96 1 3.66
2.28
(00) d
1.36
-1 1.36 -1 3.04 -1 1.52
j -j
j j
-j
-j
-j 0.22 1.96 3.10
(1 + j ) / 2 3.26
(01) 1.92
1.80 3.04 1
(1 + j ) / 2
0.53
(1 − j ) / 2 -1
j
0.22 3.49
(−1 − j ) / 2 (−1 − j ) / 2 2.77
(10) 2.22 2.24
(1 − j ) / 2 3.25
(−1 + j ) / 2
(−1 + j ) / 2
(1 − j ) / 2 (−1 + j ) / 2
1.63 2.57
(1 + j ) / 2 3.58
(11) 3.16
1.63 (−1 − j ) / 2 2.45
trellis termination in state 00. In considered case trellis should return from the state
01 (where the encoder was after emitting a code sequence c) into the state 00, which
is provided if in the fourth step x2 = 1 and in the fifth x2 = 1. The corresponding
values of x1 are not interesting, it will be supposed that they equal zero.
A complete sequence at the encoder input is now x = (01,00,10,01,01), the
corresponding code sequence is y = (010,001,100,011,010), while the modulated
symbols are
pffiffiffi pffiffiffi
u ¼ ðj; ð1 þ jÞ= 2; 1; ð1 þ jÞ= 2; jÞ;
and after the superimposition of noise at the decoder input is the following sequence
pffiffiffi pffiffiffi
v ¼ u þ n ¼ ð0:1 þ 0:8j; ð1 þ jÞ= 2; 0:7 þ 0:1j; ð1 þ jÞ= 2; 1:2jÞ:
Trellis after two more decoding steps is shown in Fig. 7.61. In this case, the
decoding of all emitted information bits is terminated, i.e. the sequence
x = (01,00,10) was reconstructed correctly. Trellis has still more branches, and the
fourth and fifth bit from the encoder input are not decoded. But, these bits do not
carry any information and their reconstruction is not important. Although the metric
corresponding to the state 00 at the depth t = 5 is the smallest, the corresponding
numerical values at the last two depth of a trellis are not written, because they are
not interesting, the decoding being finished (the reader should find them, repeating
the procedure from the previous steps).
In this case for transmitting six information bits even four tail bits are used, and
it may seem that a trellis termination procedure have extremely low efficacy. Of
course, it is not a case, because the information bits block to be transmitted can be
extremely long—typical files lengths downloaded from Internet are greater than
1 [MB], and after their transmission typically 4–16 tail bits are added. For a turbo
(00) d d d
0.22 j 0.53
(01)
(1 + j ) / 2
-1
0.22
(10)
(11)
During the “long” history of error control coding, especially taking into account
that the proof of the Second Shannon theorem is not a “constructive” one, the
search for constructive codes to approach the “promised” limits have taken a long
period of time. Algebraic coding theory flourished. However, even the best cyclic
codes (RS codes) had not good performance. Namely, they reduce the error
probability for relatively high signal-to-noise ratios and have high code rate, but
they are far from the Shannon limit. Furthermore, the practical coding schemes
were developed allowing the better reliability for lower signal-to-noise ratios. These
were convolutional codes having a moderate constraint length decoded using
Viterbi algorithm, and convolutional codes having a long constraint length decoded
using sequential decoding. Next, cascade schemes were developed where block and
convolutional codes were combined to maximize the coding gain. However, such
codes were suitable only for systems having limited power and unlimited band-
width. For systems having limited bandwidth, the spectral efficiency was aug-
mented by combining multilevel signal constellations and punctured convolutional
codes, where the modulation (constellation) and coding (code rate) were adaptive.
Later it was conceived that the better performance can be achieved by joined coding
and modulation (trellis coded modulation). The next step was iterative decoding
allowing at last coming into “the land of promise”. In fact, there is no need to make
the difference between block codes and convolutional codes, because the last ones
can be terminated (by adding non information bits) and considered as block codes.
In the previous chapter, the Viterbi algorithm was introduced having in view
mainly convolutional codes. Even if the information bits are statistically indepen-
dent, the dependence is introduced into a linear block codes codewords as well, by
adding parity-check bits. Therefore, they can be regarded as generated by some
finite automat. The corresponding trellis can be constructed which has clearly
defined the starting and the finishing state.
A method to find the trellis for linear block codes was proposed in [66] and later
Wolf [67] introduced identical trellis and applied Viterbi algorithm for ML block
codes decoding. More details can be found in [68, 69]. The procedure will be firstly
explained for binary code (Problems 8.1, 8.2 and 8.4), but it can be similarly
applied for linear block codes over GF(q) (Problem 8.3).
Consider a linear block code (n, k) over GF(2). Its control matrix has the
dimensions (n − k) n and can be written in the form
H ¼ ½h1 h2 . . .hj . . .hn ;
The trellis will start at the depth 0 and will terminate at the depth n. Suppose the
systematic code where the first k bits are information bits, followed by n − k parity
checks. State at the depth j will be denoted as Sj and numerically expressed using its
ordinal number. It can be denoted as the second index, i.e. as Sj,ord.numb. Maximal
value of these ordinal numbers is 2n−k − 1. There are 2n−k possible values (states).
They can be represented by vectors having length n − k. If the information vector is
denoted as i(i1, i2, …, ik) a new state for the depth j can be found (second index
denoting the ordinal number is omitted) as
Sj ¼ Sj1 þ xj hj ;
The corresponding trellis has 2n−k = 8 states (i.e. three bit vectors). It starts from
depth 0 going to the depth n = 7. Total number of paths equals the number of code
words 2 k = 24 = 16. If index of S denotes only the trellis depth and state index is
written in the parenthesis, the calculations are as follows (for S0 = (000)):
Brief Theoretical Overview 387
j¼1:
S1 ¼ S0 xk h1 ¼ ð0 0 0Þ þ x1 ð1 1 1Þ
ð0 0 0Þ; x1 ¼ 0
¼
ð1 1 1Þ; x1 ¼ 1
j¼2:
S2 ¼ S1 x2 h2 ¼ S1 þ x2 ð1 1 0Þ
ð0 0 0Þ; x2 ¼0
S2 ¼ ð0 0 0Þ þ x2 ð1 0 1Þ ¼
ð1 0 1Þ; x2 ¼1
ð1 1 1Þ; x2 ¼0
S2 ¼ ð1 1 1Þ þ x2 ð1 1 0Þ ¼
ð0 0 1Þ; x2 ¼1
j¼3:
S3 ¼ S2 x3 h3 ¼ S2 þ x3 ð1 0 1Þ
ð0 0 0Þ; x3 ¼0
S3 ¼ ð0 0 0Þ þ x3 ð1 0 1Þ ¼
ð1 0 1Þ; x3 ¼1
ð0 0 1Þ; x3 ¼0
S3 ¼ ð0 0 1Þ þ x3 ð1 0 1Þ ¼
ð1 0 0Þ; x3 ¼1
ð1 1 0Þ; x3 ¼0
S3 ¼ ð1 1 0Þ þ x3 ð1 0 1Þ ¼
ð0 1 1Þ; x3 ¼1
ð1 1 1Þ; x3 ¼0
S3 ¼ ð1 1 1Þ þ x3 ð1 0 1Þ ¼
ð0 1 0Þ; x3 ¼1
j¼4:
S4 ¼ S3 x4 h4 ¼ S3 þ x4 ð101Þ
ð0 0 0Þ; x3 ¼ 0 ð1 0 0Þ; x4 ¼ 0
S4 ¼ ð0 0 0Þ þ x4 ð011Þ ¼ S4 ¼ ð1 0 0Þ þ x4 ð011Þ ¼
ð0 1 1Þ; x3 ¼ 1 ð1 1 1Þ; x4 ¼ 1
ð0 01Þ; x3 ¼ 0 ð1 01Þ; x4 ¼ 0
S4 ¼ ð0 0 1Þ þ x4 ð011Þ ¼ S4 ¼ ð1 0 1Þ þ x4 ð011Þ ¼
ð0
10Þ; x3 ¼ 1 ð1
10Þ; x4 ¼ 1
ð010Þ; x3 ¼ 0 ð110Þ; x4 ¼ 0
S4 ¼ ð0 1 0Þ þ x4 ð011Þ ¼ S4 ¼ ð1 1 0Þ þ x4 ð011Þ ¼
ð001Þ; x3 ¼ 1 ð101Þ; x4 ¼ 1
ð011Þ; x3 ¼ 0 ð111Þ; x4 ¼ 0
S4 ¼ ð0 1 1Þ þ x4 ð011Þ ¼ S4 ¼ ð1 1 1Þ þ x4 ð011Þ ¼
ð000Þ; x3 ¼ 1 ð100Þ; x4 ¼ 1
Now, the trellis is completely developed and there are 16 paths going to 8 nodes
(two paths into every node). These paths correspond to 16 possible information bit
sequences (of the length 4). It is supposed that the information bits are independent
and that all these combinations are equally probable. But, now the parity check bits
come. They depend on previous path bits, i.e. bits x5, x6 and x7 depend on the state
(node). The state now determines completely the next parity check bits, as shown in
Fig. 8.1. E.g. in a considered case:
388 8 Trellis Decoding of Linear Block Codes, Turbo Codes
S 1 (001)
k2 =1
k1 =0
S 2 (010)
i1 =1 i4 =0
S 3 (011)
S 4 (100)
S 5 (101)
S 6 (110)
i3 =1
i2 =0
S 7 (111)
S4 ¼ S3 þ x4 h4 ¼ ðS2 þ x3 h3 Þ þ x4 h4 ¼ ððS1 þ x2 h2 Þ þ x3 h3 Þ þ x4 h4
¼ ðððS0 þ x1 h1 Þ þ x2 h2 Þ þ x3 h3 Þ þ x4 h4 ¼ S0 þ ði1 h1 þ i2 h2 þ i3 h3 þ i4 h4 Þ
¼ S0 þ ði1 H 11 þ i2 H 21 þ i3 H 31 þ i4 H 41 i1 H 12 þ i2 H 22 þ i3 H 32 þ i4 H 42 i1 H 13
þ i2 H 23 þ i3 H 33 þ i4 H 43 Þ
¼ S0 þ ði1 P11 þ i2 P12 þ i3 P13 þ i4 P14 i1 P21 þ i2 P22 þ i3 P23 þ i4 P24 i1 P31
þ i2 P32 þ i3 P33 þ i4 P34 Þ
¼ ð000Þ þ ðk1 k2 k3 Þ ¼ ðk1 k2 k3 Þ:
From Fig. 8.1 it is easy to see that to the branch (path) which corresponds to the
input i1 = 1, i2 = 0, i3 = 1 and i4 = 0, correspond as well the same output bits
(systematic encoder). This path after the fourth step is in state S2 = (010), and the
next three control bits correspond to the bits in parenthesis, i.e. k1 = 0, k2 = 1 and
k3 = 0. Final state for the considered path is
S7 ¼ S6 þ x7 h7 ¼ ðS5 þ x6 h6 Þ þ x7 h7 ¼ ððS4 þ x5 h5 Þ þ x6 h6 Þ þ x7 h7
¼ ½k1 k2 k3 þ k1 ½100 þ k2 ½010 þ k3 ½001 ¼ ½000:
The complete trellis is shown in Fig. 8.2. Parts of the path corresponding to
symbol xj = 0 are dashed, and parts corresponding to symbol xj = 1 are denoted by
full lines.
Consider now linear block code (5, 3) which has the generator matrix
Brief Theoretical Overview 389
S1 (001)
S2 (010)
S3 (011)
S4 (100)
S5 (101)
S6 (110)
S7 (111)
2 3
1 0 1 0 0
G ¼ 40 1 0 1 05
0 0 1 1 1
To obtain the trellis, the control matrix should be found. Code is not systematic
and firstly the generator matrix of the equivalent systematic code has to be found. It
can be achieved by column permutation (here 1-2-3-4-5 ! 3-4-1-2-5) or by
addition of corresponding rows (here the third row of systematic code matrix G′ can
be obtained by adding all three rows of matrix G). Consider submatrix of this
matrix corresponding to parity checks P (here first two columns of G). Using the
standard procedure the control matrix of systematic code is obtained:
2 3
1 0 1 0 0
1 0 1 0 1
G0 ¼ 4 0 1 0 1 0 5 ¼ ½P; I3 ; H0 ¼ I2 ; PT ¼ :
0 1 0 1 1
1 1 0 0 1
Of course, the relation G′(H′)T = 0 holds but as well as the relation (GH′)T = 0.
Corresponding trellis is shown in Fig. 8.3. It should be noticed that in this
example the used generator matrix G generates the code words where the control
bits are at the beginning, followed by information bits. Corresponding code words
are c1 = (00000), c2 = (11001), c3 = (01010), c4 = (10011), c5 = (10100),
c6 = (01101), c7 = (11110), c8 = (00111). Minimal Hamming weight determines
the minimal Hamming distance is here dmin = 2.
Therefore, to construct trellis, the corresponding parity-check matrix should be
found providing the trellis construction. When parity-check matrix of linear block
code is not known, a trellis structure can be determined from the generator matrix, if
390 8 Trellis Decoding of Linear Block Codes, Turbo Codes
1 0
1
1
1
0 0
2
1
1
1
0 0
3
Fig. 8.3 Trellis structure for the considered example (code (5, 3))
X
n
dE ðu; vÞ ¼ ðui vi Þ2 :
i¼1
Table 8.1 Viterbi decoding for the trellis shown in Fig. 8.2
First step Second step Third step Fourth step
Samples! 0.4 0.2 0.1 0.2 0.1 0.2
0.5
0–0: 0000: 0.46 0: 0.46 + 0.22 = 0.5 0: 0.5 + 0.12 = 0.51 0: 0.51 + 0.22 = 0.55
0111: 1.86 1: 0.66 + 0.82 = 1.3 1: 1.3 + 0.92 = 2.31 1: 1.31 + 0.82 = 1.95
0–1: 1100: 1.26 0: 1.26 + 0.22 = 1.3 0: 1.3 + 0.12 = 1.31
1011: 1.46 1: 1.06 + 0.82 = 1.7 1: 0.5 + 0.92 = 1.31
0–2: 1101: 1.26 0: 1.26 + 0.22 = 1.3
1010: 1.46 1: 1.06 + 0.82 = 1.7
0–3: 0001: 0.46 0: 0.46 + 0.22 = 0.5
0110: 1.86 1: 0.36 + 0.82 = 1.3
0–4: 1110: 2.06
1001: 0.66
0–5: 0010: 1.26
0101: 1.06
0–6: 0011: 1.26
0100: 1.06
0–7: 1111: 2.06
1000: 0.66
step between nodes 0 and 0 (the first row in table) corresponding paths are 0000 and
0111.
For path S0-S0-S0-S0-S0 the corresponding metric is
The survived path has the distance 0.46 (fat numbers). The procedure continues
for the next step where into state S0 enter paths from S0 and S4 and a path is chosen
which has smaller accumulated distance taking as well into account metrics of the
nodes from which the part of path starts. In the third step there is a case where both
distances are equal (1.31) and arbitrarily, one path is chosen. However, and even if
the other path was chosen, it would not influence the last step. Therefore, all zeros
path (0000000) has survived and the information bits are (0000).
Soft decision is optimum if at the channel output the samples are from contin-
uous set. In real systems it is not the case because samples are quantized. Therefore,
it may be conceived that at the channel output there are specific (finite) number of
discrete values. For binary transmission, the channel can be usually represented as
discrete channel with two inputs and more outputs. The case with two outputs is a
binary channel and corresponds to hard decision. For a higher number of outputs
the limiting case is soft decision. ML rule can be applied for such channels and if
392 8 Trellis Decoding of Linear Block Codes, Turbo Codes
where PðXi ¼ xjSði1Þ SðiÞ Þ denotes the probability that during transition from
state Sði1Þ in state SðiÞ at the encoder output, x is emitted ith symbol. It can be
equal 1 (if that transition exists at the trellis or 0, if it is not allowed). Probability
that at the channel output at ith step the symbol Yi appears if at the channel input
is symbol x is denoted with PðYi jXi ¼ xÞ. The number of symbols is finite and
these probabilities can be calculated and given by the table.
2. Node metrics are obtained in the following way—at depth i = 0 only to one
node (usually corresponding to state Sð0Þ ¼ 0) the unity metric is joined
(a0 ðSð0Þ Þ ¼ 1), while for other nodes the metric equals zero. At ith trellis depth
node metrics are calculated as follows:
Y
i
ai ðSðiÞ Þ ¼ ai1 ðSði1Þ Þci ðSði1Þ ; SðiÞ Þ ¼ cj ðSðj1Þ ; SðjÞ Þ; i ¼ 1; 2; . . .; n
j¼1
There are more paths to enter into a specific node and the corresponding
products are calculated for all possible paths. For node metric the highest one is
chosen.
Consider the encoder defined in the second example in this section. Let the
probability of binary symbols is 0.5. The encoded sequence is transmitted over
discrete channel without memory. Let channel output is quantized yielding symbols
0, 0+, 1− i 1. Two output symbols are more reliable (0 i 1) and two are less reliable
(0+, 1−). Channel graph is shown in Fig. 8.4.
Let message m = (000) is transmitted, the corresponding code word is
c1 = (00000). If at the channel output y = (0+, 0, 1−, 0, 0) is obtained, apply
generalized Viterbi algorithm. Transition probabilities are given in Table 8.2.
Using trellis shown in Fig. 8.3 the corresponding path metrics are found and for
the transient regime shown in Fig. 8.5. The metrics can be easily found. E.g.
The path which has a larger metric is chosen. These pats are denoted with heavy
lines as well as their values.
Brief Theoretical Overview 393
0.5 0
0.05
0.3 +
0 0
0.01
0.15
-
1 0.3
1
0.05
0.5 1
+ -
0 0 1
i=0 i=1 i=2 i=3
α0(0)=1 0.0225(*)
0 0
0 (0.5) (0.15) 0.0225
(γ1(0,0)=0.3)
α1(2)=0.3 1
1 (0.3)
(0.05)
1 0 0.00225(*)
1 1 (0.15) 0.00225
(0.3)
(γ1(0,1)=0.15)
1
0 (0.3) 0.045*
0
2 (0.5) (0.15)
0.01125
α1(2)=0.15
1
(0.05) 1
(0.3)
0 0.0045*
3 (0.15) 0.001125
It can be noticed that after the third step, the choice was between two paths
entering state 0 (both metrics are 0.0225) and as well between two pats entering
state 1 (both 0.00225). In Fig. 8.5 the “upper” paths were chosen in both cases. The
394 8 Trellis Decoding of Linear Block Codes, Turbo Codes
complete corresponding trellis is shown in Fig. 8.6 where heavy dashed line is
decoded, i.e. the word (00000). If the “lower” paths were chosen for both cases, the
corresponding complete trellis is shown in Fig. 8.7, i.e. the word (10100) resulting
in error event. It should be stressed both results are equally reliable.
As pointed above, generalized Viterbi algorithm is based on Maximum
Likelihood (ML) rule. This rule does not take into account a priori probabilities of
symbols at the channel input. Similarly, Viterbi algorithm is not an optimal pro-
cedure if the symbol probabilities at the encoder input are unequal. To achieve the
optimal performance, the decisions should be based on Maximum A Posteriori
Probability (MAP). Such an algorithm was proposed by Bahl, Cocke, Jelinek and
Raviv [73] known today as BCJR algorithm. It takes into account the symbol
probabilities at the channel input. Further, this algorithm guarantees an optimal
decision for every system for which trellis can be constructed.
(0.05)
1
2
0.045
1
(0.05) ∗)
0.00225 (
0.0045
0
3 (0.5) 0.00225
1 (0.05)
0
2
0.045
1
1
1 (0.05) ∗)
0.00225 (
0.0045
0
3 (0.5) 0.00225
In this expression, the new parameter is PðSðiÞ jSði1Þ Þ, the conditional probability
that system in ith step from the state Sði1Þ goes into the state SðiÞ . For binary codes,
this probability depends on the probabilities of 0 and 1 at the encoder input (not
taken into account in Viterbi algorithm). For the equiprobable input symbols for
every state PðSðiÞ jSði1Þ Þ ¼ 0; 5: However, sometimes the path from state Sði1Þ
must go into only one state (then PðSðiÞ jSði1Þ Þ ¼ 1; 0: The second important dif-
ference relating to the Viterbi algorithm is that node metrics are calculated in two
ways—“forward” (with increasing trellis depth) and “backward” (from the end of
trellis).
1. Forward path metrics are calculated as follows:
– at depth i = 0, only to one node (usually corresponding to state Sð0Þ ¼ 0) unit
metric is adjoined, while for other nodes the metric equals zero.
– at ith trellis depth node metrics are calculated as follows
X
ai ðSðiÞ Þ ¼ ai1 ðSði1Þ Þci ðSði1Þ ; SðiÞ Þ; i ¼ 1; 2; . . .; n
Sði1Þ 2S x
where Sx is a set of all possible states. After finding all possible transitions from the
previous state depth to the considered state the metric of the starting states is
multiplied by the metric of the corresponding branch and all obtained results are
summed to obtain the node metric of the specific node.
2. Backward path metrics are calculated as follows:
– at depth i = n, only to one node (usually corresponding to state S(n) = 0) unit
metric is adjoined, while for other nodes the metric equals zero.
– at the depth i − 1 node metrics are calculated as follows:
X
bi1 ðSði1Þ Þ ¼ bi ðSðiÞ Þci ðSði1Þ ; SðiÞ Þ; i ¼ n; n1; . . .; 1
SðiÞ 2Sx
where Sx is a set of all possible states. Here the metric of node where the branch
enters is multiplied by the metric of corresponding branch and all obtained results
are summed to obtain the node metric of the specific node.
Consider the previous example where y = (0+, 0, 1−, 0, 0). Corresponding trellis
is shown in Fig. 8.3. E.g. for node at the depth i = 1, the following is obtained:
396 8 Trellis Decoding of Linear Block Codes, Turbo Codes
X
c1 ð0; 0Þ ¼ PðSð1Þ ¼ 0jSð0Þ ¼ 0ÞPðXi ¼ xjSð0Þ ¼ 0; Sð1Þ ¼ 0ÞPðY1 jX1 ¼ xÞ
x2f0;1g
while c1 ð0; 1Þ ¼ 0 and c1 ð0; 3Þ ¼ 0, because the transitions from state 0 into the
states 1 and 3 do not exist.
Node metrics (presented in Table 8.3) are calculated according to the following
example:
shown in Fig. 8.8. In the case when more branches enter into one node, the metrics
are added only. They are not compared neither only one branch is chosen, as in
Viterbi algorithm case.
Now, the further calculation is needed:
For every branch entering alone in the specific node, as well for the node,
parameter ki(S(i)) is calculated as
From the trellis one finds to which symbol of encoded sequence (at that depth)
corresponds this transition (i.e. node) and the probability is calculated that in this
step at the encoder output the corresponding symbol has been emitted. If at the
considered depth into every node only one branch enters, the probability that at this
(ith) depth binary zero is decoded is
1 0 2 0 0
i=0 i=1 i=2 i=3 i=4 i=5
α 0(0)=1 β4(0)=0.5 β5(0)=1
0 α 1(0)=0.15 0 0 0 0
0 (0.25) (0.075) (0.5) (0.5)
(γ1(0,0)=0.15)
1
1 (0.15) 1
(0.025)
1
(0.05)
1 0
1 (0.075)
1
(0.15)
(γ1(0,1)=0.075)
1 (0.05)
0 (0.15)
0
2 (0.25) (0.075)
α1(2)=0.075
1
1
(0.025) 1 (0.05)
(0.15)
0 0
3 (0.075) (0.5)
β4(3)=0.05
Table 8.3 List of metrics for all nodes “forward” and “backward”
t=0 t=1 t=2 t=3 t=4 t=5
Forward at(0) 1 0.15 0.0375 0.005625 0.002841 0.001455
at(1) 0 0 0.00375 0.0005625 0 0
at(2) 0 0.075 0.01875 0.007031 0 0
at(3) 0 0 0.001875 0.0007031 0.0007031 0
Backward bt(0) 0.001455 0.004922 0.01912 0.25 0.5 1
bt(1) 0 0 0.05625 0.025 0 0
bt(2) 0 0.009562 0.03769 0.0025 0 0
bt(3) 0 0 0.05625 0.025 0.05 0
P ðx¼0Þ
A kj ðSðiÞ Þ
Pi ð0Þ ¼ P x0 ðx¼0Þ
;
Ax kj ðSðiÞ Þ
where the summation in numerator is over all node metrics to which in this step
corresponds binary zero (xi = 0), and summation in denominator is over metric of
all branches entering into any of states.
If two branches enter the node, the following metric is calculated
From trellis (at that depth) one finds to which symbol of encoded sequence
correspond the transitions and the probability is calculated that in this step at the
encoder output the corresponding symbol has been emitted. The probability that at
this (ith) depth binary zero is decoded is
P
Ax0 ri ðSði1Þ ; SðiÞ Þ
Pi ð0Þ ¼ P ;
Ax ri ðSði1Þ ; SðiÞ Þ
where the summation in numerator is over metrics of all branches to which in this
step corresponds binary zero (xi = 0), and summation in denominator is over
metrics of all branches entering into any of states.
For the considered example, at depths 1 and 2 these probabilities are
k1 ð0Þ
P1 ð0Þ ¼ ¼ 0:5072;
k1 ð0Þ þ k1 ð2Þ
k2 ð0Þ þ k2 ð2Þ
P2 ð0Þ ¼ ¼ 0:9783
k2 ð0Þ þ k2 ð2Þ þ k2 ð1Þ þ k2 ð3Þ
398 8 Trellis Decoding of Linear Block Codes, Turbo Codes
meaning that at the first step probability of zero is 0.5072, and at the second—
0.9783.
Corresponding probabilities at depths three and four are
Values (S(i)) and ri ðSði1Þ ; SðiÞ Þ are shown in Fig. 8.9 where in the parentheses
are given the probabilities that zero is emitted in every step. On the basis of these
values BCJR decoder could decide that sequence (0, 0, 1, 0, 0) was emitted
(wrongly!). However, the trellis branches corresponding to the above sequence (fat
lines in figure) do not form the path in trellis.
BCJR algorithm is completed, but still the decision was not made. Now, the
valid code sequence should be found which is the nearest to the above estimation
(00100). One way to do it is the comparison of this estimation to all eight possible
code words and to choose one having the minimum Hamming distance from it. In
this case there are two such words:
– code word (00000) (Hamming distance d = 1) and according to BCJR its
estimation of the third bit being 1 is not so reliable (only 0.5072)
– code word (10100) (Hamming distance d = 1) and according to BCJR its
estimation of the first bit being 1 is not so reliable (only 0.5072).
λ(S) σ(S’,S)
Both estimations are equally unreliable and the final decision about the emitted
sequence has the same probability.
The obtained result should be expected because the Viterbi algorithm is opti-
mum when the symbols at the discrete channel input are equally probable, what is
here the case. Because of that, there is no sense to use BCJR here. However, BCJR
has many advantages:
– when the input symbols are not equally probable, Viterbi algorithm (ML
algorithm) is not optimum. But, BCJR algorithm uses MAP, is an optimum in
this case
– BCJR algorithm is suitable for iterative decoding, where on the basis of addi-
tional information, the estimation of symbols can be changed, until they become
sufficiently large to make a reliable estimation.
Decoding using MAP algorithm needs large memory and a substantial number
of operations including multiplication and exponentiation. As in the case of Viterbi
algorithm, here instead of metrics, their logarithms can be used yielding summation
instead of multiplication. Now, Log-MAP algorithm [74] is obtained (Problem
8.9), which has the same performance as BCJR. However, in this case the
expressions of the type lnðeLðd1 Þ þ eLðd2 Þ Þ appear. Here, the following exact
expression can be used
where the second term should be successively calculated. If the second term is
neglected, the suboptimum procedure is obtained—Max-Log-MAP algorithm [75]
(Problem 8.9), having smaller complexity. There exist also further modifications.
One is Constant-Log-MAP [76], where the approximation
0; jd1 d2 j [ T
lnðed1 þ ed2 Þ maxðd1 ; d2 Þ þ ;
C; jd1 d2 j\T
is used and where the value of threshold T is found on the basis of a specific criteria.
The other is Linear-Log-MAP, where the following approximation is used
0; jd1 d2 j [ T
lnðed1 þ ed2 Þ maxðd1 ; d2 Þ þ ;
ajd1 d2 j þ b; jd1 d2 j\T
and where constants a and b are determined on the basis of a specific criteria.
The application of MAP (and Log-MAP) implies the knowledge of
signal-to-noise ratio at the receiver input. It is needed for metric calculation and for
the corresponding scaling of information during the decoders communication in the
case of iterative decoding.
The Viterbi algorithm output is always “hard”. Only the SOVA (Soft Output
Viterbi Algorithm) [77, 78] (Problem 8.7) algorithm makes possible at the decoder
output, parallelly with a series of decoded bits, to find an estimation of the
400 8 Trellis Decoding of Linear Block Codes, Turbo Codes
where lmin is the metric (squared Euclidean distance) of complete survived path and
lt;c is the metric of path of the strongest concurrent (competitor) in step i.
Consider systematic linear block code (4, 3) yielding simple parity checks.
Generator and control matrices are
2 3
1 0 0 1
G ¼ 4 0 1 0 1 5; H ¼ ½ 1 1 1 1 :
0 0 1 1
Corresponding trellis is shown in Fig. 8.10. In the same figure the decoding of
sequence (0.8, 0.7, 0.1, 0.2) is shown using soft Viterbi algorithm, as well as branch
metrics and the survived path. The decoded sequence is (i1, i2, i3) = (1, 1, 0).
SOVA algorithm is identical to the classic Viterbi algorithm until the moment
when the path is found which has the minimum Euclidean distance. To find LLR,
the metric of every branch that separated from this path using expression
n o
f
li;c ¼ min
0
l i1 ðl 0
Þ þ m i ðl 0
; lÞ þ l b
i ðlÞ
l;l
f
is found, where li1 ðl0 Þ denotes the metric “forward” of the state from which the
concurrent branch started, mi1i ðl ; lÞ branch metric at the depth i and li ðlÞ metric
0 b
input
i v0
Convolutional
v1
Interleaving (N bits)
v2
i% Convolutional
One possible way to implement turbo encoder is shown in Fig. 8.12. Recursive
systematic convolutional codes are supposed. Generally, different codes can be used.
The same information sequence is led into both encoders, but the interleaving is
applied in front of one. Interleaving is introduced to avoid the encoding of the same
sequence by both encoders. In such way the packets of errors will be deinterleaved.
After the information bit sequence some “non information” bits are used to reset
encoders usually into all zeros state (“tail biting”). The corresponding trellises are
terminated, and the decoding of the next information bit sequence is independent of
the previous sequence. Therefore, turbo encoder is linear block coder generating
long “code words”, being one if the goals in construction of error control codes.
Let both encoders have code rate 1/2. Information bits are directly transmitted as
well as parity checks of the first encoder. From the second encoder only the parity
checks are send, but for interleaved sequence. In such way, an equivalent code is
obtained which has code rate 1/3. It should be noted that the second encoder takes
into account only the information bits and not control bits of the first encoder. It is
the difference in regard to serial concatenated encoding, where the outer encoder
“works” considering the complete output of the inner encoder.
In this example (code rate R = 1/3) for a sequence of N information bits, the
encoded sequence has 3 N bits. From all possible 23N binary sequences, only 2 N
are valid code words. Therefore, it is linear (3N, N) block code (N is the interleaver
length).
Consider turbo encoder shown in Fig. 8.13. The interleaver work is described by
(N = 16 bits)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I f: g ¼
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
where the second row denotes bit positions at the interleaver output and the first—
bit positions in the original sequence. The puncturing (explained in the previous
chapter) is supposed as well and let P is the puncturing matrix.
Because of puncturing the code rate is R = 1/2 and the corresponding code has
parameters (32, 16) as illustrated in Fig. 8.14.
Brief Theoretical Overview 403
m +
x1
Block + p1 Polar
interleaver
m%
Π form
x2
m%
+
p2
16 LBC
2 Vector space
(32,16)
32
2
DEINTERLEAVING
Λ 2e
Λ1e
r0 DECODER 1 INTERLEAVING
r1
input
INTERLEAVING
r%
0
DECODER 2
r2
output
DEINTERLEAVING
Λ2
For information sequence of length m = 16, control bits and output bits (in polar
format) are given in Table 8.4. It is obvious that by varying interleaver length
different codes can be obtained. Code rate does not change, but longer code words
will provide better error correction capabilities.
Turbo decoder (Fig. 8.15) has to make best use of redundancy introduced by
encoder. The used decoders work on “their own” information bit sequences and the
corresponding deinterleaving and interleaving are used. The first decoder on the
base of its own input and control bits estimate the information bits (e.g. MAP) and
sends estimations to the second decoder (here interleaving is needed). The second
decoder, on the base of these estimations and its own control bits sends to the first
decoder its own estimations (here the deinterleaving is needed). First decoder now
makes new estimations taking into account these estimations as well control bits
and sends them to the second decoder. Therefore, the iterative procedure starts. In
such way a try is made to extract all possible information introduced by the
encoders. After a few iterations, the second decoder makes hard decisions and sends
to output (after deinterleaving) decoded information bits. Number of iterations and
interleaving procedure are the subjects of investigation.
In Problem 8.10 for above considered turbo encoder (and decoder) numerical
example is given for interleaver length N = 5.
Turbo codes are an interesting example of the specific area in information the-
ory. They were firstly introduced in praxis, and a theoretical explanation (and even
understanding) came later [81]. Generally, it can be said [82] that during turbo
decoding, the a posteriori probability for every bit of information sequence is
calculated in iterations, to give the final decision at the end.
Problems 405
Problems
Solution
One of methods to construct a linear block code trellis based on its parity-check
matrix structure is described in introductory part of this chapter. This method is
applied to construct a trellis for the code whose matrix is given above. Only the
explanations necessary for the understanding of the applied procedure are given
here.
(a) The code has parameters (6, 3), and it can be noticed that it is a code whose
construction is described in Problem 5.8, obtained by Hamming code (7, 4)
shortening. Generator matrix of systematic version can be easily found by
reordering column positions, but the solution is not a unique one. The reader
should think over how many possible matrices exist for this code. Some of
possible solutions are
2 3 2 3
1 0 0 0 0 0 1 0 0 0 1 1
6 7 6 7
Gs1 ¼ 4 0 1 0 1 0 1 5; Gs2 ¼ 4 0 1 0 1 1 0 5;
0 0 1 0 1 1 0 0 1 1 0 1
2 3
1 0 0 1 0 1
6 7
Gs3 ¼ 4 0 1 0 1 1 0 5; . . .
0 0 1 0 1 1
2 3 2 3
1 0 0 0 1 1 0 1 1 1 0 0
Gs ¼ 4 0 1 0 1 0 1 5 ¼ ½I 3 P ) H s ¼ ½P; I 3 ¼ 4 1 0 1 0 1 0 5:
0 0 1 1 1 0 1 1 0 0 0 1
St ¼ St1 xt ht ;
where ht denotes t the tth parity-check matrix column transposed and xt can take
values 0 or 1. Trellis part to the depth t = 3 is formed by using equations given
below.
t¼1:
S1 ¼ S0 x1 h1 ¼ ð0 0 0Þ þ x1 ð0 1 1Þ
(
ð0 0 0Þ; x1 ¼ 0
¼
ð0 1 1Þ; x1 ¼ 1
t¼2:
S2 ¼ S1 x2 h2 ¼ S1 þ x2 ð1 0 1Þ
ð0 0 0Þ; x2 ¼ 0
ð0 0 0Þ þ x2 ð1 0 1Þ ¼
ð1 01Þ; x2 ¼ 1
ð0 1 1Þ; x2 ¼ 0
ð0 1 1Þ þ x2 ð1 0 1Þ ¼
ð1 1 0Þ; x2 ¼ 1
t¼3:
S3 ¼ S2 x3 h3 ¼ S2 þ x3 ð110Þ
ð0 0 0Þ; x3 ¼ 0
ð0 0 0Þ þ x3 ð110Þ ¼
ð110Þ; x3 ¼ 1
ð1 0 1Þ; x3 ¼ 0
ð1 01Þ þ x3 ð110Þ ¼
ð0 1 1Þ; x3 ¼ 1
ð0 1 1Þ; x3 ¼ 0
ð0 1 1Þ þ x3 ð110Þ ¼
ð1 0 1Þ; x3 ¼ 1
ð1 1 0Þ; x3 ¼ 0
ð1 1 0Þ þ x3 ð110Þ ¼
ð0 0 0Þ; x3 ¼ 1
In the introductory part of this chapter it is shown that for linear block codes,
information bits at the encoder input determines uniquely the structure of the rest of
trellis. If one starts from state S0 = (000), the following should be satisfied
Problems 407
S2 (010) z1=1
i1=1
i2=0
S3 (011)
i3=1
S4 (100)
S5 (101)
S6 (110)
S7 (111)
As the states are the same after three steps for both sequences, it should be
noticed as well that the parity-check bits are the same for these information
sequences
The codeword has a structure c = (i1 i2 i3 z1 z2 z3), therefore, the rest of path is
uniquely determined
S4 ¼ S3 z1 h4 ; S5 ¼ S4 z2 h5
S6 ¼ S3 z1 h5 z2 h z3 h7 ¼ ðz1 z2 z3 Þ z1 ð100Þ z2 ð010Þ z3 ð001Þ ¼ ð000Þ
408 8 Trellis Decoding of Linear Block Codes, Turbo Codes
0 1 2 3 4 5 6
S0 (000)
S1 (001)
S2 (010)
S3 (011)
S4 (100)
S5 (101)
S6 (110)
S7 (111)
0 1 2 3 4 5 6 7 8
S0=(0000)
S1=(0001)
S2=(0010)
S3=(0011)
S4=(0100)
S5=(0101)
S6=(0110)
S7=(0111)
S8=(1000)
S9=(1001)
S10=(1010)
S11=(1011)
S12=(1100)
S13=(1101)
S14=(1110)
S15=(1111)
Solution
On the basis of the theory exposed in Chap. 5 it is clear that it is Reed-Muller code
RM(8, 4), which can be described by a generator matrix [29, 30]
2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
G¼6
40
7:
0 1 1 0 0 1 15
0 1 0 1 0 1 0 1
The easiest way to form a trellis is to start from a fact that RM(8, 4) code is
self-dual, i.e. its generator matrix is its parity-check matrix as well
2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
H¼6
40
7:
0 1 1 0 0 1 15
0 1 0 1 0 1 0 1
410 8 Trellis Decoding of Linear Block Codes, Turbo Codes
0000 0000
0 0
0 0
1
0000 1010 0000 1101 0000
1 0
1 1 1 1
0
0 1001 1110 0
1 1
0000 0000
0 1001 0001 1 1110 0
1 1
1 0011 0 0 0011
0000 0000
1 1 1 1
1 1000 0001 1011 0010 1100 0001 1111 1
0
0
0 0
0
0 0001
0001
1 0011 0
1000 0 1 1111
0010 0010
0 1 0
1
1000 1111
Fig. 8.19 Trellis diagram with the depth 8 for RM code (8, 4), the states are denoted by using
parity-check matrix
If the fourth row of matrix G′ is added to first, second and the third row, the TOF
matrix is obtained, called trellis oriented generator matrix GTOGM
Problems 411
Now for every row the intervals from the first one to the last one can be found,
measured with respect to the bit position in a row (starting from a zeroth position).
If the first one is in the ith position, and the last in the jth position, then the active
time intervals are [i + 1, j] if j > i or / for j = i, and the following is obtained
sa ðg0 Þ ¼ ½1; 3; sa ðg1 Þ ¼ ½2; 6; sa ðg2 Þ ¼ ½3; 5; sa ðg3 Þ ¼ ½5; 7:
Table 8.5 The sets defining Time I Gsi a a0 Asi State mark
states and marks for trellis
which has the depth eight for 0 / a0 – / (0000)
RM code (8, 4) 1 {g0} a1 – {a0} (a0000)
2 {g0, g1} a2 – {a0, a1} (a0 a100)
3 {g0, g1, g2} – a0 {a0, a1, a2} (a0 a1 a20)
4 {g1, g2} a3 – {a1, a2} (0 a1 a20)
5 {g1, g2, g3} – a2 {a1, a2, a3} (0 a1 a2 a3)
6 {g1, g3} – a1 {a1, a3} (0 a10 a3)
7 {g3} – a3 {a3} (000 a3)
8 / – – / (0000)
412 8 Trellis Decoding of Linear Block Codes, Turbo Codes
0000 0000
0 0
0 0
1
0000 0010 0000 0010 0000
1 0
1 1 1 1
0
0 0100 0100 0
1 1
0000 0000
0 0100 0010 1 0100 0
1 1
1 0110 0 0 0110
0000 0000
1 1 1 1
1 1000 1000 1000 0100 0001 0001 0001 1
0
0
0 0
0
0 1010
0011
1 0110 0
1100 0 1 0101
1100 0101
0 1 0
1
1110 0111
Fig. 8.20 Trellis diagram for RM code (8, 4), the states denoted by information bits set
Table 8.6 Data defining states and their marks for BCH code (7, 4)
i Gsi a a0 Asi State
0 / a0 – / (000)
1 {g0} a1 – {a0} (a000)
2 {g0, g1} a2 – {a0, a1} (a0 a10)
3 {g0, g1, g2} a3 a0 {a0, a1, a2} (a0 a1 a2)
4 {g1, g2, g3} – a1 {a1, a2, a3} (a1 a2 a3)
5 {g2, g3} – a2 {a1, a3} (a2 a3 0)
6 {g3} – a3 {a3} (a3 00)
7 / – – / (0000)
and the branches between two states contain the information about the input
information bit by which the state mark is supplemented or about the oldest bit
being omitted from the state mark. Although the marks are different from those in
the Fig. 8.19 it is clear that both trellises have the same shape, i.e. they are
equivalent.
Problem 8.4 Form the trellis for BCH code which corrects one error, the codeword
length seven bits.
Solution
It is BCH code (7, 4) which has the generator polynomial gðxÞ ¼ x3 þ x þ 1,
analyzed previously in Problem 6.2. It should be noticed that a cyclic codes gen-
erator matrix has always trellis oriented form
Problems 413
0 1 2 3 4 5 6 7
(000)
(001)
(010)
(011)
(100)
(101)
(110)
(111)
Furthermore, the procedure for determining the active time rows intervals is
used, which for cyclic codes always differ for one bit sa(g0) = [1, 3], sa(g1) = [2, 4],
sa(g2) = [3, 5], sa(g3) = [4, 6]. Based on the data given in Table 8.6, trellis diagram
for BCH code (7, 4) is shown in Fig. 8.21.
Problem 8.5 At the output of systematic Hamming code (6, 3) encoder is the
codeword c ¼ ð011011Þ. This sequence is transmitted as a polar signal, while in
channel the additive Gaussian noise is superimposed yielding at receiving end the
signal y ¼ ð1:1; 0:2; 0:1; 0:8; þ 0:9; þ 1:2Þ which is decoded and delivered
to the user.
(a) Draw the block scheme of the transmission system.
(b) Decode the received codeword applying Viterbi algorithm by using Hamming
metric.
(c) Decode the received codeword applying Viterbi algorithm by using Euclidean
metric, supposing that a quantization error can be neglected.
(d) Decode the received codeword applying Viterbi algorithm by using Euclidean
metric, if the uniform quantizer with q = 4 levels is used (−1.5; −0.5; 0.5; 1.5).
Solution
(a) The binary sequence generated by a source is firstly encoded by Hamming
code with parameters (6, 3), for which the trellis is formed in Problem 8.1.
414 8 Trellis Decoding of Linear Block Codes, Turbo Codes
encoder noise
-2
q levels
Q(aˆn )
2
1
-1→0 r Viterbi decoder,
-2 -1 1 2 aˆn
-2
q=2
Fig. 8.22 Block diagram of Viterbi decoding for linear block codes
In the case of soft decoding, the sequence is first quantized (q levels) and then
the Viterbi algorithm is applied by using Euclidean metric, and a decoded sequence
is delivered to user. When the quantizer levels are (−1.5, −0.5, 0.5, 1.5) the
boundaries of quantizing intervals are (−∞, −1, 0, 1, ∞), yielding
r ¼ ð0; 0; 0; 0; 1; 1Þ;
and after that sent into the Viterbi decoder using Hamming metric. Complete
system block scheme is shown in Fig. 8.22.
(b) In the case of Viterbi algorithm decoding by using Hamming metric, it is
understood that at the receiver input there is decision block which has one
threshold (corresponding to two-level quantizer). Therefore, at this block
output only two voltage levels can appear, corresponding to binary one and to
binary zero.
The decoding is then performed at a bit level, where a notion of Hamming
distance is used, defined as the number of positions where two binary sequences
differ. Therefore, the Hamming distance between the received sequence r and
emitted codeword c is
Problems 415
0 0 1 0 2 0 3 0 4 1 5 1 6
S0 (000)
0
3
S1 (001)
S2 (010)
1
S3 (011)
2
S4 (100)
2
S5 (101)
1
1
S6 (110)
2
S7 (111)
Fig. 8.23 The first step during code (6, 3) decoding by Viterbi algorithm by using Hamming
metric
X
n
dðc; rÞ ¼ ðct rt Þ;
t¼1
where is a sign for modulo-2 addition, and summation over t is carried out in
decimal system.
Trellis makes possible to find the Hamming distances between the received
sequence and all possible codewords, with a minimal complexity by using a
maximum likelihood criterion. In the first step, the part of trellis is observed to the
moment when the transient regime is finished, in this example at the trellis depth
t = 3. Part of a received sequence until this moment is compared to the corre-
sponding paths, and from two paths entering the same state, that one is chosen
having a smaller Hamming distance. The same procedure is continued until all
information bits are received, as shown in Fig. 8.23. Obviously, these steps fully
correspond to convolutional codes Viterbi decoding by using hard decision [67].
In the next steps, shown in Fig. 8.24, the previously overwritten branches are
eliminated and a procedure continues in the trellis part where there is the rest of
codeword bits determined by the state in which is the path after the receiving of
information bits. Because of that, the number of states is reduced and as a result the
path is obtained starting from all zeros state and terminating in it. In this case
the sequence is decoded corresponding to states sequence S0 ! S3 ! S3 !
S3 ! S3 ! S1 ! S0, and reconstructed codeword and decoded information word are
416 8 Trellis Decoding of Linear Block Codes, Turbo Codes
0 0 0 0 1 1 1
2
S0 (000)
0 1
2
3
S1 (001)
1
S2 (010)
1
S3 (011)
S4 (100)
S5 (101)
1
S6 (110)
1
S7 (111)
Fig. 8.24 The second and the third step during code (6, 3) decoding by Viterbi algorithm by using
Hamming metric
^c ¼ ð100011Þ ) ^i ¼ ð100Þ:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X n
dE ðy; xÞ ¼ ðyt xt Þ2 ;
t¼1
but the square root operation is not often performed and the square of Euclidean
distance is considered (it does not influence to the relation of the two distances).
Because the quantization error is negligible, it will be supposed that the quanti-
zation was not performed. The procedure of decoding is shown in Fig. 8.25, where
the branches overwritten in the first step are denoted by using the sign ‘’, while
those overwritten in the second and the third step are denoted by ‘+’ and ‘o’,
respectively.
The procedure of calculating squared Euclidean distance is given in Table 8.7
where in bold font the metrics corresponding to survived paths are shown. As for
the algorithm by using Hamming metric, the first decision is at the trellis depth
t = 3 (here the transient regime is finished) and the second decision is at the depth
Problems 417
S2 (010)
5.86
S3 (011)
2.66
S4 (100)
6.26
S5 (101)
2.26
1.86
S6 (110)
6.66
S7 (111)
Fig. 8.25 Decoding of (6, 3) code by Viterbi algorithm by using Euclidean metric
^c ¼ ð011011Þ ) ^i ¼ ð011Þ:
(d) If before the decoding by applying the Viterbi algorithm the quantization was
performed, Euclidean distance is defined as
418 8 Trellis Decoding of Linear Block Codes, Turbo Codes
S2 (010)
6.75
S3 (011)
4.75
S4 (100)
8.75
S5 (101)
2.75
2.75
S6 (110)
8.75
S7 (111)
Fig. 8.26 Decoding by Viterbi algorithm by using Euclidean metric, q = 4 level quantization
X
n
dE2 ðyq ; xÞ ¼ ðyqt xt Þ2 ;
t¼1
the procedure in this case is given in Table 8.8, and the trellis and the order of paths
elimination is shown in Fig. 8.26. It is obvious that the decoding is performed
successfully although at the depth t = 5 one error was made during the decision, but
it was corrected in the next step. Because the metric difference is smaller than in a
previous case, it is clear that the decision made was less reliable.
Problems 419
Problem 8.6 Binary memoryless source emitting binary zero with the probability
Pb(0) = 0.8 sends bits to Hamming encoder defined by the generator matrix
2 3
1 0 0 0 1 1 0
60 1 0 0 1 0 17
G¼6
40
7:
0 1 0 0 1 15
0 0 0 1 1 1 1
The source emits a bit sequence (0010) and a corresponding output encoder
sequence enters a line encoder which generates a polar signal having
amplitudes ±1 V. In the channel at the transmitted signal a noise is superimposed.
At the receiver input, before the decoder, there is a four level quantizer (−1.5; −0.5;
0.5; 1.5 V). The channel is memoryless and the transition probabilities of input
symbols to the quantizer output levels are given in Fig. 8.27.
At the receiver input (before the quantizer) the following signal is detected
y ¼ ð1:1; 1:2; 2:1; 1:8; 0:9; 1:9; 1:2Þ.
(a) Reconstruct the bit sequence emitted by a source, if the decoding is performed
by using a generalized Viterbi algorithm.
(b) Repeat the previous procedure if the decoding is performed by BCJR
algorithm.
(c) Decode the received sequence by BCJR algorithm if the probabilities of source
symbols are not known.
0.4 yq=-1.5
0.1
0.3
x=-1 yq=-0.5
0.2
0.2
x=+1 yq=+0.5
0.3
0.1
0.4 yq=+1.5
x Channel y -1.5→0
yq r
Q(aˆn )
i Linear c
2
Viterbi/
0→-1 1
-0.5→0+
Source block with -2 -1 1 2 aˆn
BCJR Destination
1→+1 0.5→1-
-1
decoder
-2
Fig. 8.28 Block diagram for linear block codes decoding by using trellis
420 8 Trellis Decoding of Linear Block Codes, Turbo Codes
Solution
Similarly to a previous problem, it is suitable to draw system block scheme (shown
in Fig. 8.28). The Hamming code (7, 4) is used and a generator matrix is known,
therefore to the information sequence i = (0010) corresponds the codeword
c = (0010011).
At the encoder output two binary symbols can appear (0 and 1) while at the input
of Viterbi/BJCR decoder one from four levels obtained by quantization can appear.
If to the amplitude levels the logic symbols 0, 0+, 1− and 1 are joined, line encoder,
channel and quantizer can be considered as an equivalent discrete memoryless
channel. At this channel output two symbols more reliable (0 and 1) and two
symbols less reliable (0+, 1−) can appear, the corresponding transition probabilities
are given in Table 8.9. From a known channel output sequence one obtains
y ¼ ð1:1; 1:2; 2:1; 1:8; 0:9; 1:9; 1:2Þ ! yq ¼ ð1:5; 1:5; 1:5; 1:5; 0:5; 1:5; 1:5Þ
the trellis shown in Fig. 8.29 is easily obtained as well. It should be noticed that it is
slightly different from that one shown in Fig. 8.2.
(a) When a generalized Viterbi algorithm is used, the distances accumulated in
the nodes are calculated on the basis of relation [72]
Y
t
at ðSðtÞ Þ ¼ cj ðSðj1Þ ; SðjÞ Þ;
j¼1
where Sðj1Þ denotes a starting state (trellis node at the depth j − 1) and SðjÞ
denotes the state where the branch terminates (trellis node at the depth j).
Branch metrics cj ðSðj1Þ ; SðjÞ Þ are the same as the transition probabilities
describing the channel and for a decoder input sequence are given in
Table 8.10 (the value P(yj|0) corresponds to the horizontal segments, denoted
by a dashed line, while the probability P(yj|1) corresponds to the segments
Problems 421
S1 (001)
S2 (010)
S3 (011)
S4 (100)
S5 (101)
S6 (110)
S7 (111)
denoted by a full line). From the two paths entering into one node, this one is
chosen which has the greater metric. Procedure of decoding by using
generalized Viterbi algorithm is shown in Fig. 8.30. The survived path, i.e. the
procedure result, corresponds to a codeword c′ = (1010101) and to informa-
tion word i′ = (1010). It is obvious that the used algorithm did not result in a
correct decoding of the source emitted sequence. In the continuation of the
solution, it will be shown that, by applying BCJR algorithm, the successful
decoding of is still possible.
(b) BCRJ algorithm procedure is based on the following branch calculation [73]
X
ct ðSðt1Þ ; SðtÞ Þ ¼ PðSðtÞ jSðt1Þ ÞPðXt ¼ xjSðt1Þ SðtÞ ÞPðYt jXt ¼ xÞ
x2Ax
depending besides on the channel transition probabilities and the trellis structure
(described by the probabilities PðSðtÞ jSðt1Þ Þ) as well as on the probabilities of the
encoder input symbols. It should be noticed that in this problem at the depths t = 4,
t = 6 and t = 7 is PðSðtÞ jSðt1Þ Þ ¼ 1; 0. At the other trellis depths this probability is
422 8 Trellis Decoding of Linear Block Codes, Turbo Codes
-
1 0 1 0.004 0 1 0.0013 1 1.28×10
-4
1 7.68×10
-5
S0 (000)
0.1 0.4 0.1 0.0005 7.68×10
-4
-4
0.016 0.4 3.072×10
0.4
-4
0.0003 7.68×10
S1 (001)
-4
0.0077 5.12×10
0.1 0.0013
S2 (010)
0.0019
0.4
0.016
0.1 0.0013
S3 (011)
0.004 0.0005
S4 (100)
0.4
0.1 0.001
0.1
S5 (101)
0.4 0.064
0.4 0.004
S6 (110)
0.4 0.1 0.016
S7 (111)
ðtÞ ðt1Þ 0:8 if SðtÞ ¼ Sðt1Þ ;
PðS jS Þ¼
0:2 if SðtÞ 6¼ Sðt1Þ :
and the corresponding numerical values are given in Table 8.11. For the trellis
depths t = 1, t = 2 and t = 4, the following is calculated
According to the procedure described in the introductory part of this, the fol-
lowing values are calculated
Problems
Table 8.11 Node metrics forward and backward for various trellis depths, BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
Forward ai(0) 1 0.08 0.0256 0.00217 6.96 10−4 1.523 10−4 1.023 10−4 4.33 10−5
ai(1) 0 0 0 0 4.35 10−5 2.176 10−4 8.269 10−5 0
ai(2) 0 0 0 0 4.35 10−5 2.176 10−4 0 0
ai(3) 0 0 0.0016 0.00217 6.96 10−4 1.523 10−4 0 0
ai(4) 0 0 0 0 4.35 10−5 0 0 0
ai(5) 0 0 0.0016 0.00217 6.96 10−4 0 0 0
ai(6) 0 0.08 0.0256 0.00217 6.96 10−4 0 0 0
ai(7) 0 0 0 0 4.35 10−5 0 0 0
Backward bi(0) 4.33 10−5 3.17 10−4 9.52 10−4 0.0016 0.002 0.01 0.1 1
bi(1) 0 0 0 0 0.008 0.04 0.4 0
bi(2) 0 0 0 0 0.008 0.04 0 0
bi(3) 0 0 9.52 10−4 0.0103 0.032 0.16 0 0
bi(4) 0 0 0 0 0.003 0 0 0
bi(5) 0 0 6.4 10−4 0.004 0.012 0 0 0
bi(6) 0 2.24 10−4 6.4 10−4 0.004 0.012 0 0 0
bi(7) 0 0 0 0 0.048 0 0 0
423
424 8 Trellis Decoding of Linear Block Codes, Turbo Codes
and the estimated encoded sequence is c′ = (1010111). It is easy to verify that such
sequence does not exist at the trellis (trellis part shown by heavy line with the
arrows shown in Fig. 8.31). It is obvious that at the depth t = 6 one cannot find a
segment corresponding to binary one at the input, because the parity-check bits are
fully defined by the information bits. Therefore, the decoded word is finally
Problems
Table 8.12 Node metrics forward and backward for BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
Forward at(0) 1 0.05 0.01 0.0025 5 10−4 1.375 10−4 1.0625 10−4 5.3125 10−5
at(1) 0 0 0 0 1.25 10−4 5.125 10−4 1.0625 10−4 0
at(2) 0 0 0 0 4.06 10−4 2.3125 10−4 0 0
at(3) 0 0 0.01 0.0025 5 10−4 1.375 10−4 0 0
at(4) 0 0 0 0 1.25 10−4 0 0 0
at(5) 0 0 0.0025 0.0081 1.625 10−3 0 0 0
at(6) 0 0.2 0.04 0.0025 5 10−4 0 0 0
at(7) 0 0 0 0 1.25 10−4 0 0 0
Backward bt(0) 5.31 10−5 3.25 10−4 1.45 10−4 0.0028 0.002 0.01 0.1 1
bt(1) 0 0 0 0 0.008 0.04 0.4 0
bt(2) 0 0 0 0 0.008 0.04 0 0
bt(3) 0 0 8.875 10−4 0.00655 0.032 0.16 0 0
bt(4) 0 0 0 0 0.003 0 0 0
bt(5) 0 0 7 10−4 0.0028 0.012 0 0 0
bt(6) 0 1.84 10−4 7 10−4 0.0028 0.012 0 0 0
bt(7) 0 0 0 0 0.048 0 0 0
425
426 8 Trellis Decoding of Linear Block Codes, Turbo Codes
-
1 0 1 0 1 1 1
S0 (000)
S1 (001)
S2 (010)
S3 (011)
S4 (100)
S5 (101)
S6 (110)
S7 (111)
Fig. 8.31 Procedure for decoding by using BCJR algorithm for the known and the unknown a
priori probabilities
c00 ¼ ð1010101Þ;
Prfit ¼ 1jrg
Kðxt Þ ¼ ln ,
Prfit ¼ 0jrg
and the estimation is more reliable as its absolute value is greater (negative value
indicates that the zero was sent, and positive value indicates that one was sent).
(3) BCJR algorithm does not yield a correct solution if a sufficiently good esti-
mation of a priori probabilities is not accessible. The estimations obtained as a
result of BCJR algorithm are not optimal in this case.
Problem 8.7 The same transmission system from the previous problem is con-
sidered, but the quantizer output sequence enters the decoder whose procedure is
based at SOVA algorithm with the correlation metric. Explain the decoding
Problems 427
procedure, reconstruct the transmitted codeword and find the corresponding esti-
mations of reliability.
Solution
Block scheme is practically the same as shown in Fig. 8.28, with a difference that
instead of BJCR decoder Soft Output Viterbi Algorithm (SOVA) is used [77].
When a classic Viterbi algorithm is applied, after decision which branches have
been survived, a series of decisions about “better” paths results in the decoded bit
sequence. Not taking into account whether the metrics are calculated by using
Hamming or Euclidean distance (or the channel transition probabilities, as for a
generalized algorithm version), i.e. not taking into account whether a hard or soft
decoding is used, the Viterbi algorithm output is always “hard”. Only SOVA
algorithm makes possible at the decoder output, in parallel with a series of decoded
bits, to find as well an estimation of the reliability of corresponding decisions.
The first step in SOVA algorithm is the same as in any Viterbi algorithm variant
by using soft decision—on the basis of the branch metrics the survived branches are
found. A variant that uses Euclidean metric is used usually or a variant that uses
correlation metric calculated as follows [26]
X
n
dK ðc; yq Þ ¼ ð1Þ~ct yqt ,
t¼1
where ~ct represents the tth bit of the estimated codeword, and yqt denotes the tth
value of quantized received signal. The procedure, for this case, is shown in the
Fig. 8.32.
-1.5 3 2.5
S2 (010)
3.5
1.5
1.5 3
-1.5 2.5
S3 (011)
0 -1.5 0.5
0
S4 (100)
1.5
-3 -1.5 -4.5 6
-1.5
S5 (101)
1.5 4.5
1.5 -1.5 3
S6 (110)
1.5 1.5 3 -1.5 1.5
S7 (111) 0
By comparing Fig. 8.32 to the Fig. 8.30, it can be firstly noticed that the
branches are overwritten following the same order and the decoded codeword in
both cases is the same
c0 ¼ ð1010101Þ:
Although the metrics are calculated by using different formulas, it can be noticed
that for a fixed trellis depth to the mutual same numerical metric values from
Fig. 8.30, correspond the same mutual metric values in the Fig. 8.32. For example,
at the depth t = 3 to the states (000), (011) and (110) in Fig. 8.30 corresponds the
survived branch metric 0.016, while, in the Fig. 8.32 to the same states corresponds
the metric 1.5. Similarly, at the depth t = 5 to the metric 0.013 for generalized
Viterbi algorithm corresponds the correlation metric 2.5, and at the depth t = 6 to
the metric 7.68 10−4 corresponds metric 5.
In the book [26] it was shown that the Euclidean metric and the correlation
metric are equivalent concerning the making decisions when the channel noise is
Gaussian, but the correlation metrics are more easy to calculate and usually have a
smaller values. The advantage is obvious as well with the respect to calculate
metrics when a generalized Viterbi algorithm is applied—in Fig. 8.30 the metrics
had very small values already for a few trellis depths, and this approach for a
codewords having long length yields the metric values being difficult to express
with a sufficient exactness.
The second step in SOVA algorithm is the backward metric calculation, simi-
larly as for BCJR procedure. As it is shown in Fig. 8.33, the node metrics in this
case are accumulated from right to the left. In this example to every horizontal
3.5 4 2.5 3
1,5 -0.5
S3 (011)
2.5 -4
-1,5
-2.5 0.5
S4 (100)
-1,5 0.5
0.5 -2 0.5
S5 (101) 0.5
3.5 1.5
2
2
3.5 -2 0.5
S6 (110) 0.5
5 2 1.5
0.5
-1.5
S7 (111) 3.5
3 / 2.5 2.5 / 3
1.5 -0.5
S3 (011)
0 / 3.5 -1.5 1.5 / 4
-1.5 0.5
0 / -2.5
1.5
S4 (100)
-1.5 0.5
-1.5 -3 / 3.5 -1.5 4.5 / 2 6 / 0.5
S5 (101) 0.5
1,5
1.5
-1,5
1.5 3 / 0.5
S6 (110) 0.5
1.5 / 5 1.5 -1.5 1.5 / 2 1,5
3 / 3.5
0 / 3.5
S7 (111)
transition corresponds the branch metric −yqt, while to the other transitions corre-
spond yqt (it was valid and for the metrics “forward”, shown in Fig. 8.32, as well).
When calculating the metrics “backward” there is no need to make comparisons,
decisions nor paths overwriting. The reader is advised to verify which path would
survive in this case!
After the metrics calculation for both ways, it is suitable to write them at one
trellis diagram, together with the branch metrics. For the considered example all
elements needed for the next algorithm steps are shown in the Fig. 8.34. On the
basis of this figure it is easy to find the reliability of the bit decoded at the trellis
depth t by using the following procedure.
1. For every branch starting from the depth t − 1 the starting node is noticed and
FW
its adjoined metric forward, denoted by dt1 , and the incoming node at the depth
t and its adjoined metric backward, denoted by dtBW . If this branch metric is
BM
denoted by dt1;t , a corresponding numerical value for this branch can be
calculated
FW BM
Mt ¼ dt1 þ dt1;t þ dtBW ,
3. All the branches at the trellis part from depth t − 1 to the depth t corresponding
to bits ‘1’ (the branches that are not horizontal) at the encoder input are con-
sidered, and that one is found which has the minimal value for a path metric,
will be further denoted as Mt(1).
4. If a bit ‘1’ is decoded the reliability of estimation is as greater as
Kt = Mt(1) − Mt(0) is greater. If a bit ‘0’ is decoded the reliability of estimation
is as greater as Kt = Mt(1) − Mt(0) is smaller.
and because K is positive, it is clear that the probability that the codeword first bit is
binary one is greater.
At the depth t = 2
and it is obvious that the second decoded bit is binary zero, and because the
absolute value of parameter K2 is substantially greater than for a first bit, the
estimation of the second bit value is more reliable. It is in concordance with the
results of the third part of a previous problem.
At the depth t = 3
the third decoded bit is binary one, and this decision was made being more reliable
than for the first codeword bit and less reliable than for the second codeword bit.
At the depth t = 4
M4 ð0Þ ¼ minf1:5 þ 1:5 3:5; 1:5 þ 1:5 þ 2:5; 4:5 þ 1:5 þ 0:5; 1:5 þ 1:5 þ 0:5g ¼ 0:5
M4 ð1Þ ¼ minf1:5 1:5 þ 3:5; 1:5 1:5 2:5; 4:5 1:5 0:5; 1:5 1:5 0:5g ¼ 2:5
) K4 ¼ M4 ð1Þ M4 ð0Þ ¼ 2\0
and according to this criterion, the decoded bit is binary zero, but the decision is
relatively unreliable.
Problems 431
At the depth t = 5
It is interesting that at this depth binary one is decoded, but the parameter K5
value (negative one!) shows that this decision is very unreliable, and that it would
be more logical that in this case binary zero is decoded, but that for some reason is
not suitable (because of trellis structure).
At the depth t = 6
and at this depth binary zero is decoded, but the great parameter K6 value shows
that this decision is extremely unreliable and almost surely wrong! Therefore, the
possibility of this bit inversion should be considered.
Finally, at the depth t = 7
and one can consider that in this case it is relatively reliable that binary one was
sent.
It is obvious that the fifth and the sixth bit are decoded with the smallest
reliability, but with only their inversion the obtained bit combination is not a
codeword. By the first bit inversion (next according to the unreliability level) the
codeword would be obtained that was really sent, i.e. c = (0010011). Although the
procedure for obtaining this word is simple, still it should be noticed that to the sent
codeword corresponds a higher correlation level with the received word, than to this
reconstructed one, as
i.e., the sent word is the next one, after above reconstructed, according the relia-
bility level.
Now, some important features of SOVA algorithm can be noticed:
1. SOVA overcomes the drawback of classic Viterbi algorithm where the esti-
mations of reconstruction reliability of code sequence were not available.
432 8 Trellis Decoding of Linear Block Codes, Turbo Codes
Problem 8.8 The system is considered whose transmitting side consists of the
source emitting a series of equiprobable zeros and ones 010011…, a convolutional
encoder whose structure is defined by generator G(D) = [1 + D + D2, 1 + D2] and
BPSK modulator. Signal power at the receiver input is Ps = 1 [lW] and signaling
rate is Vb = 1 [Mb/s]. In the channel the white Gaussian noise with the average
power density spectrum N0 = 10−12 [W/Hz] is added. The signal samples at the
channel output are
y ¼ ð0:95; 1:2; 0:1; 0:05; 1:2; 1:01; 0:3; 1:13; 0:4; 16Þ;
Solution
(a) The code rate is R = 1/2 and a decoding is performed by blocks consisting of
three information bits, and the input decoder sequence is separated into the
groups consisting of ten samples. In this case the signal entering the decoder is
not quantized and a channel has not a finite number of output symbols (it is not
discrete). Therefore, the transition probabilities cannot be found and the
branch metrics are calculated according to the formula [73]
Problems 433
!
ut Lc X
n
ct ðS ðt1Þ
; S Þ ¼ Ct exp
ðtÞ
Lðut Þ exp xtl ytl
2 2 l¼1
where Ct does not affect the final result because it is canceled during the Log-
Likelihood Ratio (LLR) coefficients calculating. L(ut) is a priori LLR for the kth bit
at the encoder input (ut = 1 for it = 1 and ut = −1 for it = 0) and for equiprobable
symbols it is obtained
Pðut ¼ þ 1Þ
Lðut Þ ¼ ld ¼ 0:
Pðut ¼ 1Þ
Lc is defined as
Ec Eb
Lc ¼ 4a ¼ 4aR ;
N0 N0
where Ec = REb denotes energy needed to transmit one bit of the codeword and Eb
denotes energy needed to transmit one bit of the information word. Let the
instantaneous value of amplification (corresponding to the channels with fading) is
denoted by a. This value can change from one codeword to another (even from one
to the other bit inside the codeword, if the fading is fast).
In this problem, the fading is not supposed and the value a = 1 is taken, and
because Ck does not influence to the result, Ct = 1 will be used. For such numerical
values it is obtained
Ec Ps
¼ ¼ 1 ) Lc ¼ 4:
N0 N0 Vb
-0.95 -1.2 -0.1 -0.05 -1.2 1.01 0.3 1.13 -0.4 -16
10
1/(+1,-1)
1/(+1,-1)
1/(+1,-1) 1/(+1,-1)
11
0/(-1,+1)
Fig. 8.35 Trellis corresponding to recursive convolutional encoder for a given input sequence
2. In the second step the following is received y21 = −0.1 and y22 = −0.05,
yielding
3. In the third step the following is received y31 = −1.2 and y32 = 1.01, yielding:
4. In the fourth step the following is received y41 = 0.3 and y42 = 1.13, yielding:
5. In the last, fifth, step the following is received y51 = −0.4 and y52 = −16,
yielding:
Forward metrics are calculated by using the same formula as in Problem 8.6, and
from a0(0) = 1, it is easy to calculate
Problems 435
a01 ð0Þ ¼ a0 ð0Þc1 ð0; 0Þ ¼ 73:6998; a01 ð2Þ ¼ a0 ð0Þc1 ð0; 2Þ ¼ 0:0136:
a02 ð0Þ ¼ a1 ð0Þc2 ð0; 0Þ ¼ 1:3496; a02 ð1Þ ¼ a1 ð2Þc2 ð2; 1Þ ¼ 2:21 104 ;
a02 ð2Þ ¼ a1 ð0Þc2 ð0; 2Þ ¼ 0:7404; a02 ð3Þ ¼ a1 ð2Þc2 ð2; 3Þ ¼ 1:81 104 ;
and the corresponding normalized forward metric values at the depth t = 2 are
obtained dividing the previously obtained results by a02 ð0Þ þ a02 ð1Þ þ a02 ð2Þ þ
a02 ð3Þ ¼ 2:0904, yielding
a2 ð0Þ ¼ 0:6455; a2 ð1Þ ¼ 1:06 104 ; a2 ð2Þ ¼ 0:3543; a2 ð3Þ ¼ 0:87 104 :
In a similar way, the backward metrics are calculated and b5(0) = 1, for the
depth t = 4 yielding
b04 ð0Þ ¼ c5 ð0; 0Þb5 ð0Þ ¼ 1:75 1014 ; b04 ð1Þ ¼ c5 ð1; 0Þb5 ð0Þ ¼ 5:69 1015 ;
and the numerical values after normalization are (the values are exact to twenty
ninth decimal!).
b03 ð0Þ ¼ c1 ð0; 0Þb4 ð0Þ ¼ 0:0573; b03 ð1Þ ¼ c1 ð1; 0Þb4 ð0Þ ¼ 17:4615;
b03 ð2Þ ¼ c1 ð2; 1Þb4 ð1Þ ¼ 0; b03 ð3Þ ¼ c1 ð3; 1Þb4 ð1Þ ¼ 0;
the numerical values (normalized!) all forward and backward metrics are given in
Table 8.13.
436 8 Trellis Decoding of Linear Block Codes, Turbo Codes
Table 8.13 Node forward and backward metrics for BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward at(0) 1 0.9998 0.6455 0.0306 0.9955 1
at(1) 0 0 0.0001 0.9547 0.0045 0
at(2) 0 0.0002 0.3543 0.0143 0 0
at(3) 0 0 0.0001 0.0004 0 0
Backward bt(0) 1 0.9998 0.0001 0.0033 1 1
bt(1) 0 0 0 0.9967 0 0
bt(2) 0 0.0002 0.9998 0 0 0
bt(3) 0 0 0.0001 0 0 0
where this value should be normalized so as that a sum over all transitions at a given
depth is equal to one, to represent the transition probability from the state S(t−1) into
the state S(t) for a known received sequence y. The corresponding LLR for the tth
information bit after the received sequence y can be found from the relation (non
normalized values can be used)
P
rt ðSðt1Þ ; SðtÞ Þ
Lðut jyÞ ¼ ln PR1 ;
R0 rt ðSðt1Þ ; SðtÞ Þ
where R1 denotes the transitions in the trellis corresponding to information bit ‘1’
(ut = +1), while R0 denotes the transitions corresponding to information bit ‘0’
(ut = −1). Because the general trellis structure is the same in every step, it can be
written
Problems 437
and if in some steps (for some trellis depths) some transitions do not exist, the
corresponding probabilities are equal to zero and a previous equality is further
simplified.
Finally, it is obtained
while for a terminating part of trellis the expressions are slightly different, but as a
rule simplified (the corresponding encoder inputs are uniquely determined by
information sequence, but are not necessarily equal to all zeros sequence) yielding
here
LLR values and the branch metrics with a denoted path corresponding to a
codeword are shown in Fig. 8.36.
Decoded sequence of information bits is i1 = 0, i2 = 1 and i3 = 0, where the
second bit is decoded with smaller reliability than the other two. It should be noted
that this code is in fact a block code (although realized by the convolution encoder)
10
0
0
0 0
11 0
because at the encoder input is the three bits block to which two bits (tail bits) are
added for the trellis termination, to which at the encoder output the corresponding
sequence has ten bits. To the next information bits 001 correspond terminating bits
11, while from the line encoder in this case the following sequence is emitted −1,
−1, +1, +1, +1, −1, +1, −1, +1, +1.
(b) It can be noticed that the branch metrics depend on the channel signal-to-noise
ratio and on the symbol probabilities at the encoder input, what can be easily
seen if they are written in a developed form
!
ut Pðut ¼ þ 1Þ Ec Xn
ct ðS ðt1Þ
; S Þ ¼ Ct exp
ðtÞ
ld exp 2a xtl ytl :
2 Pðut ¼ 1Þ N0 l¼1
When Pðut ¼ þ 1Þ ¼ Pðut ¼ 1Þ ¼ 0:5; the first exponential term is equal to
one, but the branch metrics are still dependent on Ec/N0.
In Table 8.14 it is clearly shown that in considered example the correct decision
is made in all analyzed cases, where for smaller values of parameter Ec/N0 the
estimation becomes very unreliable. For example, when Ec/N0 = −20 [dB], nor-
malized transition probabilities corresponding to the correct path are (0.504,
0.2673, 0.1839, 0.2191, 1) showing a small reliability of the decision, if compared
to the values from Fig. 8.36.
Here it is essential to notice that for Viterbi decoding (either for hard outputs,
either for soft outputs, i.e. SOVA) the decision in any way does not depend on the
ratio Ec/N0 neither on the signal-to-noise ratio!
(c) For a case when the symbols at the encoder input are not equiprobable, BCJR
algorithm can take it into account. In Fig. 8.37 are shown LLR estimations
corresponding to the probabilities P(it) = 0.1 and P(it = 1) = 0.9, as well as
the paths corresponding to decoded information words (including the termi-
nating bits as well), for the case Ec/N0 = 0 [dB]. BCJR decoder favors these
information sequences which are more probable, i.e., for a difference to SOVA
algorithm, it takes into account the source characteristics.
Problem 8.9 The system described in the previous problem is considered, the
source parameters, convolutional encoder and modulator, as well as all numerical
values are the same as in the first part of the previous problem (Ps = 1 [lW], Vb = 1
Table 8.14 BCJR procedure results for various estimation of ratio Ec/N0 in the channel
Ec/N0 (dB) Lc LLR estimations
t=1 t=2 t=3 t=4 t=5
10 40 −172.4 91.6 −172.4 91.6 −710
0 4 −17.035 9.16 17.035 9.16 −70.99
−10 0.4 −1.4372 0.7533 1.4356 0.6696 −6.79
−20 0.04 −0.1021 −0.0238 −0.0756 −0.7199 −0.6582
Problems 439
10
1 1
11
Fig. 8.37 BCJR procedure results for different symbol probabilities at the encoder input
y ¼ ð0:95; 1:2; 0:1; 0:05; 1:2; 1:01; 0:3; 1:13; 0:4; 16Þ;
if in one block three information bits are transmitted, and the decoding is performed
by using
(a) Log-MAP algorithm
(b) Max-log-MAP algorithm
Solution
BCJR algorithm, although optimum, has some drawbacks:
– the overflow easily happens during the calculation forward and backward
metrics,
– a great number of multiplications should be performed.
In previous problem it was shown that the first drawback can be lessened by
normalization of a calculated metrics after every step. However, both problems can
be simultaneously solved by calculating logarithmic branch metrics values
ut Lc X
n
Ct ðSðt1Þ ; SðtÞ Þ ¼ ln ct ðSðt1Þ ; SðtÞ Þ ¼ ln Ct þ Lðut Þ þ xtl ytl ;
2 2 l¼1
providing that metrics forward and backward are obtained by a series of successive
additions
440 8 Trellis Decoding of Linear Block Codes, Turbo Codes
n o
At ðSðtÞ Þ ¼ ln at ðSðtÞ Þ ¼ max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ
Sðt1Þ
n o
Bt1 ðSðt1Þ Þ ¼ ln bt ðSðtÞ Þ ¼ max Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
SðtÞ
where the maximizing in the first case is performed over a set of all input paths into
a state S(t), while in the second case the choice is performed over these paths leaving
the considered state. A procedure for maximization performed in above equalities is
just the point where these two algorithms differ.
(a) For log-MAP algorithm [74] this operator can be changed by relation
and the numerical values of corresponding forward and backward metrics are given
in Table 8.15. When calculating metrics, operation max* is always used for the
case when there are two arguments and the above equality can be directly applied.
When the metrics forward and backward are found, they can be used for finding
the estimations for LLR because
n o
Lðut jyÞ ¼ max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R1
n o
max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ ;
R0
Table 8.15 Node metrics forward and backward for log-MAP algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward At(0) 0.0000 4.3000 4.6000 4.9801 11.2801 44.0801
At(1) −∞ −∞ −4.2000 8.4200 5.8812 −∞
At(2) −∞ −4.3000 4.0000 4.2203 −∞ −∞
At(3) −∞ −∞ −4.4000 0.5172 −∞ −∞
Backward Bt(0) 44.0801 39.7801 30.3200 29.9400 32.8000 0.0000
Bt(1) 0 0 29.5600 35.6600 −32.8000 −∞
Bt(2) 0 31.3451 40.0800 −31.1400 −∞ −∞
Bt(3) 0 0 31.2400 −34.4600 −∞ −∞
Problems 441
the corresponding metric numerical values are given in Table 8.16. When the
signal-to-noise ratio is sufficiently large (here it is Ec/N0 = 10 [dB]) the differences
in metric values in respect to BCJR algorithm are very small, while for the worse
channel conditions these differences grow.
LLR estimations in this case can be found directly by using of relation
n o
Lðut jyÞ ¼ max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R1
n o
max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R0
yielding
It is obvious that and log-MAP and max-log-MAP take into account the channel
quality influence (by using parameter Lc depending on Ec/N0), as well as the binary
symbols probabilities at the encoder input (by using parameter L(ut)).
Table 8.16 Node metrics forward and backward for max-log-MAP algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward At(0) 0.0000 4.3000 4.6000 4.9800 11.2800 44.0800
At(1) −∞ −∞ −4.2000 8.4200 5.8800 −∞
At(2) −∞ −4.3000 4.0000 4.2200 −∞ −∞
At(3) −∞ −∞ −4.4000 0.0002 −∞ −∞
Backward Bt(0) 44.0800 39.7800 30.3200 29.9400 32.8000 0.0000
Bt(1) 0 0 29.5600 35.6600 −32.8000 −∞
Bt(2) 0 31.1400 40.0800 −31.1400 −∞ −∞
Bt(3) 0 0 31.2400 −34.4600 −∞ −∞
442 8 Trellis Decoding of Linear Block Codes, Turbo Codes
i
+
x1 Conversion
Block + p1
in polar form,
interleaver 0 → -1 x
i% 1 → +1
x2 +
multiplexing
+ ⎡0 1 ⎤
Π=⎢ ⎥
⎣1 0 ⎦
+
p2
where the second row denotes the bit positions in a sequence obtained at the
interleaver output, while in the first row are ordinal numbers of bits in the original
sequence. Encoder output is led into the puncturing block (described by a punc-
turing matrix P), represented by polar pulses and after multiplexing symbol-by-
symbol is transmitted over a channel where is AWGN, resulting in Ec/N0 = 0 [dB].
(a) Find sequence x emitted into the channel if at the encoder input is the
information bits sequence i = (10101). In particular, explain the procedure for
terminating bits obtaining and at trellis show the paths corresponding to code
sequences at the outputs of component encoders.
(b) Decode the transmitted information sequence if log-MAP algorithm is used
and at the turbo decoder input the received sequence is
Problems 443
y ¼ ð2; 1:5; 0:5; 0:4; 1:8; 0:3; 0:2; 0:13; 0:1; 0:1; 0:2; 0:15; 0:1; 0:08Þ:
Solution
(a) In this case turbo encoder is realized as a parallel cascade combination of two
recursive systematic convolutional encoders (RSC), separated by a block
interleaver. At the beginning it is supposed that both encoders are reset and for
the input sequence i (determining output x1) firstly are found the outputs of the
first and the second RSC where are the parity-check bits and after that, they are
punctured, so as that at output x2 alternatively are sent odd bits of sequence p1
and even bits of sequence p2. All sequences are given in details in Table 8.17,
and the sequence at the encoder output is
Table 8.17 Sequences in some points of the encoder from Fig. 8.38
t 1 2 3 4 5 6 7
i 1 0 1 0 1 0 1
i′ 0 0 1 1 1 1 0
p1 1 1 1 0 1 1 1
p2 0 0 1 0 0 1 0
x1 1 0 1 0 1 0 1
x2 1 0 1 0 1 1 1
encoder output to return (reset) it in the initial state and to be ready to transmit a
new information block.
Decoding procedure is performed iteratively [68, 79]:
1. At the first decoder input:
– sequence y1 is led, corresponding to a transmitted sequence x1, containing
information about sent information symbols. On this basis the series of
symbols Lc y1 is calculated.
– by combination of y1 and y2,p1 (corresponding to the sequence x2, to the
parity-check bits from the first RSC decoder output, while the rest of
sequence is filled with zeros), a sequence y(I) is formed and from it by using
log-MAP algorithm, the improved estimation L1 ðut jyÞ is found, and further
DEINTERLEAVING
L2(ut)
Le2(ut /y)
Lcyt1 DECODER 1 INTERLEAVING
L2(ut /y)
output
DEINTERLEAVING
L2 ðut Þ ¼ I 1 fLe2 ðut jyÞg. This procedure is repeated successively until the code-
word is decoded or until a given maximum number of iteration is achieved. In this
case, the path corresponding to the calculated values L1 ðut jyÞ is shown in Fig. 8.41.
To these values corresponds the procedure of determining Le1 ðut Þ, shown below
Of course, the receiver does not know whether the heavy line in the Fig. 8.41
really corresponds to the emitted information sequence. However, even here it can
be concluded that the first bit of information sequence is i1 = 1 (highly reliable!)
although the corresponding received symbol is negative (−2). The rest of the bits
cannot be yet reliable decoded and a procedure is continued by leading Le1(ut)
through the interleaver and sent to the other RSC decoder as an estimation of a
priori RLL for sequences y1′.
Fig. 8.41 The decoding procedure for the first RSC during the first iteration
Fig. 8.42 The decoding procedure for the second RSC during the first iteration
446 8 Trellis Decoding of Linear Block Codes, Turbo Codes
Table 8.18 Turbo decoders outputs during the first four iterations, L1 ðut jyÞ and L2 ðut jyÞ
t=1 t=2 t=3 t=4 t=5 t=6 t=7
The first iter = 1 −2.9806 −3.4712 7.9516 −1.4854 −0.0031 −1.4803 0.1193
RSC iter = 2 2.0350 −3.7935 9.1071 −0.8265 1.1469 −0.8789 1.1729
decoder
iter = 5 17.5989 −6.7464 14.9302 −0.9586 4.4211 −0.9622 4.4210
iter = 10 44.8247 −13.4653 26.7319 −1.9421 11.2654 −1.9421 11.2654
The iter = 1 −2.6450 0.1110 0.5064 7.9508 −2.9789 −0.1109 0.4586
second iter = 2 −3.5272 0.4030 1.4530 9.2648 −1.8882 −0.4028 −0.9194
RSC
iter = 5 −6.8240 0.0487 4.7568 15.2578 17.4199 −0.0487 −4.6431
decoder
iter = 10 −13.2631 −0.9421 11.8841 27.3506 44.9234 0.9421 −11.6595
Finding L2 ðut jyÞ is shown in Fig. 8.42 and the further procedure is
and Le2(ut) is a set of a priori LLR estimations which are led to the input of the first
decoder in the second iteration.
Estimations L1 ðut jyÞ and L2 ðut jyÞ after a few first iterations are given in
Table 8.18. Erroneously decoded bits after every particular iteration are shadowed
in table and it is clear that already after the second iteration, the first RSC decoder
successfully decodes full information sequence, while the second component
decoder needs more than five iterations to eliminate the errors influence. It is
obvious that the estimations are more and more reliable after additional iterations. It
is easy to verify that a decoded sequence is i = (10101) and it just corresponds to
the sequence at the decoder input in the first part of this problem (and, as earlier
shown, to it corresponds the sequence i′ = (11001) at the interleaver output).
Chapter 9
Low Density Parity Check Codes
Besides the turbo codes, there is one more class of linear block codes that makes
possible to approach to the Shannon bound. These are Low Density Parity Check
(LDPC) codes. They were proposed by Gallager [83]. In principle they have a
sparse parity-check matrix. In such a way the corresponding parity-check equa-
tions have a small number of terms providing significantly smaller complexity
compared to standard linear block codes. They provide the iterative decoding with
a linear complexity. Even then it seemed that this class of codes has a good
performance, but the contemporary hardware and software were not suitable for
their practical implementation. Using graph theory Tanner in 1981 [84] proposed an
original interpretation of LDPC codes, but his work was practically ignored for the
next 15 years. Not until mid-nineties some researches started to consider these
codes and the decoding using graphs. Probably the most influential was work of
David McKay [85] who demonstrated that, the performance very near to the
Shannon limit can be achieved by using iterative decoding.
LDPC codes are linear block codes constructed by using control matrix H where
the number of nonzero components is small. The corresponding parity-check
equations, even for a long codeword have a small number of terms yielding sub-
stantially smaller complexity than the corresponding equations for typical linear
block code which has the same parameters. For binary codes, control matrix has a
great number of zeros and a few ones.
According to Gallager definition, the following conditions should be satisfied
yielding a code which has low density parity-checks:
1. Control matrix H has a fixed number of ones in every row, denoted by q.
2. Control matrix H has a fixed number of ones in every column. This number is
denoted by c (usually c 3).
3. To avoid the cycles in the corresponding bipartite graph (will be explained later)
in any two columns (or rows) the coincidence of ones should not be more than at
one position.
4. Parameters s and t should be as small as possible in relation to codeword length
n.
The code satisfying above conditions is denoted as CLDPC (n, c, q). Control
matrix has n columns and n–k rows, and the following must holds cn = q(n–k). If
the columns of matrix H are linearly independent code rate of LDPC code is
R ¼ 1 c=q;
being for typical LDPC codes r 0:02 and regular LDPC code
(Problems 9.1, 9.2, and 9.6) is obtained. However, it is very difficult to construct
such code, especially because of condition 3. It was shown that for efficient LDPC
codes the cycles cannot be avoided. Therefore, it is allowed that a number of ones
in some rows (columns) differs from prescribed values and in this case c and q are
average values of these parameters. The corresponding code is called irregular
LDPC code (Problem 9.1).
As an example, consider control matrix
2 3
1 1 0 0 0 0
H ¼ 40 0 1 1 0 0 5:
0 0 0 0 1 1
density r = 3/8 = 4.5/12 = 0.375 and the code rate R = 1– c/q = 1/3. The matrix
has n = 12 columns and n–k = 8 rows, basic code parameters are (n, k) = (12, 4).
Therefore, a code rate can be found and if the number of ones in every column is
not the same.
At the end consider now square matrix having c = q = 3 ones in every column
and row
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
6 7
60 0 1 1 0 1 07
6 7
H¼6
60 0 0 1 1 0 177:
61 0 0 0 1 1 07
6 7
40 1 0 0 0 1 15
1 0 1 0 0 0 1
In any two columns (or rows) the ones do not coincide at more than one position
and the matrix density is relatively small (r = 3/7 = 0.4286), this matrix seems to
satisfy the condition to be LDPC code control matrix. However, it should be
noticed that the code rate is here R = 1–3/3 = 0!
Furthermore, the rows in this matrix are not linearly independent and it cannot be
control matrix of a linear block code. By eliminating the linearly dependent rows,
the following is obtained
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H LN ¼6
40
7:
0 1 1 0 1 05
0 0 0 1 1 0 1
However, although the number of ones in every rows is fixed ðc0 ¼ 3Þ, the
number of ones in every column varies from 1 to 3 (the average value is c0 ¼ 1:71).
It may be seen that the number of matrix rows is (n–k) = 4 and a linear block code
(7, 3) is obtained having the code rate R ¼ 1 c0 =q0 ¼ 3=7.
For the last example, there is one more question—do this code is regular or not?
Obviously, code which has the control matrix H LN (linearly independent rows) is
irregular. But it can be obtained from matrix H, which satisfies conditions for a
regular code. Let instead control matrix H LN for decoding control matrix H is used
and syndrome decoding is performed (it can be applied for every linear block
code). If r is received vector, the corresponding syndrome is S ¼ rH T . In this case
to the received vectors will correspond seven-bit syndrome vectors because their
length is defined by the number of matrix H rows. However, there are three linearly
dependent rows in this matrix and to all possible words would not correspond
2n = 128 but only 2(n−3) = 16 possible syndromes. Therefore, by using matrix
H the same results will be obtained as by using matrix HLN for decoding, where the
rows are linearly independent. The conclusion is that a code can be defined by
matrix having linearly independent rows, and this code can be defined and by
450 9 Low Density Parity Check Codes
matrix having linearly dependent rows as well and it can be said that the code is
regular!
Having the above in view, the construction of regular codes can be substantially
simplified if linear dependence of control matrix rows is allowed. Using this
approach the first (and relatively exact) method for construction of regular LDPC
codes was proposed by Gallager in his pioneering work [83]. The steps in control
matrix construction are:
– The wished codeword length (n) is chosen. Number of ones in rows and col-
umns is fixed being v and s, respectively.
– One positive integer is chosen m = n/q, and a matrix to be constructed is divided
into s submatrices of dimensions m m q, where
– Now, submatrix H1 is formed, in such way that in the ith row, by ones the
positions from (i–1)q + 1 to i q (i = 1, 2, … m) are filled, providing that rows of
H1 do not have common ones, and columns have not more than one.
– The other submatrices are obtained by permutations of columns of matrix H1.
Columns permutations are chosen so as that rows of the full matrix do not have
common ones, and columns have not more than one. Total number of ones in
matrix H is mqc, and matrix density is r = 1/m.
For example, consider control matrix which has parameters n = 20, q = 4, c = 3
and m = 5. Firstly, submatrix H1 dimensions 5 20 is formed, and later using
search other submatrices (H2 and H3) are found, fulfilling the above conditions.
One possible control matrix H obtained using such procedure is
2 3
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60
6 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 077
60
6 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 177
6
6 77
2 3 6 7
H1 61
6 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 077
6 7 6 7
6 7 6 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 07
6 7 6 7
H¼6 7 6
6 H3 7 ¼ 6 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 077
6 7 6 7
4 5 6 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 07
6 7
H3 60
6 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 177
6 7
6 7
6 7
61
6 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 077
6 7
60 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 07
6 7
60
6 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 077
6 7
40 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 05
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
Brief Theoretical Overview 451
The matrix rang is 13, and from 15 rows only 13 are linearly independent. The
obtained code is (20, 7), parameters are n–k = 13, code rate R = 0.35 and matrix
density r = 0.2 (Problem 9.3).
On the basis of explained procedure, the conclusion can be drawn that Gallager
method for regular LDPC code construction is not fully deterministic. In fact, it is
not clearly defined how to perform the permutations in submatrices H2, H3,…, Hs,
but only the needed features. Such an approach usually results in codes having good
performance, but it is relatively difficult from all possible combinations to choose
LDPC code with a given codeword length n and code rate R = k/n, i.e. (n, k) code
having good characteristics. However, later the structured LDPC codes were pro-
posed where the control matrix structure can be found using a deterministic algo-
rithm. Some of these algorithms are based on Euclidean and projective geometry
[86, 87], cyclic permutations [88] or combined construction [89]. These problems
are also considered in papers and monographs [90, 91, 92].
Tanner 1981 proposed an alternative way for considering control matrix of
LDPC code by using the bipartite graphs [84] (Problems 9.1, 9.2, 9.4, 9.5,
and 9.6). This approach provides advanced technique for decoding. To explain this
method better, firstly we will consider two classical ways for linear block codes
decoding. The decoding of any linear block code can be performed using syndrome
(Problem 9.2). Using such approach firstly on the basis of matrix H and relation
vHT = 0, where v = [v1, v2,…,vn] is a valid code word, a system of n–k equations
with n unknown variables is written. Every row of control matrix defines one
equation for parity-check, and the position of one in that row defines the position of
symbol in equation.
On the base of this system, the bits of code word can be found. Tanner graph is
bipartite graph visualizing the relation between two types of nodes—variable nodes
(vj) denoting the sent symbols, and (parity-)check nodes (ci,)—nodes correspond-
ing to parity-checks which relate the emitted symbols (bits). In such way, for any
linear block code, if (i, j)th element of matrix H equals one, at the corresponding
bipartite graph, there is a line between variable node vj and (parity-)check node ci .
The state of a check node depends on the values of variable nodes to which it is
connected. For some check node it is said that it is children node of variable nodes
to which is connected, and a variable node is parent node for all check nodes
connected to it.
As an example consider Hamming code (7, 4) where the control matrix is
(Problem 5.3)
2 3 c1 : v 1 þ v 3 þ v 5 þ v 7 ¼ 0
1 0 1 0 1 0 1
H ¼ 40 1 1 0 1 ð1Þ ð1Þ 5 c2 : v2 þ v3 þ v6 þ v 7 ¼ 0 ;
0 0 0 1 1 ð1Þ ð1Þ c3 : v4 þ v5 þ v6 þ v 7 ¼ 0
where the corresponding equations are at the right side. The corresponding bipartite
graph is given in Fig. 9.1.
452 9 Low Density Parity Check Codes
v1 v2 v3 v4 v5 v6 v7
c1 c2 c3
s 1 ¼ y1 þ y3 þ y5 þ y7 ¼ 1
s 2 ¼ y2 þ y3 þ y6 þ y7 ¼ 0 ;
s 3 ¼ y4 þ y5 þ y6 þ y7 ¼ 0
Set of equations containing the matrix rows having one in the first column,
denoted with A1, is
c1 : v1 þ v2 þ v4 ¼ 0
c5 : v1 þ v5 þ v6 ¼ 0 ;
c7 : v1 þ v3 þ v7 ¼ 0
and it should be noticed that in this set every symbol occurs once except v1. This set
of equations is orthogonal to v1. Set of equations containing the matrix rows having
one in the second column, denoted with A2, is
c1 : v1 þ v2 þ v4 ¼ 0
c2 : v2 þ v3 þ v5 ¼ 0 :
c6 : v2 þ v6 þ v7 ¼ 0
v1 v2 v3 v4 v5 v6 v7
c1 c2 c3 c4 c5 c6 c7
S ¼ yH T
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 3
I 1 1 1 1
II 6 1 1 1 1 7
6 7
6 7
III 6 1 1 1 1 7
6 7
IV 6 1 1 1 17
6 7
6 7
6 7
6 7
61 7
V 6 1 1 1 7
6 7
H ¼ VI 6 1 1 1 1 7
6 7
6 7
VII 6 1 1 1 1 7
6 7
VIII 6 1 1 1 17
6 7
6 7
6 7
6 7
IX 61 1 1 17
6 7
6 7
X 6 1 1 1 1 7
6 7
XI 4 1 1 1 1 5
XII 1 1 1 1
It is obvious that the code word length is n = 16, the number of ones over rows
and columns is, respectively, c = 4 and c = 3, and control matrix density is
r = 0.25.
If the received word is y = (1100101000000000), then yHT = (000001101001).
In the syndrome at the positions VI, VII, IX and XII are ones. Therefore, the
following sums are not satisfied:
– VI control sum (where are 1, 2, 3 and 4 received word bits)
– VII control sum (where are 3, 7, 11 and 15 received word bits)
– XI control sum (where are 1, 6, 11 and 16 received word bits)
– XII control sum (where are 4, 5, 10 and 15 received word bits)
In the above control sums bits 1, 3, 4, 11 and 15 appear the most frequently (two
times) and the received word bits at these places are inverted yielding the word
y′ = (0111101000000010), to which now the syndrome y′HT = (010101101001)
corresponds. Now ones in syndrome are at the II, IV, VI, VII, IX and XII position.
It is obvious that the parity-checks are not satisfied. Now, in control sums bits 6 and
15 appear the most frequently (three times). Bit 6 appears in sums II, VI and IX, and
bit 15 in sums IV, VII and XII, while the other bits appear at most two times. By
inverting bits 6 and 15, the second estimation of the transmitted code word is
456 9 Low Density Parity Check Codes
1 2 1 2
fj0 ¼ pffiffiffiffiffiffi eðrj þ 1Þ =ð2r Þ ; fj1 ¼ pffiffiffiffiffiffi eðrj 1Þ =ð2r Þ
2 2
j ¼ 1; 2; . . .; n;
2pr 2pr
where rj denotes the jth bit of the received word. In fact, fj0 denotes a priori
probability that (logical) zero (dj = 0) was emitted, and fj1 denotes a priori
probability that (logical) one (dj = 1) was emitted. These values are the initial
probability estimations that variable nodes are in state 0 or 1. These estimation
are sent to check nodes where the symbol vj appears.
2. Other the check node ci receives from all connected variable nodes vj the
probability estimations Qxij (for x = 0 and x = 1), the probabilities are calculated
that ith parity-check equation is satisfied if the vj is in the state x (corresponding
to the symbol vj value and can be 0 or 1). These probabilities are
X Y
Rxij ¼ Pðci =vÞ Qvikk
v:vj ¼x k2N ðiÞ=j
where N(i) is a set of indexes of all parent nodes connected to check node ci, and
N(i)/j is the same set but without node vj to which the information is sent. Term
P(ci/v) denotes the probability that parity-check equation is satisfied, and
Brief Theoretical Overview 457
summing is performed over all possible decoded vectors v for which the
parity-check equation is satisfied when the informed node is in the state x.
3. At the variable node dj after receiving from all connected nodes the values of
probabilities Rxij (for x = 0 and x = 1), the correction of probabilities that this
variable node is in state x (it can be 0 or 1) is performed. These probabilities are
Y
Qxij ¼ aij fjx Rxkj ;
k2M ð jÞ=i
where M(j)/i is a set of indexes of all check nodes connected to node vj, without
P is sent. Coefficient aij is a normalization constant
node ci to which the information
chosen so as the condition x Qxij ¼ 1 is satisfied.
This process, shown in Fig. 9.3 is repeated iteratively and stops after the
information yielding vector d for which the corresponding syndrome is zero
vector. If after some number of iterations zero syndrome is not obtained, the
process stops when some, predefined, number of iterations was made. In both
cases the decoder yields optimally decoded symbols, in the maximum a posteriori
probability sense, but if syndrome is not equal to zero, the emitted code word is
not decoded.
Consider regular LDPC code and the corresponding Tanner graph from the
above example. The channel is with additive white Gaussian noise only. The
procedure starts by calculating a priori probabilities f10 ; f11 ; f20 ; f21 ; . . .; f70 ; f71 sent to
the check nodes. To the first check node the initial information from the first,
second and fourth variable node Qxij ¼ fjx are sent, concerning the probability
that the corresponding node is in the state 0 or 1. Now, the first check node has
to return some information to every of connected variable nodes. They are
different.
Parity-check equation for the first check node is v1 þ v2 þ v4 ¼ 0. Coefficient R011
is the estimation sent by the check node 1 to the variable node 1. It is calculated
Variable nodes v j
v1 v2 v3 v4 v5
Q24x
R24x
c1 c2 c3
Check nodes h i
Fig. 9.3 The change of information between variable and check nodes
458 9 Low Density Parity Check Codes
This step of iteration process is shown in Fig. 9.4. When all values of coeffi-
cients R0ij and R1ij are calculated, the first estimation of message symbols is made.
The corresponding vector bv is calculated as follows
Y
bv j ¼ arg max fjx Rxkj
x k2M ð jÞ
and if the syndrome for this vector is not zero, procedure continues by correcting
probabilities Qxij , joined to the corresponding variable nodes.
As said earlier, decoding of LDPC codes converges to the original message if
there are no cycles in the corresponding bipartite graph. However, the relatively
short cycles are unavoidable as well and when the corresponding LDPC code has
good performance. But, the degrading effect of small cycles is diminished as the
code length is longer, and substantially is small for code words long over 1000 bits.
Also, there exist special procedures to eliminate the short cycles or to reduce their
number [25].
The realization of sum-product algorithm in a logarithmic domain (Problems 9.7
and 9.10) can be further considered. The motives to use the logarithmic domain are
the same as for turbo codes decoding—avoiding the products calculation always
v1 v2 v3 v4 v5 v6 v7
R110 Q120
Q140
1
R111 Q
12 Q141
c1 c2 c3 c4 c5 c6 c7
when it is possible and lessening the overflow effect. Here logarithmic likelihood
ratio is used. It can be used as well as for hard decoder inputs (with smaller correcting
capability). However, this algorithm has one serious drawback. To find the infor-
mation which the check node send to it adjoined variable node j one has to determine
8 9
< Y =
bi;j ¼ 2 tan h1 tan hðai;k =2Þ ;
:k2NðiÞnj ;
and still more serious problems appear when the hardware block should be
implemented corresponding to this relation. That is the reason to use the SPA
procedure versions being much easier to be implemented. The simplest (and still
rather exact) approximation of the above equality is
2 3
Y
bi;j ¼ 4 sgnðai;k Þ5
min ai;k ;
k2NðiÞnj
k2NðiÞnj
it is called min-sum algorithm (Problem 9.8) having still possible further modifica-
tions as self-correcting min-sum algorithm (Problem 9.8). The quasi-cyclic codes
(Problem 9.9) can be considered as the LDPC codes as well, but the obtaining of the
parity-check matrix is not always easy. The class of Progressive Edge Growth
(PEG) LDPC codes (Problem 9.10) can be constructed as well, where the
parity-check matrix is formed by a constructive method (it is a structured, and not a
random approach with the limitations as for Gallager codes). For such a code Monte
Carlo simulation of signal transmission over BSC with AWGN was carried out
(Problem 9.10). For the case when sum-product algorithm is applied, the error
probability 10−6 is achieved for substantially smaller values of parameter Eb/N0 in
respect to the case when bit-flipping algorithm is applied. Some modification of
bit-flipping algorithm (e.g. gradient multi-bit flipping—GDFB) allows to increase the
code gain from 0.4 to 2.5 dB (for Pe,rez = 10−6). On the other hand, even a significant
simplification of sum-product algorithm (min-sum is just such one) in the considered
case does not give a significant performance degradation—coding gain decreases
from 6.1 to 5.8 dB. It is just a reason that contemporary decoding algorithms are
based on message-passing principle, with smaller or greater simplifications.
Problems
Problem 9.1 At the input of a decoder, corresponding to a linear block code (7, 4),
a code word r transmitted through the erasure channel is led. It is known that the
first four bits are not transmitted correctly (but their values are not known), while it
is supposed that the last three bits are received correctly being y5 = 1, y6 = 0,
y7 = 1. Find the values of erased (damaged) bits:
460 9 Low Density Parity Check Codes
(a) If a code used is obtained by shortening the Hamming systematic code (8, 4),
its generator is given in Problem 5.4, and a shortening was performed by the
elimination of the first information bit;
(b) If the code was used which has the parity-check matrix
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H¼6
40
7;
0 1 1 0 1 05
0 0 0 1 1 0 1
(c) For both codes draw a Tanner graph and on its basis explain the difference in
the decoding complexity of the above codes.
Solution
In this problem it is supposed that the erasure channel is used, where the emitted
symbol can be transmitted correctly, or it can be “erased”, meaning that there is an
indication that the symbol (denoted by E) was not transmitted correctly, but there is
not the information do it originate from the emitted zero or one (illustrated in
Fig. 9.5). Here, as a difference to Problem 4.8, it is not permitted that an emitted
symbol during transmission was inverted, i.e. that the error occurred and that the
receiver has not information about that (i.e. the transitions 0 ! 1 and 1 ! 0 are
not permitted).
The case is considered when code (7, 3) is used and the emitted code word
x = (x1, x2, x3, x4, x5, x6, x7) should be found, if the received word is
y = (EEEE101). Although this code can detect two errors and correct only one error
in the code word at the BSC (it is a shortened Hamming code with one added
parity-check bit), in the following it will be shown that here all four erased symbols
can be corrected.
(a) In Problem 5.4 the construction of code (8, 4) is explained, which has the
generator matrix
2 3
1 0 0 0j 1 1 0 1
60 1 0 0j 1 0 1 17
Gð8;4Þ ¼6
40
7
0 1 0j 0 1 1 15
0 0 0 1j 1 1 1 0
Code (7, 3), obtained by shortening of the previous code by elimination of the
first information bit has the following generator matrix
2 3
1 0 0 0j 1 1 0 1 2 3
60 7 1 0 0 1 0 1 1
1 0 0j 1 0 1 17 4
G1 ¼ 6
40 ¼ 0 1 0 0 1 1 1 5;
0 1 0j 0 1 1 15
0 0 1 1 1 1 0
0 0 0 1j 1 1 1 0
Problems 461
1-p1
y3=0
The code word must fulfill the condition xH T1 ¼ 0, i.e. in a developed form
x1 x3 x4 ¼ 0
x2 x3 x5 ¼ 0
x1 x2 x3 x 6 ¼ 0
x1 x2 x7 ¼ 0
By supposing that the last three symbols are transmitted correctly, then x5 ¼
y5 ¼ 1; x6 ¼ y6 ¼ 0; x7 ¼ y7 ¼ 1 and a system of four equations with four
unknown variables is obtained
x 1 x3 x4 ¼ 0
x 2 x3 ¼ 1
;
x 1 x2 x3 ¼ 0
x 1 x2 ¼ 1
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H2 ¼ 6
40
7;
0 1 1 0 1 05
0 0 0 1 1 0 1
x1 x2 x4 ¼ 0; x2 x3 x5 ¼ 0; x3 x4 x6 ¼ 0; x4 x5 x7 ¼ 0:
x4 x 5 x7 ¼ 0 ) x4 ¼ x5 x7 ¼ 0;
x3 x 4 x6 ¼ 0 ) x3 ¼ x4 x6 ¼ 0;
x2 x 3 x5 ¼ 0 ) x2 ¼ x3 x5 ¼ 1;
x1 x 2 x4 ¼ 0 ) x1 ¼ x2 x4 ¼ 1;
and the reconstructed code word is x = (1100101). The proposed decoding method
is in fact an iterative method—in the first step the first three equations (every
considered for itself) have not a solution, while from the last one x4 is found. In the
next step, the first two equations have not a solution, but the third one has it. In the
third step x2 is found using the second equation and the previously found values for
x3 and x4. In the last step the first code word bit is reconstructed.
The following could be noticed:
– Generally, for a code (n, k), if from every equation one variable can be found,
the corresponding decoder complexity is linear, i.e. O((n–k)).
– In the first case the number of binary ones in the rows is three or four (on the
average q1 = 13/4) and a number of ones in the columns varies from one to
three, but on the average it is c1 = 13/7, and a code rate can be found from the
relation R1 = 1−c1 /q1 = 1−4/7 = 3/7.
– In the second case a number of ones in every row is fixed (q2 = 3), but a number
of ones in matrix columns is not the same (from one to three, on the average
c2 = 12/7), and the code rate is here the same as well R2 = 1−c2 /q2 = 3/7.
– The matrix density is relatively high for both cases (r1 = q1 /n = 0.4643,
r2 = q2 /n = 0.4286), both codes are irregular.
– An optimum method for code which has parity-check matrix H2 decoding is
iterative—it is optimal to decode firstly the fourth bit, then the third, the second,
and at the end the first one. In the same time a decoding procedure has a linear
complexity!
– To provide for iterative decoding with a linear complexity, it is fundamental to
achieve that a parity-check matrix allows the decoding of one bit from one
equation in every iteration. If a matrix has a small number of ones in every row,
Problems 463
it will provide (with some good luck concerning the error sequence!) that in
every row there is no more than one bit with error.
– Therefore, a necessary condition is that a parity-check matrix is a sparse matrix,
the codes fulfilling this condition are Low Density Parity Check (LDPC) codes
[83].
(c) Tanner graph [84] is a bipartite graph to visualize the connections between the
two types of nodes, the variable nodes, usually denoted by vj, representing the
code word emitted symbols and the (parity-check nodes), denoted by ci,
representing the parity-check equations.
Tanner graph for a code having parity-check matrix H1 is shown in Fig. 9.6,
with especially marked cycle having a length four (4-girth-cycle), corresponding to
the variable nodes v1 and v2, i.e. to the check nodes c1 and c2 (there is one more
cycle of the same length, the reader should find it).
It can be noticed that just this cycle makes impossible to decode the symbols v1
and v2 directly, from the last four equation of the system
c1: v1 ⊕ v3 ⊕ v4 = 0
c2 : v2 ⊕ v3 ⊕ v5 = 0
c3 : v1 ⊕ v2 ⊕ v3 ⊕ v6 =0
c4 : v1 ⊕ v2 ⊕ v7 =0
At the Tanner bipartite graph, corresponding to matrix H2, shown in Fig. 9.7, it
can be noticed that the cycle of the length four does not exist. It just makes possible
to decode successively a code word bits with the linear complexity. The shortest
cycle at this graph has the length six and it is drawn by heavy lines. Therefore, to
provide the iterative decoding with the linear complexity, the parity-check matrix
graph should not have the short length cycles!
Problem 9.2 Parity-check matrix of one linear block code is
2 3
1 1 1 1 0 0 0 0 0 0
61 0 0 0 1 1 1 0 0 07
6 7
H¼6
60 1 0 0 1 0 0 1 1 077:
40 0 1 0 0 1 0 1 0 15
0 0 0 1 0 0 1 0 1 1
(a) Find the parity-check matrix density and verify whether the code is regular,
(b) Draw the Tanner graph corresponding to the code and find the minimum cycle
length,
(c) Decode the received word
464 9 Low Density Parity Check Codes
v1 v2 v3 v4 v5 v6 v7
c1 c2 c3 c4
v1 v2 v3 v4 v5 v6 v7
c1 c2 c3 c4
y ¼ ð0001000000Þ
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
s=2
c1 c2 c3 c4 c5
=4
A1 : c1 : v1 v2 v3 v4 ¼ 0; c2 : v1 v5 v6 v7 ¼ 0;
A2 : c1 : v1 v2 v3 v4 ¼ 0; c3 : v2 v5 v8 v9 ¼ 0;
A3 : c1 : v1 v2 v3 v4 ¼ 0; c4 : v3 v6 v8 v10 ¼ 0;
A4 : c1 : v1 v2 v3 v4 ¼ 0; c5 : v4 v7 v9 v10 ¼ 0;
A5 : c2 : v1 v5 v6 v7 ¼ 0; c3 : v2 v5 v8 v9 ¼ 0;
A6 : c2 : v1 v5 v6 v7 ¼ 0; c4 : v3 v6 v8 v10 ¼ 0;
A7 : c2 : v1 v5 v6 v7 ¼ 0; c5 : v4 v7 v9 v10 ¼ 0;
A8 : c3 : v2 v5 v8 v9 ¼ 0; c4 : v3 v6 v8 v10 ¼ 0;
466 9 Low Density Parity Check Codes
A9 : c3 : v2 v5 v8 v9 ¼ 0; c5 : v4 v7 v9 v10 ¼ 0;
S ¼ yH T ¼ ð10001Þ;
and syndrome bits determine the parity-check sums, while the parity-check values
for single bits of a code word, corresponding to the sets A1–A10 provide for the
code word bits reconstruction
From the above it is reliably determined that at the 5th, 6th and the 8th code
word bits there were no errors, and at the 4th bit almost surely the error occurred,
while for the rest of the bits the reliable decision cannot be made. As a difference
from the example in introductory part of this chapter, in this case the number of
parity-check sums is even (it is not desirable) and small as well, i.e. only two
parity-check sums are formed for every code word bit. Supposing that no more than
one error occurred at the code word, the reconstructed code word is
x0 ¼ ð0000000000Þ:
The decoding procedure would be highly more efficient if the number of ones is
greater per column and odd if possible. But, on the other hand, the matrix density
would grow yielding probably the occurrence of the shorter cycles as well, i.e. the
code quality would be degraded. Therefore, a majority decoding is not a procedure
providing for an efficient LDPC codes decoding, i.e. it is relatively successful only
for codes which have very long code words and a relatively dense parity-check
matrix.
Problem 9.3 In a communication system Gallager code (20, 7) is used, the
parity-check matrix given in in introductory part of this chapter (it is repeated here
for convenience!). By bit-flipping algorithm decode the following received words
Problems 467
Solution
Bit-flipping algorithm is an iterative procedure using as inputs the received bits and
a parity-check matrix structure. For Gallager code (20, 7) it is
2 3
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60
6 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 077
60
6 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 177
6
6 77
6 7
61
6 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 077
6 7
60 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 07
6 7
H¼6
60 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 077:
6 7
60 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 07
6 7
60
6 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 177
6 7
6 7
6 7
61
6 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 077
6 7
60 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 07
6 7
60
6 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 077
6 7
40 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 05
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
matrix, denoted by dot-line-dot in Fig. 9.9. The syndrome shows that the
parity-check sums I, II, IV, V, VI, VII, IX, X, XII and XV are not satisfied,
corresponding to the shadowed rows of the parity-check matrix in the Fig. 9.9.
If only these rows are considered, columns denoted by ordinal numbers 2, 5,
15, 16 and 20 contain the highest number (three) of ones.
After the first correction (carried out by inverting bits 2, 5, 15, 16 and 20) the
first estimation of the codeword is found y′ = (01010000000000101001). To this
code word (having ones at the positions 2, 4, 15, 17 and 20, determining the
columns of interest in Fig. 9.10) corresponds the syndrome y′HT =
(000100000101000), and in IV, X and XII control sums the 16th bit the most
frequently appears (three times). It can be easily seen from the shadowed rows in
Fig. 9.10. Therefore, in the second iteration only this bit is inverted and second
code word estimation is formed as y″ = (01010000000000111001).
Because r″HT = (000000000000000), i.e. the syndrome equals zero, the last
estimation is a valid code word. Therefore, the result of decoding is
x ¼ y00 ¼ ð01010000000000111001Þ;
where the bits inverted in the received word are underlined and in bold.
Of course, it may not be the transmitted code word. For example, it was possible
that the transmitted word is an all zeros word and the four bits were inverted (by
errors) yielding the received word r. Hamming distance of these two words is four,
while the distance of the received word to the decoded word r″ is four as well.
Therefore, it can be said that a decoded word is one from the nearest to the received
word in a Hamming distance sense, but is not necessarily the right solution.
Problem 9.4 LDPC code (10, 5) is defined with the parity-check matrix from the
formulation of Problem 9.2.
(a) Write the parity-check equations and explain the parallel realization of
bit-flipping algorithm with the fixed threshold T = 2,
(b) By using the explained algorithm decode the following received words
y1 ¼ ð0000100000Þ; y2 ¼ ð1000100000Þ:
Solution
As it was shown in Problem 9.2, the corresponding parity-check equations are:
c 1 ¼ v1 v2 v3 v4 ; c2 ¼ v1 v5 v6 v7 ; c3 ¼ v2 v5 v8 v9 ;
c4 ¼ v3 v6 v8 v10 ; c5 ¼ v4 v7 v9 v10 :
and all these parity-checks are satisfied (the result is equal to zero) if vector v is a
valid codeword.
Problems 469
For the updating of the jth bit in the received word, it is important to identify the
ðjÞ
parity-check equations that include this bit. Let ci denote the ith parity-check
equation containing the jth bit in the received word, which is satisfied only if the
corresponding bipartite graph (shown in Fig. 9.8) has an edge that connects the jth
variable node and the ith check node.
For regular LDPC codes, there are exactly q inputs into every check node (in this
case q = 4), and it is easy to understand that the operation in check nodes can be
realized by using q-input exclusive or (XOR) gates. On the other hand, there are
ðjÞ
exactly c parity-checks ci that are important to decide whether the jth bit will be
flipped or not (in this case c = 2). If the number of unsatisfied parity-checks is
greater or equal than the predefined threshold T c, the jth bit will be inverted
(flipped) in the current iteration
( P ðjÞ
1; h T;
Dj ¼ P iðjÞ
0; hi \T:
values, only the first and the third parity-check are unsatisfied. If the threshold is
T = 2, only the second bit in the received bit should be flipped and the estimated
word after the first iteration is y2′ = (1100100000). For this word, all parity-checks
are satisfied (h1–h3 as a summation of two binary ones, c4 and c5 as a summation of
all zeros), i.e. it is the valid codeword and the decoding process is finished.
Now two cases will be considered when (1000100000) can be received:
• codeword (1100100000) is transmitted, the channel introduced the error at the
second bit and the decoding process was successful
• codeword (0000000000) is transmitted, and the channel introduced the errors at
the first and the fifth bit. In such case, the decoder produced one additional error
at the second position of the estimated vector and the decoding process was
unsuccessful.
Now, consider why the errors at positions 1 and 5 cannot be corrected if the
codeword all-zeros is transmitted. The Tanner graph for the case when r2 is
received is given in Fig. 9.12. One can see that errors are located inside the cycle
with length 6. As this code is girth-6 code, the errors are located within the cycle
with the minimum length that is not desirable.
As the zero-valued variable nodes do not have any impact to the parity-checks, it
is enough to analyze the impact of variable nodes v1, v5, check nodes connected to
them (c1, c2, c3) and the variable nodes that are connected to these parity-checks at
least T times (could be flipped in some scenario). In this case, besides v1 and v5,
only v2 is connected to c1 and c3, and the other variable nodes are connected only to
one check node from set {c1, c2, c3}. The structure consisted of these nodes in
bipartite graph is presented in Fig. 9.12, where full circles represents the non-zero
0 0 0 0 1 0 0 0 0 0
d1 d10
+ +
h2(5) 1 h3(5)
...
T 2
ML
1
+ d 5* 0
Table 9.1 The decoding process during the first iteration, received word y1
j 1 2 3 4 5 6 7 8 9 10
ðjÞ ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
ci c1 ¼0 c1 ¼0 c1 ¼0 c1 ¼0 c2 ¼1 c2 ¼1 c2 ¼1 c3 ¼1 c3 ¼1 c4 ¼0
ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
c2 ¼1 c3 ¼1 c4 ¼0 c5 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c4 ¼0 c5 ¼0 c5 ¼0
P ðjÞ 1 1 0 0 2 1 1 1 1 0
ci
Dj 0 0 0 0 1 0 0 0 0 0
vj 0 0 0 0 0 0 0 0 0 0
9
Low Density Parity Check Codes
Problems
Table 9.2 The decoding process during the first iteration, received word y2
j 1 2 3 4 5 6 7 8 9 10
ðjÞ ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
ci c1 ¼1 c1 ¼1 c1 ¼1 c1 ¼1 c2 ¼0 c2 ¼0 c2 ¼0 c3 ¼1 c3 ¼1 c4 ¼0
ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
c2 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c4 ¼0 c5 ¼0 c5 ¼0
P ðjÞ 1 2 1 1 1 0 0 1 1 0
ci
Dj 0 1 0 0 0 0 0 0 0 0
vj 1 1 0 0 1 0 0 0 0 0
473
474 9 Low Density Parity Check Codes
v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0
c1 = 1 c2 = 0 c3 = 1 c4 = 0 c5 = 0
y ¼ ð0000100000Þ:
c2 c2
v2 v2
c3 c3
v5 v5
Problems 475
Solution
(a) The Gallager-B algorithm represents the simplest possible message passing
algorithm, where the information is transmitted across the edges of the
bipartite graph, from variable nodes to the check nodes and vice versa. Let
assume that the variable nodes that are connected to one check node are
neighbor variable nodes, and the check nodes connected to one variable node
are neighbor parity nodes as well. In message passing algorithms:
– variable nodes send reports about their current status to the check nodes,
– by using the reports from the neighbor variable nodes, check nodes make
an estimation of the value of the current variable node (in the next iteration,
the current state of variable node is not of direct interest, it is more
important how their neighbors see it!)
– by using majority of the estimations from the neighbor check nodes, new
value of the variable node is obtained.
Gallager-B algorithm represents iterative decoding procedure operating in a
binary field where, during the iteration, binary messages are sent along the edges of
Tanner graph. Let E(x) represent a set of edges incident on a node x (x can be either
symbol or parity-check node). Let mi ðeÞ and m0i ðeÞ denote the messages sent on
edge e from variable node to check node and check node to variable node at
iteration i, respectively. If the initial value of a bit at symbol node v is denoted as r
(v), the Gallager-B algorithm can be summarized as follows:
Initialization (i = 1): For each variable node v, and each set E(v), messages sent
to check nodes are computed as follows
m1 ðeÞ ¼ rðvÞ:
Step (i) (check node update): For each check node c and each set E(c), the update
rule for the ith iteration, i >1, is defined as follows
0 1
X
m0i ðeÞ ¼@ mi1 ðeÞAmod 2:
e0 2EðcÞnfeg
Step (ii) (variable node update): For each variable node v and each set E(v), the
update rule for the ith iteration, i >1, is defined as follows
8 P
< 1; if m0 ðeÞ dc=2e
Pe0 2EðvÞnfeg i0
mi ðeÞ ¼ 0; if e0 2EðvÞnfeg mi ðeÞ c 1 dc=2e;
:
rðvÞ; otherwise:
(1) Initialization
For variable node v5 incident edges are v5 ! c2 and v5 ! c3 and messages that
are initially sent are:
m1 ðv5 ! c2 Þ ¼ m1 ðv5 ! c3 Þ ¼ 1;
while the messages sent over all other edges are equal to binary zero (as the other
bits in received vector are equal to zero), as shown in Fig. 9.14.
(2) Step (i): update of check nodes (illustrated in Fig. 9.15):
– Update for check node c2
v1 = 0 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0
0 0 1
0 10 0 0
0 0 0 0 0
0 0 0 0 0 0 0
c1 = 0 c2 = 1 c3 = 1 c4 = 0 c5 = 0
Fig. 9.14 Initialization, red bold lines corresponds to send binary ones
Problems 477
v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0
0 0
0
0 0 0
0 0
c1 = 1 c2 = 0 c3 = 0 c4 = 0 c5 = 0
Fig. 9.15 Update of check nodes, red lines correspond c2, blue to c3, solid lines report from
variable nodes to check nodes, bold dashed lines report the estimations of symbol v12 by using
parity-checks
v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 0 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0
0 0
c1 = 0 c2 = 0 c3 = 0 c4 = 0 c5 = 0
Fig. 9.16 Update of check nodes, red lines correspond c2, blue to c3, solid lines report from
variable nodes to check nodes, bold dashed lines report the estimations of symbol v12 by using
parity-checks
X
m02 ðe0 Þ ¼ m02 ðc3 ! v5 Þ ¼ 0 ) m2 ðv5 ! c2 Þ ¼ 0
e0 2Eð5Þnfv5 !c2 g
2 3
0 1 0 1 1 0 0 1
61 1 1 0 0 1 0 07
H¼6
40
7:
0 1 0 0 1 1 15
1 0 0 1 1 0 1 0
(a) Find the code basic parameters and draw the corresponding Tanner graph.
(b) Find all code words corresponding to this matrix and find the code weight
spectrum.
(c) If the code words are transmitted by the unit power polar pulses, find the
samples at the decoder input if the corresponding sample values of additive
Gaussian noise are
(d) Supposing the channel noise variance r2 = 0.49, decode the words found
above by using sum-product algorithm.
Solution
(a) Number of ones in every column here is s = 2, and number of ones in every
row is q = 4. It is a regular LDPC code, matrix density is r = 2/4 = 0.5 and
the code rate R = 1−c/q = 1/2. The matrix has n = 8 columns and n−k = 4
rows, the code basic parameters are (n, k) = (8, 4).
Tanner graph of the code is shown in Fig. 9.17. It can be noticed that there are
no cycles of minimum length. A rule for its forming is shown as well—connections
between the second check node and the corresponding variable nodes (the second
matrix row) are denoted by heavy lines, while the connections of the seventh
variable node and the nodes corresponding to parity-check sums where they appear
(the seventh matrix column) are denoted by dashed lines.
v1 v2 v3 v4 v5 v6 v7 v8
c1 c2 c3 c4
⎡0 1 0 1 1 0 0 1⎤ 1
⎢1 1 1 0 0 1 0 0 ⎥⎥
H =⎢ 2
⎢0 0 1 0 0 1 1 1⎥
⎢ ⎥ 3
⎣1 0 0 1 1 0 1 0⎦
4
1 2 3 4 5 6 7 8
(b) On the basis of the parity-check matrix, vector v (eight bits) is a code vector if
the following is satisfied
c1 : v2 v4 v5 v8 ¼ 0;
c2 : v1 v2 v3 v6 ¼ 0;
c3 : v3 v6 v7 v8 ¼ 0;
c4 : v1 v4 v5 v7 ¼ 0:
but this system has not a unique solution in spite that the matrix H rank is 4!
Not until five bits are fixed, e.g. v1, v2, v3, v4, v5, the others bits can be found
from relations
c2 : v6 ¼ v1 v2 v3 ;
c4 : v7 ¼ v1 v4 v5 ;
c1 : v8 ¼ v2 v4 v5 :
The matrix has a full rank, it is the code (8, 5) and the corresponding code words
are
480 9 Low Density Parity Check Codes
minimum Hamming distance is dmin = 2 and it cannot be guaranteed even that this
code will always could correct one error in a code word. However, further it will be
shown that in some situations, using a suitable decoding procedure, it could correct
more!
(c) Consider the transmission of code word x = (11111111). This vector is
emitted as a polar pulse train x* = (+1, +1, +1, +1, +1, +1, +1, +1). The
transmission is through the channel with AWGN which has a known samples
values, and at its output the vector y* = (−0.1, 1.2, 0.7, −0.5, 1.1, 0.6, 0.85,
0.75) is obtained.
As shown in Fig. 9.18 this signal can be decoded using various procedures—
after hard decision majority decoding can be applied or iterative decoding on the
bit-flipping algorithm basis. The received word in this case would be
y = (01101111), where the errors occurred at the positions 1 and 4. It is recom-
mended to the reader to find the corresponding result if the iterative decoding with
hard decisions is applied.
In the following the procedure for decoding of the received word using sum-
product algorithm (SPA), known as well as belief-propagation algorithm, proposed
Q(aˆn )
2
1
-1→0 r Majority
aˆn
-2 -1 1 2
+1→1 decoder
-1
-2
q=2
Iterative decoder
bit-flipping
firstly in [90] and rediscovered in [93] will be described. It is supposed the AWGN
in channel has a standard deviation r = 0.7, the coefficients fjc are calculated firstly
on the basis of probability density functions according to the transmitted signal
values
the corresponding numerical values are given in Table 9.3. In such a way the initial
metrics are found showing the likelihood that single samples originate from the one
of the two possible emitted symbols.
Further, two sets of coefficients are calculated sent from the variable to the check
nodes in the first iteration, Q0ij and Q1ij . As presented in Fig. 9.19, the initial values
of these coefficients are found from
fjc ; Hði; jÞ ¼ 1;
Qcij ¼
0; Hði; jÞ ¼ 0:
After the check node ci receives from all connected variable nodes vj the
probability values Qcij (for c = 0 and c = 1), the probabilities are calculated that ith
parity-check equation is satisfied, if the variable node has the value c, using the
relation
X Y
Rcij ¼ Pðci =vÞ Qdikk
dj ¼c k2NðiÞ=j
where N(i) is a set of parent (variable) nodes indexes connected to the check node
ci, and N(i)/j is the same set, but without the node vj to which the information is
sent.
The term P(ci/v) denotes the probability that jth parity-check equation is satis-
fied, and summing is carried out over all possible decoded vectors v for which the
parity-check equation is satisfied when the informed variable node is in the state c.
Therefore, the coefficient R123 is the estimation that the check node 2 sends to the
variable node 2 (shown in Fig. 9.20) and it is calculated under the condition that the
corresponding parity-check equation v2 v4 v5 v7 ¼ 0 is satisfied and when
the second variable node is in the state v2 = 1, being
Table 9.3 The received vector values and the corresponding coefficients fjx
j 1 2 3 4 5 6 7 8
yj –0.1 1.2 0.3 –0.3 1.1 0.6 0.85 0.75
fj0 0.2494 0.0041 0.1016 0.3457 0.0063 0.0418 0.0173 0.0250
fj1 0.1658 0.5471 0.3457 0.1016 0.5641 0.4841 0.5570 0.5347
482 9 Low Density Parity Check Codes
v1 v2 v3 v4 v5 v6 v7 v8
Q14c = f 4c Q15c = f 5c
c c
Q = f
12 2
Q18c = f8c
c1 c2 c3 c4
Fig. 9.19 Initial information transmission from the variable nodes to the check nodes
v1 v2 = 1 v3 v4 v5 v6 v7 v8
Q14x
R12d 2 =1 Q15x
Q18x
c1 c2 c3 c4
Fig. 9.20 Transmission of initial information from check node to variable nodes
R112 ¼ Q014 Q015 Q117 þ Q014 Q115 Q017 þ Q114 Q015 Q017 þ Q114 Q115 Q117 ;
because for v2 = 1, the combinations ðv4 ; v5 ; v7 Þ: ð0; 0; 1Þ, ð0; 1; 0Þ, ð1; 0; 0Þ,
ð1; 1; 1Þ satisfy the parity-check.
Values R0ij and R1ij make possible to find the first estimation of the received
vector.
) )
v01 ¼ f10 R021 R041 ¼ 7:25 104 v05 ¼ f50 R015 R045 ¼ 4:19 105
) v1 ¼ 0 ) v5 ¼ 1
v11 ¼ f11 R121 R141 ¼ 5:49 104 v15 ¼ f51 R115 R145 ¼ 9:15 104
) )
v02 ¼ f20 R012 R022 ¼ 3:79 105 v06 ¼ f60 R026 R036 ¼ 6:49 105
) v2 ¼ 1 ) v6 ¼ 1
v12 ¼ f21 R112 R122 ¼ 6:98 104 v16 ¼ f61 R126 R136 ¼ 3:90 104
) )
v03 ¼ f30 R023 R033 ¼ 4:97 105 v07 ¼ f70 R037 R047 ¼ 2:23 105
) v3 ¼ 1 ) v7 ¼ 1
v13 ¼ f31 R123 R133 ¼ 3:79 103 v17 ¼ f71 R137 R147 ¼ 5:16 103
) )
v04 ¼ f40 R114 R144 ¼ 3:85 104 v08 ¼ f80 R018 R038 ¼ 8:39 105
) v4 ¼ 1 ) v8 ¼ 1
v14 ¼ f41 R114 R144 ¼ 5:25 104 v18 ¼ f81 R118 R138 ¼ 1:53 103
By forming the first estimation of the codeword v0 ¼ ð01111111Þ the first iter-
ation is ended. Because this vector does not satisfy the relation v0 H T ¼ 0, the
procedure should be continued (as a sent word is x = (1111110), it is obvious that
Problems 483
there is still one more error at the first bit, but the decoder, of course, does not
“know” it!).
The next iteration starts by calculating coefficients Q0ij and Q1ij . The variable node
vj from connected check nodes receives the probabilities Rcij values (for c = 0 and
c = 1), the probability is corrected that this node is in the state c
0 11
Y X Y
Qcij ¼ aij fjc Rckj ; aij ¼ @ fc
c j
Rckj A
k2MðjÞ=i k2MðjÞ=i
where aij is a normalization constant and M(j)/i is the set of indexes of check nodes
connected to the node vj, but without the check node ci to which the information is
sent.
On the basis of Fig. 9.21 it is obvious that except the check node to which the
considered variable node sends the information, only one more node is connected to
it. For the variable node j = 1, it can be written [93]
Q021 ¼ a21 f10 R041 ; Q121 ¼ a21 f11 R141 ; a21 ¼ ðf10 R041 þ f11 R141 Þ1 ;
Q041 ¼ a41 f10 R021 ; Q121 ¼ a41 f11 R121 ; a41 ¼ ðf10 R021 þ f11 R121 Þ1 ;
while, e.g. for the sixth variable node, the corresponding equations are
Q026 ¼ a26 f10 R036 ; Q126 ¼ a26 f11 R136 ; a21 ¼ ðf10 R036 þ f11 R136 Þ1 ;
Q036 ¼ a36 f10 R026 ; Q136 ¼ a36 f11 R126 ; a36 ¼ ðf10 R026 þ f11 R126 Þ1 ;
After that, the calculating procedure for coefficients R0ij and R1ij is repeated, as
described earlier.
Numerical values of coefficients Q0ij , Q1ij , R0ij and R1ij , for all combinations of
check and variable nodes, are given in Tables 9.4 9.5 9.6 9.7 9.8 and 9.9,
respectively.
v1 = 1 v2 v3 v4 v5 v6 = 0 v7 v8
v6 = 0
v1 =1 R26
R41
Q36v6 =0
v1 =1
Q21
c1 c2 c3 c4
Fig. 9.21 Information transmission from variable nodes to the check nodes, start of the next
iteration
484 9 Low Density Parity Check Codes
and v00 ¼ ð11111111Þ, this vector satisfies the relation v00 H T ¼ 0, a valid code
word is reconstructed and the decoding procedure is ended. It should be noticed that
the node for which the metrics are recalculated (it reports or is informed) does not
take part in the corresponding estimation calculation, providing that a procedure
converge to the correct solution. Algorithm is simpler if the number of ones in
every row is smaller, what is one of the reasons that it is desirable for the LDPC
parity-check matrix to have a small density.
Furthermore, during the decision whether the zero or one is decoded, the relative
ratio of values dj,0 i dj,1 is only important, and these values can be presented in a
normalized form. These normalized values are the estimation of likelihood for a
decision made after the iteration having the ordinal number iter, which, similarly as
for turbo codes decoding can be denoted as
corresponding to the probability that in the jth symbol interval the bit 0/1 is emitted,
when the received word is known. It is clear that a priori probabilities are 0.5 (if it is
not said that the symbols are not equiprobable) and initial estimation of likelihoods is
corresponding to the estimation before the decoding started (after the zeroth iter-
ation), i.e. to reliability estimation if only hard decision of the receiving bits was
made, without any processing by the decoder.
In Table 9.10 is given how the reliability estimations for single symbols change
during the iterative decoding (shadowed fields in the table correspond to wrongly
reconstructed bits):
486 9 Low Density Parity Check Codes
Table 9.10 Reliability estimations for symbols during the iterations for r = 0.7
Iter Lsj,( iter ) / j 1 2 3 4 5 6 7 8
0,(0)
L j 0.6006 0.0074 0.0543 0.8850 0.0111 0.0795 0.0302 0.0447
0
L1,(0)
j 0.3994 0.9926 0.9457 0.1150 0.9889 0.9205 0.9698 0.9553
L0,(1)
j 0.5690 0.0514 0.0130 0.4228 0.0438 0.0164 0.0043 0.0520
1
L1,(1)
j 0.4310 0.9486 0.9870 0.5772 0.9562 0.9836 0.9957 0.9480
L0,(2)
j 0.0482 0.3440 0.1335 0.0559 0.0644 0.1776 0.0103 0.0887
2
L1,(2)
j 0.9518 0.6560 0.8665 0.9441 0.9356 0.8224 0.9897 0.9113
L0,(3)
j 0.0584 0.0135 0.0377 0.1053 0.0068 0.0488 0.0105 0.0661
3
L1‚(3)
j 0.9416 0.9865 0.9623 0.8947 0.9932 0.9512 0.9895 0.9339
L0,(5)
j 0.0190 0.0123 0.0072 0.0280 0.0103 0.0097 0.0024 0.0083
5
L1,(5)
j 0.9810 0.9877 0.9928 0.9720 0.9897 0.9903 0.9976 0.9917
L0,(10)
j 0.0008 0.0003 0.0003 0.0008 0.0002 0.0004 0.0001 0.0004
10
L1,(10)
j 0.9992 0.9997 0.9997 0.9992 0.9998 0.9996 0.9999 0.9996
– before the decoding, the estimation is wrong at the first and the fourth symbol,
but these estimations are highly unreliable, while the values for the other bits are
very reliable,
– after the first iteration the error at the fourth bit is corrected, but its estimation is
unreliable, while the error at the first bit is not corrected, but this estimation is
not reliable as well,
– after the third iteration all errors are corrected and a valid code word is decoded,
but the second bit estimation has still a moderate reliability,
– after the third estimation all bits are estimated with the reliability higher than
90% and the reliability further grows with the increasing iteration number and
after the tenth iteration it is greater than 99.9%!
It is shown that the decoding can be stopped always in that iteration where a
valid code word is reconstructed, because the algorithm must converge to a unique
solution and the estimation reliability only grows with the number of iterations.
Besides, the reliability estimation of the obtained result depends on the estimated
channel signal-to-noise ratio (fjc depends on yj and r2), and not only on the received
word. In Table 9.11 the values of these estimations are given before the decoding
and during the first and the second iteration. The reader should draw the corre-
sponding conclusion on the basis of the obtained results.
Problem 9.7 Communication system uses for error control coding LDPC code
described by the parity-check matrix from the previous problem. For transmission
the unit power polar pulses are used and the signal at the channel output is
Problems 487
Table 9.11 Reliability estimations for symbols during the iterations for r = 1
Solution
The same code and the same sequence at the channel output are considered as in the
previous problem, but the decoding procedure differs, i.e. here the realization of
sum-product algorithm in logarithmic domain is considered. The motives to use a
logarithmic domain are the same as for turbo codes decoding—avoiding the
products calculation always when it is possible (i.e. their change by summations)
and lessening the overflow effect.
In this algorithm version on the basis of initial estimations of a posteriori
probabilities Pðxn ¼ þ 1=yn Þ and Pðxn ¼ 1=yn Þ for every symbol a priori loga-
rithmic likelihood ratio is defined
ð0Þ
Kj ¼ logðPðxn ¼ þ 1=yn Þ=Pðxn ¼ 1=yn ÞÞ:
1. For all check nodes connected with jth variable node is set initially
ð0Þ
ai;j ¼ Kj :
ðiterÞ ð0Þ
X
Kj ¼ Kj þ bi;j ;
i2MðjÞ
and the estimation of a corresponding code word symbol after current iteration is
ðiterÞ ð0Þ
found from vj ¼ sgnðKj Þ, where sgn(x) is the sign of the argument x.
3. Procedure from the previous step is repeated iteratively until vHT = 0 is
obtained or if a fixed (given in advance) number of iterations is achieved.
In this problem two different cases will be considered, as shown in Fig. 9.22.
In the first case the initial estimations are obtained on the “soft inputs” basis,
while in the second case they are obtained from the quantized samples (equivalent
to the received binary word). As it will be shown, the output in both cases is
“soft”, i.e. besides the send code word estimation a logarithmic likelihood ratio
will be found for every code word symbol, estimated on the channel output
sequence basis.
(a) When the estimation is based on the non quantized samples, the initial esti-
mations are formed using the same relations as in a previous problem
i x x* Channel y* Q(aˆn )
2
y Iterative decoder
LDPC 0→-1 1
sum-product User
encoder 1→+1
-1
noise
-2
q=2 (log-domain)
Iterative decoder
sum-product
(log-domain)
Fig. 9.22 Block scheme of a transmission system using LDPC codes when the decoding is
performed using sum-product algorithm in a logarithmic domain
Problems 489
1 ðyj þ 1Þ2
1 ðyj 1Þ2
ð0Þ
fj0 ¼ pffiffiffiffiffiffi e 2r2 ; fj1 ¼ pffiffiffiffiffiffi e 2r2 ) Kj ¼ log fj1 =fj0 :
2pr 2pr
The variables ai;j ; bi;j values are further calculated by the above expressions and
the numerical values, concluding with the second iteration, are given in Table 9.12.
On the basis of numerical values given in Table 9.13 the same conclusions can
be drawn as in the previous problem. It is easy to verify that between the logarithms
of likelihood ratio and a posteriori estimations that zero or one were sent (when
“non-logarithmic” SPA logarithm version is used) the connection can be
established
and it is left to a reader to verify whether this relation is satisfied for the numerical
values given in Tables 9.10 and 9.13.
(b) When the estimation is based on the quantized samples, system part from the
encoder output to the decoder input can be described as a binary symmetric
channel where the crossover probability is
1 1
p ¼ erfc pffiffiffi ¼ 0:0766:
2 2r
and the initial estimations are the same for all “positive” symbols (i.e. for all
“negative” symbols) given by
ð0Þ
Kj ¼ yqj logðð1 pÞ=pÞ:
Numerical values of variables ai;j ; bi;j are calculated by this relation and given in
Table 9.14, while the corresponding likelihood ratios are given in Table 9.15. It can
be noticed that a large number of symbols has the same likelihood ratio and that the
relation between the likelihood ratios does not changes significantly during a few
first iterations.
Already after the first iteration the algorithm reconstructs the word
v ¼ ð00101111Þ;
which satisfies the relation vHT = 0, i.e. it is a valid code word. However, it is clear
that it is not a sent code word (the sent code word x is composed from a binary ones
490
Table 9.12 Estimations of symbols reliability during the iterations, log-SPA, soft inputs
Iter i/j 1 2 3 4 5 6 7 8
0 ð0Þ –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
Kj
1 aij 1 0 4.8980 0 –2.0408 4.4898 0 0 3.0612
2 –0.4082 4.8980 2.8571 0 0 2.4490 0 0
3 0 0 2.8571 0 0 2.4490 3.4694 3.0612
4 –0.4082 0 0 –2.0408 4.4898 0 3.4694 0
bij 1 0 –1.6791 0 2.7264 –1.6988 0 0 –1.9089
2 1.8944 –0.3041 –0.3367 0 0 –0.3573 0 0
3 0 0 1.8132 0 0 2.0048 1.6678 1.7518
4 –1.7642 0 0 –0.3742 0.2934 0 0.3055 0
2 aij 1 0 4.5938 0 –2.4150 4.7832 0 0 4.8130
2 –2.1724 3.2188 4.6703 0 0 4.4538 0 0
9
Table 9.13 Logarithmic likelihood ratio during the iterations, log-SPA, soft inputs
Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
1 Λ (1)
j –0.2779 2.9147 4.3336 0.3114 3.0844 4.0965 5.4426 2.9041
2 Λ (2)
j 2.9832 0.6453 1.8701 2.8272 2.6764 1.5328 4.5619 2.3295
3 Λ (3)
j 2.7796 4.2936 3.2386 2.1395 4.9860 2.9701 4.5433 2.6488
5 Λ (5)
j 3.9420 4.3898 4.9216 3.5455 4.5636 4.6213 6.0449 4.7795
10 Λ (10)
j 7.0962 8.2109 8.1903 7.0840 8.3884 7.9194 9.1831 7.7597
only), and besides the two errors occurred in a channel, the decoder introduced an
additional error at the second code word bit.
It is interesting to notice that the Hamming distance of the reconstructed code
word x′ = (00101111) to the received word y = (01101111) equals one, while the
distance the received word to the emitted word x = (11101111) equals two.
Therefore, the algorithm made the best possible decision on the received word basis
and the parity-check matrix structure knowledge.
On the other hand, squared Euclidean distance of sequence y* = (−0.1, 1.2, 0.7,
−0.5, 1.1, 0.6, 0.85, 0.75) to the concurrent emitting sequences (−1, −1, 1, −1, 1, 1,
1, 1) and (+1, +1, +1, +1, +1, +1, +1, +1) respectively is 6.245 i 3.845, and the
sequence corresponding to the emitted code word is nearer to the received
sequence, explaining the result from the first part of this problem.
Now it is obvious that the decoding error in the second part of solution is not the
consequence of the non-optimum decoding algorithm (it is the same in both cases),
but of the quantization error. Of course, if the quantization was carried out using a
sufficient number of levels, this effect would become negligible.
Problem 9.8 Transmission system uses LDPC error control code described by
parity-check matrix given in the previous problems, for transmission the unit power
polar pulses are used and the signal at the channel output is
Decode the signal if the channel noise is AWGN with variance r2 = 0.49, when
the signal is led directly to the decoder input by applying the following:
(a) Min-sum algorithm,
(b) Min-sum algorithm supposing that the initial LLR-s are determined by signal
samples,
(c) Self-correcting min-sum algorithm supposing that the initial LLR-s are
determined by signal samples.
492
Table 9.14 Estimations of symbols reliability during the iterations, log-SPA, hard inputs
Iter i/j 1 2 3 4 5 6 7 8
0 Lj –2.49 2.49 2.49 –2.49 2.49 2.49 2.49 2.49
1 aij 1 0 2.49 0 –2.49 2.49 0 0 2.49
2 –2.49 2.49 2.49 0 0 2.49 0 0
3 0 0 2.49 0 0 2.49 2.49 2.49
4 –2.49 0 0 –2.49 2.49 0 2.49 0
bij 1 0 –1.4095 0 1.4095 –1.4095 0 0 –1.4095
2 1.4095 –1.4095 –1.4095 0 0 –1.4095 0 0
3 0 0 1.4095 0 0 1.4095 1.4095 1.4095
4 –1.4095 0 0 –1.4095 1.4095 0 1.4095 0
2 aij 1 0 1.0805 0 –3.8995 3.8995 0 0 3.8995
2 –3.8995 1.0805 3.8995 0 0 3.8995 0 0
9
Table 9.15 Logarithmic likelihood ratio during the iterations, log-SPA, hard inputs
Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –2.4900 2.4900 2.4900 –2.4900 2.4900 2.4900 2.4900 2.4900
1 Λ (1)
j –2.4900 –0.3290 2.4900 –2.4900 2.4900 2.4900 5.3090 2.4900
2 Λ (2)
j –1.9844 –3.1139 1.9844 –1.9844 1.9844 1.9844 2.9721 1.9844
3 Λ (3)
j –3.4898 –1.2587 3.4898 –3.4898 3.4898 3.4898 3.5515 3.4898
5 Λ (5)
j –3.4409 –3.7699 3.4049 –3.4049 3.4049 3.4049 4.3304 3.4049
10 Λ (10)
j –5.7482 –4.1346 5.7482 –5.7482 5.7482 5.7482 7.1994 5.7482
Solution
As known from the previous problem, logarithmic version of a sum-product
algorithm has the substantial advantages in respect to its original version:
– the number of multiplications is decreased in favor the additions
– the overflow probability is decreased
– the number of relations whose values should be found is two times smaller—
instead of Q0ij and Q1ij only ai;j is calculated, and instead of R0ij and R1ij only bi;j is
calculated,
– a number of mathematical operations in every step is smaller, there is no need
for normalization etc.
However, the algorithm explained in a previous problem has one serious
drawback. To find the information which the check node sends to it adjoined
variable node j one has to determine
8 9
< Y =
bi;j ¼ 2 tan h1 tan hðai;k =2Þ ;
:k2NðiÞnj ;
it is obvious that the corresponding numerical value cannot be easily found, and still
more serious problems appear when the hardware block should be implemented
corresponding to this relation.
That is the reason to use the SPA procedure versions much easier to be
implemented:
1. The simplest (and still rather exact) approximation of the above equality is
2 3
Y
bi;j ¼ 4 sgnðai;k Þ5 min ai;k ;
k2NðiÞnj
k2NðiÞnj
494
Table 9.16 Estimations of symbols reliability by the check nodes, min-sum, soft inputs
Iter i/j 1 2 3 4 5 6 7 8
0 ð0Þ –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
Kj
1 bij 1 0 –2.0408 0 3.0612 –2.0408 0 0 –2.0408
2 2.4490 –0.4082 –0.4082 0 0 –0.4082 0 0
3 0 0 2.4490 0 0 2.8571 2.4490 2.4490
4 –2.0408 0 0 –0.4082 0.4082 0 0.4082 0
2 bij 1 0 –2.4490 0 4.4898 –2.4490 0 0 –2.4490
2 2.8571 –2.4490 –2.4490 0 0 –2.4490 0 0
9
being the key modification defining the difference between SPA and min-sum
algorithm [97].
2. The modification concerning the information which the variable node i sends to
check node j connected to it is possible as well, and the corresponding value is
now calculated in two steps
X
ð0Þ ai;j ; sgn(ai;j Þ ¼ sgn(ai;j Þ;
ai;j ¼ Kj þ bk;j ; ai;j ¼
0; sgn(ai;j Þ 6¼ sgn(ai;j Þ;
k2MðjÞni
Table 9.17 Logarithmic likelihood ratio during iterations, min-sum, soft inputs
Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
(1)
1 Λ j 1.1×10 -16
2.4490 4.8980 0.6122 2.8571 4.8980 6.3265 3.4694
2 Λ (2)
j 3.4694 8.9×10-16 1.4286 4.4898 3.0612 1.0204 5.5102 2.6531
(3)
3 Λ j 4.0816 5.5102 3.4694 2.4490 6.9388 3.4694 5.5102 3.0612
5 Λ (5)
j 5.5102 4.8980 5.9184 5.3061 5.5102 5.5102 7.5510 6.9388
(10)
10 Λ j 12.6531 12.0408 12.0408 12.8571 13.2653 12.0408 14.2857 12.0408
496 9 Low Density Parity Check Codes
Table 9.18 Logarithmic likelihood ratio during iterations, self-correcting min-sum, soft inputs
Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0,4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
1 Λ (1)
j 1.1×10 -16
2.4490 4.8980 0.6122 2.8571 4.8980 6.3265 3.4694
2 Λ (2)
j 2.4490 8.9×10-16 1.4286 2.4490 2.0408 1.0204 4.4898 2.6531
3 Λ (3)
j 2.0408 2.4490 2.4490 0.4082 2.4490 2.4490 3.4694 1.0204
5 Λ (5)
j 2.4490 2.4490 2.4490 1.0204 2.4490 3.0612 3.4694 1.0204
10 Λ (10)
j 2.4490 2.4490 2.4490 1.0204 2.4490 3.0612 3.4694 1.0204
ð0Þ
Kj ¼ yj :
Although it is clear that this approach is not optimum and that the LLR for some
symbols by a rule will have a substantially smaller values, numerical results (given
in Table 9.19) show that the convergence rate in respect to the previously con-
sidered case did not changed. Because of that, such an approach is often used for
the implementation of simplified sum-product algorithm versions.
Table 9.19 Logarithmic likelihood ratio during iterations, self-correcting min-sum, input samples
are used as the initial LLR values
Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
0 Λ (0)
j –0.1 1.2 0.7 –0.5 1.1 0.6 0.85 0.75
1 Λ (1)
j –1.1×10-16 0.6 1.2 0.15 0.7 1.2 1.55 0.85
(2)
2 Λ j –0.6 –2.2×10 -16
0.35 0.6 0.5 0.25 1.1 0.65
3 Λ (3)
j 0.5 0.6 0.6 0.1 0.6 0.6 0.85 0.25
5 Λ (5)
j 0.6 0.6 0.6 0.25 0.6 0.75 0.85 0.25
10 Λ (10)
j 0.6 0.6 0.6 0.25 0.6 0.75 0.85 0.25
Problems 497
Problem 9.9
(a) Explain the construction of parity-check matrix of a quasi-cyclic code where n
−k = 4, the first circulants is a1(x) = 1 + x and the second a2(x) = 1 + x2 + x4.
Draw a corresponding bipartite graph.
(b) Find this code generator matrix starting from the fact that its second circulant
is invertible. Write all code words and verify whether the code is cyclic.
(c) If it is not known that a code is quasi-cyclic, find its matrix by parity-check
matrix reduction in a systematic form to a reduced standard row-echelon form.
(d) Find a generator matrix by transforming the corresponding parity-check matrix
in approximate lower diagonal form.
Solution
(a) Parity-check matrix of quasi-cyclic (QC) LDPC code in a general case has a
form [99,100]
H ¼ ½A1 ; A2 ; . . .; Al
Tanner bipartite graph for this code is shown in Fig. 9.23. Although the groups
of symbol nodes, corresponding to every circulant do not overlap, it does not
guarantee that the small length cycles will be avoided.
(b) It is known that generator matrix of quasi-cyclic code generally has the form
498 9 Low Density Parity Check Codes
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
c1 c2 c3 c4 c5
2 T 3
ðA1
l A1 Þ
6 ðA1 T 7
G¼6 l A2 Þ 7;
4 I uðl1Þ ... 5
1 T
ðAl Al1 Þ
where matrix A1 is invertible (it is not necessarily the last right square submatrix in
H!). It is obvious that a corresponding code has the code word length n = ul, while
the information word length is k = u(l−1).
In this example circulant a2(x) is invertible yielding a−1 2 3 4
2 (x) = x + x + x , and it
can be easily verified that the product of corresponding matrices modulo-2 yields
the unity matrix
2 3 2 3
0 0 1 1 1 1 0 0 0 0
61 0 0 1 17 60 1 0 0 07
6 7 6 7
A1
2 ¼6
61 1 0 0 17 1 6
7 ) A2 A2 ¼ 6 0 0 1 0 077 ¼ I5
41 1 1 0 0 5 40 0 0 1 05
0 1 1 1 0 0 0 0 0 1
Code corresponding to this generator matrix is systematic and the first five bits
are completely determined by information word at the encoder input. The corre-
sponding transformation is
Problems 499
i ¼ ð00001Þ ! x ¼ ð0000100101Þ
From the generator matrix it can be easily noticed that the successive cyclic
shifts of this word correspond to information words
x0 ¼ ð1000010010Þ ! i0 ¼ ð10000Þ
x00 ¼ ð0100001001Þ ! i00 ¼ ð01000Þ
x000 ¼ ð1010000100Þ ! i000 ¼ ð10100Þ
here first two relations are really satisfied (to these code words correspond written
information sequences), but the third equality is not satisfied because to information
sequence (10100) corresponds code word
i ''' = (10100) → x ''' = (1010000110) .
Because of the fact that to one information sequence cannot correspond two
different code words, it is obvious that a code is not cyclic, because at least one
cyclic code word shift is not a code word. This code is only quasi-cyclic, because
the parity-check bits are given by matrix corresponding to the polynomial
T
2 ðxÞ * a2 ðxÞ ¼¼ 1 þ x :
a1 1 3
(c) Standard Parity-check matrix row-echelon form satisfies the condition that in
the neighboring rows of parity-check matrix (being not all zeros rows!) a
leading one of the lower row appears in the one column to the right regarding
to the upper row. Reduced standard form includes an additional condition—
the column comprising a leading one in some row has not ones at the other
positions, i.e. the parity-check matrix begins by unit matrix having dimensions
(n – k) (n – k).
It is obvious that the first four rows of the found parity-check matrix satisfy a
standard echelon form, but the fifth row does not satisfy it. In this case it is difficult
by using standard operations over the rows (reordering of rows, change of one row
by sum modulo-2 of two rows) to form complete matrix satisfying standard echelon
form. Therefore, step-by-step, the try will be made to form parity-check matrix in a
systematic form. In the first column, one should be only at the first position, the fifth
row is changed by the sum of the first and the fifth row, forming the matrix H1. In
the second column, one should be only at the second position, and the first row of
matrix H1 is changed by the sum of the first and the fifth row, and the fifth row is
changed by the sum of the second and the fifth row yielding the matrix H2. Besides,
one should be careful not to decrease the matrix rang by these transformations.
500 9 Low Density Parity Check Codes
2 3 2 3
1 1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 1 0 0 1 1 0 1 07
6 7 6 7
6 7 6 7
H¼60 0 1 1 0 0 1 1 0 1 7 ! H1 ¼ 6 0 0 1 1 0 0 1 1 0 1 7;
6 7 6 7
60 7 60 07
4 0 0 1 1 1 0 1 1 05 4 0 0 1 1 1 0 1 1 5
1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0
2 3 2 3
1 1 0 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 1 0 0 1 1 0 1 07
6 7 6 7
6 7 6 7
H 1 ¼ 6 0 0 1 1 0 0 1 1 0 1 7 ! H 2 ¼ 6 0 0 1 1 0 0 1 1 0 1 7:
6 7 6 7
60 0 0 1 1 1 0 1 1 07 60 0 0 1 1 1 0 1 1 07
4 5 4 5
0 1 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0
In the third column now one should be eliminated in the second row (changing
by the sum of the second and fifth row of matrix H2). Further in the new obtained
matrix a similar procedure is applied for elimination of ones in the fourth column—
in the third row (sum 3rd and 5th) and after in the fifth row (sum 4th and 5th). As
the zero in fifth row and fifth column of matrix H3 cannot be eliminated and it
cannot be written in systematic form, this approach does not yield the results!
2 3 2 3
1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 0 0 1 1 1 1 1 0 7
6 7 6 7
H2 ¼ 6
60 0 1 1 0 0 1 1 0 177 ! H 3 ¼ 60
6 0 1 1 0 0 1 1 0 1 7;
7
6 7 6 7
40 0 0 1 1 1 0 1 1 05 40 0 0 1 1 1 0 1 1 05
0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1
2 3 2 3
1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 0 0 1 1 1 1 1 07 60 1 0 0 1 1 1 1 1 07
6 7 6 7
H3 ¼ 6
60 0 1 1 0 0 1 1 0 17 6
7 ! H4 ¼ 6 0 0 1 0 1 0 0 1 0 077:
6 7 6 7
40 0 0 1 1 1 0 1 1 05 40 0 0 1 1 1 0 1 1 05
0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 1 1
Of course, it does not mean that the parity-check matrix cannot be written in a
systematic form, but only means that it cannot be obtained by reducing it to reduced
standard echelon form. Knowing that a code is quasi-cyclic, from generator matrix
it is easily obtained
2 3
1 0 1 0 0 1 0 0 0 0
60 1 0 1 0 0 1 0 0 07
6 7
H s ¼ A1
2 A1 I5 ¼6
60 0 1 0 1 0 0 1 0 077:
41 0 0 1 0 0 0 0 1 05
0 1 0 0 1 0 0 0 0 1
A B T
H¼
C D E
where matrix T is square, dimensions (n–k–g) (n–k–g), where g denotes gap and
if this value is smaller, the encoding procedure complexity is smaller. Matrix B then
has dimensions (n−k−g) g, while dimensions of A are (n−k−g) k.
This matrix further by Gauss-Jordan elimination is reduced to a form
~T ¼ I nkg 0 ~ A B T
H HT ¼ ~ ~
ET 1 Ig C D 0
~ ¼ ET 1 A þ C;
C ~ ¼ ET 1 B þ D;
D
On the basis of this matrix form, the code word can be obtained in a systematic
form, where it has the form
x ¼ ði; p1 ; p2 Þ;
where i is the information word and the other component vectors are formed by
using relations
p1 ¼ D ~ T;
~ 1 Ci p2 ¼ T 1 ðAiT þ Bp1 Þ:
Finally, by reordering the third and the fourth row in the above matrix, the
parity-check matrix is formed in an approximate lower triangular form
⎡1 1 1 1 0 1 1 0 0 0⎤
⎢0 1 1 1 1 0 1 1 0 0 ⎥⎥
⎢
H T = ⎢0 0 0 1 1 1 0 1 1 0⎥ ,
⎢ ⎥
⎢0 0 1 1 0 0 1 1 0 1⎥
⎢⎣1 0 0 0 1 0 1 0 1 1 ⎥⎦
where
2 3 2 3 2 3
1 1 1 1 0 1 1 0 0 0
60 17 607 61 0 07
6 1 1 1 7 6 7 6 1 7
A¼6 7; B ¼ 6 7; T ¼ 6 7;
40 0 0 1 15 415 40 1 1 05
0 0 1 1 0 0 1 1 0 1
C = ½1 0 0 0 1 ; D = ½0 ; E ¼ ½ 1 0 1 1 :
where
~ ¼½ 1 0
C 1 0 0 ; ~ ¼½1 :
D
and matrices multiplied from the left by information vector just correspond to the
sixth, i.e. to 7–10 columns of the generator matrix obtained in the second part of the
problem solution using a totally independent way. The procedure described in this
part of solution is general (not limited to quasi-cyclic codes only), guaranteeing that
the systematic code obtaining even in a case if it is applied by successive vectors p1
and p2 calculations, has approximately linear complexity.
Problem 9.10
(a) Explain the calculation of sphere packing bound and using this relation find
the probability that the decoding is unsuccessful if the transmission is over
BSC, if the crossover probability is p < 0.14, and a used code has code rate
R = 1/2, code word length being 10 n 1000.
(b) Draw the probability that a code word after transmitting over BSC is decoded
unsuccessfully. Give the results for regular PEG (Progressive Edge Growth)
LDPC code, code word length n = 200, if for decoding are applied
bit-flipping, min-sum, self-correcting min-sum and sum-product algorithm.
Compare the obtained results to the limiting values found in a previous part.
(c) For the same code as above draw the dependence of the error probability after
decoding on the crossover probability, if the transmission is through AWGN
channel for Eb/N0 < 10 dB. Compare the obtained results to the corresponding
Shannon bound and find the code gain for Pe = 10−4.
(d) In the case of the channel with AWGN, when there is no a quantizing block,
the channel can be considered as having a discrete input (a binary one!) while
the output is continuous. For such case use PEG code which has the code word
length n = 256, code rate R = 1/2, and calculate the residual bit error proba-
bility (BER) if for decoding are applied bit-flipping, gradient multi-bit flipping
(GDBF), min-sum, and sum-product algorithm.
504 9 Low Density Parity Check Codes
Solution
a) Sphere packing bound in general case can be written in a form
qn d1
Aq ðn; dÞ ; ec ¼ ;
P
ec
n 2
ðq 1Þl
l¼0 l
where Aq(n, d) is a maximum number of words of code basis q and the minimum
Hamming distance is at least d. It is easy to notice that sphere packing bound is only
the second name for a Hamming bound (defined in Problem 5.8), and for a case of
binary linear block codes Aq(n, d) = 2k and the above relation reduces to
2n X
ec
n
2k ) 2nð1RÞ :
Pec
n l
l¼0
l¼0 l
From the above one can find a number of errors correctable in the optimum case,
in the code word of the length n and for the code rate R (because k = Rn).
(b) The lower bound for the unsuccessful decoding is
X
ec
n l
Pe;min ðpÞ 1 p ð1 pÞnl ;
l¼0
l
and as to one code word usually corresponds one transmission frame, the above
relation in effect gives the frame error probability (rate) (FER). Besides, the
equality is satisfied only if the code is perfect, while in a general case (and for non
perfect codes) the exact expression for the probability that at least one error is not
corrected depends on the weight code spectrum given in fourth part of the Problem
5.8. The corresponding numerical values are shown in Fig. 9.24.
(c) Firstly, the case will be considered for binary symmetric channel, having
crossover probability p. The class of Progressive Edge Growth (PEG) LDPC
codes is considered, where the parity-check matrix is formed by a constructive
method (it is a structured, and not a random approach with the limitations as
for Gallager codes) described in [102]. To analyze these codes, a Monte Carlo
simulation method was used. Basics of that procedure are shown in Fig. 9.25
—at the coded sequence the uncorrelated sequence is superimposed where the
probability of ones equals p and the decoding is performed. The residual bit
error probability is estimated by comparing the obtained sequence to the
sequence at the encoder input and the corresponding results for the code PEG
(200,100) and various decoding algorithms are shown in Fig. 9.26.
Problems 505
0
10
n=10
n=30
n=100
n=200
−1
10 n=500
n=1000
Frame Error Rate, FER
−2
10
Schannon
limit for BSC
−3
10
−4
10
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Crossover probability in BSC, p
From the obtained diagram it can be seen that from all the considered algorithms
SPA provides the minimum and bit-flipping the maximum error probability.
Simplified variants of SPA decoding procedure have the smaller complexity, but
the decoder performances are degraded as well. It is obvious that by min-sum
algorithm correction these degradations can be lessened substantially. However,
even when the optimum decoding algorithm is used, there is a significant difference
to the performances foreseen by sphere packing bound, being the consequence of
“non perfectness” of the considered PEG code.
(d) For the case of the channel with AWGN, when there is no a quantizing block,
the channel can be considered as a channel which has a discrete input (a binary
one!) while the output is continuous. In this case the Monte Carlo simulation is
Generator of b c r b'
LDPC LDPC
information User
encoder decoder
sequence
Probability of error
estimation
0
10
−1
10
−2
10
bit−flipping
min−sum
self−correcting min−sum
sum−product
sphere packing bound
−3
10
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Crossover probability in BSC
0
10
uncoded
bit−flipping
−1
10 multi−GDBF
min−sum
sum−product
−2
10
Bit Error Rate, BER
−3
10
−4
10
−5
10
−6
10
1 2 3 4 5 6 7 8 9 10 11
Eb/No [dB]
performed on the modulation channel level and the results are shown in
Fig. 9.27 for PEG code having the code word length n = 256 and code rate
R = 1/2. The detailed description of code construction and the simulation
procedure for this case are given in [103].
For the case when sum-product algorithm is applied, the error probability 10−6 is
achieved for the substantially smaller values of parameter Eb/N0 in respect to the
case when bit-flipping algorithm is applied. Some modification of bit-flipping
algorithm (e.g. gradient multi-bit flipping—GDBF) allows to increase the code gain
from 0.4 to 2.5 dB (for Pe,res = 10−6). On the other hand, even a significant sim-
plification of sum-product algorithm (min-sum is just such one) in the considered
case does not gives a significant performances degradation—coding gain decreases
from 6.1 to 5.8 dB. It is just a reason that contemporary decoding algorithms are
based mainly on message-passing principle, with smaller or greater simplifications.
The Shannon bound for code rate R = 1/2 is 0.19 dB [1], it is clear that even
SPA algorithm for Pe,res = 10−6 provides performances being about 4.5 dB worse
than that limiting value. Of course, this difference can be substantially decreased if
the code word length is longer (luckily, PEG procedure allows the codes con-
struction having practically arbitrary code words length). On the Shannon theory
basis, providing the negligible small probability of error for Eb/N0 < 0.19 dB is
possible only if the code rate is additionally reduced, and even when R ! 0, a
reliable transmission can be provided only if Eb/N0 > –1, 59 dB.
References
22. H. Nyquist, Certain topics in telegraph transmission theory. Trans. AIEE 47, 617–644
(1928)
23. R.W. Hamming, Error detecting and error correcting codes. Bell Sys. Tech J. 29, 147–160
(1950)
24. M.J.E. Golay, notes on digital coding. Proc. IRE 37, 657 (1949)
25. S. Lin, D.J. Costello Jr., Error Control Coding—Fundamentals and Applications, 2nd edn.
(Prentice Hall, Englewood Cliffs, NJ, 2004), Prentice Hall, Englewood Cliffs, N.J., 1983
26. R.H. Morelos-Zaragoza, The Art of Error Correcting Coding (Wiley, Hoboken, 2002)
27. R.E. Blahut, Theory and Practice of Error Control Codes (Addison-Wesley Publishing
Company, Reading, Massachusetts, 1983)
28. I.S. Reed, A class of multiple-error-correcting codes and the decoding scheme. IRE Trans.
Inform. Theor. IT-4, 38–49 (1954)
29. D.E. Muller, Application of Boolean algebra to switching circuit design and to error
detection. IRE Trans. Electr. Comp. EC-3, 6–12 (1954)
30. D. Brown, Error detecting and correcting binary codes for arithmetic operations. IRE Trans.
Electron. Comput. EC-9, 333–337 (1960)
31. R.R. Varshamov, G.M. Tenengolz, One asymetrical error-correction codes. Avtomatika i
Telematika 26, 288–292 (1965)
32. R.R. Varshamov, A class of codes for asymmetric channels and a problem from the additive
theory of numbers. IEEE Trans. Inform. Theor. IT-19, 92–95 (1973)
33. A.J.H. Vinck, H. Morita, Codes over the ring of integer modulo m. IEICE Trans. Fundam.
E-81-A, 2013–2018 (1998)
34. W.W. Peterson, E.J. Weldon Jr., Error-Correcting Codes, 2nd edn. (The MIT Press,
Cambridge, 1972)
35. W.W. Peterson, D.T. Brown, Cyclic codes for error detection. Proc. IRE 49, 228–235
(1961)
36. G. Castagnoli, J. Ganz, P. Graber, Optimum cyclic redundancy-check codes with 16-bit
redundancy. IEEE Trans. Commun. 38, 111–114 (1990)
37. A. Hocquenghem, Codes correcteurs d'erreurs. Chiffres 2, 147–156 (1959)
38. R.C. Bose, D.K. Ray-Chaudhuri, On a class of error correcting binary group codes. Inform.
Control 3, 68–79 (1960)
39. S.B. Wicker, Error control systems for digital communication and storage (Prentice Hall
Inc, New Jersey, 1995)
40. W.W. Peterson, Encoding end error-correction procedures for the Bose-Chaudhuri codes.
IRE Trans. Inform. Theor. 6, 459–470 (1960)
41. E.R. Berlekamp, Nonbinary BCH decoding, in Proceedings of the International Symposium
on Information Theory (1967). San Remo, Italy
42. I.S. Reed, G. Solomon, Polynomial codes over certain finite fields. SIAM J. Appl. Math. 8,
300–304 (1960)
43. D.C. Gorenstein, N. Zierler, A class of error-correcting codes in pm symbols. J. Soc. Industr.
Appl. Math. 9, 207–214 (1961)
44. J. Massey, Shift-register synthesis and BCH decoding. IEEE Trans. Inform. Theor. 15,
122–127 (1969)
45. R.E. Blahut, Theory and practice of error control codes (Addison-Wesley Publishing
Company, Reading, Massachusetts, 1983)
46. P. Elias, Coding for noisy channels. IRE Convention Rec. 3, 37–46 (1955). (Pt. 4)
47. G.D. Forney Jr, Convolutional codes I: algebraic structure. IEEE Trans. Inform. Theor.
IT-16, 720–738 (1970), IT-17, 360 (1971)
48. J.L. Massey, Threshold Decoding (MIT Press, Cambridge, 1963)
49. D.E. Muller, Application of Boolean algebra to switching circuit design and to error
detection. IRE Trans. Electron. Comp. EC-3, 6–12 (1954)
50. I.S. Reed, A class of multiple-error-correcting codes and the decoding Sheme. IRE Trans.
Inform. Theor. PGIT-4, 38–49 (1954)
References 511
51. J.M. Wozencraft, Sequential decoding for reliable communication. National IRE
Convention Rec. 5(2), 11–25 (1957)
52. R.M. Fano, A Heuristic discussion of probabilistic decoding. IEEE Trans. Inform. Theor.
IT-19, 64–74 (1963)
53. К. Ш. Зигaнгиpoв, “Heкaтopыe пocлeдoвaтeльныe пpoцeдypы дeкoдиpoвaния”,
Пpoблeмы пepeдaчи инфopмaции, T.2. (1966), cтp. 13–25
54. F. Jelinek, Fast sequential decoding algorithm using a stack. IBM J. Res. Dev. 13, 675–678
(1969)
55. R.W. Chang, J.C. Hancock, On receiver structures for channels having memory. IEEE
Trans. Inform. Theor. IT-12, 463–468 (1966)
56. J.L. Massey, Coding and modulation in digital communications, in International Zurich
Seminar on Digital Communications (Zurich, Switzerland, 1974), pp. E2(1)–E2(4)
57. G. Ungerboeck, I. Csajka, On improving data-link performance by increasing the channel
alphabet and introducing sequence coding, in International Symposion on Information
Theory (Ronneby, Sweden, 1976)
58. G. Ungerboeck, Channel coding with multilevel/phase signals. IEEE Trans. Inform. Theor.
IT-28, 55–67 (1982)
59. G. Ungerboeck, Trellis-coded modulation with redundant signal sets Part I: introduction.
IEEE Commun. Mag. 25, 5–11 (1987)
60. G. Ungerboeck, Trellis-coded modulation with redundant signal sets Part II: State of the art.
IEEE Commun. Mag. 25, 12–21 (1987)
61. R. Johanneson, K.S. Zigangirov, Fundamentals of convolutional coding (IEEE press, New
York, 1999)
62. A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm. IEEE Trans. Inform. Theor. IT-13, 260–269 (1967).
63. A.J. Viterbi, J.K. Omura, Principles of Digital Communication and Coding (McGraw-Hill,
New York, 1979)
64. J.B. Cain, G.C. Clark, J.M. Geist, Punctured convolutional codes of rate (n–1)/n and
simplified maximum likelihood decoding, IEEE Trans. Inform. Theor. IT-25, 97–100
(1979)
65. G. Ungerboeck, Channel coding with multilevel/phase signals. IEEE Trans. Inform. Theor.
IT-28, 55–67 (1982)
66. L.R. Bahl, J. Cocke, F. Jelinek, J. Raviv, Optimum decoding of linear codes for minimizing
symbol error rate. IEEE Trans. Inform. Theor. IT-20, 284–287 (1974)
67. J.K. Wolf, Efficient maximum likelihood decoding of linear block codes. IEEE Trans.
Inform. Theor. IT-24, 76–80 (1978)
68. B. Vucetic, J. Yuan, Turbo Codes—Principles and Applications (Kluwer Academic
Publishers, Boston, 2000)
69. B. Honary, G. Markarian, Trellis Decoding of Block Codes (Kluwer Academic Publishers,
Boston, 1997)
70. D.J. Costello Jr., J. Hagenauer, H. Imai, S.B. Wicker, Applications of error-control coding.
IEEE Trans. Inform. Theor. 44, 2531–2560 (1998)
71. S. Lin, T. Kasami, T. Fujiwara, M. Fossorier, Trellises and Trellis-Based Decoding
Algorithms for Linear Block Codes (Kluwer Academic Publishers, Boston, 1998)
72. J. Hagenauer et al., Variable-rate sub-band speech coding and matched channel coding for
mobile radio channels, in Proceedings of the 38th IEEE Vehicular Technology Conference
(1988), pp. 139–146
73. L.R. Bahl, J. Cocke, F. Jelinek, J. Raviv, Optimum decoding of linear codes for minimizing
symbol error rate, IEEE Trans. Inform. Theor. IT-20, 284–287 (1974).
74. P. Robertson, E. Villebrun, P. Hoeher, A comparison of optimal and sub-optimal MAP
decoding algorithms operating in the log domain, in Proceedings of the IEEE ICC ‘95
(Seattle, June 1995), pp. 1009–1013
512 References
75. W. Koch, A. Baier, Optimum and sub-optimum detection of coded data disturbed by
time-varying intersymbol interference, in Proceedings of the IEEE GLOBECOM ‘90,
November 1990, Vol. II, pp. 1679–1684
76. M.C. Valenti, An efficient software radio implementation of the UMTS turbo codec, in
Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio
Communications (San Diego, CA, Sept 2001), pp. G-108–G-113
77. J. Hagenauer, P. Hoeher, A Viterbi algorithm with soft-decision outputs and its
applications, in GLOBECOM ’89, Nov, pp. 1680–1686
78. Y. Li, B. Vucetic, Y. Sato, Optimum Soft-Output Detection for Channels with Inter-symbol
Interference. IEEE Trans. Inform. Theory 41, 704–713 (1995)
79. C. Berrou, A. Glavieux, P. Thitimajshima, Near optimum error correcting coding and
decoding: turbo codes, in Proceedings of the ICC ’93 (Geneva, 1993), pp. 1064–1070
80. C. Berrou, The ten-year old turbo codes are entering into service. IEEE Commun. Mag. 41
(7), 110–116 (2003)
81. S. Benedetto, G. Montorsi, “Unveiling turbo codes: some results on parallel concatenated
coding, IEEE Trans. Inform. Theor. IT-43, 591–600 (1996)
82. B. Sklar, A primer on turbo code concepts. IEEE Commun. Mag. 35, 94–102 (1997)
83. R. Gallager, Low-density parity-check codes. IRE Trans. Inform. Theor. 7, 21–28 (1962)
84. L.M. Tanner, A recursive approach to low complexity codes. IEEE Trans. Inform. Theor.
27, 533–547 (1981)
85. D.J.C. MacKay, R.M. Neal, Near Shannon limit performance of low density parity check
codes. Electron. Lett. 32, 1645–1646 (1996)
86. Y.Y. Tai, L. Lan, L. Zeng, S. Lin, K.A.S. Abdel-Ghaffar, Algebraic construction of
quasi-cyclic LDPC codes for the AWGN and erasure channels, IEEE Trans. Commun. 54,
1765–1774 (2006)
87. J. Xu, L. Chen, I. Djurdjevic, S. Lin, K.A.S. Abdel-Ghaffar, Construction of regular and
irregular LDPC codes: geometry decomposition and masking. IEEE Trans. Inform. Theor.
53, 121–134 (2007)
88. M.P.C. Fossorier, Quasi-cyclic low-density parity-check codes from circulant permutation
matrices. IEEE Trans. Inform. Theor. 50, 1788–1793 (2004)
89. B. Vasic, O. Milenkovic, Combinatorial constructions of low-density parity-check codes
for iterative decoding. IEEE Trans. Inform. Theor. 50, 1156–1176 (2004)
90. T.J. Richardson, R. Urbanke, Efficient encoding of low-density parity-check codes. IEEE
Trans. Inform. Theor. 47, 638–656 (2001)
91. T. Richardson, R. Urbanke, The renaissance of Gallager’s low-density parity-check codes.
IEEE Commun. Mag. 41(8), 126–131 (2003)
92. T. Richardson, R. Urbanke, Modern Coding Theory, 2007, online: http://lthcwww.epfl.ch/
mct/index.php
93. Y. Kou, S. Lin, M. Fossorier, Low-density parity-check codes based on finite geometries: a
rediscovery and new results. IEEE Trans. Inform. Theor. 47, 2711–2736 (2001)
94. J. Zhang, M. Fossorier, A modified weighted bit-flipping decoding of low density
parity-check codes. IEEE Comm. Lett. 8, 165–167 (2004)
95. F. Guo, L. Hanzo, Reliability ratio based weighted bit-flipping decoding for low-density
parity-check codes. Electron. Lett. 40, 1356–1358 (2004)
96. D.J.C. MacKay, Good error correcting codes based on very sparse matrices. IEEE Trans.
Inform. Theor. 45, 399–431 (1999)
97. J. Chen, M. Fossorier, Density evolution for two improved BP-based decoding algorithms
of LDPC codes. IEEE Commun. Lett. 6, 208–210 (2002)
98. V. Savin, Self-corrected min-sum decoding of LDPC codes, in Proceedings of the IEEE
International Symposium on Information Theory, ISIT 2008 (Toronto, July 2008), pp. 146–
150
99. M.P.C. Fossorier, Quasi-cyclic low-density parity-check codes from circulant permutation
matrices. IEEE Trans. Inform. Theor. 50, 1788–1793 (2004)
References 513
A C
Algebraic coding theory, 158, 385 Channel
alphabet
B input, 2, 91, 95, 104, 109, 113, 115,
Block codes 124, 126–128, 131, 132, 141
bound output, 2, 91, 95, 98, 102, 109, 113,
Singleton, 156 133, 136, 140, 143, 147, 364, 391,
Varshamov-Gilbert, 156, 215 392, 420, 432, 487, 488, 491
cyclic code, 239, 240 binary (BC), 92, 93, 107, 120, 123, 124,
dual code, 162 215, 391
equivalent code, 159, 172 binary erasure (BEC), 94, 136, 137, 155,
error control procedure 461
ARQ, 154, 155, 163, 165, 188 binary symmetric (BSC), 93, 120, 121, 122,
FEC, 154, 155, 163, 165 123, 124, 128, 133, 151, 165, 183, 194,
hybrid, 154, 155 197, 198, 366, 489, 504
extended code, 159 burst noise channel
generator matrix, 159–162, 168, 169, 172, Gilbert model, 110
173, 178, 184, 194, 201, 202, 208 Gilbert-Elliott model, 110
hamming codes, 156, 157, 159, 173–175, capacity, 101, 102, 106, 111, 123, 124, 139,
177, 178, 180, 181, 183–186 142, 143, 145
hamming distance, 154, 155, 157, 158, 162, continuous, 91, 103, 104, 123, 124, 139,
164, 170, 172, 179, 194, 212, 213, 247 142, 143, 145
hamming weight, 158, 159, 170, 179, 183, decision
203, 219 hard, 400, 404, 415, 453, 480, 485, 496
linear block code, 153, 157, 158, 160, 168, soft, 2, 414, 417, 427, 456
169, 170, 178, 194, 199, 201, 202, 215, decision rule, 106
216 MAP, 108, 114, 116, 117, 441
MDS codes, 156, 197 ML, 108, 117, 391, 394
parity-check, 157, 162, 174 discrete, 2, 91, 92, 95, 103, 109, 121, 136,
parity-check matrix, 162, 164, 179, 184, 139–143, 145, 391–393, 399
196 probabilities
perfect code, 156 input (a posteriori, a priori), 95
Reed-Muller codes, 164 output (a posteriori, a priori), 95
repetition codes, 165, 166, 168 transitional, 11
syndrome, 156, 163, 164, 171, 172, 174, second Shannon theorem, 2, 111, 167
176, 177, 182, 186, 194, 196, 220 with memory, 93, 109, 151
weight spectrum, 158, 159, 160, 189, 195, without memory, 2, 95, 109, 392
203, 207, 212, 215 Convolutional codes
soft, 2, 3, 109, 142, 147, 199, 200, 201, pseudorandom (PN), 13, 31
250, 334, 339, 364, 366, 367, 391, stationary, 13
414, 417, 427, 456 wide sense stationary, 13
irregular, 448, 449, 462
nodes S
parity(-check), 451–453, 456–459, 463, Source
469, 471, 475–478, 481–483, 487, binary, 6, 10, 18–20, 34, 35, 73, 84, 89,
488, 493–495 111, 112, 115, 118, 122, 128, 133, 136,
variable, 451–453, 456–459, 463, 469, 139, 141, 143, 145, 147
471, 475–478, 481–483, 485, 487, continuous, 2, 13, 14, 15, 39, 41, 42
488, 493, 495 discrete
children, 451 adjoined, 12, 143
parent, 451, 456, 481 alphabet, 5, 7, 19, 45, 77
regular, 448, 449, 450, 451, 452, 453, 455, encoding, 45, 77
457, 463, 464, 469, 478, 503 entropy, 15, 32, 34, 136
sum-product algorithm, 456, 458, 459, 478, extension, 2, 17
487, 488, 495, 497, 503, 507 with memory (Markov)
Tanner bipartite graph ergodic, 6
girth cycle, 452, 453, 463 homogeneous, 22
joined source, 55
M state, 6, 20, 21, 22, 25, 27, 29
Matrix absorbing, 6
channel, 2, 91 diagram, 6–8, 20–27, 29–31, 34, 35,
doubly stochastic, 24 110, 329, 330, 334, 335, 345–351,
generator, 3, 159–163, 168, 169, 172, 173, 353, 358, 359, 363, 377, 378, 410
178, 179, 184, 194, 201–203, 206–209, stationary, 34, 110
211, 214, 217, 218, 220, 221, 240, 241, stationary probabilities, 8, 20–22,
242, 249, 250, 251, 253–256, 258–261, 74, 75, 149
267, 302, 303, 386, 388–390, 405, 406, trellis, 6, 9, 26, 29, 31, 34, 35, 407,
408–410, 414, 419, 420, 460, 479, 408
497–500, 503 without memory, 5, 41, 52
parity-check, 3, 162, 164, 171, 174,
178–180, 184, 194, 196, 202, 205, 207, T
208, 216–218, 241, 256, 258, 389, 406, Trellis decoding of linear block codes
408–410, 447, 459, 460–464, 466–468, algorithms
474, 477, 479, 485, 487, 491, 497, generalized VA, 3, 385, 390, 392, 394,
499–502, 504 419–422, 426, 428
stochastic, 6, 21, 92 BCJR, 3, 394, 398–400, 419, 421, 426,
transition, 6, 19–25, 73, 92–95, 106–110, 432, 438, 439, 441
112, 133, 137, 145 Max-Log-MAP, 3, 399, 439, 441
Constant-Log-MAP, 3, 399
R SOVA, 3, 399–401, 426–428, 431, 432,
Random process 438, 442
autocorrelation function, 13 trellis oriented generator matrix, 3,
discrete, 12, 13 399–401, 426–428, 431, 432, 438,
discrete autocorrelation function, 31 442
PN generator, 31 Turbo codes
power density spectrum, 14, 143, 147, 339, encoder, 402, 405, 442, 443
432 decoder, 401, 405, 447, 458, 487