0% found this document useful (0 votes)
109 views

Information Theory and Coding - Solved Problems (PDFDrive)

Uploaded by

vgovind2021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Information Theory and Coding - Solved Problems (PDFDrive)

Uploaded by

vgovind2021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 517

Predrag Ivaniš · Dušan Drajić

Information
Theory and
Coding - Solved
Problems
Information Theory and Coding - Solved Problems
Predrag Ivaniš Dušan Drajić

Information Theory
and Coding - Solved
Problems

123
Predrag Ivaniš Dušan Drajić
Department of Telecommunications, School Department of Telecommunications, School
of Electrical Engineering of Electrical Engineering
University of Belgrade University of Belgrade
Belgrade Belgrade
Serbia Serbia

ISBN 978-3-319-49369-5 ISBN 978-3-319-49370-1 (eBook)


DOI 10.1007/978-3-319-49370-1
Library of Congress Control Number: 2016957145

© Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The aim of the book is to offer a comprehensive treatment of information theory and
error control coding, by using a slightly different approach then in existed literature.
There are a lot of excellent books that threat error control coding, especially modern
coding theory techniques (turbo and LDPC codes). It is clear that understanding
of the iterative decoding algorithms require the background knowledge about the
classical coding and information theory. However, the authors of this book did not
find other book that provides simple and illustrative explanations of the decoding
algorithms, with the clearly described relations with the theoretical limits defined by
information theory. The available books on the market can be divided in two
categories. The first one consists of books that are specialized either to algebraic
coding or to modern coding theory techniques, offering mathematically rigorous
treatment of the subject, without much examples. The other one provides wider
treatment of the field where every particular subject is treated separately, usually
with just a few basic numerical examples.
In our approach we assumed that the complex coding and decoding techniques
cannot be explained without understanding the basic principles. As an example, for
the design of LDPC encoders, the basic facts about linear block codes have to be
understood. Furthermore, the functional knowledge about the information theory is
necessary for a code design—the efficiency of statistical coding is determined by
the First Shannon theorem whereas the performance limits of error control codes are
determined with the Second Shannon theorem. Therefore, we organized the book
chapters according to the Shannon system model from the standpoint of the
information theory, where one block affects the others, so they cannot be treated
separately.
On the other hand, we decided to explain the basic principles of information
theory and coding through the complex numerical examples. Therefore, a relatively
brief theoretical introduction is given at the beginning of every chapter including a
few additional examples and explanations, but without any proofs. Also, a short
overview of some parts of abstract algebra is given at the end of the corresponding
chapters. Some definitions are given inside the examples, when they appear for the
first time. The characteristic examples with a lot of illustrations and tables are

v
vi Preface

chosen to provide a detail insight to the nature of the problem. Especially, some
limiting cases are given to illustrate the connections with the theoretical bounds.
The numerical values are carefully chosen to provide the in-depth knowledge about
the described algorithms. Although the examples in the different chapters can be
considered separately, they are mutually connected and the conclusions in one
considered problem formulates the other. Therefore, a sequence of problems can be
considered as an “illustrated story about an information processing system, step by
step”. The book contains a number of schematic diagrams to illustrate the main
concepts and a lot of figures with numerical results. It should be noted that the in
this book are exposed mainly the problems, and not the simple exercises. Some
simple examples are included in theoretical introduction at the beginning of the
chapters.
The book is primarily intended to graduate students, although the parts of the
book can be used in the undergraduate studies. Also, we hope that this book will
also be of use to the practitioner in the field.

Belgrade, Serbia Predrag Ivaniš


Dušan Drajić
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Data Compression (Source Encoding) . . . . . . . . . . . . . . . . . . . . . . . . . 45
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Information Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5 Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Brief Introduction to Algebra I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Brief Introduction to Algebra II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7 Convolutional Codes and Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . 327
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
8 Trellis Decoding of Linear Block Codes, Turbo Codes. . . . . . . . . . . . 385
Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

vii
viii Contents

9 Low Density Parity Check Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447


Brief Theoretical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Chapter 1
Introduction

We are living in the era of digital communications (transmission and storage). As an


important feature of such systems, the transmission errors can be detected and
corrected (Error control coding), while in analogue transmission practically only
signal-to-noise ratio was considered. Classical communication theory started using
Fourier analysis, to be latter enriched by the probabilistic approach. However, it
considered only signals—the carriers of information. But Shannon, publishing
(1948) his paper “A Mathematical Theory of Communication” (some scientists
think that the article should be “The” and not “A”), founded Information theory.
The Shannon approach was totally different from the classic one. One should say it
was at a higher level. He did not consider the signals, but the information. The
information is represented (encoded) by signals, which are the carriers of infor-
mation. Therefore, it is possible that the transmitted signals do not carry any
information at all (of course, these signals may be needed for proper functioning of
the communication system itself). A simplified block-scheme of communication
system from the information theory point of view is shown in Fig. 1.1.
The encoder can be divided into source encoder (for the corresponding data
compression) and channel encoder (for the corresponding error control coding). The
encriptor can be added as well. Communication channel is, by its nature, a contin-
uous one. Using such an approach the noise and other interferences (other signals,
fading etc.) can be directly taken into account. However, it can be simplified
introducing the notion of a discrete channel, incorporating the signal generation
(modulation), continuous channel and signal detection (demodulation). In contem-
porary communication systems, very often the encoding is connected with the choice
of the corresponding (modulation) signals to obtain better system performances.
Generally speaking, the fields of interest in information theory are sources, source
encoding, channels and channel encoding (and the corresponding decoding).
The next three chapters are dealing with the sources, source encoding and the
channel. A few elementary examples are given at their beginnings, followed later
with more difficult problems. However, the main body of this book is dealing with
the error control coding (last five chapters).
© Springer International Publishing AG 2017 1
P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_1
2 1 Introduction

SIGNAL
SOURCE ENCODER
GENERATOR

CHANNEL

DESTINATION DECODER SIGNAL


DETECTOR

DISCRETE CHANNEL

Fig. 1.1 A simplified block-scheme of communication system from the information theory point
of view

The second chapter deals with the information sources. Mainly discrete sources
are considered. The further division into memoryless sources and sources with
memory is illustrated. The corresponding state diagram and trellis construction are
explained, as well as the notions of adjoint source and source extension. Further, the
notions of quantity of information and entropy are introduced. At the end an
example of continuous source (Gaussian distribution) is considered.
In the next chapter source encoding (data compression) is considered for discrete
sources. The important notions concerning the source codes are introduced—
nonsingularity, unique decodability and the need for an instantaneous code. Notions
of a code tree and of average code word length are introduced as well. The short
discussion from the First Shannon theorem point of view is included. Further
Shannon-Fano and Huffman encoding algorithms are illustrated with corresponding
problems. Adaptive Huffman algorithms (FGK, Vitter) are discussed and illustrated.
At the end LZ algorithm is considered.
Information channels are considered in the fourth chapter. As the sources, the
channels can be as well discrete and continuous, but there exists also a mixed type
(e.g. discrete input, continuous output). Discrete channels can be with or without
memory. They are described using the corresponding channel matrix. A few dis-
crete channels without memory are analyzed in details (BSC, BEC etc.).
Transmitted information and channel capacity (for discrete and continuous chan-
nels) are defined and the Second Shannon theorem is commented. The decision
rules are further analyzed (hard decoding and soft decoding), and some criteria
1 Introduction 3

(MAP, ML) are considered. At the end Gilbert-Elliott model for the channels with
memory is illustrated.
In the fifth chapter block codes (mainly linear block codes) are illustrated by using
some interesting problems. At the beginning the simple repetitions codes are used to
explain FEC, ARQ and hybrid error control procedures. Hamming distance is further
introduced, as well as Hamming weight and distance spectrum. Corresponding
bounds are discussed. The notion of systematic code is introduced. Hamming codes
are analyzed in many problems. The notions of generator matrix, parity-check matrix
and syndrome are explained. Dual codes and McWilliams identities are discussed.
The notion of interleaving is illustrated as well. At the end of the chapter arithmetic
and integer codes are illustrated. At the end of chapter a brief overview of the
corresponding notions from abstract algebra is added (group, field, vector, space).
The sixth chapter deals with cyclic codes, a subclass of linear block codes
obtained by imposing on an additional strong structure requirement. In fact, cyclic
code is an ideal in the ring of polynomials. The notions of generator polynomial and
parity-check polynomial are introduced. The usage of CRC is illustrated in a few
problems. BCH codes are as well illustrated. RS codes are analyzed in details,
especially decoding algorithms (Peterson, Berlekamp-Massey, Gorenstein and
Zierler, Forney). At the end of chapter a brief overview of the corresponding
notions from abstract algebra is added (Galois field, primitive and minimal poly-
nomial, ideal).
Convolutional codes and decoding algorithms are analyzed in the next chapter.
The corresponding notions are explained (constraint length, transfer function
matrix, state diagram and trellis, free distance). Majority logic decoding, sequential
decoding and especially Viterbi algorithm are illustrated as well as the possibilities
of hard (Hamming metric) and soft (Euclidean metric) decision. Punctured codes
are explained as well. TCM is considered in details.
Trellis decoding of linear block codes and turbo decoding are explained in the
chapter eight. It is shown how from suitable transformed (trellis oriented) generator
matrix, the corresponding parity-check matrix can be obtained suitable for trellis
construction. The corresponding decoding algorithms are analyzed (generalized
Viterbi algorithm, BCJR, SOVA, log-MAP, max-log-MAP). At the end turbo codes
are shortly described.
The last chapter deals with LDPC codes. They provide an iterative decoding
with a linear complexity. Tanner interpretation of LDPC codes using bipartite
graphs is explained. The various decoding algorithms are analyzed with hard
decision (majority logic decoding, bit-flipping) and with soft decision
(belief-propagation, sum-product, self-correcting min-sum). At the end the algo-
rithms are compared using Monte Carlo simulation over BSC with AWGN.
For reader convenience, a relatively brief theoretical introduction is given at the
beginning of every chapter including a few additional examples and explanations,
but without any proofs. Also, a short overview of some parts of abstract algebra is
given at the end of the corresponding chapters. This material is mainly based on the
textbook An Introduction into Information Theory and Coding [1] (in Serbian) from
the same authors.
Chapter 2
Information Sources

Brief Theoretical Overview

Generally, the information sources are discrete or continuous. The discrete source
has finite or countable number of messages, while the con source messages are from
an uncountable set. In this book will be mainly dealt with discrete sources, espe-
cially with those having a finite number of symbols.
The further subdivision of sources is according to the memory they may have.
The sources can be without memory (zero-memory, memoryless) (Problems 2.1,
2.2, 2.3, 2.5, 2.6 and 2.8) emitting the symbols (messages) si according only to the
corresponding probabilities P(s). Therefore, the zero-memory source is completely
described by the list of symbols S (source alphabet) i
 
S s1 ; s2 ; . . .; sq

and by the corresponding symbol probabilities.

Pðsi Þði ¼ 1; 2; . . .; qÞ:

It is supposed as well that the symbols (i.e. their emitting) is a complete set of
mutually exclusive events, yielding

X
q
Pðsi Þ ¼ 1:
i¼1

For the sources with memory, where the emitting of the next symbol depends on
m previously emitted symbols (m is the memory order) (Problems 2.4 and 2.7), the
corresponding conditional probabilities are needed.

© Springer International Publishing AG 2017 5


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_2
6 2 Information Sources

Pðsj =si1 ; si2 ; . . .; sik ; . . .; sim Þ ðj ¼ 1; 2; . . .; qÞ ðik ¼ 1; 2; . . .; qÞ


ðk ¼ 1; 2; . . .; mÞ;

where si1 is the oldest and sim the youngest symbol, proceeding the symbol sj. The
other name for such source is Markov source (Markov chain is the emitted
sequence). For the source with memory-order m, m previously emitted symbols are
called the source state. Generally, there are qm states. For a binary source (two
symbols only) the number of states is 2m. For every state the following must be
satisfied

X
q
 
P sj =si1 ; si2 ; . . .; sik ; . . .; sim ¼ 1 ðik ¼ 1; 2; . . .; qÞ ðk ¼ 1; 2; . . .; mÞ:
j¼1

From every state the source can emit any of q possible symbols, and there are
totally qmq = qm+1 conditional probabilities (some can be equal to zero!).
The state diagram (Problems 2.4, 2.6 and 2.7) can be drawn, comprising the
states and the corresponding conditional probabilities. Instead of a state diagram,
trellis (Problems 2.5, 2.6 and 2.7) can be constructed being some kind of a dynamic
state diagram.
In further considering only the ergodic sources will be taken into account. For
example, the source having at least one absorbing state (Problem 2.4), i.e. where
from such a state the source can not pass into the other states, is not ergodic.
Loosely speaking, the source is ergodic, if observed for a long time, it will pass
through every of its possible states. If the elements of the matrix containing tran-
sition probabilities do not change in time, the source is homogenous. The source is
stationary if the steady state probabilities can be found by solving the corre-
sponding matrix equation (Problems 2.4 and 2.5)

p ¼ p  P;

where p is steady state probabilities vector and P is matrix of conditional proba-


bilities. Of course, the additional condition is that the sum of the steady state
probabilities equals 1. In this book the convention will be used that element pij in ith
row and jth column in the matrix of conditional probabilities denotes the probability
of the transition from the state Si into the state Sj. Having in view that the elements
in transition matrix are probabilities (non negative) as well as that from any state the
source must pass to some other state or stay in the same state, the sum of elements
in every row equals 1. Such a matrix is called stochastic.

X
q
pij ¼ 1 ði ¼ 1; 2; . . .; qÞ:
j¼1

Some authors for transition matrix use the transposed version of the above
matrix and in this case the sum of the elements in every column equals 1.
Brief Theoretical Overview 7

If the sum of elements in every column also equals 1, the matrix is doubly
stochastic and all the states are equally probable. For any Markov source the
probabilities of emitted symbols can be found as well. The adjoint source
(Problems 2.4 and 2.7) to the source S, denoted by S is a zero-memory information
source with source alphabet identical to that of S, having the same symbol prob-
abilities. The nth extension (Problems 2.3 and 2.7) of any source S is the source
denoted by Sn whose symbols are all possible different sequences (combinations) of
the symbols of source S of the length n. If the original source has q symbols, source
Sn has qn different symbols.
Consider binary (q = 2) Markov source having memory order 2 (m = 2). There
are qm = 22 = 4 states and qm+1 = 23 = 8 transition probabilities. Let these proba-
bilities are

Pð0=00Þ ¼ Pð1=11Þ ¼ 0:7


Pð1=00Þ ¼ Pð0=11Þ ¼ 0:3
Pð0=01Þ ¼ Pð0=10Þ ¼ Pð1=01Þ ¼ Pð1=10Þ ¼ 0:5

Therefore, the corresponding matrix of transition probabilities is

00 01 10 11
2 3
00 0:7 0:3 0:0 0:0
01 6
6 0:0 0:0 0:5 0:5 7
7

10 4 0:5 0:5 0:0 0:0 5
11 0:0 0:0 0:3 0:7

The states are 00, 01, 10 and 11. It should be noted that the source cannot go
directly from any state into the any other state. E.g. if source is in the state 00, it can
pass only to the same state (i.e. stay in it) (00) if in the emitted symbol sequence
after 00 symbol 0 appears (000 ! 000), or into the state 01 if after 00 symbol 1
appears (001 ! 001). It can be easily visualized as the state of the corresponding
shift register, where new (emitted) symbols enter from the right side. Equivalently,
the “window” of the corresponding width (m = 2) sliding over the sequence of
emitted symbols can be conceived, showing the current state. The corresponding
state diagram is illustrated in the Fig. 2.1.
Steady state probabilities can be calculated as follows

Pð00Þ ¼ 0:7Pð00Þ þ 0:5Pð10Þ


Pð01Þ ¼ 0:3Pð00Þ þ 0:5Pð10Þ
Pð11Þ ¼ 0:7Pð11Þ þ 0:5Pð01Þ:

The first equation is the consequence of the fact the entering into the state 00 is a
result of two mutually exclusive events—either the source was in state 00 and 0 is
emitted, either the source was in state 10 and 0 was emitted. The second and the
8 2 Information Sources

Fig. 2.1 State diagram of a 0.3


considered source 0.7
00 01
0.5
0.5
0.5
0.5

10 11
0.3 0.7

third equation are obtained in a similar way. However, if the corresponding (fourth)
equation based on the same reasoning was added for the state 10, a singular system
would be obtained, because this equation is dependent of the previous three.
Instead, the following equation will be used

Pð00Þ þ Pð01Þ þ Pð10Þ þ Pð11Þ ¼ 1;

because the source must be in one of the possible states. The corresponding
solution is

Pð00Þ ¼ Pð11Þ ¼ 5=16


Pð01Þ ¼ Pð10Þ ¼ 3=16:

The corresponding steady state probabilities vector is

p ¼ ½ 5=16 3=16 3=16 5=16 :

Taking into account the fact that state diagram is symmetrical, one should expect
as well that the symmetrical states are equally probable. The stationary probabilities
of the symbols in emitted sequence can now be calculated. The probability to find 0
in the emitted sequence is equal to the sum (the states are mutually exclusive) of the
probability that the source is in the state 00 and the half of the probability that
source is in the state 01 or 10

Pð0Þ ¼ Pð00Þ þ 0:5ðPð01Þ þ Pð10ÞÞ ¼ 5=16 þ 0:5ð3=16 þ 3=16Þ ¼ 5=16 þ 3=16


¼ 0:5:

Similarly, one can calculate P(1) = 0.5. However, this value can be easily
obtained from

Pð0Þ þ Pð1Þ ¼ 1:

Further, because the state diagram is symmetrical (i.e. the values of the corre-
sponding steady state probabilities) the obtained result should be expected as well
Brief Theoretical Overview 9

00

01

10

11

0
1

Fig. 2.2 Trellis for the source from the Fig. 2.1

—in the emitted sequence 0 and 1 are equally probable. The corresponding trellis is
shown in Fig. 2.2. States are denoted at the trellis beginning only. Further, only
points are used. The corresponding emitted bits are marked using arrows.
The state transition probabilities can be marked also. It is easy to note that trellis
has a periodical structure. After the first step, the trellis repeats. If the initial source
state is defined in advance (e.g. 00) the trellis starts from this state.
The information is represented (encoded) by signals, which are the carriers of
information. From the information theory point of view, the information depends on
the probability of the message emitted by the source

Qðsi Þ ¼ f ½Pðsi Þ:

If the probability of some message is smaller, the user uncertainty about this
event is greater. By receiving the message (symbol) si, the user is no more uncertain
about this event. In other words, if the uncertainty is greater, it can be concluded
that by receiving the corresponding message, the greater quantity of information is
received. Therefore, the quantity of information of some message is equal, or at
least proportional, to the corresponding uncertainty at the receiving part. It can be
easily concluded that the quantity of information of some message is inversely
proportional to its (a priori) probability. A step further, if the user is sure that some
message will be emitted (P(si) = 1), then it does not carry any information, because
there was not any uncertainty. That means as well that an information source must
10 2 Information Sources

have at least two messages at its disposition, to be capable to emit information. It


should be noted that the emitting and not emitting of one message correspond to the
two different messages.
On the basis of such approach from the definition of the quantity of information
carrying by some message one expects that its numerical value is in inverse pro-
portion to its probability, and equals to zero if it is a sure event. The next reasonable
condition is that quantity of information is a continuous function of probability,
because a small change in probability can not substantially change the quantity of
information. Of course, the quantity of information can not have a negative value.
There are many functions satisfying these conditions. However, from all these
function, as a measure of the quantity of information the logarithm of the prob-
ability reciprocal value was chosen
 
1
Qðsi Þ ¼ log ¼  log Pðsi Þ:
Pðsi Þ

It is easy to verify that the above conditions are satisfied. Of course, some other
function, satisfying these conditions could have been chosen, e.g.

1
Qðsi Þ ¼ 1
Pðsi Þ

However, the logarithmic function is a unique one making possible the fulfill-
ment of one more necessary condition for the quantity of information—the addi-
tivity of information quantities of independent messages. Let the probabilities of
messages si and sj are P(si) i P(sj). Then the joint message obtained by successive
emitting of these independent messages should have the information quantity equal
to the sum of their information quantities. Using logarithmic function, one obtains
   
1 1
Qðsi ; sj Þ ¼ log ¼ log
Pðsi ; sj Þ Pðsi ÞPðsj Þ
   
1 1
¼ log þ log ¼ Qðsi Þ þ Qðsj Þ:
Pðsi Þ Pðsj Þ

It can be proved that the logarithmic function is the unique one satisfying all
mentioned conditions. Logarithm to any base can be used, however, it is suitable to
use a logarithm to the base 2. It will be denoted as ld(.).
If a binary source is considered, its output signals (symbols) are known as bits
(usually denoted as 0 and 1). If the bits are equally probable, i.e. P(s1) =
P(s2) = 0.5, one obtains

Qðs1 Þ ¼ Qðs2 Þ ¼ ldð2Þ ¼ 1 ½Sh:

This definition gives a very suitable etalon for a unit to measure the quantity of
information, especially having in view the digital transmission. Therefore, here
Brief Theoretical Overview 11

every bit can carry a unit of information. To avoid the confusion between the bits
(signals) and the units of information in this book the information unit using a
logarithm to the base 2 will be called shannon (denoted by Sh). Therefore, one bit
can carry up to one shannon.
For a source having q symbols, the corresponding quantities of information for
the symbols are
 
1
Qðsi Þ ¼ ld ½Sh ði ¼ 1; 2; . . .; qÞ:
Pðsi Þ

The average quantity of information emitted (per one symbol) is

X
q X
q  
1
HðSÞ ¼ E½Qðsi Þ ¼ Qðsi Þ ¼ Pðsi ÞQðsi Þ ¼ Pðsi Þld
Pðsi Þ
Xq  i¼1 i¼1
Sh
¼ Pðsi ÞldPðsi Þ :
i¼1
symb

The average information rate of the source is



Sh symb Sh
UðSÞ ¼ vHðSÞ ¼ :
symb s s

where v the source symbol rate [symb/s].


The average amount of information per source symbol is called the entropy
(Problems 2.1 and 2.2), being for the zero-memory source

X
q
1 X q
HðSÞ ¼ Pðsi Þ ld ¼ Pðsi Þ ld Pðsi Þ ½Sh=symb
i¼1
Pðsi Þ i¼1

This expression is analogue to the Boltzmann formula for the entropy of the
ideal gas, where P(si) is the probability of the state in the phase space. It is some
measure of the disorder and according to the Second law of thermodynamics can
not be become smaller in a closed system. In information theory it is an average
measure (per symbol) of the user uncertainty about the emitted source symbols.
Shannon (1948) introduced the name entropy for H(S).
Entropy of the nth extension of zero-memory source is H(Sn) = nH(S) (Problem
2.3). By taking at least m extensions of an mth order Markov source the first-order
Markov source is obtained. There will be always the dependence at the borders of
of the extension “symbols”. The property H(Sn) = nH(S) also holds in this case.
The entropy of the mth order Markov source with q symbols {s1, s2, …, sq} having
the transitional probabilities
12 2 Information Sources

Pðsj =si1 ; si2 ; . . .; sik ; . . .; sim Þ ðj ¼ 1; 2; . . .; qÞ ðik ¼ 1; 2; . . .; qÞ


ðk ¼ 1; 2; . . .; mÞ:

is
!
X
q X
q X
q X
q
1
HðSÞ ¼  Pðsi1 ; si2 ; . . .; sim ; sj Þld  
j¼1 i1 ¼1 i2 ¼1 im ¼1 P sj =si1 ; si2 ; . . .; sim
!
X 1
¼ Pðsi1 ; si2 ; . . .; sim ; sj Þld   :
Sm þ 1
P sj =si1 ; si2 ; . . .; sim

In the last row the summation over all possible symbols of m + 1th extension
(Sm+1) of the original source is symbolically denoted. The number of elements of
the sum is qm+1. Further, the entropy of the corresponding source adjoined to the
extended Markov source is not equal to the entropy of the extended Markov source
(Problems 2.4, 2.5 and 2.7).
The entropy is an important characteristic of the information source. It is
interesting how the entropy changes when the symbol probabilities change. If the
probability of one symbol equals 1, then the probabilities of the all others are equal
to zero, and the entropy is equal to zero as well, because

1
1  ld(1Þ ¼ 0; lim x  ld ¼ 0:
x!0 x

The expression for the entropy is a sum of non negative quantities, and the
entropy cannot have the negative values. The next question is do the entropy is
limited from the upper side. By using some inequalities, it can be easily proved

HðSÞ  ld q:

yielding finally

0  HðSÞ  ld q:

Furthermore, it can be proved as well that the entropy will have the maximum
value (ld q) if P(si) = 1/q for every i, i.e. when all the symbols are equally probable.
The “physical interpretation” is here very clear. If the entropy is a measure of
uncertainty of the user about the source emitting, then it is obvious that this
uncertainty will have the maximal value when all the symbols are equally probable.
If these probabilities are different, the user will expect the more probable symbols,
they will be emitted more frequently, and at the average, the smaller quantity of
information will be obtained per symbol.
The sequence of emitted symbols can be conceived as well as some kind of a
discrete random process, where all possible sequences generated by a source form
an ensemble. Sometimes, it is also said that the time-series are considered. It will be
Brief Theoretical Overview 13

supposed that the ensemble is wide sense stationary, i.e. that the average value and
autocorrelation function do not depend on the origin. Further, the process can be
ergodic for the mean value and autocorrelation function if they are the same if
calculated by using any generated sequence, or averaged over the whole ensemble
for any fixed time moment. If the symbol duration is T, instead of x(t) symbol x(nT)
can be used, or x(n) or simply xn. The average value is

Efxn g;

where Efg denotes mathematical expectation. The discrete autocorrelation


function (Problem 2.6) is

Rx ðlÞ ¼ Efxn xn þ l g:

If the symbols are complex numbers, the second factor should be complemented.
On the base of the Wiener- Khinchin theorem for discrete signals (z-transformation)
the corresponding discrete average power spectrum density (APSD) (Problem 2.6)
can be found.
A special kind of discrete random process is pseudorandom (PN—Pseudo
Noise) process (sequence). In most applications binary sequences are used. They
are generated by pseudorandom binary sequence generator (PN), i.e. by the shift
register with linear feedback (Problem 2.6). PN sequence is periodical and auto-
correlation function is periodical as well. If the number of register cells is m, the
corresponding maximal sequence length (period) is L = 2m − 1. Of course, it will
be the case, if the feedback is suitably chosen.
For a continuous source (Problem 2.8), emitting the “symbols” s from an
uncountable set, having the probability density w(s), the entropy can be defined as
follows

Z1  
1
HðSÞ ¼ wðsÞld ds:
wðsÞ
1

Of course, the following must be satisfied

Z1
wðsÞds ¼ 1;
1

However, one should be careful here, because in this case the entropy does not
have all the properties of the entropy as for discrete sources. For example, it can
depend on the origin, it can be infinite etc. Some additional constraints should be
added. If the probability density is limited to a finite interval (a, b), the entropy will
be maximum for a uniform distribution, i.e.
14 2 Information Sources

ba ;
1
asb
wðsÞ ¼ ;
0; otherwise

and a corresponding entropy value is


HðSÞjmax ¼ ldðb  aÞ:

This result is similar to the result obtained for a discrete source having a finite
number of symbols.
If the stochastic variable can take any value from the set of real numbers, but if
its variance is finite, i.e.
Z1
r ¼
2
s2 wðsÞds\1;
1

the maximum entropy will have Gaussian distribution



1 s2
wðsÞ ¼ pffiffiffiffiffiffiffiffiffiffi exp  2 :
2pr2 2r

It is
1  
HðS)jmax ¼ ld 2per2 :
2

According to the Central limit theorem, the sum of more independent random
variables, having limited variances, when the number of variables increases, con-
verges to Gaussian distribution. Therefore, otherwise speaking, Gaussian distribu-
tion in the nature is a result of the action of a large number of statistically
independent causes. Therefore, it is to be expected that the measure of uncertainty
(entropy) in such case has a maximal value.
Some authors consider that the variables whose probability density is limited to a
finite interval (a, b) correspond to the “artificial” signals (speech, TV etc.), while the
variables with unlimited interval of values correspond to the “natural” signals (noise).
It is also interesting that it is possible to generate Gaussian random process
having a predefined autocorrelation function (Problem 2.7).
One more problem is how to define the information rate for continuous sources.
In this case the finite frequency band (limited to fc) of power density spectrum
should be supposed. The corresponding signal should be sampled with the rate 2fc
samples per second (these values are statistically independent!). The corresponding
information rate is

sample Sh Sh
UðSÞ ¼ 2fg HðSÞ  ¼ ;
s sample s

i.e., the same units are used as for discrete sources (Sh/s). In such a way it is
possible to compare discrete and continuous sources.
Problems 15

Problems

Problem 2.1 Zero-memory source is defined by the table

si s1 s2 s3 s4 s5 s6 s7 s8
P(si) 0.3 0.21 0.17 0.13 0.09 0.07 0.01 0.02

(a) Find the entropy and the source information rate if the symbol rate is vs = 100
[symb/s].
b) How the entropy and information rate will change in the case of equiprobable
symbols?

Solution
(a) Source entropy is the average entropy per symbol

X
q
1 X q
HðSÞ ¼ Pðsi Þ ld ¼ Pðsi Þ ld Pðsi Þ:
i¼1
Pðsi Þ i¼1

For this source (q = 8) the entropy is

HðSÞ ¼ 0:3 ldð0:3Þ  0:21 ldð0:21Þ  0:17 ldð0:17Þ  0:13 ldð0:13Þ


 0:09 ldð0:09Þ  0:07 ldð0:07Þ  0:01 ldð0:01Þ  0:02 ldð0:02Þ
¼ 2:5717 ½Sh/symb;

where, for the easier calculation of logarithms to the base 2 (denoted by ld(x)),
the following relation can be used

loga ðxÞ
ldðxÞ ¼ :
loga ð2Þ

Information rate, the average information quantity emitted by the source per
second, is
  
Sh symb Sh
UðSÞ ¼ HðSÞvs ¼ 2:5717  100 ¼ 257:17 ;
symb s s

where vs is the symbol rate.


16 2 Information Sources

(b) In the case of equiprobable symbols, the expressions become

X
q
1 1 1
HðSÞ ¼ ld ¼ q   ldðqÞ ¼ ldðqÞ; UðSÞ ¼ vs  ld q;
i¼1
q 1=q q

and the corresponding numerical values are


 
Sh Sh
HðSÞ ¼ 3 ; UðSÞ ¼ 300 :
symb s

As expected, the zero-memory source has the maximum entropy when the
symbols are equiprobable, because the uncertainty about the next emitted
symbol is maximal.

Problem 2.2 Zero-memory source is defined by the table

si s1 s2 s3 s4
P(si) 0.5 0.25 0.125 0.125

(a) Find the entropy and the information rate if the symbol rate is vs = 400
[symb/s].
(b) Find the entropy if the source emits q symbols according to the following
expression

2i ; i ¼ 1; 2; . . .; q  1
Pðsi Þ ¼
2i þ 1 ; i ¼ q

and draw the values of entropy depending on the number of symbols for
q = 2, 3, …, 20.

Solution
(a) Direct using of the corresponding formula for entropy yields

X
4
1 1 1 1 1
HðSÞ ¼ Pðsi Þ ld ¼ ld 2 þ ld 4 þ ld 8 þ ld 8
i¼1 
Pðs i Þ 2 4 8 8
Sh
¼ 1:75 :
symb
Problems 17

In this case, the logarithms can be easily calculated, because the probabilities
are inverses of the integer powers of 2. Information rate is
  
Sh symb Sh
UðSÞ ¼ 1:75  400 ¼ 700 :
symb s s

(b) When the probability of one symbol is twice smaller than that of the previous
one, and when they are ordered according to the probabilities (while the last
two have equal probabilities), the entropy of the source which has q symbols
can be found in the form

X
q
1 X i
q1
q1
HðSÞ ¼ Pðsi Þ ld ¼ i
þ q1 ;
i¼1
Pðsi Þ i¼1
2 2

after the substitution k = i − 1 one obtains


 k þ 1 q2   q2  
q1 X q  1 1X 1 k 1X 1 k
q2
1
HðSÞ ¼ þ ðk þ 1Þ ¼ þ k þ :
2q1 k¼0
2 2q1 2 k¼0 2 2 k¼0 2

The obtained sums can be calculated as the geometrical and


arithmetical-geometrical progressions
X
n
xn  1 X
n1
ðn  1Þxn xð1  xn1 Þ
xk1 ¼ ; kxk ¼  þ ;
k¼1
x1 k¼0
1x ð1  xÞ2

yielding

HðSÞ ¼ ðq  1Þ21q  ðq  2Þ21q þ ð1  22q Þ þ ð1  21q Þ ¼ 2  22q :

Therefore, the entropy here is always limited 1  HðSÞ  2, grows monotonously


with the number of symbols, approaching asymptotically to Hmax ðSÞ ¼ 2. In
Fig. 2.3 the corresponding entropy is shown (for 2  q  20) as well as the com-
parison to the entropy of source having equiprobable symbols (Problem 2.1b).
This figure verifies also the known fact that the entropy has the maximal value
when the symbols are equiprobable. The difference is very notable when the
number of symbols is greater.
Problem 2.3 Consider the binary zero-memory source
(a) Find the entropy of the source, if probability of one symbol equals p = 0.1, as
well as the entropies for the second and the third source extension.
(b) Find the expression for the entropy of the source nth extension, if the corre-
sponding probabilities are p = 0.01, p = 0.1 and p = 0.5. For n  5 show the
results in a diagram.
18 2 Information Sources

4.5

source from example 1.2b


4 symbols with equal probabilities

3.5
Entropy, H(s)

2.5

1.5

1
2 4 6 8 10 12 14 16 18 20
Number of symbols, q

Fig. 2.3 Entropy of sources from Problems 1.1b and 1.2b for q = 2, 3, …, 20

Solution
Binary source emits only two symbols with the probabilities P(s1) = p and
P(s2) = 1 − p. As its extension the sequences of original binary symbols as new
(compound) symbols are considered.
(a) From p(s1) = 0.1 follows P(s2) = 1 − p = 0.9 yielding the entropy

HðSÞ ¼ 0:1 ld ð0:1Þ  0:9 ld ð0:9Þ ¼ 0:469 ½Sh/symb:

Second extension has 22 = 4 symbols r1 = s1s1, r2 = s1s2, r3 = s2s1 and


r4 = s2s2. Having in view that the source is zero-memory (memoryless), the
corresponding probabilities are

Pðrj Þ ¼ Pðsi1 ; si2 Þ ¼ Pðsi1 ÞPðsi2 Þ; i1 ; i2 2f1; 2g; j2f1; 2; 3; 4g;

and can be easily calculated P(r1) = 0.81, P(r2) = 0.09, P(r3) = 0.09 and
P(r4) = 0.01, yielding the corresponding entropy

HðS2 Þ ¼ 0:81 ld ð0:81Þ  0:09 ld ð0:09Þ  0:09 ld ð0:09Þ


 0:01 ld ð0:01Þ
¼ 0:938 ½Sh/symb:
Problems 19

In the case of nth extension the probabilities are

Pðrj Þ ¼ Pðsi1 ; si2 ; . . .; sin Þ


¼ Pðsi1 Þ  Pðsi2 Þ  . . .  Pðsin Þ; i1 ; i2 ; . . .; in 2f1; 2g; j  2n ;

for the third extension, which has eight compound symbols, given in
Table 2.1 the corresponding entropy is

X
8 
1 Sh
HðS Þ ¼3
Pðri Þ ld ¼ 1:407 ;
i¼1
Pðri Þ symb

confirming for n = 2 and n = 3, the known relation [1,2]

HðSn Þ ¼ nHðSÞ:

(b) For zero-memory binary source the entropy is given by


   
1 1
HðSÞ ¼ p ld þ ð1  pÞ ld ;
p 1p

while for the entropy of nth extension of zero-memory binary source one
obtains
   
1 1
HðSn Þ ¼ nHðSÞ ¼ np ld þ nð1  pÞ ld :
p 1p

The entropy of nth extension of zero-memory binary source for n = 1, …, 5


for some values of parameter p, is shown in Fig. 2.4. It can be easily seen that
entropy grows linearly with n, yielding the larger differences for greater n. Of
course, the entropy is maximal in the case of equiprobable symbols.

Problem 2.4 First-order (Markov) memory source whose source alphabet is (s1, s2,
s3, s4), is defined by the transition matrix

Table 2.1 Symbols of the third extension and its probabilities+


I 1 2 3 4 5 6 7 8
ri s1 s1 s1 s1 s1 s2 s1 s2 s1 s1 s2 s2 s2 s1 s1 s2 s1 s2 s2 s2 s1 s2 s2 s2
P 0.729 0.081 0.081 0.009 0.081 0.009 0.009 0.001
(ri)
20 2 Information Sources

p=0.5
4.5
p=0.1
Entropy of the n−th extension, H(S ) p=0.01
n
4

3.5

2.5

1.5

0.5

0
1 1.5 2 2.5 3 3.5 4 4.5 5
The extension order, n

Fig. 2.4 Entropy of the nth extension of zero-memory binary source, one binary symbol with the
probability p

2 3
0 0:7 0:3 0
6 0 0 0:7 0:3 7
6
P¼4 7:
0:1 0 0 0:9 5
0:7 0:3 0 0

(a) Draw the source state diagram and find the state probabilities and the symbol
probabilities. Whether the source is a homogenous one and a stationary one?
(b) If the probabilities in the matrix are changed as follows P(s1/s3) = 0.3 and
P(s4/s3) = 0.7, draw the state diagram and find stationary probabilities of
source symbols.
(c) Repeat the procedure from (b) for P(s1/s3) = 0 and P(s4/s3) = 1, the other
probabilities being unchanged.
(d) How the state diagram changes for P(s1/s3) = P(s4/s3) = 0 and P(s3/s3) = 1?
(e) Find the entropies of the sources defined previously. For which transition
matrix the entropy would achieve the maximum? What are the symbol
probabilities in this case?

Solution
(a) Transition matrix describing the first-order memory source (first order
Markov source) contains the conditional probabilities—the element in ith row
and jth column corresponds to the conditional probability P(sj/si). This matrix
Problems 21

is always stochastic because the sum of elements in every row must be equal
to 1 [3].
In this problem every symbol corresponds to one source state and transition
probabilities are determined by the transition matrix. The corresponding state
diagram is shown in Fig. 2.5. The source is stationary if the corresponding
stationary symbol probabilities can be calculated, for which the following
relation holds

½Pðs1 Þ Pðs2 Þ Pðs3 Þ Pðs4 Þ ¼ ½Pðs


21 Þ Pðs2 Þ Pðs3 Þ Pðs4 Þ 3
0 0:7 0:3 0
6 0 0 0:7 0:3 7
64 0:1 0
7
0 0:9 5
0:7 0:3 0 0
) p ¼ p  P;

where p is the stationary probabilities vector.


From the previous matrix equation, the following set of equations can be
easily found

Pðs1 Þ ¼ 0:1Pðs3 Þ þ 0:7Pðs4 Þ;


Pðs2 Þ ¼ 0:7Pðs1 Þ þ 0:3Pðs4 Þ;
Pðs3 Þ ¼ 0:3Pðs1 Þ þ 0:7Pðs2 Þ;
Pðs4 Þ ¼ 0:3Pðs2 Þ þ 0:9Pðs3 Þ;

The source should be always in one of the possible states, yielding

Pðs1 Þ þ Pðs2 Þ þ Pðs3 Þ þ Pðs4 Þ ¼ 1:

It can be easily verified that the system consisting of the first four equations
has not a unique solution. If the fourth equation is substituted by the fifth, the
following matrix equation can be obtained

Fig. 2.5 State diagram of a 0.7


first-order Markov source—
(a)
s1 0.1 0.3 s2

0.3
0.3 0.7
0.7

s4 s3
0.9
22 2 Information Sources

2 3
0 0:7 0:3 1
6 0 0 0:7 1 7
p ¼ ½0 0 0 1þp  6
4 0:1
7;
0 0 1 5
0:7 0:3 0 0

yielding
2 31
1 0:7 0:3 0
6 1 1 0:7 0:3 7
p ¼ ½0 0 0 1  6
4 0:1
7
0 1 0:9 5
1 1 1 1

finally, the stationary probabilities are obtained

pðaÞ ¼ PðaÞ ðs1 Þ PðaÞ ðs2 Þ PðaÞ ðs3 Þ PðaÞ ðs4 Þ


¼ ½ 0:2263 0:2451 0:2395 0:2891 

The source is homogeneous because the probabilities in the transition matrix


do not depend on time. Further, it was verified that, in this case, the state
probabilities fulfilling the relation p ¼ p  P can be found. Therefore, the
source is stationary. It is obvious that for the first-order Markov source state
stationary probabilities are equal to the emitted symbols probabilities.
(b) The corresponding state diagram is shown in Fig. 2.6a. It is symmetric and the
corresponding transition matrix
2 3
0 0:7 0:3 0
6 0 0 0:7 0:3 7
PðbÞ ¼6
4 0:3
7
0 0 0:7 5
0:7 0:3 0 0

(a) 0.7 (b) 0.7

s1 s2 s1 s2
0.3 0.3 0.3
0.3 0.3
0.3 0.7 0.3 0.7
0.7 0.7

s4 s4 s3
s3
0.7 1.0

Fig. 2.6 State diagrams for (b) (a) and (c) (b)
Problems 23

is double stochastic, because the sum of elements in every row equals 1, as


well as the sum of elements in every column. In the case of double stochastic
transition matrix all states have equal probabilities, and the symbol
probabilities here are

pðbÞ ¼ ½ 0:25 0:25 0:25 0:25 :

(c) For P(s1/s3) = 0 and P(s4/s3) = 1, the other probabilities remaining unchan-
ged, state diagram is shown in Fig. 2.6b. The source from state s3 goes
deterministically to the state s4. However, the source is stationary, because the
solution can be found

pðcÞ ¼ PðaÞ ðs1 Þ PðaÞ ðs2 Þ PðaÞ ðs3 Þ PðaÞ ðs4 Þ


¼ ½ 0:2152 0:2429 0:2346 0:3074 

(d) For P(s1/s3) = P(s4/s3) = 0 and P(s3/s3) = 1, the other probabilities remaining
unchanged, state diagram is shown in Fig. 2.7. In this case the equation

Pðs3 Þ ¼ 0:3Pðs1 Þ þ 0:7Pðs2 Þ þ Pðs3 Þ

has not solution except for P(s1) = P(s2) = 0 (being in contradiction to the
other relations, i.e. not yielding the unique solutions for P(s3) i P(s4)). In this
case it is not possible to find probabilities satisfying the relation p ¼ p  P
and the source is not stationary [1, 3]. It is clear as well from the fact that the
state s3 is an absorbing one, and the source cannot go out from this state.
(e) For the first-order memory sources the general expression for entropy of the
source with memory is substantially simplifies

X
q X
q  
1
HðSÞ ¼ Pðsi Þ Pðsj =si Þ ld :
i¼1 j¼1
Pðsj =si Þ

Fig. 2.7 State diagram for 0.7


(d)

s1 0.3 s2
0.3
0.3 0.7
0.7

s4 s3
1.0
24 2 Information Sources

Taking into account the previously calculated state probabilities in d), the
additional simplification is achieved comprising only 8 sum elements, instead
of q2 = 16 (because some elements of transition matrix are equal to zero)
    
1 1
H ðaÞ ðSÞ ¼PðaÞ ðs1 Þ 0:7 ld þ 0:3 ld
0:7 0:3
    
1 1
þ PðaÞ ðs2 Þ 0:7 ld þ 0:3 ld
0:7 0:3
    
1 1
þ PðaÞ ðs3 Þ 0:1 ld þ 0:9 ld
0:1 0:9
    
1 1
þ PðaÞ ðs4 Þ 0:7 ld þ 0:3 ld
0:7 0:3

Sh
¼ 0:7825 ;
symb

the entropy of adjoint source is

X
4   
1 Sh
H ðaÞ ð
SÞ ¼ PðaÞ ðsk Þ ld ðaÞ
¼ 1:9937 :
k¼1
P ðsk Þ symb
For the source where the transition matrix is doubly stochastic, the state diagram
is symmetric, the entropies of the source and the corresponding adjoint source are
     
1 1 Sh
H ðbÞ ðSÞ ¼ 4  0:25  0:7 ld þ 0:3 ld ¼ 0:8813 ;
0:7 0:3 symb
  
1 Sh
H ðbÞ ð
SÞ ¼ 4  0:25 ld ¼2 :
0:25 symb

For P(s1/s3) = 0 and P(s4/s3) = 1, the corresponding values are obtained

H ðcÞ ðSÞ ¼ ðPðcÞ ðs1 Þ þ PðcÞ ðs2 Þ þ PðcÞ ðs4 ÞÞ


     
1 1 Sh
 0:7 ld þ 0:3 ld ¼ 0:6746 ;
0:7 0:3 symb
X4   
ðcÞ  1 Sh
H ðSÞ ¼ ðcÞ
P ðsk Þ ld ðcÞ
¼ 1:9867 ;
k¼1
P ðsk Þ symb

In the case P(s1/s3) = P(s4/s3) = 0 i P(s3/s3) = 1, the source is not stationary, and
the entropy cannot be found.
Maximal entropy would correspond to the case when all the elements of the
transition matrix have the values 1/4, the symbols are equally probable as well,
yielding
Problems 25

Fig. 2.8 Block-scheme of


the equivalent source Binary source xk yk
without
memory

T T
yk-2 yk-1

X 
4
1X
4
1 Sh
Hmax ðSÞ ¼ ld ð4Þ ¼ 2 :
i¼1
4 j¼1
4 symb

In this case, the entropy of the adjoint source has the same value. It can be easily
shown that this type of transition matrix corresponds to the zero-memory source.
Problem 2.5 Zero-memory source emits binary sequence xk at the encoder input as
shown in Fig. 2.8, where T denotes the cell delay corresponding to the duration of
one symbol and addition is modulo 2.
(a) If encoder and source form an equivalent source emitting sequence yk, find the
type of this sequence. Is it a binary sequence? Whether the equivalent source
has memory?
(b) Draw state diagram and trellis for this source. Find the state probabilities and
symbol probabilities of the equivalent source.
(c) If the probability of one binary symbol in sequence xk is p = 0.2 find the
entropy of equivalent source and entropy of the source adjointed to it.
(d) Draw the entropy of the equivalent source as well as the entropy of its adjoint
source as a function of p.

Solution
(a) All operations are in binary field and the sequence yk is binary sequence as
well. Forming of the sequence emitted by the equivalent source with memory
is defined by

yk ¼ xk  yk2 ;

showing clearly that output depends on the input and on the delayed input,
where the delay is equal to the duration of two binary symbols. Therefore, the
output symbol depends on two previously emitted symbols and second-order
memory source is obtained.
(b) State diagram of the equivalent source can be easily drawn having in view that
the source state (second-order memory source—m = 2) depends on two pre-
viously emitted symbols. These are the symbols at the cell outputs and the
current state is S ¼ ðyk2 ; yk1 Þ while the next state will be S0 ¼ ðyk1 ; yk Þ.
From every state the source can enter only into two other states depending on
26 2 Information Sources

(b)
(a) S S'
p 1-p
00 00
1-p
p
00 01
1-p 01 1-p 01
p
p p
p
1-p 10 1-p 10

10 11 p
p 1-p 1-p
11 11

Fig. 2.9 State diagram (a) and trellis (b) of the second-order memory source

yk, i.e. on the value of the binary symbol emitted by zero-memory source.
Therefore, the transition probabilities are p and 1 − p. The corresponding state
diagram and trellis are shown in Fig. 2.9a, b, where the symbols are denoted
by ‘0’ and ‘1’.
The following set of equations holds

Pð00Þ ¼ Pð00Þð1pÞ þ Pð10Þp;


Pð01Þ ¼ Pð00Þp þ Pð10Þð1pÞ;
Pð10Þ ¼ Pð01Þð1pÞ þ Pð11Þp;
Pð11Þ ¼ Pð01Þp þ Pð11Þð1pÞ:

However, this system does not have the unique solution (it is singular!) and a
new equation should be added to substitute any one in the set. This equation
can be easily written, because the equivalent source must be always in some
state

Pð00Þ þ Pð01Þ þ Pð10Þ þ Pð11Þ ¼ 1:

Steady state probabilities are P(00) = P(01) = P(10) = P(11) = 1/4, and all
states are equiprobable.
The corresponding probabilities of the binary symbols are

Pð0Þ ¼ Pð00Þ þ ðPð01Þ þ Pð10ÞÞ=2 ¼ 0:5;


Pð1Þ ¼ Pð11Þ þ ðPð01Þ þ Pð10ÞÞ=2 ¼ 0:5:

It should be noted the perfect symmetry of state diagram as well as of the


trellis yielding as a consequence such values of probabilities.
Problems 27

From the state diagram it can be seen that the conditional probability
Pðyk =yk2 yk1 Þ equals to the transition probability from the state
S ¼ ðyk2 ; yk1 Þ into the state S0 ¼ ðyk1 ; yk Þ. Therefore,

Pð0=00Þ ¼ Pð0=01Þ ¼ Pð1=10Þ ¼ Pð1=11Þ ¼ 1p;


Pð1=00Þ ¼ Pð1=01Þ ¼ Pð0=10Þ ¼ Pð0=11Þ ¼ p:

(c) Entropy of the sequence yk can be found using the general expression for the
entropy of the second-order memory source

2 X
X 2 X
2  
1
HðYÞ ¼ Pðyi1 ; yi2 ÞPðyj =yi1 ; yi2 Þ ld ;
i1 ¼1 i2 ¼1 j¼1
Pðyj =yi1 ; yi2 Þ

writing equivalently

4 X
X 2  
1
HðYÞ ¼ PðSi ÞPðyj =Si Þ ld ;
i¼1 j¼1
Pðyj =Si Þ

where Si ¼ ðyi1 ; yi2 Þ denotes the ith source state


  
1 1
HðYÞ ¼ Pð00ÞPð0=00Þ ld þ Pð00ÞPð1=00Þ ld
Pð0=00Þ Pð1=00Þ
   
1 1
þ Pð01ÞPð0=01Þ ld þ Pð01ÞPð1=01Þ ld
Pð0=01Þ Pð1=01Þ
   
1 1
þ Pð10ÞPð0=10Þ ld þ Pð10ÞPð1=10Þ ld
Pð0=10Þ Pð1=10Þ
   
1 1
þ Pð11ÞPð0=11Þ ld þ Pð11ÞPð1=11Þ ld :
Pð0=11Þ Pð1=11Þ

Taking into account the calculated probabilities


       
1 1 1 1 1 1 1 1
HðYÞ ¼ ð1  pÞ ld þ p ld þ ð1  pÞ ld þ p ld
4 1p 4 p 4 1p 4 p
       
1 1 1 1 1 1 1 1
þ p ld þ ð1  pÞ ld þ p ld þ ð1  pÞ ld ;
4 p 4 1p 4 p 4 1p

finally, the following is obtained


   
1 1
HðYÞ ¼ ð1  pÞ ld þ p ld :
1p p
28 2 Information Sources

By definition, an adjoint source is a zero-memory source which has the same


symbol probabilities as the source with memory to which is it adjoint. In the
considered case its entropy is

X
2   
1 Sh
HðYÞ ¼ Pðyj Þ ld ¼ 0:5 ld 2 þ 0:5 ld 2 ¼ 1 :
k¼1
Pðyj Þ symb

The above derived expressions are valid for any value of parameter p. For
p = 0.2, the corresponding numerical value of entropy for the source with
memory is

HðSÞ ¼ 0:8 ld ð0:8Þ  0:2 ld ð0:2Þ ¼ 0:7219 ½Sh/symb;

while the entropy of the adjoint source does not depend on p, being always
HðSÞ ¼ 1 ½Sh/symb:
(d) The entropy as a function of p is shown in Fig. 2.10. As one can expect, the
entropy of the source with memory is always smaller than the entropy of its
adjoint source. It should be noted that for practically all values of p the source
having equiprobable symbols ‘0’ i ‘1’ is obtained (it is only not true in the case
p = 0, but for some initial states only). This means that the condition P(0) =
P(1) = 0.5 is not sufficient for the entropy to have the maximal value. The
same could be concluded from the previous problem, but it should be noted
that in this case, the zero-memory source is considered.

0.9

0.8

0.7
Entropy H(S)

0.6

0.5

0.4

0.3

0.2
Entropy − source with memory
0.1
Entropy − adjoint source

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p

Fig. 2.10 The entropy of the second-order memory source as a function of p


Problems 29

It is interesting to notice that for p = 0 and p = 1 entropy equals zero. For


p = 0 the emitted sequence depends on the initial source state. In any case the
output sequence will be deterministic (no information is emitted!) as follows

(1) For the initial state 00 the following sequence is emitted


00000000000000000000…
(2) For the initial state 01 the following sequence is emitted
01010101010101010101…
(3) For the initial state 10 the following sequence is emitted
10101010101010101010…
(4) For the initial state 11 the following sequence is emitted
11111111111111111111…
It is expected from a reader to find the sequence emitted when p = 1, depending
on the initial state. Do the sequences are deterministic, either they can be random as
well?
Problem 2.6 Binary zero-memory source emits sequence xk entering the encoder
input, generating the corresponding encoder output

yk ¼ xk  a1 yk1  a2 yk2  . . .  ap ykp ;

where the coefficients ai are from binary field and addition is modulo 2.
(a) Considering zero-memory source and encoder as an equivalent source
(emitting sequence yk) find entropy of this source.
(b) Draw state diagram and trellis for this source for the case p = 2 and a1 =
a2 = 1. Find the corresponding state and symbol probabilities.
(c) If the zero-memory source emits all-zeros sequence and encoder parameters
are p = 5, a2 = a5 = 1 and a1 = a3 = a4 = 1, draw block-scheme of an
equivalent source (with memory), find its entropy and draw autocorrelation
function of the sequence yk.
(d) Find the autocorrelation function of an equivalent source, when its input is
generated by zero-memory source emitting binary sequence xk and (subse-
quently) encoded by encoder having p cells (p even) according to relation

yk ¼ xk  ykp :

Solution
(a) All calculations performed by encoder are over binary field and sequence yk is
binary as well, the source memory is of order p. The corresponding entropy is
30 2 Information Sources

2p X
X 2  
1
HðYÞ ¼ PðSi ÞPðyj =Si Þ ld ;
i¼1 j¼1
Pðyj =Si Þ

where Si ¼ ðyi1 ; yi2 ; . . .; yip Þ denotes the current encoder state. From this state
only the transitions into two next states are possible—ðyi1 ; yi2 ; . . .; yip ; 0Þ and
ðyi1 ; yi2 ; . . .; yip ; 1Þ, depending on the value yj ¼ 0 or yj ¼ 1. Transition
probabilities depend on the current state yielding Pðyj ¼ 1=yi1 ; yi2 ; . . .; yip Þ ¼
Pi or Pðyj ¼ 0=yi1 ; yi2 ; . . .; yip Þ ¼ 1  Pi . The previous expression can be
written in the form

X
2p     
1 1
HðYÞ ¼ PðSi Þ Pi ld þ ð1  Pi Þ ld :
i¼1
Pi 1  Pi

However, taking into account the following

p; a1 yk1  a2 yk2  . . .  ap ykp ¼ 0


Pi ¼
1  p; a1 yk1  a2 yk2  . . .  ap ykp ¼ 0

it is clear that the sum in brackets does not depends on i (one addend will be p, the
other 1 − p), yielding finally
     X2p
1 1
HðYÞ ¼ p ld þ ð1  pÞ ld PðSi Þ
p 1p
    i¼1
1 1
¼ p ld þ ð1  pÞ ld :
p 1p

Therefore, the encoding cannot change the source memory (but can introduce the
memory into emitted sequence).
(b) For p = 2 and a1 = a2 = 1 block-scheme of equivalent source slightly differs
from that in Fig. 2.8, because the output of the first cell is introduced into
modulo-2 adder as well. The corresponding state diagram and trellis of the
equivalent source are constructed in the same way as in the previous problem.
They are shown in Fig. 2.11a, b.
Comparing state diagram and trellis to these ones in the previous problem
(Fig. 2.9), the complete asymmetry can be noticed here. However, all four states are
also equiprobable yielding P(0) = P(1).
(c) The emitting of an all-zeros sequence by zero-memory source is equivalent to
the case where it is off, and the corresponding block-scheme of the source is
given in Fig. 2.12.
Problems 31

(b)
(a) S S'
p 00
1-p
1-p 00
p
00 01
01 p 01
p
1-p
1-p p
p
1-p 10 1-p 10

10 11 1-p
1-p p p
11 11

Fig. 2.11 State diagram (a) and trellis (b) of the source having parameters p = 2 and a1 = a2 = 1

yk

T T T T T
yk-2 yk-1

Fig. 2.12 Block-scheme of equivalent source for p = 5, a2 = a5 = 1 and for xk = 0

This block-scheme corresponds to the structure of pseudorandom binary


sequence generator (PN) [4], PN generator having N delay cells, and where the
connections of cells and modulo 2 adder are chosen optimally. During one period it
emits L = 2N − 1 bits, from which 2N−1 binary “ones” and 2N−1 − 1 binary “zeros”.
Therefore, the average value is

1X L
2N1 ðL þ 1Þ=2 1 1
my ¼ Efyk g ¼ yk ¼ ¼ ¼ þ ;
L k¼1 L L 2 2L

where Efg denotes mathematical expectation.


The autocorrelation function of this PN sequence can be calculated using the
definition of discrete autocorrelation function [5]

1X L
Ry ðlÞ ¼ Efyk yk þ l g ¼ yn ynk
L k¼1

and by using the relation between two binary numbers


32 2 Information Sources

ab ¼ ða þ b  a  bÞ=2:

the following result is obtained

1 XL
1X L
yk  ykl
Ry ðlÞ ¼ ðyk þ ykl  yk  ykl Þ ¼ my  :
2L k¼1 L k¼1 2

Sum in this expression for l = 0 is the sum of the addends being equal to zero,
while for l 6¼ 0 it corresponds to one half of the average value yielding finally

my ; l ¼ 0; L; . . .
Ry ðlÞ ¼
my =2; l 6¼ 0; L; . . .

The autocorrelation function is shown in Fig. 2.13. PN sequence is periodical


and autocorrelation function will be periodical as well, with the period
L = 25 − 1 = 31. This sequence is periodic and therefore deterministic, without any
information carried and the corresponding source entropy equals zero.
(d) In this case sequence yk can be written as follows

yk ¼ xk  ykp ¼ xk þ ykp  2xk ykp :

Average value is

0.7

0.6
Autocorrelation function, Ry (l)

0.5

0.4

0.3

0.2

0.1

−0.1

−0.2
−60 −40 −20 0 20 40 60
Discrete shift, l

Fig. 2.13 Autocorrelation function of PN sequence generated for p = 5 and a2 = a5 = 1


Problems 33

   
Efyk g ¼ Efxk g þ E ykp  2E xk ykp ;

input and output encoder sequences are uncorrelated, yielding

my ¼ mx þ my  2mx my ) my ¼ 1=2:

PN sequence autocorrelation function can be found using the definition of dis-


crete autocorrelation function
 
Ry ðlÞ ¼ Efyk ykl g ¼ E ðxk þ ykp  2xk ykp Þðxkl þ ykpl  2xkl ykpl Þ
     
¼ Efxk xkl g þ E xk ykpl  2E xk xkl ykpl þ E ykp xkl
     
þ E ykp ykpl  2E ykp xkl ykpl  2E xk ykp xkl
   
 2E xk ykp ykpl þ 4E xk ykp xkl ykpl
¼ Rx ðlÞ þ mx my  2Rx ðlÞmy þ mx my þ Ry ðlÞ  2Ry ðlÞmx  2Rx ðlÞmy
 2Ry ðlÞmx þ 4Rx ðlÞRy ðlÞ:

For my = 1/2 a simple relation is further obtained

Ry ðlÞ ¼ mx  Rx ðlÞ þ Ry ðlÞ 1  4Ry ðlÞmx þ 4Rx ðlÞ ;

and because the input sequence is uncorrelated, it can be written

mx ; l ¼ 0; L; . . .
Rx ðlÞ ¼
mx =2; l 6¼ 0; L; . . .

In the case l = 0 autocorrelation function equals average power of binary signal


(and to its average value as well!). It is interesting to consider the following relation

Ry ðlÞ ¼ mx =2 þ Ry ðlÞ 1  2Ry ðlÞmx ; l 6¼ 0:

By using z-transformation the discrete average power spectrum density (APSD)


can be found

mx =2
UðzÞ ¼ ;
ð1  zÞð1  ð1  2mx Þzp Þ

and by further using the partial decomposition [6]


!
A X
p=2
Bj Bj
UðzÞ ¼ þ  ;
1z j¼1
1  q j z 1 þ qj z
34 2 Information Sources

where coefficients A, Bj i qj do not depend on z, the autocorrelation function finally


can be written in the form

X
p=2
Ry ðlÞ ¼ Bj qlj ; l 0:
j¼1

Autocorrelation functions for p = 2 and p = 8 are shown in Fig. 2.14a, b. It is


easy to notice that autocorrelation function has constant values for all discrete shifts
except for integer multiples of p. On the other hand, by considering only the values
Ry(np), the exponential decay is observed with the greater values of n. Further, the
rate of change of Ry(np) diminishes with the greater values of p. Taking this into
account, the memory order (p) can be easily found from the autocorrelation function
shape.
Problem 2.7 First-order memory binary source is defined by the conditional
probabilities P(0/1) = P(1/0) = p.
(a) Draw the corresponding source realization by using zero-memory binary
source, one adder and one delay cell.
(b) Draw the corresponding state diagram and trellis. Find the probabilities of
binary symbols as well as of stationary states.
(c) For p = 0.1 find the corresponding sequence at the output. Find the symbol
probabilities for the second extension of the source. Find the order of this
extension.
(d) By using definition, find the entropy of source with memory and of its second
extension, as well as the entropies of their adjoint sources.
(e) Sketch the source entropy and the entropy of its adjoint source as a function of
the extension order for at least three values of p.

(a) (b)
0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4
Ry(l)

Ry(l)

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −40 −30 −20 −10 0 10 20 30 40
Discrete shift, l Discrete shift, l

Fig. 2.14 Autocorrelation functions for mx = 0, 5 and p = 2 (a), p = 8 (b)


Problems 35

Fig. 2.15 First-order First order memory


memory binary symmetric Binary source xk source yk
source without
memory

T
yk-1

Solution
(a) This is a first-order memory binary symmetric source, it can be realized by
combining zero-memory binary source and a differential encoder [5], as shown
in Fig. 2.15.
(b) State diagram and the corresponding trellis are shown in Fig. 2.16. The source
state corresponds to the previously emitted bit and transition probability into
the next state is determined only by the bit probability at the zero-order
memory binary source output. The source is symmetric (easily concluded from
state diagram or the trellis) output binary symbols are equally probable
yielding P(0) = P(1) = 0.5. The probability for the source to be in the state ‘0’
equals the probability that previously emitted bit is ‘0’ and state probabilities
are equal (the source is stationary).
(c) If zero-memory source emits “binary ones” with probability p = 0.1, the
transition probability into the other state will be small and transition in the
sequence at the source output will be rare. Characteristic sequence emitted by
the source with memory will have the following form

. . . 00000000001111111000000111111111111111111100000000000111111
111110000000000 . . .
Source extension of the order n = 2 results in grouping two neighboring binary
symbols into one compound symbol, the possible compound symbols being
A = ‘00’, B = ‘01’, C = ‘10’ and D = ‘11’. The above binary sequence can now be
written using two twice shorter sequence

(a) (b)
p S 1-p S'
0 0
1-p 1-p p
0 1
p
p 1 1
1-p

Fig. 2.16 State diagram (a) and trellis (b) of first-order memory binary symmetric source
36 2 Information Sources

. . .AAAAADDDCAABDDDDDDDDDAAAAABDDDDDAAAAA. . .

While binary symbols of the original sequence are equally probable, in the new
sequence the symbols A and D will have substantially greater probabilities than the
symbols B and C, being the consequence of the unequal probabilities in the state
diagram. By using the joint probability, it is easy to find

PðAÞ ¼ Pð00Þ ¼ Pð0ÞPð0=0Þ ¼ 0:5  ð1  pÞ ¼ 0:45


PðBÞ ¼ Pð01Þ ¼ Pð0ÞPð1=0Þ ¼ 0:5  p ¼ 0:05
PðCÞ ¼ Pð10Þ ¼ Pð1ÞPð0=1Þ ¼ 0:5  p ¼ 0:05
PðDÞ ¼ Pð11Þ ¼ Pð1ÞPð1=1Þ ¼ 0:5  ð1  pÞ ¼ 0:45

To find the entropy of the nth extension of the m-order memory source, the
memory order of the extended source firstly has to be found. It is calculated by
using formula [7]
lmm
l¼ ;
n

where the operator d:e denotes the taking of upper integer (if the quotient is not an
integer). In this case m = 1 and n = 2 yielding

l ¼ d1=2e ¼ 1;

obtaining the first-order memory source by extension. By any extension of source


with memory, zero-memory source cannot be obtained because there will be always
some statistical dependence at the borders of extension source symbols.
(d) The entropy of the original first-order memory source is

X
2 X
2  
1
HðSÞ ¼ Pðsi ÞPðsj =si Þ ld
i¼1 j¼1
Pðsj =si Þ

or
     
1 1 1
HðSÞ ¼ 0:5ð1  pÞ ld þ 0:5p ld þ 0:5p ld þ 0:5ð1
 1p
 p p
1
 pÞ ld ;
1p

yielding finally the well known result


Problems 37

   
1 1
HðSÞ ¼ ð1  pÞ ld þ p ld ;
1p p

showing that it is equal to entropy of the zero-memory source at the differential


coder input, because the entropy cannot be changed by differential encoding.
Entropy of the corresponding adjoint source is, as in the previous example,

X
2   
1 Sh
HðSÞ ¼ Pðsi Þ ld ¼ 0:5 ld 2 þ 0:5 ld 2 ¼ 1 :
i¼1
Pðsi Þ symb

The extended source is the first-order memory source as well, and the following
expression can be used

2 X
X 4  
1
HðS2 Þ ¼ Pðsi ÞPðrj =si Þ ld ;
i¼1 j¼1
Pðrj =si Þ

where r1 = A, r2 = B, r3 = C and r4 = D. Stationary compound symbol proba-


bilities are found earlier as P(r1) = P(r4) = (1–p)/2 and P(r2) = P(r3) = p/2.
If the symbol rj is obtained by combining sj,1 and sj,2, then sequence si rj
corresponds to the sequence si, sj,1, sj,2 emitted by the original source, and the
corresponding conditional probabilities can be found by using

Pðrj =si Þ ¼ Pðsj;2 =sj;1 ÞPðsj;1 =si Þ;

and after the final ordering of the entropy expression (tedious, but straight proce-
dure) the following is obtained
   
1 1
HðS2 Þ ¼ 2ð1  pÞ ld þ 2p ld :
1p p

By definition, the entropy of the adjoint source is

X
2 X
2  
1 ð1  pÞ 1p p p
HðS2 Þ ¼ Pðri Þld ¼ 2 ld ð Þ  2 ld ð Þ
i¼1 j¼1
Pðri Þ 2 2 2 2
¼ ð1  pÞ½ ld ð1  pÞ  1  p½ ld ðpÞ  1 ¼ 1  ð1  pÞ ld ð1  pÞ  p ld ðpÞ:

One can easily verify that two previously given expressions can be obtained on
the basis of the general relations

HðSn Þ ¼ nHðSÞ;
HðSn Þ ¼ ðn  1ÞHðSÞ þ HðSÞ;
38 2 Information Sources

where the last equality is obtained using the fact that the difference between the
entropy of the extended source and its adjoint source does not change with the
extension order.
For p = 0.1 the corresponding numerical values are

HðSÞ ¼ 0:469 ½Sh/s; HðS2 Þ ¼ 0:938 ½Sh/s; HðS2 Þ ¼ 1:469 ½Sh/s;

and the values of HðSn Þ and HðSn Þ are shown in Fig. 2.17 for three characteristic
values of p.
When p = 0.5, the transitions into the next possible states are equiprobable and
the source loses the memory (generated sequence becomes uncorrelated) yielding
HðSÞ ¼ HðSÞ and HðSn Þ ¼ HðSn Þ ¼ nHðSÞ. On the other hand, if p = 0.01 there is
a great difference between the entropies of the original source and its adjoint source
(HðSÞ HðSÞ) and it even after the extension HðSn Þ has substantially greater
value than HðSn Þ. If p is fixed, the difference between entropy of the extended
source and the entropy of its adjoint source does depend of the extension order,
being in concordance with relation

HðSn Þ  HðSn Þ ¼ HðSÞ  HðSÞ:

10
the extended source, p=0.01
9 the adjoint to the extended source, p=0.01
the extended source, p=0.1
8 the adjoint to the extended source, p=0.1
the extended source, p=0.5
the adjoint to the extended source, p=0.5
7
Entropy [Sh/s]

0
1 2 3 4 5 6 7 8 9 10
The extension order, n

Fig. 2.17 HðSn Þ and HðSn Þ as functions of the extension-order n for p = 0.01, p = 0.2 and
p = 0.5
Problems 39

Problem 2.8 Zero-memory continuous source generates Gaussian random process


xk (mx = 0, r2x = 1) as the input of transversal filter yielding at the output

X
p
yk ¼ xk  ai yki ;
i¼1

where coefficients ai are real numbers and the addition is in decimal system.
(a) Draw the system block-scheme and explain its functioning. Find the average
value and autocorrelation function of the output process.
(b) Is it possible to generate Gaussian random process having predefined auto-
correlation function?
(c) Find the entropy H(X) at the filter input. Does the entropy of the output
random process is smaller or greater than H(X)?

Solution
(a) Block-scheme of the system, shown in Fig. 2.18, can be found directly from
the corresponding equation. The average value of the random process at the
output is

X
p
Efyk g ¼ Efxk g  ai Efyki g;
i¼1

and this process is the stationary Gaussian process as well (autoregressive IIR filter
is a linear system!). As mx = Efxk g ¼ 0, it follows
!
X
p
my 1 þ ai ¼ 0 ) my ¼ 0:
i¼1

Fig. 2.18 Structure of order p AR filter


40 2 Information Sources

By performing the z-transform of yk, the corresponding transfer function [4] is


obtained

1
HðzÞ ¼ ;
P
p
1 ai zi
i¼1

and the discrete average power spectrum density of the output Gaussian process is

N0
Uy ðzÞ ¼  2 :
 
1 þ P ai zi 
p
 
i¼1

Using Wiener-Khinchin theorem, the corresponding output autocorrelation


function is easily obtained

Zp
1 N0 ezl
RðlÞ ¼  2 dz ðl ¼ 0; 1;    ; ðp  1ÞÞ:
 
1 þ P ai zi 
2p p
p
 
i¼1

(b) Autoregressive (AR) model makes possible to generate process having the
predefined autocorrelation function. If the values of R(l) are given in advance,
it is possible to define correlation matrix R, coefficients vector of AR filter
a and correlation vector r, as follows

2 3 2 3 2 3
Ry ð0Þ Ry ð1Þ    Ry ðp  1Þ a1 Ry ð1Þ
6 Ry ð1Þ Ry ð0Þ    Ry ðp  2Þ 7 6 a2 7 6 Ry ð2Þ 7
6 7 6 7 6 7
R¼6 .. .. .. .. 7; a ¼ 6 .. 7; r ¼ 6 . 7:
4 . . . . 5 4 . 5 4 .. 5
Ry ðp  1Þ Ry ðp  2Þ  Ry ð0Þ ap Ry ðpÞ

On the basis of R and r, the coefficients of AR filter can be calculated to obtain


output process which has the predefined autocorrelation function. The coefficients
are obtained by using the Yule-Walker equations [8], which in matrix form is given
by [9]

a ¼ R1 r:

In Fig. 2.19 it is shown how from an uncorrelated Gaussian random process xk, a
correlated random process yk can be generated. It is obtained by filtering process xk
using filter with p = 50 delay cells, where the delay of one cell corresponds to the
sampling period of the process T = 1 [ls]. Estimation was made using the sample
Problems 41

4 4

2 2
x(t)

y(t)
0 0

−2 −2

−4 −4
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
time t[ms] time t[ms]

0.3 0.3
f (x)

fY(y)
0.2 0.2
X

0.1 0.1

0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x y

Fig. 2.19 Uncorrelated process (noise) x(t) at the input, correlated process at the output y(t) and
the corresponding probability density functions for lmax = 50

size N = 106 and the uncorrelated and correlated signals are shown in Fig. 2.19. In
the same figure the probability density functions of both processes are shown. It is
clear that the first order statistics are not changed when AR filter was used.
On the other hand, second order statistics of input and output processes differ
substantially, as shown in Fig. 2.20. The input process is uncorrelated, while, the
output process has the autocorrelation function corresponding very good to the
predefined linearly decreasing function.

ðlmax  lÞ=lmax ; l  lmax


RðlÞ ¼
0; l [ lmax

where parameter lmax determines the autocorrelation function slope.


(c) Entropy of the continuous source without memory is obtained by using the
expression

Z1  
1
HðXÞ ¼ wðxÞ ld dx:
wðxÞ
1
42 2 Information Sources

0.9 required Ry(l)


Rx(l), input − uncorrelated noise
Autocorrelation function 0.8 Ry(l), output − correlated noise

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50
discrete shift

Fig. 2.20 Input and output autocorrelation function of the process

For Gaussian process with parameters mx = 0, r2x = 1, probability density


function has the form

1 x2
wðxÞ ¼ pffiffiffiffiffiffi e 2 ; 1\x\1:
2p

Taking into account that process is uncorrelated, the above written expression
for entropy can be used

Z1 Z1  
1 1 x2 1 x2
HðXÞ ¼ Þdx ¼ pffiffiffiffiffiffi
wðxÞ ldð e 2 ld ð2pÞ2 e 2 ds
wðxÞ 2p
1 1
2 3 2 3 \!endaligned [
Z1 Z1
ld ð2pÞ 4 1 2 ld ðeÞ 4 1 2
e 2 dx5 þ x2 e 2 dx5:
x x
¼ pffiffiffiffiffiffi pffiffiffiffiffiffi
2 2p 2 2p
1 1

Both expressions in brackets have 1 as a value, and the entropy of uncorrelated


Gaussian process at the AR filter input is

ld ð2peÞ
: HðXÞ ¼
2
The output process is correlated, and can be considered as generated by a
continuous source with memory. For such sources, the entropy cannot be calculated
Problems 43

using the same formula as above. However, it can be noticed that the source
generates signal yk which has the same probability density function as signal xk and
that their first order statistics are identical. Therefore, one can consider the source
generating xk as an adjoint source to that one generating signal yk, from which
follows

HðYÞ  HðYÞ ¼ HðXÞ:

The equality is valid only for lmax = 1, while for its higher values, the entropy of
the output sequence of AR filter is smaller than entropy at the input, and the
difference is greater if autocorrelation function decreases slowly.
Chapter 3
Data Compression (Source Encoding)

Brief Theoretical Overview

In this chapter only the discrete sources are considered. Generally, the encoding is a
mapping of sequences of source alphabet symbols (S) into sequences of code
alphabet symbols (X) (Fig. 3.1).
Number of code alphabet symbols (r) is called code base. If the code alphabet
consists of two symbols only, binary code is obtained. Block code maps each of the
symbols (or their sequences) of source alphabet S into a fixed sequence of code
alphabet X symbols. The obtained sequences are called code words (Xi). A number
of symbols in a code word is called code word length.
Consider the following binary code (r = 2) for a source with q = 4 symbols.
Code alphabet symbols are denoted simply as 0 and 1.

S Xi
s1 0
s2 01
s3 11
s4 01

This code is defined by the table. A code can be also defined in some other way,
e.g. using mathematical operations. The above code is practically useless because
two source symbols have the same code words. The decoding is impossible. It is
singular code. Therefore, a natural restriction is put—all code words should be
distinct. A code is nonsingular if all code words are distinct. Consider the fol-
lowing nonsingular code

© Springer International Publishing AG 2017 45


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_3
46 3 Data Compression (Source Encoding)

Fig. 3.1 Encoder


block-scheme S{s1,s2,…,sq} X{x1,x2,…,xr}
Encoder

S Xi
s1 0
s2 01
s3 11
s4 00

But this code is also useless. The sources are emitting sequences of symbols and
the corresponding code words will be received by decoder in sequences. E.g., if the
received sequence is 0011, there are two possibilities for decoding—s1s1s3 and s4s3.
Therefore, a further restriction should be put. All possible sequences of code words
(corresponding to all possible sequences of source symbols) must be different to
allow a unique decodability. In other words, a nonambiguous decoding must be
allowed. A (block) code is uniquely decodable if the nth extension of the code (i.e.
the sequence corresponding to the nth source extension) is nonsingular for finite n.
Consider the following uniquely decodable codes

S (a) (b) (c)


s1 00 0 0
s2 01 10 01
s3 10 110 011
s4 11 1110 0111

From the practical point of view, one more condition should be imposed. All
three codes are uniquely decodable. Code words of code (a) have the same code
length. Code (b) is a comma code (Problems 3.1 and 3.2), where at the end of every
word is the symbol not appearing anywhere else. In binary code this symbol can be
0 (and 1 as well). In the above example the code words consist of a series of ones,
with a zero at their end. However, code (c) is a little bit different. Here all code
words start with 0. It means that decoder has to wait for the beginning of the next
code word, to decode the current one. Therefore, an additional condition is put.
A uniquely decodable code is instantaneous (Problems 3.2–3.4) if it is possible to
decode each word in a sequence without reference to neighboring code symbols.
A necessary and sufficient condition for a code to be instantaneous is that no
complete code word is a prefix of some other code word. Code (c) does not satisfy
this condition. In Fig. 3.2 the mentioned subclasses of codes are shown.
Brief Theoretical Overview 47

CODES

NONSINGULAR

UNIQUELY DECODABLE

INSTANTANEOUS

Fig. 3.2 Subclasses of codes

As a suitable way to visualize the code, a code tree can be constructed. The tree
consists of the root, the nodes and the branches. From the root and from every node,
for r-base code, there are r branches, each one corresponding to one code symbol.
The alphabet symbols are at the ends of outer branches. Decoder starts from the root
and follows the branch determined by a code symbol. At the end the corresponding
source symbol should be found. In Fig. 3.3 the code trees corresponding to the last
example are shown.
In the figure is arbitrary chosen that going along left branch corresponds to 0 and
along right branch corresponds to 1. Of course, bits in the code words can be

(a) (b) (c)

s1
0
s1
0
s2
s2 01
10
s1 s2 s3 s4
00 01 10 11 s3
s3
011
110

s4 s4
1110 0111

Fig. 3.3 Code trees corresponding to the above example


48 3 Data Compression (Source Encoding)

complemented. It corresponds to the rotations of the branches starting from nodes at


corresponding level. Codes (a) and (b) are instantaneous, while the tree corre-
sponding to (c) is degenerated, and the code is not instantaneous. It also can be
noticed that the last bit in the code (b) for s4 is needless and the decoder can make
the correct decision without it. It can be concluded also that the use of a
non-degenerated code tree guarantees an instantaneous code.
For the binary code (r = 2), for every integer q, code tree can be constructed
having exactly q final nodes. Generally, for r  3, it is not always possible to have
exactly q final nodes. It can be easily seen from the example in Fig. 3.4.
It is obvious that ternary coding cannot provide a useful code tree for source with
q = 4 symbols. Code (b) offers 5 code words and one code word will be unused.
If the source has q symbols, and the code base is r, the Kraft inequality
(Problems 3.2, 3.4 and 3.5) gives necessary and sufficient condition for the exis-
tence of an instantaneous code with word lengths l1, l2, …, lq

X
q
r li  1:
i¼1

Of course, for a binary code (code base r = 2), the inequality is

X
q
2li  1:
i¼1

If this relation is satisfied with equality, the code is complete (Problem 3.4). In
fact, it is merely a condition on the word lengths of a code and not on the con-
struction of words themselves (in the extreme case, one can put only symbol 0 in all
code words). The next example is interesting.
Consider the following binary (r = 2) codes for a source with q = 4 symbols [3]

(a) (b)

s1 s2 s3 s1 s2
0 1 2 0 1

s3 s4 s5
20 21 22

Fig. 3.4 Code trees of ternary code (r = 3) for the source with q = 3 (a) and q = 5 symbols (b)
Brief Theoretical Overview 49

S (a) (b) (c) (d) (e)


s1 00 0 0 0 0
s2 01 100 10 100 10
s3 10 110 110 110 110
s4 11 111 111 11 11
P4 1 7/8 1 1 9/8
2li
i¼1

Codes (a), (b), (c) and (d) satisfy Kraft inequality. However, code (d) is not
instantaneous because s4 is a prefix of s3. But, the code words with such lengths can
be used to obtain an instantaneous code—(c). On the other hand, using code words
whose lengths correspond to code (e), it is not possible to construct an instanta-
neous code.
The goal of data compression is to transmit or record the information as eco-
nomically as possible. The useful measure is the average code word length
(Problem 3.1)
X
q
l ¼ L ¼ Pi li :
i¼1

where Pi (i = 1, 2, …, q) are the corresponding source symbol probabilities. A code


is compact if its average code word length is less than or equal to the average code
word length of all other uniquely decodable codes for the same source and the same
code alphabet. One can ask is there the lower limit for the average code word
length. The answer is given by First Shannon theorem (Problem 3.7). Consider a
zero-memory source. Entropy of the nth extension (Problem 2.7) of zero-memory
source is H(Sn) = nH(S), where H(S) is the entropy of the original source. Shannon
proved the following
Ln HðSÞ
lim ¼ :
n!1 n ldr

i.e. by unlimited source extension (n ! ∞) the equivalent average code word


length per original symbol can be made as close to the entropy as one wish. In
praxis, with the finite n, the following is valid

HðSÞ
L :
ld r

For a binary code

L  HðSÞ:
50 3 Data Compression (Source Encoding)

This result is obvious, because one bit cannot carry more than one shannon.
This theorem can be proved as well for sources with memory. For the nth
extension of such source it is obtained (where the adjoint source to the source S, is
denoted by S)

HðSÞ  HðS) Ln HðSÞ  HðS) þ 1


HðSÞ þ  \HðSÞ þ :
n n n

It means that with sufficient source extension, the equivalent average code word
length can be as near as we wish to the source entropy.
To compare the various codes, some parameters are defined. A code efficiency
(Problems 3.1, 3.7 and 3.8). is

HðSÞ
g¼  100% ¼ 100%
L

For 100% efficiency a code is perfect (Problems 3.1 and 3.3). Compression ratio
is (Problems 3.1, 3.7 and 3.8).

d ld qe
q¼ :
L

where operator d:e denotes the first greater (or equal) integer than the argument.
The maximum possible compression ratio value (Problem 3.1) is

d ld ðqÞe d ld ðqÞe
qmax ¼ ¼
HðSÞ ld ðqÞ

The interesting features concerning the transmission errors are maximum code
word length (Problem 3.5). and sum of code word lengths as well as code words
length variance (Problem 3.5). Further, effective efficiency of code for sources with
memory, can be defined as well as maximum possible efficiency and maximum
possible compression ratio (Problem 3.7) for a given extension order.
The first known procedure for the source encoding was the Shannon-Fano one
[10,11] (Problems 3.1 and 3.4). It was proposed by Shannon, while Fano made it
easier to follow by using the code tree (Problems 3.3, 3.4, 3.6, 3.7 and 3.8). By
applying this procedure sometimes more code trees have to be considered. The main
reason is that the most probable symbols should have the shortest words. For binary
codes it could be achieved by dividing symbols into two groups having equal (or as
equal as possible) probabilities. Code words for one group will have the first bit 0,
and for the second the first bit will be 1. It is continued into the subgroups until all
subgroups contain only one symbol. For r > 2, the number of subgroups is r.
Consider the next examples:
Brief Theoretical Overview 51

code a) code b)

s1

s2 s1 s2 s3

s3

s4 s5

s4 s5

Fig. 3.5 Code trees corresponding to the codes (a) and (b)

(a) S Pi I division II division III division IV division


s1 0.6 0 0 0 0
s2 0.2 1 10 10 10
s3 0.1 1 11 110 110
s4 0.07 1 11 111 1110
s5 0.03 1 11 111 1111

(b) S Pi I division II division III division


s1 0.3 0 00 00
s2 0.2 0 01 01
s3 0.2 1 10 10
s4 0.2 1 11 110
s5 0.1 1 11 111

The corresponding code trees are shown in Fig. 3.5 (left branch corresponds to 0,
right one to 1).
It is easy to conclude that this procedure is not an algorithm. Generally, for every
q and r, each corresponding code trees should be examined to obtain a compact
code. In Fig. 3.6, two code trees (from possible 5) are shown for a source with 6
symbols.
The corresponding codes, entropy and average word code lengths are
52 3 Data Compression (Source Encoding)

(a) (b)

Fig. 3.6 Two possible code trees for a source with 6 symbols

S Pi (a) (b)
s1 0.65 0 0
s2 0.15 10 10
s3 0.08 110 1100
s4 0.05 1110 1101
s5 0.04 11110 1110
s6 0.03 11111 1111
H(S) = 1.6597 La=1.74 b/symb Lb=1.75 b/symb
Sh/symb

The difference in code word average length is very small and it is difficult to
choose the right one code tree without calculation.
However, Huffman [12] (Problems 3.3, 3.4, 3.5, 3.6, 3.7 and 3.8) gave an
algorithm resulting in obtaining a compact code. For illustration binary coding will
be used. Consider the source without memory with the symbols s1, s2, …, sq and the
corresponding symbol probabilities Pi (i = 1, 2, …, q). Symbols should be ordered
according to nonincreasing probabilities P1  P1    Pq. Now, the reduc-
tions start. The last two of ordered symbol are combined into one symbol (its
probability is Pq−1 + Pq) and a new source is obtained containing q − 1 symbols
(first reduction). It is continued until only two symbols are obtained (after q − 2
reductions). Their code words start with 0 and 1. After that one goes back dividing
the compound symbols into ones it was obtained from, adding 0 and 1 into the
corresponding code words. At the end, the compact code is obtained.
Consider the previous example. Probabilities of symbols to be combined are
denoted by (i.e. ♦) after the probability value, and probability of resulting symbol
are denoted with the same sign before the numerical value of the combined symbol
probability, S1, …, S4 are the sources obtained after reduction.
Brief Theoretical Overview 53

S Pi Xi S1 S2 S3 S4
s1 0.65 0 0.65 0 0.65 0 0.65 0 0.65 0
s2 0.15 11 0.15 11 0.15 11 •0.20♦ 11 ♦0.35 1
s3 0.08 101 0.08 101 ♦0.12• 110 0.15♦ 110
s4 0.05 1001 •0.07♦ 1000 0.08• 101 101
s5 0.04• 10000 0.05♦ 1001
s6 0.03• 10001

Average code word length is L = 1.74 b/symb corresponding to the code tree
(b) from the Fig. 3.6. The obtained code words are not identical to the previous
solution, but the code word lengths are the same. Of course, by complementing the
corresponding bits, the identical code words could be obtained.
One more question is of interest. In some cases there will be more than two
compound code words with the same probabilities. What will happen if the com-
pound code word is not put at the last place? Consider the following example.
(a) Symbol obtained by reduction is put always at the end of symbols having the
same probabilities
S Pi Xi S1 S2 S3 S4
s1 0.5 0 0.5 0 0.5 0 0.5 0 0.5 0
s2 0.2 11 0.2 11 0.2 11 •0.3♦ 10 ♦0.5 1
s3 0.1 101 0.1 101 ♦0.2• 100 0.2♦ 11
s4 0.1 1000 0.1♦ 1000 0.1• 101
s5 0.07• 10010 •0.1♦ 1001
s6 0.03• 10011

Average code word length is L = 2.1 b/symb (entropy is H(S) = 2.0502


Sh/symb)
(b) Symbol obtained by the first reduction is put always at the first place among of
symbols having the same probabilities, and in latter reduction at the last place.
S Pi Xi S1 S2 S3 S4
s1 0.5 0 0.5 0 0.5 0 0.5 0 0.5 0
s2 0.2 11 0.2 11 0.2 11 •0.3♦ 10 ♦0.5 1
s3 0.1 1000 •0.1 101 ♦0.2• 100 0.2♦ 11
s4 0.1 1001 0.1♦ 1000 0.1• 101
s5 0.07• 1010 0.1♦ 1001
s6 0.03• 1011

Average code word length is the same as above L = 2.1 b/symb.


The structure of obtained code is different, but it is not important, because both
codes are compact (Problem 3.3). It is a nontrivial example of the existence of more
54 3 Data Compression (Source Encoding)

compact codes for the same source and the same code alphabet. This fact can be
used to impose some additional conditions for the obtained code. For example, the
condition can be put that the lengths of the obtained code words have smaller
variance. It will lessen the demands for encoder memory. It can be achieved by
putting the compound symbol always at the first place among of symbols having the
same probabilities.
One drawback of a “classic” Huffman procedure is the need to know the symbol
probabilities. However, these probabilities are not known at the beginning (except
mainly for the languages). But often there is no time to wait the end of a long
message to obtain a corresponding statistics. The messages must be transmitted
continuously as they are generated. The problem is solved by using the adaptive
(dynamic) Huffman algorithm. The code tree is changed continuously to take into
account the current symbol probabilities, it is dynamically reordered. One possible
solution was proposed independently by Faller and Gallager, later improved by
Knuth and the algorithm is known as FGK algorithm (Problem 3.9). It is based on
the so called sibling property (Problems 3.2, 3.3, 3.4 and 3.9). Vitter (Problem 3.9)
gave the modification that is suitable for very large trees. These long procedures are
exposed in details in the corresponding problems.
The Huffman coding does not take into account the source memory. LZ
(Lempel-Ziv) (Problem 3.10) procedure is some kind of universal coding without
any explicit model. It can be thought out as a try to write a program for a decoder to
generate the sequence compressed by the encoder at the transmitting part.
For ternary encoding, the last three symbols are combined, and correspondingly
for r-ary encoding. If after the last reduction the number of compounded symbols
differs from 3 (r), dummy symbols (having the probability equal to zero) should be
added before the first reduction to avoid the loose of short code words.

Problems

Problem 3.1 By applying Shannon-Fano encoding procedure find the binary code
corresponding to zero-memory source emitting q = 5 symbols whose probabilities
 i
2 ; i ¼ 1; 2; . . .; q  1
(a) are defined by Pðsi Þ ¼
2i þ 1 ; i ¼ q
(b) are mutually equal, P(si) = 1/q, i = 1, 2, …, q.
For both cases find the code efficiency and the compression ratio. For values
2  q  20 draw efficiency and compression ratio as the functions of q for both
sources.
Solution
(a) Procedure for obtaining the code using Shannon-Fano encoding is shown in
Table 3.1. The encoding steps are:
Problems 55

Table 3.1 Shannon-Fano si P Code words Code li code word


procedure for the first set (si) forming word length
(a) of symbol probabilities
s1 1/2 0 0 1
s2 1/4 1 0 10 2
s3 1/8 1 0 110 3
s4 1/16 1 0 1110 4
s5 1/16 1 1111 4

1. Symbols are ordered according to nonraising probabilities.


2. Set of symbols is divided into two groups with equal or approximately
equal probabilities, but the groups can not include non adjacent symbols.
3. To every obtained group binary symbols “0” or “1” are joined, they will be
the first symbol of the code words for symbols in this group.
4. If any group obtained in the previous step contains more than one symbol,
its elements are further grouped into two subgroups with approximately
equal probabilities. To every obtained subgroup new binary symbol is
joined, being the next symbol in the corresponding code words.
5. Described procedure is continued until all the symbols obtain the code
words.
In this case a set of symbols in the first step is divided into two equally probable
subgroups (s1 is in the first, all other symbols in the second). In the next steps the
subgroups are always further divided into two equally probable sets. Therefore, in
this case, code word length corresponding to the ith symbol (denoted by li) satisfies
the relation

1
li ¼ ld ;
Pðsi Þ

and so called comma code is obtained, where the binary zero occurs only at the last
position in the code word, except for one of the words that has the maximum length,
having binary ones only.
To find the code efficiency, the entropy and the average code word length
should be found. In this case

X
5
1 1 1 1 1
HðSÞ ¼ Pðsi Þld ¼ ld 2 þ ld 4 þ ld 8 þ 2  ld 16 ¼ 1:875½Sh=symb
i¼1
Pðsi Þ 2 4 8 16
X
5
1 1 1 1
L¼ Pðsi Þli ¼  1 þ  2 þ  3 þ 2   4 ¼ 1:875½b=symb
i¼1
2 4 8 16

and the perfect code is obtained, because its efficiency equals


56 3 Data Compression (Source Encoding)

HðSÞ
g¼  100% ¼ 100%:
L

Compression ratio shows the reducing of the number of bits at the encoder
output comparing to the case when binary (non compression) code is used, it is
defined as

d ld qe 3
q¼ ¼ ¼ 1:6;
L 1:875

where operator d:e denotes the first greater (or equal) integer than the argument. In
this case, the number of bits at the encoder output is 1.6 times smaller.
For the given probabilities, it is easy to verify that for any number of emitted
symbols, a perfect comma code always can be found, and the following relation
holds

d ld ðqÞe
q¼ ;
HðSÞ

where the general expression for entropy, derived in Problem 2.2, yields

HðSÞ ¼ 2  22q :

(b) For the second set of probabilities, the procedure for obtaining the code using
Shannon-Fano encoding is shown in Table 3.2.
Entropy and average code word length are now

X
5
1 1
HðSÞ ¼ Pðsi Þld ¼ 5  ld 5 ¼ 2:3219 ½Sh=symb;
i¼1
Pðsi Þ 5
X
5
1 1
L¼ Pðsi Þli ¼ 3   2 þ 2   3 ¼ 2:4 ½b=symb;
i¼1
5 5

Table 3.2 Shannon-Fano si P Obtaining Code li code word


procedure for the second set (si) of code word length
(b) of symbol probabilities word
s1 1/5 0 0 00 2
s2 1/5 1 01 2
s3 1/5 1 0 10 2
s4 1/5 1 0 110 3
s5 1/5 1 111 3
Problems 57

and efficiency and compression ratio

HðSÞ
g¼  100% ¼ 96:75%;
L
d ld ðqÞe 3
q¼ ¼ ¼ 1:25:
L 2:4

For the probabilities in the form P(si) = 1/q, i = 1, 2, …, q, general expression for
entropy is

X
q
1 X1
q
HðSÞ ¼ Pðsi Þ ld ¼ ld ðqÞ ¼ ld ðqÞ;
i¼1
Pðsi Þ i¼1
q

while the obtained compression ratio is

d ld ðqÞe
q¼ ¼ qmax g;
L

where qmax denotes maximum possible compression ratio value, not depending on
the code used, but on the source and shows the “source compression potential”,
given by

d ld ðqÞe d ld ðqÞe
qmax ¼ ¼ :
HðSÞ ld ðqÞ

When the number of source symbols corresponds to the power of 2 (q = 2n, n—


integer), the code obtained by Shannon-Fano procedure is equivalent to the binary
code, and this code is perfect. In this case the efficiency is maximum possible and
the obtained compression ratio is equal to the maximum possible compression ratio.
However, here it equals 1, meaning that there is no any compression.
Therefore, for the source that has the equiprobable symbols, the compression can
be achieved only if the number of symbols is not q = 2n. This code is not perfect,
but the compression ratio is greater than 1. The conclusion is that only non perfect
codes can be compressed, while perfect codes (for these values of q for which they
can be constructed) are useless from compression point of view. The reader should
respond to the question is that fact in contradiction with the definition that for some
source the perfect code achieves 100% efficiency and the maximum possible
compression ratio. The maximum possible compression ratio vs. the number of
source symbols is shown in Fig. 3.7. For equiprobable symbols qmax has small
values. However, in the second analyzed case source compression potential is
greater and grows with the number of emitted symbols.
58 3 Data Compression (Source Encoding)

2.6
a) Perfect comma code
2.4 b) Symbols with equal probability
Maximum compression ratio, ρmax(q)
2.2

1.8

1.6

1.4

1.2

1
2 4 6 8 10 12 14 16 18 20
Number of symbols, q

Fig. 3.7 The maximum possible compression ratio vs. the number of source symbols for two
cases analyzed

From Fig. 3.7 the following conclusions can be drawn:


1. Compression ratio does not depend on the obtained code efficiency only, but on
the source entropy as well. Sources with smaller entropy have greater com-
pression potential, given by qmax.
2. For the first analyzed case, for any number of source symbols, the obtained
comma code is perfect and the achieved compression ratio is equal to qmax.
3. For the second analyzed source, when number of symbols is increased, the code
efficiency approaches to η = 100%, but that does not mean that the compression
ratio grows monotonously with the number of emitted symbols.
4. For the second analyzed source, the obtained code is perfect only when q = 2n
yielding the compression ratio q = qmax = 1. For all other cases, the curve in
Fig. 3.7 is only the upper bound for compression ratio. However, it is interesting
that by using a non perfect code (if e.g. when combined with the source
extensions) for the source having equiprobable symbols for q 6¼ 2n, the com-
pression ratio q > 1 can be achieved.

Problem 3.2 Find the binary code for zero-memory source defined by

si s1 s2 s3 s4
P(si) 0.6 0.25 0.125 0.025
Problems 59

(a) Find the efficiency and the code compression ratio.


(b) If the probability of s1 is smaller for 0.1 and P(s4) is greater for the same value,
find the corresponding efficiency and the compression ratio. Comment the
results.
(c) Whether the obtained code is instantaneous? Elaborate the answer!

Solution
(a) It is obvious that comma code is the best solution. The first symbol is encoded
by one bit only (‘0’), second by two (‘10’) and third and fourth by three bits
(‘110’ and ‘111’). The entropy and the average code word length are

HðSÞ ¼ 0:6 ldð0:6Þ  0:25 ldð0:25Þ  0:125 ldð0:125Þ  0:025 ldð0:025Þ


¼ 1:4502 ½Sh=symb
L ¼ 0:6  1 þ 0:25  2 þ 0:125  3 þ 0:025  3 ¼ 1:55 ½b=symb;

the efficiency and compression ratio are g ¼ 93:56% and q ¼ d ld ðqÞe=L ¼


2=1:55 ¼ 1:2903.
(b) In the case when the probabilities of the first and the fourth symbol are
changed, comma code is the best solution as well. The same code as in (a) will
be used. The probabilities are of the type 2−i, and the code is perfect yielding
the same values for entropy and the average code word length

HðSÞ ¼ L ¼ 1:75 ½Sh=symb:

The efficiency is g ¼ 100%, and the compression ratio is

q ¼ d ld ðqÞe=L ¼ 2=1:75 ¼ 1:1429:

In this case by changing the probabilities of two symbols (the number of


symbols is the same!) the efficiency becomes greater, but the compression
ratio becomes smaller. In both cases obtained codes are compact, and in the
second case code is even perfect, but the compression ratio is small, because
the source is not “suitable for compression”. In fact, the source entropy in the
second case is greater than in the first one and the maximal values of the
compression ratio (defined in the previous problem) decreased from qmax =
1.3791 to qmax = 1.1429. Because of that, the compression ratio is smaller.
(c) In both cases the same code was used, with the same code word lengths. For
an instantaneous code, the Kraft inequality should be satisfied, as in both cases

X
4
2li ¼ 21 þ 22 þ 2  23 ¼ 1  1:
i¼1
60 3 Data Compression (Source Encoding)

Kraft inequality is always satisfied for comma code (in fact with equality), it is
obvious that the code can be constructed with the same code word lengths, but it is
not instantaneous. For example, the code that has the code words ‘0’, ‘10’ ‘100’ and
‘111’ is not even uniquely decodable. Therefore, it should be verified whether some
code word is the prefix of some other code word.
For the codes (a) and (b) it is not the case, and the codes are instantaneous.
Problem 3.3 Find the binary code for zero-memory source defined by:

si s1 s2 s3 s4 s5 s6
P(si) 0.65 0.05 0.08 0.15 0.04 0.03

(a) Apply the Huffman procedure, find the efficiency. Whether the obtained code
is compact?
(b) Draw the corresponding code tree. Whether the code tree satisfies the sibling
property?

Solution
(a) Huffman procedure consists of the following steps:
1. The symbols are ordered according to the non increasing probabilities.
2. The reduction is carried out by combining two symbols having the smallest
probabilities into the one symbol.
3. The obtained symbol (denoted in this example by an asterisk) has the
probability equal to sum of the combined symbols probabilities.
4. The procedure is repeated (including the possible reordering of new
symbols) until only two symbols remain. One is denoted e.g. by “0”, and
the second by “1”.
5. Code word for a combined symbol is a prefix of the code words for
symbols from which it was obtained. After this prefix, “0” and “1” are
added.
6. The procedure is repeated until all original symbols obtain corresponding
code words (third column in the Table 3.3).
Entropy, average code word length and efficiency are:
X
6
1
HðSÞ ¼ Pðsi Þ ld ¼ 1:66 ½Sh=symb;
i¼1
Pðsi Þ

X
6
L¼ Pðsi Þlðsi Þ
i¼1
¼ 0:65  1 þ 0:05  4 þ 0:08  2 þ 0:15  3 þ 0:04  5 þ 0:03  5
¼ 1:74 ½b=symb;
Problems

Table 3.3 Huffman procedure


si P(si) xi si P(si) xi si P(si) xi si P(si) xi si P(si) xi
s1 0.65 0 s1 0.65 0 s1 0.65 0 s1 0.65 0 s1 0.65 0
s4 0.15 11 s4 0.15 11 s4 0.15 11 s2s3s5s6 0.20* 10 s2 s3s4s5 s6 0.35* 1
s3 0.08 101 s3 0.08 101 s2s5s6 0.12* 100 s4 0.15 11
s2 0.05 1001 s5s6 0.07* 1000 s3 0.08 101
s5 0.04 10000 s2 0.05 1001
s6 0.03 10001
61
62 3 Data Compression (Source Encoding)

s1 0,65

s4 0,15 1

0 0
s3 0,08
1 0,35
1
s2 0,05
1 0,20
0
1 0,12
s5 0,04
0
0,07
s6 0,03 0
Fig. 3.8 Huffman procedure illustrated by tree

g ¼ ðHðSÞ=LÞ  100% ¼ 95:4%;

while the obtained compression ratio is

q ¼ d ld ðqÞe=L ¼ 1:8072:

Huffman procedure guarantees the obtaining of the compact code. The obtained
code here is not perfect. However, for this source it is not possible to find an
instantaneous code achieving smaller average code word length (higher efficiency).
(b) It is easier to follow the Huffman procedure using the code tree, as shown in
Fig. 3.8. The tree is formed from the root (on the right side), two branches
starting from it are denoted by different bits and the procedure is repeated
going to the lower hierarchical levels. It is not important which branch is
denoted by binary zero and which by binary one.
Here the result of Huffman procedure gives slightly different code words than in
the previous case—1, 0101, 011, 00, 01001 and 01000. In Fig. 3.9 the ordered code
tree is shown where the hierarchical levels can be easily noticed. It is obvious that
this tree satisfies the sibling property, because the symbol probabilities, started from
the left to the right and from the lowest hierarchical level up, do not decrease. One
can verify easily that the same tree topology corresponds to the code from Table 3.3,
although the code words differ. There are five nodes at the tree, from each one two
branches are going out. Therefore, 32 equivalent compression codes can be formed
having the same tree topology, and each one is instantaneous and compact.
Problem 3.4 Zero-memory source is defined by:

si s1 s2 s3 s4 s5 s6 s7
P(si) 0.39 0.21 0.13 0.08 0.07 0.07 0.05
Problems 63

0 1

0,35 0,65
s1
0 1

0,15 0,20
s4
1 0

0,08 0,12
s3
1 0

0,05 0,07
s2
0 1

s6 s5
0,03 0,04

Fig. 3.9 Ordered code tree from Fig. 3.8

(a) Apply Shannon-Fano encoding procedure and find the average code word
length.
(b) Whether the obtained code is compact? Whether the code is perfect?
(c) Whether the Kraft inequality is satisfied?

Solution
(a) It is chosen that in the first division symbols s1 and s2 form one group, the
other symbols forming the other group (probability ratio is 0.6:0.4). The
alternatives are s1 in the first group (0.39:0.61) or s1, s2 and s3 in the first group
(0.73:0.27), but according to the Shannon-Fano procedure they are suboptimal
solutions. The obtained groups are further subdivided into the subgroups by
trying that they have as equal as possible probabilities. The procedure is given
in Table 3.4.
Entropy and the average code word length are
64 3 Data Compression (Source Encoding)

Table 3.4 Shannon-Fano si P Obtaining of the Code li the code


procedure (si) code word word word length
s1 0.39 0 0 00 2
s2 0.21 1 01 2
s3 0.13 1 0 0 100 3
s4 0.08 1 101 3
s5 0.07 1 0 110 3
s6 0.07 1 0 1110 4
s7 0.05 1 1111 4

X
7
1
HðSÞ ¼ Pðsi Þ ld ¼ 2:43½Sh=symb;
i¼1
Pðsi Þ
X
7
L¼ Pðsi Þli ¼ 2:52½b=symb;
i¼1

(b) Compactness of the code obtained by Shannon-Fano procedure will be veri-


fied by comparison to the code which is compact, obtained by Huffman
procedure—Fig. 3.10.
The corresponding average code word length is

X
7
L¼ Pðsi Þli
i¼1
¼ 0:39  1 þ 0:21  3 þ 0:13  3 þ 0:08  4 þ 0:07  4 þ 0:07
 4 þ 0:05  4
¼ 2:49½b=symb

and the obtained efficiency is greater—η = 97.59%.


The existence of code which has shorter average code length (higher effi-
ciency), confirms that the above code obtained by Shannon-Fano procedure is
not compact. It confirms as well that the application of Shannon-Fano pro-
cedure does not guarantee the obtaining of a compact code. Of course, non
compact code surely is not perfect. It is recommended to the reader to verify
whether by using the different subdivisions during Shannon-Fano procedure
(when the probabilities in the subgroups are not balanced) the compact code
can be obtained.
Compactness can be verified as well by inspection whether the corresponding
code tree has the sibling property. The code tree is shown in Fig. 3.11 where
the hierarchical levels can be easily seen. The sibling property is not fulfilled
because the tree cannot be altered in such a way that the probabilities at the
third hierarchical level (denoted by the dashed line) are increasingly ordered
without the change of the tree topology. This tree does not correspond to the
code obtained by Huffman procedure and the code cannot be compact.
Problems 65

s1 0.39

s4 0.21 0
0 0.36
s3 0.13
0 0
1 1
s2 0.08 1
0.61
1
s5 0.07 0 0.15 0.25

s6 0.07 0 1
0.12
s7 0.05 1

Fig. 3.10 The code tree obtained by Huffman procedure

1 0

0.40 0.60

1 0 1 0

0.19 0.21 0.21 0.39


s4 s2 s1
0 1 0
1

0.08 0.13
0.07 0.12 s3
s5 s4

1 0

0.05 0.07
s7 s6

Fig. 3.11 Code tree corresponding to the Shannon-Fano procedure


66 3 Data Compression (Source Encoding)

(c) Kraft inequality is satisfied

X
7
2li ¼ 2  22 þ 3  23 þ 2  24 ¼ 1  1;
i¼1

what should be expected, because the Shannon-Fano procedure guarantees the


instantaneous code. This relation is here satisfied with equality, and the code is
complete.

Problem 3.5 Apply the Huffman procedure for a source defined by the table given
below if the symbols obtained by reduction are always put at the last place in the
group of equally probable symbols, either they are put arbitrary.

(a)

s1 0.4

s4 0.2 0
1.0
0
s3 0.1 0 1
0.2
0.6
s2 0.1 1 1
0 0.4
s5 0.1
1
0.2
s6 0.1 1

(b)
s1 0.4
0
0.6
s4 0.2 0
0 1.0
0.4
s3 0.1 0 0.2 1
1
1
s2 0.1
1

s5 0.1 0
0.2
s6 0.1 1

Fig. 3.12 a The first solution. b The second solution


Problems 67

Table 3.5 Huffman codes si P(si) x(1) l(1) x(2) l(2)


i i i i
corresponding to code trees
shown in Fig. 3.12a, b s1 0.4 0 1 00 2
s2 0.2 10 2 10 2
s3 0.15 1100 4 110 3
s4 0.1 1101 4 111 3
s5 0.1 1110 4 010 3
s6 0.05 1111 4 011 3

si s1 s2 s3 s4 s5 s6
P(si) 0.4 0.2 0.1 0.1 0.1 0.1

(a) Draw at least two code trees corresponding to obtained codes that have dif-
ferent topologies. Whether the codes satisfy Kraft inequality? Whether these
codes have the same average code word length? What are the differences?
(b) If the source emits the sequence s2, s1, s4, s1, s5, s3, s6 and the channel errors
occur at the first and eighth bit, find the decoded sequences in both cases.

Solution
In Fig. 3.12a, b two possible results of Huffman procedure use are shown.
The obtained code words are given in Table 3.5. For both cases the average code
word length is the same—L ¼ 2:4 ½b=symb, but the maximum code word length is
different (max(li) = 5 for the first code and max(li) = 4 for the second one) as well
as the sum of code word lengths (sum(li) = 20 and sum(li) = 18).
Also, differ the code words length variances found according the formula

1 X ðnÞ
q
Var ðnÞ ¼ ðl  LÞ2 ;
q i¼1 i

where n denotes the ordinal number of the solution (code). Corresponding


numerical values are

Var ð1Þ ¼ ð1  2:4Þ2 þ ð2  2:4Þ2 þ 4  ð4  2:4Þ2 ¼ 12:36;

Var ð2Þ ¼ 2  ð2  2:4Þ2 þ 4  ð3  2:4Þ2 ¼ 1:76:

For both topologies it is possible to form a number of different code words sets.
There are five nodes in the tree, from each one two branches going out. Therefore
32 different compact codes can be formed for each tree topology. Codes corre-
sponding to one topology have different code words from the second one, but their
lengths are the same and their variances are the same.
68 3 Data Compression (Source Encoding)

(b) For the first code when seven symbols were sent, eight symbols were received,
one bit error usually cause the error propagation to the next symbol as well.
s2 s1 s4 s1 s5 s3 s6
10 0 1101 0 1110 1100 1111
0 0 0 1101 1101 0 1100 1111
s1 s1 s1 s4 s4 s1 s3 s6

For the second code when seven symbols were sent, seven symbols were
received, and error propagation is not noticeable.

s2 s1 s4 s1 s5 s3 s6
10 00 111 00 010 110 011
00 00 111 00 110 110 011
s1 s1 s4 s1 s3 s3 s6

Although in the second case code words length variance is smaller (code words
lengths are more uniform) it not resulted in reducing the problems due the bad
encoder and decoder synchronization, usually caused by transmission errors. It is
obvious that this effect arouses always when the code words have different lengths.
In this case the number of symbols in error as well as the error structure depend on
code words structure and the consequences generally cannot be easily predicted.
Problem 3.6 Zero-memory source is defined by

si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01

Apply the Huffman procedure to obtain binary, ternary and quaternary code.
Find the corresponding efficiencies.
Solution
The procedure of based m Huffman code obtaining consists of the following steps
[1, 7]:
1. In the first step q0 symbols are grouped where 2  q0  m and q is the smallest
integer satisfying
nq  q o
0
rem ¼ 0:
m1

where rem{} is the reminder after the division


2. In every following step m symbols are grouped, until only one symbol remains.
3. When the tree is formed to every branch starting from the root one of the m-ary
symbols is adjoined. All symbols that can be reached by the path using one
branch will have the code word starting with the corresponding adjoined
symbol.
Problems 69

s2 0,20
0
0,41
s1 0,18
0 0
1,0
1
s4 0,15
1 0,33
0 1
s3 0,14
0,59
s6 0,11 0 1 1

s5 0,10 1 0,21 0,26

s7 0,05 0
0
s8 0,04 0 0,12
1
s9 0,02 0 0,07
1
0,03
s10 0,01 1

Fig. 3.13 Procedure to obtain Huffman binary code

4. The procedure is repeated by adding a new symbols corresponding to the


branches starting from the nodes at the lower hierarchical levels, until the lowest
level is reached (where the source symbols are put).
For the considered source the obtaining of binary code is shown in Fig. 3.13. In
this case q0 = 2, regardless of the number of source symbols.
For a binary code, the following is obtained

X
10
1
HðSÞ ¼ Pðsi Þ ld ¼ 2:981½Sh=symb; Lð2Þ ¼ 3:02½b=symb; gð2Þ ¼ 98:72%:
i¼1
Pðsi Þ

The procedure for ternary code forming is shown in Fig. 3.14a. In this case
q = 10 and as q − q0 should be even, it is easy to find q0 = 2. The corresponding
code words are given in Table 3.6. The average code word length and efficiency are

X
10
ð3Þ HðSÞ
Lð3Þ ¼ Pðsi Þli ¼ 1:95 ½ter: symb=symb; gð3Þ ¼  100%
i¼1
ld ð3ÞLð3Þ
¼ 96:42%:

The procedure for quaternary code forming is shown in Fig. 3.14b. In this case
m = 4, q0 = 4 yielding
70 3 Data Compression (Source Encoding)

(a) (b)
s2 0.20 s2 0.20

s1 0.18 A s1 0.18 B A
A 1.00
1.00
B 0.47 s4 0.15 C
s4 0.15
B
C s3 0.14
s3 0.14
C D
s6 0.11 s6 0.11 A

A B
s5 0.10 s5 0.10

B 0.33 C 0.47
s7 0.05 s7 0.05
A C A D
s8 B
s8 0.04 0.04
B 0.12 0.12
s9 0.02 A C s9 0.02 C
0.03
s10 0.01 B s10 0.01 D

Fig. 3.14 Huffman procedure for three (a) and four (b) symbols of the code alphabet

X
10
ð4Þ HðSÞ
Lð4Þ ¼ Pðsi Þli ¼ 1:59½quot: symb=symb; gð4Þ ¼  100%
i¼1
ld ð4ÞLð4Þ
¼ 90:88%:

Problem 3.7 The source emits two symbols according to the table:

si s1 s2
P(si) 0.7 0.3

(a) By applying the Huffman procedure find the code for the source and for its
second and third extension as well. Find the efficiency and the compression
ratio for all codes and compare the results to the bounds given by the First
Shannon theorem.

Table 3.6 Code words of binary, ternary and quaternary code and their lengths
si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01
x(2)
i 100 00 111 101 011 010 1100 11010 110110 110111
l(2)
i 3 2 3 3 3 3 4 5 6 6
x(3)
i BA A BC BB CB CA CCA CCB CCCA CCCB
l(3)
i 2 1 2 2 2 2 3 3 4 4
x(4)
i B A DA C DC DB DDA DDB DDC DDD
l(4)
i 1 1 2 1 2 2 3 3 3 3
Problems 71

(b) Draw the efficiency as a function of the extension order (n) for P(si) = 0.05,
P(s1) = 0.1, P(s1) = 0.2, P(s1) = 0.3 and P(s1) = 0.4 for the extension order
1  n  8.
(c) Find the dependence for the maximal compression ratio on the probability
P(s1).

Solution
(a) Encoding of the original source is trivial—symbol s1 is encoded by binary
zero and symbol s2 by binary one. Entropy and average code word length are

HðSÞ ¼ 0:7 ld ð0:7Þ  0:3 ld ð0:3Þ ¼ 0:8813½Sh=symb;


L1 ¼ 0:7  1 þ 0:3  1 ¼ 1½b=symb;

the efficiency is η = 88.13% and the compression ratio q = 1/L1 = 1.


The encoding of the second source extension is given in Table 3.7. Firstly, the
combined symbols are formed having the probabilities P(si, sj) = P(si)P(sj),
because it is zero-memory source, and after that, the binary Huffman proce-
dure is used
From HðS2 Þ ¼ 2HðSÞ ¼ 1:7626 ½Sh=symb, after finding L2 ¼ 1:81 ½b=symb
the efficiency is increased at η = 97.38% and compression ratio is
q ¼ 2=L2 ¼ 1:105. For the third extension code words are shown in
Table 3.8. Further, it is obtained HðS3 Þ ¼ 3HðSÞ ¼ 2:6439 ½Sh=symb,
L3 ¼ 2:726 ½b=symb, η = 96.99% and q ¼ 3=L3 ¼ 1:1005.
(b) It is obvious that the efficiency increases with the extension order. The
described procedure is repeated for various values of P(s1), results are given in
Table 3.9. The efficiency does not increase always monotonically with the
extension order (n)! However, with further increase of n, efficiency will
converge to 100%, satisfying the First Shannon theorem

HðSn Þ
lim ¼ 1:
n!1 Ln

The dependence on the extension order is noticeable easier in Fig. 3.15. Of


course, maximum compression ratio is obtained for P(s1) = P(s2), it is equal to
100% for any extension order. However, in this case the maximum

Table 3.7 Huffman encoding for the second source extension


si P(ri) xi si P(si) xi si P(si) xi
r1 = s1 s1 0.49 1 r1 0.49 1 r2 r3 r4 0.51* 0
r2 = s1 s2 0.21 01 r3 r4 0.30* 00 r1 0.49 1
r3 = s2 s1 0.21 000 r2 0.21 01
r4 = s2 s2 0.09 001
72 3 Data Compression (Source Encoding)

Table 3.8 Huffman encoding for the third source extension


ri s1 s1 s1 S1 s 1 s 2 s1 s2 s1 s1 s2 s2 s2 s1 s1 s2 s1 s2 s2 s2 s1 s2 s2 s2
P 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027
(ri)
xi 00 010 011 1100 10 1101 1110 1111

Table 3.9 The efficiency as a function of extension order, for various probabilities P(s1)
n 1 2 3 4 5 6 7 8
P(s1) = 0.05 28.64 49.92 66.10 78.00 85.19 90.56 94.29 95.86
P(s1) = 0.1 46.90 72.71 88.05 95.22 97.67 99.75 98.87 98.57
P(s1) = 0.2 72.19 92.55 99.17 97.45 97.83 99.54 98.66 98.59
P(s1) = 0.3 88.13 97.38 96.99 98.82 99.13 99.22 99.64 99.48
P(s1) = 0.4 97.10 97.10 98.94 98.96 99.31 99.45 99.55 99.64

100

90

80

70
Efficiency, η

60
P(s1)=0.4
50
P(s1)=0.3
P(s1)=0.2
40
P(s )=0.1
1
P(s1)=0.05
30

20
1 2 3 4 5 6 7 8
The extension order, n

Fig. 3.15 The dependence of efficiency on the extension order

compression ratio does not mean the highest compression ratio. It will be
additionally explained in the next part of solution.
(c) The extensions are obtained from the original source and the maximal com-
pression ratio does not depend on the extension because the compression
potential does not depend on the extension order
Problems 73

d ld ð2n Þe nd ld ð2Þe 1
qmax
ðnÞ
¼ ¼ ¼ ¼ qmax
ð1Þ
:
HðSn Þ nHðSÞ HðSÞ

while the obtained compression ratio of binary source (ld(q) = 1) is


qðnÞ ¼ n=Ln , and on the basis of First Shannon theorem with the increasing
of extension order it attains the maximum use of the source compression
potential as well

1 L
lim ¼ lim ¼ HðSÞ:
n!1 qðnÞ n!1 n

The dependence of maximum compression ratio on the binary source symbol


probability is shown in Fig. 3.16. It is obvious that for the equiprobable
symbols the compression ratio is q = 1, regardless the extension order, while
in case when P(s1)  P(s2) or P(s2)  P(s1), the obtained compression ratio
can be significantly high.

Problem 3.8 First-order memory binary source is defined by the transition matrix
 
1p p
P¼ :
p 1p

25
Maximum compression ratio, ρmax
(n)

20

15

10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of one symbol of binary source, P(s1)

Fig. 3.16 The dependence of maximum compression ratio on the binary source symbol
probability
74 3 Data Compression (Source Encoding)

(a) Find the corresponding binary Huffman codes for a source as well as for its
second and third extension if p = 0.01.
(b) Find the minimum value of the average code word length for the nth exten-
sion. Draw the efficiency and compression ratio dependence on the extension
order for p = 0.1, p = 0.01 and p = 0.001.

Solution
(a) In solution of Problem 2.6 it was shown that the corresponding stationary
binary symbols probabilities are P(0) = P(1) = 0.5. Huffman code for such a
source is trivial—symbol ‘0’ is coded by ‘0’ and ‘1’ by ‘1’. Obviously, the
average code word length is L1 = HðSÞ = 1[b/symb]. The entropy is

HðSÞ ¼ 0:01 ldð0:01Þ  0:99 ldð0:99Þ ¼ 0:0808½Sh=symb;

efficiency and compression ratio are

g ¼ 8:08%; q ¼ 1:

Stationary probabilities of combined symbols corresponding to the second


extension and the corresponding code words are shown in Table 3.10.
It is easy to verify the following

HðS2 Þ ¼ 2HðSÞ ¼ 0:1616½Sh=symb;

  X 4
H S2 ¼ Pðri Þ ld ð1=Pðri ÞÞ ¼ 1:0808 ½Sh=symb;
i¼1

while the average code word length is

X
4
L2 ¼ Pðri Þli ¼ 1:515½b=symb;
i¼1

the efficiency is η = 10.67%, and the compression ratio q ¼ 2=L2 ¼ 1:32. The
relation HðS2 Þ\L is satisfied as well.

Table 3.10 Stationary probabilities of the second extension symbols and the corresponding code
words
Combined symbols Probabilities Code words li
r1 = ‘00’ P(00) = P(0)P(0/0) = 0.495 0 1
r2 = ‘01’ P(01) = P(0)P(1/0) = 0.005 110 3
r3 = ‘10’ P(10) = P(1)P(0/1) = 0.005 111 3
r4 = ‘11’ P(11) = P(1)P(1/1) = 0.495 10 2
Problems 75

The probabilities of the third extension combined symbols and the corre-
sponding code words are given in Table 3.11. In this case, it is easy to verify

HðS3 Þ ¼ 3HðSÞ ¼ 0:2424 ½Sh=symb;

  X 4
H S2 ¼ Pðri Þldð1=Pðri ÞÞ ¼ 1:1616½Sh=symb;
i¼1

and the average code word length is

X
8
L3 ¼ Pðri Þli ¼ 1:549½b=symb;
i¼1

yielding the efficiency η = 15.65%, and the compression ratio q ¼ 1:94. It is


interesting to note that the relation HðS2 Þ\L in this case is satisfied as well.
(b) The Huffman procedure takes into account only the stationary probabilities of
symbols emitted by the extended source and the average code word length
cannot be smaller than the entropy of the source adjoined to that extension, i.e.
the following must hold

 
L n  H Sn :

Table 3.11 Stationary probabilities of the third extension symbols and the corresponding code
words
Combined Conditional Probabilities Code li
symbols probabilities words
r1 = ‘000’ P(00/0) = P(0/0)P P(000) = P(0)P 0 1
(0/0) = 0.9801 (00/0) = 4.9005  10−1
r2 = ‘001’ P(01/0) = P(1/0)P P(001) = P(0)P 1111 4
(0/0) = 0.0099 (01/0) = 4.95  10−3
r3 = ‘010’ P(10/0) = P(0/1)P P(010) = P(0)P 111001 6
(1/0) = 0.0001 (10/0) = 5  10−5
r4 = ‘011’ P(11/0) = P(1/1)P P(011) = P(0)P 1100 4
(1/0) = 0.0099 (11/0) = 4.95  10−3
r5 = ‘100’ P(00/1) = P(0/0)P P(100) = P(1)P 1101 4
(0/1) = 0.0099 (00/1) = 4.95  10−3
r6 = ‘101’ P(01/1) = P(1/0)P P(101) = P(1)P 111000 6
(0/1) = 0.0001 (01/1) = 5  10−5
r7 = ‘110’ P(10/1) = P(0/1)P P(110) = P(1)P 11101 5
(1/1) = 0.0099 (10/1) = 4.95  10−3
r8 = ‘111’ P(11/1) = P(1/1)P P(111) = P(1)P 10 2
(1/1) = 0.9801 (11/1) = 4.9005  10−1
76 3 Data Compression (Source Encoding)

Further, the effective efficiency of Huffman code for sources with memory can
be defined
 
ðnÞ H Sn
gef ¼  100%;
Ln

and for considered case it is

ð1Þ ð2Þ ð2Þ


gef ¼ 100%; gef ¼ 71:34%; gef ¼ 74:99%:

It is clear that with the increasing of the extension order, the effective efficiency
rapidly converge to the its maximum value even when there is a significant dif-
ference between the symbol probabilities, what is here the case. More critically is
how the maximum possible efficiency and the maximum possible compression
degree change. The correspond formulas are

HðSn Þ dlog2 ðqn Þe


gmax
ðnÞ
¼  100%; qmax
ðnÞ
¼  100%:
HðSn Þ HðSn Þ

Using the known relations for entropy of nth extension and the source adjoined
to it, as well as the fact that the original source is binary, the following expressions
are obtained (and the dependences shown in Figs. 3.17 and 3.18)

100

90

80
(n)
max

70
Maximum efficiency, η

60

50

40

30
p=0.1
20 p=0.01
p=0.001
10

0
0 20 40 60 80 100 120 140 160 180 200
The extension order, n

Fig. 3.17 The maximum efficiency that can be achieved by Huffman encoding, first-order
memory source, p parameter
Problems 77

70

60
Maximum compression ratio
50
p=0.001
40 p=0.01
p=0.1

30

20

10

0
0 20 40 60 80 100 120 140 160 180 200
The extension order, n

Fig. 3.18 The maximum compression ratio that can be achieved by Huffman encoding, first-order
memory source, p parameter

nHðSÞ n
gmax
ðnÞ
¼  100%; qmax
ðnÞ
¼ :
ðn  1ÞHðSÞ þ HðSÞ ðn  1ÞHðSÞ þ HðSÞ

For the higher p values maximum code efficiency can be achieved with a small
extension order, but the corresponding maximum value of compression ratio is
significantly smaller. Sources with memory where parameter p has small values
have a great compression potential. For p = 0.001, the 50th extension provides
compression ratio of the order 30. This means that 30 equiprobable bits emitted by
the source can be, in average, represented by one bit only at the encoder output.
However, in this case the Huffman procedure should be applied for 250 symbols of
source alphabet!
Problem 3.9 Discrete source for which the number of symbols and its probabilities
are not known in advance, emits the sequence

aabcccbd. . .

(a) Apply binary adaptive Huffman encoding by using FGK algorithm.


(b) Explain in details the decoding procedure if its input is generated by the
encoder previously considered.
(c) Repeat the procedure from (a) if instead of FGK algorithm, Vitter algorithm is
applied.
78 3 Data Compression (Source Encoding)

Solution
In adaptive Huffman algorithm encoder and decoder use dynamic change of code
tree structure. The procedure of tree rearranging after the reception of every new
symbol at the encoder input depends on the applied variant of the adaptive Huffman
algorithm, but the code tree in every moment must have the sibling property. The
algorithm is based on the fact that the binary code is a Huffman code if and only if
the code tree has a sibling property. It practically means that in any moment (for
any length of the incoming sequence) the code tree obtained by applying the
adaptive Huffman algorithm could be obtained and by applying the static Huffman
algorithm as well. Therefore, the code is compact always, although the code tree
structure changes in time.
(a) The basis of FGK algorithm was independently found by Newton Faller
(1973) [13] and Robert Gallager (1978) [14], while Donald Knuth (1985)
[15] added improvements into the original algorithm, and the name of the
algorithm is derived from the family initials of the authors.
The procedure of the code forming for the input sequence aabcccbd is exposed
in what follows
1. At the beginning the Huffman tree is degenerated and consists of the zero
node only (knot 0) as shown in Fig. 3.19a. At the encoder input is symbol
a and it is easy to verify that this symbol is not on the tree. Therefore, the
encoder emits code word corresponding to the knot 0 (being 0 as well),
emitting simultaneously the corresponding ASCII code for the symbol—
ASCII(a). As shown in Fig. 3.19b, now a new tree is formed by division of
the node 0 into two nodes—0 and a. To node 0 the code word 0 is adjoined
and the number of previous appearances of this symbol corresponds to its
weight being always w(0) = 0 (this symbol does not appear at the input
sequence, but it is added artificially, for the easier algorithm realization).
To the node corresponding to symbol a code word 1 is adjoined and the
corresponding node weight is w(a) = 1 (in the figures, the weight is always
in the rectangle, near the node).
2. Once more into encoder enters a, already existing on the tree. The encoder
emits code corresponding to the knot a—it is now the code word 1. Tree
structure does not change because the entering element already exists on
the tree. Because two symbols a have been entered, the node weight

Fig. 3.19 Tree structure at (a) (b)


the beginning (a) and after the
0 1
entering of the symbol a (b)

0
0 1
0 1

0 a
Problems 79

(a) 2 (b) 3

0 1 0 1
0 2 1 2

0 a a
0 1
0 1

0 b

Fig. 3.20 Tree structure after entering of the second (a) and of the third (b) symbol

increments, now it is w(a) = 2, and to this node at the new tree code 1 is
adjoined, as shown in Fig. 3.20a.
3. Into encoder enters b, not existing yet at the tree. The encoder emits the
code corresponding to the node 0 (the code being 0) and after that ASCII
(b). New tree is formed by separating the node 0 into two nodes—0 and b,
as shown in Fig. 3.20b. To the node 0 the code 00 is adjoined with the
weight w(0) = 0 while the code 01 is adjoined to the node b with the
weight w(b) = 1.
4. Now, into encoder enters c, not existing yet at the tree. The encoder emits
the code corresponding to the node 0, now being 00 and after that ASCII
(c). New tree is formed by separating the node 0 into two nodes—0 and
c. To the node 0 code 000 is adjoined (it is obvious that the corresponding
code for this node will be all-zeros sequence, always with the weight
w(0) = 0!), and to the node b the code 001 is adjoined, with the weight
w(b) = 1, as shown in Fig. 3.21a.

(a) 4 (b) 5

0 1 0 1
2 2 2 3

a a
0 1 0 1
1 1 1 2

b c
0 1 0 1

0 1 0 1

0 c 0 b
(a) (b)

Fig. 3.21 Tree structure after entering of the fourth (a) and of the fifth (b) symbol
80 3 Data Compression (Source Encoding)

5. Into encoder now enters c, already existing on the tree. The encoder emits
code corresponding to the node c being now 001. The weight of node
corresponding to symbol c is incremented and a new tree does not have the
sibling property. Because of that, symbols b and c change the places and a
new tree is obtained (the tree topology does not change). The rule for this
reordering is that the node whose weight increased by incrementation
(w(c) = 2) change the place with the node which has a smaller weight than
it, and from all the nodes that have this property, the one is chosen which is
at the lowest hierarchical level. Just because of that, the node c goes up for
one level towards the root. It should be noted that this reordering is logical,
because the same result would be obtained for static Huffman algorithm
with the same symbol probabilities (i.e. P(a) = P(c) = 0.4 and P(b) = 0.2),
with the exception that here formally exists zero node always being at the
lowest hierarchical level. New tree is shown in Fig. 3.21b (the first level is
drawn in a slightly different way to emphasize that the sibling property is
satisfied).
6. Now once more enters the symbol c. At the encoder output the code
corresponding to the node c is emitted being now 01. Because the weight
incrementation yields now the weight w(c) = 3, the node c change the
place with the node a going up one level. New tree is shown in Fig. 3.22a.
7. Into encoder now enters once more the symbol b. At the encoder output the
code corresponding to the knot b is emitted which is now 001. Because
totally two symbols b have been entered so far, the weight increments is
now w(b) = 2, and to this node at a new tree code 001 is adjoined, as
shown in Fig. 3.22b. Tree structure in this step does not change, because
the tree with a new weights satisfies the sibling property.
8. Into encoder enters d, not existing yet at the tree. The encoder emits the
code corresponding to the knot 0, now being 000 and after that ASCII(d).
New tree is formed by separating the node 0 into two nodes—0 and d. It is
obvious that the weights observed from left to right starting at the lowest

(a) 6
(b) 7

0 1 0 1
3 3 3 4

c c
0 1 0 1
2 2
1 2

a a
0 1 0 1
0 2
0 1
0 b
0 b

Fig. 3.22 Tree structure after entering of the sixth (a) and of the seventh (b) symbol
Problems 81

Fig. 3.23 Tree structure after 8


entering of the eighth symbol
0 1
3 5

c
0 1
2 3

a
0 1
1
2

b
0 1
0 1

0 d

level to the upper levels do not decrease. The tree satisfies the sibling
property and further reordering is not needed. Final tree structure for a
given sequence is shown in Fig. 3.23.
The emitted bit sequence for the input sequence abcccbd is

0; ASCIIðaÞ; 1; 0; ASCIIðbÞ; 00; ASCIIðcÞ; 001; 01; 001; 000; ASCIIðdÞ:

where with the sign “,”the symbols emitted for one symbol at the input are sepa-
rated, while “;” corresponds to the adjacent input symbols. It should be noted that
the same entering sequence was encoded using the Huffman static code corre-
sponding to the tree from Fig. 3.18 (when there is no zero node, and the symbol d is
represented by 000) is

01; 001; 1; 1; 1; 001; 000;

while for the correct decoding, the decoder must receive in advance ASCII(a),
ASCII(b), ASCII(c), ASCII(d), enabling it to “know” the correspondence between
the symbols and the code words.
Therefore, for FGK algorithm 16 bits and four ASCII codes for letters are
transmitted in real time, as the symbols enter into the encoder. For a static Huffman
algorithm, 14 bits are transmitted, but four ASCII codes for letters had to be
transmitted in advance. The reader should try to explain why, in this case, the static
Huffman algorithm yields better results (whether only due to the absence of the zero
knot?). Of course, great advantage of adaptive Huffman algorithm is that for the
encoder functioning, the input symbol probabilities should not be known in
advance.
82 3 Data Compression (Source Encoding)

(b) If the encoder output is connected directly to the decoder input, error-free
transmission us supposed. In this case the decoder output should be found, if
its input is

0; ASCIIðaÞ; 1; 0; ASCIIðbÞ; 00; ASCIIðcÞ; 001; 01; 001; 000; ASCIIðdÞ:

The decoding procedure will be described step by step as well:


1. At the beginning, in the decoder is only the degenerated Huffman tree
consisting only of the zero node (node 0), as shown in Fig. 3.19a. Into
decoder enters the code word 0 accompanied by ASCII(a) and the decoder
finds it—it corresponds to the node 0. Every time when decoder comes to
this node, it separate it into two new nodes adjoining to one code word
adding the suffix 0 to the existing code for zero node (it is a new zero node,
the corresponding code word is 00) and to the other node adjoins the
symbol whose ASCII code was received, adding at the existing code word
suffix 1 (the symbol is a, new code word is 01). The node weight corre-
sponding to the new symbol is incremented and the tree is shown in
Fig. 3.19b. The sibling property is satisfied and the next step begins.
2. In the second step the received sequence is 1. At the previously formed tree
code 1 corresponds to the symbol a, corresponding weight is incremented
becoming w(a) = 2. The tree is shown in Fig. 3.20a and satisfies the sib-
ling property.
3. In the next step the sequence 0, ASCII(b) is received. At the previously
formed tree code 0 corresponds to the symbol 0, the procedure of adding a
new symbol starts (node 0 is separated into 0 and b) having the weight w
(b) = 1 and a code word 01. The tree is shown in Fig. 3.20b has the sibling
property and the further reordering is not needed.
4. In the fourth step the received sequence is 00, ASCII(c). At the previous
tree to the bit 0 corresponds the node without adjoined final symbol (more
code words start with that bit). For this reason one more bit from the input
is taken into account and 00 corresponds to symbol 0, resulting in the
further adding of a new symbol (node 0 is separated into 0 and c), with the
weight w(c) = 1 with the corresponding code word 001. Now, the tree
(shown in Fig. 3.21a) has a sibling property and the next step can begin.
5. Sequence at the decoder input in the fifth step is 001. At the previously
formed tree 0 and 00 do not define any symbol node while 001 corresponds
to the symbol c and the corresponding weight is incremented—w(c) = 2.
The obtained tree does not have a sibling property and symbols b and c
change the places. Due to the using the same procedure for tree reordering,
the same result is obtained, shown in Fig. 3.21b.
6. The input sequence in the sixth step is 01. To bit 0 no one symbol node
corresponds, while 01 corresponds to symbol c and the corresponding
weight is incremented—w(c) = 3. The obtained tree does not satisfy the
Problems 83

sibling property and symbols a and c change the places giving the tree
shown in Fig. 3.22a.
7. In this step the received sequence is 001. The first and the second bit are
not sufficient for symbol location and the third one (the complete sequence)
points to the symbol b whose weight is incremented to w(b) = 2. The tree
satisfies the sibling property and does not change the structure, as shown in
Fig. 3.22b.
8. In this step the received sequence is 000, ASCII(d). The code 000 points to
symbol 0, resulting in the adding of a new symbol (node 0 is separated into
0 and d), having the weight w(d) = 1 with the corresponding code word
0001. The tree is shown in Fig. 3.23, satisfies the sibling property and
further reordering is not needed.

(c) The second variant of adaptive Huffman algorithm was proposed by Jeffrey
Vitter (1987) [16]. The basic modification is that during the tree reordering
symbols nodes are put at as low as possible hierarchical levels. If the two
nodes have the same weight, being at the different levels, then during
reordering the goal is that a node corresponding to the end symbol has to be
put on the lower level. In this procedure the numeration of nodes is introduced
from the left to the right and from the lowest level up (underlined numbers in
Fig. 3.24). It is not allowed that from the two notes having the same weight,
one corresponding to the one symbol and the other being combined, the
symbol node has the upper ordinal number. If it happens, these two nodes
should change the positions at the tree (the combined node “carries” its suc-
cessors as well).

(a) 9
8 (b) 8
9

0 1 0 1
3 5
7 8 3 5
7 8
c
0 1
0 1 0 1
3 2
6 1 2 2 3
5 3 5 6
a 4

0 1 b a c
0 1
1
2
3 4
0 1 2 1
b
0 1 0 d

0 1 2 1

0 d

Fig. 3.24 Code trees obtained by FGK procedure (a) and by Vitter procedure (b)
84 3 Data Compression (Source Encoding)

For the input sequence in this problem (aabcccbd) the obtained trees are iden-
tical for the first seven steps as for FGK procedure, because the above condition is
satisfied. However, the tree for the eighth step differs, as shown in Fig. 3.24. It is
obvious that for tree obtained by FGK procedure (Fig. 3.24a) symbol node (No. 6)
and the combined node (No. 7) have the same weight, but, the symbol node is at the
lower level. If these nodes change the places (symbol No. 7 successors), the tree
shown in Fig. 3.24b is obtained as a result of Vitter procedure.
In the previous problems it was mentioned that the number of levels in a code
tree corresponds to the maximal code word length, denoted by max(li), having here
the value 4 for FGK and 3 for Vitter procedure. Sum of the lengths of all code
words sum(li) is here 14 for FGK and 12 for Vitter procedure. Therefore, the Vitter
procedure provides smaller values for both numerical characteristics comparing to
FGK. Vitter procedure can easily be expanded for the case when both nodes to be
changed are combined and to the higher level (nearer to the root) the node which
has a greater value for max(li) either sum(li) should be put. In such way, the code
words lengths are equalized (the variance defined in Problem 3.5 is smaller) and the
time for which code word lengths attain the maximum value defined by the cor-
responding format and by the implementation is longer. If 16 bits are used for the
word lengths, for the node weights greater than 4095 the exceeding arises. This
problem for the weights is easily solved (when the root weight becomes 4095, the
symbol node weights are divided by two) but the problem of maximum code word
length stands for very great trees and the Vitter procedure has an important
application.
Problem 3.10 The sequence from the binary symmetric source with memory enters
into Lempel-Ziv encoder which forms dictionary of length N = 8.
(a) For the following relation of source parameters P(0/1)  P(0/0) and for the
emitted sequence

01010101010101010010110100101010101011010. . .

explain the principle of dictionary forming and find the achieved compression
ratio. What are minimum and maximum values for the compression ratio in
this case?
(b) If the errors occur at the first and at the tenth bit of encoded sequence, find the
decoded sequence.
(c) Repeat the previous for the case P(0/1) P(0/0) if the sequence emitted by
source is

00000000000000000000000000000000011111111. . .

(d) Find the compression ratio for the cases P(0/1) = 1 and P(0/1) = 0, when
dictionary length is N and the sequence is very long. How to estimate the
dependence of compression ratio on the dictionary length for single-order
memory binary source?
Problems 85

Table 3.12 Dictionary forming for the sequence from (a)


Program in pseudo language Dictionary
Address Contents w k wk ? out
w=nil; 0 0
loop 1 1 Nil 0 0 + /
read k;
0 1 01 − 0
if wk in vocabulary
w=wk; 2 01 1 0 10 − 1
else 3 10 0 1 01 + /
code of w!out 01 0 010 − 2
wk! table of 4 010 0 1 01 + /
strings
w=k; 01 0 010 + /
end; 010 1 0101 − 4
end loop; 5 0101 1 0 10 + /
10 1 101 − 3
6 101 1 0 10 + /
10 1 101 + /
101 0 1010 − 6
7 1010

Solution
(a) Dictionary forming for Lempel-Ziv code [17, 18] including Welch modifi-
cation [19] can be described by program in pseudo language, given in left
column in the Table 3.12. The encoder functioning can be described by the
entering and the outgoing sequences

010101010101 0101 0101 1010 0101 0101 0101 1010


! 012436 5 5 7 5 5 5 7

where the underlined part corresponds to the bits at the input, used for
dictionary forming, i.e. to the addresses emitted at the encoder output during
the dictionary forming.
Every address is denoted by three bits and the obtained compression ratio equals
to the quotient of the entering and outgoing bits, i.e.

Nul 13 þ 28
q¼ ¼ ¼ 1:0513;
Nizl 18 þ 21

where the first addends in numerator and the denominator correspond to the number
of bits at the input and the output during the dictionary forming (the transient
regime), while the second addends describe the coder functioning in the stationary
regime.
86 3 Data Compression (Source Encoding)

Compression ratio is by rule minimum after the dictionary forming

13
qmin ¼ ¼ 0:7222;
18

while the maximum value is achieved if the sequence for encoding is very long and
the transient regime can be neglected, and the sequence has a property that a
maximum numbers of symbols (four in this case) from the input, can be represented
by one address (three binary symbols) yielding

4
qmax ¼ ¼ 1:3333:
3

In this case the decoder, on the base of known dictionary and the decoder input
(denoted by in) estimate the sequence entering its input, denoted by k’ (as given in
Table 3.12). From decoded sequence, the following is obtained

out ¼ 0 1 2 4 3 6 5 5 7 5 5 5 7
¼ 000 001 010 100 011 110 101 101 111 101 101 101 111:

If the errors occurred at the first and the tenth bit (Fig. 3.25), the received
sequence is

in ¼ 100 001 010 000 011 110 101 101 111 101 101 101 111
¼ 4 1 2 0 3 6 5 5 7 5 5 5 7:

Although the error propagation in the decoded sequence does not exist, it is
possible that in the decoded sequence occurs smaller or greater number of bits
because to the various addresses correspond to the code words having the various
length, and the estimation of the encoded sequence (from encoder input) is

010 1 01 0 10 101 0101 0101 1010 0101 0101 0101 0101

of course, differing from the original sequence

0 1 01 010 10 101 0101 0101 1010 0101 0101 0101 1010

and in this case the total number of bits remained unchanged.

LZ encoder
k out in LZ decoder k'
(dictionary is
(dictionary is
created by using
known)
the input sequence)
e
Fig. 3.25 LZ encoder and decoder
Problems 87

Table 3.13 Dictionary forming for the sequence from b


Dictionary
Address Contents W k wk ? out
0 0
1 1 Nil 0 0 + /
0 0 00 − 0
2 00 0 0 00 + /
00 0 000 − 2
3 000 0 0 00 + /
00 0 000 + /
000 0 0000 − 3
4 0000 0 0 00 + /
00 0 000 + /
000 0 0000 + /
0000 0 00000 − 4
5 00000 0 0 00 + /
00 0 000 + /
000 0 0000 + /
0000 0 00000 + /
00000 0 000000 − 5
6 000000 0 0 00 /
00 0 000 /
000 0 0000 /
0000 0 00000 /
00000 0 000000 /
000000 0 0000000 6
7 0000000

(b) For a new sequence, the dictionary forming is given in Table 3.13 and the
input and outgoing sequences are

0 00 000 0000 00000 000000 0000000 00000 11111111. . . ! 023456 5 ????

All symbol groups inscribed in dictionary are the combinations of binary zeros
only. The problem arouses when in the stationary regime appears the first binary
one—coder in this case cannot find the corresponding word and the encoding
becomes impossible.
This example illustrates the fact that the sequence used for dictionary forming
has to be representative. It is desirable that during the transient regime the sequence
at the coder input has the same statistical properties as the rest of the sequence. For
stationary signals it is always possible although the duration of the transient regime
can be relatively long, and this duration determines the needed dictionary
88 3 Data Compression (Source Encoding)

dimension. For practical purpose is (e.g. for compression of the text corresponding
to the speaking language) usually the dictionary dimension of 1024 or 2048
addresses is chosen (every symbol is represented by 10 or 11 bits).
(c) If P(0/1) = 1 and if the source is symmetric then P(1/0) = 1 and combinations
00 and 01 will never appear in the input encoder sequence. It is obvious that in
this case only two sequences of length three (010 and 101), two sequences of
length four (1010 and 0101) and two sequences of any length n will appear.
Therefore, in the dictionary with N = 2n positions only the pairs of sequences
having the lengths m = 1, 2, …, N/2 will be kept.
In the stationary regime only the sequences of the length N/2 will be separated
and each one is encoded using n = ld(N) bits, sufficiently for a binary record of
every decimal address. For this optimal case the compression ratio is

ð1Þ N
qopt ðNÞ ¼ :
2 ld ðNÞ

For the case of binary symmetric source with memory where P(0/1)=P(1/0)=0,
the sequence consists of zeroes only either of ones only (depending on the starting
state) and the compression ratio is

50

45

40 limited by
the dictionary length
35
The compression ratio

limited by
the source parameters
30
P(1/0)=0.005
25

20
P(1/0)=0.01
15

10
P(1/0)=0.1
5

0
0 100 200 300 400 500 600 700 800 900 1000
The dictionary length, N

Fig. 3.26 The limits of compression ratio for LZ code and for some first-order memory sources
Problems 89

ð2Þ N
qopt ðNÞ ¼ :
ld ðNÞ

Of course, the expressions are valid if the length of the sequence is sufficiently
long and the duration of the transition period can be neglected. The dependence of
compression ratio on the number of bits representing the address in the dictionary is
shown in Fig. 3.26.
It is clear the with the increase of dictionary length, the compression ratio tends
to infinity. It is understandable, because the entropy in this case equals zero, and for
such binary source

1
qmax ¼ ! 1:
HðSÞ

Of course, if the sequence is not a deterministic one, but has a long memory, the
compression ratio must satisfy the condition

d ld ðqÞe;
lim ðqðNÞÞ  :
N!1 HðSÞ

For P(0/1) = P(1/0), the entropy of first-order memory symmetric source is

HðSÞ ¼ Pð1=0ÞldðPð1=0ÞÞ  ð1  Pð1=0ÞÞldð1  Pð1=0ÞÞ;

yielding

1
lim ðqðNÞÞ ¼ qmax ¼
N!1 Pð1=0ÞldðPð1=0ÞÞ þ ð1  Pð1=0ÞÞldð1  Pð1=0ÞÞ

and the corresponding upper bound of compression ratio is shown in Fig. 3.26 by
dashed line, for various values of P(1/0).
It should be as well noticed that the lines are the limiting cases only, and that the
real compression ratio can substantially differ from these values, but it must be
below them

N 1
qðNÞ\min ; :
2ldðNÞ Pð1=0Þld ðPð1=0ÞÞ þ ð1  Pð1=0ÞÞld ð1  Pð1=0ÞÞ

The deviation will be greater as the memory is shorter, but some useful con-
clusion can be drawn as well from the diagram. The compression ratio for a
first-order memory binary source where P(1/0) = 0.01 in any case cannot be greater
than 12.38, but using LZ coder having dictionary length 128 bits (seven-bits
addresses) even in the ideal case (with an infinite source extension) cannot achieve
the compression ratio greater than 9.14, because the address length is a charac-
teristic of the encoder itself.
90 3 Data Compression (Source Encoding)

One more example can be considered. It is known that the printed English text
can be approximately described as a source with memory of the 30th order, emitting
26 letters. The entropy of this source is estimated as H(S) = 1.30 [Sh/symb], while
ld(q) = 4.70 yielding d ld ðqÞe ¼ 5 and a maximum compression ratio is approx-
imately qmax = 3.85. From the figure one could conclude that for the corresponding
compression a dictionary should have a few tens of bits. It is not the case and the
obtained compression ratio for LZ coder increases substantially slower than it is
shown in Fig. 3.26, and the needed dictionary length is at least N = 1024.
Chapter 4
Information Channels

Brief Theoretical Overview

The information channels, as well as the sources, generally are discrete or con-
tinuous. Physical communication channel is, by its nature, a continuous one. Using
such an approach the noise and other interferences (other signals, fading etc.) can be
directly taken into account. However, it can be simplified introducing the notion of
a discrete channel (Fig. 1.1), incorporating the signal generation (modulation),
continuous channel and signal detection (demodulation). Some channel can be of a
mixed type. The input can be discrete, as for the case of digital transmission, while
at the receiver, the decision is made on the basis of continuous amplitude range, i.e.
the output is continuous.
The discrete channel is described by the input alphabet (xi 2 X{x1, x2, …,
xi, …, xr}), the output alphabet (yj 2 Y{y1, y2, …, yj, …, ys}) and a set of con-
ditional probabilities P(yj/xi), according to Fig. 4.1. These sets are finite (generally,
they may be countable as well). This channel is a zero-memory (memoryless)
channel, because the conditional probabilities depend on the current symbol only.
A good example for such a channel is that one where there is no intersymbol
interference and where only the white Gaussian noise is present. In the channel with
intersymbol interference, the probability of the received symbols depends as well
and on the adjacent symbols and in such way a memory is introduced into the
channel. Therefore, here P(yj/xi) is a conditional probability that at the channel
output symbol yj will appear, if the symbol xi is sent (emitted). These probabilities
can be arranged as the transition (channel) matrix completely describing the
channel (Problems 4.1–4.3)

© Springer International Publishing AG 2017 91


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_4
92 4 Information Channels

X{xi} Y{y j}
{P(yj/xi)}

Fig. 4.1 Discrete memoryless channel

2 3
P11 P12 ... P1s
  6 P
6 21 P22 ... P2s 7
7
P  Pij ¼ 6 .. .. .. 7;
4 . . ... . 5
Pr1 Pr2 ... Prs

where Pij  P(yj/xi). Index i corresponds to ith row (i.e. input symbol) and index
j corresponds to jth column (i.e. output symbol).
Obviously,

X
s
Pij ¼ 1 ði ¼ 1; 2; . . .; rÞ;
j¼1

i.e. matrix P must be stochastic, because at the receiving end the decision must be
made—which symbol yj is received, after symbol xi was sent. Some authors use Pij
to denote the conditional probability that symbol yi is received, after symbol xj was
sent. In this case, the sum of elements in every column must be equal 1.
A very simple discrete channel is binary channel (BC) (Problems 4.1–4.3)
where the transition matrix is
 
v p1
PBC ¼ 1 ;
p2 v2

where v1 + p1 = v2 + p2 = 1. Instead of using a matrix, the corresponding graph is


very useful (Fig. 4.2a). This channel can be as well completely described by adding

(a)
v1 (b)
X {x1=0, x2=1} Y {y1=0, y2=1}
p1 p2 0110110011 0100011011
e
0010101000
v2

Fig. 4.2 Graph corresponding to a binary channel (a) and an equivalent description using error
sequence (b)
Brief Theoretical Overview 93

the input sequence using an XOR gate and the error sequence (e) (Fig. 4.2b). In
this sequence binary ones are at the positions where the errors occurred, comple-
menting the transmitted bit.
A special case of binary channel is binary symmetric channel (BSC) (Problems
4.4–4.6) where the transition matrix is
 
v p
PBSC ¼ ðp þ v ¼ 1Þ:
p v

The notion “symmetry” here means that the probabilities of transition from 0 to
1, and from 1 to 0 are the same. It is the simplest channel described by only one
parameter p(v) corresponding in the same time to the probability of error (Pe)
(crossover probability). In this case Pe does not depend on the probabilities of input
symbols. For the simple binary channel probability of error is

Pe ¼ Pðx1 Þp1 þ Pðx2 Þp2 :

As said earlier, a good example for BC is such channel where there is no


intersymbol interference and where only the white Gaussian noise is present. It is in
fact BSC. Such channels are not often encountered in praxis, but BSC can be used
for modeling even for channels with memory as the basic building block.
If BC or BSC is accepted as an idealized model for a digital line, then, for a line
with more regenerative sections a model can be used having more BC (BSC) in
cascade (Problems 4.4, 4.5, 4.7). The next problem is to form an equivalent
channel. In Fig. 4.3 two identical BSC are shown in cascade.
The corresponding transition matrix can be found easily
 
v 2 þ p2 2pv
Pekv ¼ ;
2pv v 2 þ p2

i.e., the equivalent channel is BSC as well. Its matrix is obtained by multiplication
of the corresponding transition matrices. Using induction method, it can be easily
proved that in a general case the matrix of the equivalent channel is obtained by
multiplication of matrices of channels is cascade. For identical channels the

Fig. 4.3 Two identical BSC


in cascade v

p p

v
94 4 Information Channels

equivalent transition matrix is obtained by exponentiation. Further, it can be proved


as well that when the number of cascaded channels increases, the elements of
equivalent transition matrix converge to 0.5 regardless on the value of p, meaning
that the information transmission is not possible.
The channels can be extended as well as the sources by considering sequences of
n channel symbols as the symbols of n-th extension. For example, the equivalent
matrix of the second extension of BSC—(BSC)2, is a Kronecker square of BSC
matrix [2]
2 3
  v2 pv pv p2
vPBSC pPBSC 6 pv v 2 p2 pv 7
PðBSCÞ2 ¼ ¼6
4 pv
7:
pPBSC vPBSC p2 v 2 pv 5
p2 pv pv v2

One may ask why r 6¼ s? The answer is that in general case such channels are
encountered in praxis. For example, BSC can be modified, if it is decided that the
transmission is unsuccessful, using the “erasure” of the bit instead of the decision
that the other symbol (bit) is received (error!). Such channel is called binary era-
sure channel (BEC) (Problem 3.8). The corresponding transition matrix is
 
v p 0
PBEC ¼ ðp þ v ¼ 1Þ:
0 p v

In Fig. 4.4 a graph of (symmetric) BEC is shown.


This model corresponds to a communication system with a feedback, and the
receiver can send to the transmitter the information whether the received symbol is
received correctly or not. If the erased symbol is received (usually denoted with E),
the transmitter will repeat it until it is correctly received. Probability of this event is
1 − p, and the corresponding signaling rate should be multiplied with the same
factor.

Fig. 4.4 Graph of v


(symmetric) BEC

v
Brief Theoretical Overview 95

For further explanations it is important to consider the probability relations in a


channel. Consider discrete channel without memory described by transition matrix
P with (dimension r  s). Its elements are probabilities Pij(P(yj/xi)). They can be
called as well transition probabilities. If the probabilities of the source symbols P
(xi) (i = 1, 2, …, r) are known (input probabilities), then the probabilities at the
channel output symbols P(yj) (j = 1, 2, …, s) (output probabilities) can be
calculated

X
r
Pðyj Þ ¼ Pðxi ÞPij ðj ¼ 1; 2; . . .; sÞ:
i¼1

Therefore, the output probabilities can be found if the input and transition
probabilities are known. Of course, the following must hold

X
r X
s
Pðxi Þ ¼ 1; Pðyj Þ ¼ 1:
i¼1 j¼1

Of course, P(xi) are a priori input probabilities. P(yj/xi) can be called a poste-
riori output probabilities and P(yj)—a priori output probabilities. Generally, on the
basis of transition and output probabilities sometimes it is not possible to calculate
input probabilities.
Consider the next (trivial) example. BSC is described by p = v = 0.5. Let P
(x1) = a and P(x2) = b (a + b = 1). It is easy to calculate

Pðy1 Þ ¼ av þ bp ¼ 0:5ða þ bÞ ¼ 0:5


Pðy2 Þ ¼ap þ bv ¼ 0:5ða þ bÞ ¼ 0:5;

because a + b = 1. Therefore, P(y1) = P(y2) = 0.5 regardless of input probabilities.


Of course, it is an extreme case.
There are two sets of input probabilities. Besides the above mentioned a priori
probabilities (P(xi)), the set of a posteriori input probabilities is important—P(xi/yj)
(i = 1, 2, …, r; j = 1, 2, …, s)—when it is known which symbol (yj) is received.
Of course, it is known at the receiving end. These probabilities are calculated as

Pðxi ÞPðyj =xi Þ


Pðxi =yj Þ ¼ :
Pðyj Þ

This formula (Bayes’ rule) is obtained from the joint probability of events xi and yj

Pðxi ; yj Þ ¼ Pðxi ÞPðyj =xi Þ ¼ Pðyj ÞPðxi =yj Þ:

The a posteriori input probabilities can be calculated as a function of a priori


input probabilities and the transition probabilities, i.e. on the basis of known data
96 4 Information Channels

Pðxi ÞPðyj =xi Þ


Pðxi =yj Þ ¼ Pr :
i¼1 Pðxi ÞPðyj =xi Þ

Of course, the following must hold as well

X
r r X
X s
Pðxi =yj Þ ¼ 1; Pðxi ; yj Þ ¼ 1:
i¼1 i¼1 j¼1

Consider the following example [3].


BC is described by matrix
" #
2 1
3 3
P¼ 1 9
:
10 10

Find all unknown probabilities if the input (a priori) probabilities are P(x1) = 3/4
and P(x2) = 1/4.
Output (a posteriori) probabilities are

3 2 1 1 21
Pðy1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼  þ  ¼
4 3 4 10 40
3 1 1 9 19
Pðy2 Þ ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy2 =x2 Þ ¼  þ  ¼
4 3 4 10 40

Of course P(y1) + P(y2) = 1 must hold.


A posteriori input probabilities are

Pðx1 ÞPðy1 =x1 Þ 20


Pðx1 =y1 Þ ¼ ¼
Pðy1 Þ 21
Pðx2 ÞPðy2 =x2 Þ 9
Pðx2 =y2 Þ ¼ ¼ :
Pðy2 Þ 19

Similarly, the other probabilities can be found. However, the calculation is


simpler taking into account the following

20 1
Pðx1 =y1 Þ þ Pðx2 =y1 Þ ¼ 1; Pðx2 =y1 Þ ¼ 1  Pðx1 =y1 Þ ¼ 1  ¼ :
21 21

Analogously

9 10
Pðx1 =y2 Þ ¼ 1  Pðx2 =y2 Þ ¼ 1  ¼ :
19 19
Brief Theoretical Overview 97

Joint probabilities can be easily found

20 21 1
Pðx1 ; y1 Þ ¼ Pðy1 ÞPðx1 =y1 Þ ¼  ¼ :
21 40 2

The same result can be obtained in the other way

3 2 1
Pðx1 ; y1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ ¼  ¼ :
4 3 2

Similarly, it is easy to calculate the rest

Pðx1 ; y2 Þ ¼ 1=4; Pðx2 ; y1 Þ ¼ 1=40; Pðx2 ; y2 Þ ¼ 9=40:

Sum of the last four probabilities equals one.


Entropy, defined in Chap. 2, according to the above consideration, can be called
a priori entropy.

X
r  
1
HðXÞ ¼ Pðxi Þld
i¼1
Pðxi Þ

because it is calculated using a priori probabilities. However, after the reception of


specific symbol yj, the a posteriori entropy of the set X is

X
r  
1
HðX=yj Þ ¼ Pðxi =yj Þld :
i¼1
Pðxi =yj Þ

In fact, H(X) is an average measure (per symbol) of the receiver uncertainty


about the emitted source symbols, before any symbol reception, while H(X/yj) is a
measure of the remaining part of (a posteriori) uncertainty about the emitted
symbol, when specific symbol yj is received. It can be called partial a posteriori
entropy.
For the above example

3 4 1 Sh
HðXÞ ¼ ld þ ld 4 ¼ 0:811
4 3 4 symb
20 21 1 Sh
HðX=y1 Þ ¼ ld þ ld 21 ¼ 0:276
21 20 21 symb
9 19 10 19 Sh
HðX=y2 Þ ¼ ld þ ld ¼ 0:998 :
19 9 19 10 symb

Therefore, if y1 is received the uncertainty is smaller, because it is more probable


that symbol x1 was sent then x2. However, if y2 is received, the uncertainty is
greater, because an a posteriori probabilities of both symbols are very close—9/19
98 4 Information Channels

and 10/19. The conclusion is that this uncertainty can increase after the reception of
some specific symbol.
However, information theory considers the average values and it is natural to
average partial a posteriori entropy over all output symbols to obtain a posteriori
entropy

X
s
HðX/Y) ¼ Pðyj ÞHðX=yj Þ
j¼1
Xs X
r  
1
¼ Pðyj Þ Pðxi =yj Þld
j¼1 i¼1
Pðxi =yj Þ
XX
r s  
1
¼ Pðyj ÞPðxi =yj Þld
i¼1 j¼1
Pðxi =yj Þ
XX
r s  
1
¼ Pðxi ; yj Þld :
i¼1 j¼1
Pðxi =yj Þ

For the above example

Sh
HðX=YÞ ¼ Pðy1 ÞHðX=y1 Þ þ Pðy2 ÞHðX=y2 Þ ¼ 0:617 ;
symb

and the uncertainty is smaller in the average.


Now, it can be mathematically defined how the information is transmitted
through the channel. Let the channel output is considered. The uncertainty about
specific symbol xi is ld ð1=Pðxi ÞÞ. If the received symbol is yj, the measure of
uncertainty about the symbol xi is ld ð1=Pðxi =yj ÞÞ: Therefore, the quantity of
information transmitted in this case is equal to the difference of a priori uncertainty
and a posteriori uncertainty
   
1 1
Iðxi ; yj Þ ¼ ld  ld :
pðxi Þ pðxi =yj Þ

Of course, it should be averaged over all pairs of symbols—(xi, yj). These


average values are in fact a priori and a posteriori entropies, and the average
quantity of transmitted information is
 
Sh
IðX,Y) ¼ HðX)  HðX/Y) :
symb
Brief Theoretical Overview 99

It is usually called mutual information (Problems 4.1–4.6, 4.8).


The last expression can be written as

IðX,Y) ¼ HðX)  HðX/Y)


Xr   X r X s  
1 1
¼1 Pðxi Þ ld  Pðxi ; yj Þ ld :
i¼1
Pðxi Þ i¼1 j¼1
Pðxi =yj Þ
Ps
Instead of multiplying the first sum by “1”, it can be written 1 ¼ j¼1 Pðyj =xi Þ;
yielding

X
r X
s  
1
IðX;Y) ¼ Pðxi ÞPðyj =xi Þ ld
i¼1 j¼1
Pðxi Þ
XX
r s  
1
 Pðxi ; yj Þ ld
i¼1 j¼1
Pðxi =yj Þ
Xr X  
s
Pðxi =yj Þ
¼ Pðxi ; yj Þ ld
i¼1 j¼1
Pðxi Þ
Xr X s  
Pðxi ; yj Þ
¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðxi ÞPðyj Þ

where the last row is obtained using Bayes’ rule. It should be noted that I(X, Y) is
symmetric with regard to X and Y (I(X, Y) = I(Y, X)—mutual information!).
For the above example

Sh
IðX; YÞ ¼ HðXÞHðX=YÞ ¼ 0:8110:617 ¼ 0:194 :
symb

Therefore, on the average, some quantity information per symbol is transmitted


through the channel.
Considering the last row in the above expression, it can be easily concluded that
the mutual information will be equal to zero if the input and output symbols are
statistically independent in pairs, i.e.

Pðxi ; yj Þ ¼ Pðxi ÞPðyj Þ ði ¼ 1; 2; . . .; r; j ¼ 1; 2; . . .; sÞ:

This conclusion verifies once more that the approach to definition the quantity of
information is intuitively correct. When input and output symbols are statistically
independent in pairs, i.e. when any received symbol does not depend on the
emitted, there is no any transmission of information.
100 4 Information Channels

As said earlier, the entropy of the finite discrete source is limited. It can be easily
shown the following

IðX,Y)  0;

where the equality sign corresponds only to the case of independent symbols. On
the other hand, the mutual information is maximal, if the a posteriori entropy equals
zero, because then all emitted information is transmitted. Therefore,

0  IðX; YÞ  HðXÞ:

Besides the above defined entropies, three more can be defined. The joint
entropy of X and Y

X
r X
s  
1
HðX,Y) ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðxi ; yj Þ

the entropy of output symbols

X
s  
1
HðYÞ ¼ Pðyj Þ ld :
j¼1
Pðyj Þ

and conditional entropy of output symbols

X
r X
s  
1
HðY/X) ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj =xi Þ

Using the above mentioned symmetry and some mathematical manipulation the
following expression can be obtained

HðX; Y) ¼ HðXÞ þ HðYÞHðX=YÞ ¼ HðXÞ þ HðY=XÞ ¼ HðYÞ þ HðX=Y):


 
v p
Consider BSC defined by PBSC ¼ (v + p = 1). Let input (a priori)
p v
probabilities are P(x1) = a, P(x2) = b(a + b = 1). Probabilities of output symbols
are

Pðy1 Þ ¼ av þ bp i Pðy2 Þ ¼ ap þ bv:

Mutual information is
Brief Theoretical Overview 101

X
2 X
2  
1
IðX,Y) ¼ HðY)  HðY/X) ¼ HðY)  Pðxi Þ Pðyj =xi Þ ld
i¼1 j¼1
Pðy j =xi Þ

X2  
1 1
¼ HðY)  Pðxi Þ  p  ld þ v  ld
i¼1
p v
 
1 1
¼ HðY)  ða þ bÞ  p  ld þ v  ld
p v
 
1 1
¼ HðY)  p  ld þ v  ld :
p v

Introducing the corresponding values into H(Y) one obtains


   
1 1
IðX,YÞ ¼ ðav þ bpÞ  ld þ ðap þ bvÞ  ld
 av þ bp
 ap þ bv
1 1
 p  ld þ v  ld ;
p v

or,

IðX,Y) ¼ Hðap þ bvÞ  HðpÞ;

where the entropy function is used


 
1 1
HðxÞ  ð1  xÞ  ld þ x  ld :
1x x

From

p  ap þ bv  v; za p  0:5;

i.e.

p  ap þ bv  v za p  0:5 ðv  0:5Þ;

it follows

Hðap þ bvÞ  HðpÞ;

therefore, I(X, Y)  0.
Now, the last step to define the channel capacity follows.
Mutual information can be written using input (a priori) probabilities, charac-
terizing the source, and transition probabilities, characterizing the channel.
102 4 Information Channels

r X
X  
s
Pðyj =xi Þ
IðX,Y) ¼ Pðxi ÞPðyj =xi Þ ld Pr :
i¼1 j¼1 i¼1 Pðxi ÞPðyj =xi Þ

It is easy to conclude that the mutual information depends on transition prob-


abilities, characterizing the channel, as well on input probabilities, characterizing
the source. This means that mutual information depends on the way the channel is
used as well—which source is at the channel output. However, the quantity should
be found, depending only of the channel. The number of input symbols is fixed
(r) and such sources can only differ by symbol probabilities. It is natural to chose
the input probabilities in such a way to obtain maximal mutual information

Imax ¼ max IðX,Y),


Pðxi Þ

where the maximization is carried out over the set of input probabilities. Dimension
of Imax is Sh/symb corresponding to the maximal quantity of information that can
be transmitted in the average by one symbol.
The engineers prefer to know the maximal quantity of information that can be
transmitted over the channel in one second. Let the source has symbol rate
vm(X, Y) [symb/s], being the maximal rate for the channel. Then, the maximal
information rate is
 
symb Sh Sh
C ¼ vm ðX,Y)  Imax ¼ :
s symb s

It will be called in this book the channel capacity (Problems 4.4, 4.5, 4.10,
4.11). Some authors (especially in the courses on probability) call Imax channel
capacity. However, the symbol rates are in a range from a few thousands symbols
per second to more terasymbols per second. It is usually taken vm(X, Y) = 2fg, to
avoid the intersymbol interference (First Nyquist criterion).
In reality, the information rate (flux) (Problems 4.1, 4.5) is
 
Sh
UðX,Y)  vðX,Y)  IðX,Y) ;
s

where v(X, Y) and I(X, Y) are the real values for the system. The corresponding
efficiency can be defined

UðX,Y)
gC  ;
C

Generally, the calculation of channel capacity is very difficult, usually using the
variation calculus or some similar mathematical tool. Here only two interesting case
will be given.
Brief Theoretical Overview 103

Consider a discrete channel where the number of symbols is r. However, further


hypothesis is that there is no noise in the channel (noiseless channel). Let the
transmission (signaling) rate is vm(X, Y) = 2fc, providing the transmission without
intersymbol interference. According to definition

IðX; YÞ ¼ HðXÞHðX=YÞ:

However, after the receiving of any symbol there would not be any ambiguity,
because there is no noise nor intersymbol interference. Hence, H(X/Y) = 0, and

Imax ¼ HðXÞ:

According to previous discussion, here the entropy is maximum when the


symbols are equiprobable, i.e. when Hmax(X) = ldr. Finally,

C ¼ 2fc ldr

Of course, the number of symbols (r) in practice may depend on signal-to-noise


ratio.
A very important channel is so called continuous channel (with additive noise).
Here the source is continuous, the channel is continuous and the output (being sum
of signal and noise) is continuous as well. Te corresponding entropy is defined in
Chap. 2. Introducing the corresponding probability densities w(x), w(y), w(x/y), w(y/
x) and w(x, y) (the corresponding indexes are omitted!) one obtains

Z1  
1
HðX) ¼ wðxÞ ld dx;
wðxÞ
1
Z1  
1
HðY) ¼ wðyÞ ld dy;
wðyÞ
1
Z1 Z1  
1
HðX/Y) ¼ wðx; yÞ ld dxdy;
wðx=yÞ
1 1

Here the mutual information is

Z1 Z1  
wðx; yÞ
IðX,Y) ¼ wðx; yÞ ld dxdy;
wðxÞwðyÞ
1 1

It is supposed that a frequency band is limited, e.g. 0  fc, and the corre-
sponding rate is vm(X, Y) = 2fc, yielding
104 4 Information Channels

C ¼ 2fc  Imax :

Let the continuous channel input (x) has the probability density w(x), where

Z1
mx ¼ x ¼ xwðxÞdx ¼ 0
1
Z1
r2x ¼ x2 ¼ x2 wðxÞdx\1:
1

The additive channel noise is n, where

Z1
mn ¼ n ¼ nwðnÞdn ¼ 0
1
Z1
r2n ¼ n2 ¼ n2 wðnÞdn\1:
1

Further, it is supposed that signal and noise are statistically independent,


yielding the output signal (y)

y ¼ x þ n:

where

my ¼ y ¼ mx þ mn ¼ 0;
r2y ¼ y2 ¼ r2x þ r2n :

Mutual information is

IðX,Y) ¼ HðY)  HðY/X):

The output ambiguity depends only on additive noise (independent from signal)
and the following can be shown (H(N) is noise entropy)

IðX,Y) ¼ HðY)  HðN):

One interpretation of this result is that the transmitted quantity of information


(per sample) is obtained when from the total output quantity of information, the
“false quantity of information”, introduced by noise is deducted.
Brief Theoretical Overview 105

Now, a hypothesis about the noise (its probability density) should be made. The
worst case, as explained in Chap. 2, for a fixed variance (r2), is when the noise has
Gaussian probability density, yielding

1 
Hmax ¼ ld 2per2 :
2

Therefore,

1 
IðX,Y) ¼ HðY)  ld 2per2n ;
2

yielding

1 
Imax ¼ max HðYÞ  ld 2per2n :
wðxÞ 2

Now, the probability density w(x) of the input signal should be found to max-
imize H(Y). The output signal, sum of the input signal and noise (statistically
independent), will have the maximal entropy if its probability density is Gaussian.
To obtain it, the input signal should be Gaussian as well because the sum of the
independent Gaussian processes is Gaussian process. Therefore,

1 1 h i
HðY)jmax ¼ ld 2per2y ¼ ld 2pe r2x þ r2y ;
2 2

and a maximal mutual information is


 
1   2  1  1 r2x
Imax ¼ ld 2pe rx þ rn  ld 2pe rn ¼ ld 1 þ 2 :
2 2
2 2 2 rn

Capacity of such channel is


     
1 r2x r2x Sh
C ¼ vm ðx; yÞImax ¼ 2fc  ld 1 þ 2 ¼ fc  ld 1 þ 2 :
2 rn rn s

Some authors instead of fc use symbol B (frequency band) and instead of vari-
ances quotient—symbol S/N (from signal-to-noise, being here a natural quotient of
signal and noise powers, not in dB!), yielding finally (Problems 4.4, 4.5, 4.9–4.11)
 
S
C ¼ B  ld 1 þ :
N
106 4 Information Channels

This expression is obtained using mainly abstract consideration, but the used notions
are very well known to the communication engineers—frequency band (B) and
signal-to-noise ratio (S/N). Therefore, capacity can be increased by increasing either
band either signal-to-noise ratio (i.e. by increasing the transmitter power or by sup-
pressing the noise power). Of course, some compensation can be introduced—de-
creasing the band can be compensated by increasing the transmitter power (but the
increase is slow—logarithmic!) and vice versa.
When applying the above expression for the channel capacity it is very important to
keep in mind the conditions supposed. The channel is considered with additive Gaussian
noise, a capacity is achieved if the input signal is Gaussian as well. Therefore, it is
maximum possible value for capacity. In reality, if the signals are used which are not
Gaussian (usually some kind of baseband pulses or some kind of digital modulation),
the capacity is smaller. In such cases, a calculation of capacity can be very complicated.
The notion of error probability is very well known. Still, it is useful to connect it
with so called decision rule. Consider BC described by transition matrix
 
v1 p1
PBC ¼ :
p2 v2

It is obvious that the error will happen as a result of two exclusive events—either
x1 is emitted and y2 is received or x2 is emitted and y1 is received. Therefore, the
average error probability is

Pe ¼ Pðx1 Þp1 þ Pðx2 Þp2 ;

and it is just the error probability (Problems 4.1, 4.2, 4.3, 4.7), as usually this
notion is used. For P(x1) = P(x2) = 0.5

Pe ¼ 0:5  ðp1 þ p2 Þ:

For BSC, where p1 = p2 = p (v1 = v2 = v),

Pe ¼ Pðx1 Þp þ Pðx2 Þp ¼ ½Pðx1 Þ þ Pðx2 Þ  p ¼ p;

because P(x1) + P(x2) = 1 and p is just the parameter equivalent to the error
probability.
Consider BSC described by transition matrix
 
0:99 0:01
:
0:01 0:99
Brief Theoretical Overview 107

It is obvious that the error probability is Pe = p = 0.01. But, consider the fol-
lowing transition matrix
 
0:01 0:99
:
0:99 0:01

Do the error probability is 0.99? Of course no! One should only “complement”
the logic. In previous cases after receiving symbol y1(=0) the decision was that
symbol x1(=0) is emitted, and vice versa. If now after the receiving y1(=0) the
decision is x1(=1), the error probability is again Pe = 0.01. Therefore, the error
probability does not depend on transition probabilities only, but on the decision rule
as well—on the way the receiver interprets the received symbol. It should be noted
that here (binary channel) the worst case is when the error probability is 0.5.
Now, the notion of decision rule will be more precisely defined, Consider the
channel having r input symbols {xi} (i = 1, 2, …, r) and s output symbols {yj}
(j = 1, 2, …, s). Decision rule D(yj) is a function specifying a unique input symbol
xi for each output symbol yj, i.e.

Dðyj Þ ¼ xi

In the next to last example it was D(y1) = x1, D(y2) = x2 and in the last one
D(y1) = x2, D(y2) = x1.
Generally, there are rs different decision rules. For BC there are 22 = 4 different
decision rules.
The next question is which decision rule one should choose? The answer is clear—
that one yielding minimum error probability. Therefore, the relation between decision
rule and error probability should be found. Let symbol yj is received. The receiver will
use the decision rule Dðyj Þ ¼ xi ; But, the receiving of symbol yj can be as well the
consequence of emitting some other symbol from {X}. It can be written as follows

Pðxi =yj Þ þ Pðe=yj Þ ¼ 1;

where P(e/yj) is a conditional probability that the wrong decision was made if the
received symbol is yj. Error probability is obtained by averaging this probability
over all output symbols.

X
s
Pe ¼ Pðyj ÞPðe=yj Þ:
j¼1

The considered events are mutually exclusive, the addends are nonnegative, and
minimization is achieved by minimizing every addend independently. P(yj) does
not depend on decision rule and D(yj) = xi should be chosen minimizing every
conditional probability for itself. With the chosen decision rule one obtains
108 4 Information Channels

 
Pðe=yj Þ ¼ 1  Pðxi =yj Þ ¼ 1  P Dðyj Þ=yj :

Therefore, to minimize the error probability, for every received symbol yj the
following decision rule should be chosen

Dðyj Þ ¼ x ;

where x* is defined as

Pðx =yj Þ  Pðxi =yj Þ ði ¼ 1; 2; . . .; rÞ:

In other words, for every received symbol, the error probability will be minimal if
to each yj the input symbol x* is joined having the maximum a posteriori probability
P(x*/yj). Of course, if there are more such symbols, anyone can be chosen.
This rule is denoted as MAP (Maximum A Posteriori Probability rule)
(Problems 4.1, 4.2). To apply this rule the a posteriori probabilities P(xi/yj) should
be found. These probabilities can be found only if besides the transition proba-
bilities, a priori probabilities of input symbols are known.
On basis of Bayes’ rule, previous inequalities can be written as

Pðyj =x ÞPðx Þ Pðyj =xi ÞPðxi Þ


 ði ¼ 1; 2; . . .; rÞ:
Pðyj Þ Pðyj Þ

If input probabilities are mutually equal

1
Pðxi Þ ¼ ði ¼ 1; 2; . . .; rÞ;
r

previous expression can be written in the form

Pðyj =x Þ  Pðyj =xi Þ ði ¼ 1; 2; . . .; rÞ;

yielding a simple way to find the decision rule on the basis of transition matrix only.
Simply, for every received symbol yj the decision is that symbol xi is emitted for
which in the corresponding column in matrix the conditional probability has a
maximum value. This rule is denoted as ML (Maximum Likelihood rule) (Problem
4.2). Minimal error probability will be obtained if all input symbols are
equiprobable.
For MAP it can be shown that the error probability is
Brief Theoretical Overview 109

X
s
Pe ¼ Pðe=yj ÞPðyj Þ
j¼1
Xs  
¼ Pðyj ÞP 1  Pðx =yj
j¼1
X
s X
s
¼ Pðyj Þ  Pðyj ÞPðx =yj Þ
j¼1 j¼1
X
s
¼1 Pðx ; yj Þ
j¼1
Xs
¼1 Pðx ÞPðyj =x Þ:
j¼1

For equiprobable input symbols

1X s
Pe ¼ 1  Pðyj =x Þ;
r j¼1

and error probability is obtained from transition matrix, by adding maximal values
in columns.
The above described decision process can be called as well the hard decision
(Problems 4.2, 4.4, 4.10). This means that the final decision about the received
symbol is made in demodulator (detector). However, it is better to quantize the
output signal, i.e. instead of symbol value to give some estimation (i.e. instead of
the value “1” to generate real (measured) values—0.8; 1.05; 0.98 etc.). In this case,
it is said the soft decision (Problems 3.10, 3.11) is made. The drawback is that more
bits should be used.
In fact, in digital transmission, the channel input is discrete. By noise and other
interferences superposition, the channel output is continuous. By using the corre-
sponding thresholds, the output is quantized becoming discrete. The thresholds
positions influence the error probability. These positions should be chosen to
minimize the error probability.
Up to now, channels without memory are considered. As said earlier, in the
channel with intersymbol interference, the error probability of the received symbols
depends as well and on the adjacent symbols in such a way introducing memory
into the channel. Models for channels with memory are often made using a com-
bination of memoryless channels, usually by so called Markov models (Problem
4.12).
It is obvious that channel without intersymbol interference with additive
Gaussian noise can be modeled as BSC. However, in such channels sometimes
110 4 Information Channels

Fig. 4.5 State diagram of PBG


Gilbert model for burst noise
channel G B
PGG PBB

PGB

impulsive noise occurs (burst-noise channel). It can be considered as a nonsta-


tionary channel, because, the error probability changes in time. However, taking
into account that during the bursts of noise result in “packets of errors”, one can
conceive that BSC has two possible states—one without bursts of noise, when one
formula for error probability is valid, and the other, during the burst of noise, when
the error probability is much higher, e.g. 0.5. Gilbert [20] proposed a simple model.
There are two states in the model—“good” (G) and “bad” (B). In every state
channel is considered as BSC (where only one parameter is sufficient to describe the
channel). In Gilbert model it is supposed the in good state there are no errors
(pg = 0), while in bad state error probability can take any value. Usually, it is taken
pL = 0.5 (the worst case). Transition matrices are
   
1 0 vB pB
PG ¼ ; PB ¼ ðvG þ pB ¼ 1Þ:
0 1 pB vB

The corresponding transition probabilities are given as well

PGG þ PGB ¼ 1; PBB þ PBG ¼ 1;

(where PAB  P(B/A). State diagram is shown in Fig. 4.5.


There are three independent parameters, pL and two transition from state to state
probabilities (other two are then fixed). Transition matrix for the model is
 
PGG PGB
PGIL ¼ :
PBG PBB

Stationary state probabilities are

PBG
PG ¼
PGB þ PGG
PGB
PB ¼ ;
PGB þ PGG

and the average error probability

pB PGB
Pe ¼ pB  PB ¼ :
PGB þ PGG
Brief Theoretical Overview 111

The sequence of states can be easily found. The average duration of time
intervals (sojourn time) in good state corresponds to the intervals without the
impulsive noise (without the errors in Gilbert model). The average duration of time
intervals in bad state corresponds to the intervals with the impulsive noise. They are

1 1
NG ¼ ; NB ¼ :
1  PGG 1  PBB

It is only the first step in modeling real channels from the error statistics point of
view. Elliott modified this model by introducing the possibility of errors (pG 6¼ 0)
in the good state (Problem 4.12) [21].
Now, the time came to consider the trough meaning of the channel capacity. It is
given by Second Shannon theorem. Its complicated proof here will be omitted.
According to this theorem, the error probability can be made as small as one wish if
the channel information rate is smaller than the capacity (U(X, Y) < C). If the
signaling rate is v(X, Y), channel information rate is U(X, Y) = v(X, Y) I(X, Y)
[Sh/s]. This result is unexpected. The reliable transmission over the unreliable
channel is possible! It should be noted that the transmission without any error is not
possible. It only means that the error probability can be kept as low as one wish.
This goal can be achieved using error control coding (considered in details in
the next five chapters). To get some insight, an elementary example will be done
here—error control coding using repetitions.
Consider binary memoryless binary source (emitting symbols 0 and 1). BSC
with parameter p = 10−2 (crossover probability) is used. Usually it is high error
probability. One possible way to lower error probability is to repeat every infor-
mation bit three times, i.e. 0 is encoded as 000 and 1 as 111. That means that three
channel bits are used for every information bit. Of course, the corresponding
information rate is three times smaller (i.e. the code rate (Problem 4.12) is 1/3). At
the receiving end the majority logic is implemented. Three bits are decoded
according to the greater number of ones or zeros as one or zero. It is in fact MAP
criterion. Probability of one error is this case is 3p(1–p)2 = 3  0.01  0.992 =
0.0294, while the probability of two errors is 3  p2  (1 − p) = 3  0.0001  0.99
= 0.000297. The probability of undetected error is
   
3 3 3 2
Pe ¼ p þ p ð1  pÞ ¼ 0:000298:
0 1

Therefore, single errors in a code word are detected and corrected. If the errors
are only detected, then all single and double errors will be detected, but the error
probability would be Pe = p3 = 10−6. Considering further fivefold repetition, with
error correction, the probability of uncorrected error is
     
5 5 5 4 5 3
Pe ¼ p þ p ð1  pÞ þ p ð1  pÞ2 105 :
0 1 2
112 4 Information Channels

Fig. 4.6 Error probability


Pe
versus code rate (R) (other
explanations in the text)

10-2

10-4

10-6

10-8
ε

1 Imax 1/3 1/5 1/7 1/9 R[Sh/b]

Of course, the code rate is now R = 1/5. The number of repetition can be further
increased (Pe 4  10−7 for sevenfold repetition and Pe 10−8). Corresponding
results are shown using fat points in Fig. 4.6. Note the reciprocal values at the
abscissa.
By considering this figure, it would be easy to understand the essence of error
control coding and the corresponding problems. In this case (n-fold repetition), to
obtain small error probability (<e) a code rate must be drastically decreased.
Therefore, the logical question is: does exist some way to decrease the error
probability, without such decrease of code rate? Second Shannon theorem just gives
a positive answer.

Problems

Problem 4.1 Zero-memory binary source emits symbols with rate vs = 100 [b/s],
the probability of one symbol is P(x1) = 0.3. The corresponding channel is
described by transition matrix
 
0:4 0:6
P¼ :
0:75 0:25

(a) Find the entropy and the information rate of the source.
(b) Find the mutual (transmitted) information and the information rate of the
channel.
(c) Find the decision rule for the receiver yielding the minimum error probability.
(d) Repeat the same as in (c) for P(x1) = 0.2.
Problems 113

Solution
(a) Entropy and information rate are the source characteristics. Finding
P(x2) = 1 − P(x1) = 0.7, the following is calculated
X
2  
1 Sh
HðXÞ ¼ Pðxi Þ ld ¼ 0:882 ;
i¼1
Pðxi Þ b

UðXÞ ¼ HðXÞvs ¼ 88:2 ½Sh/s

For the case of equiprobable symbols, the entropy equals one and the infor-
mation rate has the maximum value—100 [Sh/s].
(b) Transmitted (mutual) information is [1, 2]
X
2 X
2
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ

and for calculation the corresponding probabilities should be found:


• Probabilities of channel output symbols are

Pðy1 Þ ¼ Pðy1 =x1 ÞPðx1 Þ þ Pðy1 =x2 ÞPðx2 Þ ¼ 0:645;


Pðy2 Þ ¼ Pðy2 =x1 ÞPðx1 Þ þ Pðy2 =x2 ÞPðx2 Þ ¼ 0:355:

• Joint probabilities of binary symbols at the input and the output of the
channel are
Pðx1 ; y1 Þ ¼ Pðx1 ÞPðy1 =x1 Þ ¼ 0:12
Pðx2 ; y1 Þ ¼ Pðx2 ÞPðy1 =x2 Þ ¼ 0:525
Pðx1 ; y2 Þ ¼ Pðx1 ÞPðy2 =x1 Þ ¼ 0:18
Pðx2 ; y2 Þ ¼ Pðx2 ÞPðy2 =x2 Þ ¼ 0:175:

Transmitted information is
0:4 0:75 0:6
IðX; YÞ ¼ 0:12 ld þ 0:525 ld þ 0:18 ld
0:645 0:645
  0:355
0:25 Sh
þ 0:175 ld ¼ 0:0793 ;
0:355 b

and as a source emits symbols (bits) directly into the channel

UðX; YÞ ¼ vðX; YÞIðX; YÞ ¼ vs IðX; YÞ ¼ 7:93 ½Sh/b :


114 4 Information Channels

Transmitted information should always be smaller than the source entropy,


and it is obvious that the channel information rate cannot be greater from the
source information rate

IðX; YÞ  HðXÞ ) UðX; YÞ  UðXÞ:

(c) Minimum error probability is achieved when the maximum a posteriori


probability (Maximum A Posteriori Probability—MAP) decision rule is used.
Simply, this rule leads to the choice of input symbol for which the greatest a
posteriori probability of output symbol is obtained. In this case, the a posteriori
probabilities are

Pðx1 =y1 Þ ¼ Pðx1 ; y1 Þ=Pðy1 Þ ¼ 0:186


Pðx2 =y1 Þ ¼ Pðx2 ; y1 Þ=Pðy1 Þ ¼ 0:814
Pðx1 =y2 Þ ¼ Pðx1 ; y2 Þ=Pðy2 Þ ¼ 0:507
Pðx2 =y2 Þ ¼ Pðx2 ; y2 Þ=Pðy2 Þ ¼ 0:493

the corresponding decision rule is

Pðx1 =y1 Þ ¼ 0:186


) dMAP ðy1 Þ ¼ x2 ;
Pðx2 =y1 Þ ¼ 0:814
:
Pðx1 =y2 Þ ¼ 0:507
) dMAP ðy2 Þ ¼ x1
Pðx2 =y2 Þ ¼ 0:493

From this follows that the “opposite” decision rule is optimum in this case, and
the error probability is

Pe ¼ Pðx1 ÞPðy1 =x1 Þ þ Pðx2 ÞPðy2 =x2 Þ ¼ 0:295;

while for a “classic” decision rule the error probability is

P0e ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼ 0:705:

(d) In this case it seems that the optimum decision rule is

Pðx1 =y1 Þ ¼ 0:1176;


) dMAP ðy1 Þ ¼ x2 ;
Pðx2 =y1 Þ ¼ 0:8824
Pðx1 =y2 Þ ¼ 0:375
) dMAP ðy2 Þ ¼ x2 ;
Pðx2 =y2 Þ ¼ 0:625

leading to an unexpected result. However, the decision rule must provide that for
the different output symbols, the decision must be that different input symbols are
emitted. In this case the decision can be based on the a posteriori probabilities, or
one have to choose the rule giving a smaller error probability, i.e.
Problems 115

Pe ¼ Pðx1 ÞPðy1 =x1 Þ þ Pðx2 ÞPðy2 =x2 Þ ¼ 0:28;


P0e ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼ 0:72;

and the conclusion is that in this case, the optimum decision rule is
dMAP ðy1 Þ ¼ x2 ; dMAP ðy2 Þ ¼ x1 :

Problem 4.2 Zero-memory binary source where the probability of one symbol is
P(x1) = p emits the symbols at the input of memoryless channel whose transition
matrix is
 
0:3 0:7
P¼ :
0:01 0:99

(a) Find the entropies of input and output sequences and conditional entropy of
output symbols when the input sequence is known.
(b) Find the transmitted information for p = 0.1, p = 0.5 and p = 0.9. For the
same values of p find the decision rule yielding the minimum error probability
as well as the probability of error in this case.
(c) Draw the transmitted information as a function of p. Draw the dependence of
the error probability as a function of p for the cases when the maximum a
posteriori probability (MAP) rule is used and when maximum likelihood
(ML) rule is used.

Solution
(a) For zero-memory binary source the entropy is

1 1
HðXÞ ¼ p ld þ ð1  pÞ ld ;
p 1p

while the entropy of outgoing sequence is

1 1
HðYÞ ¼ ð0:3p þ 0:01ð1  pÞÞ ld þ ð0:7p þ 0:99ð1  pÞÞ ld :
0:3p þ 0:01ð1  pÞ 0:7p þ 0:99ð1  pÞ

Conditional entropy of output symbols when the sequence of the input sym-
bols is known can be expressed as

1 1 1
HðY=XÞ ¼ 0:3p ld þ 0:01ð1  pÞ ld þ 0:7p ld
0:3 0:01 0:7
1
þ 0:99ð1  pÞ ld :
0:99
116 4 Information Channels

(b) The transmitted information is

1
IðX; YÞ ¼ HðYÞ  HðY=XÞ ¼ ð0:3p þ 0:01ð1  pÞÞ ld
0:3p þ 0:01ð1  pÞ
1
þ ð0:7p þ 0:99ð1  pÞÞ ld
0:7p þ 0:99ð1  pÞ
1 1 1 1
 0:3p ld  0:01ð1  pÞ ld  0:7p ld  0:99ð1  pÞ ld ;
0:3 0:01 0:7 0:99

after the calculation yielding

p¼0:1 p¼0:5
IðX; YÞ ¼ 0:0768 ½Sh/b ; IðX; YÞ ¼ 0:1412 ½Sh/b ;
p¼0:9
IðX; YÞ ¼ 0:0417 ½Sh/b :

To obtain the minimum probability of error MAP decision rule is used as follows
(1) For p = 0.1

Pðx1 =y1 Þ ¼ 0:7691


) dMAP ðy1 Þ ¼ x1 ;
Pðx2 =y1 Þ ¼ 0:2308
Pðx1 =y2 Þ ¼ 0:0728
) dMAP ðy2 Þ ¼ x2 :
Pðx2 =y2 Þ ¼ 0:9272

(2) For p = 0.5

Pðx1 =y1 Þ ¼ 0:9677


) dMAP ðy1 Þ ¼ x1 ;
Pðx2 =y1 Þ ¼ 0:0323
Pðx1 =y2 Þ ¼ 0:4142
) dMAP ðy2 Þ ¼ x2 :
Pðx2 =y2 Þ ¼ 0:5858

(3) For p = 0.9

Pðx1 =y1 Þ ¼ 0:9963


) dMAP ðy1 Þ ¼ x1 ;
Pðx2 =y1 Þ ¼ 0:0037
Pðx1 =y2 Þ ¼ 0:8642
) dMAP ðy2 Þ ¼ x1 :
Pðx2 =y2 Þ ¼ 0:1358

For the first two cases the results are understandable


ðdMAP ðy1 Þ ¼ x1 ; dMAP ðy2 Þ ¼ x2 Þ while, for the third case the smaller error
probability is obtained when “opposite” decision rule is applied. The corre-
sponding error probabilities are
Problems 117

p¼0:1 p¼0:5 p¼0:9


Pe ¼ 0:0790; Pe ¼ 0:355; Pe ¼ 0:369:

From these results the following can be concluded:


• Transmitted information and the error probability depend not only on the
channel, but on the source as well.
• Transmitted information does not depend on decision rule used, while the
error probability depends on.
• Maximum transmitted information and minimum error probability are not
achieved for the same values of the source parameters (it will be addi-
tionally illustrated in the next part of solution).
(c) It is obvious that MAP is not the only possible decision rule. Often the
maximum likelihood (ML) rule is used, where the decision depends on
channel parameters only

Pðy1 =x1 Þ ¼ 0:3 Pðy2 =x1 Þ ¼ 0:7


) dML ðy1 Þ ¼ x1 ; ) dML ðy2 Þ ¼ x2
Pðy1 =x2 Þ ¼ 0:01 Pðy2 =x2 Þ ¼ 0:99

i.e. does not depending in any way on the probabilities of the input binary
symbols.
The transmitted information as a function of p is shown in Fig. 4.7 and from
the figure it can be found that the maximum corresponds to value p = 0.409.
Probability of error as a function of p for MAP and ML decision rules are
shown in Fig. 4.8. It is obvious that the minimum error probability

0.16

0.14
Transitted information, I(X,Y)

0.12

0.1

0.08

0.06

0.04

0.02

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a symbol at the channel input, p

Fig. 4.7 Transmitted information as a function of p


118 4 Information Channels

0
10
Pe , MAP
Pe , ML
Pe for optimum decision rule

−1
10

d MAP (y 1)=x 2
d MAP (y 1)=x 1 d MAP (y 2)=x 1
d MAP (y 2)=x 2
−2
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a symbol at the channel input, p

Fig. 4.8 Probability of error as a function of p for MAP and ML

corresponds to p = 0, i.e. when the source emits continuously one binary


symbol (however, in this case the transmitted information equals to zero, and
there is no a symbol probability being optimum according to the both criteria).

Problem 4.3 Zero-memory binary source where the probability of one symbol is P
(x1), emits the symbols at the input of memoryless channel described by the tran-
sition matrix
 
1  p0 p0
P¼ :
p1 1  p1

(a) For P(x1) = 0.1 find the dependence of I(X,Y) on the channel parameters.
(b) Draw the dependence of I(X, Y) on p0 for p1 = 0.6 and P(x1) = 0.1; P
(x1) = 0.5 and P(x1) = 0.9.
(c) For a fixed value p1 = 0.6, for p1 = 0.01 and p1 = 0.99 find the dependence of
I(X, Y) and of the decision rule on the probability of symbol 0 in the incoming
sequence.
(d) For p0 = p1 = p find the probability P(x1) yielding the maximum value of I(X,Y).

Solution
(a) The transmitted information for binary asymmetric channel with transition
matrix P is
Problems 119

Fig. 4.9 Transmitted


information as a function of
binary channel parameters

I(X,Y)

p0
p1

0.4

0.35 P(0)=0.1
P(0)=0.5
Transmited information, I(x,y)

P(0)=0.9
0.3

0.25

0.2

0.15

0.1

0.05

−0.05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a binary symbol, P(0)=p

Fig. 4.10 The dependence of transmitted information on p0 for p1 = 0.6 and some values of P(x1)

   
ð1  p0 Þ p0 p1 ð1  p1 Þ
IðX; YÞ ¼ Pðx1 Þ ð1  p0 Þ ld þ p0 ld þ Pðx2 Þ p1 ld þ ð1  p1 Þ ld :
Pðy1 Þ Pðy2 Þ Pðy1 Þ Pðy2 Þ

If the parameters p0 and p1 are not fixed, the transmitted information can be
considered as a three-dimensional function shown in Fig. 4.9. The maximum is
120 4 Information Channels

0.4
p =0.99
0
p =0.01
0.35 0

Transmited information I(x,y)


0.3

0.25

0.2

0.15

0.1

0.05

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a binary symbol, P(0)

Fig. 4.11 Dependence of transmitted information on P(x1) for p1 = 0.6 and different values of p0

achieved for p0 = p1 = 0 and p0 = p1 = 1. Minimum is obtained for p0 + p1 = 1


(diagonal in coordinate system), for any probability of the input symbols.
(b) As shown in Fig. 4.10, for p1 = 0.6, minimum I(X, Y) is obtained for p0 = 0.4,
then the transmitted information equals zero for any value of P(x1). It can be
concluded as well that when p0 + p1 6¼ 1 the transmitted information is
greater if the probabilities of the source symbols are more equalized.
(c) For a asymmetric channel, maximum I(X, Y) does not corresponds to the case
P(x1) = P(x2) = 0.5, as shown in Fig. 4.11. Putting P(x1) = a, optimal a value
is obtained as a solution of the equation
 
@IðX; YÞ @ ð1  p0 Þ p0
¼ að1  p0 Þ ld þ ap0 ld
@a @a ð1  p0 Þa þ p1 ð1  aÞ p0 a þ ð1  p1 Þð1  aÞ
 
@ p1 ð1  p1 Þ
þ ð1  aÞp1 ld þ ð1  aÞð1  p1 Þ ld ¼ 0:
@a ð1  p0 Þa þ p1 ð1  aÞ p0 a þ ð1  p1 Þð1  aÞ

It is easy to verify that it is very difficult to find the optimum value of P(x1) in
a closed form and it is easier to obtain it numerically, by finding the position of
transmitted information maximum for a fixed p0 and p1. For p0 = 0.01 and
p1 = 0.6 maximum value of transmitted information is obtained for P
(x1)max = 0.5770 and in the case p0 = 0.99 and p1 = 0,6 for P
(x1)max = 0.5890.
For p0 = p1 = p, the transmitted information corresponds to the main diagonal
in Fig. 4.9. Then binary channel becomes binary symmetric channel
(BSC) and the transmitted information is
Problems 121

 
1 1
IðX; YÞ ¼ ða þ p  2apÞ ld ð Þ þ ð1  a  p þ 2apÞ ld ð Þ
a þ p  2ap 1  a  p þ 2ap
 
1 1
 p ld ð Þ þ ð1  pÞ ld ð Þ :
p 1p

The optimum value of parameter a is obtained on the basis of


 
@ 1 1
½1  a  p þ 2ap ld ð Þ þ ða þ p  2apÞ ld ð Þ ¼ 0;
@a 1  a  p þ 2ap a þ p  2ap

and it is easy to verify that it is satisfied only when

1  aopt  p þ 2paopt ¼ aopt þ p  2paopt ;

and the optimum value P(x1) is aopt = 0.5 regardless on p value. The
maximum transmitted information is
 
1 1
IðX; YÞmax ¼ 1  p ld ð Þ þ ð1  pÞ ld ð Þ :
p 1p

Problem 4.4 The digital transmission system consists of:


1. Binary information source emitting symbols with rate vs = 100 [Mb/s], the
probability of one symbol is P(0) = 0.25;
2. Line encoder emitting polar pulses of amplitude U = 1 [V];
3. Channel consisting of N identical sections. Every section is a linear system with
the transfer function corresponding to an ideal low-pass filter with the cutoff
frequency fc = 1 [MHz], the noise has a zero mean value and the variance
r2 = 0.1 [W];
4. The receiver making decisions on the one sample basis, the threshold is put to
zero;
5. The user receiving the reconstructed bit sequence.
(a) Draw the system block-scheme under assumption that the line encoder, all
sections and the receiver (making decisions) can be jointly represented by a
discrete channel. Find the source information rate and the channel infor-
mation rate.
(b) Consider the case when a greater number of regenerative stations are in
cascade, where the line coder, one section and decision block form an
equivalent binary symmetric channel. In this case find the channel infor-
mation rate for N = 2, N = 10 and N = 100.
(c) Find the capacity of the equivalent discrete channel and draw it as a function
of the number of sections. How it can be achieved?
122 4 Information Channels

(d) Find the capacity of channel when noise is present, obtained by removing
line encoder and decision block (the case N = 1 is considered). How to
achieve the capacity in this case?

Solution
(a) System block-scheme is shown in Fig. 4.12. The source information rate is
   
1 3 4 kSh
UðSÞ ¼ vs ld 4 þ ld ¼ 811:3 :
4 4 3 s

IT emits symbols through the channel. The channel is equivalently modeled as


a binary symmetric channel (the noise influence is the same on the trans-
mission of positive or negative pulses).
The error probability corresponding to this channel is
  rffiffiffiffiffi!
1 U 1 An
Pe ¼ erfc pffiffiffi ¼ erfc ¼ 7:82  104 ;
2 2r 2 2

where An denotes the signal-to-noise ratio in the channel [here An = 10 (not


dB!)]. The symbols are emitted directly through the binary symmetric channel
and transmission rate is
 
1 1
UðX; YÞ ¼ vs ð1=4 þ p=2Þ ld ð Þ þ ð3=4  p=2Þ ld ð Þ
1=4 þ p=2 3=4  p=2
   
1 1 kSh
 vs p ld ð Þ þ ð1  pÞ ld ð Þ ¼ 802:7 :
p 1p s

Channel – LP
0100 Line encoder Decision
Binary filter with noise
(polar code) (comparison User
source fg=1MHz,
vs=1Mb/s U=1V 2 with threshold )
=0.1W
n(t )
Equivalent BSC

Fig. 4.12 Complete block-scheme of the system

Table 4.1 Parameters of channel having N sections for signal-to-noise ratio


an = 10 log10(An) = 10 dB
N 1 2 10 100
4 3 3
ðNÞ
pekv 7:82  10 1:56  10 7:77  10 7:25  102
ðNÞ 802,700 795,700 751,750 488,630
Uekv ðX; YÞ [Sh/s]
ðNÞ 1,981,600 1,966,300 1,868,700 1,249,600
Cekv ½Sh=s
Problems 123

(b) Cascade of N symmetric binary channels is a binary symmetric channel as well


(matrices are multiplied), the equivalent error probability is

ðNÞ
pekv ¼ ð1  ð1  2pÞN Þ=2;

and the transmission rate is


!
ðNÞ ðNÞ 1 ðNÞ 1
Uekv ðX; YÞ ¼ vs ð1=4 þ pekv =2Þ ld ð ðNÞ
Þ þ ð3=4  pekv =2Þ ld ð ðNÞ
Þ
1=4 þ pekv =2 3=4  pekv =2
!
ðNÞ 1 ðNÞ 1
 vs pekv ld ð ðNÞ Þ þ ð1  pekv Þ ld ð ðNÞ
Þ ;
pekv 1  pekv

The corresponding numerical data are given in Table 4.1.


(c) Channel capacity is a maximum possible value of the transmission rate. The
maximum signaling rate was chosen so as to avoid intersymbol interference
[22], i.e. vmax(X, Y) = 2fc = 2 [Mb/s], while the maximum transmitting rate is
achieved by optimizing the source symbol probabilities [2]

ðNÞ ðNÞ
Cekv ¼ max vmax ðX; Y ÞUekv ðX; YÞ
Pð0Þ
ðNÞ ðNÞ ðNÞ ðNÞ
¼ vmax ðX; Y Þ 1 þ pekv ld ðpekv Þ þ ð1  pekv Þ ld ð1  pekv Þ :

Capacities of the equivalent channel for 1, 2, 10 and 100 sections is given in


Table 4.1, and the corresponding capacity dependence on the channel

5.5

5 Continuous channel
Binary channel
4.5

3.5
Capacity, C

2.5

1.5

0.5
0 5 10 15
Signal to noise ratio a n [dB]

Fig. 4.13 Capacities of discrete and continuous channel corresponding to one section for An = 10
124 4 Information Channels

signal-to-noise ratio is shown in Fig. 4.13. Channel capacity can be used in


full if at the channel input the source emits equiprobable symbols with twice
greater rate than the earlier source.
If the line encoder and decision block are removed, a continuous channel with
noise is obtained. The corresponding capacity is [1]

C ¼ fg ld ð1 þ An Þ;

and the channel capacity can be achieved only in the case where at his input a
continuous information source generating the Gaussian random process is applied.
In this case, the process at the channel (linear system) output has the Gaussian
distribution as well.
It is obvious that binary channel has a smaller capacity compared to the corre-
sponding continuous channel with noise. This is a consequence of binary signaling
used at the transmitting part and of a hard decision at the receiving part, trans-
forming the channel with noise into a binary symmetric channel. In the next
problem, it will be shown that by applying a different decision rule at the receiver,
this difference could be made smaller.
Problem 4.5 Digital communication system consists of
1. Information source emitting symbols s1, …, s6, with rate vs = 800 [symb/s].
Symbol probabilities are

Pðs1 Þ ¼ 0:29; Pðs2 Þ ¼ 0:24; Pðs3 Þ ¼ 0:21; Pðs4 Þ ¼ 0:14;


Pðs5 Þ ¼ 0:07:

2. Binary encoder performing Huffman coding.


3. Channel consisting of two identical sections. Each one can be considered as a
binary symmetric channel. Total error probability is 0.18.
(a) Draw the system block-scheme, find the entropy and the source information
rate.
(b) Find the average code word length at the encoder output, code efficiency
and the compression ratio of the obtained code. Find the probability of ones
and zeros at the encoder output.
(c) Find the transmitted information and the channel information rate if the
receiver is positioned after the first channel section.
(d) Under the condition (c) find the channel capacity if the source emits the
symbols with maximum possible (allowed) rate.

Solution
(a) System block-scheme is shown in Fig. 4.14. On the basis of the five given
probabilities, the probability of the sixth symbol is
Problems 125

P(si), H(S)
s1 ,..., s6 Statistical encoder 0, 1
Discrete
(Huffman
memoryless channel
vs = 800 [simb /s] algorithm) v( X , Y )

BSC 2 BSC 1

Equivalent BSC

Receiver
(statistical decoder and user)

Fig. 4.14 System block-scheme

Fig. 4.15 Huffman s1 0.29


procedure illustration 0 0.55
0
s2 0.24 0
1
0.45
s3 0.21 1 1

s4 0.14
0

s5 0.07 0.26
0 1
0.12
s6 0.05 1

X
5
Pðs6 Þ ¼ 1  Pðsi Þ ¼ 0:05;
i¼1

and the entropy and information rate are

X
6  
Sh
HðSÞ ¼  Pðsi Þ ld ðPðsi ÞÞ ¼ 2:366 ;
symb
i¼1  
Sh
UðSÞ ¼ vs HðSÞ ¼ 1893:3 :
s
126 4 Information Channels

(b) The procedure of the Huffman code obtaining is shown in Fig. 4.15, and it is
obvious that to symbols s1–s6 correspond the code words ‘00’, ‘10’, ‘11’,
‘010’, ‘0110’ and ‘0111’, respectively. Average code word length and effi-
ciency are

X
6  
b HðSÞ
L¼ Pðsi Þli ¼ 2:38 ; g¼  100% ¼ 99:41%:
i¼1
symb L

The probability of zeros and ones can be found by calculating their average
number per code words as follows

L0 ¼ 0:29  2 þ 0:24  1 þ 0:21  0 þ 0:14  2 þ 0:07  2 þ 0:05  1 ¼ 1:29;


L1 ¼ 0:29  0 þ 0:24  1 þ 0:21  2 þ 0:14  1 þ 0:07  2 þ 0; 05  3 ¼ 1:09;

and

Pð0Þ ¼ L0 =L ¼ 0:542 Pð1Þ ¼ L1 =L ¼ 0:458;

and from the relation L0 + L1 = L, it follows P(0) + P(1) = 1 as well.


Under the assumption that the sequence at the encoder output is uncorrelated,
the entropy of binary symbols at the channel input can be found

HðXÞ ¼ 0:542 ld ð0:542Þ  Pð0:458Þ ld ð0:458Þ ¼ 0:9949 ½Sh=symb ;

and as the signaling rate is

vðX; YÞ ¼ vs L ¼ 1904 ½b/s

the information rate at the encoder output is

UðXÞ ¼ vðX; YÞHðXÞ ¼ 1894:3 ½Sh/s :

In this case UðXÞ [ UðSÞ, what is, of course, impossible because the encoder
cannot introduce the new information. This illogical result is a consequence of
the assumption that the sequence at the encoder output is uncorrelated, what is
not true in fact. The introduced imprecision is small and it can be corrected
having in mind that the previously found entropy is the entropy of adjoint
source, and the true entropy is


HðXÞ ¼ UðSÞ=vðX; YÞ ¼ 0:9944\0:9949 ¼ HðXÞ:
Problems 127

(c) If the probability of binary zero at the channel input is denoted by a, the
transmitted information can be calculated using the expression
 
1 1
IðX; YÞ ¼ ða þ p1  2ap1 Þ ld ð Þ þ ð1  a  p1 þ 2ap1 Þ ld ð Þ
a þ p1  2ap1 1  a  p1 þ 2ap1
 
1 1
 p1 ld ð Þ þ ð1  p1 Þ ld ð Þ ;
p1 1  p1

where p1 denotes the error probability corresponding to the one channel


section. For two cascaded sections, the error probability in the equivalent
channel becomes

p ¼ ð1  ð1  2p1 Þ2 Þ=2 ¼ 2p1  2p21 ;

and

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p1 ¼ ð1 1  2pÞ;
2

and the equation has two solutions p(1) (2)


1 = 0.1 and p1 = 0.9. The more logical
value of the error probability is p1 = 0.1 (the reader should examine the case
p1 = 0.9!), yielding

IðX; YÞ ¼ 0:5277 ½Sh/simb ; UðX; YÞ ¼ vðX; YÞIðX; YÞ ¼ 1004:8 ½Sh/s :

The source is already using the maximum signaling rate, and the following is
valid vðX; YÞ ¼ vmax ðX; YÞ. Therefore, the capacity calculation is carried out
by maximizing the transmitted information by the choice of the symbol
probabilities at the channel input. If these symbols are equiprobable, it is
obtained [1]
   
1 1 Sh
C ¼ vðX; YÞ 1  p1 ld ð Þ  ð1  p1 Þ ld ð Þ ¼ 1011 :
p1 1  p1 s

Reader should try to respond to the following questions:


• whether by the different bit position at the Huffman tree the value nearer to the
capacity can be obtained?
• whether the insertion of differential encoder between the Huffman encoder and
the channel will add to the increase of the information rate and in this case,
whether the greater quantity of information in the unit of time would be
transmitted?
128 4 Information Channels

It is important to note that in this case C\UðSÞ, therefore, it is obvious that for
this channel (with a given error probability) it is impossible to transmit all infor-
mation emitted by the source. However, if the channel state becomes substantially
better—the probability of error decreasing to p1 = 10−4, the capacity should
increase to C = 1901.2 [Sh/s].
It can be noticed that for the information transmission a more unreliable channel
can be used if the encoder efficiency is smaller. In the case when the encoder is
directly at the channel input, the redundancy introduced by the encoder allows the
reliable transmission over the unreliable channel. In other words, without the
redundancy, the information without any distortion can be transmitted over the
completely reliable channel only. This feature will be used in theory of error control
codes where the redundancy will be intentionally introduced to make possible the
information transmission over very unreliable channels.
Problem 4.6 Binary source completely defined by the probability P(0) = a, sends the
bits to the error control encoder which repeats every bit m times. These bits are further
sent over the binary symmetric channel where the crossover probability is p = 0.1. On
the receiving end is the error control decoder using majority logic decision.
Do the following:
(a) Find the entropy and information rate if the source emits the bits with the rate
vi = 1 [Mb/s] and if a = 0.4.
(b) Find the bit error probability observed by the user. Calculate the transmitted
information over the equivalent channel. Draw its dependence as a function of
the number of repetitions for a = 0.1, a = 0.2 and a = 0.4. Comment the results.
(c) If the maximum signaling rate over the channel is vb = 10 [Mb/s], find the
time needed for transmission of N = 107 bits under the condition that the error
probability observed by the user is smaller than Pe,min = 10−2. Whether by the
use of some procedure this time can be additionally shortened for a = 0.2?

Solution
(a) Entropy and information rate for zero-memory binary source are
 
Sh
HðSÞ ¼ a ld ðaÞ  ð1  aÞ ld ð1  aÞ ¼ 0:9709 ;
symb
 
kSh
UðSÞ ¼ vi ½a ld ðaÞ  ð1  aÞ ld ð1  aÞ ¼ 970:9 :
s

(b) When applying n repetitions error control coding with majority logic and
when the channel is symmetric and binary, with the crossover probability p,
the error probability observed by user is
Problems 129

0.9
a=0.4
0.8
H(S), I(X,Y)

0.7

0.6
a=0.2

0.5

0.4

a=0.1 Entropy of the source, H(S)


0.3 Transmited information, I(X,Y)

0.2
2 4 6 8 10 12 14
Number of repetitions, n

Fig. 4.16 Entropy of input sequence and the transmitted information for p = 0.1

X
n  
n k
PeðnÞ ¼ p ð1  pÞnk ;
k
k¼ðn1Þ=2 þ 1

while the transmitted information through the equivalent channel (consisting


of encoder with repetitions, channel and decoder) is

1
I ðnÞ ðX; YÞ ¼ ða þ PeðnÞ  2aPeðnÞ Þ ld ð ðnÞ ðnÞ
Þ
a þ Pe  2aPe
1
þ ð1  a  PeðnÞ þ 2aPeðnÞ Þ ld ð ðnÞ ðnÞ
Þ
1  a  Pe þ 2aPe
1 1
 PeðnÞ ld ( ðnÞ Þ  ð1  PeðnÞ Þ ld ð ðnÞ
Þ:
Pe 1  Pe

The dependence of transmitted information on the number of repetitions is


shown in Fig. 4.16. It is obvious that the error probability depends only on the
channel characteristics (for a fixed decision rule), while the transmitted
information depends on the source features (parameter a) as well.
The reader should calculate information rate through the equivalent channel
(encoder, channel and decoder). Whether it can be increased by the increase of
n, if the transmission bit rate through the channel is limited? It should be
noticed that the signaling rate in this channel is vi!
(c) Firstly, the value of n should be found for which the error probability is below
the prescribed value, when the information rate is maximum. In this case
p = 10−2, and it is obtained successively
130 4 Information Channels

Peð1Þ ¼ 102 ; Peð3Þ ¼ 2:8  102 ; Peð5Þ ¼ 8:56  103 ;


Peð7Þ ¼ 2:72  103 :

but the binary rate increases

ð1Þ ð3Þ ð5Þ ð7Þ


vb ¼ vi ; vb ¼ 3vi ; vb ¼ 5vi ; vb ¼ 7vi :

It is obvious that the needed transmission quality is achieved for n  5, while


the minimum binary rate under this condition is obtained just for n = 5 and the
code with the code rate R = 1/5 is the optimum one, yielding the transmission
rate

vi;max ¼ Rvb;max ¼ 2 ½Mb/s :

For sending N = 107 bits, the needed time is

t ¼ N=vb ¼ 5½s :

This time can be decreased if a sufficient source extension is done, after that by
the use compression encoding and after the repetition encoding as well. The
error probability does not change by this procedure and the code with R = 1/5
is the optimum one. On the other hand, N bits of the source (S) alphabet can be
substituted by N  H(S) symbols of code alphabet and for a = 0.2 it is
obtained

tmin ¼ N  HðSÞ=vb ¼ 3:61 ½s]:

Problem 4.7 Digital communication system consists of


1. Information source emitting the symbols s1, …, s7, with the rate vs = 1000
[symb/s]. Symbol probabilities are as follows

si s1 s2 s3 s4 s5 s6 s7
P(si) 0.3 0.2 0.15 0.15 0.1 0.05 0.05

2. Binary Huffman encoder and the corresponding decoder at the receiving end.
3. Error control encoder using threefold repetition and the corresponding decoder
with majority logic at the receiving end (these blocks are included optionally).
4. Channel consisting of two cascaded sections where one is modeled by binary
symmetric channel (crossover probability equals p = 10−2) and the other is
binary asymmetric channel with parameters p1 = 2p0 = 2p.
Problems 131

Find the following:


(a) Source information rate and code efficiency for Huffman encoding
(b) Find the information rate of the equivalent source at the channel input and
information rate of the source adjoined to it (draw the system block-scheme)
(c) Channel information rate and the error probability without the error control
coding.
(d) Information rate and the error probability when the error control coding is
applied and the signaling rate cannot be additionally increased with respect to
the case without error control coding. Specially comment how the error
control coding influences the information rate!

Solution
(a) The source characteristics are easily obtained

X
7    
1 Sh Sh
HðSÞ ¼ Pðsi Þ ld ¼ 2:57 ; UðSÞ ¼ vs HðSÞ ¼ 2570 :
i¼1
Pðsi Þ symb s

Code words obtained by Huffman encoding are given in Table 4.2, from
which the average code word length, code efficiency and the compression ratio
are

X
7  
b
L¼ Pðsi Þli ¼ 2:6 ; g ¼ 98:88%; q ¼ 1:1538:
i¼1
symb

The complete system block-scheme is shown in Fig. 4.17. It is obvious that


the information rate at the channel input (i.e. at the Huffman encoder output)
must be equal to the source information rate

UðXÞ ¼ UðSÞ ¼ 2570 ½Sh/s ;

and while the signaling rate through the channel is

Table 4.2 One possible Symbols P(si) Code words


result of Huffman encoding
s1 0.30 00
s2 0.20 10
s3 0.15 010
s4 0.15 011
s5 0.10 110
s6 0.05 1110
s7 0.05 1111
132 4 Information Channels

P (si ) P (xi)
Statistical encoder
s1 ,..., s 7
Source (Huffman 0, 1
v s H (s) Φ ( s) algorithm)

Error control
encoder
(repetition 3x)

PBSC
BSC

PBNC
BNC

equivalent BC

Error control
v (X ,Y )
decoder
I (X ,Y )
(majority voting)
Φ ( x, y) superchannel

Statistical
User
decoder

Fig. 4.17 System block-scheme, the notion of superchannel

vðxÞ ¼ vs L ¼ 2600 ½b/s ;

the entropy of equivalent source consisting of zero-memory source and


Huffman encoder is

HðXÞ ¼ UðXÞ=vðX; YÞ ¼ 0:9885 ½Sh/symb :

To this equivalent source, the source can be joined completely described by


symbol probabilities at the channel input

Pðx1 Þ ¼ L0 =L ¼ 1:4=2:6 ¼ 0:5385; Pðx2 Þ ¼ L0 =L ¼ 1:2=2:6 ¼ 0:4615;

yielding

X
2
1
 ¼
HðXÞ Pðxi Þ ld ¼ 0:9957 ½Sh/symb ;
i¼1
Pðxi Þ
 ¼ vðX; YÞHðXÞ
UðXÞ  ¼ 2588:82 ½Sh/symb :
Problems 133

(b) The transmitted information of the equivalent binary source is

2 X
X 2
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ

where the probabilities correspond to the equivalent channel representing


cascade of one binary symmetric channel and one binary (asymmetric)
channel described by matrices
   
vBSC;0 pBSC;0 1p p
PBSC ¼ ¼ ;
pBSC;1 vBSC;1  p 1p 
vNSC;0 pNSC;0 1p p
PNSC ¼ ¼ :
pNSC;1 vNSC;1 2p 1  2p

The equivalent channel has the transition matrix


  
1p p 1p p
Pekv ¼
 p 1p 2p 1  2p 
2 2
ð1  pÞ þ 2p pð2  3pÞ
¼ ;
3pð1  pÞ ð1  pÞð1  2pÞ þ p2

and for p = 10−2 it is obtained

Pðy1 =x1 Þ ¼ ð1  pÞ2 þ 2p2 ¼ 0:9803; Pðy2 =x1 Þ ¼ pð2  3pÞ ¼ 0:0197;
Pðy1 =x2 Þ ¼ 3pð1  pÞ ¼ 0:0297; Pðy2 =x2 Þ ¼ ð1  pÞð1  2pÞ þ p2 ¼ 0:9703:

Probabilities of symbols y1 = 0 and y2 = 1 at the channel output are

Pðy1 Þ ¼ 0:5279 þ 0:0137 ¼ 0:5416; Pðy2 Þ ¼ 0:0106 þ 0:4478 ¼ 0:4584;

and the corresponding joint probabilities are

Pðx1 ; y1 Þ ¼ 0:5385  0:9803 ¼ 0:5279; Pðx1 ; y2 Þ ¼ 0:5385  0:0197 ¼ 0:0106;


Pðx1 ; y1 Þ ¼ 0:4615  0:0297 ¼ 0:0137; Pðx2 ; y2 Þ ¼ 0:4615  0:9703 ¼ 0:4478:

Finally, the numerical value of transmitted information and of information rate


can be found

IðX; YÞ ¼ 0:8307 ½Sh/symb ; UðX; YÞ ¼ vðX; YÞIðX; YÞ ¼ 2159:82 ½Sh/s :


134 4 Information Channels

In this case it is obvious that the optimum decision rule is d(y1) = x1, d
(y2) = x2 and the error probability without the error control coding is

Pe1 ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼ 0:0243:

(c) The case with three repetitions now will be considered. If at the output of
Huffman encoder the binary zero is sent to the error control encoder, at its
output the sequence ‘000’ is generated and at the error control decoder output
binary zero will be generated in the cases without transmission errors or when
only one error occurred at these three bits. Otherwise, the error will occur for
binary zero transmission. Therefore, the superchannel (consisting of error
control encoder, channel and error control decoder) parameters are

PSK ðy1 =x1 Þ ¼ Pðy1 =x1 Þ3 þ 3Pðy1 =x1 Þ2 Pðy2 =x1 Þ ¼ 0:9989;
PSK ðy2 =x1 Þ ¼ 3Pðy1 =x1 ÞPðy2 =x1 Þ2 þ Pðy2 =x1 Þ3 ¼ 0:0011;
PSK ðy1 =x2 Þ ¼ 3Pðy2 =x2 ÞPðy1 =x2 Þ2 þ Pðy1 =x2 Þ3 ¼ 0:0026;
PSK ðy2 =x2 Þ ¼ Pðy2 =x2 Þ3 þ 3Pðy2 =x2 Þ2 Pðy1 =x2 Þ ¼ 0:9974:

The probabilities of input symbols are not changed and the output symbol
probabilities are

PSK ðy1 Þ ¼ 0:5391; PSK ðy2 Þ ¼ 0:4609;

while the corresponding joint probabilities are

PSK ðx1 ; y1 Þ ¼ 0:5279; PSK ðx1 ; y2 Þ ¼ 0:0006;


PSK ðx1 ; y1 Þ ¼ 0:0012; PSK ðx2 ; y2 Þ ¼ 0:4603:

Binary rate in the channel is three times greater than the rate at the coder input
– the code rate is R = 1/3. If the binary rate in the channel cannot be increased
(the channel is the same) information rate at the error control coder must be
decreased three times, i.e.

vSK ðX; YÞ ¼ R  vðX; YÞ;

and the source should emit the symbols three times slower.
Finally, transmitted information and information rate can be calculated

ISK ðX; YÞ ¼ 0:9766 ½Sh/symb ;


USK ðX; YÞ ¼ vSK ðX; YÞISK ðX; YÞ ¼ 846:39 ½Sh/s :

Error probability in this case is


Problems 135

Pe1 ¼ Pðx1 ÞPðy2 =x1 Þ þ Pðx2 ÞPðy1 =x2 Þ ¼ 0:0018;

being 13.5 times smaller than in the case without error control coding.
It is important to notice the following
• If the channel is connected directly to the information source, the channel
cannot transmit more information (per second) than the source sent, i.e.

UðX; YÞ  UðSÞ

• Channel is capable to transmit the information with success if

C  UðSÞ;

i.e. the channel have to be good enough to transmit all information.


• To transmit the complete information that the source sent, the following
must hold

UðX; YÞ ¼ UðSÞ;

where the relation is satisfied only if ηR = I(X, Y)—here for perfect


compression code the relation can be satisfied even and for non-ideal
channel if R = I(X, Y)!
• It is desirable that the ratio UðX; YÞ=C should be as great as possible,
allowing that for the information transmission the channel having capacity
as small as possible can be used. It is also desirable that the signaling rate is
as small as possible for a given information rate. By source encoding a
non-destructive compression is achieved and the difference between these
two rates is smaller (in such a way demands for a frequency band are
partially lessened).
• By error control coding the reliability of transmission can be increased (the
error probability is decreased) but the information rate cannot be greater,
i.e.

RvðX; YÞISK ðX; YÞ  vðX; YÞIðX; YÞ:

• If the error control code enabling the negligible error probability is applied,
the transmitted information through the superchannel is ISK ðX; YÞ ¼ 1, the
above relation becoming

R  IðX; YÞ;

which corresponds to the code rate predicted by the Second Shannon


theorem [10].
136 4 Information Channels

Problem 4.8 Zero-memory binary source, where the probability of binary zero is P
(x1), is at the input of line encoder which generates polar binary signal with voltage
levels ±1. From line encoder the signal is emitted into the channel with additive
white Gaussian noise (r = 0.5). The decision block is at the channel output with
two thresholds ±G (0  G  1).
(a) Find the transmitted information for P(x1) = 0.5 and G = 0.5.
(b) Draw the transmitted information versus G when P(x1) = 0.5.
(c) Draw the transmitted information versus P(x1) for G = 0, G = 0.5 and G = 1
and compare it with source entropy.

Solution
Signal (amplitude) probability density function at the channel output depends on
voltage level emitted by line encoder. It is shown in Fig. 4.18. If the received signal
is below the lower threshold (u < −G), the decision is that the symbol ‘0’ is at the
discrete channel output, while, in the case u > G, the decision is ‘1’. If the received
signal is between thresholds, the decision would not be sufficiently reliable. In this
case the symbol is considered as erased, i.e. to it formally corresponds symbol E.
The probabilities for the received signal to be between the thresholds, and above the
more distant threshold are
   
1 1G 1 1þG
p1 ¼ wY ðG\y\G=x ¼ 1Þ ¼ wY ðG\y\G=x ¼ þ 1Þ ¼ erfc pffiffiffi  erfc pffiffiffi ;
2 2r 2 2r
 
1 1þG
p2 ¼ wY ðy [ G=x ¼ 1Þ ¼ wY ðy\  G=x ¼ þ 1Þ ¼ erfc pffiffiffi :
2 2r
Probability density function of the output signal

0.4
wY (y/x=+1)
0.35 wY (y/x=−1)

0.3

0.25

0.2

0.15 −G G

’0’ ’1’
0.1

0.05 ’E’

0
−4 −3 −2 −1 0 1 2 3 4
Received signal, u

Fig. 4.18 Probability density functions of the signal at channel output, equivalence with BEC
(Binary Erasure Channel)
Problems 137

Transition matrix of the binary erasure channel is


 
1  p1  p2 p1 p2
P¼ ;
p2 p1 1  p1  p2

where for the erasure channel often is supposed p2 0, not supposed here.
Transmitted information is
 
ð1  p1  p2 Þ p1 p2
IðX; YÞ ¼ Pðx1 Þ ð1  p1  p2 Þ ld þ p1 ld þ p2 ld
Pðy1 Þ Pðy2 Þ Pðy3 Þ
 
p2 p1 ð1  p1  p2 Þ
þ Pðx2 Þ p2 ld þ p1 ld þ ð1  p1  p2 Þ ld ;
Pðy1 Þ Pðy2 Þ Pðy3 Þ

where

Pðy1 Þ ¼ ð1  p1  p2 ÞPðx1 Þ þ p2 Pðx2 Þ;


Pðy2 Þ ¼ p1 Pðx1 Þ þ p1 Pðx2 Þ;
Pðy2 Þ ¼ p2 Pðx1 Þ þ ð1  p1  p2 ÞPðx2 Þ:

For P(x1) = 0.5 and G = r =0.5 it is obtained p1 = 0.1573 and p2 = 0.0013


yielding P(y1) = 0.4213, P(y1) = 0.1573, P(y1) = 0.4213 and finally
I(X, Y) = 0.8282 [Sh/symb].
The transmitted information dependence on the threshold absolute value is
shown in Fig. 4.19. It is obvious that for every value of P(x1), there are the

1
P(x1 )=0.5
0.9
P(x1 )=0.3
Transmited information, I(X,Y)

P(x1 )=0.1
0.8

0.7

0.6

0.5

0.4

0.3

0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Absolute value of the threshold, G

Fig. 4.19 The transmitted information dependence on the threshold G absolute value
138 4 Information Channels

Source entropy, transmited information


0.9

0.8

0.7

0.6

0.5

0.4

0.3
H(X)
0.2
I(X,Y), G=0,0001
I(X,Y), G=0.2
0.1 I(X,Y), G=0,9

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of a simbol at the channel input, P(x 1)

Fig. 4.20 The dependence of the transmitted information on the input signal probabilities

optimum thresholds values −G and G yielding the maximum transmitted infor-


mation. It is interesting that for P(x1) = 0.5 more information is transmitted when
G = 0.2 than for G = 0 (when BEC is reduced to BSC). The dependence of the
transmitted information on the input signal probabilities is shown in Fig. 4.20,
leading to the conclusion that the optimal case is when the input symbols are
equiprobable.
Problem 4.9 The information source whose rate is vs = 106 [symb/s], is defined by
the following table

si s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
P(si) 0.18 0.2 0.14 0.15 0.1 0.11 0.05 0.04 0.02 0.01

Whether it is justified to put the source output at the input of the channel modeled
as a pass-band filter with the frequency band B = 1 [MHz] if the noise power
spectrum density is N0 = 10−14 [W/Hz] and if the signal power is Ps = 0.1 [lW]?
Problems 139

Solution
The source information rate is

X
8
UðSÞ ¼ vs Pðsi Þ ld ðPðsi ÞÞ ¼ 2:981 ½MSh/s ;
i¼1

and the continuous channel capacity


 
Ps
C ¼ B ld 1þ ¼ 106 ld ð11Þ ¼ 3:46 ½MSh/s :
pn B

The information rate is smaller than the channel capacity, and it is justified to use
the channel for a given source. According to the first Nyquist criterion for trans-
mission in the transposed frequency band, it should be provided for the signaling rate
to be smaller than vmax = 1 [Msymb/s] (the set of signaling symbols, as a rule, is not
the same as emitted by the source!). As the minimum binary rate over the channel is
determined by the information rate, it is obvious that BPSK cannot be used, but some
M-level modulation scheme could be used, providing U(S)/ld(M) < vmax.
Problem 4.10 Zero-memory binary source generating equiprobable symbols is at
the line encoder input which generates polar binary signal having the voltage levels
±U. From line encoder the signal is emitted over the channel with additive white
Gaussian noise (standard deviation is r). At the channel output are the quantizer
(s = 2n quantization levels) and a corresponding decision block. The bounds of the
quantization intervals are

Ugr ¼ l  ð4UÞ=s  2U; l ¼ 1; . . .; s  1;

and at its output one of the possible M discrete symbols is generated.


(a) Consider equivalent discrete channel consisting of line encoder, channel,
quantizer and decision block. Find the transmitted information for equivalent
channel for U = 1 [V] and r = 0.5 [V], when s = 2, s = 4, s = 8, s = 16 and
s = 32.
(b) For a considered discrete channel find the capacity dependence on
signal-to-noise ratio for various values of parameter s and compare it to the
continuous channel capacity.

Solution
(a) Because of the quantizer, the complete voltage range is divided into
s non-overlapping intervals every corresponding to one discrete symbol. When
s = 4 the signal probability density function at the channel output is shown in
140 4 Information Channels

Probability density function at the channel output


0.4
w Y (y/x=+1)

0.35 w Y (y/x=−1)

0.3

0.25

0.2

0.15
’0’ ’3
0.1
’1’ ’2’
0.05

0
−4 −3 −2 −1 0 1 2 3 4
Received signal, u

Fig. 4.21 Determining of probabilities of the channel outgoing symbols

Fig. 4.22 Graph p1 y1=-1.5


corresponding to discrete p2
channel with r = 2 inputs and 1-2p1-p2
s = 4 outputs, to every x1=-1 y2=-0.5
p1
quantization level
corresponds one discrete p1
symbol x2=+1 1-2p1-p2
y3=+0.5
p2
p1 y4=+1.5

Fig. 4.21, and a graph corresponding to discrete channel with r = 2 inputs and
s = 4 outputs is shown in Fig. 4.22.
The bounds of quantizing intervals are

Ugr ðlÞ ¼ l  ð4UÞ=s  2U; l ¼ 1; . . .; s  1

and by defining Ux(1) = U, Ux(2) = −U, the channel transition probabilities


can be easily found from Fig. 4.21
Problems 141

Pðyj =xi Þ ¼ 8
pjjij n o
> 1 erfc min UX ðiÞU
> pffiffigr pffiffi
ðj1Þ UX ðiÞUgr ðjÞ
;
>
> 2
n 2r 2r o
>
>
>
> 1
 2 erfc max pffiffi ; pffiffi
UX ðiÞUgr ðj1Þ UX ðiÞUgr ðjÞ
; i 6¼ j; 1\j\s
>
> 2r 2r
<
¼ 12 erfc X pffiffi2rgr
U ðiÞU ð1Þ
; i 6¼ j ¼ 1
>
>
>
> erfc U
pffiffi
ðiÞU ðs1Þ
>
>
1 X gr
; i 6¼ j ¼ s
>
>
2
P 2r
>
>
: 1  Pðyj =xi Þ i ¼ j:
j6¼i

where i = 1,2 and j = 1, 2, …, s.


Transmitted information is

X
2 X
s
Pðyj =xi Þ
IðX; YÞ ¼ Pðxi ; yj Þ ld ;
i¼1 j¼1
Pðyj Þ

and having in view the equiprobable binary source symbols (P(x1) =


P(x1) = 0.5) joint probabilities are

Pðxi ; yj Þ ¼ Pðyj =xi Þ=2; 8i; j:

The source is symmetric and the maximum transmitted information value is


achieved for equiprobable symbols

1X 2 X s
2Pðyj =xi Þ
Imax ðX; YÞ ¼ Pðyj =xi Þ ld :
2 i¼1 j¼1 Pðyj =x1 Þ þ Pðyj =x2 Þ

For U = 1 [V] and r = 0.5 [V], the transmitted information for a given values
of s is given in Table 4.3. It is obvious that with the increased number of
quantization levels, the transmitted information increases as well, but after
some number of levels, the saturation is achieved. In the case when the
number of symbols at the channel input is very great—corresponding to the
levels being very close each other—the channel with discrete input (r = 2) and
a continuous output is obtained. It should be noticed as well that the quanti-
zation levels here are chosen in a suboptimal way (only the range from −2U to
+2U was quantized) and that the additional increase in the transmitted

Table 4.3 Transmitted information for various values of s


s 2 4 8 16 32 64 256 32,768
I(X,Y) 0.8434 0.8662 0.8968 0.9087 0.9118 0.9126 0.9128 0.9128
142 4 Information Channels

0.9
Imax(X,Y) = Cd / vmax(X,Y)

0.8

0.7

0.6
Capacity of continuous channel
Entropy of binary source
0.5 Transmited information for r=2, s=2
Transmited information for r=2, s=4
Transmited information for r=2, s=8
0.4 Transmited information for r=2, s=16

0.3
0 1 2 3 4 5 6 7 8 9 10
Signal to noise ratio, snr[dB]

Fig. 4.23 Normalized continuous channel capacity and capacity of the corresponding discrete
channel with r = 2 inputs and s outputs

information can be achieved, for the same number of quantizing levels, by


quantizer optimization.
(b) Capacity of the above defined discrete channel with r = 2 inputs and s outputs
is

C ¼ vmax ðX; YÞImax ðX; YÞ


1 X 2 X s
2Pðyj =xi Þ
¼ vmax ðX; YÞ Pðyj =xi Þ ld
2 i¼1 j¼1
Pðy j 1 Þ þ Pðyj =x2 Þ
=x

Having in view that the data for calculating the maximum signaling rate for a
channel are not available, the capacity will be normalized to that rate (this
approach is very often used in the literature).
The corresponding numerical values are shown in Fig. 4.23. It is obvious that
for all signal-to-noise ratio values the increasing of the number of quantization
levels increases the transmitted information as well. In such a way it is
practically confirmed that soft decision is superior to hard decision where the
received signal is compared to one threshold value only.
It is interesting to find as well the capacity of the continuous channel with
noise, the channel corresponding to the system part from line encoder output
to the quantizer input.

C ¼ B ld ð1 þ Ps =Pn Þ:
Problems 143

Although the frequency band is not given in this problem, it is obvious that the
signaling is in the baseband and the condition vmax ðX; YÞ  2B must be sat-
isfied, and the maximum channel capacity (corresponding to the system
modeled as an ideal low-pass filter) is

C=vmax ðX; YÞ ¼ ld 1 þ U 2 =r2 =2:

Of course, it is obvious that the continuous channel capacity is always greater


than that of the discrete channel adjoined to it. However, it should be noted
that the transmitted information of a discrete source channel cannot be greater
than the entropy of the input discrete source.

Problem 4.11 Zero-memory binary source generates equiprobable symbols. Its n-


th order extension is led at the line encoder input. Line encoder generates multilevel
polar signal having M = 2n equidistant voltage levels, average power is Ps = 1 [W].
Line encoder output is transmitted through the continuous channel modeled by an
ideal low-pass filter having the cutoff frequency fc = 100 [kHz], and where the
additive Gaussian noise has the average power density spectrum
N0 = 2.5  10−6 [W/Hz]. At the channel output are the quantizer (M levels) and
the corresponding decision block. Quantization levels bounds are

Ugr ðlÞ ¼ l  2UMU; l ¼ 1; . . .; M1;

and at its output one of the possible M discrete symbols is generated.


(a) Draw the block-scheme if the equivalent source consists of binary source and
the block for its extension, and the equivalent discrete channel consists of line
encoder, channel, quantizer and the decision block. Find the entropy of the
equivalent source.
(b) Find the transmitted information for the equivalent channel when M = 2,
M = 4, M = 8 and M = 16.
(c) For a previously defined discrete source find the dependence of capacity on the
signal-to-noise ratio for the various values of parameter s and compare it to the
continuous channel capacity.

Solution
(a) System block-scheme is shown in Fig. 4.24. It is obvious that the binary
source combined with the extension block emits M equiprobable symbols, and
the entropy of the combined source is

X2n
1
HðSn Þ ¼ n
ld ð2n Þ ¼ nHðSÞ ¼ n ½Sh/symb :
i¼1
2
144 4 Information Channels

0100
Binary
2->M=2n M->2 User
source

M-ary source

Channel – LP Decision
Line encoder
filter with (comparison
(M-ary polar
noise with
code) Ps =1[W]
fg=100 [kHz] threshold)
n(t )
Equivalent BSC
pn = 5 ×10−6 [W /Hz]

Fig. 4.24 System block-scheme

The receiving part of this system is practically the same as in the previous
problem. The transmitting part differs, because n binary symbols are firstly
grouped (in the extension block) and every symbol in line coder is represented
by one (from M) voltage level

Ux ðiÞ ¼ ðM  1ÞU þ 2ði  1ÞU; i ¼ 1; 2; . . .; M:

Multilevel signal has a unity power and the following relation must hold
" #
1X M
1X M
U2 X M XM
Psr ¼ Ux2 ðiÞ ¼ ð2i  M  1Þ2 U 2 ¼ 4 i2  4ðM þ 1Þ i þ ðM þ 1Þ2 ¼ 1;
M i¼1 M i¼1 M i¼1 i¼1

after using the relations

X
M
MðM þ 1Þ X
M
MðM þ 1Þð2M þ 1Þ
i¼ ; i2 ¼ ;
i¼1
2 i¼1
6

Fig. 4.25 Graph of 1-p1-p2-p3


x1=-3U p1 y1=-3U
symmetric channel with r = 4 p3 p2
inputs and s = 4 outputs
p1
x1=-U p1 y2=-U
p2
p2
p1
x2=+U 1-p1-p2 y3=+U
p1
p3 p2
p1
x1=-3U 1-p1-p2-p3
y4=+3U
Problems 145

the corresponding value is obtained


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
U¼ :
M2  1

(b) Equivalent discrete channel now has M inputs and M outputs and for M = 4 is
described by the graph shown in Fig. 4.25, and the corresponding transition
matrix is
2 3
1  p1  p2  p3 p1 p2 p3
6 p1 1  2p1  p2 p1 p2 7
P¼6 4
7:
5
p2 p1 1  2p1  p2 p1
p3 p2 p1 1  p1  p2  p3

Transition probabilities are found by using the same equalities as in the pre-
vious problem, with only a few changes—parameter s is substituted by M,
values Ugr(l) are given in the problem text being
9ð2i  M  1Þ
UX ðiÞ ¼ ð2i  M  1ÞU ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; i ¼ 1; 2; . . .; M:
M2  1

The frequency band is B = 100 [kHz], maximum signaling rate is always


vs,max = 2B = 200 [ksymb/s] for the M-ary symbols. The equivalent binary
rate is vb,max = 2fc  ld(M). In this case, for r2 = N0 fc = 0.25 [W], trans-
mitted information for some values of M is given in Table 4.4. Normalized
capacity of the corresponding continuous channel is C = 1.1610 [Sh/symb].
The dependence of transmitted information on the signal-to-noise ratio in the
channel is shown in Fig. 4.26. When the number of discrete symbols at the
input and at the output of the channel increases, the discrete channel capacity
could converge to the continuous channel capacity, seeming not being here
completely the case. The basic problem of the above analysis is that is it
limited to equiprobable input symbols only.
It should be noticed that in this case the channel is not completely symmetric,
which can be easily seen from the graph or from the transition matrix.
Although all parameters corresponding to the inputs x1 and x4 (combinations
‘00’ and ‘11’ from the binary source) are identical, channel parameters cor-
responding to the inputs x2 and x3 are different. It implicates that it is optimally
to choose the probabilities

Table 4.4 Transmitted information for some values of M (equiprobable input symbols)
s 2 4 8 16 32 64 1024
I(X,Y) 0.8434 0.9668 1.0691 1.1025 1.1123 1.1153 1.1170
[Sh/symb]
146 4 Information Channels

Pðx1 Þ ¼ Pðx4 Þ; Pðx2 Þ ¼ Pðx3 Þ ¼ 1=2  Pðx1 Þ:

Each of this probabilities should be in the range from 0 to 0.5, but from the
Fig. 4.27 it is obvious that the optimal value of probability P(x1) = P(x4)
depends on the noise power and has a greater value when the noise power is

4.5

4 Capacity of continuous channel


I(X,Y), discrete channel, M=16
3.5 I(X,Y), discrete channel, M=8
I(X,Y), discrete channel, M=4
I(X,Y), discrete channel, M=2
3

2.5
I(X,Y)

1.5

0.5

0
0 5 10 15 20 25
Signal to noise ratio, snr [dB]

Fig. 4.26 The dependence of transmitted information on the signal-to-noise ratio

1.8
Transmited information, I(X,Y)

1.6

1.4

1.2

0.8

0.6 σ=0.1
σ=0.3
σ=0.5
0.4

0.2
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Probability of symbols 00 and 11

Fig. 4.27 Optimization of parameters P(00) = P(11) for signaling with M = 4 levels
Problems 147

greater. It is logical—for a strong noise more distant levels should be used to


reduce the noise influence.
Therefore, for r = 0.5 [V] (SNR = 6 [dB]) by input probabilities optimization
the transmitted information can be increased from 0.9668 [Sh/symb] for
equiprobable symbols, to 1.0513 [Sh/symb] for the optimum found input
probabilities P(x1) = P(x4) = 0.77, P(x2) = P(x3) = 0.23.
Of course, an additional gain is possible if for a fixed number of signaling
levels at the channel input, an additional quantization is performed at the
receiving end in such a way that the number of symbols at the output is greater
than at the input, i.e. r = M, s > M, what will correspond to the multilevel
signaling with a soft decision.

Problem 4.12 In one data transmission system zero-memory binary source sends
bits with the rate vb = 100 [kb/s] to the BPSK modulator. Modulated signal with
the average power Ps = 5 [lW] is transmitted over the channel modeled by an ideal
pass-band filter. The average power density spectrum of the additive white
Gaussian noise in the channel is N0 = 2  10−11 [W/Hz]. The bandwidth is chosen
so as to eliminate the intersymbol interference obtaining in the same time the
minimum noise power at the channel output.
Beside the Gaussian noise, in the channel there is the impulsive noise which has
the average pulse duration TL = 50 [ls] and the average interval between two
pulses is TD = 500 [ls]. The impulsive noise has also the Gaussian distribution and
the average power spectral density pi = 10−9 [W/Hz], being constant as well in the
frequency band of interest. The BPSK signal receiver is using integrate and dump
circuit.
(a) Find the signal-to-noise ratio in the channel in intervals without the impulsive
noise, as well as when it is active.
(b) Find the error probability in intervals without the impulsive noise as well as
when it is active. By which discrete model the corresponding channel error
sequence can be modeled?
(c) Find the error probability in the good and in the bad state (impulsive noise), as
well as the average number of bits transmitted in every state. Find the channel
average error probability.
(d) Draw the autocorrelation function of the corresponding error sequence.

Solution
The channel is modeled as an ideal pass-band filter. The bandwidth is chosen so as
to eliminate the intersymbol interference, and for such transmission the rate cor-
responds to the bandwidth multiplied by the any integer n, being

vb;no ISI ¼ B=n:

Minimum noise power at channel output is obtained for n = 1 yielding


148 4 Information Channels

25

20

15

10

5
Noise level

−5

−10

−15

−20

−25
0 5 10 15 20
Time [ms]

Fig. 4.28 The form of channel noise

Bmin;no ISI ¼ vb ¼ 100 ½kHz :

Let the “good” state corresponds to the intervals without the impulsive noise in
the channel. In this case noise power and the average signal-to-noise ratio are

Ps
r2n ¼ pn Bmin;no ISI ¼ 2 ½lW ; SNRD ¼ ¼ 2:5:
r2n

The “bad” state corresponds to the intervals when both noises are present, and
the impulsive noise power, total noise power and the average signal-to-noise ratio
are

Ps
r2i ¼ pi Bmin;no ISI ¼ 100 ½lW ; r2tot ¼ r2n þ r2i ¼ 102½lW ; SNRL ¼ ¼ 0:0490:
r2tot

The time form of the channel noise signal is illustrated in Fig. 4.28.
(a) The receiver is realized as a integrate and dump circuit, the probability of error
is
rffiffiffiffiffiffi
1 Eb
Pe ¼ erfc :
2 N0

From the relation


Problems 149

Fig. 4.29 Gilbert-Elliott PLD


channel model
PDD D L PLL

PDL

Eb Ps =vb B
¼ ¼ SNR ;
N0 Pn =B vb

and for vb = B, in this case, the error probabilities in good and bad states are

1 pffiffiffiffiffiffiffiffiffiffiffiffi
pD ¼ erfc SNRD ¼ 0:0127;
2
1 pffiffiffiffiffiffiffiffiffiffiffi
pL ¼ erfc SNRL ¼ 0:3771:
2

This channel can be successfully modeled using the Gilbert-Elliott model [20,
21] where it is supposed that every state can be considered as a binary sym-
metric channel, the corresponding matrices are
   
vG pG vB pB
PG ¼ ; ðvG þ pG ¼ 1Þ and PB ¼ ; ðvB þ pB ¼ 1Þ;
pG vG pB vB

and the transition from one state into another can be illustrated graphically, as
shown in Fig. 4.29.
(b) Stationary probabilities of good and bad state in Gilbert-Elliott model are fully
determined by transition probabilities from good into the bad state (PGB) and
vice versa (PBG) being

PBG PGB
pG ¼ ; pB ¼ ;
PGB þ PBG PGB þ PBG

the number of bits emitted in the states is

NG ¼ 1=PGB ; NB ¼ 1=PBG :

On the base of the average duration of time intervals in the good and the bad
state (TG = 500 [ls], TB = 50 [ls]) with the signaling interval duration
Tb = 1/vb = 10 [ls], a model should be found where in the good state the
average number of emitted bits is NG = TG/Tb = 50, and for every sojourn in
the bad state the average number of emitted bits is NB = TB/Tb = 5. It can be
easily found
150 4 Information Channels

1
N G=50, N B=5
0.9
N G=50, N B=50
0.8 N G=5, N B=50

0.7
Normalized ACF

0.6

0.5

0.4

0.3

0.2

0.1

0
−100 −80 −60 −40 −20 0 20 40 60 80 100
Discrete shift

Fig. 4.30 Normalized sequence of error autocorrelation function corresponding to Gilbert-Elliott


model

PBG ¼ 1=NB ¼ 0:2; PBB ¼ 1PBG ¼ 0:8; PGB ¼ 1=NG ¼ 0:02;


PGG ¼ 1PGB ¼ 0:98;

yielding

pG ¼ 0:9091; pB ¼ 0:0909:

The channel average error probability is

pG PBG pB PGB
Pe ¼ pG pG þ pB pB ¼ þ ¼ 0:0458:
PGB þ PBG PGB þ PBG

(d) Typical state sequence is

. . .GGGGGGGGGGGGBBBGGGGGGGGGGGGGBBBBBGGGGGGGGGGGGGGGG. . .

where G denotes that the channel in this signaling interval (corresponding to


one bit transmission) is in good state, while B corresponds to the bad state.
The corresponding error sequence (it could be obtained by adding modulo 2
sequences of transmitted and received bits) is
Problems 151

. . .00000100000010100000000000001010100000000000000. . .

where the bold and underlined bits correspond to the bad state.
Normalized sequence of error autocorrelation function obtained on the base

1X N
Re ðkÞ ¼ eðnÞeðn þ kÞ;
N n¼1

it was estimated by Monte Carlo simulation and it is shown in Fig. 4.30.


The autocorrelation function value for zero shift corresponds to the average
error probability Pe, it is substantially greater for NB > NG than for NB < NG.
For this reason, the normalization is carried out, to allow a more noticeable
difference in the function shape. Of course, in the second chapter of this book,
it was shown that the discrete sequence autocorrelation function must be
discrete, but in this case, to allow a clearer insight, only the envelope of the
function is drawn.
It is interesting that in the limiting cases a Gilbert-Elliott model is practically
reduced to binary symmetric channel (for NB NG the channel is almost
always in a bad state, and for NG NB the channel is almost always in a good
state). In this case the following is valid

p; k¼ 0
Re ðkÞ
p2 ; k 6¼ 0

where p = pB for NB NG and p = pG for NG NB.


For NG = NB, the shape of autocorrelation function clearly shows that the error
sequence is correlated, i.e. corresponding to the channel with memory.
Chapter 5
Block Codes

Brief Theoretical Overview

In the remaining part of this book error control codes will be considered. As said
earlier, these codes enable detection and possible correction of transmission errors.
Usually, it is supposed that at the encoder input there is a series of bits, statistically
independent and equally probable. It can be a result of previous data compression or
scrambling. To detect and correct the errors some redundancy should be added.
Depending on the way how the redundancy is added, the error control codes are
divided into two families—block codes and convolutional codes. Further subdi-
vision is possible, but it will be postponed and commented in the corresponding
chapters. Block codes are invented before convolutional codes. These last will be
considered in the next chapter.
From the name it is obvious that block encoding consists of taking a block of
input bits (k bits) and representing it with the block of output bits (n bits). This code
is denoted as (n, k) code. In this case the code rate is R = k/n (as used earlier in this
book). Generally, block code can be defined by the corresponding table comprising
k-tuples of input (information) bits and the corresponding n-tuples of output
encoded bits (code words). This table will have 2k rows. For greater values of k it is
not practical and some rule should be used how to obtain the code word from
k input bits. Further, the corresponding decoding rule is needed—how to obtain
information bits from the received code word. In praxis the systematic codes are
especially interesting, where the block comprising k information bits is not changed
in the code word, and n – k control bits are added, forming a code word of the
length n. In the following, it is primarily supposed that binary codes are considered.
But, many of exposed principles and conclusions can be used for forming nonbi-
nary codes as well.
In this chapter will be considered primarily so called linear block codes. At the
beginning, some elementary notions will be introduced. Repetitions codes
(Problem 5.1) are considered in the previous chapter as well (Introductory part and

© Springer International Publishing AG 2017 153


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_5
154 5 Block Codes

Problems 4.6 and 4.7). At the end of Sect. 4.6 an example where multifold repe-
tition is considered (BSC where crossover probability is p < 0.5). Here, only the
fivefold repetition will be analyzed—i.e. information bit 0 is encoded as (00000)
and information bit 1 is encoded as (11111). Various decoding strategies can be
used. There are 32 possible received words. According to one decision rule (all
rules are MAP, with majority logic implemented). Five bits are decoded according
to the greater number of ones or zeros as one or zero. Therefore, all single errors
and double errors in a code word are detected and corrected. The feedback channel
is not needed. This procedure is Forward Error Control (FEC). According to the
next decision rule only (00000) and (11111) are decoded as 0 or 1. In this case all
single, double, triple and fourfold bit errors are detected. However, then the repe-
tition of this word is requested over the feedback channel. This procedure is
Automatic Repeat reQuest (ARQ). But, the third decision rule is possible as well.
Besides the combination (00000), (no errors!), all combinations having four 0 and
one 1 are decoded as 0, and analogously the combination (11111) and combinations
having four 1 are decoded as 1. For the other combination having 2 or 3 zeros either
2 or 3 ones, the retransmission is requested. It means that single errors are corrected,
and all double and triple errors are detected. It is a hybrid procedure.
In fact, these rules are based on Hamming distance (d) (Problems 5.2 and 5.4).
For binary codes Hamming distance between two (binary) sequences of the same
length is defined as the number of places (bits) in which they differ. It has the
properties of metric. In the above example the Hamming distance between code
words is d = 5. A discrete five dimensional space can be conceived, having total of
32 points, two of which are code words. Their distance is 5.
According to the first decision rule only the received words identical to code
words are decoded, for the other cases, the retransmission is requested (ARQ), It is
illustrated in Fig. 5.1a, where the code words are denoted by fat points. According
to the second decision the received word is decoded as the nearer word in hamming
sense (FEC) as in Fig. 5.1b. For the third decision rule (hybrid procedure), the
space is divided into three subspaces. The points in the subspaces around code
words are at the distance 1 from code words and single errors are corrected. In the
third subspace (in the middle) are the points having Hamming distance 2 or 3 from
code words and double and triple errors are detected (Fig. 5.1c). Decoder which
every received word decodes as some code word (final decision) is called complete
decoder.

(a) (b) (c)

Fig. 5.1 Detection of errors without correction (a), with correction of all (up to 4) errors (b) and
with partial (single error) correction (c)
Brief Theoretical Overview 155

It is obvious that Hamming distance of two sequences equals the number of


errors turning one sequence into the other. Therefore, it is easy to find the direct
connection of Hamming distance and the numbers of detectable and correctable
errors of some code. For d = 1, a single error transform one code word into the
other and there is even no any error detection. To detect all single errors it must be
d  2, and to detect qd errors the following must hold

d  qd þ 1:

Of course, for a code with more (than two) code words, the worst case should be
considered, i.e. minimum Hamming distance in the code should be found. To
correct qc errors, the minimum Hamming distance should be

qc \d=2;

i.e. (qc and d are integers)

d  2qc þ 1:

Step further, to detect qd errors and of them to correct qc errors (qd  qc), the
following must hold.

d  qd þ qc þ 1:

It is a general formula (previous expressions are obtained for qc = 0, i.e. for


qd = qc). It should be mentioned that if e errors occur (qc < e  qe) these errors
will be detected, but not corrected. For e  qc the errors will be detected and
corrected. Therefore, the same code can be used for FEC, ARQ or the hybrid
procedure.
In the case of an erasure channel (BEC) where the bits “under suspicion” are
erased, if qe bits are erased, the Hamming distance should be

d  qe þ 1;

to allow for distinguishing from the nearest code word. Also, to correct qc errors—
the possibility not explicitly considered in BEC definition—and to erase qe bits the
following must hold

d  2qc þ qe þ 1:

Codes using single parity check are probably the oldest error control codes.
Simply, at the end of k-bits information word a single parity check is added gen-
erating (n = k + 1, k) block code. The total number of ones is even. Hamming
distance is d = 2 and any odd number of errors can be detected. The code rate is
R = (n – 1)/n.
156 5 Block Codes

However, information bits can be considered in more than one dimension. If


they are ordered in two dimensions, as a matrix and parity checks can be calculated
for rows and for columns

b11  b1;k2 x


.. .. ..
. . .
bk1 ;1    bk1 ;k2 x
..
x . x x

where the positions of parity bits are denoted with “x”. If the block consists of k1k2
information bits the obtained block (“codeword”) will have n = (k1 + 1)(k2 + 1)
bits (k1 + k2 + 1 control bits). Code rate is

k1 k2
R¼ :
ðk1 þ 1Þðk2 þ 1Þ

Maximal code rate value is obtained for k1 = k2 = k, while n = (k + 1)2, and the
code rate is

k2
R¼ :
ðk þ 1Þ2

Hamming in 1950 proposed a class of single-error correcting codes (Hamming


codes) (Problems 5.3, 5.4, 5.5 and 5.6). The code parameters are n = 2b – 1,
k = n – ld(n + 1) = n – b. These codes are described and analyzed in details in the
mentioned problems. The result of parity check—syndrome (“checking number”)—
shows, in binary notation, the position of the error. Therefore, this code can correct
all single errors, but not any combination of two errors in the code word.
A code that, for some e, can correct all combinations of e errors or less and no
others is called a perfect code (Problem 5.3). Besides the Hamming codes, it is
known that the Golay code (23, 12) is a perfect one, it can correct all single, double
or triple errors. Perfect codes satisfy with equality the Hamming bound (Problems
5.5 and 5.8) for a (n, k) code having minimum Hamming distance dmin = 2ec + 1
ec  
X n
qnk  ðq  1Þt
t¼0
t

where q is the code base (q = 2 for binary code). There is also a Singleton bound
(Problem 5.8) for any code (n, k) having a minimum distance dmin

dmin  1 þ nk:

Any code satisfying that bound with equality is called a maximum-distance


code (MDS) (Problem 5.8). Varshamov-Gilbert bound (Problem 5.11) gives the
Brief Theoretical Overview 157

lower bound for a maximum possible code words number for a given code base q,
code word length n and a minimum code Hamming distance d

d 1  
X n
Aq ðn; dÞ  qn = ðq  1Þ j :
j¼0
j

Hamming code can be expanded by adding one general parity-check bit (the
code word length is now n = 2b). The obtained code can detect practically all even
numbers of errors. A code can be shortened as well by omitting some information
symbols.
It is obvious that for every Hamming (n, k) code, where the syndrome has
n – k bits, all single errors can be corrected because

n ¼ 2nk 1:

The corresponding code word lengths are 2j – 1 (j  3), i.e. codes (7, 4),
(15, 11), (31, 26), (63, 57) etc. By adding a general parity check, the codes (8, 4),
(16, 11), (32, 26), (64, 57) etc. are obtained, correcting all single errors and
detecting all double errors. It is very suitable having in view the byte data structure.
In a previous consideration it was mentioned that generally any block code can
be defined by the corresponding table comprising k-tuples of input (information)
bits and the corresponding n-tuples of output encoded bits (code words). This table
will have 2k rows. For the corresponding  words there are 2 possible “can-
n
 ncode
2
didates”. Therefore, there are in total possible codes (of course, some of
2k
them are not good!). For the greater values of n and k it is not practical and some
rule should be used how to obtain the code word from k input bits. It means that
some simple rule should be defined. These rules (algorithms in fact) can be very
different, but there exists a special class of block codes having a simple mathe-
matical description. With the help of discrete mathematics (abstract algebra)
apparatus such class was found. In such a way it is possible to construct codes
having needed characteristics (Hamming distance, etc.). These codes are linear
block codes. This class of codes is defined by imposing a strong structural property
on the codes. This structure provides guidance in the finding the good codes and
helps to practical encoders and decoders realization. At the end of this chapter a
short overview of the corresponding part of abstract algebra is included.
The definition of linear block code is very simple. Consider finite field—Galois
field GF(q), i.e. field that has q symbols. In the field two operations are defined—
addition and multiplication (both are commutative). Sequences of n field elements
(vectors) form a vector space V dimension n over the field. In this space vector
addition is defined where the corresponding vector elements are added according to
the rules from field. The set V is a commutative group under vector addition. Scalar
multiplication is defined as well where vectors (i.e. all their elements) are multiplied
158 5 Block Codes

by field elements (scalars). The product of two n-tuples, (a1, a2, …, an) and (b1, b2,
…, bn), defined as follows

ða1 ; a2 ; . . .; an Þ  ðb1 ; b2 ; . . .; bn Þ ¼ a1 b1 þ a2 b2 þ    þ an bn

is called inner product or dot product. If inner product equals zero (0 from the
field), for the corresponding vectors it is said to be orthogonal. Definition of linear
code is as follows: A linear code is a subspace of vector space over GF(q). In this
chapter mainly binary field GF(2) will be considered, but the theory is general, it
can be applied for linear codes with q different symbols, if these symbols can be
connected 1:1 with the symbols from GF(q). It means that linear block codes exist
if the number of elements in the code alphabet is a prime number or exponent of the
prime number—2, 3, 4 (=22), 5, 7, 8 (=23), 9 (=32), 11, 13, 16 (=24) etc. It is very
convenient that for any exponent of 2 there exists a linear block code. This
approach in describing codes is often called algebraic coding theory.
Of course, block codes can be formed for any number of symbols, but then they
are not linear. The vector spaces over the finite fields can be used to define code
words not as the elements of a subspace. Then, it is usually said that “nonlinear”
codes are considered.
Also, at the end of this chapter there are two problems not connected to linear
block codes. The arithmetic block codes (Problem 5.11) construction is based on
the arithmetic operation connecting decimal representations of information and
code word. These relations are usually very simple. For integer codes (Problem
5.12) information and code words symbols take the values from the set Zr = {0, 1,
2, …, q − 1} (integer ring) while all operation during the code words forming are
modulo-q.
Previous “mathematical” definition of linear block code can be “translated” by
using previously introduced notions. Vector space has 2n vectors—points—can-
didates for code words. From these candidates 2k code words should be chosen. If
these words are chosen as a subspace of the considered space, a linear block code is
obtained. According to Lagrange’s theorem, the number of elements in the group
must be divisible by the number of elements in the subgroup. The vectors form a
group for addition and for every n and every k there exists linear code (n, k),
because 2n is divisible by 2k (of course, qn is divisible by qk). The identity element
is 0(0, 0, …, 0) must be in a subgroup and in a code there must be code word (00 …
0). The subgroup is closed under addition and the sum of code words will be a code
word as well. The Hamming distance between two code words (vectors) equals the
Hamming weight (i.e. to the number of ones) of their sum. It further means that for
any two code vectors there exists code vector obtained by their summation and the
Hamming weights of code vectors are in the same time possible Hamming distances
in the code. Therefore, to find the minimum Hamming distance in the linear code,
the code word having the minimum Hamming weight should be found (of course,
not the all zeros code word!). To calculate the error probability, the weight dis-
tribution should be found called also the weight spectrum (Problems 5.2, 5.3, 5.8
Brief Theoretical Overview 159

and 5.9) of the code. It specifies the number of code words that have the same
Hamming weight.
Consider Hamming code (7, 4) (Hamming code is a linear code!). Code words
(24 = 16) are
(0000000), (1101001), (0101010), (1000011), (1001100), (0100101), (1100110),
(0001111),
(1110000), (0011001), (1011010), (0110011), (0111100), (1010101), (0010110),
(1111111).
Corresponding weight spectrum is: 0(1) 3(7) 4(7) 7(1), shown in Fig. 5.2a. In
Fig. 5.2b weight spectrum of extended Hamming code (8, 4) obtained by adding
the parity check is shown.
A set of vectors is said to span a vector space if every vector equals at least one
linear combination of these vectors. The number of linearly independent vectors is
called a vector space dimension. Any set of linearly independent vectors spanning
the same vector space is called the basis. Therefore, there can be more bases of the
same vector space. Generally, the group can have the nontrivial subgroups.
Similarly, the vector space can have the subspaces. The subspace dimension is
smaller than the space dimension. For (n, k) code vector space dimension equals
n and code subspace dimension equals k. The subspace can have more bases as
well. The subspaces can be orthogonal (dot products of the vectors from bases equal
zero).
Consider (n, k) code. Vectors of its base can be ordered in a matrix form, its
dimensions are k  n. It is a generator matrix) (Problems 5.2, 5.3, 5.4 and 5.9) of
the code. Its rows are linearly independent. The matrix rang is k. There are ele-
mentary operations over the rows which do not change the matrix rang. Therefore,
in this case the code will be the same, only the base would change. Further, by
commuting the matrix columns, an equivalent code (Problems 5.2 and 5.9) would

(a) (b)
Number of code words with given weight, a(d)
Number of code words with given weight, a(d)

8 15

7 4(14)
3(7) 4(7)
6
10
5

3
5
2 0(1) 7(1)
0(1) 8(1)
1

0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8

Hamming weight, d Hamming weight, d

Fig. 5.2 Weight spectrum of Hamming code (7, 4) (a) and Hamming code (8, 4) (b)
160 5 Block Codes

be obtained. This code will have the same weight spectrum as the previous one, i.e.
will have the same performances.
Consider linear block code (5, 2). Code words are (00000), (11010), (10101) i
(01111), i.e. there are 22 = 4 code words. Combination any two of three nonzero
code words forms a generator matrix
     
1 1 0 1 0 1 1 0 1 0 1 0 1 0 1
H1 ¼ ; H2 ¼ ; H3 ¼ :
1 0 1 0 1 0 1 1 1 1 0 1 1 1 1

Of course, any of them can be obtained from the other using elementary oper-
ations over the rows.
Such mathematical description provides very simple generation of code words.
Consider linear code (n, k) that has generator matrix (dimensions k  n)
2 3
g11 g12  g1n
6 g21 g22  g2n 7
6 7
G¼6 . .. .. .. 7:
4 .. . . . 5
gk1 gk2  gkn

To obtain the code word for information bits vector i(i1, i2, …, ik), the following
multiplication is used

v ¼ i  G;

where the resulting code vector (word) v(v1, v2, …, vn) is the sum of G rows
corresponding to the places of ones in i. Therefore, a code word must be obtained.
Further, the rows of G are linearly independent and to any specific information
vector i corresponds code word v, different from other code words.
  Total number of
k
possible code words is equal to the number of rows of G − , plus the number
1
 
k
of words obtained by summing two rows of G − , plus the number of words
2
obtained by summing three rows of G etc. This number is 2k – 1. To this total
should be added all zeros code vector, corresponding to all zeros information
vector, i.e.
     
k k k
þ þ  ¼ 2k :
0 1 k

It is just the number of vectors in a code subspace.


Next step is to obtain a systematic code. This problem is solved easily. By
elementary operations over the rows (code does not change!) and by permuting the
columns, a generator matrix of an equivalent code can be obtained
Brief Theoretical Overview 161

2 3
1 0  0 p11    p1;nk
60 1  0 p21    p2;nk 7
6 7
Gs ¼ 6 . .. .. . .. .. .. 7 ¼ ½ Ik P ;
4 .. . . .. . . . 5
0 0  1 pk1    pk;nk

where Ik is a unity matrix (dimensions k  k) and P is a matrix dimensions k


(n − k). Multiplying information vector i by matrix G, code vector v(i1, i2, … , ik,
k1, k2, … , kn−k) is obtained. It is a systematic code, information bits are not
changed, while the last n − k bits are parity checks

kj ¼ i1  p1j  i2  p2j      ik  pkj ;

i.e. kj is parity check taking into account the bits on the positions where P in jth
column has ones. The number of these checks is just n − k. In binary encoding these
are real parity checks, and in general case (nonbinary encoding) these are general-
ized parity checks. It is also possible to construct a product code (Problem 5.7). For a
systematic product code information bits are ordered in two (or more) dimensions.
Then, one error control code is applied for the rows and the other for the columns.
Therefore, the problem of encoding is solved. The encoding is very simple. It is
sufficed to know matrix G. The next question is what to do at the receiving end.
How to verify do the received vector (u) is a code vector (v)? The abstract algebra
gives here an elegant solution as well. As said earlier, if inner product of two
vectors equals zero (0 from the field), the vectors are orthogonal. Two subspaces
are orthogonal if all vectors from one subspace are orthogonal to all vectors of the
other subspace. It is shown that for every subspace there is a subspace orthogonal to
it. To test the orthogonality, it is sufficient to verify the orthogonality of the cor-
responding bases. If the bases are orthogonal, then all their corresponding linear
combinations are orthogonal. Further, if the dimension of vector space is n, and if
the dimension of one subspace is k, the dimension of corresponding orthogonal
subspace is n − k. Let the base (generator matrix) of one subspace is G (dimensions
k  n) and let H is the base (generator matrix) of the orthogonal subspace (di-
mensions (n − k)  n). Taking into account that in matrix multiplication the ele-
ment aij is obtained as a dot product of ith row of the first matrix and the jth column
of the second matrix, the following is obtained

G  H T ¼ 0:

where (T) denotes transposition and the dimension of matrix 0 (all elements equal
zero) are k  (n − k). Of course, by transposing this product one obtains

H  G T ¼ 0T :

It is obvious, because the subspaces are orthogonal each other. Therefore,


instead of using generator matrix G, at the receiving end, from the above matrix
162 5 Block Codes

equation parity-check matrix H (rank is n − k) (Problems 5.3 and 5.4) should be


found and then verify do the received vector is in the subspace orthogonal to the
subspace generated by H. Solution of the above equation is very simple if matrix
G corresponds to a systematic code. In this case
 
H ¼ PT Ink ;

where In−k is a unity matrix dimensions (n − k)  (n − k), the sign “–” denotes that
for every element of matrix P the corresponding inverse element in the field should
be taken.
In the case of binary code—GF(2)—each element (0 and 1) is inverse to itself
and a sign “minus” is not needed. It is obvious that the first n − k elements of every
row of matrix H are parity-checks and “one” shows the position of the corre-
sponding parity bit. Some authors use a systematic code matrix
Gs ¼ ½ P Ik :

The corresponding parity-check matrix is


 
H s ¼ Ink PT ;

and the orthogonality remains.


One more characteristic is here of interest. Because of mutual orthogonality, if
G generator matrix of a linear code (n, k) and H is a corresponding parity-check
matrix, then H is a generator matrix of (n, n − k) linear code and G is its
parity-check matrix. These codes are called dual codes (Problem 5.9). The
MacWilliams identities relate the weight distribution of some code to the weight
distribution of its dual code. They are useful if the dual code is small enough for its
weight distribution to be found by computer search.
On the base of orthogonality of generator and parity-check matrices it can be
shown that a block code can have maximal Hamming distance w if (and only if)
every set of w − 1 or less columns of matrix H is linearly independent. In other
words, to obtain the code having minimal Hamming distance w, at least w − 1
linearly independent parity checks must exist.
Consider dual code for the above (5, 2) code. It is a code (5, 3). From H1 the
corresponding generator matrix is easily obtained
2 3
1 0 0 1 1
G1 ¼ 4 0 1 0 1 0 5:
0 0 1 0 1

From 25 = 32 vectors in the space, set of 23 = 8 vectors form a (code) subspace

ð00000Þ; ð10011Þ; ð01010Þ; ð11001Þ; ð00101Þ; ð10110Þ; ð01111Þ; ð11100Þ:


Brief Theoretical Overview 163

The other generator matrices are


2 3 2 3
1 0 0 1 1 1 0 0 1 1
G2 ¼ 4 0 1 0 1 0 5; G3 ¼ 4 1 1 0 0 1 5:
1 0 1 1 0 1 1 1 0 0

For the earlier mentioned Hamming code (7, 4) generator and parity-check
matrices are
2 3
1 1 1 0 0 0 0 2 3
61 1 0 1 0 1 0 1
0 0 1 1 0 07
G¼6
40
7; H ¼ 4 0 1 1 0 0 1 1 5:
1 0 1 0 1 05
0 0 0 1 1 1 1
1 1 0 1 0 0 1

For decoding the received word u, the following vector should be found

S ¼ u  HT:

It is called syndrome, dimensions are 1  (n – k) and its elements are parity


checks. It is easy to conclude taking into account previous consideration about
finding matrix H, when matrix G is given in a systematic form. When the errors are
only detected (ARQ procedure), nonzero elements (one or more) of syndrome mean
that the error (or errors) occurred. If the errors are corrected (FEC procedure) from
the syndrome the corresponding error position(s) should be found. Let code vector
v is transmitted and let vector u = v + e is received, where vector e—is an error
vector, mentioned earlier having “ones” at the error positions. The following
syndrome is obtained

S ¼ u  H T ¼ ð v þ eÞ  H T ¼ v  H T þ e  H T ¼ 0 þ e  H T ¼ e  H T :

Now, vector e should be found. It is obvious that if the error vectors have the
same syndrome, only one of them must be taken into account. Hoping that the
smaller number of errors is expected, for the error vector should be taken that one
that has a smaller number of ones.
Still further insight into error detection and correction can be obtained using
standard array. Linear code (n, k) is a subspace of a vector space dimension n. It is
in the same time a subgroup of the additive group. Each group can be partitioned
with respect to the subgroup. Here the vector space is partitioned with respect to the
code subspace. The procedure is described and commented in details in Problem
5.2. Here, a short overview of the procedure will be given. For any linear block
code standard array has 2n−k rows and 2k columns. Every row is called coset, code
words c1, c2, …, c2k are in the first row (usually starting from the all zeros code
word) and the first elements in every row are called coset leaders. The element in
the ith column and the jth row is obtained by adding ci and jth coset leader, denoted
by ej. In fact, it is assumed that the code word ci was transmitted, and that the error
164 5 Block Codes

vector ej is added to the code word. Syndrome corresponding to this received word
is uniquely determined by jth coset leader

S¼v H T ¼ ðci  ej Þ H T ¼ ej HT ;

because the parity-check matrix does not depend on the received word and it is
known at the receiving end. The error vectors corresponding to coset leaders are
uniquely determined by the syndrome and can be corrected. Therefore, 2n−k dif-
ferent error patterns (vectors) can be corrected.
As an example consider code (5, 2) with code words (00000), (11010), (10101),
(01111). Minimal Hamming distance equals 3 and a code should correct all single
errors. It is obvious from standard array
S
00000 11010 10101 01111 000
00001 11011 10100 01110 101
00010 11000 10111 01101 110
00100 11110 10001 01011 001
01000 10010 11101 00111 010
10000 01010 00101 11111 100
--------------------------------
00011 11001 10110 01100 011
00110 11100 10011 01001 111
For coset leaders of two last rows the some error vectors having two errors are
chosen (from four possible candidates).
An interesting class of linear codes over GF(2) are Reed-Muller codes (Problem
5.10). They can be easy described and by a simple majority logic easy decoded as
well. Here the code word at the receiver is found iteratively, without using a
syndrome. A notion of orthogonality is introduced (different from orthogonality of
vectors). The set of check sums (parity checks) is orthogonal to a particular error
bit, if this bit is involved in every sum, and no other error bit is checked by more
than one sum.
Considered codes are constructed for error control at the channels where single
errors are dominant (memoryless channels). However, sometimes the errors occur
in packets (“bursts”). By using interleaving (Problem 5.6) at the transmitter and
deinterleaving at the receiver the error control code for random errors can suc-
cessfully combat with packet errors.

Problems

Problem 5.1 Explain a block code construction where bits are repeated n times.
(a) Explain the decoding procedure when the decisions are made using majority
logic.
(b) If it is possible to form some other decision rule for this code, explain its
advantages and drawbacks with respect to the previously described rule.
Problems 165

(c) Find the residual error probability after decoding and draw it for n = 5, n = 7,
n = 9 and n = 11, for two different decision rules. Find the code rate for all
cases considered.
(d) Comment the quality of this code comparing it to the limiting case given by
Second Shannon theorem.

Solution
The construction of a repetition code is quite simple and it is based on the fact that a
binary zero at the encoder input is represented by a series of n zeros at its output,
while to the binary one correspond a series of n successive ones. It is obvious that at
the encoder output only two code words can appear, while at the receiver input any
bit combination can appear (but not all having the same probability). The decoder,
on the basis of the received code word, has to decide which information bit is
transmitted.
(a) To attain this goal, usually the majority decision rule is used, where it is
decided that at the encoder input was binary zero if in the received word there
are more zeros than ones, and vice versa. To avoid the case when the received
word has an equal number of binary zeros and ones, it is usually taken that a
code word length, denoted by n, is an odd number. In this case all received n-
bits words are divided into two groups, as illustrated in Fig. 5.3 for n = 3.
Using this approach the decision error appears if there are errors on at least
(n + 1)/2 positions. The probability of the residual error for the user just
corresponds to the probability of such an event being

X
n  
n t
Pe;rez1 ¼ p ð1  pÞnt ;
t
t¼ðn þ 1Þ=2

where p denotes the crossover probability in the binary symmetric channel.


(b) The second decision rule is simpler—when the decoder receives one of two
possible code words the decoding procedure is performed. When the received
word is not a codeword, correction is not carried out, but the corresponding
retransmission is requested. While the first decoding method, based on
majority decision, is a typical representative of FEC (Forward Error
Correction) techniques, the second rule is completely based on the code word
validity and on the use of ARQ (Automatic Repeat reQuest) procedure. In this

000 100
0 Repetition 000 001 010 Majority 0
code, n=3 BSC decision rule
1 111 101 110 1
011 111

Fig. 5.3 Encoder and decoder procedures when threefold repetition is applied
166 5 Block Codes

case the error occurs only if all bits in a code word are hit with the channel
errors, and a corresponding probability is

Pe;rez2 ¼ pn :

(c) Using of the previously derived expressions, it is possible to calculate the


dependence of the residual error, detected by the user, on the crossover
probability in the channel. By direct use of these expressions the dependence
of the residual error after decoding Pe,res on the BSC crossover probability
was calculated and shown in Fig. 5.4. The results are shown for both decision
rules.
In the figure two effects can be noticed:
(1) For the same value of the crossover probability, the residual error is smaller as
the number of repetitions is greater (also, as expected, Pe,res is greater if p is
greater). On the other hand, the information rate for n repetitions is smaller
n times than the channel binary rate vi = vb/n, and generally vi = vb  R,
where R is the code rate (in this case R = 1/n). It is obvious that for a
repetition code, the increased transmission reliability was paid by the reducing
the information rate.
(2) The residual error probability does not depend only on the code construction
and the code parameters values. In this case, for an n-repetition code and the
constant code rate, it depends on the decoding procedure as well. E.g. for the
same n = 5 and R = 1/5 repetition code the various results are obtained for
two different decision rules. On the other hand, the second decision rule

0
10 n=5, decision rule no. 1
n=7, decision rule no. 1
−1
10 n=9, decision rule no. 1
n=11, decision rule no. 1 decision rule
no. 1
−2 n=5, decision rule no. 2
Residual bit error probability, prez

10
n=7, decision rule no. 2
−3 n=9, decision rule no. 2
10
n=11, decision rule no. 2
−4
10

−5
10

−6
10
decision rule
−7
10 no. 2

−8
10

−9
10
−3 −2 −1 0
10 10 10 10
Crossover probability in BSC, p

Fig. 5.4 Residual error probability, the repetition code, two decision rules
Problems 167

achieves the better performances, but it implies the feedback channel and
ARQ procedure. Because of that, to achieve the small residual error with the
moderate decoder complexity is a very complicated task.

(d) In Problem 4.7 it was shown that the transmission reliability can be increased
by reducing the channel information rate. It is interesting to check whether
the results obtained here are in concordance with the Second Shannon theo-
rem, formulated in 1948 [10], where it was proved that the reliable trans-
mission can be provided if the code rate (R) is smaller than the transmitted
information—I(X, Y)—for a given channel.
For BSC, when the input symbol probabilities are equal, the transmitted infor-
mation depends only on the average transmission error (crossover) probability (p)
 
1 1
Imax ðpÞ ¼ 1  ð1  pÞ  ld  p  ld :
1p p

The function I(p) is shown in Fig. 5.5, where for crossover probability the
logarithmic scale is used (for p < 0.5). In the same figure there are shown the
parameter p values for which n-fold repetition provides the residual error proba-
bility Pe,res = 10−9 (for the second decision rule) and at y-axis the corresponding
code rate is denoted.
It should be noted that such comparison is not quite correct—I(p) is the upper
limit for a code rate for the case of when the residual error is arbitrary small, while,

0.9

0.8
R>I(p)
Transmitted information, I

0.7 region where the reliable R<I(p)


transmission is possible
0.6

0.5

0.4
n=15
n=11
0.3 n=13
n=3 n=9
0.2

0.1 n=5

n=7
0
−4 −3 −2 −1
10 10 10 10
Crossover probabitily, p

Fig. 5.5 Illustration of the second Shannon theorem for BSC


168 5 Block Codes

for the repetition code the values are given for the case when the Pe.res is sufficiently
small for the practical purposes (for a majority of communication systems
Pe,res < 10−9). However, some interesting effects can be noticed:
(1) For a chosen decision rule, the same value of residual error is achieved for
crossover probability values as the code rate is smaller.
(2) In all considered cases R I(p), and it is possible to construct a code having
substantially greater code rate, which will provide the same (and even the
better) transmission quality that the n-repetition code, for the same crossover
probability.
(3) For a very great number of repetitions, the code performances approach to the
limits given by the Second Shannon theorem (as R/I(p) ! 1) but then R and
I(p) approach to zero.

Problem 5.2 Block code is defined by Table 5.1.


(a) Find the code rate. Is it a linear code? If yes, find its generator matrix.
(b) Write all code words and draw the corresponding weight (distribution) spec-
trum. Is it possible from it to find the code capability to correct the errors?
(c) Find a standard array of a code.
(d) Is the generator matrix of the code unique?
(e) Form some code equivalent to the original code.
(f) By changing one code word form a nonlinear block code.

Solution
(a) The three-bits information words are represented (according to Table 5.1) by
five bits code words yielding the code rate R = 3/5.
By definition, a linear block code is a vector subspace of a vector space. For a
binary code (n, k) vector space consists of all n-bits combination (2n), while a vector
subspace is formed of 2k sequences of the length n, satisfying the following
(1) Vector subspace has a neutral (identity) element for addition
(2) Vector subspace is closed for the modulo-2 addition.

Table 5.1 Information Information word Code word


words and the corresponding
code words of block code 0 0 0 0 0 0 0 0
0 0 1 1 1 0 1 1
0 1 0 1 0 1 1 0
0 1 1 0 1 1 0 1
1 0 0 0 1 0 0 1
1 0 1 1 0 0 1 0
1 1 0 1 1 1 1 1
1 1 1 0 0 1 0 0
Problems 169

To check the code linearity these conditions should be verified. In the code there
is a code word consisting of all zeros and the first condition is satisfied. By adding
any pair of the code words some code word is obtained (one of the bit combinations
in Table 5.1) and code is a linear one.
From Table 5.1 it can be easily found that the code word corresponding to
information word 110 can be obtained by adding the code words corresponding to
information words 100 and 010. Now, the general rule can be found, from which
follows that ith row of a generator matrix should correspond to the code word for
an information word having one at the ith position only, and a generator matrix is
2 3
1 1 0 1 1
G ¼ 41 0 1 1 0 5:
0 1 0 0 1

All code words can be obtained multiplying all information words by a generator
matrix

c¼i G

where i is an arbitrary information word (generally, any combination of k bits, here


k = 3) and the operator ⊗ denotes modulo-2 matrix multiplication. In this case the
above matrix relation is satisfied and eight information words are transformed into
code words forming the vector subspace, as shown in Fig. 5.6, and the code is a
linear one.
Linear block code can be described by a generator matrix, it completely defines
the transformation of 2k information words into 2k code words of the length
n. These code words form the linear block code. On the other hand, for any linear
code, the generation matrix G can be uniquely found.

Vector
subspace
Vector space
00000
(5,3)
11011 00100
11111 10110 10010 11100
01101 11101 11110
01001 10000
00001 10001 10111 11010
00011 01010 10011 11000
00010 00110 01000 10100 11001
00101 01011 01100 10101
00111 01110
01111

Fig. 5.6 Linear block code (5, 3) as a vector subspace


170 5 Block Codes

(b) Code words are given in Table 5.1 and shown in Fig. 5.6 as the elements of
vector subspace. Linear block code is a subspace and it is closed under
addition modulo-2 (bit-by-bit), and the sum of two code words is a code word
as well

ð11011Þ  ð01101Þ ¼ ð10010Þ:

Hamming distance between two code words ci and cj, denoted by dH(ci, cj), is
determined by a Hamming weight of the word ci ⊕ cj (i.e. by the number of ones
in it). It can be written formally as

dH ðci ; cj Þ ¼ dH ðci  cj ; 0Þ ¼ wðci  cj Þ;

where w(ci) denotes the Hamming weight of the ith code word. Code words weight
distribution determines completely code distances spectrum. In this code there is
one code word having the weight d = 1, two words with d = 2, two words with
d = 3, one word with d = 4 and one word with d = 5. The corresponding weight
distribution is shown in Fig. 5.7.
The minimum Hamming distance is dmin = 1, relation dmin  2ec + 1 is sat-
isfied only for ec = 0, and under the previous condition the inequality dmin 
ec + ed + 1 is satisfied only for ed = 0. It is obvious that this code cannot correct
even not to detect all single errors.
(c) Standard array of any binary linear block code has 2n−k rows and 2k columns,
as given in Table 5.2. Every row is one coset, code words c1, c2, …, c2k are in
the first row (usually starting from the all zeros code word) and the first
elements in every row are coset leaders. The element in the ith column and the
jth row is obtained by adding ci and jth coset leader, denoted by ej. Under

Fig. 5.7 The code (5, 3) 2.5


d
Number of code words with given weight, a

weight distribution

1.5

0.5

0
0 1 2 3 4 5
Hamming weight, d
Problems 171

Table 5.2 General clishé for obtaining the standard array of a linear block code
c1 = (0, 0, …, 0) c2 … ci … ck
2

e2 c2 ⊕ e2 … ci ⊕ e2 … c2k ⊕ e2
… … … … … …
e2n−k c2 ⊕ e2n−k … ci ⊕e2n−k … c2k ⊕ e2n−k

Table 5.3 Standard array of a code (5, 3)


00000 11011 10110 01001 10010 11111 01101 00100
10000 01011 00110 11001 00010 01111 11101 10100
00010 11001 10100 01011 10000 11101 01111 00110
00001 11010 10111 01000 10011 11110 01100 00101

assumption that the code word ci was transmitted, it is obvious that the ele-
ment of the jth row in the ith column corresponds to the received word if the
error vector is equal to the jth coset leader. It holds as well as for every
column and every row (any i, j combination). Syndrome corresponding to this
code word is uniquely determined by jth coset leader

S¼r H T ¼ ðci  ej Þ H T ¼ ej HT

because the parity-check matrix does not depend on the received word and it is
known at the receiving end.
The error vectors corresponding to coset leaders are uniquely determined by the
syndrome and can be corrected. In general case, it is obvious that 2n−k different
error patterns can be corrected. To minimize the error probability it is suitable that
the coset leaders are the error patterns which occur the most frequently. It is obvious
that for BSC (which has a relatively small crossover probability) single errors are
more probable than double ones and because of that for coset leaders the combi-
nations with the smallest number of ones should be chosen.
Standard array of the (5, 3) code has 8 columns, 4 rows, as given in Table 5.3.
Code words ci are in bold, while the coset leaders are in italic and underlined. In this
example, every element in ith column differs from ci by at most one bit. E.g. second
element in the third column equals the sum of c3 = (10110) and the second coset
leader e2 = (10000). This code can correct the error patterns (10000), (00010) and
(00001). Of course, the trivial error pattern (00000) can be “corrected” as well. On
the other hand, the error pattern (00100) cannot be corrected and the code cannot
correct all single bit error patterns. This pattern is in fact a code word and this error
cannot be even detected, confirming the above conclusion that for this code
ec = ed = 0 holds. It is recommended to the reader to verify whether the vector
(01000) can be a coset leader as well as whether the code, in this case, could detect
and correct the errors.
172 5 Block Codes

Let the ith column in the standard array is denoted by Di. The decoding by using
a standard array is simple—if word r was received from the column Di, the con-
clusion is that code word ci is transmitted. It is easy to verify that this way of
decoding is based on the maximum likelihood rule, because it is concluded that the
code word was transmitted to which the received one has a minimum Hamming
distance. The following conclusions can be drawn:
(1) It is possible to decode 2n−k various patterns because there is the same number
of different syndromes. For a code having minimum Hamming distance dmin
for coset leaders the error patterns with weights ec  (dmin − 1)/2 should be
chosen.
(2) If the received word is different from code words, the error can be detected.
Therefore, there are in total 2n − 2k detectable error patterns.

(d) The code generator matrix is not unique. In fact, it can be obtained if as its
rows any three independent code words (the third one cannot be the sum of the
other two) from the vector subspace shown in Fig. 5.6 are written. It is easy to
verify that this condition is satisfied as well as with the matrix

2 3
1 0 0 1 0
G0 ¼ 4 1 1 1 1 1 5:
0 0 1 0 0

Of course, the transformation of information words into the code words is not
now defined by Table 5.1, but the set of code words will be the same. Therefore,
this generator matrix generates the same code. The following should be noticed as
well—although for the same code more generator matrices can be found, one
generator matrix defines one and only one code.
(e) An equivalent code is, by definition, the code having the same code distances
spectrum, but at least one its code word differs from the words of the origi-
nating code. This code can be formed if two (or more) columns change the
places. The following matrix is obtained from the previous generator matrix
where the third and the fourth columns have changed the places.

2 3
1 0 1 0 0
G00 ¼ 4 1 1 1 1 1 5:
0 0 0 1 0

The following set of code words corresponds to this generator matrix

ð00000Þ; ð00010Þ; ð11111Þ; ð11101Þ; ð10100Þ; ð10110Þ; ð01011Þ; ð01001Þ;


Problems 173

and it is obvious that the code distance spectrum is not changed comparing to the
previous case.
(f) The simplest way to form a nonlinear block code is to substitute all zeros code
word by some bit combination not belonging to the code. In such a way in the
vector subspace (shown in Fig. 5.6) there will be no more the identity element
for addition, if the combination (00000) is substituted with (00001). It is easy
to verify that, in this case, is not possible to define the generator matrix
providing for the transformation of all zeros information word into non zero
binary combination (the multiplying of any matrix by a vector consisting of all
zeros yields the all zero vector).
Of course, even if a code word “all zeros” exists in the code, it might not be
linear. If any code word is changed in such a way that its sum with some other code
word does not yield a code word, the closure property regarding the addition is not
fulfilled (the sum is a word not belonging to the vector subspace, of course, it
belongs to some other vector subspace). In the second variant shown in Table 5.4
the critical word is (01100). In this case, as well, it is not possible to define a
generator matrix, because this word cannot be obtained as a linear combination of
other three code words.
Problem 5.3 Explain the construction of Hamming code having code word length
n = 2b – 1 (b—any integer greater than 1), and after
(a) Explain in details the construction of Hamming (7, 4) code, as well the
encoding and decoding process for a code word corresponding to information
sequence i = (1110), if during the transmission an error occurred in the third
position of the code word;
(b) Decode the received word, if the same information sequence as above was
transmitted for the cases without errors and when the errors occurred in the

Table 5.4 Transformation to obtain a nonlinear block code


Nonlinear block code, variant 1 Nonlinear block code, variant 2
Information word Code word Information word Code word
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 1
0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0
0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 0
1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1
1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0
1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0
174 5 Block Codes

third and the fifth position of the code word, comment the corresponding
syndromes;
(c) Find the generator and the parity-check matrices for this code;
(d) Draw the distance spectrum of the code.

Solution
In this problem a class of codes proposed by Richard Hamming in 1950 [23] will be
described. Suppose that the aim is to construct a Hamming code when n = 2b − 1
bits. If the bit positions are numbered as l = 1, 2, …, n, then the parity-check bits
are in the code word c positions l = 2m (m = 1, 2, …, b), while information bits are
in the other places. It is obvious that the total number of parity-check bits in the
code word is just b = ld(n + 1), and the information word length at the encoder
input should be k = n − ld(n + 1). Some possible (n, k) combinations, according to
this expression, are given in Table 5.5.
Parity-check bit m, which is in the position l = 2m of the code word c, is obtained
by adding modulo-2 (XOR) of all information bits which are at the positions having
1 at mth position in their binary record.
After the superimposing the channel errors, the code word r is at the decoder
input. Here, the syndrome is formed consisting of b = n − k bits. The value of mth
syndrome bit is obtained by adding the bits of the received word which are at the
positions having 1 at mth position in their binary record. If during the transmission
only one of these bits was inverted (error!), the syndrome value, starting from the
bits with the higher ordinal number to the position having the smaller ordinal
number is a binary record of the error position. General block scheme of the
Hamming encoder and decoder are shown in Fig. 5.8.
(a) Hamming code (7, 4) construction can be simply described by a clishé given
in Table 5.6. The code word length is n = 7, the number of parity-check bits is
b = ld(7 + 1) = 3.
Number l, from the first column determines the ordinal bit number in the code
word. The position of the first 1 in the mth (m = 1, 2, 3) position of a binary record
of l, determines the position of mth parity-check bit in code word, this bit is denoted
by zm. The value of mth parity check bit is obtained by summing modulo-2 the

Table 5.5 Some possible b n k


(n, k) pairs
2 3 1
3 7 4
4 15 11
5 31 26
6 63 57
7 127 120
… … …
Problems 175

k
zm = ∑ il −⎡ld ( l )⎤ ,
⎢ ⎥
[ bin(l )]b −m =1, ld (l ) ≠ ⎡⎢ld (l ) ⎤⎥
l =1

(i1,i2,...,ik) Hamming (c1,c2,...,cn)


encoder ⎧ zm ,
⎪ l = 2m
cl = ⎨
⎪i , l ≠ 2m
⎩ l −⎡⎢ld ( l )⎥⎤
el
n
sm = ∑ rl , [ bin(l )]b −m =1
l =1 rl+el
(iˆ1 , iˆ2 ,..., iˆk ) (r1,r2,...,rn)
Hamming
decoder

Fig. 5.8 General block scheme of Hamming encoder and decoder

Table 5.6 Clishé for l bin (l) ci


constructing code words for
Hamming code (7, 4) 1 001 z1
2 010 z2
3 011 i1
4 100 z3
5 101 i2
6 110 i3
7 111 i4

information bits in the positions where the binary record of ordinal number of l in
position 3 − m has 1. For the obtaining the first parity-check bit the last column in a
binary record is used, and for the third one the first column is used, yielding

z1 ¼ c3  c5  c7 ¼ i1  i2  i4 ;
z2 ¼ c3  c6  c7 ¼ i1  i3  i4 ;
z3 ¼ c5  c6  c7 ¼ i2  i3  i4 :

The code word has a structure ðc1 c2 c3 c4 c5 c6 c7 Þ ¼ ðz1 z2 i1 z3 i2 i3 i4 Þ, and a


sequence at the receiver input is changed because of channel errors. Therefore, lth
code word bit is rl ¼ cl  el and the decoder, on the basis of received sequence,
calculates the parity-check sums, and the mth syndrome component is formed by
adding modulo-2 the bits at the positions where the binary record at position
b – m has 1.
Similarly to the parity-check bits calculation, for the first syndrome component
s1 the last column in binary record is used. However, in this case all bits in the
received word are added, taking into account and position of the first 1 in every
176 5 Block Codes

column of the clishé in Table 5.6. Denoting by el the lth bit in the error vector,
syndrome components can be written as

s1 ¼ r1  r3  r5  r7 ¼ ði1  i2  i4 Þ  e1  i1  e3  i2  e5  i3  e7 ¼ e1  e3  e5  e7 ;
s2 ¼ r2  r3  r6  r7 ¼ ði1  i3  i4 Þ  e2  i1  e3  i3  e6  i3  e7 ¼ e2  e3  e6  e7 ;
s3 ¼ r4  r5  r6  r7 ¼ ði2  i3  i4 Þ  e4  i2  e5  i2  e6  i3  e7 ¼ e4  e5  e6  e7 :

On the basis of the above relations, it is easy to prove that the corresponding
parity-checks (ordered by descending indexes) form the syndrome which deter-
mines the position of the code word bit where the error occurred, recorded in binary
system S ¼ ðs3 s2 s1 Þ. By complementing this bit, the error can be corrected.
When the transmitted sequence is i = (1110), information bits are i1 = 1, i2 = 1,
i3 = 1, i4 = 0 and the corresponding parity-checks are

z1 ¼ i1  i2  i4 ¼ 1  1  0 ¼ 0;
z2 ¼ i1  i3  i4 ¼ 1  1  0 ¼ 0;
z3 ¼ i2  i3  i4 ¼ 1  1  0 ¼ 0;

and at the encoder output is the code word c = (0010110). Due to the transmission
error at the fourth position, the received sequence is r = (0011110), the
parity-check sums have the values

s1 ¼ r1  r3  r5  r7 ¼ 0  1  1  0 ¼ 0;
s2 ¼ r2  r3  r6  r7 ¼ 0  1  1  0 ¼ 0;
s3 ¼ r4  r5  r6  r7 ¼ 1  1  1  0 ¼ 1:

It is obvious that syndrome is S ¼ ð100Þ ¼ 4 and that the error is detected at the
fourth position. The decoder inverts the fourth bit, the error is corrected and a
correctly reconstructed code word is obtained, from which the information bits are
extracted

^c ¼ ½0010110 ¼ c ) ^i ¼ ½1110 ¼ i:

(b) Firstly, it should be noted that the code word structure depends on the code
construction as well as on the input information sequence. If the same code is
applied, to the unchanged information word corresponds the same code word
c = (0010110) because the transformation is 1:1.
If during the transmission the error did not occurred, the received word is r = c,
parity-checks are
Problems 177

s1 ¼ r1  r3  r5  r7 ¼ 0  1  1  0 ¼ 0;
s2 ¼ r2  r3  r6  r7 ¼ 0  1  1  0 ¼ 0;
s3 ¼ r4  r5  r6  r7 ¼ 0  1  1  0 ¼ 0;

and syndrome is S ¼ ð000Þ ¼ 0, corresponding to the transmission without errors.


In this case decoder set aside the code word bits 3, 4, 6 and 7, and the reconstructed
information word is the same as in the previous case.
When the errors occurred in the 3rd and the 5th bit, the received word is
r = (0000010)

s1 ¼ r1  r3  r5  r7 ¼ 0  0  0  0 ¼ 0
s2 ¼ r2  r3  r6  r7 ¼ 0  0  1  0 ¼ 1
s3 ¼ r4  r5  r6  r7 ¼ 0  0  1  0 ¼ 1

and the syndrome S ¼ ð110Þ ¼ 6 indicates that the error occurred at the 6th bit.
After the corresponding inversion, the estimations of the code and information
words are

^c ¼ ð0000000Þ ) ^i ¼ ð0000Þ 6¼ i:

It is obvious that the decoding in this case was unsuccessful because even three
information word bits are wrongly decoded. Here the decoding made situation even
the worse—channel errors were at two information bits, while the decoder itself
introduced an additional error. It does not happen always that the channel (or
decoder) introduces the errors on the information bits, but such situation is not
desirable.
As a Hamming code syndrome has n – k = b bits, it is easy to calculate that the
total number of possible syndrome values is 2n−k = n + 1. From it, n syndrome
values show the single error positions, and the last one (all zeros) corresponds to the
transmission without errors. This relation shows that here the Hamming bound
ec  
X n
qnk  ðq  1Þt
t¼0
t

is satisfied with equality for ec = 1 (for a binary code the code word consists only of
zeroes and ones, i.e. q = 2), the code is perfect. Therefore, this code can correct all
single errors, but not any combination of two errors in the code word. Besides the
Hamming code, it is known that the Golay code (23, 12) is as well a perfect one, it
can correct all single, double or triple errors [24].
Now it is interesting to consider how often the single and double errors will
occur. Generally, it depends on the code word length, type of the channel errors and
the average value of error probability. For BSC the probability that from n trans-
mitted bits, the errors occurred in t positions is given by a binomial distribution
178 5 Block Codes

 
n t
Pe;t ¼ p ð1  pÞnt ; t  n:
t

For illustration, let suppose that the channel error probability is p = 10−3. In this
case the probabilities of single or double channel errors in the code word of length
n = 7 are
 
7 7!
Pe;1 ¼ pð1  pÞ6  103 ¼ 7  103 ;
1 1!6!
 
7 7!
Pe;2 ¼ p2 ð1  pÞ5  ð103 Þ2 ¼ 2:1  105
2 2!5!

Similarly, it is possible to calculate the probabilities for triple or fourfold error


sequences, but they will have the substantially smaller values. It is obvious that the
uncorrectable error in the code word, for a code (7, 4), for the above channel
conditions, is a few hundred times less frequent than the single errors. Therefore,
even such simple code can substantially decrease the error probability for the user.
(c) The generator matrix of any linear block code can be found from the relation
c = i ⊗ G, for the considered Hamming code yielding

2 3
g11 g12 ... g17
6 g21 g22 ... g27 7
½z1 z2 i1 z3 i2 i3 i4  ¼ ½i1 i2 i3 i4  6 7:
4 g31 g32 ... g37 5
g41 g42 ... g47

The previous relation, combined with the parity-check bits defining expressions,
yields the Hamming code (7, 4) generator matrix
2 3
1 1 1 0 0 0 0
z1 ¼ i 1  i 2  i 4 ; 61
6 0 0 1 1 0 077:
z2 ¼ i 1  i 3  i 4 ; ) G¼4
0 1 0 1 0 1 05
z3 ¼ i 2  i 3  i 4 :
1 1 0 1 0 0 1

On the other hand, the Hamming code parity-check matrix can be found gen-
erally from the relation
2 3
h11 h21 h31
6 h12 h22 h32 7
S¼r HT ) ½s1 s2 s3  ¼ ½r1 r2 r3 r4 r5 r6 r7  6 7;
4 ... ... ... 5
h17 h27 h37
Problems 179

and in this case, the parity-check matrix can be found practically without
difficulties
2 3
1 0 0
60 1 07
6 7 2 3
s1 ¼ r1  r3  r5  r7 61 1 07 1 0 1 0 1 0 1
6 7
s2 ¼ r2  r3  r6  r7 s ) HT ¼6
60 0 177 ) H ¼ 40 1 1 0 0 1 1 5:
3 ¼ r4  r5  r6  r7
61 0 17 0 0 0 1 1 1 1
6 7
40 1 15
1 1 1

(d) It is obvious that this code has 2k = 16 code words. They can be obtained
easily if all four-bit information words are multiplied by the generator matrix,
resulting in the following combinations
c1 = (0000000), c2 = (1101001), c3 = (0101010), c4 = (1000011),
c5 = (1001100), c6 = (0100101), c7 = (1100110), c8 = (0001111),
c9 = (1110000), c10 = (0011001), c11 = (1011010), c12 = (0110011),
c13 = (0111100), c14 = (1010101), c15 = (0010110), c16 = (1111111).

Let two code words from this set are chosen at random, e.g. c4 and c11. Their
sum is the code word as well, because

c4  c11 ¼ ð1000011Þ  ð1011010Þ ¼ ð0011001Þ ¼ c10 :

As in a vector subspace (and the linear code by definition is it) the closure for
addition must be satisfied, the sum of any two code words must be a code word as
well. As a consequence is that the Hamming distance between the addends will be
equal to the Hamming weight of their sum (i.e. to the number of ones in the
obtained code word). Because of that, every code word is a sum of some other two
code words, and its weight is the Hamming distance between some other code
words and there is no any two words whose Hamming distance does not correspond
to the Hamming weight of some third code word.
As a further consequence, it is sufficient to count ones in the code words,
because it completely determines the spectrum of all possible distances between the
code words. In the above code there is one word consisting of all zeros, seven
words having three ones, seven words having four ones and one word having seven
ones. The corresponding weight spectrum is shown in Fig. 5.9.
Minimum Hamming distance in this case is dmin = 3, the number of correctable
bits (errors) is given by dmin  2ec + 1 and ec = 1, while the number of errors that
can be detected obtained from dmin  ec + ed + 1 is ed = 1. This code obviously
can detect only one error, and to correct it. If ec = 0, then ed = 2. Of course, it does
not mean that all combinations of three errors (in the code word) result in an
180 5 Block Codes

Fig. 5.9 Hamming code

Number of code words with given weight, a(d)


8
(7, 4) weight spectrum
7

0
0 1 2 3 4 5 6 7
Hemming weight, d

undetectable error—it is the case only for combinations having the same pattern as
the code words.
Problem 5.4 Explain the construction of Hamming code having code word length
n = 2b, b—any integer greater than 1, and analyze the Hamming code (8, 4) in
details.
(a) Illustrate the procedures of encoding and decoding for the information word
i = (0011) if the error occurred at the third position of the code word.
(b) Explain in details the decoding of sequences (00000101) and (00101100).
(c) Find the minimum Hamming distance of the code and verify the number of
errors the code can correct and detect.
(d) Analyze the code correction capabilities for p = 10−3.
(e) Find the generator and the parity-check matrix for systematic (8, 4) Hamming
code.

Solution
Hamming code having the code word length n = 2b bits, is constructed beginning
from the previously formed code (2b − 1, 2b – b − 1). Then, at the last position one
parity-check bit is added, while after the reception one additional syndrome bit is
calculated

2X
b
1 X
2b
zb þ 1 ¼ cl ; sb þ 1 ¼ rl :
l¼1 l¼1

In such a way the code which has parameters (2b, 2b − b) is obtained, and the
additional parity-check bit just makes possible the detection of one more error. The
obtained code can correct one and detect two errors in the code word.
Problems 181

The construction of (8, 4) Hamming code differs from (7, 4) code construction
only in adding one bit obtained by simple parity-check, i.e. the fourth parity-check
bit is calculated as follows

X
7
z4 ¼ ci ¼ z 1  z 2  i 1  z 3  i 2  i 3  i 4 :
i¼1

If the number of ones in code word of the basic code (7, 4) is odd, then z4 = 1,
and if it is even, then z4 = 0. In any case, after the inclusion of this bit, the sum
(modulo-2) of all bits in the code word must be equal to zero, and the additional bit
of the syndrome can be written as

X
8 X
8 X
8
s4 ¼ ri ¼ ðci  ei Þ¼ ei :
i¼1 i¼1 i¼1

Therefore, s4 = 1 indicates an odd number of errors during transmission, while


s4 = 0 corresponds to the even number of errors. Syndrome indicating the estimated
error position is formed using the first three parity-check bits.
(a) The procedure for encoding the information word i = (1110) using the
Hamming code (8, 4) can be subdivided into two parts. During the first phase
the code word corresponding to (7, 4) code is formed, as explained in details in
the previous problem, resulting in code word c = (0010110). During the
second phase, the ones in the first seven positions of code word are counted,
and because here there are three ones (an odd number), the code word of (8, 4)
is obtained by adding the bit z4 ¼ 1.
If during the transmission, the error occurred at the third position, the received
word is r = (00001101), and the parity-check bits are

s1 ¼ r1  r3  r5  r7 ¼ 0  0  1  0 ¼ 1;
s2 ¼ r2  r3  r6  r7 ¼ 0  1  1  0 ¼ 1;
s3 ¼ r4  r5  r6  r7 ¼ 1  1  1  0 ¼ 0;
X
8
s4 ¼ ri ¼0  0  1  1  1  1  0  1 ¼ 1:
i¼1

From S ¼ ðs3 s2 s1 Þ ¼ ð0 11Þ ¼ 3 and s4 ¼ 1, it is obvious that during the


transmission (in the first seven positions!) occurred an odd number of errors, i.e.
one, three, five or seven errors. As it will be shown later, usually it is justifiable to
suppose that only one error occurred at the position indicated by syndrome.
The receiving word is corrected and the information bits are “extracted”
182 5 Block Codes

^c ¼ ð00101101Þ ) ^i ¼ ð1110Þ:

One can conclude that the error control coding was successful, i.e. the user
obtained the information bits without errors in spite of the fact that one error
occurred during transmission
(b) If the received word is r = (00000101), the parity-check bits are:

s1 ¼ r1  r3  r5  r7 ¼ 0  1  0  0 ¼ 0
s2 ¼ r2  r3  r6  r7 ¼ 0  1  1  0 ¼ 1
s3 ¼ r4  r5  r6  r7 ¼ 1  0  1  0 ¼ 1
X
8
s4 ¼ ri ¼0  0  1  1  0  1  0  1 ¼ 0
i¼1

From s4 ¼ 0 follows that an even number of errors occurred during the trans-
mission corresponds to the case when the errors cannot be corrected. Now the
syndrome is S ¼ ðs3 s2 s1 Þ ¼ ð110Þ ¼ 6, but it is not the error position.
It could be noticed that this case corresponds to the (b) from the previous
problem, but here the parity bit z4 ¼ 1 is added. The errors there were at the same
positions (third and fifth bit), but the decoder could not correct the errors (even
introduced additional error at the sixth position). In this case the decoder only
detects the occurrence of an even number of errors and does not try to correct them,
but the retransmission is asked for. In such a way the possibility that the decoder
introduces new errors is avoided, and if the channel state is better during the
retransmission (if not more than one error in the code word occurred) the infor-
mation word can be successfully transmitted.
For r = (00101100), the parity-check bits are

s1 ¼ r1  r3  r5  r7 ¼ 0  1  0  0 ¼ 0;
s2 ¼ r2  r3  r6  r7 ¼ 0  1  1  0 ¼ 0;
s3 ¼ r4  r5  r6  r7 ¼ 1  0  1  0 ¼ 0;
X
8
s4 ¼ ri ¼0  0  1  1  0  1  0  1 ¼ 1:
i¼1

The result s4 ¼ 1 suggests an odd number of channel errors, probably one.


However, the syndrome value S ¼ ðs3 s2 s1 Þ ¼ ð000Þ ¼ 0 shows that there were no
errors.
As stressed earlier, the syndrome is calculated on the basis of the first seven code
word bits, while, the last parity-check includes and the last—eighth bit. It makes
possible to find that the error occurred at the last code word bit, being here the case.
In such a case the decoder has only to invert the last code word bit (it is not even
Problems 183

needed, why?). The information sequence ^i ¼ ð1110Þ is then extracted. Of course,


the decoding is successful in this case.
(c) It is obvious that the inclusion of general parity-check does not affect the all
zeros word, while the Hamming weight for all ones word increases from d = 7
at d = 8. All the words having three ones for (7, 4) code now will have an
additional one at the eighth position, while the words having weight d = 4 will
have zero at the eighth position. It is the reason that in the spectrum there are
no more words having the weight d = 3, while the number of words having
four ones increases from 7 at 14.
It is the reason that Hamming code weight becomes dmin = 4. The relation
dmin  2ec + 1 must be satisfied, and the maximum corresponding integer value is
ec = 1. The number of errors that could be detected is found from dmin  ec +
ed + 1, and as the value for ec is determined (ec = 1), now the value ed = 2 is
obtained. Therefore, the Hamming code (8, 4) detects two errors, but only one can
be corrected.
(d) Because the code detects an even number of errors in the code word, the
probability to occur any number of odd errors should be estimated. Similarly
to the previous case, the channel will be considered as memoryless channel
and that it can be modeled as a binary symmetric channel. Supposing the
average error probability p = 10−3, the probabilities of occurrence of one,
three, five or seven errors in the code word are, respectively

   
8 7 8 2
Pe;1 ¼ pð1  pÞ 8  10 ; Pe;2 ¼
3
p ð1  pÞ6 2:8  105 ;
1 2
 
8 3
Pe;3 ¼ p ð1  pÞ5 5:6  108 ;   
3

The drawback of this code is that it is incapable to correct double errors and, if
the decoder is designed to correct one error per codeword, it is not capable to detect
triple errors. In such a case, the probability for code word wrong detection for
p = 10−3 is mainly determined by the probability of triple error
 
8 3
Ped Pe;3 ¼ p ð1  pÞ5 5:6  108 :
3

For such transmission conditions, the probability of an undetected error is about


500 times smaller when compared with the case when additional parity check bit is
not added, i.e. in the case when Hamming (7, 4) is applied. It is a substantial
improvement, but it is conditioned by the feedback channel use, which allows the
request for retransmission of words (packets) when an even number of errors is
detected.
184 5 Block Codes

In the case when the decoder is designed not to correct any error, it is capable to
detect up to three errors in any positions but it is not capable to detect some patterns
with larger weights that corresponds to the code words (analysis is related with part
c) of this problem and similar to Problem 5.2). For this code, there are 14 critical
eight-bit critical error patterns having four ones and one error pattern with weight
eight, for which the syndrome equals zero! In this case, the probability for code
word wrong detection is

Ped ¼ 14 p4 ð1  pÞ4 þ p8 1:4  1011 :

(e) A general parity-check bit can be written as well as follows

z4 ¼ z1  z2  i1  z3  i2  i3  i4
¼ ði1  i2  i4 Þ  ði1  i3  i4 Þ  i1  ði2  i3  i4 Þ  i2  i3  i4
¼ i1  i2  i3

and the generator matrix of a Hamming code (7, 4) from the previous problem
should be slightly modified
2 3
1 1 1 0 0 0 0 1
61 0 0 1 1 0 0 17
G¼6
40
7:
1 0 1 0 1 0 15
1 1 0 1 0 0 1 0

Generator matrix of systematic Hamming code is formed by the changing the


columns places, this operation does not change the code spectrum nor the code
capability to correct the errors. If the columns of the previous matrix are reordered
as 3-5-6-7-1-2-4-8, the systematic generator matrix is obtained
2 3
1 0 0 0j 1 1 0 1
60 1 0 0j 1 0 1 17
Gs ¼ 6
40
7 ¼ ½I4 ; P:
0 1 0j 0 1 1 15
0 0 0 1j 1 1 1 0

The corresponding parity-check matrix can be easily found by noting that to the
submatrix P correspond the last four columns of the generator matrix, yielding as a
result
2 3
1 1 0 1j 1 0 0 0
 T  61 0 1 1j 0 1 0 07
H s ¼ P ; I3 ¼ 6
40
7:
1 1 1j 0 0 1 05
1 1 1 0j 0 0 0 1
Problems 185

Problem 5.5
(a) Perform the encoding of sequence (10101010101) by (15, 11) code and
decode the corresponding code word if the error occurred at the ninth position.
How many errors can be corrected and how many detected by this code?
(b) How to modify the code to make possible the detection of two errors? For the
previous information word, analyze the case when the errors occurred at the
positions 6 and 16.
(c) Decode the sequences (111111000000) and (000000010001) if it is known
that the code used can correct one error and detect one error.
(d) Decode the sequences (001001000001) and (101111000011) if it is known
that Hamming (12, 7) code is used.

Solution
(a) The code word length is n = 15, the first higher power of two is 24 = 16 and
the code will be constructed using four columns from the cliché and there are
b = 4 parity-check bits. Because of n – k = b, the code have the possibility to
detect and to correct one error. The cliché for forming the code, encoding and
decoding procedures is shown below.
l bin(l) cl -encoding -decoding
1 0001 z1 i ¼ ð10101010101Þ r ¼ ð101101000010101Þ
2 0010 z2 z1 ¼ c3  c5  c7  c9  c11  c13  c15 s1 ¼ r1  r3  r5  r7  r9 
3 0011 i1 ¼ i1  i2  i4  i5  i7  i9  i11 ¼ 1  r11  r13  r15 ¼ 1
4 0100 z3 z2 ¼ c3  c6  c7  c10  c11  c14  c15 s2 ¼ r2  r3  r6  r7  r10 
5 0101 i2 ¼ i1  i3  i4  i6  i7  i10  i11 ¼ 0  r11  r14  r15 ¼ 0
6 0110 i3 z3 ¼ c5  c6  c7  c12  c13  c14  c15 s3 ¼ r4  r5  r6  r7  r12 
7 0111 i4 ¼ i2  i3  i4  i8  i9  i10  i11 ¼ 1  r13  r14  r15 ¼ 0
8 1000 z4 z4 ¼ c9  c10  c11  c12  c13  c14  c15 s4 ¼ r8  r9  r10  r11  r12 
9 1001 i5 ¼ i5  i6  i7  i8  i9  i10  i11 ¼ 0  r13  r14  r15 ¼ 1
10 1010 i6 c ¼ ðz1 z2 i1 z3 i2 i3 i4 z4 i5 i6 i7 i8 i9 i10 i11 Þ S ¼ ðs4 s3 s2 s1 Þ ¼ ð1001Þ ¼ 9
11 1011 i7 ¼ ð101101001010101Þ ^c ¼ ð101101001010101Þ
12 1100 i8 ^i ¼ ð10101010101Þ
13 1101 i9
14 1110 i10
15 1111 i11

Therefore, the error occurred at the position 9 was detected and corrected, and
the user received the correct information sequence.
(b) To made the detection of two errors possible one parity-check bit should be
added on the position 16. The extended Hamming code has the parameters
(n, k) = (16, 12) and the added parity bit is
186 5 Block Codes

X
15
z4 ¼ ci ¼ 0;
i¼1

yielding the code word c ¼ ð1011010010101010Þ.


The received code word, after the inversion of bits at the 6th and the 16th
position is r = (1011000010101011) while syndrome and additional parity-check bit are

X
16
S ¼ ðs4 s3 s2 s1 Þ ¼ ð0110Þ ¼ 6; s5 ¼ ri ¼ 0:
i¼1

In this case syndrome shows the position of one of the errors, but because of
s5 = 0 the receiver detects the double error and does not start the error correction,
but asks for retransmission. If the decoder “knew” that one error occurred at the
parity-check bit, it could correct both errors.
According to the solution from the previous problem the syndrome can be as
well written as

s1 ¼ e1  e3  e5  e7  e9  e11  e13  e15


s2 ¼ e2  e3  e6  e7  e10  e11  e14  e15
s3 ¼ e4  e5  e6  e7  e12  e13  e14  e15
s4 ¼ e8  e9  e10  e11  e12  e13  e14  e15

and the combination S ¼ ð0110Þ, s5 = 0 appears as well in the case when the errors
occurred at the positions 7 and 14. Therefore, the various error combinations can
result in the same syndrome value and the errors positions cannot be found exactly.
The same can be clearly seen as well from the Hamming bound in this case
ec 
X   
5 16 16
2  ¼ 1þ ¼ 17;
t¼0
t 1

this relation being satisfied only for ec = 1.


(c) From n = 12 and the fact that code can detect one error follows that it is not
possible to add the general parity-check bit. Because of that 12th position is
taken into account when forming the table and parity-check bits for the syn-
drome. The cliché is given in the table having 4 columns (12 < 24) and a
shortened Hamming code (12, 8) is obtained. The decoding procedure is as
follows:
Problems 187

l bin(l) cl The first received word The second received word


1 0001 z1 r ¼ ð111111000000Þ r ¼ ð000000010001Þ
2 0010 z2 s1 ¼ r1  r3  r5  r7  r9  r11 ¼ 1 s1 ¼ r1  r3  r5  r7  r9  r11 ¼ 0
3 0011 i1 s2 ¼ r2  r3  r6  r7  r10  r11 ¼ 1 s2 ¼ r2  r3  r6  r7  r10  r11 ¼ 0
4 0100 z3 s3 ¼ r4  r5  r6  r7  r12 ¼ 1 s3 ¼ r4  r5  r6  r7  ¼ 1
5 0101 i2 s4 ¼ r8  r9  r10  r11  r12 ¼ 0 s4 ¼ r8  r9  r10  r11  r12 ¼ 0
6 0110 i3 S ¼ ðs4 s3 s2 s1 Þ ¼ ð0111Þ ¼ 7 S ¼ ðs4 s3 s2 s1 Þ ¼ ð0100Þ ¼ 4
7 0111 i4 ^c ¼ ð111111100000Þ ^c ¼ ð000100010001Þ
8 1000 z4 ^i ¼ ð11110000Þ ^i ¼ ð00000001Þ
9 1001 i5
10 1010 i6
11 1011 i7
12 1100 i8

(d) When a Hamming (12, 7) code is used, it is obvious that it is shortened,


because the code word length cannot be written as n = 2b − 1 nor n = 2b.
Now the numbers of correctable errors will be checked.

(1) The relation n  k ¼ dldðnÞe, because n – k = 5 and dldð12Þe ¼ 4, is not


satisfied, the code cannot detect nor correct one error.
(2) The relation n  k ¼ dldðn  1Þe þ 1 is satisfied and the first n – 1 = 11 bits
form the cliché to obtain the code having b = dldðn  1Þe = 4 redundant
(parity-check) bits. The fifth redundant bit is general parity-check and it is put
on the position 12. Therefore, the code can correct one and detect two errors at
the code word.
In this case for construction of (12, 7) code, firstly the code (11, 7) is formed and
there is not 11th bit according to the cliché, and after a general parity-check bit is
added. Procedure for decoding the first received word is as follows.

l bin(l) cl r ¼ ð001001000001Þ
1 0001 z1 s1 ¼ r1  r3  r5  r7  r9  r11 ¼ 1
2 0010 z2 s2 ¼ r2  r3  r6  r7  r10  r11 ¼ 0
3 0011 i1 s3 ¼ r4  r5  r6  r7 ¼ 1
4 0100 z3 s4 ¼ r8  r9  r10  r11 ¼ 0
5 0101 i2 S ¼ ðs4 s3 s2 s1 Þ ¼ ð0101Þ ¼ 5
6 0110 i3 s5 ¼ 1
7 0111 i4 ^c ¼ ð111111100000Þ
8 1000 z4 ^i ¼ ð1110000Þ
9 1001 i5
10 1010 i6
11 1011 i7
188 5 Block Codes

Differently from the previous case, the decoding of the word (101111000011)
yields the following syndrome values

s1 ¼ r1  r3  r5  r7  r9  r11 ¼ 0
s2 ¼ r2  r3  r6  r7  r10  r11 ¼ 1
s3 ¼ r4  r5  r6  r7 ¼ 1
s4 ¼ r8  r9  r10  r11 ¼ 1
S ¼ ðs4 s3 s2 s1 Þ ¼ ð1110Þ ¼ 14
s5 ¼ 1

It is obvious that an even number of errors did not occur, but it is unusual that
the syndrome gives the position 14, not existing in the code. The reader should
found which error can result in such syndrome value.
Problem 5.6 Uplink of one geostationary satellite system is considered. Due to the
frequency limitations maximum signaling rate is vb = 200 [kb/s]. The bad weather
conditions in some intervals (not longer than 0.01 [ms]) cause the errors in packets.
These intervals have a period T = 0.14 [ms]. Due to the long propagation delay, the
satellite link is not suitable for ARQ procedure.
(a) Propose the solution enabling transmission without errors under the circum-
stances. Find the code rate of the proposed error control code.
(b) If the demodulator output sequence is:

011010010001001111110100011111001010011110101

comment the syndrome values, decode the information word and determine
the type of channel errors.
(c) If from the terrestrial station a sequence of N = 5  106 symbols from the set
fA; B; C; D; Eg with the corresponding probabilities {0.45; 0.35; 0.1; 0.07;
0.03} is emitted, find a minimum theoretical time for their emitting, if the error
control code from (a) is used. If the time for this sequence transmission should
not be greater than tmax = 80 [s], propose a compression code that satisfies this
condition.

Solution
The satellite is at the geostationary orbit, the distance from the terrestrial station is at
least d = 33600 [km]. The electromagnetic waves propagate as a light (c = 3108
[m/s]), minimum propagation time (delay) on link is s = d/c = 112 [ms]. The
satellite is in fact the relay between two terrestrial stations, and if the errors occur,
for a complete retransmission, the total delay for a packet retransmission is
8s = 896 [ms] (the initial transmission earth-satellite-earth, the return path for a
negative acknowledgment for a packet reception, once more the complete path for a
positive acknowledgement after the packet reception for resetting a transmission
buffer). Taking into account that when using ARQ procedure sometimes more
Problems 189

retransmissions are used, it is obvious that the total delay can attain a few seconds.
Due to this reason, in satellite systems instead of ARQ procedure, the FEC tech-
niques are more suitable, where the codes having a possibility to correct all the
detected errors have the advantage (as a difference from e.g. computer networks).
(a) The signaling rate is vb = 200 [kb/s] and the time for one bit transmission is
Tb = 1/vb = 5 [ls]. During one period (0.14 [ms]) 28 bits are transmitted, and
during the bad interval (0.01 [ms]) two neighboring bits may be inverted. The
interleaving period always equals the number of bits transmitted during one
period (LI = 28) and after the deinterleaving it should provide the maximum
distance between the errors. If the disturbances appear regularly (strictly
periodically and approximately having the same duration for every period), it
is suitable to use matrix block-interleaver where one matrix dimension is
determined by the number of errors in one disturbance period (denoted
by l) and the other by a code word length n, where nl = LI yielding
n = 28/2 = 14.
There are two Hamming codes having this code word length. A code (14, 9) can
detect two errors, but it is not interesting, because there is no the retransmission.
Therefore, code (14, 10) is chosen having the greater code rate as well as the same
capability for error correction as the code (14, 9).
(b) The procedure for obtaining the Hamming code (14, 10) code words is given
by the following cliché, where the calculating the parity-check bits and syn-
drome are shown as well

1 0001 z1
2 0010 z2
3 0011 i1 z1= i1⊕ i2⊕ i4 ⊕ i5⊕ i7 ⊕ i9,
4 0100 z3 z2= i1⊕ i3⊕ i4 ⊕ i6⊕ i7 ⊕ i10
5 0101 i2
6 0110 i3 z3= i2⊕ i3⊕ i4 ⊕ i8⊕ i9 ⊕ i10
7 0111 i4 z4= i5⊕ i6⊕ i7 ⊕ i8⊕ i9 ⊕ i10
8 1000 z4
9 1001 i5 s1= r1⊕ r3⊕ r5 ⊕ r7⊕ r9 ⊕ r11⊕ r13
10 1010 i6
11 1011 i7 s2= r2⊕ r3⊕ r6 ⊕ r7⊕ r10 ⊕ r11⊕ r14
12 1100 i8 s3= r4⊕ r5⊕ r6 ⊕ r7⊕ r12 ⊕ r13⊕ r14
13 1101 i9 s4= r8⊕ r9⊕ r10 ⊕ r11⊕ r12 ⊕ r13⊕ r14
14 1110 i10
15 1111

The matrix block-interleaver is shown in Fig. 5.10. The input sequence


obtained from the encoder is firstly entered into the interleaver row-by-row, but is
red out column-by-column for emitting through the channel. The packet errors
occur in the channel, and in the example shown in Fig. 5.10 it is supposed that they
occurred in the three neighboring bits denoted by yl1, y12, y22. The writing in
deinterleaver is column-by-column and the reading row-by-row. In such a way, the
190 5 Block Codes

from the channel to the deinterleaver


c → ⎡ x11 x12 ... x1n ⎤ y = ( y11 , y21 ,..., yl1 , y12 , y22 ,..., yl 2 ,..., y1n , y2 n ,..., yln )
(from encoder) ⎢⎢ x21 x22 ... x2 n ⎥⎥ ↓
⎢ ... ... ... ... ⎥ ⎡ y11 y12 ... y1n ⎤ → r
⎢ ⎥ ⎢y y22 ... y2 n ⎥⎥ (in decoder)
⎣ xl1 xl 2 ... xln ⎦ ⎢ 21
↓ ⎢ ... ... ... ... ⎥
⎢ ⎥
x = ( x11 , x21 ,..., xl1 , x12 , x22 ,..., xl 2 ,..., x1n , x2 n ,..., xln ) ⎣ yl 1 yl 2 ... yln ⎦

from the interleaver to the channel

Fig. 5.10 The illustration of the working principle of matrix interleaver and deinterleaver

first row in the deinterleaver matrix corresponds to the first interleaver column and
the errors are not more concentrated into a packet, but they are at maximum
distances. If not more than l successive errors occurred and the period having the
length LI = nl, at the decoder input there would not be any received word having
more than one error.
For a given sequence at the receiver input, the forming of code words by
interleaver is as follows.

y ¼ ð0011101011101111100100010001Þ
#
 
0 1 1 1 1 1 1 1 0 0 0 0 0 0 ! r ðIÞ
0 1 0 0 1 0 1 1 1 1 0 1 0 1 ! r ðIIÞ

1 2 3 4 5 6 7 8 9 10 11 12 13 14

The decoding of the first word is as follows

s1 ¼ r1  r3  r5  r7  r9  r11  r13 ¼ 0  1  1  1  0  0  0 ¼ 1
s2 ¼ r2  r3  r6  r7  r10  r11  r14 ¼ 1  1  1  1  0  0  0 ¼ 0
s3 ¼ r4  r5  r6  r7  r12  r13  r14 ¼ 1  1  1  1  0  0  0 ¼ 0
s4 ¼ r8  r9  r10  r11  r12  r13  r14 ¼ 1  0  0  0  0  0  0 ¼ 1

and because the error is at the ninth position (1001), the nearest code word and
decoded sequences are
cˆ ( I ) = (01111111100000) ⇒ iˆ( I ) = (1111100000) .
The second word decoding is as follows
Problems 191

s1 ¼ r1  r3  r5  r7  r9  r11  r13 ¼ 0  0  1  1  1  0  0 ¼ 1
s2 ¼ r2  r3  r6  r7  r10  r11  r14 ¼ 1  0  0  1  1  0  1 ¼ 0
s3 ¼ r4  r5  r6  r7  r12  r13  r14 ¼ 0  1  0  1  1  0  1 ¼ 0
s4 ¼ r8  r9  r10  r11  r12  r13  r14 ¼ 1  1  1  0  1  0  1 ¼ 1

and the error is again in the ninth position.


After the error correction, the code word and the decoded sequence are

cˆ ( II ) = (01001011010101) ⇒ iˆ( II ) = (0101010101) .

In both cases, the errors were in the same position in the code words (corre-
sponding to the neighboring positions in the interleaver entering sequence)—it is
packet error having the length l = 2.
(c) The complete block-scheme of the system for data transmission is shown in
Fig. 5.11. The symbol probabilities determine the average information per
symbol emitted by a source, i.e. the entropy

X
q
HðSÞ ¼  Pðsi ÞldðPðsi ÞÞ ¼ 1:801 ½Sh/symb:
i¼1

The average code word length cannot be smaller than the entropy. Therefore,
Ns = 5  106 symbols after the compression cannot be represented by less than
Nb = N  H(S). After adding the parity-check bits in Hamming encoder (14, 10)
total number of bits emitted through the channel is

vs = ? vb1 = vs L vb 2 = vs L / R
H(s)=1.801Sh/s Compression Errror control
Source encoder, encoder
N s = 5 × 106 Huffman Hamming (14,10)
A, B, C , D, E Matrix block
interleaver, 2×14

Channel

Matrix block
deinterleaver, 2×14
vs = 1000 sb / s Compression Errror control
User decoder, decoder
Huffman Hamming (14,10)

Fig. 5.11 Block-scheme of the complete transmission system


192 5 Block Codes

Table 5.7 The result of si A B C D E


Huffman procedure
P(si) 0.45 0.35 0.1 0.07 0.03
Code word 0 10 110 1110 1111

Nb;tot ¼ NS HðSÞ=R ¼ 12:607  106 ;

and because the transmitting rate is vb = 200 [kb/s], the minimum time for this
sequence transmission is

tmin ¼ Nb;tot =vb ¼ 63:05 [s]:

The simplest practical solution for a compression encoder is based on a Huffman


algorithm. For a memoryless source (it is supposed because in the problem text it is
not explicitly written that the source is with memory) this algorithm guarantees to
obtain a compact code. For the example from this problem the result of Huffman
procedure is shown in Table 5.7.
It is interesting that the obtained code is in the same time so called comma code.
In this case, the average code word length is

X
q
L¼ Pðsi Þli ¼ 0:45  1 þ 0:35  2 þ 0:1  3 þ 0:07  4 þ 0:03  4
i¼1
¼ 1:85 [bit/symb];

and the code efficiency

HðSÞ
g¼ ¼ 97:35%:
L

The time needed for transmission, if the Huffman code is used, is

NS L
tHuff ¼ ¼ 64:75 [s];
Rvb

and because it is shorter than that in the problem text, the given condition is
satisfied. An additional shortening of the time needed for transmission could be
achieved if with the Huffman coding the source extension is combined. However,
this time in any case, cannot be shorter than tmin.
Problem 5.7 Explain the product code construction using a cliché 2  3 and write
the code word structure, i.e. the positions of information and parity-check bits. Find
Problems 193

the code rate, write the code word corresponding to information sequence (000111)
and explain the decoding procedure if during the transmission the error occurred at
the third position.
Solution
In this case the information bits are written in the matrix having the dimensions
2  3 and then one parity-check bit is calculated for every row and for every
column. The code word of the product code is formed by reading the bit sequence
starting from the first row of the cliché as follows

i1 i2 i3 jz1
i4 i5 i6 jz2 ) c ¼ ði1 ; i2 ; i3 ; z1 ; i4 ; i5 ; i6 ; z2 ; z3 ; z4 ; z5 ; z6 Þ;
z3 z4 z5 jz6

where the parity-check bits are

z1 ¼ i1  i2  i3  i4 ; z2 ¼ i5  i6  i7  i8 ; z3 ¼ i9  i10  i11  i12 ; z4 ¼ i1  i5  i9


z5 ¼ i2  i6  i10 ; z6 ¼ i3  i7  i11 ; z7 ¼ i4  i8  i12 ; z8 ¼ z1  z2  z3 ¼ z4  z5  z6  z7 :

The received word is written into the same cliché

r1 r2 r3 jr4
r5 r6 r7 jr8
r9 r10 r11 jr12

where the parity-checks are calculated for the rows and for columns. If there is no
more than one error, its position can be easily found and the error can be corrected.
If the number of information bits in every row is k1, and in every column k2, the
code rate is

k1 k2
R¼ ;
ðk1 þ 1Þðk2 þ 1Þ

in this case R = 6/12 = 1/2. The encoder from sequence (000111) generates the
code word

0 0 0 j0
1 1 1 j1 ) c ¼ ð000011111111Þ:
1 1 1 j1

After the channel error occurred at the third position, at the receiving end the
cliché is formed
0 0 1 0 1
1 1 1 1 0
r = (001011111111) ⇒
1 1 1 1 0
0 0 1 0
194 5 Block Codes

and the error is at the intersection of the first row and the third column, i.e. on the
third position in the code word. The information sequence is correctly decoded. The
reader should find the generator matrix of the code, to calculate the minimum
Hamming distance and to check the number of errors correctable and detectable by
the code.
Problem 5.8 Consider a linear block code described by the generator matrix
2 3
1 1 1 0 0 0
G ¼ 41 0 0 1 1 0 5:
0 1 0 1 0 1

(a) Find generator and parity-check matrix of the equivalent systematic code
(b) Find the weight spectrum of the code. Comment the code correcting and
detecting capabilities.
(c) Find syndromes corresponding to correctable error patterns of the systematic
code. Is this code perfect? Is it a MDS code?
(d) For a binary symmetric channel calculate the probability of the unsuccessful
error detection. Find the probability that the code does not correct the error.
(e) If the code words (bits) are represented by polar pulses (amplitudes +1 and –1)
and are transmitted over the channel with additive white Gaussian noise
(AWGN), explain the optimum decision rule for decoding. Illustrate the rule
for the case when the received word is (0.1; 0.35; −0.2; −1; 1; 0.15). How the
error probability can be estimated for such procedure?

Solution
It is a linear block code (6, 3), code rate R = 1/2.
(a) A systematic code is simply obtained by permuting the columns to obtain as
the first (or last) three columns a unity submatrix I3

2 3
1 0 0 1 1 0
Gs ¼ 4 0 1 0 0 1 1 5 ¼ ½Ik jP;
0 0 1 1 0 1

the parity-check matrix is


2 3
 T  1 0 1 1 0 0
H s ¼ P jInk ¼ 4 1 1 0 0 1 0 5;
0 1 1 0 0 1

it is easy to verify the following


Problems 195

2 3
1 1 0
2 3 60 1 17 2 3
1 0 0 1 1 0 6 7 0 0 0
61 0 17
Gs H Ts ¼ 4 0 1 0 0 1 15 6
61
7 ¼ 40
7 0 0 5:
6 0 07
0 0 1 1 0 1 40 0 0 0
1 05
0 0 1

(b) The list of all codewords of the systematic code is given in Table 5.8, and the
corresponding code weight spectrum is shown in Fig. 5.12. It is obvious that a
nonsystematic code has the same spectrum, although the code words are
different (equivalent codes are considered). The minimum Hamming distance
is dmin = 3, relation dmin  2ec + 1 is satisfied for ec  1 and the code
corrects one error in the code word. When ec = 1 the relation dmin  ec +
ed + 1 is satisfied for ed  1 (one error can be detected and corrected). If the
decision rule is used where ec = 0, two errors in the code word can be
detected.

Table 5.8 Code words list


i1 i2 i3 z1 z2 z3
0 0 0 0 0 0
0 0 1 1 0 1
0 1 0 0 1 1
0 1 1 1 1 0
1 0 0 1 1 0
1 0 1 0 1 1
1 1 0 1 0 1
1 1 1 0 0 0

Fig. 5.12 Weight spectrum


Number of code words with given weight, a(d)

4
of code (6, 3)
3.5

2.5

1.5

0.5

0
0 1 2 3 4 5 6
Hamming weight, d
196 5 Block Codes

Table 5.9 Syndromes Ordinal pattern number Pattern ej Syndrome S


corresponding to error vectors
(optimum coset leaders) 1 (000000) (000)
2 (000001) (001)
3 (000010) (010)
4 (000100) (100)
5 (001000) (101)
6 (010000) (011)
7 (100000) (110)
8 (100001) (111)

(c) Standard array of the code has 2n−k = 8 rows and seven error patterns can be
corrected [besides the trivial one (000000)]. On the basis of relation

S ¼ ej H Ts

it is easy to verify that to every error vector containing only single one (i.e. one
error) corresponds a unique syndrome, as shown in Table 5.9. It is a direct con-
sequence of the fact that parity-check matrix Hs has not two identical columns. It is
obvious that every linear code having this feature, and where n < 2n−k holds, can
correct the single errors. Unused syndrome value (111) clearly corresponds to
double error, and the sum of the corresponding columns of parity-check matrix
yields just this value. One of such error vectors is (100001), but it is not a unique
one—the same condition is fulfilled as well for the patterns (010100) and (001010).
The Hamming bound generally is

Xec  
n
q nk
 ðq  1Þt ;
t¼0
t

where q denotes the code basis (the number of symbols to construct a code words).
Because this code is binary (q = 2, the code word consists of zeros and ones) and
one error at code word can be corrected, the previous relation reduces to
   
nk n n
2  þ ¼ 1 þ n:
0 1

In this case n = 6 and n – k = 3, the Hamming bound is not satisfied with


equality. It can be interpreted as well as follows—the code can correct all single
errors, but cannot correct two errors in the code word, because it is not clear do the
syndrome value (111) corresponds to the error vector (100001) either (010100) or
(001010), because all are equiprobable. All syndrome values are not efficiently used
and the code is not perfect.
Problems 197

Singleton bound can be written in the form

dmin  n  k þ 1;

and, as in this case dmin = n−k = 3, the bound is not satisfied with equality.
Because of that the code is not Maximum Distance Separable (MDS), meaning that
for the same number of parity bits, it may be possible to construct code having the
greater minimum Hamming distance (dmin = 4).
(d) If only the code possibility to detect the errors is considered, it is sufficient to
verify does the received word belong to the set of possible code words. If it
does not belong to this set, it is obvious that the errors occurred during
transmission. However, it is possible to emit one code word and to receive the
other—in this case the error will be undetected.
When the transmission system can be modeled as a binary symmetric channel,
the probability of occurrence of exactly d errors at n positions is

Pðn; d Þ ¼ pd ð1pÞnd ;

and the probability of error undetectability equals the probability that error vector
is the same as the code word. It can be found on the weight spectrum basis [25, 26]

X
n
Pe;d ðpÞ ¼ aðdÞpd ð1  pÞnd ;
d¼dmin

where a(d) denotes the number of code words having Hamming weight d.
On the other hand, the probability of the uncorrectable error equals the prob-
ability that the error vector differs from the coset leaders in the standard array. If the
maximum weight of error patterns which correspond to coset leaders is denoted by
l, and the number of patterns having weight i  l by L(i), this probability is

X
l
Pe;c ðpÞ ¼ 1  LðiÞpi ð1  pÞni ;
i¼0

and for a considered case l = 2, L(0) = 1, L(1) = 6, L(2) = 1.


On the basis of previously found weight spectrum, the corresponding numerical
values for the probabilities that the error is not detected/corrected are shown in
Fig. 5.13. The relations
   
n n
aðdÞ  ; LðiÞ  ;
d i

are always valid, and when the weight spectrum is not known, the coefficients a
(d) and L(i) can be substituted by a binary coefficients yielding the upper bound of
198 5 Block Codes

0
10
missed detection
missed correction
−2
10
Residual bit errir rate

−4
10

−6
10

−8
10

−10
10
−4 −3 −2 −1
10 10 10 10
Crossover probability in BSC, p

Fig. 5.13 The probability that the error is not detected/corrected, BSC

the corresponding probability. For a perfect code these inequalities become


equalities.
(e) Block scheme of a system where the code words are represented by vectors xi
(were the symbols are –1 and 1) and the transmission is through the channel
with additive white Gaussian noise (variance r2) is shown in Fig. 5.14. If the
emitted binary symbols are equiprobable and if at the receiving end, before the
decoder, a decision block is inserted having the threshold put at zero, the
channel can be modeled as a binary symmetric channel. In this case the hard
decision is optimal, based on the syndrome and the standard array, where the
nearest code word is chosen based on the minimum Hamming distance.
However, if at the receiver input the hard decision block is not inserted, it is
possible to find a more efficient decision rule. It is based on the choice of the code
word which minimize the conditional probability density function of the received
sequence

Y
n Y
n
1 ðyi xi Þ2
pðyjxÞ ¼ pðyi jxi Þ ¼ pffiffiffiffiffiffi e 2r2 ;
i¼1 i¼1 2pr

where it is supposed that the adjacent noise samples are statistically independent.
The previous expression can be minimized if the Maximum Likelihood Decoding
(ML) is applied, which reduces in this case to choice of the sequence having a
minimum squared Euclid distance defined by Morelos-Zaragoza [26]
Problems 199

i Error control c Polar line x BSC


Source encoder encoder
Hamming (6,3) (0,1)->(-1,1)
n

Decoder with Decision y


iˆ1 Hemming distances (comparing with
(hard decoding) the threshold)
User
Decoder with
iˆ2 Euclid distances
(soft decoding)

Fig. 5.14 The illustration of two decision rules for linear block codes

X
n
D2 ðx; yÞ ¼ ðyi  xi Þ2 :
i¼1

If this metric is used, it can be considered as a soft decision, regardless if it was


done according to ML or some other decision rule.
The decoding procedure for a received word y = (0.1; 0.35; −0.2; −1; 1; 0.15)
by comparison to the possible emitted words is shown in Table 5.10. The code
word c = (010011) is chosen for which the vector y has a minimum squared
Euclidean distance to the received word, and the decoded information sequence is
i = (010).
It is obvious that even for soft decision, the decision error can be made if the
received vector, in Euclidian sense, is nearer to some other code word. The prob-
ability of such event can be upper bounded [26]

Table 5.10 Illustration of i c X D2(x, y)


soft decoding of a linear block
code 0 0 0 0 0 0 0 0 0 −1 −1 −1 −1 –1 −1 8.995
0 0 1 0 0 1 1 0 1 −1 −1 +1 +1 −1 +1 13.195
0 1 0 0 1 0 0 1 1 −1 +1 −1 −1 +1 +1 2.995
0 1 1 0 1 1 1 1 0 −1 +1 +1 +1 +1 −1 8.395
1 0 0 1 0 0 1 1 0 +1 −1 −1 +1 +1 −1 8.595
1 0 1 1 0 1 0 1 1 +1 −1 +1 −1 +1 +1 4.795
1 1 0 1 1 0 1 0 1 +1 +1 −1 +1 −1 +1 10.595
1 1 1 1 1 1 0 0 0 +1 +1 +1 −1 −1 −1 7.995
200 5 Block Codes

Xn rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aðdÞ Eb
Pe;c;meko ðEb =N0 Þ  erfc dR ;
d¼d
2 N0
min

where R denotes a code rate and Eb/N0 is energy per bit divided by the noise power
density spectrum. The crossover probability for the equivalent BSC is
rffiffiffiffiffiffiffiffiffi
1 Eb
pðEb =N0 Þ ¼ erfc R ;
2 N0

and the probability of residual error versus Eb/N0 for hard decision can be written as

X
l  rffiffiffiffiffiffiffiffiffii  rffiffiffiffiffiffiffiffiffini
1 Eb 1 Eb
Pe;c;tvrdo ðEb =N0 Þ ¼ 1  LðiÞ erfc R 1  erfc R :
i¼0
2 N0 2 N0

The coding gain (G) is usually defined as a saving in Eb/N0 ratio in comparison
to the case when the error control code was not applied, for some fixed error
probability

ð1Þ ð0Þ ð0Þ ð1Þ


Pe;c;tvrdo ðEb =N0 Þ ¼ pðEb =N0 Þ ¼ 10a ) Gtvrdo ð10a Þ ¼ Eb =N0  Eb =N0 ;
ð2Þ ð0Þ ð0Þ ð2Þ
Pe;c;meko ðEb =N0 Þ ¼ pðEb =N0 Þ ¼ 10a ) Gmeko ð10a Þ ¼ Eb =N0  Eb =N0 :

0
10
Residual bit error rate after decoding, Pe,c

−5
10

−10
10

uncoded
code (6,3), hard decoding
code (6,3), soft decoding

−15
10
0 5 10 15
Eb/No [dB]

Fig. 5.15 Probability of residual error versus Eb/N0


Problems 201

For the considered code (6, 3) the corresponding numerical results are shown in
Fig. 5.15.
From the figure the following coding gains can be found:

• for the crossover probability 10−4; • for the crossover probability 10−10;
– Ghard(10−5) = –0.3 dB, – Ghard(10−15) = 0.1 dB,
– Gsoft(10−5) = 2.6 dB, – Gsoft(10−15) = 3 dB,

On the basis of these results the following can be concluded:


• Hard decision becomes efficient only for very high signal-to-noise ratios and the
coding gain becomes positive only for very small values of the error probability
registered by the user;
• The difference between asymptotic coding gains is about 3 dB to the advantage
of soft decision.
One should have in mind that the residual error probability (y-axis) is the error
probability for a code word. To obtain the bit error probability, the corresponding
scaling should be done, depending on the probability of occurring of the various
error combinations. If the crossover probability is sufficiently small (the ratio Eb/N0
sufficiently large), the approximate expression is

Pe;b Pe;c =ðec þ 1Þ:

Problem 5.9 Consider a linear block code described by generator matrix


2 3
1 0 0 1 0 0 0 0 0 0 0 0
60 1 0 0 0 0 0 0 0 0 1 07
6 7
60 0 1 0 0 0 0 0 0 0 1 17
6 7
60 0 0 1 0 0 0 0 0 0 0 17
6 7
60 0 0 0 1 0 0 0 0 0 1 17
G¼6
60
7:
6 0 0 0 0 1 0 0 0 0 0 077
60 0 0 0 0 0 1 0 0 0 1 17
6 7
60 0 0 0 0 0 0 1 0 0 0 17
6 7
40 0 0 0 0 0 0 0 1 0 1 05
0 0 0 0 0 0 1 1 0 1 0 0

(a) Find the generator matrix of an equivalent systematic code. How the corre-
sponding dual code can be described?
(b) Find the weight spectrum of the code defined by the generator matrix G.
202 5 Block Codes

(c) Find the probability that this code does not detect the transmission error. Find
the probability that dual code does not detect the transmission error

Solution
(a) From the theory of linear block codes it is known that the weight spectrum
does not change (the code is equivalent) if the following operations with the
generator matrix are performed:
(1) Permutation of any two columns
(2) Adding of one row multiple to the other row
(3) Multiplication of one row or column with a nonzero element
Generator matrix of the corresponding systematic code has the form Gs ¼ ½Ik ; P,
its parameters are n = 12 and k = 10. To achieve that first eight rows of matrix
G form a unity matrix, the binary one from the first row, the fourth column, should
be removed, as well as from 7th and 8th columns in the last row. The first row of
systematic generator matrix is obtained by addition of the first and fourth row of the
matrix G. Finally, by adding 7th, 8th and the last row of G the last row of new
matrix is obtained
2 3
1 0 0 0 0 0 0 0 0 0 0 1
60 1 0 0 0 0 0 0 0 0 1 07
6 7
60 0 1 0 0 0 0 0 0 0 1 17
6 7
60 0 0 1 0 0 0 0 0 0 0 17
6 7
60 0 0 0 1 0 0 0 0 0 1 17
Gs ¼ 6
60
7 ¼ ½I8 ; P:
6 0 0 0 0 1 0 0 0 0 0 077
60 0 0 0 0 0 1 0 0 0 1 17
6 7
60 0 0 0 0 0 0 1 0 0 0 17
6 7
40 0 0 0 0 0 0 0 1 0 1 05
0 0 0 0 0 0 0 0 0 1 1 0

Parity-check matrix of a systematic code is


 
  0 1 1 0 1 0 1 0 1 1 1 0
H s ¼ PT ; I2 ¼ ;
1 0 1 1 1 0 1 1 0 0 0 1

the dual code of this code has a generator matrix


 
0 1 1 0 1 0 1 0 1 1 1 0
Gd;s ¼ :
1 0 1 1 1 0 1 1 0 0 0 1

Code words of this code (12, 2) are all linear combinations of the rows of the
matrix, i.e. c1 = (000000000000), c2 = (011010101110), c3 = (101110110001) and
c4 = (110100011111). It is obvious that the minimum Hamming distance of dual
code is dmin = 7.
Problems 203

(b) Weight spectrum of the code defined by G can be found directly, by a com-
puter search. In this case all information words (28 = 256) should be found,
everyone should be separately multiplied by generator matrix and find the
Hamming weights of the obtained code words. The weight spectrum of the
code can be written as a polynomial

Að xÞ ¼ a0 þ a1 x1 þ    þ an xn ;

where the number of words having weight d is denoted by ad ð  d  nÞ. The codes
defined by generator matrix G and its systematic version are equivalent, i.e. they
have the same weight spectrum. Because of this, the previous relation determines
the systematic code weight spectrum as well. But, for large values of n and k, the
finding of weight spectrum by computer search is very time consuming.
However, the procedure can be accelerated having in view that for the case
k > (n − k) it is simpler to find the weight spectrum of a dual code. If with B(x) the
spectrum polynomial of dual code is denoted, where the coefficients Bd determine
the spectrum, than the MacWilliams identities can be formulated giving the cor-
respondence between the dual codes spectra written as polynomials [27]
   
1x 1x
AðxÞ ¼ 2ðnkÞ ð1 þ xÞn B ; BðxÞ ¼ 2k ð1 þ xÞn A :
1þx 1þx

In this case the spectrum of dual code (for which Gd,s = Hs) is described by a
polynomial

Bð xÞ ¼ 1 þ 2x7 þ x8 ;

and the spectrum of the code defined by G is determined by a first MacWilliams


identity
  "     #
2 0 1x 2 10 1x 7 1x 8
AðxÞ ¼ 2 ð1 þ xÞ B ¼ 2 ð1 þ xÞ 1 þ 2 þ
1þx 1þx 1þx
h i
¼ 22 ð1 þ xÞ10 þ 2ð1  xÞ7 ð1 þ xÞ3 þ ð1  xÞ8 ð1 þ xÞ2
¼ 1 þ x þ 15x2 þ 63x3 þ 122x4 þ 186x5 þ 238x6 þ 206x7 þ 117x8 þ 53x9 þ 19x10 þ 3x11 :

Spectra of a dual code (12, 2) and the original code (12, 10) are shown in
Fig. 5.16a and b, respectively. The dual code has a possibility to correct all single,
204 5 Block Codes

(a) (b)

d
Number of code words with given weight, ad

Number of codewords with given weight, a


250
2

200

1.5
150

1
100

0.5 50

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12

Hamming weight, d Hamming weight, d

Fig. 5.16 Spectrum of dual code (12, 2) (a) and the original code (12, 10) (b) defined by G

double and triple errors in the code word having the length n = 12 bits, while the
code defined by G cannot even detect all single errors.
(c) As it was explained in the previous problems, the probability that this linear
code does not detect the error is

X
n X
n
p d
Pe;d ðpÞ ¼ ad pd ð1  pÞnd ¼ ð1  pÞn ad ð Þ ;
d¼dmin d¼dmin
1p

and because a(0) = 1 and a(d) = 0 for 1  d  dmin always holds, the previous
relation becomes
" #  
n
X
n
p d n p
Pe;d ðpÞ ¼ ð1  pÞ ad ð Þ  1 ¼ ð1  pÞ Að Þ1 :
d¼0
1p 1p

Use of MacWilliams identity yields


   
p n 1  p=ð1  pÞ
Pe;d ðpÞ ¼ ð1  pÞn 2ðnkÞ ð1 þ Þ B 1 ;
1p 1 þ p=ð1  pÞ

and finally

X
n
Pe;d ðpÞ ¼ 2ðnkÞ bd ð1  2pÞd  ð1  pÞn :
d¼0

On the other hand, the probability that the dual code (12, 2) does not detect the
transmission error is given by
Problems 205

0
10

Probability of missed detection, P (p)


e

−5
10

original code, G
dual code, G
dual

−10
10

−15
10
−3 −2 −1
10 10 10
Crossover probability in BSC, p

Fig. 5.17 The probability that the codes do not detect the code word error versus the crossover
probability

X
n
Pe;d;dual ðpÞ ¼ bd pd ð1  pÞnd :
d¼dmin;dual

The probability that the codes do not detect the error versus the channel error
probability is shown in Fig. 5.17.
Problem 5.10
(a) Explain the construction of the Reed-Muller (RM) codes with parameters (8,
4) and (8, 7). Draw code distances spectra and find their parity-check matrices.
(b) Draw the spectrum of the code obtained by forming 5th, 6th and 7th matrix
G of the code RM(8, 7) if instead of vector multiplication the vector addition
was used. Comment the result.
(c) Explain the majority decoding logic procedure for the code (8, 4) and then
decode the word r = (00000010).
(d) If it is known that in a data transmission system is used Reed-Muller code
which corrects one and detects two errors in the code word, decode the
received word r2 = (0000001000000000).

Solution
It is possible to construct for any natural number m and any natural number r < m a
Reed-Muller code having code word length n = 2m with minimum Hamming
206 5 Block Codes

Table 5.11 Table for the x1 0 0 1 1


Boolean function “or”
x2 0 1 0 1
f(x1, x2) 0 1 1 1

distance dmin = 2m−r [28, 29]. To Reed-Muller code of tth order, denoted by RM
(2m, k, 2m−r), corresponds code word length
r 
X 
m
k¼ ; r  m:
i¼0
i

To describe the Reed-Muller code construction consider Boolean function of


two variables [corresponding to the operator logical “or”, i.e. f(x1, x2) = OR(x1,
x2)], given in Table 5.11.
If every row of the table is rewritten as a vector, i.e. x1 = (0011), x2 = (0101)
and f = (0111), the relation f ¼ x1  x2  x1 x2 is valid, where  and denote
the operators of vector addition and multiplication modulo-2 (the operations are
carried out bit-by-bit).
It can be easily shown that any Boolean function of m variables can be written as
a linear combination of the corresponding vectors, i.e.

f ¼ a0  1 þ a1 x1  a2 x2     am xm  a1;2 ðx1 x2 Þa1;3 ðx1 x3 Þ     am1;m ðxm1 xm Þ


 a1;2;3 ðx1 x2 x3 Þ     am2;m1;m ðxm2 xm1 xm Þ      a1;2;...;m ðx1 x2    xm Þ:

where 1 is a vector consisting of 2m binary ones. Generator matrix of Reed-Muller


code is completely determined by the above equality in such a way that to ith row of
matrix G corresponds ith element of this sum, the number of generator matrix rows
(k < n) is determined by above relation as well.
(a) For a code word length n = 8 it is obtained m = ld(n) = 3, and the minimum
Hamming distance is always dmin = 2m−r, therefore
– for r = 1, dmin,1 = 23−1 = 4, and code corrects ec ¼ ðdmin;1  1Þ=2 ¼ 1
error,
– for r = 2, dmin,2 = 23−2 = 2, and code corrects ec ¼ ðdmin;2  1Þ=2 ¼ 0
errors,
the corresponding information words lengths are
     
3 3 3
k1 ¼ 1 þ ¼ 4; k2 ¼ 1 þ þ ¼ 7:
1 1 2

For the case r = 1, the generator matrix rows are determined by vectors 1, x1, x2,
x3, where x1 corresponds to a bit of the maximum weight (MSB) and x3 corre-
sponds to a bit of minimum weight (LSB) of the corresponding three-bit combi-
nations resulting in
Problems 207

⎡1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 1 1 1 1⎥⎥ ⎡G(8,4)
(0)

G(8,4) =⎢ = ⎢ (1) ⎥ .
⎢0 0 1 1 0 0 1 1⎥ ⎢⎣G(8,4) ⎥⎦
⎢ ⎥
⎣0 1 0 1 0 1 0 1⎦

Weight spectrum of the code is shown in Fig. 5.18a. It is obvious that the
minimum code Hamming distance is dmin,1 = 4 and that the code is equivalent to
Hamming code (8, 4) which has the code rate R1 = 4/8 = 0.5. It is easy to verify
that this code is selfdual, because of

Gð8;4Þ GTð8;4Þ ¼ 04x4 ) H ð8;4Þ ¼ Gð8;4Þ :

When r = 2, generator matrix rows are determined by vectors 1,


x1 ; x2 ; x3 ; x1 ; x2 ; x1 x3 ; x2 ; x3 and the corresponding generator matrix is

⎡1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 1 1 1 1⎥⎥

1⎥ ⎡G(8,7) ⎤
(1)
⎢0 0 1 1 0 0 1
⎢ ⎥ ⎢ (2) ⎥
G(8,7) = ⎢0 1 0 1 0 1 0 1⎥ = ⎢G(8,7) ⎥ .
⎢0 0 0 0 0 0 1 1⎥ ⎢⎢G(8,7)
(3) ⎥
⎢ ⎥ ⎣ ⎦⎥
⎢0 0 0 0 0 1 0 1⎥
⎢0 0 0 1 0 0 0 1⎥⎦

It is easy to verify that the corresponding parity-check matrix is

H ð8;7Þ ¼ ½ 1 1 1 1 1 1 1 1 :

and this code corresponds to a simple parity check, its spectrum being shown in
Fig. 5.18b.

(a) (b)
Number of code words with given weight, a(d)
Number of codewords with given weight, a(d)

14 70

12 60

10 50

8 40

6 30

4 20

2 10

0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Hamming weight, d Hamming weight, d

Fig. 5.18 Weight spectrum of RM(8, 4) (a) and RM(8, 7) (b) code
208 5 Block Codes

If the decoding is carried out by using the relation

S ¼ rH Tð8;4Þ ;

it is clear that the result can be equal to zero or to one, i.e. the syndrome consists of
one bit (scalar). If S = 0 the conclusion can be drawn that during the transmission
there were no errors (or an even number of errors occurred), while if S = 1 the
conclusion can be drawn that one error occurred (or an odd number of errors
occurred). Therefore, single errors can be detected, but not corrected, because their
position cannot be found, which corresponds to the value dmin,2 = 2 used for a code
construction.
(b) If for obtaining the last three rows of the code (8, 7) generator matrix, instead
of multiplying (as for Reed-Muller algorithm), the adding was used, the
generator matrix rows (vectors) would have been 1, x1 ; x2 ; x3 ; x1 ; x2 ; x1 
x3 ; x2 ; x3 yielding

2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
6 7
60 0 1 1 0 0 1 17
6 7
Gð8;7Þ ¼6
60 1 0 1 0 1 0 177;
60 0 1 1 1 1 0 07
6 7
40 1 1 0 0 1 1 05
0 1 0 1 1 0 1 0

and to this code corresponds the parity-check matrix

H ð8;7Þ ¼ ½ 1 1 1 1 1 1 1 1 :

equivalent to the simple parity-check.


However, from code weight spectrum (shown in Fig. 5.17) it is clear that
minimum Hamming weight is dmin = 4 showing that the code has the minimum
Hamming distance dmin = 4 as well, meaning further that it can be used to correct
one and to detect two errors in the code word, but with the higher code rate
(R = 7/8 = 0.875).
The spectrum in Fig. 5.19 is correctly found, but the conclusions drawn from its
pattern are wrong. It is clear that there are eight code words consisting of all zeros
and 112 code words consisting of four ones. Therefore, to the various information
bits combinations corresponds the same code words. The same can be concluded by
observing that the number of possible “four ones” combinations at eight positions is
Problems 209

Fig. 5.19 Weight spectrum 120

Number of words with given weight, a(d)


of the code having the
generator matrix G2 100

80

60

40

20

0
0 1 2 3 4 5 6 7 8
Hamming weight, d

 
8 8!
¼ ¼ 70\112:
4 2  4!

Therefore, the transformation done by the encoder is not of 1:1 type, and the set
of words at the encoder output is not even a block code (and, of course, not a linear
block code). This is a reason that in this case (and generally as well) on the basis of
minimum Hamming weight the minimum Hamming distance cannot be determined,
and especially the number of errors detectable and correctable by this code.
(b) From the generator matrix

⎡1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1⎥⎥
⎢ ⎡ G1 ⎤
G = ⎢0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1⎥ = ⎢ ⎥ .
⎢ ⎥ ⎣G2 ⎦
⎢0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1⎥
⎢⎣0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1⎥⎦

it is obvious that the code words are formed according to rules

c1 ¼ i1 ; c2 ¼ i1  i4 ; c3 ¼ i1  i3 ; c4 ¼ i1  i3  i4 ; c5
¼ i1  i2 ; c6 ¼ i1  i2  i4 ; c7 ¼ i1  i2  i3 ; c4 ¼ i1  i2  i3  i4 :

Sums orthogonal to the fourth information bit are


210 5 Block Codes

ð4Þ
S1 ¼ r1  r2 ¼ c1  c2  e1  e2 ¼ ði1 Þ  ði1  i4 Þ  e1  e2 ¼ i4  e1  e2
ð4Þ
S2 ¼ r3  r4 ¼ c3  c4  e3  e4 ¼ ði1  i3 Þ  ði1  i3  i4 Þ  e3  e4 ¼ i4  e3  e4
ð4Þ
S3 ¼ r5  r6 ¼ c5  c6  e5  e6 ¼ ði1  i2 Þ  ði1  i2  i4 Þ  e5  e6 ¼ i4  e5  e6
ð4Þ
S4 ¼ r7  r8 ¼ c7  c8  e7  e8 ¼ ði1  i2  i3 Þ  ði1  i2  i3  i4 Þ  e7  e8 ¼ i4  e7  e8

each depending only on the fourth information bit and the error vector bit. Further,
any error vector bit does not influence to values of two sums (either to more sums).
Sums orthogonal to the third information bit are

ð3Þ ð3Þ
S1 ¼ r1  r3 ¼ i4  e1  e3; S2 ¼ r2  r4 ¼ i4  e2  e4 ;
ð3Þ ð3Þ
S3 ¼ r5  r7 ¼ i4  e5  e7; S4 ¼ r6  r8 ¼ i4  e6  e8 ;

while sums orthogonal to the second information bit are

ð2Þ ð2Þ
S1 ¼ r1  r5 ¼ i2  e1  e5; S2 ¼ r2  r6 ¼ i2  e2  e6 ;
ð2Þ ð2Þ
S3 ¼ r3  r7 ¼ i2  e3  e7; S4 ¼ r4  r8 ¼ i2  e4  e8 :

On the basis of the calculated orthogonal sums, by majority logic decoding the
corresponding information bit values are estimated

ð4Þ ð4Þ ð4Þ ð4Þ


S1 ; S2 ; S3 ; S4 ! i04
ð3Þ ð3Þ ð3Þ ð3Þ
S1 ; S2 ; S3 ; S4 ! i03
ð2Þ ð2Þ ð2Þ ð2Þ
S1 ; S2 ; S3 ; S4 ! i02

Using the obtained estimations, the code sequence is formed, with the removed
influence of decoded information bits
  ð1Þ
r0 ¼ r  i02 i03 i04 Gð8;4Þ ;

and by majority logic from r′ the information bit i1 is estimated.


For a considered case r = (00001110), and r5 = r6 = r7 = 1, while the other
vector r values are equal to zero. Majority decision is based on the estimations

ð4Þ ð4Þ ð4Þ ð4Þ


S1 ; S2 ; S3 ; S4 ¼ ð0; 0; 0; 1Þ ! i04 ¼ 0
ð3Þ ð3Þ ð3Þ ð3Þ
S1 ; S2 ; S3 ; S4 ¼ ð0; 0; 0; 1Þ ! i03 ¼ 0
ð2Þ ð2Þ ð2Þ ð2Þ
S1 ; S2 ; S3 ; S4 ¼ ð1; 1; 1; 0Þ ! i02 ¼ 1
Problems 211

finally yielding

r0 ¼ r  ð0 01Þ G1 ¼ ð0 0 0 01110Þ  ð0 0 0 01111Þ


¼ ð0 0 0 0 0 0 01Þ ! i01 ¼ 0:

The decoded sequence is i′ = (0100) corresponding to the code word written in


the generator matrix second row c′ = (00001111). The difference between the
received word and reconstructed code word is in one bit only, and the conclusion
can be drawn that the decoder corrected the error on the 8th code word bit.
(c) For a code word length n = 16 the value m = ld(n) = 4 is obtained and for
minimum Hamming distance dmin = 4 the values ec ¼ bðdmin  1Þ=2c ¼ 1
and ed ¼ dmin  ec  1 ¼ 2 are obtained, while the code word length is

 
4
k ¼ 1þ ¼ 5;
1

therefore, it is RM(16, 11) code having the generator matrix

⎡1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1⎤
⎢0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1⎥⎥
⎢ ⎡ G1 ⎤
G = ⎢0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1⎥ = ⎢ ⎥ .
⎢ ⎥ G
⎢0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1⎥ ⎣ 2 ⎦
⎣⎢0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1⎦⎥

Orthogonal sums for i5 include bits of the received word having the ordinal
number determined by the ordinal numbers of column pairs which should be
summed (element by element) to obtain only one “one” in the fifth row, i.e.

i5 : S1 ¼ r1  r2 ; S2 ¼ r3  r4 ; S3 ¼ r5  r6 ; S4 ¼ r7  r8 ; S5 ¼ r9  r10 ; S6
¼ r11  r12 ; S7 ¼ r13  r14 ; S8 ¼ r15  r16 ;

therefore, the adjacent bits are summed and the decision is based on the majority
logic.
Orthogonal sums corresponding to bits i4, i3 and i2 are

i4 : S1 ¼ r1  r3 ; S2 ¼ r2  r4 ; S3 ¼ r5  r7 ; S4 ¼ r6  r8 ; S5 ¼ r9  r11 ; S6 ¼ r10  r12 ; S7 ¼ r13  r15 ; S8 ¼ r14  r16 ;


i3 : S1 ¼ r1  r5 ; S2 ¼ r2  r6 ; S3 ¼ r3  r7 ; S4 ¼ r4  r8 ; S5 ¼ r9  r13 ; S6 ¼ r10  r14 ; S7 ¼ r11  r15 ; S8 ¼ r12  r16 ;
i2 : S1 ¼ r1  r9 ; S2 ¼ r2  r10 ; S3 ¼ r3  r11 ; S4 ¼ r4  r12 ; S5 ¼ r5  r13 ; S6 ¼ r6  r14 ; S7 ¼ r7  r15 ; S8 ¼ r8  r16 ;

and the received bits being apart for 2, 4 and 8 positions are summed. As for i5, the
value of every transmitted bit is found based on the majority decision, and the
estimations are denoted with i2′, i3′, i4′ i i5′.
212 5 Block Codes

In the considered case r = (0000001000000000) and only r7 = 1, while all


others received word bits are equal zero. Majority decision is based on the
estimations

i5 : ðS1 S2 S3 S4 S5 S6 S7 S8 Þ ¼ ð000100000Þ ! i05 ¼ 0


i4 : ðS1 S2 S3 S4 S5 S6 S7 S8 Þ ¼ ð001000000Þ ! i04 ¼ 0
i3 : ðS1 S2 S3 S4 S5 S6 S7 S8 Þ ¼ ð001000000Þ ! i03 ¼ 0
i2 : ðS1 S2 S3 S4 S5 S6 S7 S8 Þ ¼ ð000000010Þ ! i02 ¼ 0

finally yielding

r0 ¼ r  ð0000Þ G1 ¼ ð0000001000000000Þ ! i01 ¼ 0:

The decoded information word is i = (00000) and the reconstructed code word
(all zeros) differs from the received one in one bit.
Problem 5.11 Explain the construction of the following arithmetic codes
(a) binary Brown code defined by relation c = 19i + 61, for the base b = 8;
(b) binary Varshamov code, code word length n = 4;
(c) ternary Varshamov code, code word length n = 4;
For every problem part find the code words, weight spectrum, minimum
Hamming distance and the number of correctable errors by the code. Explain the
notion of asymmetric error and verify whether the codes (b) and (c) satisfy the
Varshamov-Gilbert bound.
Solution
The arithmetic block codes construction is based on the arithmetic operation
connecting decimal representations of information and a code word. These relations
are usually very simple and an overview of some typical constructions follows.
(a) Construction method proposed by David Brown in 1960 uses the rule
c = Ai + B, where c represents a decimal equivalent of the code word and i—
a decimal equivalent of the information word [30]. Coefficients A and B are
chosen so as that the binary representations of all code words have suitable
features and that a decimal equivalent of information word has to fulfill the
condition n  b − 1. The basis b can be any integer greater than two, and in
the case b = 2k binary equivalent of number i can be any combination of k bits,
and the set of code words consists of 2k n-bits combinations.
For b = 8 in this case, decimal equivalents of information words are numbers
0  i  7 and decimal equivalents of code words can be found using the relation
c = 19i + 61, the corresponding set of code words is given in Table 5.12. The code
word length is determined by the number of bits needed to represent the largest
decimal number c (here cmax = 194) yielding here n = 8, the corresponding code
rate is R = k/n = 3/8.
Problems 213

Table 5.12 Illustration of Brown code (8, 3) construction


Information words Code words, c = 19i + 61
Binary Decimal Decimal Binary
representation, i representation, i representation, c representation, c
000 0 61 00111101
001 1 80 01010000
010 2 99 01100011
011 3 118 01110110
100 4 137 10001001
101 5 156 10011100
110 6 175 10101111
111 7 194 11000010
Number of code words with given weight, a(d)

Fig. 5.20 Code distances


spectrum of Brown code (8, 12
3)
10

0
0 1 2 3 4 5 6 7 8
Hamming weight, d

It is obvious that the set of code words does not include combination “all zeros”
leading to the conclusion that the code is not a linear one. This is the reason that the
minimum Hamming distance cannot be found from the code words weights,
because the sum of two code words may not be a code word. The distance spectrum
of non linear codes is obtained by comparing all pairs of code words. In this case
there is total of 2k(2k – 1)/2 = 28 code word pairs and the corresponding spectrum
is shown in Fig. 5.20. Minimum Hamming distance is d = 3 and the code can
correct all single errors, by a search to find the code word which differs from the
received word in one bit. It is interesting that these codes besides the single errors
can correct as well some specific types of multiple errors.
214 5 Block Codes

(b) Rom Varsharmov (Poм Bapшaмoв) in 1965 proposed a method for block
code construction where the ordered n-tuples c = (c1, c2, …, cn) are consid-
ered, where every symbol can be an integer from the set {0, 1, …, q − 1} [31].
This ordered n-tuple is a code word only if the following holds

!
X
n
ixi mod ðn þ 1Þ ¼ 0:
i¼1
Therefore, all the words c = (c1, c2, …, cn) satisfying the above condition form
so called integer block code not being necessarily linear in the general case. This
procedure can be modified by summation using some other modulo m > n (e.g.
m = 2n + 1), the remainder being equal to some in advance fixed number from the
set {0, 1, …, m − 1}.
For q = 2, it is a binary code, and in a case n = 4 and m = n+1 = 5 only the
words (0000), (1001), (0110) and (1111) satisfy the relation
" #
X
4
ixi mod 5 ¼ 0;
i¼1

it is easy to verify that the code is linear and that its generator matrix is
 
1 0 0 1
G¼ ;
0 1 1 0

while the weight spectrum is determined by a(0) = 1, a(2) = 2, a(3) = 0 and a


(4) = 1 (Fig. 5.21a).

(a) (b)
4 4
Number of code word pairs with given distance
Number of code words with given weight

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming weight Hamming distance

Fig. 5.21 Weight spectrum (a) and the code distances spectrum of binary Varshamov code (4, 2)
(b)
Problems 215

It is interesting to note that the weight spectrum is not fully equivalent to the
distances spectrum even when the code is linear one. The code distances spectrum
is found by the comparison of all different code words (giving always a(0) = 0!),
and there is a total of 2k(2k − 1)/2 = 6 combinations, shown in Fig. 5.21b. E.g.
there are two code words pairs ((0000) and (1111), (0110) and (1001)) where the
Hamming distance equals 4, but there is one word only having the Hamming
weight d = 4. However, minimum distance and minimum weight are same, and the
spectrum shapes are the same for d 6¼ 0.
Minimum Hamming distance is dmin = 2 and the code cannot correct all single
errors. However, if it is supposed that the error probability for binary zero is much
higher than the error probability for binary one (the corresponding binary channel
has parameters P(1/0) P(0/1) 0, what is the case for some magnetic recording
systems), i.e. the errors are “asymmetric” and an asymmetric error can result in
following transitions

ð0000Þ ! ð0000Þ; ð0001Þ; ð0010Þ; ð0100Þ; ð1000Þ; ð1111Þ ! ð1111Þ;


ð0110Þ ! ð0110Þ; ð1110Þ; ð0111Þ; ð1001Þ ! ð1001Þ; ð1101Þ; ð1101Þ:

Every received vector, corresponding to the right sides of denoted transitions,


can originate from only one code word, and the code can correct one asymmetric
error. Reader should verify whether this code can correct one error when P
(0/1) P(1/0) 0, and whether this code can correct two asymmetric errors.
Varshamov-Gilbert bound gives the lower bound of a maximum possible code
words number for a given code basis q, code word length n and a minimum code
Hamming distance dmin

d 1  
X n
Aq ðn; dÞ  qn = ðq  1Þ j :
j¼0
j

For a considered example q = 2, n = 4 and d = 2 yielding


Aq ðn; dÞ  24 =ð1 þ 4Þ ¼ 3; 2, what means that it is possible to construct a code
having at least four code words for a given conditions. It is obvious that the
constructed linear block code confirms that it is possible to construct at least one
code satisfying the bound.
(c) For a ternary code, the code words are formed by the numbers from set {0, 1,
2}. For n = 4 and m = 2n + 1 = 9 the relation [32]

!
X
4
ici mod 9 ¼ 0
i¼1
216 5 Block Codes

Fig. 5.22 Code distance

Number of codeword pairs with given distance, a(d)


spectrum for the ternary
25
Varshamov code (4, 2)

20

15

10

0
0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming distance, d

is satisfied by code words (0000), (0111), (0222), (1002), (1120), (1201), (2011),
(2122), (2210) (rk = 9 code words). This code is not linear, although includes
identity element for addition, because e.g.,

ð0111Þ þ ð1002Þ ¼ ð1110Þ;

and it is not a code word meaning that the closure for addition is not satisfied.
Varshamov-Gilbert bound for q = 3, n = 4 and d = 2 yields

34
Aq ðn; dÞ  ¼ 9;
1 þ 4  21

in this case satisfied with equality.


Code distance spectrum was found by comparison all 3 k(3k – 1)/2 = 36 code
words pairs, shown in Fig. 5.22. In this case the minimum Hamming distance is
dmin = 2 as well, and a code corrects the error at one symbol in the code word, if it
is asymmetric. It is important to note that for an integer code, the correcting
capability does not depend on the error weight. In this case the error in the symbols
can result in transitions 0 ! 1 and 0 ! 2, but the code is capable to correct the
error in one symbol, for every of these weight.
Problem 5.12 An integer linear block code is defined over the integer ring Z34 has
the parity-check matrix
 
0 1 1 1
H¼ :
1 0 1 2
Problems 217

(a) Explain the notion of an integer linear block code.


(b) Find the generator matrix of the code. Is it unique?
(c) List all code words, draw its spectrum and find a minimum Hamming distance.
(d) Write the generalized Hamming bound for integer linear block codes. Which
error weights (and in which positions) this code can correct? Find the corre-
sponding syndromes and comment the result.
(e) Find the correction capability of a code defined over the ring Z56 having the
parity-check matrix.

 
0 1 2 3 4 1
H2 ¼ :
1 1 1 1 1 0

Solution
(a) For integer codes information and code words symbols take the values from
the set Zr = {0, 1, 2, …, q − 1} while all operation during the code words
forming are modulo-q.
For any two elements a and b from ring Zq two operations are defined—addition
and multiplication. Ring properties (axioms) are:
1. The set {0, 1, 2, …, q − 1} is an additive Abelian group and 0 is an identity
element for addition modulo-q (denoted by a + b).
2. For the multiplication (denoted by ab), the product is in the ring ab 2 Zr
(closure).
3. The multiplication is associative a(bd) = (ab)d.
4. The multiplication is distributive in relation to addition, i.e. a(b + d) = ab + ad,
(b + d)a = ba + da.
A ring can be commutative (if ab = ba for any pair of elements), but it is not
always the case. Every ring element has an inverse element for addition (their sum
equals 0), but it should be noted that an inverse element for multiplication is not
needed, and the division cannot be defined in that case. If these inverse elements are
in the ring Zq as well and if the multiplication group is commutative, the ring would
become a field. The ring of integers is commutative and the subtraction is defined
by

a  b ¼ a þ q  b;

but the inverse elements for multiplication are not defined (the rational numbers are
not in a set).
New ring Zqn has qn ordered n-tuples, which have symbols from the originating
ring Zq, and Zqn can be regarded as an nth extension of the ring Zq. Now an integer
code with a basis q and parameters (n, k) can be defined as a set of code words,
denoted by c, satisfying the conditions [33]
218 5 Block Codes

n o
c 2 Zqn ; cH T ¼ 0 :

The first condition shows that a code word must be an ordered n-tuple from the
ring Zqn and matrix H is a parity-check matrix, dimensions (n − k)  n, with the
elements from the ring Zq. Of course, the capability of integer code to correct the
errors depends on parity-check matrix.
(b) In this case n = 4 and n – k = 2, and the generator matrix can be found on the
base of relation

2 3
  0 1  
g11 g12 g13 g14 6
61 077¼ 0 0
GH T ¼ 0 ) ;
g21 g22 g23 g24 4 1 15 0 0
1 2

the code is ternary one and all operations are modulo q = 3. The previous matrix
equation can be written in an expanded form

g12 þ g13 þ g14 ¼ 0; g22 þ g23 þ g24 ¼ 0


g11 þ g13 þ 2g14 ¼ 0; g21 þ g23 þ 2g24 ¼ 0;

and it is obvious that this system has not a unique solution. E.g. one solution is
g11 = 2, g12 = 2, g13 = 0, g14 = 1, g21 = 0, g22 = 1, g23 = 1, g24 = 1 and the other
could be g11 = 0, g12 = 1, g13 = 1, g14 = 1, g21 = 2, g22 = 2, g23 = 0, g24 = 1.
It is interesting to note that the code is self-dual, because
2 3
  0 1  
0 1 1 1 6
61 077¼ 0 0
;
1 0 1 2 41 15 0 0
1 2

and one solution for a generator matrix is


 
0 1 1 1
G ¼ H¼ :
1 0 1 2

(c) All code words of the code can be found multiplying all possible information
words by generator matrix yielding
Problems 219

   
0 1 1 1 0 1 1 1
c1 ¼ i1 G ¼ ½ 0 0 ¼ ½0 0 0 0 ; c2 ¼ i2 G ¼ ½ 0 1  ¼ ½1 0 1 2 ;
1 0 1 2 1 0 1 2
   
0 1 1 1 0 1 1 1
c3 ¼ i3 G ¼ ½ 0 2 ¼ ½2 0 2 1 ; c4 ¼ i4 G ¼ ½ 1 0  ¼ ½0 1 1 1 ;
1 0 1 2 1 0 1 2
   
0 1 1 1 0 1 1 1
c5 ¼ i5 G ¼ ½ 1 1 ¼ ½1 1 2 0 ; c6 ¼ i6 G ¼ ½ 1 2  ¼ ½2 1 0 2 ;
1 0 1 2 1 0 1 2
   
0 1 1 1 0 1 1 1
c7 ¼ i7 G ¼ ½ 2 0 ¼ ½0 2 2 2 ; c8 ¼ i8 G ¼ ½ 2 1  ¼ ½1 2 0 1 ;
1 0 1 2 1 0 1 2
 
0 1 1 1
c9 ¼ i9 G ¼ ½ 2 2 ¼ ½2 2 1 0 :
1 0 1 2

Because the code is linear, the Hamming distance between any two code words
corresponds to the Hamming weight of the word obtaining by their addition. The
Hamming weight corresponds to the number of non-zero elements in the code
words, and it is obvious that the minimum Hamming distance in this code is
dmin = 4. The corresponding distance spectrum is shown in Fig. 5.23.
An integer code can correct ec errors of weight t if it is possible to correct all
error vectors e = (e1, e2, …, en) having the Hamming weight w(e)  ec, where
ei 2 ft; t þ 1; . . .; t  1; tg. To achieve this feature, the syndromes correspond-
ing to all correctable error patterns must be different, and a Hamming bound
generalization can be defined
ec  
X n
qnk  ð2tÞi :
i¼0
i
Number of code word pairs with given distance, a(d)

4.5

3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4
Hamming distance, d

Fig. 5.23 Distance spectrum of the integer code defined by matrix H


220 5 Block Codes

As previously concluded q = 3 and ec = 1, yielding


 
4
342  1 þ  2t ) 9  1 þ 8t:
1

Therefore, the errors of the type −1 and +1 at any code word position can be
corrected. Taking into account that for operations modulo-3, the subtraction of one
(–1) corresponds to the adding of two (+2), the correctable error patterns are (0001),
(0002), (0010), (0020), (0100), (0200), (1000) and (2000).
It is obvious that the code can correct all possible error patterns, and the cor-
responding syndromes are defined by

S ¼ rH T ¼ eH T ;

where e is the corresponding error vector and r = c + e is a received vector cor-


responding to code word c.
List of syndromes corresponding to correctable error patterns is as follows

S1 ¼ ½ 0 0 0 1 H T ¼ ½ 1 2 ; S2 ¼ ½ 0 0 0 2 H T ¼ ½ 2 1 ;
T T
S3 ¼ ½ 0 0 1 0 H ¼ ½ 1 1 ; S4 ¼ ½ 0 0 2 0 H ¼ ½ 2 2 ;
T T
S5 ¼ ½ 0 1 0 0 H ¼ ½ 1 0 ; S6 ¼ ½ 0 2 0 0 H ¼ ½ 2 0 ;
S7 ¼ ½ 1 0 0 0 H T ¼ ½ 0 1 ; S8 ¼ ½ 2 0 0 0 H T ¼ ½ 0 2 :

Obviously, when there are no errors, the same syndrome is always obtained

S0 ¼ ½ 0 0 0 0 H T ¼ ½ 0 0 :

(d) In this case n = 6, and n − k = 2, the generator matrix has the dimensions
4  6 and can be written as

2 3 2 3
g11 g12 g13 g14 g15 g16  T 0 0
6 g21 g22 g23 g24 g25 g26 7 60 07
6 7 0 1 2 3 4 1
¼6 7;
4 g31 g32 g33 g34 g35 g36 5 1 1 1 1 1 0 40 05
g41 g42 g43 g44 g45 g46 0 0

the code basis is q = 5 and the equation system corresponding to the first generator
matrix column is

g12 þ 2g13 þ 3g14 þ 4g15 þ g16 ¼ 0


g11 þ g12 þ g13 þ g14 þ g15 ¼ 0
Problems 221

whose one solution is g11 = g12 = g13 = g14 = g15 = 1, g16 = 0, and the other
g11 = 1, g12 = g15 = g16 = 0, g13 = g14 = 2.
The equation of a similar form can be written as well for the other rows of
generator matrix, but the solutions for the various rows of the matrix G have to be
mutually different. One possible form of generator matrix is
2 3
1 1 1 1 1 0
61 2 0 2 0 27
G¼6
42
7;
2 1 0 0 15
0 0 2 1 2 0

where one should be careful that the third and the fourth row are not the linear
combinations of the upper two (the operations are modulo-5).
The corresponding distance spectrum is shown in Fig. 5.24. Total number of
code words is 54 = 625 and a minimum Hamming distance is dmin = 3, the code
can correct ec = 1 error in the code word.
In this case q = 5 and ec = 1, and a generalized Hamming bound is
 
64 6
5 1þ  2t ) 25  1 þ 12t;
1

here t = 2 and the errors of the type −2, −1, 0, +1, +2 can be corrected at any
position in the code word. Correctable error patterns are
Number of code word pairs with given distance, a(d)

300

250

200

150

100

50

0
0 1 2 3 4 5 6
Hamming distance, d

Fig. 5.24 Distance spectrum for integer code defined by matrix H2


222 5 Block Codes

ð000001Þ; ð000002Þ; ð000003Þ; ð000004Þ; ð000010Þ; ð000020Þ; ð000030Þ; ð000040Þ;


ð000100Þ; ð000200Þ; ð000300Þ; ð000400Þ; ð001000Þ; ð002000Þ; ð003000Þ; ð004000Þ;
ð010000Þ; ð020000Þ; ð030000Þ; ð040000Þ; ð100000Þ; ð200000Þ; ð300000Þ; ð400000Þ;

therefore, the code can correct the error of any weight at any position in the code
word.

Brief Introduction to Algebra I

The aim of this subsection is to facilitate the study of block codes. Practically all
proofs are omitted. The corresponding mathematical rigor can be found in many
excellent textbooks [25, 27, 34].
Algebraic systems satisfy rules which are very often the same as applied to an
“ordinary number system”.
Groups
Let G be a set of elements (a, b, c, …). An operation (⊗) is defined for which
certain axioms hold. The operation is a procedure applied to two elements to obtain
the uniquely defined third one (it is in fact a binary operation). It can be denoted
f ða; bÞ ¼ c;

but it is customary to write


a b ¼ c:

In the following, we it will be written very often


ab¼c or a þ b ¼ c;

calling the operation “multiplication” or “addition” although the operation is not


necessarily the multiplication or addition of ordinary numbers.
Axioms for the group:
G1. (Closure) An application of operation to any two group elements results
always in an element of the same group.
8a; b 2 G: a b ¼ c 2 G:

G2. (Associativity) The order of performing the operation is immaterial.

8a; b; c 2 G: a ðb cÞ ¼ ða bÞ c¼a b c:
Brief Introduction to Algebra I 223

G3. (Identity element) There is a unique identity (neutral) element e2G satis-
fying (even the operation is not commutative).
8a 2 G: e a¼a e ¼ a:

If operation is denoted by “+” (i.e. if it is called addition), the neutral element is


written as “0”, and if operation is denoted by “” (i.e. if it is called multiplication), it
is written as “1”.
G4. (Inverse elements) For every element of a set a there is always an unique
inverse (symmetric) element of a set b (even the operation is not commutative)

8a 2 G; 9b 2 G: a b¼b a ¼ e:

If operation is denoted by “+” (i.e. if it is called addition), the inverse element is


written as “–a”, and if operation is denoted by “” (i.e. if it is called multiplication),
it is written as “a−1”.
Additionally, if a group satisfies the commutative law, i e.

8a; b 2 G: a b¼b a;

the group is called Abbelian or commutative.


The uniqueness of the identity element as well of the inverse elements can be
easily proved.
In every group (was it commutative or not) the following is satisfied

ða bÞ1 ¼ b1 a1 :

If the number of elements in a group is finite, the group is called a finite group
(the number of elements is called order of G.
Examples:
1. The set of integers (positive, negative and zero) is a commutative group under
addition.
2. The set of positive rationals is a commutative group under multiplication.
3. The set of real numbers (excluding zero) is a commutative group under
multiplication.
4. The set of n  n real-valued matrices is a commutative group under matrix
addition.
5. The set of n  n nonsingular (det 6¼ 0) matrices is a non-commutative group
under the matrix multiplication.
6. The functions f1(x) = x and f2(x) = 1/x under the operation

f1 ðxÞ f2 ðxÞ ¼ f1 ðf2 ðxÞÞ


224 5 Block Codes

are the only elements of a binary group (i.e. the group consisting of only two
elements). The corresponding table (for every finite group this table can be
constructed) for the operation is

f1 ðxÞ f2 ðxÞ
f1 ðxÞ f1 ðxÞ f2 ðxÞ
f2 ðxÞ f2 ðxÞ f1 ðxÞ

The identity element is f1(x) and every element is its own inverse. Is a finite
group of the order 2.
7. The linear transformations (rotation or reflections) of an equilateral triangle into
itself are the elements of an algebraic group of the order 6.
B

A C

By denoting the transformation corresponding to the 120° counterclockwise


rotation as
 
ABC
;
CAB

and introducing the following notation:


 
ABC
1¼ ; ðno rotation nor reflection)
ABC
 
ABC 
a¼ ; ðcounterclockwise rotation 120 Þ
CAB
 
ABC 
b¼ ; ðcounterclockwise rotation 240 Þ
BCA
 
ABC
c¼ ; ðreflection about bisector of angle A)
ACB
 
ABC
d¼ ; ðreflection about bisector of angle B)
CBA
 
ABC
e¼ ; ðreflection about bisector of angle A)
BAC

defining the operation


Brief Introduction to Algebra I 225

a b

as a transformation a followed by a transformation b, the following table is


obtained:
⊗ 1 a b c d e
1 1 a b c d e
a a b 1 d e c
b b 1 a e c d
c c e d 1 b a
d d c e a 1 b
e e d c b a 1

The obtained group is not commutative (the table is not symmetric with respect
to the main diagonal). The corresponding inverses can be easily found from the
table
8. The set of integers G = {0, 1, …, m − 1} is a group under modulo-m addition
(i.e. the result is obtained as the remainder from dividing the sum by m). For
m = 3, the table is the following

⊕ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1

The group is commutative.


9. The set of integers G = {0, 1, …, p} is not generally a group under modulo-
p multiplication (i.e. the result is obtained as the remainder from dividing the
product by p). However, if p is a prime, G is the group under modulo-
p multiplication.
Comments:
1. There is a group consisting of only one element. It must be the identity element.
All other axioms also hold.
2. There is a group with two elements. One must be the identity element (for
example 0). Denote the other with a. According to G.3: a + 0 = 0 +
a = a. Therefore, a must be its own inverse, i.e. a + a = 0 ) –a = a. The table
is as follows
+ 0 a
0 0 a
a a 0

The table is practically the same as for Example 6. In fact, there is a unique table
for all groups consisting of two elements (all groups of order 2 are isomorphic). The
same table is obtained for modulo-2 addition (by putting a = 1).
226 5 Block Codes

3. All groups of order 3 are also isomorphic (corresponding to modulo-3 addition).


4. There are 2 different groups of order 4 and both are commutative. One is
obtained on the basis of modulo-4 addition.

Subgroup
Let G be a group. Let H be a subset of G. H is a subgroup of G if its elements
satisfy all the axioms for the group itself with the same operation. The identity
element must be in the subgroup.
Examples:
1. In the group of 6 transformations of the equilateral triangle (Ex. 7), the fol-
lowing sets are subgroups: (1, a, b), (1, c), (1, d) and (1, e).
2. In the group of all integers (Example 1), the subset of integers that are multiples
of any integer is a subgroup.

Rings
The introducing of another “operation” results in an algebraic structure called
ring. One operation is called addition (denoted as a + b), the other is called mul-
tiplication (denoted as ab). In order for a set R to be a ring, some axioms must hold.
Axioms for the ring:
R1. The set R is a commutative group under addition (the identity element is
denoted by 0).
R2. (Closure) The operation “multiplication” is defined for any two elements of R,
the product being also always element of R, i.e.

8a; b 2 R; ab 2 R:

R3. (Associativity) The multiplication is associative, i.e.

aðbcÞ ¼ ðabÞc:

R4. (Distributivity) The multiplication is distributive over addition, i.e.

aðb þ cÞ ¼ ab þ ac; ðb þ cÞa ¼ ba þ ca:

The ring is commutative if its multiplication operation is commutative.


It can be easily proved that in any ring
Brief Introduction to Algebra I 227

0a ¼ a0 ¼ 0; aðbÞ ¼ ðaÞb ¼ ðabÞ:

Note that the existence of identity element for multiplication (denoted e.g. by 1)
is not supposed. A ring having this identity element is called a ring with identity. In
such a ring

a1 ¼ 1a ¼ a:

It can be shown that identity element 1 is unique.


Note also that in the ring there need not be any inverse elements. In ring with
identity the inverse elements may exist. In the noncommutative ring left and right
inverse elements may exist. If they exist both, then they are mutually equal, i.e.
there is a unique inverse element. An element of the ring having an inverse element
is called a unit. Further, under ring multiplication the set of units form a group.
Examples:
1. The set of integers (positive, negative and zero) is a commutative ring under
(ordinary) addition and multiplication.
2. The set of real numbers is a commutative ring under (ordinary) addition and
multiplication.
3. The set of all polynomials in one variable (e.g. x) with integer coefficients is a
commutative ring. This means that polynomials may be added and multiplied
giving as a result polynomial with integer coefficients. Inverse elements for
addition are polynomials of the same degree having as coefficients the inverse
elements of the corresponding integers. On the other hand, there are not inverse
elements for multiplication because rational functions are not included in the
ring. Still, the identity element exists, it is zero-degree polynomial p(x) = 1.
Comments:
1. There is a ring consisting of only one element. It must be the addition identity
element (0). The corresponding rules are: 0 + 0 = 0, (0)(0) = 0.
2. There are two possible (different!) rings with two elements. One element is 0,
denote the other with a. The corresponding table for additive commutative
group is (as it must be!)

+ 0 a
0 0 a
a a 0

Of course, 0a = a0 = 0. The question is what is the value of aa? As either


aa = a or aa = 0 both satisfy associative and distributive laws, two different tables
for multiplication are possible

⋅ 0 a ⋅ 0 a
0 0 0 or 0 0 0
a 0 a a 0 0

yielding two rings with different structure.


228 5 Block Codes

Fields
The group can be considered as a set in which additions and subtraction are
possible, the ring—as a structure where besides the addition and subtraction, the
multiplication is also possible. To include division, a more powerful algebraic
structure is needed. This is a field.
Axioms for the field:
F1. The set F is a commutative group under addition.
F2. The set F is closed under multiplication. All set nonzero (6¼0) elements form a
commutative group under multiplication.
F3. The distributive law holds for all field elements, i.e.

8a; b; c 2 F: aðb þ cÞ ¼ ab þ ac; ða þ bÞc ¼ ac þ bc:

Therefore, the field can be considered as a commutative ring with a multi-


plicative identity element (“ring with identity”) in which every nonzero element has
a multiplicative inverse (i.e. all nonzero elements are “units”).
Examples:
1. The set of rational numbers is a field.
2. The set of real numbers is a field.
3. The set of complex numbers is a field.
pffiffiffi
4. The set of the numbers of the type a þ 2 b, where a and b are rational numbers,
is a field.
5. The set of integers is not a field (the inverse elements for multiplication—
rational numbers—are not included).
The field can have an infinite number of elements (as in Examples 1–4). If a field
with finite number of (q) elements exists, it is called a finite field, or a Galois field,
denoted by GF(q). The number of elements is called order of a field. In the fol-
lowing only the finite fields will be considered.
Comments:
1. The smallest field must have two elements—identity element for addition
(0) and identity element for multiplication (1). It is GF(2). The corresponding
tables are

+ 0 1 ⋅ 0 1
0 0 1 0 0 0
1 1 0 1 0 1.

2. It can be shown that the field with q elements exists only if q is a prime (p) or a
power of a prime (pn). All fields of the same order have the same tables (they are
isomorphic).
Brief Introduction to Algebra I 229

3. The tables for the field of order p (a prime) are obtained by modulo-p addition
and modulo-p multiplication. The field is also called a prime field.
Examples:
1. The tables for GF(3) (instead of “2” as a third field element a can be written):
+ 0 1 2 ⋅ 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1.

2. The tables for GF(4) = GF(22). The other two field elements are denoted by
a and b.
+ 0 1 a b ⋅ 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a.

Note that GF(2) is a subfield of GF(22)—the corresponding tables of GF(2) are


contained in tables of GF(22) (but not in tables of GF(3)!). GF(22) is an ex-
tension field of GF(2).
3. The tables for modulo-4 addition and multiplication are:
+ 0 1 2 3 ⋅ 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0(!) 2(!)
3 3 0 1 2 3 0 3 2 1

The table for modulo-4 multiplication even does not correspond to the table of
the group (because 2  2 = 4, the element 2 does not have the inverse element—as
commented earlier the set of integers 0, 1, …, q is a group under modulo-q mul-
tiplication only if q is a prime number). By comparing these tables to tables for GF
(4) it can be noted that even the table for addition in GF(4) does not correspond to
modulo-4 addition.
The further analysis of the finite fields as well as of the way to obtain the tables
for GF(pn) will be postponed for the next chapter.
Coset Decomposition and Factor Groups
Suppose that the elements of a finite group G are g1, g2, g3, … and the elements
of a subgroup H are h1, h2, h3, … (h1 is the identity element). Construct the array as
follows—the first row is the subgroup itself with the identity element at the left (i.e.
h1, h2, …). The first (leading) element in the second row (g1) is any element not
appearing in the first row (i.e. not being the element of the subgroup), the other
elements are obtained by the corresponding (group) operation g1 ⊗ hi (in fact, the
first element of the row can be envisaged as g1 ⊗ h1 = g1). The next row is formed
by choosing any previously unused group element in the first column (e.g. g2). The
construction is continued until all the elements of a (finite) group appear somewhere
230 5 Block Codes

in the array. The row is called a left coset (if the commutativity is supposed, it is
simply a coset), and the first element the coset leader. The array itself is a coset
decomposition of the group. If the commutativity is not supposed there exists also a
right coset.
The array is (the multiplication is supposed):
h1=1 h2 h3 . . . hn
g1h1=g1 g1 h2 g1h3 . . . g1hn
g2h1=g2 g2 h2 g2h3 . . . g2hn
. . . . . . .
. . . . . . .
. . . . . . .
gmh1=gm gmh2 gmh3 . . . gmhn

Further, it can be proved that every element of the group will appear only once in
a coset decomposition. It can be proved also that the group elements g′ and g′′ are in
the same (left) coset if (and only if) (g′)−1g′′ is an element of the subgroup.
There is a well known Lagrange’s theorem stating that the order of a subgroup
must divide the order of a group. It is clearly seen (but not directly proved!) from
this rectangular decomposition. It means further, that in a group whose order is a
prime number there are no subgroups, except the trivial ones—the whole group and
the identity element.
Example:
Consider noncommutative group corresponding to the linear transformations of
an equilateral triangle (Example 7 for groups).
(a) Decomposition using subgroup H1(1, a, b)
1 a b or 1 a b or 1 a b
c e d e d c d c e

(b) Decomposition using subgroup H2(1, c)


1 c 1 c
a d or b e
b e d a.

A subgroup H is called normal (or invariant) if

8h 2 H ^ 8g 2 G: g1 hg 2 H:

For a normal subgroup every left coset is also a right coset. In the commutative
groups every subgroup is normal. For the preceding example, H1 is a normal
subgroup.
Further, for a normal subgroup it is possible to define an operation on the cosets.
In fact a new group is formed, the cosets being its elements. The new group is
called the factor group. The corresponding coset containing the element g is
denoted {g}. The operation definition is
Brief Introduction to Algebra I 231

fg1 gfg2 g ¼ fg1 g2 g

valid only if no matter which element is chosen from the cosets, the resulting coset
is the same. The identity element is the subgroup itself—H = {1}

f1gfgg ¼ f1gg ¼ fgg

and the inverse is the coset containing the corresponding inverse element {g−1}, i.e.
 
fgg g1 ¼ gg1 ¼ f1g:

If the group is commutative, the factor group will be also commutative.


Examples:
1. The linear transformations of an equilateral triangle ABC. The subgroup H1 is
normal. Starting from the following decomposition

1 a b
c e d

denoting the cosets by I = {1} and C = {c}, the following factor group is obtained

I C
I I C
C C I

Of course, the group is isomorphic with the other groups of two elements.
2. Let G be the set of integers (positive, negative and zero)—a commutative group
under addition. Let H be the subgroup consisting of multiples of an integer—
n. All the numbers from 0 to n – 1 are in different cosets. They can be taken as
the coset leaders. For n = 3 the cosets are

0 3 -3 6 -6 9 -9 . . .
1 4 -2 7 -5 10 -8 . . .
2 5 -1 8 -4 11 -7 . . .

Denoting the cosets by {0}, {1} and {2} the corresponding table is obtained
In fact, it is the addition modulo-3.
+ {0} {1} {2}
{0} {0} {1} {2}
{1} {1} {2} {0}
{2} {2} {0} {1}
232 5 Block Codes

Vector Spaces
The notion of a vector space is a well known. However, in the following a
specific kind of vector space will be considered whose vectors have the elements
from a finite field.
Axioms for the vector space:
A set of elements (vectors) V is called a vector space over a field F (whose
elements are called scalars) if it satisfies the following axioms:
V1. (Closure under addition) The set V is a commutative group under addition.
V2. (Scalar multiplication) The product cv 2 V is defined (c 2 F, v 2 V). It can be
called an outer operation.
V3. (Distributivity) Scalar multiplication is distributive over vector addition, i.e.

cðu þ vÞ ¼ cu þ cv:

V4. (Distributivity) (c + d)v = cv + dv


V5. (Associativity) (cd)v = c(dv)
By introducing now the vector multiplication a linear associative algebra A over
a field F can be obtained. It will be needed later (for the next chapter).
Axioms for the linear associative algebra:
A1. (Vector space) The set A is a vector space over F.
A2. (Closure under multiplication) For any two elements u, v 2 A, a product is
defined uv 2 A.
A3. (Associativity) For any three elements u, v, w 2 A, (uv)w = u(vw).
A4. (Bilinearity) u(cv + dw) = cuw + duw, (cv + dw)u = cvu + dwu. (c, d 2 F, u,
v, w 2 V)
Let denote an ordered set of n field elements (an n-tuple) with (a1, a2, …, an). If
the addition and multiplication of n-tuples are defined as follows

ða1 ; a2 ; . . .; an Þ þ ðb1 ; b2 ; . . .; bn Þ ¼ ða1 þ b1 ; a2 þ b2 ; . . .; an þ bn Þ;


ða1 ; a2 ; . . .; an Þðb1 ; b2 ; . . .; bn Þ ¼ ða1 b1 ; a2 b2 ; . . .; an bn Þ;

as well as a multiplication of an n-tuple by a field element

cða1 ; a2 ; . . .; an Þ ¼ ðca1 ; ca2 ; . . .; can Þ;

it can be verified easily that a set of n-tuples forms a vector space over a field.
The identity element is 0(0, 0, …, 0). 0 is the identity element for addition in the
field. The following relations hold:
Brief Introduction to Algebra I 233

0v ¼ 0; a0 ¼ 0; ðvÞ ¼ ð1Þv

(–1 is the inverse element for addition of the multiplication identity element in
the field).
A subspace is a subset of a vector space satisfying the axioms for a vector space.
In a vector space a sum of the form (vector)

u ¼ a1 v 1 þ a 2 v 2 þ    þ a k v k

is called a linear combination of vectors.


A set of vectors v1, v2, …, vk is linearly dependent if (and only if) there is a set of
scalars c1, c2, …, ck, not all being equal to zero, such that

c1 v1 þ c2 v2 þ    þ ck vk ¼ 0:

A set of vectors is linearly independent if it is not linearly dependent.


A set of vectors spans (generates) a vector space if every vector in a space
equals their linear combination. By eliminating the linearly dependent vectors from
the set, a new set of linearly independent vectors will be obtained spanning the
same vector space.
The number of linearly independent vectors in the set that span a vector space is
called the dimension of a space, while the set is called a basis of the space. It can be
shown that in a k-dimensional vector space every set of more than k vectors must be
linearly dependent. On the other hand, in a k-dimensional vector space every set of
k linearly independent vectors is a basis. Therefore, a vector space has more bases.
In fact, every suitable transformation (will be defined later) of the basis leads to a
new basis.
In fact, any n-tuple (vector) can be always written as follows

u ¼ a1 ð1; 0; . . .; 0Þ þ a2 ð0; 1; 0; . . .; 0Þ þ    þ an ð0; 0; . . .; 0; 1Þ:

The vectors (1, 0, …, 0), (0, 1, 0, …, 0) … can be considered as orts. They are
linearly independent. Therefore, the dimension of the vector space consisting of the
vectors with n elements equals n.
The product of two n-tuples defined as follows

ða1 ; a2 ; . . .; an Þðb1 ; b2 ; . . .; bn Þ ¼ a1 b1 þ a2 b2 þ    þ an bn

is called inner product or dot product. If inner product equals zero (0 from the
field), for the vectors is said to be orthogonal. For inner product the commutativity
and distributivity can be easily verified on the basis of the field axioms.
Matrices
In this subsection only the corresponding elements needed for the study of the
codes are exposed.
234 5 Block Codes

The matrix M (m  n) is an ordered set (a rectangular array) of elements of


m rows and n columns
2 3
a11 a12  a1n
6 a21 a2n 7
6 a22  7  
M ¼ 6 .. .. .. .. 7 ¼ aij :
4 . . . . 5
am1 am2    amn

In the following exposition matrix elements are also the elements of a finite field.
Therefore, the matrix rows can be considered as a n-tuples of field elements—
vectors (the same for the columns—vectors of m elements).
The number of linearly independent rows is called the row rank (the same for the
number of linearly independent columns—the column rank). The row rank equals
column rank being the rank of the matrix.
The following set of elementary row operations does not change the matrix rank:
1. Interchange of any two rows.
2. Multiplication of any row by a (nonzero) field element
3. Multiplication of any row by a (nonzero) field element and addition of the result
to another row.
Now, the row-space of a matrix can be defined—a space whose basis consists of
the rows of the matrix. Its dimension equals the rank of the matrix. By performing
the elementary row operations (and obtaining another matrix from the first), the
basis is transformed (the space does not change). These matrices have the same
row-space. From the row-space point of view, linearly dependent rows can be
omitted from the matrix.
A suitable simplification of a matrix can be obtained using elementary row
operations. An echelon canonical form has the following properties:
1. Every leading term of a nonzero row is 1 (multiplication identity element from
the field).
2. Every column containing such a leading term has all other elements 0 (addition
identity element from the field).
3. The rows are ordered in such way that the leading term of the next row is to the
right of the leading term in the preceding row. All zero rows (if any) are below
the other rows.
This procedure corresponds to that one when solving the system of linear
equations by the successive elimination of the variables.
If the rows of n  n matrix are linearly independent, matrix is non-singular
(det 6¼ 0), its rank is n and at the end of procedure an identity matrix is obtained
(corresponding the complete solution of the system of equations).
When multiplying an m  n matrix [aij] by an n  p matrix [bjk] the
m  p matrix [cik] is obtained where cik is the inner product of the ith row of [aij] by
the kth column of [bjk].
Brief Introduction to Algebra I 235

The number of vectors (of n elements) in an n-dimensional vector space over GF


(q) equals qn. It can be shown (because the subspace is a subgroup under addition)
that the number of vectors in a subspace must be qk(k  n). The dimension of the
corresponding subspace is k.
Let V be the vector space of the dimension n. Let V1 be the subspace of a
dimension k. The subspace V2 is called the null space of V1 if every vector of V2 is
orthogonal to every vector of V1. To assert the orthogonality of the subspaces it will
suffice to assert the orthogonality of the corresponding bases (vectors of the basis),
because all the other vectors are their linear combinations. In other words, if the
bases of the orthogonal subspaces V1 and V2 (they are mutually orthogonal—null
spaces for each other) are considered as a rows of the corresponding matrices (M1
and M2), their product is

M1 MT2 ¼ 0 or M2 MT1 ¼ 0T ;

where the corresponding matrix types are: M1(k  n), M2((n − k)  n) and 0
(k  (n – k)). By using the transposing (T) the multiplication of rows of matrix M1
by the rows of matrix M2 is achieved. Therefore, null space V2 for the subspace V1
of dimension k must have the dimension n − k, if the dimension of the whole space
V equals n.
Chapter 6
Cyclic Codes

Brief Theoretical Overview

Cyclic codes (cyclic subspaces) are one of the subclasses of linear codes. They are
obtained by imposing on an additional strong structure requirement. A brief
overview of the corresponding notions of abstract algebra comprising some
examples and comments is at the end of this chapter. Here only the basic definitions
are given. Thus imposed structure allows a successful search for good error control
codes. Further, their underlying Galois field description leads to efficient encoding
and decoding procedures. These procedures are algorithmic and computationally
efficient. In the previous chapter a notion of ring was introduced. For any positive
integer q, there is the ring of integers obtained by modulo-q operations. If q is a
prime integer (p), Galois field GF(p) is obtained. The number of field elements is
just p—all remainders obtained by division modulo-p. These fields are called prime
fields. A subset of field elements is called a subfield if it is a field under the inherited
operations. The original field can be considered as an extension field. Prime
numbers cannot be factored and they have not nontrivial subfields. Consider now a
polynomial over a field GF(q), i.e. the polynomial having the highest power q − 1,
its coefficients being from GF(q). If its leading coefficient equals one, it is called a
monic polynomial. For any monic polynomial p(x) a ring of polynomials modulo p
(x) exists with polynomial addition and multiplication. For a binary ground field
every polynomial is monic. If p(x) is a prime polynomial, i.e. if it is irreducible (it
cannot be further factored) and monic, then a ring of polynomials modulo p(x) is a
field (Galois field). In any Galois field the number of elements is a power of a
prime. In fact, polynomial coefficients are taken from a prime field. For any prime
(p) and any integer (m) there is a Galois field with pm elements. These fields can
have a nontrivial subfields. A primitive element of GF(q) is an element (usually
denoted by a) such that every field element can be expressed as its power. Every
Galois field has a primitive element. Therefore, a multiplicative group in Galois
field is cyclic—all elements are obtained by exponentiation of the primitive element

© Springer International Publishing AG 2017 237


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_6
238 6 Cyclic Codes

(0 is not in the multiplicative group!). A primitive polynomial is a prime polyno-


mial over GF(q) if it has a primitive element of the extension field as a root. The
smallest positive integer e such that for any element g, ge = 1 is called order of the
element g. The order of a primitive element of GF(q) is q − 1. The prime poly-
nomial of smallest degree over GF(q) such that it has an element of the extension
field as a root is called the minimal polynomial of that element. It is unique. The
elements of the extension field sharing the same minimal polynomial are called
conjugates. All roots of irreducible polynomial have the same order. If a is a
primitive element, then all its conjugates are primitive elements as well. All
m
non-zero elements of the extension field of GF(q) are the roots of xq 1  1. In the
following exposition ground filed is a binary field (x + 1 is the corresponding
n

polynomial). However, the same approach is valid for a nonbinary ground field,
taking into account the corresponding relations in this case. Therefore, this poly-
nomial can be uniquely factored using the minimal polynomials. Euclidean algo-
rithm can be used for polynomials as well as for integers.
A vector is an ordered n-tuple of elements from some field. An equivalent is a
polynomial where the elements are ordered as the polynomial coefficients.
Therefore, vectors and polynomials can be considered as a different way to denote
the same extension field elements.
For vector

vða0 ; a1 ; . . .; an1 Þ

the corresponding polynomial is

vðxÞ ¼ an1 xn1 þ . . . þ a1 x þ a0 :

E.g. vector (110010) can be written as polynomial x5 + x4 + x + 1. It should be


noted that this polynomial can be written as well as 1 + x + x4 + x5 or x5 + x +
x4 + 1, in fact in arbitrary order of powers, because the power of x denotes the
position of a coefficient.
To multiplying a polynomial by x, corresponds a cyclic shift of vector coeffi-
cients to the right for one place.
 
x a0 þ a1 x þ . . . þ an1 xn1 ¼ an1 þ a0 x þ . . . þ an2 xn1

because xn = 1.
An ideal is a subset of elements of a ring if it is a subgroup of the additive group
and a result of multiplication of an ideal element by any ring element is in the ideal.
In an integer ring a set of integer is an ideal if and only if it consists of all multiples
of some integer. Similarly, a set of polynomials is an ideal (in a polynomial ring) if
and only if it consists of all multiples of some polynomial. It was shown that a
subspace is the cyclic subspace if and only if it is an ideal.
Let the power of g(x) is n − k, then linearly independent residue classes
(polynomials) are
Brief Theoretical Overview 239

0; gðx); xgðx); . . .; xk1 g(x):

The power of polynomial xkg(x) is n, and after division by xn − 1 (in binary field
−1 = 1!) it will be represented by the corresponding residue.
Now, cyclic code can be defined. Consider vector space dimension n (length of
code vectors). Subspace of this space is called cyclic (code) if for any vector

vða0 ; a1 ; . . .; an1 Þ

vector

v1 ðan1 ; a0 ; a1 ; . . .; an2 Þ

obtained by cyclic permutation of its elements is as well in a cyclic subspace.


Therefore, cyclic codes are a subclass of linear block codes as shown in Fig. 6.1.
Furthermore, from a mathematical point of view it can be said that a cyclic code
is an ideal in polynomial algebra modulo polynomial xn − 1 over the coefficients
field. It is specified completely by the corresponding polynomial that divides xn + 1
(for a binary ground field −1 = 1 and n—a code word length).
In fact, xn + 1 can be factored comprising all irreducible polynomials. Every
such polynomial as well as all their products can be generator polynomials g(x) of
a cyclic code. The parity check polynomial h(x) is obtained from xn + 1 = g(x)h(x).
In Problem 6.1 it is analyzed in details for n = 7. As a consequence, cyclic codes
can be encoded and decoded by multiplying and dividing the corresponding
polynomials. It provides for efficient encoding and decoding procedures. Cyclic
codes can be shortened as well as obtained in the systematic form.
Now, the vector orthogonality and “polynomial orthogonality” will be com-
pared. If a(x)b(x) = 0, then the vector corresponding to a(x) is orthogonal to the
vector corresponding to b(x) with the order of its components reversed as well as to
every cyclic shift of this vector. Of course, the multiplication is commutative,
therefore, a(x)b(x) = 0 ) b(x)a(x) = 0 and the same rule can be applied keeping

Fig. 6.1 Cyclic codes as a


subclass of linear block codes Block codes

Linear block
codes

Cyclic codes
240 6 Cyclic Codes

components of b(x) in normal order and reverting and shifting the components of
a(x).
The following short analysis shows the extreme simplicity in description of
cyclic codes:
– For a complete description of (n, k) block code all code words (2k) should be
known.
– For a complete description of linear block code with the same parameters (n, k),
its basis should be known—k code words of length n (generator matrix).
– For a complete description of the corresponding cyclic code it is sufficient to
know only one code word of length n. All others code words are obtained by its
cyclic shifts and as the linear combinations of these shifts.
There are some trivial cyclic codes. Repetition code (n, 1) is trivial (only two
code words), as well as the code (n, n) (full vector space, but no error control!). The
binary code where the number of ones is even is trivial as well. E.g. code (3, 2)
which has code words (000), (011), (101) i (110), and the code (4, 3) which has
code words (0000), (0011), (0101), (0110), (1001), (1010), (1100) i (1111). It is
interesting that for some codewords length, e.g. for n = 19, binary trivial codes are
in the same time unique existing cycling codes.
Consider polynomial x7 + 1 over GF(2). Here
  
x7 þ 1 ¼ ð 1 þ xÞ 1 þ x þ x3 1 þ x2 þ x3 :

Let the generator polynomial is

gðx) ¼ 1 þ x2 þ x3

The cyclic code (7, 4) is obtained (n = 7, n − k = 3).


One code word can be obtained by using coefficients of generator polynomial,
and the other three linearly independent by its cyclical shifting

gðx) ¼ ð1011000Þ
xgðx) ¼ ð0101100Þ
x2 gðx) ¼ ð0010110Þ
x3 gðx) ¼ ð0001011Þ

and the corresponding generator matrix is


2 3
1 0 1 1 0 0 0
60 1 0 1 1 0 07
G¼6
40
7:
0 1 0 1 1 05
0 0 0 1 0 1 1
Brief Theoretical Overview 241

The other code words (vectors, i.e. polynomials) are their linear combinations. In
Fig. 6.2 all code words are shown as well as the way to obtain them using poly-
nomials corresponding to the rows of generator matrix. There are total seven code
words forming a “cyclic set” (all possible cyclic shifts of the word corresponding to
g(x)). Hamming weight is d = 3. These words are linearly independent and by
summing two words from this set, the code word having Hamming weight d = 4 is
obtained. By its cyclic shifts the other “cyclic set” is obtained having seven code
words as well. Including combinations (0000000) ans (1111111) a complete code
(16 code words) is obtained. It is equivalent to Hamming (7, 4) code which has the
same weight spectrum.
Parity check polynomial is
 
hðx) ¼ ð1 þ xÞ 1 þ x þ x3 ¼ 1 þ x2 þ x3 þ x4 :

It is here obtained by multiplication of the other factors, but it could have been
obtained by division as well, i.e. h(x) = (x7 + 1): g(x).
Polynomial h(x) forms its code (ideal)

hðx) ¼ ð1011100Þ
xhðx) ¼ ð0101110Þ
x2 hðx) ¼ ð0010111Þ:

If the product of two polynomilas equals zero, i.e. if

gðx)  hðxÞ ¼ xn 1 ¼ 0ðmodðxn 1ÞÞ;

then the corresponding coefficients vectors are orthogonal (the coefficients of one of
them should be taken in inverse order). Parity-check matrix is here

g(x) (1011000) g(x)+xg(x) (1110100)


xg(x) (0101100) xg(x)+x2g(x) (0111010)
2
x g(x) (0010110) x2g(x)+x3g(x) (0011101)
3
x g(x) (0001011) 2
g(x)(1+x ) (1001110)
4 2 3
x g(x)=(1+x +x ) g(x) (1000101) g(x)(x+x3) (0100111)
5 2
x g(x)=(1+x+x ) g(x) (1100010) 3
g(x)(1+x ) (1010011)
x6g(x)=(x+x2+x3) g(x) (0110001) g(x)(1+x+x2+x3) (1101001)

(0000000) g(x)(1+x+x3) (1111111)

Fig. 6.2 Coded words of cyclic code (7, 4)


242 6 Cyclic Codes

2 3
0 0 1 1 1 0 1
H ¼ 40 1 1 1 0 1 0 5:
1 1 1 0 1 0 0

It is easy to verify

GH T ¼ 0:

It should be noticed that the rows in this matrix are as well the cyclic permu-
tation of one row. Taking H (h(x)) as a generator matrix (polynomial) a dual code is
obtained.
The same field can be obtained modulo any primitive polynomial. Consider FG
(23). Two primitive polynomials are x3 + x + 1 and x3 + x2 + 1:
x3+x+1 x3+x2+1
(primitive element α) (primitive element β=α 3)
(100) α 0= 1 β 0= 1
1
(010) α = α β 1= β
(001) α 2= α2 β 2= β2
(110) α 3= 1 + α β 3= 1 + β2
4 2 4
(011) α = α + α β = 1 + β + β2
(111) α 5= 1 + α + α2 β 5= 1 + β
6 2 6
(101) α = 1 + α β = β + β2
7 0 7 0
(100) α (α )= 1 β (β )= 1

The finite fields with the same number of elements are isomorphic—they have
the unique operation tables. The difference is only in the way of naming their
elements.
The fact that in both matrix the next rows are only the cyclic shifts of preceding
rows contributes significantly to the elegant and economic encoder and decoder
construction.
From

gðx)  hðxÞ ¼ xn 1 ¼ 0 ðmodðxn 1ÞÞ;

it is obvious that codes generated by g(x) and h(x) are dual codes. Therefore, to
generate cyclic code word from information polynomial i(x) (power k − 1, because
the are k information bits) one should obtain code polynomial (power n − 1),
divisible by generator polynomial g(x). At the receiver, the received word u
(x) should be divided by g(x) and if the remainder equals zero, the conclusion can
be drawn that there were no errors. Also, u(x) could b multiplied by h(x) and verify
does the result (not remainder!) equals zero (modulo xn − 1!).
Brief Theoretical Overview 243

Hardware implementation of cyclic codes is based on using linear switched


circuits to multiply and divide polynomials. Corresponding shift registers have the
number of cells equal to the polynomial power. Therefore, for using g(x) n − k cells
are needed, and for h(x) k cells are needed. It is not important for short codes
(n − k and k do not differ significantly), but for long codes more economical is to
divide. It is just the reason to implement division for CRC Cyclic Redundancy
Check) procedure, to be considered later.
The next question is how to generate a systematic cyclic code. Let the infor-
mation (bit) polynomial is

iðxÞ ¼ ik1 xk1 þ . . . þ i1 x þ i0

and the generator polynomial

gðxÞ ¼ gnk xnk þ . . . þ g1 x þ g0 :

By multiplying, the following is obtained

vðxÞ ¼ iðxÞgðxÞ ¼ vn1 xn1 þ . . . þ v1 x þ v0 :

It must be code polynomial, because one factor is g(x). To obtain systematic


code, information polynomial should be in advance multiplied by xn−k, yielding
new information polynomial

i  ðxÞ ¼ xnk iðxÞ ¼ ik1 xn1 þ . . . þ i1 xnk1 þ i0 xnk :

This result should be divided by g(x) (power n − k). It is obvious that the power
of remainder will be n − k − 1, having in total n − k coefficients

rðxÞ ¼ rnk1 xnk1 þ . . . þ r1 x þ r0 :

Code word is obtained by the following subtraction

vðxÞ ¼ i ðxÞrðxÞ ¼ xnk iðxÞrðxÞ:

The corresponding code vector is


 
ik1 ; . . .; i1; i0 ; rnk1 ; . . .; r1 ; r0 ;

and information bits are not changed, while the rest of bits are parity-checks.
Therefore, a systematic code is obtained. Here, parity-checks are obtained in a more
elegant way comparing to classic way to introduce parity-checks. From the
“hardware point of view” it means that the information bits are put in the upper part
of shift register and parity-checks in the rest of cells.
244 6 Cyclic Codes

Golay code (23, 12) (Problem 6.5) is a very known cyclic code. It was invented
before the invention of cyclic codes and later it was recognized as a cyclic code. It
satisfies the Hamming bound with equality, it is the perfect code.
In communications the Cyclic Redundancy Check (CRC) (Problems 6.6, 6.7,
6.8 and 6.9) is often used, where the cyclic code is applied to detect the errors. In its
systematic version, the number of information bits is not fixed, but it can vary in
some broad range. A main advantage of code shortening is a possibility to adjust
the code word length to a length of the message entering the encoder. In such a
way, the same generator polynomial can be used for encoding the information
words of various lengths. CRC codes are generally used for a channels which have
a low noise power which (except for rare packet errors) can be modeled as a BSC
where the channel error probability is sufficiently small (p  10−2), and the codes
are optimized for this channel type.
CRC procedure is based on forming systematic cyclic code. The information bits
transmitted in one frame are encoded using cyclic codes as follows
 
iðxÞxnk
cðxÞ ¼ iðxÞxnk þ rem ¼ iðxÞxnk þ rðxÞ;
gðxÞ

where rem{} denotes the corresponding remainder. The information and the
parity-check bits are separated. It can be conceived that, to transmitted information
bits, CRC appendix is “attached”. Its dimension is usually substantially shorter than
the rest of he packet (n − k  k). During the transmission an error polynomial e
(x) is superimposed. At the receiver, the received sequence is divided by g(x). The
corresponding remainder is
 
cðxÞ þ eðxÞ
r0 ðxÞ ¼ rem :
gðxÞ

If the remainder equals zero, it is concluded that there were no errors. In the
opposite case, the retransmission can be requested. Problems 6.6, 6.7, 6.8 and 6.9
deal in details with CRC procedure showing as well encoder and decoder hardware
realization.
Cyclic codes are a subclass of linear codes and can be described as well by
matrices. Of course, as said earlier, the polynomial description is more elegant
because the generator polynomial coefficients just form one row of generator
matrix, other rows are its cyclic shifts. Up to now it seemed that one can firstly
chose a generator polynomial and after that has to determine the number of cor-
rectable errors. However, a special class of cyclic codes guarantees the required
number of correctable errors in advance. These are BCH codes (discovered sepa-
rately by Bose and Ray-Chaudhuri at one side, and by Hocquenghem at the other
side) (Problems 6.10, 6.11, 6.12 and 6.13). They are obtained using a constructive
way, by defining generator polynomial roots. Simply, to correct ec errors, the
generator polynomial should have 2ec roots expressed as the consecutive powers of
the primitive element. If it starts from a, the code is called primitive, otherwise, it is
Brief Theoretical Overview 245

Block codes

LBC
Cyclic

BCH

Reed-Solomon
codes

Fig. 6.3 BCH and RS codes as a subset of cyclic codes

not a primitive one. Of course, to find the generator polynomial, the corresponding
conjugate elements should be taken into account and sometimes a number of
correctable errors is augmented (the conjugates continue a series of the consecutive
powers). These codes are very known cyclic codes. In the Fig. 6.3 their position in
the family of linear block codes is shown. Reed-Solomon (RS) codes (considered
later) are a special subset of nonbinary BCH codes.
Any cyclic code can be specified using generator polynomial g(x) roots. Let
these roots from the field extension are

a1 ; a2 ; . . .; ar :

The received polynomial (vector) u(x) is a code word if and only if a1, a2, …, ar
are roots of u(x)—what is equivalent to the fact that u(x) is divisible by. g(x).
However, the coefficients are from ground field and u(x) should have all others
conjugate elements as a roots—meaning that u(x) must be divisible by the other
minimal polynomials

m1 ðxÞ; m2 ðxÞ; . . .; mr ðxÞ:

In fact, generator polynomial is obtained as

gðxÞ ¼ LCMfm1 ðxÞ; m2 ðxÞ; . . .; mr ðxÞg

because some roots can have the same minimal polynomial. Of course, xn − 1 is
divisible by g(x).
It can be shown that for every positive integer m0 and dmin (minimal Hamming
distance) exists BCH code [generator polynomial g(x)], only if it is the minimum
246 6 Cyclic Codes

power polynomial with the coefficients from GF(r), which has the roots from the
extension field
m0 þ 1 0 þ dmin 2
0 ; a0
am 0
; . . .; am
0

where a0 is some element from extension field. Code word length is LCM of the
roots order.
Generally, it is not necessary for r to be a prime. It can be as well power of the
prime. It is only important that GF(r) exists. Used symbols are from this field, but
the generator polynomial roots are from the extension field. Therefore, coefficients
can be from GF(pn) (p—prime) and the roots from GFððpn Þk Þ. Of course, some
roots can be from the ground field, but it is the subfield of the extension field.
Consider the following example. Let p = 2 (binary ground field) and let GF(23)
is generated by primitive polynomial p(x) = x3 + x + 1. Find cyclic code with the
minimal number of parity-check bits, over this field, if one root of generator
polynomial is a1 = a.
Complete GF(23) is shown in Table 6.1 (it should be noticed that an = an−7 and that
identity addition element is omitted (it belongs to the field and sometimes is denoted as
a−∞). Groups of conjugate elements (roots) are denoted with symbols A, B and C.
Conjugate elements for a are a2 and a4, yielding
  
gðxÞ ¼ ðx þ aÞ x þ a2 x þ a4 ¼ x3 þ x þ 1:

The same generator polynomial is obtained for a1 = a2 and for a2 = a4.


Parity-check polynomial is obtained from g(x)h(x) = 1, i.e. h(x) = (x + 1)
(x3 + x2 + 1).
If for roots of generator polynomial a1 = a and a2 = a3 are chosen, two different
minimal polynomials are obtained

m1 ðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þ ¼ x3 þ x þ 1


m2 ðxÞ ¼ ðx þ a3 Þðx þ a6 Þðx þ a5 Þ ¼ x3 þ x2 þ 1ða12 ¼ a5 Þ

Table 6.1 Galois field GF(23)


Exponential equivalent Polynomial Binary equivalent Group of conjugate roots
equivalent
a0 1 001 A
a1 a 010 B
a2 a2 100 B
a3 a+ 1 011 C
a4 a2+ a 110 B
a5 a2+ a+ 1 111 C
a6 a2+ 1 101 C
Brief Theoretical Overview 247

and
  
gðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ x3 þ x þ 1 x3 þ x2 þ 1 ¼ x6 þ x5 þ x4 þ x3 þ x2 þ x þ 1;

yielding
hðxÞ ¼ x þ 1:

Further, for a1 = 1 and a2 = a, the corresponding minimal polynomials are:

m1 ðxÞ ¼ x þ 1
m2 ðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þ ¼ x3 þ x þ 1

yielding
 
gðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ ðx þ 1Þ x3 þ x þ 1 ¼ x4 þ x3 þ x2 þ 1

and

hðxÞ ¼ x3 þ x2 þ 1:

Generally, consider GF(2m). It can be shown that for any two positive integers
m (m  3) and ec (ec < 2m−1) exists binary BCH code having the following
characteristic:
– Code word length n = 2m − 1
– Number of control bits n – k  mec
– Minimal Hamming distance: dmin  2ec + 1.
This code can correct all combinations of ec of smaller number of errors.
Generator polynomial is a minimal power polynomial which has for roots suc-
cessive powers of a:

a; a2 ; a3 ; . . .; a2ec :

Some of conjugate elements can continue a series of successive powers yielding


the obtained Hamming distance greater than the projected one (Problem 6.11).
The above considered codes (m0 = 1) are called primitive BCH codes. However,
the series of successive powers can start from any nonprimitive element (b). It is
only important that 2ec successive powers of this element (b, b 2, …) are the roots of
generator polynomial. In such way nonprimitive BCH codes are obtained.
Generally, error correction using cyclic code is not an easy problem. Of course,
the syndrome components can be found as the received word values for code roots.
Syndromes can be obtained as well from error locators (locator Xi = ak corre-
sponds to ith error which occurred at the kth position in a code word) (Problem
6.12). The corresponding equations are called power-sum symmetric functions
248 6 Cyclic Codes

(Problem 6.12), they form a system of nonlinear algebraic equations of more


variables and they are not suitable for solution by a direct procedure. Peterson
(Problems 6.12 and 6.13) showed that, by introducing a notion of error locator
polynomial, these equations can be transformed into a series of linear equations. It
is a polynomial which has as the roots the error locators. In the special case, when
all operations are in the binary field and when there are exactly m = ec errors, a
system is additionally simplified. The next algorithm was proposed by Berlekamp
(Problems 6.13). To apply his algorithm, firstly a syndrome polynomial is defined.
Further, an error magnitude polynomial is formed. In a binary case all errors have
the same magnitude (1). It is interesting that for nonbinary (RS) codes. Berlekamp
algorithm can now be defined as a procedure for finding coefficients of special
polynomial, by solving the series of smaller problems. Berlekamp algorithm is
more difficult to understand than Peterson approach, but it provides for the sub-
stantially efficacy implementation. A complexity of Peterson technique increases
with the square of the number of corrected errors, having efficient application for
binary BCH decoders correcting a small number of errors. On the other hand, the
Berlekamp algorithm complexity increases linearly, allowing forming efficacy
decoders correcting more tens of errors.
Reed-Solomon codes are an important class of nonbinary cyclic codes (Problems
6.14, 6.15, 6.16, 6.17, 6.18 and 6.19). They do not use symbols from GF(2), but
from some field extension. They are discovered in 1960 but it was soon recognized
that they are a special case of nonbinary BCH codes. Here code symbols and
generator polynomial roots are from the same field. Generally, code symbols are
from GF(q) (q primitive or primitive power) and the roots are from GF(qm).
Therefore, here m = 1. If a is a primitive element, code word length is

n ¼ qm 1 ¼ q1

and RS codes are relatively “short”. Minimal polynomial for any element b is just

mb ðxÞ ¼ xb:

Generator polynomial of RS code correcting ec (symbols!), if one starts from a, is


 
gðxÞ ¼ ðxaÞðxa2 Þ. . . xa2ec ;

its power is always 2ec because conjugate elements do not exist. Number of control
symbols is n − k = 2ec and a code word is obtained by multiplication of infor-
mation polynomial and generator polynomial.
In original paper the case q = 2b was considered (binary code for alphanumerical
characters), i.e. field GF(2b). Minimum Hamming distance for RS (n, k) code is

d ¼ nk þ 1;
Brief Theoretical Overview 249

satisfying Singleton bound with equality. Therefore, it is a MDS code. In other


words, for a specified (n, k), any other code can not have greater minimum
Hamming distance than RS code. For q = 2b, RS code parameters are:
– code word length: n = b(2b − 1);
– number of control bites: n − k = 2bec (ec—number of bit errors);
– code words number: |C| = 2mk [there are 2mk information polynomials, each of
k coefficients can be any field element (2m)].
Let power of g(x) is n − k, (k—number of information bits). If g(x) generates
linear cyclic code over GF(2r), the generator matrix is
2 3
gðxÞ
6 xgðxÞ 7
6 7
G¼6 .. 7:
4 . 5
xk1 gðxÞ

RS code can be shortened as well.


Consider GF(2) and the extension field GF(2b = 4), obtained modulo polyno-
mial x2 + x + 1 where the symbols are 0, 1, a and a2. Find RS codes for correcting
one and two errors.
Code (3, 2) has the generator polynomial

gðxÞ ¼ xa:

Corresponding 16(=24) code words are

000 101 a0a a2 0a2 011 110 a1a2 a2 1a


0aa 1aa2 aa0 a2 a1 0a2 a2 1a2 a aa2 1 a2 a2 0

yielding code with one “parity-check” (a2 = a+1).


For (3, 1) code, generator polynomial is (a3 = 1)
   
gðxÞ ¼ ðxaÞ xa2 ¼ x2  a þ a2 x þ a3 ¼ x2 þ x þ 1;

and, repetition code is obtained

000 111 aaa a2 a2 a2 :

For a different (3, 1) code, generator polynomial is (a0 = 1)

gðxÞ ¼ ðx1ÞðxaÞ ¼ x2 ða þ 1Þx þ a ¼ x2 þ a2 x þ a;


250 6 Cyclic Codes

and the code words are

000 1a2 a a1a2 a2 a1:

Consider now field GF(2b = 8) (Table 6.1). Starting from the primitive element
a, find RS code for correcting one error. There is no need to find the conjugate
elements. The generator polynomial roots are a and a2

gðxÞ ¼ ða þ xÞða2 þ xÞ ¼ a3 þ a4 x þ x2 ;

and n = 7, n − k = 2 yielding RS code (7, 5), the number of code words is


|C| = 23(7−2) = 32768. Generator matrix (it can be also given in a binary form (15
rows, 7 columns)) is
2 3
a3 a4 1 0 0 0 0
60 a3 a4 1 0 0 07
6 7
G¼6
60 0 a3 a4 1 0 077:
40 0 0 a3 a4 1 05
0 0 0 0 a3 a4 1

Gorenstein and Zierler (Problem 6.17) modified Peterson algorithm to decode the
nonbinary codes. Berlekamp-Massey algorithm for RS codes decoding starts from a
fact that the error position polynomial does not depend on error magnitudes in the
code word. It makes possible to use Berlekamp algorithm to find the error position,
and later on to determine the error magnitudes and to do their correction. Forney
algorithm (Problem 6.18) for roots finding allows to determine the error magnitude if
its location is known, as well as error locator polynomial and magnitude error
polynomial, but to find a magnitude of error, the position of other errors are not
explicitly used. The erasure of symbols is a simplest way of soft decision. For
nonbinary codes the location of erased symbol is known, but not the error magnitude.
Here erasure locator polynomial is calculated by using known erasure locators.
Generally, at the beginning the error locator polynomial and be found [e.g. by using
Berlekamp-Massey (Problem 6.18) or Euclidean algorithm (Problem 6.19)], and
then a main equation for errors and erasures decoding is used to obtain a final solution.

Problems

Problem 6.1 Consider the linear block codes, word code length n = 7.
(a) How many cyclic codes can be constructed which have this word length? Find
their generator polynomials.
(b) For one of the possible generator polynomials find all code words and code
weight spectrum. Comment the code possibilities concerning the error
correction.
Problems 251

(c) Whether g(x) = 1 + x3 + x4 can be the generator polynomial of cyclic (7, 3)


code?
(d) Form a systematic generator matrix for the code with the generator polynomial
g(x) = 1 + x.

Solution
Consider a vector space whose elements are all vectors of the length n over GF(q).
A subspace of this space is called cyclic if for any vector ðc0 ; c1 ; . . .; cn1 Þ, a vector
ðcn1 ; c0 ; c1 ; . . .; cn2 Þ, obtained by cyclic permutation of its coefficients also
belongs to this subspace. For an easier analysis a code vector ðc0 ; c1 ; . . .; cn1 Þ can
be represented as well as a polynomial, which has the coefficients equal to the
vector elements [34].

cð xÞ ¼ cn1 xn1 þ . . . þ c1 x þ co :

A vector subspace is cyclic if it consists of all polynomials divisible by a


generator polynomial gðxÞ, which is in the same time a divisor of polynomial
xn  1, where n is a code word length. In a binary field (q = 2) the subtraction is
equivalent to the addition, and for binary cyclic codes it can be written

xn þ 1 ¼ gðxÞhðxÞ;

where h(x) is a parity-check polynomial. If a degree of polynomial gðxÞ is n − k, a


subspace dimension is k, the power of polynomial hðxÞ is k and a cyclic code ðn; kÞ
is obtained.
Because of this the first step to find the generator polynomial of binary code is a
factorization of polynomial xn þ 1 according to

Y
l
xn þ 1 ¼ mi ðxÞ;
i¼1

where m1(x), m2(x), … are irreducible (minimal) polynomials over field GF(2).
(a) For n = 7 the factorization is as follows
  
x7 þ 1 ¼ ð1 þ xÞ 1 þ x þ x3 1 þ x2 þ x3 ;

the procedure to find the minimal polynomials will be described in details


later.
According to previously given rules, a generator polynomial can be any
combination of minimal polynomials. There are three irreducible factors and it
is possible to form 2n−k − 1 = 7 generator polynomials (polynomial g(x) =
x7 + 1 would generate a code word without information bits), each one
defining one cyclic code. Taking into account that the generator polynomial
252 6 Cyclic Codes

highest power is always n − k, all possible generator polynomials and the


corresponding code parameters are:
(1) g(x) = 1 + x ) n − k = 1, (7, 6) code
(2) g(x) = 1 + x + x3 ) n − k = 3, (7, 4) code
(3) g(x) = 1 + x2 + x3 ) n − k = 3, (7, 4) code
(4) g(x) = (1 + x) (1 + x + x3) ) n − k = 4, (7, 3) code
(5) g(x) = (1 + x) (1 + x2 + x3) ) n − k = 4, (7, 3) code
(6) g(x) = (1 + x + x3) (1 + x2 + x3) ) n − k = 6, (7, 1) code
(7) g(x) = 1 ) n − k = 0, (7, 7) code
In the first case only one parity-check bit is added corresponding to a simple
parity-check, and a parity-check polynomial is h(x) = (1 + x + x3)
(1 + x2 + x3). In the sixth case, the obtained code is a sevenfold repetition (n-
fold repetition is always a cyclic code!). In the last case the parity-check bits
are not added, the information sequence is unchanged and a code word is the
same as the information word.

(b) Previous analysis shows that one possible cyclic code generator polynomial is
 
g1 ð x Þ ¼ ð 1 þ x Þ 1 þ x þ x 3 ¼ 1 þ x 2 þ x 3 þ x 4
¼ g0 þ g1 x þ g2 x 2 þ g3 x 4 þ g4 x 4 :

For arbitrary generator polynomial the elements of the first row of generator
matrix are determined by its coefficients (starting from the lowest power),
while every next row is obtained by shifting the row above for one place at the
right as follows
2 3 2 3
g0 g1 g2 g3 g4 0 0 1 0 1 1 1 0 0
G1 ¼ 4 0 g0 g1 g2 g3 g4 0 5 ¼ 40 1 0 1 1 1 0 5:
0 0 g0 g1 g2 g3 g4 0 0 1 0 1 1 1

In a linear block code there is always all zeros word. The second code word
can be red from generator polynomial coefficients (it corresponds to the
combination (100) at the encoder input) and other six can be obtained as its
cyclic shifts.

ð1011100Þ ! ð0101110Þ ! ð0010111Þ ! ð1001011Þ ! ð1100101Þ


! ð1110010Þ ! ð0111001Þ:

It is obvious that a code have one word with weight zero and seven words with
weight four (a(0) = 1, a(4) = 7), and it can correct one error and detect two
errors in the code word. Reader should verify to which information words
correspond written code words (it can be easily found by using a generator
matrix).
Problems 253

(c) Suppose at a moment that

g2 ð x Þ ¼ 1 þ x 3 þ x 4

is a generator polynomial of a cyclic code (7, 3). The polynomial g(x) highest
power is n − k = 4 and it seems possible that a cyclic code is obtained. If
g(x) is cyclic code generator polynomial, then xg(x) i x2g(x) must be also the
code polynomials and a generator matrix is
2 3
1 0 0 1 1 0 0
G2 ¼ 4 0 1 0 0 1 1 0 5:
0 0 1 0 0 1 1

Code words of this linear block code are obtained multiplying all possible
three-bit information vectors by a generator matrix, given in Table 6.2.
It is clear that by cyclic shifts of the code word c(1) = (000000) no other code
word can be obtained. Cyclic shifts of a word (1001100) yield
c(5) = (1001100) ! c(3) = (0100110) ! c(2) = (0010011) ! (1001001) !
(1100100) ! (0110010) ! (0011001), and the last four combinations are not
code words. Because of the existence of at least one code word whose all
cyclic shifts are not code words, this code obviously is not cyclic.
It can be noted that the same conclusion could be drawn from the fact that the
supposed generator polynomial g(x) = 1 + x3 + x4 does not divide a polyno-
mial x7 + 1 without remainder, because it cannot be written as a product of its
minimal polynomials. It should be noted as well that the spectrum of a code
defined by generator matrix G2 is
8
>
> 1; d ¼ 0;
>
>
< 3; d ¼ 3;
aðdÞ ¼ 2; d ¼ 4;
>
>
> 1; d ¼ 5;
>
:
1; d ¼ 6;

Table 6.2 Transformations defining one block code


Word ordinal number Information word Code word
1 0 0 0 0 0 0 0 0 0 0
2 0 0 1 0 0 1 0 0 1 1
3 0 1 0 0 1 0 0 1 1 0
4 0 1 1 0 1 1 0 1 0 1
5 1 0 0 1 0 0 1 1 0 0
6 1 0 1 1 0 1 1 1 1 1
7 1 1 0 1 1 0 1 0 1 0
8 1 1 1 1 1 1 1 0 0 1
254 6 Cyclic Codes

and this code has a minimum Hamming distance dmin = 3, it detects one error
and can correct it as well. However, the code is not cyclic and cannot be
defined by a generator polynomial g2(x), i.e. the transition from g2(x) to G2,
was not correct.
(d) Starting from a generator polynomial g(x) = 1 + x = g0 + g1x by previ-
ously described procedure a starting nonsystematic generator matrix of
this cyclic code is obtained

2 3 2 3
g0 g1 0 0 0 0 0 1 1 0 0 0 0 0
60 g0 g1 0 0 0 07 60 1 1 0 0 0 07
6 7 6 7
60 0 g0 g1 0 0 07 6 07
G¼6 7 ¼ 60 0 1 1 0 0 7;
60 0 0 g0 g1 0 07 6
7 07
6 60 0 0 1 1 0 7
40 0 0 0 g0 g1 0 5 40 0 0 0 1 1 05
0 0 0 0 0 g0 g1 0 0 0 0 0 1 1

the addition of all six rows and putting the result in the first row yields
2 3 2 3
1 1 0 0 0 0 0 1 0 0 0 0 0 1
60 1 1 0 0 0 07 60 1 1 0 0 0 07
6 7 6 7
60 0 1 1 0 0 07 6 07
GðIÞ ¼6 7 ! GðIIÞ ¼ 6 0 0 1 1 0 0 7:
60 0 0 1 1 0 07 60 0 0 1 1 0 07
6 7 6 7
40 0 0 0 1 1 05 40 0 0 0 1 1 05
0 0 0 0 0 1 1 0 0 0 0 0 1 1

Further, in matrix G(II) the rows 2–6 are added and the result is put into the
second row of a new matrix G(III), where the rows 3–6 are added and put in the
next one
2 3 2 3
1 0 0 0 0 0 1 1 0 0 0 0 0 1
60 1 1 0 0 0 07 60 1 0 0 0 0 17
6 7 6 7
60 0 1 1 0 0 07 6 07
GðIIÞ ¼ 6 7 ! GðIIIÞ ¼ 6 0 0 1 1 0 0 7
60 0 0 1 1 0 077 60 0 0 1 1 0 07
6 6 7
40 0 0 0 1 1 0 5 40 0 0 0 1 1 05
0 020 0 0 1 1 3 0 0 0 0 0 1 1
1 0 0 0 0 0 1
60 1 0 0 0 0 17
6 7
60 0 1 0 0 0 17
! GðIVÞ ¼6
60
7:
6 0 0 1 1 0 07 7
40 0 0 0 1 1 05
0 0 0 0 0 1 1
Problems 255

The procedure, which is in fact the Gaussian elimination method, is repeated


two times more to form the fourth and the fifth row, and a generator matrix
finally has a form
2 3 2 3
1 0 0 0 0 0 1 1 0 0 0 0 0 1
60 1 0 0 0 0 17 60 1 0 0 0 0 17
6 7 6 7
60 0 1 0 0 0 17 60 0 1 0 0 0 17
GðIVÞ ¼ 6
60
7 ! GðVÞ ¼ 6 7
6 0 0 1 1 0 077
60
6 0 0 1 0 0 177
40 0 0 0 1 1 05 40 0 0 0 1 1 05
0 0 20 0 0 1 1 3 0 0 0 0 0 1 1
1 0 0 0 0 0 1
60 1 0 0 0 0 17
6 7
60 0 1 0 0 0 17
! GðVIÞ ¼ 6
60
7:
6 0 0 1 0 0 17 7
40 0 0 0 1 0 15
0 0 0 0 0 1 1

It is obvious that G(VI) is a systematic form of code generator matrix corre-


sponding to the general parity-check. The parity-check bit of this code is a sum
of all information bits modulo-2, what can be clearly seen from the last matrix.
Obviously, the same procedure can be used for code (n, n − 1) construction,
for an arbitrary code word length.

Problem 6.2 A cyclic code, code word length n = 7, is defined by the generator
polynomial g(x) = 1 + x + x3.
(a) Verify whether the code obtained by this generator polynomial is cyclic. Find
the corresponding parity-check polynomial, generator matrix and parity-check
matrix.
(b) Describe in details the procedure to find the code word corresponding to
information word (1010). List all code words of the code and comment its
capability to correct errors.
(c) Describe in details the decoding procedure when in transmitted code word
(corresponding to (0111) information word) if the error occurred at the third
position.
(d) Whether the code obtained by shortening of this code for one bit is cyclic as
well?

Solution
(a) A polynomial is generator one if it divides the polynomial xn + 1 without
remainder. To verify it, the polynomial division is carried out:
256 6 Cyclic Codes

(x7+1) : (x3+x+1) = x4+x2+x+1


x7+x5+x4
x 5+x4+1
x 5+x3+x2
x 4+x3+x2+1
x 4+x2+x
x3+x+1
x3+x+1
0

The remainder equals zero and the code is cyclic. Parity-check polynomial is
obtained as the result of a division

h ð xÞ ¼ 1 þ x þ x2 þ x4 :

For a cyclic code the polynomials corresponding to the code words must be
divisible by g(x). Therefore, the generator matrix rows are determined by
coefficients of polynomials g(x), xg(x), x2g(x), x3g(x)
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
G¼6
40
7:
0 1 1 0 1 05
0 0 0 1 1 0 1

Parity-check polynomial h(x) defines the corresponding code as well, and the
corresponding parity-check matrix rows are determined by coefficients of
polynomials h(x), xh(x), x2h(x) red out in a reverse order
2 3
1 1 1 0 1 0 0
H ¼ 40 1 1 1 0 1 0 5:
0 0 1 1 1 0 1

The following relation holds


 
gð xÞhð xÞ ¼ x7 þ 1 ¼ 0 x7 þ 1 ;

and every code vector is orthogonal to the every row of H yielding

G H T ¼ 0:

(b) Information word (1010) is represented by polynomial i(x) = 1 + x2 and a


corresponding code polynomial is obtained as follows
Problems

Table 6.3 Cyclic code generated by a polynomial g(x) = 1 + x + x3


Information word Code word Polynomial
(0 0 0 0) (0 0 0 0 0 0 0) 0 = 0  g(x)
(1 0 0 0) (1 1 0 1 0 0 0) 1 + x + x3 = g(x)
(0 1 0 0) (0 1 1 0 1 0 0) x + x2 + x3 + x4 = xg(x)
(1 1 0 0) (1 0 1 1 1 0 0) 1 + x2 + x3 + x4 = (1 + x)g(x)
(0 0 1 0) (0 0 1 1 0 1 0) x2 + x3 + x5 = x2g(x)
(1 0 1 0) (1 1 1 0 0 1 0) 1 + x + x2 + x5 = (1 + x2)g(x)
(0 1 1 0) (0 1 0 1 1 1 0) x + x3 + x4 + x5 = (x + x2)g(x)
(1 1 1 0) (1 0 0 0 1 1 0) 1 + x4 + x5 = (1 + x + x2)g(x)
(0 0 0 1) (0 0 0 1 1 0 1) x3 + x4 + x6 = x3g(x)
(1 0 0 1) (1 1 0 0 1 0 1) 1 + x + x4 + x6 = (1 + x3)g(x)
(0 1 0 1) (0 1 1 1 0 0 1) x + x2 + x3 + x6 = (x + x3)g(x)
(1 1 0 1) (1 0 1 0 0 0 1) 1 + x2 + x6 = (1 + x + x3)g(x)
(0 0 1 1) (0 0 1 0 1 1 1) x2 + x4 + x5 + x6 = (x2 + x3)g(x)
(1 0 1 1) (1 1 1 1 1 1 1) 1 + x + x2 + x3 + x4 + x5 + x6 = (1 + x2 + x3)g(x)
(0 1 1 1) (0 1 0 0 0 1 1) x + x5 + x6 = (x + x2 + x3)g(x)
(1 1 1 1) (1 0 0 1 0 1 1) 1 + x3 + x5 + x6 = (1 + x + x2 + x3)g(x)
257
258 6 Cyclic Codes

  
cð xÞ ¼ ið xÞgð xÞ ¼ 1 þ x2 1 þ x þ x3 ¼ 1 þ x þ x3 þ x2 þ x3 þ x5
¼ 1 þ x þ x2 þ x5 :

The highest power of generator polynomial is n − k = 3, and it is obvious that


the code (7, 4) is obtained, the corresponding code word in this case is
(1110010). Code words corresponding to all possible information words are
given in Table 6.3, as well as their polynomial representations.
This code has seven words with the weights d = 3 and d = 4, one with the
weight d = 0 and one with the weight d = 7. It is equivalent to Hamming (7,
4) code. However, these codes are not identical because they do not have the
same code words. Both codes are equivalent to the cyclic code whose gen-
erator polynomial is g(x) = 1 + x2 + x3 (given in the Brief Introductory
Remarks). It is obvious that a minimum Hamming distance is dmin = 3 and the
code can detect and correct no more than one error in the code word.

(c) From Table 6.3 it is easy to find that to information word i = (0111) corre-
sponds the code word c = (0100011). If during the transmission an error
occurred at the third position, the received sequence is r = (0110011) and the
corresponding polynomial is r(x) = x + x2 + x5 + x6. The decoding procedure
is based on finding the remainder after the division of received word poly-
nomial by a generator polynomial

(x 6+x 5+x 2+x) : (x 3+x+1) = x 3+x 2+x


x 6+x 4+x3
x 5+x 4+x 3+x 2+x
x 5+x 3+x 2
x 4+x
x 4+x 2+x
x2

The obtained remainder is (001), it is not equal to zero meaning that an error
occurred. To find its position, one should find to what error type corresponds
this remainder. This code can correct all single errors, the syndrome value is
found from S ¼ rH T ¼ ðc eÞH T ¼ eH T and to the error at jth position the
syndrome corresponds being equal to the jth column of a parity-check matrix,
as shown in Table 6.4.

(d) Generator matrix of the code obtained from the originating one by shortening
it for one bit is obtained by eliminating the last row and the last column from
the basic code matrix. There are six cyclic shifts of the code word and all
words (given in Table 6.5) have not the same Hamming weight, it is obvious
that the shortening disturbed the cyclical property. However, the minimum
Problems 259

Table 6.4 Syndromes corresponding to correctable error patterns


Error pattern Syndrome
1 0 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 1
0 0 0 1 0 0 0 1 1 0
0 0 0 0 1 0 0 0 1 1
0 0 0 0 0 1 0 1 1 1
0 0 0 0 0 0 1 1 0 1
0 0 0 0 0 0 0 0 0 0

Table 6.5 Code words of the code (6, 3) obtained by cyclic code (7, 4) shortening
Information word Code word
0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 1
0 1 0 0 1 1 0 1 0
0 1 1 0 1 0 1 1 1
1 0 0 1 1 0 1 0 0
1 0 1 1 1 1 0 0 1
1 1 0 1 0 1 1 1 0
1 1 1 1 0 0 0 1 1

Hamming distance was not changed and the code still can detect and correct
one error in the code word (but having a smaller code rate R = 1/2). It is easy
to verify that this code is equivalent to shortened Hamming code (6, 3) from
Problem 4.8.

Problem 6.3 Generator polynomial of a cyclic code (code word length n = 7) is


 
gð x Þ ¼ ð 1 þ x Þ 1 þ x 2 þ x 3 :

(a) Find the code generator matrix and list all code words.
260 6 Cyclic Codes

(b) Explain the procedure to obtain one code word of the systematic version of the
code. Write all code words and a corresponding generator matrix.
(c) Do the generator matrix of a systematic code can be obtained from the cor-
responding matrix of the nonsystematic code and how?

Solution
(a) By applying the procedure from the previous problem, it is easy to find the
generator polynomial

gð x Þ ¼ 1 þ x þ x 2 þ x 4 ;

and, as g0 = g1 = g2 = 1, g3 = 0 and g4 = 1, the generator matrix is


2 3 2 3
g0 g1 g2 g3 g4 0 0 1 1 1 0 1 0 0
G¼4 0 g0 g1 g2 g3 g4 0 5 ¼ 40 1 1 1 0 1 0 5:
0 0 g0 g1 g2 g3 g4 0 0 1 1 1 0 1

To obtain a nonsystematic code, the code polynomial is found from c(x) = i(x)
g(x), the list of all code words is given in Table 6.6.
(b) Code polynomial of a systematic cyclic code is found using the relation
 
iðxÞxnk
cð xÞ ¼ ið xÞxnk þ rem
gðxÞ

where rem{.} denotes the remainder (polynomial) after division. In this case
n − k = 4 and to information word i = (111) corresponds the polynomial i
(x) = 1 + x + x2, while the code polynomial is
 
  ð1 þ x þ x2 Þx4
cð xÞ ¼ 1 þ x þ x2 x4 þ rem :
1 þ x þ x2 þ x4

The remainder after division is obtained as follows

Table 6.6 Code words of cyclic code (7, 3), nonsystematic and systematic version
Information Code word of the Code word of the
word nonsystematic code systematic code
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0
0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0
0 0 1 0 0 1 1 1 0 1 1 1 0 1 0 0 1
1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0
0 1 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1
1 0 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1
1 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 1
Problems 261

 
as rem ðiðxÞ  xnk Þ=gðxÞ ¼ x, a code word is obtained from the code
polynomial coefficients

cð xÞ ¼ x4 þ x5 þ x6 þ x ¼ x þ x4 þ x5 þ x6 ! c ¼ ð0100111Þ:

To information words i1 = (100), i2 = (010) and i3 = (001) correspond the


code polynomials
 
4 x4
c1 ð xÞ ¼ x þ rem ¼ 1 þ x þ x2 þ x4
1 þ x þ x2 þ x4
 
x5
c2 ð xÞ ¼ x5 þ rem ¼ x þ x2 þ x3 þ x5
1 þ x þ x2 þ x4
 
x6
c3 ð xÞ ¼ x6 þ rem ¼ 1 þ x þ x3 þ x6
1 þ x þ x2 þ x4

and a generator matrix of systematic code can be easily found


2 3
1 1 1 0 1 0 0
Gs ¼ 4 0 1 1 1 0 1 0 5:
1 1 0 1 0 0 1

Using the linear combinations of the generator matrix rows, the code words of
this systematic code can be easily found (given in the third column of the
Table 6.6). It is obvious that both systematic and nonsystematic codes are
cyclic, because by a cyclic shift of any code word, the code word is obtained
as well. In this case a set of code words is identical, the code being systematic
or not.
(c) Generator matrix of the systematic code can be easily found, its third row will
be obtained by summing the first and the third row of the nonsystematic code
generator matrix.

⎡1 1 1 0 1 0 0 ⎤ ⎡1 1 1 0 1 0 0 ⎤
G = ⎢⎢0 1 1 1 0 1 0 ⎥⎥ ⎢ ⎥
⎢ 0 1 1 1 0 1 0 ⎥ = Gs .
⎢⎣0 0 1 1 1 0 1 ⎥⎦ ⎢⎣1 1 0 1 0 0 1 ⎥⎦
262 6 Cyclic Codes

Problem 6.4 The cyclic code generator polynomial is:

gðxÞ ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þðx3 þ x2 þ 1Þ2

(a) Verify do the code is a cyclic one! Find all code words and draw its weight
spectrum,
(b) How many errors can be corrected and how many can be detected? Verify
whether the Hamming and Singleton bounds are satisfied.
(c) Find the probability that the emitted code word is not corrected if the channel
can be modeled as BSC (crossover probability p = 10−2).

Solution
A generator polynomial has to satisfy the condition to be a divisor of polynomial
xn þ 1 (n—code word length). It means that for given g(x) a minimum n should be
found so as that
 
xn þ 1
rem ¼0
gðxÞ

In this case, for gðxÞ ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þðx3 þ x2 þ 1Þ2 , minimum n value
satisfying the above equality is n = 14, because the factoring can be carried out in
field GF(2):

x14 þ 1 ¼ ðx7 þ 1Þ2 ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þ2 ðx3 þ x2 þ 1Þ2 ¼ gðxÞðx3 þ x þ 1Þ;

where the second element at the right side is the parity-check polynomial, while the
generator polynomial is

gðxÞ ¼ ðx þ 1Þ2 ðx3 þ x þ 1Þðx3 þ x2 þ 1Þ2 ¼ 1 þ x þ x2 þ x4 þ x7 þ x8 þ x9 þ x11 :

(a) To information polynomial i(x) = 1 + x2 corresponds the code polynomial

cð xÞ ¼ ið xÞgð xÞ ¼ 1 þ x þ x3 þ x6 þ x7 þ x8 þ x10 þ x13 ;

and, procedure for a direct obtaining of a code word from the information
word i(x) = 1 + x2 and the generator polynomial coefficients is as follows

111010011101×101
111010011101
111010011101
c=(1 1 0 1 0 0 1 1 1 0 1 0 0 1).
Problems 263

Table 6.7 Code words of a cyclic code (14, 3)


Information word Code word
0 0 0 00000000000000
0 0 1 00111010011101
0 1 0 01110100111010
0 1 1 01001110100111
1 0 0 11101001110100
1 0 1 11010011101001
1 1 0 10011101001110
1 1 1 10100111010011

Fig. 6.4 Weight spectrum of the code (14, 3)

It is easy to show that six code words more can be obtained by cyclic shifts of
the obtained code word. As to the information word i = (000) always corre-
sponds all zero code word, list of all code words is given in Table 6.7 and the
corresponding weight spectrum is shown in Fig. 6.4.

(b) Minimum Hamming distance is dmin = 8, it is obvious that the code corrects
up to ec = 3 errors and detects ed = 4 errors in the code word. If the code
capability to correct the errors (ec = 0) is not used, the code can detect up to
ed  dmin − 1 = 7 errors in the code word.
The code parameters are n = 14, k = 3, a Hamming bound is satisfied because
264 6 Cyclic Codes

   
14 14 14 14
2 143
 þ þ þ ) 2048 [ 1 þ 14 þ 91 þ 364
0 1 2 3
¼ 470;

and a Singleton bound is

dmin  n  k þ 1 ) 8  12:

It is obvious that both bounds are satisfied with a very large margin, meaning
that it is possible to construct a code having better performances with the same
code parameters. It can also be noticed that for the same code word length and
the information word length, the following relation is satisfied as well

4 
X 14
2 143
 ;
t¼1
t

meaning that a Hamming bound predicts the existence of a linear block code
having the parameters (14, 3) correcting ec = 4 words at the code word.
Singleton bound in this case is less severe and it even predicts the code
existence with dmin  12, correcting up to ec = 5 error at the code word.
A code (14, 3) correcting five errors at a code word does not fulfill a Hamming
bound, it does not exist, and there is no sense to look for it. However, it even
does not mean that a code (14, 3) correcting four errors exists. The fulfilling of
both bounds only means that this code may exist (and one has to find it!). The
Hamming and Singleton bounds consider the possibility of existence the linear
block codes which has given parameters, and even if such a code exists, it
does not mean that it is a cyclic code.

(c) In Problem 5.8 it was shown that the probability not to detect errors corre-
sponds to the probability that an error vector corresponds to the code word,
and it can be easily calculated using the weight spectrum

X
n
Pe;d ðpÞ ¼ aðdÞpd ð1  pÞnd ;
d¼dmin

for the crossover probability p = 10−2 it is

Pe;d ð102 Þ ¼ 7  ð102 Þ8  ð1  102 Þ148 ¼ 6; 59  1016 :

On the other hand, the probability that the error is not corrected corresponds to
the probability that an error vector does not corresponds to the coset leader in a
standard array. If the maximum weight of the pattern corresponding to the coset
Problems 265

Fig. 6.5 Golay code (23, 12) 1400

Number of words with given weigth, a(d)


weight spectrum
1200
11(1288) 12(1288)

1000

800

600
8(506) 15(506)

400
8(253) 16(253)
200
0(1) 23(1)
0
0 5 10 15 20
Hamming weight, d

leaders is denoted by l, and the number of patterns of the weight i  l is denoted


by L(i), this probability is

X
l
Pe;c ðpÞ ¼ 1  LðiÞpi ð1  pÞni :
i¼0

The code corrects single, double and triple errors and it is obvious that all error
vectors having one, two or three ones (total 470) can be coset leaders, and L(0) = 1,
L(1) = 14, L(2) = 91 and L(3) = 364. Now, the coset leaders corresponding to the
syndrome values left over should be found. The reader should verify that, for this
code, all error vectors having four ones result in the various syndrome values and to
find the probability not to correct the errors. It is the most easier to use a computer
search.
Problem 6.5 The Golay cyclic code (code word length n = 23) generator poly-
nomial is:

gðxÞ ¼ 1 þ x2 þ x4 þ x5 þ x6 þ x10 þ x11 :

(a) Explain the forming procedure of one code word.


(b) Draw weight spectrum and find the number of errors correctable by this code.
Whether the code is perfect? Is it a MDS code?
(c) Show how to form a Golay code (24, 12) and explain its possibilities.
(d) Explain how to form a systematic variant of Golay (24, 12) code,
266 6 Cyclic Codes

Solution
The code word length is n = 23 and generator polynomial power is n − k = 11,
therefore the information word length is k = 12. It is the Golay code (23, 12)
described in [24].
(a) The forming of a code word corresponding to information word i(x) = 1 +
x5 + x11 is carried out by finding the code polynomial coefficients

c( x) = i ( x) g ( x) = (1 + x5 + x12 )(1 + x 2 + x 4 + x5 + x 6 + x10 + x11 ) = 1 + x 2 + x 4 + x5 + x 6 + x10


+ x11 + x 5 + x 7 + x9 + x10 + x11 + x15 + x16 + x12 + x14 + x16 + x17 + x18 + x 22 + x 23
= 1 + x 2 + x 4 + x 6 + x 7 + x9 + x12 + x14 + x15 + x17 + x18 + x 22 + x 23 ,

and the corresponding code word is c = (101010110100101101100011).

(b) Golay code (23, 12) weight spectrum was found by a computer search by
repeating the above exposed procedure to find all 212 code words. The result is
shown in Fig. 6.5. Minimum Hamming distance is dmin = 7 and the code can
correct up to three errors in the code word.
In this case the Hamming bound is satisfied with equality because

3 
X
2312 23
2 ¼ ¼ 2048;
t¼1
t

and the Golay (23, 12) code is perfect. Singleton bound is satisfied as well

dmin  n  k þ 1 ) 7  12

but not with the equality and it is not an MDS code.

(c) The extended Golay code which has the parameters (24, 12) is obtained by
adding a general parity-check to the Golay (23, 12) code. Its generator
polynomial is

gðxÞ ¼ ð1 þ xÞð1 þ x2 þ x4 þ x5 þ x6 þ x10 þ x11 Þ


¼ 1 þ x þ x2 þ x3 þ x4 þ x7 þ x10 þ x12 ;

a weight spectrum of the code is determined by coefficients a(0) = a(24) = 1,


a(8) = a(16) = 759, a(12) = 2576.
Problems 267

(d) The systematic Golay code (24, 12) has the code words determined by the
coefficients of the polynomial
 
iðxÞxnk
cð xÞ ¼ ið xÞx nk
þ rem ;
gðxÞ

and to information word i = (100000000000) corresponds a code polynomial


 
12 x12
c1 ð xÞ ¼ x þ rem
1 þ x þ x2 þ x3 þ x4 þ x7 þ x10 þ x12
¼ 1 þ x þ x2 þ x3 þ x4 þ x7 þ x10 þ x12 :
 
13 x13
c2 ð xÞ ¼ x þ rem
1 þ x þ x2 þ x3 þ x4 þ x7 þ x10 þ x12
¼ 1 þ x þ x þ x3 þ x4 þ x7 þ x10 þ x13 :
2

This procedure is repeated for all information words having only one “one”
resulting in the corresponding code words. By using these words as the rows
of a matrix which has 12 rows and 24 columns, a systematic generator matrix
is obtained

G ¼ ½P; I12 :

This matrix is usually additionally modified so as that the unity matrix is at the
first 12 columns, while the positions of other columns are changed so as their
rows and columns become identical:

Just due to this last feature, AT = A and a control matrix is


268 6 Cyclic Codes

H ¼ AT ; I12 ¼ ½A; I12 ;

and the Golay code (24, 12) is self-dual (dual to itself).

Problem 6.6 The generator polynomial of cyclic code is

gðxÞ ¼ 1 þ x2 þ x8 :

(a) Explain the procedure for forming a sequence at the CRC encoder output if the
information sequence i = (01100111) was emitted.
(b) Verify the remainder in decoder after division if there were no errors during
the transmission.
(c) If during the transmission the errors at the second, at the last but one and at the
last position occurred find the result in CRC decoder.
(d) Draw a code weight spectrum and comment its capability to detect the errors.

Solution
(a) A polynomial corresponding to the information sequence is i(x) = x7 + x6 +
x5 + x2 + x, and the code sequence is
 
iðxÞ  xnk
cð xÞ ¼ ið xÞxnk þ rem ¼ ið xÞx8 þ r ð xÞ:
gðxÞ

A procedure for finding the remainder r(x) will be split into two phases.
Firstly, a polynomial to be divided will be found and a corresponding binary
sequence (bits corresponding to the coefficients)

i ð xÞ ¼ ið xÞx8 ¼ x9 þ x10 þ x13 þ x14 þ x15 ! i ¼ ð0000000001100111Þ;

and the remainder after division of the found polynomial by the generator
polynomial is
 
i ðxÞ
r ð xÞ ¼ rem ¼ 1 þ x4 þ x5 þ x6 ! r ¼ ð1000011Þ;
gðxÞ

therefore, a code polynomial and a corresponding code word are

cð xÞ ¼ ið xÞ  x8 þ r ð xÞ ¼ 1 þ x4 þ x5 þ x6 þ x9 þ x10 þ x13 þ x14 þ x15 ! c


¼ ð1000111001100111Þ:

(b) If during the transmission there were no errors, the code word was not
changed. The decoder is performing a Cyclic Redundancy Check (CRC) to
find the remainder after division of the received polynomial by a generator
Problems 269

polynomial [35]. A procedure of division cðxÞ by g(x) is as follows (it starts


from the highest power coefficient!):

1110011001110001 : 100000101 =11100101


100000101
110010011110001
100000101
10010110110001
100000101
0010100010001
100000101
0100000101
100000101
00000000

the obtained result is a polynomial q(x) = 1 + x2 + x5 + x 6 + x7 (not


important here) and a remainder equals zero. It is just the indicator that there
were no errors during transmission.
(c) In this case the received message is

c ¼ ð1000011001100111Þ
e ¼ ð0100000000000011Þ
c0 ¼ c þ e ¼ ð1100011001100100Þ

and a received word polynomial is r(x) = 1 + x + x5 + x6 + x9 + x10 + x13,


while the remainder after division by a generator polynomial is

Fig. 6.6 CRC code (16, 8) weight spectrum


270 6 Cyclic Codes

 
c0 ðxÞ
r ð xÞ ¼ rem ¼ x7 þ x6 þ x4 þ x3 þ x2 þ 1
gðxÞ

This remainder is different from zero, i.e. r = (11011101) 6¼ (00000000), and


the transmission error(s) is (are) detected, being the goal of CRC procedure. In
this case the number of errors is not known, neither their positions, because
this decoder is not designed to correct the errors.
A procedure for obtaining the code words, explained in the first part of this
problem, can be repeated for any 8-bits code word. By a computer search all
28 = 256 code words were generated and their Hamming weights were found,
the result shown in Fig. 6.6.
As explained earlier, a linear block code cannot detect an error vector if it is
the same as a code word (vector). The probability for this event can be found
on the weight spectrum basis, and for p  1 (e.g. p < 10−2) the following
approximation can be used

X
n X
n
Pe;d ðpÞ ¼ aðdÞpd ð1  pÞnd aðdÞpd aðdmin Þpdmin :
d¼dmin d¼dmin

It is obvious that this probability would be greater if dmin is greater and a(dmin)
smaller. In this case dmin = 3, and a code can detect all single and double
errors, while more errors can be detected, if they do not correspond to the code
word structures.

Problem 6.7 Binary message i = (01100111) is transmitted through the system


using CRC which has the generator polynomial gðxÞ ¼ x4 þ x3 þ 1.
(a) Find a corresponding code word. Whether the same code word could be
obtained using a nonsystematic procedure for some other information word? If
yes, find it!
(b) Draw block schemes of encoder and decoder for error detection and explain
how they work.
(c) Find a weight spectrum of a code and calculate the probability of the unde-
tected error if the channel can be modeled as BSC with crossover probability
p = 10−3.
(d) Whether by using the same generator polynomial is possible to form linear
block codes having parameters (7, 3) and (11, 4)? Are these codes cyclic?

Solution
(a) As mentioned in the previous problems, forming of a cyclic code can be
conceived as multiplying information polynomial by a generator polynomial
cnesist ðxÞ ¼ iðxÞgðxÞ, but the obtained code word is not a systematic one.
Problems 271

SET SET SET SET


D Q D Q D Q D Q

c11...c0
CLR
Q CLR
Q CLR
Q CLR
Q
clk

SET SET SET SET SET SET SET SET


D Q D Q D Q D Q D Q D Q D Q D Q

CLR Q CLR Q CLR Q CLR Q CLR Q CLR Q CLR Q CLR Q

i7 i6 i5 i4 i3 i2 i1 i0

Fig. 6.7 Block scheme of CRC encoder for (12, 8) code

To form a systematic code, where a code word is composed by clearly sep-


arated information and parity-check bits, the word is formed on the basis of the
following identity

iðxÞxnk
csist ðxÞ ¼ iðxÞxnk þ rem :
gðxÞ

As shown in the previous problem, this word is divisible by g(x) without


remainder. It means that it can be obtained in the other way, multiplying a
generator polynomial by some other information polynomial i2(x)

csist ðxÞ ¼ i2 ðxÞgðxÞ:

Combining two previous equations the following relation is obtained (addition


and subtraction are equivalent)

iðxÞxnk
iðxÞx nk
¼ i2 ðxÞgðxÞ þ rem ;
gðxÞ

from which it can be concluded that dividing a polynomial iðxÞxnk by g(x) the
result i2 ðxÞ is obtained, and the remainder is rðxÞ ¼ rem iðxÞxnk =gðxÞ . In
this example iðxÞ ¼ x þ x2 þ x5 þ x6 þ x7 , while a division by g(x) is as
follows
272 6 Cyclic Codes

111001100000 :11001 =10110110


11001
010111
11001
11100
11001
010100
11001
11010
11001
0011

It should be noted that a polynomial iðxÞxnk in a general case has n coeffi-


cients, while a generator polynomial is of order n − k, resulting in the
following:
– a result obtained by division is a polynomial having k coefficients, and the
information word looked for is

i2 ðxÞ ¼ x þ x2 þ x4 þ x5 þ x7 ) i2 ¼ ð0110111Þ

– a remainder after division in a general case is polynomial which has n–


k coefficients

rðxÞ ¼ x þ x2 :

Therefore, every code word obtained by a systematic procedure, can be


obtained as well by starting from some other information word, using a
nonsystematic procedure. It can be suitable for code spectrum calculating.

cˆ0 ...cˆ11
D D D D

(1) OE
D 0 D 1 D 2 D 3 D4 D 5 D 6 D 7
(2)

mˆ 0 ...mˆ 7

Fig. 6.8 Block scheme of CRC decoder for (12, 8) code


Problems 273

Table 6.8 Code words of a cyclic code (7, 3)


Information word Code word
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 1 0 0 1
0 1 0 0 1 0 1 0 1 1
0 0 1 0 1 1 0 0 1 0
1 1 0 1 0 0 1 1 1 1
0 1 1 1 0 1 0 1 1 0
1 0 1 1 1 0 0 1 0 0
1 1 1 1 1 1 1 1 0 1

Instead of division a polynomial modulo polynomial, a polynomial multipli-


cation modulo-2 can be used.

(b) Block scheme of a CRC encoder for (12, 8) code is shown in Fig. 6.7—during
the first k shifts, the lower bits are red out, while at the upper part a division is
performed. The encoder consists of two shift registers (usually realized using
D-flip flops, as standard elements for delay). Generally, a lower register has
k delay cells. The writing is parallel, but the reading (sending at the line) is
serial, during k successive shifts. The upper register consists of n − k delay
cells and a feedback is defined by coefficients of generator polynomial. In this
register, after k shifts, a remainder after division by a generator polynomial is
obtained. It will be sent at the line after the information bits, because the
switch will be at the upper position after the first k shifts.

(a) (b)

Fig. 6.9 Weight spectra of two CRC codes having the same generator polynomial g(x) = x4 +
x3 + 1, but the different code words lengths—(11, 7) (a) and (7, 3) (b)
274 6 Cyclic Codes

Table 6.9 Minimum code distances profile—optimum and for two considered codes
Dmin Optimum CCITT, nc = 32767 C1, nc = 151
17 17
12 18
10 19, …, 21 17, …, 20
9 22
8 23, …, 31 21, 22
7 32, …, 35
6 36, …, 151 23, …, 151
5 152, …, 257
4 258, …, 32767 17, …, 32767
3 32768, …, 65535
2  65536  32768  152

Block scheme of CRC decoder is shown in Fig. 6.8, using simplified symbols
for delay cells. When the switch is in position (1), the first k bits enter the
division circuit and the shift register. Here the remainder after division by a
generator polynomial, whose coefficients determine the feedback structure as
well, is obtained. If the remainder differs from zero, the signal OE (Output
Enable) is not active and the reading from the lower register is not permitted.
If the remainder equals zero indicating that there were no errors during
transmission, signal OE is active and the decoded information can be delivered
to a user.

Fig. 6.10 Weight spectrum of the cyclic code (15, 11)


Problems 275

(c) CRC procedure as a result gives the code words of a code obtained by the
shortening a binary cyclic code. The polynomial gðxÞ ¼ x4 þ x3 þ 1 is an
irreducible factor of a polynomial x15 + 1, CRC procedure based on this
polynomial can generate all codes having the parameters n  15 and n −
k = 4. Codes having the parameters (11, 7) and (7, 3) fulfill these conditions
and they can be generated using this polynomial. By a computer search all
code words of these codes were found and their weight spectra are shown in
Figs. 6.9a, b. In Table 6.8 all code words of a code (7, 3) are listed. It can be
easily seen that this code is not cyclic, nor equivalent to the code from
Problem 6.3.
It is clear that every obtained code, whose spectra were found, has a minimum
Hamming distance dmin = 3, and could correct one error in a code word.
However, here the CRC codes are considered where the aim is not the error
correction. If ec = 0, they can detect at most ed = 2 errors in the code word. It
should be noticed that even and a spectrum shape indicates that a code is
cyclic. If a spectrum is symmetric (as for the case of non-shortened code (15,
11), spectrum shown in Fig. 6.10), a code is cyclic. From Fig. 6.9 it is clear
that a code shortening usually disturbs this feature.

Problem 6.8 During CRC-8 procedure for error detecting, after the information
word, CRC extension is included, where its forming is based on the generator
polynomial g(x) = x8 + x7 + x6 + x4 + x2 + 1.
(a) Find the code word corresponding to information word i = (1000000) and find
the code distances spectrum.
(b) Find the code word corresponding to information word i = (10000000) and
find the code distances spectrum.

(a) (b)
Number of words with given weight, a(d)

Number of words with given weight, a(d)

90 1500
80
70
60 1000
50
40
30 500
20
10

0 0
0 2 4 6 8 10 12 14 16 0 5 10 15 20
Hamming weight,d Hamming weight, d

Fig. 6.11 Weight spectra for two CRC codes which have the same generator polynomial g
(x) = x8 + x7 + x6 + x4 + x2 + 1, but the different code word lengths—(16, 8) (a) and (20, 12) (b)
276 6 Cyclic Codes

Solution
CRC code word is formed using the identity

iðxÞxnk
cCRC ðxÞ ¼ iðxÞx nk
þ rem ;
gðxÞ

and always consists of clearly separated information and parity-check bits. Where
CRC-8 is applied, the remainder has always eight bits, while a length of the
information part depends on the message to be transmitted.
(a) If the information sequence length is k = 8, a code word has length n = 16 and
for the considered example iðxÞ i(x) = 1 yielding

x8
ðIÞ 8
c ðxÞ ¼ x þ rem ¼ 1 þ x2 þ x4 þ x6 þ x7 þ x8 :
gðxÞ

Here generator polynomial corresponds to the code word, and the code word is
obtained by reading coefficients starting from x0 to x15

cðn¼8Þ ¼ ð1010101110000000Þ:

A code spectrum found by a computer search is shown in Fig. 6.11a. In this


case dmin = 4, the code detects all three-bit error combinations in a 16-bits
block. It is obvious that there are no code words which have odd weights and
the code will detect all sequences having an odd number of errors.

(b) Although the information sequence length here is k = 10, a procedure for
obtaining a code word is practically unchanged, because here the following is
satisfied as well

cðIIÞ ðxÞ ¼ gðxÞ:

However, a code word in this case has the length n = 18 being obtained by
reading coefficients starting from x0 to x17

cðn¼10Þ ¼ ð101010111000000000Þ:

If the same information polynomial is used, but the code words have the
various lengths, it is obvious that the resulting code polynomials will be
unchanged, and under the same conditions, a longer code word relating to a
shorter one has only zeros at the right side. The code spectrum is shown in
Fig. 6.11b. In this case dmin = 4 as well, and a code can also detect all three-bit
error combinations, but this time in the twenty-bit block. The code will detect
all sequences with an odd number of errors as well.
Problems 277

Some general rules for the choice and analysis of CRC code performances are:
(1) By shortening a cyclic code, the code is obtained having minimum
Hamming distance at least as the original code. In fact, by shortening a
cyclic code, a parameter dmin cannot be decreased! Taking into account
that code corrects at least the same number of errors as before a short-
ening, but has shorter code words, it is obvious that the capability for
error detection is increased by code shortening. Of course, it is paid by
decrease of code rate, because in shorter code word the number of
parity-checks remains the same.
(2) A main advantage of code shortening is a possibility to adjust a code
word length to a length of the message entering the encoder. In such way,
the same generator polynomial can be used for encoding of information
words of various lengths. However, for every generator polynomial there
is an upper bound for a code word length, after which code performances
become worse abruptly. It will be considered in the next problem.
(3) It is desirable for a code to have the capability to detect an odd number of
errors. It can be achieved by a generator polynomial choice, if it has a
factor (x + 1). It is interesting that in this case a code can correct an odd
number of errors although x + 1 is not a factor of generator polynomial!
(4) A spectrum of a code (20, 12) shown in Fig. 6.11b, was obtained by
inspection of Hamming weights of all 4096 code words. In communi-
cation systems the information bits are grouped into still longer blocks,
and to find a linear block code spectrum a long time is needed. In
practice, the codes are usually used which have a maximum generator
polynomial power 16 or 24 (wireless systems) up to n − k = 32 (e.g.
Ethernet), while the information block lengths are a few hundreds and

(a) (b)

Fig. 6.12 Weight spectra for a two CRC codes, for the same code word length (n = 30), but
with different generator polynomials g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 (a) and g2 ðxÞ ¼ x16 þ x13 þ x12 þ
x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 (b)
278 6 Cyclic Codes

even thousands of bits. In this case a procedure to find a code spectrum


used in this problem is useless (about 21000 code words should be gen-
erated and analyzed!), but in this case a procedure of performance
analysis can be done by observing a dual code spectrum.

Problem 6.9 A communication system uses CRC, where generator polynomials


could be g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 or g2 ðxÞ ¼ x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ
x6 þ x5 þ x2 þ 1.
(a) If a code word length is n = 30, draw spectra of both codes. How the spectrum
shape changes and what are their possibilities to detect errors if the code word
length is changed?
(b) If for the channel, a BSC model can be used (crossover probability p = 10−3),
draw a dependence of a probability that the errors are not detected versus a
code word length (for n  300). Find ranges of information word (message)
length where it is possible to apply CRC procedure in both cases.
(c) If CRC procedure is applied for a word length n = 120, find the dependence
not to detect the errors vs. crossover probability. The same procedure repeat
for code word length n = 200. What are the advantages and the drawbacks of
both codes?

Solution
(a) A code word length is relatively short and a direct method was used to find a
spectrum by a computer search. Here n = 30 and n − k = 16, in both cases

Fig. 6.13 Probability that a code does not detect error vs. the code word length for two analyzed
codes
Problems 279

214 = 16384 code words were generated, their Hamming weights and their
spectra were found. For both polynomials they are shown in Figs. 6.12a, b.
A code with generator polynomial g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 is a standard
CCITT CRC-16 code having dmin = 4, and in a code there are A(dmin) = 14
words having four ones. A CRC-16 code using polynomial g2 ðxÞ ¼
x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 was proposed [36] as C1
code which has dmin = 6 and in a code there are A(dmin) = 18 words having six
ones. Both codes can correct any odd number of errors. The reader should
verify the divisibility of generator polynomials by factor x + 1.
As mentioned earlier, a shortening of a cyclic code cannot additionally
decrease the minimum Hamming weight. If CCITT CTC-16 code is shortened
to the code word length n = 25, dmin = 4 (the same), in the code there are less
words with this weight, because A(4) = 9. Similarly, for a code whose gen-
erator polynomial is g2 ðxÞ ¼ x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ
x2 þ 1 for n = 25, dmin = 4, but now A(6) = 3.
On the basis of the analysis from the previous chapter, the probability that the
error was not detected is

X
n
Ped ðp; nÞ ¼ Ad ðnÞpd ð1  pÞnd ;
d¼dmin ðnÞ

Fig. 6.14 Probability for a code not to detect error vs. crossover probability for two considered
CRC codes and two code word lengths
280 6 Cyclic Codes

where is was stressed that the minimum Hamming distance, as well as the
weight spectrum coefficients, depend on the code word length (it is important
how much a CRC code was shortened). CRC codes are generally used for the
channels which have a low noise power which, except for rare packet errors,
that can be modeled as a BSC where the crossover probability is sufficiently
small (p  10−2), and the codes are optimized for such a channel type. Then,
the first element of a series is dominant, and the following approximation can
be used

Ped ðp; nÞ Admin ðnÞ ðnÞpdmin ðnÞ :

(b) The above used procedure is repeated for a code words where n  300. Of
course, here n − k = 16 and it is obvious that the code word length cannot be
shorter than n = 17 and a typical example of such a code is the simple
parity-check. For n > 35 the performances are found using a dual code
spectrum (as described in Problem 5.9) and the results are shown in Fig. 6.13.
It is obvious that a code with generator polynomial g2 ðxÞ ¼
x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 (here denoted as a C1 code),
provides greater reliability for code word lengths n  151, while CCITT
code has much better performances for a longer code word lengths.
The above exposed results can be understand easier if one notice that a gen-
erator polynomial CCITT CRC-16 code g1 ðxÞ ¼ x16 þ x12 þ x5 þ 1 is a factor
of polynomial x32767 + 1, while a generator polynomial of C1 CRC-16 code
g2 ðxÞ ¼ x16 þ x13 þ x12 þ x11 þ x10 þ x8 þ x6 þ x5 þ x2 þ 1 is a factor of poly-
nomial x151 + 1. The code length of a non-shortened CCITT code is nc =
32767 and a polynomial g1(x) can generate only codes being a shortened
versions of a cyclic code (32767, 32751). Similarly, a polynomial g2(x) can
generate shortened versions of a cyclic code (151, 135) because here nc = 151.
By using the polynomial g2(x) it is not possible to obtain a code for effective
error detection for code word length n > 151. It was shown that a code

Table 6.10 Galois field GF(23)


Exponential equivalent Polynomial Binary equivalent Minimal polynomial
equivalent
a0 1 1 0 0 m3(x) = x+1
a1 a 0 1 0 m1(x) = x3 + x2 + 1
a2 a2 0 0 1 m1(x) = x3 + x2 + 1
a3 1+ a2 1 0 1 m2(x) = x3 + x + 1
a4 1+ a+ a2 1 1 1 m1(x) = x3 + x2 + 1
a5 1+ a 1 1 0 m2(x) = x3 + x + 1
a6 a+ a2 0 1 1 m2(x) = x3 + x + 1
Problems 281

becomes more effective as it is more shortened because of two effects, the


minimum Hamming distance cannot be further shortened and dmin(n) 
dmin(nc), the number of detectable errors is the same either it increases. Also,
the code word length is shorter and the number of positions where the errors
can occur is smaller.
For a CCITT CRC-16 code for any code word length from a range 17 
n  32767, dmin = 4, while Admin decreases with a code shortening. On the
other hand, minimum Hamming distance for the code with generator poly-
nomial g2(x) increases with the code shortening and in a range 17  n  20
it is dmin = 10, in a range 21  n  22 it is dmin = 8, while for 23  n
151 it is dmin = 6. Because in this third group is the greatest number of code
word lengths, it is said that this code is optimized for dmin = 6. Both codes
have always a possibility to detect an odd number of errors, and it is obvious
that a minimum Hamming distance is dmin = 2.
In [36] was analyzed in details a possibility to find minimum distance profile
of CRC-16 codes. The analysis was based on an exhausting search and it was
shown that a code which has maximum dmin for all code word lengths does not
exists. The first two columns of Table 6.9 give the maximum possible mini-
mum Hamming distance for some ranges of code words lengths. These dis-
tances are not achieved by any CRC-16 code, but to various code word lengths
correspond the different optimal generator polynomials. In the third column
the profile for a code CCITT CRC-16 is written and it is obvious that it is
mainly optimum for a greater code word lengths (n  258), while a code C1
CRC-16 defined by a polynomial g2(x) is approximate optimum for a range
36  n  151.
It is interesting to notice that the probability not to detect an error for CRC
code approaches to the bound 2ðnkÞ if a code word length approaches to
infinity, i.e.

lim Ped ðp; nÞ ¼ 2ðnkÞ :


n!1

Codes fulfilling as well for all values of n

Ped ðp; nÞ  2ðnkÞ

and if their probability not to detect error monotonically decreases are called
proper codes. It was shown that code C1 is proper for all code word lengths
n  nc, while CRC-CCITT code is not proper for all values n  nc.
In Fig. 6.14 the probability for a code not to detect error versus BSC crossover
probability is shown. It is obvious that CCITT CRC-16 code has such prob-
ability higher than C1 CRC-16 code, but this probability does increase sig-
nificantly with the code word length increasing. On the other hand, a code
defined by generator polynomial g2 ðxÞ provides better performances for a
short code words, but when n > 151, performances are drastically worse
282 6 Cyclic Codes

because it is not more a code obtained by a cyclic code shortening. A basic


advantage of CCITT code is, therefore, more flexibility concerning the code
word length.

Problem 6.10 Field GF(23) is generated by a primitive polynomial p(x) = x3 +


x2 + 1. Find a BCH code constructed over this field and its distance spectrum, if
one root of a generator polynomial is a1 = a under condition
(a) the code should correct one error in the code word,
(b) the code should correct two errors in the code word,
(c) the code should correct three errors in the code word,

Solution
Firstly, the root of primitive polynomial generating a field should be found. The
polynomial p(x) coefficients take the values from a binary field (addition is
modulo-2), and it is obvious that binary zero and binary one are not polynomial
roots because p(0) = p(1) = 1. An element a is introduced, for which

pðaÞ ¼ a3 þ a2 þ 1 ¼ 0

It is obvious that this element is not from a binary field, but a relation a3 =
2
+ 1 must formally be satisfied. It provides the expression of every power of a (the
element of binary field extension) as a polynomial b0a0 + b1a1 + b2a2, where the
ordered coefficients (b0 b1 b2) are a binary equivalent of a corresponding element
from field extension GF(23). E.g. it can be written a4 = (a2 + 1)a = a3 + a = 1 +
+a2 and a binary equivalent is (1,1,1). A complete Galois field GF(23) is given in
Table 6.10.
In this table the identity element for addition (combination “all zeros”) is not
included, it exists as the eighth field element, but it is not interesting for BCH codes

(a) (b)

Fig. 6.15 Code distances spectrum of BCH(7, 4) (a) and BCH(7, 1) (b) codes
Problems 283

construction [37, 38]. It is obvious that a7 = a0 = 1 as well as al = al−7 for any


natural number l.
For any arbitrary element b from GF(2m), an order of element r can be defined.
It is a minimum natural number equal to the element power yielding the identity
element for multiplication, i.e. b r = a0 = 1. An element is primitive if his order is
maximum i/e/ being rmax = 2m − 1. In other words, an element is primitive if by its
successive powers all field elements can be obtained. From the extended field
construction view, it is obvious that element a must always be a primitive one.
In this example, element a5 is primitive, because by its successive powers yield
the elements (a5)1 = a5, (a5)2 = a3, (a5)3 = a, (a5)4 = a5, (a5)5 = a4, (a5)6 = a2,
(a5)7 = a0 = 1 and its order is seven. On the other hand, it is obvious that a0 is not
primitive because its order equals one. Conjugate elements of element b are found
as all different powers b2, b4, b8, b16, … existing in a considered field extension
(power is always of the form 2i, i being a natural number). In considered field GF(8)
element a has conjugate elements a2 and a4, because the other powers are repeating
a8 = a8−7=a, a16 = a16−27 = a2… Further, a3 has the conjugate elements
(a3)2 = a6 and (a3)4 = a12 = a5, because (a3)8 = a24 = a3. The last field element is
the identity element for multiplication, it has not conjugate elements, because its
value does not change by raising to any power.
To every group of conjugate elements a minimal polynomial can be joined. To
the groups of conjugated roots (a, a2 i a4), (a3, a5 and a6) and a0 correspond the
following minimal polynomials, respectively

m1 ð xÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þ ¼ x3 þ ða4 þ a2 þ aÞx2 þ ða6 þ a5 þ a3 Þx þ a7


¼ x3 þ x2 þ 1
m2 ð xÞ ¼ ðx þ a3 Þðx þ a5 Þðx þ a6 Þ ¼ x3 þ ða6 þ a5 þ a3 Þx2 þ ða8 þ a9 þ a11 Þx þ a14
¼ x3 þ x þ 1
m3 ð xÞ ¼ ðx þ a0 Þ ¼ x þ 1

where for calculations, the relation al = al−7 was used as well as a corresponding
equivalence between exponential and polynomial equivalents, given in Table 6.10.
Coefficients of minimal polynomials are by rule from a binary field. Further, in a
general case, minimal polynomials are all irreducible factors of polynomial xn + 1
where n = 2m − 1, 2m is the number of elements in extension field GF(2m)
  
x7 þ 1 ¼ x3 þ x2 þ 1 x3 þ x þ 1 ðx þ 1Þ:

This relation, previously written in Problem 6.1, can now be explained more
precisely. To obtain a binary cyclic code it suffices that its generator polynomial is a
product of minimal polynomials, and a code word polynomial is easily obtained
multiplying information polynomial by code generator polynomial. Of course, all
combinations of minimal polynomials will not result in equally effective error
control codes.
284 6 Cyclic Codes

On the other hand, BCH code construction guarantees to obtain a code where a
capability to correct errors is given in advance. For every natural number m  3 it
is possible to construct BCH code where the code word length is n = 2m − 1,
correcting ec errors, where n − k  mec. To provide these features, it is necessary
and sufficient that a code generator polynomial has the following (successive) roots
from a field extension

am0 ; am0 þ 1 ; am0 þ 2 ; . . .; am0 þ 2ec 1 :

If a starting element am0 is a primitive one, all its conjugate elements are
primitive as well, and the corresponding minimal polynomial is primitive. BCH
code, where a starting element is primitive, it is called a primitive code. As the
element a is primitive by rule, usually as a generator polynomial the polynomial of
a minimal power is chosen having 2ec successive powers of a

a; a2 ; a3 ; . . .; a2ec :

(a) To obtain a code which has code word length n = 7, capable to correct one
error, a generator polynomial should have roots a i a2. As both elements are
roots of minimal polynomial m1(x), it is obvious

g1 ð xÞ ¼ m1 ð xÞ ¼ x3 þ x2 þ 1:

The generator polynomial highest power corresponds to the number of


parity-check bits n − k = 3, and a code (7, 4) is obtained. A code spectrum
can be easily found by noticing the following:
– to information polynomial i1(x) = 1 corresponds code polynomial c1(x) =
i1(x)g1(x) = 1 + x2 + x3, i.e. the word (1 0 1 1 0 0 0). Because the code is
cyclic, the six cyclic shifts of this code word yield six corresponding code
words with Hamming weight d = 3.
– to information polynomial i2(x) = 1 + x corresponds code polynomial
c2(x) = i2(x)g1(x) = 1 + x + x2 + x4, i.e. a code word (1 1 1 0 1 0 0). This
code word and its cyclic shifts (seven in total) has weight d = 4.
By noticing as well that in a linear block code there is always an “all zeros”
code word, it is obvious that a code distances spectrum has a form shown in
Fig. 6.15a. This code is equivalent to Hamming code, because their spectra are
identical. Because a minimum Hamming distance is dmin = 3, the code can
correct one error, as required.

(b) If the goal is to construct a code, having code word length n = 7 and cor-
recting two errors, the generator polynomial should have roots a, a2, a3, a4.
Problems 285

Elements a, a2 and a4 are roots of a minimal polynomial m1(x), while element


a3 is element of a minimal polynomial m2(x), therefore
  
g2 ð xÞ ¼ m1 ð xÞm2 ð xÞ ¼ x3 þ x2 þ 1 x3 þ x2 þ 1
¼ 1 þ x þ x2 þ x3 þ x4 þ x5 þ x6 :

In this case n − k = 6, and a code (7, 1) is obtained. It is obviously sevenfold


repetition code where to information polynomial i1(x) = 0 corresponds code
word “all zeros”, while to information polynomial i2(x) = 1 corresponds the
word “all ones”. These are the only two existing code words, the code distance
spectrum can be easily found, as shown in Fig. 6.15b. It can be easily noticed
that a minimum Hamming distance is dmin = 7, and a code can correct up to
three errors, one more than required. A detailed explanation of this effect will
be given in the next part of solution.

(c) To construct a code correcting three errors and having word code length n = 7,
a generator polynomial should have the roots a, a2, a3, a4, a5, a6. Elements a,
a2 i a4 are roots of minimal polynomial m1(x), while the elements a3, a5 i a6
are roots of minimal polynomial m2(x), and a generator polynomial is the same
as in a previous case

g3 ð xÞ ¼ m1 ð xÞm2 ð xÞ ¼ 1 þ x þ x2 þ x3 þ x4 þ x5 þ x6 :

A generator polynomial completely determines a cyclic code, and BCH code


correcting three errors is the same as in a previous case (sevenfold repetition)
and its code distances spectrum is the same. As explained previously, the
correcting of three errors is satisfied because the relation dmin = 2ec + 1 = 7
holds.
It is known the BCH codes satisfy the relation n – k  mec. In this case
m = 3 and n = 7, and if a code corrects one error the information word length
must be k  4, while, when two errors are corrected a relation k  1 holds,
being satisfied with equality in both cases. It is interesting that for a three
errors correction this condition is k  − 2, is satisfied as well. This feature of
BCH code gives a lower bound for an information word length, while a
Singleton bound n − k  2ec gives a fundamental upper bound for all block
codes. In this case a Singleton bound is reduced to the condition k  1, being
satisfied with equality for a MDS code. Therefore, although code (7, 1) has a
low code rate, it has a maximum possible value of code rate under these
conditions.
From these examples it is obvious that the constructed BCH code corrects at
least a number of errors it was projected for, but it could be possible that it
corrects a larger number of errors. This effect appears, because a generator
polynomial is formed as a product of the corresponding minimal polynomials,
286 6 Cyclic Codes

Table 6.11 Galois field GF(24)


Exponential Polynomial Binary Minimal
equivalent equivalent equivalent polynomial
a0 1 1 0 0 0 m5(x)
a1 a 0 1 0 0 m1(x)
a2 a2 0 0 1 0 m1(x)
a3 a3 0 0 0 1 m2(x)
a4 1+a 1 1 0 0 m1(x)
a5 a + a2 0 1 1 0 m4(x)
a6 a2 + a3 0 0 1 1 m2(x)
a7 1 + a + a3 1 1 0 1 m3(x)
a8 1 + a2 1 0 1 0 m1(x)
a9 a + a3 0 1 0 1 m2(x)
a10 1 + a + a2 1 1 1 0 m4(x)
a11 a+a2 + a3 0 1 1 1 m3(x)
a12 1 + a+a2 + a3 1 1 1 1 m2(x)
a13 1 + a2 + a3 1 0 1 1 m3(x)
a14 1 + a3 1 0 0 1 m3(x)

and a whole polynomial enters into the product, regardless if it is necessary


that only one or all its roots is a generator polynomial root. It seems that a
feature of a code to correct more errors than required is suitable. However, in
some cases, it is desirable that a number of errors is just as it was projected, if
it can provide a greater code rate. E.g. it would be very useful if a code exists
correcting just ec = 2 errors for a code which has the code word length n = 7,
being in the same time k > 1. However, it was found that such binary code
cannot be constructed. But, using nonbinary codes it is possible to construct
codes correcting just a required number of errors which have at the same time
a maximum possible code rate. These are Reed-Solomon codes, which are in
details described in one of the next problems.

Problem 6.11 Field GF(24) is generated by the primitive polynomial p(x) = x4 +


x + 1. Find BCH code constructed over this field under condition
(a) one root of generator polynomial is a1 = a, and one error should be corrected,
and draw a code spectrum distance;
(b) one root of generator polynomial is a1 = a, and two errors should be corrected,
and draw a code spectrum distance;
(c) one root of generator polynomial is a1 = a4, and two errors should be cor-
rected, compare its code rate to the code rate of a code b);
(d) one root of generator polynomial is a1 = a5, and four errors should be
corrected.
Problems 287

Fig. 6.16 BCH(15, 11) code distances spectrum

Solution
In this case a root of primitive polynomial is found from

pðaÞ ¼ a4 þ a þ 1 ¼ 0

and the relation a4 = a+1 must hold, enabling to form the corresponding extended
field GF(24) given in Table 6.11.
The minimal polynomials in GF(24) are obtained

m1 ðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þðx þ a8 Þ ¼ x4 þ x þ 1


m2 ðxÞ ¼ ðx þ a3 Þðx þ a6 Þðx þ a12 Þðx þ a9 Þ ¼ x4 þ x3 þ x2 þ x þ 1
m3 ðxÞ ¼ ðx þ a7 Þðx þ a14 Þðx þ a13 Þðx þ a11 Þ ¼ x4 þ x3 þ 1
m4 ðxÞ ¼ ðx þ a5 Þðx þ a10 Þ ¼ x2 þ x þ 1
m5 ðxÞ ¼ ðx þ a0 Þ ¼ x þ 1

Although all polynomials are irreducible and the following is valid


x15 þ 1 ¼ m1 ðxÞm2 ðxÞm3 ðxÞm4 ðxÞm5 ðxÞ, it could be noticed that only these poly-
nomials having primitive elements as a roots are primitive.
It is obvious that element a is primitive, as well as its conjugate elements and a
polynomial m1(x) is primitive, the same being true and for a polynomial m3(x).
However, it can be easily shown that the element a3 has the order five, and the
element a5 has the order three. The corresponding minimal polynomials are not
primitive [as well as polynomial m5(x)]. Therefore, field GF(24) can be constructed
288 6 Cyclic Codes

only with polynomials m1(x) and m3(x), but any element can be used to find a BCH
code polynomial. However, whether the obtained BCH code is primitive depends
on a nature of generator polynomials roots.
(a) To construct a code correcting one error in the code word length n = 15, the
generator polynomial should have the roots a and a2, being here

g1 ðxÞ ¼ m1 ðxÞ ¼ x4 þ x þ 1

and the obtained code is (15, 11). A code spectrum, shown in Fig. 6.16, was
obtained by a computer search generating all 211 = 2048 information words,
obtaining the corresponding code words and then finding their weights.
It is obvious that the code minimum Hamming distance is dmin = 3 providing
the correction of all single errors, as required. This code is equivalent to
Hamming (15, 11) code.
(b) To construct a code correcting two errors in the code word length n = 15, a
generator polynomial should have the roots a, a2, a3, a4, where a, a2 and a4 are
roots of minimal polynomial m1(x), while a3 is a root of m2(x), and generator
polynomial is

g2 ðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ ðx4 þ x þ 1Þðx4 þ x3 þ x2 þ x þ 1Þ


¼ x8 þ x7 þ x6 þ x4 þ 1

A code (15, 7) spectrum, shown in Fig. 6.17, was obtained by a computer


search generating all 27 = 128 information words. In this case the minimum
Hamming distance is dmin = 5 and a code can correct all double errors.
(c) In this case the goal is to construct a code correcting two errors in the code
word length n = 15, but the roots are taken starting from the element a4.
Therefore, generator polynomial should have roots a4, a5, a6, a7, where a4 is a
root of minimal polynomial m1(x), a5 is a root of minimal polynomial m4(x),
a6 is a root of minimal polynomial m2(x) and a3 is a root of minimal poly-
nomial m3(x) yielding

g2 ðxÞ ¼ m1 ðxÞm2 ðxÞm3 ðxÞm4 ðxÞ


¼ ðx4 þ x þ 1Þðx4 þ x3 þ x2 þ x þ 1Þðx4 þ x3 þ 1Þðx2 þ x þ 1Þ

In this case n − k = 14, it is code (15,1), in fact, a 15 times repetition.


Minimum Hamming distance is dmin = 15 and the code corrects up to seven
errors (it is easy to prove that an n-times repetition code can correct up to
(n − 1)/2 errors, and it is still MDS code). Therefore, a careless choice of the
starting element for a series of generator polynomial roots can result in a code
correcting substantially more errors than required, but it is paid by a great code
rate reduction compared to the optimal choice. This effect can appear even and
when the starting element is primitive, and can be avoided if a greater number
Problems 289

Fig. 6.17 BCH(15, 7) code distances spectrum

of generator polynomial elements are the roots of the same minimal


polynomial.
(d) In this case, a goal is to construct a code correcting four errors in the code
word length n = 15 bits, but the roots are taken starting from the element a5. In
this case a generator polynomial should have roots a5, a6, a7, a8, a9, a10, a11,
a12 and it is easy to conclude that a generator polynomial is

g2 ðxÞ ¼ m1 ðxÞm2 ðxÞm3 ðxÞm4 ðxÞ


¼ ðx4 þ x þ 1Þðx4 þ x3 þ x2 þ x þ 1Þðx4 þ x3 þ 1Þðx2 þ x þ 1Þ

Of course, it is the same code as in the above case, i.e. 15-fold repetition code
correcting up to seven errors at a code word length. It is easy to conclude that
this effect cannot be avoided by a suitable choice of the starting element for
roots, and in this case, the requirement to construct a code correcting ec  4
errors will always result in a code (15, 1), correcting seven errors.

Problem 6.12 Field GF(24) is generated by the primitive polynomial p(x) = x4 +


x + 1. Find a BCH code constructed over this field, if one root of generator
polynomial is a1 = a and a code should correct two errors in a code word.
(a) Find the code distances spectrum on the basis of spectrum of a dual code by
using McWilliams identities.
(b) Find a code word corresponding to information polynomial i(x) = 1 and find a
received word if the errors during transmission occurred at the second and the
eighth position.
290 6 Cyclic Codes

(c) Calculate syndrome values at the receiver end and verify whether the relations
connecting syndromes and error locators in the previous example are fulfilled.
(d) Explain in short a procedure for finding locator error polynomial on the basis
of syndrome and verify the procedure carried out in the previous example.
(e) Explain in short Peterson procedure for BCH codes decoding and verify its
efficiency in a previous example.

Solution
Spectra of dual codes for BCH codes correcting double errors in the code word
length n = 2m − 1 (m even), can be found using identities given in Table 6.12 [25].
When m = 4, BCH code correcting double error in the code word length n = 15
bits has the parameters (15, 7). Dual code (15, 8) spectrum in this case has six
nonzero components, and it is easy to calculate that in the dual code there are 1
word with weight 0, 15 words with weight 4, 100 words with weight 6, 75 words
with weight 8, 60 words with weight 10 and 5 words with weight 12, what can be
described by a relation

Bð xÞ ¼ 1 þ 15x4 þ 100x6 þ 75x8 þ 60x10 þ 5x12 :

BCH code (15, 7) spectrum can be now found by using the first McWilliams
identity

8 15 1x
AðxÞ ¼ 2 ð1 þ xÞ B
1þx
"   
8 15 1x 4 1x 6 1x 8
¼ 2 ð1 þ xÞ 1 þ 15 þ 100 þ 75
1þx 1þx 1þx
  #
1  x 10 1  x 12
þ 60 þ5
1þx 1þx
h
¼ 28 ð1 þ xÞ15 þ 15ð1  xÞ4 ð1 þ xÞ11 þ 100ð1  xÞ6 ð1 þ xÞ9
i
þ 75ð1  xÞ8 ð1 þ xÞ7 þ 60ð1  xÞ10 ð1 þ xÞ5 þ 5ð1  xÞ12 ð1 þ xÞ3
¼ 1 þ 18x5 þ 30x6 þ 15x7 þ 15x8 þ 30x9 þ 18x10 þ x15

and the same result was obtained as in the previous problem, where a computer
search was used (Fig, 6.17),
(b) In the previous problem it was shown that a generator polynomial for BCH
(15, 7) code is

g2 ðxÞ ¼ m1 ðxÞm2 ðxÞ ¼ x8 þ x7 þ x6 þ x4 þ 1

In this case information polynomial is extremely simple (it is 1), a code


polynomial is the same as a generator polynomial, and a code word is obtained
Problems 291

Table 6.12 Spectra of codes dual to BCH codes with ec = 2, m even


Weight d Number of code words having weight d, bd
0 1
2m−1 − 2(m+2)/2−1 2(m−2)/2−1 (2(m−2)/2 + 1)(2m − 1)/3
2m−1 − 2m/2−1 2(m+2)/2−1 (2m/2 + 1)(2m − 1)/3
2m−1 (2m−2 + 1)(2m − 1)
2m−1 + 2m/2−1 2(m+2)/2−1 (2m/2 − 1)(2m − 1)/3
2m−1 + 2(m+2)/2−1 2(m−2)/2−1 (2(m−2)/2 − 1)(2m − 1)/3

by reading out its coefficients from the lowest to the highest power. Therefore,
the code word, error vector and the received word are, respectively

c ¼ ð1000101110000000Þ
e ¼ ð0100000100000000Þ
r ¼ ð1100101010000000Þ

and the polynomial corresponding to the received word is

rðxÞ ¼ 1 þ x þ x4 þ x6 þ x8 :

Syndrome values at the receiver end are calculated as polynomial values for
generator polynomial roots which should exist for a case ec = 2 being a, a2, a3, a4.
For the above polynomial, syndrome components are

S1 ¼ rðaÞ ¼ 1 þ a þ a4 þ a6 þ a8 ¼ 1 þ a þ ð1 þ aÞ þ ða2 þ a3 Þ þ ð1 þ a2 Þ
¼ 1 þ a3 ¼ a14 ;
S2 ¼ rða2 Þ ¼ 1 þ ða2 Þ þ ða2 Þ4 þ ða2 Þ6 þ ða2 Þ8
¼ 1 þ a2 þ ð1 þ a2 Þ þ ð1 þ a þ a2 þ a3 Þ þ a ¼ 1 þ a2 þ a3 ¼ a13 ;
S3 ¼ rða3 Þ ¼ 1 þ ða3 Þ þ ða3 Þ4 þ ða3 Þ6 þ ða3 Þ8 ¼ 1 þ a3 þ ð1 þ a þ a2 þ a3 Þ þ a3
þ ða þ a3 Þ ¼ a2 ;
S4 ¼ rða4 Þ ¼ 1 þ ða4 Þ þ ða4 Þ4 þ ða4 Þ6 þ ða4 Þ8 ¼ 1 þ ð1 þ aÞ þ a þ ða þ a3 Þ þ a2
¼ a þ a2 þ a3 ¼ a11 :

If there were no errors during the transmission, roots a, a2, a3, a4 would have
been zeros of the received word polynomial, because it is code word polynomial,
divisible by a generator polynomial.
Syndromes can be obtained as well from error locators (locator Xi = ak corre-
sponds to ith error which occurred in the kth position in a code word) by following
operations
292 6 Cyclic Codes

S1 ¼ X1 þ X2 ¼ a1 þ a7 ¼ a þ ð1 þ a þ a3 Þ ¼ 1 þ a3 ¼ a14 ;
S2 ¼ X12 þ X22 ¼ ða1 Þ2 þ ða7 Þ2 ¼ a2 þ ð1 þ a3 Þ ¼ a13 ;
S3 ¼ X13 þ X23 ¼ ða1 Þ3 þ ða7 Þ3 ¼ a3 þ ða2 þ a3 Þ ¼ a2 ;
S4 ¼ X14 þ X24 ¼ ða1 Þ4 þ ða7 Þ4 ¼ ð1 þ aÞ þ ð1 þ a2 þ a3 Þ ¼ a11 :

It is important to note that the syndrome values in two previous sets are obtained
in different ways. In the first case, syndrome values are determined by a received
word structure, while in the second, syndrome values are determined by error
positions. Of course, the receiver does not know the error positions (it has to find
them!) but it just gives a possibility for a decoder construction.
(d) Consider firstly a general case, i.e. BCH code for correcting ec errors in the
code word. Let during a transmission at m bits errors occurred. Syndrome
values can be found as polynomial values for roots a; a2 ; . . .; a2ec and an
equations system can be formed to find unknown error locators X1 ; X2 ; . . .; Xm
S1 ¼ X1 þ X2 þ  þ Xm ;
S2 ¼ X21 þ X22 þ  þ X3m ;
S3 ¼ X31 þ X32 þ  þ X2m ;
..
.
S2ec ¼ X2e m :
2ec
1 þ X2 þ
c
 þ X2e c

These equations are called power-sum symmetric functions, they form a


system of nonlinear algebraic equations of more variables and they are not
suitable for solution by a direct procedure [39].
Peterson showed that, by introducing a notion of error locator polynomial,
these equations can be transformed into a series of linear equations, Error
locator polynomial is a polynomial having as the roots the error locators, i.e.
[40].
Y
m
KðxÞ ¼ ð1 þ Xi xÞ ¼ K0 þ K1 x þ K2 x2 þ . . . þ Km xm
i¼1

It is obvious that error locations can be easily found if the coefficients of error
locator polynomial are known. On the other hand, it was shown that syndrome
components and coefficients of error locator polynomial are connected by a
system of linear equations

S1 þ K1 ¼ 0 Sm þ K1 Sm1 þ K2 Sm2 þ    þ Km1 S1 þ mKm ¼ 0


S2 þ K1 S1 þ 2K2 ¼ 0 Sm1 þ K1 Sm þ K2 Sm1 þ    þ Km S1 ¼ 0
.
S3 þ K1 S2 þ K2 S1 þ 3K3 ¼ 0 ..
..
. S þK S
2ec þK S
1 2ec 1 þ  þK S
2 2ec 2 ¼0
m 2ec m
Problems 293

In the special case, when all operations are in a binary field and when there are
m = ec errors, the system is additionally simplified and can be written as
2 32 3 2 3
1 0 0 0  0 0 K1 S1
6 76 7 6 7
6 S2 S1 1 0  0 0 76 K2 7 6 S3 7
6 76 7 6 7
6 S4 S3 S2 S1  0 0 76 K3 7 6 S5 7
6 76 7 6 7
6 76 7 6 7
AK ¼ 6 S S5 S4 S3  0 0 7 6 K4 7 ¼ 6 S7 7:
6 6 76 7 6 7
6 .. .. .. .. .. .. .. 7 6 .. 7 6 .. 7
6 . . . . . . . 76 . 7 6 . 7
6 76 7 6 7
6 76 7 6 7
4 S2ec 4 S2ec 5 S2ec 6 S2ec 7    Sec 2 Sec 3 5 4 Kec 1 5 4 S2ec 3 5
S2ec 2 S2ec 3 S2ec 4 S2ec 5    S ec Sec 1 Kec S2ec 1

This system has a unique solution if and only if syndrome matrix A is not
singular. Peterson showed that, due to the limitations for syndromes, when a
code is binary one, matrix A has a determinant different from zero if there are
ec or ec − 1 errors in a received word. Solutions of a system of equations for
ec = 1, ec = 2, ec = 3 and ec = 4 are given in Table 6.13.
In the previous example the case of a code correcting ec = 2 errors was
considered and on the basis of the received word the syndrome values were
calculated S1 ¼ a14 ; S2 ¼ a13 ; S3 ¼ a2 ; S4 ¼ a11 : System of equations con-
necting syndrome and coefficients of error locator polynomial is for this case

1 0 K1 S1
AK ¼ ¼ ;
S2 S1 K2 S3

and it is easily found K1 ¼ S1 ¼ a14 as well as S2 K1 þ S1 K2 ¼ S3 , from which


follows

K2 ¼ ðS3  S2 S1 Þ=S1 ¼ ða2  a13 a14 Þ=a14 ¼ ð1 þ a þ a3 Þ=a14 ¼ a7 =a14
¼ a7 þ 15 ¼ a8 ;

and a same result could be obtained using relation K2 ¼ ðS3 þ S31 Þ=S1 and
error locator polynomial is

Table 6.13 Solutions of a system of equations for ec = 1, ec = 2, ec = 3 and ec = 4


ec = 1: ec = 2:
K1 = S1 K1 ¼ S1 ; K2 ¼ ðS3 þ S31 Þ=S1
ec = 3: ec = 4:
K1 ¼ S1 K1 ¼S1
S21 S3 þ S5 S1 ðS7 þ S71 Þ þ S3 ðS51 þ S5 Þ
K2 ¼ K2 ¼
S31 þ S3 S3 ðS31 þ S3 Þ þ S1 ðS51 þ S5 Þ
K3 ¼ ðS31 þ S3 Þ þ S1 K2 K3 ¼ðS31 þ S3 Þ þ S1 K2
K4 ¼ððS5 þ S21 S3 Þ þ ðS31 þ S3 ÞK2 Þ=S1
294 6 Cyclic Codes

KðxÞ ¼ 1 þ a14 x þ a8 x2 :

The same result can be obtained from error locators

Y
2
KðxÞ ¼ ð1 þ Xi xÞ ¼ ð1 þ axÞð1 þ a7 xÞ ¼ 1 þ ða þ ð1 þ a þ a3 ÞÞx þ a8 x2
i¼1
¼ 1 þ a14 x þ a8 x2 ;

confirming the validity of the exposed procedure.


(e) Peterson algorithm for decoding of a binary BCH code correcting ec errors
consists of a few phases as follows [40]:
1. Find syndromes Sj as values of the received word polynomial r(aj), j = 1,
2, …., 2ec.
2. Form syndrome matrix A and calculate its determinant. If determinant
differs from zero, go immediately to the step 4.
3. Form a new syndrome matrix by omitting last two columns and the lower
two rows from the old syndrome matrix. Go back to the step 2.
4. Solve a linear equations system, on the basis of known syndrome values
find coefficients of error locator polynomial and form K(x).
5. Find roots of K(x). If the roots are not obvious or K(x) has not roots in a
given field, go to the step 8.
6. Invert bits pointed out by polynomial K(x) roots. If a number of corrected
errors is smaller than ec, verify whether the obtained word satisfy all 2ec
syndrome equations. If not, go to step 8.
7. Send at the output a correct word and stop algorithm.
8. Signal a decoding error and stop algorithm.
Procedure defined by steps 1–3 is explained in details in the previous text.
However, the finding of error locator polynomial roots is not a trivial task. To
find them a Chien search can be used, being a systematic procedure to find the
error locator polynomial in a field GF(2m) [39].
In the previous example it was found that the error locator polynomial is
KðxÞ ¼ 1  a14 x þ a8 x2 . Its factorization yields

KðxÞ ¼ ð1 þ X1 xÞð1 þ X2 xÞ ¼ 1 þ ðX1 þ X2 Þx þ X1 X2 x2 ;

resulting in the system of equations for solving

X1 þ X2 ¼ a14 ; X1 X2 ¼ a8 :

In this case even without a Chien search, the values of error locator can be
easily found X1 ¼ a1 ; X2 ¼ a7 , and a correction is carried out by inversion of
the first and the eighth bit.
Polynomial corresponding to the corrected received word is now
Problems 295

^r ðxÞ ¼ 1 þ x4 þ x6 þ x7 þ x8 ;

and syndrome components for the corrected polynomial have the values

^
S1 ¼ ^r ðaÞ ¼ 1 þ a4 þ a6 þ a7 þ a8 ¼ 1 þ ð1 þ aÞ þ ða2 þ a3 Þ þ ð1 þ a þ a3 Þ
þ ð1 þ a2 Þ ¼ 0;
^
S2 ¼ ^r ða2 Þ ¼ 1 þ ða2 Þ4 þ ða2 Þ6 þ ða2 Þ7 þ ða2 Þ8 ¼ 1 þ ð1 þ a2 Þ
þ ð1 þ a þ a2 þ a3 Þ þ ð1 þ a3 Þ þ a ¼ 0;
^
S3 ¼ ^r ða3 Þ ¼ 1 þ ða3 Þ4 þ ða3 Þ6 þ ða3 Þ7 þ ða3 Þ8 ¼ 1 þ ð1 þ a þ a2 þ a3 Þ
þ a3 þ ða2 þ a3 Þ þ ða þ a3 Þ ¼ 0;
^
S4 ¼ ^r ða4 Þ ¼ 1 þ ða4 Þ4 þ ða4 Þ6 þ ða4 Þ7 þ ða4 Þ8 ¼ 1 þ a þ ða þ a3 Þ
þ ð1 þ a2 þ a3 Þ þ a2 ¼ 0:

All syndrome components are equal to zero and it can be concluded that roots
a, a2, a3, a4 are polynomial ^r ðxÞ zeros. Otherwise speaking, this polynomial is
divisible by a generator polynomial and its coefficients define a valid code
word. In this case it can be considered that the decoding was successful,
because decoder corrected both errors in a code word.

Problem 6.13 Field GF(25) is generated by the primitive polynomial p(x) = x5 +


x2 + 1. Find a BCH code constructed over this field, if one generator polynomial
root is a1 = a and a code has to correct up to three errors in a code word.
(a) If the polynomial corresponding to received word is r(x) = x10, explain a
decoding procedure by using Peterson algorithm.
(b) Explain Berlekamp algorithm for BCH codes decoding and apply it to the
previous example.
(c) If a polynomial corresponding to the received word is r(x) = 1 + x9 + x11 +
x14, explain decoding by applying Berlekamp algorithm.

Solution
In this case a primitive polynomial root is found on the basis

pðaÞ ¼ a5 þ a2 þ 1 ¼ 0;

and the relation a5 = a2 + 1 is satisfied, allowing to form a corresponding extended


field GF(32).
To construct a code correcting three errors at a code word length n = 31 bit, a
generator polynomial should have roots a, a2, a3, a4, a5 i a6. As given in
Table 6.14, in the field there are seven conjugated roots groups, where to the
generator polynomial roots correspond the groups (a, a2, a4, a8, a16), (a3, a6, a12,
a17, a24), (a5, a9, a10, a18, a20). Differently to the previous cases, identity element
296 6 Cyclic Codes

Table 6.14 Galois field GF(25)


Exponential Binary Group Exponential Binary Group
equivalent equivalent equivalent equivalent
0 0 0 0 0 0 – a15 1 1 1 1 1 G
a0 1 0 0 0 0 A a16 1 1 0 1 1 B
a1 0 1 0 0 0 B a17 1 1 0 0 1 C
a2 0 0 1 0 0 B a18 1 1 0 0 0 D
a3 0 0 0 1 0 C a19 0 1 1 0 0 E
a4 0 0 0 0 1 B a20 0 0 1 1 0 D
a5 1 0 1 0 0 D a21 0 0 0 1 1 F
a6 0 1 0 1 0 C a22 1 0 1 0 1 F
a7 0 0 1 0 1 E a23 1 1 1 1 0 G
a8 1 0 1 1 0 B a24 0 1 1 1 1 C
a9 0 1 0 1 1 D a25 1 0 0 1 1 E
a10 1 0 0 0 1 D a26 1 1 1 0 1 F
a11 1 1 1 0 0 F a27 1 1 0 1 0 G
a12 0 1 1 1 0 C a28 0 1 1 0 1 E
a13 0 0 1 1 1 F a29 1 0 0 1 0 G
a14 1 0 1 1 1 E a30 0 1 0 0 1 G

for addition is included, always existing in a field, but not important for a BCH code
construction.
The code (31, 16) generator polynomial is obtained as a product of three min-
imal polynomials

g1 ðxÞ ¼ x15 þ x11 þ x10 þ x9 þ x8 þ x7 þ x5 þ x3 þ x2 þ x þ 1:

(a) Polynomial corresponding to the received word r(x) = x10, in the case when
the received word consists of all zeros, except at the 11th position, where is
binary one. The code corrects at least three errors, the guaranteed minimum
Hamming distance is d = 2  3 + 1 = 7 and in a code there is not a code
word having less than seven ones (except “all zeros” word). It is obvious that
the most probable is that the received word originates just from this word
consisting of all zeros, i.e. it is the most probable that the error occurred at the
eleventh position. It will be now verified by the Peterson algorithm [40].
The first step is the syndrome values determining

S1 ¼ rðaÞ ¼ a10 ; S2 ¼ rða2 Þ ¼ ða10 Þ2 ¼ a20 ; S3 ¼ rða3 Þ ¼ ða10 Þ3 ¼ a30 ;


S4 ¼ rða4 Þ ¼ ða10 Þ4 ¼ a9 ; S5 ¼ rða5 Þ ¼ ða10 Þ5 ¼ a19 ; S6 ¼ rða6 Þ ¼ ða10 Þ6 ¼ a29 :

Further, the syndrome matrix is constructed


Problems 297

2 3
1 0 0
A ¼ 4 a20 a10 1 5
a9 a30 a20

and it is verified whether it is non-singular. The third row of matrix A equals to


the second row raised at the power a29, therefore the second and the third
column should be omitted as well as two lower rows. A unity matrix
(dimensions 1  1) is obtained and a single error locator polynomial
coefficient is K1 = a10, the error locator being X1 = a10. Therefore, the
conclusion is drawn that the eleventh bit was inverted and the word which has
all zeros is decoded.

(b) To apply Berlekamp algorithm, firstly the syndrome polynomial should be


defined

SðxÞ ¼ S1 x þ S2 x2 þ . . . þ S2ec x2ec ;

where, as earlier, Sk ¼ rðak Þ; k ¼ 1; 2; . . .; 2ec .


Further, the error magnitude polynomial is formed

XðxÞ ¼ ½1 þ SðxÞ KðxÞ


¼ ð1 þ S1 x þ S2 x2 þ . . . þ S2ec x2ec Þð1 þ K1 x þ K2 x2 þ . . . þ Km xm Þ
¼ 1 þ X 1 x þ X 2 x2 þ . . .

In this (binary) case coefficients of X(x) having an odd index must be zero, and
a decoding reduces to finding the polynomial K(x) having a power smaller or
equal to ec satisfying

½1 þ SðxÞ KðxÞ ð1 þ X2 x2 þ X4 x4 þ . . . þ X2ec x2ec Þ x2ec þ 1

Berlekamp algorithm can now be defined as a procedure for finding coeffi-


cients of polynomial K(x), by solving a series of smaller problems which have
the form

½1 þ SðxÞ Kð2kÞ ðxÞ ð1 þ X2 x2 þ X4 x4 þ . . . þ X2k x2k Þ x2k þ 1 ; k ¼ 1; 2; . . .; ec

Firstly, it is verified whether K(0)(x) = 1 is a solution for a case k = 1. If it is


true, a counter k is incremented, but if it is not satisfied, a correction factor is
calculated and added to K(0), yielding a new solution K(2)(x). The algorithm
ingenuity is in the correction factor calculating, which is chosen so as that a
new solution is not valid only for a current case, but for all previous k values.
Berlekamp algorithm for binary BCH codes decoding can now be formulated
with a few important steps [41].
298 6 Cyclic Codes

1. Ser the initial conditions: K(−1)(x) = K(0)(x) = 1, d−1 = 1, d0 = S1,


l−1 = l0 = 1.
2. An improved error locator polynomial is calculated

a. if dk = 0, then Kðk þ 1Þ ðxÞ ¼ KðkÞ ðxÞ h i


b. if dk 6¼ 0, then Kðk þ 1Þ ðxÞ ¼ KðkÞ ðxÞ þ dk dq1 xkq  KðqÞ ðxÞ

3. Find dk as coefficient of xk+1 in the product K(k)(x)[1 + S(x)], i.e.

ðk Þ ðk Þ
dk ¼ Sk þ 1 þ K1 Sk þ . . . þ K1 Sk þ 1lk

4. Find lk+1 = max(lk + k − q)


5. Set k = k + 1. If k < 2ec, go to step 2.
6. Find roots of KðxÞ ¼ Kð2tÞ ðxÞ, correct the corresponding locations in the
received word and stop algorithm.
7. If the previous step cannot be carried out, signal a decoding error.
an efficient implementation. A complexity of Peterson technique increases as
the square of the number of corrected errors, yielding an efficient application
for binary BCH decoders correcting a small number of errors. On the other
hand, the Berlekamp algorithm complexity increases linearly, allowing
forming the efficient decoders correcting more tens of errors.

Table 6.15 Berlekamp K K(k)(x) dk lk


algorithm for a Problem 6.13b
−1 1 1 0
0 1 a10 0
1 1 + a10x 0 1
2 1 + a10x 0 1
3 1 + a10x 0 1
4 1 + a10x 0 1
5 1 + a10x 0 1
6 1 + a10x

Table 6.16 Berlekamp K K(k)(x) dk lk


algorithm for a Problem 6.13c
−1 1 1 0
0 1 1 0
1 1 + x 0 1
2 1 + x a3 1
3 1 + x + a3x2 0 2
4 1 + x + a3x2 a20 2
5 1 + x + a16x2 + a17x3 0 3
6 1 + x + a16x2 + a17x3
Problems 299

For the previously considered example, polynomial corresponding to the


received word is r(x) = x10 and syndrome components are S1 ¼ a10 ; S2 ¼
a20 ; S3 ¼ a30 ; S4 ¼ a9 ; S5 ¼ a19 ; S6 ¼ a29 : Syndrome polynomial is S
(x) = a10x + a20x2 + a30x3 + a9x4 + a19x5 + a29x6, and algorithm continues
as shown in Table 6.15.
Error locator polynomial is KðxÞ ¼ 1 þ a10 x pointing to error at the 11th
position, and the word “all zeros” is decoded. As a difference to Peterson
algorithm, m < ec errors are successfully corrected without the additional
procedure decoding modifications.

(c) For a polynomial corresponding to the received word r(x) = 1 + x9 + x11 +


x14, syndrome polynomial is S(x) = x + x2 + a29x3 + x4 + a23x5 + a27x6, and
a decoding procedure is shown in details in Table 6.16.
Error locator polynomial now can be factored

KðxÞ ¼ 1 þ x þ a16 x2 þ a17 x3 ¼ ð1 þ a13 xÞð1 þ a16 xÞð1 þ a19 xÞ;

pointing to errors at the positions corresponding to a13, a16 and a19 and the
correct received word is
 
r ð xÞ ¼ 1 þ x9 þ x11 þ x13 þ x14 þ x16 þ x19 ¼ x4 þ x þ 1 gð xÞ:

Problem 6.14 GF(23) is generated by a primitive polynomial p(x) = x3 + x2 + 1.


Find Reed-Solomon code constructed over this field, write one code word, find the
code rate, minimum Hamming distance and draw code distance spectrum, if one
generator polynomial root is a1 = a and if
(a) the code can correct one error in the code word,
(b) the code can correct two errors in the code word,
(c) the code can correct three errors in the code word.

Solution Galois field GF(23) construction was explained in one of the previous
problems and field structure is given in Table 6.10. As previously explained, a
procedure for BCH construction guarantees the code correcting at least the number
of errors as required. However, it might happen that a code in some cases corrects
more errors than defined in advance, as illustrated in Problems 6.1 and 6.2.
This BCH feature, sometimes undesirable, is a consequence of the fact that gen-
erator polynomial has a greater number of successive roots than directly required by
code construction procedure (conjugate roots are included). In this problem it will
be illustrated how this drawback can be avoided, if the requirement for a binary
code is given up.
Reed-Solomon codes are a special class of nonbinary BCH codes. For a dif-
ference from binary BCH codes, here the code symbols and generator polynomial
300 6 Cyclic Codes

roots as well are from the same field. Because of that, generator polynomial of
Reed-Solomon code for correcting ec errors is not a minimal polynomials product,
but a direct product of factors am0 ; am0 þ 1 ; am0 þ 2 ; . . .; am0 þ 2ec 1 where usually
m0 = 1 is chosen, because the element a is primitive by rule [42]. In such a way the
code having word length n = 2m − 1 symbols is obtained where the number of
redundant symbols is always n − k = 2ec.
(a) To construct a code to correct one symbol at the code word having length
n = 7 nonbinary symbols, a generator polynomial should have roots a and a2.
Number of redundant symbols is n − k = 2, and the obtained code is (7, 5).
A generator polynomial is obtained directly

gðxÞ ¼ ðx þ aÞðx þ a2 Þ ¼ a3 þ xða þ a2 Þ þ x2 ¼ a3 þ a6 x þ x2 ;

One possible information word is 10001 and information polynomial is i


(x) = 1 + x4. Corresponding code polynomial is c(x) = i(x)g(x) =
a3 þ a6 x þ x2 þ a3 x4 þ a6 x5 þ x6 and the resulting code word is a3 a6 10a3 a6 1.
Every polynomial coefficient in GF(8) can be represented by one three-bit and
it can be said that the considered Reed-Solomon encoder transformed infor-
mation sequence (100,000,000,000,100) into the code sequence
(101,011,100,000,101,011,100).
Encoder transforms 5 information bits (represented by 15 bits) into 7 code
symbols (represented by 21 bit) and it is obvious that code rate is R = 5/7. It is
interesting to note that an increasing of the numbers of bits in the code word
(relating to BCH) is additionally justified by the fact that this code can correct
as well double and triple errors occurred in one nonbinary symbol.
Every Reed-Solomon code is MDS code, meaning that a Singleton bound is
satisfied with equality. In this case dmin = n − k + 1 = 3, and a code can
correct one error. It is known that for any MDS code defined over a field GF
(q) a code distances spectrum can be found by using the relation [27]
 X
id min

n j i1
Ai ¼ ðq  1Þ ð1Þ qidmin j ; i¼dmin ; dmin þ 1;. . .; n
i j¼0
j

where n is a code word length, dmin is minimum Hamming length while Ai


denotes a number of code words having i nonzero symbols. In this case the
relation becomes
 X
i3 
7 j i  1 i3j
Ai ¼ 7 ð1Þ 8 ; i¼3; 4; . . .; 7:
i j¼0
j

It is interesting to note that at any word position can be one from q = 8


symbols and the code words number is qk = 85 = 32768. More, it should be
also noticed that there is always one code word comprising binary zeros only
Problems 301

(a) (b)

Fig. 6.18 Code distances spectrum of Reed-Solomon codes RS(7, 5) (a) and RS(7, 3) (b)

and it is obvious that code distances spectrum has a shape as shown in


Fig. 6.18a. Although the code has the same Hamming distance as BCH code
constructed for ec = 1 in Problem 6.1, its code rate is increased for some
amount, what is in any case desirable.

(b) To construct a RS code correcting two symbol errors in the code word length
n = 7, roots of generator polynomial should be a, a2, a3, a4, and for the code
(7, 3) it is obtained

gðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a3 Þðx þ a4 Þ ¼ a3 þ x þ a3 x2 þ a2 x3 þ x4 :

Code polynomial corresponding to information word 100 is c(x) = g


(x) = a3 þ x þ a3 x2 þ a2 x3 þ x4 and the obtained code word is a3 1a3 a2 100.
Every polynomial coefficient in GF(8) can be represented by one three-bit
word and a corresponding code sequence is (101,100,101,001,100,000,000).
At the encoder output 83 = 512 various code words can appear, code rate is
R = 3/7, minimum Hamming distance is dmin = n − k + 1 = 5, the code
spectrum is shown in Fig. 6.18b.

(c) To construct a code correcting three errors in the code word length n = 7, roots
of generator polynomial should be a, a2, a3, a4, a5, a6, and it is obtained

gðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a3 Þðx þ a4 Þðx þ a5 Þðx þ a6 Þ


¼ 1 þ x þ x2 þ x3 þ x4 þ x5 þ x6 :
302 6 Cyclic Codes

The coefficients of generator polynomial are binary and a code is a binary one.
It is obvious that a code RS(7, 1) reduces to BCH code (7, 1) equivalent to
sevenfold repetition code, where dmin = 7, the corresponding spectrum shown
in Fig. 6.15b.

Problem 6.15 Field GF(24) is generated by the primitive polynomial p(x) = x4 +


x + 1. Construct RS code over this field and find code distances spectrum under
conditions that a code corrects three errors as well as that a code word length is a
maximum possible. Analyze code properties after shortening where two most
significant information symbols are omitted.
Solution
Galois field GF(24) construction by using polynomial p(x) = x4 + x + 1 is
explained in Problem 6.2. To construct a code correcting three errors in code word
length n = 15, roots of generator polynomial should be a, a2, a3, a4, a5, a6, and it is
obtained

gðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a3 Þðx þ a4 Þðx þ a5 Þðx þ a6 Þ


¼ a6 þ a9 x þ a6 x2 þ a4 x3 þ a14 x4 þ a10 x5 þ x6 ;

and the generating matrix of the corresponding code (15, 9) is


2 3
a6 a9 a6 a4 a14 a10 1 0 0 0 0 0 0 0 0
6 7
60 a6 a9 a6 a4 a14 a10 1 0 0 0 0 0 0 07
6 7
60 0 a6 a9 a6 a4 a14 a10 1 0 0 0 0 0 07
6 7
6 7
60 0 0 a6 a9 a6 a4 a14 a10 1 0 0 0 0 07
6 7
G¼6
60 0 0 0 a6 a9 a6 a4 a14 a10 1 0 0 0 077:
6 7
60 0 0 0 0 a6 a9 a6 a4 a14 a10 1 0 0 07
6 7
60 0 0 0 0 0 a6 a9 a6 a4 a14 a10 1 0 07
6 7
6 7
40 0 0 0 0 0 0 a6 a9 a6 a4 a14 a10 1 05
0 0 0 0 0 0 0 0 a6 a9 a6 a4 a14 a10 1

In general case a Reed-Solomon code word is found as

cðxÞ ¼ iðxÞgðxÞ ¼ ði0 þ i1 x þ i2 x2 þ . . . þ ik xk ÞgðxÞ ¼ c0 þ c1 x þ c2 x2 þ . . . þ cn xn ;

and a code shortening is carried out by omitting some number (in general case—s)
information bits (symbols) usually at the most significant positions.

c0 ðxÞ ¼ ði0 þ i1 x þ i2 x2 þ . . . þ iks xks ÞgðxÞ ¼ c0 þ c1 x þ c2 x2 þ . . . þ cns xns :

As ith row of generator matrix corresponds to polynomial xig(x), it is easy to


show that a shortening operation corresponds to omitting last s rows and columns of
Problems 303

Table 6.17 Galois field Exponential Polynomial Binary


GF(23) equivalent equivalent equivalent
a0 1 1 0 0
a1 a 0 1 0
a2 a2 0 0 1
a3 1+ a 1 1 0
a4 a+ a2 0 1 1
a5 1+ a+ a2 1 1 1
a6 1+ a2 1 0 1

a matrix G. The generator polynomial is not changed and a minimum Hamming


distance (code capability for error correction) is not changed by shortening as well.
By omitting two last information symbols, code word length becomes n′ = 13,
generator matrix dimensions become 13  7, code rate is R′ = 7/13 (<R = 9/15)
but dmin = 7 does not change, and ec = 3.
Problem 6.16 Field GF(23) is generated by the primitive polynomial p(x) = x3 +
x + 1. Find a Reed-Solomon code constructed over this field, if one generator
polynomial root is a1 = a and a code has to correct two errors in the code word.
(a) Explain a procedure for obtaining systematic Reed-Solomon code and draw
the block scheme of a corresponding encoder.
(b) Starting from a nonsystematic code version form a shortened RS(5, 2) code
and draw its spectrum.

Solution
Galois field GF(8) construction by using polynomial p(x) = x3 + x + 1 is given in
Table 6.17 (not the same as in Problem 6.1!). Minimal polynomials corresponding
to conjugate roots are not given, because they are not interesting for RS code
construction.
(a) Reed-Solomon code correcting two errors in the code word length n = 7 has
the generator polynomial

gðxÞ ¼ ðx þ aÞðx þ a2 Þðx þ a3 Þðx þ a4 Þ ¼ a6 þ a5 x þ a5 x2 þ a2 x3 þ x4 :

Let the input sequence is (110010101). From Table 6.17 it can be concluded
that the corresponding symbols are a3 a1 a6.
Procedure to obtain the remainder after division by a generator polynomial can
be symbolically written as follows:
304 6 Cyclic Codes

...

gn-k gn-k+1 g2 g1 g0

...
code word

information word

Fig. 6.19 Reed-Solomon systematic encoder

α2 α5 α5 α6

Fig. 6.20 Division circuits as a main element of RS(7, 3) systematic encoder

and the corresponding cyclic code word is (a5 a0 a5 a6 a5 a0 a1), the binary
equivalent is (111001111101111001010). Block scheme of systematic Reed-
Solomon encoder for any generator polynomial is given in Fig. 6.19. The
functioning is the same as for binary codes—at a link the information bits are
send firstly and after them the remainder polynomial coefficients found in
delay cells of the circuit for division by polynomial g(x). Circuit for division,
as a main encoder element, is shown in details in Fig. 6.20 for a considered
case.

(b) Code polynomial of the nonsystematic (7, 3) code is


Problems 305

(a) (b)

Fig. 6.21 Code distance spectrum of the shortened codes RS(6, 1) (a) and RS(5, 1) (b)

cðxÞ ¼ iðxÞgðxÞ ¼ ði0 þ i1 x þ i2 x2 Þða6 þ a5 x þ a5 x2 þ a2 x3 þ x4 Þ;

as every code word symbol can take one from eight field values, total number
of code words is 83 = 512.
The omitting of the last information symbol yields the code (6, 2) having
82 = 64 code words, the word length n = 6, with the corresponding
polynomial

c0 ðxÞ ¼ ði0 þ i1 xÞða6 þ a5 x þ a5 x2 þ a2 x3 þ x4 Þ;

while the omitting the last two information symbols yields the code (5, 1),
with the corresponding polynomial

c00 ðxÞ ¼ i0 ða6 þ a5 x þ a5 x2 þ a2 x3 þ x4 Þ:

Code distance spectra of shortened codes RS(6, 2) and RS(5, 1) are shown in
Figs. 6.21a, b. It is obvious that a minimum Hamming distance for both codes
is dmin = 5, and they can, as the code RS(7, 3), by which shortening they were
obtained, correct two errors in the code word length.

Problem 6.17 Consider a primitive RS code correcting double errors, constructed


over the field GF(8), generated by the primitive polynomial p(x) = x3 + x + 1.
(a) Write the equations connecting syndrome components and error locator
polynomial coefficients, as well as the equations connecting syndrome com-
ponents, error positions and error magnitudes. Verify whether these relations
are satisfied if the transmitted word “all zeros” was sent and the errors
occurred at the first and the fifth code word symbols, having magnitudes a4 i
a5, respectively.
306 6 Cyclic Codes

(b) By using Peterson-Gorenstein-Zierler algorithm decode a received word


whose polynomial is r(x) = a5x2 + x3 + a2x4 + a2x6.
(c) By using Peterson-Gorenstein-Zierler algorithm decode a received word
whose polynomial is rðxÞ ¼ a5 x2 þ a3 x3 þ ax4 þ a5 x5 þ a2 x6 .

Solution
(a) The relation connecting syndrome components and error locator polynomial
coefficients, when during the transmission a number of errors occurred being
just equal to the number of errors (m = ec) correctable by Reed-Solomon code
is [40]
2 32 3 2 3
S1 S2 S3 S4  Sec 1 Sec Kec Sec þ 1
6 S Sec þ 1 7 6 7 6 7
6 2 S3 S4 S5  Sec 7 6 Kec 1 7 6 Sec þ 2 7
6 76 7 6 7
6 S3 Sec þ 2 7 6 7 6 7
6 S4 S5 S6  Sec þ 1 7 6 Kec 2 7 6 Sec þ 3 7
6 S Sec þ 3 7 6 Kec 3 7 ¼ 6 Sec þ 4 7
7 6 7 6
AK¼6
0
6
4 S5 S6 S7  Sec þ 2
76 7 6
7:
7
6 .. ... ... ... .. .. .. 7 6 .. 7 6 .. 7
6 . . . . 7 6 . 7 6 . 7
6 76 7 6 7
6 76 7 6 7
4 Sec 1 Sec Sec þ 1 Sec þ 2    S2ec 3 S2ec 2 5 4 K2 5 4 S2ec 1 5
Sec Sec þ 1 Sec þ 2 Sec þ 3    S2ec 2 S2ec 1 K1 S2ec

It can be shown that matrix A′ is nonsingular if in a received word there are


just ec errors, while A′ is singular if less than ec errors occurred. If matrix A′ is
singular, then the last right column and the lower row are omitted and the
determinant of new obtained matrix is calculated, the procedure repeating until
the resulting matrix becomes nonsingular. Error locator polynomial coeffi-
cients are then found by using standard linear algebraic techniques and the
corresponding polynomial roots define error positions Xi. Finally, error mag-
nitudes eXi can be found from the matrix relation
2 32 3 2 3
X1 X2    Xm eloga X1 S1
6 X12 X22    Xm2 76 eloga X2 7 6 S2 7
6 76 7 6 7
Be6 . .. . . .. 76 .. 7 ¼ 6 .. 7:
4 .. . . . 54 . 5 4 . 5
m m m
X1 X2    Xm eloga Xm Sm

In the considered example X1 = a0, e1 = a4, X2 = a4, e2 = a5 and


K(x) = (1 + X1x)(1 + X2x) = a4x2 + a5x + 1 and r(x) = a4 + a5x4. Because of
ec = 2, syndrome components are S1 = r(a) = a4 + a2 = a, S2 = r(a2) = a4 +
6
= a3, S3 = r(a3) = a4 + a3 = a6 and S4 = r(a4) = a4 + a0 = a5, and it is easy
to verify the following

a a3 a4 a6 a4 a5 a0 a
A0 K ¼ ¼ ; Be ¼ ¼ :
a3 a6 a5 a5 a a3 a4 a3
Problems 307

(b) Peterson-Gorenstein-Zierler algorithm is based on a few steps [40, 43]:


1. Find syndromes for a received word: {Sl} = {r(al)}, l = 1, 2, 3, …, 2ec,
then construct syndrome matrix A′.
2. Calculate syndrome matrix determinant. If it equals zero, construct a new
matrix by omitting the last right and the lowest row from the previous
syndrome matrix. Shorten K for one position by erasing the coordinate Kt.
Repeat this step until the determinant becomes not equal to zero.
3. Find polynomial K(x) coefficients and its roots as well. If the roots are not
obvious or K(x) has not roots in the observed field go to step 7.
4. Construct matrix B and solve it to find error magnitudes.
5. Subtract error magnitudes from the values in the corresponding positions
of the received word.
6. Deliver the corrected word and stop algorithm.
7. Signal decoding error and stop algorithm.
In this case the received polynomial r(x) = a5x2 + x3 + a2x4 + a2x6 determi-
nes syndrome components S1 = a6, S2 = a3, S3 = a4, S4 = a3 and the fol-
lowing relation is valid

a6 a3 K2 a4
A0 K ¼ :
a3 a4 K1 a3

Matrix A′ is nonsingular, a method of simple substitution yields error locator


polynomial coefficients K2 = a, K1 = a2 and K(x) = ax2 + a2x + 1 = (1 +
3
x)(1 + a5x), the errors occurred at the fourth and the sixth code word symbol.
Error magnitudes are then found from the following system of equations

a3 a5 e3 a6 a3 e 3 þ a5 e 5 ¼ a6
Be ¼ ¼ )
a6 a3 e5 a3 a6 e 3 þ a3 e 5 ¼ a3

from which by a simple substitution it is found that the error at the fourth
position has the magnitude a, while the error at the sixth symbol has the
magnitude a5. Finally, error polynomial becomes

eðxÞ ¼ eloga X1 xloga X1 þ eloga X2 xloga X2 ¼ ax3 þ a5 x5 ;

and a nearest code word is found by adding error polynomial and the received
polynomial

cðxÞ ¼ rðxÞ þ eðxÞ ¼ ða5 x2 þ x3 þ a2 x4 þ a2 x6 Þ þ ðax3 þ a5 x5 Þ


¼ a5 x 2 þ a3 x 3 þ a2 x 4 þ a5 x 5 þ a2 x 6 :
308 6 Cyclic Codes

Finally, it is suitable to verify do the obtained result is a code word.


Generator RS polynomial correcting double errors at the code word length
n = 7 is

gð xÞ ¼ ðx  aÞðx  a2 Þðx  a3 Þðx  a4 Þ ¼ x4 þ a3 x3 þ x2 þ ax þ a3 ;

and it is easy to verify that the code word can be written as cðxÞ ¼ a2 x2 gðxÞ.

(c) A Peterson-Gorenstein-Zierler algorithm is applied as well, but in this case the


received polynomial is rðxÞ ¼ a2 x6 þ a5 x5 þ ax4 þ a3 x3 þ a5 x2 and syndrome
components are S1 = a, S2 = a5, S3 = a2, S4 = a4 . This time one starts from a
matrix

a a5
A0 ¼ ;
a5 a2

which is singular, because the second row equals to the first multiplied by a4.
A matrix equation reduces to scalar equality A″K = aK1 ¼ a5 from which
easily follows K1 ¼ a4 and K(x) = 1 + a4x. Therefore, an error occurred at the
fifth code word symbol.
Error magnitude is found using

Be ¼ a4 e4 ¼ a;

from which easily follows that the error at the fourth symbol has the
magnitude a4 and the error polynomial becomes

eðxÞ ¼ ei1 xloga X1 ¼ a4 x4 ;

and a nearest code word is found by adding error polynomial and the received
polynomial

cðxÞ ¼ rðxÞ þ eðxÞ ¼ ða2 x6 þ a5 x5 þ ax4 þ a3 x3 þ a5 x2 Þ þ ða4 x4 Þ


¼ a2 x 6 þ a5 x 5 þ a2 x 4 þ a3 x 3 þ a5 x 2 :

Although the fifth symbol of received word is a and the fifth code word
symbol is a2 it should be noted that the error magnitude is a very large being
a4. However, it is only a consequence of the way for defining addition in a
field GF(8) shown in Table 6.15.

Problem 6.18 Consider a primitive RS code correcting double errors, constructed


over the field GF(8), generated by the primitive polynomial p(x) = x3 + x + 1.
Problems 309

(a) Explain in short Berlekamp-Massey algorithm for RS codes decoding.


(b) By applying Berlekamp-Massey algorithm decode the word
rðxÞ ¼ a5 x2 þ a3 x3 þ ax4 þ a5 x5 þ a2 x6 .

Solution
(a) Berlekamp-Massey algorithm for RS codes decoding starts from the fact that
the error position polynomial does not depend on error magnitudes in a code
word. It makes possible to use Berlekamp algorithm to find the error position,
and later on to determine error magnitudes and to do their correction.
A complete algorithm is divided into a few steps as follows [41, 44]:
1. Find syndromes for a received word: {Sl} = {r(al)}, l = 1, 2, 3, …, 2ec
2. Set initial conditions K(−1)(x) = K(0)(x) = 1, d-1 = 1, d0 = S1, l-1 = l0 = 1.
3. Calculate improved error locator polynomial
a. if dk = 0 then Kðk þ 1Þ ð xÞ ¼ KðkÞ ð xÞ h i
b. if dk 6¼ 0 then Kðk þ 1Þ ðxÞ ¼ KðkÞ ðxÞ þ dk dq1 xkq KðqÞ ðxÞ

4. Calculate dk as a coefficient of the power xk+1 in the product K(k)(x)[1 + S


(x)], i.e.

ðk Þ ðk Þ
dk ¼ Sk þ 1 þ K1 Sk þ . . . þ Klk Sk þ 1lk

5. Calculate lk+1 = max(lk, lq + k − q)


6. Set k = k + 1. If k < 2ec, go to the step 3.
7. Find the roots of KðxÞ ¼ Kð2tÞ ðxÞ, correct corresponding locations in a
received word and stop the algorithm.
8. If the previous step is not possible, signal decoding error.
9. Form error magnitude polynomial—a main equation for RS code
decoding:

XðxÞ ¼ ½1 þ SðxÞ KðxÞ x2ec þ 1


¼ 1 þ ðS1 þ K1 Þx þ x2 ðS2 þ S1 K1 þ K2 Þ þ . . .
þ xec ðSec þ Sec 1 K1 þ Sec 2 K2 þ . . . þ Sec Þ

10. If m is a number of occurred errors, error magnitude at the position loga Xi is

Y
m
eloga Xi ¼ zðXi1 Þ= ð1 þ Xk Xi1 Þ
k¼1
k 6¼ i

11. Form error polynomial eðxÞ ¼ eloga X1 xloga X1 þ . . . þ eloga X2 xloga X2 , add it to
the received word polynomial to find the nearest code word c′(x).
310 6 Cyclic Codes

Table 6.18 Berlekamp-Massey algorithm for Problem 6.18c


k K(k)(x) dk lk
−1 1 1 0
0 1 a 0
1 1 + ax a3 1
2 1 + a4x 0 1
3 1 + a4x 0 1
4 1 + a4x – 1

12. If c′(x) is not divisible by g(x) signal a decoding error.


It is obvious that the essential difference relating to Berlekamp algorithm for
BCH codes decoding are the last four steps, allowing finding the error mag-
nitude and its correction.

(b) In this case the received polynomial is rðxÞ ¼ a5 x2 þ a3 x3 þ ax4 þ


a5 x5 þ a2 x6 , corresponding syndromes are S1 = a, S2 = a5, S3 = a2, S4 = a4.
The first eight algorithm steps are illustrated in Table 6.18.
Syndrome polynomial is S(x) = ax + a5x2 + a2x3 + a4x4, error locator poly-
nomial K(x) = 1 + a4x (to which corresponds error locator X 1 = a4) and the
error magnitude polynomial can be formed

Xð xÞ ¼ KðxÞ½1 þ SðxÞ x5 ¼ ð1 þ a4 xÞð1 þ ax þ a5 x2 þ a2 x3 þ a4 x4 Þ x5

and a part of error magnitude polynomial up to a power ec = 2 is


X(x) = 1 þ a2 x.
Error magnitudes are

XðXi1 Þ
eloga Xi ¼ ; i ¼ 1; 2; k 6¼ i:
1 þ Xk Xi1

Although a code was designed to correct ec = 2 errors, in this case only one
error occurred (m = 1) and its magnitude is

e4 ¼ Xða4 Þ=1 ¼ 1 þ a2 ða3 Þ ¼ 1 þ a5 ¼ a4

and error polynomial is eðxÞ ¼ a4 x4 , the nearest code word is found by adding
error polynomial and received polynomial

cðxÞ ¼ rðxÞ þ eðxÞ ¼ ða5 x2 þ a3 x3 þ ax4 þ a5 x5 þ a2 x6 Þ þ ða4 x4 Þ


¼ a5 x2 þ a3 x3 þ a2 x4 þ a5 x5 þ a2 x6 ¼a2 x2 gðxÞ :
Problems 311

Problem 6.19 Consider a primitive RS code correcting double errors, constructed


over a field GF(8), generated by the primitive polynomial p(x) = x3 + x + 1.
(a) Explain extended Euclidean algorithm for finding the greatest common divisor
of two integers.
(b) Explain shortly Euclidean algorithm for RS codes decoding.
(b) Decode the received word r(x) = a5x2 + x3 + a2x4 + a2x6 using Euclidean
algorithm.
(c) Decode the received word rðxÞ ¼ ax2 þ a5 x4 using Euclidean algorithm.

Solution
(a) The Euclidean algorithm is an efficient method for finding the greatest com-
mon divisor (GCD) for a group of elements. It can be expressed as a linear
combination of these elements. Extended form of this algorithm, besides the
finding GCD, makes possible to find the corresponding coefficients.
Let a GCD for two elements (a,b) should be found. If the initial conditions
r−1 = a, r0 = b, s−1 = 1, s0 = 0, t−1 = 0, t0 = 1 are given, the algorithm
continues according to the following recursive relations

ri ¼ ri2  qi ri1
si ¼ si2  qi si1
ti ¼ ti2  qi ti1

where in every step the parameter qi is chosen providing that the relation
ri \ri1 is satisfied. Algorithm stops when the remainder rn = 0. The
remainder rn−1 is GCD(a,b). Recursive relations are chosen so as that in
every (ith) iteration provide

si a þ t i b ¼ ri

Table 6.19 illustrates the procedure for finding GCD(322, 115) and coeffi-
cients in linear combination for every iteration. It is obvious that GCD for 322
and 115 is 23, and the in every iteration the above relation is satisfied.

Table 6.19 Euclidean extended algorithm for finding the greatest common divisor for two
positive integers
i qi ri si ti sia + tib
−1 – a = 322 1 0 –
0 – b = 115 0 1 –
1 2 92 1 −2 1  322 − 2  115 = 92
2 1 23 −1 3 −1  322 + 3  115 = 23
3 4 0 5 −14 5  322 − 14  115 = 0
312 6 Cyclic Codes

(b) Euclidean algorithm for RS codes decoding is defined by the following steps
[39]
nP
k
1. Find syndrome polynomial SðxÞ ¼ sj xj þ 1 . If S(x) = 0, the received
j¼1
vector corresponds to a code word. If it is not true, go to the next step.
2. Set the following initial conditions: X1 ðxÞ ¼ x2ec þ 1 , X0 ðxÞ = 1 + S(x),
K-1(x) = 0, K0(x) = 1, i = 1.
3. By applying the extended algorithm find successive remainders ri(x) and
the corresponding polynomials ti(x)

Xi ðxÞ ¼ Xi2 ðxÞ  qðxÞXi1 ðxÞ


Ki ðxÞ ¼ Ki2 ðxÞ  qðxÞKi1 ðxÞ

where polynomial qi(x) is chosen so as that the power of remainder


polynomial (denoted by ri ðxÞ) is minimum.
4. Repeat the previous step until deg[Xn ðxÞ]  ec. Remainder Xn ðxÞ is
GCD for X1 ðxÞ and X0(x).
5. Calculate polynomial Kn(x) roots, defining the error positions.
6. Find error magnitudes.
It is obvious that this algorithm is a modification of Euclidean algorithm for
polynomials (instead of integers). Its modification is possible because the main
equation for BCH and Reed-Solomon codes decoding

½1 þ SðxÞ KðxÞmod x2ec þ 1 ¼ XðxÞ;

can be written in a slightly different form

HðxÞx2ec þ 1 þ KðxÞ½1 þ SðxÞ ¼ XðxÞ:

Now the extended Euclidean algorithm for finding GCD for x2t+1 and [1 + S
(x)], can be used to obtain a complete of solutions (K(i)(x), X(i)(x)) enabling
that in ith iteration the following is satisfied

HðiÞ ðxÞx2ec þ 1 þ KðiÞ ðxÞ½1 þ SðxÞ ¼ XðkÞ ðxÞ:

During RS codes decoding, finding H(x) is not of interest, but of a great


interest is a pair of solutions (K(i)(x), X(i)(x)). Solution corresponding to error
locator and magnitude polynomials is obtained when a power of X(i)(x) is less
or equal to the power of the polynomial K(i)(x).
It is interesting that in the literature can be found a variant where this problem
is formulated in slightly different form (the result is the same)
Problems 313

Table 6.20 Euclidean algorithm for Problem 6.19c


k Xk(x) qk(x) Kk(x)
−1 x5 – 0
0 1 + a6x + a5x2 + ax3 + ax4 – 1
1 a6 + ax + x2 + a5x3 a6 + a6x a6 + a6x
2 a3 + ax + a6x2 a2 + a3x a3 + a4x + a2x2

HðiÞ ðxÞx2ec þ KðiÞ ðxÞSðxÞ ¼ XðkÞ ðxÞ:

Berlekamp-Massey algorithm is more efficient than Euclidean algorithm, but


the difference between these two is not as dramatic as the difference between
Euclidean approach and a direct solving in Peterson-Gorenstein-Zierler algo-
rithm. A basic advantage of Euclidean algorithm is that is easy to understand
and to apply this algorithm.
(b) Here the code RS(7, 3) is considered, correcting double error and Euclidean
algorithm is applied, but syndromes are calculated starting from the element a0 i.e.
from Si = r(ai−1). Here r(x) = a x2 + a5x4 and syndrome components are S1 = a6,
S2 = a5, S3 = a, S4 = a. Initial conditions are X1 ðxÞ ¼ x5 , X0(x) = 1 + S
(x) = 1 + a6x + a5x2 + ax3 + ax4, K−1(x) = 0, K0(x) = 1. The next steps of algo-
rithm are given in Table 6.20.
The obtained results are polynomials

Xð xÞ ¼ a3 þ ax þ a6 x2 ¼ a3 ð1 þ a5 x þ a3 x2 Þ and Kð xÞ ¼ a3 þ a4 x þ a2 x2
¼ a3 ð1 þ ax þ a6 x2 Þ;

satisfying the equation

ð1 þ ax þ a6 x2 Þð1 þ a6 x þ a5 x2 þ ax3 þ ax4 Þ ¼ ð1 þ a5 x þ a3 x2 Þ x5 :

Brief Introduction to Algebra II

The aim of this subsection is to facilitate the study of the cyclic codes. Practically
all proofs are omitted. The corresponding mathematical rigor can be found in many
excellent textbooks [25, 27, 34].
It is in fact, the continuation of the introduction to algebra from the previous
chapter.
Finite (Galois) Field Arithmetic
Losely speaking, rings can be regarded as an initial step when introducing the
fields. The ring is a commutative group under addition, while the multiplication is
closed, associative and distributive over addition. The existence of an identity
314 6 Cyclic Codes

element for multiplication is not supposed (although, it may exist), Even, if the
identity element exists (it must be unique) all elements of the ring need not have the
inverses. If all ring elements (with the exception of identity element for addition—
0) have inverses and the multiplication is commutative, the field is obtained. Here it
will be entered more deeply to the theory of finite fields.
Ideals, Residue Classes, Residue Class Ring
The normal (invariant) subgroup is very important notion in the theory of groups.
The corresponding notion in the ring theory is ideal.
Axioms for the ideal:
I.1. Ideal I is the subgroup of the ring’s (R) additive group
I.2. 8i2I; 8r2R : ir^ ri2I:

Example
1. In the ring of all integers, the set of multiples of any particular integer is an ideal.
2. In the ring of polynomials in one variable with integer coefficients, the set of
polynomials which are multiples of any particular polynomial is an ideal.
Cosets can be formed, ideal being a subgroup. These cosets are here called
residue classes. The elements of the ideal form the first row (starting from 0). The
procedure is the same as being explained for the coset decomposition.

i1 = 0 i2 i3 . . .
r1 = r1 + i 1 r1 + i 2 r1 + i 3 . . .
r2 = r2 + i 1 r2 + i 2 r2 + i 3 . . .
. . . . . .
. . . . . .
. . . . . .

The ideal is a normal (invariant) subgroup and the cosets (residue classes) form a
group (factor group) under addition defined as follows
   
fri g þ rj ¼ ri þ rj ;

where {r} denotes the residue class containing r. If the multiplication is defined as
follows
   
fri g rj ¼ ri rj ;

it can be shown that the residue classes of a ring with respect to an ideal form also a
ring. This ring is called residue class ring.
Brief Introduction to Algebra II 315

Example In the ring of all integers, the integers which are multiples of 3 form an
ideal. The residue classes are {0}, {1} and {2}. They form the residue class ring—
modulo 3 addition and multiplication. In fact, they are elements of GF(3).
Ideals and Residue Class of Integers
As explained earlier, the set of all integers forms a ring under addition and mul-
tiplication. Let start with some definitions (only integers are considered):
1. If rs = t, it is said that t is divisible by r (and by s as well), or that r (s) divides t,
or that r and s are factors of t.
2. A positive integer p > 1 divisible only by ±p or ±1 is called a prime (integer).
3. The greatest common divisor of two integers, GCD(r, s), is the largest positive
integer dividing both of them.
4. The least common multiple of two integers, LCM(r, s), is the smallest positive
integer divisible by both of them.
5. Two integers are relatively prime if their greatest common divisor equals 1.
For every pair of integers t and d (non-zero integers) there is a unique pair of
integers q (the quotient) and r (the remainder), such that

t ¼ dq þ r; 0  r\jd j:

This is well known Euclid division algorithm. In the following the remainder
will be written as

r ¼ Rd ½t

or

r ¼ tðmod dÞ:

In fact, the last relation means that r and t have the same remainder modulo
d (both can be greater than d)—it is called congruence.
Some interesting properties of the remainders are

Rd ½t þ id ¼ Rd ½t ; for any i;
Rd ½t ¼ Rd ½t ; only quotient changes sign:

For two distinct non-zero integers (r, s) their GCD can always be expressed in
the form

GCDðr; sÞ ¼ ar þ bs;

and computed by an iterative application of the division algorithm.


Example Find GCD(105, 91)
316 6 Cyclic Codes

105 ¼ 1  91 þ 14
91 ¼ 6  14 þ 7
7 ¼ 1  7þ0
Therefore GCD(105, 91) = 7 and, starting at the bottom we obtain

7 ¼ 916  14
7 ¼ 916  ð105  91Þ
7 ¼ 916  105 þ 6  91
7 ¼ 6  105 þ 7  91

Note that the representation is not unique, also


7 ¼ 19  105 þ 22  91 ¼ 7  1058  91:

Two more interesting properties of remainders are:

Rd ðn1 þ n2 Þ ¼ Rd ðn1 Þ þ Rd ðn2 Þ;


Rd ðn1 n2 Þ ¼ Rd ½Rd ðn1 ÞRd ðn2 Þ :

Therefore, if one is interested only in remainders, to avoid big integers, they can
be replaced by their remainders at any point in computation.
It can be shown also that every positive integer can be written as a product of
prime number powers.
Using previously explained procedure it can be shown that residue class ring
modulo p is a field if (and only if) p is a prime number. Therefore, the arithmetic in
GF(p) (p—prime number) can be described as addition and multiplication modulo
p. These fields are called also prime fields.
In such a way, the arithmetic in prime fields is defined. To define the arithmetic
in GF(pn) the polynomial rings must be introduced.
Ideals and Residue Classes of Polynomials
Some definitions:
1. A polynomial with one variable x with the coefficients (fi) from (“over”) a field
GF(q) is of the following form

f ðxÞ ¼ fn1 xn1 þ    þ f1 x þ f0

2. The polynomial is monic if fn−1 = 1.


3. The zero polynomial is f(x) = 0.
In the following exposition the monic polynomials would be supposed if not
otherwise stated.
The polynomials over GF(q) form a commutative ring if the operations are
defined as the “usual” addition and multiplication of polynomials. The obtained
Brief Introduction to Algebra II 317

results will also have the coefficients from the field because they are obtained by
addition and multiplication of the field elements. The inverse element for addition is
just the same polynomial having as the coefficients the corresponding inverse
elements from the field. Generally, the inverse elements for multiplication do not
exist, but the division is possible. To obtain the field special conditions should be
imposed.
Some more definitions:
1. A polynomial p(x) of degree n is irreducible if it is not divisible by any poly-
nomial of degree less than n, but is greater than 0.
2. A monic irreducible polynomial of degree of at least 1 is called a prime
polynomial.
3. The greatest common divisor of two polynomials is the monic polynomial of
largest degree which divides both of them.
4. Two polynomials are relatively prime if their greatest common divisor is 1.
5. The least common multiple of two polynomials is the monic polynomial of
smallest degree divisible by both of them.

Examples (for the irreducible polynomials):


1. x2 + x + 1 in GF(2), but x2 + 1 = (x2 + 1)2 is not irreducible.
2. x2 − 3 in the field of rational numbers.
3. x2 + 1 in the field of real numbers.
The Euclidian division algorithm for polynomials can be stated as follows. For
every pair of polynomials t(x) and d(x) (non equal to zero) there is a unique pair of
polynomials q(x) (quotient polynomial) and r(x) (remainder polynomial), such that
tð xÞ ¼ d ð xÞqð xÞ þ r ð xÞ;

where the degree of r(x) is less than the degree of d(x). The remainder polynomial
can be written
rðxÞ ¼ RdðxÞ ½tðxÞ :
It can also be called a residue of t(x) when divided by d(x). The corresponding
congruence relation is
t ð xÞ r ð xÞðð d ð xÞÞ;

where the degree of r(x) can be greater than the degree of d(x).
Two important properties of remainders are:
RdðxÞ ½aðxÞ þ bðxÞ ¼ RdðxÞ ½aðxÞ þ RdðxÞ ½bðxÞ
 
RdðxÞ ½aðxÞbðxÞ ¼ RdðxÞ RdðxÞ ½aðxÞ RdðxÞ ½bðxÞ :

Further, the greatest common divisor of two polynomials r(x) and s(x) can
always be expressed in the form
318 6 Cyclic Codes

GCDðr ð xÞ; sð xÞÞ ¼ að xÞr ð xÞ þ bð xÞsð xÞ;

where a(x) and b(x) are polynomials (they are not unique!). They can be obtained in
an analogous way as for the corresponding integer relation.
Therefore, there is a parallel between the ring of integers and rings of polynomial
over a field. The word “integer” can be changed to word “polynomial”, “a < b” to
“deg a(x) < deg b(x)”, “prime number” to “irreducible polynomial”.
The nonzero polynomial over a field has a unique factorisation into a product of
prime polynomials (like the factorisation of integers into a product of primes). The
irreducible (prime) polynomial cannot be factored further, because it has no field
elements as the zeros.
Having the preceding note in view, the next conclusion is evident:
The set of polynomials which are multiples of any particular polynomial f(x) is
an ideal. The residue class ring formed from this ideal is called polynomial ring
modulo f(x).
One more note. The number of residue classes in the integer ring modulo d is
just d ({0}, {1}, …, {d − 1}). The number of residue classes in the polynomial ring
modulo d(x) of the degree n over the GF(q) equals the number of all possible
remainders—all possible polynomials—with the degree less equal to n − 1, i.e.
there are qn residue classes.
The Algebra of Polynomial Residue Classes
It can be easily proved that the residue classes of polynomials modulo polynomial f
(x) form a commutative linear algebra of a dimension n over the corresponding
coefficients field GF(q).
Denoting the field element with a, the scalar multiplication is defined as

afr ð xÞg ¼ far ð xÞg:

The other axioms are also easily verified. Among qn residue classes, only n are
linearly independent spanning the vector space. They are, for example
   
f1g; fxg; x2 ; . . .; xn1 :

Every residue class contains the polynomial of degree less n, therefore


   
a0 þ a1 x þ . . . þ an1 xn1 ¼ a0 f1g þ a1 f xg þ . . . þ an1 xn1 :

Of course,

ff ð x Þ g ¼ f0g ¼ 0
fxf ð xÞg ¼ 0
fx þ f ð xÞg ¼ fxg:
Brief Introduction to Algebra II 319

In the algebra of polynomials modulo f(x) there are ideals—the residue classes
which are multiples of some monic polynomial (class) g(x). But the polynomial g
(x) must divide f(x). It is called generator (generating) polynomial of the ideal. In
fact, every factor of f(x) is a generator polynomial of an ideal. The factors can be
irreducible polynomials or their products as well. There are no other ideals. The
ideal is a subgroup of the additive group. It is the basis of the corresponding
subspace. Let the degree of f(x) is n and the degree of g(x) – n − k. Then, the
dimension of the subspace is k and the linearly independent residue classes are
 
fgð xÞg; fxgð xÞg; . . .; xnk1 gð xÞ :

Multiplying g(x) by xn−k, the polynomial of degree n is obtained, having to be


reduced modulo f(x).
Let

f ð xÞ ¼ gð xÞhð xÞ;

then the dimension of subspace corresponding to h(x) is n − k (the degree of h


(x) being k).
The next relation also holds

fgð xÞgfhð xÞg ¼ fgð xÞhð xÞg ¼ ff ð xÞg ¼ 0:

It can be said also that the corresponding subspaces generated by g(x) and h
(x) are null-spaces of each other.
In the following exposition, the parenthesis “{.}” will be usually omitted.
The “value” of the polynomial for any field element can be easily computed. Let
be p1(x) = 1 + x + x2 and p2(x) = 1 + x over GF(2). then

p1 ð0Þ ¼ 1 þ 0 þ 02 ¼ 1; p1 ð1Þ ¼ 1 þ 1 þ 12 ¼ 1; p2 ð0Þ ¼ 1 þ 0 ¼ 1; p2 ð1Þ ¼ 1 þ 1


¼ 0;

and “1” is the zero (root) of the polynomial p2(x).


Vectors and Polynomials
Consider the correspondence between the n-tuples (vectors) and polynomials
modulo f(x) of degree n. While the vector is an ordered sequence of the field
elements (a0, a1, …, an−1), using polynomial the field elements are ordered by the
degrees of x, i.e. the corresponding polynomial is a0 + a1x + … + an−1xn. The
multiplication by scalar (the field element) gives in both cases the same corre-
sponding result as well as the addition. In fact, an n-tuple and corresponding
polynomial can be considered to be only a different ways of representing the same
element of the algebra. Therefore, both names will be used accordingly.
There is one problem. The requirement for the orthogonality of vectors (inner
product equals 0) differs from the requirement that the product of polynomials be 0.
320 6 Cyclic Codes

However, a very suitable connection is obtained if f(x) = xn − 1 (in GF(2) − xn + 1).


In this case

fxn 1g ¼ 0 ) fxn g ¼ f1g;

or, by omitting the parenthesis xn = 1.


Let

að xÞ ¼ a0 þ a1 x þ . . . þ an1 xn
bð xÞ ¼ b0 þ b1 x þ . . . þ bn1 xn ;

then

cð xÞ ¼ að xÞbð xÞ ¼ c0 þ c1 x þ . . . þ cn1 xn ;

because

xn þ j ¼ x j :

The corresponding coefficient is


   
cj ¼ a0 bj þ a1 bj1 þ . . . þ aj b0 þ aj þ 1 bn1 þ aj þ 2 bn2 þ . . . þ an1 bj þ 1 :

The first part of the expression corresponds to the terms whose sum of the
indexes is j and the second part to the terms whose sum is n + j. The coefficient cj
can be rewritten as an inner product
  
cj ¼ a0 ; a1 ; . . .; aj ; aj þ 41 ; aj þ 2 ; . . .; an1 bj ; bj1 ; . . .; b0 ; bn1 ; bn2 ; . . .; bj þ 1 :

It should be noted that the second vector is obtained taking the coefficients of b
(x) in reverse order and shifted cyclically j + 1 positions to the right. Therefore, if a
(x)b(x) = 0, then the vector corresponding to a(x) is orthogonal to the vector cor-
responding to b(x) with the order of its components reversed as well as to every
cyclic shift of this vector. Of course, the multiplication is commutative, therefore, a
(x)b(x) = 0 ) b(x)a(x) = 0 and the same rule can be applied keeping components
of b(x) in normal order and reverting and shifting the components of a(x).
Having the cyclic codes in view, the residue classes of polynomials modulo
polynomial xn − 1 will be almost exclusively considered in the following.
Example
Let f(x) = 1 + x3 = (1 + x)(1 + x + x2) over GF(2). The residue classes are (in
the first column the corresponding vectors are written):

(000) {0} 1 + x3 x(1 + x3) = x + x4 . . .


3
(100) {1} x 1 + x + x4 . . .
(continued)
Brief Introduction to Algebra II 321

(continued)
(000) {0} 1 + x3 x(1 + x3) = x + x4 . . .
3
(010) {x} 1+x+x x4 . . .
(110) {1 + x} x + x3 1 + x4 . . .
(001) {x2} 1 + x2 + x3 x + x2 + x4 . . .
(101) {1 + x2} x2 + x3 1 + x + x2 + x4 . . .
(011) {x + x2} 1 + x + x2 + x3 x2 + x4 . . .
(111) {1 + x + x2} x + x2 + x3 1 + x2 + x4 . . .

The first row is the ideal, the other rows are the residue classes. The elements of any
row have the same remainder after dividing by f(x). The corresponding basis is {1}, {x}
and {x2}. f(x) is the product of two factors—1 + x and 1 + x + x2. Let g
(x) = 1 + x. The elements of the corresponding ideal are {0}, {1 + x}, {xg
(x)} = {x + x2} as well as their sum {(1 + x)g(x)} = {1 + x} + {x + x2} = {1 + x2}.
This ideal is the null-space for the ideal generated by h(x) = 1 + x + x2 whose elements
are {0} and {1 + x + x2}. The corresponding orthogonality can be easily verified.
Galois Fields
Let p(x) is a polynomial with coefficients in a field F. The algebra of polynomials
modulo p(x) is a field if (and only if) p(x) is irreducible in F. If p(x) is of degree n, the
obtained field is called the extension field of degree n over F, while the field F is called
the ground field. The extension field contains as the residue classes all the elements of
the ground field. It is said that extension field contains the ground field. (Note: some
authors instead of irreducible polynomials consider only monic irreducible polyno-
mials called a prime polynomials—in the case of GF(2) there is no any difference).
The residue classes modulo prime number p form GF(p) of p elements. The ring
of polynomials over any finite field has at least one irreducible polynomial of every
degree. Therefore, the ring of polynomials over GF(p) modulo an irreducible
polynomial of degree n is the finite field of pn elements—GF(pn).
The existence of inverse elements in the field GF(pn) is proved in an analogous
way as in the field GF(q), using now the relation

1 ¼ að xÞr ð xÞ þ bð xÞsð xÞ:

There are no other finite fields. Moreover, the finite fields with the same number
of elements are isomorphic—they have the unique operation tables. The difference
is only in the way of naming their elements.
The fields GF(pn) are said to be fields of characteristic p. In the GF(p), p = 0
(mod p), therefore, in an extension field GF(pn), p = 0 as well. Then
p 
X
p p
ða þ bÞ ¼ ai bpi ¼ ap þ bp ;
i¼0
i
322 6 Cyclic Codes
 
p p
because all binomial coefficients except and have p(=0) as a factor.
0 p
A few definitions and theorems follow (without proof!):
1. Let F be a field and a let be the element of an extension field of F. The
irreducible polynomial m(x) of the smallest degree over F with m(a) = 0, is
called the minimal polynomial of a over F.
2. The minimal polynomial always exists and it is unique.
3. Every element of the extension field has a minimal polynomial.
4. If f(x) is the polynomial over F and if f(a) = 0, then f(x) is divisible by the
corresponding minimal polynomial m(x).
For example, the minimal polynomial for the “imaginary unit” j from the field of
complex numbers (being extension of the field of real numbers) is x2 + 1, whose
coefficients are from the field of real numbers.
The Multiplicative Group of a Galois Field
Let G be any finite group. Consider the set of elements formed by any element
(g) and its powers
g; gg ¼ g2 ; gg2 ¼ g3 ; . . .

There is a finite number of elements. Therefore, after some finite number of


exponentiations there must be the repetition, i.e.

gi ¼ g j ) 1 ¼ gji :

The conclusion is that some power of g equals 1. Let e be the smallest such
positive integer (ge = 1). Then e is called the order of the element g. Obviously, the
set of elements

1; g; g2 ; . . .; ge1

forms a subgroup. There are e elements in the subgroup—the order of any field
element divides the order of the group (Lagrange’s theorem). A group consisting of all
the powers of one of its elements is called a cyclic group. Such an element is called a
primitive element. There can be more distinct primitive elements in the group.
The multiplicative group in the Galois field is cyclic—all the elements of the
field (except 0) are the powers of an element (primitive). Therefore, the Galois field
has a primitive element. The order of the primitive element equals the number of
elements of the multiplicative group—pn − 1.
The following definition will be useful:
A primitive polynomial p(x) (also called primitive irreducible polynomial) over
Galois field is a prime polynomial over the same field with the property that in the
extension field obtained modulo p(x), the field element represented by x is primi-
tive. The conclusion is that all the field elements can be obtained by the corre-
sponding exponentiation. Because {p(x)} = 0, one can say that a primitive
Brief Introduction to Algebra II 323

polynomial is a prime polynomial having a primitive element as a zero (root). Let a


is a primitive element, then p(a) = 0.
It can be shown that the primitive polynomials of every degree exist over every
Galois field.
Example The GF(24) may be formed as a field of polynomials modulo primitive
polynomial x4 + x + 1 over GF(2) (as the prime numbers, the irreducible polyno-
mials are found by search!). According to the previous definition of the primitive
element (“the field element represented by x is primitive”), denoted by a, it is the
root of the primitive polynomial (a4 + a+1 = 0). All 15 nonzero field elements are
represented by the corresponding powers of a. The table is obtained dividing the
corresponding power of x by the primitive polynomial and taking the remainder.
For a7, the procedure is the following

x7:(x4+x+1)= x3+1
x7+x4+x3
x4+x3
x4+x+1
x3+x+1 ⇒ α7=α3+α+1

The corresponding vectors are also shown.

(1000) α 0= 1 (1010) α 8= 1 + α2
(0100) α 1= α (0101) α 9= α + α3
2 2 10 2
(0010) α = α (1110) α = 1 + α + α
(0001) α 3= α3 (0111) α 11= α + α 2 + α3
(1100) α 4= 1 + α (1111) α 12= 1 + α + α 2 + α3
5 2 13
(0110) α = α + α (1011) α = 1 + α 2 + α3
(0011) α 6= α2 + α3 (1001) α 14= 1 + α3
7 3 15 0
(1101) α = 1 + α + α (1000) α (α )=1

It should be noted that x4 + x + 1 is a factor of x15 + 1 as well as that a is the


root of the both polynomials. Therefore,
 15 
x þ 1 ¼ 0 ) a15 þ 1 ¼ 0 ) a15 ¼ 1;

and the last row in the table is added to show the cyclic structure only.
The addition in the Galois field GF(pn) can be easily carried out by adding the
corresponding coefficients in GF(p). On the other hand, the multiplication is more
complicated. After multiplying the polynomials (residue classes) the obtained result
should be divided by the corresponding primitive polynomial and the remainder taken
as a final result. The representation of the field elements by the powers of the primitive
324 6 Cyclic Codes

element will ease the task. For the preceding example, when multiplying {x2 +
x} = a5 by {x3 + x2 + x} = a11, the result is a5a11 = a16 = a15a = 1a = a (={x}).
The addition using primitive element is also possible by introducing Zech logarithms.
The next theorems are also important:
1. All the pn − 1 nonzero elements of GF(pn) are roots of the polynomial
n
xp 1  1. The element 0 can be included by considering the polynomial
n n
ðx  0Þðxp 1  1Þ ¼ xp  x:
2. If m(x) is the minimal polynomial of degree r over GF(p) of an element b2GF
(pn), then m(x) is also minimal polynomial of bp.
In fact, m elements
2 r1
b; bp ; bp ;    ; bp

are all the roots of m(x). These elements are, by analogy with the fields of real and
complex numbers, called conjugates. All conjugates have the same order. Let
e denotes the order of b, i.e. be = 1. Then
 i
e j j j
bp ¼ bep ¼ ðbe Þp ¼ 1p ¼ 1:

r
There are no the others roots of m(x), because b is also the root of xp 1  1, i.e.
r r
 r1 p r r
bp 1  1 ¼ 0 ) bp 1 ¼ 1: Therefore, bp ¼ bp ¼ b  bp 1 ¼ b  1 ¼ b: Let
remind that the coefficients of m(x) are from the ground field GF(p).
Let a be a primitive element in a Galois field, then all the conjugates are also the
primitive elements because they all have the same order.
Example Find the minimal polynomial for a2GF(24) over GF(2). The conjugates
are

a; a2 ; a4 ; a8 ða16 ¼ a15 a ¼ 1a ¼ aÞ;


therefore (in GF(2) −a = a)

m1 ð xÞ ¼ ðx þ aÞðx þ a2 Þðx þ a4 Þðx þ a8 Þ ¼ . . . ¼ x4 þ x þ 1;

where the corresponding table shown earlier is used.


The verification is as follows (still using the same table)

m1 ðaÞ ¼ a4 þ a þ 1 ¼ 1 þ a þ a þ 1 ¼ 0
m1 ða2 Þ ¼ a8 þ a2 þ 1 ¼ 1 þ a2 þ a þ 1 ¼ 0
m1 ða4 Þ ¼ a16 þ a4 þ 1 ¼ a þ 1 þ a þ 1 ¼ 0 ða15 ¼ 1 ) a16 ¼ aÞ
m1 ða8 Þ ¼ a32 þ a8 þ 1 ¼ a2 þ 1 þ a2 þ 1 ¼ 0 ða30 ¼ 1 ) a32 ¼ a2 Þ:
Brief Introduction to Algebra II 325

The minimal polynomial for a3 is

m2 ðxÞ ¼ ðx þ a3 Þðx þ a6 Þðx þ a12 Þðx þ a24 Þ ¼ ðx þ a3 Þðx þ a6 Þðx þ a12 Þðx þ a9 Þ
¼ x4 þ x3 þ x2 þ x þ 1:

In a similar way the other minimal polynomials are obtained

m3 ð xÞ ¼ x4 þ x3 þ 1ðfor a7 ; a14 ; a28 ¼ a13 ; a56 ¼ a11 Þ


m4 ð xÞ ¼ x2 þ x þ 1ðfor a5 ; a10 Þ
m5 ð xÞ ¼ x þ 1ðfor a0 ¼ 1Þ:

According the above cited theorem


    
x15 þ 1 ¼ ðx þ 1Þ x4 þ x þ 1 x4 þ x3 þ x2 þ x þ 1 x4 þ x3 þ 1 x2 þ x þ 1

and all the nonzero elements GF(24) are the roots of x15 + 1 [a0 = 1 is the element
both of GF(2) and GF(24)].
It should be noted that a7 and its conjugates are also the primitive elements:

a7 ; ða7 Þ2 ¼ a14 ; ða7 Þ3 ¼ a21 ¼ a6 ; ða7 Þ4 ¼ a28 ¼ a13 ; a5 ; a12 ; . . .; ða7 Þ14 ¼ a98
¼ a8 ; ða7 Þ15 ¼ a105 ¼ a0 ¼ 1:

Therefore, the GF(24) can be obtained also modulo x4 + x3 + 1 [starting form


a (= b)]. The obtained table will appear different, but the results of addition and
7

multiplication of field elements will be always the same. As stated earlier, all finite
fields of the same order are isomorphic. When forming the new table only the field
elements are denoted by different symbols.
It should be noted also that conjugate non-primitive elements form a corre-
sponding subgroup. The multiplicative group in GF(24) has the following sub-
groups (the order of the subgroup is divisor of the order of the group—
15 = 1  3  5):

a0 ¼ 1
a3 ; a6 ; a9 ; a12 ; a15 ¼ a0 ¼ 1
a5 ; a10 ; a15 ¼ a0 ¼ 1:
Chapter 7
Convolutional Codes and Viterbi
Algorithm

Brief Theoretical Overview

Parallelly to the block codes, the convolutional codes are the second important
family of error correcting codes. They are the most important subset of so called
tree codes, where some additional limitations are imposed (finite memory order,
time invariance, linearity). They were first introduced in the “systematic” form by
Elias in 1955 [46] as an alternative to block codes.
During block encoding, a block of k information symbols is represented by a
code word of length n. In the case of systematic code, information symbols are not
changed and n − k symbols are added. If information bits (symbols) are statistically
independent the obtained code words will be statistically independent as well. From
theory, it is known that good results can be obtained if the code words are relatively
long. On the other hand, for a longer code word encoders, and especially decoders,
are more and more complex. During convolutional encoding, to obtain n channel
bits, parity checks added do not depend on the considered k information bits only,
but as well on m previous k-tuples of information bits. Therefore, the statistical
dependence is introduced not for n channel bits, but for (m + 1)n channel bits.
Otherwise speaking, if the channel bits are observed using “window” of length
(m + 1)n, the statistical dependence will be found. In the next step, the window
“slides” for n bits, and the observed bits are statistically dependent as well. The
encoding procedure can be interpreted as follows. The input signal (bits) consisting
of blocks of k bits enters into the linear system whose impulse response is m + 1
blocks “long”. At its output, the blocks of n symbols are generated and the output
signal is the convolution of the input signal and the system impulse response. The
name convolutional code originates from this interpretation. In such way, and for
smaller values of k and n, the longer “code words” can be obtained, i.e. one would
expect that good performance will be obtained. Often, the codes with k = 1 are
used, i.e. with code rate R = 1/n. The convolutional codes can be systematic as well
as nonsystematic ones.

© Springer International Publishing AG 2017 327


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_7
328 7 Convolutional Codes and Viterbi Algorithm

Generally, the convolutional encoding can be accomplished according to the


block scheme shown in Fig. 7.1. Convolutional codes can be generated using more
than two symbols, but in the following only the name “bit” will be used, although at
every place the name “symbol” can be put. Information bits (k bits) from buffer enter
parallelly into the memory keeping in the same time mk previous information bits.
The memory consists of k shift registers, of the length m. New k bits enter into the
block logic as well. In this block, according the chosen procedure, n bits are formed,
written parallelly into the output register. From register, they are sent serially into the
channel. For a systematic encoding, k current information bits enter unchanged into
output register. Convolutional codes are usually denoted with (n, k, m), or only the
code rate is given (R = k/n). It should be mentioned that at the beginning of the
encoding the zeros are in the memory and in the output register.
Sometimes, some number of “zeros” (or some other sequence) is added after the
information bits to reset the encoder. In this case, an effective code rate can be
defined as

kL
Reff ¼ \R;
nðL þ m  1Þ

where kL is a total number of encoded information bits.


Important parameter as well is the constraint length m = (m + 1)n. There are
more definitions [27]. For (n, k, m) convolutional code some authors define it as

v ¼ nðm þ 1Þ

BUFFER MEMORY k

LOGIC

n bits

Fig. 7.1 General block scheme of a convolutional encoder


Brief Theoretical Overview 329

where m is length of the longest shift register in memory (there is no need for all
shift registers to be the same length). It can be interpreted as the maximum number
of encoder output bits that can be affected by a single information bit.
As said previously, if the incoming information bits are not changed during
encoding, a systematic code is obtained. In Fig. 7.2 a nonsystematic convolutional
encoder is shown. If there is not a feedback in the encoder, it is not recursive (as in
Fig. 7.2) (Problems 7.3, 7.8). The parameters are k = 1, n = 2, m = 2, R = 1/2 and
m = 6. The encoder input is denoted as x and outputs as y1 and y2.
The outputs are obtained by adding the bit which is at the encoder input and bits
at the corresponding cells of shift register. They are denoted as s1 , s2 being the
components of state vector, i.e. the encoder state is S ¼ ½s1 s2 . This encoder has
four states corresponding to dibits or to its decimal equivalents (00 = 0, 01 = 1,
10 = 2, 11 = 3). Convolutional encoder can be considered as a finite state auto-
mate. Its functioning can be represented by table or by state diagram. If the state
vector components in the next moment are denoted as s01 i s02 , the corresponding
equations are

s01 ¼ x; s02 ¼ s1 ; y1 ¼ x  s1  s2 ; y2 ¼ x  s2 :

on the basis of which Table 7.1 is obtained.

y1
x ∈ {0,1} (1)
s1 s2
(2)
y2

Fig. 7.2 Convolutional encoder for (2,1,2) nonsystematic code

Table 7.1 The functioning x s1 s2 s01 s02 y1 y2


of encoder from Fig. 7.2
0 0 0 0 0 0 0
1 0 0 1 0 1 1
0 0 1 0 0 1 1
1 0 1 1 0 0 0
0 1 0 0 1 1 0
1 1 0 1 1 0 1
0 1 1 0 1 0 1
1 1 1 1 1 1 0
330 7 Convolutional Codes and Viterbi Algorithm

Fig. 7.3 State diagram for 0/00


encoder from Fig. 7.2

1/11
00 0/11

1/00
10 01
0/10

1/01 0/01
11

1/10

The corresponding state diagram (Problems 7.1–7.3) is shown in Fig. 7.3 (state
diagram is explained in Chap. 2). The same encoder is used in Problem 7.3, but
there Viterbi algorithm is illustrated.
Along the branches in state diagram, the bit entering the encoder is denoted as
well as the bits at the encoder output. Historically, for the first description of
convolutional encoder the code tree was used (code tree is explained in Chap. 2). In
Fig. 7.4 a corresponding code tree for the considered example is shown, and the
path (heavy lines) corresponding to sequence 1010 at the encoder input (when the
encoder is reset). The corresponding states sequence is 00-10-01-10-01-…, while at
the encoder output, the sequence is 11100010…. Therefore, to every input (and
output) sequence corresponds a unique path in the state diagram.
The main drawback of the code tree is that the number of nodes grows expo-
nentially. To follow the system “behavior” some kind of “dynamic” state diagram
can be used—trellis (explained as well in Chap. 2). The corresponding trellis is
shown in Fig. 7.5.
There exist the other useful descriptions of convolutional encoders. Consider
firstly the polynomial description (Problem 7.1). Every memory cell is a part of
delay circuit. By introducing delay operator D (“delay”) or z−1, denoting one shift
delay, the following can be written

s1 ¼ Dx; s2 ¼ Ds1 ;
y1 ¼ x  s1  s2 ¼ xð1 þ D þ D2 Þ;
y2 ¼ x  s2 ¼ xð1 þ D2 Þ:

Therefore, the encoder can be completely described using two generator poly-
nomials [47]
Brief Theoretical Overview 331

00
a
Stanja: 00
a
a=00 11
b=01 c
c=10 00
a
d=11 10
b
11
c
01 d
00
a
11 a
10
b
00
c
11
c
01
b
01 d
10
x =0 d

a
00
a
x =1
11 a
11
c
10 b
10 b
00
c
01
d
11
c=(10)
11
a
01
b
00
c
01
d
01
b
10
d
10
d

Fig. 7.4 The code tree for a considered example


332 7 Convolutional Codes and Viterbi Algorithm

0/00 0/00 0/00


00
1/11
1/11
0/11
01 1/11
1/00
0/10
0/10

10
1/01

1/01 0/01

11
1/10

Fig. 7.5 Trellis for the considered example

g1 ðDÞ ¼ 1 þ D þ D2 ; g2 ðDÞ ¼ 1 þ D2 ;

and the output sequence can be obtained by multiplying polynomial corresponding


to the encoder input (information polynomial) and generator polynomials. These
polynomials can be put together as a transfer function matrix (Problem 7.1)

GðDÞ ¼ ½g1 ðDÞ; g2 ðDÞ ¼ ½1 þ D þ D2 ; 1 þ D2 

describing completely the convolutional encoder structure.


Consider the following example. Let the input of the encoder shown in Fig. 7.2
is bit sequence x = 101. The corresponding information polynomial (the most
significant coefficient corresponds to the last bit) is

xðDÞ ¼ 1  D0 þ 0  D1 þ 1  D2 ¼ 1 þ D2 :

By multiplying the polynomials, the following is obtained

y1 ðDÞ ¼ xðDÞg1 ðDÞ ¼ ð1 þ D2 Þð1 þ D þ D2 Þ ¼ 1  D0 þ 1  D1 þ 0  D2 þ 1  D3 þ 1  D4


y2 ðDÞ ¼ xðDÞg2 ðDÞ ¼ ð1 þ D2 Þð1 þ D2 Þ ¼ 1  D0 þ 0  D1 þ 0  D2 þ 0  D3 þ 1  D4 ;

and the corresponding output sequence, after multiplexing the obtained sequences is

y ¼ 11 10 00 10 11:

In the previous example the information sequence is transformed into polyno-


mial, further multiplied by generator polynomials and later again transformed back
into the output sequence. This procedure can be simplified if instead of polyno-
mials, the corresponding vectors are used, in this case
Brief Theoretical Overview 333

g1 ¼ ð111Þ; g2 ¼ ð101Þ:

In the literature, sometimes octal equivalent is used. Sets of three-bit (from left
to right) are written using octal symbols. If the number of bits is not divisible by
three, at the sequence beginning the corresponding number of zeros is added (some
authors put the zeros at the sequence end!). In the considered example, generator
matrix is GOKT = [7, 5].
The “impulse response” can be used as well. For the considered example for a
single 1 at the input, at the first output 111 will be obtained, and at the second one 101.
Generally, for the input sequence

i ¼ ði0 ; i1 ; . . .; il ; . . .Þ;

the corresponding output sequences (encoder with two outputs) are the corre-
sponding convolutions
   
ð1Þ ð1Þ ð1Þ ð2Þ ð2Þ ð2Þ
vð1Þ ¼ v0 ; v1 ; . . .; vl ; . . . ; vð2Þ ¼ v0 ; v1 ; . . .; vl ; . . . :

By introducing vectors according to encoder configuration, the output sequences are

vð1Þ ¼ i  gð1Þ ; vð2Þ ¼ i  gð2Þ :

For the previous example (Fig. 7.2) for input sequence x = 101, the output
sequence is y = 1110001011:

Input i Output (in time)

1 11 10 11
0 00 00 00
1 11 10 11
Sum: 11 10 00 10 11

It should be noted as well that shift registers are not uniformly drawn in liter-
ature. For example, two encoders shown in Fig. 7.6 are equivalent.

+ +

+
+

Fig. 7.6 Different drawings of the same encoder


334 7 Convolutional Codes and Viterbi Algorithm

Now, the decoding problem will be considered. The following procedures are of
interest:
– majority logic decoding;
– sequential decoding;
– Viterbi algorithm.
It should be mentioned that majority logic decoding and Viterbi algorithm can be
used for block codes decoding as well (Chaps. 8 and 9). Majority logic decoding
and sequential decoding are specific for error control, while Viterbi decoding is
universal. Viterbi decoding and sequential decoding use trellis.
Majority logic decoding is a special case of threshold decoding elaborated in
details by Massey [48] having in mind convolutional encoding. However, this
approach was used earlier, especially for Reed-Muller codes [49, 50] (Chap. 5).
Hard decision and soft decision can be applied as well. Threshold decoding can be
applied for some codes classes only. Further, majority logic decoding is possible
only if hard decision is used.
Sequential decoding can be used for a very long constraint lengths. It was
proposed by Wozencraft [51]. Fano [52] proposed a new algorithm now known as
Fano algorithm. Later, Zigangirov [53] and Jelinek [54] (separately) proposed a
new version of decoding procedure called now stack algorithm. The idea of
sequential decoding is very simple. Decoder slides over only one path in trellis
measuring distance (hard or soft) of the received sequence and this path. If it is
“satisfying”, the received sequence is decoded and the process starts anew. If the
distance grows up over some threshold, decoder returns to some node and starts
considering the other branch. Threshold should be adjusted during the decoding.
This adjusting and mode of sliding form in fact the corresponding algorithm.
Convolutional codes and sequential decoding were used in some space missions
(Pioneer Program, Voyager Program, Mars Pathfinder).
However, the Viterbi algorithm is the most interesting and will be exclusively
considered and explained in details in this chapter. Viterbi proposed his algorithm
more as a pedagogical device, but it has found many applications. Here the
important code characteristic is called free distance (dfree) (Problem 7.2). It is a
minimum weight of encoded sequence whose first information block is nonzero.
Viterbi algorithm is a maximum likelihood decoding. It moves through the trellis
keeping only the best path (survivor) (Problem 7.3) to each state. It means that the
corresponding metric is used—usually Hamming distance (Problems 7.3–7.6) or
Euclidean (squared) distance (Problems 7.5–7.7). The signal used for metric cal-
culation can be quantized. If there are only two levels it results in a Hamming
distance (hard decision), otherwise the Euclidean metric is used (soft decision).
The state diagram can be modified by splitting the initial state into two states
(Problem 7.2) to find a modified transfer (generator) function, from which the
corresponding weight spectrum can be found. It can be used to find the bounds of
residual error probability. Consider convolutional encoder shown in Fig. 7.2, which
has transfer function matrix G(D) = [g1(D), g2(D)] = [1 þ D þ D2 , 1 þ D2 ].
Brief Theoretical Overview 335

(a) (b)
0/00 Sout Sin

00 LND 2 LD 2
1/11
0/11
LN
1/00
S2 S1
10 01
LD
0/10

1/01
LND LD
0/01
11 S3

1/10 LND

Fig. 7.7 Original (a) and modified (b) state diagram of the considered encoder

To find a modified transfer (generator) function, the corresponding state diagram is


obtained by splitting the initial state into two (zero) states—from one it is possible
only to start (Sout) and to other where it is possible only to enter (Sin). The system
cannot stay in this state(s). In. Fig. 7.7 original (a) and modified (b) state diagram of
the encoder are shown.
The marks on the branches of a modified state diagram are chosen as follows:
every branch having n units at the input is marked with Nn, every branch having
d units at the output is marked with Dd. One time interval corresponds to the
transmission of one information bit, is symbolically denoted by L. These marks are
multiplied yielding the corresponding system of equations for modified state
diagram

S2 ¼ Sout LND2 þ S1 NL; S1 ¼ S2 LD þ S3 DL;


S3 ¼ S2 LND þ S3 LND; Sin ¼ S1 LD2 ;

By eliminating S1, S2 and S3, the corresponding relation between two zero states
is obtained

LD LND2 ð1  LNDÞ
Sin ¼ D2 L Sout ;
1  LND 1  NLD  NL2 D

yielding a modified transfer (generator) function

Sin L3 ND5
TðL; N; DÞ ¼ ¼ :
Sout 1  NLDð1 þ LÞ
336 7 Convolutional Codes and Viterbi Algorithm

By using identity

1 X1
¼ xi ¼ 1 þ x þ x2 þ x3 þ x4 þ   
1x i¼0

modified generator function can be represented as an infinite series


h i
TðL; N; DÞ ¼ L3 ND5 1 þ NLDð1 þ LÞ þ N 2 L2 D2 ð1 þ LÞ2 þ . . .
¼ L3 ND5 þ L4 N 2 D6 þ L5 N 2 D6 þ L5 N 3 D7 þ 2L6 N 3 D7 þ L7 N 3 D7 þ   

where the element m Ll N n Dd shows that there are m weight paths, d formed for
l entering bits, from which in this entering sequence there are n ones.
Denoting with ad,n the number of paths having weight d, generated by an input
of the weight n, a modified generating function for L = 1 can be written in a form

X
1 X
1
TðN; DÞ ¼ ad;n N n Dd ;
d¼dfree n¼1

where the total number of input ones on the paths having a weight d, denoted by cd,
can be found from

@TðD; NÞ X1 X1 X1 X1
jN¼1 ¼ nad;n Dd ¼ cd Dd ) cd ¼ nad;n :
@N d¼d n¼1 d¼d n¼1
free free

Trellis corresponding to the considered encoder is shown in Fig. 7.8, while the
weight spectrum is given in Table 7.2.
The first nonzero component is obtained for d = 5, therefore dfree = 5. It cor-
responds as well to the above obtained infinite series where the first nonzero term is

L3N1D5 L4N2D6 L5N2D6


0
0/11 0/11 0/11 L5N3D7
1/11
1
1/00 1/00
0/10
1/00 0/10
0/10
2

0/01 1/01
1/01 0/01 0/01 1/01
3
1/10 1/10 1/10

Fig. 7.8 Trellis corresponding to the considered encoder


Brief Theoretical Overview 337

Table 7.2 Weight spectrum of the considered encoder


d 1 2 3 4 5 6 7 8 9 10 11 12 13 14
ad 0 0 0 0 1 2 4 8 16 32 64 128 256 512
cd 0 0 0 0 1 4 12 32 80 192 448 1024 2304 5120

Fig. 7.9 Illustration of 11 10 11


decision error
00 00 00 5
0
0
11 11

1
10

L3 ND5 . In this code k = 1, minimum distance after the first (i + 1) information bits
(di—i = 0,1,2,…) is d0 = 0, d1 = 0, d2 = 5, d3 = 5,…. Therefore dm = d2 = 5, and
the code can correct up to ec = (dfree–1)/2 = 2 errors on one constraint length
(6 bits).
The next question is what will happen if the number of errors is greater than code
correcting capability. If there are no errors at further received bits, the decoder will
return to the correct path. The path part comprising errors is called “error event”.
However, some bits during the error event will be correctly decoded. The standard
measure of error correcting capability is BER (“Bit Error Rate”).
Performance of convolutional code can be found using the weight spectrum. For
the above example suppose that the encoder emits all zeros sequence and that the
error vector transforms this sequence into some possible sequence. Let this
sequence is 111011. According to Fig. 7.9 this sequence is valid. Generally, the
decision error will appear if three or more bits of the received sequence differ from
the emitted sequence. It should be noted that the error at the fourth bit (0 for both
paths) does not affect the decision. Because of that the summing is up to d = 5.
For BSC with crossover probability p, the probability to choose the path with
Hamming weight d = 5 (denoted as P5)

5  
X 5
P5 ¼ pe ð1  pÞ5e
e¼3
e
338 7 Convolutional Codes and Viterbi Algorithm

For d = 6, 3 bit errors will yield the same probability for right and wrong
decision, while 4 and more bit errors will always result in wrong decision yielding
  X6  
1 6 3 6 e
P6 ¼ p ð1  pÞ3 þ p ð1  pÞ6e :
2 3 e¼4
e

Generally, path error event probability for path having the weight d is
8  
> Pd d e
>
< e¼ðd þ 1Þ=2 p ð1  pÞde ; d neparno
  e  
Pd ¼ Pd :
>
> d d e
:21 d=2 d=2
p ð1  pÞ þ e¼d=2 þ 1 p ð1  pÞde ; d parno
d=2 e

This probability can be limited from the upper side (for both even and odd
d values) [55]

X
d   X
d  
d e d d=2
Pd ¼ p ð1  pÞde \ p ð1  pÞd=2 ¼ 2d pd=2 ð1  pÞd=2 :
e e
e¼ðd þ1Þ=2 e¼ðd þ1Þ=2

Taking into account the number of paths which have the weight d(ad), the
average probability for wrong path choice does not depend on the node where the
decision is made, being

X
1 X
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d 

Pe  ad P d \ ad  ½2 ð1  pÞp ¼ TðDÞD¼2pffiffiffiffiffiffiffiffiffiffi
ð1pÞp
ffi:
d¼dfree d¼dfree

Having in mind that p is usually a small number (p 0.5), previous sum can be
calculated on the basis of its first term as follows:

Pe \adfree 2dfree ½pð1  pÞdfree =2 adfree 2dfree pdfree =2

This analysis can used for BER calculation. In fact, for every wrong decision, the
number of bits in error equals the number of ones at the wrong part. Denoting this
number with cd (the total number of ones at all sequences with weight d) and
denoting with k the number of information bits entering the encoder in one time
unit, BER is upper-limited as follows

1 X 1
Pb \ cd Pd :
k d¼d
free
Brief Theoretical Overview 339

By using the above expression for Pd, this expression can be further simplified

1 X 1
1 X 1 h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiid 1 @TðD; NÞ
Pb \ cd Pd ¼ cd 2 pð1  pÞ ¼ jN¼1; D¼2pffiffiffiffiffiffiffiffiffiffi

k d¼d k d¼d k @D pð1pÞ
free free

and if it is supposed that the first term in the sum is dominant, a simple expression
for hard decision Viterbi algorithm is obtained [25] (Problem 7.5)

1
Pb \ cdfree 2dfree pdfree =2 :
k

For the above considered example dfree = 5, adfree ¼ 1 and cdfree ¼ 1, and for
p = 10−2 and k = 1 it is obtained

Pe Pb \25 p5=2 ¼ 3:2  104 :

Consider the system using BPSK and coherent demodulation with binary
quantization of demodulator output modeled as BSC. Denoting by Eb/N0 the ratio
of energy per information bit and average power density spectrum, error probability
without encoding is
rffiffiffiffiffiffi
1 Eb 1 Eb =N0
p ¼ erfcð Þ e ;
2 N0 2

for hard decision Viterbi decoding (k = 1) this probability is

1
Pb \ cdfree 2dfree eðdf ree =2ÞðEb =N0 Þ :
2

By comparing these two expressions the asymptotic coding gain is obtained (R is


the code rate)

gtvrdo ¼ 10  log10 ðRdfree =2Þ:

For soft decision Viterbi algorithm the following is obtained [25]


rffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
1 X 1
Eb 
Pe  ad erfcð Rc d Þ TðDÞD¼1=2eREb =N0 ;
2 d¼d N0
free

and
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 1
Eb 1 @TðD; NÞ
Pb  cd erfcð Rc d Þ jN¼1;D¼1=2eREb =N0 :
2 d¼d N0 k @D
free
340 7 Convolutional Codes and Viterbi Algorithm

The following approximation can be used (Problem 7.5)


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ad Eb ad Rc dfree NEb
Pd  erfcð Rc dfree Þ e 0
2 N0 2
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cd Eb cd Rc dfree NEb
Pb  erfcð Rc dfree Þ e 0:
2 N0 2

Asymptotic coding gain in this case is

gmeko ¼ 10  log10 ðRdfree Þ:

If the initial encoder state is not known (Problem 7.6), the algorithm starts from
all possible states. If the synchronization is lost, decoder and encoder can be
resynchronized after some steps, if there were no channel errors. One of drawbacks
of Viterbi algorithm is variable rate (Problem 7.4) of the decoded bits at the decoder
output. It can be overcomed by so called fixed delay (Problem 7.6). The decoding is
performed until some trellis depth (defined by fixed delay) and then the decisions are
made. An interesting class are punctured convolutional codes (Problem 7.7), where
some output bits of the originating code are omitted (“punctured”). It can result in the
encoder having the same code rate and better error correcting capability.
There are two ways to approach the limiting performance of digital communi-
cation systems. In the case when the signal-to-noise ratio is limited (i.e. where the
average signal power is limited), the performance can be improved by using binary
error control coding including the adequate frequency band broadening. When the
frequency band is limited, performance can be improved using multilevel signaling
and the corresponding more powerful transmitter. For the last case an interesting
procedure is TCM (Trellis Coded Modulation) (Problem 7.8) being a combination of
convolutional encoder and modulator whose parameters are simultaneously opti-
mized (e.g. the mapping information bit combinations into a modulated signal
levels). For decoding the Euclidean metric is used. The first mention of this possi-
bility was in 1974 by Massey [56], while Ungerboeck later, starting from 1976,
elaborated the idea in many articles [57–60]. Coding and modulation are jointed here.
The next step is the substitution of code symbols by the channel signals. In
Fig. 7.10 some one-dimensional and two-dimensional signal constellations are
shown. One dimension corresponds to the baseband transmission, but two
dimensions (complex numbers) corresponds to modulation (multilevel PSK,
QAM). To compare the efficiency of coding schemes the notion of asymptotic
coding gain is often used. For soft Viterbi algorithm decoding it is (Problem 7.8)
 
dfree
Ga ¼ 20  log ½dB;
dref

where dfree is free Euclidean distance and dref is minimal Euclidean distance without
the error control coding (for the same signal-to-noise ratio).
Brief Theoretical Overview 341

One-dimensional constellations

binary
-1 0

quaternary
-3 -1 0 1
eight levels

-7 -5 -3 -1 0 1 3 5 7

Two-dimensional constellations
QPSK (4QAM rotated for 90o) 8PSK 16QAM

Fig. 7.10 Some one-dimensional and two-dimensional signal constellations

Fig. 7.11 Constellation and 00 (e jπ/2)


“trellis” for QPSK (without 00
error control coding)

01
d1
10 01

10

11 11

Consider now digital communication system using QPSK where every level
(carrier phase) corresponds to two information bits. Let the error probability is
higher than allowed. What to do? One way is to use the higher transmission power.
However, there are also some other ways. One is to use convolutional code R = 2/3,
with the same power. Error probability will be smaller, but the information rate will
be smaller as well. The other way is to use 8PSK and the same convolutional code
without changing the power nor bandwidth. In Fig. 7.11 the constellation for QPSK
signal (without error coding) is shown and the corresponding “degenerated trellis”.
Gray code is used. There is no any memory in the signal and all four branches enter
into the same node. Error probability is determined by referent (minimal) distance
pffiffiffi
d1 ¼ 2.
342 7 Convolutional Codes and Viterbi Algorithm

Let augment the number of levels (phases) to eight. Every phase corresponds to
three bits. In Fig. 7.12 the corresponding constellation is shown with the corre-
sponding distances denoted.
Consider now 8PSK and convolutional coding R = 2/3. Two equivalent encoder
configurations are shown in Fig. 7.13 and the part of the corresponding trellis. The
encoder is systematic (easily seen from the second variant). From every node four
branches (paths) are going out (corresponding to different information bits combi-
nations—last two bits in tribit) and four paths enter into every node. It should be
noticed that every nonsystematic convolutional encoder [if it is not catastrophic
(Problem 7.4)] can be made equivalent to a systematic encoder, if feedback is allowed.
The corresponding TCM modulator is shown in Fig. 7.14.
By comparing the paths going from node (0) to node (0) one can find that
the distance of other paths from the all zeros path, over the node (1) is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dð1Þ ¼ d02 þ d12 , over the node (2) dð2Þ ¼ d22 þ d32 and over the node
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(3) dð3Þ ¼ d32 þ d02 . Therefore, the free distance is dfree ¼ dð1Þ ¼ d02 þ d12 , and
the code gain is
pffiffiffi
d02 þ d12 2  2þ2
Ga ¼ 10  log ¼ 10  log ¼ 1; 1 dB:
d12 2

This gain is modest, but it can be increased by optimal constellation mapping


and by choice of optimal convolutional encoder for a given constellation.
If during Viterbi decoding some errors occur, they can propagate. The catas-
trophic error propagation can occur as well, but only if the corresponding

000(ejπ/2)
d0

100 001(ejπ/4)

d1
011(ej0)

101
d2
d3

111 010

110

Fig. 7.12 8PSK signal constellation, bit combinations and Euclidean distances.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
(d0 ¼ 2  2; d1 ¼ 2; d2 ¼ 2 þ 2; d3 ¼ 2)
Brief Theoretical Overview 343

+
+ +
+
+

(0) 000 000


001
011
111

110
110
(1) 011
010

100 101

101

(2) 110
111

Fig. 7.13 Two equivalent scheme of convolutional encoder and part of trellis

ej0
ENCODER
ejπ/4
Inf.
bits
SIGNAL ejπ/2
SELECTOR .
.
.

Fig. 7.14 TCM modulator

generator polynomials are not suitably chosen, i.e. if they have the common factors.
At the end of convolutional decoding a few bits non carrying information (tail bits)
(Problem 7.8) are added to provide the trellis termination in the initial state.
However, this number is very small comparing to the number of bits usually
transferred in one file. Convolutional codes are practically used for error correction
only. They can be used as well to correct bursts of errors. Block codes decoding by
using trellis will be considered in the next chapter.
344 7 Convolutional Codes and Viterbi Algorithm

Problems

Problem 7.1 The convolutional encoder is shown in Fig. 7.15.


Write polynomial and matrix description of the encoder, find the outputs and the
next states corresponding to all possible initial states and to all inputs.
Solution
Polynomial form describes how the inputs (and their delayed values as well) yield
the encoder outputs

y1 ¼ x1  s1  s2 ¼ ð1 þ D þ D2 Þx1 ;
y2 ¼ x1  s3 ¼ x1 þ Dx2 ;
y3 ¼ x2 ¼ x2 ;

also described in a matrix form (transfer function matrix), having as the elements
polynomials, binary or octal numbers [61]

1 þ D þ D2 1 0
GPOL ðDÞ ¼ ) ðy1 y2 y3 Þ ¼ ðx1 x2 Þ  GPOL ;
0 D 1
  
111 1 0 111 001 000 7 1 0
GBIN ¼ ¼ ) GOCT ¼ ;
0 10 1 000 010 001 0 2 1

where, in binary form a corresponding number of zeros should be added from the
left (or from right, it is a matter of convention which is not unique in the literature)
and after that every three-bit is transformed into a corresponding octal number.

y1

x1
s1 s2

y2

x2
s3

y3

Fig. 7.15 The convolutional encoder structure


Problems 345

Corresponding trellis parameters are determined directly from the encoder structure:
– the encoder has in total m = 3 delay cells, the number of states is 2m = 8;
– the encoder has in total k = 2 inputs (number of rows in GOCT), number of
branches leaving every node is 2k = 4;
– the encoder has in total three outputs (number of columns in GOCT), number of
outgoing bits for every branch is n = 3.
Transitions into the next states are given by s01 = x1, s02 = s1, s03 = x2. The
encoder is described in details by Table 7.3, while the corresponding state diagram
is shown in Fig. 7.16. For a better insight the corresponding combinations of input
and output bits are omitted.
It can be shown that with the code memory increasing, its correcting capability
increases as well, the code rate remaining constant (the same number of inputs and
outputs), but with the number of delay cells the number of states grows expo-
nentially, resulting in more complicate decoding procedure.
On the other hand, all convolutional codes which have the same code rate and
the memory order have not the same error correction capability. A code quality
depends on the encoder structure and is primarily determined by a corresponding
free distance, as shown in the next problem.
Problem 7.2
(a) Find the modified generating function and the weight spectrum of the con-
volutional code defined by matrix G(D) = [1 + D, D];
(b) Draw a few first steps of a trellis diagram needed to find the weight spectrum
of the convolutional code defined by matrix G(D) = [1 + D+D2, 1].

Solution
(a) The encoder block scheme is shown in Fig. 7.17, and the corresponding
state diagram is shown in Fig. 7.18a. Modified state diagram is shown in
Fig. 7.18b.

Table 7.3 Description of encoder functioning


s1 s2 s3 x1 = 0, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 0 x1 = 1, x2 = 1
s01 s02 s03 y1 y2 s01 s02 s03 y1 y2 s01 s02 s03 y1 y2 s01 s02 s03 y1 y2
0 0 0 0 0 0 000 0 0 1 001 1 0 0 110 1 0 1 111
0 0 1 0 0 0 010 0 0 1 011 1 0 0 100 1 0 1 101
0 1 0 0 0 0 100 0 0 1 101 1 0 0 010 1 0 1 011
0 1 1 0 0 0 110 0 0 1 111 1 0 0 000 1 0 1 001
1 0 0 0 1 0 100 0 1 1 101 1 1 0 010 1 1 1 011
1 0 1 0 1 0 110 0 1 1 111 1 1 0 000 1 1 1 001
1 1 0 0 1 0 000 0 1 1 001 1 1 0 110 1 1 1 111
1 1 1 0 1 0 010 0 1 1 011 1 1 0 100 1 1 1 101
346 7 Convolutional Codes and Viterbi Algorithm

000

001 111

010 110

011 101

100

Fig. 7.16 State diagram

y1

x
y2

Fig. 7.17 Convolutional encoder structure

(a) (b)
Sout Sin
0/11

0 1 LND LD2

1/10
S1
0/00 1/01

LND

Fig. 7.18 Original (a) and modified (b) state diagram of the encoder shown in Fig. 7.17
Problems 347

The marks on the branches of a modified state diagram are standard, every
branch that has n units at the input is marked with Nn, every branch that has
d units at the output is marked with Dd. One time interval corresponds to the
transmission of one information bit, it is symbolically denoted with L. The
corresponding system of equations is

S1 ¼ Sout LND þ S1 LND; Sin ¼ S1 LD2 ;

and it is easy to find a relation connecting the initial and the final zero state,
yielding a modified generation function of a code [47]

LND Sin L2 ND3


Sin ¼ LD2 Sout ) TðL; N; DÞ ¼ ¼ :
1  LND Sout 1  LND

As explained in the introductory part of this chapter, by using

1 X1
¼ xi ¼ 1 þ x þ x2 þ x3 þ x4 þ   
1x i¼0

the modified generating function can be represented as an infinite series

TðL; N; DÞ ¼ L2 ND3 1 þ LND þ L2 N 2 D2 þ L3 N 3 D3 . . .


¼ L2 ND3 þ L3 N 2 D4 þ L4 N 3 D5 þ L2 N 4 D6 þ   

where the element m Ll N n Dd shows that there are m weight paths, d formed
for l entering bits, from which in this entering sequence there are n ones.
Denoting with ad,n a number of paths having weight d, generated by an input
of the weight n, a modified generating function for L = 1 can be written in a
form [47]

X
1 X
1
TðN; DÞ ¼ ad;n N n Dd ;
d¼dfree n¼1

where the total number of input ones on the paths having a weight d, denoted
by cd, can be found from

@TðD; NÞ X1 X1 X1 X1
jN¼1 ¼ nad;n Dd ¼ cd D d ) cd ¼ nad;n ;
@N d¼d n¼1 d¼dfree n¼1
free

while, a simplified generating code function T(D) is


348 7 Convolutional Codes and Viterbi Algorithm

X
1
TðDÞ ¼ TðL; N; DÞjL¼N¼1 ¼ ad Dd ¼ D3 þ D4 þ D5 þ . . .
d¼dfree

Coefficients ad,n and cd determine weight spectrum components of the ana-


lyzed code. For a considered case these coefficients are given in Table 7.4, and
they can be determined as well from a modified trellis shown in Fig. 7.19. For
this encoder, there is one path for every weight of sequence at the encoder
output, where the number of bits in the input sequence is always for two
smaller than its length, therefore, ad = 1 and cd = d−2 for d 3. A free code
distance is here dfree = 3, it is relatively small, as well as a code capability to
correct the errors.
(b) Block scheme of the encoder is given in Fig. 7.20 and the corresponding state
diagram in Fig. 7.21a. Modified state diagram is shown in Fig. 7.21b.
From the trellis shown in Fig. 7.22 it can be noticed that there are two paths
with weight d = 4, four paths with weight d = 6 etc. Code free distance is
dfree = 4, and this code is better than the code analyzed in (a).

Table 7.4 Spectrum of a code which state diagram is shown in Fig. 7.19
d 1 2 3 4 5 6 7 8 …
ad 0 0 1 1 1 1 1 1 …
cd 0 0 1 2 3 4 5 6 …

L2N1D3 L3N2D4 L4N3D5 L5N4D6


0
1/10
0/11 0/11 0/11 0/11

1
1/01 1/01 1/01

Fig. 7.19 Modified trellis for finding weight spectrum for a code from (a)

Fig. 7.20 Convolutional y1


encoder structure

x1

y2
Problems 349

(a) (b)
0/00 Sout Sin

00 LND2 LD
1/11
0/10 LND
1/01
S2 S1
10 01
LD
0/10

LND L
1/01 0/00
S3
11

1/11 LND2

Fig. 7.21 Original (a) and modified (b) state diagram of the encoder shown in Fig. 7.20

L3N1D4 L4N2D4 L5N2D6 5 3 6 L6N3D6


0 L N D
0/10 0/10 0/10 L6N3D6
0/10
1/11
1 1/01
1/01
0/10
1/01 0/10
2 0/10

0/00
1/01 0/00 1/01 0/00 1/01 1/01
3
1/11 1/11 1/11 1/11

Fig. 7.22 Trellis paths determining a code spectrum

Problem 7.3 One digital communication system for error control coding use the
convolutional encoder and Viterbi decoding. A convolutional encoder has one
input, two outputs, defined by matrix

GPOL ðDÞ ¼ 1 þ D þ D2 ; 1 þ D2
The initial convolutional encoder state is 00.
(a) Draw the convolutional encoder block scheme, find the table describing the
encoder functioning and draw a state diagram.
(b) If the decoder input sequence is

00 11 10 11 11 0

decode the sequence using Viterbi algorithm with Hamming metric.


(c) What result would be obtained if decoding starts from the second received bit?
350 7 Convolutional Codes and Viterbi Algorithm

Solution
Using matrix GPOL(D) a convolutional encoder structure is firstly obtained, shown
in Fig. 7.23, where D denotes a delay corresponding to the duration of one infor-
mation symbol. This symbol is used because D flop-flops are usually used as delay
cells. There are m = 2 delay cells, a total number of states is 2m = 4 being deter-
mined by all combinations of cell delay outputs, denoted by s1 and s2. Number m is
usually called memory order, and m = (m + 1)n is a constraint length, showing a
number of output bits depending on one input bit. There is only one input and from
every state 2k = 2 paths start. The number of encoder outputs is n = 2, its code rate
is R = k/n = 1/2, meaning that a binary rate at the output is two times greater than
information rate at its input,
(a) If the series of encoder input bits is denoted by xn, states are determined by
previously emitted bits in this sequence, i.e.

x ¼ xn ; s1 ¼ xn1 ; s2 ¼ xn2

where s1 ; s2 define a current encoder state S ¼ ðs1 s2 Þ, they are usually denoted
by the corresponding decimal numbers (00 = 0, 01 = 1, 10 = 2, 11 = 3). On
the other hand, the next state S0 ¼ ðs01 s02 Þ and the output are determined by
encoder structure and by the previous state:

s01 ¼ x; s02 ¼ s1 ;
y1 ¼ x  s 1  s 2 ; y2 ¼ x  s 2 :

For all possible combinations of x, s1 and s2 values, using the above relations,
Table 7.5 is formed. Corresponding state diagram and trellis are shown in
Fig. 7.24a, b. The transitions corresponding to ‘0’ information bits are drawn
by dashed lines and those corresponding to ‘1’ information bits are drawn by
full lines. It can be noticed that from two paths going out of a considered state,
the branch going “up” (relatively to the other one) corresponds to ‘0’ infor-
mation bit and vice versa. This rule is always valid when there is not a

y1
(1)
D D
s1 s2
(2)
y2

Fig. 7.23 Convolutional encoder block scheme


Problems 351

Table 7.5 The encoder functioning


x s1 s2 s01 s02 y1 y2
0 0 0 0 0 00
0 0 1 0 0 11
0 1 0 0 1 10
0 1 1 0 1 01
1 0 0 1 0 11
1 0 1 1 0 00
1 1 0 1 1 01
1 1 1 1 1 10

(a) (b) 0/00


0/00 00 00
1/11
00 0/11
1/11 0/11
01 01
1/00 1/00
10 01
0/10
0/10 10 10

1/01 0/01 1/01


11
0/01

1/10 11 11
1/10

Fig. 7.24 State diagram (a) and trellis (b) of convolutional encoder from Fig. 7.23

feedback in the encoder (i.e. when the encoder is not recursive). In the fol-
lowing it will be shown that this feature makes the decoding process easier.
(b) On the basis of input information bits the encoder emits one of the possible
sequences, denoted by i. The input sequence uniquely determines the output
sequence c transmitted through the channel. If the sequence at the channel
output, i.e. at the decoder input (denoted by r), is known, the input encoder
sequence can be uniquely reconstructed. Therefore, at the receiving end, one
should estimate which of possible sequences was emitted resulting directly in
a decoding sequence.
The encoder functioning can be described by a semi-infinite tree starting from
one node and branching infinitely, and the decoding is equivalent to finding
the corresponding tree branch. To every output sequence corresponds a unique
path through a state diagram—trellis diagram. An example for a trellis cor-
responding to the considered convolutional encoder, if the initial state was 00
352 7 Convolutional Codes and Viterbi Algorithm

t=0 t=1 t=2 t=3


0/00 0/00 0/00
00
1/11
1/11
0/11
01 1/11
1/00
0/10
0/10

10
1/01

1/01 0/01

11
1/10

Fig. 7.25 Trellis corresponding to a sequence for the first three information bits at the encoder
input

(encoder was reset at the beginning) is shown in Fig. 7.25. The transient
regime is finished when in every of states the same number of branches enters
as started from the initial state at the depth t = 0. In the case of encoder having
one input, the transient regime duration equals the number of delay cells.
In the case of Viterbi decoder with Hamming metric the rules are as follows
[62]:
1. To every branch a branch metric is adjoined being equal to the number of
different outgoing bits from the branch relating to the received bits in this step.
2. To the nodes a metric is adjoined equal to the sum of the node metric of the
node from which the incoming branch starts and a branch metric of this branch.
In this problem in every node two branches are coming, resulting in two dif-
ferent metrics. From these two the branch is chosen which has a smaller metric,
the corresponding path is survived. The branch with a larger Hamming distance
is eliminated (overwritten) and erased from a trellis (the corresponding data are
erased from the memory). In such way the path with smaller Hamming distance
is always chosen.
If at some trellis depth both branches outgoing from some state (a trellis node)
are eliminated, then a branch (i.e. all the branches) entering into this state are
eliminated.
3. A unique path from depth t = 0 till depth t = N, determines N decoded bits,
having the same values as the first N bits at the convolutional encoder input, to
which corresponds this path.
4. If the Hamming distance for two paths is the same, any of them can be chosen.
However, to accelerate the decoding, usually one of the following criteria is
applied:
Problems 353

– a “more suitable” path is chosen, so as to eliminate a branch making “more


trouble”, i.e. the branch allowing the elimination of more branches at the
trellis.
– always the upper branch is chosen—although suboptimal, this approach
yields a good results if successively repeated many times.
For a considered case the decoder input sequence is:

00 11 10 11 11 01 01 11 00 11 01 10 1:

At the beginning, the encoder was reset, and decoding starts from a state 00 as
well. For every next step (for every new information bit entering the encoder) a
new state diagram is drawn, but only the states are connected where the tran-
sitions are possible in this step.
1. Step—transient regime
As shown in Fig. 7.26 and in Table 7.6, in this case the transition regime ends at
the depth t = 3, because in this step two branches are entering into the every state.
The possible sequences at the convolutional encoder output are compared to the
received sequence and for every pair of paths entering the same node that one is
chosen differing in smaller number of bits, i.e. having smaller Hamming distance
[for a depth t, it is denoted by d(t)]. Survived path goes from state 00 into the state
00. This change of states in convolutional encoder is caused by the input bit 0,
easily seen from the state diagram (encoder output is 00). In the case of correct
transmission encoder and decoder change the states simultaneously, and the
decoder performs an inverse operation, i.e. to the encoder input corresponds the
decoder output. Therefore, the decoder output is i1 = 0.

00 11 10
0/00 0/00 0/00 3
00
1/11 4
1/11
0/11
0
01 1/11
1/00 5
0/10
0/10 3
10
4
1/01
1/01 0/01
2
11
1/10 3

Fig. 7.26 Decoding after the transient regime


354 7 Convolutional Codes and Viterbi Algorithm

Table 7.6 Detailed data concerning decoding during the transient regime
Corresponding Decoder input/ d(3)
path encoder output
Received sequence 00 11 10
Possible paths, corresponding 0!0 0-0-0-0 00 00 00 3
emitted bits at the convolutional 0-2-1-0 11 10 11 4
encoder output and Hamming
0!1 0-0-2-1 00 11 10 0
distance to the received sequence
0-2-3-1 11 01 01 5
0!2 0-0-0-2 00 00 11 3
0-2-1-2 11 10 00 4
0!3 0-0-2-3 00 11 01 2
0-2-3-3 11 01 10 3

2. Step—stationary regime
All overwritten paths are eliminated and only the survived paths are withheld, i.e.
the paths having a smaller Hamming distance. The Hamming distance of a survived
path becomes the Hamming distance of a node where the path terminates. At the
depth t = 4 in state 00 enter two paths—one with d(4) = 5 and the other with
d(4) = 0. Therefore, the probability that a code sequence 00 00 00 00 was emitted is
very small, because the sequence 00 11 10 11 in this case could be received only if
there were five bit errors (bits 3, 4, 5, 7 and 8). The decision made is more reliable if
the metric difference between two competing paths is larger, and the decisions for
states 01, 10 and 11 are less reliable than that one for a state 00.
For further calculation it is taken into account the previous Hamming distance of
every node (from which the branches start) and a trellis structure is the same
between trellis depths t = 2 and t = 3. It is clear that after this step the path to trellis
depth t = 2 survived, and the decoded bits are i1 = 0 and i2 = 1 (see the Fig. 7.27

00 11 10 11
0/00 0/00 0/00 3 0/00 5
00
1/11 1/11 0

1/11 0/11
0 4
01
1/00 3
0/10 3 0/10 3
10
2
1/01 1/01

0/01
2 4
1/10
11
3

Fig. 7.27 Decoding after eight received bits


Problems 355

Table 7.7 Detailed data concerning decoding after eight received bits
Corresponding d(3) Decoder input/ d = d(3) + d
path encoder output (3 − 4)
t=3!t=4
Received sequence 11
Possible paths, 0!0 0-0 3 00 3 + 2 = 5
corresponding 1-0 0 11 0 + 0 = 0
emitted bits at the
0!1 2-1 3 10 3 + 1 = 4
convolutional encoder
output and Hamming 3-1 2 01 2 + 1 = 3
distance to the received 0!2 0-2 3 11 3 + 0 = 3
sequence 1-2 0 00 0 + 2 = 2
0!3 2-3 3 01 3 + 1 = 4
3-3 2 10 2 + 1 = 3

and Table 7.7). In this step the second information bit was decoded because i1 was
decoded in a previous step.
3. Step—stationary regime continuation
The principle is the same as in the previous case, but the node metrics have the
other values. All overwritten paths are omitted and only survived paths are with-
held, i.e. the paths which have a smaller Hamming distance. Hamming distance of
survived paths becomes the node metric where the path ends.
From the trellis shown in Fig. 7.28 and from Table 7.8, it is clear that in this step
the third information bit is decoded and decoded bits are i1 = 0, i2 = 1 and i3 = 0. It
should be noticed that the accumulated Hamming distance of survived path till the
depth t = 5 is d(5) = 0, meaning that during the transmission of the first ten bits
over the channel there were no errors. Therefore, with a great probability it can be
supposed that in the future the survived path will follow the path where node

00 11 10 11 11
0/00 0/00 5 0/00 2
00
1/11 0 1/11 3

1/11 0
0/11
4
0/11
3
01
1/00 3 1/00 3
0/10 0/10
3 0/10 3 0
10
2 5
1/01 1/01 1/01

0/01 0/01
2
1/10
4
1/10
3
11
3 4

Fig. 7.28 Decoding after ten received bits


356 7 Convolutional Codes and Viterbi Algorithm

Table 7.8 Detailed data concerning decoding after ten received bits
Corresponding d(4) Decoder input/ d = d(4) + d
path encoder output (4 − 5)
t=4!t=5
Received sequence 11
Possible paths, 0!0 0-0 0 00 0 + 2 = 2
corresponding emitted bits 1-0 3 11 3 + 0 = 3
at the convolutional
0!1 2-1 2 10 2 + 1 = 3
encoder output and
Hamming distance to the 3-1 3 01 3 + 1 = 4
received sequence 0!2 0-2 0 11 0 + 0 = 0
1-2 3 00 3 + 2 = 5
0!3 2-3 2 01 2 + 1 = 3
3-3 3 10 3 + 1 = 4

metrics are equal to zero, i.e. that the next decoded bits are i4 = 0, i5 = 1. However,
these two bits are not still decoded!
Because the transmission is without errors, bit sequence at the encoder output is
the same as the sequence at the decoder input (until this moment—ten transmitted
bits). Therefore, the decoder output sequence must be identical to the convolutional
encoder input sequence. However, for a five bits at the encoder input, at the decoder
output appeared only three. Reader should think about this phenomenon.
(c) If the decoding starts from the second bit of the received sequence, it can be
considered that the decoder input is

01 11 01 11 10

The transient regime for this case is shown in Fig. 7.29, and it is clear that at
the depth t = 3 no one bit was decoded. The same situation is repeated and
after two additional received bits (Fig. 7.30)—no one information bit was
decoded and the path with Hamming distance equals zero does not exists.
After two more received bits the situation is unchanged, only the node metrics
have other values (shown in Fig. 7.31). Also, at the trellis depth t = 5, the
paths entering into two nodes have the same Hamming distance. In spite of
here applied approach where a more suitable path was chosen, i.e. that one was
eliminated yielding a successive elimination of a more paths in previous steps,
the noticeable branch shortening was not achieved, and after three steps no one
bit was detected (decoded)!
There can be two possible reasons:
1. Large number of errors during the transmission
2. Bad synchronization.
Problems 357

01 11 01
0/00 0/00 0/00 4
00
1/11 3
1/11
0/11
3
01 1/11
1/00 2
0/10
0/10 4
10
3
1/01

1/01 0/01
1
11
1/10 4

Fig. 7.29 Decoding after the transient regime

01 11 01 11
0/00 4 0/00 5
00
3 1/11 2
1/11
0/11 0/11
3 4
01 1/11
1/00 2 1/00 2
0/10

4 0/10 3
10 4
3 1/01
1/01

1/01 0/01 0/01


1 4
11 1/10
4 2

Fig. 7.30 Decoding after eight received bits

Because all paths during the first bit decoding are shortened, it is obviously the
bad synchronization. It can be concluded as well and during the further decoding,
because a minimum Hamming distance for any state at some large trellis depth will
attain the large value, and a distance of two paths entering the same node will differ
for a small values.
358 7 Convolutional Codes and Viterbi Algorithm

01 11 01 11 10
0/00 0/00 0/00 3
00
1/11 2 1/11 3
1/11
0/11 0/11 0/11
3
01 1/11

0/10
1/00 1/00 2 1/00 4
0/10 3 0/10 3
10 3
1/01 1/01 1/01

1/01 0/01 0/01 0/01


5
11 1/10 2 1/10 2

Fig. 7.31 Decoding after ten received bits

Problem 7.4 One digital communication system for error control coding use the
convolutional encoder and Viterbi decoding. Convolutional encoder has one input
and three outputs defined by

y1n ¼ xn ; y2n ¼ xn  xn2 ; y3n ¼ xn  xn1  xn2 :

(a) Draw the block scheme of the convolutional encoder and find its generator
matrix.
(b) Find the code rate and draw the state diagram. Is the encoder a systematic one?
(c) If it is known that the decoding starts from the first received bit and that the
initial state of convolutional encoder is 01, decode the first two information
bits if at the input of Viterbi decoder is the sequence

100 110 010 100 000 000 111

Whether in this moment some further information bit was decoded? If yes,
how many?
(d) Comment the code capability to correct the errors.

Solution
(a) Encoder block scheme is shown in Fig. 7.32, and the code generator is

GðDÞ ¼ 1 þ D2 ; 1 þ D; D2 :

(b) Code rate equals to the ratio of inputs and outputs of encoder—R = k/n = 1/3.
It is a systematic encoder because the first input is directly sent to the third
output. State diagram is shown in Fig. 7.33. Because there is no a loop con-
taining all zeros at the output for a nonzero input sequence, the encoder is not
a catastrophic one.
Problems 359

Fig. 7.32 Encoder block y1


scheme

x s1 s2

y2

y3

Fig. 7.33 State diagram of 0/000


the encoder

1/111 00 0/100

1/011
10 01
0/010
0/110
1/101 11

1/001

(c) According to problem text, the encoder initial state is 01. The decoding starts
from this state. At the decoder input at every moment enter n = 3 bits, the
trellis has 2m ¼ 4 states. As shown in Fig. 7.34, after the transient regime, the
path survived at the trellis depth t = 1, and one information bit is decoded—
i1 = 0. After the second step (see Fig. 7.35) there are no new decoded bits.
The decoding should be continued, because two information bits are not yet
decoded.
In the next step, shown in Fig. 7.36, two additional bits are decoded, i2 = 1 and
i3 = 0. At this moment the condition that at least two information bits are decoded
is fulfilled, the decoding procedure should be stopped, and there is no need to use
other bits at the decoder input.
The Viterbi algorithm has just a feature that the rate at the decoder output is not
constant. It can be explained by the fact that this algorithm does not make the
decision instantly, but in some way it “ponders” until the moment when the
360 7 Convolutional Codes and Viterbi Algorithm

100 110 010


0/000 0/000 3
00
1/111 6
0/100 1/111
0/100
1
01
1/011 6
0/010
1/011
0/010 4
10
5
1/101

1/101 0/110
4
11
1/001 7

Fig. 7.34 Viterbi algorithm result after the transient regime

100 110 010 100


0/000 0/000 3 0/000 4
00
1/111 1/111 1
0/100 1/111
0/100
1 6
01
1/011 5
0/010 4 0/010 5
10
4
1/101 1/101

0/110
5
11 1/001
4 6

Fig. 7.35 Viterbi algorithm result after twelve received bits

similarity of the received sequence is “sufficiently large” to correspond to one of


emitted sequences at the convolutional encoder output, i.e. to one of the trellis
paths. The delay during decoding is of variable duration by itself, and at the
moment when the decision is finally made, a greater number of information bits can
be decoded. Because the delay can be very large, decoded signal rate variations can
be lessen if at the decoder output a buffer is inserted having relatively large
capacity.
Problems 361

100 110 010 100 000


0/000 0/000 1
00
1/111 1 1/111 6
0/100
1/111 0/100 0/100
5 5
01
1/011 1/011 7
0/010
0/010 0/010
4
10
4 7
1/101 1/101
1/101
0/110 0/110
5 6
11 1/001 6

Fig. 7.36 Viterbi algorithm result after fifteen received bits

110 000 000


0/000 0/000 0/000 ec=2
00
dfree-ec=3

1/111 0/100
01

0/010
10

11

Fig. 7.37 The influence of parameter dfree on the decoding capability

Suppose at a moment that the encoder at the beginning was reset and that at his
output the sequence of all zeros is emitted, and that the channel introduces the
errors at the first ec = 2 bits. On the basis of Fig. 7.37 it is clear that the decoder will
nevertheless made a good decision, i.e. it will chose the path corresponding to the
emitted sequence. If ec = 3 the decision will be wrong as this decoder can correct at
most ec = 2 errors if they are sufficiently separated. It can be shown that by using
Viterbi algorithm, regardless the initial state and the sequence at the encoder input,
one can correct up to (dfree − 1)/2 bits (errors) on the constraint length m = (m + 1)n
bits.
Problem 7.5 One digital communication system for error control coding uses the
convolutional encoder and Viterbi decoding. The encoder is shown in Fig. 7.38.
362 7 Convolutional Codes and Viterbi Algorithm

y1

x x' s1 s2

P y2

y3

Fig. 7.38 Convolutional encoder block scheme

(a) Do the encoder structure can be presented in a form more suitable for
decoding?
(b) Find the bit sequence at the encoder output if the switch is closed, and at his
input is the binary sequence i = (0110…).
(c) When the samples of channel noise are n = (0.2, −0.1, −0.5, −0.15, −0.6, 0.6,
−0.1, 0.15, 0.4, 0.1, 0.3, 0.1), find the decoder output sequence if Hamming
metric is used, after that repeat the same procedure if the decoder uses
Euclidean metric.
(d) If the switch is open decode a sequence by decoder using Euclidean metric.
(e) If the switch is open calculate the approximate values of error probability for
both ways of decoding (hard and soft) if Eb/N0 = 5 [dB].

Solution
(a) It is clear that the delay cell at the encoder input will not substantially influence
the decoding result—sequence x is obtained at the cell output becoming x′ after
only one step. Therefore, during the decoding it is sufficient to reconstruct
sequence x′, and this cell can be omitted. On the other hand, output y3 corre-
sponds to the input sequence delayed for three bits, denoted by s2 (corre-
sponding to the second state component), and the output is active only when the
switch is closed. Now, it is obvious that this encoder can be completely
described by a simplified block scheme shown in Fig, 7.39, i.e. by the state
diagram having four states.

Fig. 7.39 Simplified y1


convolutional encoder
structure (switch closed)
P
x' s1 s2
y3

y2
Problems 363

(b) When the switch is closed, the corresponding code rate is R = 1/3, and the
code is systematic because the input is practically sent to the third output (only
the delay of two bit intervals is introduced). One can suppose that the encoder
was reset at the beginning, because nothing contrary was said in the problem
text. Corresponding trellis and state diagram are similar to these ones in
Problem 7.3, but the first and the second output bit change the places, while
the third output bit is determined by the first bit of the state from which the
corresponding branch at the trellis starts. For a sequence i = (0110) at the
encoder input, the following transitions in the Fig. 7.40 can be observed:
– the first input bit 0 causes the transition from the state 00 into the state 00,
and 000 is emitted at the output,
– the second input bit 1 causes the transition from the state 00 into the state
10, and 110 is emitted at the output,
– the third input bit 1 causes the transition from the state 10 into the state 11,
and 100 is emitted at the output,
– the fourth input bit 0 causes the transition from the state 11 into the state
01, and 101 is emitted at the output,
yielding the channel input sequence

c ¼ ð000110100101Þ:

(c) If for transmission the unipolar pulses are used, the corresponding voltage
levels being 1 and 0, then x = c, and for a known input sequence and noise
samples the following is obtained

y ¼ x þ n ¼ ð0:2; 0:1; 0:5; 0:85; 0:4; 0:6; 0:9; 0:15; 0:4; 1:1; 0:3; 1:1Þ:

t=0 t=1 t=2 t=3 t=4


0/000 0/000 0/000 0/000
00
1/110 1/110
1/110
0/111 0/111
01 1/110
1/001 1/001
0/010
0/010 0/010
10
1/100 1/100

1/100 0/101 0/101

11 1/011 1/011

Fig. 7.40 Trellis for the simplified convolutional encoder structure (switch closed)
364 7 Convolutional Codes and Viterbi Algorithm

i c x Channel y
Convolutional 0→0 Viterbi decoder,
Source with User
encoder 1→1 Euclidean metric
noise

<0,5 → 0 r Viterbi decoder,


>0,5 → 1 Hamming metric

Fig. 7.41 Block scheme of a complete system, both ways of decoding are illustrated

000 101 100 101


000 000 000 3 000 5
00
110 6 110 4
110
111 111
4 6
01 110
001 3 001 2
011
010 3 010 5
10 6
4
100 100

101 101 101


2 4
11 011 5 011 4

Fig. 7.42 Viterbi decoding using Hamming metric, switch closed

The further procedure is illustrated in Fig. 7.41. When the hard decoding is
used, between the channel output and the decoder input, a decision block
should be inserted (a comparison to a threshold), while for soft decoding there
is no need for it. If the decision is carried out on one sample basis, it is
understood that the symbols are equally probable, and a threshold is at 0.5.
At the trellis corresponding to the hard decoding, shown in Fig. 7.42, the input
bits corresponding to branches are not denoted (from two branches leaving
one node, the “upper” branch always corresponds to input bit 0 and it is
denoted by dashed line). The branches eliminated after the transient regime are
denoted with ‘+’ and those eliminated in the next step with ‘’. After the first
two steps of decoding no one bit was decoded, but it is clear that if the first
decoded bit is i1 = 0 the second must be i2 = 1, while, the alternative com-
bination is i1 = 1, i2 = 1. Most likely to be detected is a sequence i = (0110),
because for a corresponding path d = 2, but it is not still the algorithm result.
In Fig. 7.43 the procedure of soft decoding is shown, when the metrics are
calculated as a square of Euclidean distance between the received sequence
y and a supposed code word x, i.e. [63]
Problems 365

0.2; -0.1; -0.5 0.85; 0.4; 0.6 0.9; 0.15; 0.4 1.1; 0.3; 1.1
000 000 000 2.535 000 5.045
00
110 4.435 110 3.245
110
111 111
2.735 5.345
01 110
001 2.835 001 1.345
011
010 2.435 010 4.245
10 4.535 4.045
100 100
101
101 101
1.235 3.745
11 011 011
4.335 2.945

Fig. 7.43 Viterbi decoding using Euclidean metric, the switch is closed

X
n
dE2 ðx; yÞ ¼ ðxi  yi Þ2 :
i¼1

Here two bits were successfully decoded—i1 = 0 and i2 = 1. It is rather certain that
the next two decoded bits will be i3 = 1 and i4 = 0 (look at the metrics relation at
depth t = 4!). If the numerical metrics values are compared, the Hamming metrics
can be considered as a rounded off values of Euclidean metrics, where the error of
rounding off propagates resulting in non-optimum decision.
(d) When the switch is open, the encoder is no more a systematic one, and its code
rate is increased to R = 1/2. This encoder is not catastrophic (as well as in the
case of closed switch) because the polynomial corresponding to the second
output is not divisible with the polynomial corresponding to the third output
[D2 is not a factor of polynomial 1 + D+D2 over the field GF(2)].
In this case the trellis structure is changed in such a way that the third encoder
output is not taken into account. Code word is now c = (0,0, 1,1, 1,0, 1,0), and
at the decoder input is the sequence obtained by superimposition of the first
eight noise samples on the code word, i.e. y = (0.2; −0.1; 0.5; 0.85; 0.4; 0.6;
0.9; 0.15). Decoding procedure is illustrated in Fig. 7.44, and one bit is cor-
rectly decoded (i1 = 0). Although the transmitted sequence i = (0110) is most
likely to be reconstructed, at this moment even its second bit is not yet
decoded, because the noise influence is stronger than in previous case (e.g. it is
not the same case if noise sample −0.5 is superimposed on 0 and 1!), and
because of the increased code rate, the correcting capability is decreased.
(e) Error probability registered by a user when hard decoding is used, for
implemented encoder having one input (k = 1) is [25, 26]
366 7 Convolutional Codes and Viterbi Algorithm

0.2; -0.1 0.5; 0.85 0.4; 0.6 0.9; 0.15


00 00 00 1.5425 00 2.375
00
11 2.6425 11 1.375
11
11 11
0.6425 3.075
01 11
00 3.5425 00 1.075
01 01
01 1.5425 2.275
10 2.6425 1.475
10 10
10
10 10
1.0425
1.575
11 01 3.1425 01 2.575

Fig. 7.44 Viterbi decoding using Euclidean metric, switch open

X
1 h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiid
Pb;HM \ cd 2 pð1  pÞ cdfree 2dfree pdfree =2
d¼dfree
 rffiffiffiffiffiffi dfree =2
1 Eb
¼ cdfree 2 dfree
erfcð Þ ;
2 N0

while for a Viterbi algorithm using soft decision is


rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 1
Eb cdfree Eb
Pb;EM  cd erfcð Rd Þ erfcð Rdfree Þ:
2 d¼d N0 2 N0
free

where cd denotes the total number of input binary ones at all sequences having
the weight d, while p denotes the probability of error at the binary symmetric
channel (crossover probability) when error control coding is not applied.
For the first case R = 1/3, dfree = 7, c7 = 1, and in the case when the switch is
open R = 1/2, dfree = 5, c5 = 1. If it is known that before the error control
application the signal-to-noise ratio was Eb/N0 = 105/10 (it is a value per
information bit, while after the encoding the ratio of energy per code word bit
and noise spectral power density it decreases to Ec/N0 = Eb/N0  R). The
reader should find the numerical values for every considered case (switch
closed or open, Hamming or Euclidean metric).

Problem 7.6
(a) If at the input of the convolutional encoder defined by the generator
G(D) = [1 + D+D2, 1 + D2] is the sequence i = (010010) find the corre-
sponding code word and explain the effect of quantization on the system
performances. The system uses BPSK modulation, Viterbi decoding and
Problems 367

Euclidean metric. Find a sequence for decoding if two and four quantization
regions are used and if the real noise samples are:

n ¼ ð0:51; 0:1; 0:95; 1:1; 0:2; 0:9; 2:34; 0:68; 0:42; 0:56; 0:08; 0:2Þ:

(b) Using the same example explain the decoding with a fixed delay where the
decision about transmitted bit is made after three steps. What are the advan-
tages and the drawbacks of this decoding type? Decode the first four infor-
mation bits if at the receiver input is the same sequence as above, but if
eight-level quantization is used.

Solution
(a) Soft decision in practice can never be applied exactly because the Viterbi
algorithm is usually implemented by software or hardware, and the samples of
received signal are represented by a limited number of bits. Therefore, the
received signal should be quantized with the finite number of levels, and it is
suitable that this number is a power of two.
Let BPSK signal has a unity power (baseband equivalent is +1/−1) while the
received signal is uniformly quantized. The positive and negative values of
additive noise are equally probable (easily concluded from a constellation
diagram of BPSK signal and noise), it is logical that the thresholds are
symmetric with respect to the origin.
– Quantization into only Q = 2 regions is trivial reducing to the hard deci-
sion, where only one threshold is used for the comparison. Centers of
quantization regions can be denoted using one bit (1-bit quantization), and
samples at the decoder input are denoted by −1 and +1, as shown in
Fig. 7.45a.
– In the case of quantization using Q = 4 decision regions, it is logical that
regions in the negative part are symmetric in respect to the points corre-
sponding to transmitted signals, in the same time not disturbing the sym-
metry regarding to the origin. In this case two bits are used for
quantization, regions borders have values −1, 0, 1 and quantization regions
centers are points −1.5, −0.5, 0.5 and 1.5, as shown in Fig. 7.45b.
To the encoder input sequence i = (010010) corresponds the code word
c = (001110111110), and the baseband equivalent of BPSK signal becomes

x ¼ ð1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1Þ:

Taking into account the noise samples, signal at the receiver input is
368 7 Convolutional Codes and Viterbi Algorithm

(a) 2

Quadrature component of the received signal


1.5

0.5

−0.5

−1

−1.5

−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
In−phase component of the received signal

(b) 2
Quadrature component of the received signal

1.5

0.5

−0.5

−1

−1.5

−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
In−phase component of the received signal

Fig. 7.45 BPSK signal quantization with Q = 2 (a) and Q = 4 (b) quantization levels

y ¼ x þ n ¼ ð1:51; 0:90; 0:05; 0:10; 0:80; 1:90; 3:34; 1:68; 0:58; 1:56; 1:08; 1:20Þ;

while the signal values at the quantizer output for Q = 2, Q = 4 are Q = 8 are
as follows:
Problems 369

yq2 ¼ ð1:00; 1:00; 1:00:  1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00; 1:00Þ;
yq4 ¼ ð1:5; 0:5; 0:5; 0:5; 0:5; 1:5; 1:5; 1:5; 0:5; 1:5; 1:5; 1:5Þ;
yq8 ¼ ð1:75; 0:75; 0:25; 0:25; 0:75; 1:75; 1:75; 1:75; 0:75; 1:75; 1:25; 1:25Þ:

It is proposed to reader to carry out a complete decoding procedure for


non-quantized and quantized signal and to draw the corresponding conclu-
sions. In the second part of solution the quantized signal decoding procedure
will be described in details when the initial decoder state is not known, and
during decoding a fixed delay is used.
Of course, the number of regions can be further increased, but it is shown that
the optimum relation between performances and memory capacity is achieved
already for Q = 8 quantization levels (3-bit quantization). In Fig. 7.46 the
results giving residual error probability per bit for Q = 2, 4, 8, 16 quantization
levels are shown, as well as for the case without quantization. The results are
obtained by Monte Carlo simulation for a sample size of N = 100/Pe bits,
where Pe denotes the estimated error probability. As mentioned earlier, the
case Q = 2 corresponds to decision using Hamming metric, and a case without
quantization corresponds to decision using Euclidean metric, both being
considered in a previous problem.

0
10

Q=2
Q=4
Q=8
−1
Q=16
10 unquantized
Bit error rate, BER

−2
10

−3
10

−4
10
0 1 2 3 4 5 6 7
Signal−to noise ratio, snr [dB]

Fig. 7.46 The influence of the number of quantization levels on the error probability registered by
user, encoder with G = [7,5], BPSK modulation, coherent detection
370 7 Convolutional Codes and Viterbi Algorithm

(c) When the decoding is carried out using the fixed delay, the rules are as follows
[25]:
1. For the first step a classic Viterbi algorithm is applied, using a chosen
metric. After the transition regime, in every state a path having smaller
metric is chosen, i.e. path having larger metric is eliminated.
2. If the initial state is not known, the decoding should start from all states,
considering that the accumulated distance in these states are equal to zero.
Duration of transient regime in this case is only one step and the decision
can be done already at a trellis depth t = 1.
3. The procedure from the previous steps is repeated until the trellis depth
determined by a fixed delay duration (registers containing the metrics have
a limited length determining this numerical value!). When this depth is
achieved, the survived paths are found and the state is identified at the
largest depth to whom corresponds the smaller metric. For the considered
example the decoding procedure until this step is shown in Fig. 7.47, fixed
delay duration is three, the survived paths are denoted by heavy lines and
the minimal metric value at the depth t = 3 is 3.375 corresponding to the
state 01.
4. At the trellis the path is found connecting the state having a minimal metric
(at the depth corresponding to a fixed delay) with some of candidates for
the initial state. If a fixed delay is sufficiently large and if during the
decoding a sufficient number of branches was eliminated, this path is
unique. In Fig. 7.48 it is denoted by the strongest lines, with arrows
pointing to the left. In such a way, the initial state is found (in this case 00).
5. Now, it can be considered that the first bit corresponding to this path is
decoded. Therefore, although for the depth t = 1 all paths are not

-1.75 -0.75 0.25 -0.25 0.75 -1.75


t=0 t=1 t=2 t=3
0 0/-1,-1 0.625 0/-1,-1 2.75 0/-1,-1 6.375
00
10.625 5.75 9.375
1/1,1 1/1,1 1/1,1

0/1,1 0/1,1 0/1,1


0 7.625 1.75 3.375
01
3.625 6.75 14.375
1/-1,-1 1/-1,-1 1/-1,-1
0/1,-1 2.750/1,-1
0/1,-1 10.625 10.375
0
10 0.625 5.75 5.375
1/-1,1 1/-1,1 1/-1,1

0/-1,1 0/-1,1 0/-1,1


3.625 3.75 13.375
0
11
1/1,-1 7.625 1/1,-1 4.75 1/1,-1
4.375

Fig. 7.47 Decoding using a fixed delay, the first step


Problems 371

0.25 -0.25 0.75 -1.75 1.75 1.75


t=1 t=2 t=3 t=4
2.75 6.375 0/-1,-1 21.5
0.625 0/-1,-1 0/-1,-1
00
4.5
1/1,1 1/1,1

0/1,1
3.375 18.5
01 1/1,1

1/-1,-1 21.5
0/1,-1
0/1,-1 7.5
10 2.75 10.375 18.5
1/-1,1

0/-1,1
18.5
11
13.375 1/1,-1 21.5

Fig. 7.48 Decoding using a fixed delay, the second step

eliminated except one, it will be considered that the decoded bit is i1 = 0,


because to the denoted path at this depth corresponds the transition from
state 00 into the state 00.
6. After the first bit decoding, the trellis part till the depth t = 1 is erased
(from memory) because there is no need to use it further and the trellis part
from depth t = 3 to depth t = 4 is drawn. It is considered that at the depth
t = 1, one must start from the state 00 and that all paths starting at this
depth from other states are erased, the corresponding trellis shown in
Fig. 7.48.
One more fact should be noted. Because the decision was made that at the
depth t = 1 one should start from state 00, some branches being eliminated
in the previous step now should be taken into account again. For example,
it was previously decided that at depth t = 3 in the state 11 is optimum to
enter from state 11 (distance 4.375), and not from the state 10 (distance
13.375). However, now state 11 can be entered from 11 only if at the depth
t = 1 one starts from the state 10, now not being a valid option. Therefore,
if from the depth t = 1 one starts from the state 00, in the state 11 at depth
t = 3 can be entered only from the state 10 at depth t = 2, what previously
seemed suboptimal.
Also, the following should be noted:
• The branch elimination in steps 1-3 had a goal only to determine the
initial state and to decode the first information bit. After that, these
decisions can be changed
• By the decision change, the metric difference between the path corre-
sponding to the encoder output (“code word”) and other paths increa-
ses, and the decisions become more reliable.
372 7 Convolutional Codes and Viterbi Algorithm

• The previous claim is valid only in a case if the decision about the first
information bit was made correctly. Then the uncertainty about the
result of decoding really becomes smaller, because this bit value
influences to a series of next bits emitted by encoder. However, if the
decision is incorrect, the consequences for a further decoding procedure
are catastrophic.
• Of course, the decision about the first information bit is more reliable if
a fixed delay has a larger value.
7. Now, the trellis is formed from depth t = 1 to depth t = 4, starting from a
state 00, where some branches are back, and metrics are corrected as
shown in Fig. 7.48 The procedure from 1-6 is now repeated:
– metrics are calculated, some branches are eliminated and survived paths
are denoted;
– at the depth t = 4 the state having the smallest metric is identified (it is
10, the corresponding distance 4.5);
– the path is found using the survived branches from this state to the
initial state (being now known);
– the second information bit is decoded, being here i2 = 1;
– part of a trellis till the depth t = 2 is erased, and a new part from the
depth t = 4 to the depth t = 5 is formed, while a new initial state is 10;
– the metrics from the previous step are taken, with a correction described
in 6. (being here applied to the states 01, 10 and 11).
8. The previous procedure is repeated until the end of decoding.
The next step is shown in Fig. 7.49, where the decoded bit is i3 = 0. It should be
noted that not here neither in the previous steps, there would not be a correct

0.75 -1.75 1.75 1.75 0.75 1.75


t=3 t=4 t=5
t=2 0/-1,-1 15.125
4.5
00 1/1,1 22.125

0/1,1 0/1,1
3.375 21.5 26.125
01
1/-1,-1 1/-1,-1 25.125

0/1,-1 0/1,-1 5.125


10
2.75 18.5 32.125
1/-1,1 1/-1,1

0/-1,1 0/-1,1
22.125
11
13.375 1/1,-1 21.5 1/1,-1 29.125

Fig. 7.49 Decoding using fixed delay, the third step


Problems 373

decoding if the decision was not made “by force”, because the branch elimination
was not resulted in a unique path till the depth t = 3. Because of that, after this step,
a trellis part till the depth t = 3 is erased, new initial state is 01, and (insignificant)
metrics correction is carried out only for a state 01.
The last considered step is shown in Fig. 7.50. For a difference to the previous
cases, here already a standard procedure of branches elimination, to which corre-
spond larger distances, yields the decoding of the fourth information bit (i4 = 1).
After this step one should erase only the trellis part till the depth t = 4, new initial
state is now 00, the trellis part till the depth t = 7 is drawn and a decoding pro-
cedure is continued.
It can be noted that in this example in every step only one information bit was
decoded. However, it is a consequence of the fixed delay small value—it is here
minimum and corresponds to the encoder transient state duration, to simplify the
procedure illustration. Usually, a fixed delay is 15–20 bits and it is clear that, in
such case, in one window more bits can be decoded, and that a “truncation”, i.e. a
decision “by force” is applied only when in a standard decoding procedure there is
some delay.,
It was shown earlier that in this example the decoded information bits are i1 = 0,
i2 = 1, i3 = 0, i4 = 0, corresponding completely to the sequence at the encoder input
(see the first part of the solution). One can verify that by using a classic decoding
procedure at least after the first three steps no one bit would be detected (it is
recommended to a reader to verify it!). If the decoding was continued, if there were
no more transmission errors, all bits would be decoded. However, the decoder
would not emit anything for some longer time, but after he would emit a large
number of bits.
In addition to lessening the need for a registers length where the metrics are
stored, the decoding with a fixed delay is used just to lessen the variable rate at the

1.75 1.75 0.75 1.75 1.25 -1.25


t=3 t=4 t=5 t=6
0/-1,-1 0/-1,-1 10.25
00 31.25
1/1,1 15.125 1/1,1
0/1,1 4.5
0/1,1 5.25
26.125
01
3.375 1/-1,-1 32.25

1/-1,-1 0/1,-1 0/1,-1 20.25


10
18.5 5.125 31.25
1/-1,1 1/-1,1

0/-1,1
15.25
11
22.125 1/1,-1 22.25

Fig. 7.50 Decoding using fixed delay, the fourth step


374 7 Convolutional Codes and Viterbi Algorithm

decoder output as well. However, one should keep in the mind that the decoding
will be correct only if a delay is sufficiently long to “clear up” the path, i.e. that it
has no branches for a number of steps equal to the delay.
Problem 7.7
(a) Draw a block scheme of a convolutional encoder defined by the generator
G(D) = [1 + D, D, 0; 0, D, 1 + D] and a corresponding trellis with denoted
transitions (x1x2/y1y2y3). Find the code rate and the free distance.
(b) Design the encoder with one input which combined with a suitable puncturing
provides the same dfree and the same code rate as the encoder above.
(c) Decode using Viterbi algorithm with Euclidean metric if at the input of
encoder from (b) the sequence

i ¼ ð1101Þ;

is entering, represented by unit power pulses, the channel noise samples are

n ¼ ð0:3; 0:7; 0:4; 0:3; 0:8; þ 1:1Þ:

Solution
(a) The encoder block scheme is shown in Fig. 7.51, and a corresponding trellis
for a transient regime is shown in Fig. 7.52. Code rate is R = 2/3. Free dis-
tance is dfree = 3.
(b) According to the problem condition, a punctured convolutional code should
be obtained which has the code rate R = 2/3 and free distance at least dfree = 3,
where for starting convolutional encoder is chosen one having one input. The
simplest way to realize it is to start from the best possible encoder having two
inputs and two delay cells (Rm = 1/2). It is well known that the corresponding
generator is G(D) = [1 + D+D2, 1 + D2] (or G(D) = [1 + D2, 1 + D+D2]),
which has dfree = 5, which after the puncturing will be additionally decreased.

Fig. 7.51 Encoder block


scheme x1 s1 y1
+

y2
+

x2 s2 y3
+
Problems 375

Fig. 7.52 Trellis t=0 t=1 t=2


corresponding to a transient 00/000
regime, paths to find dfree are 00/000 00/011
00 00/110
heavier
00/101
01/001
01/001
01 01/010
10/100 01/111
01/100

10/100
10 10/111
10/010
11/101 10/001

11/101
11/110
11 11/011
11/000

Later, it will be verified do by using this starting encoder the wished perfor-
mances can be obtained.
It is known that the code rate after the puncturing equals the ratio of punc-
turing period (denoted by P) and a number of ones in the puncturing matrix
(denoted by a) [1]

RP ¼ Rm  ðnP=aÞ ¼ P=a;

and that the simplest puncturing matrices providing code rate R = 2/3 are as
follows
   
0 1 1 1 1 0 1 1
P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼ :
1 1 0 1 1 1 1 0

From Fig. 7.53 the weights of paths returning in the state 00 after three and
four steps can be found, the results are shown in Table 7.9 (it can be con-
sidered that there are no longer paths having smaller weights). All matrices
satisfy the problem conditions, but the encoder will have better correction
capabilities if matrices P3 are P4 are used yielding dfree = 4. Therefore, by
puncturing the originating code having one input, to which corresponds a
simpler encoder block scheme and a lower complexity trellis, it is possible to
construct an encoder which has the same code rate as well as a higher capa-
bility to correct errors [64].
(c) In what follows the puncturing matrix P3 will be used, the complete block
scheme of corresponding encoder is shown in Fig. 7.54. To information
sequence i = (1101) corresponds the code word c = (11010100). However, at
the transmission line a polar equivalent of this sequence is emitted where
(because of puncturing) third and seventh bit are omitted, and the following
sequences are at the channel input i.e. output
376 7 Convolutional Codes and Viterbi Algorithm

Π1 → [0, 1] [1, 1] [0, 1] [1, 1]


Π2 → [1, 0] [1, 1] [1, 0] [1, 1]
Π3 → [1, 1] [0, 1] [1, 1] [0, 1]
Π4 → [1, 1] [1, 0] [1, 1] [1, 0]
t=0 t=1 t=2 t=3 t=4
00

0/11
0/11
01 1/11
1/00
0/10

10

1/01 0/01

11 1/10 1/10

Fig. 7.53 Trellis corresponding to punctured convolutional code which has the code rate R = 2/3
and the free distance dfree = 3

Table 7.9 Weights of two observed paths for puncturing matrices


Path 00 ! 00 P1 P2 P3 P4
Path terminates in t = 3 d=3 d=3 d=4 d=5
Path terminates in t = 4 d=5 d=4 d=5 d=4

Punctured convolutional codes


1000

1101 ⎡1 0 ⎤
s2 Π3 = ⎢ ⎥
s1 ⎣1 1 ⎦

Block for 1110


Mother encoder
puncturing

Fig. 7.54 Convolutional encoder structure corresponding to a chosen punctured code

x ¼ ð þ 1; þ 1; þ 1; 1; þ 1; 1Þ;
y ¼ x þ n ¼ ð1:3; 0:3; 1:4; 1:3; 0:2; 0:1Þ:

Decoding procedure using Euclidean metric (without quantization) is shown in


Fig. 7.55. The difference in respect to previously considered cases is that
instead of symbols, being erased, neutral values are inserted (symbol 0 has the
same distance from −1 and +1). Although the channel noise is quite high, in
Problems 377

1.3 0.3 0 1.4 -1.3 0.2 0 0,1


-1, -1 -1, -1 -1, -1 15.27 -1, -1 13.28
00
1, 1 13.27 1, 1 4.28
1, 1
1, 1 1, 1
14.87 9.88
01 1, 1
-1,-1 2.47 -1, -1 9.28
1, -1
1, -1 19.67 1, -1 15.08
10 8.87 2.48
-1, 1 -1, 1
-1, 1
-1, 1 -1, 1
8.87 9.68
11 1, -1 1, -1
8.47 9.48

Fig. 7.55 Decoding procedure illustration

x1 y1

y2 TCM
x2
signal
s1 s2 8-PSK
y3

Fig. 7.56 TCM modulator structure

every of two decoding steps, one bit of information sequence was successfully
reconstructed (i1 = 1, i2 = 1), and there is a high probability that the next two
decoded bits will be i3 = 0 and i4 = 1, because the corresponding path
Euclidean distance is substantionaly smaller with respect to the other paths.

Problem 7.8 The systematic recursive convolutional encoder in Fig. 7.56 is


shown. It is used for TCM procedure combined with 8-PSK modulation.
(a) Draw state diagram and trellis corresponding to convolutional encoder and
find its free distance.
(b) Find which mapping (binary or Gray) at 8-PSK constellation provides a
greater value of minimum Euclidean distance.
(c) Find the TCM modulator output for the input sequence x = (01, 00, 10). If at
the obtained signal in the channel a narrowband Gaussian noise is superim-
posed which has complex samples (corresponding to a baseband)

n ¼ ð0:1  0:2j; 0; 0:3 þ 0:1j; 0; 0:2jÞ

describe in details a procedure for information bits reconstruction in TCM


demodulator.
378 7 Convolutional Codes and Viterbi Algorithm

Solution
(a) In TCM (Trellis Coded Modulation) transmitting end consists from encoder
and modulator, their parameters are jointly optimized [65]. State diagram of
recursive systematic convolutional encoder is shown in Fig. 7.57, where
dashed lines correspond to zero value of the second information bit. The first
information bit does not influence the path shape, but only the first output
value (it is the reason for parallel paths).
In Fig. 7.58 complete trellis at the depth t = 1 and a trellis part to the depth
t = 3 are shown. It is clear that the free code distance is dfree = 1. Parallel to all

00/000
10/100

01/010 00
01/010
11/110
11/110
00/000
s1’=s2
10/100
s2’= x2 s1
01 10
y1= x1
01/011 y2= x2
11/111 y3= s2
10/101 00/001
00/001
11 10/101

01/011
11/111

Fig. 7.57 State diagram and encoder (Fig. 7.58) working rule

000, 100 000 000 000


010, 110 (00) d (00)
010 100 100 100
110

011, 111
001, 101 (01) 010 (01)
001 110
101

010, 110 (10) (10)


000, 100

001, 101 (11) (11)


011, 111

Fig. 7.58 Trellis corresponding to the encoder shown in Fig. 7.48


Problems 379

zeros path is a path which in t = 0 leaves state 00, and in t = 1 returns into the
same state containing only one output bit different from one.
(b) Furthermore, two ways of constellation diagram mapping (binary and Gray)
will be considered, for a complex form of modulated signal which has a
normalized amplitude, for every three bit combination at the modulator input.
1. Binary mapping:
000 x =1
010
001 x = (1 + j ) / 2
011 001
d1 010 x= j
d2 d0
011 x = (−1 + j ) / 2
100 000 100 x = −1
d3
101 x = (−1 − j ) / 2
111 110 x=−j
101
111 x = (1 − j ) / 2
110
2. Gray mapping:
011 000 x =1
010 001 001 x = (1 + j ) / 2
d1
011 x= j
d2 d0
010 x = (−1 + j ) / 2
110 000
d3 110 x = −1
111 x = (−1 − j ) / 2
111 100 101 x=−j
101 100 x = (1 − j ) / 2

From the trellis in Fig. 7.59 the paths can be noticed which at depth t = 0
leave the state 00 and in some next moment return to this state, i.e. they are
competing for a path having a minimum distance. Euclidean distance is cal-
culated step by step, as a distance of a point at constellation diagram to the

(00) (00)
010 100
110

010 (01)
(01)
001 110
101

(10) (10)

(11) (11)

Fig. 7.59 Trellis part for finding minimum Euclidean distance for TCM modulator
380 7 Convolutional Codes and Viterbi Algorithm

point corresponding to all zeros combination, and a squares of distances of


single steps are added to obtain a total distance.
It can be noticed that there are nine competing paths where one stops at the
depth t = 1, and the others at t = 3 (it is supposed that the longer paths are not
of interest) [65]:
1. t = 1:
2
c ¼ ð110Þ ! dðIÞ;bin ¼ d32 ; dðIÞ;Grej
2
¼ d02

2. t = 3:
2
c ¼ ð010; 001; 010Þ ! dðIIÞ;bin ¼ d02 þ 2d12 ; dðIIÞ;Grej
2
¼ 2d22 þ d02
2
c ¼ ð010; 101; 110Þ ! dðIIIÞ;bin ¼ d02 þ 2d12 ; dðIIIÞ;Grej
2
¼ d22 þ d02 þ d32
2
c ¼ ð010; 001; 010Þ ! dðIVÞ;bin ¼ d22 þ 2d12 ; dðIVÞ;Grej
2
¼ 2d22 þ d12
2
c ¼ ð010; 101; 110Þ ! dðVÞ;bin ¼ d22 þ 2d12 ; dðVÞ;Grej
2
¼ d22 þ d12 þ d32
2
c ¼ ð110; 001; 010Þ ! dðVIÞ;bin ¼ d02 þ 2d12 ; dðVIÞ;Grej
2
¼ d32 þ d02 þ d22
2
c ¼ ð110; 001; 110Þ ! dðVIIÞ;bin ¼ d02 þ 2d12 ; dðVIIÞ;Grej
2
¼ 2d32 þ d02
2
c ¼ ð110; 101; 010Þ ! dðVIIIÞ;bin ¼ d22 þ 2d12 ; dðVIIIÞ;Grej
2
¼ d32 þ d12 þ d22
2
c ¼ ð110; 101; 110Þ ! dðIXÞ;bin ¼ d22 þ 2d12 ; dðIXÞ;Grej
2
¼ 2d32 þ d12
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
where d0 ¼ 2  2; d1 ¼ 2; d2 ¼ 2 þ 2; d3 ¼ 2. It is easy to
verify that in both cases the path terminating at the depth t = 1 is the
more critical, because is this case the Euclidean distance is the smallest.
Corresponding code gains are

d32 4
Gbin ¼ 10 log ¼ 10 log ¼ 3:01 dB,
d12 2
2
pffiffiffi
d0 2 2
GGrej ¼ 10 log 2 ¼ 10 log ¼ 5:33 dB,
d1 2

and it is obvious that for this encoder binary mapping is optimal.


(c) From the trellis shown in Fig. 7.59 it is obvious that to the information
sequence at the encoder input x = (01,00,10) corresponds the code sequence
y = (010,001,100) at its output. For the applied binary mapping, this sequence
of bits in 8-PSK modulator is represented by modulated symbols
pffiffiffi
u ¼ ðj; ð1 þ jÞ= 2; 1Þ;

and after noise superimposing at the decoder input the following symbols
enter the decoder
Problems 381

pffiffiffi
v ¼ u þ n ¼ ð0:1 þ 0:8j; ð1 þ jÞ= 2; 0:7 þ 0:1jÞ:

TCM demodulator functioning is shown in details in Fig. 7.60. It is obvious that


in this case demodulation and decoding are performed simultaneously—instead that
a demodulation is performed by observing the octant where the received signal
“enter”, and thus obtained decisions send to a Viterbi decoder, the decoding is
performed using Euclidean metrics where in every step, the metrics are calculated in
a two-dimensional space (dimensions correspond to so called I and Q received
signal components) and from all paths entering in one node, this one is selected
which has a minimum Euclidean distance in respect to a received sequence. For a
better insight in this problem only, the eliminated paths are shown, denoted by
dashed lines.
In this example, after the transient regime ending, it was found that the encoder
terminated in the state 00, corresponding to the code sequence 010, i.e. to symbol
j at the modulator output (denoting a signal whose real part equals zero and
imaginary part equals one, i.e. the phase shift is p/2). The encoder is systematic and
the first two bits in the code sequence 010 correspond to information bits at the
decoder input, therefore, at this moment decoded bits are x1 = 0 and x2 = 1.
The bits most likely to be decoded further are x3 = 0, x4 = 0, x5 = 1 and x6 = 0,
but the final decision is not yet made because even in an optimum regime the last
mk bits are not decoded (m is the encoder memory order and k is a number of input
bits). However, it can be overcomed if during the encoding after the information bit
sequence mk “tail bits” are added, do not bearing any information, but providing the

y=0.1+0.8j (1 + j ) / 2 y=-0.7+0.1j
1 1.20 1 1.96 1 3.66
2.28
(00) d
1.36
-1 1.36 -1 3.04 -1 1.52
j -j
j j
-j
-j
-j 0.22 1.96 3.10
(1 + j ) / 2 3.26
(01) 1.92
1.80 3.04 1
(1 + j ) / 2
0.53
(1 − j ) / 2 -1
j
0.22 3.49
(−1 − j ) / 2 (−1 − j ) / 2 2.77
(10) 2.22 2.24
(1 − j ) / 2 3.25
(−1 + j ) / 2
(−1 + j ) / 2
(1 − j ) / 2 (−1 + j ) / 2
1.63 2.57
(1 + j ) / 2 3.58
(11) 3.16
1.63 (−1 − j ) / 2 2.45

Fig. 7.60 Procedure of demodulation/decoding of TCM signal, transient regime


382 7 Convolutional Codes and Viterbi Algorithm

trellis termination in state 00. In considered case trellis should return from the state
01 (where the encoder was after emitting a code sequence c) into the state 00, which
is provided if in the fourth step x2 = 1 and in the fifth x2 = 1. The corresponding
values of x1 are not interesting, it will be supposed that they equal zero.
A complete sequence at the encoder input is now x = (01,00,10,01,01), the
corresponding code sequence is y = (010,001,100,011,010), while the modulated
symbols are
pffiffiffi pffiffiffi
u ¼ ðj; ð1 þ jÞ= 2; 1; ð1 þ jÞ= 2; jÞ;

and after the superimposition of noise at the decoder input is the following sequence
pffiffiffi pffiffiffi
v ¼ u þ n ¼ ð0:1 þ 0:8j; ð1 þ jÞ= 2; 0:7 þ 0:1j; ð1 þ jÞ= 2; 1:2jÞ:

Trellis after two more decoding steps is shown in Fig. 7.61. In this case, the
decoding of all emitted information bits is terminated, i.e. the sequence
x = (01,00,10) was reconstructed correctly. Trellis has still more branches, and the
fourth and fifth bit from the encoder input are not decoded. But, these bits do not
carry any information and their reconstruction is not important. Although the metric
corresponding to the state 00 at the depth t = 5 is the smallest, the corresponding
numerical values at the last two depth of a trellis are not written, because they are
not interesting, the decoding being finished (the reader should find them, repeating
the procedure from the previous steps).
In this case for transmitting six information bits even four tail bits are used, and
it may seem that a trellis termination procedure have extremely low efficacy. Of
course, it is not a case, because the information bits block to be transmitted can be
extremely long—typical files lengths downloaded from Internet are greater than
1 [MB], and after their transmission typically 4–16 tail bits are added. For a turbo

y=0.1+0.8j (1 + j ) / 2 y=-0.7+0.1j ( −1 + j ) / 2 0.8 j

(00) d d d

0.22 j 0.53
(01)
(1 + j ) / 2
-1

0.22
(10)

(11)

Fig. 7.61 Demodulation/decoding of TCM signal, trellis termination


Problems 383

codes, a block of a few thousands of information bits is entered into parallel


recursive convolutional encoders and with trellis termination it is provided that
every block is considered separately, i.e. by using the convolutional codes a linear
block code is obtained, having the possibility to be decoded by using a trellis as
well. These procedure will be analyzed in details in the Chap. 2
Chapter 8
Trellis Decoding of Linear Block Codes,
Turbo Codes

Brief Theoretical Overview

During the “long” history of error control coding, especially taking into account
that the proof of the Second Shannon theorem is not a “constructive” one, the
search for constructive codes to approach the “promised” limits have taken a long
period of time. Algebraic coding theory flourished. However, even the best cyclic
codes (RS codes) had not good performance. Namely, they reduce the error
probability for relatively high signal-to-noise ratios and have high code rate, but
they are far from the Shannon limit. Furthermore, the practical coding schemes
were developed allowing the better reliability for lower signal-to-noise ratios. These
were convolutional codes having a moderate constraint length decoded using
Viterbi algorithm, and convolutional codes having a long constraint length decoded
using sequential decoding. Next, cascade schemes were developed where block and
convolutional codes were combined to maximize the coding gain. However, such
codes were suitable only for systems having limited power and unlimited band-
width. For systems having limited bandwidth, the spectral efficiency was aug-
mented by combining multilevel signal constellations and punctured convolutional
codes, where the modulation (constellation) and coding (code rate) were adaptive.
Later it was conceived that the better performance can be achieved by joined coding
and modulation (trellis coded modulation). The next step was iterative decoding
allowing at last coming into “the land of promise”. In fact, there is no need to make
the difference between block codes and convolutional codes, because the last ones
can be terminated (by adding non information bits) and considered as block codes.
In the previous chapter, the Viterbi algorithm was introduced having in view
mainly convolutional codes. Even if the information bits are statistically indepen-
dent, the dependence is introduced into a linear block codes codewords as well, by
adding parity-check bits. Therefore, they can be regarded as generated by some
finite automat. The corresponding trellis can be constructed which has clearly
defined the starting and the finishing state.

© Springer International Publishing AG 2017 385


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_8
386 8 Trellis Decoding of Linear Block Codes, Turbo Codes

A method to find the trellis for linear block codes was proposed in [66] and later
Wolf [67] introduced identical trellis and applied Viterbi algorithm for ML block
codes decoding. More details can be found in [68, 69]. The procedure will be firstly
explained for binary code (Problems 8.1, 8.2 and 8.4), but it can be similarly
applied for linear block codes over GF(q) (Problem 8.3).
Consider a linear block code (n, k) over GF(2). Its control matrix has the
dimensions (n − k)  n and can be written in the form
H ¼ ½h1 h2 . . .hj . . .hn ;

where hj are vector columns which have dimensions (n − k)  1. Vector


(row) v with dimensions n  1 is a code vector if the following is true
vH T ¼ 0:

The trellis will start at the depth 0 and will terminate at the depth n. Suppose the
systematic code where the first k bits are information bits, followed by n − k parity
checks. State at the depth j will be denoted as Sj and numerically expressed using its
ordinal number. It can be denoted as the second index, i.e. as Sj,ord.numb. Maximal
value of these ordinal numbers is 2n−k − 1. There are 2n−k possible values (states).
They can be represented by vectors having length n − k. If the information vector is
denoted as i(i1, i2, …, ik) a new state for the depth j can be found (second index
denoting the ordinal number is omitted) as

Sj ¼ Sj1 þ xj hj ;

where xj = ij (j  k) and for j = k + 1, …, nxj is the corresponding parity check bit


obtained on the basis of the previous information bits using generator matrix G. In
such way the terminating state Sn will be equal to the starting one—S0. In this case
all paths in trellis will correspond to the possible code words.
Consider (7, 4) code where the corresponding generator and control matrices are
2 3
1 0 0 0 1 1 1
6 7
60 1 0 0 1 1 07
G ¼6
60
7 ¼ ½I4 P;
4 0 1 0 1 0 17
5
0 0 0 1 0 1 1
2 3
1 1 1 0 1 0 0
6 7
H ¼ ½PI3  ¼ 4 1 1 0 1 0 1 0 5:
1 0 1 1 0 0 1

The corresponding trellis has 2n−k = 8 states (i.e. three bit vectors). It starts from
depth 0 going to the depth n = 7. Total number of paths equals the number of code
words 2 k = 24 = 16. If index of S denotes only the trellis depth and state index is
written in the parenthesis, the calculations are as follows (for S0 = (000)):
Brief Theoretical Overview 387

j¼1:
S1 ¼ S0  xk h1 ¼ ð0 0 0Þ þ x1 ð1 1 1Þ

ð0 0 0Þ; x1 ¼ 0
¼
ð1 1 1Þ; x1 ¼ 1
j¼2:
S2 ¼ S1  x2 h2 ¼ S1 þ x2 ð1 1 0Þ

ð0 0 0Þ; x2 ¼0
S2 ¼ ð0 0 0Þ þ x2 ð1 0 1Þ ¼
ð1 0 1Þ; x2 ¼1

ð1 1 1Þ; x2 ¼0
S2 ¼ ð1 1 1Þ þ x2 ð1 1 0Þ ¼
ð0 0 1Þ; x2 ¼1
j¼3:
S3 ¼ S2  x3 h3 ¼ S2 þ x3 ð1 0 1Þ

ð0 0 0Þ; x3 ¼0
S3 ¼ ð0 0 0Þ þ x3 ð1 0 1Þ ¼
ð1 0 1Þ; x3 ¼1

ð0 0 1Þ; x3 ¼0
S3 ¼ ð0 0 1Þ þ x3 ð1 0 1Þ ¼
ð1 0 0Þ; x3 ¼1

ð1 1 0Þ; x3 ¼0
S3 ¼ ð1 1 0Þ þ x3 ð1 0 1Þ ¼
ð0 1 1Þ; x3 ¼1

ð1 1 1Þ; x3 ¼0
S3 ¼ ð1 1 1Þ þ x3 ð1 0 1Þ ¼
ð0 1 0Þ; x3 ¼1

j¼4:
S4 ¼ S3  x4 h4 ¼ S3 þ x4 ð101Þ 
ð0 0 0Þ; x3 ¼ 0 ð1 0 0Þ; x4 ¼ 0
S4 ¼ ð0 0 0Þ þ x4 ð011Þ ¼ S4 ¼ ð1 0 0Þ þ x4 ð011Þ ¼
 ð0 1 1Þ; x3 ¼ 1  ð1 1 1Þ; x4 ¼ 1
ð0 01Þ; x3 ¼ 0 ð1 01Þ; x4 ¼ 0
S4 ¼ ð0 0 1Þ þ x4 ð011Þ ¼ S4 ¼ ð1 0 1Þ þ x4 ð011Þ ¼
ð0
 10Þ; x3 ¼ 1 ð1
 10Þ; x4 ¼ 1
ð010Þ; x3 ¼ 0 ð110Þ; x4 ¼ 0
S4 ¼ ð0 1 0Þ þ x4 ð011Þ ¼ S4 ¼ ð1 1 0Þ þ x4 ð011Þ ¼
 ð001Þ; x3 ¼ 1  ð101Þ; x4 ¼ 1
ð011Þ; x3 ¼ 0 ð111Þ; x4 ¼ 0
S4 ¼ ð0 1 1Þ þ x4 ð011Þ ¼ S4 ¼ ð1 1 1Þ þ x4 ð011Þ ¼
ð000Þ; x3 ¼ 1 ð100Þ; x4 ¼ 1

Now, the trellis is completely developed and there are 16 paths going to 8 nodes
(two paths into every node). These paths correspond to 16 possible information bit
sequences (of the length 4). It is supposed that the information bits are independent
and that all these combinations are equally probable. But, now the parity check bits
come. They depend on previous path bits, i.e. bits x5, x6 and x7 depend on the state
(node). The state now determines completely the next parity check bits, as shown in
Fig. 8.1. E.g. in a considered case:
388 8 Trellis Decoding of Linear Block Codes, Turbo Codes

h 1=[111] h2 =[110] h3 =[101] h4 =[011] h5 =[100] h6 =[010] h7 =[001]


S 0 (000)
k3 =0

S 1 (001)
k2 =1

k1 =0
S 2 (010)
i1 =1 i4 =0

S 3 (011)

S 4 (100)

S 5 (101)

S 6 (110)
i3 =1

i2 =0
S 7 (111)

Fig. 8.1 Trellis after the first four steps

S4 ¼ S3 þ x4 h4 ¼ ðS2 þ x3 h3 Þ þ x4 h4 ¼ ððS1 þ x2 h2 Þ þ x3 h3 Þ þ x4 h4
¼ ðððS0 þ x1 h1 Þ þ x2 h2 Þ þ x3 h3 Þ þ x4 h4 ¼ S0 þ ði1 h1 þ i2 h2 þ i3 h3 þ i4 h4 Þ
¼ S0 þ ði1 H 11 þ i2 H 21 þ i3 H 31 þ i4 H 41 i1 H 12 þ i2 H 22 þ i3 H 32 þ i4 H 42 i1 H 13
þ i2 H 23 þ i3 H 33 þ i4 H 43 Þ
¼ S0 þ ði1 P11 þ i2 P12 þ i3 P13 þ i4 P14 i1 P21 þ i2 P22 þ i3 P23 þ i4 P24 i1 P31
þ i2 P32 þ i3 P33 þ i4 P34 Þ
¼ ð000Þ þ ðk1 k2 k3 Þ ¼ ðk1 k2 k3 Þ:

From Fig. 8.1 it is easy to see that to the branch (path) which corresponds to the
input i1 = 1, i2 = 0, i3 = 1 and i4 = 0, correspond as well the same output bits
(systematic encoder). This path after the fourth step is in state S2 = (010), and the
next three control bits correspond to the bits in parenthesis, i.e. k1 = 0, k2 = 1 and
k3 = 0. Final state for the considered path is

S7 ¼ S6 þ x7 h7 ¼ ðS5 þ x6 h6 Þ þ x7 h7 ¼ ððS4 þ x5 h5 Þ þ x6 h6 Þ þ x7 h7
¼ ½k1 k2 k3  þ k1 ½100 þ k2 ½010 þ k3 ½001 ¼ ½000:

The complete trellis is shown in Fig. 8.2. Parts of the path corresponding to
symbol xj = 0 are dashed, and parts corresponding to symbol xj = 1 are denoted by
full lines.
Consider now linear block code (5, 3) which has the generator matrix
Brief Theoretical Overview 389

h1=[111] h2=[110] h3=[101] h4=[011] h5=[100] h6=[010] h7=[001]


S0 (000)

S1 (001)

S2 (010)

S3 (011)

S4 (100)

S5 (101)

S6 (110)

S7 (111)

Fig. 8.2 Trellis structure for the considered example

2 3
1 0 1 0 0
G ¼ 40 1 0 1 05
0 0 1 1 1

To obtain the trellis, the control matrix should be found. Code is not systematic
and firstly the generator matrix of the equivalent systematic code has to be found. It
can be achieved by column permutation (here 1-2-3-4-5 ! 3-4-1-2-5) or by
addition of corresponding rows (here the third row of systematic code matrix G′ can
be obtained by adding all three rows of matrix G). Consider submatrix of this
matrix corresponding to parity checks P (here first two columns of G). Using the
standard procedure the control matrix of systematic code is obtained:
2 3
1 0 1 0 0  
  1 0 1 0 1
G0 ¼ 4 0 1 0 1 0 5 ¼ ½P; I3 ; H0 ¼ I2 ; PT ¼ :
0 1 0 1 1
1 1 0 0 1

Of course, the relation G′(H′)T = 0 holds but as well as the relation (GH′)T = 0.
Corresponding trellis is shown in Fig. 8.3. It should be noticed that in this
example the used generator matrix G generates the code words where the control
bits are at the beginning, followed by information bits. Corresponding code words
are c1 = (00000), c2 = (11001), c3 = (01010), c4 = (10011), c5 = (10100),
c6 = (01101), c7 = (11110), c8 = (00111). Minimal Hamming weight determines
the minimal Hamming distance is here dmin = 2.
Therefore, to construct trellis, the corresponding parity-check matrix should be
found providing the trellis construction. When parity-check matrix of linear block
code is not known, a trellis structure can be determined from the generator matrix, if
390 8 Trellis Decoding of Linear Block Codes, Turbo Codes

i=0 i=1 i=2 i=3 i=4 i=5


0 0 0 0 0
0
1
1
1 1

1 0
1
1

1
0 0
2
1
1
1

0 0
3

Fig. 8.3 Trellis structure for the considered example (code (5, 3))

it is written in a Trellis Oriented Form (TOF). For a generator matrix G it is said


that is TOF if the next two conditions are satisfied [70, 71]: the leading one in every
matrix G row is before the leading one in the every next row, i.e. the row below it
and no two rows have the last ones in the same column. In fact, trellis oriented
generator matrix GTOGM is obtained by generator matrix transformation as done in
the previous examples. The generator matrix of cyclic codes has always trellis
oriented form.
The trellis provides for application of hard or soft Viterbi algorithm. Path metrics
are calculated as Hamming or (squared) Euclidean distances of the received
sequence u from the sequences v that could have been transmitted, taking into
account the trellis structure. Therefore, the decoding can be hard [by using
Hamming metric (Problem 8.5)] or soft [by using Euclidean metric with or without
the quantization (Problems 8.5 and 8.6)]. Accumulated squared metric of every
state is calculated as follows

X
n
dE ðu; vÞ ¼ ðui  vi Þ2 :
i¼1

If ML rule is applied by using the Viterbi algorithm, a generalized Viterbi


algorithm is obtained (Problem 8.6).
Consider now the first example in this section. Let the samples at the decoder
input are

0:4; 0:2; 0:5; 0:1; 0:2; 0:1 and 0:2;

forming one code word.


Decoding procedure is shown in Table 8.1. The procedure starts at the moment
when the paths joint for the first time (here for depth 4). It is supposed that bit 0
corresponds to the sample value 0.0, and bit 1 to the sample value 1.0. In the first
Brief Theoretical Overview 391

Table 8.1 Viterbi decoding for the trellis shown in Fig. 8.2
First step Second step Third step Fourth step
Samples! 0.4 0.2 0.1 0.2 0.1 0.2
0.5
0–0: 0000: 0.46 0: 0.46 + 0.22 = 0.5 0: 0.5 + 0.12 = 0.51 0: 0.51 + 0.22 = 0.55
0111: 1.86 1: 0.66 + 0.82 = 1.3 1: 1.3 + 0.92 = 2.31 1: 1.31 + 0.82 = 1.95
0–1: 1100: 1.26 0: 1.26 + 0.22 = 1.3 0: 1.3 + 0.12 = 1.31
1011: 1.46 1: 1.06 + 0.82 = 1.7 1: 0.5 + 0.92 = 1.31
0–2: 1101: 1.26 0: 1.26 + 0.22 = 1.3
1010: 1.46 1: 1.06 + 0.82 = 1.7
0–3: 0001: 0.46 0: 0.46 + 0.22 = 0.5
0110: 1.86 1: 0.36 + 0.82 = 1.3
0–4: 1110: 2.06
1001: 0.66
0–5: 0010: 1.26
0101: 1.06
0–6: 0011: 1.26
0100: 1.06
0–7: 1111: 2.06
1000: 0.66

step between nodes 0 and 0 (the first row in table) corresponding paths are 0000 and
0111.
For path S0-S0-S0-S0-S0 the corresponding metric is

d ¼ ð0:4  0Þ2 þ ð0:2  0Þ2 þ ð0:5  0Þ2 þ ð0:1  0Þ2 ¼ 0:46;

and for path S0-S0-S6-S3-S0

d ¼ ð0:4  0Þ2 þ ð0:2  1Þ2 þ ð0:5  1Þ2 þ ð0:1  1Þ2 ¼ 1:86:

The survived path has the distance 0.46 (fat numbers). The procedure continues
for the next step where into state S0 enter paths from S0 and S4 and a path is chosen
which has smaller accumulated distance taking as well into account metrics of the
nodes from which the part of path starts. In the third step there is a case where both
distances are equal (1.31) and arbitrarily, one path is chosen. However, and even if
the other path was chosen, it would not influence the last step. Therefore, all zeros
path (0000000) has survived and the information bits are (0000).
Soft decision is optimum if at the channel output the samples are from contin-
uous set. In real systems it is not the case because samples are quantized. Therefore,
it may be conceived that at the channel output there are specific (finite) number of
discrete values. For binary transmission, the channel can be usually represented as
discrete channel with two inputs and more outputs. The case with two outputs is a
binary channel and corresponds to hard decision. For a higher number of outputs
the limiting case is soft decision. ML rule can be applied for such channels and if
392 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Viterbi algorithm is used, this procedure is usually named generalized Viterbi


algorithm (Problem 8.6) [72]. The procedure is as follows:
1. Path metrics are defined by trellis structure and by transient discrete channel
probabilities

ci ðSði1Þ ; SðiÞ Þ ¼ PðSðiÞ jSði1Þ ÞPðYi jXi ¼ xÞ:

where PðXi ¼ xjSði1Þ SðiÞ Þ denotes the probability that during transition from
state Sði1Þ in state SðiÞ at the encoder output, x is emitted ith symbol. It can be
equal 1 (if that transition exists at the trellis or 0, if it is not allowed). Probability
that at the channel output at ith step the symbol Yi appears if at the channel input
is symbol x is denoted with PðYi jXi ¼ xÞ. The number of symbols is finite and
these probabilities can be calculated and given by the table.
2. Node metrics are obtained in the following way—at depth i = 0 only to one
node (usually corresponding to state Sð0Þ ¼ 0) the unity metric is joined
(a0 ðSð0Þ Þ ¼ 1), while for other nodes the metric equals zero. At ith trellis depth
node metrics are calculated as follows:

Y
i
ai ðSðiÞ Þ ¼ ai1 ðSði1Þ Þci ðSði1Þ ; SðiÞ Þ ¼ cj ðSðj1Þ ; SðjÞ Þ; i ¼ 1; 2; . . .; n
j¼1

There are more paths to enter into a specific node and the corresponding
products are calculated for all possible paths. For node metric the highest one is
chosen.
Consider the encoder defined in the second example in this section. Let the
probability of binary symbols is 0.5. The encoded sequence is transmitted over
discrete channel without memory. Let channel output is quantized yielding symbols
0, 0+, 1− i 1. Two output symbols are more reliable (0 i 1) and two are less reliable
(0+, 1−). Channel graph is shown in Fig. 8.4.
Let message m = (000) is transmitted, the corresponding code word is
c1 = (00000). If at the channel output y = (0+, 0, 1−, 0, 0) is obtained, apply
generalized Viterbi algorithm. Transition probabilities are given in Table 8.2.
Using trellis shown in Fig. 8.3 the corresponding path metrics are found and for
the transient regime shown in Fig. 8.5. The metrics can be easily found. E.g.

a3 ð2Þ0 ¼ c1 ð0; 0Þc2 ð0; 0Þc3 ð0; 2Þ ¼ Pð0 þ j0ÞPð0j0ÞPð1 j1Þ


¼ 0:3  0:5  0:3 ¼ 0:045 ;
a3 ð2Þ00 ¼ c1 ð0; 2Þc2 ð2; 2Þc3 ð2; 2Þ ¼ Pð0 þ j1ÞPð0j0ÞPð1 j0Þ
¼ 0:15  0:5  0:15 ¼ 0:01125:

The path which has a larger metric is chosen. These pats are denoted with heavy
lines as well as their values.
Brief Theoretical Overview 393

0.5 0
0.05

0.3 +
0 0
0.01

0.15
-
1 0.3
1

0.05
0.5 1

Fig. 8.4 Graph of discrete channel (two inputs, four outputs)

Table 8.2 Transition probabilities for the received message


i 1 2 3 4 5
Yi 0+ 0 1− 0 0
(P(Yi/0); P(Yi/1)) (0.3, 0.15) (0.5, 0.05) (0.15, 0.3) (0.5, 0.05) (0.5, 0.05)

+ -
0 0 1
i=0 i=1 i=2 i=3
α0(0)=1 0.0225(*)
0 0
0 (0.5) (0.15) 0.0225
(γ1(0,0)=0.3)
α1(2)=0.3 1
1 (0.3)
(0.05)
1 0 0.00225(*)
1 1 (0.15) 0.00225
(0.3)
(γ1(0,1)=0.15)
1
0 (0.3) 0.045*
0
2 (0.5) (0.15)
0.01125
α1(2)=0.15
1
(0.05) 1
(0.3)

0 0.0045*
3 (0.15) 0.001125

Fig. 8.5 Trellis part corresponding to the transient regime

It can be noticed that after the third step, the choice was between two paths
entering state 0 (both metrics are 0.0225) and as well between two pats entering
state 1 (both 0.00225). In Fig. 8.5 the “upper” paths were chosen in both cases. The
394 8 Trellis Decoding of Linear Block Codes, Turbo Codes

complete corresponding trellis is shown in Fig. 8.6 where heavy dashed line is
decoded, i.e. the word (00000). If the “lower” paths were chosen for both cases, the
corresponding complete trellis is shown in Fig. 8.7, i.e. the word (10100) resulting
in error event. It should be stressed both results are equally reliable.
As pointed above, generalized Viterbi algorithm is based on Maximum
Likelihood (ML) rule. This rule does not take into account a priori probabilities of
symbols at the channel input. Similarly, Viterbi algorithm is not an optimal pro-
cedure if the symbol probabilities at the encoder input are unequal. To achieve the
optimal performance, the decisions should be based on Maximum A Posteriori
Probability (MAP). Such an algorithm was proposed by Bahl, Cocke, Jelinek and
Raviv [73] known today as BCJR algorithm. It takes into account the symbol
probabilities at the channel input. Further, this algorithm guarantees an optimal
decision for every system for which trellis can be constructed.

i=0 i=1 i=2 i=3 i=4 i=5


0 0.01125 ∗ 0 0.005625

0 0 0
0 (0.5) (0.5)
0.0225 0.0001125 0.0001125
1 1 1
(0.05)
0
1
0.00225

(0.05)
1

2
0.045
1

(0.05) ∗)
0.00225 (
0.0045
0
3 (0.5) 0.00225

Fig. 8.6 Complete trellis, the first variant

i=0 i=1 i=2 i=3 i=4 i=5


0 0
0.01125 ∗ 0 0.05625∗
0
0 (0.5) (0.5)
1 0.0225 0.0001125
1 0.0001125
1 1
0.00225 (0.05)
1
1
1

1 (0.05)
0
2
0.045
1
1
1 (0.05) ∗)
0.00225 (
0.0045
0
3 (0.5) 0.00225

Fig. 8.7 Complete trellis, the second variant


Brief Theoretical Overview 395

Here path (branch) metric is modified as follows:


X
ci ðSði1Þ ; SðiÞ Þ ¼ PðSðiÞ jSði1Þ ÞPðXi ¼ xjSði1Þ SðiÞ ÞPðYi jXi ¼ xÞ:
x2Ax

In this expression, the new parameter is PðSðiÞ jSði1Þ Þ, the conditional probability
that system in ith step from the state Sði1Þ goes into the state SðiÞ . For binary codes,
this probability depends on the probabilities of 0 and 1 at the encoder input (not
taken into account in Viterbi algorithm). For the equiprobable input symbols for
every state PðSðiÞ jSði1Þ Þ ¼ 0; 5: However, sometimes the path from state Sði1Þ
must go into only one state (then PðSðiÞ jSði1Þ Þ ¼ 1; 0: The second important dif-
ference relating to the Viterbi algorithm is that node metrics are calculated in two
ways—“forward” (with increasing trellis depth) and “backward” (from the end of
trellis).
1. Forward path metrics are calculated as follows:
– at depth i = 0, only to one node (usually corresponding to state Sð0Þ ¼ 0) unit
metric is adjoined, while for other nodes the metric equals zero.
– at ith trellis depth node metrics are calculated as follows
X
ai ðSðiÞ Þ ¼ ai1 ðSði1Þ Þci ðSði1Þ ; SðiÞ Þ; i ¼ 1; 2; . . .; n
Sði1Þ 2S x

where Sx is a set of all possible states. After finding all possible transitions from the
previous state depth to the considered state the metric of the starting states is
multiplied by the metric of the corresponding branch and all obtained results are
summed to obtain the node metric of the specific node.
2. Backward path metrics are calculated as follows:
– at depth i = n, only to one node (usually corresponding to state S(n) = 0) unit
metric is adjoined, while for other nodes the metric equals zero.
– at the depth i − 1 node metrics are calculated as follows:
X
bi1 ðSði1Þ Þ ¼ bi ðSðiÞ Þci ðSði1Þ ; SðiÞ Þ; i ¼ n; n1; . . .; 1
SðiÞ 2Sx

where Sx is a set of all possible states. Here the metric of node where the branch
enters is multiplied by the metric of corresponding branch and all obtained results
are summed to obtain the node metric of the specific node.
Consider the previous example where y = (0+, 0, 1−, 0, 0). Corresponding trellis
is shown in Fig. 8.3. E.g. for node at the depth i = 1, the following is obtained:
396 8 Trellis Decoding of Linear Block Codes, Turbo Codes

X
c1 ð0; 0Þ ¼ PðSð1Þ ¼ 0jSð0Þ ¼ 0ÞPðXi ¼ xjSð0Þ ¼ 0; Sð1Þ ¼ 0ÞPðY1 jX1 ¼ xÞ
x2f0;1g

¼ 0:5  1  Pð0j0Þ þ 0:5  0  Pð0j1Þ ¼ 0:15


X
c1 ð0; 2Þ ¼ PðSð1Þ ¼ 2jSð0Þ ¼ 0ÞPðXi ¼ xjSð0Þ ¼ 0; Sð1Þ ¼ 2ÞPðY1 jX1 ¼ xÞ
x2f0;1g

¼ 0:5  0  Pð0j0Þ þ 0:5  1  Pð0j1Þ ¼ 0:075;

while c1 ð0; 1Þ ¼ 0 and c1 ð0; 3Þ ¼ 0, because the transitions from state 0 into the
states 1 and 3 do not exist.
Node metrics (presented in Table 8.3) are calculated according to the following
example:

a1 ð2Þ ¼ a0 ð0Þc1 ð0; 2Þ ¼ 1  0:075 ¼ 0:075;


b4 ð3Þ ¼ b5 ð0Þc5 ð3; 0Þ ¼ 1  0:05 ¼ 0:05

shown in Fig. 8.8. In the case when more branches enter into one node, the metrics
are added only. They are not compared neither only one branch is chosen, as in
Viterbi algorithm case.
Now, the further calculation is needed:
For every branch entering alone in the specific node, as well for the node,
parameter ki(S(i)) is calculated as

ki ðSðiÞ Þ ¼ ai ðSðiÞ Þbi ðSðiÞ Þ:

From the trellis one finds to which symbol of encoded sequence (at that depth)
corresponds this transition (i.e. node) and the probability is calculated that in this
step at the encoder output the corresponding symbol has been emitted. If at the
considered depth into every node only one branch enters, the probability that at this
(ith) depth binary zero is decoded is

1 0 2 0 0
i=0 i=1 i=2 i=3 i=4 i=5
α 0(0)=1 β4(0)=0.5 β5(0)=1
0 α 1(0)=0.15 0 0 0 0
0 (0.25) (0.075) (0.5) (0.5)
(γ1(0,0)=0.15)
1
1 (0.15) 1
(0.025)
1
(0.05)
1 0
1 (0.075)
1
(0.15)
(γ1(0,1)=0.075)
1 (0.05)
0 (0.15)
0
2 (0.25) (0.075)
α1(2)=0.075
1
1
(0.025) 1 (0.05)
(0.15)

0 0
3 (0.075) (0.5)
β4(3)=0.05

Fig. 8.8 Node metrics calculation “forward” and “backward”


Brief Theoretical Overview 397

Table 8.3 List of metrics for all nodes “forward” and “backward”
t=0 t=1 t=2 t=3 t=4 t=5
Forward at(0) 1 0.15 0.0375 0.005625 0.002841 0.001455
at(1) 0 0 0.00375 0.0005625 0 0
at(2) 0 0.075 0.01875 0.007031 0 0
at(3) 0 0 0.001875 0.0007031 0.0007031 0
Backward bt(0) 0.001455 0.004922 0.01912 0.25 0.5 1
bt(1) 0 0 0.05625 0.025 0 0
bt(2) 0 0.009562 0.03769 0.0025 0 0
bt(3) 0 0 0.05625 0.025 0.05 0

P ðx¼0Þ
A kj ðSðiÞ Þ
Pi ð0Þ ¼ P x0 ðx¼0Þ
;
Ax kj ðSðiÞ Þ

where the summation in numerator is over all node metrics to which in this step
corresponds binary zero (xi = 0), and summation in denominator is over metric of
all branches entering into any of states.
If two branches enter the node, the following metric is calculated

ri ðSði1Þ ; SðiÞ Þ ¼ ai1 ðSði1Þ Þci ðSði1Þ ; SðiÞ Þbi ðSðiÞ Þ:

From trellis (at that depth) one finds to which symbol of encoded sequence
correspond the transitions and the probability is calculated that in this step at the
encoder output the corresponding symbol has been emitted. The probability that at
this (ith) depth binary zero is decoded is
P
Ax0 ri ðSði1Þ ; SðiÞ Þ
Pi ð0Þ ¼ P ;
Ax ri ðSði1Þ ; SðiÞ Þ

where the summation in numerator is over metrics of all branches to which in this
step corresponds binary zero (xi = 0), and summation in denominator is over
metrics of all branches entering into any of states.
For the considered example, at depths 1 and 2 these probabilities are

k1 ð0Þ
P1 ð0Þ ¼ ¼ 0:5072;
k1 ð0Þ þ k1 ð2Þ
k2 ð0Þ þ k2 ð2Þ
P2 ð0Þ ¼ ¼ 0:9783
k2 ð0Þ þ k2 ð2Þ þ k2 ð1Þ þ k2 ð3Þ
398 8 Trellis Decoding of Linear Block Codes, Turbo Codes

meaning that at the first step probability of zero is 0.5072, and at the second—
0.9783.
Corresponding probabilities at depths three and four are

r3 ð0; 0Þ þ r3 ð1; 1Þ þ r3 ð2; 2Þ þ r3 ð3; 3Þ


P3 ð0Þ ¼
r3 ð0; 0Þ þ r3 ð1; 1Þ þ r3 ð2; 2Þ þ r3 ð3; 3Þ þ r3 ð0; 2Þ þ r3 ð1; 3Þ þ r3 ð2; 0Þ þ r3 ð3; 1Þ
¼ 0:4927;
r4 ð0; 0Þ þ r4 ð3; 3Þ
P4 ð0Þ ¼ ¼ 0:9783:
r4 ð0; 0Þ þ r4 ð3; 3Þ þ r4 ð1; 0Þ þ r4 ð2; 3Þ

Values (S(i)) and ri ðSði1Þ ; SðiÞ Þ are shown in Fig. 8.9 where in the parentheses
are given the probabilities that zero is emitted in every step. On the basis of these
values BCJR decoder could decide that sequence (0, 0, 1, 0, 0) was emitted
(wrongly!). However, the trellis branches corresponding to the above sequence (fat
lines in figure) do not form the path in trellis.
BCJR algorithm is completed, but still the decision was not made. Now, the
valid code sequence should be found which is the nearest to the above estimation
(00100). One way to do it is the comparison of this estimation to all eight possible
code words and to choose one having the minimum Hamming distance from it. In
this case there are two such words:
– code word (00000) (Hamming distance d = 1) and according to BCJR its
estimation of the third bit being 1 is not so reliable (only 0.5072)
– code word (10100) (Hamming distance d = 1) and according to BCJR its
estimation of the first bit being 1 is not so reliable (only 0.5072).

Pt (0) = (0.5072) (0.9783) (0.4928) (0.9783) (0.9758)


t0 t1 t2 t3 t4 0
t5
0 0 0 0
0 0.0007172 0.0007031 0.001406 0.001420
0.0007383 1
1 0.00001406
1
0.00001406
1 0.0007066 0.000007031
1 0
0.00001406 1
1
1 0.00003516
0.0007031
0.0007172 0 0
2 0.000003516
0.00002109
1
1 0.00001758
1
0.000007031
0.00001055 0 0
3 0.000003516 0.00001758

λ(S) σ(S’,S)

Fig. 8.9 Decoding procedure and estimation probabilities


Brief Theoretical Overview 399

Both estimations are equally unreliable and the final decision about the emitted
sequence has the same probability.
The obtained result should be expected because the Viterbi algorithm is opti-
mum when the symbols at the discrete channel input are equally probable, what is
here the case. Because of that, there is no sense to use BCJR here. However, BCJR
has many advantages:
– when the input symbols are not equally probable, Viterbi algorithm (ML
algorithm) is not optimum. But, BCJR algorithm uses MAP, is an optimum in
this case
– BCJR algorithm is suitable for iterative decoding, where on the basis of addi-
tional information, the estimation of symbols can be changed, until they become
sufficiently large to make a reliable estimation.
Decoding using MAP algorithm needs large memory and a substantial number
of operations including multiplication and exponentiation. As in the case of Viterbi
algorithm, here instead of metrics, their logarithms can be used yielding summation
instead of multiplication. Now, Log-MAP algorithm [74] is obtained (Problem
8.9), which has the same performance as BCJR. However, in this case the
expressions of the type lnðeLðd1 Þ þ eLðd2 Þ Þ appear. Here, the following exact
expression can be used

lnðed1 þ ed2 Þ ¼ maxðd1 ; d2 Þ þ lnð1 þ ejd1 d2 j Þ;

where the second term should be successively calculated. If the second term is
neglected, the suboptimum procedure is obtained—Max-Log-MAP algorithm [75]
(Problem 8.9), having smaller complexity. There exist also further modifications.
One is Constant-Log-MAP [76], where the approximation

0; jd1  d2 j [ T
lnðed1 þ ed2 Þ  maxðd1 ; d2 Þ þ ;
C; jd1  d2 j\T

is used and where the value of threshold T is found on the basis of a specific criteria.
The other is Linear-Log-MAP, where the following approximation is used

0; jd1  d2 j [ T
lnðed1 þ ed2 Þ  maxðd1 ; d2 Þ þ ;
ajd1  d2 j þ b; jd1  d2 j\T

and where constants a and b are determined on the basis of a specific criteria.
The application of MAP (and Log-MAP) implies the knowledge of
signal-to-noise ratio at the receiver input. It is needed for metric calculation and for
the corresponding scaling of information during the decoders communication in the
case of iterative decoding.
The Viterbi algorithm output is always “hard”. Only the SOVA (Soft Output
Viterbi Algorithm) [77, 78] (Problem 8.7) algorithm makes possible at the decoder
output, parallelly with a series of decoded bits, to find an estimation of the
400 8 Trellis Decoding of Linear Block Codes, Turbo Codes

reliability of the corresponding decisions. A variant that use Euclidean metric is


used as well as a variant that use correlation metric. SOVA overcomes the draw-
back of classical Viterbi algorithm where the estimations of reconstruction relia-
bility of code sequence were not available. This procedure uses a good feature of
Viterbi algorithm that a decoded word must correspond to a complete trellis path,
i.e. it must be a codeword (BCJR has not this feature). It is ML procedure, and
algorithm is still suboptimum with respect to BCJR. SOVA algorithm can be
implemented and for unquantized signal values and can be used for channels which
are not discrete.
Similarly to BCJR, SOVA is Viterbi algorithm modification where the complete
received sequence is considered and decisions are based on forward and backward
recursions. The algorithm output are not only the hard decisions, but as well the
estimations of a posteriori probabilities of encoded symbols xi or their logarithmic
ratio—LLR(log likelihood ratio) [67]

Prfxi ¼ 1jrg li;c  lmin ; xi ¼ 1 in ML path
Kðxi Þ ¼ log ¼
Prfxi ¼ 0jrg lmin  li;c ; xi ¼ 0 in ML path

where lmin is the metric (squared Euclidean distance) of complete survived path and
lt;c is the metric of path of the strongest concurrent (competitor) in step i.
Consider systematic linear block code (4, 3) yielding simple parity checks.
Generator and control matrices are
2 3
1 0 0 1
G ¼ 4 0 1 0 1 5; H ¼ ½ 1 1 1 1 :
0 0 1 1

Corresponding trellis is shown in Fig. 8.10. In the same figure the decoding of
sequence (0.8, 0.7, 0.1, 0.2) is shown using soft Viterbi algorithm, as well as branch
metrics and the survived path. The decoded sequence is (i1, i2, i3) = (1, 1, 0).
SOVA algorithm is identical to the classic Viterbi algorithm until the moment
when the path is found which has the minimum Euclidean distance. To find LLR,
the metric of every branch that separated from this path using expression

0.8 0.7 0.1 0.2


0 0 0 0
0 (0.64) (0.49) (0.01) (0.04)
1
1
(0.09) (0.81)
1 1
(0.04) (0.64)
1 (0.09) 1
(0.81)
0
1 0 (0.49) (0.01)

Fig. 8.10 “Classic” Viterbi algorithm zapete


Brief Theoretical Overview 401

n o
f
li;c ¼ min
0
l i1 ðl 0
Þ þ m i ðl 0
; lÞ þ l b
i ðlÞ
l;l

f
is found, where li1 ðl0 Þ denotes the metric “forward” of the state from which the
concurrent branch started, mi1i ðl ; lÞ branch metric at the depth i and li ðlÞ metric
0 b

“backward” of state where enters the concurrent branch. Minimization is performed


over all concurrent branches. At the first step, the first term does not exist, and at the
last one, the third term does not exist. Therefore, taking into account that the
decoded sequence is 1100:
i = 1: l1;c ¼ 0:64 þ 0:54 ¼ 1:18 and Kðx1 Þ ¼ l1;c  lmin ¼ 1:18  0:14 ¼ 1:04,
i = 2: l2;c ¼ 0:04 þ 0:49 þ 0:65 ¼ 1:18 and Kðx2 Þ ¼ l2;c  lmin ¼ 1:18  0:14 ¼
1:04,
i = 3: l3;c ¼ 0:13 þ 0:81 þ 0:64 ¼ 1:58 and Kðx3 Þ ¼ l3;c  lmin ¼ 0:14  1:58 ¼
1:44,
i = 4: l4;c ¼ 0:94 þ 0:64 þ 0 ¼ 1:58 and Kðx4 Þ ¼ l4;c  lmin ¼ 0:14  1:58 ¼
1:44.
Metrics according SOVA algorithm are shown in Fig. 8.11.
It is considered that SOVA algorithm is about 1.5 times more complex than
classical Viterbi algorithm. For very long code words, the decision based on the
whole received sequence needs too much decoder memory. However, it can be
substantially reduced by using “sliding window” [68].
As mentioned earlier to obtain the results promised by the Second Shannon
theorem, the codes having long codewords should be used. However, the decoding
procedure for such codes can be very complex. In 1966, Forni proposed con-
catenated codes where two encoders in series are used. Decoding procedure can be
performed iteratively, where the decoders interchange information. Further, in 1993
turbo codes [79] (Problem 8.10) were proposed where the feedback between the
decoders is used to obtain a better estimation of every bit a posteriori probability
during the iterations. Here, the feedback between decoders is used [80]. This
principle was previously applied in turbo motors, and the name turbo codes was
proposed. In the following only the basics of turbo coding will be explained, while
the numerical example is given in Problem 8.10.

0.8 0.7 0.1 0.2


[0]/[0.18] [0.13]/[0.05] [0.14]/[0.04] [0.18]/[0]
0 0 0 0
0
[0.64]/[0.54]
1
1
1 1 [0.94]/[0.64]
0
1 0
[0.04]/[0.14] [0.53]/[0.65]

Fig. 8.11 SOVA algorithm—metrics forward and backward


402 8 Trellis Decoding of Linear Block Codes, Turbo Codes

input

i v0

Convolutional
v1
Interleaving (N bits)
v2
i% Convolutional

Fig. 8.12 Turbo encoding principle

One possible way to implement turbo encoder is shown in Fig. 8.12. Recursive
systematic convolutional codes are supposed. Generally, different codes can be used.
The same information sequence is led into both encoders, but the interleaving is
applied in front of one. Interleaving is introduced to avoid the encoding of the same
sequence by both encoders. In such way the packets of errors will be deinterleaved.
After the information bit sequence some “non information” bits are used to reset
encoders usually into all zeros state (“tail biting”). The corresponding trellises are
terminated, and the decoding of the next information bit sequence is independent of
the previous sequence. Therefore, turbo encoder is linear block coder generating
long “code words”, being one if the goals in construction of error control codes.
Let both encoders have code rate 1/2. Information bits are directly transmitted as
well as parity checks of the first encoder. From the second encoder only the parity
checks are send, but for interleaved sequence. In such way, an equivalent code is
obtained which has code rate 1/3. It should be noted that the second encoder takes
into account only the information bits and not control bits of the first encoder. It is
the difference in regard to serial concatenated encoding, where the outer encoder
“works” considering the complete output of the inner encoder.
In this example (code rate R = 1/3) for a sequence of N information bits, the
encoded sequence has 3 N bits. From all possible 23N binary sequences, only 2 N
are valid code words. Therefore, it is linear (3N, N) block code (N is the interleaver
length).
Consider turbo encoder shown in Fig. 8.13. The interleaver work is described by
(N = 16 bits)
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I f: g ¼
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16

where the second row denotes bit positions at the interleaver output and the first—
bit positions in the original sequence. The puncturing (explained in the previous
chapter) is supposed as well and let P is the puncturing matrix.
Because of puncturing the code rate is R = 1/2 and the corresponding code has
parameters (32, 16) as illustrated in Fig. 8.14.
Brief Theoretical Overview 403

m +
x1
Block + p1 Polar
interleaver

m%
Π form
x2
m%

+
p2

Fig. 8.13 Complete turbo encoder structure

16 LBC
2 Vector space
(32,16)

32
2

Fig. 8.14 Turbo code as a linear block code

Table 8.4 Sequences at some points of encoder from Fig. 8.13


i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
m 1 1 0 1 1 0 1 1 0 1 1 1 0 1 0 0
m′ 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 0
p1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0
p2 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0
x1 1 −1 1 −1 1 −1 1 −1 1 −1 1 1 1 1 −1 −1
x2 1 −1 1 1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 1 −1
404 8 Trellis Decoding of Linear Block Codes, Turbo Codes

DEINTERLEAVING

Λ 2e

Λ1e

r0 DECODER 1 INTERLEAVING
r1

input

INTERLEAVING
r%
0
DECODER 2
r2

output
DEINTERLEAVING

Λ2

Fig. 8.15 Turbo decoding principle

For information sequence of length m = 16, control bits and output bits (in polar
format) are given in Table 8.4. It is obvious that by varying interleaver length
different codes can be obtained. Code rate does not change, but longer code words
will provide better error correction capabilities.
Turbo decoder (Fig. 8.15) has to make best use of redundancy introduced by
encoder. The used decoders work on “their own” information bit sequences and the
corresponding deinterleaving and interleaving are used. The first decoder on the
base of its own input and control bits estimate the information bits (e.g. MAP) and
sends estimations to the second decoder (here interleaving is needed). The second
decoder, on the base of these estimations and its own control bits sends to the first
decoder its own estimations (here the deinterleaving is needed). First decoder now
makes new estimations taking into account these estimations as well control bits
and sends them to the second decoder. Therefore, the iterative procedure starts. In
such way a try is made to extract all possible information introduced by the
encoders. After a few iterations, the second decoder makes hard decisions and sends
to output (after deinterleaving) decoded information bits. Number of iterations and
interleaving procedure are the subjects of investigation.
In Problem 8.10 for above considered turbo encoder (and decoder) numerical
example is given for interleaver length N = 5.
Turbo codes are an interesting example of the specific area in information the-
ory. They were firstly introduced in praxis, and a theoretical explanation (and even
understanding) came later [81]. Generally, it can be said [82] that during turbo
decoding, the a posteriori probability for every bit of information sequence is
calculated in iterations, to give the final decision at the end.
Problems 405

Problems

Problem 8.1 Hamming code generator matrix is


2 3
1 1 1 0 0 0
G ¼ 41 0 0 1 1 0 5;
0 1 0 1 0 1

(a) Is it possible to write a generator matrix in a systematic form, so as that the


code is selfdual?
(b) Describe in details a procedure to form trellis diagram corresponding to a
systematic code.

Solution
One of methods to construct a linear block code trellis based on its parity-check
matrix structure is described in introductory part of this chapter. This method is
applied to construct a trellis for the code whose matrix is given above. Only the
explanations necessary for the understanding of the applied procedure are given
here.
(a) The code has parameters (6, 3), and it can be noticed that it is a code whose
construction is described in Problem 5.8, obtained by Hamming code (7, 4)
shortening. Generator matrix of systematic version can be easily found by
reordering column positions, but the solution is not a unique one. The reader
should think over how many possible matrices exist for this code. Some of
possible solutions are

2 3 2 3
1 0 0 0 0 0 1 0 0 0 1 1
6 7 6 7
Gs1 ¼ 4 0 1 0 1 0 1 5; Gs2 ¼ 4 0 1 0 1 1 0 5;
0 0 1 0 1 1 0 0 1 1 0 1
2 3
1 0 0 1 0 1
6 7
Gs3 ¼ 4 0 1 0 1 1 0 5; . . .
0 0 1 0 1 1

By a suitable matrix choice, the relation PT = P (matrix P is symmetric) can be


achieved, the code described by a generator matrix Gs is self-dual, and it is easy to
find the parity-check matrix
406 8 Trellis Decoding of Linear Block Codes, Turbo Codes

2 3 2 3
1 0 0 0 1 1 0 1 1 1 0 0
Gs ¼ 4 0 1 0 1 0 1 5 ¼ ½I 3 P ) H s ¼ ½P; I 3  ¼ 4 1 0 1 0 1 0 5:
0 0 1 1 1 0 1 1 0 0 0 1

General rule to form a trellis by using a parity-check matrix is given by

St ¼ St1  xt ht ;

where ht denotes t the tth parity-check matrix column transposed and xt can take
values 0 or 1. Trellis part to the depth t = 3 is formed by using equations given
below.

t¼1:
S1 ¼ S0  x1 h1 ¼ ð0 0 0Þ þ x1 ð0 1 1Þ
(
ð0 0 0Þ; x1 ¼ 0
¼
ð0 1 1Þ; x1 ¼ 1
t¼2:
S2 ¼ S1  x2 h2 ¼ S1 þ x2 ð1 0 1Þ

ð0 0 0Þ; x2 ¼ 0
ð0 0 0Þ þ x2 ð1 0 1Þ ¼
ð1 01Þ; x2 ¼ 1

ð0 1 1Þ; x2 ¼ 0
ð0 1 1Þ þ x2 ð1 0 1Þ ¼
ð1 1 0Þ; x2 ¼ 1
t¼3:
S3 ¼ S2  x3 h3 ¼ S2 þ x3 ð110Þ

ð0 0 0Þ; x3 ¼ 0
ð0 0 0Þ þ x3 ð110Þ ¼
ð110Þ; x3 ¼ 1

ð1 0 1Þ; x3 ¼ 0
ð1 01Þ þ x3 ð110Þ ¼
ð0 1 1Þ; x3 ¼ 1

ð0 1 1Þ; x3 ¼ 0
ð0 1 1Þ þ x3 ð110Þ ¼
ð1 0 1Þ; x3 ¼ 1

ð1 1 0Þ; x3 ¼ 0
ð1 1 0Þ þ x3 ð110Þ ¼
ð0 0 0Þ; x3 ¼ 1

In the introductory part of this chapter it is shown that for linear block codes,
information bits at the encoder input determines uniquely the structure of the rest of
trellis. If one starts from state S0 = (000), the following should be satisfied
Problems 407

h1=[111] h2=[110] h3=[101] h5=[100] h6=[010] h7=[001]


S0 (000)
z3=1
z2=0
S1 (001)

S2 (010) z1=1
i1=1

i2=0
S3 (011)

i3=1
S4 (100)

S5 (101)

S6 (110)

S7 (111)

Fig. 8.16 Trellis structure and process of parity-check bits determining

S3 ¼ S2 þ x3 h3 ¼ ððS1 þ x2 h2 Þ þ x3 h3 ¼ S0 þ ði1 h1 þ i2 h2 þ i3 h3 Þ ¼ ðz1 z2 z3 Þ:

In Fig. 8.16 trellis for the information bits i1 ¼ 1; i2 ¼ 0; i3 ¼ 1 is shown.


To information bits i1 ¼ 1; i2 ¼ 0; i3 ¼ 1 the following trellis state is obtained

S3 ¼ ð000Þ þ 1  ð011Þ þ 0  ð101Þ þ 1  ð110Þ ¼ ð101Þ;

but to information bits i1 ¼ 0; i2 ¼ 1; i3 ¼ 0 the same state corresponds as well

S3 ¼ ð000Þ þ 0  ð011Þ þ 1  ð101Þ þ 0  ð110Þ ¼ ð101Þ:

As the states are the same after three steps for both sequences, it should be
noticed as well that the parity-check bits are the same for these information
sequences

c1 ¼ ð101Þ G ¼ ð101101Þ; c2 ¼ ð010Þ G ¼ ð010101Þ:

The codeword has a structure c = (i1 i2 i3 z1 z2 z3), therefore, the rest of path is
uniquely determined

S4 ¼ S3  z1 h4 ; S5 ¼ S4  z2 h5
S6 ¼ S3  z1 h5  z2 h  z3 h7 ¼ ðz1 z2 z3 Þ  z1 ð100Þ  z2 ð010Þ  z3 ð001Þ ¼ ð000Þ
408 8 Trellis Decoding of Linear Block Codes, Turbo Codes

0 1 2 3 4 5 6
S0 (000)

S1 (001)

S2 (010)

S3 (011)

S4 (100)

S5 (101)

S6 (110)

S7 (111)

Fig. 8.17 Complete trellis structure for Hamming code (6, 3)

and trellis must terminate in all zeros state.


A complete Hamming code (6, 3) trellis structure is shown in Fig. 8.17. It should
be noticed that a number of trellis states is determined by a number of parity-check
(redundant) bits in a codeword, being in general case 2n−k.
Problem 8.2 Determine the trellis structure for the Hamming code (8, 4).
Solution
Generator and parity-check matrix of systematic Hamming code (8, 4) have (one
possible) form
2 3 2 3
1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 0
60 1 0 0 1 0 1 17 61 0 1 1 0 1 0 07
G¼6
40
7; H¼6 7
0 1 0 1 1 0 15 41 1 0 1 0 0 1 05
0 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1

By applying procedure described in the previous problem, the trellis structure to


the depth t = 4 is obtained and shown in Fig. 8.18, corresponding to codeword
information bits. Trellis state where the path is at this depth determines uniquely
parity-check bits and the rest of every single path at the trellis.
Problem 8.3 Form a trellis corresponding to Reed-Muller first order code, if the
codeword length is eight bits.
(a) Starting from parity-check matrix,
(b) Starting from generator matrix given in trellis oriented form.
Problems 409

0 1 2 3 4 5 6 7 8

S0=(0000)

S1=(0001)

S2=(0010)

S3=(0011)

S4=(0100)

S5=(0101)

S6=(0110)

S7=(0111)

S8=(1000)

S9=(1001)

S10=(1010)

S11=(1011)

S12=(1100)

S13=(1101)

S14=(1110)

S15=(1111)

Fig. 8.18 Hamming code (8, 4) trellis

Solution
On the basis of the theory exposed in Chap. 5 it is clear that it is Reed-Muller code
RM(8, 4), which can be described by a generator matrix [29, 30]
2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
G¼6
40
7:
0 1 1 0 0 1 15
0 1 0 1 0 1 0 1

The easiest way to form a trellis is to start from a fact that RM(8, 4) code is
self-dual, i.e. its generator matrix is its parity-check matrix as well
2 3
1 1 1 1 1 1 1 1
60 0 0 0 1 1 1 17
H¼6
40
7:
0 1 1 0 0 1 15
0 1 0 1 0 1 0 1
410 8 Trellis Decoding of Linear Block Codes, Turbo Codes

0000 0000
0 0
0 0
1
0000 1010 0000 1101 0000
1 0
1 1 1 1
0
0 1001 1110 0
1 1
0000 0000
0 1001 0001 1 1110 0
1 1
1 0011 0 0 0011
0000 0000
1 1 1 1
1 1000 0001 1011 0010 1100 0001 1111 1
0
0
0 0
0
0 0001
0001
1 0011 0
1000 0 1 1111
0010 0010
0 1 0
1
1000 1111

Fig. 8.19 Trellis diagram with the depth 8 for RM code (8, 4), the states are denoted by using
parity-check matrix

Following the procedure described in previous problems the trellis is formed on


the generator matrix basis, shown in Fig. 8.19. Trellis diagram of the depth eight
was not drawn in a rectangular form, but a notation similar to a state diagram was
used. However, the horizontal axis here corresponds to successive time intervals as
well.
(b) When parity-check matrix of linear block code is not known, trellis can be
formed on the generator matrix basis, if it is written in a Trellis Oriented
Form (TOF). For a generator matrix G it is said that is TOF if the following
conditions are satisfied [70, 71]:
1. The leading one in every matrix G row is before the leading one in the
every next row, i.e. the row below it.
2. No two rows have the last ones in the same column.
The first element in non zero binary n-tuple v = (v0, v1, …, vn−1), different from
zero, is a leading one, and the last element different from zero is the last one.
Matrix G is not TOF. Changing the second and the fourth row, matrix G′ is
obtained
2 3
1 1 1 1 1 1 1 1
6 0 1 0 1 0 1 0 17
G0 ¼ 6
40
7:
0 1 1 0 0 1 15
0 0 0 0 1 1 1 1

If the fourth row of matrix G′ is added to first, second and the third row, the TOF
matrix is obtained, called trellis oriented generator matrix GTOGM
Problems 411

Now for every row the intervals from the first one to the last one can be found,
measured with respect to the bit position in a row (starting from a zeroth position).
If the first one is in the ith position, and the last in the jth position, then the active
time intervals are [i + 1, j] if j > i or / for j = i, and the following is obtained

sa ðg0 Þ ¼ ½1; 3; sa ðg1 Þ ¼ ½2; 6; sa ðg2 Þ ¼ ½3; 5; sa ðg3 Þ ¼ ½5; 7:

At the moment 0  i  n a matrix Gsi can be defined consisting of these


GTOGM rows whose active time interval contains the moment i (in a previous
problem the moments when the states change are denoted by dashed lines). This
matrix is a basis for trellis forming because:
– ordinal number of a row appearing in matrix Gsi at the moment i + 1 (and was
not at the i-moment) defines the ordinal number of current input information bit,
denoted by a ,
– ordinal number of a row existing at the ith moment in matrix Gsi but being
omitted at the next moment defines the ordinal number of the oldest information
bit stored in the encoder memory at the moment, denoted by a .
– subset of information bits corresponding to the matrix Gsi rows, denoted by Asi ,
where information bits from Asi define the encoder state at the moment i.
In Table 8.5 all previously defined notions from RM code (8, 4) are given, for
the every moment when the state changes. Corresponding trellis is shown in
Fig. 8.20, where the states are denoted according to the last column of this table,

Table 8.5 The sets defining Time I Gsi a a0 Asi State mark
states and marks for trellis
which has the depth eight for 0 / a0 – / (0000)
RM code (8, 4) 1 {g0} a1 – {a0} (a0000)
2 {g0, g1} a2 – {a0, a1} (a0 a100)
3 {g0, g1, g2} – a0 {a0, a1, a2} (a0 a1 a20)
4 {g1, g2} a3 – {a1, a2} (0 a1 a20)
5 {g1, g2, g3} – a2 {a1, a2, a3} (0 a1 a2 a3)
6 {g1, g3} – a1 {a1, a3} (0 a10 a3)
7 {g3} – a3 {a3} (000 a3)
8 / – – / (0000)
412 8 Trellis Decoding of Linear Block Codes, Turbo Codes

0000 0000
0 0
0 0
1
0000 0010 0000 0010 0000
1 0
1 1 1 1
0
0 0100 0100 0
1 1
0000 0000
0 0100 0010 1 0100 0
1 1
1 0110 0 0 0110
0000 0000
1 1 1 1
1 1000 1000 1000 0100 0001 0001 0001 1
0
0
0 0
0
0 1010
0011
1 0110 0
1100 0 1 0101
1100 0101
0 1 0
1
1110 0111

Fig. 8.20 Trellis diagram for RM code (8, 4), the states denoted by information bits set

Table 8.6 Data defining states and their marks for BCH code (7, 4)
i Gsi a a0 Asi State
0 / a0 – / (000)
1 {g0} a1 – {a0} (a000)
2 {g0, g1} a2 – {a0, a1} (a0 a10)
3 {g0, g1, g2} a3 a0 {a0, a1, a2} (a0 a1 a2)
4 {g1, g2, g3} – a1 {a1, a2, a3} (a1 a2 a3)
5 {g2, g3} – a2 {a1, a3} (a2 a3 0)
6 {g3} – a3 {a3} (a3 00)
7 / – – / (0000)

and the branches between two states contain the information about the input
information bit by which the state mark is supplemented or about the oldest bit
being omitted from the state mark. Although the marks are different from those in
the Fig. 8.19 it is clear that both trellises have the same shape, i.e. they are
equivalent.
Problem 8.4 Form the trellis for BCH code which corrects one error, the codeword
length seven bits.
Solution
It is BCH code (7, 4) which has the generator polynomial gðxÞ ¼ x3 þ x þ 1,
analyzed previously in Problem 6.2. It should be noticed that a cyclic codes gen-
erator matrix has always trellis oriented form
Problems 413

0 1 2 3 4 5 6 7

(000)

(001)

(010)

(011)

(100)

(101)

(110)

(111)

Fig. 8.21 Trellis diagram for BCH code (7, 4)


2 3 2 3 2 3
g0 gðxÞ 1 1 0 1 0 0 0
6 7 6 7 6 7
6 g1 7 6 xgðxÞ 7 6 0 1 1 0 1 0 07
6 7 6 7 6 7
G¼6 7¼6 7 ¼6 7
6 g2 7 6 x2 gðxÞ 7 6 0 0 1 1 0 1 07
4 5 4 5 6 7
6 7
g3 3
x gðxÞ 60 0 0 1 1 0 17
0 1 2 3 4 5 6

Furthermore, the procedure for determining the active time rows intervals is
used, which for cyclic codes always differ for one bit sa(g0) = [1, 3], sa(g1) = [2, 4],
sa(g2) = [3, 5], sa(g3) = [4, 6]. Based on the data given in Table 8.6, trellis diagram
for BCH code (7, 4) is shown in Fig. 8.21.
Problem 8.5 At the output of systematic Hamming code (6, 3) encoder is the
codeword c ¼ ð011011Þ. This sequence is transmitted as a polar signal, while in
channel the additive Gaussian noise is superimposed yielding at receiving end the
signal y ¼ ð1:1; 0:2; 0:1; 0:8; þ 0:9; þ 1:2Þ which is decoded and delivered
to the user.
(a) Draw the block scheme of the transmission system.
(b) Decode the received codeword applying Viterbi algorithm by using Hamming
metric.
(c) Decode the received codeword applying Viterbi algorithm by using Euclidean
metric, supposing that a quantization error can be neglected.
(d) Decode the received codeword applying Viterbi algorithm by using Euclidean
metric, if the uniform quantizer with q = 4 levels is used (−1.5; −0.5; 0.5; 1.5).

Solution
(a) The binary sequence generated by a source is firstly encoded by Hamming
code with parameters (6, 3), for which the trellis is formed in Problem 8.1.
414 8 Trellis Decoding of Linear Block Codes, Turbo Codes

i Linear c x Channel y Q(aˆn ) 2


yq
0→-1 1
Viterbi decoder,
Source block with -2 -1 1 2 aˆn
Destination
1→+1 Euclidean metric
-1

encoder noise
-2

q levels

Q(aˆn )
2

1
-1→0 r Viterbi decoder,
-2 -1 1 2 aˆn

+1→1 Hamming metric


-1

-2

q=2

Fig. 8.22 Block diagram of Viterbi decoding for linear block codes

Encoding is performed multiplying all possible three-bit combinations by code


generator matrix, given as well in the same problem.
Encoded sequence is converted into a polar signal adjoining the pulse with
amplitude +1 V to binary one, and with amplitude −1 V to binary zero. At the
transmitted signal x a white Gaussian noise is superimposed having zero average
value. To the received signal corresponds the real numbers sequence

y ¼ ð1:1; 0:2; 0:1; 0:8; þ 0:9; þ 1:2Þ:

In the case of soft decoding, the sequence is first quantized (q levels) and then
the Viterbi algorithm is applied by using Euclidean metric, and a decoded sequence
is delivered to user. When the quantizer levels are (−1.5, −0.5, 0.5, 1.5) the
boundaries of quantizing intervals are (−∞, −1, 0, 1, ∞), yielding

yq ¼ ð1:5; 0:5; 0:5; 0:5; 0:5; 1:5Þ:

In the case of hard decoding, two level quantization is performed (threshold is at


zero). Quantized signal is then converted into a binary sequence, being in this case

r ¼ ð0; 0; 0; 0; 1; 1Þ;

and after that sent into the Viterbi decoder using Hamming metric. Complete
system block scheme is shown in Fig. 8.22.
(b) In the case of Viterbi algorithm decoding by using Hamming metric, it is
understood that at the receiver input there is decision block which has one
threshold (corresponding to two-level quantizer). Therefore, at this block
output only two voltage levels can appear, corresponding to binary one and to
binary zero.
The decoding is then performed at a bit level, where a notion of Hamming
distance is used, defined as the number of positions where two binary sequences
differ. Therefore, the Hamming distance between the received sequence r and
emitted codeword c is
Problems 415

0 0 1 0 2 0 3 0 4 1 5 1 6
S0 (000)
0
3
S1 (001)

S2 (010)

1
S3 (011)
2

S4 (100)

2
S5 (101)
1

1
S6 (110)
2

S7 (111)

Fig. 8.23 The first step during code (6, 3) decoding by Viterbi algorithm by using Hamming
metric

X
n
dðc; rÞ ¼ ðct  rt Þ;
t¼1

where  is a sign for modulo-2 addition, and summation over t is carried out in
decimal system.
Trellis makes possible to find the Hamming distances between the received
sequence and all possible codewords, with a minimal complexity by using a
maximum likelihood criterion. In the first step, the part of trellis is observed to the
moment when the transient regime is finished, in this example at the trellis depth
t = 3. Part of a received sequence until this moment is compared to the corre-
sponding paths, and from two paths entering the same state, that one is chosen
having a smaller Hamming distance. The same procedure is continued until all
information bits are received, as shown in Fig. 8.23. Obviously, these steps fully
correspond to convolutional codes Viterbi decoding by using hard decision [67].
In the next steps, shown in Fig. 8.24, the previously overwritten branches are
eliminated and a procedure continues in the trellis part where there is the rest of
codeword bits determined by the state in which is the path after the receiving of
information bits. Because of that, the number of states is reduced and as a result the
path is obtained starting from all zeros state and terminating in it. In this case
the sequence is decoded corresponding to states sequence S0 ! S3 ! S3 !
S3 ! S3 ! S1 ! S0, and reconstructed codeword and decoded information word are
416 8 Trellis Decoding of Linear Block Codes, Turbo Codes

0 0 0 0 1 1 1
2
S0 (000)
0 1
2

3
S1 (001)
1

S2 (010)

1
S3 (011)

S4 (100)

S5 (101)
1

S6 (110)
1

S7 (111)

Fig. 8.24 The second and the third step during code (6, 3) decoding by Viterbi algorithm by using
Hamming metric

^c ¼ ð100011Þ ) ^i ¼ ð100Þ:

(c) The same example is considered, but decoding is performed by Viterbi


algorithm using Euclidean metric. Euclidean distance is [63]

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X n
dE ðy; xÞ ¼ ðyt  xt Þ2 ;
t¼1

but the square root operation is not often performed and the square of Euclidean
distance is considered (it does not influence to the relation of the two distances).
Because the quantization error is negligible, it will be supposed that the quanti-
zation was not performed. The procedure of decoding is shown in Fig. 8.25, where
the branches overwritten in the first step are denoted by using the sign ‘’, while
those overwritten in the second and the third step are denoted by ‘+’ and ‘o’,
respectively.
The procedure of calculating squared Euclidean distance is given in Table 8.7
where in bold font the metrics corresponding to survived paths are shown. As for
the algorithm by using Hamming metric, the first decision is at the trellis depth
t = 3 (here the transient regime is finished) and the second decision is at the depth
Problems 417

0 -1.1 1 -0.2 2 -0.1 3 -0.8 4 +0.9 5 +1.2 6


S0 (000) 9.95
1.46 7.30
5.11 2.75
7.06
S1 (001)
9.11
2.71

S2 (010)

5.86
S3 (011)
2.66

S4 (100)

6.26
S5 (101)
2.26

1.86
S6 (110)
6.66

S7 (111)

Fig. 8.25 Decoding of (6, 3) code by Viterbi algorithm by using Euclidean metric

Table 8.7 The squared Euclidean distances calculation, no quantization


t=3 t=5 t=6
−1.1 −0.2 −0.1 −0.8 0.9 1.2
0–0: S0-S0-S0-S0 −1 − 1 −1 1.46 −1 − 1: 1.46 + 3.65 = 7.30 −1: 5.11 + 4.84 = 9.95
S0-S3-S6-S0 +1 +1 +1 7.06 +1 +1: 1.86 + 3.25 = 5.11 +1: 2.71 + 0.04 = 2.75
0–3: S0-S3-S3-S3 +1 − 1 −1 5.86 +1 − 1: 2.26 + 6.85 = 9.11
S0-S0-S5-S3 −1 +1 +1 2.66 -1 +1: 2.66 + 0.05 = 2.71
0–5: S0-S3-S3-S5 +1 − 1 +1 6.26
S0-S0-S5-S5 −1 +1−1 2.26
0–6: S0-S0-S0-S6 −1 − 1 +1 1.86
S0-S3-S6-S6 +1 +1−1 6.66

t = 5. For a difference from hard decoding, by soft decoding a correct decision is


made about the emitted sequence, as

^c ¼ ð011011Þ ) ^i ¼ ð011Þ:

(d) If before the decoding by applying the Viterbi algorithm the quantization was
performed, Euclidean distance is defined as
418 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Table 8.8 Euclidean metrics calculation, q = 4 level calculation


t=3 t=5 t=6
−1.5 −0.5 −0.5 −0.5 1.0 1.5
0–0: S0-S0-S0-S0 −1 − 1 −1 0.75 −1 − 1: 0.75 + 2.50 = 3.25 −1: 3.25 + 6.25 = 9.50
S0-S3-S6-S0 +1 +1 +1 10.75 +1 +1: 2.75 + 2.50 = 5.25 +1: 5.25 + 2.75 = 8.00
0–3: S0-S3-S3-S3 +1 − 1 −1 6.75 +1 − 1: 2.75 + 4.50 = 7.25
S0-S0-S5-S3 −1 +1 +1 4.75 −1 +1: 4.75 + 0.50 = 5.25
0–5: S0-S3-S3-S5 +1 − 1 +1 8.75
S0-S0-S5-S5 −1 +1−1 2.75
0–6: S0-S0-S0-S6 −1 − 1 +1 2.75
S0-S3-S6-S6 +1 +1−1 8.75

0 -1,5 1 -0.5 2 -0.5 3 -0.5 4 +0.5 5 +1.5 6


S0 (000) 9.50
0.75 3.25
5.25 8.00
10.75
S1 (001)
7.25
5.25

S2 (010)

6.75
S3 (011)
4.75

S4 (100)

8.75

S5 (101)
2.75

2.75
S6 (110)
8.75

S7 (111)

Fig. 8.26 Decoding by Viterbi algorithm by using Euclidean metric, q = 4 level quantization

X
n
dE2 ðyq ; xÞ ¼ ðyqt  xt Þ2 ;
t¼1

the procedure in this case is given in Table 8.8, and the trellis and the order of paths
elimination is shown in Fig. 8.26. It is obvious that the decoding is performed
successfully although at the depth t = 5 one error was made during the decision, but
it was corrected in the next step. Because the metric difference is smaller than in a
previous case, it is clear that the decision made was less reliable.
Problems 419

Problem 8.6 Binary memoryless source emitting binary zero with the probability
Pb(0) = 0.8 sends bits to Hamming encoder defined by the generator matrix
2 3
1 0 0 0 1 1 0
60 1 0 0 1 0 17
G¼6
40
7:
0 1 0 0 1 15
0 0 0 1 1 1 1

The source emits a bit sequence (0010) and a corresponding output encoder
sequence enters a line encoder which generates a polar signal having
amplitudes ±1 V. In the channel at the transmitted signal a noise is superimposed.
At the receiver input, before the decoder, there is a four level quantizer (−1.5; −0.5;
0.5; 1.5 V). The channel is memoryless and the transition probabilities of input
symbols to the quantizer output levels are given in Fig. 8.27.
At the receiver input (before the quantizer) the following signal is detected
y ¼ ð1:1; 1:2; 2:1; 1:8; 0:9; 1:9; 1:2Þ.
(a) Reconstruct the bit sequence emitted by a source, if the decoding is performed
by using a generalized Viterbi algorithm.
(b) Repeat the previous procedure if the decoding is performed by BCJR
algorithm.
(c) Decode the received sequence by BCJR algorithm if the probabilities of source
symbols are not known.

0.4 yq=-1.5
0.1
0.3
x=-1 yq=-0.5
0.2
0.2
x=+1 yq=+0.5
0.3
0.1
0.4 yq=+1.5

Fig. 8.27 Transition probabilities for channel and quantizer

x Channel y -1.5→0
yq r
Q(aˆn )
i Linear c
2
Viterbi/
0→-1 1
-0.5→0+
Source block with -2 -1 1 2 aˆn
BCJR Destination
1→+1 0.5→1-
-1

decoder
-2

encoder noise q=4 levels 1.5→1


Discrete channel without memory

Fig. 8.28 Block diagram for linear block codes decoding by using trellis
420 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Table 8.9 Transition P(y/x)


probabilities for discrete
memoryless channel x/y 0 0+ 1− 1
0 0.4 0.3 0.2 0.1
1 0.1 0.2 0.3 0.4

Solution
Similarly to a previous problem, it is suitable to draw system block scheme (shown
in Fig. 8.28). The Hamming code (7, 4) is used and a generator matrix is known,
therefore to the information sequence i = (0010) corresponds the codeword
c = (0010011).
At the encoder output two binary symbols can appear (0 and 1) while at the input
of Viterbi/BJCR decoder one from four levels obtained by quantization can appear.
If to the amplitude levels the logic symbols 0, 0+, 1− and 1 are joined, line encoder,
channel and quantizer can be considered as an equivalent discrete memoryless
channel. At this channel output two symbols more reliable (0 and 1) and two
symbols less reliable (0+, 1−) can appear, the corresponding transition probabilities
are given in Table 8.9. From a known channel output sequence one obtains

y ¼ ð1:1; 1:2; 2:1; 1:8; 0:9; 1:9; 1:2Þ ! yq ¼ ð1:5; 1:5; 1:5; 1:5; 0:5; 1:5; 1:5Þ

and the sequence at the decoder input is r = (1, 0, 1, 0, 1−, 1, 1).


Because the generator matrix is given in a systematic form, the parity-check
matrix is easily obtained
2 3
1 1 0 1 1 0 0
H ¼ 41 0 1 1 0 1 0 5;
0 1 1 1 0 0 1

the trellis shown in Fig. 8.29 is easily obtained as well. It should be noticed that it is
slightly different from that one shown in Fig. 8.2.
(a) When a generalized Viterbi algorithm is used, the distances accumulated in
the nodes are calculated on the basis of relation [72]

Y
t
at ðSðtÞ Þ ¼ cj ðSðj1Þ ; SðjÞ Þ;
j¼1

where Sðj1Þ denotes a starting state (trellis node at the depth j − 1) and SðjÞ
denotes the state where the branch terminates (trellis node at the depth j).
Branch metrics cj ðSðj1Þ ; SðjÞ Þ are the same as the transition probabilities
describing the channel and for a decoder input sequence are given in
Table 8.10 (the value P(yj|0) corresponds to the horizontal segments, denoted
by a dashed line, while the probability P(yj|1) corresponds to the segments
Problems 421

h1=[110] h2=[011] h3=[101] h4=[110] h5=[100] h6=[010] h7=[001]


S0 (000)

S1 (001)

S2 (010)

S3 (011)

S4 (100)

S5 (101)

S6 (110)

S7 (111)

Fig. 8.29 Trellis of a systematic Hamming code (7, 4)

Table 8.10 Transition probabilities for a discrete memoryless channel


t 1 2 3 4 5 6 7
yt 1 0 1 0 1− 1 1
((P(yt|0); P (0.1, (0.4, (0.1, (0.4, (0.2, (0.1, (0.1,
(yt|1)) 0.4) 0.1) 0.4) 0.1) 0.3) 0.4) 0.4)

denoted by a full line). From the two paths entering into one node, this one is
chosen which has the greater metric. Procedure of decoding by using
generalized Viterbi algorithm is shown in Fig. 8.30. The survived path, i.e. the
procedure result, corresponds to a codeword c′ = (1010101) and to informa-
tion word i′ = (1010). It is obvious that the used algorithm did not result in a
correct decoding of the source emitted sequence. In the continuation of the
solution, it will be shown that, by applying BCJR algorithm, the successful
decoding of is still possible.
(b) BCRJ algorithm procedure is based on the following branch calculation [73]

X
ct ðSðt1Þ ; SðtÞ Þ ¼ PðSðtÞ jSðt1Þ ÞPðXt ¼ xjSðt1Þ SðtÞ ÞPðYt jXt ¼ xÞ
x2Ax

depending besides on the channel transition probabilities and the trellis structure
(described by the probabilities PðSðtÞ jSðt1Þ Þ) as well as on the probabilities of the
encoder input symbols. It should be noticed that in this problem at the depths t = 4,
t = 6 and t = 7 is PðSðtÞ jSðt1Þ Þ ¼ 1; 0. At the other trellis depths this probability is
422 8 Trellis Decoding of Linear Block Codes, Turbo Codes

-
1 0 1 0.004 0 1 0.0013 1 1.28×10
-4
1 7.68×10
-5

S0 (000)
0.1 0.4 0.1 0.0005 7.68×10
-4
-4
0.016 0.4 3.072×10
0.4
-4
0.0003 7.68×10
S1 (001)
-4
0.0077 5.12×10

0.1 0.0013
S2 (010)
0.0019
0.4
0.016
0.1 0.0013
S3 (011)
0.004 0.0005

S4 (100)

0.4
0.1 0.001
0.1
S5 (101)
0.4 0.064

0.4 0.004
S6 (110)
0.4 0.1 0.016

S7 (111)

Fig. 8.30 Decoding procedure by using a generalized Viterbi algorithm


ðtÞ ðt1Þ 0:8 if SðtÞ ¼ Sðt1Þ ;
PðS jS Þ¼
0:2 if SðtÞ 6¼ Sðt1Þ :

The metrics are calculated going forward and backward from


X
at ðSðtÞ Þ ¼ at1 ðSðt1Þ Þct ðSðt1Þ ; SðtÞ Þ; t ¼ 1; 2; . . .; 7
Sðt1Þ 2Sx
X
bt1 ðSðt1Þ Þ ¼ bt ðSðtÞ Þct ðSðt1Þ ; SðtÞ Þ; t ¼ 7; 6; . . .; 1
SðtÞ 2Sx

and the corresponding numerical values are given in Table 8.11. For the trellis
depths t = 1, t = 2 and t = 4, the following is calculated

kt ðSðtÞ Þ ¼ at ðSðtÞ Þbt ðSðtÞ Þ;

from which the estimations are obtained

k1 ð0Þ k2 ð0Þ þ k2 ð6Þ


P1 ð0Þ ¼ ; P2 ð0Þ ¼ ;
k1 ð0Þ þ k1 ð6Þ k2 ð0Þ þ k2 ð6Þ þ k2 ð3Þ þ k2 ð5Þ
k4 ð0Þ þ k4 ð3Þ þ k4 ð5Þ þ k4 ð6Þ
P4 ð0Þ ¼ :
k4 ð0Þ þ k4 ð3Þ þ k4 ð5Þ þ k4 ð6Þ þ k4 ð1Þ þ k4 ð2Þ þ k4 ð4Þ þ k4 ð7Þ

According to the procedure described in the introductory part of this, the fol-
lowing values are calculated
Problems

Table 8.11 Node metrics forward and backward for various trellis depths, BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
Forward ai(0) 1 0.08 0.0256 0.00217 6.96  10−4 1.523  10−4 1.023  10−4 4.33  10−5
ai(1) 0 0 0 0 4.35  10−5 2.176  10−4 8.269  10−5 0
ai(2) 0 0 0 0 4.35  10−5 2.176  10−4 0 0
ai(3) 0 0 0.0016 0.00217 6.96  10−4 1.523  10−4 0 0
ai(4) 0 0 0 0 4.35  10−5 0 0 0
ai(5) 0 0 0.0016 0.00217 6.96  10−4 0 0 0
ai(6) 0 0.08 0.0256 0.00217 6.96  10−4 0 0 0
ai(7) 0 0 0 0 4.35  10−5 0 0 0
Backward bi(0) 4.33  10−5 3.17  10−4 9.52  10−4 0.0016 0.002 0.01 0.1 1
bi(1) 0 0 0 0 0.008 0.04 0.4 0
bi(2) 0 0 0 0 0.008 0.04 0 0
bi(3) 0 0 9.52  10−4 0.0103 0.032 0.16 0 0
bi(4) 0 0 0 0 0.003 0 0 0
bi(5) 0 0 6.4  10−4 0.004 0.012 0 0 0
bi(6) 0 2.24  10−4 6.4  10−4 0.004 0.012 0 0 0
bi(7) 0 0 0 0 0.048 0 0 0
423
424 8 Trellis Decoding of Linear Block Codes, Turbo Codes

rt ðSðt1Þ ; SðtÞ Þ ¼ at1 ðSðt1Þ Þct ðSðt1Þ ; SðtÞ Þbt ðSðtÞ Þ;

and on this basis the values of other bits are estimated

r3 ð0; 0Þ þ r3 ð3; 3Þ þ r3 ð5; 5Þ þ r3 ð6; 6Þ


P3 ð0Þ ¼ ;
r3 ð0; 0Þ þ r3 ð3; 3Þ þ r3 ð5; 5Þ þ r3 ð6; 6Þ þ r3 ð0; 3Þ þ r3 ð3; 0Þ þ r3 ð5; 6Þ þ r3 ð6; 5Þ
r5 ð0; 0Þ þ r5 ð1; 1Þ þ r5 ð2; 2Þ þ r5 ð3; 3Þ
P5 ð0Þ ¼ ;
r5 ð0; 0Þ þ r5 ð1; 1Þ þ r5 ð2; 2Þ þ r5 ð3; 3Þ þ r5 ð4; 0Þ þ r5 ð5; 1Þ þ r5 ð6; 2Þ þ r5 ð7; 3Þ
r6 ð0; 0Þ þ r6 ð1; 1Þ
P6 ð0Þ ¼ :
r6 ð0; 0Þ þ r6 ð1; 1Þ þ r6 ð2; 0Þ þ r6 ð3; 1Þ

Numerical values of probabilities that the corresponding bit of the estimated


encoded sequence equals to zero are

P1 ð0Þ ¼ 0:5865; P2 ð0Þ ¼ 0:9412; P3 ð0Þ ¼ 0:3071; P4 ð0Þ ¼ 0:9327;


P5 ð0Þ ¼ 0:5628; P6 ð0Þ ¼ 0:2362; P7 ð0Þ ¼ 0:2362;

from which the sequence emitted by the encoder is estimated c′ = (0010011). It is


obvious that the second and the fourth bit are successfully estimated, while the
estimations of the first and of the fifth bit have smaller reliability. In Fig. 8.31 it is
shown that a path corresponding to the estimated sequence exists at the trellis and
the decoder takes out the first four bits considering that a sequence i′ = (0010) is
valid. In this case, the decoding was successful.
(c) When the decoder has not information about probabilities of symbols emitted
by a source, it is assumed that these symbols are equiprobable. The decoding
procedure was repeated and the numerical metrics values “forward” and
“backward” are given in Table 8.12. The probability that to the ith trellis depth
corresponds binary zero at the encoder input, denoted by Pi(0), has a
numerical value

P1 ð0Þ ¼ 0:3059; P2 ð0Þ ¼ 0:8; P3 ð0Þ ¼ 0:2; P4 ð0Þ ¼ 0:8;


P5 ð0Þ ¼ 0:4; P6 ð0Þ ¼ 0:4118; P7 ð0Þ ¼ 0:2;

and the estimated encoded sequence is c′ = (1010111). It is easy to verify that such
sequence does not exist at the trellis (trellis part shown by heavy line with the
arrows shown in Fig. 8.31). It is obvious that at the depth t = 6 one cannot find a
segment corresponding to binary one at the input, because the parity-check bits are
fully defined by the information bits. Therefore, the decoded word is finally
Problems

Table 8.12 Node metrics forward and backward for BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
Forward at(0) 1 0.05 0.01 0.0025 5  10−4 1.375  10−4 1.0625  10−4 5.3125  10−5
at(1) 0 0 0 0 1.25  10−4 5.125  10−4 1.0625  10−4 0
at(2) 0 0 0 0 4.06  10−4 2.3125  10−4 0 0
at(3) 0 0 0.01 0.0025 5  10−4 1.375  10−4 0 0
at(4) 0 0 0 0 1.25  10−4 0 0 0
at(5) 0 0 0.0025 0.0081 1.625  10−3 0 0 0
at(6) 0 0.2 0.04 0.0025 5  10−4 0 0 0
at(7) 0 0 0 0 1.25  10−4 0 0 0
Backward bt(0) 5.31  10−5 3.25  10−4 1.45  10−4 0.0028 0.002 0.01 0.1 1
bt(1) 0 0 0 0 0.008 0.04 0.4 0
bt(2) 0 0 0 0 0.008 0.04 0 0
bt(3) 0 0 8.875  10−4 0.00655 0.032 0.16 0 0
bt(4) 0 0 0 0 0.003 0 0 0
bt(5) 0 0 7  10−4 0.0028 0.012 0 0 0
bt(6) 0 1.84  10−4 7  10−4 0.0028 0.012 0 0 0
bt(7) 0 0 0 0 0.048 0 0 0
425
426 8 Trellis Decoding of Linear Block Codes, Turbo Codes

-
1 0 1 0 1 1 1
S0 (000)

S1 (001)

S2 (010)

S3 (011)

S4 (100)

S5 (101)

S6 (110)

S7 (111)

Fig. 8.31 Procedure for decoding by using BCJR algorithm for the known and the unknown a
priori probabilities

c00 ¼ ð1010101Þ;

identical to that one obtained by applying a generalized Viterbi algorithm.


Therefore, the following is shown:
(1) Viterbi algorithm is an optimal procedure for the case when the symbols at the
encoder input are equiprobable (ML solution is equivalent to MAP solution).
(2) Viterbi algorithm does not yield the estimation of reliability for the decoded
bits, for a difference from BCJR algorithm. This estimation (for every single
bit) usually is described as log likelihood ratio (LLR)

Prfit ¼ 1jrg
Kðxt Þ ¼ ln ,
Prfit ¼ 0jrg

and the estimation is more reliable as its absolute value is greater (negative value
indicates that the zero was sent, and positive value indicates that one was sent).
(3) BCJR algorithm does not yield a correct solution if a sufficiently good esti-
mation of a priori probabilities is not accessible. The estimations obtained as a
result of BCJR algorithm are not optimal in this case.

Problem 8.7 The same transmission system from the previous problem is con-
sidered, but the quantizer output sequence enters the decoder whose procedure is
based at SOVA algorithm with the correlation metric. Explain the decoding
Problems 427

procedure, reconstruct the transmitted codeword and find the corresponding esti-
mations of reliability.
Solution
Block scheme is practically the same as shown in Fig. 8.28, with a difference that
instead of BJCR decoder Soft Output Viterbi Algorithm (SOVA) is used [77].
When a classic Viterbi algorithm is applied, after decision which branches have
been survived, a series of decisions about “better” paths results in the decoded bit
sequence. Not taking into account whether the metrics are calculated by using
Hamming or Euclidean distance (or the channel transition probabilities, as for a
generalized algorithm version), i.e. not taking into account whether a hard or soft
decoding is used, the Viterbi algorithm output is always “hard”. Only SOVA
algorithm makes possible at the decoder output, in parallel with a series of decoded
bits, to find as well an estimation of the reliability of corresponding decisions.
The first step in SOVA algorithm is the same as in any Viterbi algorithm variant
by using soft decision—on the basis of the branch metrics the survived branches are
found. A variant that uses Euclidean metric is used usually or a variant that uses
correlation metric calculated as follows [26]

X
n
dK ðc; yq Þ ¼ ð1Þ~ct yqt ,
t¼1

where ~ct represents the tth bit of the estimated codeword, and yqt denotes the tth
value of quantized received signal. The procedure, for this case, is shown in the
Fig. 8.32.

1.5 -1.5 1.5 -1.5 0.5 1.5 1.5


0 -1.5 0 -1.5 3 2.5 1 3.5
S0 (000)
-1.5 1.5 -1.5 0.5 5
1.5 1.5 6.5
0 -0.5 5
S1 (001)
6.5 4

-1.5 3 2.5
S2 (010)
3.5
1.5
1.5 3
-1.5 2.5
S3 (011)
0 -1.5 0.5
0
S4 (100)

1.5
-3 -1.5 -4.5 6
-1.5
S5 (101)
1.5 4.5

1.5 -1.5 3
S6 (110)
1.5 1.5 3 -1.5 1.5

S7 (111) 0

Fig. 8.32 SOVA algorithm—calculating of correlation metrics forward


428 8 Trellis Decoding of Linear Block Codes, Turbo Codes

By comparing Fig. 8.32 to the Fig. 8.30, it can be firstly noticed that the
branches are overwritten following the same order and the decoded codeword in
both cases is the same

c0 ¼ ð1010101Þ:

Although the metrics are calculated by using different formulas, it can be noticed
that for a fixed trellis depth to the mutual same numerical metric values from
Fig. 8.30, correspond the same mutual metric values in the Fig. 8.32. For example,
at the depth t = 3 to the states (000), (011) and (110) in Fig. 8.30 corresponds the
survived branch metric 0.016, while, in the Fig. 8.32 to the same states corresponds
the metric 1.5. Similarly, at the depth t = 5 to the metric 0.013 for generalized
Viterbi algorithm corresponds the correlation metric 2.5, and at the depth t = 6 to
the metric 7.68  10−4 corresponds metric 5.
In the book [26] it was shown that the Euclidean metric and the correlation
metric are equivalent concerning the making decisions when the channel noise is
Gaussian, but the correlation metrics are more easy to calculate and usually have a
smaller values. The advantage is obvious as well with the respect to calculate
metrics when a generalized Viterbi algorithm is applied—in Fig. 8.30 the metrics
had very small values already for a few trellis depths, and this approach for a
codewords having long length yields the metric values being difficult to express
with a sufficient exactness.
The second step in SOVA algorithm is the backward metric calculation, simi-
larly as for BCJR procedure. As it is shown in Fig. 8.33, the node metrics in this
case are accumulated from right to the left. In this example to every horizontal

1.5 -1.5 1.5 -1.5 0.5 1.5 1.5


5.5 7 0.5 -2 -3.5 -3 -1.5 0
S0 (000)
2 1.5 -0.5 -1.5 -1.5
6.5 2 5.5
-0.5 0 1.5 1.5
S1 (001)
-0.5 -1.5
-1.5 1.5
-0.5 0
-0.5
S2 (010)
1.5

3.5 4 2.5 3
1,5 -0.5
S3 (011)
2.5 -4
-1,5
-2.5 0.5
S4 (100)
-1,5 0.5

0.5 -2 0.5
S5 (101) 0.5
3.5 1.5
2
2
3.5 -2 0.5
S6 (110) 0.5
5 2 1.5
0.5
-1.5
S7 (111) 3.5

Fig. 8.33 SOVA algorithm, backward correlation metrics calculation


Problems 429

0 / 6.5 -1.5 / 7 0 / 5.5 1.5 / 2 5 / -1.5 6.5 / 0


3 / -3.5 2.5 / -3
S0 (000)
-1.5 1.5 -1.5 1.5 -0.5 -1.5 -1.5

0 / -0.5 6.5 / 0 5 / 1.5 1.5


S1 (001) 1.5 -1.5
-0.5 -1.5
1.5
-1.5 3 / -0.5 3.5 / 0
-0.5
S2 (010) 1.5
1.5

3 / 2.5 2.5 / 3
1.5 -0.5
S3 (011)
0 / 3.5 -1.5 1.5 / 4
-1.5 0.5
0 / -2.5
1.5
S4 (100)
-1.5 0.5
-1.5 -3 / 3.5 -1.5 4.5 / 2 6 / 0.5
S5 (101) 0.5
1,5
1.5
-1,5
1.5 3 / 0.5
S6 (110) 0.5
1.5 / 5 1.5 -1.5 1.5 / 2 1,5
3 / 3.5

0 / 3.5
S7 (111)

Fig. 8.34 SOVA algorithm, calculating the reliability estimation

transition corresponds the branch metric −yqt, while to the other transitions corre-
spond yqt (it was valid and for the metrics “forward”, shown in Fig. 8.32, as well).
When calculating the metrics “backward” there is no need to make comparisons,
decisions nor paths overwriting. The reader is advised to verify which path would
survive in this case!
After the metrics calculation for both ways, it is suitable to write them at one
trellis diagram, together with the branch metrics. For the considered example all
elements needed for the next algorithm steps are shown in the Fig. 8.34. On the
basis of this figure it is easy to find the reliability of the bit decoded at the trellis
depth t by using the following procedure.
1. For every branch starting from the depth t − 1 the starting node is noticed and
FW
its adjoined metric forward, denoted by dt1 , and the incoming node at the depth
t and its adjoined metric backward, denoted by dtBW . If this branch metric is
BM
denoted by dt1;t , a corresponding numerical value for this branch can be
calculated
FW BM
Mt ¼ dt1 þ dt1;t þ dtBW ,

it will be called the metric of a considered path.


2. All the branches at the trellis part from depth t − 1 to the depth t corresponding
to bits ‘0’ (here these are horizontal branches) at the encoder input are con-
sidered, and that one is found which has the minimal value for a path metric,
being further denoted as Mt(0).
430 8 Trellis Decoding of Linear Block Codes, Turbo Codes

3. All the branches at the trellis part from depth t − 1 to the depth t corresponding
to bits ‘1’ (the branches that are not horizontal) at the encoder input are con-
sidered, and that one is found which has the minimal value for a path metric,
will be further denoted as Mt(1).
4. If a bit ‘1’ is decoded the reliability of estimation is as greater as
Kt = Mt(1) − Mt(0) is greater. If a bit ‘0’ is decoded the reliability of estimation
is as greater as Kt = Mt(1) − Mt(0) is smaller.

In this case for a depth t = 1, the following is obtained

M1 ð0Þ ¼ 0  1:5 þ 7 ¼ 5:5; M1 ð1Þ ¼ 0 þ 1:5 þ 5 ¼ 6:5 ) K1 ¼ M1 ð1Þ  M1 ð0Þ


¼ 1[0

and because K is positive, it is clear that the probability that the codeword first bit is
binary one is greater.
At the depth t = 2

M2 ð0Þ ¼ minf1:5 þ 1:5 þ 5:5; 1:5 þ 1:5 þ 3:5g ¼ 5:5


M2 ð1Þ ¼ minf1:5  1:5 þ 3:5; 1:5  1:5 þ 3:5g ¼ 0:5
) K2 ¼ M2 ð1Þ  M2 ð0Þ ¼ 5\0

and it is obvious that the second decoded bit is binary zero, and because the
absolute value of parameter K2 is substantially greater than for a first bit, the
estimation of the second bit value is more reliable. It is in concordance with the
results of the third part of a previous problem.
At the depth t = 3

M3 ð0Þ ¼ minf0  1:5 þ 2; 0  1:5 þ 4; 3  1:5 þ 2; 3  1:5 þ 2g ¼ 2:5


M3 ð1Þ ¼ minf0 þ 1:5 þ 4; 0 þ 1:5 þ 2; 3 þ 1:5 þ 2; 3 þ 1:5 þ 2g ¼ 0:5
) K3 ¼ M3 ð1Þ  M3 ð0Þ ¼ 3 [ 0

the third decoded bit is binary one, and this decision was made being more reliable
than for the first codeword bit and less reliable than for the second codeword bit.
At the depth t = 4

M4 ð0Þ ¼ minf1:5 þ 1:5  3:5; 1:5 þ 1:5 þ 2:5; 4:5 þ 1:5 þ 0:5; 1:5 þ 1:5 þ 0:5g ¼ 0:5
M4 ð1Þ ¼ minf1:5  1:5 þ 3:5; 1:5  1:5  2:5; 4:5  1:5  0:5; 1:5  1:5  0:5g ¼ 2:5
) K4 ¼ M4 ð1Þ  M4 ð0Þ ¼ 2\0

and according to this criterion, the decoded bit is binary zero, but the decision is
relatively unreliable.
Problems 431

At the depth t = 5

M5 ð0Þ ¼ minf3  0:5  3; 0  0:5 þ 0; 3  0:5 þ 0; 3  0:5 þ 3g ¼ 0:5


M5 ð1Þ ¼ minf0 þ 0:5  3; 6 þ 0:5 þ 0; 3 þ 0:5 þ 0; 0 þ 0:5 þ 3g ¼ 2:5
) K5 ¼ M5 ð1Þ  M5 ð0Þ ¼ 2\0

It is interesting that at this depth binary one is decoded, but the parameter K5
value (negative one!) shows that this decision is very unreliable, and that it would
be more logical that in this case binary zero is decoded, but that for some reason is
not suitable (because of trellis structure).
At the depth t = 6

M6 ð0Þ ¼ minf2:5  1:5  1:5; 6:5  1:5 þ 1:5g ¼ 0:5


M6 ð1Þ ¼ minf3:5 þ 1:5  1:5; 2:5 þ 1:5 þ 1:5g ¼ 3:5
) K6 ¼ M6 ð1Þ  M6 ð0Þ ¼ 4 [ 0

and at this depth binary zero is decoded, but the great parameter K6 value shows
that this decision is extremely unreliable and almost surely wrong! Therefore, the
possibility of this bit inversion should be considered.
Finally, at the depth t = 7

M7 ð0Þ ¼ 5  1:5 þ 0 ¼ 3:5; M7 ð1Þ ¼ 5 þ 1:5 þ 0 ¼ 6:5


) K7 ¼ M7 ð1Þ  M7 ð0Þ ¼ 3 [ 0

and one can consider that in this case it is relatively reliable that binary one was
sent.
It is obvious that the fifth and the sixth bit are decoded with the smallest
reliability, but with only their inversion the obtained bit combination is not a
codeword. By the first bit inversion (next according to the unreliability level) the
codeword would be obtained that was really sent, i.e. c = (0010011). Although the
procedure for obtaining this word is simple, still it should be noticed that to the sent
codeword corresponds a higher correlation level with the received word, than to this
reconstructed one, as

dK ðc0 ; yq Þ ¼ yq1  yq2 þ yq3  yq4 þ yq5  yq6 þ yq7 ¼ 6:5;


dK ðc; yq Þ ¼ yq1  yq2 þ yq3  yq4  yq5 þ yq6 þ yq7 ¼ 5:5

i.e., the sent word is the next one, after above reconstructed, according the relia-
bility level.
Now, some important features of SOVA algorithm can be noticed:
1. SOVA overcomes the drawback of classic Viterbi algorithm where the esti-
mations of reconstruction reliability of code sequence were not available.
432 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Therefore, this procedure is an advanced version of that one being explained in


the first part of the proceeding problem.
2. This procedure uses a good feature of Viterbi algorithm that a decoded word
must correspond to a complete trellis path, i.e. it must be a codeword (BCJR has
not this feature). Just because of that a slightly better solution is obtained than in
the third part of a previous problem.
3. Neither here, the a priori probabilities at the encoder input are not taken into
account and it is ML procedure, and algorithm is still suboptimal in respect to
BCJR (the obtained solution is worse than that one in the second part of the
previous problem).
4. SOVA algorithm can be implemented and for unquantized signal values and can
be used for channels which are not discrete. Later, it will be shown that the
BCJR algorithm can be modified to be applied at such channels as well.

Problem 8.8 The system is considered whose transmitting side consists of the
source emitting a series of equiprobable zeros and ones 010011…, a convolutional
encoder whose structure is defined by generator G(D) = [1 + D + D2, 1 + D2] and
BPSK modulator. Signal power at the receiver input is Ps = 1 [lW] and signaling
rate is Vb = 1 [Mb/s]. In the channel the white Gaussian noise with the average
power density spectrum N0 = 10−12 [W/Hz] is added. The signal samples at the
channel output are

y ¼ ð0:95; 1:2; 0:1; 0:05; 1:2; 1:01; 0:3; 1:13; 0:4; 16Þ;

while in the receiver a coherent BPSK demodulation is used as well as a soft


decision decoding. Decode a transmitted information sequence if one block consists
of three information bits for the following cases:
(a) BCJR algorithm is used for a previously given parameters.
(b) BCJR algorithm is used and it is supposed that in the channel the parameter N0
changes as N0 = 10−13 [W/Hz], N0 = 10−11 [W/Hz] and N0 = 10−10 [W/Hz],
for the same input sequence.
(c) BCRJ algorithm is used for N0 = 10−11 [W/Hz], but the probabilities of
symbols emitted by a source are related as P(0) = 9P(1) or P(1) = 9P(0).

Solution
(a) The code rate is R = 1/2 and a decoding is performed by blocks consisting of
three information bits, and the input decoder sequence is separated into the
groups consisting of ten samples. In this case the signal entering the decoder is
not quantized and a channel has not a finite number of output symbols (it is not
discrete). Therefore, the transition probabilities cannot be found and the
branch metrics are calculated according to the formula [73]
Problems 433

!
ut Lc X
n
ct ðS ðt1Þ
; S Þ ¼ Ct exp
ðtÞ
Lðut Þ exp xtl ytl
2 2 l¼1

where Ct does not affect the final result because it is canceled during the Log-
Likelihood Ratio (LLR) coefficients calculating. L(ut) is a priori LLR for the kth bit
at the encoder input (ut = 1 for it = 1 and ut = −1 for it = 0) and for equiprobable
symbols it is obtained

Pðut ¼ þ 1Þ
Lðut Þ ¼ ld ¼ 0:
Pðut ¼ 1Þ

Lc is defined as

Ec Eb
Lc ¼ 4a ¼ 4aR ;
N0 N0

where Ec = REb denotes energy needed to transmit one bit of the codeword and Eb
denotes energy needed to transmit one bit of the information word. Let the
instantaneous value of amplification (corresponding to the channels with fading) is
denoted by a. This value can change from one codeword to another (even from one
to the other bit inside the codeword, if the fading is fast).
In this problem, the fading is not supposed and the value a = 1 is taken, and
because Ck does not influence to the result, Ct = 1 will be used. For such numerical
values it is obtained

Ec Ps
¼ ¼ 1 ) Lc ¼ 4:
N0 N0 Vb

The branch metrics are now considerably simplified becoming


!
X
n
ct ðSðt1Þ ; SðtÞ Þ ¼ exp 2 xtl ytl
l¼1

where n denotes a number of convolutional encoder outputs (here n = 2), defining a


number of samples at the decoder input in one step. The corresponding trellis is
shown in Fig. 8.35, the corresponding metrics are
1. In the first step the following is received y11 = −0.95 and y12 = −1.2, yielding:

c1 ð0; 0Þ ¼ expð2½ð1Þ  ð0:95Þ þ ð1Þ  ð1:2ÞÞ ¼ 73:6998;


c1 ð0; 2Þ ¼ expð2½ð þ 1Þ  ð0:95Þ þ ð þ 1Þ  ð1:2ÞÞ ¼ 0:0136:
434 8 Trellis Decoding of Linear Block Codes, Turbo Codes

-0.95 -1.2 -0.1 -0.05 -1.2 1.01 0.3 1.13 -0.4 -16

0/(-1,-1) 0/(-1,-1) 0/(-1,-1) 0/(-1,-1) 0/(-1,-1)


00
1/(+1,+1)
1/(+1,+1) 1/(+1,+1)
1/(+1,+1) 1/(+1,+1) 1/(+1,+1)
01
0/(-1,-1) 0/(-1,+1)
0/(-1,+1)
0/(-1,+1)

10
1/(+1,-1)
1/(+1,-1)
1/(+1,-1) 1/(+1,-1)

11
0/(-1,+1)

Fig. 8.35 Trellis corresponding to recursive convolutional encoder for a given input sequence

2. In the second step the following is received y21 = −0.1 and y22 = −0.05,
yielding

c2 ð0; 0Þ ¼ expð2½ð1Þ  ð0:1Þ þ ð1Þ  ð0:05ÞÞ ¼ 1:3499;


c2 ð0; 2Þ ¼ expð2½ð þ 1Þ  ð0:1Þ þ ð þ 1Þ  ð0:05ÞÞ ¼ 0:7408;
c2 ð2; 1Þ ¼ expð2½ð1Þ  ð0:1Þ þ ð þ 1Þ  ð0:05ÞÞ ¼ 1:1052;
c2 ð2; 3Þ ¼ expð2½ð þ 1Þ  ð0:1Þ þ ð1Þ  ð0:05ÞÞ ¼ 0:9048:

3. In the third step the following is received y31 = −1.2 and y32 = 1.01, yielding:

c3 ð0; 0Þ ¼ 1:4623; c3 ð0; 2Þ ¼ 0:6839; c3 ð1; 0Þ ¼ 0:6839; c3 ð1; 2Þ ¼ 1:4623;


c3 ð2; 1Þ ¼ 83:0963; c3 ð2; 3Þ ¼ 0:012; c3 ð3; 1Þ ¼ 0:012; c3 ð3; 3Þ ¼ 83:0963:

4. In the fourth step the following is received y41 = 0.3 and y42 = 1.13, yielding:

c4 ð0; 0Þ ¼ 0:0573; c3 ð1; 0Þ ¼ 17:4615; c4 ð2; 1Þ ¼ 5:2593; c4 ð3; 1Þ ¼ 0:1901:

5. In the last, fifth, step the following is received y51 = −0.4 and y52 = −16,
yielding:

c5 ð0; 0Þ ¼ 1:75  1014 ; c5 ð1; 0Þ ¼ 5:69  1015 :

Forward metrics are calculated by using the same formula as in Problem 8.6, and
from a0(0) = 1, it is easy to calculate
Problems 435

a01 ð0Þ ¼ a0 ð0Þc1 ð0; 0Þ ¼ 73:6998; a01 ð2Þ ¼ a0 ð0Þc1 ð0; 2Þ ¼ 0:0136:

To reduce the overflow/underflow probability, the metrics forward are scaled so


as that their sum equals one

a01 ð0Þ a01 ð2Þ


a1 ð0Þ ¼ ¼ 0:9998; a1 ð2Þ ¼ ¼ 0:0002:
a1 ð0Þ þ a01 ð2Þ
0 a1 ð0Þ þ a01 ð2Þ
0

The metrics corresponding to the depth t = 2 are

a02 ð0Þ ¼ a1 ð0Þc2 ð0; 0Þ ¼ 1:3496; a02 ð1Þ ¼ a1 ð2Þc2 ð2; 1Þ ¼ 2:21  104 ;
a02 ð2Þ ¼ a1 ð0Þc2 ð0; 2Þ ¼ 0:7404; a02 ð3Þ ¼ a1 ð2Þc2 ð2; 3Þ ¼ 1:81  104 ;

and the corresponding normalized forward metric values at the depth t = 2 are
obtained dividing the previously obtained results by a02 ð0Þ þ a02 ð1Þ þ a02 ð2Þ þ
a02 ð3Þ ¼ 2:0904, yielding

a2 ð0Þ ¼ 0:6455; a2 ð1Þ ¼ 1:06  104 ; a2 ð2Þ ¼ 0:3543; a2 ð3Þ ¼ 0:87  104 :

In a similar way, the backward metrics are calculated and b5(0) = 1, for the
depth t = 4 yielding

b04 ð0Þ ¼ c5 ð0; 0Þb5 ð0Þ ¼ 1:75  1014 ; b04 ð1Þ ¼ c5 ð1; 0Þb5 ð0Þ ¼ 5:69  1015 ;

and the numerical values after normalization are (the values are exact to twenty
ninth decimal!).

b01 ð0Þ b04 ð1Þ


b4 ð0Þ ¼  1; b4 ð1Þ ¼  0:
b4 ð0Þ þ b04 ð1Þ
0
b4 ð0Þ þ b04 ð1Þ
0

Backward metrics corresponding to depth t = 3 are

b03 ð0Þ ¼ c1 ð0; 0Þb4 ð0Þ ¼ 0:0573; b03 ð1Þ ¼ c1 ð1; 0Þb4 ð0Þ ¼ 17:4615;
b03 ð2Þ ¼ c1 ð2; 1Þb4 ð1Þ ¼ 0; b03 ð3Þ ¼ c1 ð3; 1Þb4 ð1Þ ¼ 0;

and the corresponding normalized backward metric values are

b3 ð0Þ ¼ 0:0033; b3 ð1Þ ¼ 0:9967; b3 ð2Þ ¼ 0; b3 ð3Þ ¼ 0;

the numerical values (normalized!) all forward and backward metrics are given in
Table 8.13.
436 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Table 8.13 Node forward and backward metrics for BCJR algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward at(0) 1 0.9998 0.6455 0.0306 0.9955 1
at(1) 0 0 0.0001 0.9547 0.0045 0
at(2) 0 0.0002 0.3543 0.0143 0 0
at(3) 0 0 0.0001 0.0004 0 0
Backward bt(0) 1 0.9998 0.0001 0.0033 1 1
bt(1) 0 0 0 0.9967 0 0
bt(2) 0 0.0002 0.9998 0 0 0
bt(3) 0 0 0.0001 0 0 0

It is interesting to notice the paths of maximal forward and backward metrics:


– Starting from the left to the right, the optimal states sequence would be
0 ! 0 ! 0 ! 1 ! 0 ! 0. Because the transition 0 ! 1 at the trellis at the
third step is not possible, the nearest paths are 0 ! 0 ! 2 ! 1 ! 0 ! 0
(corresponding to information sequence 010) and 0 ! 0 ! 0 ! 0 ! 0 ! 0
(corresponding to information sequence 000). The metrics are in average larger
for the first variant.
– When starting from the trellis end to the left, the optimal states sequence would
be 0 ! 2 ! 2 ! 1 ! 0 ! 0. However, the transition 2 ! 2 at the trellis at
the second step is not possible, the nearest paths are 0 ! 0 ! 2 ! 1 ! 0 ! 0
(corresponding to information sequence 010) and the path 0 ! 2 ! 1 ! 0
0 ! 0 (corresponding to information sequence 100). The metrics are larger for
the first variant.
However, the final decision is not made from forward metrics nor backward
metrics, but on the basis of equality defined earlier in Problem 8.6

rt ðSðt1Þ ; SðtÞ Þ ¼ at1 ðSðt1Þ Þct ðSðt1Þ ; SðtÞ Þbt ðSðtÞ Þ;

where this value should be normalized so as that a sum over all transitions at a given
depth is equal to one, to represent the transition probability from the state S(t−1) into
the state S(t) for a known received sequence y. The corresponding LLR for the tth
information bit after the received sequence y can be found from the relation (non
normalized values can be used)
P
rt ðSðt1Þ ; SðtÞ Þ
Lðut jyÞ ¼ ln PR1 ;
R0 rt ðSðt1Þ ; SðtÞ Þ

where R1 denotes the transitions in the trellis corresponding to information bit ‘1’
(ut = +1), while R0 denotes the transitions corresponding to information bit ‘0’
(ut = −1). Because the general trellis structure is the same in every step, it can be
written
Problems 437

rt ð0; 2Þ þ rt ð1; 0Þ þ rt ð2; 3Þ þ rt ð3; 1Þ


Lðut jyÞ ¼ ln ;
rt ð0; 0Þ þ rt ð1; 2Þ þ rt ð2; 1Þ þ rt ð3; 3Þ

and if in some steps (for some trellis depths) some transitions do not exist, the
corresponding probabilities are equal to zero and a previous equality is further
simplified.
Finally, it is obtained

r1 ð0; 2; yÞ a0 ð0Þc1 ð0; 2Þb1 ð2Þ


Lðu1 jyÞ ¼ ln ¼ ln ¼ 17:035;
r1 ð0; 0; yÞ a0 ð0Þc1 ð0; 0Þb1 ð0Þ
r2 ð0; 2Þ þ r2 ð2; 3Þ
Lðu2 jyÞ ¼ ln ¼ 9:16;
r2 ð0; 0Þ þ r2 ð2; 1Þ
r3 ð0; 2Þ þ r3 ð1; 0Þ þ r3 ð2; 3Þ þ r3 ð3; 1Þ
Lðu3 jyÞ ¼ ln ¼ 17:035;
r3 ð0; 0Þ þ r3 ð1; 2Þ þ r3 ð2; 1Þ þ r3 ð3; 3Þ

while for a terminating part of trellis the expressions are slightly different, but as a
rule simplified (the corresponding encoder inputs are uniquely determined by
information sequence, but are not necessarily equal to all zeros sequence) yielding
here

r4 ð1; 0Þ þ r4 ð4; 2Þ r5 ð1; 0; yÞ


Lðu4 jyÞ ¼ ln ¼ 9:16; Lðu5 jyÞ ¼ ln ¼ 1;
r4 ð0; 0Þ þ r4 ð3; 2Þ r5 ð0; 0; yÞ

LLR values and the branch metrics with a denoted path corresponding to a
codeword are shown in Fig. 8.36.
Decoded sequence of information bits is i1 = 0, i2 = 1 and i3 = 0, where the
second bit is decoded with smaller reliability than the other two. It should be noted
that this code is in fact a block code (although realized by the convolution encoder)

-17.035 9.16 -17.04 12.27 -∞

1 0.0001 0.0001 0.0001 1


00
0
0.9999 0.9999 0
0 0
01
0 0
0
0.9999

10
0
0
0 0

11 0

Fig. 8.36 The result of BCJR procedure


438 8 Trellis Decoding of Linear Block Codes, Turbo Codes

because at the encoder input is the three bits block to which two bits (tail bits) are
added for the trellis termination, to which at the encoder output the corresponding
sequence has ten bits. To the next information bits 001 correspond terminating bits
11, while from the line encoder in this case the following sequence is emitted −1,
−1, +1, +1, +1, −1, +1, −1, +1, +1.
(b) It can be noticed that the branch metrics depend on the channel signal-to-noise
ratio and on the symbol probabilities at the encoder input, what can be easily
seen if they are written in a developed form

  !
ut Pðut ¼ þ 1Þ Ec Xn
ct ðS ðt1Þ
; S Þ ¼ Ct exp
ðtÞ
ld exp 2a xtl ytl :
2 Pðut ¼ 1Þ N0 l¼1

When Pðut ¼ þ 1Þ ¼ Pðut ¼ 1Þ ¼ 0:5; the first exponential term is equal to
one, but the branch metrics are still dependent on Ec/N0.
In Table 8.14 it is clearly shown that in considered example the correct decision
is made in all analyzed cases, where for smaller values of parameter Ec/N0 the
estimation becomes very unreliable. For example, when Ec/N0 = −20 [dB], nor-
malized transition probabilities corresponding to the correct path are (0.504,
0.2673, 0.1839, 0.2191, 1) showing a small reliability of the decision, if compared
to the values from Fig. 8.36.
Here it is essential to notice that for Viterbi decoding (either for hard outputs,
either for soft outputs, i.e. SOVA) the decision in any way does not depend on the
ratio Ec/N0 neither on the signal-to-noise ratio!
(c) For a case when the symbols at the encoder input are not equiprobable, BCJR
algorithm can take it into account. In Fig. 8.37 are shown LLR estimations
corresponding to the probabilities P(it) = 0.1 and P(it = 1) = 0.9, as well as
the paths corresponding to decoded information words (including the termi-
nating bits as well), for the case Ec/N0 = 0 [dB]. BCJR decoder favors these
information sequences which are more probable, i.e., for a difference to SOVA
algorithm, it takes into account the source characteristics.

Problem 8.9 The system described in the previous problem is considered, the
source parameters, convolutional encoder and modulator, as well as all numerical
values are the same as in the first part of the previous problem (Ps = 1 [lW], Vb = 1

Table 8.14 BCJR procedure results for various estimation of ratio Ec/N0 in the channel
Ec/N0 (dB) Lc LLR estimations
t=1 t=2 t=3 t=4 t=5
10 40 −172.4 91.6 −172.4 91.6 −710
0 4 −17.035 9.16 17.035 9.16 −70.99
−10 0.4 −1.4372 0.7533 1.4356 0.6696 −6.79
−20 0.04 −0.1021 −0.0238 −0.0756 −0.7199 −0.6582
Problems 439

P(it =1)=0.1 → -5.3643 -3.4810 -5.3613 -3.4823 -71.01


P(it =1)=0.9 → 2.6479 4.5722 2.6470 3.8167 -70.93
00
0 0 0 0
0
1
01
1

10

1 1

11

Fig. 8.37 BCJR procedure results for different symbol probabilities at the encoder input

[Mb/s], N0 = 10−12 [W/Hz], equally probable symbols at the encoder input).


Decode the emitted information sequence for a received sequence

y ¼ ð0:95; 1:2; 0:1; 0:05; 1:2; 1:01; 0:3; 1:13; 0:4; 16Þ;

if in one block three information bits are transmitted, and the decoding is performed
by using
(a) Log-MAP algorithm
(b) Max-log-MAP algorithm

Solution
BCJR algorithm, although optimum, has some drawbacks:
– the overflow easily happens during the calculation forward and backward
metrics,
– a great number of multiplications should be performed.
In previous problem it was shown that the first drawback can be lessened by
normalization of a calculated metrics after every step. However, both problems can
be simultaneously solved by calculating logarithmic branch metrics values

ut Lc X
n
Ct ðSðt1Þ ; SðtÞ Þ ¼ ln ct ðSðt1Þ ; SðtÞ Þ ¼ ln Ct þ Lðut Þ þ xtl ytl ;
2 2 l¼1

providing that metrics forward and backward are obtained by a series of successive
additions
440 8 Trellis Decoding of Linear Block Codes, Turbo Codes

n o
At ðSðtÞ Þ ¼ ln at ðSðtÞ Þ ¼ max  At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ
Sðt1Þ
n o
Bt1 ðSðt1Þ Þ ¼ ln bt ðSðtÞ Þ ¼ max  Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
SðtÞ

where the maximizing in the first case is performed over a set of all input paths into
a state S(t), while in the second case the choice is performed over these paths leaving
the considered state. A procedure for maximization performed in above equalities is
just the point where these two algorithms differ.
(a) For log-MAP algorithm [74] this operator can be changed by relation

max fa; bg ¼ maxfa; bg þ lnð1 þ expðja  bjÞÞ;

and the numerical values of corresponding forward and backward metrics are given
in Table 8.15. When calculating metrics, operation max* is always used for the
case when there are two arguments and the above equality can be directly applied.
When the metrics forward and backward are found, they can be used for finding
the estimations for LLR because

n o
Lðut jyÞ ¼ max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R1

n o
 max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ ;
R0

and because log-MAP algorithm is a logarithmic equivalent of MAP algorithm,


LLR values are the same as in the first part of a previous problem

Lðu1 jyÞ ¼ 17:035; Lðu2 jyÞ ¼ 9:16; Lðu3 jyÞ


¼ 17:035; Lðu4 jyÞ ¼ 9:16; Lðu5 jyÞ ¼ 70:99:

Table 8.15 Node metrics forward and backward for log-MAP algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward At(0) 0.0000 4.3000 4.6000 4.9801 11.2801 44.0801
At(1) −∞ −∞ −4.2000 8.4200 5.8812 −∞
At(2) −∞ −4.3000 4.0000 4.2203 −∞ −∞
At(3) −∞ −∞ −4.4000 0.5172 −∞ −∞
Backward Bt(0) 44.0801 39.7801 30.3200 29.9400 32.8000 0.0000
Bt(1) 0 0 29.5600 35.6600 −32.8000 −∞
Bt(2) 0 31.3451 40.0800 −31.1400 −∞ −∞
Bt(3) 0 0 31.2400 −34.4600 −∞ −∞
Problems 441

It should be noticed that maximization in the above relation can be performed


and for more than two arguments, this problem can be overcomed easily by a
recursive by using of the corresponding operation (in every step two inputs are
represented as one output, becoming the input for the next step).
(b) For max-log-MAP algorithm [75] the approximation is used

max fa; bg ¼ maxfa; bg;

the corresponding metric numerical values are given in Table 8.16. When the
signal-to-noise ratio is sufficiently large (here it is Ec/N0 = 10 [dB]) the differences
in metric values in respect to BCJR algorithm are very small, while for the worse
channel conditions these differences grow.
LLR estimations in this case can be found directly by using of relation
n o
Lðut jyÞ ¼ max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R1
n o
 max At1 ðSðt1Þ Þ þ Ct ðSðt1Þ ; SðtÞ Þ þ Bt ðSðtÞ Þ
R0

yielding

Lðu1 jyÞ ¼ 17:24; Lðu2 jyÞ ¼ 9:16; Lðu3 jyÞ


¼ 17:24; Lðu4 jyÞ ¼ 9:16; Lðu5 jyÞ ¼ 44:08:

It is obvious that and log-MAP and max-log-MAP take into account the channel
quality influence (by using parameter Lc depending on Ec/N0), as well as the binary
symbols probabilities at the encoder input (by using parameter L(ut)).

Lðut jyÞ ¼ F fLðut Þ; Lc yt g

Table 8.16 Node metrics forward and backward for max-log-MAP algorithm
t=0 t=1 t=2 t=3 t=4 t=5
Forward At(0) 0.0000 4.3000 4.6000 4.9800 11.2800 44.0800
At(1) −∞ −∞ −4.2000 8.4200 5.8800 −∞
At(2) −∞ −4.3000 4.0000 4.2200 −∞ −∞
At(3) −∞ −∞ −4.4000 0.0002 −∞ −∞
Backward Bt(0) 44.0800 39.7800 30.3200 29.9400 32.8000 0.0000
Bt(1) 0 0 29.5600 35.6600 −32.8000 −∞
Bt(2) 0 31.1400 40.0800 −31.1400 −∞ −∞
Bt(3) 0 0 31.2400 −34.4600 −∞ −∞
442 8 Trellis Decoding of Linear Block Codes, Turbo Codes

i
+
x1 Conversion
Block + p1
in polar form,
interleaver 0 → -1 x

i% 1 → +1
x2 +
multiplexing
+ ⎡0 1 ⎤
Π=⎢ ⎥
⎣1 0 ⎦
+
p2

Fig. 8.38 Turbo encoder block diagram

which is their important advantage in respect to SOVA algorithm. By this proce-


dure, for a priori estimation LLR of information bits values at the encoder input, an
improved estimation is obtained, after the receiving the input sequence y at the
decoder input (more information is obtained!).
Problem 8.10 Consider turbo encoder having the structure shown in Fig. 8.38,
where the interleaving is performed over a block of N = 5 bits, according to the
scheme
 
1 2 3 4 5
If:g ¼
5 1 4 2 3

where the second row denotes the bit positions in a sequence obtained at the
interleaver output, while in the first row are ordinal numbers of bits in the original
sequence. Encoder output is led into the puncturing block (described by a punc-
turing matrix P), represented by polar pulses and after multiplexing symbol-by-
symbol is transmitted over a channel where is AWGN, resulting in Ec/N0 = 0 [dB].
(a) Find sequence x emitted into the channel if at the encoder input is the
information bits sequence i = (10101). In particular, explain the procedure for
terminating bits obtaining and at trellis show the paths corresponding to code
sequences at the outputs of component encoders.
(b) Decode the transmitted information sequence if log-MAP algorithm is used
and at the turbo decoder input the received sequence is
Problems 443

y ¼ ð2; 1:5; 0:5; 0:4; 1:8; 0:3; 0:2; 0:13; 0:1; 0:1; 0:2; 0:15; 0:1; 0:08Þ:

Solution
(a) In this case turbo encoder is realized as a parallel cascade combination of two
recursive systematic convolutional encoders (RSC), separated by a block
interleaver. At the beginning it is supposed that both encoders are reset and for
the input sequence i (determining output x1) firstly are found the outputs of the
first and the second RSC where are the parity-check bits and after that, they are
punctured, so as that at output x2 alternatively are sent odd bits of sequence p1
and even bits of sequence p2. All sequences are given in details in Table 8.17,
and the sequence at the encoder output is

x ¼ ð þ 1; þ 1; 1; 1; þ 1; þ 1; 1; 1; þ 1; þ 1; 1; þ 1; þ 1; þ 1Þ:

Sequences corresponding to two RSC encoder outputs are shown as well in


Fig. 8.39, by heavy lines. It is obvious that to a bit sequence at the first RSC input,
being i = (10101), correspond the terminating bits 01, while to the sequence of
information bits at the second RSC (i.e. at the interleaver output i′ = (11001))
correspond the terminating bits 10. In Table 8.16 in the last two columns the
sequences are given corresponding to the trellis termination that must be sent to the

Table 8.17 Sequences in some points of the encoder from Fig. 8.38
t 1 2 3 4 5 6 7
i 1 0 1 0 1 0 1
i′ 0 0 1 1 1 1 0
p1 1 1 1 0 1 1 1
p2 0 0 1 0 0 1 0
x1 1 0 1 0 1 0 1
x2 1 0 1 0 1 1 1

0/00 0/00 0/00 0/00 0/00 0/00 0/00


00
1/11 1/11 1/11
1/11 1/11

1/11 1/11 1/11 1/11 1/11


01
0/00 0/00 0/00 0/01
0/01
0/01
0/01 0/01
10
1/10 1/10 1/10
1/10
1/10 1/10 1/10 1/10

11 0/01 0/01 0/01

Fig. 8.39 Codewords forming in two RSC


444 8 Trellis Decoding of Linear Block Codes, Turbo Codes

encoder output to return (reset) it in the initial state and to be ready to transmit a
new information block.
Decoding procedure is performed iteratively [68, 79]:
1. At the first decoder input:
– sequence y1 is led, corresponding to a transmitted sequence x1, containing
information about sent information symbols. On this basis the series of
symbols Lc y1 is calculated.
– by combination of y1 and y2,p1 (corresponding to the sequence x2, to the
parity-check bits from the first RSC decoder output, while the rest of
sequence is filled with zeros), a sequence y(I) is formed and from it by using
log-MAP algorithm, the improved estimation L1 ðut jyÞ is found, and further

Le1 ðut jyÞ ¼ L1 ðut jyÞ  L2 ðut Þ  Lc y1 :

where at the beginning it is set usually L2 ðut Þ ¼ 0ð8tÞ, i.e. it is determined by


the source characteristics.
2. at the second decoder input
– sequence y1′ is led, corresponding to the transmitted sequence x1 after the
interleaver ! Lc y1′.
– from sequences y1′ and y2,p2 the sequence y(II) is formed, and the second
decoder finds

Le2 ðut jyÞ ¼ L2 ðut jyÞ  L1 ðut Þ  Lc y01 ;

where L1 ðut Þ ¼ I fLe1 ðut jyÞg is determined by a value obtained in the


previous step.
As shown in Fig. 8.40, in the next step Le2 ðut Þ is led through deinterleaver and a
priori LLR are found for the first decoder in the second iteration, because

DEINTERLEAVING
L2(ut)
Le2(ut /y)
Lcyt1 DECODER 1 INTERLEAVING

Lcyt2,p1 Le1(ut /y)


L1(ut)

Lcyt2,p2 INTERLEAVING DECODER 2

L2(ut /y)
output
DEINTERLEAVING

Fig. 8.40 Turbo decoding principle


Problems 445

L2 ðut Þ ¼ I 1 fLe2 ðut jyÞg. This procedure is repeated successively until the code-
word is decoded or until a given maximum number of iteration is achieved. In this
case, the path corresponding to the calculated values L1 ðut jyÞ is shown in Fig. 8.41.
To these values corresponds the procedure of determining Le1 ðut Þ, shown below

y1 ¼ 2 0:5 1:8 0:2 0:1 0:2 0:1


L1 ðut jyÞ ¼ 2:9806 3:4712 7:9516 1:4854 0:0031 1:4803 0:1193
L2 ðut Þ ¼ 0 0 0 0 0 0 0
Lc y1 ¼ 8 2 7:2 0:8 0:4 0:8 0:4
Le1 ðut Þ ¼ 5:0194 1:4712 0:7516 0:6854 0:4031 0:6803 0:2807

Of course, the receiver does not know whether the heavy line in the Fig. 8.41
really corresponds to the emitted information sequence. However, even here it can
be concluded that the first bit of information sequence is i1 = 1 (highly reliable!)
although the corresponding received symbol is negative (−2). The rest of the bits
cannot be yet reliable decoded and a procedure is continued by leading Le1(ut)
through the interleaver and sent to the other RSC decoder as an estimation of a
priori RLL for sequences y1′.

-2.9806 -3.4712 7.9516 -1.4854 -0.0031 -1.4803 0.1193


0/00 0/00 0/00 0/00 0/00 0/00 0/00
00
1/11 1/11 1/11
1/11 1/11
1/11 1/11 1/11 1/11 1/11
01
0/00 0/00 0/00 0/01
0/01
0/01
0/01 0/01
10
1/10 1/10 1/10
1/10
1/10 1/10 1/10 1/10

11 0/01 0/01 0/01

Fig. 8.41 The decoding procedure for the first RSC during the first iteration

-2.6450 0.1110 0.5064 7.9508 -2.9789 -0.1109 0.4586


0/00 0/00 0/00 0/00 0/00 0/00 0/00
00
1/11 1/11 1/11
1/11 1/11
1/11 1/11 1/11 1/11 1/11
01
0/00 0/00 0/00 0/01
0/01
0/01
0/01 0/01
10
1/10 1/10 1/10
1/10
1/10 1/10 1/10 1/10

11 0/01 0/01 0/01

Fig. 8.42 The decoding procedure for the second RSC during the first iteration
446 8 Trellis Decoding of Linear Block Codes, Turbo Codes

Table 8.18 Turbo decoders outputs during the first four iterations, L1 ðut jyÞ and L2 ðut jyÞ
t=1 t=2 t=3 t=4 t=5 t=6 t=7
The first iter = 1 −2.9806 −3.4712 7.9516 −1.4854 −0.0031 −1.4803 0.1193
RSC iter = 2 2.0350 −3.7935 9.1071 −0.8265 1.1469 −0.8789 1.1729
decoder
iter = 5 17.5989 −6.7464 14.9302 −0.9586 4.4211 −0.9622 4.4210
iter = 10 44.8247 −13.4653 26.7319 −1.9421 11.2654 −1.9421 11.2654
The iter = 1 −2.6450 0.1110 0.5064 7.9508 −2.9789 −0.1109 0.4586
second iter = 2 −3.5272 0.4030 1.4530 9.2648 −1.8882 −0.4028 −0.9194
RSC
iter = 5 −6.8240 0.0487 4.7568 15.2578 17.4199 −0.0487 −4.6431
decoder
iter = 10 −13.2631 −0.9421 11.8841 27.3506 44.9234 0.9421 −11.6595

Finding L2 ðut jyÞ is shown in Fig. 8.42 and the further procedure is

y01 ¼ 0:5 0:2 0:1 1:8 2 0 0


L2 ðut jyÞ ¼ 2:6450 0:1110 0:5064 7:9508 2:9789 0:1109 0:4586
L1 ðut Þ ¼ 1:4712 0:6854 0:4031 0:7516 5:0194 0 0
Lc y01 ¼ 2 0:8 0:4 7:2 8:0 0 0
Le2 ðut Þ ¼ 10:0350 1:7935 1:9071 0:0265 0:7469 0:0789 0:7729

and Le2(ut) is a set of a priori LLR estimations which are led to the input of the first
decoder in the second iteration.
Estimations L1 ðut jyÞ and L2 ðut jyÞ after a few first iterations are given in
Table 8.18. Erroneously decoded bits after every particular iteration are shadowed
in table and it is clear that already after the second iteration, the first RSC decoder
successfully decodes full information sequence, while the second component
decoder needs more than five iterations to eliminate the errors influence. It is
obvious that the estimations are more and more reliable after additional iterations. It
is easy to verify that a decoded sequence is i = (10101) and it just corresponds to
the sequence at the decoder input in the first part of this problem (and, as earlier
shown, to it corresponds the sequence i′ = (11001) at the interleaver output).
Chapter 9
Low Density Parity Check Codes

Brief Theoretical Overview

Besides the turbo codes, there is one more class of linear block codes that makes
possible to approach to the Shannon bound. These are Low Density Parity Check
(LDPC) codes. They were proposed by Gallager [83]. In principle they have a
sparse parity-check matrix. In such a way the corresponding parity-check equa-
tions have a small number of terms providing significantly smaller complexity
compared to standard linear block codes. They provide the iterative decoding with
a linear complexity. Even then it seemed that this class of codes has a good
performance, but the contemporary hardware and software were not suitable for
their practical implementation. Using graph theory Tanner in 1981 [84] proposed an
original interpretation of LDPC codes, but his work was practically ignored for the
next 15 years. Not until mid-nineties some researches started to consider these
codes and the decoding using graphs. Probably the most influential was work of
David McKay [85] who demonstrated that, the performance very near to the
Shannon limit can be achieved by using iterative decoding.
LDPC codes are linear block codes constructed by using control matrix H where
the number of nonzero components is small. The corresponding parity-check
equations, even for a long codeword have a small number of terms yielding sub-
stantially smaller complexity than the corresponding equations for typical linear
block code which has the same parameters. For binary codes, control matrix has a
great number of zeros and a few ones.
According to Gallager definition, the following conditions should be satisfied
yielding a code which has low density parity-checks:
1. Control matrix H has a fixed number of ones in every row, denoted by q.
2. Control matrix H has a fixed number of ones in every column. This number is
denoted by c (usually c  3).

© Springer International Publishing AG 2017 447


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1_9
448 9 Low Density Parity Check Codes

3. To avoid the cycles in the corresponding bipartite graph (will be explained later)
in any two columns (or rows) the coincidence of ones should not be more than at
one position.
4. Parameters s and t should be as small as possible in relation to codeword length
n.
The code satisfying above conditions is denoted as CLDPC (n, c, q). Control
matrix has n columns and n–k rows, and the following must holds cn = q(n–k). If
the columns of matrix H are linearly independent code rate of LDPC code is

R ¼ 1  c=q;

and matrix density is

r ¼ q=n ¼ c=ðn  kÞ;

being for typical LDPC codes r  0:02 and regular LDPC code
(Problems 9.1, 9.2, and 9.6) is obtained. However, it is very difficult to construct
such code, especially because of condition 3. It was shown that for efficient LDPC
codes the cycles cannot be avoided. Therefore, it is allowed that a number of ones
in some rows (columns) differs from prescribed values and in this case c and q are
average values of these parameters. The corresponding code is called irregular
LDPC code (Problem 9.1).
As an example, consider control matrix
2 3
1 1 0 0 0 0
H ¼ 40 0 1 1 0 0 5:
0 0 0 0 1 1

Number of ones in every column is c = 1, and number of ones in every row is


q = 2. Therefore it is regular LDPC code having the matrix density
r = 1/3 = 2/6 = 0.333 and the code rate R = 1−c/q = 1/3. The matrix has n = 6
columns and n–k = 3 rows. It is code (n, k) = (6, 3).
Consider now control matrix
2 3
0 1 0 1 0 1 1 1 0 0 0 1
61 0 1 1 0 0 0 0 1 0 0 07
6 7
60 1 0 0 1 0 1 0 0 0 0 17
6 7
61 0 0 1 0 0 0 0 0 1 1 07
H¼6
60
7:
6 0 1 0 1 1 0 0 0 1 0 077
61 0 1 0 0 0 1 1 0 0 1 07
6 7
40 1 0 0 0 1 0 1 1 1 0 05
0 0 0 0 1 0 0 0 1 0 1 1

Number of ones in every column is c = 3, while the number of ones differs in


rows and has the average value q = 4.5. It is irregular LDPC code which has matrix
Brief Theoretical Overview 449

density r = 3/8 = 4.5/12 = 0.375 and the code rate R = 1– c/q = 1/3. The matrix
has n = 12 columns and n–k = 8 rows, basic code parameters are (n, k) = (12, 4).
Therefore, a code rate can be found and if the number of ones in every column is
not the same.
At the end consider now square matrix having c = q = 3 ones in every column
and row
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
6 7
60 0 1 1 0 1 07
6 7
H¼6
60 0 0 1 1 0 177:
61 0 0 0 1 1 07
6 7
40 1 0 0 0 1 15
1 0 1 0 0 0 1

In any two columns (or rows) the ones do not coincide at more than one position
and the matrix density is relatively small (r = 3/7 = 0.4286), this matrix seems to
satisfy the condition to be LDPC code control matrix. However, it should be
noticed that the code rate is here R = 1–3/3 = 0!
Furthermore, the rows in this matrix are not linearly independent and it cannot be
control matrix of a linear block code. By eliminating the linearly dependent rows,
the following is obtained
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H LN ¼6
40
7:
0 1 1 0 1 05
0 0 0 1 1 0 1

However, although the number of ones in every rows is fixed ðc0 ¼ 3Þ, the
number of ones in every column varies from 1 to 3 (the average value is c0 ¼ 1:71).
It may be seen that the number of matrix rows is (n–k) = 4 and a linear block code
(7, 3) is obtained having the code rate R ¼ 1  c0 =q0 ¼ 3=7.
For the last example, there is one more question—do this code is regular or not?
Obviously, code which has the control matrix H LN (linearly independent rows) is
irregular. But it can be obtained from matrix H, which satisfies conditions for a
regular code. Let instead control matrix H LN for decoding control matrix H is used
and syndrome decoding is performed (it can be applied for every linear block
code). If r is received vector, the corresponding syndrome is S ¼ rH T . In this case
to the received vectors will correspond seven-bit syndrome vectors because their
length is defined by the number of matrix H rows. However, there are three linearly
dependent rows in this matrix and to all possible words would not correspond
2n = 128 but only 2(n−3) = 16 possible syndromes. Therefore, by using matrix
H the same results will be obtained as by using matrix HLN for decoding, where the
rows are linearly independent. The conclusion is that a code can be defined by
matrix having linearly independent rows, and this code can be defined and by
450 9 Low Density Parity Check Codes

matrix having linearly dependent rows as well and it can be said that the code is
regular!
Having the above in view, the construction of regular codes can be substantially
simplified if linear dependence of control matrix rows is allowed. Using this
approach the first (and relatively exact) method for construction of regular LDPC
codes was proposed by Gallager in his pioneering work [83]. The steps in control
matrix construction are:
– The wished codeword length (n) is chosen. Number of ones in rows and col-
umns is fixed being v and s, respectively.
– One positive integer is chosen m = n/q, and a matrix to be constructed is divided
into s submatrices of dimensions m  m q, where

H ððj1Þ  m þ 1 : jm; 1 : nÞ ¼ H j ð1 : m; 1 : nÞ; j ¼ 1; 2; . . .; s:

– Now, submatrix H1 is formed, in such way that in the ith row, by ones the
positions from (i–1)q + 1 to i q (i = 1, 2, … m) are filled, providing that rows of
H1 do not have common ones, and columns have not more than one.
– The other submatrices are obtained by permutations of columns of matrix H1.
Columns permutations are chosen so as that rows of the full matrix do not have
common ones, and columns have not more than one. Total number of ones in
matrix H is mqc, and matrix density is r = 1/m.
For example, consider control matrix which has parameters n = 20, q = 4, c = 3
and m = 5. Firstly, submatrix H1 dimensions 5  20 is formed, and later using
search other submatrices (H2 and H3) are found, fulfilling the above conditions.
One possible control matrix H obtained using such procedure is
2 3
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60
6 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 077
60
6 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 177
6
6                   77
2 3 6 7
H1 61
6 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 077
6 7 6 7
6  7 6 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 07
6 7 6 7
H¼6 7 6
6 H3 7 ¼ 6 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 077
6 7 6 7
4  5 6 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 07
6 7
H3 60
6 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 177
6 7
6                   7
6 7
61
6 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 077
6 7
60 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 07
6 7
60
6 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 077
6 7
40 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 05
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
Brief Theoretical Overview 451

The matrix rang is 13, and from 15 rows only 13 are linearly independent. The
obtained code is (20, 7), parameters are n–k = 13, code rate R = 0.35 and matrix
density r = 0.2 (Problem 9.3).
On the basis of explained procedure, the conclusion can be drawn that Gallager
method for regular LDPC code construction is not fully deterministic. In fact, it is
not clearly defined how to perform the permutations in submatrices H2, H3,…, Hs,
but only the needed features. Such an approach usually results in codes having good
performance, but it is relatively difficult from all possible combinations to choose
LDPC code with a given codeword length n and code rate R = k/n, i.e. (n, k) code
having good characteristics. However, later the structured LDPC codes were pro-
posed where the control matrix structure can be found using a deterministic algo-
rithm. Some of these algorithms are based on Euclidean and projective geometry
[86, 87], cyclic permutations [88] or combined construction [89]. These problems
are also considered in papers and monographs [90, 91, 92].
Tanner 1981 proposed an alternative way for considering control matrix of
LDPC code by using the bipartite graphs [84] (Problems 9.1, 9.2, 9.4, 9.5,
and 9.6). This approach provides advanced technique for decoding. To explain this
method better, firstly we will consider two classical ways for linear block codes
decoding. The decoding of any linear block code can be performed using syndrome
(Problem 9.2). Using such approach firstly on the basis of matrix H and relation
vHT = 0, where v = [v1, v2,…,vn] is a valid code word, a system of n–k equations
with n unknown variables is written. Every row of control matrix defines one
equation for parity-check, and the position of one in that row defines the position of
symbol in equation.
On the base of this system, the bits of code word can be found. Tanner graph is
bipartite graph visualizing the relation between two types of nodes—variable nodes
(vj) denoting the sent symbols, and (parity-)check nodes (ci,)—nodes correspond-
ing to parity-checks which relate the emitted symbols (bits). In such way, for any
linear block code, if (i, j)th element of matrix H equals one, at the corresponding
bipartite graph, there is a line between variable node vj and (parity-)check node ci .
The state of a check node depends on the values of variable nodes to which it is
connected. For some check node it is said that it is children node of variable nodes
to which is connected, and a variable node is parent node for all check nodes
connected to it.
As an example consider Hamming code (7, 4) where the control matrix is
(Problem 5.3)
2 3 c1 : v 1 þ v 3 þ v 5 þ v 7 ¼ 0
1 0 1 0 1 0 1
H ¼ 40 1 1 0 1 ð1Þ ð1Þ 5 c2 : v2 þ v3 þ v6 þ v 7 ¼ 0 ;
0 0 0 1 1 ð1Þ ð1Þ c3 : v4 þ v5 þ v6 þ v 7 ¼ 0

where the corresponding equations are at the right side. The corresponding bipartite
graph is given in Fig. 9.1.
452 9 Low Density Parity Check Codes

v1 v2 v3 v4 v5 v6 v7

c1 c2 c3

Fig. 9.1 Bipartite graph for Hamming code (7, 4)

If the received word is y = (0000011), syndrome is formed on the base of rows


of control matrix, but instead of the code word bits, the received bits are written

s 1 ¼ y1 þ y3 þ y5 þ y7 ¼ 1
s 2 ¼ y2 þ y3 þ y6 þ y7 ¼ 0 ;
s 3 ¼ y4 þ y5 þ y6 þ y7 ¼ 0

and it is concluded that the error happened at position S = [s1s2s3] = 1.


From graph shown in Fig. 9.1 it can be concluded that the variable node v6 is
connected with check nodes c2 and c3. Variable node v7 is connected (among
others) with the same check nodes—c2 i c3. These connections are the conse-
quences of binary ones written in control matrix H in parentheses. These connec-
tions in bipartite graphs are denoted by heavy lines forming 4-girth-cycle.
The described procedure can be applied to any linear block code, and to the
LDPC codes as well. Of course, generally, decimal equivalent of syndrome does
not give directly the error position, i.e. the process of error position finding is not
finished. For longer code words (and practically, interesting LDPC codes have
control matrix of large dimensions to provide for small density), a smaller decoding
complexity is obtained by majority logic decoding (MLD) (mentioned in Chap. 7)
(Problem 9.2). Let Al is a set of all rows of matrix having one at lth position. Then,
on the base of positions of ones in these rows, for every such set of matrix Al rows,
a set of s equations (chosen from n–k) can be written orthogonal to bit vl (l = 1, 2,
…,n), i.e. in this set every symbol occurs only once except vl. If in these equations
code word bits are changed to received word bits, MLD can be used and it is
possible to correct up to bc=2c errors and minimal Hamming distance is
dmin [ s þ 1. However, for small s this method is not especially effective, but the
value of s for LDPC codes cannot be large.
Structural feature of the matrix H generated by Gallager method is that in rows
corresponding to one code word, the other code word bits except vl appear only
once. Because of that, if the regular LDPC code is constructed on the above rules, in
the graph there would be no girths having cycles length four. Besides, during the
code construction, the aim is to eliminate all girths having short cycles from the
corresponding bipartite Tanner graph.
Brief Theoretical Overview 453

Consider control matrix of regular LDPC code


2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
6 7
60 0 1 1 0 1 07
6 7
H¼6
60 0 0 1 1 0 177:
61 0 0 0 1 1 07
6 7
40 1 0 0 0 1 15
1 0 1 0 0 0 1

Set of equations containing the matrix rows having one in the first column,
denoted with A1, is

c1 : v1 þ v2 þ v4 ¼ 0
c5 : v1 þ v5 þ v6 ¼ 0 ;
c7 : v1 þ v3 þ v7 ¼ 0

and it should be noticed that in this set every symbol occurs once except v1. This set
of equations is orthogonal to v1. Set of equations containing the matrix rows having
one in the second column, denoted with A2, is

c1 : v1 þ v2 þ v4 ¼ 0
c2 : v2 þ v3 þ v5 ¼ 0 :
c6 : v2 þ v6 þ v7 ¼ 0

This set is orthogonal to v2.


If the sequence r = (0011100) is at the decoder input, parity-checks for set A1
will be (1, 1, 1) and the first estimated bit of the code word is x1 = 1 (at that bit the
error happened!). Parity-checks for A2 will be (100) and the first estimated bit of the
code word is x1 = 0 because of the majority of zeros (there is no error at the second
bit). Further procedure would show that the error happened only at the first
transmitted bit, i.e. that the emitted code word is x = (1011100).
Tanner bipartite graph corresponding to this regular LDPC code is shown in
Fig. 9.2. Weight of every node is defined by the number of entering edges. For
bipartite graph where all nodes at one its side have the same weight it is said the
code is regular. It can be easily noticed that all variable nodes have the same weight
c = 3, and that all check nodes have as well the same weight q = 3. Also, in this
case Tanner graph does not have 4-girth-cycles, but has 6-girth-cycles. As stressed
earlier, during construction, the aim is that such cycles are as long as possible.
A simple procedure of iterative decoding LDPC codes proposed Gallager
himself in his pioneering work [63]. As shown above, the information about
“successful” transmission is in the syndrome. If some of syndrome components
equal one, it means that some of control sums defined by matrix columns are not
satisfied. Iterative decoding of LDPC codes based on hard decision consists of
454 9 Low Density Parity Check Codes

v1 v2 v3 v4 v5 v6 v7

c1 c2 c3 c4 c5 c6 c7

Fig. 9.2 Tanner graph for considered example

complementing some syndrome bits, performed in iterations. The result is a


decoded code word corresponding to the syndrome having all zeros.
The procedure to find which bits should be complemented is usually called bit-
flipping (BF) algorithm (Problems 9.3, 9.4, and 9.10) and consists of the next
steps:
1. By above explained procedure, on the basis of known matrix H the sums of
parity-checks are formed. For the bits of received word y, using

S ¼ yH T

the syndrome is found. If all syndrome components are zeros, it is concluded


that the decoder code word is just the received vector y and the decoding is
finished.
2. If the syndrome differs from zero, all control sums are calculated corresponding
to nonzero syndrome bits. Let these sums form the set Z. Let ith bit of received
word is denoted by yi. This bit appears only in some parity-checks of
Z. Parity-check equations where yi appears, which are not satisfied form subset
Z ðyi Þ 2 Z.
3. From all received word bits (yi, i = 1, 2, … , n), the bits are found for which
subset Z(yi) has the biggest number of elements. Otherwise speaking, the
received bits are found which take part in the largest number of subset elements.
If there are more such bits all of them should be taken into account.
4. Received bit(s) found according the previous step should be complemented
(flipped). By this procedure, the modified code word is obtained y(1), being the
result of the first iteration.
5. The procedure from steps 1–4 is repeated iteratively for vectors y(1), y(2),… y(N),
until all syndrome bits are not equal zero. Vector y(N) satisfying this condition
corresponds to the received code word on the basis bit-flipping procedure.
It is considered today that by using BF algorithm a good performance can be
achieved with relatively small complexity of realization. Low-complex parallel
bit-flipping algorithm with fixed threshold is especially important, as well as the
simplest massage-passing algorithm, known as Gallager-B (Problems 9.4 and 9.5).
Brief Theoretical Overview 455

Very popular are modifications of this algorithm, especially weighted BF algo-


rithms (weighted bit-flipping [93]), modified weighted BF (modified weighted bit-
flipping [94]) and weighted BF algorithm based on reliability ratio (reliability ratio
weighted bit-flipping [95]).
As an example consider the control matrix of regular LDPC code (only positions
of ones are denoted, while the zero positions are omitted):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 3
I 1 1 1 1
II 6 1 1 1 1 7
6 7
6 7
III 6 1 1 1 1 7
6 7
IV 6 1 1 1 17
6 7
6 7
6               7
6 7
61 7
V 6 1 1 1 7
6 7
H ¼ VI 6 1 1 1 1 7
6 7
6 7
VII 6 1 1 1 1 7
6 7
VIII 6 1 1 1 17
6 7
6 7
6               7
6 7
IX 61 1 1 17
6 7
6 7
X 6 1 1 1 1 7
6 7
XI 4 1 1 1 1 5
XII 1 1 1 1

It is obvious that the code word length is n = 16, the number of ones over rows
and columns is, respectively, c = 4 and c = 3, and control matrix density is
r = 0.25.
If the received word is y = (1100101000000000), then yHT = (000001101001).
In the syndrome at the positions VI, VII, IX and XII are ones. Therefore, the
following sums are not satisfied:
– VI control sum (where are 1, 2, 3 and 4 received word bits)
– VII control sum (where are 3, 7, 11 and 15 received word bits)
– XI control sum (where are 1, 6, 11 and 16 received word bits)
– XII control sum (where are 4, 5, 10 and 15 received word bits)
In the above control sums bits 1, 3, 4, 11 and 15 appear the most frequently (two
times) and the received word bits at these places are inverted yielding the word
y′ = (0111101000000010), to which now the syndrome y′HT = (010101101001)
corresponds. Now ones in syndrome are at the II, IV, VI, VII, IX and XII position.
It is obvious that the parity-checks are not satisfied. Now, in control sums bits 6 and
15 appear the most frequently (three times). Bit 6 appears in sums II, VI and IX, and
bit 15 in sums IV, VII and XII, while the other bits appear at most two times. By
inverting bits 6 and 15, the second estimation of the transmitted code word is
456 9 Low Density Parity Check Codes

y″ = (0111111000000000) yielding y″HT = (000000000000000), and y″ is the


decoded code word.
Besides the above explained algorithm, Gallager in his work also exposed the
basic idea for LDPC iterative decoding based on soft decision. He has shown that
by using such algorithms, substantially better performance can be obtained, where
the number of operations per bit in every iteration is not dependent of the block
length. The proposed algorithm is based on MAP, later known as belief-propa-
gation [83, 96] (Problem 9.6). This method is based on solution of equations on
graph, yielding as a result the exact values of a posteriori probabilities if the graph
has not cycles. The graphs corresponding to LDPC codes usually have cycles and
this algorithm does not guarantee a correct decoding. In spite of this, Gallager
iterative algorithm in praxis achieves the good results because the correct decoding
most frequent can be performed even if the estimated a posteriori probabilities are
not exact. Iterative version of belief propagation algorithm suitable for practical
implementation is here usually called sum-product algorithm (SPA) [25, 96]
(Problems 9.6 and 9.10). This algorithm is based on the message passing between
Tanner graph nodes. During sum-product algorithm, every connected variable node
dj and check node hi change information according to the following iterative
procedure:
1. The initial values of probabilities are chosen Qxij . For these values a priori
estimations of received symbols fjx are chosen. These probabilities in principle
depend on the type of the channel used. For a channel with only additive white
Gaussian noise, they are chosen as

1 2 1 2
fj0 ¼ pffiffiffiffiffiffi eðrj þ 1Þ =ð2r Þ ; fj1 ¼ pffiffiffiffiffiffi eðrj 1Þ =ð2r Þ
2 2
j ¼ 1; 2; . . .; n;
2pr 2pr

where rj denotes the jth bit of the received word. In fact, fj0 denotes a priori
probability that (logical) zero (dj = 0) was emitted, and fj1 denotes a priori
probability that (logical) one (dj = 1) was emitted. These values are the initial
probability estimations that variable nodes are in state 0 or 1. These estimation
are sent to check nodes where the symbol vj appears.
2. Other the check node ci receives from all connected variable nodes vj the
probability estimations Qxij (for x = 0 and x = 1), the probabilities are calculated
that ith parity-check equation is satisfied if the vj is in the state x (corresponding
to the symbol vj value and can be 0 or 1). These probabilities are
X Y
Rxij ¼ Pðci =vÞ Qvikk
v:vj ¼x k2N ðiÞ=j

where N(i) is a set of indexes of all parent nodes connected to check node ci, and
N(i)/j is the same set but without node vj to which the information is sent. Term
P(ci/v) denotes the probability that parity-check equation is satisfied, and
Brief Theoretical Overview 457

summing is performed over all possible decoded vectors v for which the
parity-check equation is satisfied when the informed node is in the state x.
3. At the variable node dj after receiving from all connected nodes the values of
probabilities Rxij (for x = 0 and x = 1), the correction of probabilities that this
variable node is in state x (it can be 0 or 1) is performed. These probabilities are

Y
Qxij ¼ aij fjx Rxkj ;
k2M ð jÞ=i

where M(j)/i is a set of indexes of all check nodes connected to node vj, without
P is sent. Coefficient aij is a normalization constant
node ci to which the information
chosen so as the condition x Qxij ¼ 1 is satisfied.
This process, shown in Fig. 9.3 is repeated iteratively and stops after the
information yielding vector d for which the corresponding syndrome is zero
vector. If after some number of iterations zero syndrome is not obtained, the
process stops when some, predefined, number of iterations was made. In both
cases the decoder yields optimally decoded symbols, in the maximum a posteriori
probability sense, but if syndrome is not equal to zero, the emitted code word is
not decoded.
Consider regular LDPC code and the corresponding Tanner graph from the
above example. The channel is with additive white Gaussian noise only. The
procedure starts by calculating a priori probabilities f10 ; f11 ; f20 ; f21 ; . . .; f70 ; f71 sent to
the check nodes. To the first check node the initial information from the first,
second and fourth variable node Qxij ¼ fjx are sent, concerning the probability
that the corresponding node is in the state 0 or 1. Now, the first check node has
to return some information to every of connected variable nodes. They are
different.
Parity-check equation for the first check node is v1 þ v2 þ v4 ¼ 0. Coefficient R011
is the estimation sent by the check node 1 to the variable node 1. It is calculated

Variable nodes v j

v1 v2 v3 v4 v5

Q24x
R24x

c1 c2 c3
Check nodes h i

Fig. 9.3 The change of information between variable and check nodes
458 9 Low Density Parity Check Codes

under assumption that the corresponding parity-check equation is satisfied if the


first bit in code word, v1, is zero. In this case the number of ones in equation should
be even and there are only two combinations of other bits entering into parity-check
for the first check node satisfying simultaneously this condition yielding

R012 ¼ Q012 Q014 þ Q112 Q114 :

Coefficient R111 is also the estimation which v1 sends to c1 but it is calculated by


supposing that the parity-check equation is satisfied when v1 = 1. In this case the
number of ones should be odd and there are two combinations of bits v2 and v4
satisfying this condition, yielding

R111 ¼ Q012 Q114 þ Q112 Q114 :

This step of iteration process is shown in Fig. 9.4. When all values of coeffi-
cients R0ij and R1ij are calculated, the first estimation of message symbols is made.
The corresponding vector bv is calculated as follows
Y
bv j ¼ arg max fjx Rxkj
x k2M ð jÞ

and if the syndrome for this vector is not zero, procedure continues by correcting
probabilities Qxij , joined to the corresponding variable nodes.
As said earlier, decoding of LDPC codes converges to the original message if
there are no cycles in the corresponding bipartite graph. However, the relatively
short cycles are unavoidable as well and when the corresponding LDPC code has
good performance. But, the degrading effect of small cycles is diminished as the
code length is longer, and substantially is small for code words long over 1000 bits.
Also, there exist special procedures to eliminate the short cycles or to reduce their
number [25].
The realization of sum-product algorithm in a logarithmic domain (Problems 9.7
and 9.10) can be further considered. The motives to use the logarithmic domain are
the same as for turbo codes decoding—avoiding the products calculation always

v1 v2 v3 v4 v5 v6 v7

R110 Q120
Q140
1
R111 Q
12 Q141

c1 c2 c3 c4 c5 c6 c7

Fig. 9.4 One step in sum-product algorithm


Problems 459

when it is possible and lessening the overflow effect. Here logarithmic likelihood
ratio is used. It can be used as well as for hard decoder inputs (with smaller correcting
capability). However, this algorithm has one serious drawback. To find the infor-
mation which the check node send to it adjoined variable node j one has to determine
8 9
< Y =
bi;j ¼ 2 tan h1 tan hðai;k =2Þ ;
:k2NðiÞnj ;

and still more serious problems appear when the hardware block should be
implemented corresponding to this relation. That is the reason to use the SPA
procedure versions being much easier to be implemented. The simplest (and still
rather exact) approximation of the above equality is
2 3
Y  
 
bi;j ¼ 4 sgnðai;k Þ5   
min ai;k ;
k2NðiÞnj
k2NðiÞnj

it is called min-sum algorithm (Problem 9.8) having still possible further modifica-
tions as self-correcting min-sum algorithm (Problem 9.8). The quasi-cyclic codes
(Problem 9.9) can be considered as the LDPC codes as well, but the obtaining of the
parity-check matrix is not always easy. The class of Progressive Edge Growth
(PEG) LDPC codes (Problem 9.10) can be constructed as well, where the
parity-check matrix is formed by a constructive method (it is a structured, and not a
random approach with the limitations as for Gallager codes). For such a code Monte
Carlo simulation of signal transmission over BSC with AWGN was carried out
(Problem 9.10). For the case when sum-product algorithm is applied, the error
probability 10−6 is achieved for substantially smaller values of parameter Eb/N0 in
respect to the case when bit-flipping algorithm is applied. Some modification of
bit-flipping algorithm (e.g. gradient multi-bit flipping—GDFB) allows to increase the
code gain from 0.4 to 2.5 dB (for Pe,rez = 10−6). On the other hand, even a significant
simplification of sum-product algorithm (min-sum is just such one) in the considered
case does not give a significant performance degradation—coding gain decreases
from 6.1 to 5.8 dB. It is just a reason that contemporary decoding algorithms are
based on message-passing principle, with smaller or greater simplifications.

Problems

Problem 9.1 At the input of a decoder, corresponding to a linear block code (7, 4),
a code word r transmitted through the erasure channel is led. It is known that the
first four bits are not transmitted correctly (but their values are not known), while it
is supposed that the last three bits are received correctly being y5 = 1, y6 = 0,
y7 = 1. Find the values of erased (damaged) bits:
460 9 Low Density Parity Check Codes

(a) If a code used is obtained by shortening the Hamming systematic code (8, 4),
its generator is given in Problem 5.4, and a shortening was performed by the
elimination of the first information bit;
(b) If the code was used which has the parity-check matrix
2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H¼6
40
7;
0 1 1 0 1 05
0 0 0 1 1 0 1

(c) For both codes draw a Tanner graph and on its basis explain the difference in
the decoding complexity of the above codes.

Solution
In this problem it is supposed that the erasure channel is used, where the emitted
symbol can be transmitted correctly, or it can be “erased”, meaning that there is an
indication that the symbol (denoted by E) was not transmitted correctly, but there is
not the information do it originate from the emitted zero or one (illustrated in
Fig. 9.5). Here, as a difference to Problem 4.8, it is not permitted that an emitted
symbol during transmission was inverted, i.e. that the error occurred and that the
receiver has not information about that (i.e. the transitions 0 ! 1 and 1 ! 0 are
not permitted).
The case is considered when code (7, 3) is used and the emitted code word
x = (x1, x2, x3, x4, x5, x6, x7) should be found, if the received word is
y = (EEEE101). Although this code can detect two errors and correct only one error
in the code word at the BSC (it is a shortened Hamming code with one added
parity-check bit), in the following it will be shown that here all four erased symbols
can be corrected.
(a) In Problem 5.4 the construction of code (8, 4) is explained, which has the
generator matrix
2 3
1 0 0 0j 1 1 0 1
60 1 0 0j 1 0 1 17
Gð8;4Þ ¼6
40
7
0 1 0j 0 1 1 15
0 0 0 1j 1 1 1 0

Code (7, 3), obtained by shortening of the previous code by elimination of the
first information bit has the following generator matrix
2 3
1 0 0 0j 1 1 0 1 2 3
60 7 1 0 0 1 0 1 1
1 0 0j 1 0 1 17 4
G1 ¼ 6
40 ¼ 0 1 0 0 1 1 1 5;
0 1 0j 0 1 1 15
0 0 1 1 1 1 0
0 0 0 1j 1 1 1 0
Problems 461

Fig. 9.5 BEC (Binary 1-p1 y1=1


Erasure Channel) graph
x1=1 p1
y2=E
p1
x2=0

1-p1
y3=0

and the corresponding parity-check matrix is


2 3
1 0 1 1 0 0 0
60 1 1 0 1 0 07
H1 ¼ 6
41
7:
1 1 0 0 1 05
1 1 0 0 0 0 1

The code word must fulfill the condition xH T1 ¼ 0, i.e. in a developed form

x1  x3  x4 ¼ 0
x2  x3  x5 ¼ 0
x1  x2  x3  x 6 ¼ 0
x1  x2  x7 ¼ 0

By supposing that the last three symbols are transmitted correctly, then x5 ¼
y5 ¼ 1; x6 ¼ y6 ¼ 0; x7 ¼ y7 ¼ 1 and a system of four equations with four
unknown variables is obtained

x 1  x3  x4 ¼ 0
x 2  x3 ¼ 1
;
x 1  x2  x3 ¼ 0
x 1  x2 ¼ 1

the solutions are x1 = 1, x2 = 0, x3 = 1 and x4 = 0, the reconstructed word is


x = (1010101).
For any linear block code, on the parity-check matrix basis, a system with
n–k equations can be written, and in such way it is possible to reconstruct no more
than n–k erased bits. It should be noticed that for the solving of this system some
method of linear algebra should be applied (e.g. the substitution method).
(b) For the code (7, 3) which has the parity-check matrix
462 9 Low Density Parity Check Codes

2 3
1 1 0 1 0 0 0
60 1 1 0 1 0 07
H2 ¼ 6
40
7;
0 1 1 0 1 05
0 0 0 1 1 0 1

the corresponding equation system on the basis of the relation xH T2 ¼ 0 is

x1  x2  x4 ¼ 0; x2  x3  x5 ¼ 0; x3  x4  x6 ¼ 0; x4  x5  x7 ¼ 0:

This last three symbols are correctly transmitted ðx5 ¼ y5 ¼ 1; x6 ¼ y6 ¼ 0;


x7 ¼ y7 ¼ 1Þ, and above equations make possible a successive code word bits
reconstruction

x4  x 5  x7 ¼ 0 ) x4 ¼ x5  x7 ¼ 0;
x3  x 4  x6 ¼ 0 ) x3 ¼ x4  x6 ¼ 0;
x2  x 3  x5 ¼ 0 ) x2 ¼ x3  x5 ¼ 1;
x1  x 2  x4 ¼ 0 ) x1 ¼ x2  x4 ¼ 1;

and the reconstructed code word is x = (1100101). The proposed decoding method
is in fact an iterative method—in the first step the first three equations (every
considered for itself) have not a solution, while from the last one x4 is found. In the
next step, the first two equations have not a solution, but the third one has it. In the
third step x2 is found using the second equation and the previously found values for
x3 and x4. In the last step the first code word bit is reconstructed.
The following could be noticed:
– Generally, for a code (n, k), if from every equation one variable can be found,
the corresponding decoder complexity is linear, i.e. O((n–k)).
– In the first case the number of binary ones in the rows is three or four (on the
average q1 = 13/4) and a number of ones in the columns varies from one to
three, but on the average it is c1 = 13/7, and a code rate can be found from the
relation R1 = 1−c1 /q1 = 1−4/7 = 3/7.
– In the second case a number of ones in every row is fixed (q2 = 3), but a number
of ones in matrix columns is not the same (from one to three, on the average
c2 = 12/7), and the code rate is here the same as well R2 = 1−c2 /q2 = 3/7.
– The matrix density is relatively high for both cases (r1 = q1 /n = 0.4643,
r2 = q2 /n = 0.4286), both codes are irregular.
– An optimum method for code which has parity-check matrix H2 decoding is
iterative—it is optimal to decode firstly the fourth bit, then the third, the second,
and at the end the first one. In the same time a decoding procedure has a linear
complexity!
– To provide for iterative decoding with a linear complexity, it is fundamental to
achieve that a parity-check matrix allows the decoding of one bit from one
equation in every iteration. If a matrix has a small number of ones in every row,
Problems 463

it will provide (with some good luck concerning the error sequence!) that in
every row there is no more than one bit with error.
– Therefore, a necessary condition is that a parity-check matrix is a sparse matrix,
the codes fulfilling this condition are Low Density Parity Check (LDPC) codes
[83].

(c) Tanner graph [84] is a bipartite graph to visualize the connections between the
two types of nodes, the variable nodes, usually denoted by vj, representing the
code word emitted symbols and the (parity-check nodes), denoted by ci,
representing the parity-check equations.
Tanner graph for a code having parity-check matrix H1 is shown in Fig. 9.6,
with especially marked cycle having a length four (4-girth-cycle), corresponding to
the variable nodes v1 and v2, i.e. to the check nodes c1 and c2 (there is one more
cycle of the same length, the reader should find it).
It can be noticed that just this cycle makes impossible to decode the symbols v1
and v2 directly, from the last four equation of the system
c1: v1 ⊕ v3 ⊕ v4 = 0
c2 : v2 ⊕ v3 ⊕ v5 = 0
c3 : v1 ⊕ v2 ⊕ v3 ⊕ v6 =0
c4 : v1 ⊕ v2 ⊕ v7 =0

At the Tanner bipartite graph, corresponding to matrix H2, shown in Fig. 9.7, it
can be noticed that the cycle of the length four does not exist. It just makes possible
to decode successively a code word bits with the linear complexity. The shortest
cycle at this graph has the length six and it is drawn by heavy lines. Therefore, to
provide the iterative decoding with the linear complexity, the parity-check matrix
graph should not have the short length cycles!
Problem 9.2 Parity-check matrix of one linear block code is
2 3
1 1 1 1 0 0 0 0 0 0
61 0 0 0 1 1 1 0 0 07
6 7
H¼6
60 1 0 0 1 0 0 1 1 077:
40 0 1 0 0 1 0 1 0 15
0 0 0 1 0 0 1 0 1 1

(a) Find the parity-check matrix density and verify whether the code is regular,
(b) Draw the Tanner graph corresponding to the code and find the minimum cycle
length,
(c) Decode the received word
464 9 Low Density Parity Check Codes

v1 v2 v3 v4 v5 v6 v7

c1 c2 c3 c4

Fig. 9.6 Tanner graph corresponding to the first considered code

v1 v2 v3 v4 v5 v6 v7

c1 c2 c3 c4

Fig. 9.7 Tanner graph corresponding to the second considered code

y ¼ ð0001000000Þ

using majority logic decision.


Solution
(a) The number of ones in every column is c = 2, and in every row is q = 4, the
code is regular. Matrix density is r = 2/4 = 0.5 and code rate R = 1−c/
q = 1/2. Parity-check matrix has n = 10 columns and n−k = 5 rows, it is the
code (n, k) = (10, 5).
(b) The corresponding Tanner graph is shown in Fig. 9.8. In the parity-check
matrix is not possible to find a rectangle at which corners are binary ones and
it is obvious that in the graph there are no cycles of the length four. One cycle
of a length six is shown by heavy lines. The cycle length must be an even
number and it is obvious that it is a minimum cycle length for this graph.
(c) Majority logic decoding (majority voting) starts by forming a set of equations
orthogonal to a single code word bit (similarly to the procedure for
Reed-Muller codes decoding):
Problems 465

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

s=2

c1 c2 c3 c4 c5
=4

Fig. 9.8 Tanner graph of the code (10, 5)

1. equations set orthogonal to v1:

A1 : c1 : v1  v2  v3  v4 ¼ 0; c2 : v1  v5  v6  v7 ¼ 0;

2. equations set orthogonal to v2:

A2 : c1 : v1  v2  v3  v4 ¼ 0; c3 : v2  v5  v8  v9 ¼ 0;

3. equations set orthogonal to v3:

A3 : c1 : v1  v2  v3  v4 ¼ 0; c4 : v3  v6  v8  v10 ¼ 0;

4. equations set orthogonal to v4:

A4 : c1 : v1  v2  v3  v4 ¼ 0; c5 : v4  v7  v9  v10 ¼ 0;

5. equations set orthogonal to v5:

A5 : c2 : v1  v5  v6  v7 ¼ 0; c3 : v2  v5  v8  v9 ¼ 0;

6. equations set orthogonal to v6:

A6 : c2 : v1  v5  v6  v7 ¼ 0; c4 : v3  v6  v8  v10 ¼ 0;

7. equations set orthogonal to v7:

A7 : c2 : v1  v5  v6  v7 ¼ 0; c5 : v4  v7  v9  v10 ¼ 0;

8. equations set orthogonal to v8:

A8 : c3 : v2  v5  v8  v9 ¼ 0; c4 : v3  v6  v8  v10 ¼ 0;
466 9 Low Density Parity Check Codes

9. equations set orthogonal to v9:

A9 : c3 : v2  v5  v8  v9 ¼ 0; c5 : v4  v7  v9  v10 ¼ 0;

10. equations set orthogonal to v10:

A10 : c4 : v3  v6  v8  v10 ¼ 0; c5 : v4  v7  v9  v10 ¼ 0:

If at the decoder input is the sequence y = (0001000000), the syndrome can be


found

S ¼ yH T ¼ ð10001Þ;

and syndrome bits determine the parity-check sums, while the parity-check values
for single bits of a code word, corresponding to the sets A1–A10 provide for the
code word bits reconstruction

A1 : ð1; 0Þ ! v1 ¼ ?; A2 : ð1; 0Þ ! v2 ¼ ?; A3 : ð1; 0Þ ! v3 ¼ ?;


A4 : ð1; 1Þ ! v4 ¼ 1; A5 : ð0; 0Þ ! v5 ¼ 0; A6 : ð0; 0Þ ! v6 ¼ ?;
A7 : ð0; 1Þ ! v7 ¼ ?; A8 : ð0; 0Þ ! v8 ¼ 0;
A9 : ð0; 1Þ ! v9 ¼ ?; A10 : ð0; 1Þ ! v10 ¼ ?

From the above it is reliably determined that at the 5th, 6th and the 8th code
word bits there were no errors, and at the 4th bit almost surely the error occurred,
while for the rest of the bits the reliable decision cannot be made. As a difference
from the example in introductory part of this chapter, in this case the number of
parity-check sums is even (it is not desirable) and small as well, i.e. only two
parity-check sums are formed for every code word bit. Supposing that no more than
one error occurred at the code word, the reconstructed code word is

x0 ¼ ð0000000000Þ:

The decoding procedure would be highly more efficient if the number of ones is
greater per column and odd if possible. But, on the other hand, the matrix density
would grow yielding probably the occurrence of the shorter cycles as well, i.e. the
code quality would be degraded. Therefore, a majority decoding is not a procedure
providing for an efficient LDPC codes decoding, i.e. it is relatively successful only
for codes which have very long code words and a relatively dense parity-check
matrix.
Problem 9.3 In a communication system Gallager code (20, 7) is used, the
parity-check matrix given in in introductory part of this chapter (it is repeated here
for convenience!). By bit-flipping algorithm decode the following received words
Problems 467

ðaÞ y ¼ ð11000000000000000000Þ; ðbÞ y ¼ ð00011000000000011000Þ:

Solution
Bit-flipping algorithm is an iterative procedure using as inputs the received bits and
a parity-check matrix structure. For Gallager code (20, 7) it is
2 3
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60
6 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 077
60
6 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 077
6 7
60
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 177
6
6                   77
6 7
61
6 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 077
6 7
60 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 07
6 7
H¼6
60 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 077:
6 7
60 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 07
6 7
60
6 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 177
6 7
6                   7
6 7
61
6 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 077
6 7
60 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 07
6 7
60
6 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 077
6 7
40 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 05
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

(a) For the received word y = (11000000000000000000) the syndrome is


yHT = (000001100011000) showing that the following sums are not satisfied:
– VI control sum (where are 1, 5, 9 and 13 received word bits)
– VII control sum (where are 2, 6, 10 and 17 received word bits)
– XI control sum (where are 1, 6, 12 and 18 received word bits)
– XII control sum (where are 2, 7, 11 and 16 received word bits)
In the above control sums bits 1, 2 and 6 appear the most frequently (two times)
and the received word bits at these places are inverted yielding the word
y′ = (00000100000000000000), to which the syndrome y′HT = (010000100010000)
corresponds. Now, the II, VI and XI parity-check rows are considered where one
appears only in the sixth column in every of these rows. By inverting this bit, the
second estimation of the code word y″ is obtained consisting of zeros only, yielding
the following y″HT = (000000000000000), and the vector y″ is a decoded code
word.
(b) For the received word y = (00011000000000011000) the syndrome is
yHT = (110111101101001), where every syndrome element corresponds to
the sum of corresponding elements in 4, 5, 16 and 17 column of parity-check
468 9 Low Density Parity Check Codes

matrix, denoted by dot-line-dot in Fig. 9.9. The syndrome shows that the
parity-check sums I, II, IV, V, VI, VII, IX, X, XII and XV are not satisfied,
corresponding to the shadowed rows of the parity-check matrix in the Fig. 9.9.
If only these rows are considered, columns denoted by ordinal numbers 2, 5,
15, 16 and 20 contain the highest number (three) of ones.
After the first correction (carried out by inverting bits 2, 5, 15, 16 and 20) the
first estimation of the codeword is found y′ = (01010000000000101001). To this
code word (having ones at the positions 2, 4, 15, 17 and 20, determining the
columns of interest in Fig. 9.10) corresponds the syndrome y′HT =
(000100000101000), and in IV, X and XII control sums the 16th bit the most
frequently appears (three times). It can be easily seen from the shadowed rows in
Fig. 9.10. Therefore, in the second iteration only this bit is inverted and second
code word estimation is formed as y″ = (01010000000000111001).
Because r″HT = (000000000000000), i.e. the syndrome equals zero, the last
estimation is a valid code word. Therefore, the result of decoding is

x ¼ y00 ¼ ð01010000000000111001Þ;

where the bits inverted in the received word are underlined and in bold.
Of course, it may not be the transmitted code word. For example, it was possible
that the transmitted word is an all zeros word and the four bits were inverted (by
errors) yielding the received word r. Hamming distance of these two words is four,
while the distance of the received word to the decoded word r″ is four as well.
Therefore, it can be said that a decoded word is one from the nearest to the received
word in a Hamming distance sense, but is not necessarily the right solution.
Problem 9.4 LDPC code (10, 5) is defined with the parity-check matrix from the
formulation of Problem 9.2.
(a) Write the parity-check equations and explain the parallel realization of
bit-flipping algorithm with the fixed threshold T = 2,
(b) By using the explained algorithm decode the following received words

y1 ¼ ð0000100000Þ; y2 ¼ ð1000100000Þ:

Solution
As it was shown in Problem 9.2, the corresponding parity-check equations are:

c 1 ¼ v1  v2  v3  v4 ; c2 ¼ v1  v5  v6  v7 ; c3 ¼ v2  v5  v8  v9 ;
c4 ¼ v3  v6  v8  v10 ; c5 ¼ v4  v7  v9  v10 :

and all these parity-checks are satisfied (the result is equal to zero) if vector v is a
valid codeword.
Problems 469

Fig. 9.9 The first iteration during the decoding

For the updating of the jth bit in the received word, it is important to identify the
ðjÞ
parity-check equations that include this bit. Let ci denote the ith parity-check
equation containing the jth bit in the received word, which is satisfied only if the
corresponding bipartite graph (shown in Fig. 9.8) has an edge that connects the jth
variable node and the ith check node.
For regular LDPC codes, there are exactly q inputs into every check node (in this
case q = 4), and it is easy to understand that the operation in check nodes can be
realized by using q-input exclusive or (XOR) gates. On the other hand, there are
ðjÞ
exactly c parity-checks ci that are important to decide whether the jth bit will be
flipped or not (in this case c = 2). If the number of unsatisfied parity-checks is
greater or equal than the predefined threshold T  c, the jth bit will be inverted
(flipped) in the current iteration
( P ðjÞ
1; h  T;
Dj ¼ P iðjÞ
0; hi \T:

Therefore, operations in variable nodes can be realized by using v-input majority


logic (ML) gates with the decision threshold T. The updated bit is then calculated as
470 9 Low Density Parity Check Codes

Fig. 9.10 The second iteration during the decoding

vj ¼ vj  Dj : In the parallel decoder realization, every symbol (j = 1, 2 ,…, n) is


updated at the same time, in parallel.
According to the formulation of the problem, in this case it is assumed that the
threshold has the fixed value for all symbol nodes in all iterations, i.e. T ¼ 2. The
corresponding hardware realization (on the logic gate level) is given in Fig. 9.11,
where the updating of the fifth bit is illustrated.
(b) When the received word is y1 = (0000100000) only the fifth received bit is not
equal to zero, and therefore only the second and the third parity-check are
unsatisfied. As it is shown in Table 9.1, two parity-checks are satisfied only
for the fifth symbol of the received word and this bit is flipped. It is easy to
verify that vector with all zeros represents the valid codeword and the
decoding process is finished.
If the threshold is set to the typical value T ¼ dc=2e ¼ 1, it is easy to see that
seven bits should be flipped. It is recommended to reader to check do the obtained
word after the first iteration is a codeword in this case.
If the received word is y2 = (1000100000), the decoding procedure is illustrated
in Table 9.2. As decoder structure is the same, only the values of the parity-checks
are changed. As only the first and the fifth bit in the received word have non-zero
Problems 471

values, only the first and the third parity-check are unsatisfied. If the threshold is
T = 2, only the second bit in the received bit should be flipped and the estimated
word after the first iteration is y2′ = (1100100000). For this word, all parity-checks
are satisfied (h1–h3 as a summation of two binary ones, c4 and c5 as a summation of
all zeros), i.e. it is the valid codeword and the decoding process is finished.
Now two cases will be considered when (1000100000) can be received:
• codeword (1100100000) is transmitted, the channel introduced the error at the
second bit and the decoding process was successful
• codeword (0000000000) is transmitted, and the channel introduced the errors at
the first and the fifth bit. In such case, the decoder produced one additional error
at the second position of the estimated vector and the decoding process was
unsuccessful.
Now, consider why the errors at positions 1 and 5 cannot be corrected if the
codeword all-zeros is transmitted. The Tanner graph for the case when r2 is
received is given in Fig. 9.12. One can see that errors are located inside the cycle
with length 6. As this code is girth-6 code, the errors are located within the cycle
with the minimum length that is not desirable.
As the zero-valued variable nodes do not have any impact to the parity-checks, it
is enough to analyze the impact of variable nodes v1, v5, check nodes connected to
them (c1, c2, c3) and the variable nodes that are connected to these parity-checks at
least T times (could be flipped in some scenario). In this case, besides v1 and v5,
only v2 is connected to c1 and c3, and the other variable nodes are connected only to
one check node from set {c1, c2, c3}. The structure consisted of these nodes in
bipartite graph is presented in Fig. 9.12, where full circles represents the non-zero

0 0 0 0 1 0 0 0 0 0
d1 d10

+ +
h2(5) 1 h3(5)

...
T 2
ML

1
+ d 5* 0

Fig. 9.11 Parallel bit flipping, updating of the fifth bit


472

Table 9.1 The decoding process during the first iteration, received word y1
j 1 2 3 4 5 6 7 8 9 10
ðjÞ ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
ci c1 ¼0 c1 ¼0 c1 ¼0 c1 ¼0 c2 ¼1 c2 ¼1 c2 ¼1 c3 ¼1 c3 ¼1 c4 ¼0
ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
c2 ¼1 c3 ¼1 c4 ¼0 c5 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c4 ¼0 c5 ¼0 c5 ¼0
P ðjÞ 1 1 0 0 2 1 1 1 1 0
ci
Dj 0 0 0 0 1 0 0 0 0 0
vj 0 0 0 0 0 0 0 0 0 0
9
Low Density Parity Check Codes
Problems

Table 9.2 The decoding process during the first iteration, received word y2
j 1 2 3 4 5 6 7 8 9 10
ðjÞ ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
ci c1 ¼1 c1 ¼1 c1 ¼1 c1 ¼1 c2 ¼0 c2 ¼0 c2 ¼0 c3 ¼1 c3 ¼1 c4 ¼0
ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ ð8Þ ð9Þ ð10Þ
c2 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c3 ¼1 c4 ¼0 c5 ¼0 c4 ¼0 c5 ¼0 c5 ¼0
P ðjÞ 1 2 1 1 1 0 0 1 1 0
ci
Dj 0 1 0 0 0 0 0 0 0 0
vj 1 1 0 0 1 0 0 0 0 0
473
474 9 Low Density Parity Check Codes

v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0

c1 = 1 c2 = 0 c3 = 1 c4 = 0 c5 = 0

Fig. 9.12 Tanner graph of code (10, 5) for received word r2

symbols (errors), empty circles zero-symbols, full squares unsatisfied parity-checks


and empty squares satisfied parity-checks. It is clear that maximum number of two
unsatisfied parity-checks is for v2, as two full squares are connected to them.
The same structure is presented in Fig. 9.13, after the flipping of the second bit
in the received word. In this case all parity-checks are satisfied, and the codeword
(1100100000) is decoded.
Finally, as is was shown that at least one weight-three vector satisfy all
parity-check equations, it is clear that the minimum Hamming distance of this code
is not greater than three. Therefore, it is not possible to construct the decoder that
could correct more than one error per codeword. So, it can be concluded that the
described bit-flipping decoder is the best possible for the described scenario.
Problem 9.5 LDPC code (10, 5) is defined with the parity-check matrix from the
formulation of Problem 9.2.
(a) Shortly explain Gallager-B decoding algorithm,
(b) By using the explained algorithm with fixed threshold T ¼ dc=2e decode the
following code word

y ¼ ð0000100000Þ:

Fig. 9.13 The error patterns v1 v1


(a) in the received word, c1 c1
(b) in the decoded word

c2 c2
v2 v2

c3 c3
v5 v5
Problems 475

Solution
(a) The Gallager-B algorithm represents the simplest possible message passing
algorithm, where the information is transmitted across the edges of the
bipartite graph, from variable nodes to the check nodes and vice versa. Let
assume that the variable nodes that are connected to one check node are
neighbor variable nodes, and the check nodes connected to one variable node
are neighbor parity nodes as well. In message passing algorithms:
– variable nodes send reports about their current status to the check nodes,
– by using the reports from the neighbor variable nodes, check nodes make
an estimation of the value of the current variable node (in the next iteration,
the current state of variable node is not of direct interest, it is more
important how their neighbors see it!)
– by using majority of the estimations from the neighbor check nodes, new
value of the variable node is obtained.
Gallager-B algorithm represents iterative decoding procedure operating in a
binary field where, during the iteration, binary messages are sent along the edges of
Tanner graph. Let E(x) represent a set of edges incident on a node x (x can be either
symbol or parity-check node). Let mi ðeÞ and m0i ðeÞ denote the messages sent on
edge e from variable node to check node and check node to variable node at
iteration i, respectively. If the initial value of a bit at symbol node v is denoted as r
(v), the Gallager-B algorithm can be summarized as follows:
Initialization (i = 1): For each variable node v, and each set E(v), messages sent
to check nodes are computed as follows

m1 ðeÞ ¼ rðvÞ:

Step (i) (check node update): For each check node c and each set E(c), the update
rule for the ith iteration, i >1, is defined as follows
0 1
X
m0i ðeÞ ¼@ mi1 ðeÞAmod 2:
e0 2EðcÞnfeg

Step (ii) (variable node update): For each variable node v and each set E(v), the
update rule for the ith iteration, i >1, is defined as follows
8 P
< 1; if m0 ðeÞ  dc=2e
Pe0 2EðvÞnfeg i0
mi ðeÞ ¼ 0; if e0 2EðvÞnfeg mi ðeÞ  c  1  dc=2e;
:
rðvÞ; otherwise:

(b) Decoding of the fifth bit will be explained step by step:


476 9 Low Density Parity Check Codes

(1) Initialization
For variable node v5 incident edges are v5 ! c2 and v5 ! c3 and messages that
are initially sent are:

m1 ðv5 ! c2 Þ ¼ m1 ðv5 ! c3 Þ ¼ 1;

while the messages sent over all other edges are equal to binary zero (as the other
bits in received vector are equal to zero), as shown in Fig. 9.14.
(2) Step (i): update of check nodes (illustrated in Fig. 9.15):
– Update for check node c2

m02 ðc2 ! v5 Þ ¼ ðm1 ðv1 ! c2 Þ þ m1 ðv6 ! c2 Þ þ m1 ðv7 ! c2 ÞÞ mod 2


¼ ð0 þ 0 þ 0Þ mod 2 ¼ 0:

– Update for check node c3

m02 ðc3 ! v5 Þ ¼ ðm1 ðv2 ! c3 Þ þ m1 ðv8 ! c3 Þ þ m1 ðv9 ! c3 ÞÞ mod 2


¼ ð0 þ 0 þ 0Þ mod 2 ¼ 0:

(c) Step (ii): variable node update (illustrated in Fig. 9.16):


Under the assumption that all parity-checks are satisfied, new estimation of the
observed variable node is equal to message that is sent from the check node.
As different check nodes can make the different estimations, the final estimation
in the current iteration can be obtained by a majority voting. On the other hand, to
the edge on the graph to the ith parity-check node, the majority voting of the
estimations from the other check nodes is sent, excluding the ith node.
For variable node j = 5 and set E(12) = {c2 ! v5, c3 ! v5} one obtains the
estimations according to Fig. 9.16:
1. For the edge v5 ! c2 one obtains

v1 = 0 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0

0 0 1
0 10 0 0
0 0 0 0 0
0 0 0 0 0 0 0
c1 = 0 c2 = 1 c3 = 1 c4 = 0 c5 = 0

Fig. 9.14 Initialization, red bold lines corresponds to send binary ones
Problems 477

v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 1 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0

0 0

0
0 0 0
0 0
c1 = 1 c2 = 0 c3 = 0 c4 = 0 c5 = 0

Fig. 9.15 Update of check nodes, red lines correspond c2, blue to c3, solid lines report from
variable nodes to check nodes, bold dashed lines report the estimations of symbol v12 by using
parity-checks

v1 = 1 v2 = 0 v3 = 0 v4 = 0 v5 = 0 v6 = 0 v7 = 0 v8 = 0 v9 = 0 v10 = 0

0 0

c1 = 0 c2 = 0 c3 = 0 c4 = 0 c5 = 0

Fig. 9.16 Update of check nodes, red lines correspond c2, blue to c3, solid lines report from
variable nodes to check nodes, bold dashed lines report the estimations of symbol v12 by using
parity-checks

X
m02 ðe0 Þ ¼ m02 ðc3 ! v5 Þ ¼ 0 ) m2 ðv5 ! c2 Þ ¼ 0
e0 2Eð5Þnfv5 !c2 g

2. For the edge v5 ! c3 one obtains


X
m02 ðe0 Þ ¼ m02 ðc2 ! v5 Þ ¼ 0 ) m2 ðv5 ! c3 Þ ¼ 0
e0 2Eð5Þnfv5 !c3 g

Therefore, the updated word is y′ = (0000100000) and it can be easily checked


that y′ is valid codeword as y′HT = 0. Therefore, the decoding procedure is
finished.
Problem 9.6 LDPC code parity-check matrix is
478 9 Low Density Parity Check Codes

2 3
0 1 0 1 1 0 0 1
61 1 1 0 0 1 0 07
H¼6
40
7:
0 1 0 0 1 1 15
1 0 0 1 1 0 1 0

(a) Find the code basic parameters and draw the corresponding Tanner graph.
(b) Find all code words corresponding to this matrix and find the code weight
spectrum.
(c) If the code words are transmitted by the unit power polar pulses, find the
samples at the decoder input if the corresponding sample values of additive
Gaussian noise are

n ¼ ð1:1; 0:2; 0:3; 1:5; 0; 1; 0:4; 0:15; 0:25Þ:

(d) Supposing the channel noise variance r2 = 0.49, decode the words found
above by using sum-product algorithm.

Solution
(a) Number of ones in every column here is s = 2, and number of ones in every
row is q = 4. It is a regular LDPC code, matrix density is r = 2/4 = 0.5 and
the code rate R = 1−c/q = 1/2. The matrix has n = 8 columns and n−k = 4
rows, the code basic parameters are (n, k) = (8, 4).
Tanner graph of the code is shown in Fig. 9.17. It can be noticed that there are
no cycles of minimum length. A rule for its forming is shown as well—connections
between the second check node and the corresponding variable nodes (the second
matrix row) are denoted by heavy lines, while the connections of the seventh
variable node and the nodes corresponding to parity-check sums where they appear
(the seventh matrix column) are denoted by dashed lines.

v1 v2 v3 v4 v5 v6 v7 v8

c1 c2 c3 c4

Fig. 9.17 Tanner graph for code (8, 4)


Problems 479

⎡0 1 0 1 1 0 0 1⎤ 1
⎢1 1 1 0 0 1 0 0 ⎥⎥
H =⎢ 2
⎢0 0 1 0 0 1 1 1⎥
⎢ ⎥ 3
⎣1 0 0 1 1 0 1 0⎦
4
1 2 3 4 5 6 7 8

(b) On the basis of the parity-check matrix, vector v (eight bits) is a code vector if
the following is satisfied

c1 : v2  v4  v5  v8 ¼ 0;
c2 : v1  v2  v3  v6 ¼ 0;
c3 : v3  v6  v7  v8 ¼ 0;
c4 : v1  v4  v5  v7 ¼ 0:

From these four linearly independent equations a maximum of four unknown


variables can be determined, all code words can be found if the values for four bits
are fixed, while the other four are obtained by solving this system. For example, if
bits v3, v4, v5, v6, are fixed, the other bits are found by solving the following system
of equations
2 3
v2  v8 ¼ v4  v5 ; 0 1 0 1
v1  v2 ¼ v3  v6 ; 61 1 0 07
) ð v1 ; v2 ; v7 ; v8 Þ 6
40 0 1 15
7
v7  v8 ¼ v3  v6 ;
v1  v7 ¼ v4  v5 : 1 0 1 0
¼ ðv4  v5 ; v3  v6 ; v3  v6 ; v4  v5 Þ;

but this system has not a unique solution in spite that the matrix H rank is 4!
Not until five bits are fixed, e.g. v1, v2, v3, v4, v5, the others bits can be found
from relations

c2 : v6 ¼ v1  v2  v3 ;
c4 : v7 ¼ v1  v4  v5 ;
c1 : v8 ¼ v2  v4  v5 :

and it is clear that one possible generator matrix is


2 3
1 0 0 0 0 1 1 0
60 1 0 0 0 1 0 17
6 7
G¼6
60 0 1 0 0 1 0 077
40 0 0 1 0 0 1 15
0 0 0 0 1 0 1 1

The matrix has a full rank, it is the code (8, 5) and the corresponding code words
are
480 9 Low Density Parity Check Codes

ð00000000Þ; ð01000101Þ; ð10000110Þ; ð11000011Þ


ð00001011Þ; ð01001110Þ; ð10001101Þ; ð11001000Þ
ð00010011Þ; ð01010110Þ; ð10010101Þ; ð11010000Þ
ð00011000Þ; ð01011101Þ; ð10011110Þ; ð11011011Þ
ð00100100Þ; ð01100001Þ; ð10100010Þ; ð11100111Þ
ð00101111Þ; ð01101010Þ; ð10101001Þ; ð11101100Þ
ð00110111Þ; ð01110010Þ; ð10110001Þ; ð11110100Þ
ð00111100Þ; ð01111001Þ; ð10111010Þ; ð11111111Þ

The code spectrum is

að0Þ ¼ 1; að2Þ ¼ 2; að3Þ ¼ 8; að4Þ ¼ 10; að5Þ ¼ 8; að6Þ ¼ 2; að8Þ ¼ 1; að1Þ


¼ að7Þ ¼ 0;

minimum Hamming distance is dmin = 2 and it cannot be guaranteed even that this
code will always could correct one error in a code word. However, further it will be
shown that in some situations, using a suitable decoding procedure, it could correct
more!
(c) Consider the transmission of code word x = (11111111). This vector is
emitted as a polar pulse train x* = (+1, +1, +1, +1, +1, +1, +1, +1). The
transmission is through the channel with AWGN which has a known samples
values, and at its output the vector y* = (−0.1, 1.2, 0.7, −0.5, 1.1, 0.6, 0.85,
0.75) is obtained.
As shown in Fig. 9.18 this signal can be decoded using various procedures—
after hard decision majority decoding can be applied or iterative decoding on the
bit-flipping algorithm basis. The received word in this case would be
y = (01101111), where the errors occurred at the positions 1 and 4. It is recom-
mended to the reader to find the corresponding result if the iterative decoding with
hard decisions is applied.
In the following the procedure for decoding of the received word using sum-
product algorithm (SPA), known as well as belief-propagation algorithm, proposed

i c x Channel y Iterative decoder


LDPC 0→-1
Source with sum-product User
encoder 1→+1
noise (bilief propagation)

Q(aˆn )
2

1
-1→0 r Majority
aˆn
-2 -1 1 2

+1→1 decoder
-1

-2

q=2

Iterative decoder
bit-flipping

Fig. 9.18 Block scheme of a transmission system using LDPC codes


Problems 481

firstly in [90] and rediscovered in [93] will be described. It is supposed the AWGN
in channel has a standard deviation r = 0.7, the coefficients fjc are calculated firstly
on the basis of probability density functions according to the transmitted signal
values

ðyj þ 1Þ2 ðyj 1Þ2


1 1
fj0 ¼ pffiffiffiffiffiffi e 2r2 ; fj1 ¼ pffiffiffiffiffiffi e 2r2
2pr 2pr

the corresponding numerical values are given in Table 9.3. In such a way the initial
metrics are found showing the likelihood that single samples originate from the one
of the two possible emitted symbols.
Further, two sets of coefficients are calculated sent from the variable to the check
nodes in the first iteration, Q0ij and Q1ij . As presented in Fig. 9.19, the initial values
of these coefficients are found from

fjc ; Hði; jÞ ¼ 1;
Qcij ¼
0; Hði; jÞ ¼ 0:

After the check node ci receives from all connected variable nodes vj the
probability values Qcij (for c = 0 and c = 1), the probabilities are calculated that ith
parity-check equation is satisfied, if the variable node has the value c, using the
relation
X Y
Rcij ¼ Pðci =vÞ Qdikk
dj ¼c k2NðiÞ=j

where N(i) is a set of parent (variable) nodes indexes connected to the check node
ci, and N(i)/j is the same set, but without the node vj to which the information is
sent.
The term P(ci/v) denotes the probability that jth parity-check equation is satis-
fied, and summing is carried out over all possible decoded vectors v for which the
parity-check equation is satisfied when the informed variable node is in the state c.
Therefore, the coefficient R123 is the estimation that the check node 2 sends to the
variable node 2 (shown in Fig. 9.20) and it is calculated under the condition that the
corresponding parity-check equation v2  v4  v5  v7 ¼ 0 is satisfied and when
the second variable node is in the state v2 = 1, being

Table 9.3 The received vector values and the corresponding coefficients fjx
j 1 2 3 4 5 6 7 8
yj –0.1 1.2 0.3 –0.3 1.1 0.6 0.85 0.75
fj0 0.2494 0.0041 0.1016 0.3457 0.0063 0.0418 0.0173 0.0250
fj1 0.1658 0.5471 0.3457 0.1016 0.5641 0.4841 0.5570 0.5347
482 9 Low Density Parity Check Codes

v1 v2 v3 v4 v5 v6 v7 v8

Q14c = f 4c Q15c = f 5c
c c
Q = f
12 2
Q18c = f8c

c1 c2 c3 c4

Fig. 9.19 Initial information transmission from the variable nodes to the check nodes

v1 v2 = 1 v3 v4 v5 v6 v7 v8

Q14x
R12d 2 =1 Q15x
Q18x

c1 c2 c3 c4

Fig. 9.20 Transmission of initial information from check node to variable nodes

R112 ¼ Q014 Q015 Q117 þ Q014 Q115 Q017 þ Q114 Q015 Q017 þ Q114 Q115 Q117 ;

because for v2 = 1, the combinations ðv4 ; v5 ; v7 Þ: ð0; 0; 1Þ, ð0; 1; 0Þ, ð1; 0; 0Þ,
ð1; 1; 1Þ satisfy the parity-check.
Values R0ij and R1ij make possible to find the first estimation of the received
vector.
) )
v01 ¼ f10 R021 R041 ¼ 7:25  104 v05 ¼ f50 R015 R045 ¼ 4:19  105
) v1 ¼ 0 ) v5 ¼ 1
v11 ¼ f11 R121 R141 ¼ 5:49  104 v15 ¼ f51 R115 R145 ¼ 9:15  104
) )
v02 ¼ f20 R012 R022 ¼ 3:79  105 v06 ¼ f60 R026 R036 ¼ 6:49  105
) v2 ¼ 1 ) v6 ¼ 1
v12 ¼ f21 R112 R122 ¼ 6:98  104 v16 ¼ f61 R126 R136 ¼ 3:90  104
) )
v03 ¼ f30 R023 R033 ¼ 4:97  105 v07 ¼ f70 R037 R047 ¼ 2:23  105
) v3 ¼ 1 ) v7 ¼ 1
v13 ¼ f31 R123 R133 ¼ 3:79  103 v17 ¼ f71 R137 R147 ¼ 5:16  103
) )
v04 ¼ f40 R114 R144 ¼ 3:85  104 v08 ¼ f80 R018 R038 ¼ 8:39  105
) v4 ¼ 1 ) v8 ¼ 1
v14 ¼ f41 R114 R144 ¼ 5:25  104 v18 ¼ f81 R118 R138 ¼ 1:53  103

By forming the first estimation of the codeword v0 ¼ ð01111111Þ the first iter-
ation is ended. Because this vector does not satisfy the relation v0 H T ¼ 0, the
procedure should be continued (as a sent word is x = (1111110), it is obvious that
Problems 483

there is still one more error at the first bit, but the decoder, of course, does not
“know” it!).
The next iteration starts by calculating coefficients Q0ij and Q1ij . The variable node
vj from connected check nodes receives the probabilities Rcij values (for c = 0 and
c = 1), the probability is corrected that this node is in the state c
0 11
Y X Y
Qcij ¼ aij fjc Rckj ; aij ¼ @ fc
c j
Rckj A
k2MðjÞ=i k2MðjÞ=i

where aij is a normalization constant and M(j)/i is the set of indexes of check nodes
connected to the node vj, but without the check node ci to which the information is
sent.
On the basis of Fig. 9.21 it is obvious that except the check node to which the
considered variable node sends the information, only one more node is connected to
it. For the variable node j = 1, it can be written [93]

Q021 ¼ a21 f10 R041 ; Q121 ¼ a21 f11 R141 ; a21 ¼ ðf10 R041 þ f11 R141 Þ1 ;
Q041 ¼ a41 f10 R021 ; Q121 ¼ a41 f11 R121 ; a41 ¼ ðf10 R021 þ f11 R121 Þ1 ;

while, e.g. for the sixth variable node, the corresponding equations are

Q026 ¼ a26 f10 R036 ; Q126 ¼ a26 f11 R136 ; a21 ¼ ðf10 R036 þ f11 R136 Þ1 ;
Q036 ¼ a36 f10 R026 ; Q136 ¼ a36 f11 R126 ; a36 ¼ ðf10 R026 þ f11 R126 Þ1 ;

After that, the calculating procedure for coefficients R0ij and R1ij is repeated, as
described earlier.
Numerical values of coefficients Q0ij , Q1ij , R0ij and R1ij , for all combinations of
check and variable nodes, are given in Tables 9.4 9.5 9.6 9.7 9.8 and 9.9,
respectively.

v1 = 1 v2 v3 v4 v5 v6 = 0 v7 v8

v6 = 0
v1 =1 R26
R41
Q36v6 =0
v1 =1
Q21

c1 c2 c3 c4

Fig. 9.21 Information transmission from variable nodes to the check nodes, start of the next
iteration
484 9 Low Density Parity Check Codes

Table 9.4 Coefficients R0ij values in the first iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.1343 0 0.0108 0.1301 0 0 0.1366
2 0.0208 0.0691 0.0702 0 0 0.0740 0 0
3 0 0 0.0237 0 0 0.0210 0.0257 0.0245
4 0.1396 0 0 0.0806 0.0508 0 0.0501 0

Table 9.5 Coefficients R1ij values in the first iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.0250 0 0.1652 0.0238 0 0 0.0203
2 0.1385 0.0510 0.0501 0 0 0.0518 0 0
3 0 0 0.1453 0 0 0.1558 0.1361 0.1415
4 0.0239 0 0 0.0554 0.0682 0 0.0680 0

Table 9.6 Values of coefficients Q0ij in the second iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.01 0 0.918 0.0083 0 0 0.0081
2 0.8977 0.0385 0.0093 0 0 0.0082 0 0
3 0 0 0.0744 0 0 0.1099 0.0224 0.2401
4 0.1845 0 0 0.335 0.0578 0 0.0058 0

Table 9.7 Values of coefficients Q1ij in the second iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.99 0 0.082 0.9917 0 0 0.9919
2 0.1023 0.9615 0.9907 0 0 0.9885 0 0
3 0 0 0.9256 0 0 0.8901 0.9776 0.7599
4 0.8155 0 0 0.6650 0.9422 0 0.9942 0

Table 9.8 Values of coefficients R0ij in the second iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.9044 0 0.0259 0.9030 0 0 0.9028
2 0.0574 0.8814 0.8587 0 0 0.8603 0 0
3 0 0 0.3063 0 0 0.2887 0.3274 0.1829
4 0.3558 0 0 0.2242 0.3971 0 0.4079 0
Problems 485

Table 9.9 Values of coefficients R1ij in the second iteration


i/j 1 2 3 4 5 6 7 8
1 0 0.0956 0 0.9741 0.0970 0 0 0.0972
2 0.9426 0.1186 0.1413 0 0 0.1397 0 0
3 0 0 0.6937 0 0 0.7113 0.6726 0.8171
4 0.6442 0 0 0.7758 0.6029 0 0.5921 0

The second estimation of the received vector is found from


) ) ) )
5:10  103 3:25  103 7:85  103 2:57  103
) v1 ¼ 1; ) v2 ¼ 1; ) v3 ¼ 1; ) v4 ¼ 1
1:01  101 6; 2  103 5:10  102 4:34  102
) ) ) )
2:27  103 1:04  102 2:32  103 4:13  103
) v5 ¼ 1; ) v6 ¼ 1; ) v7 ¼ 1; ) v8 ¼ 1
3:3  102 4:81  102 2:22  101 4:25  102

and v00 ¼ ð11111111Þ, this vector satisfies the relation v00 H T ¼ 0, a valid code
word is reconstructed and the decoding procedure is ended. It should be noticed that
the node for which the metrics are recalculated (it reports or is informed) does not
take part in the corresponding estimation calculation, providing that a procedure
converge to the correct solution. Algorithm is simpler if the number of ones in
every row is smaller, what is one of the reasons that it is desirable for the LDPC
parity-check matrix to have a small density.
Furthermore, during the decision whether the zero or one is decoded, the relative
ratio of values dj,0 i dj,1 is only important, and these values can be presented in a
normalized form. These normalized values are the estimation of likelihood for a
decision made after the iteration having the ordinal number iter, which, similarly as
for turbo codes decoding can be denoted as

0;ðiterÞ v0j 1;ðiterÞ v1j


Lj ¼ ; Lj ¼ ; j ¼ 1; 2; . . .; n
v0j þ vj 1 v0j þ v1j

corresponding to the probability that in the jth symbol interval the bit 0/1 is emitted,
when the received word is known. It is clear that a priori probabilities are 0.5 (if it is
not said that the symbols are not equiprobable) and initial estimation of likelihoods is

0;ð0Þ fj0 0;ð0Þ fj1


Lj ¼ ; Lj ðvj Þ ¼ ; j ¼ 1; 2; . . .; n
fj0 þ fj 1 fj0 þ fj1

corresponding to the estimation before the decoding started (after the zeroth iter-
ation), i.e. to reliability estimation if only hard decision of the receiving bits was
made, without any processing by the decoder.
In Table 9.10 is given how the reliability estimations for single symbols change
during the iterative decoding (shadowed fields in the table correspond to wrongly
reconstructed bits):
486 9 Low Density Parity Check Codes

Table 9.10 Reliability estimations for symbols during the iterations for r = 0.7
Iter Lsj,( iter ) / j 1 2 3 4 5 6 7 8
0,(0)
L j 0.6006 0.0074 0.0543 0.8850 0.0111 0.0795 0.0302 0.0447
0
L1,(0)
j 0.3994 0.9926 0.9457 0.1150 0.9889 0.9205 0.9698 0.9553
L0,(1)
j 0.5690 0.0514 0.0130 0.4228 0.0438 0.0164 0.0043 0.0520
1
L1,(1)
j 0.4310 0.9486 0.9870 0.5772 0.9562 0.9836 0.9957 0.9480
L0,(2)
j 0.0482 0.3440 0.1335 0.0559 0.0644 0.1776 0.0103 0.0887
2
L1,(2)
j 0.9518 0.6560 0.8665 0.9441 0.9356 0.8224 0.9897 0.9113
L0,(3)
j 0.0584 0.0135 0.0377 0.1053 0.0068 0.0488 0.0105 0.0661
3
L1‚(3)
j 0.9416 0.9865 0.9623 0.8947 0.9932 0.9512 0.9895 0.9339
L0,(5)
j 0.0190 0.0123 0.0072 0.0280 0.0103 0.0097 0.0024 0.0083
5
L1,(5)
j 0.9810 0.9877 0.9928 0.9720 0.9897 0.9903 0.9976 0.9917
L0,(10)
j 0.0008 0.0003 0.0003 0.0008 0.0002 0.0004 0.0001 0.0004
10
L1,(10)
j 0.9992 0.9997 0.9997 0.9992 0.9998 0.9996 0.9999 0.9996

– before the decoding, the estimation is wrong at the first and the fourth symbol,
but these estimations are highly unreliable, while the values for the other bits are
very reliable,
– after the first iteration the error at the fourth bit is corrected, but its estimation is
unreliable, while the error at the first bit is not corrected, but this estimation is
not reliable as well,
– after the third iteration all errors are corrected and a valid code word is decoded,
but the second bit estimation has still a moderate reliability,
– after the third estimation all bits are estimated with the reliability higher than
90% and the reliability further grows with the increasing iteration number and
after the tenth iteration it is greater than 99.9%!
It is shown that the decoding can be stopped always in that iteration where a
valid code word is reconstructed, because the algorithm must converge to a unique
solution and the estimation reliability only grows with the number of iterations.
Besides, the reliability estimation of the obtained result depends on the estimated
channel signal-to-noise ratio (fjc depends on yj and r2), and not only on the received
word. In Table 9.11 the values of these estimations are given before the decoding
and during the first and the second iteration. The reader should draw the corre-
sponding conclusion on the basis of the obtained results.
Problem 9.7 Communication system uses for error control coding LDPC code
described by the parity-check matrix from the previous problem. For transmission
the unit power polar pulses are used and the signal at the channel output is
Problems 487

Table 9.11 Reliability estimations for symbols during the iterations for r = 1

Iter Lsj,( iter ) / j 1 2 3 4 5 6 7 8


L0,(0)
j 0.5498 0.0832 0.1978 0.7311 0.0998 0.2315 0.1545 0.1824
0
L1,(0)
1 0.4502 0.9168 0.8022 0.2689 0.9002 0.7685 0.8455 0.8176
L0,(1)
j 0.5419 0.1351 0.1429 0.5512 0.1463 0.1621 0.1005 0.2110
1
L1,(1)
1 0.4581 0.8649 0.8571 0.4488 0.8537 0.8379 0.8995 0.7890
L0,(2)
j 0.3638 0.1976 0.2102 0.4262 0.1765 0.2403 0.1275 0.2292
2
L1,(2)
1 0.6362 0.8024 0.7899 0.5738 0.8235 0.7597 0.8725 0.7708

y ¼ ð0:1; 1:2; 0:7; 0:5; 1:1; 0:6; 0:85; 0:75Þ:

Decode the signal by applying a logarithmic version of sum-product algorithm if


the channel noise is AWGN with variance r2 = 0.49 for two following cases:
(a) Channel output is continuous,
(b) Before the decoder input, the decision block is inserted which has a zero
threshold.

Solution
The same code and the same sequence at the channel output are considered as in the
previous problem, but the decoding procedure differs, i.e. here the realization of
sum-product algorithm in logarithmic domain is considered. The motives to use a
logarithmic domain are the same as for turbo codes decoding—avoiding the
products calculation always when it is possible (i.e. their change by summations)
and lessening the overflow effect.
In this algorithm version on the basis of initial estimations of a posteriori
probabilities Pðxn ¼ þ 1=yn Þ and Pðxn ¼ 1=yn Þ for every symbol a priori loga-
rithmic likelihood ratio is defined

ð0Þ
Kj ¼ logðPðxn ¼ þ 1=yn Þ=Pðxn ¼ 1=yn ÞÞ:

Similarly, a logarithmic likelihood ratio can be defined corresponding to infor-


mation sent by a variable nodes to the connected check node Qcij (corresponding to
the probability that this variable node is in c state, on the basis of the variable node
estimation itself), and information sent by the check node to the adjoined variable
node j, R0ij (corresponding to the probability that this variable node is in the state c,
on the basis of the estimation reporting check node).

ai;j ¼ logðQ1i;j =Q0i;j Þ; bi;j ¼ logðR1i;j =R0i;j Þ:

Sum-product algorithm in logarithmic domain can now be described by the


following steps [95]:
488 9 Low Density Parity Check Codes

1. For all check nodes connected with jth variable node is set initially

ð0Þ
ai;j ¼ Kj :

2. For every iteration the following is calculated


8 9
< Y = X
ð0Þ
bi;j ¼ 2 tan h1 tan hðai;k =2Þ ; ai;j ¼ Kj þ bk;j ;
:k2NðiÞnj ; k2MðjÞni

where tanh(x) denotes hyperbolic tangent. A posteriori information concerning


symbols after iteration, denoted by ordinal number iter, are

ðiterÞ ð0Þ
X
Kj ¼ Kj þ bi;j ;
i2MðjÞ

and the estimation of a corresponding code word symbol after current iteration is
ðiterÞ ð0Þ
found from vj ¼ sgnðKj Þ, where sgn(x) is the sign of the argument x.
3. Procedure from the previous step is repeated iteratively until vHT = 0 is
obtained or if a fixed (given in advance) number of iterations is achieved.
In this problem two different cases will be considered, as shown in Fig. 9.22.
In the first case the initial estimations are obtained on the “soft inputs” basis,
while in the second case they are obtained from the quantized samples (equivalent
to the received binary word). As it will be shown, the output in both cases is
“soft”, i.e. besides the send code word estimation a logarithmic likelihood ratio
will be found for every code word symbol, estimated on the channel output
sequence basis.
(a) When the estimation is based on the non quantized samples, the initial esti-
mations are formed using the same relations as in a previous problem

i x x* Channel y* Q(aˆn )
2
y Iterative decoder
LDPC 0→-1 1

Source with -2 -1 1 2 aˆn

sum-product User
encoder 1→+1
-1

noise
-2

q=2 (log-domain)

Iterative decoder
sum-product
(log-domain)

Fig. 9.22 Block scheme of a transmission system using LDPC codes when the decoding is
performed using sum-product algorithm in a logarithmic domain
Problems 489

1 ðyj þ 1Þ2
1 ðyj 1Þ2  
ð0Þ
fj0 ¼ pffiffiffiffiffiffi e 2r2 ; fj1 ¼ pffiffiffiffiffiffi e 2r2 ) Kj ¼ log fj1 =fj0 :
2pr 2pr

The variables ai;j ; bi;j values are further calculated by the above expressions and
the numerical values, concluding with the second iteration, are given in Table 9.12.
On the basis of numerical values given in Table 9.13 the same conclusions can
be drawn as in the previous problem. It is easy to verify that between the logarithms
of likelihood ratio and a posteriori estimations that zero or one were sent (when
“non-logarithmic” SPA logarithm version is used) the connection can be
established

ðiterÞ ð1Þ ð0Þ


Kj ¼ logðLj;iter =Lj;iter Þ;

and it is left to a reader to verify whether this relation is satisfied for the numerical
values given in Tables 9.10 and 9.13.
(b) When the estimation is based on the quantized samples, system part from the
encoder output to the decoder input can be described as a binary symmetric
channel where the crossover probability is

1 1
p ¼ erfc pffiffiffi ¼ 0:0766:
2 2r

In this case, the quantizer output sequence is

yq ¼ ð1; þ 1; þ 1; 1; þ 1; þ 1; þ 1; þ 1Þ;

and the initial estimations are the same for all “positive” symbols (i.e. for all
“negative” symbols) given by

ð0Þ
Kj ¼ yqj logðð1  pÞ=pÞ:

Numerical values of variables ai;j ; bi;j are calculated by this relation and given in
Table 9.14, while the corresponding likelihood ratios are given in Table 9.15. It can
be noticed that a large number of symbols has the same likelihood ratio and that the
relation between the likelihood ratios does not changes significantly during a few
first iterations.
Already after the first iteration the algorithm reconstructs the word

v ¼ ð00101111Þ;

which satisfies the relation vHT = 0, i.e. it is a valid code word. However, it is clear
that it is not a sent code word (the sent code word x is composed from a binary ones
490

Table 9.12 Estimations of symbols reliability during the iterations, log-SPA, soft inputs
Iter i/j 1 2 3 4 5 6 7 8
0 ð0Þ –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
Kj
1 aij 1 0 4.8980 0 –2.0408 4.4898 0 0 3.0612
2 –0.4082 4.8980 2.8571 0 0 2.4490 0 0
3 0 0 2.8571 0 0 2.4490 3.4694 3.0612
4 –0.4082 0 0 –2.0408 4.4898 0 3.4694 0
bij 1 0 –1.6791 0 2.7264 –1.6988 0 0 –1.9089
2 1.8944 –0.3041 –0.3367 0 0 –0.3573 0 0
3 0 0 1.8132 0 0 2.0048 1.6678 1.7518
4 –1.7642 0 0 –0.3742 0.2934 0 0.3055 0
2 aij 1 0 4.5938 0 –2.4150 4.7832 0 0 4.8130
2 –2.1724 3.2188 4.6703 0 0 4.4538 0 0
9

3 0 0 2.5204 0 0 2.0917 3.7749 1.1523


4 1.4863 0 0 0.6856 2.7910 0 5.1371 0
bij 1 0 –2.2471 0 3.6268 –2.2309 0 0 –2.2287
2 2.7977 –2.0055 –1.8045 0 0 –1.8180 0 0
3 0 0 0.8175 0 0 0.9018 0.7200 1.4970
4 0.5936 0 0 1.2412 0.4175 0 0.3725 0
Low Density Parity Check Codes
Problems 491

Table 9.13 Logarithmic likelihood ratio during the iterations, log-SPA, soft inputs

Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
1 Λ (1)
j –0.2779 2.9147 4.3336 0.3114 3.0844 4.0965 5.4426 2.9041
2 Λ (2)
j 2.9832 0.6453 1.8701 2.8272 2.6764 1.5328 4.5619 2.3295
3 Λ (3)
j 2.7796 4.2936 3.2386 2.1395 4.9860 2.9701 4.5433 2.6488
5 Λ (5)
j 3.9420 4.3898 4.9216 3.5455 4.5636 4.6213 6.0449 4.7795
10 Λ (10)
j 7.0962 8.2109 8.1903 7.0840 8.3884 7.9194 9.1831 7.7597

only), and besides the two errors occurred in a channel, the decoder introduced an
additional error at the second code word bit.
It is interesting to notice that the Hamming distance of the reconstructed code
word x′ = (00101111) to the received word y = (01101111) equals one, while the
distance the received word to the emitted word x = (11101111) equals two.
Therefore, the algorithm made the best possible decision on the received word basis
and the parity-check matrix structure knowledge.
On the other hand, squared Euclidean distance of sequence y* = (−0.1, 1.2, 0.7,
−0.5, 1.1, 0.6, 0.85, 0.75) to the concurrent emitting sequences (−1, −1, 1, −1, 1, 1,
1, 1) and (+1, +1, +1, +1, +1, +1, +1, +1) respectively is 6.245 i 3.845, and the
sequence corresponding to the emitted code word is nearer to the received
sequence, explaining the result from the first part of this problem.
Now it is obvious that the decoding error in the second part of solution is not the
consequence of the non-optimum decoding algorithm (it is the same in both cases),
but of the quantization error. Of course, if the quantization was carried out using a
sufficient number of levels, this effect would become negligible.
Problem 9.8 Transmission system uses LDPC error control code described by
parity-check matrix given in the previous problems, for transmission the unit power
polar pulses are used and the signal at the channel output is

y ¼ ð0:1; 1:2; 0:7; 0:5; 1:1; 0:6; 0:85; 0:75Þ;

Decode the signal if the channel noise is AWGN with variance r2 = 0.49, when
the signal is led directly to the decoder input by applying the following:
(a) Min-sum algorithm,
(b) Min-sum algorithm supposing that the initial LLR-s are determined by signal
samples,
(c) Self-correcting min-sum algorithm supposing that the initial LLR-s are
determined by signal samples.
492

Table 9.14 Estimations of symbols reliability during the iterations, log-SPA, hard inputs
Iter i/j 1 2 3 4 5 6 7 8
0 Lj –2.49 2.49 2.49 –2.49 2.49 2.49 2.49 2.49
1 aij 1 0 2.49 0 –2.49 2.49 0 0 2.49
2 –2.49 2.49 2.49 0 0 2.49 0 0
3 0 0 2.49 0 0 2.49 2.49 2.49
4 –2.49 0 0 –2.49 2.49 0 2.49 0
bij 1 0 –1.4095 0 1.4095 –1.4095 0 0 –1.4095
2 1.4095 –1.4095 –1.4095 0 0 –1.4095 0 0
3 0 0 1.4095 0 0 1.4095 1.4095 1.4095
4 –1.4095 0 0 –1.4095 1.4095 0 1.4095 0
2 aij 1 0 1.0805 0 –3.8995 3.8995 0 0 3.8995
2 –3.8995 1.0805 3.8995 0 0 3.8995 0 0
9

3 0 0 1.0805 0 0 1.0805 3.8995 1.0805


4 –1.0805 0 0 –1.0805 1.0805 0 3.8995 0
bij 1 0 –2.8019 0 0.9814 –0.9814 0 0 –0.9814
2 0.9814 –2.8019 –0.9814 0 0 –0.9814 0 0
3 0 0 0.4759 0 0 0.4759 0.2411 0.4759
4 –0.4759 0 0 –0.4759 0.4759 0 0.2411 0
Low Density Parity Check Codes
Problems 493

Table 9.15 Logarithmic likelihood ratio during the iterations, log-SPA, hard inputs

Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –2.4900 2.4900 2.4900 –2.4900 2.4900 2.4900 2.4900 2.4900
1 Λ (1)
j –2.4900 –0.3290 2.4900 –2.4900 2.4900 2.4900 5.3090 2.4900
2 Λ (2)
j –1.9844 –3.1139 1.9844 –1.9844 1.9844 1.9844 2.9721 1.9844
3 Λ (3)
j –3.4898 –1.2587 3.4898 –3.4898 3.4898 3.4898 3.5515 3.4898
5 Λ (5)
j –3.4409 –3.7699 3.4049 –3.4049 3.4049 3.4049 4.3304 3.4049
10 Λ (10)
j –5.7482 –4.1346 5.7482 –5.7482 5.7482 5.7482 7.1994 5.7482

Solution
As known from the previous problem, logarithmic version of a sum-product
algorithm has the substantial advantages in respect to its original version:
– the number of multiplications is decreased in favor the additions
– the overflow probability is decreased
– the number of relations whose values should be found is two times smaller—
instead of Q0ij and Q1ij only ai;j is calculated, and instead of R0ij and R1ij only bi;j is
calculated,
– a number of mathematical operations in every step is smaller, there is no need
for normalization etc.
However, the algorithm explained in a previous problem has one serious
drawback. To find the information which the check node sends to it adjoined
variable node j one has to determine
8 9
< Y =
bi;j ¼ 2 tan h1 tan hðai;k =2Þ ;
:k2NðiÞnj ;

it is obvious that the corresponding numerical value cannot be easily found, and still
more serious problems appear when the hardware block should be implemented
corresponding to this relation.
That is the reason to use the SPA procedure versions much easier to be
implemented:
1. The simplest (and still rather exact) approximation of the above equality is
2 3
Y  
 
bi;j ¼ 4 sgnðai;k Þ5  min ai;k  ;
k2NðiÞnj
k2NðiÞnj
494

Table 9.16 Estimations of symbols reliability by the check nodes, min-sum, soft inputs
Iter i/j 1 2 3 4 5 6 7 8
0 ð0Þ –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
Kj
1 bij 1 0 –2.0408 0 3.0612 –2.0408 0 0 –2.0408
2 2.4490 –0.4082 –0.4082 0 0 –0.4082 0 0
3 0 0 2.4490 0 0 2.8571 2.4490 2.4490
4 –2.0408 0 0 –0.4082 0.4082 0 0.4082 0
2 bij 1 0 –2.4490 0 4.4898 –2.4490 0 0 –2.4490
2 2.8571 –2.4490 –2.4490 0 0 –2.4490 0 0
9

3 0 0 1.0204 0 0 1.0204 1.0204 2.0408


4 1.0204 0 0 2.0408 1.0204 0 1.0204 0
Low Density Parity Check Codes
Problems 495

being the key modification defining the difference between SPA and min-sum
algorithm [97].
2. The modification concerning the information which the variable node i sends to
check node j connected to it is possible as well, and the corresponding value is
now calculated in two steps
X 
ð0Þ ai;j ; sgn(ai;j Þ ¼ sgn(ai;j Þ;
ai;j ¼ Kj þ bk;j ; ai;j ¼
0; sgn(ai;j Þ 6¼ sgn(ai;j Þ;
k2MðjÞni

and this correction defines self-correcting min-sum algorithm [98].


(a) The modification introduced in min-sum algorithm directly influences the
coefficient bij values, these variable values during the first two iterations are
given in Table 9.16. Of course, during next iterations, these coefficient values
changes will influence at the other relevant parameters, what can be easily
followed from a posteriori LLR values, given in Table 9.17.
Although min-sum is an approximate version of sum-product algorithm, the
approximation here allowed the code word decoding already after the first iteration.
However, the estimation of the first bit after the first iteration and the estimation of
the second bit after the second iteration are extremely unreliable! Because of that,
not until the third iteration one can be relatively reliable asserted that a code word
was successfully decoded.
(b) The logarithmic likelihood ratios obtained by using self-correcting min-sum
algorithm after some number of iterations are given in Table 9.18. Firstly, it
can be noticed that in this case LLR symbol values grow substantially slower
with the number of iterations increase, from which the conclusion could be
drawn that the estimations are less reliable. However, in the case of
self-correcting min-sum algorithm, LLR-s converge substantially faster to the
stable values, and in this case, by the rule, a smaller number of iteration is
needed to make a decision. On the other hand, the small coefficients values

Table 9.17 Logarithmic likelihood ratio during iterations, min-sum, soft inputs

Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0.4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
(1)
1 Λ j 1.1×10 -16
2.4490 4.8980 0.6122 2.8571 4.8980 6.3265 3.4694
2 Λ (2)
j 3.4694 8.9×10-16 1.4286 4.4898 3.0612 1.0204 5.5102 2.6531
(3)
3 Λ j 4.0816 5.5102 3.4694 2.4490 6.9388 3.4694 5.5102 3.0612
5 Λ (5)
j 5.5102 4.8980 5.9184 5.3061 5.5102 5.5102 7.5510 6.9388
(10)
10 Λ j 12.6531 12.0408 12.0408 12.8571 13.2653 12.0408 14.2857 12.0408
496 9 Low Density Parity Check Codes

Table 9.18 Logarithmic likelihood ratio during iterations, self-correcting min-sum, soft inputs

Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8
(0)
0 Λ j –0,4082 4.8980 2.8571 –2.0408 4.4898 2.4490 3.4694 3.0612
1 Λ (1)
j 1.1×10 -16
2.4490 4.8980 0.6122 2.8571 4.8980 6.3265 3.4694
2 Λ (2)
j 2.4490 8.9×10-16 1.4286 2.4490 2.0408 1.0204 4.4898 2.6531
3 Λ (3)
j 2.0408 2.4490 2.4490 0.4082 2.4490 2.4490 3.4694 1.0204
5 Λ (5)
j 2.4490 2.4490 2.4490 1.0204 2.4490 3.0612 3.4694 1.0204
10 Λ (10)
j 2.4490 2.4490 2.4490 1.0204 2.4490 3.0612 3.4694 1.0204

provide good protection against the overflow, being of importance in the


practical implementation. It is interesting to note that in a case when at the
decoder input, the values obtained by a hard decisions were led, this procedure
resulted in an exact reconstruction of the sent code sequence (SPA did not
succeeded, see the second part of the previous problem).
Finally, the case is considered when for the initial LLR values the signal samples
are used, i.e. when

ð0Þ
Kj ¼ yj :

Although it is clear that this approach is not optimum and that the LLR for some
symbols by a rule will have a substantially smaller values, numerical results (given
in Table 9.19) show that the convergence rate in respect to the previously con-
sidered case did not changed. Because of that, such an approach is often used for
the implementation of simplified sum-product algorithm versions.

Table 9.19 Logarithmic likelihood ratio during iterations, self-correcting min-sum, input samples
are used as the initial LLR values

Iter Λ (jiter ) / j 1 2 3 4 5 6 7 8

0 Λ (0)
j –0.1 1.2 0.7 –0.5 1.1 0.6 0.85 0.75
1 Λ (1)
j –1.1×10-16 0.6 1.2 0.15 0.7 1.2 1.55 0.85
(2)
2 Λ j –0.6 –2.2×10 -16
0.35 0.6 0.5 0.25 1.1 0.65
3 Λ (3)
j 0.5 0.6 0.6 0.1 0.6 0.6 0.85 0.25
5 Λ (5)
j 0.6 0.6 0.6 0.25 0.6 0.75 0.85 0.25
10 Λ (10)
j 0.6 0.6 0.6 0.25 0.6 0.75 0.85 0.25
Problems 497

Problem 9.9
(a) Explain the construction of parity-check matrix of a quasi-cyclic code where n
−k = 4, the first circulants is a1(x) = 1 + x and the second a2(x) = 1 + x2 + x4.
Draw a corresponding bipartite graph.
(b) Find this code generator matrix starting from the fact that its second circulant
is invertible. Write all code words and verify whether the code is cyclic.
(c) If it is not known that a code is quasi-cyclic, find its matrix by parity-check
matrix reduction in a systematic form to a reduced standard row-echelon form.
(d) Find a generator matrix by transforming the corresponding parity-check matrix
in approximate lower diagonal form.

Solution
(a) Parity-check matrix of quasi-cyclic (QC) LDPC code in a general case has a
form [99,100]

H ¼ ½A1 ; A2 ; . . .; Al

where A1, A2,…,Al are binary circulant matrices having dimensions u  u,


where u = n–k.
In the case when l = 2 and u = 5, the first row of every circulant matrix is
determined by the corresponding polynomial coefficients, while the other rows are
obtained by their successive cyclic shifts
2 3 2 3
1 1 0 0 0 1 0 1 0 1
60 1 1 0 07 61 1 0 1 07
6 7 6 7
A1 ¼ 6
60 0 1 1 077; A2 ¼ 6
60 1 1 0 177;
40 0 0 1 15 41 0 1 1 05
1 0 0 0 1 0 1 0 1 1

and the parity-check matrix of the corresponding LDPC code finally is


2 3
1 1 0 0 0 1 0 1 0 1
6 0 1 1 0 0 1 1 0 1 0 7
6 7
H¼6
6 0 0 1 1 0 0 1 1 0 1 7:
7
4 0 0 0 1 1 1 0 1 1 0 5
1 0 0 0 1 0 1 0 1 1

Tanner bipartite graph for this code is shown in Fig. 9.23. Although the groups
of symbol nodes, corresponding to every circulant do not overlap, it does not
guarantee that the small length cycles will be avoided.
(b) It is known that generator matrix of quasi-cyclic code generally has the form
498 9 Low Density Parity Check Codes

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

c1 c2 c3 c4 c5

Fig. 9.23 Bipartite quasi-cyclic code graph

2 T 3
ðA1
l A1 Þ
6 ðA1 T 7
G¼6 l A2 Þ 7;
4 I uðl1Þ ... 5
1 T
ðAl Al1 Þ

where matrix A1 is invertible (it is not necessarily the last right square submatrix in
H!). It is obvious that a corresponding code has the code word length n = ul, while
the information word length is k = u(l−1).
In this example circulant a2(x) is invertible yielding a−1 2 3 4
2 (x) = x + x + x , and it
can be easily verified that the product of corresponding matrices modulo-2 yields
the unity matrix
2 3 2 3
0 0 1 1 1 1 0 0 0 0
61 0 0 1 17 60 1 0 0 07
6 7 6 7
A1
2 ¼6
61 1 0 0 17 1 6
7 ) A2 A2 ¼ 6 0 0 1 0 077 ¼ I5
41 1 1 0 0 5 40 0 0 1 05
0 1 1 1 0 0 0 0 0 1

and generator matrix becomes


2 3
1 0 0 0 0 1 0 0 1 0
60 1 0 0 0 0 1 0 0 17
6 7
G ¼ I5 ðA1
2 A1 Þ
T
¼6
60 0 1 0 0 1 0 1 0 077:
40 0 0 1 0 0 1 0 1 05
0 0 0 0 1 0 0 1 0 1

Code corresponding to this generator matrix is systematic and the first five bits
are completely determined by information word at the encoder input. The corre-
sponding transformation is
Problems 499

i ¼ ð00001Þ ! x ¼ ð0000100101Þ

From the generator matrix it can be easily noticed that the successive cyclic
shifts of this word correspond to information words

x0 ¼ ð1000010010Þ ! i0 ¼ ð10000Þ
x00 ¼ ð0100001001Þ ! i00 ¼ ð01000Þ
x000 ¼ ð1010000100Þ ! i000 ¼ ð10100Þ

here first two relations are really satisfied (to these code words correspond written
information sequences), but the third equality is not satisfied because to information
sequence (10100) corresponds code word
i ''' = (10100) → x ''' = (1010000110) .

Because of the fact that to one information sequence cannot correspond two
different code words, it is obvious that a code is not cyclic, because at least one
cyclic code word shift is not a code word. This code is only quasi-cyclic, because
the parity-check bits are given by matrix corresponding to the polynomial
T
2 ðxÞ * a2 ðxÞ ¼¼ 1 þ x :
a1 1 3

(c) Standard Parity-check matrix row-echelon form satisfies the condition that in
the neighboring rows of parity-check matrix (being not all zeros rows!) a
leading one of the lower row appears in the one column to the right regarding
to the upper row. Reduced standard form includes an additional condition—
the column comprising a leading one in some row has not ones at the other
positions, i.e. the parity-check matrix begins by unit matrix having dimensions
(n – k)  (n – k).
It is obvious that the first four rows of the found parity-check matrix satisfy a
standard echelon form, but the fifth row does not satisfy it. In this case it is difficult
by using standard operations over the rows (reordering of rows, change of one row
by sum modulo-2 of two rows) to form complete matrix satisfying standard echelon
form. Therefore, step-by-step, the try will be made to form parity-check matrix in a
systematic form. In the first column, one should be only at the first position, the fifth
row is changed by the sum of the first and the fifth row, forming the matrix H1. In
the second column, one should be only at the second position, and the first row of
matrix H1 is changed by the sum of the first and the fifth row, and the fifth row is
changed by the sum of the second and the fifth row yielding the matrix H2. Besides,
one should be careful not to decrease the matrix rang by these transformations.
500 9 Low Density Parity Check Codes

2 3 2 3
1 1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 1 0 0 1 1 0 1 07
6 7 6 7
6 7 6 7
H¼60 0 1 1 0 0 1 1 0 1 7 ! H1 ¼ 6 0 0 1 1 0 0 1 1 0 1 7;
6 7 6 7
60 7 60 07
4 0 0 1 1 1 0 1 1 05 4 0 0 1 1 1 0 1 1 5
1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0
2 3 2 3
1 1 0 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 1 0 0 1 1 0 1 07
6 7 6 7
6 7 6 7
H 1 ¼ 6 0 0 1 1 0 0 1 1 0 1 7 ! H 2 ¼ 6 0 0 1 1 0 0 1 1 0 1 7:
6 7 6 7
60 0 0 1 1 1 0 1 1 07 60 0 0 1 1 1 0 1 1 07
4 5 4 5
0 1 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0

In the third column now one should be eliminated in the second row (changing
by the sum of the second and fifth row of matrix H2). Further in the new obtained
matrix a similar procedure is applied for elimination of ones in the fourth column—
in the third row (sum 3rd and 5th) and after in the fifth row (sum 4th and 5th). As
the zero in fifth row and fifth column of matrix H3 cannot be eliminated and it
cannot be written in systematic form, this approach does not yield the results!
2 3 2 3
1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 1 0 0 1 1 0 1 07 60 1 0 0 1 1 1 1 1 0 7
6 7 6 7
H2 ¼ 6
60 0 1 1 0 0 1 1 0 177 ! H 3 ¼ 60
6 0 1 1 0 0 1 1 0 1 7;
7
6 7 6 7
40 0 0 1 1 1 0 1 1 05 40 0 0 1 1 1 0 1 1 05
0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1
2 3 2 3
1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1
6 7 6 7
60 1 0 0 1 1 1 1 1 07 60 1 0 0 1 1 1 1 1 07
6 7 6 7
H3 ¼ 6
60 0 1 1 0 0 1 1 0 17 6
7 ! H4 ¼ 6 0 0 1 0 1 0 0 1 0 077:
6 7 6 7
40 0 0 1 1 1 0 1 1 05 40 0 0 1 1 1 0 1 1 05
0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 1 1

Of course, it does not mean that the parity-check matrix cannot be written in a
systematic form, but only means that it cannot be obtained by reducing it to reduced
standard echelon form. Knowing that a code is quasi-cyclic, from generator matrix
it is easily obtained
2 3
1 0 1 0 0 1 0 0 0 0
60 1 0 1 0 0 1 0 0 07
6 7
H s ¼ A1
2 A1 I5 ¼6
60 0 1 0 1 0 0 1 0 077:
41 0 0 1 0 0 0 0 1 05
0 1 0 0 1 0 0 0 0 1

(d) Parity-check matrix has an approximate lower triangular form if it is in the


form [101]
Problems 501

 
A B T

C D E

where matrix T is square, dimensions (n–k–g)  (n–k–g), where g denotes gap and
if this value is smaller, the encoding procedure complexity is smaller. Matrix B then
has dimensions (n−k−g)  g, while dimensions of A are (n−k−g)  k.
This matrix further by Gauss-Jordan elimination is reduced to a form
   
~T ¼ I nkg 0 ~ A B T
H HT ¼ ~ ~
ET 1 Ig C D 0

where the matrix E is “erased”, and the following is valid

~ ¼ ET 1 A þ C;
C ~ ¼ ET 1 B þ D;
D

On the basis of this matrix form, the code word can be obtained in a systematic
form, where it has the form

x ¼ ði; p1 ; p2 Þ;

where i is the information word and the other component vectors are formed by
using relations
p1 ¼ D ~ T;
~ 1 Ci p2 ¼ T 1 ðAiT þ Bp1 Þ:

In this example n = 10 and k = 5. The matrix firstly should be obtained in a


suitable form. By changing the first row by the sum of the first and the third row the
matrix H′ is obtained, and then by changing the second row of this matrix by sum
of the second and the fourth row, the matrix H″ is formed
2 3
1 1 0 0 0 1 0 1 0 1
6 7
60 1 1 0 0 1 1 0 1 0 7
6 7
H¼6 60 0 1 1 0 0 1 1 0 1 7 !
7
6 7
40 0 0 1 1 1 0 1 1 05
1 0 0 0 1 0 1 0 1 1
2 3
1 1 1 1 0 1 1 0 0 0
6 7
60 1 1 0 0 1 1 0 1 07
6 7
H0 ¼ 6
60 0 1 1 0 0 1 1 0 177
6 7
40 0 0 1 1 1 0 1 1 05
1 0 0 0 1 0 1 0 1 1
502 9 Low Density Parity Check Codes
2 3
1 1 1 1 0 1 1 0 0 0
6 7
60 1 1 0 0 1 1 0 1 07
6 7
H ¼6
0
60 0 1 1 0 0 1 1 0 177!
6 7
40 0 0 1 1 1 0 1 1 05
1 0 0 0 1 0 1 0 1 1
2 3
1 1 1 1 0 1 1 0 0 0
6 7
60 1 1 1 1 0 1 1 0 07
6 7
H 00 ¼ 6
60 0 1 1 0 0 1 1 0 177
6 7
40 0 0 1 1 1 0 1 1 05
1 0 0 0 1 0 1 0 1 1

Finally, by reordering the third and the fourth row in the above matrix, the
parity-check matrix is formed in an approximate lower triangular form
⎡1 1 1 1 0 1 1 0 0 0⎤
⎢0 1 1 1 1 0 1 1 0 0 ⎥⎥

H T = ⎢0 0 0 1 1 1 0 1 1 0⎥ ,
⎢ ⎥
⎢0 0 1 1 0 0 1 1 0 1⎥
⎢⎣1 0 0 0 1 0 1 0 1 1 ⎥⎦

where
2 3 2 3 2 3
1 1 1 1 0 1 1 0 0 0
60 17 607 61 0 07
6 1 1 1 7 6 7 6 1 7
A¼6 7; B ¼ 6 7; T ¼ 6 7;
40 0 0 1 15 415 40 1 1 05
0 0 1 1 0 0 1 1 0 1

C = ½1 0 0 0 1 ; D = ½0 ; E ¼ ½ 1 0 1 1 :

Here is n – k = 5 and it is obvious that g = 1, and the desired parity-check matrix


form is
⎡1 1 1 1 0 1 1 0 0 0⎤
⎢0 1 1 1 1 0 1 1 0 0 ⎥⎥

H T = ⎢0 0 0 1 1 1 0 1 1 0⎥ ,
⎢ ⎥
⎢0 0 1 1 0 0 1 1 0 1⎥
⎢⎣1 0 1 0 0 1 0 0 0 0 ⎥⎦

where

~ ¼½ 1 0
C 1 0 0 ; ~ ¼½1 :
D

For two previously considered information sequences now it is obtained


Problems 503

i ¼ ð10000Þ ! p1 ¼ ð1Þ; p2 ¼ ð0010Þ ! x ¼ ði; p1 ; p2 Þ ¼ ð1000010010Þ;


i ¼ ð10100Þ ! p1 ¼ ð0Þ; p2 ¼ ð0110Þ ! x ¼ ði; p1 ; p2 Þ ¼ ð1010000110Þ

and it can be noticed that the following is always valid


2 3
1
607
T 6 7
p1 ¼ i D ~T
~ 1 C ¼i 6 1 7;
6 7
405
0 2 3
0 0 1 0
 T 61 0 0 17
6 7
~
~ 1 CÞ 60 07
p2 ¼ i T 1 ðA þ BD ¼i 6 1 0 7:
41 0 1 15
0 1 0 0

and matrices multiplied from the left by information vector just correspond to the
sixth, i.e. to 7–10 columns of the generator matrix obtained in the second part of the
problem solution using a totally independent way. The procedure described in this
part of solution is general (not limited to quasi-cyclic codes only), guaranteeing that
the systematic code obtaining even in a case if it is applied by successive vectors p1
and p2 calculations, has approximately linear complexity.
Problem 9.10
(a) Explain the calculation of sphere packing bound and using this relation find
the probability that the decoding is unsuccessful if the transmission is over
BSC, if the crossover probability is p < 0.14, and a used code has code rate
R = 1/2, code word length being 10  n  1000.
(b) Draw the probability that a code word after transmitting over BSC is decoded
unsuccessfully. Give the results for regular PEG (Progressive Edge Growth)
LDPC code, code word length n = 200, if for decoding are applied
bit-flipping, min-sum, self-correcting min-sum and sum-product algorithm.
Compare the obtained results to the limiting values found in a previous part.
(c) For the same code as above draw the dependence of the error probability after
decoding on the crossover probability, if the transmission is through AWGN
channel for Eb/N0 < 10 dB. Compare the obtained results to the corresponding
Shannon bound and find the code gain for Pe = 10−4.
(d) In the case of the channel with AWGN, when there is no a quantizing block,
the channel can be considered as having a discrete input (a binary one!) while
the output is continuous. For such case use PEG code which has the code word
length n = 256, code rate R = 1/2, and calculate the residual bit error proba-
bility (BER) if for decoding are applied bit-flipping, gradient multi-bit flipping
(GDBF), min-sum, and sum-product algorithm.
504 9 Low Density Parity Check Codes

Solution
a) Sphere packing bound in general case can be written in a form

 
qn d1
Aq ðn; dÞ  ; ec ¼ ;
P
ec
n 2
ðq  1Þl
l¼0 l

where Aq(n, d) is a maximum number of words of code basis q and the minimum
Hamming distance is at least d. It is easy to notice that sphere packing bound is only
the second name for a Hamming bound (defined in Problem 5.8), and for a case of
binary linear block codes Aq(n, d) = 2k and the above relation reduces to

2n X
ec
n
2k  ) 2nð1RÞ  :
Pec
n l
l¼0
l¼0 l

From the above one can find a number of errors correctable in the optimum case,
in the code word of the length n and for the code rate R (because k = Rn).
(b) The lower bound for the unsuccessful decoding is

X
ec
n l
Pe;min ðpÞ  1  p ð1  pÞnl ;
l¼0
l

and as to one code word usually corresponds one transmission frame, the above
relation in effect gives the frame error probability (rate) (FER). Besides, the
equality is satisfied only if the code is perfect, while in a general case (and for non
perfect codes) the exact expression for the probability that at least one error is not
corrected depends on the weight code spectrum given in fourth part of the Problem
5.8. The corresponding numerical values are shown in Fig. 9.24.
(c) Firstly, the case will be considered for binary symmetric channel, having
crossover probability p. The class of Progressive Edge Growth (PEG) LDPC
codes is considered, where the parity-check matrix is formed by a constructive
method (it is a structured, and not a random approach with the limitations as
for Gallager codes) described in [102]. To analyze these codes, a Monte Carlo
simulation method was used. Basics of that procedure are shown in Fig. 9.25
—at the coded sequence the uncorrelated sequence is superimposed where the
probability of ones equals p and the decoding is performed. The residual bit
error probability is estimated by comparing the obtained sequence to the
sequence at the encoder input and the corresponding results for the code PEG
(200,100) and various decoding algorithms are shown in Fig. 9.26.
Problems 505

0
10
n=10
n=30
n=100
n=200
−1
10 n=500
n=1000
Frame Error Rate, FER

−2
10

Schannon
limit for BSC
−3
10

−4
10
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Crossover probability in BSC, p

Fig. 9.24 FER in BSC, sphere packing bounds

From the obtained diagram it can be seen that from all the considered algorithms
SPA provides the minimum and bit-flipping the maximum error probability.
Simplified variants of SPA decoding procedure have the smaller complexity, but
the decoder performances are degraded as well. It is obvious that by min-sum
algorithm correction these degradations can be lessened substantially. However,
even when the optimum decoding algorithm is used, there is a significant difference
to the performances foreseen by sphere packing bound, being the consequence of
“non perfectness” of the considered PEG code.
(d) For the case of the channel with AWGN, when there is no a quantizing block,
the channel can be considered as a channel which has a discrete input (a binary
one!) while the output is continuous. In this case the Monte Carlo simulation is

Generator of b c r b'
LDPC LDPC
information User
encoder decoder
sequence

Probability of error
estimation

Fig. 9.25 Block scheme of a simulation procedure, BSC


506 9 Low Density Parity Check Codes

0
10

Frame Error Rate, FER

−1
10

−2
10
bit−flipping
min−sum
self−correcting min−sum
sum−product
sphere packing bound
−3
10
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Crossover probability in BSC

Fig. 9.26 FER in BSC

0
10
uncoded
bit−flipping
−1
10 multi−GDBF
min−sum
sum−product
−2
10
Bit Error Rate, BER

−3
10

−4
10

−5
10

−6
10
1 2 3 4 5 6 7 8 9 10 11
Eb/No [dB]

Fig. 9.27 BER for AWGN channel


Problems 507

performed on the modulation channel level and the results are shown in
Fig. 9.27 for PEG code having the code word length n = 256 and code rate
R = 1/2. The detailed description of code construction and the simulation
procedure for this case are given in [103].
For the case when sum-product algorithm is applied, the error probability 10−6 is
achieved for the substantially smaller values of parameter Eb/N0 in respect to the
case when bit-flipping algorithm is applied. Some modification of bit-flipping
algorithm (e.g. gradient multi-bit flipping—GDBF) allows to increase the code gain
from 0.4 to 2.5 dB (for Pe,res = 10−6). On the other hand, even a significant sim-
plification of sum-product algorithm (min-sum is just such one) in the considered
case does not gives a significant performances degradation—coding gain decreases
from 6.1 to 5.8 dB. It is just a reason that contemporary decoding algorithms are
based mainly on message-passing principle, with smaller or greater simplifications.
The Shannon bound for code rate R = 1/2 is 0.19 dB [1], it is clear that even
SPA algorithm for Pe,res = 10−6 provides performances being about 4.5 dB worse
than that limiting value. Of course, this difference can be substantially decreased if
the code word length is longer (luckily, PEG procedure allows the codes con-
struction having practically arbitrary code words length). On the Shannon theory
basis, providing the negligible small probability of error for Eb/N0 < 0.19 dB is
possible only if the code rate is additionally reduced, and even when R ! 0, a
reliable transmission can be provided only if Eb/N0 > –1, 59 dB.
References

1. D. Drajić, P. Ivaniš, Uvod u teoriju informacija i kodovanje (An Introduction into


Information Theory and Coding) (Akademska misao, Beograd, 2009). (in Serbian)
2. T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, New York, 1991)
3. N. Abramson, Information Theory and Coding (McGraw-Hill Book Company, New York,
1963)
4. D.B. Drajić, Uvod u statističku teoriju telekomunikacija (Akademska misao, Beograd,
2003)
5. G. Lukatela, D. Drajić, G. Petrović, R. Petrović, Digitalne telekomunikacije, II izd.
(Građevinska knjiga, Beograd 1984)
6. I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series and Products, 5th edn. (Academic
Press Inc, Cambridge, 1994)
7. P.M. Fenwick, Huffman Code Efficiencies for Extensions of Sources. IEEE Trans.
Commun. 43, 163–165 (1995)
8. G. Yule, On a method of investigating periodicities in disturbed series, with special
reference to Wolfer’s sunspot numbers. Philos. Trans. R. Soc. London Ser. A 226, 267–298
(1927)
9. S. Haykin, Adaptive Filter Theory, 3rd edn. (Prentice Hall, Englewood Cliffs, New Jersey,
1996)
10. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423
(1948). (623–656)
11. R. Fano, Transmission of Information (Wiley, New York, 1961)
12. D.A. Huffman, A method for the construction of minimum redundancy codes. Proc. IRE
40, 1098–1101 (1952)
13. N. Faller, An adaptive system for data compression, in Record of the 7th Asilomar
Conference on Circuits, Systems and Computers (1973), pp. 593–597
14. R.G. Gallager, Variations on a theme by Huffman. IEEE Trans. Inf. Theor. 24, 668–674
(1978)
15. D.E. Knuth, Dynamic Huffman coding. J. Algorithms 6, 163–180 (1985)
16. J.S. Vitter, Dynamic Huffman coding. ACM Trans. Math. Softw. 15, 158–167 (1989)
17. J. Ziv, A. Lempel, A universal algorithm for sequential data compression. IEEE Trans.
Inform. Theor. IT-23, 337–343 (1977)
18. J. Ziv, A. Lempel, Compression of individual sequences via variable-rate coding. IEEE
Trans. Inform. Theor. IT-24, 530–536 (1978)
19. T.A. Welch, A technique for high performance data compression. Computer 17(6), 8–19
(1984)
20. E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39, 1253–1266 (1960)
21. E.O. Elliott, Estimates of error rates for codes on burst-noise channels. Bell Syst. Tech.
J. 42, 1977–1998 (1963)

© Springer International Publishing AG 2017 509


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1
510 References

22. H. Nyquist, Certain topics in telegraph transmission theory. Trans. AIEE 47, 617–644
(1928)
23. R.W. Hamming, Error detecting and error correcting codes. Bell Sys. Tech J. 29, 147–160
(1950)
24. M.J.E. Golay, notes on digital coding. Proc. IRE 37, 657 (1949)
25. S. Lin, D.J. Costello Jr., Error Control Coding—Fundamentals and Applications, 2nd edn.
(Prentice Hall, Englewood Cliffs, NJ, 2004), Prentice Hall, Englewood Cliffs, N.J., 1983
26. R.H. Morelos-Zaragoza, The Art of Error Correcting Coding (Wiley, Hoboken, 2002)
27. R.E. Blahut, Theory and Practice of Error Control Codes (Addison-Wesley Publishing
Company, Reading, Massachusetts, 1983)
28. I.S. Reed, A class of multiple-error-correcting codes and the decoding scheme. IRE Trans.
Inform. Theor. IT-4, 38–49 (1954)
29. D.E. Muller, Application of Boolean algebra to switching circuit design and to error
detection. IRE Trans. Electr. Comp. EC-3, 6–12 (1954)
30. D. Brown, Error detecting and correcting binary codes for arithmetic operations. IRE Trans.
Electron. Comput. EC-9, 333–337 (1960)
31. R.R. Varshamov, G.M. Tenengolz, One asymetrical error-correction codes. Avtomatika i
Telematika 26, 288–292 (1965)
32. R.R. Varshamov, A class of codes for asymmetric channels and a problem from the additive
theory of numbers. IEEE Trans. Inform. Theor. IT-19, 92–95 (1973)
33. A.J.H. Vinck, H. Morita, Codes over the ring of integer modulo m. IEICE Trans. Fundam.
E-81-A, 2013–2018 (1998)
34. W.W. Peterson, E.J. Weldon Jr., Error-Correcting Codes, 2nd edn. (The MIT Press,
Cambridge, 1972)
35. W.W. Peterson, D.T. Brown, Cyclic codes for error detection. Proc. IRE 49, 228–235
(1961)
36. G. Castagnoli, J. Ganz, P. Graber, Optimum cyclic redundancy-check codes with 16-bit
redundancy. IEEE Trans. Commun. 38, 111–114 (1990)
37. A. Hocquenghem, Codes correcteurs d'erreurs. Chiffres 2, 147–156 (1959)
38. R.C. Bose, D.K. Ray-Chaudhuri, On a class of error correcting binary group codes. Inform.
Control 3, 68–79 (1960)
39. S.B. Wicker, Error control systems for digital communication and storage (Prentice Hall
Inc, New Jersey, 1995)
40. W.W. Peterson, Encoding end error-correction procedures for the Bose-Chaudhuri codes.
IRE Trans. Inform. Theor. 6, 459–470 (1960)
41. E.R. Berlekamp, Nonbinary BCH decoding, in Proceedings of the International Symposium
on Information Theory (1967). San Remo, Italy
42. I.S. Reed, G. Solomon, Polynomial codes over certain finite fields. SIAM J. Appl. Math. 8,
300–304 (1960)
43. D.C. Gorenstein, N. Zierler, A class of error-correcting codes in pm symbols. J. Soc. Industr.
Appl. Math. 9, 207–214 (1961)
44. J. Massey, Shift-register synthesis and BCH decoding. IEEE Trans. Inform. Theor. 15,
122–127 (1969)
45. R.E. Blahut, Theory and practice of error control codes (Addison-Wesley Publishing
Company, Reading, Massachusetts, 1983)
46. P. Elias, Coding for noisy channels. IRE Convention Rec. 3, 37–46 (1955). (Pt. 4)
47. G.D. Forney Jr, Convolutional codes I: algebraic structure. IEEE Trans. Inform. Theor.
IT-16, 720–738 (1970), IT-17, 360 (1971)
48. J.L. Massey, Threshold Decoding (MIT Press, Cambridge, 1963)
49. D.E. Muller, Application of Boolean algebra to switching circuit design and to error
detection. IRE Trans. Electron. Comp. EC-3, 6–12 (1954)
50. I.S. Reed, A class of multiple-error-correcting codes and the decoding Sheme. IRE Trans.
Inform. Theor. PGIT-4, 38–49 (1954)
References 511

51. J.M. Wozencraft, Sequential decoding for reliable communication. National IRE
Convention Rec. 5(2), 11–25 (1957)
52. R.M. Fano, A Heuristic discussion of probabilistic decoding. IEEE Trans. Inform. Theor.
IT-19, 64–74 (1963)
53. К. Ш. Зигaнгиpoв, “Heкaтopыe пocлeдoвaтeльныe пpoцeдypы дeкoдиpoвaния”,
Пpoблeмы пepeдaчи инфopмaции, T.2. (1966), cтp. 13–25
54. F. Jelinek, Fast sequential decoding algorithm using a stack. IBM J. Res. Dev. 13, 675–678
(1969)
55. R.W. Chang, J.C. Hancock, On receiver structures for channels having memory. IEEE
Trans. Inform. Theor. IT-12, 463–468 (1966)
56. J.L. Massey, Coding and modulation in digital communications, in International Zurich
Seminar on Digital Communications (Zurich, Switzerland, 1974), pp. E2(1)–E2(4)
57. G. Ungerboeck, I. Csajka, On improving data-link performance by increasing the channel
alphabet and introducing sequence coding, in International Symposion on Information
Theory (Ronneby, Sweden, 1976)
58. G. Ungerboeck, Channel coding with multilevel/phase signals. IEEE Trans. Inform. Theor.
IT-28, 55–67 (1982)
59. G. Ungerboeck, Trellis-coded modulation with redundant signal sets Part I: introduction.
IEEE Commun. Mag. 25, 5–11 (1987)
60. G. Ungerboeck, Trellis-coded modulation with redundant signal sets Part II: State of the art.
IEEE Commun. Mag. 25, 12–21 (1987)
61. R. Johanneson, K.S. Zigangirov, Fundamentals of convolutional coding (IEEE press, New
York, 1999)
62. A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm. IEEE Trans. Inform. Theor. IT-13, 260–269 (1967).
63. A.J. Viterbi, J.K. Omura, Principles of Digital Communication and Coding (McGraw-Hill,
New York, 1979)
64. J.B. Cain, G.C. Clark, J.M. Geist, Punctured convolutional codes of rate (n–1)/n and
simplified maximum likelihood decoding, IEEE Trans. Inform. Theor. IT-25, 97–100
(1979)
65. G. Ungerboeck, Channel coding with multilevel/phase signals. IEEE Trans. Inform. Theor.
IT-28, 55–67 (1982)
66. L.R. Bahl, J. Cocke, F. Jelinek, J. Raviv, Optimum decoding of linear codes for minimizing
symbol error rate. IEEE Trans. Inform. Theor. IT-20, 284–287 (1974)
67. J.K. Wolf, Efficient maximum likelihood decoding of linear block codes. IEEE Trans.
Inform. Theor. IT-24, 76–80 (1978)
68. B. Vucetic, J. Yuan, Turbo Codes—Principles and Applications (Kluwer Academic
Publishers, Boston, 2000)
69. B. Honary, G. Markarian, Trellis Decoding of Block Codes (Kluwer Academic Publishers,
Boston, 1997)
70. D.J. Costello Jr., J. Hagenauer, H. Imai, S.B. Wicker, Applications of error-control coding.
IEEE Trans. Inform. Theor. 44, 2531–2560 (1998)
71. S. Lin, T. Kasami, T. Fujiwara, M. Fossorier, Trellises and Trellis-Based Decoding
Algorithms for Linear Block Codes (Kluwer Academic Publishers, Boston, 1998)
72. J. Hagenauer et al., Variable-rate sub-band speech coding and matched channel coding for
mobile radio channels, in Proceedings of the 38th IEEE Vehicular Technology Conference
(1988), pp. 139–146
73. L.R. Bahl, J. Cocke, F. Jelinek, J. Raviv, Optimum decoding of linear codes for minimizing
symbol error rate, IEEE Trans. Inform. Theor. IT-20, 284–287 (1974).
74. P. Robertson, E. Villebrun, P. Hoeher, A comparison of optimal and sub-optimal MAP
decoding algorithms operating in the log domain, in Proceedings of the IEEE ICC ‘95
(Seattle, June 1995), pp. 1009–1013
512 References

75. W. Koch, A. Baier, Optimum and sub-optimum detection of coded data disturbed by
time-varying intersymbol interference, in Proceedings of the IEEE GLOBECOM ‘90,
November 1990, Vol. II, pp. 1679–1684
76. M.C. Valenti, An efficient software radio implementation of the UMTS turbo codec, in
Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio
Communications (San Diego, CA, Sept 2001), pp. G-108–G-113
77. J. Hagenauer, P. Hoeher, A Viterbi algorithm with soft-decision outputs and its
applications, in GLOBECOM ’89, Nov, pp. 1680–1686
78. Y. Li, B. Vucetic, Y. Sato, Optimum Soft-Output Detection for Channels with Inter-symbol
Interference. IEEE Trans. Inform. Theory 41, 704–713 (1995)
79. C. Berrou, A. Glavieux, P. Thitimajshima, Near optimum error correcting coding and
decoding: turbo codes, in Proceedings of the ICC ’93 (Geneva, 1993), pp. 1064–1070
80. C. Berrou, The ten-year old turbo codes are entering into service. IEEE Commun. Mag. 41
(7), 110–116 (2003)
81. S. Benedetto, G. Montorsi, “Unveiling turbo codes: some results on parallel concatenated
coding, IEEE Trans. Inform. Theor. IT-43, 591–600 (1996)
82. B. Sklar, A primer on turbo code concepts. IEEE Commun. Mag. 35, 94–102 (1997)
83. R. Gallager, Low-density parity-check codes. IRE Trans. Inform. Theor. 7, 21–28 (1962)
84. L.M. Tanner, A recursive approach to low complexity codes. IEEE Trans. Inform. Theor.
27, 533–547 (1981)
85. D.J.C. MacKay, R.M. Neal, Near Shannon limit performance of low density parity check
codes. Electron. Lett. 32, 1645–1646 (1996)
86. Y.Y. Tai, L. Lan, L. Zeng, S. Lin, K.A.S. Abdel-Ghaffar, Algebraic construction of
quasi-cyclic LDPC codes for the AWGN and erasure channels, IEEE Trans. Commun. 54,
1765–1774 (2006)
87. J. Xu, L. Chen, I. Djurdjevic, S. Lin, K.A.S. Abdel-Ghaffar, Construction of regular and
irregular LDPC codes: geometry decomposition and masking. IEEE Trans. Inform. Theor.
53, 121–134 (2007)
88. M.P.C. Fossorier, Quasi-cyclic low-density parity-check codes from circulant permutation
matrices. IEEE Trans. Inform. Theor. 50, 1788–1793 (2004)
89. B. Vasic, O. Milenkovic, Combinatorial constructions of low-density parity-check codes
for iterative decoding. IEEE Trans. Inform. Theor. 50, 1156–1176 (2004)
90. T.J. Richardson, R. Urbanke, Efficient encoding of low-density parity-check codes. IEEE
Trans. Inform. Theor. 47, 638–656 (2001)
91. T. Richardson, R. Urbanke, The renaissance of Gallager’s low-density parity-check codes.
IEEE Commun. Mag. 41(8), 126–131 (2003)
92. T. Richardson, R. Urbanke, Modern Coding Theory, 2007, online: http://lthcwww.epfl.ch/
mct/index.php
93. Y. Kou, S. Lin, M. Fossorier, Low-density parity-check codes based on finite geometries: a
rediscovery and new results. IEEE Trans. Inform. Theor. 47, 2711–2736 (2001)
94. J. Zhang, M. Fossorier, A modified weighted bit-flipping decoding of low density
parity-check codes. IEEE Comm. Lett. 8, 165–167 (2004)
95. F. Guo, L. Hanzo, Reliability ratio based weighted bit-flipping decoding for low-density
parity-check codes. Electron. Lett. 40, 1356–1358 (2004)
96. D.J.C. MacKay, Good error correcting codes based on very sparse matrices. IEEE Trans.
Inform. Theor. 45, 399–431 (1999)
97. J. Chen, M. Fossorier, Density evolution for two improved BP-based decoding algorithms
of LDPC codes. IEEE Commun. Lett. 6, 208–210 (2002)
98. V. Savin, Self-corrected min-sum decoding of LDPC codes, in Proceedings of the IEEE
International Symposium on Information Theory, ISIT 2008 (Toronto, July 2008), pp. 146–
150
99. M.P.C. Fossorier, Quasi-cyclic low-density parity-check codes from circulant permutation
matrices. IEEE Trans. Inform. Theor. 50, 1788–1793 (2004)
References 513

100. B. Vasic, O. Milenkovic, Combinatorial constructions of low-density parity-check codes


for iterative decoding. IEEE Trans. Inform. Theor. 50, 1156–1176 (2004)
101. S.J. Johnson, Introducing Low-density Parity-Check Codes, Published Internal Technical
Report, Department of Electrical and Computer Engineering, University of Newcastle,
Australia
102. T. Richardson, R. Urbanke, Modern Coding Theory (Cambridge University Press, New
York, 2008)
103. Omran Al Rasheed, Dajana Radović, Predrag Ivaniš, Performance analysis of iterative
decoding algorithms for PEG LDPC codes in Nakagami fading channels, Telfor J. 5, 97–
102 (2013)
Index

A C
Algebraic coding theory, 158, 385 Channel
alphabet
B input, 2, 91, 95, 104, 109, 113, 115,
Block codes 124, 126–128, 131, 132, 141
bound output, 2, 91, 95, 98, 102, 109, 113,
Singleton, 156 133, 136, 140, 143, 147, 364, 391,
Varshamov-Gilbert, 156, 215 392, 420, 432, 487, 488, 491
cyclic code, 239, 240 binary (BC), 92, 93, 107, 120, 123, 124,
dual code, 162 215, 391
equivalent code, 159, 172 binary erasure (BEC), 94, 136, 137, 155,
error control procedure 461
ARQ, 154, 155, 163, 165, 188 binary symmetric (BSC), 93, 120, 121, 122,
FEC, 154, 155, 163, 165 123, 124, 128, 133, 151, 165, 183, 194,
hybrid, 154, 155 197, 198, 366, 489, 504
extended code, 159 burst noise channel
generator matrix, 159–162, 168, 169, 172, Gilbert model, 110
173, 178, 184, 194, 201, 202, 208 Gilbert-Elliott model, 110
hamming codes, 156, 157, 159, 173–175, capacity, 101, 102, 106, 111, 123, 124, 139,
177, 178, 180, 181, 183–186 142, 143, 145
hamming distance, 154, 155, 157, 158, 162, continuous, 91, 103, 104, 123, 124, 139,
164, 170, 172, 179, 194, 212, 213, 247 142, 143, 145
hamming weight, 158, 159, 170, 179, 183, decision
203, 219 hard, 400, 404, 415, 453, 480, 485, 496
linear block code, 153, 157, 158, 160, 168, soft, 2, 414, 417, 427, 456
169, 170, 178, 194, 199, 201, 202, 215, decision rule, 106
216 MAP, 108, 114, 116, 117, 441
MDS codes, 156, 197 ML, 108, 117, 391, 394
parity-check, 157, 162, 174 discrete, 2, 91, 92, 95, 103, 109, 121, 136,
parity-check matrix, 162, 164, 179, 184, 139–143, 145, 391–393, 399
196 probabilities
perfect code, 156 input (a posteriori, a priori), 95
Reed-Muller codes, 164 output (a posteriori, a priori), 95
repetition codes, 165, 166, 168 transitional, 11
syndrome, 156, 163, 164, 171, 172, 174, second Shannon theorem, 2, 111, 167
176, 177, 182, 186, 194, 196, 220 with memory, 93, 109, 151
weight spectrum, 158, 159, 160, 189, 195, without memory, 2, 95, 109, 392
203, 207, 212, 215 Convolutional codes

© Springer International Publishing AG 2017 515


P. Ivaniš and D. Drajić, Information Theory and Coding - Solved Problems,
DOI 10.1007/978-3-319-49370-1
516 Index

Convolutional codes (cont.) code efficiency, 50, 54, 55, 58, 77


catastrophic error propagation, 343 code tree, 47, 48, 50–54, 60, 62–65, 67, 78,
code rate, 327, 328, 350, 374, 402 83
constrain length, 3, 328 code word length, 45, 49, 50, 53–56, 59,
encoder, 153 60, 62–64, 67, 69, 71, 74, 75, 84
error event, 337 compact code, 52, 54, 62, 64, 67
error path, 344 complete code, 46
free distance, 3, 334, 345, 374, 376, 377 compression ratio, 50, 54, 56–59, 62,
nonsystematic, 327, 329, 342 71–75, 77, 84–86, 88–90
polynomial description, 330 FGH algorithm
punctured convolutional codes, 340, 374, Vitter modification, 54, 83
376 First Shannon theorem, 49, 71, 73
state diagram, 3, 329, 334, 349, 351, 378 Huffman algorithm
systematic, 329, 342, 378, 402, 443 adaptive, 78, 81, 83
TCM, 340, 378 instantaneous code, 48, 49, 59, 62, 66
transfer function matrix, 334, 344 Kraft inequality, 48, 49, 59, 60, 63, 66, 67
tree code, 327 nonsingular code, 45
Cyclic codes perfect code, 55, 57, 58
algorithm sibling property, 54, 62, 64, 78, 80, 82, 83
Berlekamp-Massey, 250, 309, 310, 313 Shannon Fano coding, 50, 54–57, 63–66
Euclidian, 317 singular code, 45
Gorenstein-Zierler, 306–308, 313 unique decodability, 46
BCH codes, 244, 245, 247, 248, 282, 285,
290, 291, 295, 297, 299, 310 E
CRC codes, 244, 273, 275, 277, 279, 280 Encoding
polynomial Channel (error control)
erasure locator, 250 block, 153, 327
error magnitude, 248, 297, 309, 310 convolutional, 327, 328, 334
error locator, 248, 250, 292–294, cyclic, 237, 240, 385
297–299, 306, 309, 310 linear, 237, 385
erasure locator, 250 turbo, 385
generator, 239, 240, 242–255, 258–260, source (data compression), 1, 2, 45–90, 153
262, 265, 266, 268–286, 288–291,
295, 296, 299–304, 319 I
irreducible, 238, 239, 317–319, Information
321–323 mutual, 5, 7, 8, 99–105, 107, 108, 112, 113
parity-check, 246, 251, 252, 255, 256, rate, 11, 14, 15, 16, 102, 111–113, 121,
262 122, 124–129, 131, 133–135, 139, 166,
primitive, 238, 242, 246, 282, 286, 287, 167, 341, 350
289, 295, 299, 302, 303, 305, 308, source, 2, 5, 6, 8–10, 12, 15, 113, 121, 122,
311, 322, 323 124, 130, 131, 135, 138, 139
syndrome, 248, 297, 299, 310, 312 unit (Shannon), 10, 11, 127
Reed-Solomon codes, 248, 286, 299, 301,
312 L
LDPC
D belief propagation, 3, 456, 480
Data compression (source encoding) bit flipping, 3, 454, 455, 467, 468, 471,
average code word length, 2, 49, 53–56, 59, 474, 480, 503, 505, 507
60, 62–64, 67, 69, 71, 74, 75 decoding
binary code, 45, 46, 48, 49, 54, 58, 60, 69, hard, 2, 199, 200, 364, 365, 414, 417,
78 459
block code, 45, 46, 153 iterative, 3, 385, 399, 447, 453, 456,
code base, 45, 48 462, 463, 475, 480, 486, 488
Index 517

soft, 2, 3, 109, 142, 147, 199, 200, 201, pseudorandom (PN), 13, 31
250, 334, 339, 364, 366, 367, 391, stationary, 13
414, 417, 427, 456 wide sense stationary, 13
irregular, 448, 449, 462
nodes S
parity(-check), 451–453, 456–459, 463, Source
469, 471, 475–478, 481–483, 487, binary, 6, 10, 18–20, 34, 35, 73, 84, 89,
488, 493–495 111, 112, 115, 118, 122, 128, 133, 136,
variable, 451–453, 456–459, 463, 469, 139, 141, 143, 145, 147
471, 475–478, 481–483, 485, 487, continuous, 2, 13, 14, 15, 39, 41, 42
488, 493, 495 discrete
children, 451 adjoined, 12, 143
parent, 451, 456, 481 alphabet, 5, 7, 19, 45, 77
regular, 448, 449, 450, 451, 452, 453, 455, encoding, 45, 77
457, 463, 464, 469, 478, 503 entropy, 15, 32, 34, 136
sum-product algorithm, 456, 458, 459, 478, extension, 2, 17
487, 488, 495, 497, 503, 507 with memory (Markov)
Tanner bipartite graph ergodic, 6
girth cycle, 452, 453, 463 homogeneous, 22
joined source, 55
M state, 6, 20, 21, 22, 25, 27, 29
Matrix absorbing, 6
channel, 2, 91 diagram, 6–8, 20–27, 29–31, 34, 35,
doubly stochastic, 24 110, 329, 330, 334, 335, 345–351,
generator, 3, 159–163, 168, 169, 172, 173, 353, 358, 359, 363, 377, 378, 410
178, 179, 184, 194, 201–203, 206–209, stationary, 34, 110
211, 214, 217, 218, 220, 221, 240, 241, stationary probabilities, 8, 20–22,
242, 249, 250, 251, 253–256, 258–261, 74, 75, 149
267, 302, 303, 386, 388–390, 405, 406, trellis, 6, 9, 26, 29, 31, 34, 35, 407,
408–410, 414, 419, 420, 460, 479, 408
497–500, 503 without memory, 5, 41, 52
parity-check, 3, 162, 164, 171, 174,
178–180, 184, 194, 196, 202, 205, 207, T
208, 216–218, 241, 256, 258, 389, 406, Trellis decoding of linear block codes
408–410, 447, 459, 460–464, 466–468, algorithms
474, 477, 479, 485, 487, 491, 497, generalized VA, 3, 385, 390, 392, 394,
499–502, 504 419–422, 426, 428
stochastic, 6, 21, 92 BCJR, 3, 394, 398–400, 419, 421, 426,
transition, 6, 19–25, 73, 92–95, 106–110, 432, 438, 439, 441
112, 133, 137, 145 Max-Log-MAP, 3, 399, 439, 441
Constant-Log-MAP, 3, 399
R SOVA, 3, 399–401, 426–428, 431, 432,
Random process 438, 442
autocorrelation function, 13 trellis oriented generator matrix, 3,
discrete, 12, 13 399–401, 426–428, 431, 432, 438,
discrete autocorrelation function, 31 442
PN generator, 31 Turbo codes
power density spectrum, 14, 143, 147, 339, encoder, 402, 405, 442, 443
432 decoder, 401, 405, 447, 458, 487

You might also like