CH 0 Introduction: 0.1 Overview of Information Theory and Coding
CH 0 Introduction: 0.1 Overview of Information Theory and Coding
Sent Received
messages messages
symbols
Source Channel Channel Source
coding coding decoding decoding
source encoder channel decoder receiver
0110101001110…
1
Digital Communication and Storage Systems
Channel: produces a received signal r which differs from the original signal, c (the
channel introduces noise, channel distortion, etc.). Thus, the decoder can only produce an
estimate m’ of the original message, m.
Goal of processing: Information conveyed through (or stored in) the channel must be
reproduced at the destination as reliable as possible. At the same time, it needs to allow
the transmission of as much information as possible per unit time (communication
system) or storage (storage system).
Information Source
The Source Message m consists of a time sequence of symbols emitted by the
information source. The source can be:
Continuous-time Source, if this message is continuous in time, e.g., speech waveform.
Discrete-time Source, if the message is discrete in time, e.g., data sequences from a
computer.
Since the information and coding theory depends on the probability theory, we need to
review it first.
2
§0.2 Review of Random Variables and Probability
Probability
Let us consider a single experiment, such as rolling of a dice, with a number of possible
outcomes. The sample space S of the experiment consists of the set of all possible
outcomes.
In the case of a dice S 1, 2,3,4,5,6 , with the integer representing the number of
dots on the six faces of the dice.
Event:
Complement
Two events are said to be Mutually Exclusive if they have no sample points in common.
For example:
Or:
3
Joint Event and Joint Probability
Instead of dealing with a single experiment, let us perform two experiments and consider
their outcomes.
For example: The two experiments can be separate tosses of a single dice or a single toss
consisting of two consecutive dices.
In addition,
Conditional Probability
A joint event A, B occurs with the probability P A, B , which can be expressed as:
,
where P A B and P B A are conditional probabilities.
4
A conditional probability is .
If A B ,
Statistical Independence: Let P A B be the probability of occurrence of A given that
B has occurred. Suppose that the occurrence of A does not depend on the occurrence of B.
Then,
5
Random Variables
Sample space S
Elements s S X s is a Random Variable
For examples:
i 1 i 1
1, x xi
x xi pX x
0, otherwise
For example:
1/6
x
1 2 3 4 5 6
Definition:
The Mean of the random variable X:
Useful Distributions
Let X be a discrete random variable that has two possible values, say X 1 or X 0 ,
with probabilities p and 1- p , respectively.
This is the Bernoulli distribution, and the PMF can be represented as given in the figure.
The mean of such a random variable is .
pX x
The performance of a fixed number of trials
1-p p
with fixed probability of success on each trial is
known as a Bernoulli trial.
x
0 1
6
Let X i , i 1,..., n , be statistically independent and identically distributed random
variables with a Bernoulli distribution, and let us define a new random variable,
n
Y X i . This random variable takes values from 0 to n . Associated probabilities can
i 1
be expressed as:
More generally,
,
n n!
where is the binomial coefficient. This represents the probability to
k k ! n k !
have k successes in n Bernoulli trials.
Definitions:
1. The Mean of a function of the random variable X, g X , is defined as
Example (calculate the variance for the random variable defined in Example 0.5, whose
mean is 21/6)
7
Ch 1 Discrete Source and Entropy
Overview
The Information Theory is based on the Probability Theory, as the term information
carries with it a connotation of UNPREDICTABILITY (SURPRISE) in the transmitted
signal.
The Information Source is defined by :
- The set of output symbols
- The probability rules which govern the emission of these symbols.
Finite-Discrete Source: finite number of unique symbols.
The symbol set is called the Source Alphabet.
Definition
A is a source alphabet with M possible symbols, . We can say
that the emitted symbol is a random variable, which takes values in A. The number of
elements in a set is called its Cardinality, e.g.,
is the symbol emitted by the source at time t. Note that here t is an integer time index.
Stationary Source: the set of probabilities is not a function of time. It means, at any
given time moment, the probability that the source emits am is pm Pr(am )
Probability mass function:
Since the source emits only members of its alphabet, then
8
Information Sources Classification
Stationary Versus Non-Stationary Source:
For a Stationary Source the set of probabilities is not a function of time, whereas for a
Non-stationary Source it is.
9
Entropy of a Source
Each transmitted Symbol 1 is just one choice out of 1/p1 many possible choices and
therefore Symbol 1 contains log2 1/p1 bits information (1/ p1 = 2 log2 1/ p1).
Similarly, Symbol k contains log2 1/pk bits information.
The average information bits per symbol for our source is Entropy, it is calculated by
10
Example 1.1: What is the entropy of a 4-ary source having symbol probabilities
PA {0.5,0.3,0.15,0.05} ?
Example 1.3: For a M-ary source, what distribution of probabilities P( A) maximizes the
information entropy H ( A) ?
11
Measurement of the Information Efficiency of the Source is in terms of ratio of the
entropy of the source to the (average) number of binary digits used to represent the
source data.
Example 1.4: For a 4-ary source A {00,01,10,11} that has symbol probabilities
PA {0.5,0.3,0.15,0.05} . What is the efficiency of the source?
When the entropy of the source is lower than the (average) number of bits used to
represent the source data, an efficient coding scheme can be used to encode the source
information, using, an average, fewer binary digits. This is called Data Compression and
the encoder used for that is called Source Encoder.
12
i) If A and B are statistically independent:
ii) If B depends on A:
Example 1.5: We often use a parity bit for error detection. For a 4-ary information source
A {0,1,2,3} with PA {0.25,0.25,0.25,0.25} , and the parity generator B {0,1} with
0, if a 0 or 1
bj { where j 1,2 , find H ( A) , H (B) and H ( A, B) .
1, if a 2 or 3
13
1.1.3 Entropy of Symbol Blocks and the Chain Rule
Example 1.5: Suppose a memoryless source with A {0,1} having equal probabilities
emits a sequence of 6 symbols. Following the 6th symbol, suppose a 7th symbol is
transmitted which is the sum modulo 2 of the six previous symbols (this is just the
exclusive-or of the symbols emitted by A). What is the entropy of the 7-symbol
sequence?
14
Example 1.6: For an information source having alphabet A with |A| symbols, what is the
range of entropies possible?
For an inefficient information source, i.e. H(A) < log2(|A|), the communication system
can be made more cost effective through source coding.
Information
Source Source
Sequence Code Words
Encoder
s0,s1,… s'0,s'1,…
st ϵ A(source s't ϵ B(code
alphabet) alphabet)
15
In its simplest form, the encoder can be viewed as a mapping of the source alphabet A to
a code alphabet B, i.e., C: A→B. Since the encoded sequence must be decoded at the
receiver end, the mapping function C must be invertible.
Goal of coding: average information bits/symbol ~ average bits we use to represent a
symbol (i.e. code efficiency ~ 1).
Example 1.8: Let C be an encoder grouping the symbols in A into ordered pairs
ai , a j , the set of all possible pairs ai , a j is called the Cartesian product of set A
4-ary memoryless source with symbol probabilities given in Example 1.7, determine the
average number of transmitted binary digits per code word and the efficiency of the
encoder. The code words are shown in the table following.
16
< ai,aj> Pr< ai,aj> bm < ai,aj > Pr< ai,aj > bm
a0,a0 00 a2,a0 1101
a0,a1 100 a2,a1 0111
a0,a2 0.075 1100 a2,a2 0.0225 111110
a0,a3 0.025 11100 a2,a3 0.0075 1111110
a1,a0 0.15 101 a3,a0 0.025 11101
a1,a1 0.09 010 a3,a1 0.015 111101
a1,a2 0.45 0110 a3,a2 0.0075 11111110
a1,a3 0.015 111100 a3,a3 0.0025 11111111
If we have source set A and code set B, what are the entropy relationship between them?
A B
17
i) A B
a b
ii) A B
ai b
aj
18
iii)
A B
bi
ai
bj
Lossless Compression:
Lossy Compression:
19
Lossless and lossy compression are terms that describe whether or not, in the
compression of the message, all original data can be recovered when decompression is
performed.
Lossless Compression
- Every single bit of data originally transmitted remains after decompression.
After decompression, all the information is completely restored.
- One can use lossless compression whenever space is a concern, but the
information must be the same.
In other words, when a file is compressed, it takes up less space, but when it is
decompressed, it still has the same information.
- The idea is to get rid of redundancy in the information.
- Standards: ZIP, GZIP, UNIX Compress, GIF
Lossy Compression
- Certain information is permanently eliminated from the original message,
especially redundant information.
- When the message is decompressed, only a part of the original information is still
there (although the user may not notice it).
- Lossy compression is generally used for video and sound, where a certain amount
of information loss will not be detected by most users.
- Standards: JPEG (still), MPEG (audio and video), MP3 (MPEG-1, Layer 3)
Lossless Compression
When we encode characters in computers, we assign each an 8-bit code based on
(extended) ASCII chart. (Extended) ASCII: fixed 8 bits per character
For example: for “hello there!”, a number of 12 characters*8bits=96 bits are needed.
20
Kraft Inequality Theorem
Prefix Code (or Instantaneously Decodable Code): A code that has the property of
being self-punctuating. Punctuating means dividing a string of symbols into words. Thus,
a prefix code has punctuating built into the structure (rather than adding in using special
punctuating symbols). This is designed in a way that no code word is a prefix of any
other (longer) code word. It is also data compression code.
To construct an instantaneously decodable code of minimum average length (for a
source A or given random variable a, with values drawn from the source alphabet), it
needs to follow the Kraft Inequality:
For an instantaneously decodable code B for a source A, the code lengths {li } must
satisfy the inequality
Conversely, if the code word lengths satisfy this inequality, then there exists an
instantaneously decodable code with these word lengths.
Shanno-Fano Theorem
KRAFT INEQUALITY tells us when an instantaneously decodable code exists. But we
are interested in finding the optimal code, i.e., the one that maximizes the efficiency, or
minimizes the average code length, L . The average code length L of the code B for the
source A (with a as a random variable of values drawn from the source alphabet with
probabilities {pi}) is minimized if the code lengths {li} are given by:
a a0 a1 a2 a3
li i=0,1,2,3 1 2 3 3
21
Note that this is the same as the entropy of A, H(A).
Lower Bound on the Average Length
The observation about the relation between the entropy and the expected length of the
optimal code can be generalized. Let B be an instantaneous code for the source A. Then,
the average code length is bounded by:
is bounded by:
Why is the upper bound H(A)+1 and not H(A)? Because sometimes the Shannon
information gives us fractional lengths, and we have to round them up.
Example 1.10: Consider the following random variable a, with the optimal code lengths
given by the Shannon information theorem. Determine the average code length bounds.
a a0 a1 a2 a3 a4
a a0 a1 a2 a3 a4
b 00 10 11 010 011
li i=0,1,2,3 2 2 2 3 3
22
The average code length for this code is .
This is very close to the optimal code length of H(A)=2.2855.
Summary
i) The motivation for data compression is to reduce in the space allocated for data
(increase of source efficiency). It is obtained by reducing redundancy which exists in data.
ii) Compression can be lossless or lossy. In the former case, all information is completely
restored after decompression, whereas in the latter case it is not (used in applications in
which the information loss will not be detected by most users).
iii) The optimal code, which ensures a maximum efficiency for the source, is
characterized by the lengths of the code words given by the Shannon information, log 2 pi .
iv) According to the source coding theorem, the average length of the optimal code is
bounded by entropy as
v) The coding schemes for data compression include Huffman, Lempel-Ziv, Arithmetic
coding.
23
Step 2 : Begin with the two symbols with the two lowest probability symbols. The
combining of the two symbols forms a new compound symbol or a branch in the tree.
This step is repeated using the two lowest probability symbols from the new set of
symbols, and continues until all the original symbols have been combined into a single
compound symbol.
Step 3 : A tree is formed, with the top and bottom stems going from the compound
symbol to the symbols which form it, labeled with 0 and 1, respectively, or the other way
around. Code words are assign by reading the labels of the tree stems from right to the
left, back to the original symbol.
Example 1.12: Let the alphabet of the source A be {a0, a1, a2, a3}, and the probabilities
of emitting these symbols be {0.50 0.30 0.15 0.05}. Draw the Huffman tree and find the
Huffman codes.
0.30 a1
0.15 a2
0.05 a3
24
25
How are the Probabilities Known?
Counting symbols in input string:
- data must be given in advance; requires an extra pass on the input string. Data source’s
distribution is known
- data not necessarily known in advance, but we know its distribution. Reasonable care
must be taken in estimating the probabilities, since large errors lead to serious loss in
26
optimality. For example, a Huffman code designed for English text can have a serious
loss in optimality when used for French.
More Remarks
For Huffman coding, the alphabet and its distribution must be known in advance. It
achieves entropy when occurrence probabilities are negative powers of 2 (optimal code).
Huffman code is not unique (because some arbitrary decisions in the tree construction).
Given the Huffman tree, it is easy (and fast) to encode and decode. In general, the
efficiency of Huffman coding relies on having a source alphabet A with a fairly large
number of symbols. Compound symbols are obtained based on the original symbols (see,
e.g., AxA). For a compound symbol formed with n symbols, the alphabet is An, and the set
of probabilities of the compound symbols is denoted by PAn.
Question: How does one get PAn?
Answer: Easy for a memoryless source. Difficult for a source with memory!
Remarks
LZ coding does not require the knowledge of the symbol probabilities beforehand. It is a
particular class of dictionary codes. They are compression codes that dynamically
construct their own coding and decoding tables by looking at the data stream itself.
In simple Huffman coding, the dependency between the symbols is ignored, while in LZ,
these dependencies are identified and exploited to perform better encoding. When all the
data is known (alphabet, probabilities, no dependencies), it’s best to use Huffman (LZ
will try to find dependencies which are not there…)
This is the compression algorithm used in most PCs. Extra information is supplied to the
receiver, these codes initially “expand”. The secret is that most of the code words
represent strings of source symbols. In a long message it is more economical to encode
these strings (can be of variable length), than it is to encode individual symbols.
27
Definitions related to the Structure of the Dictionary
Each entry in the dictionary has an address, m. Each entry is an ordered pair, <n, ai >.
The former ( n ) is a pointer to another location in the dictionary, it is also the
transmitted code word. ai is a symbol drawn from the source alphabet A. A fixed-length
binary word of b bits is used to represent the transmitted code word. The number of
entries will be lower or equal to 2b. The total number of entries will exceed the number of
symbols, M, in the source alphabet. Each transmitted code word contains more bits that it
would take to represent the alphabet A.
Question: Why do we use LZ coding if the code word has more bits?
Answer: Because most of these code words represent STRINGS of source symbols
other than single.
Encoder
A Linked-List Algorithm (simplified for the illustration purpose) is sued, it inlcudes:
Step 1: Initialization
The algorithm is initialized by constructing the first M +1 (null symbol plus M source
symbols) entries in the dictionary, as follows.
Address (m) Dictionary Entry (n, ai)
0 0 null
1 0 a0
2 0 a1
… … …
m 0 am
… … …
M 0 aM-1
Note: The 0-address entry in the dictionary is a null symbol. It is used to let the decoder
know where the end of the string is. In a way, this entry is a punctuation mark. The
pointers n in these first M+1 entries are zero. It means they point to the null entry at
28
address 0 at the beginning.
The initialization also initializes pointer variable to zero (n=0), and the address pointer to
M +1, (m=M+1). The address pointer points to the next “blank” location in the dictionary.
Iteratively executed:
Step 2: Fetch next source symbol.
Step 3:
If
the ordered pair <n, a> is already in the dictionary, then
n = dictionary address of entry <n, a>
Else
transmit n
create new dictionary entry <n, a> at dictionary address m
m = m+1
n = dictionary address of entry <0, a>
Step 4:
Return to Step 2.
Example 1.13: A binary information source emits the sequence of symbols 110 001 011
001 011 100 011 11 etc. Construct the encoding dictionary and determine the sequence of
transmitted code symbols.
Initialize:
29
0 1 6 5
1 5 6 5 2 5,1
0 2 7 4
1 4 7 4 2 4,1
1 2 8 3
0 3 8 3 1 3,0
0 1 9 5
1 5 9 6
0 6 9 6 1 6,0
1 1 10 1 2 1,1
1 2 11 3
1 3 11 3 2 3,1
0 2 12 4
0 4 12 4 1 4,0
0 1 13 5
1 5 13 6
1 6 13 6 2 6,1
1 2 14 3
1 3 14 11
30
8 3, 0
9 6, 0
10 1, 1
11 3, 1
12 4, 0
13 6, 1
14 No entry yet
Decoder
The decoder at the receiver must also construct an identical dictionary for decoding.
Moreover, reception of any code word means that a new dictionary entry must be
constructed. Pointer n for this new dictionary entry is the same as the received code word.
Source symbol a for this entry is not yet known, since it is the root symbol of the next
string (which has not been transmitted by the encoder).
If the address of the next dictionary entry is m, we see that the decoder can only construct
a partial entry <n, ?>, since it must await the next received code word to find the root
symbol a for this entry. It can, however, fill in the missing symbol a in its previous
dictionary entry, at address m -1. It can also decode the source symbol string associated
with the received code word n.
Example 1.14: Decode the received code words transmitted in Example 1.13.
31
5
6
7
8
9
… … … …
Remarks
Assigns one (normally long) code word to the entire input stream. Reads the input stream
symbol by symbol, appending more bits to the code word each time. The code word is a
number obtained based on the symbol probabilities. The symbols probabilities need to
be known. Encodes symbols using a non-integer number of bits (in average), which
results in a very good efficiency of the encoder (it allows to achieve the entropy lower
bound). It is often used for data compression in image processing.
Encoder
Construct a code interval (rather than a code number), which uniquely describes a block
of successive source symbols. Any convenient b within this range is a suitable code word,
representing the entire block of symbols.
Algorithm:
ai A, I i [ Sli , Shi )
j 0, L j 0, H j 1
32
REPEAT
H j - Lj
Next read ai , use ai's Ii=[Sli,Shi) to update
L j 1 L j + Sli
H j 1 L j + Shi
j j 1
Until all a i have been encoded.
Select a number b that fall in the final interval as the code word.
Example 1.15: For a 4-ary source A {a 0 , a1 , a 2 , a3 } with PA {0.5,0.3,0.15,0.05} ,
j ai Lj Hj ∆ Lj+1 Hj+1
0
1
2 a0 0.5 0.65 0.15 0.5 0.575
3 a3 0.5 0.575 0.075 0.57125 0.575
4 a2 0.57125 0.575 0.00375 0.57425 0.5748125
33
Decoder
In order to decode the message, the symbol order and probabilities must be passed to the
decoder. The decoding process is identical to the encoding. Given the code word (the
final number), at each iteration the corresponding sub-range is entered, decoding the
symbols representing the specific range.
Given b , the decoding procedure is
L 0, H 1, Δ H - L
R ep eat
F in d i s u c h th a t
b - L
Ii
O u t p u t s y m b o l a i , use ai's Ii=[Sli,Shi) to update
L L + S li
H L + S hi
Δ H - L
U n til la s t s y m b o l is d e c o d e d .
Example 1.16: For the source and encoder in Example 1.15, decode b 0.57470703125 .
Practical Issues
Attention: the precision with which we calculate (b L) / .
Round-off error in this calculation can lead to an erroneous answer. Numerical overflow
(see the products S li and S hi ). The limited size of S li and S hi limits the size of the
34
alphabet A. In practice it is important to transmit and decode the info “on the fly.” Here
we must read in the entire block of source symbols before being able to compute the code
word. We also must receive the entire code word b before we can begin decoding.
Code words One code word for One code word for Code words for strings
each symbol all data of source symbols
35
Ch 2 Channel and Channel Capacity
Communication Link
Source
Encoder Channel
Continuous-Input Decoder
Continuous-Output
Informatio
Channel
Source Decoder
c0,c1,...,ct
Alphabet C y0,y1,...,yt
Probabilities PC
Alphabet Y
Probabilities PY
Definition
In most communication or storage systems, the signal is designed such that the output
symbols, y0,y1,...,yt , are statistically independent if the input symbols, c0,c1,...,ct , are
statistically independent. If the output set Y consists of discrete output symbols, and if the
property of statistical independence of the output sequence holds, the channel is called a
Discrete Memoryless Channel (DMC).
36
output symbols, Y. Any particular c from C may have some probability, py|c , of being
transformed to an output symbol y, from Y, this probability is called a (Forward)
Transition Probability.
For a DMC, let p c be the probability that symbol c is transmitted, the probability that the
The probability distribution of the output set Y, denoted by QY, may be easily calculated
in matrix form as
Remarks: The columns of PY|C sum to unity (no matter what symbol is sent, some
output symbol must result). Numerical values for the transition probability matrix are
determined by analysis of the noise and transmission impairment properties of the
channel, and the method of modulation/demodulation.
Hard Decision Decoding : MY = MC. Hard refers to the decision that the demodulator
makes; it is a firm decision on what symbol was transmitted.
Soft Decision Decoding : MY > MC. The final decision is left to the receiver decoder.
37
Example 2.1: C={0,1} , with equally probable symbols; Y={y0, y1, y2}. The transition
probability matrix of the channel is
0.80 0.05
PY |C 0.15 0.15 . QY=?
0.05 0.80
Remarks: The sum of the elements on each column of the transition probability matrix is
1. This is an example of soft-decision decoding.
Example 2.1 (cont’d): Calculate the entropy of Y for the previous system. Compare this
with the entropy of source C.
Remarks: We noticed the same thing when we discussed the source encoder
(encryption encoder). It is possible for the output entropy to be greater than the input
entropy, but the “additional” information carried in the output is not related to the
information from the source. The “extra” information in the output comes from the
presence of noise in the channel during transmission, and not from the source C.
This “extra” information carried in Y is truly “useless”. In fact, it is harmful because it
produces uncertainty about what symbols were transmitted.
Question: Can we solve this problem by using only systems which employ hard-decision
decoding?
38
Answer:
Example 2.2: C={0,1} , with equally probable symbols; Y={0,1}. The transition
probability matrix of the channel is
0.98 0.05
PYC| .
0.02 0.95
Calculate the entropy of Y. Compare this with the entropy of source C.
If Y tells us nothing about C (e.g., Y and C are independent, such as somebody cut the
phone wire and there is no signal getting through).
But if
39
Looking at Y there is no uncertainty on C. i.e., Y contains sufficient information to tell
what the transmitted sequence is. The conditional entropy is a measure of how much
information loss occurs in the channel !
Example 2.3: Calculate the mutual information for the system of Example 2.1.
Remark: The mutual information for this system is well below the entropy ( H(C)=1 )
of the source and so, this channel has a high level of information loss.
Example 2.4: Calculate the mutual information for the system of Example 2.2.
Remarks: This channel is quite lossy also. Although H(Y) was almost equal to H(C) in
Example 2.2, the mutual information is considerably less than H(C) . One cannot tell
how much information loss we are dealing with simply by comparing the input and
output entropies !
40
Question: Why it is not the same as the mutual information ?
Answer: Because for a fixed transition probability matrix, a change in the probability
distribution of C, PC , results in a different mutual information, I(C;Y).The maximum
mutual information achieved for a given transition probability matrix is the Channel
Capacity.
a)
0.98 0.05 0.51289 0.52698
PY|C ,
C C 0.78585, PC , Q
0.48711 Y 0.47302
0.02 0.95
b)
0.80 0.05 0.46761 0.4007
PY|C , C 0.48130, P , Q
0.95 0.53239 Y 0.5993
C C
0.20
c
0.80 0.10 0.4824 0.4377
PY|C , C 0.39775, P , Q
0.90 0.5176 Y 0.5623
C C
0.20
d)
0.80 0.30 0.510 0.555
PY |C , CC 0.191238, PC , Q
0.490 Y 0.445
0.20 0.70
e) 0.80 0.05 0.425
0.5
PY |C 0.15 0.15 , CC 0.57566, PC , QY 0.150
0.05 0.80 0.5
0.425
41
Remarks: The channel capacity proves to be a sensitive function of the transition
probability matrix, PY|C , but a fairly weak function of PC. The last case is interesting, as
the uniform input distribution produces the maximum mutual information.
This is an example of Symmetric Channel. Note that the columns of symmetric
channel’s transition probability matrix are permutations of each other. Likewise, the top
and bottom rows are permutations of each other. The center row, which is not a
permutation of the other rows, corresponds to the output symbol y1, which, as we noticed
in Example 2.3, makes no contribution to the mutual information.
Symmetric Channels
Symmetric channels play an important role in communication systems and many such
systems attempt, by design, to achieve a symmetric channel function. The reason for the
importance of the symmetric channel is that when such a channel is possible, it
frequently has greater channel capacity than an non-symmetric channel would have.
Example 2.6:
0.79 0.05 0.4207
0.50095
PY |C 0.16 0.15 , CC 0.571215, PC , QY 0.1550
0.05 0.80 0.49905 0.4243
The transition probability matrix is slightly changed compared to Example 2.5e), and the
channel capacity decreases.
Example 2.7:
42
Remarks:
i) The capacity for this channel is achieved when PC is uniformly distributed. This is
always the case for a symmetric channel.
ii) The columns of the transition probability matrix are permutations of each other, and so
are the rows.
iii) When the transition probability matrix is a square matrix, this permutation property of
columns and rows is sufficient condition for a uniformly distributed input alphabet to
achieve the maximum mutual information. Indeed, the permutation condition is what it
gives rise to the term “symmetric channel .”
The parameter p is known as the Crossover Probability, and it is the probability that the
demodulator/detector makes a hard-decision decoding error. The BSC is the model for
essentially all binary-pulse transmission systems of practical importance.
Channel Capacity: for uniform input probability distribution
43
The case p = 1 corresponds to a channel which always makes errors. If we know
that the channel output is always wrong, we can easily set things right by
decoding the opposite of what the channel output is.
The case p = 0.5 corresponds to a channel for which the output symbol is as
likely to be correct as it is to be incorrect. Under this condition, the information
loss in the channel is total, and the channel capacity is zero. The capacity of the
BSC is a concave-upward function, possessing a single minimum at p = 0.5.
Except for p = 0 and p = 1 cases, the capacity of the BSC is always less than the
source entropy. If we try to transmit information through the channel using the
maximum amount of information per symbol, some of this info will be lost, and
decoding errors at the receiver will result. However, if we add sufficient
redundancy to the transmitted data stream, it is possible to reduce the
probability of lost information to an arbitrary low level.
and, so
The conditional entropy H(C|Y) corresponds to our uncertainty about what the input of
the channel was, given our observation of the channel output. It is a measure of the
information loss during the transmission. For this reason, this conditional entropy is
often called the Equivocation. The equivocation has the property that
and it is given by
44
The equivocation is zero if and only if the transition probabilities py|c are either zero or
one for all pairs (yY, cC).
Entropy Rate
The entropy of a block of n symbols satisfy the inequality
However, the average bits per channel use is achieved in the limit, when n goes to infinity,
such that
H ( C 0 , C 1 , ..., C n 1 )
R lim H (C )
n n
where R is called the Entropy Rate.
R H (C ) , with equality if and only if all symbols are statistically independent.
Suppose that they are not, and in the transmission of the block, we deliberately introduce
redundant symbols. Then, R < H(C). Taking this further, suppose that we introduce a
sufficient number of redundant symbols in the block so that
Question: Is the transmission without information loss (i.e. zero equivocation) possible in
such case?
Answer: Remarkably enough, the answer to this question is “YES”!
What is the implication of doing so ?
It is possible to send information through the channel with arbitrarily low probability of
error.
The process of adding redundancy to a block of transmitted symbols is called Channel
Coding.
45
Question: Does there exist a channel code that will accomplish this purpose?
Answer: The answer to this question is given by the Shannon’s second theorem.
46
error is made arbitrarily small. It is believed by many that beyond a particular rate, called
Cutoff Rate, R0, it is prohibitively expensive to use the channel. In the case of the binary
symmetric channel, this rate is given by
R0 log 2 0.5 p (1 p )
The belief that R0 is some kind of “sound barrier” for practical error correcting codes
comes from the fact that for certain kind of decoding methods, the complexity of the
decoder grows extremely rapidly as R exceeds R0.
47
Let us number these possible states from 0 to N -1 and let n(t) represent the probability
of being in state n at time t. The probability distribution of the system at time t can
then be represented by the vector
For each state at time t, there are MA possible next states at time t +1, depending on which
symbol is emitted next by the source.
If we let pi |k be the conditional probability of going to state i given that the present state
is k, the state probability distribution at time t + 1 is governed by the transition
probability matrix.
p0|0 p0|1 ... p0| N 1
p p1|1 ... p1| N 1
1|0
PA|
... ... ... ....
pN 1|0 pN 1|1 ... pN 1| N 1
and is given by
Example 2.8: Let A be a binary first-order Markov source with A={0,1}. This source
has 2 states, labeled “0” and “1”. Let the transition probabilities be
0.3 0.4
PA| .
0.7 0.6
What is the equation for the next probability state? Find the state probabilities at time t=2,
given that the probabilities at time t=0 are 0=1 and 1=0.
48
Example 2.9: Let A be a second-order binary Markov source with
Pr( a 0 | 0,0) 0.2 Pr(a 1| 0,0) 0.8
Pr(a 0 | 0,1) 0.4 Pr( a 1| 0,1) 0.6
Pr(a 0 |1,0) 0.0 Pr(a 1|1,0) 1.0
Pr(a 0 |1,1) 0.5 Pr(a 1|1,1) 0.5
If all the states are equally probable at time t = 0, what are the state probabilities at t =1 ?
Remarks: Every column of the transition probability matrix adds to one. Every properly
constructed transition probability matrix has this property.
49
Steady State Probability and the Entropy Rate
Starting from the equation for the state probabilities, is can be shown by induction that
the state probabilities at time t are given by
A Markov process is said to be Ergodic if we can get from the initial state to any other
state in some number of steps and if, for large t, Πt approaches a steady-state value that is
independent of the initial probability distribution, Π0. The steady-state value is reached
when
The Markov processes which model information sources are always ergodic.
Example 2.10: Find the steady-state probability distribution for the source in Example
2.9.
In the steady state, the state probabilities become
It appears from this that we have four equations and four unknowns, so, solving for the
four probabilities is no problem. However, if we look closely, we will see that only three
of the equations above are linearly independent. To solve for the probabilities, we can use
any of three of the above equations and the constraint equation. This equation is a
consequence of the fact that the total probability must sum to unity;
50
which has the solution
0 1/ 9, 1 2 2 / 9, 3 4 / 9.
This solution is independent of the initial probability distribution. The situation
illustrated in the previous example, where only N - 1 of the equations resulting from the
transition probability expression are linearly independent and we must use the “sum to
unity” equation to obtain the solution, always occurs in the steady-state probability
solution of an ergodic Markov process.
Since each possible symbol a leads to a single state, Sn can lead to MA possible next
states. The remaining N - MA states cannot be reached from Sn , and for these states the
transition probability pi|n=0. Therefore, the conditional entropy expression can be
expressed in terms of the transition probabilities as
For large t , the probability of being in state Sn is given by its steady-state probability n.
Therefore, the entropy rate of the system is
51
This expression, in turn, is equivalent to
where pi|n are the entries in the transition probability matrix and the n are the steady-state
probabilities.
Example 2.11: Find the entropy rate for the source in Example 2.9. Calculate the steady-
state probability of the source emitting a “0” and the steady-state probability of the source
emitting a “1”. Calculate the entropy of a memoryless source having these symbol
probabilities and compare the result with the entropy rate of the Markov source.
With the steady-state probabilities calculated in Example 2.10, by applying the formula
for the entropy rate of an ergodic Markov source, one gets
Remarks:
i) In earlier section, we discussed about how introducing redundancy into a block of
symbols can be used to reduce the entropy rate to a level below the channel capacity and
52
how this technique can be used for error correction at the receive-side, in order to
achieve an arbitrarily small information bit error rate.
ii) In this section, we have seen that a Markov process also introduces redundancy
into the symbol block.
Question: Can this redundancy be introduced in such a way such to be useful for error
correction?
Answer: YES! This is the principle underlying a class of error correcting codes known
as convolutional codes.
iii) In the previous lecture we examined the process of transmitting information C
through a channel, which produces a channel output Y. We have found out that a noisy
channel introduces information loss if the entropy rate exceeds the channel capacity.
iv) It is natural to wonder if there might be some (possible complicated) form of data
processing which can be performed on Y to recover the lost information. Unfortunately,
the answer to this question is NO! Once the information has been lost, it is gone!
Y Z
Data Processing
A very common example of this kind of information loss is the roundoff or truncation
error during digital signal processing in a computer or microprocessor. Another
examples is quantization in an analog to digital converter. Designers of these systems
need to have an awareness of the possible impact of such design decisions, as the word
length of the digital signal processor or the number of bits of quantization in analog to
digital converters, on the information content.
53
§2.5 Constrained Channels
Channel Constraints
So far, we have considered only memoryless channels corrupted by noise, which are
modeled as discrete-input discrete-output memoryless channels. However, in many cases
we have channels which place constraints on the information sequence.
Sampler
s(t)
Remarks:
i) When the system needs to recover the timing information, additional information
should be transmitted for that. As the maximum information rate is limited by the
54
channel capacity, the information needed for timing recovery is included at the expense
of user information. This may require that the sequence of transmitted symbols be
constrained in such a way as to guarantee the presence of timing infomation embedded
within the transmitted coded sequence.
ii) Another aspect arises from the type and severity of channel distortions imposed by the
physical bandlimited channel. We can think of the physical channel as performing a kind
of data processing on the information bearing waveform presented to it by the modulator.
But data processing might result in information loss. A given channel can thus place its
own constraints on the allowable symbol sequence which can be “process” without
information loss.
iii) Modulation theory tells us that it is possible and desirable to model the
communication channel as a cascade of noise-free channel and an unconstrained noisy
channel (we have implicitly used such a model, except that we have not considered any
constraint on the input symbol sequence).
Noise, nt
The decision block takes these inputs and produces output symbols, yt, drawn from a
finite alphabet Y, with MY ≥ MA.
55
If MY =MA, yt is an estimate of the transmitted symbol at, and the decision block is
said to make a Hard-decision.
If MY > MA, the decision block is said to make a Soft-decision, and the final decision
on the transmitted symbol at is made by the decoder.
Example 2.12: Let A be a source with equiprobable symbols, A={-1,1}. The bandlimited
channel has the impulse response {h0=1 h1=0 h2=-1}. Calculate the steady-state entropy
of the constrained channel’s output and the entropy rate of the sequence xt.
State of the channel at time t : St = <at-1,at-2>.
The states are as follows:
(-1,-1) is state S0, (1,-1) is state S1,
(-1, 1) is state S2, (1, 1) is state S3.
The channel can be represented as a Markov process, with the state diagram given in the
sequel.
-1 / 0
1/2
(0.5) S0 S1
(0.5)
-1 -1 1-1
-1 / 0
-1 / -2 1/2
(0.5) 1/0 (0.5)
-1 1 11
-1 / -2
S2 (0.5) S3
1/0
(0.5)
Note that all transition probabilities, shown in parentheses, are 0.5. The arrows are
labeled at / xt . One can easily show that X={-2, 0, 2}.
The state probability equation is then given by
0.5 0 0.5 0
0.5 0 0.5 0
t 1 t
0 0.5 0 0.5
0 0.5 0 0.5
56
from which we set up 4 equations and find the steady state probabilities, i.e.,
i=0.25, i=0,1,2,3.
The output symbol X's probabilities are:
57
Ch 3 Error Control Strategies
Stop-and-Wait ARQ
Continuous ARQ
58
Types
Stop-and-Wait (SW) ARQ: The transmitter sends a block of information to the receiver
and waits for a positive (ACK) or negative (NAK) acknowledgment from the receiver. If
an ACK is received (no error detected), the transmitter sends the next block. If a NAK is
received (errors detected) , the transmitter resends the previous block. When the errors
are persistent, the same block may be retransmitted several times before it is correctly
received and acknowledged.
Continuous ARQ: The transmitter sends blocks of information to the receiver
continuously and receives acknowledgments continuously. When a NAK is received, the
transmitter begins a retransmission. It may back-up to the block and resend that block
plus the N-1 blocks that follow it. This is called Go-Back-N (GBN) ARQ. Alternatively,
the transmitter may simply resend only those blocks that are negatively acknowledged.
This is known as Selective Repeat (SR) ARQ.
Comparison
GBN Versus SR ARQ
SR ARQ is more efficient than GBN ARQ, but requires more logic and buffering.
Continuous Versus SW ARQ
Continuous ARQ is more efficient than SW ARQ, but it is more expensive to implement.
For example: In a satellite communication, where the transmission rate is high and the
round-trip delay is long, continuously ARQ is used. SW ARQ is used in systems where
the time taken to transmit a block is long compared to the time taken to receive an
acknowledgment. SW ARQ is used on half-duplex channels (only one way transmission
at a time), whereas continuous ARQ is designed for use on full-duplex channels
(simultaneous two-way transmission).
Performance Measure
Throughput Efficiency: is the average number of information (bits) successfully
accepted by the receiver per unit of time, over the total number of information digits that
could have been transmitted per unit of time.
59
Delay of a Scheme: The interval from the beginning of a transmission of a block to the
receipt of a positive acknowledgment for that block.
60
§3.2 Forward Error Correction
Example 3.1: Consider a coded communication system using an (23, 12) binary Golay
code for error control. Each code word consists of 23 code digits, of which 12 are of
information. Therefore, there are 11 redundant bits, and the code rate is R=12/23=0.5217.
Suppose that BPSK modulation with coherent detection is used and the channel is
AWGN, with one-side PSD N0 . Let Eb / N0 at the input of the receiver be the signal-to-
noise ratio (SNR), which is usually expressed in dB.
61
The bit-error performance of the (23,12) Golay code with both hard- and soft-decision
decoding versus SNR is given, along with the performance of the uncoded system.
From the above figure, the coded system, with either hard- or soft-decision decoding,
provides a lower bit-error probability than the uncoded system for the same SNR, when
the SNR is above a certain threshold.
With hard-decision, this threshold is 3.7 dB.
For SNR=7dB, the BER of the uncoded system is 8x10-4, whereas the coded system
(hard-decision) achieves a BER of 2.9x10-5. This is a significant improvement in
performance.
For SNR=5dB this improvement in performance is small: 2.1x10-3 compared to 6.5x10-3.
However, with soft-decision decoding, the coded system achieves a BER of 7x10-5.
62
Performance Measures – Coding Gain
The other performance measure is the Coding Gain. Coding gain is defined as the
reduction in SNR required to achieve a specific error probability (BER or WER) for a
coded communication system compared to an uncoded system.
For a BER=10-5, the Golay-coded system with hard-decision decoding has a coding gain
of 2.15 dB over the uncoded system, whereas with soft-decision decoding, a coding gain
of more than 4 dB is achieved. This result shows that soft-decision decoding of the Golay
code achieves 1.85 dB additional coding gain compared to hard-decision decoding at a
BER of 10-5.
This additional coding gain is achieved at the expense of higher decoding complexity.
Coding gain is important in communication applications, where every dB of improved
performance results in savings in overall system cost.
Remarks:
At sufficient low SNR, the coding gain actually becomes negative. This threshold
phenomenon is common to all coding schemes. There always exists an SNR below which
the code loses its effectiveness and actually makes the situation worse. This SNR is
called the Coding Threshold. It is important to keep this threshold low and to maintain a
coded communication system operating at an SNR well above its coding threshold.
Another quantity that is sometimes used as a performance measure is the Asymptotic
Coding Gain (the coding gain for large SNR).
Shannon’s Limit
63
In designing a coding system for error control, it is desired to minimize the SNR
required to achieve a specific error rate. This is equivalent to maximizing the coding
gain of the coded system compared to an uncoded system using the same modulation
format. A theoretical limit on the minimum SNR required for a coded system with
code rate R to achieve error-free communication (or an arbitrarily small error
probability) can be derived based on Shannon’s noisy coding theorem.
This theoretical limit, often called the Shannon Limit, simply says that for a coded
system with code rate R, error-free communication is achieved only if the SNR exceeds
this limit. As long as SNR exceeds this limit, Shannon’s theorem guarantees the existence
of a (perhaps very complex) coded system capable of achieving error-free
communication.
For transmission over a binary-input, continuous-output AWGN with BPSK signaling,
the Shannon’s limit, in terms of SNR as a function of the code rate does not have a close
form; however, it can be evaluated numerically.
0.188 dB
64
Convolutional
Code, R=1/2
5.35 dB
9.462 dB
Shannon’s limit
65
§3.4 Codes for Error Control
66
Types of Channnels
Types of Channels
Burst-Error Channels
Compound Channels
Random Error Channels: are memoryless channels; the noise affects each transmitted
symbol independently. Example: deep space and satellite channels, most line-of-sight
transmission.
Burst Error Channels: are channels with memory. Example: fading channels (the
channel is in a “bad state” when a deep fade occurs, which is caused by multipath
transmission) and magnetic recordings subject to dropouts caused by surface defects and
dust particles.
Compound Channels: both types of errors are encountered.
67
Ch 4 Error Detection and Correction
At Transmitter
At Receiver
Definition
A code can be characterized in terms of its amount of error detection capability and error
correction capability. The Error Detection Capability is the ability of the decoder to
tell if an error has been made in transmission. The Error Correction Capability is the
ability of the decoder to tell which bits are in error.
68
Binary Code, M={0,1}
Coded sequence, C
Channel Encoder
Assumptions:
- independent bits
- each message is equally probable, 2k equally likely messages, of k bits each
- r = n-k redundant bits
Thus, the Entropy Rate of the coded word is , this is also called the Code Rate.
Hamming Weight wH of a code word is defined as the number of “1” bits in the code
word (the Hamming distance between the code word and the zero code word).
69
Example 4.1 (cont’d) : For the received words in the 1st column of the Table below,
determine their source words.
Decision: based on the minimum Hamming distance between the received word and
the code words.
• The code corrects 1 error (dH=1), but does not simultaneously detect the 2 bit
error. Moreover, we can miscorrect the received word.
• The code detects up to two bits in error (3 bits in error lead to a code word;
dmin between the two code words is 3).
70
Received Decoded Word Received Decoded
Word Word Word
0000 1000
0001 1001
0010 1010
0011 1011
0100 1100
0101 1101
0110 1110
0111 1111
71
This gives us the lower limit on the number of the redundant bits for a certain minimum
Hamming distance (certain detection and correction capability), and it is called the
Singleton Bound.
Definition
Linear Block Codes can be mathematically treated using the mathematics of vector
spaces.
Binary
(We deal here only with such codes)
Non-Binary
Reed-Solomon
0 0 1 0 0 0
1 1 0 1 0 1
72
( A , , )
n
Vector Scalar
Addition Multiplication
Vector space An is a set with elements a (a0 ,..., an1 ), with each ai A
The set of code words, C, is a subset of An. It is a subspace (2k elements); any subspace
is also a vector space.
If the sum of two code words is also a code word, such a code is called a Linear
Code).
Consequence : All-zero vector is a code word, 0 C (because c1 c1 0)
Vector Space
Linear Independent : For code words, c0 ,..., ck 1
If they are linear independent and if and only if every cC can be uniquely written as
c a0 c0 ... a k 1 ck 1
then, the Dimension of a vector space is defined as the number of basis vectors it takes to
describe (span) it.
73
c mG m ( m 0 ,..., m k 1 )
c ( c 0 ,..., c n 1 ), n k
74
Here, it is linear systematic block code, since
75
§4.2.1 Linear Systematic Block Codes
Definition
If the generating matrix can be written as :
G [P | I k ]
nxk Parity-check
matrix k x n-k Identity matrix k x k
n bits
then, a linear block code generated by such a generator matrix is called Linear
Systematic Block Code. Its code words are in the form of
c0 m0 + m2 +m3
c1 m0 + m 1 +m2 Parity Check Bits (last k bits)
c2 m1 +m2 +m3
c3 m0 ENCODING CIRCUIT
c m
4 1
c5 m2
Information Bits (first r bits)
c6 m3
76
the encoder can be designed as
c1 (0010001)
77
Minimum Hamming Distance
The Minimum Hamming Distance of a linear block code is equal to the Minimum
Hamming Weight of the non-zero code vectors.
In Example 4.3 : n = 7, k = 4, r = 3, dmin = wmin=3
Rules
i) Detect Up to t Errors IF AND ONLY IF dmin t 1
ii) Correct Up to t Errors IF AND ONLY IF dmin 2t 1
iii) Detect Up to td Errors and Correct Up to tc Errors IF AND ONLY IF
dmin 2tc 1 and dmin tc td 1
In Example 4.3: n = 7, k = 4, r = 3
The minimum Hamming distance is 3, and, such, the number of errors which can be
detected is 2 and the number of errors which can corrected is equal to 1. The code does
not have the capability to simultaneously detect and correct errors. (see the relations
between dmin and the correction/detection capability of a code).
Error Vector
For received vectors
v c+e
Error Vector
78
Parity Check Matrix
GH T = 0
G=Generator Matrix H=Parity Check Matrix k x n-k
kxn n-k x n
mGHT 0 c0 m0 m2 m3 0
cH T 0 c1 m0 m1 m2 0
c2 m1 m2 m3 0
(c0c1c2m0m1m2m3 )HT 0
Parity Check Equations
79
Syndrome Calculation and Error Detection
Syndrome is defined as:
s = vH T =0 If v=c
1 x n-k 1xn n x n-k
0 If vc
In Example 4.3: n = 7, k = 4, r = 3, if
Question: What is the number of error patterns which can be detected with this code?
Answer: The total number of error patterns is 2n-1 (the all-zero vector is not an error!).
However, 2k-1 of them lead to code words, which mean that they are not detectable. So,
the number of error patterns which are detectable is 2n-2k.
80
Error Correction Capacity
Likelihood Test
Why and When the Minimum Hamming Distance is a Good Decoding Rule ?
Let c1 , c 2 be two code words and v be the received word,
If c1 is the actual code word, then the number of errors is
If c 2 is the actual code word, then the number of errors is
Which of these two code words is most likely based on v ?
The most likely code word is the one with the greatest probability of occuring with the
received word, i.e.,
81
The joint probabilities can be further written as
p v, ci p v |ci p ci (i = 1,2)
where t i d (v , ci ) is the number of errors that have occurred during the transmission of
code word ci . Since there is a specific error pattern for a received word, the binomial
coefficient does not appear in above.
IF
Condition 1: the code words have the same probability and
Condition 2: p < 0.5 (p is the crossover probability of the BSC channel)
By performing some calculations, one gets that:
82
§4.2.4 Decoding Linear Block Codes
When decoding with the standard array, we indentify the column of the array where the
received vector appears. The decoded vector is the vector in the first row of that column.
Each row is called Coset. In the first column we have all correctable error patterns.
These are called Coset Leaders. Decoding is correctly done if and only if the error
pattern caused by the channel is a coset leader (including the zero-vector). The
words on each column, except for the first element, which is a code word, are obtained by
adding the coset leader to the code word.
83
84
Syndrome Decoder
Standard array decoder becomes slow when the block code length is large. A more
efficient method is syndrome decoder. Syndrome Vector is defined as:
s = vH T =0 If v=c
1 x n-k 1xn n x n-k
0 If vc
§4.2.4:
4.2.4: Decoding Linear Block Codes
Syndrome Decoder
85
Example 4.4: Design the Syndrome decoder for Example 4.3 in which n = 7, k = 4, r
=3
For the parity-check matrix in Example 4.3 and the single-bit error pattern:
86
§4.2.5 Hamming Codes
Definition
Hamming codes are important linear block codes, used for single-error controlling in
digital communications and data storage systems. For any integer r 3 , there exist a
Hamming Code with the following parameters:
Code Length:
Number of information digits:
Number of parity check digits:
Error correction capability:
Systematic Hamming code has:
In Example 4.3: n = 7, k = 4, r = 3
Code Length: n 2r 1 7
Number of information digits: k 2 1 r 4
r
Example 4.6: Write down the generator matrix for the Hamming code of Example
4.5.
87
Perfect Code
If we form the standard array for the Hamming code of length n 2 r 1 , the n-tuples of
weight 1 can be used as coset leaders. Recall that the number of cosets is 2 n / 2 k 2 r !
That would be the zero vector and the n–tuples of weight 1. Such a code is called a
Perfect Code. “PERFECT” does not mean “BEST”!
A Hamming code corrects only error patterns of single error and no others.
Theorem 2: The minimum weight (distance) of a code is equal to the smallest number of
columns of H that sum to 0.
In Example 4.3: n = 7, k = 4, r = 3
The columns of H are non-zero and distinct. Thus, no two columns add to zero, and the
minimum distance of the code is at least 3. As H consists of all non-zero r-tuples as its
columns, the vector sum of any such two columns must be a column in H, and thus, there
are three columns whose sum is zero. Hence, the minimum Hamming distance is 3.
88
Shortened Hamming Codes
If we delete columns of H of a Hamming code, then the dimension of the new parity
check matrix, H’, becomes r(2r 1) .. Using H’ we obtain a Shortened Hamming
Code, with the following parameters:
Code Length:
Number of information digits:
Number of parity check digits:
Minimum Hamming Distance:
We delete from PT all the columns of even weight, such that no three columns add to zero
(since total weight must be odd). However, for the column of weight 3, there are 3
columns in Ir , such that the 4 columns’ sum is zero. We can thus conclude that the
minimum Hamming distance of the shortened code is exactly 4. This increases the error
correction and detection capability.
The shortened code is capable of correcting all error patterns of single error and detecting
all error patterns of double errors. By shortening the code, the error correction and
detection capability is increased.
89
Ch 5 Cyclic Codes
Definition
Cyclic code is a class of linear block codes, which can be implemented with extremely
cost effective electronic circuits.
Cyclic Shift Property
A cyclic shift of c (c0 c1...cn 2 cn 1 ) is given by
Since a cyclic shift of any of its code vectors results in a vector that is element of C.
Check by yourself.
Example 5.2: Verify the (5,2) linear block code defined by the generator matrix
Its code vectors are
1 0 1 1 1
G
0 1 1 0 1
is not a cyclic code.
90
00000
10111
01101
11010
The cyclic shift of (10111) is (11011), which is not an element of C. Similarly, the cyclic
shift of (01101) is (10110), which is also not a code word.
c (c0 c1...cn 2 cn 1 )
Code Word
c( X ) c0 c1 X ... cn 2 X n 2 cn 1 X n 1
Code Polynomial of degree (highest exponent of X) n -1 or less.
Theorem: The non-zero code polynomial of minimum degree in a cyclic code is
unique, and is of order r.
Theorem 1: A binary code polynomial of degree n -1 or less is a code word if and only if
it is a multiple of g ( X ) .
c ( X ) m( X ) g ( X )
degree k -1 degree r
degree n -1
or less
or less
c ( X ) ( m0 m1 X ... m k 1 X k 1 ) g ( X )
where m0 ,..., m k 1 are the k information digits to be encoded.
91
Theorem 2: The generator polynomial, g ( X ) , of an (n, k) cyclic code is a factor of
X n 1.
Question: For any n and k, is there an (n, k) cyclic code?
Remark: For n large, X n 1 may have many factors of degree n - k. Some of these
polynomials generate good codes, whereas some generate bad codes.
Example 5.3: Determine the factor of X7+1 that can generate (7, 4) cyclic codes.
For a (7,4) code, r=n-k=7-4=3, the generator polynomial can be chosen either as
or
Step 1:
Step 2:
Step 3:
Proof: X nk m ( X ) a( X ) g ( X ) b( X )
degree = n - k degree ≤ n - k -1
degree ≤ n -1
92
Example 5.4: Find (7, 4) cyclic code, generated by g( X ) 1 X X 3 when
m( X ) 1 X 3 i.e. m (1001)
Step 3: Combine b(X) and Xn – k m(X) to form the systematic code word.
c ( 011
1001
)
parity check bits k bits of the message
Generator Matrix
Let (n, k) be a cyclic code, with the Generator Polynomial
93
which is equivalent to the fact that g ( X ), Xg ( X ),..., X k 1 g ( X ) span C.
Example 5.5: Determine the systematic generator matrix for (7, 4) cyclic code, generated
by g(X) 1XX3
1 1 0 1 0 0 0 R1 1 1 0 1 0 0 0
R4+R1+R2
R3+R1
G
0 1 1 0 1 0 0 R2 0 1 1 0 1 0 0
0 =
0 1 1 0 1 0 R3 1 1 1 0 0 1 0
0 0 0 1 1 0 1 R4 1 0 1 0 0 0 1
systematic form
94
The (7, 4) cyclic code, generated by g ( X ) 1 X X 3 when message is (1100)
95
for other messages, the code see below:
Parity-check Matrix
X n 1 g ( X ) h( X )
We know:
degree k
degree r =n - k
Parity-check Polynomial
Let c (c0c1...cn1) be a code word,
c( X ) a( X ) g ( X )
degree ≤ k-1
96
Thus, X k , X k 1 ,, X n 1 do not appear in a( X ) a ( X ) X n , i.e.,
the coefficients of X k , X k 1 ,..., X n 1 must be equal to zero, then
k
hc
i 0 i n i j
0, 1 j n - k
from which we can set up n-k equations.
It can be shown that this is a factor of X n 1 , thus, it can generate an (n, n-k) cyclic
code. The generator matrix of the (n, n-k) cyclic code is
hk hk 1 hk 2 ... h0 0 0 0 ... 0
0 hk hk 1 ... h1 h0 0 0 ... 0
H 0 0 hk ... h2 h1 h0 0 ... 0
................................ .....................................
0 0 0 ... 0 ................h0
0
with h0 hk 1.
As for a linear block code, any code word is orthogonal to every row of H,
( cH T 0 )
H is a Parity Check Matrix of the cyclic code. h(X) is called the parity polynomial of
the code. A cyclic code is uniquely specified by h(X).
Remark: The polynomial X k h( X 1 ) generates the dual code of C, (n, r).
Example 5.6: Find the dual code generator polynomial for (7, 4) cyclic code, generated
by g( X ) 1 X X 3
k 4, r n k 7 4 3
97
Generates
r 7 3 4
All the 3 steps can be accomplished with a division circuit of (n-k)-stage register with
feedback based on g(X). The mechanism of the division process have a simple
implementation for binary polynomials. We assume that the bits are transmitted serially
with the highest power of X being transmitted first.
98
g r 1
g2 g
r 2
S1 g1
Remainder
after this cycle In the general case, S1 ...
1 g
1
g 0
In the next division cycle we have g 2 X 5 g1 X 4 X 3 divided by X 3 g 2 X 2 g1 X 1
g 2 1 0 g 2 g 2 1 0
Remainder
after this cycle
S2 g1 0 1 g1 g1 0 1 S1
1 0 0 1 1 0 0
I
( r 1)( r 1)
g r 1 1 0 ... 0
g 0
In the r 2 0 1 ...
general S2 ... ... ... ... ... S1 S1
case,
g1 0 0 ... 1
g0 0 0 ... 0
The process continues 2 more times, for a total of k cycles (k=4 here),
S3 S 2 and S 4 S3
The process for the terms m2 X5 is the same, except only k-1=3 cycles are involved . The
same is true for each successive term in Xrm(X), with one less shift in for each decrease in
the power of X.
99
For a general (n, k) code, we can represent the long-division process for the remainder
vector as
g r 1
g
r 2
St St 1 ... mk t , t 1, 2,..., k , S0 0
g
1
g 0
Encoder Circuit
After obtaining the remainder, run Step 3: Combine b(X) and Xn –k
m(X) to form the
systematic code word.
In Example 5.7: For k = 4 and r = 3, design the encoder circuit.
100
X3m(X) X3(m3 X3 m2 X 2 m1X m0 )
Codeword
m3
m2 g2m3 m2
g0 g1 g2
g2m3 Parity-check digits
D D D
g 0 m3 g1 m3 g 2 m3 S1
g 0 m2 g 0 g 2 m3 g0 m3 g1m2 g1 g2 m3 g1 m3 g 2 m 2 g 22 m3 S2
and so on …………………………………………………………………….
b0 b1 b2
101
Homework: Find the encoding circuit for the (7, 4) code, generated by g(X) 1X X3
Encoding a cyclic code can also be accomplish by using its parity polynomial,
h ( X ) 1 h1 X ... hk 1 X k 1 1 X k
As hk =1 (see formula in slide 2)
k 1
cn k j
(1)
hc
i 0 i n i j
, 1 j n - k
This is known as a difference equation.
For a Systematic Code: ( c0 c1 ...cn k 1 cn k ...cn 1 )
n - k parity check binary digits k information binary digits
m0 ... mk 1
Given the k info bits, (1) is a rule for determining the n-k parity check digits, c0 c1 c n k 1 .
The encoder circuit using parity polynomial is
h0=1
102
The Encoding Operations can be described in the following steps:
Step1: Initially, Gate 1 is turned on and Gate 2 is turned off. The k information
k 1
digits, m( X ) m0 m1 X ... mk 1 X , are shifted into the register and the
communication channel simultaneously.
Step 2: As soon as the k information bits have entered the shift register, Gate 1 is turned
off and Gate 2 is turned on. The first parity check digit,
Homework: Find the encoding circuit for the (7, 4) code, generated
by g( X ) 1 X X 3 , based on h(X).
Definition of Syndrome
Cyclic Codes are Linear Block Codes. For a received word v ( v 0 v1 ...v n 1 )
Syndrome is defined as s vH T
103
For Cyclic Codes: Received Polynomial v( X ) v0 v1 X ... vn 1 X n 1
or v( X ) a( X ) g ( X ) s( X )
The received polynomial is shifted into the register with all stages initially set to zero.
As soon as v(X) has been shifted into the register, the content in the register form the
syndrome s(X).
Properties of Syndrome
Let s(X) be the syndrome of a received polynomial v(X). The remainder s(1)(X) resulting
from dividing Xs(X) by the generator polynomial g(X) is the syndrome of v(1)(X), which
is a cyclic shift of v(X) (For proof: see the definition of the syndrome). The syndrome
s(1)(X) of v(1)(X) can be obtained by shifting the register (syndrome) once, with s(X) as
the initial content and with the input gate disabled. This is equivalent with dividing Xs(X)
by g(X).
In general, the remainder s(i)(X) resulting from dividing Xis(X) by the generator
polynomial g(X) is the syndrome of v(i)(X), which is a cyclic shift of v(X). This
104
property is useful in decoding cyclic codes. The syndrome s(i)(X) of v(i)(X) can be
obtained by shifting the register (syndrome) i times, with s(X) as the initial content and
with the input gate disabled. This is equivalent with dividing Xis(X) by g(X).
Example 5.8: Find the syndrome circuit for the (7,4) cyclic code generated by
g ( X ) 1 X X 3 . Suppose that the received vector is v (0010110)
.
Calculate the syndrome and compare it with the contents of the shift register after the 7th
shift. Show the contains of the shift register with the input gate disabled and comment on
the result.
v (0010110)
The remainder of v(X) / g(X) is X 1 , and so, the syndrome is
2
,
or .
For the content of the shift register, see the next table, which is related to the syndrome
circuit.
105
With the input gate disabled, the syndrome of v(1) ( X ) (0001011) is obtained by shifting
the register once, the syndrome of v(2) ( X ) (1000101) is obtained if we shift the register
twice, and so on.
Let c( X ) be the transmitted code polynomial, and let
The syndrome is computed based on the received vector, and the decoder has to estimate
the error pattern e(X) based on the syndrome. However, the error pattern is not known at
the decoder. The syndrome is equal to the remainder of dividing the error pattern by the
generator polynomial.
Remark: The error detection circuit is simply a syndrome circuit with an OR gate whose
inputs are the syndrome digits. If the syndrome is non-zero, the output of the OR gate is 1,
and the presence of errors has been detected.
CYCLIC CODES ARE VERY EFFECTIVE FOR DETECTING ERRORS,
RANDOM OR BURST !
Definition: An error pattern with errors confined to i high-order positions and l-i low-
order positions is also regarded as a burst of length l. This is called an end-around burst.
106
CASE 1: Suppose that e(X) is a burst of length r = n-k or less.
e( X ) X j B ( X )
degree ≤ n-k-1
Because degree{B ( X )} degree{g ( X )} g ( X ) is not a factor of B ( X )
Also X is not a factor of g ( X ), as g ( X ) divides X n 1
e( X ) X j B( X ) is not divisible by g( X )
or, equivalently, the syndrome caused by e(X) is not equal to zero.
The (n, k) cyclic code is capable of detecting any error burst of length
CASE 2: Suppose that e(X) is a burst of length r +1 = n-k+1, and let it start from the ith
position. Thus, it ends at the (i+n-k)th position. Errors are confined to ei , ei 1 ,..., ei n k
with ei e i n k 1
There are 2 n k 1 such bursts (the error bits in the first and last positions are 1, and only
the n-k+1-2 (i.e. n-k-1) positions can take any value, i.e., either 0 or 1). Among these,
only one cannot be detected (zero syndrome), i.e.,
e( X ) X i g ( X )
The fraction of undetectable bursts of length n – k +1 is
107
Example 5.9: Analyze the error detection capacity of the (7, 4) cyclic code generated by
g( X ) 1 X X 3
The minimum Hamming distance for this code is 3, thus, the code can detect up to 2
random errors (see the relation between dmin and td.)
Also, it detects 112 error patterns
The code can detect any burst errors of length
It also detects many burst of length >3.
The fraction of undetectable error patterns with n-k+1=4 errors is .
The fraction of undetectable error patterns with more than 4 errors is
Decoding Steps
The decoding process consists of three steps, as for decoding of linear block codes. These
are:
i)
ii)
108
iii)
Syndrome Computation: The syndrome for cyclic codes can be computed with a
division circuit whose complexity is linearly proportional to the number of parity check
binary digits, i.e., n-k.
Error Corrections: The error-correction step is simply adding (mod-2) the error-pattern
to the received vector (exclusive-or gate).
The association of the syndrome with an error pattern can be completely specified by a
decoding table. This is a straightforward approach to the design of a decoding circuit is
via a combinational logic circuit that implements the table look-up procedure. However,
the limit to this approach is that the complexity tends to grow exponentially with the code
length and number of errors to be corrected.
Cyclic Codes have considerable algebraic properties, which allow a low complexity
structure of the encoder. The cyclic structure of a cyclic code allows us to decode a
received vector v(X) serially. The received digits are decoded one at a time, and each
digit is decoded with the same circuitry.
2
109
Two Cases
As soon as the syndrome has been computed, the decoding circuit checks whether the
syndrome s(X) corresponds to a correctable error pattern e( X ) e0 e1X ... en1X n1,
with an error at the higher position Xn -1, i.e., en-1=1.
CASE I: If s(X) does not correspond to an error pattern with en-1=1, the received
polynomial and the syndrome register are cyclically shifted once simultaneously. We
obtain
CASE II: If s(X) of v(X) does correspond to an error pattern with en-1=1, the first
received digit vn-1 is an erroneous digit, and it must be corrected. The correction is carried
out by the sum vn 1 en 1.
This correction results in a modified received polynomial
v1 ( X ) v0 v1 X ... vn2 X n2 (vn1 en1 ) X n1
The effect of en-1 on the syndrome is removed from the syndrome s(X). v1(X) and the
syndrome register are cyclically shifted once simultaneously. The polynomial which
results now is
v1(1) ( X ) (vn 1 en 1 ) v0 X ... vn 2 X n 1
Its syndrome, s1(1) ( X ) , is the remainder resulting from dividing X [ s ( X ) X n 1 ]
by the generator polynomial g(X).
Proof
n 1
v( X ) a( X ) g ( X ) s( X ) X (Error C orrection)
v( X ) X n 1
a( X ) g ( X ) s( X ) X n 1 X (Shift O nce)
n n
Xv ( X ) X Xa ( X ) g ( X ) Xs ( X ) X
110
Such that the remainder of Xv( X ) X n : g( X ) is the remainder of
n
X [ s( X ) X n1 ] : g ( X ), which is s(1) ( X ) 1, because g ( X ) | ( X 1)
Therefore, if 1 is added to the left end of the syndrome register while it is shifted, we
obtain s1 ( X )
(1)
. The decoding circuitry proceeds to decode vn-2. Whenever an error is
detected and corrected, its effect is removed from the syndrome.
Remarks:
The decoding stops after n shifts (= total number of binary bits in a received
word).
If e(X) is a correctable error pattern, the contents of the syndrome register is zero
at the end of the decoding operation, and the received vector has been correctly
decoded. Otherwise, an uncorrectable error pattern has been detected.
This decoder applies in principle to any (n, k) cyclic code.
But whether it is practical depends entirely on its error-pattern detection circuit.
In some cases this is a simple circuit.
Design Decoder
Example 5.10: Design the decoder for the (7,4) cyclic code generated by
g(X ) 1 X X 3
d min 3
It is capable of correcting any single error over a block of 7 bits. There are 7 such error
patterns. These and the all-zero vector form all the coset leaders of the decoding table.
They form all correctable error patterns. Suppose that the received polynomial,
v( X ) v0 v1 X ... v6 X 6
111
1
3 112
In the sequel, we give an example for the decoding process when the codeword
c (1001011) ( c ( X ) 1 X 3 X 5 X 6 ) is transmitted and v (1011011)
( v( X ) 1 X 2 X 3 X 5 X 6 ) is received. A single error occurs at location X2.
When the entire received polynomial has been shifted into the syndrome and buffer
registers, the syndrome register contains (001). We see that after 4 shifts, the content in
the syndrome register is (101) and the next digit to come out from the buffer is the
erroneous digit, v2.
4 3
113
Ch 6 Convolutional Codes
Encoding
The source data is broken into frames of k0 bits per frame. M +1 frames of source data are
coded into n0-bit code frame, where M is the Memory Depth of the shift register.
Convolutional codes are encoded using shift registers. As each new data frame is read,
the old data is shifted one frame to the right, and a new code word is calculated.
Characteristics of the Code: Code Rate , Constraint Length
For binary convolutional codes: k0=1
114
Example 6.1: For a R 1/ 2 , 3 binary convolutional encoder below, determine
its code polynomials.
c0 ( X ) m( X ) g 0 ( X )
such that
c1 ( X ) m( X ) g1 ( X )
The vector corresponding to the output is
C( X ) [c0 ( X ) c1 ( X )] m( X )[ g0 ( X ) g1 ( X )] m( X )G ( X )
For example, if message
m( X ) 1 X X 3
Then
Let us assume that the highest power of X is the first symbol transmitted, and that we
first send c0, and then c1. Thus, the transmitted sequence is
C0(0) C1(0) C0(1) C1(1) … C0(t) C1(t) =
You can also input the message to the encoder directly to verify the result. The message
has 4 bits, i.e., ( 1 0 1 1), but the transmitted sequence contains 12 transmitted bits.
Therefore, the Code Rate is 4/12=1/3, not 1/2 !
115
Effective Code Rate
In Example 6.1, the code rate is 1/3, the explanation for that is that the encoder has M=2
memory elements and it has to “flush” its buffer to complete the code sequence. The last
two code symbols in the transmitted code sequence, i.e., 01 and 11, correspond to
empting the encoder’s shift register. The first 8 bits correspond to the 4 message bits at
rate ½, so the Effective Code Rate is 1/3. This reduction in the code rate is known as the
Fractional Rate Loss.
For a convolutional code with rate R (K bits of information) and memory depth M, the
Effective Code Rate is
Convolutional codes are effective when K M , the effective code rate approaches the
code rate.
State Diagram
The convolutional encoder is a “state machine” (it is convenient to represent its operation
using a State Diagram). With M memory elements, it has 2 M states
116
Example 6.2: Find the state diagram for the encoder in Example 6.1.
M=2, we associate the 2M=4 states with the content of the shift register, as
Trellis Diagram
Trellis Diagram is to use states at different time to analyze performance of a
convolutional code
117
Adversary Paths
The error-correcting property of a convolutional code is determined by the adversary
paths through the trellis. Adversary Paths: the paths that begin in the same state and
end in the same state, and have no state in common at any step between the initial and
final states.
118
Performance is based on the Hamming distance ( d H (ci , c j ) of the two code sequences)
between the adversary paths in the trellis. As we can see in this simple example, the
number of adversary paths grows, and we wonder how we can handle the combinatorics
involved. The trellis path analysis is simplified in case of linear codes. On such case, the
Hamming distance between two code sequences in the trellis is equivalent to the
Hamming distance between some code word and the all-zero code sequence.
Transfer function
This information can be found using transfer function. We will show only the non-zero
adversary paths which begin and end in state S0. We modify the state diagram by
removing the self loop at the S0 state, and adding a new node S0, representing the
termination of the non-zero adversary path.
Transfer Function Operators
S 0 S1
1 | 11
Source symbol of Code symbol of weight
weight 1 2
Source Symbol Weight Operator: N Code Symbol Weight Operator: D
Time Index Operator: J
For this case, is the state operator for the transition (exponent means number
of ``1`` bits in D or N).
Example 6.3: Write the transfer operators for each branch of the state diagram.
119
Results are:
We can solve for the transfer function for all possible paths starting at S0 and ending at S0,
by writing a set of state equations for the transfer function diagram.
with X 0 , X 0( e) the beginning and ending state S0, respectively. The transfer function
T ( J , N , D) is found by solving this set of equations for X 0( e ) , with X 0 1 using linear
algebra,
D5 NJ 3
T ( J , N , D)
1 DNJ (1 J )
To see the the individual adversary paths, apply long division and then
T ( J , N , D) D5 NJ 3 D6 N 2 J 4 (1 J ) ....
120
D 6 N 2 J 4 (1 J ) shows that there are exactly two paths of Hamming weight 6 and both
paths involve source symbols with Hamming distance 2. One is reached is 4 transitions,
and the other one in 5. With this information, the two paths satisfied are found as
S 0 S1 S 3 S 2 S 0
S 0 S1 S 2 S1 S 2 S 0
Viterbi Algorithm
Convolutional codes are employed when significant error correction capability is required.
In such cases, the decoding cannot be carried out using syndrome method and shift
register circuits, but a more powerful method is needed. Such a method was introduced
by Viterbi (1965) and quickly became known as the Viterbi algorithm. The Viterbi
Algorithm is of major practical importance, and we will introduce it primarily by means
of examples.
We have seen that a convolutional code with constraint length M 1 has states
in its trellis. One way to view the Viterbi decoder is to construct it as a network of simple,
identical processors, with one processor for each state in the trellis.
For example: v 3, M 2 , it needs 2 2 4 states.
Example of node processor: It receives inputs from the node processors S0 and S2, and
supplies outputs for node processors S0 and S1.
S0 S0
S1 S1
S2 S2
S3 S3
t t+1 t+2
Each processor does the following: i) monitors the received code sequence, y(X), which
can be written as y(X)=c(X)+e(X). Each processor calculates a number (likelihood
121
metric) that is related to the probability that the received sequence arises from a
transmitted sequence. The likelihood metric is the accumulated Hamming distance
between the received sequence and expected transmitted sequence. The larger the
distance, the less likely it is that this processor is decoding the true transmitted message.
2) Each processor must supply, as an output, its likelihood metric to each node processor
connected to its output side. 3) For each of its input paths, the node processor must
calculate the Hamming distance between the n-bit code symbol y and the n-bit code
symbol it should have received if the path of the transmitted message had just made a
transition (likelihood update). It adds the likelihood update to the likelihood supplied
to it by the source node processor. It selects the path associated to the input-side
processor having the smallest accumulated Hamming distance (the most likely path).
4) Based on which path is selected, the processor must decode the message associated
with the selected path and update a record (called Survivor Path Register) of all of the
decoded message bits associated with the selected path.
Example 6.4: Assume that we have the convolutional code discussed as in Example 6.1.
At time t, assume that the processors have the following initial conditions:
Assume that the received code-word symbol at time t is y=11. Find the resulting
likelihoods and survivor path registers for each of the node processors at time t+1.
122
Write down Trellis Diagram (see example discussed earlier)
y = 11 For node S0: if y=11
from S0, then 0/00
y=11
from S2, then 0/11
0 01
123
Above is the result of applying the Viterbi algorithm. The solid lines are the selected
Paths , the dashed lines are Rejected Paths. T=tied path, is shown above the
branches. The accumulated Hamming distances are indicated below each node. the first
two steps are easier since we know S0 always winds (other u are large). Results of steps
3-6 are
t = 5 S 0 : 01100x t = 6 S 0 : 101100
S1 : 00001x S1 : 101101
S 2 : 10110x S 2 : 101110
S 3 : 10111x S 3 : 101111
124
After the 3rd step, we cannot decide on the correct decoding of even the 1st bit (since the
4 path registers disagree on what this bit should be). Till the 6th step, all 4 survivor
registers agree on the first 4 decoded bits. Why? If you trace back from t=6, all surviving
paths join together at t = 4. However, see the tie! This result depends on how we choose
the tie!
After the algorithm has a chance to observe a sufficient number of received symbols, it is
able to use the sequence of information to pick the globally most likely transmitted
sequence.
Notice that the path selection for the 4 first steps through the trellis cannot be changed by
any further decisions the node processors may make. This is because all the node
processors now agree on the first four steps.
Received: 10 10 00 01 10 01
Most Likely: 11 10 00 01 ?? ??
In any practical implementation of the Viterbi algorithm, we must use a finite number of
bits for the survivor path register. This is called the Decoding Depth.
If we use a few bits, the performance of the algorithm will be hurt by having to force the
decoding decisions when we run out of decision bits. In such case, the “most likely” bits
are those that lead to the best likelihood metric. Most of the time this will result in correct
decoding, but sometimes it will not. An erroneous decision is called Truncation Error.
How many bits of decoding depth are required to make the probability of
truncation error negligible?
Forney (1970) gave the answer to this question. Answer: 5.8 times the number of bits
in the encoder’s shift register, i.e,
125
processor’s likelihood. This leaves the relative likelihoods unchanged, while limiting the
range of the likelihood number each node processor must be able to express.
0 1
126
Instead of transferring the contents of the survivor register, each node processor is
assigned a unique register in which we store a single bit. This is the last bit of the
state picked by that node processor as survivor path (in the previous example this is
“1”). As we deal with binary codes, each node has two inputs (two path choices). The
bit that can be chosen is different for the two possible paths (see the trellis diagram).
This will always be true with the state-naming convention we are using.
Trellis Diagram
Only the surviving path decisions are shown at each time step. The solid line is the
survivor path agreed on by all four nodes processors at the last time step shown in
the figure below.
127
The entries into each node processor’s traceback (i.e., survivor path) register at each
trellis step are shown in the figure. The traceback process is also illustrated. It begins at
the far right side of the figure and proceeds backwards in time. Once the traceback is
completed, the decoded bit sequence is read from left to right. The path traces back to
state “00” (S0). Whatever else may have happened during the time prior to the start of
the figure, we know that the last 2 bits leading into state “00” must have been “0,0”, so,
the decoded message sequence corresponding to the solid line must be . The
last two message bits, corresponding to the final 2 steps through the trellis have not
been decoded yet (due to the extra decoding lag mentioned above).
convolutional codes are determined by the minimum free distance. Convolutional codes
provide very powerful error correction capability, at the price of low code rate.
For examples
128
Using Nonbinary Convolutional Codes
So far we have been looking only at convolutional codes with rate 1/n0 ( low R ). If the
source frame is increased to some k0 >1, we can achieve a rate k0/n0 convolutional code.
Example 6.6: Find the code rate and Trellis diagram for the 2-source frame encoder
shown on next page.
R= ; df = 3, it is a 4-ary code.
129
In Example 6.6: The number of inputs in each trellis node processor is equal to 4.
(Disadvantage!) In general, a k0/n0 convolutional code requires to deal with 2 k0 input
paths, so, the complexity of the Viterbi decoder increases geometrically with k0. This is a
severe problem. Non-binary convolutional codes are non-popular.
130
the best known non-binary codes of the same rate and memory depth. Punctured codes
with rates up to 9/10 are known. Punctured codes are still linear codes (but not longer
shift invariant).
Example 6.7: Us the same encoder as Example 6.1, but c1 is punctured in every second
code-word. Find its code rate and trellis diagram.
(5 in octal)
2
The second message bit is encoded
R
using only the generator polynomial 7. 3
Code sequence:
131
Here the puncturing period is equal to 2. It requires (M=2) node processors, and
the state diagram contains 8 (= 4x2, 4 times puncture period) states.
Examples 6.9: Find the punctual period of the punctured code (15,17), 15,17
The second message bit is encoded using only the generator polynomial 15, whereas the
3
third message bit only by using the generator polynomial 17, thus, R
4
Code sequence:
Here the puncturing period is equal to 3. The Viterbi decoder requires (M=3) node
processors, and the state diagram contains 24 (= 8x3, 8 times puncture period) states.
The punctured codes presented here are punctured versions of known good-rate 1/2 codes.
However, it is not always true that puncturing a good code (1/n0 rate) yields a good
punctured code. There is no known systematic procedure for generating good
punctured convolutional codes. Good codes are discovered by computer search.
132
133