0% found this document useful (0 votes)
151 views148 pages

Ec23ec4211itc PPT

This document provides information about an "Information Theory and Coding" course, including the course code, instructor details, textbooks, objectives, and an introduction to key concepts. The main objectives are to understand the mathematics and physical meaning of information theory, various channel coding techniques, and apply the knowledge to communication problems. Some key concepts introduced are information sources, entropy, conditional and joint entropies, information rate, and the relationship between information theory and digital communication systems.

Uploaded by

Prathmesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views148 pages

Ec23ec4211itc PPT

This document provides information about an "Information Theory and Coding" course, including the course code, instructor details, textbooks, objectives, and an introduction to key concepts. The main objectives are to understand the mathematics and physical meaning of information theory, various channel coding techniques, and apply the knowledge to communication problems. Some key concepts introduced are information sources, entropy, conditional and joint entropies, information rate, and the relationship between information theory and digital communication systems.

Uploaded by

Prathmesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

Information Theory and Coding

Course Code: EC716+CS4103


By
Dr. Nagendra Kumar
Assistant Professor
Department of ECE
NIT Jamshedpur
Text Books:
 T1. Information Theory, Inference, and Learning Algorithms: David J.C.
MacKay, Cambridge
T2. Information Theory Coding and Cryptography: Ranjan Bose, McGraw-
Hill.
 Reference Books:
R1. Principle of Digital Communication and Coding: Andrew J. Viterbi, Jim K.
Omura, McGraw-Hill, United States of America.
R2. Principle of Communication Engineering: W. Jacobs, John Wiley.
R3. Information Theory and Reliable Communication: R. Gallager, John Wiley
Objectives
The main objective is to introduce the fundamental limits of communication
with practical techniques to realize the limits specified by information
theory. The course emphasizes:
 To deeply understand the mathematics of Information Theory and its
physical meaning.
 To understand various channel coding techniques.
 Can apply the knowledge to real problems in communication
applications.
Introduction

The purpose of communication system is to carry information bearing


base band signals from one place to another placed over a
communication channel.
Information theory is concerned with the fundamental limits of
communication.
It provides some fundamental knowledge to understanding and
characterizing the performance of communication systems
What is the ultimate limit to data compression?
Contd…
 What is the ultimate limit of reliable communication over a noisy
channel, e.g. how many bits can be sent in one second over a
telephone line?
 Information Theory is a branch of probability theory which may be
applied to the study of the communication systems that deals with the
mathematical modelling and analysis of a communication system
rather than with the physical sources and physical channels.
 Two important elements presented in this theory are Binary Source
(BS) and the Binary Symmetric Channel (BSC).
 A binary source is a device that generates one of the two possible
symbols ‘0’ and ‘1’ at a given rate ‘r’, measured in symbols per
second
Contd…

 These symbols are called bits (binary digits) and are generated randomly.
 The BSC is a medium through which it is possible to transmit one symbol
per time unit. However this channel is not reliable and is characterized by
error probability ‘p’ (0 ≤ p ≤ 1/2) that an output bit can be different from the
corresponding input.
 Information theory tries to analyse communication between a transmitter and
a receiver through an unreliable channel and in this approach performs an
analysis of information sources, especially the amount of information
produced by a given source, states the conditions for performing reliable
transmission through an unreliable channel.
Contd…
 The source information measure, the channel capacity measure and the
coding are all related by one of the Shannon theorems, the channel coding
theorem which is stated as: ‘If the information rate of a given source does
not exceed the capacity of a given channel then there exists a coding
technique that makes possible transmission through this unreliable channel
with an arbitrarily low error rate.
Contd…
 There are three main concepts in this theory:
1. The first is the definition of a quantity that can be a valid measurement of
information which should be consistent with a physical understanding of
its properties.
2. The second concept deals with the relationship between the information
and the source that generates it. This concept will be referred to as the
source information. Compression and encryptions are related to this
concept.
3. The third concept deals with the relationship between the information and
the unreliable channel through which it is going to be transmitted. This
concept leads to the definition of a very important parameter called the
channel capacity. Error-correction coding is closely related to this concept
Contd…

Digital Communication System


What is Information?

Information of an event depends only on its probability of occurrence


and is not dependent on its content.
 The randomness of happening of an event and the probability of its
prediction as a news is known as information.
The message associated with the leastlikelihood event contains the
maximum information.
Axioms of Information:

1. Information is a non-negative quantity: I (p) ≥ 0.


2. If an event has probability 1, we get no information from the
occurrence of the event: I (1) = 0.
3. If two independent events occur (whose joint probability is the product
of their individual probabilities), then the information we get from
observing the events is the sum of the two information: I (p1* p2) =
I (p1) + I (p2).
4. I (p) is monotonic and continuous in p.
Information Source
An information source may be viewed as an object which produces an
event, the outcome of which is selected at random according to a
probability distribution.
The set of source symbols is called the source alphabet and the
elements of the set are called symbols or letters
Information source can be classified as having memory or being
memory-less.
A source with memory is one for which a current symbol depends on
the previous symbols.
Information Source

A memory-less source is one for which each symbol produced is


independent of the previous symbols.
A discrete memory-less source (DMS) can be characterized by the list of
the symbol, the probability assignment of these symbols and the
specification of the rate of generating these symbols by the source.
Information Content of a DMS

The amount of information contained in an event is closely related to its


uncertainty.
A mathematical measure of information should be a function of the
probability of the outcome and should satisfy the following axioms

a) Information should be proportional to the uncertainty of an outcome


b) Information contained in independent outcomes should add up
Information Content of a Symbol
(i.e. Logarithmic Measure of Information):

 Let us consider a DMS denoted by ‘x’and having alphabet {x1, x2, ……, xm}.
 The information content of the symbol xi, denoted by I (𝑥𝑖) is defined by
1
𝐼 𝑥𝑖 = 𝑙𝑜𝑔𝑏 = −𝑙𝑜𝑔𝑏 𝑝(𝑥𝑖)
𝑝(𝑥𝑖)
where p(𝑥𝑖) is the probability of occurrence of symbol 𝑥𝑖.
 For any two independent source messages xi and xj with probabilities 𝑃𝑖 and 𝑃𝑗
respectively and with joint probability P (𝑥𝑖 , 𝑥𝑗) = Pi Pj, the information of the
messages is the addition of the information in each message. 𝐼𝑖𝑗 = 𝐼𝑖 + 𝐼𝑗.
Contd…
Note that I(xi) satisfies the following properties.
1. I(xi) = 0 for P(xi) = 1
2. I(xi) ≥ 0
3. I(xi) > I(xj) if P(xi) < P(xj)
4. I(xi, xj) = I(xi) + I(xj) if xi and xj areindependent

 Unit of I (xi): The unit of I (xi) is the bit (binary unit) if b = 2, Hartley or
decit if b = 10 and nat (natural unit) if b = e. it is standard to use b = 2.

ln a log a
log 2 a = =
ln 2 log 2
Entropy (i.e. Average Information):

 Entropy is a measure of the uncertainty in a random variable. The entropy, H, of a


discrete random variable X is a measure of the amount of uncertainty associated
with the value of X.
 For quantitative representation of average information per symbol we make the
following assumptions:
i) The source is stationary so that the probabilities may remain constant with time.
ii) The successive symbols are statistically independent and come from the source
at an average rate of ‘r’ symbols per second.
Entropy (i.e. Average Information):

 The quantity H(X) is called the entropy of source X. it is a measure of the average
information content per source symbol.
 The source entropy H(X) can be considered as the average amount of uncertainty
within the source X that is resolved by the use of the alphabet.
 H(X) = E [I(xi)] = - ΣP(xi) I(xi) = - ΣP(xi)log2 P(xi) b/symbol.
 Entropy for Binary Source:
1 1 1 1
𝐻 𝑋 = − log 2 − log 2 = 1 bit/symbol
2 2 2 2
Entropy (i.e. Average Information):

 The source entropy H(X) satisfies the relation: 0 ≤ H(X) ≤ log2 m, where m
is the size of the alphabet source X.
 Properties of Entropy:
1) 0 ≤ 𝐻 (𝑋) ≤ log2 𝑚 ; m = no. of symbols of the alphabet of
source X.
2) When all the events are equally likely, the average uncertainty
must have the largest value i.e. log2 𝑚 ≥ 𝐻 ( 𝑋)
3) H (X) = 0, if all the P(xi) are zero except for one symbol with P = 1.
Information Rate:

 I f t h e t i me rate at which X emits symbols is ‘r’ (symbols s), the


information rate R of the source is given by
 R = r H(X) b/s [(symbols / second) X (information bits/ symbol)].
 R is the information rate. H(X) = Entropy or average information.
1.0
Entropy H(bits/sec)

0.5

00
0.5 1.0
Probability, p
Conditional and Joint Entropies
 Using the input probabilities P (xi), output probabilities P (yj), transition
probabilities P (yj/xi) and joint probabilities P (xi, yj), various entropy functions for
a channel with m inputs and n outputs are defined

H X = − ෍ P(x i )log 2 p(x i )


i=1

H Y = − ෍ P(y 𝑗 )log 2 p(𝑦𝑗 )


j=1
Contd…

n m
H X Y = −෍ ෍ P x i , y j log 2 P(x i |y j )
j=1 i=1

n m
H Y X = −෍ ෍ P x i , y j log 2 P(y i |x j )
j=1 i=1

n m
H X, Y = − ෍ ෍ P x i , y j log 2 P(x i , y j )
j=1 i=1
Contd…
H (X) is the average uncertainty of the channel input and H (Y) is the average
uncertainty of the channel output.

The conditional entropy H (X/Y) is a measure of the average uncertainty remaining


about the channel input after the channel output has been observed. H (X/Y) is
also called equivocation of X w.r.t. Y.

The conditional entropy H (Y/X) is the average uncertainty of the channel output
given that X was transmitted.
Contd…

The joint entropy H (X, Y) is the average uncertainty of the communication


channel as a whole. Few useful relationships among the above various entropies
are as under:
a. H (X, Y) = H (X/Y) + H (Y)
b. H (X, Y) = H (Y/X) + H (X)
c. H (X, Y) = H (X) + H (Y)
d. H (X/Y) = H (X, Y) – H (Y)

X and Y are statisticallyindependent


Contd…

The conditional entropy or conditional uncertainty of X given random variable Y is


the average conditional entropy over Y.
The joint entropy of two discrete random variables X and Y is merely the entropy
of their pairing: (X, Y), this implies that if X and Y are independent, then their
joint entropy is the sum of their individual entropies.
The Mutual Information:
 Mutual information measures the amount of information that can be
obtained about
one random variable by observing another.
 It is important in communication where it can be used to maximize the
amount of information shared between sent and received signals.
 The mutual information denoted by I (X, Y) of a channel is defined by:
I X; Y = H X − H(X|Y) bits/symbol

 Since H (X) represents the uncertainty about the channel input before the channel
output is observed and H (X/Y) represents the uncertainty about the channel input
after the channel output is observed, the mutual information I (X; Y) represents
the uncertainty about the channel input that is resolved by observing the channel
output.
Properties of Mutual Information I (X; Y)
 I (X; Y) = I(Y; X)
 I (X; Y) ≥ 0
I (X; Y) = H (Y) – H (Y/X)
I (X; Y) = H(X) + H(Y) – H(X,Y)

 The Entropy corresponding to mutual information [i.e. I (X, Y)] indicates a measure
of the information transmitted through a channel. Hence, it is called ‘Transferred
information’.
The Discrete Memoryless Channels (DMC):
 Channel Representation: A communication channel may be defined as
the path or
medium through which the symbols flow to the receiver end.
 A DMC is a statistical model with an input X and output Y. Each possible
input to output path is indicated along with a conditional probability P
(yj|xi), where P (yj|xi) is the conditional probability of obtaining output
yj given that the input is x1 and is called a channel transition
probability.
 A channel is completely specified by the complete set of transition
probabilities. The channel is specified by the matrix of transition
probabilities [P(Y|X)]. This matrix is known as Channel Matrix.
Contd…

𝑃(𝑦1/𝑥1) ⋯ 𝑃(𝑦𝑛/𝑥1)
𝑃(𝑌| 𝑋) = ⋮ ⋱ ⋮
𝑃(𝑦 1 /𝑥 𝑚 ) ⋯ 𝑃(𝑦 𝑛 /𝑥 𝑚 )
Contd…
 Since each input to the channel results in some output, each row of the
column matrix must sum to unity. This means that
𝑛

෍ 𝑃 𝑦𝑗 𝑥𝑖 = 1 for all 𝑖
𝑗=1

 Now, if the input probabilities P(X) are represented by the row matrix, we
have
𝑃 𝑋 = [𝑃 𝑥1 𝑃 𝑥2 … 𝑃(𝑥𝑚 )]

 Also the output probabilities P(Y) are represented by the row matrix,
we have
𝑃(Y )= [𝑃( 𝑦1) 𝑃(𝑦2) ….𝑃(𝑦𝑛)]
Contd…
Then
𝑃 𝑌 = 𝑃 𝑋 𝑃 𝑌𝑋
Now if P(X) is represented as a diagonal matrix, we have

𝑃(𝑥1) ⋯ 0
[𝑃(𝑋)]𝑑= ⋮ ⋱ ⋮
0 ⋯ 𝑃(𝑥 𝑚 )
Then

𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑑 [𝑃(𝑌|𝑋)]

 Where the (i, j) element of matrix [P(X,Y)] has the form P(xi, yj).
 The matrix [P(X, Y)] is known as the joint probability matrix and the element
P(xi, yj) is the joint probability of transmitting xi and receiving yj.
Types of Channels:
Other than discrete and continuous channels, there are some special types of
channels with their own channel matrices. They are as follows:
Lossless Channel: A channel described by a channel matrix with only one non–
zero element in each column is called a lossless channel.

3/4 1/4 0 0
𝑃 𝑌 𝑋 = 0 0 2/3 0
0 0 0 1
Contd…
Deterministic Channel: A channel described by a channel matrix
with only one non – zero element in each row is called a deterministic
channel.

1 0 0
𝑃 𝑌𝑋 = 1 0 0
0 1 0
0 0 1
Contd…
Noiseless Channel: A channel is called noiseless if it is both
lossless and deterministic. For a lossless channel, m = n

1 0 0
𝑃 𝑌𝑋 = 0 1 0
0 0 1
Contd…
Binary Symmetric Channel: BSC has two inputs (x1 =
0 and x2 = 1) and two outputs (y1 = 0 and y2 = 1).
This channel is symmetric because the probability of receiving a 1 if a
0 is sent is the same as the probability of receiving a 0 if a 1 is sent.

1−𝑝 𝑝
𝑃 𝑌𝑋 =
𝑝 1−𝑝
The Channel Capacity
The channel capacity represents the maximum amount of information that can be
transmitted by a channel per second.
To achieve this rate of transmission, the information has to be processed properly
or coded in the most efficient manner.
Channel Capacity per Symbol CS: The channel capacity per symbol of a
discrete memory-less channel (DMC) is defined as
𝐶𝑠 = max I(X; Y) bits/symbol
{P(xi)}

Where the maximization is over all possible input probability distributions {P (xi)}
on X.
Contd…

 Channel Capacity per Second C: I f ‘r’ symbols are being transmitted per
second, then the maximum rate or transmission of information per second is ‘r
CS’. this is the channel capacity per second and is denoted by C (b/s) i.e.
𝐶 = 𝑟𝐶𝑆 𝑏/𝑠
Capacities of Special Channels:
Lossless Channel: For a lossless channel, H (X/Y) = 0 and I (X; Y) = H (X).
Thus the mutual information is equal to the input entropy and no source
information is lost in transmission.

𝐶𝑆 = max 𝐻 (𝑋)= log2 𝑚


{P(xi )}

Where m is the number of symbols in X.


Contd…

 Deterministic Channel: For a deterministic channel,


H (Y/X) = 0 for all input distributions P (xi) and I (X; Y) = H (Y).
 Thus the information transfer is equal to the output entropy. The channel capacity per
symbol will be

𝐶𝑆 = max 𝐻 (Y)= log2 n


{P(xi )}

where n is the number of symbols in Y.


Contd…
 Noiseless Channel: since a noiseless channel is both lossless and deterministic, we
have I (X; Y) = H (X) = H (Y) and the channel capacity per symbol is
𝐶𝑆 = log2 𝑚 = log2𝑛

 Binary Symmetric Channel: For the BSC, the mutual information is

𝐼( 𝑋; 𝑌) = 𝐻 ( 𝑌) + 𝑝 log2 𝑝 + (1 − 𝑝) log2(1 − 𝑝)

 And the channel capacity per symbol will be

𝐶𝑆 = 1 + 𝑝 log2 𝑝 +(1−𝑝) log2(1 −𝑝)


Capacity of an Additive Gaussian Noise (AWGN)
Channel - Shannon – Hartley Law
 The Shannon – Hartley law underscores the fundamental role of bandwidth and
signal – to – noise ration in communication channel. It also shows that we can
exchange increased bandwidth for decreased signal power for a system with given
capacity C.

 In an additive white Gaussian noise (AWGN) channel, the channel output Y is


given by
Y=X+n
 Where X is the channel input and n is an additive bandlimited white Gaussian
noise for zero mean and variance σ2.
Contd…
The capacity C of an AWGN channel is given by
𝐶𝑠 = max 𝐼(𝑋; 𝑌) = 1/2log2(1+S/N) bit/sample
{P(xi)}

 Where S/N is the signal – to – noise ratio at the channel output.


 If the channel bandwidth B Hz is fixed, then the output y(t) is also a bandlimited
signal completely characterized by its periodic sample values taken at the Nyquist
rate 2B samples/s.
 Then the capacity C (b/s) of the AWGN channel is limited by
𝐶 = 2𝐵 ∗𝐶𝑆 =𝐵 log2 (1+S/N)𝑏/𝑠
 This above equation is known as the Shannon – Hartley Law.
Channel Capacity: Shannon – Hartley Law Proof.

 The bandwidth and the noise power place a restriction upon the rate of information
that can be transmitted by a channel. Channel capacity C for an AWGN channel is
expressed as

𝐶 = 𝐵𝑙𝑜𝑔2 (1 + 𝑆/𝑁)

Where B = channel bandwidth in Hz; S = signal power; N = noise power;


Contd…
 Proof: Assuming signal mixed with noise, the signal amplitude can be
recognized only
within the root mean square noise voltage.
 Assuming average signal power and noise power to be S
watts and N watts, respectively, the RMS value of the received
signal is 𝑆 + 𝑁 and that of noise is 𝑁
 Therefore the number of distinct levels that can be distinguished without error is
expressed as
𝑆+𝑁
𝑀= = 1 + S/N
𝑁
Contd…
The maximum amount of information carried by each pulse having 1 + S/N
distinct levels is given by
S 1 𝑆
𝐼 = log 2 1+ = log 2 1 + bits
N 2 𝑁

 The channel capacity is the maximum amount of information that can be


transmitted per second by a channel. If a channel can transmit a maximum of K
pulses per second, then the channel capacity C is given by

𝐾 𝑆
C = log 2 1 + bits/Second
2 𝑁
Contd…
 A system of bandwidth nfm Hz can transmit 2nfm independent pulses per second.
It is concluded that a system with bandwidth B Hz can transmit a maximum of
2B pulses per second. Replacing K with 2B, we eventually get
𝑆
C = 𝐵 log 2 1 + bits/Second
𝑁

 The bandwidth and the signal power can be exchanged for one another.
Differential Entropy
The differential entropy h(X) of a continuous random variable X with probability
density function 𝑓𝑋 𝑥 is defined as

1
ℎ 𝑋 = න 𝑓𝑋 𝑥 𝑙𝑜𝑔2 𝑑𝑥
𝑓𝑋 𝑥
−∞

Differential entropy for a random variable with uniform probability density


function

1
𝑓𝑋 𝑥 = ቐ𝑎 , 0 ≤ 𝑥 ≤ 𝑎
0, otherwise
Contd…
Further,
𝑎
1
ℎ 𝑋 = න 𝑙𝑜𝑔2 𝑎 𝑑𝑥 = 𝑙𝑜𝑔2 (𝑎)
𝑎
0
Unlike entropy differential entropy can be negative
Contd…

Differential entropy of Gaussian Source with mean μ and variance 𝜎 2 : One of


most commonly occurring and practically relevant sources
(𝑥−𝜇)2
1 −
𝑓𝑋 𝑥 = 𝑒 2𝜎2
2𝜋𝜎 2
1
ℎ 𝑋 = log 2 (2πσ2 e)
2
Differential entropy increases with variance 𝜎 2 : As uncertainty increases
It does not depend on mean
Joint/Conditional Differential Entropy:

𝒇𝒀 𝒚 and 𝒇𝑿 (𝒙) are marginal PDF of X and Y, then for the joint PDF 𝑓𝑋𝑌 𝑥, 𝑦 ,
we can write joint entropy and conditional entropy, respectively, as

∞ ∞
1
ℎ 𝑥, 𝑦 = න න 𝑓𝑋𝑌 𝑥, 𝑦 𝑙𝑜𝑔2 𝑑𝑥𝑑𝑦
−∞ 𝑓𝑋𝑌 𝑥, 𝑦
∞ −∞

1
ℎ 𝑥|𝑦 = න න 𝑓𝑋𝑌 𝑥, 𝑦 𝑙𝑜𝑔2 𝑑𝑥𝑑𝑦
−∞ −∞ 𝑓𝑋|𝑌 𝑥|𝑦
Source Coding
Source Coding:
Definition: A conversion of the output of a discrete memory less
source (DMS) into a sequence of binary symbols i.e. binary code
word, is called Source Coding.
The device that performs this conversion is called the Source
Encoder.
Objective of Source Coding: An objective of source coding is to
minimize the average bit rate required for representation of the source
by reducing the redundancy of the information source
Few Terms Related to Source Coding Process:
1. Code word Length:
 Let X be a DMS with finite entropy H (X) and an alphabet {𝑥1 … … . . 𝑥 𝑚 } with
corresponding probabilities of occurrence P(xi) (i = 0, …. , M-1). Let the binary code
word assigned to symbol xi by the encoder have length ni, measured in bits. The
length of the code word is the number of binary digits in the code word.
2. Average Code word Length:
 The average code word length L, per source symbol is given by
M−1

𝐿ത = ෍ p(x i )𝑛 𝑖
i=0

 The parameter 𝐿ത represents the average number of bits per source symbol used in
the
source coding process.
Contd…
1. Code Efficiency:
 The code efficiency η is defined as
𝐿𝑚𝑖𝑛
η=
𝐿ത
2. Code Redundancy:
 The code redundancy γ is defined as

𝜸 = 𝟏 −ƞ
The Source Coding Theorem
The source coding theorem states that for a DMS X, with entropy H (X), the
average code word length 𝐿ത per symbol is bounded as 𝐿ത ≥ 𝐻(𝑋)
And further, 𝐿ത can be made as close to H (X) as desired for some suitable chosen
code
Thus,
𝐿ത𝑚𝑖𝑛 = 𝐻(𝑋)

 The code efficiency can be rewritten as

𝐻(𝑋)
η=
𝐿ത
Classification of Code
1. Fixed – Length Codes
2. Variable – Length Codes
3. Distinct Codes
4. Prefix – Free Codes
5. Uniquely Decodable Codes
6. Instantaneous Codes
7. Optimal Codes
xi Code 1 Code 2 Code 3 Code 4 Code 5 Code 6
x1 00 00 0 0 0 1
x2 01 01 1 10 01 01
x3 00 10 00 110 011 001
x4 11 11 11 111 0111 0001
Contd…

1. Fixed – Length Codes:


A fixed – length code is one whose code word length is fixed. Code 1 and
Code 2 of above table are fixed – length code words with length 2.
2. Variable – Length Codes:
A variable – length code is one whose code word length is not fixed. All
codes of above table except Code 1 and Code 2 are variable – length codes.
3. Distinct Codes:
A code is distinct if each code word is distinguishable from each other. All
codes of above table except Code 1 are distinct codes.
Contd…
4. Prefix – Free Codes:
A code in which no code word can be formed by adding code symbols to another
code word is called a prefix- free code. In a prefix – free code, no code word is prefix
of another. Codes 2, 4 and 6 of above table are prefix – free codes.

5. Uniquely Decodable Codes:


A distinct code is uniquely decodable if the original source sequence can be
reconstructed perfectly from the encoded binary sequence. A sufficient condition
to ensure that a code is uniquely decodable is that no code word is a prefix of another.
Thus the prefix – free codes 2, 4 and 6 are uniquely decodable codes. Prefix – free
condition is not a necessary condition for uniquely decidability. Code 5 albeit does not
satisfy the prefix – free condition and yet it is a uniquely decodable code since the
bit 0 indicates the beginning of each code word of the code.
Contd…

6. Instantaneous Codes:
A uniquely decodable code is called an instantaneous code if the end of any code
word is recognizable without examining subsequent code symbols. The
instantaneous codes have the property previously mentioned that no code word is a
prefix of another code word. Prefix – free codes are sometimes known as
instantaneous codes.
7. Optimal Codes:
A code is said to be optimal if it is instantaneous and has the minimum average L for
a given source with a given probability assignment for the source symbols
Kraft Inequality
Let X be a DMS with alphabet {𝑥𝑖} (𝑖= 0,2, …, M-1). Assume that the
length of the assigned binary code word corresponding to xi is ni.
A necessary and sufficient condition for the existence of an instantaneous
binary code is
𝑀−1
𝐾=෍ 2−𝑛𝑖 ≤ 1
𝑖=0
This is known as the Kraft Inequality
It may be noted that Kraft inequality assures us of the existence of an
instantaneously decodable code with code word lengths that satisfy the
inequality.
But it does not show us how to obtain those code words, nor does it say any
code satisfies the inequality is automatically uniquely decodable.
Entropy Coding
The design of a variable – length code such that its average code word
length approaches the entropy of DMS is often referred to as Entropy
Coding.

There are basically two types of entropy coding, viz.


1) Shannon – Fano Coding
2) Huffman Coding
Shannon – Fano Coding:

An efficient code can be obtained by the following simple procedure,


known as Shannon–Fano algorithm.

1) List the source symbols in order of decreasing probability.


2) Partition the set into two sets that are as close to equi-probables as possible and
assign 0 to the upper set and 1 to the lower set.
3) Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible until further partitioning is not possible
4) Assign code word by appending the 0s and 1s from left to right
Shannon – Fano Coding - Example
Let there be six (6) source symbols having probabilities as x1 = 0.30, x2 = 0.25, x3 =
0.20, x4 = 0.12, x5 = 0.08 x6 = 0.05. Obtain the Shannon – Fano Coding for the
given source symbols.
Shannon Fano Code words

H (X) = 2.36 b/symbol 𝒙𝒊 P(𝒙𝒊 ) Step 1 Step 2 Step 3 Step 4 Code

ഥ
𝑳 = 2.38 b/symbol 𝒙𝟏 0.30 0 0 00
η = H (X)/ ഥ
𝑳 = 0.99 𝒙𝟐 0.25 0 1 01
𝒙𝟑 0.20 1 0 10
𝒙𝟒 0.12 1 1 0 110
𝒙𝟓 0.08 1 1 1 0 1110
𝒙𝟔 0.05 1 1 1 1 1111
Huffman Coding:

 Huffman coding results in an optimal code. It is the code thathas the highest
efficiency.
 The Huffman coding procedure is as follows:
1) List the source symbols in order of decreasing probability.
2) Combine the probabilities of the two symbols having the lowest probabilities and
reorder the resultant probabilities, this step is called reduction 1. The same
procedure is repeated until there are two ordered probabilities remaining.
Contd…
3) Start encoding with the last reduction, which consists of exactly two ordered
probabilities. Assign 0 as the first digit in the code word for all the source
symbols associated with the first probability; assign 1 to the second
probability.
4) Now go back and assign 0 and 1 to the second digit for the two probabilities
that were combined in the previous reduction step, retaining all the source
symbols associated with the first probability; assign 1 to the second
probability.
5) Keep regressing this way until the first column is reached.
6) The code word is obtained tracing back from right to left.
Huffman Encoding - Example
Source P (xi) Codeword
Sample
xi
X1 0.30 00
X2 0.25 01
X3 0.20 11
X4 0.12 101
x5 0.08 1000
x6 0.05 1001

H (X) = 2.36 b/symbol


𝑳ത = 2.38 b/symbol
η = H (X)/ 𝑳ത = 0.99
Redundancy:
Redundancy in information theory refers to the reduction in information content of
a message from its maximum value
For example, consider English having 26 alphabets. Assuming all alphabets are
equally likely to occur, P (xi) = 1/26. For all the 26 letters, the information
contained is therefore
log2 26 = 4.7 𝑏𝑖𝑡𝑠/𝑙𝑒𝑡𝑡𝑒𝑟

Assuming that each letter to occur with equal probability is not correct, if we
assume that some letters are more likely to occur than others, it actually reduces
the information content in English from its maximum value of 4.7 bits/symbol.
We define relative entropy on the ratio of H (Y/X) to H (X) which gives the
maximum compression value and Redundancy is then expressed as

Redundancy = H Y X /H(X)
Channel Coding
Why channel coding?

The challenge in digital communication system is that of providing a cost effective


facility for transmitting Information , at a rate and a level of reliability and quality
that are acceptable to a user at the receiver.
The two key system parameters :
1) Transmitted signal power.
2) Channel bandwidth.
Power spectral density of receiver noise (Important parameter)
These parameters determine the signal energy per bit-to-noise power spectral
density ratio Eb/No.
Contd…

In practical , there are a limit on the value that we can assign to Eb /No.
So, it is impossible to provide acceptable data quality (i.e. , low enough error
performance).
For a fixed Eb/No , the only practical option available for changing data quality is
to use ERROR-CONTROL CODING.

The two main methods of error control are:

i. Forward Error Correction (FEC).


ii. Automatic Repeat request (ARQ).
CHANNEL CODING Block Diagram
Forward Error Correction (FEC)
The key idea of FEC is to transmit enough redundant data to allow receiver to recover
from errors all by itself. No sender retransmission required.
The major categories of FEC codes are

i. Block codes,
ii. Cyclic codes
iii. BCH codes
iv. Reed-Solomon codes .
v. Convolutional codes
Forward Error Correction (FEC)

FEC require only a one-way link between the transmitter and receiver.

In the use of error-control coding there are trade offs between:

i. Efficiency & Reliability.


ii.Encoding/Decoding complexity & Bandwidth .
Channel Coding Theorem
The channel coding theorem states that if a discrete memoryless channel has
capacity C and the source generate info at rate less than C ,then there exists a
coding technique that the output of the source may be transmitted over the channel
with an arbitrarily low probability of symbol error.
For the special case of BSC the theorem tell us that it is possible to find a code
that achieves error free transmission over the channel.
The issue that mater not the signal to noise ratio put how the channel input is
encoded.
The theorem asserts the existence of good codes but dose not tell us how to find
them.
By good codes we mean families of channel codes that are capable of providing
reliable (error-free) transmission of info over a noisy channel of interest at bit rate
up to a max value less than the capacity of the channel.
Linear Block Codes

 The encoder generates a block of n coded bits from k information bits and we
call this as (n, k) block codes. The coded bits are also called as code word
symbols.

Why linear?
 A code is linear if the modulo-2 sum of two code words is also a code word.
Contd…

 ‘n’ code word symbols can take 2𝑛 possible values. From that we
select 2𝑘 code words to form the code.
 A block code is said to be useful when there is one to one mapping
between message m and its code word c as shown above.
Generator Matrix
 All code words can be obtained as linear combination of basis vectors.
 The basis vectors can be designated as {𝑔1, 𝑔2, 𝑔3,….., 𝑔𝑘}
 For a linear code, there exists a k by n generator matrix such that
𝑐1∗𝑛 = 𝑚1∗𝑘 . 𝐺𝑘∗𝑛

where c={𝑐1, 𝑐2, ….., 𝑐𝑛} and m={𝑚1, 𝑚2, ……., 𝑚 𝑘 }


Block Codes in Systematic Form

 In this form, the code word consists of (n-k) parity check bits
followed by k bits of the message.
 The structure of the code word in systematic form is:

 The rate or efficiency for this code R= k/n


Contd…

G = [ 𝐼𝑘 P] C = m.G = [m mP]
Message
part Parity part
Example:
 Let us consider (7, 4) linear code where k=4 and n=7

𝒈𝟎 1101000
𝒈𝟏 0 11 01 00
m=(1110) and G = 𝒈𝟐
= 1110010
𝒈𝟑 1 01 00 01
Contd…

C= m.G = 𝒎𝟏𝒈𝟏 + 𝒎𝟐𝒈𝟐 + 𝒎𝟑𝒈𝟑 + 𝒎𝟒𝒈𝟒

= 1.𝒈𝟏 + 𝟏. 𝒈𝟐 + 𝟏. 𝒈𝟑 + 𝟎. 𝑔 4

C = (1101000) +(0110100) + (1110010)


= (0101110)
Contd…

 Another method:
Let m=(𝑚1, 𝑚2, 𝑚3, 𝑚4) and c= (𝑐1, 𝑐2, 𝑐3, 𝑐4, 𝑐5, 𝑐6, 𝑐7)

1101000
0 11 01 00
c=m.G= (𝑚1, 𝑚2, 𝑚3, 𝑚4)
1110010
1 01 00 01

 By matrix multiplication we obtain :


𝑐1=𝑚1 + 𝑚3 + 𝑚4, 𝑐2=𝑚1 + 𝑚2 + 𝑚3, 𝑐3= 𝑚2 + 𝑚3 + 𝑚4,
𝑐4=𝑚1,
𝑐5=𝑚2, 𝑐6=𝑚3, 𝑐7=𝑚4

The code word corresponding to the message (1110) is (0101110) .


Parity Check Matrix (H)
When G is systematic, it is easy to determine the parity check matrix H as

𝐻 = 𝐼𝑛−𝑘 𝑃𝑇

The parity check matrix H of a generator matrix is an (n-k)-by-n matrix


satisfying
𝑇
𝐻 𝑛−𝑘 ∗𝑛 𝑛∗𝑘 = 0
𝐺

Then the code words should satisfy (n-k) parity check equations

𝑇
𝐶1∗𝑛 𝐻𝑛∗(𝑛−𝑘) = 𝑚1∗𝑘 𝐺𝑘∗𝑛 𝐻𝑛∗(𝑛−𝑘) =0
Example:
Consider generator matrix of (7, 4) linear block code
H = [𝐼𝑛−𝑘 𝑃𝑇 ] and G = [𝑃 𝐼𝑘]
The corresponding parity check matrix is
1 0 0 1 0 1 1
𝐻= 0 1 0 1 1 1 0
0 0 1 0 1 1 1
1 0 0
0 1 0
1 1 0 1 0 0 0
0 0 1
𝐺. 𝐻 𝑇 == 0 1 1 0 1 0 0
1 1 0 =0
1 1 1 0 0 1 0
0 1 1
1 0 1 0 0 0 1
1 1 1
1 0 1
Syndrome and Error Detection
 For a code word c, transmitted over a noisy channel, let r be
the received vector at the output of the channel with error e

c r = c+e
+

1, if r ≠c
e i=
0, if r=c

Syndrome of received vector r is given by:


T
s = r.H = (𝑠 , 𝑠 , 𝑠 , … … . . , 𝑠 )
1 2 3 𝑛−𝑘
Properties of syndrome:
 The syndrome depends only on the error pattern and not on
the transmitted word.
T T T T
s = (c+e).H = c.H + e.H = e.H

 All the error pattern differ by at least a code word have the
same syndrome ‘s’.
Example:
Let C=(0101110) be the transmitted code and r=(0001110) be the received vector.
1 0 0
0 1 0
0 0 1
s=r. 𝐻𝑇=[𝑠1, 𝑠2, 𝑠3] =[𝑟1, 𝑟2, 𝑟3, 𝑟4, 𝑟5, 𝑟6, 𝑟7] 1 1 0
0 1 1
1 1 1
1 0 1

The syndrome digits are:


𝑠1 = 𝑟1 + 𝑟4 + 𝑟6 + 𝑟7 = 0

𝑠2 = 𝑟2 + 𝑟4 + 𝑟5 + 𝑟6 = 1
𝑠3 = 𝑟3 + 𝑟5 + 𝑟6 + 𝑟7 =0
Contd…

The error vector, e=(𝑒1, 𝑒2, 𝑒3, 𝑒4, 𝑒5, 𝑒6, 𝑒7)=(0100000)
*
C =r + e
= (0001110)+(0100000)
= (0101110)
where C*is the actual transmitted code word
Minimum Distance of a BlockCode
 Hamming weight w(c ) : It is defined as the number of non-zero
components of c.
For ex: The hamming weight of c=(11000110) is 4
 Hamming distance d( c, x): It is defined as the number of places where they
differ .
 The hamming distance between c=(11000110) and x=(00100100) is 4
 The hamming distance satisfies the triangle inequality d(c, x)+d(x, y)≥ d(c, y)
 The hamming distance between two n-tuple c and x is equal to the
hamming weight of the sum of c and x
d(c, x) = w( c+ x)
 For ex: The hamming distance between c=(11000110) and x=(00100100) is 4
and the weight of c + x = (11100010) is 4.
Contd…
 Minimum hamming distance dmin: It is defined as the smallest
distance between any pair of code vectors in the code.
For a given block code C, dmin is defined as:
dmin=min{ d(c, x): c, x€C, c ≠ x}

 The Hamming distance between two code vectors in C is equal


to the Hamming weight of a third code vector in C.
dmin = min{w( c+x):c, x €C, c≠x}
= min{w(y):y €C, y≠ 0}
= wmin
APPLICATIONS

 Communications:
 Satellite and deep space communications.
 Digital audio and video transmissions.

 Storage:
 Computer memory (RAM).
 Single error correcting and double error detecting code.
ADVANTAGES DISADVANTAGES

 It is the easiest and  Transmission bandwidth


simplest technique to requirement is more.
detect and correct errors.
 Extra bits reduces bit rate
 Error probability is of transmitter and also
reduced. reduces its power.
Cyclic Codes
Definition: A code C is cyclic if
1) C is a linear code: c1+c2=m1G+m2G=(m1+m2)G  a new code word in the
code.
2) Any cyclic shift of a codeword is also a codeword, i.e. If c0 c1 c2 …. cn-2 cn-1
is a codeword, then cn-1 c0 c1 …. cn-3 cn-2 c1 c2 c3 …. cn-1 c0 are all
codewords
Example
• (i) Code C = {000, 101, 011, 110} is cyclic.
(ii) Given the following code :
0 0 0 0 0
1 0 1 1 1
C
0 1 1 0 1
 
1 1 0 1 0

Thus, it is not a cyclic code because, the cyclic shift of [10111] is [11011] .
Contd…
The code with the generator matrix
1 0 1 1 1 0 0
 
G   0 1 0 1 1 1 0
0 0 1 0 1 1 1
 

has codewords
c1 = 1011100 c2 = 0101110 c3 =0010111
c1 + c2 = 1110010 c1 + c3 = 1001011 c2 + c3 = 0111001
c1 + c2 + c3 = 1100101
and it is cyclic because the right shifts have the following impacts
c1  c2 , c2  c3 , c3  c1 + c 3
c1 + c2  c2 + c3 , c1 + c3  c1 + c2 + c3 , c2 + c3  c1
Contd…
A (n, k) cyclic code can be generated by a polynomial g(x) which has
degree n-k and
is a factor of xn - 1.
Which is known as generator polynomial.
Given message bits, (mk-1…m1m0 ), the code is generated simply as:

In other words, C(x) can be considered as the product of m(x) and g(x).
A (7,4) cyclic code with g(x) = x3 + x + 1. If m(x) = x3 + 1, C(x) = x6 + x4 + x + 1.
Generator Polynomials:
It is that polynomial which is able to generate all the codeword polynomials.
For (n, k) cyclic code
C = (Cn−1 , Cn−2 , … , C2 , C1 , C0 )
In polynomial form
C x = Cn−1 x n−1 + Cn−2 x n−2 + ⋯ + C2 x 2 + C1 x + C0
Theorem: Let C be an (n, k) cyclic code then: There exists only one polynomial
g(x) of the minimum degree (n-k).
g x = g n−k x n−k + Cn−k−1 x n−k−1 + ⋯ + g1 x + g 0 )
Properties:
(i) g n−k = g 0 (Always)
Contd…
How can we find different codewords by using generator
polynomial

A (7,4) cyclic code with g(x) = x3 + x + 1.


Generator polynomial is also a codeword, hence
C1 (x) = x3 + x + 1
Other codeword can be obtained as
x g(x) = x4 + x2 + x =C2 (x)
x2 g(x) = x5 + x3 + x2 =C3 (x)
xk-1 g(x) = x5 + x3 + x2 =C𝑘 (x)
Encoding of cyclic codes

C x = m x g(x) gives non-systematic code.

Systematic Code
The codeword can be expressed by the data polynomial m(x) and the check
polynomial cp(x) as
c(x) = m(x) xn-k + cp(x)

where cp(x) = remainder from dividing m(x) xn-k by generator g(x)


Decoding
Let m(x) be the data block and g(x) be the polynomial divisor, we have
xn-k m(x)/g(x) = q(x) + cp(x) /g(x)

The transmitted block is c(x) = xn-k m(x) + cp(x)


If there are no errors the division of c(x) by g(x) produces no remainder.
c(x) / g(x) = q(x)
If one or more bit errors, then the received block c’(x) will be of the
form c’(x) = c(x) + e(x) and the error pattern is detected from known
error syndromes s(x) = e(x)/g(x)
The syndrome value s(x) only depends on the error bits

98
Cyclic Code: Example
• Example : Find the codeword c(x) if m(x) = 1 + x + x2 and g(x) = 1 + x + x3, for (7, 4)
cyclic code
• We have n = total number of bits = 7, k = number of information bits = 4, r =
number of parity bits = n - k = 3
  m( x ) x n  k 
c p ( x )  rem  
 g ( x ) 
x  x  x 
5 4 3

 rem   x
 x  x 1 
3

Then,
c( x)  m( x) x nk  c p ( x)  x  x 3  x 4  x5
= 0111010
Contd…

Example : Let m(x) = 1 + x + x2 and g(x) = 1 + x + x3, for (7, 4) cyclic code
Assume e(x) = 1000000. The received block c’(x) = 1111010
We have s(x) = e(x)/g(x) = x2 + 1. Therefore, s = 101. According to Table (b), we have
the error pattern 1000000
Now, supposed the received block is 0111011, or
c’(x) = x5 +x4 + x3 + x + 1. Find s(x) and the error pattern.
A Single-Error-Correcting (7,4) Cyclic Code

(a) Table of valid codewords (b) Table of syndromes for single-bit errors
Data Block Codeword
Error pattern E Syndrome S
0000 0000000
0000001 001
0001 0001011
0010 0010110 0000010 010

0011 0011101 0000100 100


0100 0100111 0001000 011
0101 0101100 0010000 110
0110 0110001
0100000 111
0111 0111010
1000000 101
1000 1000101
1001 1001110
1010 1010011
1011 1011000
1100 1100010
1101 1101001
1110 1110100
1111 1111111
Bose_chaudhuri_hocquenghem codes (BCH)

 For positive pair of integers m≥3 and t, a (n, k) BCH code has
parameters:
 Block length: n = 2m – 1
 Number of check bits: n – k ≤ mt
 Minimum distance: dmin ≥ 2t + 1
 t<(2m – 1)/2 random errors detected and corrected.
 So also called ‘t-error correcting BCH code’.
 Major advantage is flexibility for block length and code rate.
Contd…
 Generatorpolynomial  specified in terms of its
roots from Galois Field GF(2k).
 g(x) has α,α2,…, α2t and their conjugates as its roots.
 We choose g(x) from xn + 1 polynomial factors by
taking xn-k as highest term.
Contd…

 The parameters of some useful BCH codes are:

n k t Generator Polynomial
7 4 1 1 011
15 11 1 10 011
15 7 2 111 010 001
15 5 3 10 100 110 111
31 26 1 100 101
31 21 2 11 101 101 001
31 16 3 1 000 111 110 101 111
31 11 5 101 100 010 011 011 010 101
31 6 7 11 001 011 011 110 101 000
100 111
BCH Encoder
 (15, 7) BCH Encoder.
 The 7 message bits (M0, M1….M6) are applied to the parallel to serial
shift register.
 The outputof parallel to serial shift register will be sent to (15, 7)
BCH Encoder module.
 Using these message bits, parity bits are computed and sent to serial
to parallel shift register.
 Then parity bits are appended to original message bits to obtain 15 bit
encoded data.
 This entire encoding process requires 15 clock cycles.
BLOCK DIAGRAM

Block diagram of (15,7) BCH Encoder


BCH Decoder

 (15, 7) BCH decoder.


 The decoding algorithm for BCH codes consists of three major
steps.
 Calculate the syndromevalue Si, i=1,2,….,2t from the received
word r(x).
 Determine the error location polynomial s(x)
 Find the roots of s(x) and then correct the errors
BLOCK DIAGRAM

Block diagram for (15, 7) BCH


Decoder.
EXAMPLE

 For a (31,21,2) BCH code:


 Encoder: t = 2
 C(x)=xⁿ+1=x³¹+1
=(x+1)(x¹⁰+x⁹+x⁸+x⁶+x⁵+x³+1)
(x²⁰+x¹⁷+x¹⁶+x¹³+x¹¹+x⁷+x⁶+x⁵+x²+x+1)
 Here highest order term for g(x) must be chosen as
 xⁿ⁻ᵏ=x³¹⁻²¹=x¹⁰
 So g(x)= (x¹⁰+x⁹+x⁸+x⁶+x⁵+x³+1)
Contd…
 Message D: (0110011)
 Data: d(x)=x⁵+x⁴+x+1
 So code C(x)=d(x).g(x)
= (x⁵+x⁴+x+1)(x¹⁰+x⁹+x⁸+x⁶+x⁵+x³+1)
= x¹⁵+2x¹⁴+2x¹³+x¹²+2x¹¹+4x¹⁰+3x⁹+2x⁸
+2x⁷+2x⁶+2x⁵+2x⁴+x³+x+1
= x¹⁵+x¹²+ x⁹+ x³+x+1
 Codeword, C: (1001001000001011)
MERITS
 The principal advantage is the ease with which they can be decoded using ‘syndrome
decoding’ method.
 Allows very simple electronic hardware to perform the task, obviating the need for a
computer, and meaning that a decoding device may be made small and low-
powered.
 Low amount of redundancy
 Easy to implement in hardware
 Widely used
DEMERITS
 Complexity
 Iterative and complex decoding algorithm
 Decoder cannot decide whether a decoded package is false
or not.
Reed Solomon Codes
It is a subclass of non-binary BCH codes.
The encoder for an RS code differs from binary encoder in that it operates on
multiple bits rather than individual bits

Properties:
Block length, n=2^m-1
Message size k symbols
Parity check size (n-k)=2t symbols
Minimum distance, d_min=(2t+1)
(n, k) code is used to encode m-bit symbol, Redundant bits=(n-k)
Contd…
Before data transmission, the encoder attaches parity symbols to the
data using a predetermined algorithm before transmission.
At the receiving side, the decoder detects and corrects a limited
predetermined number of errors occurred during transmission.
Transmitting additional symbols introduced by FEC is better than
retransmitting the whole package when at least an error has been
detected by the receiver.
A Reed-Solomon code is a block code and can be specified as RS(n,k)
Contd…
Contd…
Contd…
Encoder:
Contd…
Decoder:
Contd…
Advantages:
Convolutional Codes

 Convolutional codes are introduced by Elias in 1955.


 Convolution coding is a popular error-correcting coding method used to
improve the reliability of communication system.
 A message is convoluted, and then transmitted into a noisy channel.
 This convolution operation encodes some redundant information into the
transmitted signal, thereby improving the data capacity of the channel.
 Convolution codes are error detecting codes used to reliably transmit
digital data over unreliable communication channel system to channel noise.
Contd…

 The convolutional codes map information to code bits, but


sequentially convolve the sequence of information bit according to
some rule.
 The convolutional coding can be applied to a continuous data
stream as well as to blocks of data whereas the block codes can be
applied only for the block of data.
Convolutional encoder

 Convolutional encoder is a finite state machine (FSM),


processing information bits in a serial manner.
 Convolutional encoding of data is accomplished using a shift
register and associated combinatorial logic that performs
modulo-two addition.
 A shift register is merely a chain of flip-flops wherein the
output of the nth flip-flop is tied to the input of the (n+1)th flip
flop.
 Every time the active edge of the clock occurs, the input to the
flip-flop is clocked through to the output, and thus the data are
shifted over one stage.
Contd…

In convolutional code the block of n code bits generated by the


encoder in a particular time instant depends not only on the block of k
message bits within that time instant but also on the block of data bits
within a previous span of N-1 time instants (N>1).
A convolutional code with constraint length N consists of an N-stage
shift register (SR) and ν modulo-2 adders.
Contd…

Fig (a) Convolutional Encoder with N=3 and ʋ=2


 The message bits are applied at the input of the shift register (SR). The
coded digit stream is obtained at the commutator output. The commutator
samples the ν modulo-2 adders in a sequence, once during each input-bit
interval.
Contd…
 Example: Assume that the input digits are 1010. Find the coded
sequence output for previous Fig (a).
 Initially, the Shift Registers s1=s2=s3=0.
 When the first message bit 1 enters the SR, s1= 1, s2 =
s3=0.Then ν1=1, ν2=1 and the coder output is 11.
 When the second message bit 0 enters the SR, s1=0, s2=1,
s3=0.Then ν1=1 and ν2=0 and the coder output is 10.
 When the third message bit 1 enters the SR, s1=1, s2=0 and
s3=1Then ν1=0 and ν2=0 and the coder output is 00.
 When the fourth message bit 0 enters the SR, s1=0, s2=1 and
s3=0.Then ν1=1 and ν2=0 and the coder output is 10.
 The coded Output Sequence is : 11100010
Contd…
 PARAMETERS OF A CONVOLUTIONAL ENCODER
 Convolutional codes are commonly specified by three parameters:
(n,k,m):
n = number of output bits
k = number of input bits
m = number of memory registers
 Code Rate: The quantity k/n is called as code rate. It is a measure of the
efficiency of the code.
 Constraint Length: The quantity L(or K) is called the constraint length of the
code. It represents the number of bits in the encoder memory that affect the
generation of the n output bits. It is defined by
Constraint Length, L = k (m-1)
Encoder Representation
The encoder can be represented in several different
but equivalent ways. They are:

a) Generator Representation
b) State Diagram Representation
c) Tree Diagram Representation
d) Trellis Diagram Representation
Contd…
a) Generator Representation
 Generator representation shows the hardware connection of the shift
register taps to the modulo-2 adders. A generator vector represents the
position of the taps for an output. A “1” represents a connection and a “0”
represents no connection.
 (n, k, L) Convolutional code can be described by the generator
sequences that are the impulse response for each coder n output
branches.
 Generator sequences specify convolutional code completely by the
associated generator matrix.
 Encoded convolution code is produced by matrix multiplication of the input
and the generator matrix.
Contd…
 For example, the two generator vectors for the encoder in Fig (a) are g1
= [111] and g2 = [101], where the subscripts 1 and 2 denote the
corresponding output terminals.

b) State Diagram Representation


 In the state diagram, the state information of the encoder is shown in
the circles. Each new input information bit causes a transition from one
state to another.
 Contents of the rightmost (K-1) shift register stages define the states of
the encoder. The transition of an encoder from one state to another, as
caused by input bits, is depicted in the state diagram.
Contd…
 The path information between the states, denoted as x/c, represents
input information bit x and output encoded bits c.
 It is customary to begin convolutional encoding from the all zero state.
Example: State diagram representation of convolutional codes.

Fig(b):state diagram Here


k=1,n=2,K=3.
Contd…
 From the state diagram
Let 00 State a ; 01 State b; 10 State c; 11 State d;
(1) State a goes to State a when the input is 0 and the output is 00
(2) State a goes to State b when the input is 1 and the output is 11
(3) State b goes to State c when the input is 0 and the output is 10
(4) State b goes to State d when the input is 1 and the output is 01
(5) State c goes to State a when the input is 0 and the output is 11
(6) State c goes to State b when the input is 1 and the output is 00
(7) State d goes to State c when the input is 0 and the output is 01
(8) State d goes to State d when the input is 1 and the output is 10
Contd…

c) Tree Diagram Representation


 The tree diagram representation shows all possible information and
encoded sequences for the convolutional encoder.
 In the tree diagram, a solid line represents input information bit 0 and a
dashed line represents input information bit 1.
 The corresponding output encoded bits are shown on the branches of
the tree.
 An input information sequence defines a specific path through the tree
diagram from left to right.
Contd…
Example: Tree Diagram representation of convolutional codes

Fig(c): Tree diagram


Contd…

 The tree diagram in Fig(b) tends to suggest that there are eight states in
the last layer of the tree and that this will continue to grow. However
some states in the last layer (i.e. the stored data in the encoder) are
equivalent as indicated by the same letter on the tree (for example H and
h).
 These pairs of states may be assumed to be equivalent because they have
the same internal state for the first two stages of the shift register and
therefore will behave exactly the same way to the receipt of a new (0 or
1) input data bit.
Contd…
d) Trellis Diagram Representation
 The trellis diagram is basically a redrawing of the state diagram. It shows all
possible state transitions at each time step.
 The trellis diagram is drawn by lining up all the possible states (2L) in the
vertical axis. Then we connect each state to the next state by the allowable
codeword’s for that state.
 There are only two choices possible at each state. These are determined by the
arrival of either a 0 or a 1 bit.
 The arrows show the input bit and the output bits are shown in parentheses.
 The arrows going upwards represent a 0 bit and going downwards represent a 1
bit.
Contd…
Steps to construct trellis diagram
 It starts from scratch (all 0’s in the SR, i.e., state a) and makes transitions
corresponding to each input data digit.
 These transitions are denoted by a solid line for the next data digit 0 and
by a dashed line for the next data digit 1.
 Thus when the first input digit is 0, the encoder output is 00 (solid line)
 When the input digit is 1, the encoder output is 11 (dashed line).
 We continue this way for the second input digit and so on as depicted in
Fig (e) that follows.
Contd…
Example: Encoding of convolutional codes using
Trellis Representation
k=1, n=2, K=3 convolutional code

We begin in state 00: Fig (d)

Input Data: 0 1 0 1 1 0 0

Output: 0 0 1 1 0 1 0 0 10 10 1 1
Decoding of Convolutional Codes

 There are several different approaches to decoding of convolutional codes.


 These are grouped in two basic categories:
A. Sequential Decoding
-Fano Algorithm.
B. Maximum Likelihood Decoding
-Viterbi Algorithm.

Both of these two methods represent two different approaches .


Contd…

 Each node examined represents a path through part of the tree.


 TheFano algorithm can only operate over a code tree because it
cannot examine path merging.
 At each decoding stage, the Fano-algorithm retains the information regarding
three paths:
-the current path,
-its immediate predecessor path,
-one of its successor paths.

 Based on this information, the Fano algorithm can move from the current
path to either its immediate predecessor path or the selected successor path.
Contd…

 It allows both the forward and backward movement through the


Trellis diagram flow.
 Example: Decoding using Sequential decoding-Fano algorithm

 Consider the code 01 11 01 11 01 01 11 is received, the algorithm will


take a start and tally with the outputs it finds on the way. If an output
does not tally, it retraces the position back to the previous ambiguous
decision.
Contd…

A. Sequential Decoding – Fano Algorithm.

 It was one of the first methods proposed for decoding of a convolution ally
coded bit stream.
 It was first proposed by Wozencraft and later a better version was
proposed by Fano.
 Sequential decoding concentrates only on a certain number of likely
codeword's.
 The purpose of sequential decoding is to search through the nodes of the
code tree in an efficient way to find the maximum likelihood path.
Contd…

Fig(e):decoding using sequential decoding-Fano Algorihm


Contd…
B. Maximum Likelihood Decoding-Virtebi Algorithm
 The Viterbi decoder examines the entire received sequence of a given
length.
 It works on maximum likelihood decoding rule which tried to reduce the
error between the detected sequence and the original transmitted sequence.
 Trellis diagram is constructed for a system based on the received sequence
the path is straight and the trellis level by level.
 If a condition raises in such a way that there is no path for the
corresponding sequence then the viterbi decoding helps to detect the best
path based on the subsequent sequence.
Contd…

 The best path is termed as survivor


Example: Maximum Likelihood decoding –Viterbi algorithm.

Fig: Maximum Likelihood decoding –Viterbi algorithm (Step1)


Contd…

Fig: Maximum Likelihood decoding –Viterbi algorithm (Step2)


Contd…

Fig: Maximum Likelihood decoding –Viterbi algorithm (Step3)


Contd…

Fig: Maximum Likelihood decoding –Viterbi algorithm (Step4)


Advantages of Convolutional Codes
 Convolution coding is a popular error-correcting coding method used in digital
communications.
 The convolution operation encodes some redundant information into the
transmitted signal, thereby improving the data capacity of the channel.
 Convolution Encoding with Viterbi decoding is a powerful FEC technique that is
particularly suited to a channel in which the transmitted signal is corrupted mainly
by AWGN.
 It is simple and has good performance with low implementation cost.
Thank you

You might also like