0% found this document useful (0 votes)
935 views111 pages

Ece403 - Itc - Unit 1 - PPT Notes

This document describes an Information Theory and Coding course. The course aims to teach fundamental concepts of information theory, various source and channel coding techniques, and their importance for efficient communication. The course is divided into four units that cover topics like information theory, source coding algorithms, channel capacity, and various error control coding methods.

Uploaded by

Sai Charan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
935 views111 pages

Ece403 - Itc - Unit 1 - PPT Notes

This document describes an Information Theory and Coding course. The course aims to teach fundamental concepts of information theory, various source and channel coding techniques, and their importance for efficient communication. The course is divided into four units that cover topics like information theory, source coding algorithms, channel capacity, and various error control coding methods.

Uploaded by

Sai Charan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Course Code: ECE403

Course Name: Information Theory & Coding


Course Objectives

• This course enables the learners to realize the fundamental concepts of information theory,
various types of communication channel and its capacity for data transfer

• The course also analyzes various types of source coding and channel coding techniques and
their significance for efficient and reliable communication

ECE403 - INFORMATION THEORY & CODING


22-09-2021 2
UNIT I – Information Theory and Source Coding
Syllabus

UNIT I – INFORMATION THEORY and SOURCE CODING


Block Diagram of a Communication System – Fundamental Problems of Communication – Information
and Entropy – Properties of Entropy – Binary Memoryless Source – Extension to Discrete Memoryless
Source - Elements of Encoding – Properties of Code – Kraft-Macmillan Inequality – Code Length – Code
Efficiency – Source Coding Theorem – Source Coding Techniques – Shannon encoding - Shannon-Fano
Encoding - Huffman’s Encoding, Arithmetic Coding, Run-Length Encoding, Lempel-Ziv Encoding and
Decoding
UNIT II – NOISY CHANNEL CODING
Measure of Information for two dimensional discrete finite probability scheme – marginal, conditional
and joint entropies – Interpretation of different entropies for a two port communication system – Basic
relationships among different entropies – Discrete Memoryless Channel – Mutual Information –
Properties – Channel Capacity – Channel Classification – Channel Coding Theorem
Entropy in the continuous case – Definition and Properties – Capacity of a band-limited Gaussian channel
– Hartley-Shannon’s Law – Ideal System – Definition – Bandwidth Efficiency Diagram
ECE403 - INFORMATION THEORY & CODING
22-09-2021 3
UNIT I – Information Theory and Source Coding
Syllabus

UNIT III – BLOCK CODES, CYCLIC CODES and CONVOLUTIONAL CODES


Block Codes: Introduction – Hamming Code – Linear Block Codes – Syndrome decoding – Minimum
Distance Consideration
Cyclic Codes: Generator Polynomial – Parity-Check Polynomial – Encoder for Cyclic Codes –
Calculation of the Syndrome
Convolutional Codes: Convolutional Encoder Representations (State Diagram, Code trellis, Code tree) –
Viterbi decoding. Trellis Coded Modulation

UNIT IV – BCH, RS, LDPC and TURBO CODES


General Principles – Definition and Construction of Binary BCH codes – Error Syndromes in finite fields
– Decoding of SEC and DEC – binary BCH codes – Error Location Polynomial – Peterson-Gorenstein-
Zieler Decoder – Reed-Solomon Codes – Reed Solomon Encoding and decoding – Introduction to LDPC
and Turbo Codes
ECE403 - INFORMATION THEORY & CODING
22-09-2021 4
UNIT I – Information Theory and Source Coding
References

• TEXTBOOKS:
• Bernard Sklar and Prabitra Kumar Ray, Digital Communications, 2nd Edition, Pearson Education,
2011
• Simon Haykin, Communication Systems, 5th Edition, John Wiley and Sons, 2010
• F.M.Reza, An introduction to information theory, McGraw Hill Inc., 1994
• REFERENCES:
• B.P.Lathi, Modern Digital and Analog Communication Systems, 4th Edition, Oxford University
Press, 2012
• Salvatore Gravano, Introduction to Error Control Codes, Oxford University Press, 2011
• R.P.Singh and S.D.Sapre, Communication Systems - Analog and Digital, 2nd Edition, Tata
McGraw Hill, 2008
• Peter Sweeney, Error Control Coding from Theory to Practice, 2nd Edition, Wiley, 2002
• ONLINE MATERIAL:
• NPTEL : http://www.youtube.com/watch?v=f8RvFlr5wRk

ECE403 - INFORMATION THEORY & CODING


22-09-2021 5
UNIT I – Information Theory and Source Coding
Learning Outcomes

• UNIT I:
• Remember the basics notions in information theory like self-information, entropy and its types
• Implement various types of source coding algorithms and classify them
• UNIT II:
• Analyse various types of communication channels and its channel capacity
• UNIT III:
• Design and interpret various types of error control codes like linear block codes, cyclic codes,
convolutional codes and trellis coded modulation
• UNIT IV:
• Design and interpret about BCH Code and Reed Solomon Code

ECE403 - INFORMATION THEORY & CODING


22-09-2021 6
UNIT I – Information Theory and Source Coding
Contents
• UNIT I – Information Theory and Source Coding
• Block diagram of a communication system
• Fundamental problems of communication
• Information and entropy
• Properties of entropy
• Binary memoryless source
• Extension to discrete memoryless source
• Elements of encoding
• Properties of code
• Kraft-Macmillan Inequality
• Code length
• Code Efficiency

ECE403 - INFORMATION THEORY & CODING


22-09-2021 7
UNIT I – Information Theory and Source Coding
Contents
• UNIT I – Information Theory and Source Coding
• Source Coding Theorem
• Source Coding Techniques
• Shannon encoding
• Shannon-Fano encoding
• Huffman's encoding
• Arithmetic Coding
• Run-Length Encoding
• Lempel-Ziv encoding and decoding

ECE403 - INFORMATION THEORY & CODING


22-09-2021 8
UNIT I – Information Theory and Source Coding
Introduction

• Information Theory deals with a mathematical modeling and analysis of a Communication System

• It determines the capacity of the system to transfer essential information from the source to destination

• Information Theory provides the following limitations


1.Minimum number of bits/symbol required to completely specify the source
2.Maximum rate at which reliable communication can take place

ECE403 - INFORMATION THEORY & CODING


22-09-2021 9
UNIT I – Information Theory and Source Coding
Basic Block Diagram of a Communication System

• Transmitter: Convert the message signal produced by the source of information into a form suitable for
transmission over the channel. However, as the transmitted signal propagates along the channel, it is distorted due
to channel imperfections
• Channel: Physical medium that connects the transmitter and the receiver. noise and interfering signal (originating
from other sources) are added to the channel output, with the result that the received signal is a corrupted version
of the transmitted signal
• Receiver: Reconstructs a recognizable form of the original message signal for an end user or information sink

ECE403 - INFORMATION THEORY & CODING UNIT I


22-09-2021 10
– Information Theory and Source Coding
Block Diagram of Communication System

Transmitter

Receiver
• The basic block representation of Communication System consists of:
• Transmitter
• Channel
• Receiver

ECE403 - INFORMATION THEORY & CODING


22-09-2021 11
UNIT I – Information Theory and Source Coding
Block Diagram of Communication System

• Information Source

• Binary stream of bits (0’s and 1’s)

• Source Encoder

• Compresses the data into minimum number of bits in order to have effective utilization of the
bandwidth

• It is done by removing redundant information (bits)

• Channel Encoder

• Performs the process of error correction as the noise in the channel might alter the information

• It adds redundant bits to the transmitted data called the error correcting bits

ECE403 - INFORMATION THEORY & CODING


22-09-2021 12
UNIT I – Information Theory and Source Coding
Block Diagram of Communication System

• Modulator

• This facilitates the transmission of signal over long distances

(e.g.,) ASK, FSK. PSK, QAM, QPSK

• Channel

• Medium of data transfer

• Source of various types of noise

• Demodulator

• The received signal is demodulated to extract the original signal from the carrier

ECE403 - INFORMATION THEORY & CODING


22-09-2021 13
UNIT I – Information Theory and Source Coding
Block Diagram of Communication System

• Channel Decoder

• The distortions occurred during the transmission are corrected by the decoder

• Source Decoder:

• The source decoder recreates the source output

• Destination:

• Output at the receiver end of the communication system

ECE403 - INFORMATION THEORY & CODING


22-09-2021 14
UNIT I – Information Theory and Source Coding
Limitations of a Communication System

• Bandwidth

• Noise

• Equipment

• Minimum number of bits/symbol

• Channel capacity (R≤C)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 15
UNIT I – Information Theory and Source Coding
Information

• Information encountered in a Communication system are statistically defined

• Most significant feature is unpredictability

• An information source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution

• Information source can either be with memory or memory less

• Source with memory depends on previous information (symbol)

• Memoryless source produces symbols independent of previous symbols

• Information is a non negative quantity - Important property

ECE403 - INFORMATION THEORY & CODING


22-09-2021 16
UNIT I – Information Theory and Source Coding
Properties of Self information

• I(𝒙𝒊 ) satisfies the following properties:

• If receiver knows message being transmitted then information is zero,


I(𝑥𝑖 ) = 0 for P(𝑥𝑖 ) = 1

• If uncertainty is more information is more and probability is less


I(𝑥𝑖 ) >I(𝑥𝑗 ) if P(𝑥𝑖 ) <P(𝑥𝑗 )

• I(𝑥𝑖 , 𝑥𝑗 ) = I(𝑥𝑖 ) + I(𝑥𝑗 ) , if 𝑥𝑖 and 𝑥𝑗 are independent

• If there are m = 2𝑁 equally likely messages, then amount of information carried by each message
will be “N” bits

ECE403 - INFORMATION THEORY & CODING


22-09-2021 17
UNIT I – Information Theory and Source Coding
Self Information

• Lack of Information, that gives the amount of uncertainty


• System can be binary, analog or digital
• The unit of choice of self information is based on the information
• Let us consider a DMS denoted by ‘x’ and having alphabet x = {𝑥1 , 𝑥2 , ……, 𝑥𝑚 }
I(𝑥𝑖 ) = - 𝑙𝑜𝑔 P(𝑥𝑖 )

 I(𝒙𝒊 ) Unit of I(𝒙𝒊 ) Name of Sequence


2 I (𝑥𝑖 ) = - 𝑙𝑜𝑔2 P(𝑥𝑖 ) Bits Binary
3 I (𝑥𝑖 ) = - 𝑙𝑜𝑔3 P(𝑥𝑖 ) Triples Ternary
4 I (𝑥𝑖 ) = - 𝑙𝑜𝑔4 P(𝑥𝑖 ) Quadruples Quaternary
10 I (𝑥𝑖 ) = - 𝑙𝑜𝑔10 P(𝑥𝑖 ) Decits or Hartley Decadron
e I (𝑥𝑖 ) = - 𝑙𝑜𝑔𝑒 P(𝑥𝑖 ) Nats Natural

ECE403 - INFORMATION THEORY & CODING


22-09-2021 18
UNIT I – Information Theory and Source Coding
Properties of Self information

• Unit of I(𝒙𝒊 ):

• The unit of I(𝑥𝑖 ) is the bit (binary unit) if b = 2

• Hartley or decit if b = 10

• nat (natural unit) if b = e

• It is standard to use b = 2 since efficiency is more

𝑙𝑜𝑔2 𝑎 = ln 𝑎 / ln 2 = log 𝑎 / log 2

ECE403 - INFORMATION THEORY & CODING


22-09-2021 19
UNIT I – Information Theory and Source Coding
Base Conversions

log10 𝑋 log10 𝑋
• log 2 X = =
log10 2 0.3010

log10 𝑋 log10 𝑋
• log 3 X = =
log10 3 0.4771

log10 3 0.4771
• log 2 3 = = = 1.585
log10 2 0.3010

log10 2
• log 2 2 = =1
log10 2

1 1
• log 2 ( ) = 3.322 x log10 ( )
P(A) P(A)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 20
UNIT I – Information Theory and Source Coding
Problems on Self Information

• A source produces one of the four possible symbols during each interval having probabilities P(𝑿𝟏 )=0.5,
P(𝑿𝟐 )=0.25, P(𝑿𝟑 )=P(𝑿𝟒 )=0.125 . Obtain the information content of each of these symbols.
Soln.,
1
I(X) = 𝑙𝑜𝑔2 ( )
𝑃(𝑋)

Therefore.,
1 1 1
I(𝑋1 ) = 𝑙𝑜𝑔2 ( ) = 1 bit ; I(𝑋2 ) = 𝑙𝑜𝑔2 ( ) = 2 bits ; I(𝑋3 ) = I(𝑋4 ) = 𝑙𝑜𝑔2 ( ) = 3 bits
0.5 0.25 0.125

• Calculate the amount of information if binary digits occur with equal likelihood in a binary PCM System.
Soln., Number of Symbols in PCM = 2 (0 and 1)
i.e., P(𝑋1 ) = P(𝑋2 ) = 0.5
1
I(𝑋1 ) = I(𝑋2 ) = 𝑙𝑜𝑔2 ( ) = 1 bit
0.5

ECE403 - INFORMATION THEORY & CODING


22-09-2021 21
UNIT I – Information Theory and Source Coding
Problems on Self Information

• If the receiver knows the message being transmitted, the amount of information carried will be Zero.
PROVE THE STATEMENT

Soln., If the receiver is knowing the message transmitted then, P(X) = 1


1 1
I(X) = 𝑙𝑜𝑔2 ( ) = 𝑙𝑜𝑔2 ( ) = 0
𝑃(𝑋) 1

• A card is selected at random from a deck and found that it is from red suite, how much information is
received? How much more information is needed to completely specify the card?

26 1
Soln., Information Received: P(𝐴1 ) = = 0.5 ; I(𝐴1 ) = 𝑙𝑜𝑔2 ( ) = 1 bit
52 0.5
1 1
Information Received: P(𝐴2 ) = = 0.019 ; I(𝐴2 ) = 𝑙𝑜𝑔2 ( ) = 5.7 bits
52 0.019

Therefore, I(𝐴2 ) - I(𝐴1 ) = 4.7 bits


ECE403 - INFORMATION THEORY & CODING
22-09-2021 22
UNIT I – Information Theory and Source Coding
Problems on Self Information

• A single TV picture can be thought of an array of black, white and gray dots roughly 500 rows and 600
columns. Suppose that each of these dots may take on any one of 10 distinguishable levels, What is the
amount of information provided by one picture?

Soln., Total number of possible dots = 500 x 600 = 300000


Total number of pics possible = 10 x 10 x ……. x 10 = 10300000

1
Probability of one pic, P(A) =
10300000

I(A) = 𝑙𝑜𝑔2 10300000 = 3 x 105 𝑙𝑜𝑔2 10


= 996578.43 = 1 Mb
= 9.965 x 105 bits

ECE403 - INFORMATION THEORY & CODING


22-09-2021 23
UNIT I – Information Theory and Source Coding
Discrete Memoryless Source (DMS)

• If the emitted symbols are statistically independent, i.e., any symbol being produced does not depend
upon the symbols that have been produced already , we say that the source has no memory and is called
as Discrete Memoryless Source

• Information content of a DMS:

• The amount of information contained in a symbol (𝑋𝑟 ) emitted by the DMS is closely related to the
amount of uncertainty of that symbol

• A mathematical measure of information should be a function of the probability of the


outcome and should satisfy the following axioms

(a) Information should be proportional to the uncertainty of an outcome

(b) Information contained in independent outcomes should add up


ECE403 - INFORMATION THEORY & CODING
22-09-2021 24
UNIT I – Information Theory and Source Coding
Discrete Memoryless Source (DMS)

• A discrete source emits sequence of symbols from fixed finite source alphabet,

{S} = {𝑠0 , 𝑠1, 𝑠2 , …….. , 𝑠𝑞 } [1]

• Probabilities of emitted signals,


{P} = {𝑝0 , 𝑝1 , 𝑝2 , …….. , 𝑝𝑞 } [2]
σ𝑞𝑘=1 𝑃𝑘 = 1

• Suppose we consider a long sequence of symbols, where n symbols is made up 𝑛1 symbols of type
𝑠1, 𝑛2 symbols of type 𝑠2 and 𝑛𝑞 symbols of type 𝑠𝑞 . The amount of information associated with
each symbol of the source is given by,

1
I(𝑆𝑘 ) = 𝑙𝑜𝑔2 ( ) [3]
𝑃(𝑠𝑘 )

ECE403 - INFORMATION THEORY & CODING


22-09-2021 25
UNIT I – Information Theory and Source Coding
Discrete Memoryless Source (DMS)

• Average information is given by,


1
q
H(S) = σk=1 I 𝑠𝑘 . 𝑛𝑘 [4]
𝑛

q 𝑛𝑘
H(S) = σk=1 𝑝𝑘 . I 𝑠𝑘 ](𝑃𝑘 = ) [5]
𝑛

𝒒 𝟏
H(S) =σ𝐤=𝟏 𝒑𝒌 𝒍𝒐𝒈𝟐 ( ) bits/symbol [6]
𝒑𝒌

H(S) is called as entropy or average information or average uncertainty

ECE403 - INFORMATION THEORY & CODING


22-09-2021 26
UNIT I – Information Theory and Source Coding
Entropy

• Entropy is a measure of the uncertainty in a random variable

• The entropy H, of a discrete random variable X is a measure of the amount of uncertainty


associated with the value of X

• Entropy is found maximum, when the uncertainty is maximum i.e., when all the alphabet of
source X are equiprobable

• The quantity H(X) is called the entropy of source X

ECE403 - INFORMATION THEORY & CODING


22-09-2021 27
UNIT I – Information Theory and Source Coding
Entropy

• It is a measure of the average information content per source symbol denoted by,

H(X) = E [I(𝑥𝑖 )]= - Σ P(𝑥𝑖 ) I(𝑥𝑖 ) = - Σ P(𝑥𝑖 ) 𝑙𝑜𝑔2 P(𝑥𝑖 )

• Its unit is bits/symbol

• Entropy for Binary Source:


H(X) = - 1/2 𝑙𝑜𝑔2 1/2 – 1/2 𝑙𝑜𝑔2 1/2 = 1 bit/symbol

• The source entropy H(X) satisfies the relation: 0 ≤ H(X) ≤ 𝒍𝒐𝒈𝟐 m, where m is the size of the
alphabet source X

ECE403 - INFORMATION THEORY & CODING


22-09-2021 28
UNIT I – Information Theory and Source Coding
Properties of Entropy

• Continuity Property:

• If the probability of occurrence of events 𝑋𝑘 are slightly changed, the measurement of uncertainty associated
with the system varies accordingly in a continuous manner

𝑝𝑘 = P (𝑋𝑘 ) ; 0 ≤ 𝑝𝑘 ≤ 1

H(𝑋𝑘 ) = - σ𝑁
1 𝑝𝑘 𝑙𝑜𝑔2 𝑝𝑘 bits/symbol

• As, 𝑝𝑘 is continuous between the limits 0 and 1, H(X) also varies continuously

• Symmetry Property:

• H(X) is functionally symmetric in every 𝑝𝑘 ; H(𝑝𝑘 , 1-𝑝𝑘 ) = H(1-𝑝𝑘 , 𝑝𝑘 ) for all k = 1 to q

• As the entropy is the sum of weighted averages its value remains the same even when the position of the
values are interchanged. The value of the entropy function remains same irrespective of location of
probabilities
ECE403 - INFORMATION THEORY & CODING
22-09-2021 29
UNIT I – Information Theory and Source Coding
Properties of Entropy

• Minimum value of H(X) is Zero: 0 ≤ H(X) ≤ 𝑙𝑜𝑔2 N ; where N is total number of symbols

• Extremal Property:

• Entropy has its maximum value when all the events are equally likely

• Additive Property:

• 𝐻2 (𝑝1 ,𝑝2 ,𝑝3 ,….,𝑝𝑁−1 , 𝑞1 ,𝑞2 ,𝑞3 ,….,𝑞𝑚 ) = 𝐻1 (𝑝1 ,𝑝2 ,𝑝3 ,….,𝑝𝑁 ) + 𝑝𝑁 𝐻3 (𝑞1 /𝑝𝑁 , 𝑞2 /𝑝𝑁 ,….,
𝑞𝑁 /𝑝𝑁 )

ECE403 - INFORMATION THEORY & CODING


22-09-2021 30
UNIT I – Information Theory and Source Coding
Logarithmic Inequalities

• Plot of straight line, Y=X-1


Y
• Log function, Y= ln (X)
+1 • ln X ≤ (X-1) ; equality if X=1
X Multiply by -1.,
1
ln ( ) ≥ (1-X) ; equality if X=1
-1 X

• The figure shows plots of a straight line Y=X-1 and a log function
Y= ln X on the same set of co=ordinate axis
• Plot of ln X, always lie below the
• Notice that any point on the straight line will always be found
straight line Y = X-1 and equality
exists when X= +1 above the log function for any given value of X
• The straight line is tangent to log function at X=1

ECE403 - INFORMATION THEORY & CODING


22-09-2021 31
UNIT I – Information Theory and Source Coding
Proof for Extremal Property

• Statement: For a zero memory information source with ‘q’ symbol alphabet the entropy becomes
maximum if and only if all the source symbols are equally probable
1
𝐇(𝐒)𝒎𝒂𝒙 = log q ; if 𝑝𝑘 = for all k=1 to q
𝑞

• Proof: Consider a memory less source with q symbol alphabet, {S} = {𝑠1, 𝑠2 , …….. , 𝑠𝑞 } with
probabilities {P} = {𝑝1 , 𝑝2 , 𝑝3 , …….. , 𝑝𝑞 } . Entropy of source is given by,

𝑞 1
H(S) = σ𝑘=1 𝑝𝑘 𝑙𝑜𝑔2 ( ) [1]
𝑝𝑘

Consider log q – H(S)


𝑞 1
= 1 . log q - σ𝑘=1 𝑝𝑘 𝑙𝑜𝑔2 ( ) [2]
𝑝𝑘

ECE403 - INFORMATION THEORY & CODING


22-09-2021 32
UNIT I – Information Theory and Source Coding
Proof for Extremal Property

𝑞
(σ𝑘=1 𝑃𝑘 = 1)
𝑞 𝑞 1
= σ𝑘=1 𝑝𝑘 log q - σ𝑘=1 𝑝𝑘 log [3]
𝑝𝑘
𝑞
=>𝑙𝑜𝑔2 q – H(S) = σ𝑘=1 𝑝𝑘 𝑙𝑜𝑔2 (q 𝑝𝑘 ) [4]
Changing the base to e, 𝑙𝑜𝑔2 X = 𝑙𝑜𝑔2 e . 𝑙𝑜𝑔𝑒 X
𝑞
𝑙𝑜𝑔2 q – H(S) = 𝑙𝑜𝑔2 e [ σ𝑘=1 𝑝𝑘 ln q. 𝑝𝑘 ] [5]

1
• Apply logarithmic inequality ln ( )≥ (1-X) [6]
𝑋
1
X=
𝑞 . 𝑝𝑘
𝑞 1
𝑙𝑜𝑔2 q – H(S) ≥ 𝑙𝑜𝑔2 e σ𝑘=1 𝑝𝑘 . ( 1 - ) [7]
𝑞 . 𝑝𝑘

ECE403 - INFORMATION THEORY & CODING


22-09-2021 33
UNIT I – Information Theory and Source Coding
Proof for Extremal Property

1
Equality holds on only if X=1, 𝑞 . 𝑝𝑘 = 1 => 𝑝𝑘 = [8]
𝑞

𝑞 𝑞 1
𝑙𝑜𝑔2 q – H(S) ≤ 𝑙𝑜𝑔2 e ( σ𝑘=1 𝑝𝑘 - σ𝑘=1 ) [9]
𝑞

𝑙𝑜𝑔2 q – H(S) ≥ 0 [10]

H(S) ≤ log q

𝐇(𝐒)𝒎𝒂𝒙 = log q Equality if 𝑞 . 𝑝𝑘 = 1 [11]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 34
UNIT I – Information Theory and Source Coding
Proof for Additive Property

• Statement: Partitioning of symbols or events into sub-symbols or sub events cannot decrease the entropy

• Proof: Consider a memoryless information source with ‘q’ symbol alphabets {S} = {𝑠0 , 𝑠1 , 𝑠2 , …….. , 𝑠𝑞 } with
associated probabilities {𝑝1 , 𝑝2 , 𝑝3 , …….. , 𝑝𝑞 } . Suppose we split symbol 𝑆𝑞 into ‘m’ sub-symbols such that,

𝑆𝑞 = σ𝑚
𝑗=1 𝑠𝑞𝑗 ; P {𝑆𝑞𝑗 } = 𝑝𝑞𝑗

𝑃𝑞 = σ𝑚
𝑗=1 𝑝𝑞𝑗

H(S) = H(p1 , …… , pq−1 , pq1 , pq2 , …. , pq𝑚 ) [1]

𝑞−1 1 1
H(S) =σ𝑘=1 𝑝𝑘 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [2]
𝑝𝑘 𝑝𝑞𝑗

𝑞 1 1 1
= σ𝑘=1 𝑝𝑘 log ( ) - 𝑝𝑞 log + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [3]
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗

ECE403 - INFORMATION THEORY & CODING


22-09-2021 35
UNIT I – Information Theory and Source Coding
Proof for Additive Property

Since, 𝑝𝑞 = σ𝑚
𝑗=1 𝑝𝑞𝑗

𝑞 1 1 1
H(S) = σ𝑘=1 𝑝𝑘 log ( ) - σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [4]
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗

𝑞 1 1 1
H(S) = σ𝑘=1 𝑝𝑘 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 ( - log ( ) + log ( ))
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗

1 𝑝𝑞
H(S) = σ𝑞𝑘=1 𝑝𝑘 log ( )+ σ𝑚
𝑗=1 𝑝𝑞𝑗 log [5]
𝑃𝑘 𝑝𝑞𝑗

ECE403 - INFORMATION THEORY & CODING


22-09-2021 36
UNIT I – Information Theory and Source Coding
Proof for Additive Property

Multiply and Divide 2nd part of RHS by 𝑝𝑞 ,


1 𝑝𝑞𝑗 𝑝𝑞
H(S) = σ𝑞𝑘=1 𝑝𝑘 log ( )+ 𝑚
𝑝𝑞 σ𝑗=1 . log [6]
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗

𝑝𝑞1 𝑝𝑞2 𝑝𝑞𝑚


H(S) = H(p1 , p2 , ….. , pq ) + pq H( , , ….. , ) [7]
𝑝𝑞 𝑝𝑞 𝑝𝑞

Since, the entropy functions are essentially non-negative, we have

H(p1 , p2 , ….. , pq−1 , pq1 , pq2 , ….. , pq𝑚 ) ≥ H(p1 , p2 , ….. , pq )

i.e., partitioning of symbols into sub-symbols cannot decrease entropy

ECE403 - INFORMATION THEORY & CODING


22-09-2021 37
UNIT I – Information Theory and Source Coding
Numerical Problem on Additive Property

𝟏 𝟒 𝟖
• A sample space of events is shown with {P} = { , , } . Evaluate A
𝟓 𝟏𝟓 𝟏𝟓 A
(i) Average uncertainty associated with the scheme M
C
𝐁 𝐂
(ii) Average uncertainty pertaining to the following probability scheme [ A , M = B U C ] , [𝐌 , 𝑴 ] B
(iii) Verify the rule of additivity

1 1 1 4 1 8 1
Soln., (i) H(S) = σ3k=1 𝑝𝑘 log ( ) = { ( log ( 1 ) ) + ( log ( 4 ) ) + ( log ( 8 ))}
𝑝𝑘 5 15 15
5 15 15

= 1.456 bits/symbol [1]

(ii) A , M = B U C ; M is divided into B U C sub-symbols

Average uncertainty, H [A , M = B U C ]
4 8 12 4
P(M) = P(B U C) = P(B) + P(C) = + = = = 0.8
15 15 15 5

ECE403 - INFORMATION THEORY & CODING


22-09-2021 38
UNIT I – Information Theory and Source Coding
Numerical Problem on Additive Property
1 4 1 1 4 1
• H(A , M) = H( , ) = { ( log ( 1 )+ ( log ( 4 )}
5 5 5 5
5 5

= 0.46 + 0.2575 = 0.721

H [A , M = B U C ] = 0.721 bits/symbol
B C
• H( , ):
M M

4 8
B P(B) 15 1 C P(C) 15 2
P( ) = = 4 = ; P( ) = = 4 =
M P(M) 3 M P(M) 3
5 5

B C 1 2 1 1 2 1
• H( , ) = H( , ) = { ( log ( 1 ) + ( log ( 2 )}
M M 3 3 3 3
3 3

= 0.528 + 0.39 = 0.918 bits/symbol

ECE403 - INFORMATION THEORY & CODING


22-09-2021 39
UNIT I – Information Theory and Source Coding
Numerical Problem on Additive Property

(iii) Rule of Additivity:

P𝑞1 P𝑞2 P𝑞𝑚


H(S) = H( P1 , P2 , … , P𝑞 ) + P𝑞 H( , ,…, )
P𝑞 P𝑞 P𝑞

P = { P(A) , P(M) }

P(B) P(C)
H(S) = H( P(A) , P(M) ) + P(M) H ( , )
P(M) P(M)

H(S) = 0.721 + (0.8) (0.918)

H(S) = 1.456 bits/symbol [2]

[1] = [2]

Hence, Rule of additivity is verified

ECE403 - INFORMATION THEORY & CODING


22-09-2021 40
UNIT I – Information Theory and Source Coding
Entropy of a Zero Memory Binary Source

• For a zero memory binary source with source alphabet {S} = {0,1} with probabilities {P} = {p,q} ; p + q = 1

1
H(S) = σ2𝑘=1 𝑝𝑘 log ( )
𝑝𝑘
H(S)
1 1
= p log + q log
𝑝 𝑞
= - p log p – q log q
= - p log p – (1-p) log (1-p)

0 0.5 1
Probability

• The sketch showing variation of H(S) with probability is shown. If the output of source is certain, then the
source provides no information
• The maximum entropy of the source occurs if 0 and 1 are equally likely

ECE403 - INFORMATION THEORY & CODING


22-09-2021 41
UNIT I – Information Theory and Source Coding
Information Rate

• Suppose two sources have equal entropies but one is faster than the other producing more number of
symbols/unit time., In a given period, more information will be transmitted by the faster source than the
other
• If the time rate at which X emits symbols is ‘𝑟𝑠 ’ (symbols s), the information rate R of the source is
given by,

R = 𝒓𝒔 . H(X) b/s [(symbols / second) * (information bits/ symbol)]

where, R is the information rate


𝑟𝑠 is the symbol rate
H(X) is the Entropy or average information
𝟏
𝒓𝒔 = symbol/sec ; τ is the average symbol duration
τ
τ= σ𝑞𝑘=1 𝑃𝑘 τ𝑘 seconds/symbol ; τ𝑘 is the duration of 𝑘 𝑡ℎ symbol

ECE403 - INFORMATION THEORY & CODING


22-09-2021 42
UNIT I – Information Theory and Source Coding
Numerical Problem

• An event has 5 possible outcomes with probabilities 0.5, 0.25, 0.125, 0.0625 and 0.0625. Find the
entropy of the system and also find the rate of information if there are 16 outcomes per second

Soln., H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.5 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + (0.125 𝑙𝑜𝑔2 0.125) + 2 x
(0.0625 𝑙𝑜𝑔2 0.0625) } = 1.875 bits/symbol

r = 16

R = 16 x 1.875 ≈ 30 bits/second

ECE403 - INFORMATION THEORY & CODING


22-09-2021 43
UNIT I – Information Theory and Source Coding
Numerical Problem

• A continuous signal is bandlimited to 5kHz. The signal is quantized in 8 levels of a PCM system
with probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05, 0.05 . Calculate the entropy and rate of
information

Soln.,

H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.25 𝑙𝑜𝑔2 0.25) + 2 x (0.2 𝑙𝑜𝑔2 0.2) + 2 x (0.1 𝑙𝑜𝑔2 0.1) + 3 x (0.05
𝑙𝑜𝑔2 0.05) } = 2.7412 bits/symbol

r = 𝑓𝑠 = 10000 bits

R = 10000 x 2.7412 = 27412 bits/second

ECE403 - INFORMATION THEORY & CODING


22-09-2021 44
UNIT I – Information Theory and Source Coding
Extension of Discrete Memoryless Source (or)
Zero Memory Source

• It is useful to consider blocks rather than individual symbols, with each block consisting of ‘n’
successive source symbols

• Each such block is being produced by an extended source with a source alphabet ‘s n ’ that has ‘k n ’
distinct block where, ‘k’ is the number of symbols in the source alphabet of the original source

• In the case of discrete memoryless source, the source symbols are statistically independent. Hence, the
probability of source symbol in ‘s n ’ in the product of the probabilities ‘n’ source symbols in ‘s’
constituting the particular source symbol in ‘s n ’

H(S n ) = n H(S)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 45
UNIT I – Information Theory and Source Coding
Numerical Problem (Contd.,)

• Consider a discrete memoryless source with source alphabet, {𝑺𝟎 , 𝑺𝟏 , 𝑺𝟐 } with {P} = { 0.25, 0.25, 0.5 }
Prove that, H(𝑺𝟐 ) = 2 H(S)

Soln., k = 3 , n = 2
Extended source consists of 32 = 9 symbols

Blocks 𝑺𝟎 𝑺𝟎 𝑺𝟎 𝑺𝟏 𝑺𝟎 𝑺𝟐 𝑺𝟏 𝑺𝟏 𝑺𝟏 𝑺𝟐 𝑺𝟏 𝑺𝟎 𝑺𝟐 𝑺𝟐 𝑺𝟐 𝑺𝟏 𝑺𝟐 𝑺𝟎
Probability 0.0625 0.0625 0.125 0.0625 0.125 0.0625 0.25 0.125 0.125

H(S) =- Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.25 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + (0.5 𝑙𝑜𝑔2 0.5) = 1.5

2 x H(S) = 3

H(S 2 ) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { 4 x (0.0625 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + 4 x (0.125 𝑙𝑜𝑔2 0.125) } = 3

Hence Proved, H(𝑺𝟐 ) = 2 H(S)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 46
UNIT I – Information Theory and Source Coding
Encoding

• Encoding is the procedure for associating words constructed from a finite alphabet of a
language with given words of another language in a one to one manner

• Classification of Codes:
• Fixed Length Codes:

A fixed length code is defined as a code whose word length is fixed

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {00, 01, 10, 11}

• Variable Length Codes:

A fixed length code is defined as a code whose word length is not fixed

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 01, 110, 111}

ECE403 - INFORMATION THEORY & CODING


22-09-2021 47
UNIT I – Information Theory and Source Coding
Classification of Codes

• Distinct Codes:

A distinct code is defined as the one in which each codeword is distinguishable from each other

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 1, 00, 11}

• Uniquely Decipherable Encoding:

A code is said to be uniquely decipherable if any sequence of code word can be interpreted in only
one way

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 01, 010, 101} – Not Uniquely Decipherable

{𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 10, 110, 111} – Uniquely Decipherable

ECE403 - INFORMATION THEORY & CODING


22-09-2021 48
UNIT I – Information Theory and Source Coding
Classification of Codes

• Prefix Free Codes:

A code in which no codeword can be formed by adding code symbols to another codeword is called a
prefix free code. In a Prefix-free code no codeword is a prefix of another
E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 1, 10, 11} – Not Prefix Code

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 10, 110, 111} – Prefix Code

• Non Singular:

A block code is said to be non-singular, if all the code words of the word set are distinct

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 01, 10, 11}

ECE403 - INFORMATION THEORY & CODING


22-09-2021 49
UNIT I – Information Theory and Source Coding
Classification of Codes

• Instantaneous Code:

A code word having a property that no codeword is a prefix of another codeword is said to be
instantaneous

E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 01, 011, 0111} – Not Instantaneous

{𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 100, 101, 11} – Instantaneous

• Optimal Code:

An instantaneous code is said to be optimal if it has minimum average length for a source with the
given probability of assignment for the source symbol

ECE403 - INFORMATION THEORY & CODING


22-09-2021 50
UNIT I – Information Theory and Source Coding
Code Length

• Codeword Length:

• Let X be a DMS with finite entropy H(X) and an alphabet {𝑥1 , 𝑥2 , ……, 𝑥𝑚 } with corresponding
probabilities of occurrence P(𝑥𝑖 ) (i = 1, …. , m)

• Let the binary code word assigned to symbol xi by the encoder have length 𝑛𝑖 , measured in bits

• The length of the code word is the number of binary digits in the code word

ECE403 - INFORMATION THEORY & CODING


22-09-2021 51
UNIT I – Information Theory and Source Coding
Average Codeword Length

• Average Codeword Length:

• The average code word length L, per source symbol is given by,

L = σ𝑘𝑖=1 𝑝𝑖 . 𝑛𝑖

• The parameter L represents the average number of bits per source symbol used in the
source coding process

ECE403 - INFORMATION THEORY & CODING


22-09-2021 52
UNIT I – Information Theory and Source Coding
Code Efficiency and Redundancy

• Code Efficiency:

Efficiency is defined as the ratio of the average information per symbol of encoded language to the
maximum possible average information per symbol denoted by,

𝐻(𝑆) 𝐻(𝑆)
η= ; If r = 2 then., η =
𝐿 .𝑙𝑜𝑔2 𝑟 𝐿

where, H(S) - Entropy

L - Average Length

• Redundancy:

Redundancy = 1 – Efficiency

ECE403 - INFORMATION THEORY & CODING


22-09-2021 53
UNIT I – Information Theory and Source Coding
Numerical Problem

• Let us consider a source having four messages, S = {𝒔𝟎 , 𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } – {0, 10, 110, 111} with
probabilities 0.5, 0.25, 0.125, 0.125 . Calculate the efficiency and redundancy of the code.

Soln., Symbols Prob. Code • H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.5 𝑙𝑜𝑔2 0.5) + (0.25
𝑙𝑜𝑔2 0.25) + 2 x (0.125 𝑙𝑜𝑔2 0.125)} = 1.75 bits/symbol
𝑺𝟎 0.5 0
• L = { ( 1 x 0.5 ) + ( 2 x 0.25 ) + 2 x ( 3 x 0.125 ) } = 1.75
bits
𝑺𝟏 0.25 10
• Efficiency = H(S)/L = 100%
𝑺𝟐 0.125 110 • Redundancy = 0

𝑺𝟑 0.125 111

ECE403 - INFORMATION THEORY & CODING


22-09-2021 54
UNIT I – Information Theory and Source Coding
Numerical Problem

• Let us consider a source having four messages, S = {𝒔𝟎 , 𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } – {0, 10, 110, 111} with
probabilities 1/3, 1/3, 1/6, 1/6. Calculate the efficiency and redundancy of the code.

Soln., • H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { 2 x (1/3 𝑙𝑜𝑔2 1/3) + 2 x


Symbols Prob. Code
(1/6 𝑙𝑜𝑔2 1/6)
𝑺𝟎 1/3 0
= 1.918 bits/symbol

𝑺𝟏 1/3 10
• L = { ( 1 x 1/3 ) + ( 2 x 1/3 ) + 2 x ( 3 x 1/6 ) } = 2 bits
• Efficiency = H(S)/L = 95.9%
𝑺𝟐 1/6 110 • Redundancy = 4.1%

𝑺𝟑 1/6 111

ECE403 - INFORMATION THEORY & CODING


22-09-2021 55
UNIT I – Information Theory and Source Coding
Numerical Problem

• The output of a discrete source is given by, {x} = {𝒙𝟏 , 𝒙𝟐 ,…, 𝒙𝟔 } with probabilities {P} = {𝟐−𝟏 , 𝟐−𝟐 , 𝟐−𝟒 ,
𝟐−𝟒 , 𝟐−𝟒 , 𝟐−𝟒 }is encoded in the following ways:
(i) Determine which of these codes are uniquely decodable?
(ii) Determine which of these codes have prefix property?
(iii) Find average length of each uniquely decodable code?

𝑪𝟏 𝑪𝟐 𝑪𝟑 𝑪𝟒 𝑪𝟓 𝑪𝟔
𝒙𝟏 0 (1) 1 (1) 0 (1) 111 (3) 1 (1) 0 (1)
𝒙𝟐 10 (2) 011 (3) 10 (2) 110 (3) 01 (2) 01 (2)
𝒙𝟑 110 (3) 010 (3) 110 (3) 101 (3) 0011 (4) 011 (3)
𝒙𝟒 1110 (4) 001 (3) 1110 (4) 100 (3) 0010 (4) 0111 (4)
𝒙𝟓 1011 (4) 000 (3) 11110 (5) 011 (3) 0001 (4) 01111 (5)
𝒙𝟔 1101 (4) 110 (3) 111110 (6) 010 (3) 0000 (4) 011111 (6)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 56
UNIT I – Information Theory and Source Coding
Numerical Problem (Contd.,)

𝑪𝟏 𝑪𝟐 𝑪𝟑 𝑪𝟒 𝑪𝟓 𝑪𝟔
𝒙𝟏 0 (1) 1 (1) 0 (1) 111 (3) 1 (1) 0 (1)
𝒙𝟐 10 (2) 011 (3) 10 (2) 110 (3) 01 (2) 01 (2)
𝒙𝟑 110 (3) 010 (3) 110 (3) 101 (3) 0011 (4) 011 (3)
𝒙𝟒 1110 (4) 001 (3) 1110 (4) 100 (3) 0010 (4) 0111 (4)
𝒙𝟓 1011 (4) 000 (3) 11110 (5) 011 (3) 0001 (4) 01111 (5)
𝒙𝟔 1101 (4) 110 (3) 111110 (6) 010 (3) 0000 (4) 011111 (6)
Instantaneous No No Yes Yes Yes No
Uniquely Decodable No No Yes Yes Yes Yes

Average Length., L3 = (1x2−1)+(2x2−2 )+(3x2−4 )+(4x2−4 )+(5x2−4 )+(6x2−4 ) = 2.2125 bits/symbol


L4 = (3x2−1)+(3x2−2 )+(3x2−4 )+(3x2−4 )+(3x2−4 )+(3x2−4 ) = 3 bits/symbol
L5 = (1x2−1)+(2x2−2 )+(4x2−4 )+(4x2−4 )+(4x2−4 )+(4x2−4 ) = 2 bits/symbol
L6 = (1x2−1)+(2x2−2 )+(3x2−4 )+(4x2−4 )+(5x2−4 )+(6x2−4 ) = 2.2125 bits/symbol

ECE403 - INFORMATION THEORY & CODING


22-09-2021 57
UNIT I – Information Theory and Source Coding
Kraft Inequality

• Given a source {S} = {𝑆1 , 𝑆2 ,……, 𝑆𝑞 } . Let the word length of the codes corresponding to
these symbols be {𝑙1 , 𝑙2 , ….. , 𝑙𝑞 } and let the code alphabet {x} = {𝑥1 , 𝑥2 ,……, 𝑥𝑞 } then an
instantaneous code for the source exists if and only if,

σ𝑞𝑘=1 𝑟 −𝑙𝑘 ≤1 [1]

Proof: Let us assume that the word length be arranged in ascending order such that,

𝑙1 ≤𝑙2 ≤ ……… ≤𝑙𝑞 [2]

Since, the code alphabet has only ‘r’ symbols, we can have at most ‘r’ instantaneously decodable
sequence of length ‘l’ so as to satisfy the prefix property

ECE403 - INFORMATION THEORY & CODING


22-09-2021 58
UNIT I – Information Theory and Source Coding
Kraft Inequality

• Let 𝑛𝑘 be the actual number of messages encoded into the codeword of length ‘k’, then

𝑛1 ≤ r [3]

• The number of actual instantaneous codes of word length 2 must obey the rule,

𝑛2 ≤ (r-𝑛1 ) . r i.e., 𝑛2 ≤ 𝑟 2 -𝑛1 [4]

• As the first symbol can only be (r-𝑛1 ) symbols that are not used in forming the code words of
length 1 and second symbol of sequence can be any one of the ‘l’ code alphabet symbols

ECE403 - INFORMATION THEORY & CODING


22-09-2021 59
UNIT I – Information Theory and Source Coding
Kraft Inequality

• Similarly, the actual number of codes of length 3 that are distinguishable from each other and
from 𝑛1 and 𝑛2 words must obey,

𝑛3 ≤ ((r-𝑛1 ) r - 𝑛2 ) r

𝑛3 ≤ 𝑟 3 - 𝑛2 𝑟 2 - 𝑛1 r [5]

• The first two symbols may be selected in (r-𝑛1 ) r - 𝑛2 ways and the third symbol element in ‘r’
ways, then we can write,

𝑛𝑘 ≤ 𝑟 𝑘 - 𝑛1 𝑟 𝑘−1 - 𝑛2 𝑟 𝑘−2 , …… , 𝑛𝑘−1 r [6]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 60
UNIT I – Information Theory and Source Coding
Kraft Inequality

Multiply [6] by 𝑟 −𝑘 and rewriting,

𝑛𝑘 𝑟 −𝑘 + 𝑛𝑘−1 𝑟 −(𝑘−1) + ….. + 𝑛1 𝑟 −1 ≤ 1

σ𝑘𝑗=1 𝑛𝑗 𝑟 −𝑗 ≤ 1 ; ( σ𝑚
𝑗=1 𝑊𝑗 𝐷 −𝑗 ≤1) [7]

σ𝑘𝑗=1 𝑛𝑗 𝑟 −𝑗 = 𝑟 −1 + 𝑟 −1 + … + 𝑟 −1 + 𝑟 −2 + 𝑟 −2 + … + 𝑟 −2 + …… + 𝑟 −𝑘 + 𝑟 −𝑘 + … + 𝑟 −𝑘 𝑛1 times

𝑛1 times 𝑛2 times 𝑛𝑘 times [8]


𝑛 𝑛 𝑛
1
= σ𝑗=1 𝑟 −1 + σ𝑗=1
2
𝑟 −2 + ………. + σ𝑗=1
𝑘
𝑟 −𝑘 [9]
𝑛1 + 𝑛2 + ….. + 𝑛𝑘 = q ;
σ𝑞𝑘=1 𝑟 −𝑙𝑘 ≤1 [10]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 61
UNIT I – Information Theory and Source Coding
Kraft Inequality

• This inequality just tells us whether an instantaneous code exists or not wherein it does not show how to construct
the code or it does not guarantee that any code that has word lengths satisfying the inequality to be instantaneous
itself
• A symbol code is encoded into binary code shown below. Soln., Using Kraft Inequality, given r=2 ;
Which of these are Instantaneous?
Code A: ( 3 x 2−2 + 2−3 + 2 x2−4 ) = 1
Source
Symbol Code A Code B Code C Code D Code E
Code B: (2−1 + 2−5 + 4 x 2−4 ) = 0.78125
𝑆1 00 0 0 0 0
Code C: ( 2−1 + 2−2 + 4 x 2−4 ) = 1
𝑆2 01 10000 10 1000 10

𝑆3 10 1100 110 1110 110 Code D: (2−1 + 2−3 + 4 x 2−4 ) = 0.8125

𝑆4 110 1110 1110 111 1110 Code E: ( 2−1 + 2−2 + 2−3 + 2 x 2−4 + 2−5 ) = 1.031
𝑆5 1110 1101 11110 1011 11110 Code E doesn't satisfy prefix property hence not
instantaneous
𝑆6 1111 1111 11111 1100 1111

ECE403 - INFORMATION THEORY & CODING


22-09-2021 62
UNIT I – Information Theory and Source Coding
Kraft Macmillan Inequality

• Kraft inequality applies to prefix codes which are special cases of uniquely decodable codes. The same
inequality is necessary for uniquely decodable codes and was proved by Macmillan

• Statement: The Kraft Macmillan inequality states that we can construct a uniquely decodable code
with word length 𝑙1 , 𝑙2 , ….. , 𝑙𝑞 iff these lengths satisfy the condition,

σ𝑞𝑘=1 𝑟 −𝑙𝑘 ≤1 [1]

where, ‘r’ is the number of symbols in the code alphabet

ECE403 - INFORMATION THEORY & CODING


22-09-2021 63
UNIT I – Information Theory and Source Coding
Kraft Macmillan Inequality

• Proof:Consider the quantity,

𝑞 𝑛
−𝑙
(σ𝑘=1 𝑟 )𝑘 = (𝑟 −𝑙1 + 𝑟 −𝑙2 + 𝑟 −𝑙3 + ….. + 𝑟 −𝑙𝑞 )𝑛 [2]

Expanding [2] we will have 𝑞𝑛 terms each of the form 𝑟 −𝑙𝑘1 + 𝑟 −𝑙𝑘2 + ….. + 𝑟 −𝑙𝑘𝑛 = 𝑟 −𝑗 ; 𝑙𝑘1 +
𝑙𝑘1 +..+𝑙𝑘𝑛 = j

Suppose ‘l’ is the maximum word length of the codes, then it follows that ‘j’ can be assigned some set of
values from n to nl

Let ‘𝑁𝑗 ’ be the number of terms of the form 𝑟 −𝑗 , then Eqn., [2] can be written as,

𝑞 𝑛
−𝑙
(σ𝑘=1 𝑟 )𝑘 = σnl −𝑗
j=n 𝑁𝑗 𝑟 [3]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 64
UNIT I – Information Theory and Source Coding
Kraft Macmillan Inequality

• 𝑁𝑗 is also the number of strings of ‘n’ code words that can be formed so that each string has a length of
exactly ‘j’ symbols. If a code is uniquely decodable then, 𝑁𝑗 ≤𝑟 𝑗 , the number of distinct r-ary code
sequence of length ‘j’

𝑞 𝑛
(σ𝑘=1 𝑟 ) ≤σnl
−𝑙𝑘
j=n 𝑟
𝑗
𝑟 −𝑗 [6]

≤ nl – n+1 [7]

𝑞 −𝑙 𝑛
For a long sequence, (σ𝑘=1 𝑟 ) ≤
𝑘 nl [8]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 65
UNIT I – Information Theory and Source Coding
Kraft Macmillan Inequality

Taking 𝑛𝑡ℎ root on both sides of the inequality,

σ𝑞𝑘=1 𝑟 −𝑙𝑘 ≤ (𝑛𝑙)1/𝑛 ; for all ‘n’ [9]

Since, x > 1 , 𝑥 𝑛 > nl ; if we take ‘n’ large

lim (𝑛𝑙)1/𝑛 = 1
𝑛→∞

Eqn., [9] holds for any integer

σ𝑞𝑘=1 𝑟 −𝑙𝑘 ≤1 [10]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 66
UNIT I – Information Theory and Source Coding
Numerical Problem

• Let {X} = {𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝟕 } . After encoding we get a set of messages with following lengths 𝐧𝟏 =
2 , 𝒏𝟐 = 2 , 𝒏𝟑 = 3 , 𝒏𝟒 = 3 , 𝒏𝟓 = 3 , 𝒏𝟔 = 4 , 𝒏𝟕 = 5 . Length of the 𝒊𝒕𝒉 code [𝒏𝒊 = 2 , 2 , 3 , 3 , 3 , 4 ,
5] ; [𝐰𝒊 = 0 , 2 , 3 , 1 , 1 , 0 , 0] . Prove that σ𝐍
𝐢=𝟏 𝑫 −𝒏𝒊 = σ𝐦 𝐖 𝑫−𝒋
𝐣=𝟏 𝒋

Soln.,

LHS: σN
i=1 𝐷
−𝑛𝑖 = 𝐷 −2 + 𝐷 −2 + 𝐷 −3 + 𝐷 −3 + 𝐷 −3 + 𝐷 −4 + 𝐷 −5

= 2 𝐷−2 + 3 𝐷−3 + 𝐷−4 + 𝐷−5 [1]

RHS: σm
j=1 W𝑗 𝐷
−𝑗 = 2 𝐷 −2 + 3 𝐷 −3 + 𝐷 −4 + 𝐷 −5 [2]

LHS = RHS

ECE403 - INFORMATION THEORY & CODING


22-09-2021 67
UNIT I – Information Theory and Source Coding
Numerical Problem

• Find the smallest number of letters in the alphabet ‘D’ for dividing a code with a prefix property
such that [w] = [0,3,0,5] . Devise such a code
Soln., σm −𝑗
j=1 W𝑗 𝐷 ≤ 1

Therefore., 0 𝐷 −1 + 3 𝐷−2 + 0 𝐷−3 + 5 𝐷 −4 ≤ 1


3 𝐷−2 + 5 𝐷 −4 ≤ 1
3 5
+ ≤ 1
𝐷2 𝐷4
3 𝐷 2 + 5 ≤𝐷4
𝐷4 - 3 𝐷2 + 5 ≥ 0
(t - 4.19) (t + 1.19) ≥ 0
𝐷2 - 4.19≥ 0
D ≥ 2.04

ECE403 - INFORMATION THEORY & CODING


22-09-2021 68
UNIT I – Information Theory and Source Coding
Numerical Problem

When D = 3; {X} = {0,1,2}


Total number of codes = 0 + 3 + 0 + 5 = 8
Number of codes with length :
1=0 ; 2=3 ; 3=0 ; 4=5
Conditions to devise the code: (i) D = 3
(ii) 3 codes with length 2 ; 5 codes with length 4
(iii) Prefix Property
One such way of constructing code:
𝑚1 = 00 ; 𝑚2 = 01 ; 𝑚3 = 10 ; 𝑚4 = 2000
𝑚5 = 2010 ; 𝑚6 = 2200 ; 𝑚7 = 2202 ; 𝑚8 = 2222

ECE403 - INFORMATION THEORY & CODING


22-09-2021 69
UNIT I – Information Theory and Source Coding
Numerical Problem

• Show all possible sets of binary codes with prefix property for encoding the messages 𝒎𝟏 , 𝒎𝟐 , 𝒎𝟑
in words not more than 3 digits long.
Soln., • Possible Sets.,
𝑾𝟏 𝑾𝟐 𝑾𝟑
• σm
j=1 W𝑗
−𝑗
𝐷 ≤1
1 1 1
• D=2 1 2 0
1 0 2
• W1 2−1 + W2 2−2 + W3 2−3 ≤ 1
0 3 0
• W1 + W2 + W3 = 3
0 0 3
0 2 1
0 1 2

ECE403 - INFORMATION THEORY & CODING


22-09-2021 70
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem

• In Information Theory, Shannon’s noiseless coding theorem places an upper and lower limit on the
minimum possible expected length of the code words as a function of entropy of the input word and
size of the code alphabet

• Statement:Let ‘S’ be the zero memory source with ‘q’ symbols, {S} = {𝑠1 , 𝑠2 ,…, 𝑠𝑞 } and symbol
probabilities {P} = {𝑝1 , 𝑝2 ,…, 𝑝𝑞 } respectively. If ‘S’ ensemble is encoded in a sequence of uniquely
decodable characters taken from the code alphabet of ‘r’ symbols, then

𝐇(𝐒) 𝐇(𝐒)
≤ L< +1 [1]
𝐥𝐨𝐠 𝐫 𝐥𝐨𝐠 𝐫

ECE403 - INFORMATION THEORY & CODING


22-09-2021 71
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem

• Proof: Consider a zero memory source with ‘q’ symbols, {S} = {𝑠1 , 𝑠2 ,…, 𝑠𝑞 } and symbol
probabilities {P} = {𝑝1 , 𝑝2 ,…, 𝑝𝑞 } respectively. Let us encode the symbols into r-ary codes with word
lengths 𝑙1 , 𝑙2 ,…, 𝑙𝑞 , we shall find lower bound for average length of the codewords
𝑞
Let 𝑄1 , 𝑄2 ,…, 𝑄𝑞 be any set of numbers such that 𝑄𝑘 ≥ 0 and σ𝑘=1 𝑄𝑘 = 1 [2]

𝑞 1
Consider the quantity, H(S) - σ𝑘=1 𝑝𝑘 log ( ) [3]
𝑄𝑘

𝑞 1 𝑞 1 𝒒 𝑸𝒌
= σ𝑘=1 𝑝𝑘 log ( ) - σ𝑘=1 𝑝𝑘 log ( ) = σ𝒌=𝟏 𝒑𝒌 log ( ) [4]
𝑝𝑘 𝑄𝑘 𝒑𝒌

ECE403 - INFORMATION THEORY & CODING


22-09-2021 72
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem
• Changing the base and applying log inequality, (log b X = log 𝑏 r . log r X ; ln X ≤ X-1 )
𝑞 𝑄𝑘
=>log 2 e σ𝑘=1 𝑝𝑘 ln ( ) [5]
𝑝𝑘
𝑞 𝑄𝑘
≤ log 2 e (σ𝑘=1 𝑝𝑘 ln ( - 1) ) [6]
𝑝𝑘
𝑞 𝑞
≤log 2 e [σ𝑘=1 𝑄𝑘 - σ𝑘=1 𝑃𝑘 ] => ≤ 0 [7]
1 1
𝑞 1
H(S) - σ𝑘=1 𝑃𝑘 log ( )≤0 [8]
𝑄𝑘

The equality holds iff, 𝑄𝑘 = 𝑃𝑘


Eqn., [8] is valid for any set of numbers 𝑄𝑘 thatare non-negative and sum to unity. We may choose,
𝑟 −𝑙𝑘
𝑄𝑘 = 𝑞 [9]
σ𝑘=1 𝑟 −𝑙𝑘

ECE403 - INFORMATION THEORY & CODING


22-09-2021 73
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem
Applying Eqn., [9] in [8],
𝑞 𝑞
H(S) ≤σ𝑘=1 𝑃𝑘 log (𝑟 𝑙𝑘 σ𝑘=1 𝑟 −𝑙𝑘 ) [10]
𝑞 𝑞
H(S) ≤σ𝑘=1 𝑃𝑘 [𝑙𝑘 log r + log σ𝑘=1 𝑟 −𝑙𝑘 ] [11]
𝑞 𝑞
H(S) ≤ log r σ𝑘=1 𝑃𝑘 𝑙𝑘 + log σ𝑘=1 𝑟 −𝑙𝑘 [12]
𝑞
Consider, L = σ𝑘=1 𝑃𝑘 𝑙𝑘 =>Average codeword length
𝑞
H(S) ≤ L log r + log σ𝑘=1 𝑟 −𝑙𝑘 [13]
Second term in Eqn., [13] is either negative or almost zero, from Kraft’s inequality
H(S) ≤ L log r [14]

H(S)
L ≥ [15]
log r

ECE403 - INFORMATION THEORY & CODING


22-09-2021 74
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem

• Code Efficiency Derivation:


𝑞
Lower bound is possible when, (i) σ𝑘=1 𝑟 −𝑙𝑘 = 1 ; (ii) 𝑃𝑘 = 𝑟 −𝑙𝑘 [16]

1
log2 (𝑃 )
1 1 𝑘
For equality, we choose, 𝑙𝑘 = log 𝑟 ( ) i.e., ( log 𝑟 ( ) = ) [17]
𝑃𝑘 𝑃𝑘 log2 (𝑟)

𝑙𝑘 must be an integer for all k-1 to q;

Eqn., [15] => Lower bound on L, average word length of code expressed as a fraction of code bit/ source symbol

• We know that each codewords will have integer number of code bits. There is a problem where in we need to find
what to select for the value of 𝑙𝑘 , the number of code symbols in codeword ‘k’ corresponding to a source symbol
‘𝑆𝑘 ’ when the quantity in Eqn., [17] is not an integer

ECE403 - INFORMATION THEORY & CODING


22-09-2021 75
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem
1
• Suppose we choose 𝑙𝑘 to be next nearest integer value to be greater than log 𝑟 ( ),
𝑃𝑘
1 1
log 𝑟 ( ) ≤ 𝑙𝑘 ≤ log 𝑟 ( )+1 [18]
𝑃𝑘 𝑃𝑘

Eqn., [18] satisfies Kraft’s inequality


1
≤ 𝑟 𝑙𝑘 =>𝑃𝑘 ≥ 𝑟 −𝑙𝑘 [19]
𝑃𝑘

σ𝑞𝑘=1 𝑃𝑘 = 1 ≥ σ𝑞𝑘=1 𝑟 −𝑙𝑘 [20]

1
log2 (𝑃 )
1 𝑘
We know that, log 𝑟 ( ) =
𝑃𝑘 log2 (𝑟)
1 1
log𝑃 log𝑃
Eqn., [18] => 𝑘
≤ 𝑙𝑘 < 𝑘
+1 [21]
log 𝑟 log 𝑟

ECE403 - INFORMATION THEORY & CODING


22-09-2021 76
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem

Multiplying Eqn., [21] by 𝑃𝑘 and summing for all values ‘k’

σ𝑘=1 𝑃𝑘 log
𝑞
( 𝑃1 ) 𝑞
σ𝑘=1 𝑃𝑘 log (
𝑞 1
𝑃𝑘
) 𝑞
𝑘
≤σ𝑘=1 𝑝𝑘 𝑙𝑘 < + σ𝑘=1 𝑃𝑘 [22]
log 𝑟 log 𝑟

𝑯(𝑺) 𝑯(𝑺)
≤ 𝑳< +1 [23]
𝒍𝒐𝒈 𝒓 𝒍𝒐𝒈 𝒓

For binary codes, r=2,

H(S) ≤ 𝑳 < H(S) + 1 [24]

ECE403 - INFORMATION THEORY & CODING


22-09-2021 77
UNIT I – Information Theory and Source Coding
Shannon’s source coding Theorem (or)
Noiseless Coding Theorem
• To obtain better efficiency, we can use the n𝑡ℎ extension of source ‘S’
Eqn., [23] is valid for any zero memory source. It is also valid for S 𝑛 ,
𝐻(S𝑛 ) 𝐻(S𝑛 )
[23] => ≤ 𝐿< +1 [25]
𝑙𝑜𝑔 𝑟 𝑙𝑜𝑔 𝑟

We know that, H(S n ) = n . H(S)


𝑛 . 𝐻(S𝑛 ) 𝑛 . 𝐻(S𝑛 )
=> ≤ 𝐿< +1
𝑙𝑜𝑔 𝑟 𝑙𝑜𝑔 𝑟

𝑯(𝑺) 𝑳 𝐇(𝐒) 𝟏
=> ≤ < + [26]
𝒍𝒐𝒈 𝒓 𝐧 𝐥𝐨𝐠 𝐫 𝐧
𝐿 1
For binary codes, 𝐻(𝑆) ≤ < 𝐻(𝑆) + [27]
n n
𝐿𝑛 𝐻(𝑆)
lim = ( Lower and Upper bounds converge)
𝑛→∞ 𝑛 log 𝑟

ECE403 - INFORMATION THEORY & CODING


22-09-2021 78
UNIT I – Information Theory and Source Coding
Source Coding Techniques

• Source Coding Techniques:

1. Shannon Encoding

2. Shannon Fano Encoding

3. Huffman’s Encoding

4. Arithmetic Coding

5. Run-Length Encoding

6. Lempel-Ziv Encoding and Decoding

ECE403 - INFORMATION THEORY & CODING


22-09-2021 79
UNIT I – Information Theory and Source Coding
Shannon Encoding Procedure

• STEP 1: List the source symbols {S} = {𝑠1 , 𝑠2 , … , 𝑠𝑞 } in the order of decreasing probability {P} =
{𝑝1 , 𝑝2 , 𝑝3 , … , 𝑝𝑞 } of occurrence such that 𝑝1 > 𝑝2 > … > 𝑝𝑞
• STEP 2: Compute the sequence,
∝1 = 0
∝2 = 𝑝1
∝3 = 𝑝1 + 𝑝2
∝𝑞+1 = 𝑝𝑞 + 𝑝𝑞−1 + …… + 𝑝1
• STEP 3: Determine the set of integers ‘𝑙𝑘 ’ which are the smallest integer solution of the inequality
2𝑙𝑘 . 𝑝𝑘 ≥ 1 , k = 1, 2, ….., q
• STEP 4: Expand the decimal number of ∝ in binary form to 𝑙𝑘 places and neglect the expansion
beyond 𝑙𝑘 digits
• STEP 5: Removal of decimal points results in desired code
ECE403 - INFORMATION THEORY & CODING
22-09-2021 80
UNIT I – Information Theory and Source Coding
Numerical Problem on Shannon Encoding Procedure

• Consider the following ensemble {S} = {𝒔𝟏 , 𝒔𝟐 , … , 𝒔𝟒 } with {P} = {0.4, 0.3, 0.2, 0.1}. Encode the symbols
using Shannon’s binary encoding procedure and calculate the efficiency and redundancy of the code
Soln.,
• Arrange the probabilities, 𝒑𝟏 > 𝒑𝟐 > 𝒑𝟑 > 𝒑𝟒 => 0.4 > 0.3 > 0.2 > 0.1
• Compute the sequences ∝, ∝1 = 0
∝2 = 𝑝1 = 0.4
∝3 = 𝑝1 + 𝑃2 = 0.7
∝4 = 𝑝1 + 𝑝2 + 𝑝3 = 0.9
∝5 = 𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 = 1.0
• Find ‘𝑙𝑘 ’ => 2𝑙𝑘 . 𝑝𝑘 ≥ 1
2𝑙1 . 𝑝1 ≥ 1 ; 𝑙1 ≥ 1.321 => 𝑙1 = 2
2𝑙2 . 𝑝2 ≥ 1 ; 𝑙2 ≥ 1.730 => 𝑙2 = 2
2𝑙3 . 𝑝3 ≥ 1 ; 𝑙3 ≥ 2.320 => 𝑙3 = 3
2𝑙1 . 𝑝4 ≥ 1 ; 𝑙4 ≥ 3.320 => 𝑙4 = 4

ECE403 - INFORMATION THEORY & CODING


22-09-2021 81
UNIT I – Information Theory and Source Coding
Numerical Problem on Shannon Encoding Procedure

• Expand ‘∝’ into binary form,


• H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 )
∝1 = 0 = (0.0000)2 = - { (0.4 𝑙𝑜𝑔2 0.4) + (0.3 𝑙𝑜𝑔2 0.3) + (0.2 𝑙𝑜𝑔2
∝2 = 0.4 = (0.0110)2 0.2) + (0.1 𝑙𝑜𝑔2 0.1) }
∝3 = 0.7 = (0.10110)2 = 1.845 bits/symbol

∝4 = 0.9 = (0.11100)2 • L = Σ {𝑝𝑘 . 𝑙𝑘 } = { (2 x 0.4) + (2 x 0.3) + (3 x 0.2) + (4 x


∝5 = 1.0 = (1.0)2 0.1) }
• Removal of decimal point, = 2.4 bits

𝐻(𝑆)
• Efficiency = = 76%
𝐿
Symbol Code Length
𝑺𝟏 00 2 • Redundancy = 24%
𝑺𝟐 01 2
𝑺𝟑 101 3
𝑺𝟒 1110 4

ECE403 - INFORMATION THEORY & CODING


22-09-2021 82
UNIT I – Information Theory and Source Coding
Shannon Fano Encoding Procedure

• STEP 1: List the source symbols in the order of decreasing probabilities

• STEP 2: Partition this ensemble into almost two equiprobable groups ( ‘r’ groups for r-ary coding) for
binary coding

• STEP 3: Assign ‘0’ to one group and ‘1’ to the other group ( assign a code symbol each to each group
respectively, from code alphabet). This forms the stating code symbols of the codes

• STEP 4: Repeat steps 2 & 3 on each of the sub-groups until the sub-groups contain only one source
symbol to determine the succeeding code symbols of the code word

ECE403 - INFORMATION THEORY & CODING


22-09-2021 83
UNIT I – Information Theory and Source Coding
Problem on Shannon Fano Encoding

𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
• Consider the message ensemble {S} = {𝒔𝟏 , 𝒔𝟐 ,……, 𝒔𝟖 } and {P} = { , , , , , , , } with {X} = {0,1}
𝟒 𝟒 𝟖 𝟖 𝟏𝟔 𝟏𝟔 𝟏𝟔 𝟏𝟔
Construct a binary code using Shannon Fano encoding procedure. Calculate η and 𝑬𝒄 .
Soln.,
Symbols Codes
𝟏 (Length)
𝒔𝟏 𝒔𝟏 𝒔𝟏 00
𝟒
0 𝒔𝟏
𝒔𝟐 𝟏
𝒔𝟐 𝒔𝟐 00 (2)
𝟒
01
𝟏 𝒔𝟐 01 (2)
𝒔𝟑 𝒔𝟑 𝒔𝟑 𝒔𝟑 100
𝟖 𝒔𝟑
10 100 (3)
𝟏
𝒔𝟒 𝒔𝟒 𝒔𝟒 𝒔𝟒 101
𝟖 𝒔𝟒 101 (3)
𝟏
𝒔𝟓 𝒔𝟓 1 𝒔𝟓 𝒔𝟓 𝒔𝟓 1100 𝒔𝟓 1100 (4)
𝟏𝟔
𝟏
110
𝒔𝟔 𝒔𝟔 𝒔𝟔 𝒔𝟔 𝒔𝟔 1101 𝒔𝟔 1101 (4)
𝟏𝟔 11
𝒔𝟕 𝟏
𝒔𝟕 𝒔𝟕 𝒔𝟕 𝒔𝟕 1110 𝒔𝟕 1110 (4)
𝟏𝟔
𝟏
111 𝒔𝟖 1111 (4)
𝒔𝟖 𝒔𝟖 𝒔𝟖 𝒔𝟖 𝒔𝟖 1111
𝟏𝟔

ECE403 - INFORMATION THEORY & CODING


22-09-2021 84
UNIT I – Information Theory and Source Coding
Numerical Problem on Shannon Fano Encoding

• Entropy, H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 )


= -{{ (2) x ( 0.25 𝑙𝑜𝑔2 (0.25)) } + { (2) x ( 0.125 𝑙𝑜𝑔2 (0.125)) } + { (4) x ( 0.0625 𝑙𝑜𝑔2 (0.0625)) }
= 2.75 bits/symbol

• Length, L = { {2 x (2 x 0.25)} + {2 x (3 x 0.125)} + {4 x (4 x 0.0625)} }


= 2.75 bits/symbol

𝐻(𝑆)
• Efficiency, η = = 100 %
𝐿

• Redundancy = 0

ECE403 - INFORMATION THEORY & CODING


22-09-2021 85
UNIT I – Information Theory and Source Coding
Numerical Problem

• Construct a trinary code for symbols with {P} ={0.3,0.3,0.09,0.09,0.09,0.09,0.04} and {X} = {0,1,2} using
Shannon Fano Encoding Procedure
Symbol Code Length
Soln.,
𝒔𝟏 0 1
𝒔𝟏 0.30 𝒔𝟏 0 𝒔𝟐 1 1
𝒔𝟑 20 2
𝒔𝟐 0.30 𝒔𝟐 1
𝒔𝟒 21 2
𝒔𝟑 0.09 𝒔𝟑 𝒔𝟑 20 𝒔𝟓 220 3
𝒔𝟔 221 3
𝒔𝟒 0.09 𝒔𝟒 𝒔𝟒 21 𝒔𝟕 222 3
𝒔𝟓 0.09 𝒔𝟓 2 𝒔𝟓 𝒔𝟓 220 • H(S) = 2.477 bits/symbol
• L = 1.62 trinits/symbol
𝒔𝟔 0.09 𝒔𝟔 𝒔𝟔 22 𝒔𝟔 221 (r=3)
H(S) 2.477
• Efficiency = = = 96.53%
𝒔𝟕 0.04 𝒔𝟕 𝒔𝟕 𝒔𝟕 222 𝐿 . log 𝑟 1.62 𝑙𝑜𝑔2 (3)
• Redundancy = 3.47%
ECE403 - INFORMATION THEORY & CODING
22-09-2021 86
UNIT I – Information Theory and Source Coding
Huffman Encoding Procedure

• Huffman Encoding gives the proper position for optimal code

• The code with minimum average length ‘L’ would be more efficient and to have minimum redundancy associated
with it

• A compact code is one which achieves this objective. Huffman has suggested a simple method that guarantees an
optimal code

• Procedure:

• STEP 1: List the source symbol in decreasing order of probabilities

• STEP 2: Check if q=r+ ∝(r-1) is satisfied and find the integer ‘∝’. Otherwise add suitable number of dummy
symbols of zero probability of occurrence to satisfy the equation (This step is not needed for binary codes)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 87
UNIT I – Information Theory and Source Coding
Huffman Encoding Procedure

• STEP 3: Club the last ‘r’ symbols into a single composite symbol whose probabilities of occurrence is equal to
sum of probabilities of occurrence of ‘r’ symbols involved in this step

• STEP 4: A new list of events is recorded again to be in the order of decreasing probability

• STEP 5: Repeat steps 3 and 4 on the resulting set of symbols until in the final step exactly ‘r’ symbols are left

• STEP 6: Assign codes to the last ‘r’ composite symbols and work backwards to the original source to arrive at the
optimal code

• STEP 7: Discard the codes of dummy symbols (This step is not needed for binary codes)

ECE403 - INFORMATION THEORY & CODING


22-09-2021 88
UNIT I – Information Theory and Source Coding
Numerical Problem on Huffman Encoding Procedure

• Construct a Huffman code for symbols {S} = {𝒔𝟏 , 𝒔𝟐 ,……, 𝒔𝟔 } and {P} = {0.3, 0.25, 0.2, 0.1, 0.1, 0.05} with
{X} = {0, 1}
• Soln., 0
𝒔𝟏 0.30 0.30 0.30 0.45 0.55
0
𝒔𝟐 0.25 0.25 0.25 0.30 0.45
0 1
𝒔𝟑 0.20 0.20 0.25 0.25
0 1
𝒔𝟒 0.10 0.15 0.20
0 1
𝒔𝟓 0.10 0.10 1 Symbol Code Length

𝒔𝟔 0.10 1 𝒔𝟏 00 2
𝒔𝟐 10 2

• H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = 2.365 bits/symbol 𝒔𝟑 11 2

• L = 2.4 bits/symbol 𝒔𝟒 011 3

• Efficiency = H(S)/L = 98.54 % 𝒔𝟓 0100 4

• Redundancy = 0.0146 𝒔𝟔 0101 4

ECE403 - INFORMATION THEORY & CODING


22-09-2021 89
UNIT I – Information Theory and Source Coding
Huffman Code – Second Order Extension

• Apply Huffman binary encoding procedure for the given symbols{S} = {𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } with {P} =
{0.5,0.3,0.2} . Find the entropy and efficiency. If the same technique is applied for the second
order extension of the source calculate its entropy and efficiency.
Soln., 0
𝒔𝟏 0.5 0.5
0
𝒔𝟐 0.3 0.5
1 • H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = 1.485 bits/symbol
𝒔𝟑 0.2
1 • L = 1.5 bits/symbol
• Efficiency = H(S)/L = 99 %
Symbol Code Length
𝒔𝟏 1 1 • Redundancy = 1%
𝒔𝟐 00 2
𝒔𝟑 01 2

ECE403 - INFORMATION THEORY & CODING


22-09-2021 90
UNIT I – Information Theory and Source Coding
Huffman Code – Second Order Extension

• Second Extension:

𝒙𝟏 𝒔𝟏 𝒔𝟏 0.25

𝒙𝟐 𝒔𝟏 𝒔𝟐 0.15

𝒙𝟑 𝒔𝟏 𝒔𝟑 0.1

𝒙𝟒 𝒔𝟐 𝒔𝟏 0.15

𝒙𝟓 𝒔𝟐 𝒔𝟐 0.09

𝒙𝟔 𝒔𝟐 𝒔𝟑 0.06

𝒙𝟕 𝒔𝟑 𝒔𝟏 0.10

𝒙𝟖 𝒔𝟑 𝒔𝟐 0.06

𝒙𝟗 𝒔𝟑 𝒔𝟑 0.04

ECE403 - INFORMATION THEORY & CODING


22-09-2021 91
UNIT I – Information Theory and Source Coding
Huffman Code – Second Order Extension
0
𝒙𝟏 0.25 0.25 0.25 0.25 0.25 0.30 0.45 0.55
0
𝒙𝟐 0.15 0.15 0.15 0.20 0.25 0.25 0.30 0.45
0 1
𝒙𝟒 0.15 0.15 0.15 0.15 0.20 0.25 0.25
0 1
𝒙𝟑 0.10 0.10 0.15 0.15 0.15 0.20
0 1
𝒙𝟕 0.10 0.10 0.10 0.15 0.15
0 1 Symbols Code
𝒙𝟓 0.09 0.10 0.10 0.10 𝒙𝟏 11
0 1
𝒙𝟔 0.06 0.09 0.10 𝒙𝟐 001
0 1
𝒙𝟖 0.06 0.06 𝒙𝟑 110
1
𝒙𝟗 0.04 𝒙𝟒 010
1
• H(S) = 2.969 bits/symbol 𝒙𝟓 0000
• L = 3 binits/symbol 𝒙𝟔 0001
• Efficiency = 98.96% 𝒙𝟕 111
• Redundancy = 1.04% 𝒙𝟖 0110
𝒙𝟗 0111

ECE403 - INFORMATION THEORY & CODING


22-09-2021 92
UNIT I – Information Theory and Source Coding
Numerical Problem

𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
• Construct a Huffman code for the symbols {S} = {𝒔𝟏 , 𝒔𝟐 ,…, 𝒔𝟔 } with {P} = { , , , , , } and {X} =
𝟑 𝟒 𝟖 𝟖 𝟏𝟐 𝟏𝟐

{0,1,2}

Soln., Solving q = r + ∝ (r-1) ; q=6;r=3

𝑞−𝑟 6−3 𝟑
∝= = = (not an integer)
𝑟−1 3−1 𝟐

Therefore, we add a dummy variable with probability zero

Now, q=7;r=3

𝑞−𝑟 7−3 𝟒
∝= = = = 2 (∝ is an integer)
𝑟−1 3−1 𝟐

ECE403 - INFORMATION THEORY & CODING


22-09-2021 93
UNIT I – Information Theory and Source Coding
Numerical Problem (Contd.,)

𝒔𝟏 0.333 0.333 0.416 0 Symbol Code Length


𝒔𝟐 0.250 0.250 0.333 1 𝒔𝟏 1 1
0
𝒔𝟑 0.125 0.166 0.250 2 𝒔𝟐 2 1
1
𝒔𝟒 0.125 0.125 𝒔𝟑 01 2
0 2
𝒔𝟓 0.083 0.125 𝒔𝟒 02 2
1 𝒔𝟓 000 3
𝒔𝟔 0.083
2 𝒔𝟔 001 3
𝒔𝟕 0
𝒔𝟕 002 3

• H(S) = 2.375 bits/symbol


• Length, L = 1.5833 trinits/symbol
• Efficiency = 94.54%
• Redundancy = 5.46%

ECE403 - INFORMATION THEORY & CODING


22-09-2021 94
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• Procedure:

• STEP 1: Divide the given probabilities in the same order ranging from 0 to 1

• STEP 2: Expand the first symbol to be coded. The new range is defined by calculating its limits

d = Upper Limit – Lower Limit of the symbol encoded

New Range (Lower Limit of each symbol) = Lower Limit + d (Probability of the symbol)

• STEP 3: Repeat the procedure for successive symbols until the final symbol in the given sequence is
encoded

• STEP 4: The tag value is generated by calculating

Tag = (Upper Limit + Lower Limit) / 2


ECE403 - INFORMATION THEORY & CODING
22-09-2021 95
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• STEP 5: Decoding Process: The tag value is used to decode the symbols assigned with its probabilities

• STEP 6: The probabilities of the symbols are arranged in the given format ranging from 0 to 1

• STEP 7: The range in which tag value is present is now formed as new range having the lower bound
and upper bound. The new lower limit of each symbol within this range is calculated by

d = Upper bound – Lower bound

New Range (Lower Limit of each symbol) = Lower Limit + d (Probability of the symbol)

• STEP 8: This procedure continues until all the symbols are decoded

• STEP 9: The symbol within the tag value decoded at each stage is stored as a sequence forming the
final decoded data
ECE403 - INFORMATION THEORY & CODING
22-09-2021 96
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• Using Arithmetic Coding, encode the message MATHS with probabilities A = 0.3 ; T = 0.3 ; H = 0.2 ; M =
0.1 ; S = 0.1 . Generate the Tag value

Soln., STEP 1 STEP 2 • d = Upper bound – Lower bound = 0.9 – 0.8 = 0.1
1 0.9
S S • Range of each “Symbol” =
0.9 0.89
M M Lower Limit + d (Prob. Of Symbol)
0.8 0.88
• Range of “A” = 0.8 + 0.1 (0.3) = 0.83
H H
0.6 0.86 • Range of “T” = 0.83 + 0.1 (0.3) = 0.86
T T
0.3 0.83 • Range of “H” = 0.86 + 0.1 (0.2) = 0.88
A A
• Range of “M” = 0.88 + 0.1 (0.1) = 0.89
0 0.8
• Range of “S” = 0.89 + 0.1 (0.1) = 0.9

ECE403 - INFORMATION THEORY & CODING


22-09-2021 97
UNIT I – Information Theory and Source Coding
Arithmetic Coding

Soln., STEP 2 STEP 3


• d = Upper bound – Lower bound = 0.83 – 0.8 = 0.03
0.9 0.83
S S • Range of each “Symbol” =
0.89 0.827
M M Lower Limit + d (Prob. Of Symbol)
0.88 0.824
• Range of “A” = 0.8 + 0.03 (0.3) = 0.809
H H
0.86 0.818 • Range of “T” = 0.809 + 0.03 (0.3) = 0.818
T T
0.83 0.809 • Range of “H” = 0.818 + 0.03 (0.2) = 0.824
A A
• Range of “M” = 0.824 + 0.03 (0.1) = 0.827
0.8 0.8
• Range of “S” = 0.827 + 0.03 (0.1) = 0.83

ECE403 - INFORMATION THEORY & CODING


22-09-2021 98
UNIT I – Information Theory and Source Coding
Arithmetic Coding

Soln., STEP 3 STEP 4


• d = Upper bound – Lower bound = 0.818 – 0.809 = 0.009
0.83 0.818
S S • Range of each “Symbol” =
0.827 0.8171
M M Lower Limit + d (Prob. Of Symbol)
0.824 0.8162
• Range of “A” = 0.809 + 0.009 (0.3) = 0.8117
H H
0.818 0.8144 • Range of “T” = 0.8117 + 0.009 (0.3) = 0.8144
T T
0.809 0.8117 • Range of “H” = 0.8144 + 0.009 (0.2) = 0.8162
A A
• Range of “M” = 0.8162 + 0.009 (0.1) = 0.8171
0.8 0.809
• Range of “S” = 0.8171 + 0.009 (0.1) = 0.818

ECE403 - INFORMATION THEORY & CODING


22-09-2021 99
UNIT I – Information Theory and Source Coding
Arithmetic Coding

Soln., STEP 4 STEP 5


• d = Upper bound – Lower bound = 0.8162 – 0.8144 = 0.0018
0.818 0.8162
S S • Range of each “Symbol” =
0.8171 0.81602
M M Lower Limit + d (Prob. Of Symbol)
0.8162 0.81584
• Range of “A” = 0.8144 + 0.0018 (0.3) = 0.81494
H H
0.8144 0.81548 • Range of “T” = 0.81494 + 0.0018 (0.3) = 0.81548
T T
0.8117 0.81494 • Range of “H” = 0.81548 + 0.0018 (0.2) = 0.81584
A A
• Range of “M” = 0.81584 + 0.0018 (0.1) = 0.81602
0.809 0.8144
• Range of “S” = 0.81602 + 0.0018 (0.1) = 0.8162

ECE403 - INFORMATION THEORY & CODING


22-09-2021 100
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• The arithmetic codeword from the encoding process is obtained in the range

0.81602 < CODEWORD < 0.8162

• The TAG value is generated by

TAG = ( Upper Limit + Lower Limit ) / 2

= ( 0.8162 + 0.81602 ) / 2

TAG = 0.81611

ECE403 - INFORMATION THEORY & CODING


22-09-2021 101
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5

Soln., STEP 1 STEP 2


1 1.0
• d = Upper bound – Lower bound = 1.0 – 0.5 = 0.5
C C
• Range of each “Symbol” =
0.5 0.75
Lower Limit + d (Prob. Of Symbol)
B B
• Range of “A” = 0.5 + 0.5 (0.1) = 0.55
0.1 0.55
• Range of “B” = 0.55 + 0.5 (0.4) = 0.75
A A
• Range of “C” = 0.75 + 0.5 (0.5) = 1.0
0 0.5

ECE403 - INFORMATION THEORY & CODING


22-09-2021 102
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5

Soln., STEP 2 STEP 3


1.0 0.75
• d = Upper bound – Lower bound = 0.75 – 0.55 = 0.2
C C
• Range of each “Symbol” =
0.75 0.65
Lower Limit + d (Prob. Of Symbol)
B B
• Range of “A” = 0.55 + 0.2 (0.1) = 0.57
0.55 0.57
• Range of “B” = 0.57 + 0.2 (0.4) = 0.65
A A
• Range of “C” = 0.65 + 0.2 (0.5) = 0.75
0.5 0.55

ECE403 - INFORMATION THEORY & CODING


22-09-2021 103
UNIT I – Information Theory and Source Coding
Arithmetic Coding

• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5
• d = Upper bound – Lower bound = 0.65 – 0.57 = 0.08
Soln., STEP 3 STEP 4
0.75 0.65 • Range of each “Symbol” =

C C Lower Limit + d (Prob. Of Symbol)


0.65 0.61 • Range of “A” = 0.57 + 0.08 (0.1) = 0.578
B B
• Range of “B” = 0.578 + 0.08 (0.4) = 0.61

0.57 0.578 • Range of “C” = 0.61 + 0.08 (0.5) = 0.65

A A • The decoded word is:

0.55 0.57 CB BA

ECE403 - INFORMATION THEORY & CODING


22-09-2021 104
UNIT I – Information Theory and Source Coding
Run Length Encoding

• Procedure:

• STEP 1: The sequence at the output of a discrete source is written as number of times the bit occurs as
a sequence

• STEP 2: Every sequence is represented as (bit value, Number of occurrence in the sequence)

• STEP 3: If the maximum number of occurrence for all the sequence is denoted as ‘n’ then length of
occurrence in bits is log 2 (n) bits (If n is decimal, take the next integer)

• STEP 4: The number of occurrence in each sequence is replaced by its binary form with length of
log 2 (n) bits

• STEP 4: The final sequence of the compressed form is written as the encoded output
ECE403 - INFORMATION THEORY & CODING
22-09-2021 105
UNIT I – Information Theory and Source Coding
Run Length Encoding

• Using Run Length Encoding Technique, encode the given bit stream 00000111110010000101 .
Soln., Original Bit Stream : 00000111110010000101

STEP 1: Grouping bits as per successive occurrence


00000 11111 00 1 0000 1 0 1
STEP 2: Arranging in the form (Bit Value, Number of Occurrence)
(0,5) (1,5) (0,2) (1,1) (0,4) (1,1) (0,1) (1,1)
STEP 3: Finding the length of occurrence in bits
Maximum number of occurrence = 5
Length of occurrence in bits = log 2 (5) = 2.32 ≈ 3 bits
STEP 4: Representing each occurrence value in its corresponding 3 bit representation
(0,5) (1,5) (0,2) (1,1) (0,4) (1,1) (0,1) (1,1)

(0,101) (1,101) (0,010) (1,001) (0,100) (1,001) (0,001) (1,001)


STEP 5: Encoded bit stream
01011101001010010100100100011001

ECE403 - INFORMATION THEORY & CODING


22-09-2021 106
UNIT I – Information Theory and Source Coding
Run Length Encoding

• Using Run Length Encoding Technique, encode the given bit stream.
000000111111111111110000000000000111111111

Soln., Original Bit Stream : 000000111111111111110000000000000111111111


STEP 1: Grouping bits as per successive occurrence
000000 11111111111111 0000000000000 111111111
STEP 2: Arranging in the form (Bit Value, Number of Occurrence)
(0,6) (1,14) (0,13) (1,9)
STEP 3: Finding the length of occurrence in bits
Maximum number of occurrence = 14
Length of occurrence in bits = log 2 (14) = 3.8 ≈ 4 bits
STEP 4: Representing each occurrence value in its corresponding 3 bit representation
(0,6) (1,14) (0,13) (1,9)

(0,0110) (1,1110) (0,1101) (1,1001)


STEP 5: Encoded bit stream
00110111100110111001
ECE403 - INFORMATION THEORY & CODING
22-09-2021 107
UNIT I – Information Theory and Source Coding
Run Length Encoding

• Using Run Length Encoding Technique, encode the given bit stream AAAAABBBBCCCDEEEFFFFGG.
Soln., Original Bit Stream : AAAAABBBBCCCDEEEFFFFGG

STEP 1: Grouping bits as per successive occurrence

AAAAA BBBB CCC D EEE FFFF GG

STEP 2: Arranging in the form (Bit Value, Number of Occurrence)

(A,5) (B,4) (C,3) (D,1) (E,3) (F,4) (G,2)

Encoded bit stream

A5B4C3D1E3F4G2

ECE403 - INFORMATION THEORY & CODING


22-09-2021 108
UNIT I – Information Theory and Source Coding
Lempel Ziv Encoding

• The major difficulty in using Huffman code is that symbol probabilities must be known or estimated
and both encoder and decoder must know the coding tree

• If a tree is constructed from a unusual alphabet, a channel connecting encoder and decoder , must also
deliver a coding tree as the header (for the compressed file)

• This overhead would reduce the compression efficiency

• The Lempel Ziv algorithm is designed to be independent of source probability. It is a variable length to
a fixed length algorithm

ECE403 - INFORMATION THEORY & CODING


22-09-2021 109
UNIT I – Information Theory and Source Coding
Lempel Ziv Encoding

• Procedure:

• STEP 1: The sequence at the output of a discrete source is divided into variable length blocks which
are called phrases

• STEP 2: A new phrase is introduced every time, a block of letters from the source differs from some
previous phrases in last letter

• STEP 3: Phrases are listed in the dictionary which shows the location of the existing phrase

• STEP 4: When encoding a new phrase, simple specify the location of existing phrase in the dictionary
and append a new letter

ECE403 - INFORMATION THEORY & CODING


22-09-2021 110
UNIT I – Information Theory and Source Coding
Numerical Problem on Lempel Ziv Encoding

• Encode the given sequence using Lempel Ziv algorithm: 10101101001001110101000011001110101100011011


Soln., 1 | 0 | 10 | 11 | 01 | 00 | 100 | 111 | 010 | 1000 | 011 | 001 | 110 | 101 | 10001 | 1011

Dictionary Location Dictionary Content Codeword 1000 111 01001

Ref: 0000 1001 010 01010

0001 1 00001 1010 1000 01110

0010 0 00000 1011 011 01011

0011 10 00010 1100 001 01101

0100 11 00011 1101 110 01001

0101 01 00101 1110 101 00110

0110 00 00100 1111 10001 10101

0111 100 00110 1011 11101

ECE403 - INFORMATION THEORY & CODING


22-09-2021 111
UNIT I – Information Theory and Source Coding

You might also like