Ece403 - Itc - Unit 1 - PPT Notes
Ece403 - Itc - Unit 1 - PPT Notes
• This course enables the learners to realize the fundamental concepts of information theory,
various types of communication channel and its capacity for data transfer
• The course also analyzes various types of source coding and channel coding techniques and
their significance for efficient and reliable communication
• TEXTBOOKS:
• Bernard Sklar and Prabitra Kumar Ray, Digital Communications, 2nd Edition, Pearson Education,
2011
• Simon Haykin, Communication Systems, 5th Edition, John Wiley and Sons, 2010
• F.M.Reza, An introduction to information theory, McGraw Hill Inc., 1994
• REFERENCES:
• B.P.Lathi, Modern Digital and Analog Communication Systems, 4th Edition, Oxford University
Press, 2012
• Salvatore Gravano, Introduction to Error Control Codes, Oxford University Press, 2011
• R.P.Singh and S.D.Sapre, Communication Systems - Analog and Digital, 2nd Edition, Tata
McGraw Hill, 2008
• Peter Sweeney, Error Control Coding from Theory to Practice, 2nd Edition, Wiley, 2002
• ONLINE MATERIAL:
• NPTEL : http://www.youtube.com/watch?v=f8RvFlr5wRk
• UNIT I:
• Remember the basics notions in information theory like self-information, entropy and its types
• Implement various types of source coding algorithms and classify them
• UNIT II:
• Analyse various types of communication channels and its channel capacity
• UNIT III:
• Design and interpret various types of error control codes like linear block codes, cyclic codes,
convolutional codes and trellis coded modulation
• UNIT IV:
• Design and interpret about BCH Code and Reed Solomon Code
•
• Information Theory deals with a mathematical modeling and analysis of a Communication System
• It determines the capacity of the system to transfer essential information from the source to destination
• Transmitter: Convert the message signal produced by the source of information into a form suitable for
transmission over the channel. However, as the transmitted signal propagates along the channel, it is distorted due
to channel imperfections
• Channel: Physical medium that connects the transmitter and the receiver. noise and interfering signal (originating
from other sources) are added to the channel output, with the result that the received signal is a corrupted version
of the transmitted signal
• Receiver: Reconstructs a recognizable form of the original message signal for an end user or information sink
Transmitter
Receiver
• The basic block representation of Communication System consists of:
• Transmitter
• Channel
• Receiver
• Information Source
• Source Encoder
• Compresses the data into minimum number of bits in order to have effective utilization of the
bandwidth
• Channel Encoder
• Performs the process of error correction as the noise in the channel might alter the information
• It adds redundant bits to the transmitted data called the error correcting bits
• Modulator
• Channel
• Demodulator
• The received signal is demodulated to extract the original signal from the carrier
• Channel Decoder
• The distortions occurred during the transmission are corrected by the decoder
• Source Decoder:
• Destination:
• Bandwidth
• Noise
• Equipment
• An information source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution
• If there are m = 2𝑁 equally likely messages, then amount of information carried by each message
will be “N” bits
• Unit of I(𝒙𝒊 ):
• Hartley or decit if b = 10
log10 𝑋 log10 𝑋
• log 2 X = =
log10 2 0.3010
log10 𝑋 log10 𝑋
• log 3 X = =
log10 3 0.4771
log10 3 0.4771
• log 2 3 = = = 1.585
log10 2 0.3010
log10 2
• log 2 2 = =1
log10 2
1 1
• log 2 ( ) = 3.322 x log10 ( )
P(A) P(A)
• A source produces one of the four possible symbols during each interval having probabilities P(𝑿𝟏 )=0.5,
P(𝑿𝟐 )=0.25, P(𝑿𝟑 )=P(𝑿𝟒 )=0.125 . Obtain the information content of each of these symbols.
Soln.,
1
I(X) = 𝑙𝑜𝑔2 ( )
𝑃(𝑋)
Therefore.,
1 1 1
I(𝑋1 ) = 𝑙𝑜𝑔2 ( ) = 1 bit ; I(𝑋2 ) = 𝑙𝑜𝑔2 ( ) = 2 bits ; I(𝑋3 ) = I(𝑋4 ) = 𝑙𝑜𝑔2 ( ) = 3 bits
0.5 0.25 0.125
• Calculate the amount of information if binary digits occur with equal likelihood in a binary PCM System.
Soln., Number of Symbols in PCM = 2 (0 and 1)
i.e., P(𝑋1 ) = P(𝑋2 ) = 0.5
1
I(𝑋1 ) = I(𝑋2 ) = 𝑙𝑜𝑔2 ( ) = 1 bit
0.5
• If the receiver knows the message being transmitted, the amount of information carried will be Zero.
PROVE THE STATEMENT
• A card is selected at random from a deck and found that it is from red suite, how much information is
received? How much more information is needed to completely specify the card?
26 1
Soln., Information Received: P(𝐴1 ) = = 0.5 ; I(𝐴1 ) = 𝑙𝑜𝑔2 ( ) = 1 bit
52 0.5
1 1
Information Received: P(𝐴2 ) = = 0.019 ; I(𝐴2 ) = 𝑙𝑜𝑔2 ( ) = 5.7 bits
52 0.019
• A single TV picture can be thought of an array of black, white and gray dots roughly 500 rows and 600
columns. Suppose that each of these dots may take on any one of 10 distinguishable levels, What is the
amount of information provided by one picture?
1
Probability of one pic, P(A) =
10300000
• If the emitted symbols are statistically independent, i.e., any symbol being produced does not depend
upon the symbols that have been produced already , we say that the source has no memory and is called
as Discrete Memoryless Source
• The amount of information contained in a symbol (𝑋𝑟 ) emitted by the DMS is closely related to the
amount of uncertainty of that symbol
• A discrete source emits sequence of symbols from fixed finite source alphabet,
• Suppose we consider a long sequence of symbols, where n symbols is made up 𝑛1 symbols of type
𝑠1, 𝑛2 symbols of type 𝑠2 and 𝑛𝑞 symbols of type 𝑠𝑞 . The amount of information associated with
each symbol of the source is given by,
1
I(𝑆𝑘 ) = 𝑙𝑜𝑔2 ( ) [3]
𝑃(𝑠𝑘 )
q 𝑛𝑘
H(S) = σk=1 𝑝𝑘 . I 𝑠𝑘 ](𝑃𝑘 = ) [5]
𝑛
𝒒 𝟏
H(S) =σ𝐤=𝟏 𝒑𝒌 𝒍𝒐𝒈𝟐 ( ) bits/symbol [6]
𝒑𝒌
• Entropy is found maximum, when the uncertainty is maximum i.e., when all the alphabet of
source X are equiprobable
• It is a measure of the average information content per source symbol denoted by,
• The source entropy H(X) satisfies the relation: 0 ≤ H(X) ≤ 𝒍𝒐𝒈𝟐 m, where m is the size of the
alphabet source X
• Continuity Property:
• If the probability of occurrence of events 𝑋𝑘 are slightly changed, the measurement of uncertainty associated
with the system varies accordingly in a continuous manner
𝑝𝑘 = P (𝑋𝑘 ) ; 0 ≤ 𝑝𝑘 ≤ 1
H(𝑋𝑘 ) = - σ𝑁
1 𝑝𝑘 𝑙𝑜𝑔2 𝑝𝑘 bits/symbol
• As, 𝑝𝑘 is continuous between the limits 0 and 1, H(X) also varies continuously
• Symmetry Property:
• As the entropy is the sum of weighted averages its value remains the same even when the position of the
values are interchanged. The value of the entropy function remains same irrespective of location of
probabilities
ECE403 - INFORMATION THEORY & CODING
22-09-2021 29
UNIT I – Information Theory and Source Coding
Properties of Entropy
• Minimum value of H(X) is Zero: 0 ≤ H(X) ≤ 𝑙𝑜𝑔2 N ; where N is total number of symbols
• Extremal Property:
• Entropy has its maximum value when all the events are equally likely
• Additive Property:
• 𝐻2 (𝑝1 ,𝑝2 ,𝑝3 ,….,𝑝𝑁−1 , 𝑞1 ,𝑞2 ,𝑞3 ,….,𝑞𝑚 ) = 𝐻1 (𝑝1 ,𝑝2 ,𝑝3 ,….,𝑝𝑁 ) + 𝑝𝑁 𝐻3 (𝑞1 /𝑝𝑁 , 𝑞2 /𝑝𝑁 ,….,
𝑞𝑁 /𝑝𝑁 )
• The figure shows plots of a straight line Y=X-1 and a log function
Y= ln X on the same set of co=ordinate axis
• Plot of ln X, always lie below the
• Notice that any point on the straight line will always be found
straight line Y = X-1 and equality
exists when X= +1 above the log function for any given value of X
• The straight line is tangent to log function at X=1
• Statement: For a zero memory information source with ‘q’ symbol alphabet the entropy becomes
maximum if and only if all the source symbols are equally probable
1
𝐇(𝐒)𝒎𝒂𝒙 = log q ; if 𝑝𝑘 = for all k=1 to q
𝑞
• Proof: Consider a memory less source with q symbol alphabet, {S} = {𝑠1, 𝑠2 , …….. , 𝑠𝑞 } with
probabilities {P} = {𝑝1 , 𝑝2 , 𝑝3 , …….. , 𝑝𝑞 } . Entropy of source is given by,
𝑞 1
H(S) = σ𝑘=1 𝑝𝑘 𝑙𝑜𝑔2 ( ) [1]
𝑝𝑘
𝑞
(σ𝑘=1 𝑃𝑘 = 1)
𝑞 𝑞 1
= σ𝑘=1 𝑝𝑘 log q - σ𝑘=1 𝑝𝑘 log [3]
𝑝𝑘
𝑞
=>𝑙𝑜𝑔2 q – H(S) = σ𝑘=1 𝑝𝑘 𝑙𝑜𝑔2 (q 𝑝𝑘 ) [4]
Changing the base to e, 𝑙𝑜𝑔2 X = 𝑙𝑜𝑔2 e . 𝑙𝑜𝑔𝑒 X
𝑞
𝑙𝑜𝑔2 q – H(S) = 𝑙𝑜𝑔2 e [ σ𝑘=1 𝑝𝑘 ln q. 𝑝𝑘 ] [5]
1
• Apply logarithmic inequality ln ( )≥ (1-X) [6]
𝑋
1
X=
𝑞 . 𝑝𝑘
𝑞 1
𝑙𝑜𝑔2 q – H(S) ≥ 𝑙𝑜𝑔2 e σ𝑘=1 𝑝𝑘 . ( 1 - ) [7]
𝑞 . 𝑝𝑘
1
Equality holds on only if X=1, 𝑞 . 𝑝𝑘 = 1 => 𝑝𝑘 = [8]
𝑞
𝑞 𝑞 1
𝑙𝑜𝑔2 q – H(S) ≤ 𝑙𝑜𝑔2 e ( σ𝑘=1 𝑝𝑘 - σ𝑘=1 ) [9]
𝑞
H(S) ≤ log q
• Statement: Partitioning of symbols or events into sub-symbols or sub events cannot decrease the entropy
• Proof: Consider a memoryless information source with ‘q’ symbol alphabets {S} = {𝑠0 , 𝑠1 , 𝑠2 , …….. , 𝑠𝑞 } with
associated probabilities {𝑝1 , 𝑝2 , 𝑝3 , …….. , 𝑝𝑞 } . Suppose we split symbol 𝑆𝑞 into ‘m’ sub-symbols such that,
𝑆𝑞 = σ𝑚
𝑗=1 𝑠𝑞𝑗 ; P {𝑆𝑞𝑗 } = 𝑝𝑞𝑗
𝑃𝑞 = σ𝑚
𝑗=1 𝑝𝑞𝑗
𝑞−1 1 1
H(S) =σ𝑘=1 𝑝𝑘 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [2]
𝑝𝑘 𝑝𝑞𝑗
𝑞 1 1 1
= σ𝑘=1 𝑝𝑘 log ( ) - 𝑝𝑞 log + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [3]
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗
Since, 𝑝𝑞 = σ𝑚
𝑗=1 𝑝𝑞𝑗
𝑞 1 1 1
H(S) = σ𝑘=1 𝑝𝑘 log ( ) - σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 log ( ) [4]
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗
𝑞 1 1 1
H(S) = σ𝑘=1 𝑝𝑘 log ( ) + σ𝑚
𝑗=1 𝑝𝑞𝑗 ( - log ( ) + log ( ))
𝑝𝑘 𝑝𝑞 𝑝𝑞𝑗
1 𝑝𝑞
H(S) = σ𝑞𝑘=1 𝑝𝑘 log ( )+ σ𝑚
𝑗=1 𝑝𝑞𝑗 log [5]
𝑃𝑘 𝑝𝑞𝑗
𝟏 𝟒 𝟖
• A sample space of events is shown with {P} = { , , } . Evaluate A
𝟓 𝟏𝟓 𝟏𝟓 A
(i) Average uncertainty associated with the scheme M
C
𝐁 𝐂
(ii) Average uncertainty pertaining to the following probability scheme [ A , M = B U C ] , [𝐌 , 𝑴 ] B
(iii) Verify the rule of additivity
1 1 1 4 1 8 1
Soln., (i) H(S) = σ3k=1 𝑝𝑘 log ( ) = { ( log ( 1 ) ) + ( log ( 4 ) ) + ( log ( 8 ))}
𝑝𝑘 5 15 15
5 15 15
Average uncertainty, H [A , M = B U C ]
4 8 12 4
P(M) = P(B U C) = P(B) + P(C) = + = = = 0.8
15 15 15 5
H [A , M = B U C ] = 0.721 bits/symbol
B C
• H( , ):
M M
4 8
B P(B) 15 1 C P(C) 15 2
P( ) = = 4 = ; P( ) = = 4 =
M P(M) 3 M P(M) 3
5 5
B C 1 2 1 1 2 1
• H( , ) = H( , ) = { ( log ( 1 ) + ( log ( 2 )}
M M 3 3 3 3
3 3
P = { P(A) , P(M) }
P(B) P(C)
H(S) = H( P(A) , P(M) ) + P(M) H ( , )
P(M) P(M)
[1] = [2]
• For a zero memory binary source with source alphabet {S} = {0,1} with probabilities {P} = {p,q} ; p + q = 1
1
H(S) = σ2𝑘=1 𝑝𝑘 log ( )
𝑝𝑘
H(S)
1 1
= p log + q log
𝑝 𝑞
= - p log p – q log q
= - p log p – (1-p) log (1-p)
0 0.5 1
Probability
• The sketch showing variation of H(S) with probability is shown. If the output of source is certain, then the
source provides no information
• The maximum entropy of the source occurs if 0 and 1 are equally likely
• Suppose two sources have equal entropies but one is faster than the other producing more number of
symbols/unit time., In a given period, more information will be transmitted by the faster source than the
other
• If the time rate at which X emits symbols is ‘𝑟𝑠 ’ (symbols s), the information rate R of the source is
given by,
• An event has 5 possible outcomes with probabilities 0.5, 0.25, 0.125, 0.0625 and 0.0625. Find the
entropy of the system and also find the rate of information if there are 16 outcomes per second
Soln., H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.5 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + (0.125 𝑙𝑜𝑔2 0.125) + 2 x
(0.0625 𝑙𝑜𝑔2 0.0625) } = 1.875 bits/symbol
r = 16
R = 16 x 1.875 ≈ 30 bits/second
• A continuous signal is bandlimited to 5kHz. The signal is quantized in 8 levels of a PCM system
with probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05, 0.05 . Calculate the entropy and rate of
information
Soln.,
H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.25 𝑙𝑜𝑔2 0.25) + 2 x (0.2 𝑙𝑜𝑔2 0.2) + 2 x (0.1 𝑙𝑜𝑔2 0.1) + 3 x (0.05
𝑙𝑜𝑔2 0.05) } = 2.7412 bits/symbol
r = 𝑓𝑠 = 10000 bits
• It is useful to consider blocks rather than individual symbols, with each block consisting of ‘n’
successive source symbols
• Each such block is being produced by an extended source with a source alphabet ‘s n ’ that has ‘k n ’
distinct block where, ‘k’ is the number of symbols in the source alphabet of the original source
• In the case of discrete memoryless source, the source symbols are statistically independent. Hence, the
probability of source symbol in ‘s n ’ in the product of the probabilities ‘n’ source symbols in ‘s’
constituting the particular source symbol in ‘s n ’
H(S n ) = n H(S)
• Consider a discrete memoryless source with source alphabet, {𝑺𝟎 , 𝑺𝟏 , 𝑺𝟐 } with {P} = { 0.25, 0.25, 0.5 }
Prove that, H(𝑺𝟐 ) = 2 H(S)
Soln., k = 3 , n = 2
Extended source consists of 32 = 9 symbols
Blocks 𝑺𝟎 𝑺𝟎 𝑺𝟎 𝑺𝟏 𝑺𝟎 𝑺𝟐 𝑺𝟏 𝑺𝟏 𝑺𝟏 𝑺𝟐 𝑺𝟏 𝑺𝟎 𝑺𝟐 𝑺𝟐 𝑺𝟐 𝑺𝟏 𝑺𝟐 𝑺𝟎
Probability 0.0625 0.0625 0.125 0.0625 0.125 0.0625 0.25 0.125 0.125
H(S) =- Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.25 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + (0.5 𝑙𝑜𝑔2 0.5) = 1.5
2 x H(S) = 3
H(S 2 ) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { 4 x (0.0625 𝑙𝑜𝑔2 0.5) + (0.25 𝑙𝑜𝑔2 0.25) + 4 x (0.125 𝑙𝑜𝑔2 0.125) } = 3
• Encoding is the procedure for associating words constructed from a finite alphabet of a
language with given words of another language in a one to one manner
• Classification of Codes:
• Fixed Length Codes:
A fixed length code is defined as a code whose word length is not fixed
• Distinct Codes:
A distinct code is defined as the one in which each codeword is distinguishable from each other
A code is said to be uniquely decipherable if any sequence of code word can be interpreted in only
one way
A code in which no codeword can be formed by adding code symbols to another codeword is called a
prefix free code. In a Prefix-free code no codeword is a prefix of another
E.g.: {𝑆0 , 𝑆1 , 𝑆2 , 𝑆3 } – {0, 1, 10, 11} – Not Prefix Code
• Non Singular:
A block code is said to be non-singular, if all the code words of the word set are distinct
• Instantaneous Code:
A code word having a property that no codeword is a prefix of another codeword is said to be
instantaneous
• Optimal Code:
An instantaneous code is said to be optimal if it has minimum average length for a source with the
given probability of assignment for the source symbol
• Codeword Length:
• Let X be a DMS with finite entropy H(X) and an alphabet {𝑥1 , 𝑥2 , ……, 𝑥𝑚 } with corresponding
probabilities of occurrence P(𝑥𝑖 ) (i = 1, …. , m)
• Let the binary code word assigned to symbol xi by the encoder have length 𝑛𝑖 , measured in bits
• The length of the code word is the number of binary digits in the code word
• The average code word length L, per source symbol is given by,
L = σ𝑘𝑖=1 𝑝𝑖 . 𝑛𝑖
• The parameter L represents the average number of bits per source symbol used in the
source coding process
• Code Efficiency:
Efficiency is defined as the ratio of the average information per symbol of encoded language to the
maximum possible average information per symbol denoted by,
𝐻(𝑆) 𝐻(𝑆)
η= ; If r = 2 then., η =
𝐿 .𝑙𝑜𝑔2 𝑟 𝐿
L - Average Length
• Redundancy:
Redundancy = 1 – Efficiency
• Let us consider a source having four messages, S = {𝒔𝟎 , 𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } – {0, 10, 110, 111} with
probabilities 0.5, 0.25, 0.125, 0.125 . Calculate the efficiency and redundancy of the code.
Soln., Symbols Prob. Code • H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = - { (0.5 𝑙𝑜𝑔2 0.5) + (0.25
𝑙𝑜𝑔2 0.25) + 2 x (0.125 𝑙𝑜𝑔2 0.125)} = 1.75 bits/symbol
𝑺𝟎 0.5 0
• L = { ( 1 x 0.5 ) + ( 2 x 0.25 ) + 2 x ( 3 x 0.125 ) } = 1.75
bits
𝑺𝟏 0.25 10
• Efficiency = H(S)/L = 100%
𝑺𝟐 0.125 110 • Redundancy = 0
𝑺𝟑 0.125 111
• Let us consider a source having four messages, S = {𝒔𝟎 , 𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } – {0, 10, 110, 111} with
probabilities 1/3, 1/3, 1/6, 1/6. Calculate the efficiency and redundancy of the code.
𝑺𝟏 1/3 10
• L = { ( 1 x 1/3 ) + ( 2 x 1/3 ) + 2 x ( 3 x 1/6 ) } = 2 bits
• Efficiency = H(S)/L = 95.9%
𝑺𝟐 1/6 110 • Redundancy = 4.1%
𝑺𝟑 1/6 111
• The output of a discrete source is given by, {x} = {𝒙𝟏 , 𝒙𝟐 ,…, 𝒙𝟔 } with probabilities {P} = {𝟐−𝟏 , 𝟐−𝟐 , 𝟐−𝟒 ,
𝟐−𝟒 , 𝟐−𝟒 , 𝟐−𝟒 }is encoded in the following ways:
(i) Determine which of these codes are uniquely decodable?
(ii) Determine which of these codes have prefix property?
(iii) Find average length of each uniquely decodable code?
𝑪𝟏 𝑪𝟐 𝑪𝟑 𝑪𝟒 𝑪𝟓 𝑪𝟔
𝒙𝟏 0 (1) 1 (1) 0 (1) 111 (3) 1 (1) 0 (1)
𝒙𝟐 10 (2) 011 (3) 10 (2) 110 (3) 01 (2) 01 (2)
𝒙𝟑 110 (3) 010 (3) 110 (3) 101 (3) 0011 (4) 011 (3)
𝒙𝟒 1110 (4) 001 (3) 1110 (4) 100 (3) 0010 (4) 0111 (4)
𝒙𝟓 1011 (4) 000 (3) 11110 (5) 011 (3) 0001 (4) 01111 (5)
𝒙𝟔 1101 (4) 110 (3) 111110 (6) 010 (3) 0000 (4) 011111 (6)
𝑪𝟏 𝑪𝟐 𝑪𝟑 𝑪𝟒 𝑪𝟓 𝑪𝟔
𝒙𝟏 0 (1) 1 (1) 0 (1) 111 (3) 1 (1) 0 (1)
𝒙𝟐 10 (2) 011 (3) 10 (2) 110 (3) 01 (2) 01 (2)
𝒙𝟑 110 (3) 010 (3) 110 (3) 101 (3) 0011 (4) 011 (3)
𝒙𝟒 1110 (4) 001 (3) 1110 (4) 100 (3) 0010 (4) 0111 (4)
𝒙𝟓 1011 (4) 000 (3) 11110 (5) 011 (3) 0001 (4) 01111 (5)
𝒙𝟔 1101 (4) 110 (3) 111110 (6) 010 (3) 0000 (4) 011111 (6)
Instantaneous No No Yes Yes Yes No
Uniquely Decodable No No Yes Yes Yes Yes
• Given a source {S} = {𝑆1 , 𝑆2 ,……, 𝑆𝑞 } . Let the word length of the codes corresponding to
these symbols be {𝑙1 , 𝑙2 , ….. , 𝑙𝑞 } and let the code alphabet {x} = {𝑥1 , 𝑥2 ,……, 𝑥𝑞 } then an
instantaneous code for the source exists if and only if,
Proof: Let us assume that the word length be arranged in ascending order such that,
Since, the code alphabet has only ‘r’ symbols, we can have at most ‘r’ instantaneously decodable
sequence of length ‘l’ so as to satisfy the prefix property
• Let 𝑛𝑘 be the actual number of messages encoded into the codeword of length ‘k’, then
𝑛1 ≤ r [3]
• The number of actual instantaneous codes of word length 2 must obey the rule,
• As the first symbol can only be (r-𝑛1 ) symbols that are not used in forming the code words of
length 1 and second symbol of sequence can be any one of the ‘l’ code alphabet symbols
• Similarly, the actual number of codes of length 3 that are distinguishable from each other and
from 𝑛1 and 𝑛2 words must obey,
𝑛3 ≤ ((r-𝑛1 ) r - 𝑛2 ) r
𝑛3 ≤ 𝑟 3 - 𝑛2 𝑟 2 - 𝑛1 r [5]
• The first two symbols may be selected in (r-𝑛1 ) r - 𝑛2 ways and the third symbol element in ‘r’
ways, then we can write,
σ𝑘𝑗=1 𝑛𝑗 𝑟 −𝑗 ≤ 1 ; ( σ𝑚
𝑗=1 𝑊𝑗 𝐷 −𝑗 ≤1) [7]
σ𝑘𝑗=1 𝑛𝑗 𝑟 −𝑗 = 𝑟 −1 + 𝑟 −1 + … + 𝑟 −1 + 𝑟 −2 + 𝑟 −2 + … + 𝑟 −2 + …… + 𝑟 −𝑘 + 𝑟 −𝑘 + … + 𝑟 −𝑘 𝑛1 times
• This inequality just tells us whether an instantaneous code exists or not wherein it does not show how to construct
the code or it does not guarantee that any code that has word lengths satisfying the inequality to be instantaneous
itself
• A symbol code is encoded into binary code shown below. Soln., Using Kraft Inequality, given r=2 ;
Which of these are Instantaneous?
Code A: ( 3 x 2−2 + 2−3 + 2 x2−4 ) = 1
Source
Symbol Code A Code B Code C Code D Code E
Code B: (2−1 + 2−5 + 4 x 2−4 ) = 0.78125
𝑆1 00 0 0 0 0
Code C: ( 2−1 + 2−2 + 4 x 2−4 ) = 1
𝑆2 01 10000 10 1000 10
𝑆4 110 1110 1110 111 1110 Code E: ( 2−1 + 2−2 + 2−3 + 2 x 2−4 + 2−5 ) = 1.031
𝑆5 1110 1101 11110 1011 11110 Code E doesn't satisfy prefix property hence not
instantaneous
𝑆6 1111 1111 11111 1100 1111
• Kraft inequality applies to prefix codes which are special cases of uniquely decodable codes. The same
inequality is necessary for uniquely decodable codes and was proved by Macmillan
• Statement: The Kraft Macmillan inequality states that we can construct a uniquely decodable code
with word length 𝑙1 , 𝑙2 , ….. , 𝑙𝑞 iff these lengths satisfy the condition,
𝑞 𝑛
−𝑙
(σ𝑘=1 𝑟 )𝑘 = (𝑟 −𝑙1 + 𝑟 −𝑙2 + 𝑟 −𝑙3 + ….. + 𝑟 −𝑙𝑞 )𝑛 [2]
Expanding [2] we will have 𝑞𝑛 terms each of the form 𝑟 −𝑙𝑘1 + 𝑟 −𝑙𝑘2 + ….. + 𝑟 −𝑙𝑘𝑛 = 𝑟 −𝑗 ; 𝑙𝑘1 +
𝑙𝑘1 +..+𝑙𝑘𝑛 = j
Suppose ‘l’ is the maximum word length of the codes, then it follows that ‘j’ can be assigned some set of
values from n to nl
Let ‘𝑁𝑗 ’ be the number of terms of the form 𝑟 −𝑗 , then Eqn., [2] can be written as,
𝑞 𝑛
−𝑙
(σ𝑘=1 𝑟 )𝑘 = σnl −𝑗
j=n 𝑁𝑗 𝑟 [3]
• 𝑁𝑗 is also the number of strings of ‘n’ code words that can be formed so that each string has a length of
exactly ‘j’ symbols. If a code is uniquely decodable then, 𝑁𝑗 ≤𝑟 𝑗 , the number of distinct r-ary code
sequence of length ‘j’
𝑞 𝑛
(σ𝑘=1 𝑟 ) ≤σnl
−𝑙𝑘
j=n 𝑟
𝑗
𝑟 −𝑗 [6]
≤ nl – n+1 [7]
𝑞 −𝑙 𝑛
For a long sequence, (σ𝑘=1 𝑟 ) ≤
𝑘 nl [8]
lim (𝑛𝑙)1/𝑛 = 1
𝑛→∞
• Let {X} = {𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝟕 } . After encoding we get a set of messages with following lengths 𝐧𝟏 =
2 , 𝒏𝟐 = 2 , 𝒏𝟑 = 3 , 𝒏𝟒 = 3 , 𝒏𝟓 = 3 , 𝒏𝟔 = 4 , 𝒏𝟕 = 5 . Length of the 𝒊𝒕𝒉 code [𝒏𝒊 = 2 , 2 , 3 , 3 , 3 , 4 ,
5] ; [𝐰𝒊 = 0 , 2 , 3 , 1 , 1 , 0 , 0] . Prove that σ𝐍
𝐢=𝟏 𝑫 −𝒏𝒊 = σ𝐦 𝐖 𝑫−𝒋
𝐣=𝟏 𝒋
Soln.,
LHS: σN
i=1 𝐷
−𝑛𝑖 = 𝐷 −2 + 𝐷 −2 + 𝐷 −3 + 𝐷 −3 + 𝐷 −3 + 𝐷 −4 + 𝐷 −5
RHS: σm
j=1 W𝑗 𝐷
−𝑗 = 2 𝐷 −2 + 3 𝐷 −3 + 𝐷 −4 + 𝐷 −5 [2]
LHS = RHS
• Find the smallest number of letters in the alphabet ‘D’ for dividing a code with a prefix property
such that [w] = [0,3,0,5] . Devise such a code
Soln., σm −𝑗
j=1 W𝑗 𝐷 ≤ 1
• Show all possible sets of binary codes with prefix property for encoding the messages 𝒎𝟏 , 𝒎𝟐 , 𝒎𝟑
in words not more than 3 digits long.
Soln., • Possible Sets.,
𝑾𝟏 𝑾𝟐 𝑾𝟑
• σm
j=1 W𝑗
−𝑗
𝐷 ≤1
1 1 1
• D=2 1 2 0
1 0 2
• W1 2−1 + W2 2−2 + W3 2−3 ≤ 1
0 3 0
• W1 + W2 + W3 = 3
0 0 3
0 2 1
0 1 2
• In Information Theory, Shannon’s noiseless coding theorem places an upper and lower limit on the
minimum possible expected length of the code words as a function of entropy of the input word and
size of the code alphabet
• Statement:Let ‘S’ be the zero memory source with ‘q’ symbols, {S} = {𝑠1 , 𝑠2 ,…, 𝑠𝑞 } and symbol
probabilities {P} = {𝑝1 , 𝑝2 ,…, 𝑝𝑞 } respectively. If ‘S’ ensemble is encoded in a sequence of uniquely
decodable characters taken from the code alphabet of ‘r’ symbols, then
𝐇(𝐒) 𝐇(𝐒)
≤ L< +1 [1]
𝐥𝐨𝐠 𝐫 𝐥𝐨𝐠 𝐫
• Proof: Consider a zero memory source with ‘q’ symbols, {S} = {𝑠1 , 𝑠2 ,…, 𝑠𝑞 } and symbol
probabilities {P} = {𝑝1 , 𝑝2 ,…, 𝑝𝑞 } respectively. Let us encode the symbols into r-ary codes with word
lengths 𝑙1 , 𝑙2 ,…, 𝑙𝑞 , we shall find lower bound for average length of the codewords
𝑞
Let 𝑄1 , 𝑄2 ,…, 𝑄𝑞 be any set of numbers such that 𝑄𝑘 ≥ 0 and σ𝑘=1 𝑄𝑘 = 1 [2]
𝑞 1
Consider the quantity, H(S) - σ𝑘=1 𝑝𝑘 log ( ) [3]
𝑄𝑘
𝑞 1 𝑞 1 𝒒 𝑸𝒌
= σ𝑘=1 𝑝𝑘 log ( ) - σ𝑘=1 𝑝𝑘 log ( ) = σ𝒌=𝟏 𝒑𝒌 log ( ) [4]
𝑝𝑘 𝑄𝑘 𝒑𝒌
H(S)
L ≥ [15]
log r
1
log2 (𝑃 )
1 1 𝑘
For equality, we choose, 𝑙𝑘 = log 𝑟 ( ) i.e., ( log 𝑟 ( ) = ) [17]
𝑃𝑘 𝑃𝑘 log2 (𝑟)
Eqn., [15] => Lower bound on L, average word length of code expressed as a fraction of code bit/ source symbol
• We know that each codewords will have integer number of code bits. There is a problem where in we need to find
what to select for the value of 𝑙𝑘 , the number of code symbols in codeword ‘k’ corresponding to a source symbol
‘𝑆𝑘 ’ when the quantity in Eqn., [17] is not an integer
1
log2 (𝑃 )
1 𝑘
We know that, log 𝑟 ( ) =
𝑃𝑘 log2 (𝑟)
1 1
log𝑃 log𝑃
Eqn., [18] => 𝑘
≤ 𝑙𝑘 < 𝑘
+1 [21]
log 𝑟 log 𝑟
σ𝑘=1 𝑃𝑘 log
𝑞
( 𝑃1 ) 𝑞
σ𝑘=1 𝑃𝑘 log (
𝑞 1
𝑃𝑘
) 𝑞
𝑘
≤σ𝑘=1 𝑝𝑘 𝑙𝑘 < + σ𝑘=1 𝑃𝑘 [22]
log 𝑟 log 𝑟
𝑯(𝑺) 𝑯(𝑺)
≤ 𝑳< +1 [23]
𝒍𝒐𝒈 𝒓 𝒍𝒐𝒈 𝒓
𝑯(𝑺) 𝑳 𝐇(𝐒) 𝟏
=> ≤ < + [26]
𝒍𝒐𝒈 𝒓 𝐧 𝐥𝐨𝐠 𝐫 𝐧
𝐿 1
For binary codes, 𝐻(𝑆) ≤ < 𝐻(𝑆) + [27]
n n
𝐿𝑛 𝐻(𝑆)
lim = ( Lower and Upper bounds converge)
𝑛→∞ 𝑛 log 𝑟
1. Shannon Encoding
3. Huffman’s Encoding
4. Arithmetic Coding
5. Run-Length Encoding
• STEP 1: List the source symbols {S} = {𝑠1 , 𝑠2 , … , 𝑠𝑞 } in the order of decreasing probability {P} =
{𝑝1 , 𝑝2 , 𝑝3 , … , 𝑝𝑞 } of occurrence such that 𝑝1 > 𝑝2 > … > 𝑝𝑞
• STEP 2: Compute the sequence,
∝1 = 0
∝2 = 𝑝1
∝3 = 𝑝1 + 𝑝2
∝𝑞+1 = 𝑝𝑞 + 𝑝𝑞−1 + …… + 𝑝1
• STEP 3: Determine the set of integers ‘𝑙𝑘 ’ which are the smallest integer solution of the inequality
2𝑙𝑘 . 𝑝𝑘 ≥ 1 , k = 1, 2, ….., q
• STEP 4: Expand the decimal number of ∝ in binary form to 𝑙𝑘 places and neglect the expansion
beyond 𝑙𝑘 digits
• STEP 5: Removal of decimal points results in desired code
ECE403 - INFORMATION THEORY & CODING
22-09-2021 80
UNIT I – Information Theory and Source Coding
Numerical Problem on Shannon Encoding Procedure
• Consider the following ensemble {S} = {𝒔𝟏 , 𝒔𝟐 , … , 𝒔𝟒 } with {P} = {0.4, 0.3, 0.2, 0.1}. Encode the symbols
using Shannon’s binary encoding procedure and calculate the efficiency and redundancy of the code
Soln.,
• Arrange the probabilities, 𝒑𝟏 > 𝒑𝟐 > 𝒑𝟑 > 𝒑𝟒 => 0.4 > 0.3 > 0.2 > 0.1
• Compute the sequences ∝, ∝1 = 0
∝2 = 𝑝1 = 0.4
∝3 = 𝑝1 + 𝑃2 = 0.7
∝4 = 𝑝1 + 𝑝2 + 𝑝3 = 0.9
∝5 = 𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 = 1.0
• Find ‘𝑙𝑘 ’ => 2𝑙𝑘 . 𝑝𝑘 ≥ 1
2𝑙1 . 𝑝1 ≥ 1 ; 𝑙1 ≥ 1.321 => 𝑙1 = 2
2𝑙2 . 𝑝2 ≥ 1 ; 𝑙2 ≥ 1.730 => 𝑙2 = 2
2𝑙3 . 𝑝3 ≥ 1 ; 𝑙3 ≥ 2.320 => 𝑙3 = 3
2𝑙1 . 𝑝4 ≥ 1 ; 𝑙4 ≥ 3.320 => 𝑙4 = 4
𝐻(𝑆)
• Efficiency = = 76%
𝐿
Symbol Code Length
𝑺𝟏 00 2 • Redundancy = 24%
𝑺𝟐 01 2
𝑺𝟑 101 3
𝑺𝟒 1110 4
• STEP 2: Partition this ensemble into almost two equiprobable groups ( ‘r’ groups for r-ary coding) for
binary coding
• STEP 3: Assign ‘0’ to one group and ‘1’ to the other group ( assign a code symbol each to each group
respectively, from code alphabet). This forms the stating code symbols of the codes
• STEP 4: Repeat steps 2 & 3 on each of the sub-groups until the sub-groups contain only one source
symbol to determine the succeeding code symbols of the code word
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
• Consider the message ensemble {S} = {𝒔𝟏 , 𝒔𝟐 ,……, 𝒔𝟖 } and {P} = { , , , , , , , } with {X} = {0,1}
𝟒 𝟒 𝟖 𝟖 𝟏𝟔 𝟏𝟔 𝟏𝟔 𝟏𝟔
Construct a binary code using Shannon Fano encoding procedure. Calculate η and 𝑬𝒄 .
Soln.,
Symbols Codes
𝟏 (Length)
𝒔𝟏 𝒔𝟏 𝒔𝟏 00
𝟒
0 𝒔𝟏
𝒔𝟐 𝟏
𝒔𝟐 𝒔𝟐 00 (2)
𝟒
01
𝟏 𝒔𝟐 01 (2)
𝒔𝟑 𝒔𝟑 𝒔𝟑 𝒔𝟑 100
𝟖 𝒔𝟑
10 100 (3)
𝟏
𝒔𝟒 𝒔𝟒 𝒔𝟒 𝒔𝟒 101
𝟖 𝒔𝟒 101 (3)
𝟏
𝒔𝟓 𝒔𝟓 1 𝒔𝟓 𝒔𝟓 𝒔𝟓 1100 𝒔𝟓 1100 (4)
𝟏𝟔
𝟏
110
𝒔𝟔 𝒔𝟔 𝒔𝟔 𝒔𝟔 𝒔𝟔 1101 𝒔𝟔 1101 (4)
𝟏𝟔 11
𝒔𝟕 𝟏
𝒔𝟕 𝒔𝟕 𝒔𝟕 𝒔𝟕 1110 𝒔𝟕 1110 (4)
𝟏𝟔
𝟏
111 𝒔𝟖 1111 (4)
𝒔𝟖 𝒔𝟖 𝒔𝟖 𝒔𝟖 𝒔𝟖 1111
𝟏𝟔
𝐻(𝑆)
• Efficiency, η = = 100 %
𝐿
• Redundancy = 0
• Construct a trinary code for symbols with {P} ={0.3,0.3,0.09,0.09,0.09,0.09,0.04} and {X} = {0,1,2} using
Shannon Fano Encoding Procedure
Symbol Code Length
Soln.,
𝒔𝟏 0 1
𝒔𝟏 0.30 𝒔𝟏 0 𝒔𝟐 1 1
𝒔𝟑 20 2
𝒔𝟐 0.30 𝒔𝟐 1
𝒔𝟒 21 2
𝒔𝟑 0.09 𝒔𝟑 𝒔𝟑 20 𝒔𝟓 220 3
𝒔𝟔 221 3
𝒔𝟒 0.09 𝒔𝟒 𝒔𝟒 21 𝒔𝟕 222 3
𝒔𝟓 0.09 𝒔𝟓 2 𝒔𝟓 𝒔𝟓 220 • H(S) = 2.477 bits/symbol
• L = 1.62 trinits/symbol
𝒔𝟔 0.09 𝒔𝟔 𝒔𝟔 22 𝒔𝟔 221 (r=3)
H(S) 2.477
• Efficiency = = = 96.53%
𝒔𝟕 0.04 𝒔𝟕 𝒔𝟕 𝒔𝟕 222 𝐿 . log 𝑟 1.62 𝑙𝑜𝑔2 (3)
• Redundancy = 3.47%
ECE403 - INFORMATION THEORY & CODING
22-09-2021 86
UNIT I – Information Theory and Source Coding
Huffman Encoding Procedure
• The code with minimum average length ‘L’ would be more efficient and to have minimum redundancy associated
with it
• A compact code is one which achieves this objective. Huffman has suggested a simple method that guarantees an
optimal code
• Procedure:
• STEP 2: Check if q=r+ ∝(r-1) is satisfied and find the integer ‘∝’. Otherwise add suitable number of dummy
symbols of zero probability of occurrence to satisfy the equation (This step is not needed for binary codes)
• STEP 3: Club the last ‘r’ symbols into a single composite symbol whose probabilities of occurrence is equal to
sum of probabilities of occurrence of ‘r’ symbols involved in this step
• STEP 4: A new list of events is recorded again to be in the order of decreasing probability
• STEP 5: Repeat steps 3 and 4 on the resulting set of symbols until in the final step exactly ‘r’ symbols are left
• STEP 6: Assign codes to the last ‘r’ composite symbols and work backwards to the original source to arrive at the
optimal code
• STEP 7: Discard the codes of dummy symbols (This step is not needed for binary codes)
• Construct a Huffman code for symbols {S} = {𝒔𝟏 , 𝒔𝟐 ,……, 𝒔𝟔 } and {P} = {0.3, 0.25, 0.2, 0.1, 0.1, 0.05} with
{X} = {0, 1}
• Soln., 0
𝒔𝟏 0.30 0.30 0.30 0.45 0.55
0
𝒔𝟐 0.25 0.25 0.25 0.30 0.45
0 1
𝒔𝟑 0.20 0.20 0.25 0.25
0 1
𝒔𝟒 0.10 0.15 0.20
0 1
𝒔𝟓 0.10 0.10 1 Symbol Code Length
𝒔𝟔 0.10 1 𝒔𝟏 00 2
𝒔𝟐 10 2
• Apply Huffman binary encoding procedure for the given symbols{S} = {𝒔𝟏 , 𝒔𝟐 , 𝒔𝟑 } with {P} =
{0.5,0.3,0.2} . Find the entropy and efficiency. If the same technique is applied for the second
order extension of the source calculate its entropy and efficiency.
Soln., 0
𝒔𝟏 0.5 0.5
0
𝒔𝟐 0.3 0.5
1 • H(S) = - Σ P(𝑠𝑖 ) 𝑙𝑜𝑔2 P(𝑠𝑖 ) = 1.485 bits/symbol
𝒔𝟑 0.2
1 • L = 1.5 bits/symbol
• Efficiency = H(S)/L = 99 %
Symbol Code Length
𝒔𝟏 1 1 • Redundancy = 1%
𝒔𝟐 00 2
𝒔𝟑 01 2
• Second Extension:
𝒙𝟏 𝒔𝟏 𝒔𝟏 0.25
𝒙𝟐 𝒔𝟏 𝒔𝟐 0.15
𝒙𝟑 𝒔𝟏 𝒔𝟑 0.1
𝒙𝟒 𝒔𝟐 𝒔𝟏 0.15
𝒙𝟓 𝒔𝟐 𝒔𝟐 0.09
𝒙𝟔 𝒔𝟐 𝒔𝟑 0.06
𝒙𝟕 𝒔𝟑 𝒔𝟏 0.10
𝒙𝟖 𝒔𝟑 𝒔𝟐 0.06
𝒙𝟗 𝒔𝟑 𝒔𝟑 0.04
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
• Construct a Huffman code for the symbols {S} = {𝒔𝟏 , 𝒔𝟐 ,…, 𝒔𝟔 } with {P} = { , , , , , } and {X} =
𝟑 𝟒 𝟖 𝟖 𝟏𝟐 𝟏𝟐
{0,1,2}
𝑞−𝑟 6−3 𝟑
∝= = = (not an integer)
𝑟−1 3−1 𝟐
Now, q=7;r=3
𝑞−𝑟 7−3 𝟒
∝= = = = 2 (∝ is an integer)
𝑟−1 3−1 𝟐
• Procedure:
• STEP 1: Divide the given probabilities in the same order ranging from 0 to 1
• STEP 2: Expand the first symbol to be coded. The new range is defined by calculating its limits
New Range (Lower Limit of each symbol) = Lower Limit + d (Probability of the symbol)
• STEP 3: Repeat the procedure for successive symbols until the final symbol in the given sequence is
encoded
• STEP 5: Decoding Process: The tag value is used to decode the symbols assigned with its probabilities
• STEP 6: The probabilities of the symbols are arranged in the given format ranging from 0 to 1
• STEP 7: The range in which tag value is present is now formed as new range having the lower bound
and upper bound. The new lower limit of each symbol within this range is calculated by
New Range (Lower Limit of each symbol) = Lower Limit + d (Probability of the symbol)
• STEP 8: This procedure continues until all the symbols are decoded
• STEP 9: The symbol within the tag value decoded at each stage is stored as a sequence forming the
final decoded data
ECE403 - INFORMATION THEORY & CODING
22-09-2021 96
UNIT I – Information Theory and Source Coding
Arithmetic Coding
• Using Arithmetic Coding, encode the message MATHS with probabilities A = 0.3 ; T = 0.3 ; H = 0.2 ; M =
0.1 ; S = 0.1 . Generate the Tag value
Soln., STEP 1 STEP 2 • d = Upper bound – Lower bound = 0.9 – 0.8 = 0.1
1 0.9
S S • Range of each “Symbol” =
0.9 0.89
M M Lower Limit + d (Prob. Of Symbol)
0.8 0.88
• Range of “A” = 0.8 + 0.1 (0.3) = 0.83
H H
0.6 0.86 • Range of “T” = 0.83 + 0.1 (0.3) = 0.86
T T
0.3 0.83 • Range of “H” = 0.86 + 0.1 (0.2) = 0.88
A A
• Range of “M” = 0.88 + 0.1 (0.1) = 0.89
0 0.8
• Range of “S” = 0.89 + 0.1 (0.1) = 0.9
• The arithmetic codeword from the encoding process is obtained in the range
= ( 0.8162 + 0.81602 ) / 2
TAG = 0.81611
• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5
• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5
• Using Arithmetic Coding, decode the message with tag value 0.572 given in the source with probabilities
A = 0.1 ; B = 0.4 ; C = 0.5
• d = Upper bound – Lower bound = 0.65 – 0.57 = 0.08
Soln., STEP 3 STEP 4
0.75 0.65 • Range of each “Symbol” =
0.55 0.57 CB BA
• Procedure:
• STEP 1: The sequence at the output of a discrete source is written as number of times the bit occurs as
a sequence
• STEP 2: Every sequence is represented as (bit value, Number of occurrence in the sequence)
• STEP 3: If the maximum number of occurrence for all the sequence is denoted as ‘n’ then length of
occurrence in bits is log 2 (n) bits (If n is decimal, take the next integer)
• STEP 4: The number of occurrence in each sequence is replaced by its binary form with length of
log 2 (n) bits
• STEP 4: The final sequence of the compressed form is written as the encoded output
ECE403 - INFORMATION THEORY & CODING
22-09-2021 105
UNIT I – Information Theory and Source Coding
Run Length Encoding
• Using Run Length Encoding Technique, encode the given bit stream 00000111110010000101 .
Soln., Original Bit Stream : 00000111110010000101
• Using Run Length Encoding Technique, encode the given bit stream.
000000111111111111110000000000000111111111
• Using Run Length Encoding Technique, encode the given bit stream AAAAABBBBCCCDEEEFFFFGG.
Soln., Original Bit Stream : AAAAABBBBCCCDEEEFFFFGG
A5B4C3D1E3F4G2
• The major difficulty in using Huffman code is that symbol probabilities must be known or estimated
and both encoder and decoder must know the coding tree
• If a tree is constructed from a unusual alphabet, a channel connecting encoder and decoder , must also
deliver a coding tree as the header (for the compressed file)
• The Lempel Ziv algorithm is designed to be independent of source probability. It is a variable length to
a fixed length algorithm
• Procedure:
• STEP 1: The sequence at the output of a discrete source is divided into variable length blocks which
are called phrases
• STEP 2: A new phrase is introduced every time, a block of letters from the source differs from some
previous phrases in last letter
• STEP 3: Phrases are listed in the dictionary which shows the location of the existing phrase
• STEP 4: When encoding a new phrase, simple specify the location of existing phrase in the dictionary
and append a new letter