Information Theory Module 3
Information Theory Module 3
Dr. Markkandan S
2 Huffman Coding
4 Arithmetic Coding
We know 25 = 32 > 26
Hence, each of the letters can be uniquely represented using fixed length of 5 bits
We know 25 = 32 > 26
Hence, each of the letters can be uniquely represented using fixed length of 5 bits
Allotting equal no of bits for frequently used letters and not frequently used letters is not
an efficient way
We have to represent more frequently occurring letters with less number of bits using
Variable Length Code (VLC)
Prefix Condition : No codeword forms prefix of another code word ( VLC1 has better
prefix than VLC2)
Instantaneous Codes: As soon as the sequence of bits corresponding to any one of the
possible codewords is detected, symbol will be decoded
Uniquely Decodable Codes: Encoded string will be generated by only one possible input
string, Have to wait unitl entire string is obtained before decoding even the first symbol
VLC2 is not a uniquely decodable code. VLC1 is uniquely decodable code
Proof:
A six symbol source is encoded in to binary codes shown below. Which of these codes are
instantaneous ?
L
X
2−nk = 2−1 + 2−2 + 2−3 + 2−3 = 0.5 + 0.25 + 0.125 + 0.125 = 1
k=1
Hence Kraft inequality satisfiedModule-3
Dr. Markkandan S Probability Based Source Coding 12/63
Source Coding Theorem
Statement:
Let X be the ensemble of letters from a DMS with finite Entropy H(X) and the output
symbols xk , k = 1, 2, . . . , L occurring with probabilities P(xk ), k = 1, 2, . . . , L.
It is possible to construct a code that satisfies the prefix condition and has an average length
R̄ that satisfies the inequality
H(X ) ≤ R̄ < H(X ) + 1
The efficiency of a prefix code is
H(X )
η=
R̄
Redundancy of the code is
E =1−η
This algorithm is optimal in sense that average number of bits require to represent the source
symbols is a minimum provided the prefix condition is met.
Steps:
1. Arrange the source symbols in a decreasing order of their probabilities
2. Take the bottom two symbols and tie them together. Add the probabilities of the two symbols and write it
on the combined branches with a ’1’ and ’0’.
3. Treat this sum of probabilites as a new probability associated with a new symbol. Again pick the two
smallest probabilities tie tham together. Each time we perform this, total number of symbols is reduced by
one
4. Continue this procedure until only one probability is left . This completes the construction of Huffman Tree
5. To find the prefix codeword for any symbol, follwo the branches from the final node back to the symbol
Construct a quarternary Huffman code for the following set of message symbols with the
A B C D E F G H
respective probabilities
0.22 0.20 0.18 0.15 0.10 0.08 0.05 0.02
N−r 8−4 4
Step-1: No of Stages n = r −1 = 4−1 = 3 Not an integer
1
Codes that uses codeword lengths of l(x) = ⌈log P(x) ⌉ are called Shannon Codes. Shannon
codeword lengths satisfy the kraft inequality.
Steps:
1. Given the source alphabet S and the corresponding probabilities P for a given information
source
2. Arrange the probabilities in the non increasing order
3. Compute the length of li for the codeword corresponding to each symbol si from
probabilit pi is given by
1
li ≥ log2
Pi
2. Find the minimum value of li such that l1 ≥ log2 p11 = log2 0.4
1
=⇒ l1 = 2
l2 ≥ log2 p12 = log2 0.3
1
=⇒ l3 =2
1 1
l3 ≥ log2 p3 = log2 0.2 =⇒ l3 =3
l4 ≥ log2 p14 = log2 0.1
1
=⇒ l4 =4
This is an improvement over Shannon’s first algorithm. It offers better coding efficiency
compared to Shannon’s algorithm.
Steps:
1. Arrange the probabilities in the non-increasing order
2. Group the probabilities in to exactly two sets such that the sum of probabilities in both
the groups is almost equal.
3. Assign bit ’0’ to all elements of the first group and bit’1’ to all elements of group 2
4. Repreat Step-2 by dividing each group in two sub groups till no further division is possible
Where, F̄ (x) represents the sum of probabilities of all symbols less than x plus half the
probability of the symbols x.
Note: In this code, No need to arrange the probabilities in descending order
Dr. Markkandan S Module-3 Probability Based Source Coding 49/63
Example : Shannon-Fano-Elias Coding
PROBLEM:
Construct Shannon-Fano-Elias coding for the source symbols x1 , x2 , x3 , x4 with probabilites
1 1 1 1
2 , 22 , 23 , 23 .
STEPS:
P
1. Find F (x) = z≤x P(z) (Add all previous and current probabilities of the symbol)
2. Find F̄ (x) = z<x P(z) + 12 P(x) (Add all previous probabilities of less than x and half
P
the current probability of the symbol)
Symbol Probability F(x) F̄ (x)
1
x1 2 0.5 0.25
1
x2 22
0.75 0.625
1
x3 23
0.875 0.8125
1
x4 23
1 0.9375
1
1 1 23
For Example, To find F̄ (x3 ) = 2 + 22
+ 2 = 0.8125
3. Find F̄ (x) in binary form (Convert the decimal floating values in to binary)
Symbol Probability F(x) F̄ (x) F̄ (x)binary
1
x1 2 0.5 0.25 0.01
1
x2 22
0.75 0.625 0.101
1
x3 23
0.875 0.8125 0.1101
1
x4 23
1 0.9375 0.1111
For Example, To find F̄ (x3 ) = 0.8125 in to F̄ (x3 )binary
0.8125X 2 = 1.6250
0.625X 2 = 1.250
0.25X 2 = 0.50
0.5X 2 = 1 =⇒ (0.1101)2
1
4. Determine the length of the codeword using l(x) = ⌈log P(x) ⌉+1
5. Write the code word from F̄ (x)b inary for the length of l(x)
Symbol Probability F(x) F̄ (x) F̄ (x)binary l(x) code
1
x1 2 0.5 0.25 0.01 2 01
1
x2 22
0.75 0.625 0.101 3 101
1
x3 23
0.875 0.8125 0.1101 4 1101
1
x4 23
1 0.9375 0.1111 4 1111
For Example, To find codeword for x3 , F̄ (x3 )binary = 0.1101 with l(3) is 4. Hence Code
word is 1101
6. Entropy for this code is 1.75 bits
7. Average Code Word Length is 2.75 bits
Huffman Codes are only optimal if the probabilities of the symbols are negative powers of two.
Because all prefixcodes work at bit level.
1. Prefix codes try to match self information of the symbols using codewords whose lengths
are integers. The length matching may ascribe a codeword either longer than the self
information or shorter.
2. If prefix codes are generated usign binary tree, the decisions between tree branches always
take one bit.
3. Arithmetic coding doesnot have this restriction, it works by representing the file to be
encoded by an interval of real numbers between 0 and 1. Successive symbols in the
message reduce this interval in accoradance with the probability of that symbol. the more
likely symbols reduce the range by less and thus add fewer bits to the message.
Step 2: For First letter to be encoded is ’B’ the corresponding interval is [0.5 0.75).
Step 9: Hence the codeword for ’BACA’ lies anywhere in the interval[0.59375 0.609375). We
choose the minimum interval in the range 0.59375