0% found this document useful (0 votes)

79 views68 pages

Information Theory Module 3

Uploaded by

Akash Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views68 pages

Information Theory Module 3

Uploaded by

Akash Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Module-3

Probability Based Source Coding

Dr. Markkandan S

School of Electronics Engineering (SENSE)

Vellore Institute of Technology
Chennai

Dr. Markkandan S (School of Electronics Engineering (SENSE)Vellore

Module-3Institute
Probability
of Technology
Based Source
Chennai)
Coding 1 / 63
Outline

1 Source Coding Theorem

2 Huffman Coding

3 Shannon - Fano Coding

4 Arithmetic Coding

Dr. Markkandan S Module-3 Probability Based Source Coding 2/63

Source Coding Theorem
Introduction

Source Coding: Efficient Representation of symbols generated by a source.

1. The Primary motivation is to compress the data by efficient representation of symbols
2. A code is a set of vectors called code words
3. A discrete memoryless source (DMS) outputs a symbol selected from a finite set of
symbols xi = 1, 2, . . . , L The number of binary digits (bits) R required for unique coding,
when L is a power of 2 is R = log2 L
4. When L is not a power of 2, R = ⌊log2 L⌋ + 1

Dr. Markkandan S Module-3 Probability Based Source Coding 4/63

Fixed Length Code (FLC) and Variable Length Code(VLC)

Let us represent 26 letters in the English alphabet using bits.

Dr. Markkandan S Module-3 Probability Based Source Coding 5/63

Fixed Length Code (FLC) and Variable Length Code(VLC)

Let us represent 26 letters in the English alphabet using bits.

R = ⌊log2 26⌋ + 1 = 5 bits

We know 25 = 32 > 26
Hence, each of the letters can be uniquely represented using fixed length of 5 bits

Dr. Markkandan S Module-3 Probability Based Source Coding 5/63

Fixed Length Code (FLC) and Variable Length Code(VLC)

Let us represent 26 letters in the English alphabet using bits.

R = ⌊log2 26⌋ + 1 = 5 bits

We know 25 = 32 > 26
Hence, each of the letters can be uniquely represented using fixed length of 5 bits
Allotting equal no of bits for frequently used letters and not frequently used letters is not
an efficient way
We have to represent more frequently occurring letters with less number of bits using
Variable Length Code (VLC)

Dr. Markkandan S Module-3 Probability Based Source Coding 5/63

Example: Fixed Length Code (FLC ) and Variable Length
Code(VLC )
Let us code First 8 letters (A − H) of English.

Let us Represent A BAD CAB using FLC and VLC

Dr. Markkandan S Module-3 Probability Based Source Coding 6/63

Example: Fixed Length Code (FLC ) and Variable Length
Code(VLC )
Let us code First 8 letters (A − H) of English.

Let us Represent A BAD CAB using FLC and VLC

Dr. Markkandan S Module-3 Probability Based Source Coding 6/63

Example: Variable Length Code(VLC )

Let us code First 8 letters (A − H) of English.

Let us Represent A BAD CAB using both VLC

Dr. Markkandan S Module-3 Probability Based Source Coding 7/63

Example: Variable Length Code(VLC )

Let us code First 8 letters (A − H) of English.

Let us Represent A BAD CAB using both VLC

Dr. Markkandan S Module-3 Probability Based Source Coding 7/63

Variable Length Code(VLC ) : Issues

Prefix Condition : No codeword forms prefix of another code word ( VLC1 has better
prefix than VLC2)
Instantaneous Codes: As soon as the sequence of bits corresponding to any one of the
possible codewords is detected, symbol will be decoded
Uniquely Decodable Codes: Encoded string will be generated by only one possible input
string, Have to wait unitl entire string is obtained before decoding even the first symbol
VLC2 is not a uniquely decodable code. VLC1 is uniquely decodable code

Dr. Markkandan S Module-3 Probability Based Source Coding 8/63

Kraft Inequality
A necessary and sufficient condition for the existence of a binary code with codewords
having lengths n1 ≤ n2 ≤ . . . nL that satisfy the prefix condition is
L
X
2−nk ≤ 1
k=1

Proof:

Consider a binary tree n = nL . This tree has 2n termianl

nodes. Let us select any code of order n1 as the first
codeword C1 . Since no code word is prefix of any other
codeword, this choice eliminates 2n−n1 terminal codes. This
process continues until the last codeword is assigned.

Dr. Markkandan S Module-3 Probability Based Source Coding 9/63

Kraft Inequality: Example

A six symbol source is encoded in to binary codes shown below. Which of these codes are
instantaneous ?

Dr. Markkandan S Module-3 Probability Based Source Coding 10/63

Kraft Inequality: Example
A six symbol source is encoded in to binary codes shown below. Which of these codes are
instantaneous ?

Check for Prefix Property and Kraft Inequality :

CODE E - Not Satisfies Kraft Inequality ;CODE D - Not Satisfies Prefix Property
CodeA, Code B, Code C satisfy both properties and instantaneous
Dr. Markkandan S Module-3 Probability Based Source Coding 11/63
Kraft Inequality: Example
Construction of a prefix code using a binary tree

L
X
2−nk = 2−1 + 2−2 + 2−3 + 2−3 = 0.5 + 0.25 + 0.125 + 0.125 = 1
k=1
Hence Kraft inequality satisfiedModule-3
Dr. Markkandan S Probability Based Source Coding 12/63
Source Coding Theorem

Statement:
Let X be the ensemble of letters from a DMS with finite Entropy H(X) and the output
symbols xk , k = 1, 2, . . . , L occurring with probabilities P(xk ), k = 1, 2, . . . , L.
It is possible to construct a code that satisfies the prefix condition and has an average length
R̄ that satisfies the inequality
H(X ) ≤ R̄ < H(X ) + 1
The efficiency of a prefix code is
H(X )
η=
R̄
Redundancy of the code is
E =1−η

Dr. Markkandan S Module-3 Probability Based Source Coding 13/63

Example: Source Coding Theorem
Consider a Source X which generates four symbols with probabilities
P(x1 ) = 0.5, P(x2 ) = 0.3, P(x3 ) = 0.1 and P(x4 ) = 0.1
The entropy of the source is
4
X
H(X ) = − P(xk )log2 P(xk ) = 1.685 bits
k=1

If we use Prefix code discussed before {0, 10, 110, 111}

The average code word length R̄ is
4
X
R̄ = nk P(xk ) = 1(0.5) + 2(0.3) + 3(0.1) + 3(0.1) = 1.700 bits
k=1

The efficiency of the code is

η = 1.685/1.700 = 0.9912
Dr. Markkandan S Module-3 Probability Based Source Coding 14/63
Huffman Coding
Huffman Coding Algorithm

This algorithm is optimal in sense that average number of bits require to represent the source
symbols is a minimum provided the prefix condition is met.
Steps:
1. Arrange the source symbols in a decreasing order of their probabilities
2. Take the bottom two symbols and tie them together. Add the probabilities of the two symbols and write it
on the combined branches with a ’1’ and ’0’.
3. Treat this sum of probabilites as a new probability associated with a new symbol. Again pick the two
smallest probabilities tie tham together. Each time we perform this, total number of symbols is reduced by
one
4. Continue this procedure until only one probability is left . This completes the construction of Huffman Tree
5. To find the prefix codeword for any symbol, follwo the branches from the final node back to the symbol

Dr. Markkandan S Module-3 Probability Based Source Coding 16/63

Huffman Coding
Combining probabilities

Number of stages required for encoding operation

N −r
n=
r −1
Here N= Total Number of symbols in source alphabet
Binary Huffman Coding r=2
Ternary Huffman coding r=3
Quarternary Huffman Coding r=4
Dr. Markkandan S Module-3 Probability Based Source Coding 17/63
Example 1: Binary Huffman Coding

Consider a DMS with seven possible symbols xi , i = 1, 2, . . . , 7 and the corresponding

probabilities P(x1 ) = 0.37, P(x2 ) = 0.33, P(x3 ) = 0.16, P(x4 ) = 0.07, P(x5 ) = 0.04, P(x6 ) =
0.02, P(x7 ) = 0.01
Letus construct the Huffman Tree

Dr. Markkandan S Module-3 Probability Based Source Coding 18/63

Example 1: Binary Huffman Coding

The entropy of the source is

7
X
H(X ) = − P(xk )log2 P(xk ) = 2.1152 bits
k=1
The average number of binary digits per symbol is
7
X
R̄ = nk P(xk ) = 1(0.37) + 2(0.33) + 3(0.16) + 4(0.07) + 5(0.04) + 6(0.02) + 6(0.01) = 2.17 bits
k=1

Dr. Markkandan S Module-3 Probability Based Source Coding 19/63

Example 2: Non Binary Huffman Coding - Quarternary Huffman
Coding

Construct a quarternary Huffman code for the following set of message symbols with the
A B C D E F G H
respective probabilities
0.22 0.20 0.18 0.15 0.10 0.08 0.05 0.02

N−r 8−4 4
Step-1: No of Stages n = r −1 = 4−1 = 3 Not an integer

Next value to get integer is N=10

n = N−r 10−4
r −1 = 4−1 = 2

Dr. Markkandan S Module-3 Probability Based Source Coding 20/63

Example 2: Non Binary Huffman Coding - Quarternary Huffman
Coding