0% found this document useful (0 votes)
9 views

ICE513 Module 4 - Source Coding

Uploaded by

ayomide.adekoya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ICE513 Module 4 - Source Coding

Uploaded by

ayomide.adekoya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1

Lecture Outline

• Definitions
• Data Compression
• Kraft-McMillian Inequality
– Consequences of K-M Inequality
• Optimal Codes
• Huffman Codes

2
2
Data Compression
• Goal – establish the fundamental limit for the
compression of information.
• Data compression can be achieved by
assigning short descriptions to the most
frequent outcomes of the data source, and
necessarily longer descriptions to the less
frequent outcomes.

3
Data Compression
• For example, in Morse code, the most
frequent symbol is represented by a single
dot.
• In this module, we will find the shortest
average description length of a random
variable.

4
Digital Communication System

5
Source code C(x)
• A source code C of a Example 1
random variable X is a
mapping from the range •Let X be a random variable
of X, to the set of finite- depicting the toss of a fair coin;
length (D) of symbols D=2, since alphabet is
from a D-ary alphabet. {0,1} ≡ {H,T}

• Cx denotes the
Let code for H=00⇒C(H)=00
codeword corresponding
Code for T⇒C(T) =11
to x; while lx is the
length corresponding to
∴ l(H)=l(T)=l[Cx]=2
C(x).

6
Expected Length L(C) of a Code

For a random variable X with pmf p(x), the expected


length of the codeword C(x) is given by

(1)

Where l(x) is length of C(x) associated with X

7
L(C): Uniquely Decodeable
Example 2 Expected Length
Given that Px = {½, ¼, ¼}
and {C1=0, C2=01,C3=101},
Find the Entropy H(X) and
Expected length of the code
L(C).
Solution

(2)
Note: We shall see later that (2) is always valid for uniquely decodeable codes
with equality iff C(X) is an optimal code.
8
L(C): Optimal (Complete)
Example 3

Determine H(X) and L(C).

9
L(C): Fixed-length Code
Example 4 Expected length
Find the expected length for
the ensemble X of example 3,
if Cx is a fixed-length code
given as: Cx={00, 01, 10, 11}
i.e. lx={2, 2, 2, 2}.
Solution
For a fixed-length code, equation (1)
becomes:

(3)

10
Prefix-free {Instantaneous} Codes
Example 5
Consider the code:
C{1, 2, 3, 4} = {0, 10, 110, 111}
•Since none of the codes is a prefix of the
A codeword C(x) is other i.e. none of the codes begins
said to be prefix-free another. C(x) is a prefix-free code
iff no codeword C(xi) – instantaneous,
is a prefix of any – self-punctuating code
other codeword C(xj).

11
Extension C(x') of a Code
The extension C(x') of a
code C(x) is the
mapping from finite For Example 5
length strings of X to
finite-length string of D.

(4)

12
Uniquely-Decodeable Codes
• A code is uniquely decodeable if any encoded
string has only one possible source string that
can produce it.
• For this set of codes:
L(C) ≥ H(C)
with equality iff C(X) is a complete code

13
U-D Codes
Example 6
Check if a code given as C{1, 2, 3, 4} = {0, 1, 00, 11} with PX = { ½, ¼, ⅛, ⅛} is
uniquely decodeable.
Solution
Find H(X) and L(C).
H(X) = H{½, ¼, ⅛, ⅛} = 1.75 bits
L(C) = ½ + ¼ + ¼ + ¼ = 1.25bits
L(C) < H(C)
The code is not uniquely decodeable
Note:
A given source string X1 = 134213 encodes as {000111000} same as another
source string X2 = 312431 which also encodes as {000111000}.

Hence, C = 000111000 is not uniquely decodeable since it has more than one
(X1 and X2) source strings that can produce it.
14
U-D, Non-prefix Codes
Example 7
Check if a code given as C{A, B, C, D} ={0, 01, 011, 111} with
PX = { ½, ¼, ⅛, ⅛} is uniquely decodeable.
Solution
Find H(X) and L(C).
H(X) = H{½, ¼, ⅛, ⅛} = 1.75 bits
L(C) = 1x½ + 2x¼ + 3x⅛ + 3x⅛ = 1.75bits
L(C) = H(C)
The code is uniquely decodeable and complete
Note:
Though C is both uniquely decodeable (and complete), it is
however not a prefix code, since CA is a prefix of CB and CC etc.
15
Kraft-McMillian Inequality
Theorem
For any uniquely decodeable code
This inequality gives
C over the binary alphabet {0,1},
the limitation on the the codeword length must satisfy:
set of minimal
expected codeword (5)
lengths possible for
prefix codes to Conversely
describe a given Given a set of codeword lengths
source uniquely. that satisfy this inequality, there ∃
a uniquely decodeable prefix code
with these codeword lengths.

16
K-M Inequality: Consequences
a) If it holds with strict inequality {i.e. },
then the code is redundant.
b) If it holds with strict equality {i.e. },
then the code is a complete code.
c) If it does not hold {i.e. }, then the code
is not uniquely decodeable.

17
Binary Trees
• Any binary tree can be
viewed as a prefix code for
the leaves (i.e. terminal
nodes) of the tree.
(6)

Example 8
For the binary tree presented, (1), (2), and (3) are depths.
b and c are at depth 1;
equation (6) gives: d, e, f, and g are at depth 2 while
h, i, j, k are depth 3.

Conclusion:
Since the Kraft-McMillian inequality holds with strict inequality,
we conclude that this particular code has some redundancy.
18
Binary Tree
Example 9

Conclusion:
Since the Kraft-McMillian inequality holds with strict equality,
we conclude that this is a complete code.

19
Lower Bound on L(C)
Theorem
The expected length L(C) of a uniquely
decodeable code is bounded below by
H(X). i.e. :
L(C) ≥ H(X)
with equality iff 2-li=Pi

20
Lower Bound on L(C)
Proof
Corollary
When L(C) = H(X), the code is
said to be optimal.

Proof of equality clause

base change

fundamental inequality

21
Huffman Codes
• An entropy encoding • Developed by David A.
algorithm for lossless data Huffman @ MIT as a PhD
compression student.
• A simple algorithm that • Published in 1952 in “A
allows for the construction Method for Construction of
Minimum Redundancy
of an optimal {i.e. shortest
Codes”
expected length} prefix
• It is the most efficient
code.
compression method of its
kind
• Simplest algorithm gives
highest priority to least
probable node

22
Huffman Codes
• Consider a random variable X taking values in
the set X = {1, 2, 3, 4, 5} with probabilities
0.25, 0.25, 0.2, 0.15, 0.15, respectively.
• We expect the optimal binary code for X to
have the longest codewords assigned to the
symbols 4 and 5.

23
Example 10
A source generates four different symbols {a, b, c, d} with
probabilities PX = { ½, ¼, ⅛, ⅛} respectively. Construct an optimal
code to encode this source and check for optimality.
Solution

24
Example 11
Construct a binary Huffman code for the following distribution
on five symbols: PX = {0.3, 0.3, 0.2, 0.1, 0.1} and check for
optimality.
Solution

25
Huffman Codes
Exercise
Let Ax={1, 2, 3, 4, 5, 6, 7} and
Px={0.49, 0.26, 0.12, 0.04, 0.04, 0.03, 0.02}.
Construct Huffman code for encoding the
alphabet and check for optimality.
Further verify if there is any redundancy.

26

You might also like