Digital Communication & Systems
V. Praksh Singh
Department of Electronics & Comunication Engineering
National Institute of Technology Hamirpur
Hamirpur, Himachal Pradesh
India
Jan,9 2020
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Lecture #2: Coding for Discrete Sources
References:
’Modern Digital and Analog Communication Systems’, by B. P. Lathi and Zhi
Ding.
’Introduction to Analog and Digital Communications’, by Simon Haylin and
Michael Moher.
’Principles of Digital Communication’, by Robert G Gallager.
’Elements of Information Theory’, by Thomas Cover and Joy thomas.
’Lecture notes on Applied Digital Information Theory I’, James L. Massey.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Outline of the lecture
Coding of a discrete information source
Mathematical modeling of discrete sources
Source Coding
Source coding Theorem
Source coding Algorithms
Shannon Fano coding
Huffman coding
Lempel-ziv coding
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Outline
1 Coding for Discrete Sources
2 Mathematical Modeling of a Discrete Sources
3 Coding of a discrete random variable
4 Source Coding Algorithms
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Coding for Discrete Sources
In digital communication systems, all information sources e.g.
speech waveforms (analog source) or text files (discrete source)
must be represented by a sequence of bits.
Discrete Source: The source output is a sequence of symbols from a
given alphabet A of finite size. e.g.: Text files may consist of
symbols from an alphabet of english letters and alpha-numeric
characters.
The source encoder converts the sequence of symbols from the
source to a sequence of bits, using as few bits per symbol as
possible. (Also called source compression)
In this lecture, we will consider lossless encoding of discrete sources,
such that source output can be uniquely recovered from the encoded
string of bits.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Fixed length codes for Discrete Sources
Suppose the size of the source alphabet A is M. The simplest
method to encode a discrete source is to map each symbol a ∈ A
into a fixed length code C(a).
For example: If the source alphabet consists of 26 capital english
letters, then the following binary code of block-length L=5 can be
used.
Table: Fixed length coding
Symbols Code
A 00000
B 00001
.. ..
. .
Y 11010
Z 11011
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Fixed length codes for Discrete Sources
For binary fixed length code for a source alphabet A of size M will
require L = dlog2 Me bits to encode each symbol (L bits/symbol).
Where d(x)e denotes the smallest integer greater than or equal to
the real number x.
However, this method of assigning codewords of fixed length to each
source symbol does not take into consideration the probabilities of
occurance of source symbols.
We can reduce the average length per symbol (L) by assignong more
bits to less probable symbols and lesser bits to more probable
symbols i.e. variable length coding.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Outline
1 Coding for Discrete Sources
2 Mathematical Modeling of a Discrete Sources
3 Coding of a discrete random variable
4 Source Coding Algorithms
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Mathematical Modeling of a Discrete Source
We mathematically model a discrete source as a discrete random
process
Figure: Discrete Information Source
We further assume that the discrete source is memoryless i.e. the
sequence of symbols U1 , U2 , · · · are independently generated by the
source and with the same probability mass function.
So, the output of a discrete memoryless source (DMS) is a sequence
of independent and identically distributed random variables.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Mathematical Modeling of a Discrete Source
Let U is a K-ary discrete random variable which takes values from
the set A = {u1 , u2 , · · · , uK }.
The probability mass function (PMF) of U is
Pr (U = ui ) = pi for i = 1, 2, · · · , K .
Each source output in a DMS i.e. U1 , U2 , · · · is selected from A
with the same PMF.
Each source output Ui in a DMS is statistically independent of
previous outputs U1 , U2 , · · · , Ui−1 .
A DMS is completely described by the source alphabet A and the
set of probabilities {pi }K
i=1 .
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Information Measure
Suppose ui is a particular realization of a K-ary discrete random
variable U. How can we quantify the amount of information I (ui )
provided by the observation ui ?
The information measure of output ui should depend only on the
probability of ui i.e. pi . More probable the outcome, less will be the
information conveyed by its observation (i.e. measure should be a
decreasing function of p).
The information measure should be a continuous function of
probability. i.e. small change in the probability of a certain
observation should not drastically change the information.
If the observation is divided into two (or more) independent parts
ui = {ui1 , ui,2 } the information measure of ui should be the sum of
information provided by the each independent part.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Information Measure (Entropy)
The only function for information measure, that satisfies the above
mentioned properties is the logarithmic function.
1
I (ui ) = log
pi
is called as self-information of ui .
We define the average information content of the random variable U
as entropy of the discrete random variable U.
K K
X X 1
H(U) = pi I (ui ) = pi log
pi
i=1 i=1
where 0log 0 = 0
If base of the logarithm is 2, entropy (information) is expressed in
bits/symbol.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Properties of Entropy
The entropy is always non-negative i.e. H(U) ≥ 0.
Entropy can be changed from one base to another as
Hb (U) = Ha (U)logb a.
Let a binary memoryless source (2-ary discrete source) emits u1 = 0
or u2 = 1 with Pr (U = u1 ) = p and Pr (U = u2 ) = 1 − p.
def
H(U) = −plogp − (1 − p)log (1 − p) =H(p)
is called the binary entropy function.
We can observe that H(U) = 0 for p=0 or p=1 i.e source generates
only zeros or ones and there is no uncertainty. The maximum
uncertainty is when p = 1/2 and entropy is maximum H(U) = 1.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Outline
1 Coding for Discrete Sources
2 Mathematical Modeling of a Discrete Sources
3 Coding of a discrete random variable
4 Source Coding Algorithms
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Coding of a discrete random variable
Given a K-ary random variable with the source alphabet
A = {u1 , u2 , · · · , uK } and the set of probabilities {pi }K
i=1 .
Figure: Coding of a discrete random variable
The symbols Xi takes values from a D-ary alphabet
D = {0, 1, · · · , D − 1} . For binary codes D=2.
A variable-length code C maps each source symbol ui in A to a
D-ary string [x1 , x2 , · · · , xli ] called a codeword.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Coding of a discrete random variable
The number of symbols in the codeword C(ui ) is called the length
(li ) of the codeword.
The set of K codewords {C(u1 ), C(u2 ), · · · , C(uK )} is called a D-ary
code for a K-ary random variable.
The average (expected) length of the code C is defined as
K
X
L̄ = pi li
i=1
We see that we can reduce the average length of the code by
assigning codewords of smaller lengths to more probable source
symbols and vice-versa.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Class of codes
However, we can not just arbitrarily assign codewords of small
lengths to each source symbol. A code C must satisfy following
conditions for it to be suitable for encoding discrete sources in
practice.
Non-singular codes: A code C is nonsingular if every symbol from
the source alphabet is assigned a distinct codeword.
ui 6= uj =⇒ C(ui ) 6= C(uj )
Uniquely decodable codes: A code C is uniquely decodable if a
sequence of codewords should be decoded into only one possible
sequence of source symbols.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Class of codes
Instantaneous code: A code is called a prefix-free code or an
instantaneous code if no codeword in the code is a prefix of another
codeword.
In an instantaneous code, a symbol ui can be decoded as soon as
the last symbol of corresponding codeword arrives with out waiting
for future codewords.
Figure: Example of codes
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Kraft Inequality for prefix-free codes
The objective in encoding a discrete source is to construct a code C
which is instantaneous (prefix fee) and of minimum average length.
The set of codeword lengths possible for an instantaneous code is
limited by the Kraft inequality.
There exists a D-ary prefix-free (instantaneous) code whose
codeword lengths are the positive integers l1 , l2 , l3 , · · · , lK if and only
if
XK
D −li ≤ 1
i=1
Conversely, if a set of positive integers l1 , l2 , l3 , · · · , lK satisfy Kraft
inequality, we can construct a D-ary prefix-free code with codewords
of these lengths.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Optimal Prefix-free Codes
Given a discrete memoryless source i.e. source alphabet A and the
set of probabilities {pi }K
i=1 , we wish to create a prefix-free D-ary
code with minimum possible average length L̄.
In other words, we wish to determine a set of codeword lengths
l1 , l2 , l3 , ·P
· · , lK that satisfy Kraft’s inequality and the average length
K
of code i=1 pi li is minimized.
The optimization problem is formulated as
PK
min i=1 pi li
l1 ,l2 ,··· ,lK
PK −li
subject to i=1 D ≤1
l1 , l2 , · · · , lK are positive integers
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Optimal Prefix-free Codes
Initially, we simplify the optimization problem by droping the integer
constraint on the codeword lengths.
This simplified problem can be solved using Lagrange multiplier
method.
The Lagrangian of the simplified problem is formed as
K
X K
X
−li
J= pi li + λ D −1
i=1 i=1
where λ is the Lagrange multiplier.
Setting ∂J
∂li = 0, we get D −li = pi /λloge D.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Optimal Prefix-free Codes
PK 1
Using the constraint i=1 pi = 1 we obtain λ = loge D .
The optimal solution for codeword lengths are given as
∗ 1
l = logD
pi
Substituting these lengths, we obtain the average length of the
optimal code
K
∗
X 1
L̄ = pi logD = HD (U)
pi
i=1
In summary, the entropy HD (U) is a lowerbound for average length
L̄ for prefix-free codes and this lowerbound is achieved when
li = −logD (pi ) for each i.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Entropy bounds for prefix-free codes
In the solution of the optimization problem for the optimal code
(minimum average length code), we relaxed the integer constraint
on the lengths of the codeqords.
Therefore, HD (U) provides a lower bound on the average length of
the optimal code. The following theorem provides a lowerbound and
upperbound on the average length of the optimal code.
Coding Theorem : Let l1 , l2 , · · · , lK be the codeword lengths of an
optimal D-ary code for a discrete random variable with K-ary
alphabet A and probability mass function p and L̄∗ is the average
length of the code.
HD (U) ≤ L̄∗ < HD (U) + 1
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Proof of Coding Theorem
First we prove the left side i.e. show that HD (U) ≤ L̄∗ .
For any prefix-free code
K K
X 1 X
HD (U) − L̄ = pi logD − pi li
pi
i=1 i=1
K K
X 1 X
= pi logD + pi logD (D −li )
pi
i=1 i=1
K −li
X D
= pi logD
pi
i=1
where we have used logD (D li ) = li .
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Proof of Coding Theorem
Using an inequality logD u ≤ (u − 1)logD e, we get
K −li
X D
HD (U) − L̄ ≤ logD e pi −1
pi
i=1
X K K
X
−li
≤ logD e D − pi
i=1 i=1
≤ 0
where we have used Kraft inequality and properties of probability.
The inequality is strict unless li = −logD pi and pi is a power of D
(because length of the codewords are integer).
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Proof of Coding Theorem
Now we prove the right side i.e. show that L̄∗ < HD (U) + 1.
Let us assign the codeword lengths of a code as li = dLogD (−pi )e.
We can show that these lengths satisfy the Kraft inequality i.e. it is
a prefix-free code.
This assignment of these codeword lengths implies
−logD pi ≤ li < −logD pi + 1
K
X X K K
X
− pi logD pi ≤ pi li < − pi logD pi + 1
i=1 i=1 i=1
HD (U) ≤ L̄ < HD (U) + 1
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Extension of Source
We can see from the bounds on average length of the optimal code,
that we can construct codes for discrete memoryless source with in
one bit of the source entropy.
The average length of the code can be made arbitrarily close to
source entropy by encoding a block of n source symbols.
Suppose a block of symbols from the source is considered as one
super symbol U n = [U1 , U2 , · · · , Un ].
This super symbol is considered as a random variable which takes
values from the alphabet An of size K n . A prefix-free code can be
constructed for U n similar to that for U. This is called nth order
extension of the source.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Extension of Source
A block of n i.i.d. source symbols will have entropy
HD (U n ) = H(U1 , U2 , · · · , Un ) = nH(U)
Let L̄∗n be the average length per input symbol of the optimal
prefix-free code for U n , then applying the theorem for bounds on
average length of optimal codes, we get
HD (U n ) ≤ nL̄∗n < HD (U n ) + 1
nHD (U) ≤ nL̄∗n < nHD (U) + 1
HD (U) ≤ L̄∗n < HD (U) + 1/n
This result shows that we can simultaneously encode long n-tuples
of source symbols to approach the entropy bound (as 1/n goes to
zero).
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Extension of Source (Example)
Example: Let a discrete memoryless source has the source alphabet
A = {u1 , u2 } with corresponding probabilities {p1 = 0.4, p2 = 0.6}.
Find the entropy of the source. Suppose now we wish to
simultaneously encode a block of two symbols from the source i.e.
second order extension of the source. Construct the source alphabet
and probability distribution for the extended source and find its
entropy.
Solution: The entropy of the source is computed as
K
X
H2 (U) = − pi log2 pi
i=1
which gives H2 (U) = 0.9710.
Now, we encode a pair of source symbols together. The second
order extension of the source is given as
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Extension of Source (Example)
The second order extension of the source
Table: Second order extension of the source
Symbols Probability Probability
u1 u1 p21 0.16
u1 u2 p1 p2 0.24
u2 u1 p2 p1 0.24
u2 u2 p22 0.36
The entropy of this source is
H(U) = −(0.16log2 (0.16)+0.24log2 (0.24)+0.16log2 (0.16)+0.36log2 (0.36))
We can see that entropy of the second order extension of the source
is twice the entropy of the original source. (because the emitted
symbols from the source are assumed i.i.d.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Outline
1 Coding for Discrete Sources
2 Mathematical Modeling of a Discrete Sources
3 Coding of a discrete random variable
4 Source Coding Algorithms
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Source Coding Algorithms
We have seen that the source of the entropy gives a lower bound on
the average length of any prefix-free code for the source.
In this section, we will study some specific algorithms for source
coding and compare the achieved average code length with the
source entropy.
Shannon Fano Source coding Algorithm : This is a suboptimal
procedure for designing prefix-free code for a given discrete
memoryless source.
This algorithm achieves the average code length as L̄ ≤ H(U) + 2.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Shannon-Fano Source Coding Algorithm
Shannon-Fano coding: Algorithm for constructing a binary
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable with source alphabet
A = {u1 , u2 , · · · , uK } and corresponding probabilities
Pr (U = ui ) = pi . Order these symbols in decreasing order of
probabilities.
Step 1: Divide the symbols into two subgroups, such that sum of
symbol probabilities in two subgroups are as close as possible.
Step 2: Assign next most significant bit of these two subgroups as 0
and 1 in any order.
Step 3: If only one symbol is left in any subgroup, stop else go to
Step 1.
Extract the Shannon-Fano code starting from the MSB.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Shannon-Fano Source Coding Algorithm
Example: Construct Shannon-Fano code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Shannon-Fano Source Coding Algorithm
The average code length of the Shannon-Fano code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×3+0.10×3+0.10×3+0.10×3 = 2.5
i=1
The entropy of the given source is
K
X
H2 (U) = − pi log2 pi = 2.4610
i=1
The efficiency of the designed code is defined as
Source Entropy
Code efficiency =
Average code length
2.4610
So the code efficiency in this case is 2.5 = 0.984. (i.e. 98.4%)
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Source Coding Algorithm
Huffman coding: Huffman codes are the optimal prefix free codes
i.e. ( with minimum anergae code length) for a discrete memoryless
source with a given probability mass function.
The basic idea of Huffman codes is to assign short code sequence to
more probable source symbols and longer code sequence to less
probable source symbols.
The set of codeword lengths for Huffman (optimal) code is not
unique, i.e. there may be different sets of codeword lengths with
same average length.
The codeword length for optimal code may not always be less than
dlogD (1/pi)e.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Codes (Binary)
Huffman coding: Algorithm for constructing binary (D=2) prefix-free
code for a K-ary discrete memoryless source (random variable U).
Initialization: Given a K-ary random variable, create K active nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Step 1: Create a new node that combines together the two least
probable nodes and assign label 0 and 1 to the two branches in any
order. Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these two nodes which are combined and make the new
node active.
Step 2: If only one node is left, make it root and stop else go to
Step 1.
Extract the Huffman codewords from different branches starting
from the root.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Codes (D-ary)
Huffman coding: Algorithm for constructing D-ary (D > 2)
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable, create K nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Compute the remainder p when (K − D)(D − 2) is divided by D-1.
Step 1: Create a new node that combines together the D - p least
probable nodes with D - p branches of a D-ary branch and assign
assign label 0, 1, · · · , D − p − 1 to the D − p branches in any order.
Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these D − p nodes which are combined and make the
new node active.
Step 2: If only one node is left, stop, else make p = 0 and go to
Step 1.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Codes
Example: Construct a binary Huffman code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10
Figure: Binary Huffman Code
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Codes
Example: Construct a ternary (3-ary) Huffman code for the
following source: P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4)
= 0.10, P(u5) = 0.10, P(u6) = 0.10
Figure: Ternary Huffman Code
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Huffman Coding Algorithm
The average code length of the binary Huffman code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×2+0.10×3+0.10×4+0.10×4 bits
i=1
The average code length of the ternary Huffman code for the same
source is
L̄ = 0.25×1+0.25×1+0.20×2+0.10×2+0.10×3+0.10×3 ternary digits
2.4610
The code efficiency of binary Huffman code is 2.5 = 0.984. (i.e.
98.4%)
The code efficiency of ternary Huffman code is 1.552
1.7 = 0.91.34. (i.e.
91.3%). Note that we have used entropy in base 3 i.e. H3 (U) here.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems