Module 3
ELEMENTS OF ENCODING
3.1 Purpose of encoding; Separable
binary codes
3.2 Shannon-fano encoding,
Topics to 3.3 Necessary and sufficient
conditions for noiseless coding,
be 3.4 Average length of encoded
covered messages; Shannon’s binary encoding
3.5 Huffman’s minimum redundancy
codes.
3.6 Lossy and Loseless data
compression techniques.
2
Huffman Coding
Huffman coding results in an optimal code. It is the code that has the highest
efficiency.
The Huffman coding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols having the lowest probabilities
and reorder the resultant probabilities, this step is called reduction 1. The same
procedure is repeated until there are two ordered probabilities remaining.
3. Start encoding with the last reduction, which consists of exactly two ordered
probabilities. Assign 0 as the first digit in the code word for all the source
symbols associated with the first probability; assign 1 to the second probability.
4. Now go back and assign 0 and 1 to the second digit for the two probabilities
that were combined in the previous reduction step, retaining all the source
symbols associated with the first probability; assign 1 to the second probability.
5. Keep regressing this way until the first column is reached.
6. The code word is obtained tracing back from right to left.
3
Huffman Encoding - Example
H (X) = 2.36 b/symbol L= 2.38 b/symbol
η = H (X)/ L= 0.99
4
Shannon Fano Code Huffman Code
5
6
7
8
The source coding theorem
The source coding theorem states that for a DMS X,
with entropy H (X), the average code word length L
per symbol is bounded as L≥ H (X).
L can be made as close to H (X) as desired for some
suitable chosen code.
Thus, with 𝐿𝑚𝑖𝑛 = 𝐻 (𝑋)
The code efficiency can be rewritten as 𝜂 = 𝐻(𝑋)/𝐿
Re-written as:
M= bits per letter i.e M=2 for binary codes
9
Huffman as an optimal code
What is an optimal code?
1. The code efficiency is maximum
How and when?
2. The code gives the lowest possible average codeword
length for a given M which results in maximum efficiency and
minimum redundancy.
3. “Compression” should be maximum for an efficient source
encoder
Moreover, in an optimal code, symbols that occur more
frequently (have a higher probability of occurrence) will have
shorter codewords than symbols that occur less frequently.
10
Shannon Fano Vs Huffman
Coding
In general, Shannon-Fano and Huffman coding will
always be similar in size.
However, Huffman coding will always at least equal the
efficiency of the Shannon-Fano method, may be more
than it in some cases.
For example,
11
Shannon fano and Huffman Code
Try encode a message : AAABE
Symbol Count S.F Code Symbol Count H Code
A 14 2 A 14 1
B 7 2 B 7 3
C 5 2 C 5 3
D 5 3 D 5 3
E 4 3 E 4 3
Encode a Message: ABCDE
ASCII: 8 bits or 1 byte each: 280 bits Fixed Length code
Shannon Fano code: 87 bits
Variable
12 Length code
Huffman code: 77 bits
Role of M (bits per symbol)
Efficiency=? Efficiency=88.7% and 95.4%
P S. F H Code P S. F H Code
(M=2) code (M=3) code
0.4 00 0 0.4 0 0
0.2 01 111 0.2 10 2
0.12 100 101 0.12 11 11
0.08 101 1101 0.08 20 12
0.08 110 1100 0.08 21 100
0.08 1110 1001 0.08 220 101
0.04 1111 1000 0.04 221 102
L=2.52 L=2.48 L=1.72 L=1.6
H(X)=2.42 H(X)=2.42 H(X)=2.42 H(X)=2.42
Log 3 Log 3
13
P S. F P H Code
(M=3) code (M=3)
0.4 0 0.4 0 0.4 0 0.4 0
0.2 1 0 0.2 2 0.2 2 0.4 1
0.12 1 1 0.12 11 0.2 10 0.2 2
0.08 2 0 0.08 12 0.12 11
0.08 2 1 0.08 100 0.08 12
0.08 2 2 0 0.08 101
0.04 2 2 1 0.04 102
14
Practice problem
Take M=4
P (M=3) S. F H
code Code
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
15
Thank you