Lecture 5
Lecture 5
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as follows:
A=0
B = 01
C = 011
D = 111
The code is identical to the previous example but the bits are time reversed. It is still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes
of later ones.
Shannon Code
For messages x1 , x2 , x3 ,… xn with probabilities p( x1 ) , p ( x2 ) , p( x3 ) ,… p( xn ) then:
r
1) li = − log2 p( xi ) if p( xi ) = 1 1 1 1
{ , , ,...}
2 2 4 8
r
2) li = Int[− log2 p( xi )] + 1 if p( x ) 1
i
2
i −1
Also define Fi = p( xk ) 1 i 0
k =1
Ci = (Fi )2i
l
then find:
(a) Code efficiency,
(b) p(0) at the encoder output.
Solution
xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2
0 0 1
0 1 0
To find To find
1 1
0 1
1 0
0 0
7
H ( X ) = − p( x i ) log 2 p( x i ) = 2.6029 bits/message.
i =1
H (X )
= 100% = 83.965%
LC
p(0) = 0.603
Example
Repeat the previous example using ternary coding.
Solution
r
1) li = − log3 p( xi ) if p( x ) = 1 1 1 1
i
{ , , ,...}
3 3 9 27
xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2
x2 0.2 2 0.3 02 1
x3 0.15 2 0.5 11 0
x4 0.12 2 0.65 12 0
0 0 1
1
To find To find
1 2
2 0
7
H ( X ) = − p( x i ) log 3 p( x i ) = 1.642 ternary unit/message.
i =1
H (X )
= 100% = 73.632%
LC
0i p( xi ) 0 . 6 + 0 .2 + 0 .1
p ( 0) = i =1
=
LC 2.23
p(0) = 0.404
In Shannon–Fano coding, the symbols are arranged in order from most probable to
least probable, and then divided into two sets whose total probabilities are as close
as possible to being equal. All symbols then have the first digits of their codes
assigned; symbols in the first set receive "0" and symbols in the second set receive
"1". As long as any sets with more than one member remain, the same process is
repeated on those sets, to determine successive digits of their codes.
Example:
The five symbols which have the following frequency and probabilities, design
suitable Shannon-Fano binary code. Calculate average code length, source entropy
and efficiency.
𝐿 = ∑ 𝑃(𝑥𝑗 )𝑙𝑗
𝑗=1
𝐻(𝑌) 2.18567
𝜂= × 100 = × 100 = 95.86%
L 2.28
Example
Develop the Shannon - Fano code for the following set of messages,
p( x) = [0.35 0.2 0.15 0.12 0.1 0.08] then find the code efficiency.
Solution
xi p( xi ) Code li
x1 0.35 0 0 2
x2 0.2 0 1 2
x3 0.15 1 0 0 3
x4 0.12 1 0 1 3
x5 0.10 1 1 0 3
x6 0.08 1 1 1 3
6
LC = li p( xi ) = 2.45 bits/symbol
i =1
H (X )
= 100% = 97.796%
LC
Example
Repeat the previous example using with r = 3
Solution
xi p( xi ) Code li
x1 0.35 0 1
x2 0.2 1 0 2
x3 0.15 1 1 2
x4 0.12 2 0 2
x5 0.10 2 1 2
x6 0.08 2 2 2
6
LC = li p( xi ) = 1.65 ternary unit/symbol
i =1
6
H ( X ) = − p( xi ) log 3 p( xi ) =1.512 ternary unit/symbol
i =1
H (X )
= 100% = 91.636%
LC
Huffman Code
The Huffman coding algorithm comprises two steps, reduction and splitting. These
steps can be summarized as follows:
2) Splitting
2.12193
𝜂= × 100 = 96.45%
2.2
The average code word length is still 2.2 bits/symbol. But variances are different!
Example
Symbol A B C D E F G H
0
B 0.18 0.18 0.18 0.19 0.23 0.37 0.40
1
0
A 0.10 0.10 0.13 0.18 0.19 0.23
1
0
F 0.10 0.10 0.10 0.13 0.18
1
0
G 0.07 0.09 0.10 0.10
1
0
E 0.06 0.07 0.09 1
0
D 0.05 0.06 1
H 0.04 1
8
LC = li p( xi ) = 2.61 bits/symbol
i =1
Note:
The condition that the number of symbols n so that we can decode them using r
Huffman coding is n − r must be an integer value, otherwise, add a redundant symbols
r −1
with a probabilities equal to zero so that the condition is satisfied.
Data Compression:
In computer science and information theory, data compression, source coding, or bit-
rate reduction involves encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.
The input message to RLE encoder is a variable while the output code word is fixed,
unlike Huffman code where the input is fixed while the output is varied.
Example : Consider these repeated pixels values in an image … 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 5 0 0 0 0 0 0 0 0 We could represent them more efficiently as (12, 0)(4, 5)(8, 0)
24 bytes reduced to 6 which gives a compression ratio of 24/6 = 4:1.
Example :Original Sequence (1 Row): 111122233333311112222 can be encoded as:
(4,1),(3,2),(6,3),(4,1),(4,2). 21 bytes reduced to 10 gives a compression ratio of 21/10 =
21:10.
Example : Original Sequence (1 Row): – HHHHHHHUFFFFFFFFFFFFFF can be
encoded as: (7,H),(1,U),(14,F) . 22 bytes reduced to 6 gives a compression ratio of 22/6
= 11:3 .
Savings Ratio : the savings ratio is related to the compression ratio and is a measure of
the amount of redundancy between two representations (compressed and
uncompressed). Let:
N1 = the total number of bytes required to store an uncompressed (raw) source image.
N2 = the total number of bytes required to store the compressed data.
The compression ratio Cr is then defined as:
𝑁1
𝐶𝑟 =
𝑁2
Larger compression ratios indicate more effective compression
Smaller compression ratios indicate less effective compression
Compression ratios less than one indicate that the uncompressed representation
has high degree of irregularity.
The saving ratio Sr is then defined as :
(𝑁1 − 𝑁2 )
𝑆𝑟 =
𝑁1
Higher saving ratio indicate more effective compression while negative ratios are
possible and indicate that the compressed image has larger memory size than the
original.