0% found this document useful (0 votes)
2 views

Lecture 5

The document discusses various coding techniques, including instantaneous codes, Shannon codes, Shannon-Fano codes, and Huffman codes, emphasizing their unique decodability and efficiency. It provides examples of encoding messages using different probability distributions and calculates code efficiency, average code length, and source entropy for each method. The document illustrates the steps involved in creating these codes and compares their performance metrics.

Uploaded by

maxi milian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 5

The document discusses various coding techniques, including instantaneous codes, Shannon codes, Shannon-Fano codes, and Huffman codes, emphasizing their unique decodability and efficiency. It provides examples of encoding messages using different probability distributions and calculates code efficiency, average code length, and source entropy for each method. The document illustrates the steps involved in creating these codes and compares their performance metrics.

Uploaded by

maxi milian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

D = 111

This code can be instantaneously decoded since no complete codeword is a prefix of a


larger codeword. This is in contrast to the previous example where A is a prefix of both
B and D . This example is also a ‘comma code’ as the symbol zero indicates the end
of a codeword except for the all ones word whose length is known.

Example
Consider a 4 alphabet symbols with symbols represented by binary digits as follows:
A=0
B = 01
C = 011
D = 111
The code is identical to the previous example but the bits are time reversed. It is still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes
of later ones.

Shannon Code
For messages x1 , x2 , x3 ,… xn with probabilities p( x1 ) , p ( x2 ) , p( x3 ) ,… p( xn ) then:
r
1) li = − log2 p( xi ) if p( xi ) =  1  1 1 1
{ , , ,...}
2 2 4 8
r
2) li = Int[− log2 p( xi )] + 1 if p( x )   1 
i
2
i −1
Also define Fi =  p( xk ) 1  i  0
k =1

then the codeword of xi is the binary equivalent of Fi consisting of li bits.

Ci = (Fi )2i
l

where Ci is the binary equivalent of Fi up to li bits. In encoding, messages must be


arranged in a decreasing order of probabilities.

DR. MAHMOOD 2024-11-16 39


Example
Develop the Shannon code for the following set of messages,
p( x) = [0.3 0.2 0.15 0.12 0.1 0.08 0.05]

then find:
(a) Code efficiency,
(b) p(0) at the encoder output.
Solution

xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2

x2 0.2 3 0.3 010 2

x3 0.15 3 0.5 100 2

x4 0.12 4 0.65 1010 2

x5 0.10 4 0.77 1100 2

x6 0.08 4 0.87 1101 1

x7 0.05 5 0.95 11110 1

To find To find To find

0 0 1
0 1 0

To find To find

1 1
0 1
1 0
0 0

DR. MAHMOOD 2024-11-16 40


(a) To find the code efficiency, we have
7
LC =  li p( xi ) = 3.1 bits/message.
i =1

7
H ( X ) = − p( x i ) log 2 p( x i ) = 2.6029 bits/message.
i =1

H (X )
=  100% = 83.965%
LC

(b) p(0) at the encoder output is


7

 0i p( xi ) 0.6 + 0.4 + 0.3 + 0.24 + 0.2 + 0.08 + 0.05


p (0) = i =1
=
LC 3 .1

p(0) = 0.603
Example
Repeat the previous example using ternary coding.
Solution
r
1) li = − log3 p( xi ) if p( x ) =  1  1 1 1
i
{ , , ,...}
 3 3 9 27

2) li = Int[− log3 p( xi )] + 1 Ci = (Fi )3i


l
if 1
r
and
p( xi )   
 3

xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2

x2 0.2 2 0.3 02 1

x3 0.15 2 0.5 11 0

x4 0.12 2 0.65 12 0

x5 0.10 3 0.77 202 1

DR. MAHMOOD 2024-11-16 41


x6 0.08 3 0.87 212 0

x7 0.05 3 0.95 221 0

To find To find To find

0 0 1
1

To find To find

1 2
2 0

(a) To find the code efficiency, we have


7
LC =  li p( xi ) = 2.23 ternary unit/message.
i =1

7
H ( X ) = − p( x i ) log 3 p( x i ) = 1.642 ternary unit/message.
i =1

H (X )
=  100% = 73.632%
LC

(b) p(0) at the encoder output is


7

 0i p( xi ) 0 . 6 + 0 .2 + 0 .1
p ( 0) = i =1
=
LC 2.23

p(0) = 0.404

DR. MAHMOOD 2024-11-16 42


Shannon- Fano Code:

In Shannon–Fano coding, the symbols are arranged in order from most probable to
least probable, and then divided into two sets whose total probabilities are as close
as possible to being equal. All symbols then have the first digits of their codes
assigned; symbols in the first set receive "0" and symbols in the second set receive
"1". As long as any sets with more than one member remain, the same process is
repeated on those sets, to determine successive digits of their codes.

Example:

The five symbols which have the following frequency and probabilities, design
suitable Shannon-Fano binary code. Calculate average code length, source entropy
and efficiency.

Symbol count Probabilities Binary Length


codes
A 15 0.385 00 2
B 7 0.1795 01 2
C 6 0.154 10 2
D 6 0.154 110 3
E 5 0.128 111 3

The average code word length:


𝑚

𝐿 = ∑ 𝑃(𝑥𝑗 )𝑙𝑗
𝑗=1

𝐿 = 2 × 0.385 + 2 × 0.1793 + 2 × 0.154 + 3 × 0.154 + 3 × 0.128


= 2.28 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙

DR. MAHMOOD 2024-11-16 43


The source entropy is:
𝑚

𝐻(𝑌) = − ∑ 𝑃(𝑦𝑗 ) log 2 𝑃(𝑦𝑗 )


𝑗=1

𝐻(𝑌) = −[0.385𝑙𝑛0.385 + 0.1793𝑙𝑛0.1793 + 2 × 0.154𝑙𝑛0.154


+ 0.128𝑙0.128]/𝑙𝑛2

𝐻(𝑌) = 2.18567 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙

The code efficiency:

𝐻(𝑌) 2.18567
𝜂= × 100 = × 100 = 95.86%
L 2.28

Example
Develop the Shannon - Fano code for the following set of messages,
p( x) = [0.35 0.2 0.15 0.12 0.1 0.08] then find the code efficiency.
Solution
xi p( xi ) Code li
x1 0.35 0 0 2

x2 0.2 0 1 2

x3 0.15 1 0 0 3

x4 0.12 1 0 1 3

x5 0.10 1 1 0 3

x6 0.08 1 1 1 3

6
LC =  li p( xi ) = 2.45 bits/symbol
i =1

DR. MAHMOOD 2024-11-16 44


6
H ( X ) = − p( xi ) log 2 p( xi ) =2.396
bits/symbol
i =1

H (X )
=  100% = 97.796%
LC

Example
Repeat the previous example using with r = 3
Solution

xi p( xi ) Code li

x1 0.35 0 1

x2 0.2 1 0 2

x3 0.15 1 1 2

x4 0.12 2 0 2

x5 0.10 2 1 2

x6 0.08 2 2 2

6
LC =  li p( xi ) = 1.65 ternary unit/symbol
i =1

6
H ( X ) = − p( xi ) log 3 p( xi ) =1.512 ternary unit/symbol
i =1

H (X )
=  100% = 91.636%
LC

Huffman Code

The Huffman coding algorithm comprises two steps, reduction and splitting. These
steps can be summarized as follows:

DR. MAHMOOD 2024-11-16 45


1) Reduction
a) List the symbols in descending order of probability.

b) Reduce the r least probable symbols to one symbol with a probability


equal to their combined probability.

c) Reorder in descending order of probability at each stage.

d) Repeat the reduction step until only two symbols remain.

2) Splitting

a) Assign 0,1,...r to the r final symbols and work backwards.

b) Expand or lengthen the code to cope with each successive split.

Example: Design Huffman codes for 𝐴 = {𝑎1 , 𝑎2 , … … . 𝑎5 }, having the probabilities


{0.2, 0.4, 0.2, 0.1, 0.1}.

DR. MAHMOOD 2024-11-16 46


The average code word length:

𝐿 = 0.4 × 1 + 0.2 × 2 + 0.2 × 3 + 0.1 × 4 + 0.1 × 4 = 2.2 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙

The source entropy:

𝐻(𝑌) = −[0.4𝑙𝑛0.4 + 2 × 0.2𝑙𝑛0.2 + 2 × 0.1𝑙𝑛0.1]/𝑙𝑛2 = 2.12193 bits/symbol

The code efficiency:

2.12193
𝜂= × 100 = 96.45%
2.2

It can be design Huffman codes with minimum variance:

The average code word length is still 2.2 bits/symbol. But variances are different!

Example

Develop the Huffman code for the following set of symbols

Symbol A B C D E F G H

Probability 0.1 0.18 0.4 0.05 0.06 0.1 0.07 0.04

DR. MAHMOOD 2024-11-16 47


Solution
0
C 0.40 0.40 0.40 0.40 0.40 0.40 0.60 1.0

0
B 0.18 0.18 0.18 0.19 0.23 0.37 0.40
1

0
A 0.10 0.10 0.13 0.18 0.19 0.23
1

0
F 0.10 0.10 0.10 0.13 0.18
1

0
G 0.07 0.09 0.10 0.10
1
0
E 0.06 0.07 0.09 1

0
D 0.05 0.06 1

H 0.04 1

So we obtain the following codes


Symbol A B C D E F G H
Probability 0.1 0.18 0.4 0.05 0.06 0.1 0.07 0.04
Codeword 011 001 1 00010 0101 0000 0100 00011
li 3 3 1 5 4 4 4 5
8
H ( X ) = − p( xi ) log 2 p( xi ) = 2.552 bits/symbol
i =1

8
LC =  li p( xi ) = 2.61 bits/symbol
i =1

DR. MAHMOOD 2024-11-16 48


H (X )
=  100% = 97.778%
LC

Note:
The condition that the number of symbols n so that we can decode them using r
Huffman coding is n − r must be an integer value, otherwise, add a redundant symbols
r −1
with a probabilities equal to zero so that the condition is satisfied.
Data Compression:
In computer science and information theory, data compression, source coding, or bit-
rate reduction involves encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.

Lossless data compression algorithms usually exploit statistical redundancy to


represent data more concisely without losing information, so that the process is
reversible. Lossless compression is possible because most real-world data has statistical
redundancy. For example, an image may have areas of color that do not change over
several pixels.

Lossy data compression is the converse of lossless data compression. In these


schemes, some loss of information is acceptable. Dropping nonessential detail from the
data source can save storage space. There is a corresponding trade-off between
preserving information and reducing size.

Run-Length Encoding (RLE):


Run-Length Encoding is a very simple lossless data compression technique that
replaces runs of two or more of the same character with a number which represents the
length of the run, followed by the original character; single characters are coded as
runs of 1. RLE is useful for highly-redundant data, indexed images with many pixels
of the same color in a row.
Example:

DR. MAHMOOD 2024-11-16 49


Input: AAABBCCCCDEEEEEEAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAA
Output: 3A2B4C1D6E38A

The input message to RLE encoder is a variable while the output code word is fixed,
unlike Huffman code where the input is fixed while the output is varied.
Example : Consider these repeated pixels values in an image … 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 5 0 0 0 0 0 0 0 0 We could represent them more efficiently as (12, 0)(4, 5)(8, 0)
24 bytes reduced to 6 which gives a compression ratio of 24/6 = 4:1.
Example :Original Sequence (1 Row): 111122233333311112222 can be encoded as:
(4,1),(3,2),(6,3),(4,1),(4,2). 21 bytes reduced to 10 gives a compression ratio of 21/10 =
21:10.
Example : Original Sequence (1 Row): – HHHHHHHUFFFFFFFFFFFFFF can be
encoded as: (7,H),(1,U),(14,F) . 22 bytes reduced to 6 gives a compression ratio of 22/6
= 11:3 .
Savings Ratio : the savings ratio is related to the compression ratio and is a measure of
the amount of redundancy between two representations (compressed and
uncompressed). Let:
N1 = the total number of bytes required to store an uncompressed (raw) source image.
N2 = the total number of bytes required to store the compressed data.
The compression ratio Cr is then defined as:
𝑁1
𝐶𝑟 =
𝑁2
 Larger compression ratios indicate more effective compression
 Smaller compression ratios indicate less effective compression
 Compression ratios less than one indicate that the uncompressed representation
has high degree of irregularity.
The saving ratio Sr is then defined as :
(𝑁1 − 𝑁2 )
𝑆𝑟 =
𝑁1
 Higher saving ratio indicate more effective compression while negative ratios are
possible and indicate that the compressed image has larger memory size than the
original.

DR. MAHMOOD 2024-11-16 50


Example: a 5 Megabyte image is compressed into a 1 Megabyte image, the savings
ratio is defined as (5-1)/5 or 4/5 or 80%.
This ratio indicates that 80% of the uncompressed data has been eliminated in the
compressed encoding.

DR. MAHMOOD 2024-11-16 51

You might also like