0% found this document useful (0 votes)

198 views44 pages

Information Theory: Mohamed Hamada

The document discusses source coding techniques for information theory, including Huffman coding, two-pass Huffman coding, and Lempel-Ziv coding. It provides examples of how to generate Huffman codes based on symbol probabilities and how two-pass Huffman coding estimates probabilities. Lempel-Ziv coding is also introduced as an adaptive technique that does not require prior knowledge of symbol probabilities.

Uploaded by

Sudesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views44 pages

Information Theory: Mohamed Hamada

Uploaded by

Sudesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

I nform ation Theory

Mohamed Hamada
Software Engineering Lab
The University of Aizu

Email: [email protected]
URL: http://www.u-aizu.ac.jp/~hamada

1
Today’s Topics

• Source Coding Techniques

• Huffman Code
• Two-pass Huffman Code
• Lemple-Ziv Encoding
• Lemple-Ziv Decoding

2
Source Coding Techniques
1. Huffman Code．

2. Two-pass Huffman Code．

3. Lemple-Ziv Code．

Information 4. Fano code．

User of
Source 5. Shannon Code． Information
6. Arithmetic Code．

Source Source
Encoder Decoder

Channel Channel
Encoder Decoder

Modulator De-Modulator

Channel

3
Source Coding Techniques

1. Huffman Code．

2. Two-pass Huffman Code．

3. Lemple-Ziv Code．

4. Fano code．

5. Shannon Code．

6. Arithmetic Code．

4
Source Coding Techniques

1. Huffman Code．

2. Two-path Huffman Code．

3. Lemple-Ziv Code．

4. Fano Code．

5. Shannon Code ．

6. Arithmetic Code．

5
Source Coding Techniques
1. Huffman Code．

With the Huffman code in the binary case the two least
probable source output symbols are joined together,
resulting in a new message alphabet with one less symbol

1 take together smallest probabilites: P(i) + P(j)

2 replace symbol i and j by new symbol
3 go to 1 - until end

Application examples: JPEG, MPEG, MP3

6
1. Huffman Code．

ADVANTAGES:
• uniquely decodable code
• smallest average codeword length

DISADVANTAGES:
• LARGE tables give complexity
• sensitive to channel errors

7
1. Huffman Code．

For COM P UTER DATA data reduction is

lossless → no errors at reproduction

universal → effective for different types of data

Huffman is not universal!

it is only valid for one particular type of source:
If the source has no probability distribution
Huffman code can not applied.

8
Huffman Coding: Example
• Compute the Huffman
Code for the source Source Symbol
shown Symbol Probability
sk pk
Note that: the entropy of S is
s0 0.1
 1  s1 0.2
H ( S ) = ( 0.4 ) log2  
 0 .4  s2 0.4
 1  s3 0.2
+2 × ( 0.2 ) log2  
 0 .2  s4 0.1
 1 
+2 × ( 0.1) log2  
 0 .1 
2.12193 ≥ L 9
Solution A
Source Stage I
Symbol
sk
s2 0.4

s1 0.2

s3 0.2

s0 0.1

s4 0.1

10
Solution A
Source Stage I Stage II
Symbol
sk
s2 0.4 0.4

s1 0.2 0.2

s3 0.2 0.2

s0 0.1 0.2

s4 0.1

11
Solution A
Source Stage I Stage II Stage III
Symbol
sk
s2 0.4 0.4 0.4

s1 0.2 0.2 0.4

s3 0.2 0.2 0.2

s0 0.1 0.2

s4 0.1

12
Solution A
Source Stage I Stage II Stage III Stage IV
Symbol
sk
s2 0.4 0.4 0.4 0.6

s1 0.2 0.2 0.4 0.4

s3 0.2 0.2 0.2

s0 0.1 0.2

s4 0.1

13
Solution A
Source Stage I Stage II Stage III Stage IV
Symbol
sk
0 00
s2 0.4 0.4 0.4 0.6
0
s1 0.2 0.2 0.4 0.4 10
1
0 11
s3 0.2 0.2 0.2 1
0 010
s0 0.1 0.2 1
011
s4 0.1 1

14
Solution A
Source Stage I Stage II Stage III Stage IV Code
Symbol
sk
0
s2 0.4 0.4 0.4 0.6 00
0
s1 0.2 0.2 0.4 0.4 1
10
0
s3 0.2 0.2 0.2 1
11
0
s0 0.1 0.2 1
010

s4 0.1 1
011

15
Solution A Cont’d
H ( S ) = 2.12193
Source Symbol Code
Symbol Probability word ck
sk pk
s0 0.1 010
L= 0.4 × 2 + 0.2 × 2
s1 0.2 10
+0.2 × 2 + 0.1 × 3 + 0.1 × 3
s2 0.4 00
= 2.2
s3 0.2 11
s4 0.1 011
H (S ) ≤ L < H (S ) + 1

THIS IS NOT THE ONLY SOLUTION!

16
Another Solution B
Source Stage I Stage II Stage III Stage IV Code
Symbol
sk
0
s2 0.4 0.4 0.4 0.6 1
0
s1 0.2 0.2 0.4 0.4 1
01
0
s3 0.2 0.2 0.2 1
000
0
s0 0.1 0.2 1
0010

s4 0.1 1
0011

17
Another Solution B Cont’d

H ( S ) = 2.12193
Source Symbol Code
Symbol Probability word ck
sk pk
s0 0.1 0010
L= 0.4 × 1 + 0.2 × 2
s1 0.2 01
+0.2 × 3 + 0.1 × 4 + 0.1 × 4
s2 0.4 1
= 2.2
s3 0.2 000
s4 0.1 0011
H (S ) ≤ L < H (S ) + 1

18
What is the difference between
the two solutions?
• They have the same average length
• They differ in the variance of the average code
length
K −1
pk (l k − L )
2
=σ 2
∑
k=0

• Solution A
• σ2=0.16
• Solution B
• σ2=1.36
19
Source Coding Techniques

1. Huffman Code．

2. Two-pass Huffman Code．

3. Lemple-Ziv Code．

4. Fano Code．

5. Shannon Code．

6. Arithmetic Code．

20
Source Coding Techniques
2. Two-pass Huffman Code．

This method is used when the probability of symbols in

the information source is unknown. So we first can
estimate this probability by calculating the number of
occurrence of the symbols in the given message then we
can find the possible Huffman codes. This can be
summarized by the following two passes.

Pass 1 : Measure the occurrence possibility of each character in the message

Pass 2 : Make possible Huffman codes

21
Source Coding Techniques
2. Two-pass Huffman Code．

Example

Consider the message: M=ABABABABABACADABACADABACADABACAD

L(M)=32 #(A)=16 p(A)=16/32=0.5

#(B)=8 p(B)=8/32=0.25
#(C)=4 p(C)=4/32=0.125
#(D)=4 p(D)=4/32=0.125

22
0
Source Coding Techniques

1. Huffman Code．

2. Two-path Huffman Code．

3. Lemple-Ziv Code．

4. Fano Code．

5. Shannon Code．

6. Arithmetic Code．

23
Lempel-Ziv Coding
• Huffman coding requires knowledge of a
probabilistic model of the source
• This is not necessarily always feasible
• Lempel-Ziv code is an adaptive coding
technique that does not require prior
knowledge of symbol probabilities
• Lempel-Ziv coding is the basis of well-known
ZIP for data compression

24
Lempel-Ziv Coding History
•Universal: effective for different types of data

•Lossless: no errors at reproduction

Applications:
GIF, TIFF, V.42bis modem compression standard, PostScript Level 2

History:
- 1977 published by Abraham Lempel and Jakob Ziv
- 1984 LZ-Welch algorithm published in IEEE Computer
- Sperry patent transferred to Unisys (1986)
GIF file format Required use of LZW algorithm

25
Lempel-Ziv Coding Example
Input: 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…