Information Theory: Mohamed Hamada
Information Theory: Mohamed Hamada
Mohamed Hamada
Software Engineering Lab
The University of Aizu
Email: [email protected]
URL: http://www.u-aizu.ac.jp/~hamada
1
Today’s Topics
2
Source Coding Techniques
1. Huffman Code.
3. Lemple-Ziv Code.
Source Source
Encoder Decoder
Channel Channel
Encoder Decoder
Modulator De-Modulator
Channel
3
Source Coding Techniques
1. Huffman Code.
3. Lemple-Ziv Code.
4. Fano code.
5. Shannon Code.
6. Arithmetic Code.
4
Source Coding Techniques
1. Huffman Code.
3. Lemple-Ziv Code.
4. Fano Code.
5. Shannon Code .
6. Arithmetic Code.
5
Source Coding Techniques
1. Huffman Code.
With the Huffman code in the binary case the two least
probable source output symbols are joined together,
resulting in a new message alphabet with one less symbol
6
1. Huffman Code.
ADVANTAGES:
• uniquely decodable code
• smallest average codeword length
DISADVANTAGES:
• LARGE tables give complexity
• sensitive to channel errors
7
1. Huffman Code.
8
Huffman Coding: Example
• Compute the Huffman
Code for the source Source Symbol
shown Symbol Probability
sk pk
Note that: the entropy of S is
s0 0.1
1 s1 0.2
H ( S ) = ( 0.4 ) log2
0 .4 s2 0.4
1 s3 0.2
+2 × ( 0.2 ) log2
0 .2 s4 0.1
1
+2 × ( 0.1) log2
0 .1
2.12193 ≥ L 9
Solution A
Source Stage I
Symbol
sk
s2 0.4
s1 0.2
s3 0.2
s0 0.1
s4 0.1
10
Solution A
Source Stage I Stage II
Symbol
sk
s2 0.4 0.4
s1 0.2 0.2
s3 0.2 0.2
s0 0.1 0.2
s4 0.1
11
Solution A
Source Stage I Stage II Stage III
Symbol
sk
s2 0.4 0.4 0.4
s0 0.1 0.2
s4 0.1
12
Solution A
Source Stage I Stage II Stage III Stage IV
Symbol
sk
s2 0.4 0.4 0.4 0.6
s0 0.1 0.2
s4 0.1
13
Solution A
Source Stage I Stage II Stage III Stage IV
Symbol
sk
0 00
s2 0.4 0.4 0.4 0.6
0
s1 0.2 0.2 0.4 0.4 10
1
0 11
s3 0.2 0.2 0.2 1
0 010
s0 0.1 0.2 1
011
s4 0.1 1
14
Solution A
Source Stage I Stage II Stage III Stage IV Code
Symbol
sk
0
s2 0.4 0.4 0.4 0.6 00
0
s1 0.2 0.2 0.4 0.4 1
10
0
s3 0.2 0.2 0.2 1
11
0
s0 0.1 0.2 1
010
s4 0.1 1
011
15
Solution A Cont’d
H ( S ) = 2.12193
Source Symbol Code
Symbol Probability word ck
sk pk
s0 0.1 010
L= 0.4 × 2 + 0.2 × 2
s1 0.2 10
+0.2 × 2 + 0.1 × 3 + 0.1 × 3
s2 0.4 00
= 2.2
s3 0.2 11
s4 0.1 011
H (S ) ≤ L < H (S ) + 1
s4 0.1 1
0011
17
Another Solution B Cont’d
H ( S ) = 2.12193
Source Symbol Code
Symbol Probability word ck
sk pk
s0 0.1 0010
L= 0.4 × 1 + 0.2 × 2
s1 0.2 01
+0.2 × 3 + 0.1 × 4 + 0.1 × 4
s2 0.4 1
= 2.2
s3 0.2 000
s4 0.1 0011
H (S ) ≤ L < H (S ) + 1
18
What is the difference between
the two solutions?
• They have the same average length
• They differ in the variance of the average code
length
K −1
pk (l k − L )
2
=σ 2
∑
k=0
• Solution A
• σ2=0.16
• Solution B
• σ2=1.36
19
Source Coding Techniques
1. Huffman Code.
3. Lemple-Ziv Code.
4. Fano Code.
5. Shannon Code.
6. Arithmetic Code.
20
Source Coding Techniques
2. Two-pass Huffman Code.
21
Source Coding Techniques
2. Two-pass Huffman Code.
Example
22
0
Source Coding Techniques
1. Huffman Code.
3. Lemple-Ziv Code.
4. Fano Code.
5. Shannon Code.
6. Arithmetic Code.
23
Lempel-Ziv Coding
• Huffman coding requires knowledge of a
probabilistic model of the source
• This is not necessarily always feasible
• Lempel-Ziv code is an adaptive coding
technique that does not require prior
knowledge of symbol probabilities
• Lempel-Ziv coding is the basis of well-known
ZIP for data compression
24
Lempel-Ziv Coding History
•Universal: effective for different types of data
Applications:
GIF, TIFF, V.42bis modem compression standard, PostScript Level 2
History:
- 1977 published by Abraham Lempel and Jakob Ziv
- 1984 LZ-Welch algorithm published in IEEE Computer
- Sperry patent transferred to Unisys (1986)
GIF file format Required use of LZW algorithm
25
Lempel-Ziv Coding Example
Input: 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Subsequence 0 1
Representation
Encoding
26
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Subsequence 0 1 00
Representation
Encoding
27
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Subsequence 0 1 00 01
Representation
Encoding
28
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Subsequence 0 1 00 01 011
Representation
Encoding
29
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Subsequence 0 1 00 01 011 10
Representation
Encoding
30
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation
Encoding
31
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation
Encoding
32
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation
Encoding
33
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation
1 2 2 1 1 1 2
Encoding
34
Lempel-Ziv Coding Example
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation
11 12 42 21 41 61 62
Encoding
35
Lempel-Ziv Coding Example
Decimal Binary
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 1 001
2 010
4 100
6 110
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation 11 12 42 21 41 61 62
Encoding
0 1 1 0 0 0 1
36
Lempel-Ziv Coding Example
Decimal Binary
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 1 001
2 010
4 100
6 110
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation 11 12 42 21 41 61 62
Encoding
001 0 001 1 100 1 010 0 100 0 110 0 110 1
37
Lempel-Ziv Coding Example
Information bits 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Source encoded bits 0010 0011 1001 0100 1000 1100 1101
Codebook 1 2 3 4 5 6 7 8 9
Index
Representation 11 12 42 21 41 61 62
38
How Come this is Compression?!
• The hope is:
• If the bit sequence is long enough,
eventually the fixed length code words will
be shorter than the length of subsequences
they represent.
• When applied to English text
• Lempel-Ziv achieves approximately 55%
• Huffman coding achieves approximately
43%
39
Encoding idea Lempel Ziv Welch-LZW
● If wa is in the dictionary,
Process the next symbol with segment wa.
●
40
Dictionary
LZ Encoding example Initial
a 0
Consider the input message:
b 1
Input string: a a b a a ca b c a b
c 2
LZ encoding process:
Input: a a b a a c a b c a b output update
a a aa not in dictionry, output 0 add aa to dictionary 0 aa 3
a a b continue with a, store ab in dictionary 0 ab 4
a a b a continue with b, store ba in dictionary 1 ba 5
42
LZ Decoding example Dictionary
Initial
a 0
b 1
c 2
Output String:
Input update
a ? Output a 0
a a ! output a determines ? = a, update aa 0 aa 3
a a b . output 1 determines !=b, update ab 1 ab 4
a a b a a . 3 ba 5
a a b a a c . 2 aac 6
a a b a a c a b . 4 ca 7
a a b a a c a b c a . 7 abc 8
a a b a a c a b c a b 1
0011001111010100010001001
44