Ic23 Unit01 Script
Ic23 Unit01 Script
MI
A
1 2
So far on Image Compression ... 3 4
5 6
7 8
9 10
overview: lossless vs. lossy compression
11 12
Lossless compression is a prerequisite for lossy codecs. 13 14
15 16
Information theory provides the mathematical background. 17 18
19 20
• tools for analysis of input data 21 22
23 24
• optimality statements 25 26
27 28
This part of the lecture does not necessarily deal with images. 29 30
31 32
• zip, bzip2, compress, deflate, ... 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
51 52
53
Image Compression 2023 – Learning Unit 1 MI
A
1 2
Learning Unit 1 3 4
5 6
Basic Entropy Coding I: 7 8
9 10
Introduction to Information Theory 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
27 28
1. Basic Definitions and Mathematical Tools 29 30
31 32
2. Encoding and Decoding 33 34
3. Self-Information and Entropy 35 36
37 38
4. Prefix-free Codes 39 40
41 42
5. Shannon-Fano and Tunstall Codes 43 44
45 46
47 48
49 50
c 2023 Christian Schmaltz, Pascal Peter 51 52
53
Basic Definitions MI
A
1 2
Basic Definitions 3 4
5 6
7 8
goals of lossless image compression: 9 10
11 12
Allow to reconstruct unmodified input data. 13 14
15 16
Reduce size of the compressed file as much as possible. 17 18
19 20
preliminary questions: 21 22
23 24
How to define an image? 25 26
27 28
How to measure the efficiency of a compression algorithm? 29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Images (1) MI
A
1 2
How to define an image? 3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
many correct definitions depending on point of view: 33 34
35 36
acquisition view: e.g. sensor responses, photon count 37 38
39 40
continuous mathematical interpretation (useful later) 41 42
43 44
discrete mathematical interpretation (useful now) 45 46
47 48
49 50
51 52
53
Images (2) MI
A
1 2
Discrete Images 3 4
5 6
7 8
Typical input image: 9 10
11 12
resolution nx × ny pixels 13 14
15 16
fixed bit depth, e.g. 8bit: 0,...,255 grey values
17 18
19 20
Mathematical representation: 21 22
23 24
image domain Ω = {1, ..., nx} × {1, ..., ny } 25 26
27 28
image u maps each position (i, j)> ∈ Ω to a grey value:
29 30
31 32
u : Ω → {0, ..., 255} 33 34
35 36
37 38
notation: ui,j := u(i, j) 39 40
41 42
similar to matrix notation, but rows and columns are swapped! 43 44
45 46
47 48
49 50
51 52
53
Images (3) MI
A
1 2
Vectorised Images 3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
turn 2-D image into vector by traversing pixels row by row 29 30
31 32
1-D vector images are often referred to as signals. 33 34
35 36
Thus, an image can be seen as a sequence of numbers from {0, ..., 255}. 37 38
39 40
In a similar way, this works for higher dimensional and colour images. 41 42
43 44
Even higher level of abstraction: Consider sequences of arbitrary symbols. 45 46
47 48
49 50
51 52
53
9 10
Algorithms only differ in the size of the compressed file and run time. 11 12
13 14
15 16
Compression Ratio 17 18
19 20
The compression ratio C is defined as: 21 22
23 24
25 26
Size of uncompressed file 27 28
C :=
Size of compressed file 29 30
31 32
33 34
Typically given as ratio, e.g. 40 : 1.
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
File Sizes and Compression Rates (2) MI
A
1 2
Bitrate 3 4
5 6
The bitrate (also called bit rate or data rate) is defined as: 7 8
9 10
Size of compressed data 11 12
R := 13 14
Amount of uncompressed data 15 16
17 18
Unit depends on kind of data, e.g. 19 20
21 22
• bit per symbol for general data 23 24
25 26
• bit per pixel (bpp) for images 27 28
29 30
• bit per second (bit/s, b/s, bps), byte per second (byte/s, B/s, Bps), bytes 31 32
per frame (for video streams) . . . 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
9 10
Decimal Binary 11 12
13 14
Unit Value Unit Value Difference 15 16
Kilobyte (kB) 103 Byte Kibibyte (KiB) 10
2 Byte 2.4 % 17 18
19 20
Megabyte (MB) 106 Byte Mebibyte (MiB) 220 Byte 4.9 % 21 22
Gigabyte (GB) 109 Byte Gibibyte (GiB) 230 Byte 7.4 % 23 24
Terabyte (TB) 1012 Byte Tebibyte (TiB) 240 Byte 10.0 % 25 26
.. .. .. .. .. 27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
File Sizes and Compression Rates (4) MI
A
1 2
Prefixes 3 4
5 6
Binary units are often denoted like their decimal counterparts. 7 8
9 10
In rare cases, they are even mixed: 11 12
13 14
• In 1987, a 3 12 -inch “High Density” floppy disk was introduced. 15 16
• This disk has a “marketed capacity” of 1.44 MB (see image) 17 18
19 20
• However, the real capacity is 21 22
23 24
1, 474, 560 bytes = 1440 · 1024 bytes = 1440 KiB = 1.440 · 1000 KiB 6= 1.44 MB 25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
Image of a “1.44 MB” floppy disk. From: Wikipedia Commons. user:Kobafd90. 51 52
53
13 14
The probability of an event E is denoted by P (E). 15 16
17 18
Two events are called independent if the occurrence of either has no influence 19 20
27 28
The expected value E(X) of a random variable X is the average result one 29 30
expects. 31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Probability Theory (2) MI
A
1 2
Recap: Probability Theory (Mathematical) 3 4
5 6
Ω is the sample space (e.g. for a coin flip Ω = {heads, tails}) 7 8
9 10
Events are the elements of the σ-algebra A ⊆ P(Ω) (powerset P): 11 12
13 14
• Ω∈A 15 16
17 18
• A∈A⇒Ω\A∈A 19 20
• ∀i ∈ {1, ..., ∞} : Ai ∈ A ⇒ ∪∞ 21 22
i=1 Ai ∈ A
23 24
The probability distribution P : A → [0, 1] is a probability measure on A: 25 26
27 28
• ∀A ∈ A : P (A) ≥ 0 29 30
31 32
• P (∅) = 0, P (Ω) = 1 33 34
P∞ 35 36
• ∀Ai ∈ A with Ai ∩ Aj = ∅ for i 6= j: P (∪∞
i=1 Ai ∈ A) = i=1 P (Ai ) 37 38
Then, (Ω, A, P ) constitutes a probability space. 39 40
41 42
complex definition, but covers both the continuous and the discrete case. 43 44
45 46
47 48
49 50
51 52
53
9 10
11 12
P (E1 ∩ E2) = P (E1)P (E2) 13 14
15 16
17 18
A random variable is a measurable function X from the set Ω to some set Ω0. 19 20
21 22
• ∀X ∈ A0 : f −1(X) ∈ A (preimage of measurable set also measurable) 23 24
25 26
The expected value E(X) of a random variable X is given by 27 28
29 30
Z 31 32
E(X) := XdP 33 34
Ω 35 36
37 38
if this integral converges absolutely. In the discrete setting, this simplifies to 39 40
41 42
X 43 44
E(X) := xiP (xi) 45 46
i 47 48
49 50
where X = ∪i{xi}. 51 52
53
Outline MI
A
1 2
Learning Unit 1 3 4
5 6
Basic Entropy Coding I: 7 8
9 10
Introduction to Information Theory 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
27 28
1. Basic Definitions and Mathematical Tools 29 30
31 32
2. Encoding and Decoding 33 34
3. Self-Information and Entropy 35 36
37 38
4. Prefix-free Codes 39 40
41 42
5. Shannon-Fano and Tunstall Codes 43 44
45 46
47 48
49 50
c 2023 Christian Schmaltz, Pascal Peter 51 52
53
11 12
possibly appearing in a file/image/text/video. 13 14
Random variables are used to model the symbols actually appearing. 15 16
17 18
Let A = {a1, . . . , an}, n ≥ 2 be the code alphabet, i.e. the possible set of 19 20
31 32
The set of all words from an alphabet S is denoted by S ∗. 33 34
35 36
The set of all non-empty words from an alphabet S is denoted by S +. 37 38
39 40
41 42
Coding 43 44
Converting information/words from S + to A+ is called encoding. 45 46
47 48
The reverse process is called decoding. 49 50
51 52
53
Encoding and Decoding (2) MI
A
1 2
Assumptions 3 4
5 6
For the moment, we make the following assumptions: 7 8
9 10
• All symbols are independent and identically distributed. 11 12
13 14
• Each symbol si appears with a certain probability pi := P (si). 15 16
• All pi are known in advance. 17 18
19 20
• The length of the source word is known. 21 22
23 24
In practise, the pi are usually estimated. 25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
9 10
11 12
Φ : S + → A+ 13 14
15 16
For each word v ∈ S +, Φ maps v to its so-called code word. 17 18
19 20
The set of all code words is called code. 21 22
23 24
The code generated by an encoding function Φ is called 25 26
27 28
• unambiguous if Φ is injective. 29 30
31 32
• ambiguous else. 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Encoding and Decoding (4) MI
A
1 2
Valid Decoder-Recogniser 3 4
5 6
A valid decoder-recogniser (VDR) for the code determined by Φ is an 7 8
25 26
unambiguous and there exists a VDR for it. 27 28
Remarks: 29 30
31 32
• There are uncountably many codes with no VDR. 33 34
35 36
• The goal in lossless compression is to obtain a uniquely decodable code with 37 38
the shortest possible (average) code words. 39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
code alphabet: 9 10
φ : S → A+ 11 12
13 14
We denote the code word φ(si) by ci, and the length of ci with li. 15 16
17 18
Each encoding scheme φ gives rise to an induced encoding function Φ by 19 20
concatenation. 21 22
23 24
Every code determined by an encoding scheme has a VDR. 25 26
27 28
There is an algorithm to decide if an encoding scheme is uniquely decodable 29 30
(Sardinas-Patterson algorithm). 31 32
33 34
The average code word length l is defined as 35 36
37 38
m
X 39 40
l= pili 41 42
i=1 43 44
45 46
47 48
49 50
51 52
53
Encoding and Decoding (6) MI
A
1 2
Encoding Schemes – Example 3 4
5 6
i 1 2 3 4 7 8
9 10
si A B C D 11 12
13 14
pi 0.4 0.2 0.1 0.3 15 16
17 18
ci 010 0100 11 0010 19 20
21 22
li 3 4 2 4 23 24
25 26
Color
27 28
29 30
Example set of symbols with their probability, their code words, and the code word 31 32
lengths. The symbols A–D can be interpreted as four different grey values. 33 34
35 36
Examples for encodings:
37 38
• 1-pixel signal: Φ(A) = φ(A) = 010 39 40
41 42
• 2-pixel signal: Φ(AA) = φ(A)φ(A) = 010010 43 44
45 46
• 2-pixel signal: Φ(CA) = φ(C)φ(A) = 11010 47 48
49 50
51 52
53
Outline MI
A
1 2
Learning Unit 1 3 4
5 6
Basic Entropy Coding I: 7 8
9 10
Introduction to Information Theory 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
27 28
1. Basic Definitions and Mathematical Tools 29 30
31 32
2. Encoding and Decoding 33 34
3. Self-Information and Entropy 35 36
37 38
4. Prefix-free Codes 39 40
41 42
5. Shannon-Fano and Tunstall Codes 43 44
45 46
47 48
49 50
c 2023 Christian Schmaltz, Pascal Peter 51 52
53
Entropy MI
A
1 2
Self-Information of a Symbol 3 4
5 6
The amount of information disclosed by the specific symbol si is called 7 8
9 10
self-information I(si). 11 12
The self-information of a symbol can be computed as 13 14
15 16
1 17 18
I(si) = log2 = − log2 pi ∈ (0, ∞) 19 20
pi
21 22
23 24
The unit of the information content (or self-information) is called bit. 25 26
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
Self-information of a symbol subject to its probability. 51 52
53
Entropy MI
A
1 2
Why is Self-Information Defined This Way? 3 4
5 6
Desired Properties of Self-Information: 7 8
9 10
(a) The self-information should always be nonnegative: 11 12
13 14
I(x) ≥ 0 for all x ∈ (0, 1] 15 16
17 18
19 20
(b) Each symbol should give some information, unless it must occur: 21 22
23 24
25 26
I(x) > 0 for all x ∈ (0, 1) 27 28
29 30
31 32
(c) The information of two symbols should be the sum of the information of 33 34
both symbols: 35 36
37 38
I(pq) = I(p) + I(q) for all p, q ∈ (0, 1] 39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Entropy MI
A
1 2
Theorem 3 4
(Aczél and Daróczy, 1975) 5 6
7 8
All functions satisfying the conditions on the previous slide have the form 9 10
11 12
13 14
I(x) = − logb x for all x ∈ (0, 1],
15 16
17 18
where b is a positive real number. 19 20
21 22
Sketch of the Proof 23 24
25 26
(1) For 0 < x < y ≤ 1, it follows that 27 28
29 30
31 32
x x 33 34
I(x) = I y = I(y) + I > I(y). 35 36
y y
37 38
39 40
Thus, I is strictly monotonically decreasing on (0, 1]. 41 42
43 44
45 46
47 48
49 50
51 52
53
Entropy MI
A
1 2
− ln α 3 4
(2) Let α be some fixed number in (0, 1). With b := exp it holds that
I(α) 5 6
7 8
9 10
ln α
− logb α = − = I(α). 11 12
ln b 13 14
15 16
(3) You can show by induction that for elements αr from a dense subset of (0, 1]: 17 18
19 20
I(αr ) = rI(α) = −r logb(α) = − logb(αr ) 21 22
23 24
25 26
(4) Show that I must be continuous on (0, 1].
27 28
(5) Since I(·) and − logb(·) are equal on a dense subset of (0, 1], decreasing, 29 30
continuous on (0, 1], one can conclude that I(x) = − logb x on (0, 1]. 31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Entropy MI
A
1 2
The Father of Information Theory: Claude E. Shannon 3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
19 20
21 22
23 24
25 26
Left: C. E. Shannon Right: Shannon with his electromechanical mouse Theseus. 27 28
(both images are courtesy of Wikimedia Commons). 29 30
31 32
ground-breaking Master’s Thesis (foundation for Boolean circuit design) 33 34
35 36
World War II: development of Information Theory (and Cryptography)
37 38
decades ahead in research: early experiments with artificial intelligence 39 40
Entropy MI
A
1 2
Shannon Entropy 3 4
5 6
The average information content of an unknown symbol from the set S, i.e. a 7 8
31 32
If pi = 0, we define pi log2 pi = 0. This is motivated by 33 34
35 36
37 38
lim pi log2 pi = 0. 39 40
pi →0
41 42
43 44
Intuitively, the entropy is the average amount of bits needed to encode the 45 46
53
Binary Signals MI
A
1 2
Binary Signals 3 4
5 6
A signal is called binary if there are only two possible symbols, i.e. if |S| = 2. 7 8
9 10
A code is called binary if |A| = 2. 11 12
13 14
Usually, the set {0, 1} is used to represent binary signals/codes.
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
Entropy of a binary signal subject to the probability of one symbol. 49 50
51 52
53
Entropy MI
A
1 2
Properties of the Entropy 3 4
5 6
The entropy H(X) of a random variable X fulfils the following properties: 7 8
9 10
• H(X) ≥ 0. 11 12
13 14
• H(X) ≤ log2(m), where m is the number of possible symbols. 15 16
• H(X) = log2(m) if and only if pi = 1
holds for all i ∈ {1, . . . , m}. 17 18
m 19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Entropy MI
A
1 2
Theorem 3 4
(Shannon, 1948) 5 6
7 8
There is a binary encoding scheme for which 9 10
11 12
l < H(S) + 1. 13 14
15 16
17 18
The average code word length of a uniquely decodable binary encoding scheme is 19 20
Compression Algorithms MI
A
1 2
Corollary 3 4
5 6
When compressing files (S = {0, . . . , 255}, A = {0, 1}) in which all characters 7 8
are independent and equally likely, a lower bound for the average code word 9 10
length is given by: 11 12
13 14
15 16
l ≥ H(S) = log2(m) = log2(256) = 8bits = 1byte 17 18
19 20
21 22
Corollary 23 24
25 26
27 28
The average compression ratio of files in which all characters are independent
29 30
and equally likely is 1 : 1, or worse. 31 32
33 34
Remark 35 36
37 38
39 40
These corollaries also hold for encoding functions.
41 42
43 44
45 46
47 48
49 50
51 52
53
Outline MI
A
1 2
Learning Unit 1 3 4
5 6
Basic Entropy Coding I: 7 8
9 10
Introduction to Information Theory 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
27 28
1. Basic Definitions and Mathematical Tools 29 30
31 32
2. Encoding and Decoding 33 34
3. Self-Information and Entropy 35 36
37 38
4. Prefix-free Codes 39 40
41 42
5. Shannon-Fano and Tunstall Codes 43 44
45 46
47 48
49 50
c 2023 Christian Schmaltz, Pascal Peter 51 52
53
Prefix-Free Codes MI
A
1 2
Prefix-Free Codes 3 4
5 6
7 8
Problems: 9 10
11 12
Codes are only useful if they are uniquely decodable. 13 14
15 16
It is not always trivial to decide if they are. 17 18
19 20
In the following we discuss a simple condition 21 22
that creates an important class of uniquely decodable codes. 23 24
25 26
Prefixes 27 28
29 30
A word u is a prefix of a word w if and only if w = uv for some word v. 31 32
33 34
35 36
Examples 37 38
39 40
The word aab is a prefix of the word aabaab. 41 42
43 44
The word aab is a prefix of the word aab. 45 46
47 48
The word aab is not a prefix of the word abaab. However, it is a suffix.
49 50
51 52
53
Prefix-Free Codes MI
A
1 2
Prefix-Free Codes 3 4
5 6
A list of words satisfies the prefix condition if no word in the list is a prefix of a 7 8
27 28
Examples 29 30
31 32
33 34
The list (aa, ab, bab) satisfies the prefix condition. 35 36
37 38
The list (1101,0100,110) does not satisfy the prefix condition.
39 40
The list of country calling codes satisfies the prefix condition. 41 42
43 44
45 46
47 48
49 50
51 52
53
Kraft-McMillan Inequality MI
A
1 2
Kraft-McMillan Inequality 3 4
5 6
Let Al denote the set of words from A with length l. 7 8
9 10
For |S| = m, |A| = n and positive integers l1, . . . lm, the following three statements 11 12
are equivalent: 13 14
15 16
1. There is an encoding scheme φ(si) = ci ∈ Ali , i ∈ {1, . . . , m} resulting in a 17 18
uniquely decodable code. 19 20
21 22
2. There is a prefix-condition encoding scheme φ(si) = ci ∈ Ali , i ∈ {1, . . . , m}. 23 24
25 26
3. The Kraft-McMillan Inequality is fulfilled: 27 28
29 30
m
X 31 32
n−li ≤ 1 33 34
i=1
35 36
37 38
39 40
Consequence 41 42
43 44
Whenever you have uniquely decodable scheme, there is a prefix-condition 45 46
53
Kraft-McMillan Inequality MI
A
1 2
Sketch of the Proof 3 4
(2) ⇒ (1) True, since all prefix-condition codes are uniquely decodable. 5 6
7 8
(1) ⇒ (3) Will be shown on the next slides. 9 10
11 12
(3) ⇒ (2) For this direction we first have to introduce so-called Shannon-coding. 13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Kraft-McMillan Inequality MI
A
Pm 1 2
! for all Q∈ N:
(1) ⇒ (3): Let us define K = i=1 n−li . Then, 3 4
Xm Q
Xm X m m
X 5 6
−l 7 8
KQ = n−li = n−li1 n−li2 · · · n iQ
i=1 i1 =1 i2 =1 iQ =1
9 10
| {z } 11 12
Q times 13 14
m X
X m m
X 15 16
−liQ
= ··· n−li1 · n−li2 · · · n 17 18
i1 =1 i2 =1 iQ =1 19 20
21 22
m X
X m m
X −(li1 +li2 +···+liQ ) 23 24
= ··· n 25 26
i1 =1 i2 =1 iQ =1 27 28
29 30
31 32
Thereby, li1 + li2 + · · · + liQ is the length of Q codewords.
33 34
With l := maxi∈{1,...m} li, we have 35 36
37 38
39 40
Q ≤ li1 + li2 + · · · + liQ ≤ lQ 41 42
43 44
45 46
47 48
49 50
51 52
53
Kraft-McMillan Inequality MI
A
1 2
Thus, if we denote the set of combinations of Q codewords that have a combined 3 4
length of k with Ak , it holds that 5 6
7 8
Ql
X 9 10
K Q
= |Ak |n−k 11 12
13 14
k=Q
15 16
17 18
There are nk words in A with length k, and each can only represent one sequence of 19 20
code words, since the code is uniquely decodable. Thus: 21 22
23 24
25 26
|Ak | ≤ nk 27 28
29 30
Consequently: 31 32
33 34
35 36
Ql
X Ql
X Ql
X
Q −k k −k
37 38
K = |Ak |n ≤ n n = 1 = Ql − Q + 1 = Q(l − 1) + 1 39 40
k=Q k=Q k=Q 41 42
43 44
Thus, K Q rises at most linearly with Q, which leads to K ≤ 1. 45 46
47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
Shannon Coding 3 4
5 6
In order to prove the missing direction (3) ⇒ (2), we need Shannon coding. 7 8
9 10
Step 1: Reorder the source alphabet according to descending probability. 11 12
19 20
21 22
1
lk := log2 ⇒ 2−lk ≤ pk < 2−lk +1 , 23 24
pk 25 26
27 28
and numbers 29 30
31 32
k−1 33 34
X 35 36
Pk := pi , 1 ≤ k ≤ m,
37 38
i=1
39 40
P0 41 42
where the empty sum P1 := i=1 pi has a value of zero. 43 44
45 46
Step 3: Map each si to the first li digits in the binary representation of Pi. 47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
Computing Binary Representations 3 4
5 6
To convert a decimal number d ∈ [0, 1) to a string b representing the same 7 8
31 32
0.1011111 . . . = 0.101 = 0.11 = 0.11000000 . . . 33 34
35 36
37 38
Shannon’s method requires the finite representation which is also found by the 39 40
algorithm above. 41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
Remarks 3 4
5 6
Shannon coding was proposed by Claude Elwood Shannon in 1948. 7 8
9 10
The code-word lengths correspond to the self-information of each symbol 11 12
19 20
Even if this is done, Shannon’s method does not always create optimal code
21 22
words, i.e. other approaches might result in shorter average code words. 23 24
25 26
Thus we are only interested in Shannon Coding for the purpose of our proof.
27 28
For the purpose of the proof, we have given lengths lk and choose fixed 29 30
31 32
pk := 2−lk . Thus, we do not consider actual probabilities of the source symbols. 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
Theorem 3 4
5 6
Shannon’s method always results in prefix-free codes. 7 8
9 10
11 12
Sketch of the Proof 13 14
l m 15 16
(1) Since p1 ≥ . . . ≥ pm 1
and lk = log2 p , it holds that l1 ≤ . . . ≤ lm. 17 18
k
19 20
(2) Thus, it is sufficient to show that, for 1 ≤ k < r ≤ m, ck is not a prefix of cr . 21 22
23 24
(3) Assume ck is a prefix of cr , where 1 ≤ k < r ≤ m. Then, we have: 25 26
27 28
r−1 k k−1
29 30
X X X 31 32
Pr = pi ≥ pi = pi + pk = Pk + pk ≥ Pk + 2−lk 33 34
i=1 i=1 i=1 35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
(4) If the first lk positions in the binary representations of Pr and Pk are equal, they 3 4
can differ at most by 5 6
X∞
7 8
2−j = 2−lk . 9 10
j=lk +1 11 12
13 14
(5) Equality in (4) only holds if all bits after the lk th position are one, which was 15 16
forbidden. Thus, they differ by less than 2−lk . 17 18
19 20
(3) from the previous slide however states that Pr − Pk ≥ 2−lk . 21 22
(6) Since (3) contradicts (5), the assumption was wrong. 23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon Coding MI
A
1 2
Recapitulation: Kraft-McMillan Inequality 3 4
5 6
For |S| = m, |A| = n and positive integers l1, . . . lm, the following three statements 7 8
are equivalent: 9 10
11 12
1. There is an encoding scheme φ(si) = ci ∈ Ali , i ∈ {1, . . . , m} resulting in a 13 14
uniquely decodable code. 15 16
17 18
2. There is a prefix-condition encoding scheme φ(s1) = ci ∈ Ali , i ∈ {1, . . . , m}.
19 20
3. 21 22
m
X 23 24
n−li ≤ 1 25 26
i=1 27 28
29 30
31 32
33 34
Sketch of the Missing Part of the Proof 35 36
(3) ⇒ (2): Shannon’s Method can easily be extended to the non-binary case by using 37 38
39 40
logarithms to the base of n and n-ary representations, and it is still a prefix-condition 41 42
encoding scheme. Thus, the last part of the theorem can be shown using an n-ary 43 44
Shannon’s method with pi = n−li . 45 46
47 48
49 50
51 52
53
Shannon-Fano Coding MI
A
1 2
Shannon-Fano Coding 3 4
5 6
Goal: Create an efficient binary code for a given alphabet. 7 8
9 10
Idea: Partition the list of symbols according to their frequency of occurence. 11 12
13 14
1. Reorder the source alphabet such that p1 ≥ . . . ≥ pm > 0. 15 16
17 18
2. Divide the symbols into two blocks s1, . . . , sk and sk+1 . . . , sm that minimise the 19 20
Pk Pm
difference j=1 p j − j=k+1 pj .
21 22
23 24
3. Add a zero to the code words in the left group, and a one in the right group. 25 26
27 28
4. For each block with more than one symbol, go to step 2. 29 30
31 32
Each code word is found by concatenating all digits in the corresponding column. 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon-Fano-Coding MI
A
1 2
Shannon-Fano Coding: Example 3 4
5 6
i 1 4 2 3 7 8
9 10
si a d b c 11 12
13 14
pi 0.4 0.3 0.2 0.1 15 16
0 1 17 18
19 20
- 0 1 21 22
- - 0 1 23 24
25 26
ci 0 10 110 111 27 28
Example for creating an encoding scheme with the method by Shannon and Fano. 29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Shannon-Fano Coding MI
A
1 2
Remarks 3 4
5 6
This method was first published in 1948 by Shannon, but is attributed to Robert 7 8
23 24
Shannon-Fano Coding does not always create optimal code words and is thus not 25 26
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Tunstall Codes (1) MI
A
1 2
Tunstall Codes 3 4
5 6
7 8
Problem: 9 10
Sometimes, a single corrupted bit can affect the whole message. 11 12
13 14
15 16
Example: 17 18
19 20
si A D B C 21 22
ci 0 10 110 111 23 24
25 26
27 28
original message: 111 110 → C B 29 30
corrupted message: 10 111 0 → D C A 31 32
33 34
35 36
Idea: Use codewords of a fixed size q which might encode multiple source symbols. 37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
9 10
• So far, we wanted a small average codeword length ¯l. 11 12
13 14
• Now, we want a large “average sourceword length” L̄. 15 16
Tunstall codes are optimal in the sense that they maximise L̄. 17 18
19 20
intuitive idea: 21 22
23 24
• For a q-bit code, we have (at most) 2q codewords available. 25 26
27 28
• greedy approach: 29 30
Use sourceword with highest probability as prefix for new words. 31 32
33 34
• Generate new words until maximum number of codewords is reached. 35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
Tunstall Codes (3) MI
A
1 2
Algorithm for q-bit Tunstall Codes 3 4
5 6
1. Create a table with each letter si (in alphabetical order) and its probability pi. 7 8
9 10
2. If the table size exceeds 2q − |S| + 1, go to step 5. 11 12
13 14
3. Remove the entry smax with highest probability from the table. 15 16
Add the |S| words smax x obtained by concatenating every symbol x in S to smax. 17 18
19 20
Compute probabilities of new entries by multiplying the corresponding probabilities. 21 22
23 24
4. Continue with step 2. 25 26
27 28
5. Assign each q-bit codeword to a table entry. 29 30
31 32
(starting with 0 and incrementing by 1, padding with leading zeroes) 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51 52
53
15 16
Word Probability Code 17 18
19 20
A 0.600 21 22
B 0.300 000 23 24
C 0.100 001 25 26
27 28
AA 0.360 29 30
AB 0.180 010 31 32
33 34
AC 0.060 011 35 36
AAA 0.216 100 37 38
AAB 0.108 101 39 40
41 42
AAC 0.036 110 43 44
45 46
47 48
49 50
51 52
53
Summary and Outlook MI
A
1 2
Summary 3 4
5 6
Encoding functions and encoding schemes are used to encode data. 7 8
9 10
They need to be uniquely decodable to be useful. 11 12
13 14
Self-information and entropy measure how much information a symbol carries.
15 16
The optimal average code word length is always close to the entropy. 17 18
19 20
Prefix-free codes are as good as any other uniquely decodable encoding scheme. 21 22
23 24
Shannon and Shannon-Fano-Coding are historically relevant but suboptimal. 25 26
27 28
Tunstall codes are useful for robustness against transmission errors. 29 30
31 32
Outlook 33 34
35 36
37 38
Practical issues: 39 40
41 42
What information do we actually have to store in order to be able to decode?
43 44
Can we find a better variable-length alternative to Shanon-Fano? 45 46
47 48
49 50
51 52
53
References MI
A
1 2
References 3 4
5 6
U. Krengel. Einführung in die Wahrscheinlichkeitstheorie und Statistik, Friedr. 7 8
21 22
University of Illinois Press, Urbana, 1998. 23 24
(Introduction to information theory) 25 26
J. Aczél and Z. Daróczy. On Measures of Information and their 27 28
43 44
(More intuitive introduction to entropy, in German) 45 46
B. P. Tunstall. Synthesis of noiseless compression codes. PhD thesis. Georgia 47 48