Ic23 Unit02 Script
Ic23 Unit02 Script
MI
A
1 2
Recently on Image Compression ... 3 4
5 6
7 8
9 10
What did we learn about information theory so far? 11 12
Prefix-free codes are uniquely decodable and can be easily decoded. 13 14
15 16
There is no drawback in efficiency by only using prefix-free codes. 17 18
19 20
first algorithms: Shannon and Shannon-Fano codes 21 22
23 24
Problems: 25 26
27 28
How do we use these coding schemes in practice? 29 30
31 32
Shanon-Fano Coding does not always create optimal code words and is thus not 33 34
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
15 16
However, we have not talked about practical aspects of storing data, yet.
17 18
What does a compressed file need to contain? 19 20
21 22
• The encoded content. 23 24
25 26
• Information about the encoding scheme. 27 28
29 30
The encoding scheme usually varies, as it depends on the file to be compressed.
31 32
There are several ways to store the encoding schemes (see following slides). 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Practical Issues (2) MI
A
1 2
Approach I 3 4
5 6
Store how often each symbol occurs in the uncompressed file. 7 8
9 10
When decompressing, the prefix code is constructed in the same way as in the
11 12
encoding step. 13 14
15 16
The required space varies with the size of the file to be compressed.
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
si a b c d 17 18
19 20
li 1 3 3 2
21 22
ci 1 010 011 00 23 24
The resulting string might be 25 26
27 28
4int(as, 1int, 1b)(ds, 2int, 00b)(bs, 3int, 010b)(cs, 3int, 011b). 29 30
Brackets, spaces and commas are not stored. 31 32
33 34
The suffices indicate the kind of data: 35 36
37 38
• int = integer 39 40
41 42
• b = binary word 43 44
• s = symbol 45 46
47 48
49 50
51
Practical Issues (4) MI
A
1 2
Approach III: Canonical Codes 3 4
5 6
Encoding code words: 7 8
9 10
1. Reorder code words by their length. Code words with equal length are ordered 11 12
alphabetically by the word they encode. 13 14
15 16
2. Replace first code word with zeros such that it has the same length as before.
17 18
3. For each following code word: 19 20
21 22
• Increment preceding code word (+1 in binary). 23 24
25 26
• Concatenate 0s at the end of the new code word until it has the same 27 28
length as the old one. 29 30
31 32
4. Store all code word lengths.
33 34
The resulting code is called canonical codes. 35 36
37 38
Decoding code words: 39 40
41 42
1. Sort symbols first by code word length, then alphabetically. 43 44
2. Set code word of first symbol si to li zeros. 45 46
47 48
3. For each symbol: Increment previous code word, pad with 0s to correct length. 49 50
51
9 10
i 1 3 4 2 11 12
si a d b c 13 14
li 1 2 3 3 15 16
ci 1 00 010 011 17 18
canonical ci 0 10 110 111 19 20
21 22
Resulting string if it is known that S = {a, b, c, d}: 1i, 3i, 3i, 2i. 23 24
25 26
If |S| > 4, additional zeros must be saved. Thus, this method may require more 27 28
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Practical Issues (6) MI
A
1 2
Storing Code Word Lengths 3 4
5 6
Since 1 ≤ li < |S|, dlog2(|S|)e bits are sufficient to store each li. 7 8
9 10
However, this upper bound is rarely reached in practice.
11 12
13 14
Algorithm 15 16
17 18
1. Compute lmax = max li 19 20
i∈{1,2,...,m} 21 22
23 24
2. Compute q = dlog2(lmax)e, i.e. the number of bits sufficient to store each li.
25 26
3. Store q, followed by each li with q bits. 27 28
29 30
31 32
Example 33 34
35 36
For |S| = 256 with 8 < lmax < 16, 37 38
39 40
3 +256 · 4 = 1027 41 42
|{z} |{z} 43 44
Number of bits necessary to store q Bits necessary for one code word
45 46
47 48
bits (i.e. ≈ 129 bytes) are sufficient to store a “canonical” encoding scheme. 49 50
51
15 16
Thus, the actually stored word is 0010111x, where x is either 0 or 1. 17 18
19 20
Without modifications, the decoder decodes too much in both cases:
21 22
• Decoding 00101110 yields aababbba 23 24
25 26
• Decoding 00101111 yields aababbbb 27 28
29 30
This problem can also occur with Tunstall codes (and extended Huffman codes 31 32
37 38
• Reach the EOF-marker of the compressed file without decoding too much. 39 40
41 42
• Include the end-of-file (EOF) marker as a symbol in the encoder. 43 44
45 46
• Store the size of the initial file. 47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
43 44
Huffman attended first course on information theory. 45 46
47 48
If Huffman solved a minimal code problem, he would not have to write finals. 49 50
51
The Origins of Huffman Coding (2) MI
A
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
“And I was very lucky – before I undertook my thesis – 19 20
to solve, by simple arithmetic, a classic coding problem 21 22
for which several distinguished scientists had not been 23 24
able to find an exact solution. Either I didn’t know 25 26
that, or it just didn’t bother me at the time.” 27 28
29 30
31 32
33 34
David Huffman 35 36
source: huffmancoding.com, Matthew Mulbry
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
13 14
shorter than the one of s1. 15 16
The codes of the two symbols with the lowest probabilities have the same length. 17 18
19 20
21 22
Why do these requirements make sense? 23 24
first requirement (easy): 25 26
27 28
• Average code length increases if symbols that occur often have longer codes. 29 30
31 32
second requirement: 33 34
35 36
• assume: si and sj with lowest occurrence have codelengths li > lj 37 38
• prefix code: sj cannot be a prefix of si, all other codes shorter 39 40
41 42
• What happens if we drop the last li − lj bits of si? 43 44
45 46
• si, sj still distinct (si is not a prefix), average codelength shorter 47 48
49 50
51
The Origins of Huffman Coding (4) MI
A
1 2
Huffman Coding 3 4
5 6
7 8
Formally, the intuitive requirements translate to: 9 10
11 12
For symbols s1, ..., sn 13 14
with codewords c1, ..., cn 15 16
and probabilities p1 ≥ p2 ≥ · · · ≥ pn: 17 18
19 20
21 22
l1 ≤ l2 ≤ · · · ≤ ln−1 = ln 23 24
25 26
27 28
Additional Requirement: 29 30
31 32
The codewords of sn and sn−1 differ only in the last bit. 33 34
35 36
consistent with previous requirements 37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
l1 ≤ l2 ≤ · · · ≤ ln−1 = ln 5 6
7 8
codewords of sn, sn−1 differ only in last bit
9 10
11 12
How to use this for coding? 13 14
15 16
Induction-like idea: 17 18
19 20
Trivial for two symbols, simply assign 0 and 1.
21 22
Now reduce number n of symbols successively until you reach 2: 23 24
25 26
• Consider the symbols sn, sn−1 with lowest probability. 27 28
29 30
• Assign ci = p 0, cj = p 1 with unknown prefix p. 31 32
33 34
• Merge the two symbols si, sj into new symbol sisj with probability pi + pj 35 36
and “codeword” p. 37 38
39 40
• Repeating this procedure assigns codewords to all symbols.
41 42
43 44
Remark: Only the codewords of the leaf nodes are actually used. All merged symbols 45 46
are associated with prefixes that by definition should not appear in the code. 47 48
49 50
51
Tree-Interpretation (1) MI
A
1 2
Tree-Interpretation of Huffman Coding 3 4
5 6
previous algorithm: good for derivation, but abstract 7 8
9 10
nice visual interpretation: binary tree 11 12
13 14
additional benefits for decoding
15 16
17 18
Tree-based Algorithm 19 20
21 22
1. For each symbol si, create one leaf of a tree with value pi. 23 24
25 26
2. Stop if the tree is finished. 27 28
29 30
3. Find the two tree nodes x and y with smallest value. 31 32
33 34
4. Create a new tree node with value px + py , connect it to x and y, and assign a
35 36
zero and a one to the new edges, respectively. 37 38
5. Continue with step 2. 39 40
41 42
43 44
The code words are found by traversing the tree from its root to the leaves. 45 46
47 48
49 50
51
Tree-Interpretation (2) MI
A
1 2
Remarks 3 4
5 6
The generated code words are not unique. 7 8
9 10
The code words generated by Huffman’s algorithm are optimal.
11 12
Widely used, e.g. in JPEG. 13 14
15 16
There are better compression algorithms if one of our assumptions is invalid. 17 18
19 20
At least one bit is necessary for each symbol with Huffman coding. 21 22
7
23 24
=1 25 26
0 7 1
27 28
3 4
7 7 29 30
0 1
O 31 32
2 2
7 7 33 34
0 1 0 1 35 36
1 1 1 1 37 38
7 7 7 7
G Y Z L
39 40
41 42
43 44
One possible Huffman tree for the word “zoology”. 45 46
47 48
49 50
51
Tree-Interpretation (3) MI
A
1 2
Storing Encoding Schemes with Trees 3 4
5 6
A Huffman-tree contains all information the decoder requires. 7 8
9 10
Probabilities are not needed anymore!
11 12
We only need to store the tree and the symbols at the leaf nodes. 13 14
15 16
How to do this in practice (example for q-bit alphabet): 17 18
19 20
1. Start with the root node. 21 22
23 24
2. For the current node, store a 0 and its q-bit symbol if it is a leaf node, 25 26
store a 1 otherwise. 27 28
29 30
3. If the current node has children, go to step 2 for both children.
31 32
For the Huffman tree for ’ZOOLOGY’, we obtain: 33 34
9 10
In some applications, this can be a problem, e.g. if a fixed number of symbols is
11 12
to be transmitted per second. 13 14
15 16
Solution: Try to equalise the code word lengths.
17 18
Only one additional rule necessary: 19 20
If there are several tree nodes with smallest value, take the two “oldest” ones, 21 22
23 24
i.e. those with the smallest depth.
25 26
The resulting code is a minimum-variance Huffman codes. 27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Nonbinary Huffman Codes MI
A
1 2
Nonbinary Huffman Codes 3 4
5 6
Problem: The code alphabet A might be larger than two symbols. 7 8
9 10
Solution: Use Nonbinary Huffman Codes
11 12
Instead of combining two nodes in each step and assigning a zero and a one, 13 14
15 16
combine |A| = n nodes and assign the symbols from A to the new edges.
17 18
However, in the first phase, only 19 20
21 22
23 24
|S| ≡ m0 mod (n − 1) 25 26
27 28
nodes must be combined, where m0 is a number between 2 and n (including 29 30
these numbers). 31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
coding compresses. 15 16
17 18
Recap: Huffman coding needs at least one bit for each symbol. 19 20
21 22
23 24
25 26
27 28
29 30
31 32
Question: Is there a possibility to solve these problems? 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Extended Huffman Coding MI
A
1 2
Idea: Modify the source alphabet in order to encode multiple symbols at once. 3 4
5 6
Example: 7 8
9 10
Replace the alphabet 11 12
S = {A, B, C} 13 14
15 16
by 17 18
S 0 = {AA, AB, AC, BA, BB, BC, CA, CB, CC} 19 20
21 22
The corresponding probabilities are given by: 23 24
25 26
pij := P (sisj ) = P (si)P (sj ) = pipj 27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
bits 5 6
• The entropy of the source alphabet is H(S) ≈ 0.816 symbol
7 8
• “Standard” Huffman Coding results in an average code word length of 9 10
bits
l = 1.2 symbol 11 12
13 14
• Extended Huffman Coding results in an average code word length of 15 16
bits
l = 1.7228 symbol 17 18
19 20
• Since extended Huffman Coding encodes two symbols per code word, the 21 22
average code word length with respect to the initial alphabet is 23 24
25 26
1.7228 bits bits bits 27 28
= 0.8614 < 1.2 29 30
2 symbol symbol symbol 31 32
33 34
35 36
37 38
39 40
Conclusion: Extended Huffman coding might compress much better than “standard”
41 42
Huffman coding. 43 44
45 46
47 48
49 50
51
9 10
The larger n, the better the compression.
11 12
If n corresponds to the length of the source word, this method is (theoretically) 13 14
15 16
optimal.
17 18
The overhead needed to store the encoding scheme rises drastically when 19 20
increasing n. 21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
Coding Integers MI
A
1 2
Coding Integers 3 4
5 6
7 8
Motivation 9 10
11 12
All encoding schemes presented so far only work for |S| < ∞. 13 14
15 16
In practise, this restriction is sometimes false. 17 18
19 20
Today, we learn several compression algorithms that can encode all 21 22
nonnegative/positive integers. 23 24
25 26
Thereby . . .
27 28
1 li 29 30
• . . . each integer si has a so-called implied probability .
2 31 32
• . . . one uses less bits for small numbers, and many bits for large numbers. 33 34
35 36
• . . . all methods are easy to adapt to other countable sets such as Z. 37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Unary Coding MI
A
1 2
Unary Coding 3 4
5 6
Unary coding represents each integer n ≥ 0 by n 1s, followed by a 0. 7 8
9 10
For example, 4 is coded as 11110, 7 as 11111110, and 0 as 0. 11 12
13 14
Remarks:
15 16
• The role of 0s and 1s is often flipped. 17 18
19 20
• Unary coding is only efficient if the distribution of symbols in the 21 22
compressed content is similar to the implied probabilities. 23 24
25 26
Number Encoding Implied probability 27 28
0 0 1/2 29 30
31 32
1 10 1/4
33 34
2 110 1/8 35 36
3 1110 1/16 37 38
4 11110 1/32 39 40
5 111110 1/64 41 42
6 1111110 1/128 43 44
7 11111110 1/256 45 46
47 48
49 50
51
13 14
Store number in two components: 15 16
17 18
• The largest power of 2 that is contained in the number. 19 20
• The “remainder” after removing (i.e. subtracting) this power. 21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Elias Gamma Coding MI
A
1 2
Algorithm: Elias Gamma Coding 3 4
5 6
To encode the number x, the following steps are done: 7 8
9 10
1. Find the largest number N with 2N ≤ x. 11 12
2. Encode N using unary coding. 13 14
15 16
3. Append the integer x − 2N using N binary digits. 17 18
19 20
Alternative formulation: 21 22
23 24
1. Write x in binary. 25 26
2. Subtract 1 from the number of bits written in step 1 and prepend that many 27 28
29 30
zeros.
31 32
Thus, Elias Gamma Coding uses 2blog2(x)c + 1 bit to encode x. 33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
21 22
23 24
Problem 25 26
27 28
There is no possibility to adapt to the actual probabilities. 29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
Golomb- and Rice-Codes MI
A
1 2
Golomb-Codes 3 4
5 6
7 8
From Golomb’s Original Paper: 9 10
“Secret Agent 00111 is back at the Casino again, playing 11 12
13 14
a game of chance, while the fate of mankind hangs in the
15 16
source: Ralf Roletschek, balance. 17 18
Wikimedia Commons
Each game consists of a sequence of favorable events 19 20
(probability p), terminated by the first occurrence of an 21 22
unfavorable event (probability q = 1 − p). 23 24
25 26
More specifically, the game is roulette, and the unfavorable 27 28
event is the occurrence of 0, which has a probability of 29 30
q = 1/37. 31 32
No one seriously doubts that 00111 will come through 33 34
again, but the Secret Service is quite concerned about 35 36
37 38
communicating the blow-by-blow description back to 39 40
Whitehall.” 41 42
43 44
source: University of Southern 45 46
California 47 48
49 50
51
9 10
• A favourable event occurs with probability p. 11 12
• An unfavourable event occurs with probability 1 − p. 13 14
15 16
• A sequence of k favourable events that is terminated by one unfavourable 17 18
event has probability pk (1 − p). 19 20
21 22
For our implied probabilities this would mean pi+1 = pip1, p1 = 1 − p with free 23 24
choice of p. 25 26
27 28
Golomb coding approximates such a distribution and only has one free integer 29 30
9 10
1. Compute b := dlog2(m)e 11 12
x
2. Divide with remainder: q = b m c and r = x − qm 13 14
15 16
3. Store q using unary coding, i.e. as a sequence of q ones, followed by a zero. 17 18
19 20
4. The remainder r is stored in a truncated binary representation: 21 22
23 24
(a) If r < 2b − m holds, store r as binary number of length b − 1. 25 26
(b) Otherwise, store r as binary number r + 2b − m of length b. 27 28
29 30
31 32
33 34
Remark: Step 4 is skipped for m = 1, as there is no remainder. 35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Examples 3 4
5 6
7 8
9 10
11 12
13 14
x 0 1 2 3 4 5 6 7 15 16
17 18
m=1 0 10 110 1110 11110 111110 1111110 11111110 19 20
m=2 0 0 01 10 0 10 1 110 0 110 1 1110 0 1110 1 21 22
m=3 0 0 0 10 0 11 10 0 10 10 10 11 110 0 110 10 23 24
m=4 0 00 0 01 0 10 0 11 10 00 10 01 10 10 10 11 25 26
m=5 0 00 0 01 0 10 0 110 0 111 10 00 10 01 10 10 27 28
m=6 0 00 0 01 0 100 0 101 0 110 0 111 10 00 10 01 29 30
31 32
m=7 0 00 0 010 0 011 0 100 0 101 0 110 0 111 10 00 33 34
m=8 0 000 0 001 0 010 0 011 0 100 0 101 0 110 0 111 35 36
37 38
The numbers from 0 to 7 encoded with Golomb-coding with different parameters m. 39 40
The space denotes the transition from q to r. 41 42
43 44
45 46
47 48
49 50
51
9 10
For m = 1, this is the same as unary coding.
11 12
If pm = 0.5 holds. . . 13 14
15 16
• the number x + m is half as likely as x (for any x). 17 18
19 20
• the code word for x + m is one bit longer than that of x. 21 22
23 24
• m is chosen optimally for numbers with the geometric distribution. 25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
9 10
• Then, r is always stored with b bits (since r + 2b − m = r). 11 12
• r and q can be stored in reverse order. 13 14
15 16
• Rice-codes are commonly used, e.g. in JPEG-LS, FLAC (Free Lossless 17 18
Audio Codec) and MPEG-4 ALS (MPEG-4 Audio Lossless Coding). 19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Exponential Golomb-Codes 3 4
5 6
In 1978, Teuhola proposed exponential Golomb-Codes to encode a number x. 7 8
9 10
Exponential Golomb-Codes with m = 0 are identical to Elias Gamma Codes,
11 12
except for a shift by one. 13 14
15 16
A variation for negative integers is used in H.264 (MPEG4) and H.265 (HEVC).
17 18
The number of codewords with equal length increases exponentially. 19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
Fibonacci Coding MI
A
1 2
Fibonacci Numbers 3 4
5 6
7 8
9 10
named after Leonardo “Fibonacci” of Pisa
11 12
introduced in “Liber Abaci” (Book of Calculation), 1202 13 14
15 16
rabbit riddle: 17 18
19 20
• initially: one male and one female rabbit 21 22
23 24
• Rabbits mate when they are one month old. 25 26
• one month after mating: new pair (male/female) 27 28
29 30
• The rabbits don’t die. 31 32
33 34
Fibonacci numbers are actually much older. 35 36
39 40
41 42
43 44
45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
The Fibonacci number are defined as: 3 4
5 6
• F (0) = 0
7 8
• F (1) = 1 9 10
11 12
• F (n) = F (n − 1) + F (n − 2) ∀n≥2 13 14
15 16
A closed form known as Binet’s formula exists: 17 18
√ 19 20
ϕn − (1 − ϕ)n 1+ 5 21 22
Fn = √ , where ϕ = ≈ 1.61 . . . . 23 24
5 2
25 26
27 28
29 30
|1−ϕ|n 1
Since √ < for all n, it also holds that 31 32
5 2
33 34
35 36
ϕn 1 37 38
Fn = √ + .
5 2 39 40
41 42
43 44
45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Fibonacci Coding 3 4
5 6
7 8
Idea: Represent positive integers as the sum of Fibonacci numbers. 9 10
11 12
13 14
Uniqueness 15 16
Problem: Some numbers have an ambiguous Fibonacci representation: 17 18
19 20
21 22
21 + 5 + 3 = 29 = 21 + 8 23 24
25 26
27 28
Solution: Only allow sums not containing consecutive Fibonacci numbers.
29 30
31 32
33 34
Zeckendorf’s Theorem
35 36
Every positive integer can be represented uniquely as the sum of one or more distinct 37 38
39 40
Fibonacci numbers F (i), i > 1 in such a way that the sum does not include any two 41 42
consecutive Fibonacci numbers. 43 44
45 46
The sum fulfilling this condition is called Zeckendorf representation. 47 48
49 50
51
Fibonacci Coding MI
A
1 2
Proof of Zeckendorf’s Theorem 3 4
5 6
In the following, we denote a Zeckendorf representation of a number x ∈ N by xZ . 7 8
9 10
11 12
Part I: Existence 13 14
15 16
Proof by induction. Trivial for 1,2,3, since they are Fibonacci numbers. 17 18
19 20
Assumption: ∀j ≤ k : ∃ jZ
21 22
k → k + 1: 23 24
25 26
If k + 1 is a Fibonacci number, we are done. 27 28
29 30
If it is not, then ∃j : F (j) < k + 1 < F (j + 1).
31 32
Consider the remainder r := k + 1 − F (j). 33 34
35 36
F (j) ≥ 1 ⇒ r ≤ k, i.e. ∃ rZ . 37 38
39 40
41 42
F (j) + r < F (j + 1) ⇔ F (j) + r < F (j) + F (j − 1) ⇔ r < F (j − 1) 43 44
45 46
47 48
Since F (j − 1) is not contained in rZ , we have rZ + F (j) = (k + 1)Z . 49 50
51
Fibonacci Coding MI
A
1 2
Definition: In the following, Xm denotes a set of m non-consecutive indices for 3 4
Fibonacci numbers: Xm = {x1, ..., xm}, ∀i ∈ {1, ..., m} : xi < xi+1 − 1, xi ≥ 1. 5 6
7 8
Lemma: Let Xm be given with F (xm) = F (j) for some j ≥ 2. Then we have 9 10
11 12
m
X 13 14
F (xi) < F (j + 1). 15 16
i=1 17 18
Proof: 19 20
21 22
For j = 2, the Lemma holds trivially. The only valid choices for the sum is F (2) = 1, 23 24
which is strictly smaller than F (3) = 2. 25 26
27 28
j → j + 1: 29 30
31 32
Let Xm be given with F (xm) = F (j + 1) and assume that the Lemma has been 33 34
shown for j (induction hypothesis IH). 35 36
37 38
m m−1 39 40
X X IH 41 42
F (xi) = F (xi) + F (xm) < F (x {z + 1}) + F ( |{z}
| m−1 xm )
43 44
i=1 i=1 ≤j =j+1
45 46
≤ F (j) + F (j + 1) = F (j + 2) 47 48
49 50
51
Fibonacci Coding MI
A
1 2
Proof of Zeckendorf’s Theorem 3 4
5 6
Part II: Uniqueness 7 8
9 10
Assume there are two non-consecutive index sets Am and Bn such that 11 12
X X 13 14
F (i) = F (j). 15 16
i∈Am j∈Bn
17 18
19 20
21 22
Let A := Am \ Am ∩ Bn and B := Bn \ Am ∩ Bn. We assume that Am and Bn are 23 24
not empty, F (`) = maxi∈A F (i), F (k) = maxi∈B F (i) and w.l.o.g. F (`) < F (k). 25 26
Then we also have: X X 27 28
F (i) = F (j). 29 30
i∈A j∈B 31 32
From the Lemma, we also obtain: 33 34
35 36
X X 37 38
F (i) < F (` + 1) ≤ F (k) ≤ F (j) 39 40
i∈A j∈B 41 42
43 44
which contradicts our initial statement. Thus, A = B = ∅, i.e. Am = Bn. 45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Encoding 3 4
5 6
Determine the Zeckendorf representation of the number x. To this end, use the 7 8
following algorithm: 9 10
11 12
• Step 1: Search the largest Fibonacci number F ≤ x and add it to the 13 14
representation. 15 16
17 18
• Step 2: Compute the remainder r = x − F . If r > 0, set x = r and go back 19 20
to step 1. 21 22
23 24
Starting from F (2), add a 1 for each Fibonacci number occurring in the
25 26
Zeckendorf representation of the number being encoded, and a 0 for each 27 28
Fibonacci number not occurring. 29 30
31 32
Append a 1 at the end.
33 34
35 36
Decoding 37 38
39 40
Remove the final 1. 41 42
43 44
Sum the Fibonacci numbers corresponding to the remaining 1 bits. 45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Examples 3 4
5 6
F (0) F (1) F (2) F (3) F (4) F (5) F (6) F (7) F (8) F (9) F (10) 7 8
0 1 1 2 3 5 8 13 21 34 55 9 10
11 12
Top: Fibonacci numbers. Bottom: Examples of Fibonacci coding. 13 14
15 16
Number Fibonacci representation Code word 17 18
1 F (2) 1 11 19 20
2 F (3) 2 011 21 22
3 F (4) 3 0011 23 24
4 F (2) + F (4) 3+1 1011 25 26
5 F (5) 5 00011 27 28
6 F (2) + F (5) 5+1 10011 29 30
31 32
7 F (3) + F (5) 5+2 01011 33 34
8 F (6) 8 000011 35 36
9 F (6) + F (2) 8+1 100011 37 38
10 F (6) + F (3) 8+2 010011 39 40
11 F (6) + F (4) 8+3 001011 41 42
12 F (6) + F (4) + F (2) 8 + 3 + 1 101011 43 44
45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Remarks 3 4
5 6
In many cases, there are more efficient alternatives to Fibonacci coding. 7 8
9 10
They are still used for applications that need to be robust:
11 12
• The sequence 11 always/only occurs at the end of the code word. 13 14
15 16
• Thus, a corrupted bit affects at most two encoded numbers. 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Summary MI
A
1 2
Summary 3 4
5 6
Overhead has to be considered for storing the coding scheme. 7 8
9 10
Canonical codes make the results of a coding scheme unique in postprocessing. 11 12
13 14
Huffman Coding is optimal (under the right conditions).
15 16
We have seen several ways how to encode infinite source alphabets. 17 18
19 20
Parameter-free: Unary, Elias Gamma, and Fibonacci Coding. 21 22
23 24
Adaptation to distribution: (Exponential) Golomb Codes. 25 26
27 28
Robust under transmission errors: Fibonacci Coding.
29 30
31 32
Outlook 33 34
35 36
For finite alphabets, can we still do better than Huffman coding? 37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
References MI
A
1 2
References 3 4
5 6
D. Hankerson, G. A. Harris, P.D. Johnson Jr. Introduction to Information Theory 7 8
References MI
A
1 2
References (continued) 3 4
5 6
T. Strutz. Bilddatenkompression. Fourth edition. Vieweg+Teubner, 2009. 7 8
http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html#Sec_3.3 15 16
(Elias Codes, Fibonacci Codes) 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51