0% found this document useful (0 votes)
13 views29 pages

Ic23 Unit02 Script

The document discusses image compression techniques including lossless and lossy methods. It introduces entropy coding techniques like Huffman coding and integer coding and covers practical aspects of using these schemes.

Uploaded by

soumyas.vit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

Ic23 Unit02 Script

The document discusses image compression techniques including lossless and lossy methods. It introduces entropy coding techniques like Huffman coding and integer coding and covers practical aspects of using these schemes.

Uploaded by

soumyas.vit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Recently on Image Compression ...

MI
A
1 2
Recently on Image Compression ... 3 4
5 6
7 8
9 10
What did we learn about information theory so far? 11 12
Prefix-free codes are uniquely decodable and can be easily decoded. 13 14


15 16
There is no drawback in efficiency by only using prefix-free codes. 17 18


19 20
first algorithms: Shannon and Shannon-Fano codes 21 22


23 24
Problems: 25 26
27 28
How do we use these coding schemes in practice? 29 30


31 32
Shanon-Fano Coding does not always create optimal code words and is thus not 33 34


used any more in practice. 35 36


37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Recently on Image Compression ... MI


A
1 2
Today’s Learning Unit 3 4
5 6
Image Compression 7 8
9 10
Part I: Lossless Part II: Lossy 11 12
13 14
Entropy Coding
Lossless Codecs
Transform Coding
Other Approaches 15 16
Information Theory JPEG 17 18
Fractal Compression
PNG
Huffman Coding JPEG2000 19 20
gif Neural Networks
Integer Coding HEVC intra 21 22
JBIG
Arithmetic Coding 23 24
JBIGLS
RLE, BWT, MTF, bzip
Basics
25 26
Inpainting-based
Compression 27 28
Dictionary Coding PDE-based Inpainting
quantisation 29 30
Prediction Data Selection
error measures
31 32
LZ-Family
Tonal Optimisation 33 34
linear prediction
Deflate
Patch-based Inpainting 35 36
PPM Teaser: Video Coding
Inpainting-based Codecs 37 38
PAQ
PDE-based video coding 39 40
MPEG-family 41 42
43 44
45 46
We talk about practical aspects, introduce Huffman and integer coding. 47 48


49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51

Practical Issues (1) MI


A
1 2
Practical Issues 3 4
5 6
7 8
9 10
11 12
Previously, we discussed several prefix coding schemes. 13 14


15 16
However, we have not talked about practical aspects of storing data, yet.


17 18
What does a compressed file need to contain? 19 20


21 22
• The encoded content. 23 24
25 26
• Information about the encoding scheme. 27 28
29 30
The encoding scheme usually varies, as it depends on the file to be compressed.


31 32
There are several ways to store the encoding schemes (see following slides). 33 34


35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Practical Issues (2) MI
A
1 2
Approach I 3 4
5 6
Store how often each symbol occurs in the uncompressed file. 7 8


9 10
When decompressing, the prefix code is constructed in the same way as in the


11 12
encoding step. 13 14
15 16
The required space varies with the size of the file to be compressed.


17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Practical Issues (3) MI


A
1 2
Approach II 3 4
Store the number of symbols and, for each occurring symbol, store the triple 5 6


(si, li, ci) 7 8


9 10
11 12
Example 13 14
Encoding scheme: i 1 4 2 3 15 16


si a b c d 17 18
19 20
li 1 3 3 2
21 22
ci 1 010 011 00 23 24
The resulting string might be 25 26


27 28
4int(as, 1int, 1b)(ds, 2int, 00b)(bs, 3int, 010b)(cs, 3int, 011b). 29 30
Brackets, spaces and commas are not stored. 31 32


33 34
The suffices indicate the kind of data: 35 36


37 38
• int = integer 39 40
41 42
• b = binary word 43 44
• s = symbol 45 46
47 48
49 50
51
Practical Issues (4) MI
A
1 2
Approach III: Canonical Codes 3 4
5 6
Encoding code words: 7 8


9 10
1. Reorder code words by their length. Code words with equal length are ordered 11 12
alphabetically by the word they encode. 13 14
15 16
2. Replace first code word with zeros such that it has the same length as before.
17 18
3. For each following code word: 19 20
21 22
• Increment preceding code word (+1 in binary). 23 24
25 26
• Concatenate 0s at the end of the new code word until it has the same 27 28
length as the old one. 29 30
31 32
4. Store all code word lengths.
33 34
The resulting code is called canonical codes. 35 36


37 38
Decoding code words: 39 40


41 42
1. Sort symbols first by code word length, then alphabetically. 43 44
2. Set code word of first symbol si to li zeros. 45 46
47 48
3. For each symbol: Increment previous code word, pad with 0s to correct length. 49 50
51

Practical Issues (5) MI


A
1 2
Example 3 4
5 6
Reordered encoding scheme: 7 8


9 10
i 1 3 4 2 11 12
si a d b c 13 14
li 1 2 3 3 15 16
ci 1 00 010 011 17 18
canonical ci 0 10 110 111 19 20
21 22
Resulting string if it is known that S = {a, b, c, d}: 1i, 3i, 3i, 2i. 23 24


25 26
If |S| > 4, additional zeros must be saved. Thus, this method may require more 27 28


or less space than the methods described before. 29 30


31 32
Alternatively, |S| bits may be used to indicate which symbols appear. 33 34


35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Practical Issues (6) MI
A
1 2
Storing Code Word Lengths 3 4
5 6
Since 1 ≤ li < |S|, dlog2(|S|)e bits are sufficient to store each li. 7 8


9 10
However, this upper bound is rarely reached in practice.


11 12
13 14
Algorithm 15 16
17 18
1. Compute lmax = max li 19 20
i∈{1,2,...,m} 21 22
23 24
2. Compute q = dlog2(lmax)e, i.e. the number of bits sufficient to store each li.
25 26
3. Store q, followed by each li with q bits. 27 28
29 30
31 32
Example 33 34
35 36
For |S| = 256 with 8 < lmax < 16, 37 38


39 40
3 +256 · 4 = 1027 41 42
|{z} |{z} 43 44
Number of bits necessary to store q Bits necessary for one code word
45 46
47 48
bits (i.e. ≈ 129 bytes) are sufficient to store a “canonical” encoding scheme. 49 50
51

Practical Issues (7) MI


A
1 2
End of File 3 4
5 6
Consider the alphabet S = {a, b} and the encoding scheme φ(a) = 0, φ(b) = 1. 7 8


Then, Φ(aababbb) = 0010111. 9 10


11 12
Problem: Data can be only stored byte wise on the hard disk. 13 14


15 16
Thus, the actually stored word is 0010111x, where x is either 0 or 1. 17 18


19 20
Without modifications, the decoder decodes too much in both cases:


21 22
• Decoding 00101110 yields aababbba 23 24
25 26
• Decoding 00101111 yields aababbbb 27 28
29 30
This problem can also occur with Tunstall codes (and extended Huffman codes 31 32


later in the lecture) 33 34


35 36
There are several approaches to solve this problem:


37 38
• Reach the EOF-marker of the compressed file without decoding too much. 39 40
41 42
• Include the end-of-file (EOF) marker as a symbol in the encoder. 43 44
45 46
• Store the size of the initial file. 47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51

The Origins of Huffman Coding (1) MI


A
1 2
The Origins of Huffman Coding 3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
David Huffman Robert Fano 35 36
source: huffmancoding.com, Matthew Mulbry source: wikimedia commons, user 121a0012
37 38
39 40
David Huffman was a student of Robert Fano (Shannon-Fano coding) at MIT. 41 42


43 44
Huffman attended first course on information theory. 45 46


47 48
If Huffman solved a minimal code problem, he would not have to write finals. 49 50


51
The Origins of Huffman Coding (2) MI
A
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
“And I was very lucky – before I undertook my thesis – 19 20
to solve, by simple arithmetic, a classic coding problem 21 22
for which several distinguished scientists had not been 23 24
able to find an exact solution. Either I didn’t know 25 26
that, or it just didn’t bother me at the time.” 27 28
29 30
31 32
33 34
David Huffman 35 36
source: huffmancoding.com, Matthew Mulbry
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

The Origins of Huffman Coding (3) MI


A
1 2
Requirements for an Optimal Code 3 4
5 6
7 8
In his paper, Huffman formulated the following intuitive requirements: 9 10
If s1 has a higher probability of occurrence than s2, the code of s2 cannot be 11 12


13 14
shorter than the one of s1. 15 16
The codes of the two symbols with the lowest probabilities have the same length. 17 18


19 20
21 22
Why do these requirements make sense? 23 24
first requirement (easy): 25 26


27 28
• Average code length increases if symbols that occur often have longer codes. 29 30
31 32
second requirement: 33 34


35 36
• assume: si and sj with lowest occurrence have codelengths li > lj 37 38
• prefix code: sj cannot be a prefix of si, all other codes shorter 39 40
41 42
• What happens if we drop the last li − lj bits of si? 43 44
45 46
• si, sj still distinct (si is not a prefix), average codelength shorter 47 48
49 50
51
The Origins of Huffman Coding (4) MI
A
1 2
Huffman Coding 3 4
5 6
7 8
Formally, the intuitive requirements translate to: 9 10
11 12
For symbols s1, ..., sn 13 14
with codewords c1, ..., cn 15 16
and probabilities p1 ≥ p2 ≥ · · · ≥ pn: 17 18
19 20
21 22
l1 ≤ l2 ≤ · · · ≤ ln−1 = ln 23 24
25 26
27 28
Additional Requirement: 29 30
31 32
The codewords of sn and sn−1 differ only in the last bit. 33 34


35 36
consistent with previous requirements 37 38


39 40
41 42
43 44
45 46
47 48
49 50
51

The Origins of Huffman Coding (5) MI


A
1 2
intuitive requirements: 3 4


l1 ≤ l2 ≤ · · · ≤ ln−1 = ln 5 6
7 8
codewords of sn, sn−1 differ only in last bit


9 10
11 12
How to use this for coding? 13 14
15 16
Induction-like idea: 17 18
19 20
Trivial for two symbols, simply assign 0 and 1.


21 22
Now reduce number n of symbols successively until you reach 2: 23 24


25 26
• Consider the symbols sn, sn−1 with lowest probability. 27 28
29 30
• Assign ci = p 0, cj = p 1 with unknown prefix p. 31 32
33 34
• Merge the two symbols si, sj into new symbol sisj with probability pi + pj 35 36
and “codeword” p. 37 38
39 40
• Repeating this procedure assigns codewords to all symbols.
41 42
43 44
Remark: Only the codewords of the leaf nodes are actually used. All merged symbols 45 46
are associated with prefixes that by definition should not appear in the code. 47 48
49 50
51
Tree-Interpretation (1) MI
A
1 2
Tree-Interpretation of Huffman Coding 3 4
5 6
previous algorithm: good for derivation, but abstract 7 8


9 10
nice visual interpretation: binary tree 11 12


13 14
additional benefits for decoding


15 16
17 18
Tree-based Algorithm 19 20
21 22
1. For each symbol si, create one leaf of a tree with value pi. 23 24
25 26
2. Stop if the tree is finished. 27 28
29 30
3. Find the two tree nodes x and y with smallest value. 31 32
33 34
4. Create a new tree node with value px + py , connect it to x and y, and assign a
35 36
zero and a one to the new edges, respectively. 37 38
5. Continue with step 2. 39 40
41 42
43 44
The code words are found by traversing the tree from its root to the leaves. 45 46
47 48
49 50
51

Tree-Interpretation (2) MI
A
1 2
Remarks 3 4
5 6
The generated code words are not unique. 7 8


9 10
The code words generated by Huffman’s algorithm are optimal.


11 12
Widely used, e.g. in JPEG. 13 14


15 16
There are better compression algorithms if one of our assumptions is invalid. 17 18


19 20
At least one bit is necessary for each symbol with Huffman coding. 21 22


7
23 24
=1 25 26
0 7 1
27 28
3 4
7 7 29 30
0 1
O 31 32
2 2
7 7 33 34
0 1 0 1 35 36
1 1 1 1 37 38
7 7 7 7
G Y Z L
39 40
41 42
43 44
One possible Huffman tree for the word “zoology”. 45 46
47 48
49 50
51
Tree-Interpretation (3) MI
A
1 2
Storing Encoding Schemes with Trees 3 4
5 6
A Huffman-tree contains all information the decoder requires. 7 8


9 10
Probabilities are not needed anymore!


11 12
We only need to store the tree and the symbols at the leaf nodes. 13 14


15 16
How to do this in practice (example for q-bit alphabet): 17 18


19 20
1. Start with the root node. 21 22
23 24
2. For the current node, store a 0 and its q-bit symbol if it is a leaf node, 25 26
store a 1 otherwise. 27 28
29 30
3. If the current node has children, go to step 2 for both children.
31 32
For the Huffman tree for ’ZOOLOGY’, we obtain: 33 34


1 0Ob 1 1 0Gb 0Yb 1 0Zb 0Lb 35 36


37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Minimum Variance Huffman Codes MI


A
1 2
Minimum Variance Huffman Codes 3 4
5 6
Problem: Some codewords can be much longer than others. 7 8


9 10
In some applications, this can be a problem, e.g. if a fixed number of symbols is


11 12
to be transmitted per second. 13 14
15 16
Solution: Try to equalise the code word lengths.


17 18
Only one additional rule necessary: 19 20


If there are several tree nodes with smallest value, take the two “oldest” ones, 21 22
23 24
i.e. those with the smallest depth.
25 26
The resulting code is a minimum-variance Huffman codes. 27 28


29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Nonbinary Huffman Codes MI
A
1 2
Nonbinary Huffman Codes 3 4
5 6
Problem: The code alphabet A might be larger than two symbols. 7 8


9 10
Solution: Use Nonbinary Huffman Codes


11 12
Instead of combining two nodes in each step and assigning a zero and a one, 13 14


15 16
combine |A| = n nodes and assign the symbols from A to the new edges.
17 18
However, in the first phase, only 19 20


21 22
23 24
|S| ≡ m0 mod (n − 1) 25 26
27 28
nodes must be combined, where m0 is a number between 2 and n (including 29 30
these numbers). 31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Extended Huffman Coding MI


A
1 2
Extended Huffman Coding 3 4
5 6
Recap: A uniquely decodable binary scheme can obtain the average code word 7 8


length l = H(S) if and only if all pi are powers of 12 . 9 10


11 12
Especially, the further away the pi are from powers of 12 , the worse Huffman 13 14


coding compresses. 15 16
17 18
Recap: Huffman coding needs at least one bit for each symbol. 19 20


21 22
23 24
25 26
27 28
29 30
31 32
Question: Is there a possibility to solve these problems? 33 34


35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Extended Huffman Coding MI
A
1 2
Idea: Modify the source alphabet in order to encode multiple symbols at once. 3 4


5 6
Example: 7 8
9 10
Replace the alphabet 11 12


S = {A, B, C} 13 14
15 16
by 17 18
S 0 = {AA, AB, AC, BA, BB, BC, CA, CB, CC} 19 20
21 22
The corresponding probabilities are given by: 23 24


25 26
pij := P (sisj ) = P (si)P (sj ) = pipj 27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Extended Huffman Coding MI


A
1 2
Example (continued) 3 4
5 6
7 8
Standard Huffman Coding: 9 10
11 12
i 1 2 3 13 14
si A B C 15 16
17 18
pi 0.8 0.02 0.18 19 20
ci 0 11 10 21 22
23 24
Extended Huffman Coding: 25 26
27 28
i 1 2 3 4 5 6 7 8 9 29 30
31 32
si AA AB AC BA BB BC CA CB CC 33 34
pi 0.64 0.016 0.144 0.016 0.0004 0.0036 0.144 0.0036 0.0324 35 36
37 38
ci 0 10101 11 101000 10100101 1010011 100 10100100 1011 39 40
41 42
43 44
45 46
47 48
49 50
51
Extended Huffman Coding MI
A
1 2
In the example from last slide, it holds: 3 4


bits 5 6
• The entropy of the source alphabet is H(S) ≈ 0.816 symbol
7 8
• “Standard” Huffman Coding results in an average code word length of 9 10
bits
l = 1.2 symbol 11 12
13 14
• Extended Huffman Coding results in an average code word length of 15 16
bits
l = 1.7228 symbol 17 18
19 20
• Since extended Huffman Coding encodes two symbols per code word, the 21 22
average code word length with respect to the initial alphabet is 23 24
25 26
1.7228 bits bits bits 27 28
= 0.8614 < 1.2 29 30
2 symbol symbol symbol 31 32
33 34
35 36
37 38
39 40
Conclusion: Extended Huffman coding might compress much better than “standard”
41 42
Huffman coding. 43 44
45 46
47 48
49 50
51

Extended Huffman Coding MI


A
1 2
Remarks 3 4
5 6
It is also possible to use words consisting of n > 2 symbols as new alphabet. 7 8


9 10
The larger n, the better the compression.


11 12
If n corresponds to the length of the source word, this method is (theoretically) 13 14


15 16
optimal.
17 18
The overhead needed to store the encoding scheme rises drastically when 19 20


increasing n. 21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51

Coding Integers MI
A
1 2
Coding Integers 3 4
5 6
7 8
Motivation 9 10
11 12
All encoding schemes presented so far only work for |S| < ∞. 13 14


15 16
In practise, this restriction is sometimes false. 17 18


19 20
Today, we learn several compression algorithms that can encode all 21 22


nonnegative/positive integers. 23 24
25 26
Thereby . . .


27 28

1 li 29 30
• . . . each integer si has a so-called implied probability .
2 31 32
• . . . one uses less bits for small numbers, and many bits for large numbers. 33 34
35 36
• . . . all methods are easy to adapt to other countable sets such as Z. 37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Unary Coding MI
A
1 2
Unary Coding 3 4
5 6
Unary coding represents each integer n ≥ 0 by n 1s, followed by a 0. 7 8


9 10
For example, 4 is coded as 11110, 7 as 11111110, and 0 as 0. 11 12


13 14
Remarks:


15 16
• The role of 0s and 1s is often flipped. 17 18
19 20
• Unary coding is only efficient if the distribution of symbols in the 21 22
compressed content is similar to the implied probabilities. 23 24
25 26
Number Encoding Implied probability 27 28
0 0 1/2 29 30
31 32
1 10 1/4
33 34
2 110 1/8 35 36
3 1110 1/16 37 38
4 11110 1/32 39 40
5 111110 1/64 41 42
6 1111110 1/128 43 44
7 11111110 1/256 45 46
47 48
49 50
51

Elias Gamma Coding MI


A
1 2
Elias Gamma Coding 3 4
5 6
7 8
9 10
Idea: Encode “deviation” from powers of 2. 11 12


13 14
Store number in two components: 15 16


17 18
• The largest power of 2 that is contained in the number. 19 20
• The “remainder” after removing (i.e. subtracting) this power. 21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Elias Gamma Coding MI
A
1 2
Algorithm: Elias Gamma Coding 3 4
5 6
To encode the number x, the following steps are done: 7 8


9 10
1. Find the largest number N with 2N ≤ x. 11 12
2. Encode N using unary coding. 13 14
15 16
3. Append the integer x − 2N using N binary digits. 17 18
19 20
Alternative formulation: 21 22


23 24
1. Write x in binary. 25 26
2. Subtract 1 from the number of bits written in step 1 and prepend that many 27 28
29 30
zeros.
31 32
Thus, Elias Gamma Coding uses 2blog2(x)c + 1 bit to encode x. 33 34


35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Elias Gamma Coding MI


A
1 2
Elias Gamma Coding - Example 3 4
5 6
Number Encoding Implied probability 7 8
1 = 20 + 0 1 1/2 9 10
2 = 21 + 0 01 0 1/8 11 12
3 = 21 + 1 01 1 1/8 13 14
4 = 22 + 0 001 00 1/32 15 16
5 = 22 + 1 001 01 1/32 17 18
19 20
6 = 22 + 2 001 10 1/32 21 22
7 = 22 + 3 001 11 1/32 23 24
8 = 23 + 0 0001 000 1/128 25 26
9 = 23 + 1 0001 001 1/128 27 28
10 = 23 + 2 0001 010 1/128 29 30
11 = 23 + 3 0001 011 1/128 31 32
12 = 23 + 4 0001 100 1/128 33 34
13 = 23 + 5 0001 101 1/128 35 36
37 38
14 = 23 + 6 0001 110 1/128 39 40
15 = 23 + 7 0001 111 1/128 41 42
16 = 24 + 0 00001 0000 1/512 43 44
17 = 24 + 1 00001 0001 1/512 45 46
47 48
49 50
51
Elias Gamma Coding MI
A
1 2
Decoding 3 4
5 6
1. Count how many 0s occur in front of the first 1. This number is N . 7 8
9 10
2. Read the next N + 1 bits as a binary number. 11 12
13 14
Remark 15 16
17 18
Elias Gamma Codes are used in the video codec H.264. 19 20


21 22
23 24
Problem 25 26
27 28
There is no possibility to adapt to the actual probabilities. 29 30


31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
Golomb- and Rice-Codes MI
A
1 2
Golomb-Codes 3 4
5 6
7 8
From Golomb’s Original Paper: 9 10
“Secret Agent 00111 is back at the Casino again, playing 11 12
13 14
a game of chance, while the fate of mankind hangs in the
15 16
source: Ralf Roletschek, balance. 17 18
Wikimedia Commons
Each game consists of a sequence of favorable events 19 20
(probability p), terminated by the first occurrence of an 21 22
unfavorable event (probability q = 1 − p). 23 24
25 26
More specifically, the game is roulette, and the unfavorable 27 28
event is the occurrence of 0, which has a probability of 29 30
q = 1/37. 31 32
No one seriously doubts that 00111 will come through 33 34
again, but the Secret Service is quite concerned about 35 36
37 38
communicating the blow-by-blow description back to 39 40
Whitehall.” 41 42
43 44
source: University of Southern 45 46
California 47 48
49 50
51

Golomb- and Rice-Codes MI


A
1 2
Golomb-Codes 3 4
5 6
7 8
From Golomb’s Original Paper: 9 10
“The bartender, who is a free-lance agent, has a binary 11 12
13 14
channel available, but he charges a stiff fee for each bit sent.
15 16
source: Ralf Roletschek,
The problem perplexing the Service is how to encode the 17 18
Wikimedia Commons
vicissitudes of the wheel so as to place the least strain on 19 20
the Royal Exchequer. 21 22
23 24
It is easily seen that, for the case p = q = 1/2, the best that 25 26
can be done is to use 0 and 1 to represent the two possible 27 28
outcomes. 29 30
However, the case at hand involves p >> q, for which the 31 32
direct coding method is shockingly inefficient.” 33 34
35 36
37 38
39 40
41 42
43 44
source: University of Southern 45 46
California 47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Golomb-Codes 3 4
5 6
Idea: Model numbers after a geometric distribution: 7 8


9 10
• A favourable event occurs with probability p. 11 12
• An unfavourable event occurs with probability 1 − p. 13 14
15 16
• A sequence of k favourable events that is terminated by one unfavourable 17 18
event has probability pk (1 − p). 19 20
21 22
For our implied probabilities this would mean pi+1 = pip1, p1 = 1 − p with free 23 24


choice of p. 25 26
27 28
Golomb coding approximates such a distribution and only has one free integer 29 30


parameter m > 0 that corresponds to the choice of p. 31 32


33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Golomb- and Rice-Codes MI


A
1 2
Algorithm 3 4
5 6
To encode the number x using the parameter m, the following steps are done: 7 8


9 10
1. Compute b := dlog2(m)e 11 12
x
2. Divide with remainder: q = b m c and r = x − qm 13 14
15 16
3. Store q using unary coding, i.e. as a sequence of q ones, followed by a zero. 17 18
19 20
4. The remainder r is stored in a truncated binary representation: 21 22
23 24
(a) If r < 2b − m holds, store r as binary number of length b − 1. 25 26
(b) Otherwise, store r as binary number r + 2b − m of length b. 27 28
29 30
31 32
33 34
Remark: Step 4 is skipped for m = 1, as there is no remainder. 35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Examples 3 4
5 6
7 8
9 10
11 12
13 14
x 0 1 2 3 4 5 6 7 15 16
17 18
m=1 0 10 110 1110 11110 111110 1111110 11111110 19 20
m=2 0 0 01 10 0 10 1 110 0 110 1 1110 0 1110 1 21 22
m=3 0 0 0 10 0 11 10 0 10 10 10 11 110 0 110 10 23 24
m=4 0 00 0 01 0 10 0 11 10 00 10 01 10 10 10 11 25 26
m=5 0 00 0 01 0 10 0 110 0 111 10 00 10 01 10 10 27 28
m=6 0 00 0 01 0 100 0 101 0 110 0 111 10 00 10 01 29 30
31 32
m=7 0 00 0 010 0 011 0 100 0 101 0 110 0 111 10 00 33 34
m=8 0 000 0 001 0 010 0 011 0 100 0 101 0 110 0 111 35 36
37 38
The numbers from 0 to 7 encoded with Golomb-coding with different parameters m. 39 40
The space denotes the transition from q to r. 41 42
43 44
45 46
47 48
49 50
51

Golomb- and Rice-Codes MI


A
1 2
Decoding 3 4
5 6
1. Decode q. 7 8
9 10
2. Interpret the next b − 1 bits as binary number r0. If r0 < 2b − m holds, this is the 11 12
remainder r. 13 14
15 16
2’. Otherwise, read the next b bits as binary number r0. The remainder r is given by
17 18
r = r0 − 2b + m. 19 20
3. The final number x is given by x = q · m + r. 21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Remarks 3 4
5 6
Proposed in 1966 by Solomon Wolf Golomb. 7 8


9 10
For m = 1, this is the same as unary coding.


11 12
If pm = 0.5 holds. . . 13 14


15 16
• the number x + m is half as likely as x (for any x). 17 18
19 20
• the code word for x + m is one bit longer than that of x. 21 22
23 24
• m is chosen optimally for numbers with the geometric distribution. 25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Golomb- and Rice-Codes MI


A
1 2
Rice-Codes 3 4
5 6
If m is a power of two, the resulting code is called Rice-code. 7 8


9 10
• Then, r is always stored with b bits (since r + 2b − m = r). 11 12
• r and q can be stored in reverse order. 13 14
15 16
• Rice-codes are commonly used, e.g. in JPEG-LS, FLAC (Free Lossless 17 18
Audio Codec) and MPEG-4 ALS (MPEG-4 Audio Lossless Coding). 19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Exponential Golomb-Codes 3 4
5 6
In 1978, Teuhola proposed exponential Golomb-Codes to encode a number x. 7 8


9 10
Exponential Golomb-Codes with m = 0 are identical to Elias Gamma Codes,


11 12
except for a shift by one. 13 14
15 16
A variation for negative integers is used in H.264 (MPEG4) and H.265 (HEVC).


17 18
The number of codewords with equal length increases exponentially. 19 20


21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Golomb- and Rice-Codes MI


A
1 2
Algorithm 3 4
5 6
1. Determine q ≥ 0 such that 7 8
9 10
q−1
X q
X j x k 11 12
j+m
2 ≤x< j+m
2 ⇔ q = log2 m + 1 13 14
j=0 j=0
2 15 16
17 18
P−1 19 20
j+m
If q = 0, the sum j=0 2 is empty by definition. 21 22
23 24
2. Encode q in unary representation, and 25 26
27 28
q−1
X 29 30
r =x− 2j+m 31 32
j=0 33 34
35 36
as a (m + q)-bit binary number. 37 38
39 40
41 42
Caution: Golomb-Codes and exponential Golomb-Codes use different unary 43 44
representations to represent q. 45 46
47 48
49 50
51
Golomb- and Rice-Codes MI
A
1 2
Examples 3 4
5 6
x m=0 m=1 m=2 m=3 m=4 m=5 7 8
0 1 10 1 00 1 000 1 0000 1 00000 9 10
1 01 0 11 1 01 1 001 1 0001 1 00001 11 12
2 01 1 01 00 1 10 1 010 1 0010 1 00010 13 14
3 001 00 01 01 1 11 1 011 1 0011 1 00011 15 16
17 18
4 001 01 01 10 01 000 1 100 1 0100 1 00100 19 20
5 001 10 01 11 01 001 1 101 1 0101 1 00101 21 22
6 001 11 001 000 01 010 1 110 1 0110 1 00110 23 24
7 0001 000 001 001 01 011 1 111 1 0111 1 00111 25 26
8 0001 001 001 010 01 100 01 0000 1 1000 1 01000 27 28
29 30
9 0001 010 001 011 01 101 01 0001 1 1001 1 01001
31 32
10 0001 011 001 100 01 110 01 0010 1 1010 1 01010 33 34
11 0001 100 001 101 01 111 01 0011 1 1011 1 01011 35 36
12 0001 101 001 110 001 0000 01 0100 1 1100 1 01100 37 38
13 0001 110 001 111 001 0001 01 0101 1 1101 1 01101 39 40
14 0001 111 0001 0000 001 0010 01 0110 1 1110 1 01110 41 42
43 44
15 00001 0000 0001 0001 001 0011 01 0111 1 1111 1 01111
45 46
47 48
49 50
51

Outline MI
A
1 2
Learning Unit 02: 3 4
5 6
Basic Entropy Coding II: 7 8
9 10
Huffman and Integer Coding 11 12
13 14
15 16
17 18
19 20
21 22
23 24
Contents 25 26
1. Practical Issues 27 28
29 30
2. Huffman-Coding 31 32
33 34
3. Basic Integer Coding: Unary and Elias Gamma Codes 35 36
37 38
4. Golomb- and Rice-Codes 39 40
5. Fibonacci Coding 41 42
43 44
45 46
47 48
c 2023 Christian Schmaltz, Pascal Peter 49 50
51
Fibonacci Coding MI
A
1 2
Fibonacci Numbers 3 4
5 6
7 8
9 10
named after Leonardo “Fibonacci” of Pisa


11 12
introduced in “Liber Abaci” (Book of Calculation), 1202 13 14


15 16
rabbit riddle: 17 18
 19 20
• initially: one male and one female rabbit 21 22
23 24
• Rabbits mate when they are one month old. 25 26
• one month after mating: new pair (male/female) 27 28
29 30
• The rabbits don’t die. 31 32
33 34
Fibonacci numbers are actually much older. 35 36


source: Hans-Peter Postel,


37 38
Wikimedia Commons analysis of Sanskrit meter in India (200 BC)


39 40
41 42
43 44
45 46
47 48
49 50
51

Fibonacci Coding MI
A
1 2
The Fibonacci number are defined as: 3 4


5 6
• F (0) = 0
7 8
• F (1) = 1 9 10
11 12
• F (n) = F (n − 1) + F (n − 2) ∀n≥2 13 14
15 16
A closed form known as Binet’s formula exists: 17 18


√ 19 20
ϕn − (1 − ϕ)n 1+ 5 21 22
Fn = √ , where ϕ = ≈ 1.61 . . . . 23 24
5 2
25 26
27 28
29 30
|1−ϕ|n 1
Since √ < for all n, it also holds that 31 32


5 2
33 34

 35 36
ϕn 1 37 38
Fn = √ + .
5 2 39 40
41 42
43 44
45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Fibonacci Coding 3 4
5 6
7 8
Idea: Represent positive integers as the sum of Fibonacci numbers. 9 10
11 12
13 14
Uniqueness 15 16
Problem: Some numbers have an ambiguous Fibonacci representation: 17 18


19 20
21 22
21 + 5 + 3 = 29 = 21 + 8 23 24
25 26
27 28
Solution: Only allow sums not containing consecutive Fibonacci numbers.


29 30
31 32
33 34
Zeckendorf’s Theorem
35 36
Every positive integer can be represented uniquely as the sum of one or more distinct 37 38
39 40
Fibonacci numbers F (i), i > 1 in such a way that the sum does not include any two 41 42
consecutive Fibonacci numbers. 43 44
45 46
The sum fulfilling this condition is called Zeckendorf representation. 47 48
49 50
51

Fibonacci Coding MI
A
1 2
Proof of Zeckendorf’s Theorem 3 4
5 6
In the following, we denote a Zeckendorf representation of a number x ∈ N by xZ . 7 8
9 10
11 12
Part I: Existence 13 14
15 16
Proof by induction. Trivial for 1,2,3, since they are Fibonacci numbers. 17 18
19 20
Assumption: ∀j ≤ k : ∃ jZ
21 22
k → k + 1: 23 24
25 26
If k + 1 is a Fibonacci number, we are done. 27 28
29 30
If it is not, then ∃j : F (j) < k + 1 < F (j + 1).
31 32
Consider the remainder r := k + 1 − F (j). 33 34
35 36
F (j) ≥ 1 ⇒ r ≤ k, i.e. ∃ rZ . 37 38
39 40
41 42
F (j) + r < F (j + 1) ⇔ F (j) + r < F (j) + F (j − 1) ⇔ r < F (j − 1) 43 44
45 46
47 48
Since F (j − 1) is not contained in rZ , we have rZ + F (j) = (k + 1)Z . 49 50
51
Fibonacci Coding MI
A
1 2
Definition: In the following, Xm denotes a set of m non-consecutive indices for 3 4
Fibonacci numbers: Xm = {x1, ..., xm}, ∀i ∈ {1, ..., m} : xi < xi+1 − 1, xi ≥ 1. 5 6
7 8
Lemma: Let Xm be given with F (xm) = F (j) for some j ≥ 2. Then we have 9 10
11 12
m
X 13 14
F (xi) < F (j + 1). 15 16
i=1 17 18
Proof: 19 20
21 22
For j = 2, the Lemma holds trivially. The only valid choices for the sum is F (2) = 1, 23 24
which is strictly smaller than F (3) = 2. 25 26
27 28
j → j + 1: 29 30
31 32
Let Xm be given with F (xm) = F (j + 1) and assume that the Lemma has been 33 34
shown for j (induction hypothesis IH). 35 36
37 38
m m−1 39 40
X X IH 41 42
F (xi) = F (xi) + F (xm) < F (x {z + 1}) + F ( |{z}
| m−1 xm )
43 44
i=1 i=1 ≤j =j+1
45 46
≤ F (j) + F (j + 1) = F (j + 2) 47 48
49 50
51

Fibonacci Coding MI
A
1 2
Proof of Zeckendorf’s Theorem 3 4
5 6
Part II: Uniqueness 7 8
9 10
Assume there are two non-consecutive index sets Am and Bn such that 11 12
X X 13 14
F (i) = F (j). 15 16
i∈Am j∈Bn
17 18
19 20
21 22
Let A := Am \ Am ∩ Bn and B := Bn \ Am ∩ Bn. We assume that Am and Bn are 23 24
not empty, F (`) = maxi∈A F (i), F (k) = maxi∈B F (i) and w.l.o.g. F (`) < F (k). 25 26
Then we also have: X X 27 28
F (i) = F (j). 29 30
i∈A j∈B 31 32
From the Lemma, we also obtain: 33 34
35 36
X X 37 38
F (i) < F (` + 1) ≤ F (k) ≤ F (j) 39 40
i∈A j∈B 41 42
43 44
which contradicts our initial statement. Thus, A = B = ∅, i.e. Am = Bn. 45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Encoding 3 4
5 6
Determine the Zeckendorf representation of the number x. To this end, use the 7 8


following algorithm: 9 10
11 12
• Step 1: Search the largest Fibonacci number F ≤ x and add it to the 13 14
representation. 15 16
17 18
• Step 2: Compute the remainder r = x − F . If r > 0, set x = r and go back 19 20
to step 1. 21 22
23 24
Starting from F (2), add a 1 for each Fibonacci number occurring in the


25 26
Zeckendorf representation of the number being encoded, and a 0 for each 27 28
Fibonacci number not occurring. 29 30
31 32
Append a 1 at the end.


33 34
35 36
Decoding 37 38
39 40
Remove the final 1. 41 42


43 44
Sum the Fibonacci numbers corresponding to the remaining 1 bits. 45 46


47 48
49 50
51

Fibonacci Coding MI
A
1 2
Examples 3 4
5 6
F (0) F (1) F (2) F (3) F (4) F (5) F (6) F (7) F (8) F (9) F (10) 7 8
0 1 1 2 3 5 8 13 21 34 55 9 10
11 12
Top: Fibonacci numbers. Bottom: Examples of Fibonacci coding. 13 14
15 16
Number Fibonacci representation Code word 17 18
1 F (2) 1 11 19 20
2 F (3) 2 011 21 22
3 F (4) 3 0011 23 24
4 F (2) + F (4) 3+1 1011 25 26
5 F (5) 5 00011 27 28
6 F (2) + F (5) 5+1 10011 29 30
31 32
7 F (3) + F (5) 5+2 01011 33 34
8 F (6) 8 000011 35 36
9 F (6) + F (2) 8+1 100011 37 38
10 F (6) + F (3) 8+2 010011 39 40
11 F (6) + F (4) 8+3 001011 41 42
12 F (6) + F (4) + F (2) 8 + 3 + 1 101011 43 44
45 46
47 48
49 50
51
Fibonacci Coding MI
A
1 2
Remarks 3 4
5 6
In many cases, there are more efficient alternatives to Fibonacci coding. 7 8


9 10
They are still used for applications that need to be robust:


11 12
• The sequence 11 always/only occurs at the end of the code word. 13 14
15 16
• Thus, a corrupted bit affects at most two encoded numbers. 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

Summary MI
A
1 2
Summary 3 4
5 6
Overhead has to be considered for storing the coding scheme. 7 8


9 10
Canonical codes make the results of a coding scheme unique in postprocessing. 11 12


13 14
Huffman Coding is optimal (under the right conditions).


15 16
We have seen several ways how to encode infinite source alphabets. 17 18


19 20
Parameter-free: Unary, Elias Gamma, and Fibonacci Coding. 21 22


23 24
Adaptation to distribution: (Exponential) Golomb Codes. 25 26


27 28
Robust under transmission errors: Fibonacci Coding.


29 30
31 32
Outlook 33 34
35 36
For finite alphabets, can we still do better than Huffman coding? 37 38


39 40
41 42
43 44
45 46
47 48
49 50
51
References MI
A
1 2
References 3 4
5 6
D. Hankerson, G. A. Harris, P.D. Johnson Jr. Introduction to Information Theory 7 8


and Data Compression. Second edition. Chapman & Hall/CRC, 2003. 9 10


(Proof for n-ary Huffman code optimality) 11 12
13 14
D. A. Huffman. A method for the construction of minimum-redundancy codes. 15 16


Proceedings of the IRE Vol. 40, No. 9 (1952): 1098-1101. 17 18


(Original Paper by Huffman) 19 20
21 22
P. Elias. Universal codeword sets and representations of the integers. IEEE 23 24


Transactions on Information Theory 21 (2): 194-203. 25 26


(Introduction of Elias Gamma Codes) 27 28
29 30
S. W. Golomb. Run-length encodings. IEEE Transactions on Information Theory 31 32


12 (3): 399-401, 1966. 33 34


https://web.stanford.edu/class/ee398a/handouts/papers/Golomb% 35 36
20-%20Run-Length%20Codes%20-%20IT66.pdf 37 38
(Introduction to Golomb Codes featuring Agent 00111) 39 40
41 42
K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2006. 43 44


(Golomb & Rice Codes) 45 46


47 48
49 50
51

References MI
A
1 2
References (continued) 3 4
5 6
T. Strutz. Bilddatenkompression. Fourth edition. Vieweg+Teubner, 2009. 7 8


(Golomb, Rice, and exponential Golomb Codes, in German) 9 10


11 12
D. A. Lelewer and D. S. Hirschberg. Data Compression 13 14


http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html#Sec_3.3 15 16
(Elias Codes, Fibonacci Codes) 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
35 36
37 38
39 40
41 42
43 44
45 46
47 48
49 50
51

You might also like