Module 5 - Info Theory and Compression Algo
Module 5 - Info Theory and Compression Algo
Module Five
(Multimedia – Compression Algorithms)
2
Information Theory and
Compression Algorithms
3
Information Theory (Shannon Theory)
4
Information Theory (Shannon Theory)
5
Entropy and Code Length
6
Run-Length Coding (RLC)
7
Run-Length Coding (RLC)
Run length encoding (RLC) is a technique that is not
so widely used these days
This is a "bitmap", because we've mapped the pixels onto the values of bits
Using this method, the above image would be represented in the following way:
8
Run-Length Coding (RLC)
100111101111001 1, 2, 4, 1, 4, 2, 1
011111000111110 0, 1, 5, 3, 5, 1
111110000011111 5, 5, 5
111100000001111 ….
111000000000111
….
Can we represent the same image using fewer bits,
but still be able to reconstruct the original image?
Yes, we can. One of the many methods is called run length encoding.
Replace each row with numbers that say how many consecutive pixels
are the same colour,
Always starting with the number of white pixels.
For example, the first row in the image above contains one white, two black,
four white, one black, four white, two black, and one white pixel. 9
Decompression of RLC
Exercise 1: 4, 11, 3
4, 9, 2, 1, 2
Can you decompress the following code? 4, 9, 2, 1, 2
How many pixels were there in the original image? 4, 11, 3
How many numbers were used to represent those 4, 9, 5
pixels? 4, 9, 5
How much space have we saved using this alternate 5, 7, 6
representation, and how can we measure it? 0, 17, 1
1, 15, 2
RLC Usage
The main place that black and white scanned images are used now is on fax
machines, which use this approach to compression. One reason that it works so
well with scanned pages is that the number of consecutive white pixels is huge.
In fact, there will be entire scanned lines that are nothing but white pixels. A
typical fax page is 200 pixels across or more, so replacing 200 bits with one
10
number is a big saving.
Variable Length Coding (VLC)
11
Variable Length Coding (VLC)
12
Variable Length Coding (VLC)
13
Variable Length Coding
14
Variable Length Coding
15
Huffman Coding
16
Huffman Coding
17
Huffman Coding
18
Properties of Huffman Coding
19
Fixed vs. Variable Length Coding
20
Exercise: Shannon vs. Huffman Coding
Shannon-Fano: 89 bits
Huffman: 87 bits
Fix Length Coding : 117 bits
21
Extended Huffman Coding
22
Extended Huffman Coding
23
Adaptive Huffman Coding
24
Adaptive Huffman Coding
25
Adaptive Huffman Coding
(Tree Updating)
26
27
Adaptive Huffman Coding
28
Adaptive Huffman Coding
29
Adaptive Huffman Coding
30
31
Adaptive Huffman Coding
32
Dictionary-based Coding
• Lempel–Ziv–Welch is a universal lossless data compression
algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.
• Because the codes take up less space than the strings they
replace, we get compression.
33
Dictionary-based Coding
• Two commonly-used file formats in which LZW compression is
used are
• the GIF image format served from Web sites and
• the TIFF image format.
• You can open and save a TIFF file as many times you like without
degrading the image.
• If you try that with JPG, the image quality will deteriorate more
each time.
34
Dictionary-based Coding
35
36
Dictionary-based Coding
37
38
Dictionary-based Coding
39
40
Dictionary-based Coding
(LZW - Remarks)
41
Arithmetic Coding
• Arithmetic coding (AC) is a form of entropy encoding used in
lossless data compression.
43
Arithmetic Coding
44
Arithmetic Coding
45
Arithmetic Code for: CAEE$
46
Arithmetic Coding
47
Arithmetic Coding
48
Arithmetic Coding
For the above example, low = 0.33184, high = 0.3322.
Assigning 1 to the 2nd bit makes a binary code 0.01 and value(0.01) =
0.25, which is less than high, so it is accepted. Since it is still true that
value(0.01) < low, the iteration continues.
50
Arithmetic Coding
51
Lossless Image Compression
52
Lossless Image Compression
53
Lossless JPEG
54
Lossless JPEG
55
Lossless JPEG
56
Lossless JPEG
57
Homework
58