Image Compression: I. Fundamentals
Image Compression: I. Fundamentals
Introduction
I. Fundamentals
1. A compression ratio is simply the size of the original data divided by the size of the
compressed data.
2. A technique that compresses a 1 megabyte image to 100 kilobytes has achieved a
compression ratio of
CR = Uncompressed size/Compressed size = 1024 KB/ 100 KB = 10.24
Space savings: 1 Compressed Size/Uncompressed Size (=90.23%)
3. There are two basic types of image compression: lossless compression and lossy
compression.
a. A lossless scheme encodes and decodes the data perfectly, and the resulting
image matches the original image exactly. There is no degradation in the
process-no data is lost.
b. Lossy compression schemes allow redundant and nonessential information to
be lost. Typically with lossy schemes there is a tradeoff between compression
and image quality. The goal of lossy compression is that the final
decompressed image be visually lossless.
4. Denote n1, n2 are the numbers of information-carrying units in two data sets that
represent the same information. Then RD = 1 1/CR is relative data redundancy.
Since it takes three bytes to encode a run of data, it makes sense to encode
only runs of 3 or longer (otherwise, you are expanding your data).
When special character is found in the source data then must encode this
character as a run of length 1.
The MacPaint image file format uses run length encoding, combining the
prefix character with the count byte.
The most significant (highest) bit of the prefix byte determines that following
bytes is repeating data or unique data.
If the bit is set (=1), that byte stores the count (in twos complement)
of how many times to repeat the next data byte.
If the bit is not set (=0), that byte plus one is the number of how
many of the following bytes are unique and can be copied verbatim to
the output.
For example, if count byte is 172, then, next byte is repeated 44 time. If
count byte is 45, then 45 next bytes are unique.
The PCX file format run length sets the two most significant (highest) bits if
there is a run. This leaves six bits, limiting the count to 63, to represent the
length of runs. For example, compressed string 165, 211,145,153,193,234
have original string 165,145,145,....,14519, 153, 234
Other image file formats that use run length encoding are RLE and GEM.
The TIFF and TGA file format specifications allow for optional run length
encoding of the image data.
Run length encoding works very well for images with solid backgrounds like
cartoons. For natural images, it doesn't work as well.
More than 96% of this file consists of only 31 characters: the lower case letters, the
space, the comma, the carriage return, and the period.
If use only 5 bits for each of these characters, for example, 00000= a,
00001=b,..., and arbitriary for other characters (for instance, 8 bits!) the the file
reduces in size by 5/8.
Huffman encoding extremly takes the idea: to assign frequently used characters
fewer bits, and seldom used characters more bits.
Example:
Letters
A
B
C
D
E
F
G
Probability
Huffman code
0.318
0.227
0.149
0.130
0.122
0.031
0.023
00
11
010
011
100
1010
1011
E B A A
A D
D ..
10100110
byte 1
00010110
byte 2
0011011....
byte 3
byte 4
The first step in creating Huffman codes is to create an array of character frequencies.
The algorithm is as follows:
1. Input: all characters as free nodes.
2. Repeat
2.1.
The two free nodes with the lowest frequency are assigned to
a parent node with a weight equal to the sum of the two free child
nodes.
2.2.
The two child nodes are removed from the free nodes list.
The newly created parent node is added to the list as the free node.
Until there is only one free node left.
Output free node.
An example:
Input:
step 1
step 2
A
0.318
A
0.318
A
0.318
B
0.227
B
0.227
B
0.227
C
0.149
C
0.149
C
0.149
D
0.130
D
0.130
E
0.122
F
0.031
G
0.023
E
0.122
0.054
F
0.031
D
0.130
0.176
E
0.122
0.054
F
0.031
step 3
A
0.318
B
0.227
G
0.023
0.176
0.279
C
0.149
G
0.023
D
0.130
E
0.122
0.054
F
0.031
G
0.023
step 4
A
0.318
0.403
0.279
C
0.149
B
0.227
0.176
E
0.122
D
0.130
0.054
F
0.031
G
0.023
0.403
step 5
B
0.227
0.176
E
0.122
0.597
A
0.318
C
0.149
0.054
F
0.031
0.279
D
0.130
G
0.023
1.000
0.597
A
0.318
0.403
0.176
0.279
C
0.149
D
0.130
E
0.122
B
0.227
0.054
F
0.031
G
0.023
Compressed
Image Data
DCT
Quantizer
Entropy
Encoder
Compressed
Image Data
Entropy
Encoder
Dequantizer
Inverse
DCT
Uncompressed
Image
50 55 61 60 70 61
63 59 55 90 89
62 59
60 71
85 69 72
68 113 114 64 66 73
63 58 71 102 54 106 70 69
61 61 68 100 76 88
68 70
79 65 60 70 77 68 58 75
82 71 64 59 55 61 65 80
81 79 69 68 65 76 78 90
where
u = 0, 1, ..,7; v = 0, 1, ..,7
and
There are the original image and the DCT inverted image using only number of DCT
coefficients (u and v run from 0 to N/2) and 1/9 number of DCT coefficients.
16
11
10
16
24
40
51
61
12
12
14
19
26
58
60
55
14
13
16
24
40
57
69
56
14
17
22
29
51
87
80
62
18
22
37
56
68
109 103 77
7.50 0.72
24
35
55
64
81
104 113 92
-9.09 17.00
49
64
78
87 103
72
92
95
98
2.34 4.00
1.19
5.49 7.27
3) Quantization:
A typical quantization matrix, as
specified in the original JPEG Standard , is B
adjacent box.
-28
-2
-3
-1
-1
-3
-1
G ( x, y)
-2
B ( x , y ) = round
Q ( x, y)
, for x , y =0,1, ..., 7
-1
4)
as
5) Encode
5.1. The Zero Run Length Coding (RLC)
Let's consider the 63 vector (it's the 64 vector without the first coefficient). Say that we
have -2, -1, -1,-1,-3, 0, -3, 0, -2, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, -1, 0 , 0 ,0 , only 0,..,0. Here it
is how the RLC JPEG compression is done for this example :
(0,-2), (0,-1), (0,-1), (0,-1), (0,-3), (1,-3), (1,-2), (2, 2), (1,1), (3,1), (1,-1), EOB
ACTUALLY, EOB has as an equivalent (0,0) and it will be (later) Huffman coded like
(0,0). So we'll encode :
(0,-2), (0,-1), (0,-1), (0,-1), (0,-3), (1,-3), (1,-2), (2, 2), (1,1), (3,1), (1,-1), (0,0)
Note that if the quantized vector doesn't finishes with zeroes (has the last element not 0)
we'll not have the EOB marker. Somewhere in the quantized vector we have:
7, nineteen zeros, 3, 0, 0, 0 ,0,0 2, thirty-four zeroes, 5, EOB
The JPG Huffman coding makes the restriction (you'll see later why) that the number of
previous 0's to be coded as a 4-bit value, so it can't overpass the value 15 (0xF). So, the
previous example would be coded as :
Category
-1, 1
0, 1
2,3
00,01, 10,11
-7,-6,-5,-4,
4,5,6,7
000,001,010,011, 100,101,110,111
-15,..,-8,
8,..,15
0000,....,0111, 1000,....,1111
-31,..,-16,
16,..,31
00000,....,01111, 10000,....,11111
-63,..,-32,
32,..,63
-127,..,-64,
64,..,12
-255,..,-128,
128,..,255
-511,..,-256,
256,..,511
-1023,..,-512,
512,..,1023
10
-2047,..,-1024,
1024,..,2047
11
-4095,..,-2048,
2048,..,4095
12
-8191,..,-4096,
4096,..,8191
13
-16383,..,-8192,
8192,..,16383
14
-32767,..,-16384,
16384,..,32767
15
-3,-2,
Category
bit-coded
-2
-1
-3
2
1
2
1
2
2
1
01
0
00
10
1
codes as
2,
1,
2,
2,
1,
01
0
00
10
1
(0,-2), (0,-1), (0,-1), (0,-1), (0,-3), (1,-3), (1,-2), (2, 2), (1,1), (3,1), (1,-1), (0,0)
=>
(0,2)01, (0,1)0, (0,1)0, (0,1)0, (0,2)00, (1,2)00, (1,2)01, (2,2)10, (1,1)1, (3,1)1,
(1,1)0, (0,0)
The pairs of 2 values enclosed in bracket parenthesis, can be represented on a byte. In this
byte, the high nibble represents the number of previous 0s, and the lower nibble is the
category of the new value different by 0.
0, 2
0, 1
1, 2
2, 2
1, 1
3, 1
0, 0
01
00
111001
11111000
1100
111010
1010
The FINAL step of the encoding consists in Huffman encoding this byte, and then writing
in the JPG file, as a stream of bits, the Huffman code of this byte, followed by the
remaining bit-representation of that number. The final stream of bits written in the JPG
file on disk for the previous example
(01)01 (00)0 (00)0 (00)0 (01)00 (111001)00 (111001)01 (11111000)10 (1100)1
(111010)1 (1100)0 (1010)
5.3. The encoding of the DC coefficient
DC is the coefficient in the quantized vector corresponding to the lowest frequency in the
image (it's the 0 frequency) , and (before quantization) is mathematically = (the sum of
8x8 image samples) / 8 .
The authors of the JPEG standard noticed that there's a very close connection between the
DC coefficient of consecutive blocks, so they've decided to encode in the JPG file the
difference between the DCs of consecutive 8x8 blocks:
Diff = DC(i) - DC(i-1)
And in JPG decoding you will start from 0 -- you consider that the first
DC(0) = 0
Diff = (category, bit-coded representation). For example, if Diff is equal to -511 , then
Diff corresponds to (9, 000000000). Say that 9 has a Huffman code = 1111110. (In the
JPG file, there are 2 Huffman tables for an image component: one for DC (and one for
AC). In the JPG file, the bits corresponding to the DC coefficient will be:
1111110 000000000
And, applied to this example of DC and to the previous example of ACs, for this vector
with 64 coefficients, THE FINAL STREAM OF BITS written in the JPG file will be:
1111110 000000000 (01)01 (00)0 (00)0 (00)0 (01)00 (111001)00 (111001)01
(11111000)10 (1100)1 (111010)1 (1100)0 (1010)
(In the JPG file , first it's encoded DC then ACs)
6) Decoder process:
{-28, -2, -1, -1,-1,-3, 0, -3, 0, -2, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, -1, EOB}
0 24
-14
0 32
-28
0 22
-24
54 51 56 66 69 63 62 67
50
55
61
60 70 61
60
71
60 61 72 87 90 80 72 73
63
59
55
90 89
85
69
72
55 61 77 96 98 83 70 66
62 59
68
113 114
64
66
73
55 60 74 92 93 79 67 64
63
58
71
102 54
106 70
69
68 67 74 85 86 77 72 75
61
61
68
100 76 88
68
70
74 66 65 70 71 67 70 78
79
65
60
70
77
68
58
75
76 66 61 63 64 63 69 79
82
71
64
59
55
61
65
80
88 77 72 74 74 73 79 89
81
79
69
68
65
76
78
90
5 -6
3 -2 -17
1 -2 -2
3 -1
5 -3 -1
7 -2 -9 17 16 -19 -4
7
8 -21 -3 10 -39
e ( x,27y ) 3= 65 .3000
64
x =0 y =0
-7 -6 -6 15 -10 11 -4 -5
5 -1 -5 -0
-7
2 -3 -6 -9
1 -12 -3
3 -4 -9 -2 -4
3 -1
1
1