Data Compression
Data Compression
Contents
1 Data Compression
7 LOGO
7
Data Compression
8 LOGO
8
1
1/8/23
File Compression
v Reasons for file compression:
§ Less storage
9 LOGO
9
10 LOGO
10
2
1/8/23
11 LOGO
11
12 LOGO
12
3
1/8/23
Lossless Compression
Methods
13 LOGO
13
14 LOGO
14
4
1/8/23
Run-length
Encoding
15 LOGO
15
Run-length Encoding
v Simplest method of compression.
16 LOGO
16
5
1/8/23
Run-length Encoding
é0 0 0 0 0 0 0 0 ù
ê0 0 255 0 255 0 0 0 úú
ê
ê0 0 0 0 0 255 0 0 ú
ê0 0 0 0 0 0 0 0 ú
ê0 0 255 0 0 0 0 0 ú
ê ú
ê0 0 0 0 0 0 0 0 ú
ê0 0 0 0 0 255 0 0 ú
ê ú
ëê0 0 0 0 0 0 0 255ûú
17 LOGO
17
Run-length Encoding
v Run-length encoding algorithm:
§ Read through pixels, copying pixel values to file in
sequence, except the same pixel value occurs more
than once in succession
§ When the same value occurs more than once in
succession, substitute the following three bytes:
• special run-length code indicator(e.g. 0xFF)
• pixel value repeated
• the number of times that value is repeated
v Example:
22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24
RL-coded stream: 22 23 ff 24 07 25 ff 26 06 25 24
18 LOGO
18
6
1/8/23
Run-length Encoding
v Run-length encoding algorithm:
§ It is an example of redundancy reduction
§ Drawbacks:
• not guarantee any particular amount of space
savings
• under some circumstances, compressed image is
larger than original image
• Why? Can you prevent this?
19 LOGO
19
Huffman
Coding
20 LOGO
20
7
1/8/23
Huffman Coding
v Assign fewer bits to symbols that occur more
frequently and more bits to symbols appear less
often.
21 LOGO
21
Huffman Coding
v Algorithm:
1. Make a leaf node for each code symbol
• Add the generation probability of each symbol to the leaf
node
22 LOGO
22
8
1/8/23
Huffman Coding
2
v Example:
1 3
23 LOGO
23
Huffman Coding
v Encoding
v Decoding
24 LOGO
24
9
1/8/23
Huffman Coding
v Example:
25 LOGO
25
Lempel-Ziv
Codes
26 LOGO
26
10
1/8/23
Lempel-Ziv Codes
v There are several variations of Lempel-Ziv Codes.
27 LOGO
27
Lempel-Ziv Codes
v Let us look at an example for an alphabet having only two
letters:
aaababbbaaabaaaaaaabaabb
v Rule
§ Separate this stream of characters into pieces of text
so that each piece is the shortest string of characters
that we have not seen yet.
a | a a | b | a b | b b | a a a | b a| a a a a | a a b | a a b b
28 LOGO
28
11
1/8/23
Lempel-Ziv Codes
aaababbbaaabaaaaaaabaabb
a | a a | b | a b | b b | a a a | b a| a a a a | a a b | a a b b
1. We see “a”
2. “a” has been seen, we now see “aa”
3. We see “b”
4. “a” has been seen, we now see “ab”
5. “b” has been seen, we now see “bb”
6. “aa” has been seen, we now see “aaa”
7. “b” has been seen, we now see “ba”
8. “aaa” has been seen, we now see “aaaa”
9. “aa” has been seen, we now see “aab”
10. “aab” has been seen, we now see “aabb”
29 LOGO
29
Lempel-Ziv Codes
v Index:
1 2 3 4 5 6 7 8 9 10
a | a a | b | a b | b b | a a a | b a| a a a a | a a b | a a b b
30 LOGO
30
12
1/8/23
Lempel-Ziv Codes
v Encoding:
1 2 3 4 5 6 7 8 9 10
a | a a | b | a b | b b | a a a | b a| a a a a | a a b | a a b b
1 2 3 4 5 6 7 8 9 10
0a|1a|0b| 1b|3b|2a|3a|6a|2b|9b
31 LOGO
31
Lempel-Ziv Codes
v Encoding tree:
§ A tree can be built
when encoding
1 2 3 4 5 6 7 8 9 10
a | a a | b | a b | b b | a a a | 32b a| a a a a | a a b | a aLOGO
bb
32
13
1/8/23
Lempel-Ziv Codes
v Exercise No. 1:
“aaabbcbcdddeab”
33 LOGO
33
Lempel-Ziv Codes
v Exercise No. 1:
1 2 3 4 5 6 7 8
a|aa|b|bc|bcd|d|de|ab
0a|1a|0b|3c|4d|0d|6e|1b
34 LOGO
34
14
1/8/23
Lempel-Ziv Codes
v Exercise No. 1:
1 2 3 4 5 6 7 8
a|aa|b|bc|bcd|d|de|ab
0a|1a|0b|2c|4d|0d|6e|1b
35 LOGO
35
v Basic idea:
§ Create a dictionary (a table) of strings used during
communication.
36 LOGO
36
15
1/8/23
v Algorithm:
1. Extract the smallest substring that cannot be found in the
remaining uncompressed string.
2. Store that substring in the dictionary as a new entry and assign
it an index value
3. Substring is replaced with the index found in the dictionary
4. Insert the index and the last character of the substring into the
compressed string
37 LOGO
37
38 LOGO
38
16
1/8/23
39 LOGO
39
Lossy Compression
Methods
40 LOGO
40
17
1/8/23
v Several methods:
v JPEG: compress pictures and graphics
v MPEG: compress video
v MP3: compress audio
41 LOGO
41
42 LOGO
42
18
1/8/23
43 LOGO
43
JPEG Encoding
44 LOGO
44
19
1/8/23
JPEG Encoding
v Used to compress pictures and graphics.
v In JPEG, a grayscale picture is divided into 8x8 pixel blocks
to decrease the number of calculations.
v Basic idea:
1. Change the picture into a linear (vector) sets of numbers that
reveals the redundancies.
2. The redundancies is then removed by one of lossless compression
methods.
45 LOGO
45
46 LOGO
46
20
1/8/23
§ After T table is created, the values are quantized to reduce the number
of bits needed for encoding.
47 LOGO
47
§ Quantized values are read from the table and redundant 0s are
removed.
48 LOGO
48
21
1/8/23
JPEG Encoding
49 LOGO
49
MPEG Encoding
50 LOGO
50
22
1/8/23
MPEG Encoding
v Used to compress video.
v Basic idea:
v Compressing video =
spatially compressing each frame
+
temporally compressing a set of frames.
51 LOGO
51
MPEG Encoding
v Spatial Compression
v Each frame is spatially compressed by JPEG.
v Temporal Compression
v Redundant frames are removed.
v For example, in a static scene in which someone is talking, most
frames are the same except for the segment around the
speaker’s lips, which changes from one frame to the next.
52 LOGO
52
23
1/8/23
Audio Encoding
53 LOGO
53
Audio Compression
vUsed for speech or music
§ Speech: compress a 64 kHz digitized signal
§ Music: compress a 1.411 MHz signal
54 LOGO
54
24
1/8/23
Audio Encoding
v Predictive Encoding
§ Only the differences between samples are encoded, not the
whole sample values.
§ Several standards: GSM (13 kbps), G.729 (8 kbps), and G.723.3
(6.4 or 5.3 kbps)
55 LOGO
55
25