05 Compression
05 Compression
Compression
The Compression "Problem"
2
Why compress?
3
Approaches to compression
4
Lossy Compression
D Compress C Expand D’
5
Lossy examples
● MP3
○ “Cuts out” portions of audio that are considered beyond what
most people are capable of hearing
● JPEG
40K 28K
6
Lossless Compression
D Compress C Expand D
● Examples:
7
The Lossless Compression "Problem"
8
Huffman Compression
9
Considerations for compressing ASCII
10
Variable length encoding
1 A
00 T
01 K
001 U
100 R
101 C
10101 N
11
Variable length encoding for lossless compression
12
How can we create these prefix-free codes?
Huffman encoding!
13
Generating Huffman codes
14
ABRACADABRA!
Compressed bitstring:
12
1 010010101100 111001001010 1111
7
0 1
0 3
1
0
4 2
0 1 0 1
A B R C D !
5 2 2 1 1 1
15
Implementation concerns
16
Binary I/O
writeBit(true);
writeBit(false);
writeBit(true);
buffer: ????????
???????1
???????0
??????10
?????101
?????100
????1010
???10100
??101000
?1010000
10100001
10100000
writeBit(false);
writeBit(false);
writeBit(false); N: 1
2
3
4
5
6
7
0
8
writeBit(false);
writeBit(true);
17
Representing tries as bitstrings
18
Huffman pseudocode
● Encoding approach:
○ Read input
○ Compute frequencies
○ Build trie/codeword table
○ Write out trie as a bitstring to compressed file
○ Write out character count of input
○ Use table to write out the codeword for each input character
● Decoding approach:
○ Read trie
○ Read character count
○ Use trie to decode bitstring of compressed file
19
Further implementation concerns
○ …
compressed file!
20
How do we determine character frequencies?
21
How do we determine character frequencies?
22
How do we determine character frequencies?
23
Ok, so how good is Huffman compression
24
That seems like a bit of a caveat...
25
Run length encoding
27
Patterns are compressible, need a general approach
28
How do we know that “the” will be in our file?
29
LZW compression
30
LZW compression example
33
How does this work out?
codebook!
34
Just one tiny little issue to sort out...
35
LZW corner case example
● Expansion:
37
Further implementation issues: codeword size
38
Variable width codewords
39
Even further implementation issues: codebook size
40
The showdown you’ve all been waiting for...
HUFFMAN vs LZW
41
So lossless compression apps use LZW?
43
Can we reason about how much a file can be compressed?
44
Information theory in a single slide...
45
Entropy applied to language: