0% found this document useful (0 votes)
54 views

Lecture 19

Dictionary coding techniques compress data by building a dictionary of strings found in the input and encoding repeated strings with indexes into the dictionary rather than transmitting the strings in full. Key dictionary coding methods discussed include LZW, which encodes the longest match found in the dictionary and adds new strings to the dictionary, and LZ77, which encodes matches as offsets and lengths from earlier parts of the data stream. Effective dictionary compression relies on strategies like least recently used replacement to manage dictionary size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Lecture 19

Dictionary coding techniques compress data by building a dictionary of strings found in the input and encoding repeated strings with indexes into the dictionary rather than transmitting the strings in full. Key dictionary coding methods discussed include LZW, which encodes the longest match found in the dictionary and adds new strings to the dictionary, and LZ77, which encodes matches as offsets and lengths from earlier parts of the data stream. Effective dictionary compression relies on strategies like least recently used replacement to manage dictionary size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Dictionary Coding

Does not use statistical knowledge of data.


Encoder: As the input is processed develop a
dictionary and transmit the index of strings
found in the dictionary.
Decoder: As the code is processed
reconstruct the dictionary to invert the
process of encoding.
Examples: LZW, LZ77, Sequitur,
Applications: Unix Compress, gzip, GIF

Dictionaries for Data Compression


CSE 326
Autumn 2005
Lecture 19

Dictionary Data Compression - Lecture 19

LZW Encoding Algorithm

LZW Encoding Example (1)


Dictionary

Repeat
find the longest match w in the dictionary
output the index of w
put wa in the dictionary where a was the
unmatched symbol

Dictionary Data Compression - Lecture 19

0 a
1 b
2 ab

Dictionary Data Compression - Lecture 19

LZW Encoding Example (3)


Dictionary

ababababa
0

Dictionary Data Compression - Lecture 19

ababababa

0 a
1 b

LZW Encoding Example (2)


Dictionary

0
1
2
3

a
b
ab
ba

ababababa
01

Dictionary Data Compression - Lecture 19

LZW Encoding Example (4)


Dictionary
0
1
2
3
4

a
b
ab
ba
aba

LZW Encoding Example (5)


Dictionary

ababababa
01 2

Dictionary Data Compression - Lecture 19

0
1
2
3
4
5

0
1
2
3
4
5

a
b
ab
ba
aba
abab

initialize dictionary;
decode first index to w;
put w? in dictionary;
repeat
decode the first symbol s of the index;
complete the previous dictionary entry with s;
finish decoding the remainder of the index;
put w? in the dictionary where w was just decoded;

LZW Decoding Example (1)


0 a
1 b
2 a?

Dictionary Data Compression - Lecture 19

Emulate the encoder in building the dictionary.


Decoder is slightly behind the encoder.

ababababa
01 2
4 3

Dictionary Data Compression - Lecture 19

Dictionary

ababababa
01 2
4

LZW Decoding Algorithm

LZW Encoding Example (6)


Dictionary

a
b
ab
ba
aba
abab

10

LZW Decoding Example (2a)


Dictionary

012436
a

Dictionary Data Compression - Lecture 19

Dictionary Data Compression - Lecture 19

0 a
1 b
2 ab

11

012436
a b

Dictionary Data Compression - Lecture 19

12

LZW Decoding Example (2b)


Dictionary
0
1
2
3

a
b
ab
b?

LZW Decoding Example (3a)


Dictionary

012436
a b

Dictionary Data Compression - Lecture 19

0
1
2
3

13

LZW Decoding Example (3b)


Dictionary
0
1
2
3
4

a
b
ab
ba
ab?

0
1
2
3
4
5

a
b
ab
ba
aba
aba?

0
1
2
3
4

a
b
ab
ba
aba

15

14

012436
a b ab a

Dictionary Data Compression - Lecture 19

16

LZW Decoding Example (5a)


Dictionary

012436
a b ab aba

Dictionary Data Compression - Lecture 19

Dictionary Data Compression - Lecture 19

Dictionary

LZW Decoding Example (4b)


Dictionary

012436
a ba

LZW Decoding Example (4a)

012436
a b ab

Dictionary Data Compression - Lecture 19

a
b
ab
ba

0
1
2
3
4
5

17

a
b
ab
ba
aba
abab

012436
a b ab aba b

Dictionary Data Compression - Lecture 19

18

LZW Decoding Example (5b)


Dictionary
0
1
2
3
4
5
6

LZW Decoding Example (6a)


Dictionary

012436
a b ab aba ba

a
b
ab
ba
aba
abab
ba?

0
1
2
3
4
5
6

Dictionary Data Compression - Lecture 19

19

Dictionary Data Compression - Lecture 19

LZW Decoding Example (6b)


Dictionary
0
1
2
3
4
5
6
7

Base Dictionary
0
1
2
3
4

Dictionary Data Compression - Lecture 19

0 1 4 0 2 0 3 5 7

a
b
c
d
r

21

Dictionary Data Compression - Lecture 19

Trie Data Structure for Encoders


Dictionary
a
b
c
d
r
ab
br
ra
ac

9 ca
10 ad
11 da
12 abr
13 raa
14 abra

0 1 2 3 4
a b c d r
0 1 2 3 4
a b c d r

b 5

22

Encoder Uses a Trie (1)

Fredkin (1960)
0
1
2
3
4
5
6
7
8

20

Decoding Exercise

012436
a b ab aba ba bab

a
b
ab
ba
aba
abab
bab
bab?

012436
a b ab aba ba b

a
b
ab
ba
aba
abab
bab

c 8

d 10

r 6

r 12

a 9

b 5
a 11

a 7

r 12

a 13

a 14

a 14

Dictionary Data Compression - Lecture 19

c 8

d 10

r 6

a 9

a 11

a 7
a 13

abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12

23

Dictionary Data Compression - Lecture 19

24

Encoder Uses a Trie (2)

Decoders Data Structure


Simply an array of strings

0 1 2 3 4
a b c d r
b 5

c 8

d 10

r 12

a 15

r 6

a 9

a 11

0
1
2
3
4
5
6
7
8

a 7
a 13

a 14
abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19

9 ca
10 ad
11 da
12 abr
13 raa
14 abr?

25

Bounded Size Dictionary

a
b
c
d
r
ab
br
ra
ac

0 1 4 0 2 0 3 5 7 12 8 ...
a b r a c a d ab ra abr

Dictionary Data Compression - Lecture 19

Implementing the LRV Strategy

Bounded Size Dictionary

Least Recent

2n

n bits of index allows a dictionary of size


Doubtful that long entries in the dictionary will be
useful.

b 5

Strategies when the dictionary reaches its limit.


1.
2.
3.
4.

Dont add more, just use what is there.


Throw it away and start a new dictionary.
Double the dictionary, adding one more bit to indices.
Throw out the least recently visited entry to make
room for the new entry.
Dictionary Data Compression - Lecture 19

27

b 5

c 8

d 10

Doubly linked queue


Circular sibling lists
Parent pointers

0 1 2 3 4
a b c d r
a 9

a 11

r 12 a 6

a 7
a 13

d 10

r 6

a 9

a 11

r 12

a 7
a 13

a 14

Most Recent

abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12

Dictionary Data Compression - Lecture 19

28

Extremely effective when there are repeated


patterns in the data that are widely spread.
Negative: Creates entries in the dictionary
that may never be used.
Applications:
Unix compress, GIF, V.42 bis modem standard

a 14

Most Recent

c 8

Doubly linked queue


Circular sibling lists
Parent pointers

0 1 2 3 4
a b c d r

Notes on LZW

Implementing the LRV Strategy


Least Recent

26

abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19

29

Dictionary Data Compression - Lecture 19

30

LZ77

Solution A

Ziv and Lempel, 1977


Dictionary is implicit
Use the string coded so far as a dictionary.
Given that x1x2...xn has been coded we want
to code xn+1xn+2...xn+k for the largest k
possible.

If xn+1xn+2...xn+k is a substring of x1x2...xn then


xn+1xn+2...xn+k can be coded by <j,k> where j is
the beginning of the match.
Example
ababababa babababababababab....
coded

ababababa babababa babababab....


<2,8>

Dictionary Data Compression - Lecture 19

31

Dictionary Data Compression - Lecture 19

Solution A Problem

32

Solution B

What if there is no match at all in the


dictionary?
ababababa cabababababababab....
coded

Solution B. Send tuples <j,k,x> where


If k = 0 then x is the unmatched symbol
If k > 0 then the match starts at j and is k long and
the unmatched symbol is x.

If xn+1xn+2...xn+k is a substring of x1x2...xn and


xn+1xn+2... xn+kxn+k+1 is not then xn+1xn+2...xn+k
xn+k+1 can be coded by
<j,k, xn+k+1 >
where j is the beginning of the match.
Examples
ababababa cabababababababab....
ababababa c ababababab ababab....
<0,0,c> <1,9,b>

Dictionary Data Compression - Lecture 19

33

Dictionary Data Compression - Lecture 19

Solution B Example

34

Surprise Code!

a bababababababababababab.....

a bababababababababababab$

<0,0,a>

<0,0,a>

a b ababababababababababab.....

a b ababababababababababab$

<0,0,b>

<0,0,b>

a b aba bababababababababab.....

a b ababababababababababab$

<1,2,a>

<1,22,$>

a b aba babab ababababababab.....


<2,4,b>

a b aba babab abababababa bab.....


<1,10,a>

Dictionary Data Compression - Lecture 19

35

Dictionary Data Compression - Lecture 19

36

Surprise Decoding

Surprise Decoding

<0,0,a><0,0,b><1,22,$>

<0,0,a><0,0,b><1,22,$>

<0,0,a>
<0,0,b>
<1,22,$>
<2,21,$>
<3,20,$>
<4,19,$>
...
<22,1,$>
<23,0,$>

<0,0,a>
<0,0,b>
<1,22,$>
<2,21,$>
<3,20,$>
<4,19,$>
...
<22,1,$>
<23,0,$>

a
b
a
b
a
b
b
$

Dictionary Data Compression - Lecture 19

37

Solution C

Dictionary Data Compression - Lecture 19

39

Bounded Buffer Sliding Window

Search buffer of size s is the symbols xn-s+1...xn


j is then the offset into the buffer.
Look-ahead buffer of size t is the symbols xn+1...xn+t

Match pointer can start in search buffer and go into the


look-ahead buffer but no farther.
uncoded text pointer

aaaabababaaab$
search buffer
coded

tuple
<2,5,a>

look-ahead buffer
uncoded

Dictionary Data Compression - Lecture 19

Dictionary Data Compression - Lecture 19

38

Use Solution C to code the string


abaabaaabaaaab$

Dictionary Data Compression - Lecture 19

40

Search in the Sliding Window

We want the triples <j,k,x> to be of bounded size. To


achieve this we use bounded buffers.

match pointer

b
$

In Class Exercise

The matching string can include part of itself!


If xn+1xn+2...xn+k is a substring of
x1x2...xn xn+1xn+2...xn+k
that begins at j < n and xn+1xn+2... xn+kxn+k+1 is
not then xn+1xn+2...xn+k xn+k+1 can be coded by
<j,k, xn+k+1 >

Sliding window

a
b
a
b
a
b

41

aaaabababaaab$

offset
1

length
0

aaaabababaaab$

aaaabababaaab$

aaaabababaaab$

aaaabababaaab$

aaaabababaaab$

Dictionary Data Compression - Lecture 19

tuple
<2,5,a>
42

Coding Example

Coding the Tuples

s = 4, t = 4, a = 3

Simple fixed length code

log2 (s + 1) + log2 (s + t + 1) + log2a

tuple
<0,0,a>
<1,3,b>
<2,5,a>
<4,2,$>

aaaabababaaab$
aaaabababaaab$
aaaabababaaab$
aaaabababaaab$

tuple
fixed code
<2,5,a> 010 0101 00

s = 4, t = 4, a = 3

Variable length code using adaptive Huffman


or arithmetic code on Tuples
Two passes, first to create the tuples, second to
code the tuples
One pass, by pipelining tuples into a variable
length coder

Dictionary Data Compression - Lecture 19

43

Dictionary Data Compression - Lecture 19

Zip and Gzip

44

Example
12

Search Window

aaaabababaaabaaaababababaaabba$

Search buffer 32KB


Look-ahead buffer 258 Bytes

How to store such a large dictionary

aba

Hash table that stores the starting positions for all


three byte sequences.
Hash table uses chaining with newest entries at the
beginning of the chain. Stale entries can be ignored.

11

Offset =12 8 = 4
Length = 5
Tuple = <4,5,a>

Second pass for Huffman coding of tuples.


Coding done in blocks to avoid disk accesses.
10
Dictionary Data Compression - Lecture 19

45

Example

Dictionary Data Compression - Lecture 19

46

Notes on LZ77

18

Very popular especially in unix world


Many variants and implementations

aaaabababaaabaaaababababaaabba$

bab

13

17

12

16

11

15

14

10

Zip, Gzip, PNG, PKZip,Lharc, ARJ


8

Tends to work better than LZW

No match
Tuple = <0,0,b>

LZW has dictionary entries that are never used


LZW has past strings that are not in the dictionary
LZ77 has an implicit dictionary. Common tuples
are coded with few bits.

Dictionary Data Compression - Lecture 19

47

Dictionary Data Compression - Lecture 19

48

You might also like