Lempel-Ziv-Welch (LZW) Compression Algorithm

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Lempel-Ziv-Welch (LZW) Compression Algorithm

Introduction to the LZW Algorithm



Example 1: Encoding using LZW

Example 2: Decoding using LZW

LZW: Concluding Notes
Introduction to LZW
As mentioned earlier, static coding schemes require some
knowledge about the data before encoding takes place.

Universal coding schemes, like LZW, do not require
advance knowledge and can build such knowledge on-the-
fly.

LZW is the foremost technique for general purpose data
compression due to its simplicity and versatility.

It is the basis of many PC utilities that claim to double the
capacity of your hard drive

LZW compression uses a code table, with 4096 as a
common choice for the number of table entries.
Introduction to LZW (cont'd)
Codes 0-255 in the code table are always assigned to
represent single bytes from the input file.

When encoding begins the code table contains only the first
256 entries, with the remainder of the table being blanks.

Compression is achieved by using codes 256 through 4095
to represent sequences of bytes.

As the encoding continues, LZW identifies repeated
sequences in the data, and adds them to the code table.

Decoding is achieved by taking each code from the
compressed file, and translating it through the code table
to find what character or characters it represents.
LZW Encoding Algorithm
1 Initialize table with single character strings
2 P = first input character
3 WHILE not end of input stream
4 C = next input character
5 IF P + C is in the string table
6 P = P + C
7 ELSE
8 output the code for P
9 add P + C to the string table
10 P = C
11 END WHILE
12 output code for P
Example 1: Compression using LZW
Example 1: Use the LZW algorithm to compress the string

BABAABAAA
Example 1: LZW Compression Step 1
BABAABAAA P=A
C=empty
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
Example 1: LZW Compression Step 2
BABAABAAA P=B
C=empty
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
AB 257 A 65
Example 1: LZW Compression Step 3
BABAABAAA P=A
C=empty
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
AB 257 A 65
BAA 258 BA 256
Example 1: LZW Compression Step 4
BABAABAAA P=A
C=empty
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
AB 257 A 65
BAA 258 BA 256
ABA 259 AB 257
Example 1: LZW Compression Step 5
BABAABAAA P=A
C=A
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
AB 257 A 65
BAA 258 BA 256
ABA 259 AB 257
AA 260 A 65
Example 1: LZW Compression Step 6
BABAABAAA P=AA
C=empty
STRING TABLE ENCODER OUTPUT
string codeword representing output code
BA 256 B 66
AB 257 A 65
BAA 258 BA 256
ABA 259 AB 257
AA 260 A 65
AA 260
LZW Decompression
The LZW decompressor creates the same string table
during decompression.

It starts with the first 256 table entries initialized to single
characters.

The string table is updated for each character in the input
stream, except the first one.

Decoding achieved by reading codes and translating them
through the code table being built.
LZW Decompression Algorithm
1 Initialize table with single character strings
2 OLD = first input code
3 output translation of OLD
4 WHILE not end of input stream
5 NEW = next input code
6 IF NEW is not in the string table
7 S = translation of OLD
8 S = S + C
9 ELSE
10 S = translation of NEW
11 output S
12 C = first character of S
13 OLD + C to the string table
14 OLD = NEW
15 END WHILE
Example 2: LZW Decompression 1
Example 2: Use LZW to decompress the output sequence of
Example 1:

<66><65><256><257><65><260>.
Example 2: LZW Decompression Step 1
<66><65><256><257><65><260> Old = 65 S = A
New = 66 C = A
STRING TABLE ENCODER OUTPUT
string codeword string
B
BA 256 A
Example 2: LZW Decompression Step 2
<66><65><256><257><65><260> Old = 256 S = BA
New = 256 C = B
STRING TABLE ENCODER OUTPUT
string codeword string
B
BA 256 A
AB 257 BA
Example 2: LZW Decompression Step 3
<66><65><256><257><65><260> Old = 257 S = AB
New = 257 C = A
STRING TABLE ENCODER OUTPUT
string codeword string
B
BA 256 A
AB 257 BA
BAA 258 AB
Example 2: LZW Decompression Step 4
<66><65><256><257><65><260> Old = 65 S = A
New = 65 C = A
STRING TABLE ENCODER OUTPUT
string codeword string
B
BA 256 A
AB 257 BA
BAA 258 AB
ABA 259 A
Example 2: LZW Decompression Step 5
<66><65><256><257><65><260> Old = 260 S = AA
New = 260 C = A
STRING TABLE ENCODER OUTPUT
string codeword string
B
BA 256 A
AB 257 BA
BAA 258 AB
ABA 259 A
AA 260 AA
LZW: Some Notes
This algorithm compresses repetitive sequences of data
well.

Since the codewords are 12 bits, any single encoded
character will expand the data size rather than reduce it.

In this example, 72 bits are represented with 72 bits of
data. After a reasonable string table is built, compression
improves dramatically.

Advantages of LZW over Huffman:
LZW requires no prior information about the input data stream.
LZW can compress the input stream in one single pass.
Another advantage of LZW its simplicity, allowing fast execution.
LZW: Limitations
What happens when the dictionary gets too large (i.e., when all the
4096 locations have been used)?
Here are some options usually implemented:

Simply forget about adding any more entries and use the table as
is.

Throw the dictionary away when it reaches a certain size.

Throw the dictionary away when it is no longer effective at
compression.

Clear entries 256-4095 and start building the dictionary again.

Some clever schemes rebuild a string table from the last N
input characters.
Exercises
Why did we say on Slide 15 that the codeword NEW = 65 is
in the string table? Review that slide and answer this
question.

Use LZW to trace encoding the string ABRACADABRA.

Write a Java program that encodes a given string using
LZW.

Write a Java program that decodes a given set of encoded
codewords using LZW.

You might also like