Book-Chapter-07 (Lossless Compression Algorithms) Merged
Book-Chapter-07 (Lossless Compression Algorithms) Merged
lossless Compression
Algorithms
7.1 INTRODUCTION
The emergence of multimedia technologies has made digital libraries a reality. Nowadays,
libraries, museums, film studios, and governments are converting more and more data and
archives into digital form. Some of the data (e.g., precious books and paintings) indeed
need to be stored without any loss.
As a start, suppose we want to encode the call numbers of the 120 million or so items
in the Library of Congress (a mere 20 million, if we consider just books). Why don't we
just transmit each item as a 27-bit number, giving each item a unique binary code (since
2 27 > 120,000, OOO)?
The main problem is that this "great idea" requires too many bits. And in fact there exist
many coding techniques that will effectively reduce the total number of bits needed to rep-
resent the above information. The process involved is generally referred to as compression
[1,2J.
In Chapter 6, we had a beginning look at compression schemes aimed at audio. There, we
had to first consider the complexity of transforming analog signals to digital ones, whereas
here, we shall consider that we at least start with digital signals. For example, even though
we know an image is captured using analog signals, the file produced by a digital camera
is indeed digital. The more general problem of coding (compressing) a set of any symbols,
not just byte values, say, has been studied for a long time.
Getting back to our Library of Congress problem, it is well known that certain parts of
call numbers appear more frequently than others, so it would be more economic to assign
fewer bits as their Eades. This is known as variable-length coding (VLC) ~ the more
frequently-appearing symbols are coded with fewer bits per symbol, and vice versa. As a
result, fewer bits are usually needed to represent the whole collection.
In this chapter we study the basics of information theory and several popular lossless
compression techniques. Figure 7.1 depicts a general data compression scheme, in which
compression is perfonned by an encoder and decompression is performed by a decoder.
We call the output of the encoder codes or codewords. The intennediate medium could
either be data storage or a communication/computer network. If the compression and
decompression processes induce no information loss, the compression scheme is lossless;
otherwise, it is lossy. The next several chapters deal with lossy compression algorithms as
they are commonly used for image, video, and audio compression. Here, we concentrate
on 10ssless compression.
167
168 Chapter 7 Lossless Compression Algorithms
-Input
data
Encoder
(compression) -).
Storage or Decoder
networks t------* (decompression)
Outpu
data
If the total number of bits required to represent the data before compression is Eo and the
total number of bits required to represent the data after compression is 81, then we define
the compression ratio as
. . Bo
compressIOn ratIO = - (7.1)
BI
According to the famous scientist Claude E. Shannon, of Bell Labs [3, 4J, the entropy 1) of
an information source with alphabet S = {SI, S2, ••. ,511 } is defined as:
11 1
1) = H(S) L
1=1
Pi lo g2-
Pi
(7.2)
11
-L PI log2PI (7.3)
1=1
I Since we have chosen 2 as the base for logarithms in the above definition, the unit of information is bit-
naturally also most appropriate for the binary code representation used in digital computers. If the log base is 10,
the unit is the hartley; if the base is e, the unit is the nat.
Section 7.2 Basics of Information Theory 169
every decision to swap or not, we impart 1 bit of information to the card system and transfer
1 bit of negative entropy to the card deck.
The definition of entropy includes the idea that two decisions means the transfer of twice
the negative entropy in its use of the log base 2. A two-bit vector can have 2 2 states, and the
logarithm takes this value into 2 bits of negative entropy. Twice as many sorting decisions
impart twice the entropy change.
Now suppose we wish to communicate those swapping decisions, via a network, say.
Then for our two decisions we'd have to send 2 bits. If we had a two-decision system, then
of course the average number of bits for all such communications would also be 2 bits. If
we like, we can think of the possible number of states in our 2-bit system as four outcomes.
Each outcome has probability 1/4. So on average, the number of bits to send per outcome
is 4 x 0/4) x log«1/(1/4» = 2 bits - no surprise here. To communicate (transmit) the
results of our two decisions, we would need to transmit 2 bits.
But if the probability for one of the outcomes were higher than the others, the average
number of bits we'd send would be different. (This situation might occur if the deck
were already partially ordered, so that the probability of a not-swap were higher than for
a swap.) Suppose the probabilities of one of our four states were 112, and the other three
states each had probability 1/6 of occuning. To extend our modeling of how many bits
to send on average, we need to go to noninteger powers of 2 for probabilities. Then we
can use a logarithm to ask how many (float) bits of information must be sent to transmit
the information content. Equation (7.3) says that in this case, we'd have to send just
(1/2) x log2(2) + 3 x (1/6) x log2(6) = 1.7925 bits, a value less than 2. This reflects
the idea that if we could somehow encode our four states, such that the most-occurring one
means fewer bits to send, we'd do better (fewer bits) on average.
The definition of entropy is aimed at identifying often-occurring symbols in the data-
stream as good candidates for short codewords in the compressed bitstream. As described
earlier, we use a variable-length coding scheme for entropy coding - frequently-occuning
symbols are given codes that are quickly transmitted, while infrequently-occurring ones are
given longer codes. For example, E occurs frequently in English, so we should give it a
shorter code than Q, say.
This aspect of "surprise" in receiving an infrequent symbol in the datastream is reflected
in the definition (7.3). For if a symbol occurs rarely, its probability Pi is low (e.g., 11100),
and thus its logarithm is a large negative number. This reflects the fact that it takes a longer
bitstring to encode it. The probabilities Pi sitting outside the logarithm in Eq. (7.3) say that
over a long stream, the symbols come by with an average frequency equal to the probability
of their occurrence. This weighting should multiply the long or short information content
given by the element of "surprise" in seeing a particular symbol.
As another concrete example, if the information source S is a gray-level digital image,
a
each Si is a gray-level intensity ranging from to (2 k - 1), where k is the number of bits
used to represent each pixel in an uncompressed image. The range is often [0,255], since
8 bits are typically used: this makes a convenient one byte per pixel. The image histogram
(as discussed in Chapter 3) is a way of calculating the probability Pi of having pixels with
gray-level intensity i in the image.
One wrinkle in the algorithm implied by Eq. (7.3) is that if a symbol occurs with zero
frequency, we simply don't count it into the entropy: we cannot take a log of zero.
170 Chapter 7 Lossless Compression Algorithms
Pi
1
1/256 - 2/3
••• 1/3
o 255 o 255
(a) (b)
Figure 7.2(a) shows the histogram of an image with IIl1iform distribution of gray-level
intensities, - that is, Vi Pi = 1/256. Hence, the entropy of this image is
255 1
17 = L 256 . log2 256 = 8 (7.4)
i=O
As can be seen in Eq. (7.3), the entropy 17 is a weighted sum of tenns log2 ~i ; hence it
represents the average amount of infonnation contained per symbol in the source S. For
a memoryless source 2 S, the entropy 1) represents the minimum average number of bits
required to represent each symbol in S. In other words, it specifies the lower bound for the
average number of bits to code each symbol in S.
If we use I to denote the average length (measured in bits) of the codewords produced
by the encoder, the Shannon Coding Theorem states that the entropy is the best we can do
(under certain conditions):
(7.5)
Coding schemes aim to get as close as possible to this theoretical lower bound.
It is interesting to observe that in the above unifonn-distribution example we found that
1) = 8 - the minimum average number of bits to represent each gray-level intensity is at
least 8. No compression is possible for this image! In the context of imaging, this will
correspond to the "worst case," where neighboring pixel values have no similarity.
Figure 7 .2(b) shows the histogram of another image, in which 1/3 of the pixels are rather
dark and 2/3 of them are rather bright. The entropy of this image is
1 2 3
, 1) 3 . log2 3 + 3 . log2 2
0.33 x 1.59 + 0.67 x 0.59 = 0.52 + 0040 = 0.92
In general, the entropy is greater when the probability distribution is flat and smaller when
it is more peaked.
2 An information source that is independently distributed, meaning that the value of the current symbol does
not depend on the values of the previously appeared symbols.
Section 7.3 Run-Length Coding 171
Instead of assuming a memoryless source, run-length coding (RLC) exploits memory present
in the information sour<;e. It is one of the s;mplest fonns of data compression. The basic
idea is that if the information source we wish to compress has the property that symbols
tend to form continuous groups, instead of coding each symbol in the group individually,
we can code one such symbol and the length of the group.
As an example, consider a bilevel image (one with only I-bit black and white pixels)
with monotone regions. This information source can be efficiently coded using run-length
coding. In fact, since there are only two symbols, we do not even need to code any symbol
at the start of each run. Instead, we can assume that the starting run is always of a particular
color (either black or white) and simply code the length of each run.
The above description is the one-dimensional run-length coding algorithm. A two-
dimensional variant of it is usually used to code bilevel images. This algorithm uses the
coded run information in the previous row of the image to code the run in the CUiTent row.
A full description of this algorithm can be found in [5].
Since the entropy indicates the information content in an information source S, it leads to
a family of coding methods commonly known as entropy coding methods. As described
earlier, variable-length coding (VLC) is one of the best-known such methods. Here, we
will study the Shannon~Fano algorithm, Huffman coding, and adaptive Huffman coding.
The Shannon-Fano algorithm was independently developed by Shannon at Bell Labs and
Robert Fana at MIT [6]. To illustrate the algorithm, let's suppose the symbols to be coded
are the characters in the word HELLO. The frequency count of the symbols is
Symbol H E L a
Count I I 2 I
The encoding steps of the Shannon-Fano algorithm can be presented in the following
top-down manner:
(5) (5)
~
0 1
(3)
L:(2) H,E,O:(3) L:(2) 0 1
H:(I) E,0:(2)
(a) (b)
(5)
0 1
(3)
L:(2) 0
(2)
H:(I) 0 1
E:(I) 0:(1)
(c)
Initially, the symbols are sorted as LHEO. As Figure 7.3 shows, the first division yields
two parts: (a) L with a count of2, denoted as L:(2); and (b) H, E and 0 with a total count
of 3, denoted as H,E,0:(3). The second division yields H:(l) and E,O:(2). The last division
is E:(1) and 0:(1).
Table 7.1 summarizes the result, showing each symbol, its frequency count, information
content ( log2 .;;), resulting codeword, and the number ofbits needed to encode each symbol
in the word HELLO. The total number of bits used is shown at the bottom.
To revisit the previous discussion on entropy, in this case,
1 1 1 1
11 PL ·log2 - + PH ·log2 ~ + PE ·log2 - + Po ·log2~
PL PH PE Po
0.4 x 1.32 + 0.2 x 2.32 + 0.2 x 2.32 + 0.2 x 2.32 = 1.92
(5) (5)
~
L,R(3) E,0:(2)
(2)
0
(3)
0 1 0
FIGURE 7.4: Another coding tree for HELLO by the Shannon-Fano algorithm.
This suggests that the minimum average number of bits to code each character in the word
HELLO would be at least 1.92. In this example,. the Shannon-Fano algorithm uses an
average of 10/5 = 2 bits to code each symbol, which is fairly close to the lower bound of
1.92. Apparently, the result is satisfactory.
It should be pointed out that the outcome of the Shannon-Fano algorithm is not neces-
sarily unique. For instance, at the first division in the above example, it would be equally
valid to divide into the two parts L,H:(3) and E,O:(2). This would result in the coding in
Figure 7.4. Table 7.2 shows the codewords are different now. Also, these two sets of code-
words may behave differently when errors are present. Coincidentally, the total number of
bits required to encode the world HELLO remains at 10.
The Shannon~Fanoalgorithm delivers satisfactory coding results for data compression,
but it was soon outperformed and overtaken by the Huffman coding method.
L 2 1.32 00 4
H 1 2.32 01 2
E 1 2.32 10 2
0 1 2.32 11 2
TOTAL number of bits: 10
174 Chapter 7 Lossless Compression Algorithms
PI:(2) P2:(3)
~0:(1)
0 1
P1:(2)
E:(1) H:(l) 0 1
E:{l) 0:(1)
(a) (b)
P3:(5)
0 1
P2:(3)
L:(2) 0 1
P1;(2)
H:(1) 0 1
E:(1) O:{l)
(c)
FIGURE 7.5; Coding tree for HELLO using the Huffman algorithm.
1. Initialization; put all symbols on the list sorted according to their frequency counts.
2. Repeat until the list has only one symbol left.
(a) From the list, pick two symbols with the lowest frequency counts. Form a
Huffman subtree that has these two symbols as child nodes and create a parent
node for them.
(b) Assign the sum of the children's frequency counts to the parent and insert it into
the list, such that the order is maintained.
(c) Delete the children from the list.
3. Assign a codeword for each leaf based on the path from the root. ""'l.
In the above figure, new symbols PI, Pl, P3 are created to refer to the parent nodes in
the Huffman coding tree. The contents in the list are illustrated below:
For this simple example, the Huffman algorithm apparently generated the same coding
-result as one of the Shannon-Fano results shown in Figure 7.3, although the results are
usually better. The average number of bits used to code each character is also 2, (Le.,
(l + 1 + 2 + 3 + 3)/5 = 2). As another simple example, consider a text string containing
a set of characters and their frequency counts as follows: A:(l5), B:(7), C:(6), D:(6) and
E:(5). It is easy to show that the Shannon~Fano algorithm needs a total of 89 bits to encode
this string, whereas the Huffman algorithm needs only 87. !~:
As shown above, if correct probabilities ("prior statistics") are available and accurate,
the Huffman coding method produces good compression results. Decoding for the Huffman
coding is trivial as long as the statistics and/or coding tree are sent before the data to be
compressed (in the file header, say). This overhead becomes negligible if the data file is
sufficiently large.
The following are important properties of Huffman coding:
• Unique prefix property. No Huffman code is a prefix of any other Huffman code.
For instance, the code 0 assigned to L in Figure 7.5(c) is not a prefix of the code 10
for H or 110 for E or 111 for 0; nor is the code 10 for H a prefix of the code 110 for
E or 111 for O. It turns out that the unique prefix property is guaranteed by the above
Huffman algorithm, since it always places all input symbols at the leaf nodes of the
Huffman tree. The Huffman code is one of the prefix codes for which the unique
prefix property holds. The code generated by the Shannon-Fano algorithm is another
such example.
This property is essential and also makes for an efficient decoder, since it precludes
any ambiguity in decoding. In the above example, if a bit 0 is received, the decoder can
immediately produce a symbol L without waiting for any more bits to be transmitted.
- The two least frequent symbols will have the same length for their Huffman
codes, differing only at the last bit. This should be obvious from the above
algorithm.
- Symbols that occur more frequently will have shorter Huffman codes than sym-
bols that occur less frequently. Namely, for symbols Sf and Sj, if Pf 2: Pj then
l; :::: I j, where lf is the number of bits in the codeword for Sf.
- It has been shown (see [2]) that the average code length for an infonnation source
S is strictly less than I] + 1. Combined with Eq.(7 .5), we have
(7.6)
Extended Huffman Coding. The discussion of Huffman coding so far assigns each
symbol a codeword that has an integer bit length. As stated earlier, log2 ~ indicates the
amount of information contained in the infonnation source Si, which cori'esponds to the
176 Chapter 7 Lossless Compression Algorithms
number of bits needed to represent it. When a particular symbol Si has a large probability
(close to 1.0), log2 -/;; will be close to 0, and assigning one bit to represent that symbol will
be costly. Only when the probabilities of all symbols can be expressed as 2- k , where_ k is a
positive ~nteger, would the average length of codewords be truly optimal- that is, I ~ ry.
Clearly, I > ry in most cases.
One way to address the problem of integral codeword length is to group several symbols
and assign a single codeword to the group. Huffman coding of this type is called Extended
Huffman Coding [2]. Assume an information source has alphabet S = {S!, S2, ..• , Sit}. If
k symbols are grouped together, then the extended alphabet is
k symbols
S(k) = ,-'-,
{SISj ..• S!, SlSl .•• S2, •.• , SIS! •• ,Sil, SISl·· .S2S1, ••• , SI1 SI1" 'Sn
}
Note that the size of the new alphabet S(k) is nk . If k is relatively large (e.g., k ::: 3), then
for most practical applications where lZ » I,ll would be a very large number, implying a
huge symbol table. This overhead makes Extended Huffman Coding impractical.
As shown in [2], if the entropy of S is 11, then the average number of bits needed for each
symbol in S is now
(7.7)
so we have shaved quite a bit from the coding schemes' bracketing of the theoretical best
limit. Nevertheless, this is not as much of an improvement over the original Huffman coding
(where group size is 1) as one might have hoped for.
The Huffman algorithm requires prior statistical knowledge about the information source,
and such information is often not available. This is particularly true in multimedia applica-
tions, where future data is unknown before its arrival, as for example in live (or streaming)
audio and video. Even when the statistics are available, the transmission of the symbol table
could represent heavy overhead.
For the non-extended version of Huffman coding, the above discussion assumes a so-
called order-O model- that is, symbols/characters were treated singly, without any context
or history maintained. One possible way to include contextual information is to examine
k preceding (or succeeding) symbols each time; this is known as an order-k modeL For
example, an order-l model can incorporate such statistics as the probability of "qu" in
addition to the individual probabilities of "q" and "u". Nevertheless, this again implies that
much more statistical data has to be stored and sent for the order-k model when k ::: 1.
The solutio11 is to use adaptive compression algorithms, in which statistics are gathered
and updated dynamically as the datastream arrives. The probabilities are no longer based
on prior knowledge but on the actual data received so far. The new coding methods are
"adaptive" because, as the probability distribution of the received symbols changes, symbols
will be given new (longer or sh011er) codes. This is especially desirable for multimedia
data, when the content (the music or the color of the scene) and hence the statistics can
change rapidly.
As an example, we introduce the Adaptive Huffman Coding algorithm in this section.
Many ideas, however, are also applicable to other adaptive compression algorithms.
Section 7.4 Variable-Length Coding (VLC) 177
ENCODER DECODER
Initial_code(); Initial_code();
while not EOF while not EOF
{ {
get (c) j decode (c) ;
encode (c) i output (c) ;
update_tree(c); upda te_tree (c) ;
} }
• lni tial_code assigns symbols with some initially agreed-upon codes, without
any prior knowledge of the frequency counts for them. For example, some conven-
tional code such as ASCII may be used for coding character symbols.
- The Huffman tree must always maintain its sibling property ~ that is, all nodes
(internal and leaf) are arranged in the order of increasing counts. Nodes are
numbered in order from left to right, bottom to top. (See Figure 7.6, in which
the first node is l.A:(l), the second node is 2.B:(1), and so on, where the numbers
in parentheses indicates the count.) If the sibling property is about to be violated,
a swap procedure is invoked to update the tree by rearranging the nodes.
- When a swap is necessary, the fatthest node with count N is swapped with the
node whose count has just been increased to N + 1. Note that if the node with
count N is not a leaf-node ~ it is the root of a subtree - the entire subtree will
go with it during the swap.
• The encoder and decoder must use exactly the same lni tial_code and
update_tree routines.
Figure 7.6(a) depicts a Huffman tree with some symbols already received. Figure 7.6(b)
shows the updated tree after an additional A (i.e., the second A) was received. This increased
the count of As to N + I = 2 and triggered a swap. In this case, the farthest node with
count N = 1 was D:(l). Hence, A:(2) and D:(l) were swapped.
Apparently, the same result could also be obtained by first swapping A:(2) with B:(l),
then with C:(l), and finally with D:(l). The problem is that such a procedure would take
three swaps; the rule of swapping with "the farthest node with count N" helps avoid such
unnecessary swaps.
178 Chapter 7 Lossless Compression Algorithms
9. (9) 9. (10)
7. (4) 7. (5)
8. P:(5) 8. P:(5)
5. (2) 6. (2) 5. (2) 6. (3)
9. (10)
9. (10)
8. P:(5)
7.(5)
8. P:(5) 5. A:(3)
3. C:(l)
9. (11)
7. P:(5)
5. A:(3)
3. C:(l)
1. D:(l) 2. B:(l)
FIGURE 7.6: Node swapping for updating an adaptive Huffman tree: (a) a Huffman tree; (b) receiving
2nd "i\:' triggered a swap; (c':1) a swap is needed after receiving 3rd ''Pi'; (c-2) another swap is needed;
(c-3) the Huffman tree after receiving 3rd "M'.
Section 7.4 Variable-Length Coding (VLC) 179
The update of the Huffman tree after receiving the third A is more involved and is
illustrated in the three steps shown in Figure 7.6(c-l) to (c-3). Since A:(2) will become
A:(3) (temporarily denoted as A:(2+1», it is now necessary to swap A: (2+ I) with the fifth
node. This is illustrated with an arrow in Figure 7.6(c-l).
Since the fifth node is a non-leaf node, the subtree with nodes 1. D:(I), 2. B:(I), and
5. (2) is swapped as a whole with A: (3). Figure 7.6(c-2) shows the tree after this first swap.
Now the seventh node will become (5+1), which triggers another swap with the eighth node.
Figure 7.6(c-3) shows the Huffman tree after this second swap.
The above example shows an update process that aims to maintain the sibling property
of the adaptive Huffman tree - the update of the tree sometimes requires more than one
swap. When this occurs, the swaps should be executed in multiple steps in a "bottom-up"
marmer, starting from' the lowest level where a swap is needed. In other wo~ds, the update
is carried out sequentially: tree nodes are examined in order, and swaps are made whenever
necessary.
To clearly illustrate more implementation details, let's examine another example. Here,
we show exactly what bits are sent, as opposed to simply stating how the tree is updated.
Let's assume that the initial code assignment for both the encoder and decoder simply
follows the ASCII order for the 26 symbols in an alphabet, A through Z, as Table 7.3
shows. To improve the implementation of the algorithm, we adopt an additional rule: if any
character/symbol is to be sent the first time, it must be preceded by a special symbol, NEW.
The initial code for NEW is O. The count for NEW is always kept as 0 (the count is never
increased); hence it is always denoted as NEW: (0) in Figure 7.7.
Figure 7.7 shows the Huffman tree after each step. Initially, there is no tree. For the first
A, 0 for NEW and the initial code 0000I for A are sent. Afterward, the tree is built and
shown as the first one, labeled A. Now both the encoder and decoder have constructed the
same first tree, from which it can be seen that the code for the second A is 1. The code sent
is thus 1.
After the second A, the tree is updated, shown labeled as AA. The updates after receiving
D and C are similar. More subtrees are spawned, and the code for NEW is getting longer
- from 0 to 00 to 000.
TABLE 7.3: Initial code assignment for AADCCDD using adaptive Huffman coding.
Initial Code
NEW: 0
A: 00001
B: OOOlD
C: 00011
D: 00100
180 Chapter 7 Lossless Compression Algorithms
~ ~
~
l
(I)
NEW: (0) A:(1) NEW:(O) A:(2) o 1 A:(2)
NEW:(O) 0:(1)
From AADC to AADCC takes two swaps. To illustrate the update process clearly, this
is shown in three steps, with the required swaps again indicated by arrows.
• AADCQ Step 2. After the swap between C and D, the count of the parent node of
C:(2) will be increased from 2 to 2 + 1 = 3; this requires its swap with A:(2).
Table 7.4 summarizes the sequence of symbols and code (zeros and ones) being sent to
the decoder.
Section 7.5 Dictionary-Based Coding 181
It is important to emphasize that the code for a particular symbol often changes during
the adaptive Huffman coding process. The more frequent the symbol up to the moment, the
shorter the code. For example, after AADCCDD, when the character D overtakes A as the
most frequent symbol, its code changes from 101 to O. This is of course fundamental for the
adaptive algorithm"":'" codes are reassigned dynamically according to the new probability
distribution of the symbols.
The "Squeeze Page" on this book's web site provides a Java applet for adaptive Huffman
coding that should aid you in learning this algorithm.
BEGIN
s = next input character;
while not EOF
else
{
output the code for Si
add string s + c to the dictionary with a new codej
S = Ci
}
output the code for Sj
END
code string
1 A
2 B
3 C
Now if the input string is ABABBABCABABBA, the LZW compression algorithm works
as follows:
1 A
2 B
3 C
--------~~-----------~-----------
A B 1 4 AB
B A 2 5 BA
A B
AB B 4 6 ABB
B A
BA B 5 7 BAB
B C 2 8 BC
C A 3 9 CA
A B
AB A 4 10 ABA
A' B
AB B
ABB A 6 11 ABBA
A EOF 1
Section 7.5 Dictionary-Based Coding 183
The output codes are 1 24523461. Instead of 14 characters, only 9 codes need to be
sent. If we assume each character or code is transmitted as a byte, that is quite a saving (the
compression ratio would be 14/9 = 1.56). (Remember, the LZW is an adaptive algorithm,
in which the encoder and decoder independently build their own string tables. Hence, there
is no overhead involving transmitting the string table.)
Obviously, for our illustration the above example is replete with a great deal of redundancy
in the input string, which is why it achieves compression so quickly. In general, savings for
LZW would not come until the text is more than a few hundred bytes long.
The above LZW algorithm is simple, and it makes no effort in selecting optimal new
strings to enter into its dictionary. As a result, its string table grows rapidly, as illustrated
above. A typical LZW implementation for textual data uses a 12-bit codelength. Hence,
its dictionary can contain up to 4,096 entries, with the first 256 (0--255) entries being
ASCII codes. If we take this into account, the above compression ratio is reduced to
(14 x 8)/(9 x 12) = 1.04.
1 A
2 B
3 C
NIL 1 A
184 Chapter 7 Lossless Compression Algorithms
A 2 B 4 AB
B 4 AB 5 BA
AB 5 BA 6 ABE
BA 2 B 7 BAB
B 3 C 8 BC
C 4 AB 9 CA
AB 6 ABB 10 ABA
ABB 1 A 11 ABBA
A EOF
Apparently the output string is ABABBABCABABBA - a truly 10ssless result!
LZW Algorithm Details A more careful examination of the above simple version of
the LZW decompression algorithm will reveal a potential problem. In adaptively updating
the dictionaries, the encoder is sometimes ahead of the decoder. For example, after the
sequence ABABB, the encoder will output code 4 and create a dictionary entry with code
6 for the new string ABB.
On the decoder side, after receiving the code 4, the output wiI1 be AB, and the dictionary
is updated with code 5 for a new string, BA. This occurs several times in the above example,
such as after the encoder outputs another code 4, code 6. In a way, this is anticipated -
after all, it is a sequential process, and the encoder had to be ahead. In this example, this
did not cause problem.
Welch [11] points out that the simple version of the LZW decompression algorithm will
break down when the following scenario occurs. Assume that the input string is ABAB-
BABCABBABBAX....
The LZW encoder:
s c output code string
1 A
2 B
3 C
A B 1 4 AB
B A 2 5 BA
A B
AB B 4 6 ABB
B A
BA B 5 7 BAB
B C 2 8 BC
C A 3 9 CA'
A B
AB B