0% found this document useful (0 votes)

15 views25 pages

Book-Chapter-07 (Lossless Compression Algorithms) Merged

Uploaded by

Lumen Mahmud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views25 pages

Book-Chapter-07 (Lossless Compression Algorithms) Merged

Uploaded by

Lumen Mahmud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CHAPTER 7

lossless Compression
Algorithms

7.1 INTRODUCTION
The emergence of multimedia technologies has made digital libraries a reality. Nowadays,
libraries, museums, film studios, and governments are converting more and more data and
archives into digital form. Some of the data (e.g., precious books and paintings) indeed
need to be stored without any loss.
As a start, suppose we want to encode the call numbers of the 120 million or so items
in the Library of Congress (a mere 20 million, if we consider just books). Why don't we
just transmit each item as a 27-bit number, giving each item a unique binary code (since
2 27 > 120,000, OOO)?
The main problem is that this "great idea" requires too many bits. And in fact there exist
many coding techniques that will effectively reduce the total number of bits needed to rep-
resent the above information. The process involved is generally referred to as compression
[1,2J.
In Chapter 6, we had a beginning look at compression schemes aimed at audio. There, we
had to first consider the complexity of transforming analog signals to digital ones, whereas
here, we shall consider that we at least start with digital signals. For example, even though
we know an image is captured using analog signals, the file produced by a digital camera
is indeed digital. The more general problem of coding (compressing) a set of any symbols,
not just byte values, say, has been studied for a long time.
Getting back to our Library of Congress problem, it is well known that certain parts of
call numbers appear more frequently than others, so it would be more economic to assign
fewer bits as their Eades. This is known as variable-length coding (VLC) ~ the more
frequently-appearing symbols are coded with fewer bits per symbol, and vice versa. As a
result, fewer bits are usually needed to represent the whole collection.
In this chapter we study the basics of information theory and several popular lossless
compression techniques. Figure 7.1 depicts a general data compression scheme, in which
compression is perfonned by an encoder and decompression is performed by a decoder.
We call the output of the encoder codes or codewords. The intennediate medium could
either be data storage or a communication/computer network. If the compression and
decompression processes induce no information loss, the compression scheme is lossless;
otherwise, it is lossy. The next several chapters deal with lossy compression algorithms as
they are commonly used for image, video, and audio compression. Here, we concentrate
on 10ssless compression.
167
168 Chapter 7 Lossless Compression Algorithms

-Input

data
Encoder
(compression) -).
Storage or Decoder
networks t------* (decompression)
Outpu

data

FIGURE 7.1: A general data compression scheme.

If the total number of bits required to represent the data before compression is Eo and the
total number of bits required to represent the data after compression is 81, then we define
the compression ratio as

. . Bo
compressIOn ratIO = - (7.1)
BI

In general, we would desire any codec (encoder/decoder scheme) to have a compres-

sion ratio much larger than 1.0. The higher the compression ratio, the better the lossless
compression scheme, as long as it is computationally feasible.

7.2 BASICS OF INFORMATION THEORY

According to the famous scientist Claude E. Shannon, of Bell Labs [3, 4J, the entropy 1) of
an information source with alphabet S = {SI, S2, ••. ,511 } is defined as:

11 1
1) = H(S) L
1=1
Pi lo g2-
Pi
(7.2)

-L PI log2PI (7.3)
1=1

The term log2 *

where Pi is the probability that symbol SI in S will occur.
indicates the amount of information (the so-called self-information
defined by Shannon [3J) contained in Sl, which corresponds to the number of bits I needed
to encode Si. For example, if the probability of having the character 11 in a manuscript is
1/32, the amount of information associated with receiving this character is 5 bits. In other
words, a character string nIlI! will require 15 bits to code. This is the basis for possible data
reduction in text compression, since it will lead to character coding schemes different from
the ASCII representation, in which each character is always represented with 8 bits.
What is the entropy? In science, entropy is a measure of the disorder of a system - the
more entropy, the more disorder. Typically, we add negative entropy to a system when we
impart more order to it. For example, suppose we sort a deck of cards. (Think of a bubble
sort for the deck -.perhaps this is not the usual way you actually sort cards, though.) For

I Since we have chosen 2 as the base for logarithms in the above definition, the unit of information is bit-
naturally also most appropriate for the binary code representation used in digital computers. If the log base is 10,
the unit is the hartley; if the base is e, the unit is the nat.
Section 7.2 Basics of Information Theory 169

every decision to swap or not, we impart 1 bit of information to the card system and transfer
1 bit of negative entropy to the card deck.
The definition of entropy includes the idea that two decisions means the transfer of twice
the negative entropy in its use of the log base 2. A two-bit vector can have 2 2 states, and the
logarithm takes this value into 2 bits of negative entropy. Twice as many sorting decisions
impart twice the entropy change.
Now suppose we wish to communicate those swapping decisions, via a network, say.
Then for our two decisions we'd have to send 2 bits. If we had a two-decision system, then
of course the average number of bits for all such communications would also be 2 bits. If
we like, we can think of the possible number of states in our 2-bit system as four outcomes.
Each outcome has probability 1/4. So on average, the number of bits to send per outcome
is 4 x 0/4) x log«1/(1/4» = 2 bits - no surprise here. To communicate (transmit) the
results of our two decisions, we would need to transmit 2 bits.
But if the probability for one of the outcomes were higher than the others, the average
number of bits we'd send would be different. (This situation might occur if the deck
were already partially ordered, so that the probability of a not-swap were higher than for
a swap.) Suppose the probabilities of one of our four states were 112, and the other three
states each had probability 1/6 of occuning. To extend our modeling of how many bits
to send on average, we need to go to noninteger powers of 2 for probabilities. Then we
can use a logarithm to ask how many (float) bits of information must be sent to transmit
the information content. Equation (7.3) says that in this case, we'd have to send just
(1/2) x log2(2) + 3 x (1/6) x log2(6) = 1.7925 bits, a value less than 2. This reflects
the idea that if we could somehow encode our four states, such that the most-occurring one
means fewer bits to send, we'd do better (fewer bits) on average.
The definition of entropy is aimed at identifying often-occurring symbols in the data-
stream as good candidates for short codewords in the compressed bitstream. As described
earlier, we use a variable-length coding scheme for entropy coding - frequently-occuning
symbols are given codes that are quickly transmitted, while infrequently-occurring ones are
given longer codes. For example, E occurs frequently in English, so we should give it a
shorter code than Q, say.
This aspect of "surprise" in receiving an infrequent symbol in the datastream is reflected
in the definition (7.3). For if a symbol occurs rarely, its probability Pi is low (e.g., 11100),
and thus its logarithm is a large negative number. This reflects the fact that it takes a longer
bitstring to encode it. The probabilities Pi sitting outside the logarithm in Eq. (7.3) say that
over a long stream, the symbols come by with an average frequency equal to the probability
of their occurrence. This weighting should multiply the long or short information content
given by the element of "surprise" in seeing a particular symbol.
As another concrete example, if the information source S is a gray-level digital image,
a
each Si is a gray-level intensity ranging from to (2 k - 1), where k is the number of bits
used to represent each pixel in an uncompressed image. The range is often [0,255], since
8 bits are typically used: this makes a convenient one byte per pixel. The image histogram
(as discussed in Chapter 3) is a way of calculating the probability Pi of having pixels with
gray-level intensity i in the image.
One wrinkle in the algorithm implied by Eq. (7.3) is that if a symbol occurs with zero
frequency, we simply don't count it into the entropy: we cannot take a log of zero.
170 Chapter 7 Lossless Compression Algorithms

Pi
1

1/256 - 2/3

••• 1/3

o 255 o 255

(a) (b)

FIGURE 7.2: Histograms for two gray-level images.

Figure 7.2(a) shows the histogram of an image with IIl1iform distribution of gray-level
intensities, - that is, Vi Pi = 1/256. Hence, the entropy of this image is
255 1
17 = L 256 . log2 256 = 8 (7.4)
i=O
As can be seen in Eq. (7.3), the entropy 17 is a weighted sum of tenns log2 ~i ; hence it
represents the average amount of infonnation contained per symbol in the source S. For
a memoryless source 2 S, the entropy 1) represents the minimum average number of bits
required to represent each symbol in S. In other words, it specifies the lower bound for the
average number of bits to code each symbol in S.
If we use I to denote the average length (measured in bits) of the codewords produced
by the encoder, the Shannon Coding Theorem states that the entropy is the best we can do
(under certain conditions):
(7.5)

Coding schemes aim to get as close as possible to this theoretical lower bound.
It is interesting to observe that in the above unifonn-distribution example we found that
1) = 8 - the minimum average number of bits to represent each gray-level intensity is at
least 8. No compression is possible for this image! In the context of imaging, this will
correspond to the "worst case," where neighboring pixel values have no similarity.
Figure 7 .2(b) shows the histogram of another image, in which 1/3 of the pixels are rather
dark and 2/3 of them are rather bright. The entropy of this image is
1 2 3
, 1) 3 . log2 3 + 3 . log2 2
0.33 x 1.59 + 0.67 x 0.59 = 0.52 + 0040 = 0.92
In general, the entropy is greater when the probability distribution is flat and smaller when
it is more peaked.
2 An information source that is independently distributed, meaning that the value of the current symbol does
not depend on the values of the previously appeared symbols.
Section 7.3 Run-Length Coding 171

7.3 RUN-LENGTH CODING

Instead of assuming a memoryless source, run-length coding (RLC) exploits memory present
in the information sour<;e. It is one of the s;mplest fonns of data compression. The basic
idea is that if the information source we wish to compress has the property that symbols
tend to form continuous groups, instead of coding each symbol in the group individually,
we can code one such symbol and the length of the group.
As an example, consider a bilevel image (one with only I-bit black and white pixels)
with monotone regions. This information source can be efficiently coded using run-length
coding. In fact, since there are only two symbols, we do not even need to code any symbol
at the start of each run. Instead, we can assume that the starting run is always of a particular
color (either black or white) and simply code the length of each run.
The above description is the one-dimensional run-length coding algorithm. A two-
dimensional variant of it is usually used to code bilevel images. This algorithm uses the
coded run information in the previous row of the image to code the run in the CUiTent row.
A full description of this algorithm can be found in [5].

7.4 VARIABLE·LENGTH CODING (VLC)

Since the entropy indicates the information content in an information source S, it leads to
a family of coding methods commonly known as entropy coding methods. As described
earlier, variable-length coding (VLC) is one of the best-known such methods. Here, we
will study the Shannon~Fano algorithm, Huffman coding, and adaptive Huffman coding.

7,4.1 Shannon-Fano Algorithm

The Shannon-Fano algorithm was independently developed by Shannon at Bell Labs and
Robert Fana at MIT [6]. To illustrate the algorithm, let's suppose the symbols to be coded
are the characters in the word HELLO. The frequency count of the symbols is

Symbol H E L a
Count I I 2 I

The encoding steps of the Shannon-Fano algorithm can be presented in the following
top-down manner:

1. Sort the symbols according to t~e frequency count of their occurrences.

2. Recursively divide the symbols into two parts, each with approximately the same
number of counts, until an parts contain only one symbol.

A natural way of implementing the above procedure is to build a binary tree. As a

convention, let's assign bit 0 to its left branches and 1 to the right branches.
172 Chapter 7 Lossless Compression Algorithms

(5) (5)

~
0 1
(3)
L:(2) H,E,O:(3) L:(2) 0 1

H:(I) E,0:(2)
(a) (b)

(5)
0 1
(3)
L:(2) 0
(2)
H:(I) 0 1

E:(I) 0:(1)
(c)

FIGURE 7.3: Coding tree for HELLO by the Shannon-Fano algorithm.

Initially, the symbols are sorted as LHEO. As Figure 7.3 shows, the first division yields
two parts: (a) L with a count of2, denoted as L:(2); and (b) H, E and 0 with a total count
of 3, denoted as H,E,0:(3). The second division yields H:(l) and E,O:(2). The last division
is E:(1) and 0:(1).
Table 7.1 summarizes the result, showing each symbol, its frequency count, information
content ( log2 .;;), resulting codeword, and the number ofbits needed to encode each symbol
in the word HELLO. The total number of bits used is shown at the bottom.
To revisit the previous discussion on entropy, in this case,
1 1 1 1
11 PL ·log2 - + PH ·log2 ~ + PE ·log2 - + Po ·log2~
PL PH PE Po
0.4 x 1.32 + 0.2 x 2.32 + 0.2 x 2.32 + 0.2 x 2.32 = 1.92

TABLE 7.1: One result of performing the Shannon-Fano algorithm on HELLO.

Symbol Count log2 .;; Code Number of bits used

. L 2 1.32 0 2
H 1 2.32 10 2
E I 2.32 110 3
a 1 2.32 111 3
TOTAL number of bits: 10
Section 7.4 Variable-Length Coding {VLC} 173

(5) (5)

~
L,R(3) E,0:(2)
(2)
0
(3)
0 1 0

L:(2) H:(l) E:(1) 0:(1)

(a) (b)

FIGURE 7.4: Another coding tree for HELLO by the Shannon-Fano algorithm.

This suggests that the minimum average number of bits to code each character in the word
HELLO would be at least 1.92. In this example,. the Shannon-Fano algorithm uses an
average of 10/5 = 2 bits to code each symbol, which is fairly close to the lower bound of
1.92. Apparently, the result is satisfactory.
It should be pointed out that the outcome of the Shannon-Fano algorithm is not neces-
sarily unique. For instance, at the first division in the above example, it would be equally
valid to divide into the two parts L,H:(3) and E,O:(2). This would result in the coding in
Figure 7.4. Table 7.2 shows the codewords are different now. Also, these two sets of code-
words may behave differently when errors are present. Coincidentally, the total number of
bits required to encode the world HELLO remains at 10.
The Shannon~Fanoalgorithm delivers satisfactory coding results for data compression,
but it was soon outperformed and overtaken by the Huffman coding method.

7.4.2 Huffman Coding

First presented by David A. Huffman in a 1952 paper [7], this method attracted an over-
whelming amount of research and has been adopted in many important and/or commercial
applications, such as fax machines, JPEG, and MPEG.
In contradistinction to Shannon-Fano, which is top-down, the encoding steps of the
Huffman algorithm are described in the following bottom-up manner. Let's use the same
example word, HELLO. A similar binary coding tree will be used as above, in which the
left branches are coded 0 and right branches 1. A simple list data structure is also used.

TABLE 7.2: Another result of performing the Shannon-Fano algorithm on HELLO.

Symbol Count log2 -/;; Code Number of bits used

L 2 1.32 00 4
H 1 2.32 01 2
E 1 2.32 10 2
0 1 2.32 11 2
TOTAL number of bits: 10
174 Chapter 7 Lossless Compression Algorithms

PI:(2) P2:(3)

~0:(1)
0 1
P1:(2)
E:(1) H:(l) 0 1

E:{l) 0:(1)
(a) (b)

P3:(5)
0 1
P2:(3)
L:(2) 0 1
P1;(2)
H:(1) 0 1

E:(1) O:{l)
(c)

FIGURE 7.5; Coding tree for HELLO using the Huffman algorithm.

ALGORITHM 7.1 HUFFMAN CODING

1. Initialization; put all symbols on the list sorted according to their frequency counts.
2. Repeat until the list has only one symbol left.

(a) From the list, pick two symbols with the lowest frequency counts. Form a
Huffman subtree that has these two symbols as child nodes and create a parent
node for them.
(b) Assign the sum of the children's frequency counts to the parent and insert it into
the list, such that the order is maintained.
(c) Delete the children from the list.

3. Assign a codeword for each leaf based on the path from the root. ""'l.

In the above figure, new symbols PI, Pl, P3 are created to refer to the parent nodes in
the Huffman coding tree. The contents in the list are illustrated below:

After initialization: LHEO

After iteration (a): LPI H
After iteration (b): LP2
After iteration (c): P3
Section 7.4 Variable-Length Coding (VLC) 175

For this simple example, the Huffman algorithm apparently generated the same coding
-result as one of the Shannon-Fano results shown in Figure 7.3, although the results are
usually better. The average number of bits used to code each character is also 2, (Le.,
(l + 1 + 2 + 3 + 3)/5 = 2). As another simple example, consider a text string containing
a set of characters and their frequency counts as follows: A:(l5), B:(7), C:(6), D:(6) and
E:(5). It is easy to show that the Shannon~Fano algorithm needs a total of 89 bits to encode
this string, whereas the Huffman algorithm needs only 87. !~:
As shown above, if correct probabilities ("prior statistics") are available and accurate,
the Huffman coding method produces good compression results. Decoding for the Huffman
coding is trivial as long as the statistics and/or coding tree are sent before the data to be
compressed (in the file header, say). This overhead becomes negligible if the data file is
sufficiently large.
The following are important properties of Huffman coding:

• Unique prefix property. No Huffman code is a prefix of any other Huffman code.
For instance, the code 0 assigned to L in Figure 7.5(c) is not a prefix of the code 10
for H or 110 for E or 111 for 0; nor is the code 10 for H a prefix of the code 110 for
E or 111 for O. It turns out that the unique prefix property is guaranteed by the above
Huffman algorithm, since it always places all input symbols at the leaf nodes of the
Huffman tree. The Huffman code is one of the prefix codes for which the unique
prefix property holds. The code generated by the Shannon-Fano algorithm is another
such example.
This property is essential and also makes for an efficient decoder, since it precludes
any ambiguity in decoding. In the above example, if a bit 0 is received, the decoder can
immediately produce a symbol L without waiting for any more bits to be transmitted.

• Optimality. The Huffman code is a minimum-redundancy code, as shown in Huff-

man's 1952 paper [7]. It has been proven 18, 2] that the Huffman code is optimal for
a given data model (i.e., a given, accurate, probability distribution):

- The two least frequent symbols will have the same length for their Huffman
codes, differing only at the last bit. This should be obvious from the above
algorithm.
- Symbols that occur more frequently will have shorter Huffman codes than sym-
bols that occur less frequently. Namely, for symbols Sf and Sj, if Pf 2: Pj then
l; :::: I j, where lf is the number of bits in the codeword for Sf.
- It has been shown (see [2]) that the average code length for an infonnation source
S is strictly less than I] + 1. Combined with Eq.(7 .5), we have

(7.6)

Extended Huffman Coding. The discussion of Huffman coding so far assigns each
symbol a codeword that has an integer bit length. As stated earlier, log2 ~ indicates the
amount of information contained in the infonnation source Si, which cori'esponds to the
176 Chapter 7 Lossless Compression Algorithms

number of bits needed to represent it. When a particular symbol Si has a large probability
(close to 1.0), log2 -/;; will be close to 0, and assigning one bit to represent that symbol will
be costly. Only when the probabilities of all symbols can be expressed as 2- k , where_ k is a
positive ~nteger, would the average length of codewords be truly optimal- that is, I ~ ry.
Clearly, I > ry in most cases.
One way to address the problem of integral codeword length is to group several symbols
and assign a single codeword to the group. Huffman coding of this type is called Extended
Huffman Coding [2]. Assume an information source has alphabet S = {S!, S2, ..• , Sit}. If
k symbols are grouped together, then the extended alphabet is
k symbols
S(k) = ,-'-,
{SISj ..• S!, SlSl .•• S2, •.• , SIS! •• ,Sil, SISl·· .S2S1, ••• , SI1 SI1" 'Sn
}

Note that the size of the new alphabet S(k) is nk . If k is relatively large (e.g., k ::: 3), then
for most practical applications where lZ » I,ll would be a very large number, implying a
huge symbol table. This overhead makes Extended Huffman Coding impractical.
As shown in [2], if the entropy of S is 11, then the average number of bits needed for each
symbol in S is now

(7.7)

so we have shaved quite a bit from the coding schemes' bracketing of the theoretical best
limit. Nevertheless, this is not as much of an improvement over the original Huffman coding
(where group size is 1) as one might have hoped for.

7.4,3 Adaptive Huffman Coding

-"l.

The Huffman algorithm requires prior statistical knowledge about the information source,
and such information is often not available. This is particularly true in multimedia applica-
tions, where future data is unknown before its arrival, as for example in live (or streaming)
audio and video. Even when the statistics are available, the transmission of the symbol table
could represent heavy overhead.
For the non-extended version of Huffman coding, the above discussion assumes a so-
called order-O model- that is, symbols/characters were treated singly, without any context
or history maintained. One possible way to include contextual information is to examine
k preceding (or succeeding) symbols each time; this is known as an order-k modeL For
example, an order-l model can incorporate such statistics as the probability of "qu" in
addition to the individual probabilities of "q" and "u". Nevertheless, this again implies that
much more statistical data has to be stored and sent for the order-k model when k ::: 1.
The solutio11 is to use adaptive compression algorithms, in which statistics are gathered
and updated dynamically as the datastream arrives. The probabilities are no longer based
on prior knowledge but on the actual data received so far. The new coding methods are
"adaptive" because, as the probability distribution of the received symbols changes, symbols
will be given new (longer or sh011er) codes. This is especially desirable for multimedia
data, when the content (the music or the color of the scene) and hence the statistics can
change rapidly.
As an example, we introduce the Adaptive Huffman Coding algorithm in this section.
Many ideas, however, are also applicable to other adaptive compression algorithms.
Section 7.4 Variable-Length Coding (VLC) 177

PROCEDURE 7.1 Procedures for Adaptive Huffman Coding

ENCODER DECODER

Initial_code(); Initial_code();
while not EOF while not EOF
{ {
get (c) j decode (c) ;
encode (c) i output (c) ;
update_tree(c); upda te_tree (c) ;
} }

• lni tial_code assigns symbols with some initially agreed-upon codes, without
any prior knowledge of the frequency counts for them. For example, some conven-
tional code such as ASCII may be used for coding character symbols.

• upda te_tree is a procedure for constructing an adaptive Huffman tree. It basically

does two things: it increments the frequency counts for the symbols (including any
new ones), and updates the configuration of the tree.

- The Huffman tree must always maintain its sibling property ~ that is, all nodes
(internal and leaf) are arranged in the order of increasing counts. Nodes are
numbered in order from left to right, bottom to top. (See Figure 7.6, in which
the first node is l.A:(l), the second node is 2.B:(1), and so on, where the numbers
in parentheses indicates the count.) If the sibling property is about to be violated,
a swap procedure is invoked to update the tree by rearranging the nodes.
- When a swap is necessary, the fatthest node with count N is swapped with the
node whose count has just been increased to N + 1. Note that if the node with
count N is not a leaf-node ~ it is the root of a subtree - the entire subtree will
go with it during the swap.

• The encoder and decoder must use exactly the same lni tial_code and
update_tree routines.

Figure 7.6(a) depicts a Huffman tree with some symbols already received. Figure 7.6(b)
shows the updated tree after an additional A (i.e., the second A) was received. This increased
the count of As to N + I = 2 and triggered a swap. In this case, the farthest node with
count N = 1 was D:(l). Hence, A:(2) and D:(l) were swapped.
Apparently, the same result could also be obtained by first swapping A:(2) with B:(l),
then with C:(l), and finally with D:(l). The problem is that such a procedure would take
three swaps; the rule of swapping with "the farthest node with count N" helps avoid such
unnecessary swaps.
178 Chapter 7 Lossless Compression Algorithms

9. (9) 9. (10)

7. (4) 7. (5)
8. P:(5) 8. P:(5)
5. (2) 6. (2) 5. (2) 6. (3)

1. A:(l) 2. B:(l) 3. C;(1) 4. D:(l) 1. D:(l) 2. B:(l) 3. C:(l) 4. A:(2)

(a) Huffman tree (b) Receiving 2nd "A" triggered a swap

9. (10)

9. (10)
8. P:(5)
7.(5)
8. P:(5) 5. A:(3)

3. C:(l)

1. D:(1) 2. B:(l) 3. C:(l) 4. A:(2+1) 1. D:(1) 2. B:(1)

(c-l) A swap is needed after receiving 3rd "A" (c-2) Another swap is needed

9. (11)

7. P:(5)

5. A:(3)

3. C:(l)

1. D:(l) 2. B:(l)

(c-3) The Huffman tree after receiving 3rd "A"

FIGURE 7.6: Node swapping for updating an adaptive Huffman tree: (a) a Huffman tree; (b) receiving
2nd "i\:' triggered a swap; (c':1) a swap is needed after receiving 3rd ''Pi'; (c-2) another swap is needed;
(c-3) the Huffman tree after receiving 3rd "M'.
Section 7.4 Variable-Length Coding (VLC) 179

The update of the Huffman tree after receiving the third A is more involved and is
illustrated in the three steps shown in Figure 7.6(c-l) to (c-3). Since A:(2) will become
A:(3) (temporarily denoted as A:(2+1», it is now necessary to swap A: (2+ I) with the fifth
node. This is illustrated with an arrow in Figure 7.6(c-l).
Since the fifth node is a non-leaf node, the subtree with nodes 1. D:(I), 2. B:(I), and
5. (2) is swapped as a whole with A: (3). Figure 7.6(c-2) shows the tree after this first swap.
Now the seventh node will become (5+1), which triggers another swap with the eighth node.
Figure 7.6(c-3) shows the Huffman tree after this second swap.
The above example shows an update process that aims to maintain the sibling property
of the adaptive Huffman tree - the update of the tree sometimes requires more than one
swap. When this occurs, the swaps should be executed in multiple steps in a "bottom-up"
marmer, starting from' the lowest level where a swap is needed. In other wo~ds, the update
is carried out sequentially: tree nodes are examined in order, and swaps are made whenever
necessary.
To clearly illustrate more implementation details, let's examine another example. Here,
we show exactly what bits are sent, as opposed to simply stating how the tree is updated.

EXAMPLE 7.1 Adaptive Huffman Coding for Symbol String AADCCDD

Let's assume that the initial code assignment for both the encoder and decoder simply
follows the ASCII order for the 26 symbols in an alphabet, A through Z, as Table 7.3
shows. To improve the implementation of the algorithm, we adopt an additional rule: if any
character/symbol is to be sent the first time, it must be preceded by a special symbol, NEW.
The initial code for NEW is O. The count for NEW is always kept as 0 (the count is never
increased); hence it is always denoted as NEW: (0) in Figure 7.7.
Figure 7.7 shows the Huffman tree after each step. Initially, there is no tree. For the first
A, 0 for NEW and the initial code 0000I for A are sent. Afterward, the tree is built and
shown as the first one, labeled A. Now both the encoder and decoder have constructed the
same first tree, from which it can be seen that the code for the second A is 1. The code sent
is thus 1.
After the second A, the tree is updated, shown labeled as AA. The updates after receiving
D and C are similar. More subtrees are spawned, and the code for NEW is getting longer
- from 0 to 00 to 000.

TABLE 7.3: Initial code assignment for AADCCDD using adaptive Huffman coding.

Initial Code
NEW: 0
A: 00001
B: OOOlD
C: 00011
D: 00100
180 Chapter 7 Lossless Compression Algorithms

(1) (2) (3)

~ ~
~
l
(I)
NEW: (0) A:(1) NEW:(O) A:(2) o 1 A:(2)

NEW:(O) 0:(1)

"A" "AA" "AAD"

(4) (4) (4)

o o o 1
(2) (2) (2+1)
~
o A:(2) o A:(2) o 1 A:(2)
(1) (1) (1)
o D:(l) o 0:(1) o C:(2)
~
NEW:(O) C:(1) NEW: (0) C:(1+1) NEW:(O) 0:(1)

"AADC" "AADCC" step 1 "AADCC" step 2

(5) (6) (7)

0 0 0
(3) (4) (4)
A:(2) 0 1 A:(2) 0 1 D:(3) 0 1
(1) (2) (2)
0 C:(2) 0 C:(2) 0 C:(2)

NEW:(O) D:(1) NEW: (0) D:(2) NEW:(O) A:(2)

"AADCC" step 3 "AADCCD" "AADCCDD"

FIGURE 7.7: Adaptive Huffman tree for AADCCDD.

From AADC to AADCC takes two swaps. To illustrate the update process clearly, this
is shown in three steps, with the required swaps again indicated by arrows.

• AADCC Step 1. The frequency count for C is increased from 1 to 1 + 1 = 2; this

necessitates its swap with D:(l).

• AADCQ Step 2. After the swap between C and D, the count of the parent node of
C:(2) will be increased from 2 to 2 + 1 = 3; this requires its swap with A:(2).

• AADCC Step 3. The swap between A and the parent of C is completed.

Table 7.4 summarizes the sequence of symbols and code (zeros and ones) being sent to
the decoder.
Section 7.5 Dictionary-Based Coding 181

TABLE 7.4: Sequence of symbols and codes sent to the decoder

Symbol NEW A A NEW D NEW C C D D

Code 0 00001 1 0 00100 00 00011 001 101 101

It is important to emphasize that the code for a particular symbol often changes during
the adaptive Huffman coding process. The more frequent the symbol up to the moment, the
shorter the code. For example, after AADCCDD, when the character D overtakes A as the
most frequent symbol, its code changes from 101 to O. This is of course fundamental for the
adaptive algorithm"":'" codes are reassigned dynamically according to the new probability
distribution of the symbols.

The "Squeeze Page" on this book's web site provides a Java applet for adaptive Huffman
coding that should aid you in learning this algorithm.

7,5 DICTIONARY·BASED CODING

The Lempel-Ziv-Welch (LZW) algorithm employs an adaptive, dictionary-based compres-
sion technique. Unlike variable-length coding, in which the lengths of the codewords are
different, LZW uses fixed-length codewords to represent variable-length strings of sym-
bols/characters that conunonly occur together, such as words in English text.
As in the other adaptive compression techniques, the LZW encoder and decoder builds
up the same dictionary dynamically while receiving the data - the encoder and the decoder
both develop the same dictionary. Since a single code can now represent more than one
symboVcharacter, data compression is realized.
LZW proceeds by placing longer and longer repeated entries into a dictionary, then
emitting the code for an element rather than the string itself, if the element has already been
placed in the dictionary. The predecessors ofLZW areLZ77 [9] and LZ78 [10], due to Jacob
Ziv and Abraham Lempel in 1977 and 1978. Terry Welch [11] improved the technique in
1984. LZW is used in many applications, such as UNIX compress, GIF for images, V,42
bis for modems, and others.

ALGORITHM 7,2 lZW COMPRESSION

BEGIN
s = next input character;
while not EOF

c = next input character;

if s + c exists in the dictionary

s s + c;
182 Chapter 7 Lossless Compression Algorithms

else
{
output the code for Si
add string s + c to the dictionary with a new codej
S = Ci

}
output the code for Sj

END

EXAMPLE 7.2 LZW Compression for String ABABBABCABABBA

Let's start with a very simple dictionary (also referred to as a string table), initially containing
only three characters, with codes as follows:

code string

1 A
2 B
3 C

Now if the input string is ABABBABCABABBA, the LZW compression algorithm works
as follows:

S C output code string

------~~----------~-~------------

1 A
2 B
3 C
--------~~-----------~-----------

A B 1 4 AB
B A 2 5 BA
A B
AB B 4 6 ABB
B A
BA B 5 7 BAB
B C 2 8 BC
C A 3 9 CA
A B
AB A 4 10 ABA
A' B
AB B
ABB A 6 11 ABBA
A EOF 1
Section 7.5 Dictionary-Based Coding 183

The output codes are 1 24523461. Instead of 14 characters, only 9 codes need to be
sent. If we assume each character or code is transmitted as a byte, that is quite a saving (the
compression ratio would be 14/9 = 1.56). (Remember, the LZW is an adaptive algorithm,
in which the encoder and decoder independently build their own string tables. Hence, there
is no overhead involving transmitting the string table.)
Obviously, for our illustration the above example is replete with a great deal of redundancy
in the input string, which is why it achieves compression so quickly. In general, savings for
LZW would not come until the text is more than a few hundred bytes long.

The above LZW algorithm is simple, and it makes no effort in selecting optimal new
strings to enter into its dictionary. As a result, its string table grows rapidly, as illustrated
above. A typical LZW implementation for textual data uses a 12-bit codelength. Hence,
its dictionary can contain up to 4,096 entries, with the first 256 (0--255) entries being
ASCII codes. If we take this into account, the above compression ratio is reduced to
(14 x 8)/(9 x 12) = 1.04.

ALGORITHM 7.3 LZW DECOMPRESSION (SIMPLE VERSION)

BEGIN
s = NIL:
while not EOF
{
k = next input code:
entry = dictionary entry for k;
output entry;
i f (s != NIL)
add string s + entry[OJ to dictionary
with a new code;
s = entry:
}
END

EXAMPLE 7.3 LZW decompression for string ABABBABCABABBA

Input codes to the decoder are 1 24 5 234 6 1. The initial string table is identical to what
is used by the encoder.
The LZW decompression algorithm then works as follows:

s k entry/output code string

1 A
2 B
3 C

NIL 1 A
184 Chapter 7 Lossless Compression Algorithms

A 2 B 4 AB
B 4 AB 5 BA
AB 5 BA 6 ABE
BA 2 B 7 BAB
B 3 C 8 BC
C 4 AB 9 CA
AB 6 ABB 10 ABA
ABB 1 A 11 ABBA
A EOF
Apparently the output string is ABABBABCABABBA - a truly 10ssless result!

LZW Algorithm Details A more careful examination of the above simple version of
the LZW decompression algorithm will reveal a potential problem. In adaptively updating
the dictionaries, the encoder is sometimes ahead of the decoder. For example, after the
sequence ABABB, the encoder will output code 4 and create a dictionary entry with code
6 for the new string ABB.
On the decoder side, after receiving the code 4, the output wiI1 be AB, and the dictionary
is updated with code 5 for a new string, BA. This occurs several times in the above example,
such as after the encoder outputs another code 4, code 6. In a way, this is anticipated -
after all, it is a sequential process, and the encoder had to be ahead. In this example, this
did not cause problem.
Welch [11] points out that the simple version of the LZW decompression algorithm will
break down when the following scenario occurs. Assume that the input string is ABAB-
BABCABBABBAX....
The LZW encoder:
s c output code string

1 A
2 B
3 C

A B 1 4 AB
B A 2 5 BA
A B
AB B 4 6 ABB
B A
BA B 5 7 BAB
B C 2 8 BC
C A 3 9 CA'
A B
AB B

Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
2015 Chapter 7 MMS IT
No ratings yet
2015 Chapter 7 MMS IT
36 pages
Sayood DataCompression
No ratings yet
Sayood DataCompression
22 pages
Data Compression
No ratings yet
Data Compression
49 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Foundations of Information Processing: Information and Data Compression
No ratings yet
Foundations of Information Processing: Information and Data Compression
35 pages
Data Compression
No ratings yet
Data Compression
46 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
09 Basic Compression
No ratings yet
09 Basic Compression
81 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
3 pages
Data Compression
No ratings yet
Data Compression
20 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Source Coding
No ratings yet
Source Coding
29 pages
Chapter 1: Lossless Data Compression
No ratings yet
Chapter 1: Lossless Data Compression
4 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Data Compression
No ratings yet
Data Compression
26 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Compression
100% (1)
Compression
38 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
No ratings yet
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
26 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
ECM3701 Study Unit 8
No ratings yet
ECM3701 Study Unit 8
20 pages
Bec613a MMC Mod3
No ratings yet
Bec613a MMC Mod3
50 pages
Image Compression Unit 4
No ratings yet
Image Compression Unit 4
17 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
Entropy 3
No ratings yet
Entropy 3
10 pages
All Coding
No ratings yet
All Coding
52 pages
cp467 12 Lecture14 Compression1
No ratings yet
cp467 12 Lecture14 Compression1
146 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Chap 2
No ratings yet
Chap 2
47 pages
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
No ratings yet
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
77 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Lecture 3 Compressiond Algo
No ratings yet
Lecture 3 Compressiond Algo
65 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
20250320121146-Module-3 MMC Notes
No ratings yet
20250320121146-Module-3 MMC Notes
27 pages
IICT Notes Unit-2
No ratings yet
IICT Notes Unit-2
17 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Ch8c Data Compression
No ratings yet
Ch8c Data Compression
7 pages
Entropy Coding - Wikipedia
No ratings yet
Entropy Coding - Wikipedia
2 pages
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
No ratings yet
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
25 pages
ECE359 - Image Compression
No ratings yet
ECE359 - Image Compression
42 pages
Welcome To 6.004!: Handouts: Lecture Slides, Calendar
No ratings yet
Welcome To 6.004!: Handouts: Lecture Slides, Calendar
22 pages
Compression II
No ratings yet
Compression II
51 pages
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
No ratings yet
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
93 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Image Compression
No ratings yet
Image Compression
113 pages
Chapter 2 Final
No ratings yet
Chapter 2 Final
26 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
An Introduction To Digital Design
From Everand
An Introduction To Digital Design
Jason King
2/5 (1)
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
From Everand
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Rodrigo Copetti
No ratings yet
Squishy Circuits Handout
No ratings yet
Squishy Circuits Handout
9 pages
Deep Neuronal Network
No ratings yet
Deep Neuronal Network
9 pages
Grade 11 (Kinematics)
No ratings yet
Grade 11 (Kinematics)
27 pages
Talent Is Overrated Book Summary
90% (10)
Talent Is Overrated Book Summary
2 pages
ZOOM65 v3 1.2PCB Via Json.v1.05 20240827
No ratings yet
ZOOM65 v3 1.2PCB Via Json.v1.05 20240827
4 pages
16 Passage 2 - Computer Provides More Questions Than Answers Q14-26
No ratings yet
16 Passage 2 - Computer Provides More Questions Than Answers Q14-26
6 pages
IMSO 19th Preparation Statement - Final
No ratings yet
IMSO 19th Preparation Statement - Final
1 page
ALL Interview Questions
No ratings yet
ALL Interview Questions
29 pages
Business Analyst - Telecom ..
No ratings yet
Business Analyst - Telecom ..
2 pages
Fire Prevention & Protection
50% (2)
Fire Prevention & Protection
33 pages
Viva Voce Question
No ratings yet
Viva Voce Question
3 pages
Blending in Perfectly - Jackson Tegu - 2020sep
0% (1)
Blending in Perfectly - Jackson Tegu - 2020sep
8 pages
Dens Shield
No ratings yet
Dens Shield
16 pages
637 Service Manual
No ratings yet
637 Service Manual
339 pages
Lesson Plan
No ratings yet
Lesson Plan
25 pages
Frame and Generic Space Abstract
No ratings yet
Frame and Generic Space Abstract
9 pages
Templates
No ratings yet
Templates
52 pages
Apollo Cost Analysis
No ratings yet
Apollo Cost Analysis
6 pages
Post, or Distribute: Hypothesis Testing With Chi-Square
No ratings yet
Post, or Distribute: Hypothesis Testing With Chi-Square
25 pages
Power Windows Description and Operation
No ratings yet
Power Windows Description and Operation
4 pages
ACC 222 Costing
No ratings yet
ACC 222 Costing
17 pages
Calculation of Slab On Grade 15 CM
No ratings yet
Calculation of Slab On Grade 15 CM
2 pages
Tenths On A Number Line.196466852
No ratings yet
Tenths On A Number Line.196466852
19 pages
Group Assignment Guideline - HMO102.
No ratings yet
Group Assignment Guideline - HMO102.
8 pages
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
No ratings yet
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
8 pages
Philosophy of Life
No ratings yet
Philosophy of Life
3 pages
Anti Corrosion UV Curable Coatings
No ratings yet
Anti Corrosion UV Curable Coatings
3 pages
Using Gann's Methods
No ratings yet
Using Gann's Methods
3 pages
Automobile Engineering Lecture Notes PDF
No ratings yet
Automobile Engineering Lecture Notes PDF
16 pages
Thesis Paper Project Evaluation 1
No ratings yet
Thesis Paper Project Evaluation 1
25 pages

Book-Chapter-07 (Lossless Compression Algorithms) Merged

Uploaded by

Book-Chapter-07 (Lossless Compression Algorithms) Merged

Uploaded by

CHAPTER 7

FIGURE 7.1: A general data compression scheme.

In general, we would desire any codec (encoder/decoder scheme) to have a compres-

7.2 BASICS OF INFORMATION THEORY

The term log2 *

FIGURE 7.2: Histograms for two gray-level images.

7.3 RUN-LENGTH CODING

7.4 VARIABLE·LENGTH CODING (VLC)

7,4.1 Shannon-Fano Algorithm

1. Sort the symbols according to t~e frequency count of their occurrences.

A natural way of implementing the above procedure is to build a binary tree. As a

FIGURE 7.3: Coding tree for HELLO by the Shannon-Fano algorithm.

TABLE 7.1: One result of performing the Shannon-Fano algorithm on HELLO.

Symbol Count log2 .;; Code Number of bits used

L:(2) H:(l) E:(1) 0:(1)

7.4.2 Huffman Coding

TABLE 7.2: Another result of performing the Shannon-Fano algorithm on HELLO.

Symbol Count log2 -/;; Code Number of bits used

ALGORITHM 7.1 HUFFMAN CODING

After initialization: LHEO

• Optimality. The Huffman code is a minimum-redundancy code, as shown in Huff-

7.4,3 Adaptive Huffman Coding

PROCEDURE 7.1 Procedures for Adaptive Huffman Coding

• upda te_tree is a procedure for constructing an adaptive Huffman tree. It basically

1. A:(l) 2. B:(l) 3. C;(1) 4. D:(l) 1. D:(l) 2. B:(l) 3. C:(l) 4. A:(2)

1. D:(1) 2. B:(l) 3. C:(l) 4. A:(2+1) 1. D:(1) 2. B:(1)

(c-3) The Huffman tree after receiving 3rd "A"

EXAMPLE 7.1 Adaptive Huffman Coding for Symbol String AADCCDD

(1) (2) (3)

"A" "AA" "AAD"

(4) (4) (4)

"AADC" "AADCC" step 1 "AADCC" step 2

(5) (6) (7)

NEW:(O) D:(1) NEW: (0) D:(2) NEW:(O) A:(2)

"AADCC" step 3 "AADCCD" "AADCCDD"

FIGURE 7.7: Adaptive Huffman tree for AADCCDD.

• AADCC Step 1. The frequency count for C is increased from 1 to 1 + 1 = 2; this

• AADCC Step 3. The swap between A and the parent of C is completed.

TABLE 7.4: Sequence of symbols and codes sent to the decoder

Symbol NEW A A NEW D NEW C C D D

7,5 DICTIONARY·BASED CODING

ALGORITHM 7,2 lZW COMPRESSION

c = next input character;

if s + c exists in the dictionary

EXAMPLE 7.2 LZW Compression for String ABABBABCABABBA

S C output code string

ALGORITHM 7.3 LZW DECOMPRESSION (SIMPLE VERSION)

EXAMPLE 7.3 LZW decompression for string ABABBABCABABBA

s k entry/output code string

You might also like