0% found this document useful (0 votes)
8 views

Arithmetic Coding

Uploaded by

Muoti Rikollinen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Arithmetic Coding

Uploaded by

Muoti Rikollinen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

4

PRACTICAL IMPLEMENTATIONS OF
ARITHMETIC CODING

Paul C. Howard 1 and Jeffrey Scott Vitter2


Department of Computer Science
Brown University
Providence, R. I. 02912-1910

Abstract
We provide a tutorial on arithmetic coding, showing how it provides nearly
optimal data compression and how it can be matched with almost any prob-
abilistic model. We indicate the main disadvantage of arithmetic coding, its
slowness, and give the basis of a fast, space-efficient, approximate arithmetic
coder with only minimal loss of compression efficiency. Our coder is based on
the replacement of arithmetic by table lookups coupled with a new deterministic
probability estimation scheme.

Index terms: data compression, arithmetic coding, adaptive modeling, analysis


of algorithms, data structures, low precision arithmetic.

1 Data Compression and Arithmetic Coding


Data can be compressed whenever some data symbols are more likely than oth-
ers. Shannon [54] showed that for the best possible compression code (in the
sense of minimum average code length), the output length contains a contri-
bution of -lgp bits from the encoding of each symbol whose probability of
occurrence is p. If we can provide an accurate model for the probability of oc-
currence of each possible symbol at every point in a file, we can use arithmetic
coding to encode the symbols that actually occur; the number of bits used by
arithmetic coding to encode a symbol with probability p is very nearly -lgp,
so the encoding is very nearly optimal for the given probability estimates.
In this article we show by theorems and examples how arithmetic coding
achieves its performance. We also point out some of the drawbacks of arithmetic
coding in practice, and propose a unified compression system for overcoming
them. We begin by attempting to clear up some of the false impressions com-
monly held about arithmetic coding; it offers some genuine benefits, but it is
not the solution to all data compression problems.
lSupport was provided in part by NASA Graduate Student Researchers Program grant
NGT-50420 and by a National Science Foundation Presidential Young Investigators Award
grant with matching funds from IBM. Additional support was provided by a Universities
Space Research Association/CESDIS associate membership.
2Support was provided in part by National Science Foundation Presidential Young Inves-
tigator Award CCR-9047466 with matching funds from IBM, by NSF research grant CCR-
9007851, by Army Research Office grant DAAL03-91-G-0035, and by the Office of Naval
Research and the Defense Advanced Research Projects Agency under contract N00014-91-
J-4052, ARPA Order No. 8225.
J. A. Storer (ed.), Image and Text Compression
© Springer Science+Business Media New York 1992
86

The most important advantage of arithmetic coding is its flexibility: it


can be used in conjunction with any model that can provide a sequence of
event probabilities. This advantage is significant because large compression
gains can be obtained only through the use of sophisticated models of the
input data. Models used for arithmetic coding may be adaptive, and in fact
a number of independent models may be used in succession in coding a single
file. This great flexibility results from the sharp separation of the coder from
the modeling process [47]. There is a cost associated with this flexibility: the
interface between the model and the coder, while simple, places considerable
time and space demands on the model's data structures, especially in the case
of a multi-symbol input alphabet.
The other important advantage of arithmetic coding is its optimality. Arith-
metic coding is optimal in theory and very nearly optimal in practice, in the
sense of encoding using minimal average code length. This optimality is often
less important than it might seem, since Huffman coding [25] is also very nearly
optimal in most cases [8,9,18,39]. When the probability of some single symbol
is close to 1, however, arithmetic coding does give considerably better compres-
sion than other methods. The case of highly unbalanced probabilities occurs
naturally in bilevel (black and white) image coding, and it can also arise in the
decomposition of a multi-symbol alphabet into a sequence of binary choices.
The main disadvantage of arithmetic coding is that it tends to be slow. We
shall see that the full precision form of arithmetic coding requires at least one
multiplication per event and in some implementations up to two multiplications
and two divisions per event. In addition, the model lookup and update opera-
tions are slow because of the input requirements of the coder. Both Huffman
coding and Ziv-Lempel [59,60] coding are faster because the model is repre-
sented directly in the data structures used for coding. (This reduces the coding
efficiency of those methods by narrowing the range of possible models.) Much
of the current research in arithmetic coding concerns finding approximations
that increase coding speed without compromising compression efficiency. The
most common method is to use an approximation to the multiplication opera-
tion [10,27,29,43]; in this article we present an alternative approach using table
lookups and approximate probability estimation.
Another disadvantage of arithmetic coding is that it does not in general
produce a prefix code. This precludes parallel coding with multiple processors.
In addition, the potentially unbounded output delay makes real-time coding
problematical in critical applications, but in practice the delay seldom exceeds
a few symbols, so this is not a major problem. A minor disadvantage is the
need to indicate the end of the file.
One final minor problem is that arithmetic codes have poor error resistance,
especially when used with adaptive models [5]. A single bit error in the encoded
file causes the decoder's internal state to be in error, making the remainder of
the decoded file wrong. In fact this is a drawback of all adaptive codes, including
Ziv-Lempel codes and adaptive Huffman codes [12,15,18,26,55,56]. In practice,
the poor error resistance of adaptive coding is unimportant, since we can simply
apply appropriate error correction coding to the encoded file. More complicated
solutions appear in [5,20], in which errors are made easy to detect, and upon
detection of an error, bits are changed until no errors are detected.

You might also like