Digital Photography and Video Editing
Digital Photography and Video Editing
3.ARITHMETIC CODING
Arithmetic coding is a data compression technique that encodes data (the data
string) by creating a code string which represents a fractional value on the number
line between 0 and 1. The coding algorithm is symbol wise recursive; i.e., it
operates upon and encodes (decodes) one data symbol per iteration or recursion.
On each recursion, the algorithm successively partitions an interval of the number
line between 0 and I, and retains one of the partitions as the new interval. Thus, the
algorithm successively deals with smaller intervals, and the code string, viewed as
a magnitude, lies in each of the nested intervals. The data string is recovered by
using magnitude comparisons on the code string to recreate how the encoder must
have successively partitioned and retained each nested subinterval. Arithmetic
coding differs considerably from the more familiar compression coding techniques,
such as prefix (Huffman) codes. Also, it should not be confused with error control
coding, whose object is to detect and correct errors in computer operations.
The notion of compression systems captures the idea that data may be transformed
into something which is encoded, then transmitted to a destination, then
transformed back into the original data. Any data compression approach, whether
employing arithmetic coding, Huffman codes, or any other coding technique, has a
model which makes some assumptions about the data and the events encoded. The
code itself can be independent of the model. Some systems which compress
waveforms (e.g., digitized speech) may predict the next value and encode the error.
In this model the error and not the actual data is encoded. Typically, at the encoder
side of a compression system, the data to be compressed feed a model unit. The
model determines:
1) the event@) to be encoded.
2) the estimate of the relative frequency (probability) of the events.
The encoder accepts the event and some indication of its relative frequency and
generates the code string. A simple model is the memoryless model, where the data
symbols themselves are encoded according to a single code. Another model is the
first-order Markov model, which uses the previous symbol as the context for the
current symbol. Consider, for example, compressing English sentences. If the data
symbol (in this case, a letter) “q” is the previous letter, we would expect the next
letter to be “u.” The first-order Markov model is a dependent model; we have a
different expectation for each symbol (or in the example, each letter), depending
on the context. The context is, in a sense, a state governed by the past sequence of
symbols. The purpose of a context is to provide a probability distribution, or
statistics, for encoding (decoding) the next symbol. Corresponding to the symbols
are statistics. To simplify the discussion, consider a single-context model, i.e., the
memoryless model. Data compression results from encoding the more frequent
symbols with short code-string length increases, and encoding the less-frequent
events with long code length increases. Let e, denote the occurrences of the ith
symbol in a data string. For the memoryless model and a given code, let 4 denote
the length (in bits) of the code-string increase associated.
The components of a compression system include
I)The model structure for contexts and events
In practice, the model is a finite-state machine which operates successively on each
data symbol and determines the current event to be encoded and its context (i.e.,
which relative frequency distribution applies to the current event). Often, each
event is the data symbol itself, but the structure can define other events from which
the data string could be reconstructed.
I. The statistics unit for estimation of the event statistics
The estimation method computes the relative frequency distribution used for each
context. The computation may be performed beforehand, or may be performed
during the encoding process, typically by a counting technique. For Huffman
codes, the event statistics are predetermined by the length of the event’s codeword
II. The encoder.
The encoder accepts the events
to be encoded and generates the code string.
Merits of arithmetic coding include:
1) Arithmetic coding is very helpful and is useful for small alphabet letters with
highly skewed probabilities.
2) The compression ratio of arithmetic coding is efficient in comparison of
Huffman method.
3) Arithmetic coding is the most efficient method to code symbols according to
the probability of their occurrence. The average code length corresponds
exactly to the possible minimum given by information theory.
4) Arithmetic coding offers a clearly better compression rate compared to
Huffman code tree.