Arithmetic Coding
Arithmetic Coding
Assign non-overlapping intervals on the (0,1] axis for each symbol in the
alphabet. The length of each interval must be proportional to the probability of
the corresponding symbol. As an example, if we have 3 symbols x1, x2, x3
with probabilities 0.5, 0.3, and 0.2, then the intervals can be : x1 (0,0.5], x2
(0.5,0.8], x3 (0.8, 1]. This interval assignment is illustrated as follows:
The philosophy is getting clear now. Suppose our source has one symbol: x1.
How could I encode it using these intervals? Just by producing a number
between 0 and 0.5 (which belongs to the interval of x1). What would happen if
my message consisted of two consecutive symbols: x1 x2 ? Then the
Arithmetic coder would split the interval of x1 (which was (0,0.5]) into
subintervals with the same proportional interval lengths for each symbol:
The code that represents the sequence x1 x2 is therefore any number in the
interval (0.25,0.4]. You can think of this code as splitting the intervals on the
(0,1] axis finer and finer. So, for example, what is the code for the sequence
x1 x2 x2 ? The answer is: split the last x2 interval again, and find the last
appropriate interval:
So after three symbols, the encoder must produce a number between 0.325
and 0.37. We can say that the output is the interval (0.325,0.37]. For the last
time in this example, let's emit one more symbol and encode the sequence x1
x2 x2 x3:
Apparently, the interval for the sequence x1 x2 x2 x3 is (0.361,0.37].
Click here to see the generation of interval for arithmetic coding of the above
example in animation.
For example 0101 stands for 0*(1/2) + 1*(1/4) + 0*(1/8) + 1*(1/16) = 5/16 =
0.3125
Exercise: Write the number in full decimal precision whose binary representation is
10011:
The next important step in the Arithmetic encoding is to find a suitable number
within the final interval which can be represented with as few bits as possible.
For our example, the final interval is (0.361,0.37]. The shortest amount of bits to
represent a number in this interval can be obtained as
1/4+1/16+1/32+1/64+1/128 = 0.3672. In binary form, this summation means :
0101111
Exercise: Try to obtain another number in this interval with a different binary
representation (this is possible).
For this example, 0101111 is our encoded bitstream to express x1 x2 x2 x3. As you
see, we used 7 bits. For three symbols, we normally require 2 bits per symbol.
Therefore, 4 symbols make 8 bits. We have a saving of one bit for this short
sequence.
The efficiency of Arithmetic coding becomes more clear if you encode longer
sequences. For example, if the next symbol is again x2, we have a sequence of x1
x2 x2 x3 x2, and the interval would be (0.3655, 0.3682]. As you see, our previous
encoding number (0.3672) is still in this range, therefore 0101111 is the
compressed bitstream of x1 x2 x2 x3 x2 too! Now, the saving is 3 bits.
But!
Clearly, the last situation indicates a property of Arithmetic coding that has to be
handled carefully: Bitstreams correspond to many symbol sequences. For instance,
here, 0.3672 corresponds to {x1}, {x1 x2}, {x1 x2 x2}, {x1 x2 x2 x3}, {x1 x2 x2
x3 x2}, etc... You can find arbitrarily long sequences whose interval contains the
number 0.3672 (or 0101111). So where should we stop? At the point where we are
told to! We have to indicate how many symbols we are encoding. The compressed
data, therefore, contains the extra information regarding the size of the source.
So, up to now, we have observed two difficulties about the Arithmetic encoding:
The second difficulty was overcome by transmitting the extra information of how
many symbols have been encoded.
The first difficulty is usually overcome by selecting the previously defined edge of
the interval. For example, selecting the smaller edge of the interval as the
representative symbol is a common practice. Nevertheless, there are numercial
algorithms which efficiently perform Arithmetic encoding using finite precision
(binary) arithmetic. The finite precision algorithm is both complicated to mention in
this course, and patented by IBM. Therefore, if one implements the finite precision
Arithmetic encoder (they are available in many books), (s)he has to resolve the
patent issues with IBM before being able to sell the software.
Low=0;
High=1;
Range=1;
while input X are coming:
High=Low+Range*HighValue(X);
Low=Low+Range*LowValue(X);
Range=High-Low
Of course, the HighValue and LowValue numbers of the symbols are determined
according to their probabilities. For example, for the previous example,
HighValue(x1)=0.5, LowValue(x1)=0, HighValue(x2)=0.8, LowValue(x2)=0.5,
HighValue(x3)=1, LowValue(x3)=0.8.
We can continue decoding as long as required. We must stop at the point where we
reach the number of encoded symbols. For example, for the above case, we have
decoded the sequence {x1,x2,x2,x3,x2}.
Finally, let us give the pseudo-code for the classic Arithmetic decoder:
Homework: Obtain 6 symbol decoded version of the number 01101110 for the
symbols and probabilities of the above exercise.
For the practical engineer: The above explanations of the arithmetic coder gives
an idea about the philosophy of an arithmetic coder. On the other hand, the
practical implementation of the arithmetic coder is a little different. Instead of
obtaining real numbers between 0 and 1 (which may then be converted into binary
representation), the probability splitting may be done immediately using binary
numbers. The web page http://www.cbloom.com/algs/statisti.html#A5.0 gives
excellent explanations of practical arithmetic coder implementations. It also
describes the probability splitting concepts together with the binary implementation.
You can find more information, usage, and implementation details about Arithmetic
coding in the following links:
http://lena.cs.utu.fi/tko/reports/R-92-6.html
http://www.zipworld.com.au/~isanta/uni/arithmetic.htm
http://ltssg3.epfl.ch/pub_files/brigger/thesis_html/node94.html
http://www.eas.asu.edu/~morrell/551fall95/project1/project1.html
http://student.monterey.edu/dh/dunkeljodyd/world/cst332/a1present/10.htm
http://www.mdc.net/~eberhard/ari/arithmetic.html