0% found this document useful (0 votes)
24 views

Notes 7 2013 - Arithmetic Coding

Arithmetic coding is an entropy encoding technique that assigns variable-length codewords to symbols. It maps a sequence of symbols to a contiguous interval within the range [0,1). Each symbol partitions its interval into smaller subintervals proportional to the symbol probabilities. The code for the entire sequence is a real number within its final interval. This allows arithmetic coding to assign codewords with expected lengths closer to the entropy limit compared to Huffman coding. It was improved over time to address issues like finite precision and patented for practical use.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Notes 7 2013 - Arithmetic Coding

Arithmetic coding is an entropy encoding technique that assigns variable-length codewords to symbols. It maps a sequence of symbols to a contiguous interval within the range [0,1). Each symbol partitions its interval into smaller subintervals proportional to the symbol probabilities. The code for the entire sequence is a real number within its final interval. This allows arithmetic coding to assign codewords with expected lengths closer to the entropy limit compared to Huffman coding. It was improved over time to address issues like finite precision and patented for practical use.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Arithmetic coding

Motivation:
Huffman coding generates a code with rate within
pmax +0.086 of the entropy, where pmax is the
probability of the most frequently occurring symbol.
Small pmax small deviation from the entropy
Large pmax Huffman codes become inefficient
compared with the entropy.

Huffman coding cannot be used for coding of binary


sources.

Example 1
Huffman codes for the 3 letter alphabet a1 , a2 , a3
Letter Probability Code
a1 0.95 0
a2 0.02 11
a3 0.03 10
Entropy : 0.335 bits/symbol
Rate: 1.05 bits/symbol
Redundancy: 0.715 bits/symbol = 213% of the entropy.
Possible solution: block 2 symbols together and
generate the extended code
Letter Probability Huffman code
a1a1 0.9025 0
a1a2 0.0190 111
a1a3 0.0285 100
a2 a1 0.0190 1101
a2 a2 0.0004 110011
a 2 a3 0.0006 110001
a3a1 0.0285 101
a3 a 2 0.0006 110010
a3 a 2 0.0009 110000
Entropy: 0.355 bits/symbol of original alphabet
Rate : 0.611 bits/symbol of original alphabet
Redundancy: 0.276 bits/symbol of original alphabet ( 72%
of entropy)
Redundancy drops to acceptable values if we block about
8
8 symbols together Alphabet size: 3 6561.
Impractical (space, time to encode – decode).
In Huffman coding with extended alphabet of m
symbols/block we need codewords for each sequence
of m symbols.
The main idea

In all of our preceding compression algorithms, we first


partitioned our datavector into blocks and then
encoded each block into a string of codebits. Arithmetic
coding presents a whole new philosophy of coding.
In an arithmetic code, as we process the data samples in a
datavector ( X 1 , X 2 ,..., X n ) from left to right, we do not
replace each sample X i with a string of codebits, instead,
we assign to each X i a subinterval I i of the unit interval
[0; 1] so that I1 I 2 ... I n and so that I i is
determined from the previously processed samples in
conjunction with the current sample X i . When the final
interval I n is determined, then a codeword ( B1 , B2 ,..., BL ) is
assigned to the entire datavector so that the rational
number 0.B1 B2 ...BL given by
0.B1B2 ...BL B1 / 2 B2 / 4 B3 / 8 ...BL / 2 L
is a point in I n .
History:

The idea that a code string can be a binary fraction


pointing to the subinterval for a particular symbol sequence
is due to Shannon [1948] (Shannon-Fano code);

Recursive implementation for this idea:


Peter Elias ( unpublished) - mentioned in
1. N. Abramson, “Information Theory and Coding”,
McGraw-Hill,1963.
(P.Elias and D.Huffman were members of Fano’s first
information theory class at MIT)

The finite precision problem resolved:

2. J. J. Rissanen, “Generalized Kraft Inequality and


Arithmetic Coding”, IBM Jour. Res. And Dev., Vol-20,
(1976).

3. R. C. Pasco, “Source Coding for Fast Data


Compression”, Ph.D. Thesis, Stanford University, 1976.

Practical arithmetic coding:

4. J.J.Rissanen, G.G. Langdon, Arithmetic Coding, IBM


Journal of Research and Development, 23(2),1979
The main problem with arithmetic coding and the
main reason why it is not used as often as Huffman
coding in actual software is the fact that it is
protected by a large number of patents, most held by
IBM.
Coding a sequence

We tag each sequence with a unique identifier


The tag is a number in 0,1 .
Since the number of numbers in the interval is infinite,
it is possible to assign a unique tag to each sequence

We use the cumulative distribution function cdf of the


random variable associated with the source.

- Consider a random variable X that maps the


letters of the source alphabet A a1 , a2 ,..., am
to integers
X ( ai ) i , ai A

- P( X i ) P( ai ) , i 1,2,..., m
Probability distribution of X

- Cumulative distribution function


i
FX (i ) P( X k)
k 1

FX (0) 0, FX ( m) 1

- The cdf of the random variable associated with


the source partitions the 0,1 interval into
subintervals of the form
FX (i 1), FX (i ) for i 1,2,..., m .
Example 2

Letter Probability cdf


0 by def.
a1 0.7 0.7
a2 0.1 0.8
a3 0.2 1

Generating a tag
Partition the 0,1 interval into subintervals defined by
the cdf of the source.
While there are more symbols to be encoded do
- get next symbol
- restrict the tag to the subinterval corresponding to
the new symbol
- Partition the new subinterval of the tag proportionally
based on the cdf
The tag for the sequence is any number in the final
subinterval
Example 2 (continued) Code the sequence a1a2 a3

- The interval [0.0,1.0) is divided into subintervals


[0.0,0.7), [0.7,0.8) and [0.8,1.0)
The first symbol is a1 the tag will be contained in
the interval [0.0,0.7)
- This subinterval is subdivided in exactly the same
proportions as the original interval, yielding the
subintervals [0.0,0.49), [0.49,0.56), and [0.56,0.7)
The second symbol is a2 the tag will be contained
in the interval [0.49,0.56)
- The interval [0.49,0.56) is partitioned in the same
proportions into subintervals
[0.49,0.539), [0.539,0.546), and [0.546,0.56)
The third symbol is a3 the tag will be restricted to
the interval [0.546,0.56) .
Selecting a tag
The interval in which the tag for a particular sequence
resides is disjoint for all intervals in which the tag for
any other sequence may reside
Any number in the interval can be chosen as the tag
- lower limit of interval
- midpoint
We will use the midpoint of the interval as the tag.
Decoding a tag

Example 2 (continued)
The decoder can sequentially recover the data vector from
its tag.
0.546 0.56
Suppose the given tag value is 0.553.
2
The decoder knows that the interval I1 is either
[0.0,0.7) or [0.7,0.8) or [0.8,1.0) . The first subinterval
corresponds to the symbol a1 , the second corresponds to
a2 and the third subinterval corresponds to the symbol a3 .
Since the number 0.553 lies in the interval [0.0,0.7) , the
decoder determines that I1 [ 0. 0, 0. 7, )and that the first
data sample is x1 a1 .
Now the decoder knows that I 2 is either
[0.0,0.49) or [0.49,0.56), or [0.56,0.7) . Since 0.553 lies
in the interval [0.49,0.56) the decoder determines that
I 2 [0.49,0.56) , and that the second data sample is
x2 a2 .
The decoder now knows that I 3 is either
[0.49,0.539) or [0.539,0.546) or [0.546,0.56) . Since
0.553 lies in the interval [0.539,0.546) corresponding to
the symbol a3 the decoder concludes that the last data
sample is x3 a3
Tag generation for single-letter sequences
- Alphabet A a1 , a2 ,..., am
- Random variable X ( ai ) i , ai A , i 1,2,..., m
- Define a tag for ai , denoted by , TX ( ai ) to be
Example 3
A a1 , a2 ,..., a6
1
P( X i ) , i 1,2,...,6
6

Letter Probability cdf tag


1 1
a1 0.0833
6 6

1 2
a2 0.25
6 6

1 3
a3 0.4166
6 6

1 4
a4 0.5833
6 6

1 5
a5 0.75
6 6

1
a6 1 0.9166
6
Tag generation for multi-letter sequences
Impose an order “< “on the sequences
(lexicographic ordering is often used)
Define the tag for the n - letter sequence x denoted by
TX( n ) (x)
1
TX( n ) (x) P(y) P(x)
y x 2
Example 3 (continued)
1
TX(2) ( a1a3 ) P ( a1a1 ) P ( a1a2 ) P ( a1a3 )
2
1 1 1 5
36 36 72 72

Recursive computation of the tag


- Consider the sequence x x1 x2 ...xn
(k ) (k )
- Denote by l and u the lower and upper limit of
the interval corresponding to the subsequence
x x1 x2 ...xk , respectively
- Compute the lower and the upper limits of the tag
interval from the recurrence relations:
l (0) 0 u (0) 1
l (k ) l (k 1)
u( k 1)
l ( k 1) FX ( xk 1)

u( k ) l (k 1)
u( k 1)
l (k 1)
FX ( xk )
u( n ) l ( n )
- TX( n ) (x)
2
Deciphering a tag
Mimic the encoder
(0)
- Initialize l 0 and u (0) 1
-k 1
- Repeat until the whole sequence has been decoded
tag l ( k 1)
-t
u ( k 1) l ( k 1)
- Find the value of xk : FX xk 1 t FX xk
(k ) (k )
- Update l and u
- k: k 1

How do we know when the entire sequence has been


decoded?
- The decoder knows the length of the sequence in
advance
or
- A particular symbol is denoted as an end-of-
transmission symbol
Generating a binary code
(n) (n)
(n) u l
The tag TX (x) might be infinitely long
2
Use as the binary code for TX( n ) (x) the binary
representation of TX( n ) (x) truncated to
1
l (x)= log 2 1 bits.
P(x)
Example 5
Remark:
Compression is achieved in arithmetic code because high
(k )
probability events do not decrease the interval from l to
u ( k ) very much, but low probability events result in a much
smaller next interval requiring large number of digits. A
large interval needs only a few digits. The number of digits
required is –log(size of interval). The size of the final
interval is the product of the probabilities of the symbols
encoded. Thus a symbol x with probability P(x)
contributes –log P(x) bits to the output which is the
symbol’s self-information. Theoretically, therefore,
arithmetic code can achieve compression identical to the
entropy bound. But, finite precision of computer limits the
maximum compression achievable.
Arithmetic coding can be used for coding of binary
sources
Useful for long sequences

Example 6

Let us arithmetically encode the datavector X 10110 . We


employ [ p(0), p(1)] [2/ 5,3/ 5] . We need sequentially
determine the intervals I1 I (1) , I 2 I (10) , I 3 I (101) ,
I 4 I (1011) , I 5 I (10110).
The interval I (1) must be the right three-fifth’s of the
interval [0,1] , which yields

I1 [2 / 5,1] .

The interval I (10) must be the left two-fifth’s of the interval


I1 , and therefore

I2 [2 / 5,16 / 25] .

The interval I (101) must be the right three-fifth’s of I 2 .


Consequently

I 3 [62 /125,16 / 25] .

The interval I (1011) must be the right three-fifth’s of I 3 , and


so
I4 [346 / 625,16 / 25] .

Finally, I (10110) must be the left two-fifth’s of I 4 , which


gives us

I5 [346 / 625,1838 / 3125]

The length of the interval I 5 is 108/3125. Therefore the


length of the arithmetic encoder output must be

log2 3125/108 1 6.

The midpoint of the interval I 5 is 1784 / 3125. Expanding the


number in binary we obtain

1784/ 3125 .100100...

And the first L 6 bits of this is the encoder output. The


encoder output is therefore

100100 .

No compression!

Concatenating X 1000 times we get:


L 1000 log2 (3125/108) 1 4873
A compression is achieved.

Practical implementation
Problems that must be resolved
(n) (n)
- The values l and u come closer and closer
together, as n gets larger.
In a system with finite precision the two values are
bound to converge.

- to transmit portions of the code without waiting until


the entire sequence has been encoded
Tag generation with scaling

We have three possibilities

1. The interval is entirely confined to the lower half of the


unit interval
Ik [0,0.5)
2. The interval is entirely confined to the upper half of the
unit interval
Ik [0.5,1)

3. The interval straddles the midpoint of the unit interval

- The most significant bit of the binary representation of all


numbers in [0,0.5) is 0.

- The most significant bit of the binary representation of all


numbers in [0.5,1) is 1.

Ik [0,0.5) the m.s.b. of the tag is 0


we send 0

Ik [0.5,1) the m.s.b. of the tag is 1


we send 1
Then we rescale the interval (we lose the information
about the m.s.b. but we have already sent the bit to the
decoder)
The mappings required are

E1 :[0,0.5) [0,1) E1 ( x ) 2 x

E2 :[0.5,1) [0,1) E2 ( x ) 2( x 0.5)

E3 :[0.25,0.75) [0,1) E3 ( x ) 2( x 0.25)


What if the interval straddles the midpoint of the unit
interval?
(k ) (k )
We rescale when l and u are both in [0.25,0.75)
E3 :[0.25,0.75) [0,1) E3 ( x ) 2( x 0.25)

When we do an E3 mapping we do not transmit any bit.

We count how many E3 mappings we have done “count 3”

When we do the next mapping E1 ( E2 ) we transmit “0”


followed by “count 3” “1”s (“0”s)

and we reset count 3 to 0.


Decoding:
Wait to receive enough bits to unambiguously
decode the first symbol
The number of bits k that is needed should be such
that
k
2 Lmin

where Lmin is the length of the shortest tag interval in


the first partition.

In Example 4 the smallest tag interval [0.8,0.82) is of


size 0.02
k
2 0.02
We take k 6.

Then mimic the encoder


Comparison of Huffman and Arithmetic Coding

Example 5 (continuation)
The average length for this code is

l 2 0.5 3 0.25 4 0,125 4 0.125 2.75 bits/symbol

The entropy of this source is 1.75 bits/symbol and the


Huffman code achieves the entropy.

If we encode two symbols at a time the resulting code is


shown in the following table
If we encode two symbols at a time
(2)
lA 4.5 bits
lA 2.25 bits/symbol
If we increase the number of symbols per message l A is
closer to H ( X )

Arithmetic code for sequences of m symbols

2
H ( X ) lA H(X )
m

Huffman code for blocks of m symbols

1
H ( X ) lA H(X )
m
m
(but the size of the alphabet is k )

Arithmetic code is more complex but it is easy to


implement a system with multiple arithmetic codes – all we
need is to change probabilities.
Applications of Arithmetic Coding:

Bi-level image compression


- The JBIG standard

Parts of lossy compression codes


In JPEG code for coefficients

You might also like