Arithmetic coding
Irina Bocharova,
University of Aerospace Instrumentation,
St.-Petersburg, Russia
Outline
Shannon-Fano-Elias coding
Gilbert-Moore coding
Arithmetic coding as a generalization of SFE
and GM coding
Implementation of arithmetic coding
Lund, Sweden, February, 2005 1:22
Let x ∈ X = {1, . . . , M }, p(x) > 0,
p(1) ≥ p(2) ≥ · · · ≥ p(M ).
The cumulative sum is associated with the sym-
bol x
Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0,Q(2) = p(1), . . . , Q(M ) = i=1 p(i).
Then Q(m)lm is a codeword for m,
where lm = −log2 p(m)
Lund, Sweden, February, 2005 2:22
x p(x) Q Q in binary l(x) codeword
1 0.6 0 0.0 1 0
2 0.3 0.6 0.1001. . . 2 10
3 0.1 0.9 0.1110. . . 4 1110
L = 1.6 bits H(X) = 1.3 bits
Lund, Sweden, February, 2005 3:22
If lm binary symbols have been already transmit-
ted then the length of the interval of uncertainty
is 2−lm . Thus we can decode uniquely if
2−lm ≤ p(m)
or
lm ≥ − log2 p(m)
Choosing length lm we used only right segment
with respect to the point Q(m). This segment is
always shorter than the corresponding left seg-
ment since symbol probabilities are ordered in
descending order.
H(X) ≤ L < H(X) + 1.
Lund, Sweden, February, 2005 4:22
Let x ∈ X = {1, . . . , M }, p(x) > 0.
The cumulative sum is associated with the sym-
bol x
Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0, Q(2) = p(1), . . . , Q(M ) = i=1 p(i).
Introduce σ(x) = Q(x) + p(x)
2
Then σ̂(m) = σ(m)lm is a codeword for m,
where lm = −log2(p(m)/2).
Lund, Sweden, February, 2005 5:22
We put point σ(m) to the center of the segment
Q(m) + p(m)/2 and choose length of codeword
in such a manner that if lm binary symbols have
been transmitted the length of the interval of
uncertainty is less than or equal to p(m)/2.
Lund, Sweden, February, 2005 6:22
x p(x) Q σ l GM ShFE
1 0.1 0.0 0.00001... 5 00001 0000
2 0.6 0.0001.. 0.01100... 2 01 0
3 0.3 0.10110... 0.11011... 3 110 10
L = 2.6 bits H(X) = 1.3 bits
Lund, Sweden, February, 2005 7:22
Let i < j then σ(j) > σ(i)
j−1
i−1
p(j) p(i)
σ(j) − σ(i) = p(l) − p(l) + −
l=1 l=1
2 2
j−1
p(j) − p(i) p(j) − p(i)
= p(l) + ≥ p(i) +
l=i
2 2
p(i) + p(j) max{p(i), p(j)}
≥ ≥
2 2
Since lm = − log2 p(m)
2 ≥ − log2
p(m)
2
we obtain
max{p(i), p(j)}
σ(j) − σ(i) ≥ ≥ 2− min{li,lj }.
2
H(X) + 1 ≤ L < H(X) + 2
Lund, Sweden, February, 2005 8:22
When symbol-by-symbol coding is not effi-
cient?
1. Memoryless source
For symbol-by-symbol coding
R = H(X) + α,
where α is coding redundancy.
For block coding
H(X n) + α nH(X) + α α
R= = = H(X) + ,
n n n
where H(X n ) denotes entropy of n random vari-
ables.
If H(X) << 1 R ≥ 1 for symbol-by-symbol cod-
ing. For binary memoryless source with p(0) =
0.99, p(1) = 0.01 H(X) = 0.081 bits and we can
easily construct the Huffman code with R = 1
bit but it is impossible to obtain R < 1 bit.
2.Source with memory
H(X n ) ≤ nH(X) and
H(X n ) + α α
R= ≤ H(X) + .
n n
R → H∞(X)
when n → ∞, H∞(X) denotes entropy rate.
Lund, Sweden, February, 2005 9:22
How to implement block coding?
Let x ∈ X = {1, . . . , M }, and we are going to
encode sequences x = (x1, . . . , xn) which appear
at the output of X during n consecutive time
moments.
We can consider a new source X n with symbols
corresponding to the sequences x = (x1, . . . , xn)
of length n and apply any method of symbol-by-
symbol coding to these symbols. We will obtain
H(X n ) α
R= + ,
n n
where α depends on the chosen coding proce-
dure.
The problem is coding complexity. The alpha-
bet of the new source is of size M n. For example,
if M = 28 = 256 then for n = 2 M 2 = 65536,
and for n = 3 M 3 = 16777216.
The arithmetic coding provides redundancy 2/n
with complexity n2.
Lund, Sweden, February, 2005 10:22
Arithmetic coding is a direct extension of the
Gilbert-Moore coding scheme.
Let x = (x1, x2, . . . , xn) be an M -ary sequence of
length n. We construct the modified cumulative
distribution function
p(x) p(x)
σ(x) = p(a) + = Q(x) + ,
a≺x 2 2
where a ≺ x means that a is lexicographically less
than x, l(x) = −log2(p(x)/2).
The code rate R is equal to
1 1 1
p(x)l(x) = p(x)(log2 + 1)
n x n x p(x)
H(X n) + 2
<
n
If the source generates symbols independently
we obtain
2
R < H(X) + .
n
For source with memory
R → H∞(X)
when n → ∞.
Lund, Sweden, February, 2005 11:22
Consider
Q(x[1,n]) = p(a) =
a≺x
p(a)+
a:a[1,n−1] ≺x[1,n−1],an
p(a),
a:a[1,n−1] =x[1,n−1],an≺xn
where x[1,i] = x1, x2, . . . , xi. It is easy to see that
Q(x[1,n]) = Q(x[1,n−1])+ p(a)
a:a[1,n−1]=x[1,n−1] ,an≺xn
= Q(x[1,n−1]) + p(a[1,n−1]) p(an /a[1,n−1]).
an≺xn
If the source generates symbols independently
n−1
p(a[1,n−1]) = p(ai ),
i=1
p(an /a[1,n−1]) = p(an ) = Q(xn ),
an≺xn an≺xn
where Q(xi) denotes the cumulative probability
for xi.
Lund, Sweden, February, 2005 12:22
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
...
Lund, Sweden, February, 2005 13:22
We obtain the following recurrent equations
Q(x[1,n]) = Q(x[1,n−1]) + p(x[1,n−1])Q(xn ),
p(x[1,n−1]) = p(x[1,n−2])p(xn−1).
Lund, Sweden, February, 2005 14:22
Coding procedure
x = (x1, . . . , xn)
Initialization
F = 0; G = 1; Q(1) = 0;
for j = 2 : M
Q(j) = Q(j − 1) + p(j − 1);
end;
for i = 1 : n
F ← F + Q(xi) × G;
G ← G × p(xi);
end;
F = F + G/2; l = −log2 G/2; F̂ ← F ∗ 2l ;
Lund, Sweden, February, 2005 15:22
X = {a, b, c},
p(a) = 0.1, p(b) = 0.6, p(c) = 0.3
x = (bcbab), n = 5
i xi p(xi ) Q(xi ) F G
0 - - - 0.0000 1.0000
1 b 0.6 0.1 0.1000 0.6000
2 c 0.3 0.7 0.5200 0.1800
3 b 0.6 0.1 0.5380 0.1080
4 a 0.1 0.0 0.5380 0.0108
5 b 0.6 0.1 0.5391 0.0065
Codeword length −log2 G + 1 = 9
F + G/2 = 0.5423... and
codeword F̂ = F + G/2l = 100010101
H(X) = 1.3 bits R = 1.8 bit/symbol
Lund, Sweden, February, 2005 16:22
At each step of the coding algorithm we perform
1 addition and 2 multiplications.
Let p(1), . . . , p(M ) be numbers with binary rep-
resentation of length d. Then at the first step
F and G will be numbers with binary representa-
ton of length 2d. Next steps will require length
of binary representation 3d, . . . , nd.
The complexity of coding procedure can be es-
timated as
n(n + 1)d
d + 2d + · · · + nd =
2
PROBLEMS
1. Algorithm requires high computational accu-
racy (theoretically infinite)
2. Computational delay=length of the sequence
to be encoded.
Lund, Sweden, February, 2005 17:22
Decoding of Gilbert-Moore code
Q(m), m = 1, . . . , M are known.
Input:σ̂.
Set m = 1
While Q(m + 1) < σ̂ m ← m + 1
end;
Output: x(m)
Example.
σ̂ = 0.01 → σ̂ = 0.25
Q(2) = 0.1 < 0.25 m = 2
Q(3) = 0.7 > 0.25 stop with m = 2.
Lund, Sweden, February, 2005 18:22
Decoding procedure:
F̂ ← F̂ /2l ; S = 0; G = 1;
for i = 1 : n
j = 1;
while S + Q(j + 1) × G < F̂ andj ≤ M
j ←j+1
end;
S ← S + Q(j) × G;
G ← G × Q(j);
xi = j;
end;
At the ith step G = p(x[1,i]) and S = Q(x[1,i]).
Lund, Sweden, February, 2005 19:22
a, b, c p(a) = 0.1, p(b) = 0.6, p(c) = 0.3
Codeword 0100010101 F̂ = 0.541
F̂ = 0.0100010101
S G Hyp. Q S + QG xi p
0.0000 1.000 a 0.0 0.0000 < F̂
b 0.1 0.1000 < F̂ b 0.6
c 0.7 0.7000 > F̂
0.1000 0.6000 a 0.0 0.1000 < F̂
b 0.1 0.1600 < F̂ c 0.3
c 0.7 0.5200 < F̂
0.5200 0.1800 a 0.0 0.5200 < F̂
b 0.1 0.5380 < F̂ b 0.6
c 0.7 0.6460 > F̂
0.5380 0.1080 a 0.0 0.5380 < F̂
b 0.1 0.5488 > F̂ a 0.1
0.5380 0.0108 a 0.0 0.5380 < F̂
b 0.1 0.5391 < F̂ b 0.6
c 0.7 0.5456 > F̂
Lund, Sweden, February, 2005 20:22
1. High < 0.5
Bit = 0;
Normalization:
Low = Low × 2
High = High × 2
Low = 0; High = 0.00011000001
Bit = 0; High = 0.0011000001
Bit = 0; High = 0.011000001
Bit = 0; High = 0.11000001
2.Low > 0.5
Bit = 1;
Normalization:
Low = Low − 0.5;Low = Low × 2
High = High − 0.5; High = High × 2
Low = 0.11000011
Bit = 1; Low = 0.1000011
Bit = 1; Low = 0.000011
Lund, Sweden, February, 2005 21:22
3. Low < 0.5 High > 0.5
0.011111...1
It can be 0.01111...10 or 0.10000...01
Low = 0.0110 < 0.5 High = 0.1010 > 0.5
Count=1;
Read next symbol
Low = 0.10001 = 0.0110 + 0.00101
High = 0.10101
Bit=1; Output: 10
Lund, Sweden, February, 2005 22:22