0% found this document useful (0 votes)

124 views22 pages

Arithmetic Coding

Arithmetic coding is summarized in 3 sentences: Arithmetic coding is a method for lossless data compression that encodes data symbols using arithmetic operations on real numbers in the interval [0,1). It provides near-entropy compression by mapping variable-length strings of input symbols into fixed-length representations. Arithmetic coding achieves better compression than other entropy encoding techniques like Huffman coding by encoding multiple symbols in a single interval.

Uploaded by

Pradeep Kumar Sriperumbudur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views22 pages

Arithmetic Coding

Uploaded by

Pradeep Kumar Sriperumbudur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Arithmetic coding

Irina Bocharova,
University of Aerospace Instrumentation,
St.-Petersburg, Russia

Outline

Shannon-Fano-Elias coding

Gilbert-Moore coding

Arithmetic coding as a generalization of SFE

and GM coding

Implementation of arithmetic coding

Lund, Sweden, February, 2005 1:22

Let x ∈ X = {1, . . . , M }, p(x) > 0,
p(1) ≥ p(2) ≥ · · · ≥ p(M ).
The cumulative sum is associated with the sym-
bol x

Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0,Q(2) = p(1), . . . , Q(M ) = i=1 p(i).

Then Q(m)lm is a codeword for m,

where lm = −log2 p(m)

Lund, Sweden, February, 2005 2:22

x p(x) Q Q in binary l(x) codeword
1 0.6 0 0.0 1 0
2 0.3 0.6 0.1001. . . 2 10
3 0.1 0.9 0.1110. . . 4 1110

L = 1.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 3:22

If lm binary symbols have been already transmit-
ted then the length of the interval of uncertainty
is 2−lm . Thus we can decode uniquely if

2−lm ≤ p(m)
or
lm ≥ − log2 p(m)

Choosing length lm we used only right segment

with respect to the point Q(m). This segment is
always shorter than the corresponding left seg-
ment since symbol probabilities are ordered in
descending order.

H(X) ≤ L < H(X) + 1.

Lund, Sweden, February, 2005 4:22

Let x ∈ X = {1, . . . , M }, p(x) > 0.
The cumulative sum is associated with the sym-
bol x

Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0, Q(2) = p(1), . . . , Q(M ) = i=1 p(i).

Introduce σ(x) = Q(x) + p(x)

Then σ̂(m) = σ(m)lm is a codeword for m,

where lm = −log2(p(m)/2).

Lund, Sweden, February, 2005 5:22

We put point σ(m) to the center of the segment
Q(m) + p(m)/2 and choose length of codeword
in such a manner that if lm binary symbols have
been transmitted the length of the interval of
uncertainty is less than or equal to p(m)/2.

Lund, Sweden, February, 2005 6:22

x p(x) Q σ l GM ShFE
1 0.1 0.0 0.00001... 5 00001 0000
2 0.6 0.0001.. 0.01100... 2 01 0
3 0.3 0.10110... 0.11011... 3 110 10

L = 2.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 7:22

Let i < j then σ(j) > σ(i)
j−1
i−1
p(j) p(i)
σ(j) − σ(i) = p(l) − p(l) + −
l=1 l=1
2 2

j−1
p(j) − p(i) p(j) − p(i)
= p(l) + ≥ p(i) +
l=i
2 2

p(i) + p(j) max{p(i), p(j)}

≥ ≥
2 2
Since lm = − log2 p(m)
2 ≥ − log2
p(m)
2

we obtain
max{p(i), p(j)}
σ(j) − σ(i) ≥ ≥ 2− min{li,lj }.
2
H(X) + 1 ≤ L < H(X) + 2

Lund, Sweden, February, 2005 8:22

When symbol-by-symbol coding is not eﬃ-
cient?
1. Memoryless source
For symbol-by-symbol coding
R = H(X) + α,
where α is coding redundancy.
For block coding
H(X n) + α nH(X) + α α
R= = = H(X) + ,
n n n
where H(X n ) denotes entropy of n random vari-
ables.
If H(X) << 1 R ≥ 1 for symbol-by-symbol cod-
ing. For binary memoryless source with p(0) =
0.99, p(1) = 0.01 H(X) = 0.081 bits and we can
easily construct the Huﬀman code with R = 1
bit but it is impossible to obtain R < 1 bit.
2.Source with memory
H(X n ) ≤ nH(X) and
H(X n ) + α α
R= ≤ H(X) + .
n n
R → H∞(X)
when n → ∞, H∞(X) denotes entropy rate.

Lund, Sweden, February, 2005 9:22

How to implement block coding?
Let x ∈ X = {1, . . . , M }, and we are going to
encode sequences x = (x1, . . . , xn) which appear
at the output of X during n consecutive time
moments.

We can consider a new source X n with symbols

corresponding to the sequences x = (x1, . . . , xn)
of length n and apply any method of symbol-by-
symbol coding to these symbols. We will obtain
H(X n ) α
R= + ,
n n
where α depends on the chosen coding proce-
dure.

The problem is coding complexity. The alpha-

bet of the new source is of size M n. For example,
if M = 28 = 256 then for n = 2 M 2 = 65536,
and for n = 3 M 3 = 16777216.

The arithmetic coding provides redundancy 2/n

with complexity n2.

Lund, Sweden, February, 2005 10:22

Arithmetic coding is a direct extension of the
Gilbert-Moore coding scheme.

Let x = (x1, x2, . . . , xn) be an M -ary sequence of

length n. We construct the modiﬁed cumulative
distribution function
p(x) p(x)
σ(x) = p(a) + = Q(x) + ,
a≺x 2 2
where a ≺ x means that a is lexicographically less
than x, l(x) = −log2(p(x)/2).

The code rate R is equal to

1 1 1
p(x)l(x) = p(x)(log2 + 1)
n x n x p(x)

H(X n) + 2
<
n

If the source generates symbols independently

we obtain
2
R < H(X) + .
n
For source with memory
R → H∞(X)
when n → ∞.

Lund, Sweden, February, 2005 11:22

Consider

Q(x[1,n]) = p(a) =
a≺x

p(a)+
a:a[1,n−1] ≺x[1,n−1],an

p(a),
a:a[1,n−1] =x[1,n−1],an≺xn

where x[1,i] = x1, x2, . . . , xi. It is easy to see that

Q(x[1,n]) = Q(x[1,n−1])+ p(a)
a:a[1,n−1]=x[1,n−1] ,an≺xn

= Q(x[1,n−1]) + p(a[1,n−1]) p(an /a[1,n−1]).
an≺xn
If the source generates symbols independently
n−1

p(a[1,n−1]) = p(ai ),
i=1

p(an /a[1,n−1]) = p(an ) = Q(xn ),
an≺xn an≺xn
where Q(xi) denotes the cumulative probability
for xi.

Lund, Sweden, February, 2005 12:22

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
...

Lund, Sweden, February, 2005 13:22

We obtain the following recurrent equations

Q(x[1,n]) = Q(x[1,n−1]) + p(x[1,n−1])Q(xn ),

p(x[1,n−1]) = p(x[1,n−2])p(xn−1).

Lund, Sweden, February, 2005 14:22

Coding procedure
x = (x1, . . . , xn)

Initialization

F = 0; G = 1; Q(1) = 0;
for j = 2 : M

Q(j) = Q(j − 1) + p(j − 1);

end;

for i = 1 : n

F ← F + Q(xi) × G;

G ← G × p(xi);
end;

F = F + G/2; l = −log2 G/2; F̂ ← F ∗ 2l ;

Lund, Sweden, February, 2005 15:22

X = {a, b, c},
p(a) = 0.1, p(b) = 0.6, p(c) = 0.3

x = (bcbab), n = 5

i xi p(xi ) Q(xi ) F G
0 - - - 0.0000 1.0000
1 b 0.6 0.1 0.1000 0.6000
2 c 0.3 0.7 0.5200 0.1800
3 b 0.6 0.1 0.5380 0.1080
4 a 0.1 0.0 0.5380 0.0108
5 b 0.6 0.1 0.5391 0.0065

Codeword length −log2 G + 1 = 9

F + G/2 = 0.5423... and

codeword F̂ = F + G/2l = 100010101

H(X) = 1.3 bits R = 1.8 bit/symbol

Lund, Sweden, February, 2005 16:22

At each step of the coding algorithm we perform
1 addition and 2 multiplications.
Let p(1), . . . , p(M ) be numbers with binary rep-
resentation of length d. Then at the ﬁrst step
F and G will be numbers with binary representa-
ton of length 2d. Next steps will require length
of binary representation 3d, . . . , nd.

The complexity of coding procedure can be es-

timated as
n(n + 1)d
d + 2d + · · · + nd =
2

PROBLEMS

1. Algorithm requires high computational accu-

racy (theoretically inﬁnite)

2. Computational delay=length of the sequence

to be encoded.

Lund, Sweden, February, 2005 17:22

Decoding of Gilbert-Moore code

Q(m), m = 1, . . . , M are known.

Input:σ̂.

Set m = 1
While Q(m + 1) < σ̂ m ← m + 1
end;
Output: x(m)

Example.

σ̂ = 0.01 → σ̂ = 0.25

Q(2) = 0.1 < 0.25 m = 2

Q(3) = 0.7 > 0.25 stop with m = 2.

Lund, Sweden, February, 2005 18:22

Decoding procedure:
F̂ ← F̂ /2l ; S = 0; G = 1;

for i = 1 : n
j = 1;

while S + Q(j + 1) × G < F̂ andj ≤ M

j ←j+1
end;

S ← S + Q(j) × G;

G ← G × Q(j);

xi = j;

end;

At the ith step G = p(x[1,i]) and S = Q(x[1,i]).

Lund, Sweden, February, 2005 19:22

a, b, c p(a) = 0.1, p(b) = 0.6, p(c) = 0.3

Codeword 0100010101 F̂ = 0.541

F̂ = 0.0100010101

S G Hyp. Q S + QG xi p
0.0000 1.000 a 0.0 0.0000 < F̂
b 0.1 0.1000 < F̂ b 0.6
c 0.7 0.7000 > F̂
0.1000 0.6000 a 0.0 0.1000 < F̂
b 0.1 0.1600 < F̂ c 0.3
c 0.7 0.5200 < F̂
0.5200 0.1800 a 0.0 0.5200 < F̂
b 0.1 0.5380 < F̂ b 0.6
c 0.7 0.6460 > F̂
0.5380 0.1080 a 0.0 0.5380 < F̂
b 0.1 0.5488 > F̂ a 0.1
0.5380 0.0108 a 0.0 0.5380 < F̂
b 0.1 0.5391 < F̂ b 0.6
c 0.7 0.5456 > F̂

Lund, Sweden, February, 2005 20:22

1. High < 0.5

Bit = 0;

Normalization:

Low = Low × 2
High = High × 2
Low = 0; High = 0.00011000001
Bit = 0; High = 0.0011000001
Bit = 0; High = 0.011000001
Bit = 0; High = 0.11000001

2.Low > 0.5

Bit = 1;

Normalization:

Low = Low − 0.5;Low = Low × 2

High = High − 0.5; High = High × 2
Low = 0.11000011
Bit = 1; Low = 0.1000011
Bit = 1; Low = 0.000011

Lund, Sweden, February, 2005 21:22

3. Low < 0.5 High > 0.5

0.011111...1
It can be 0.01111...10 or 0.10000...01

Low = 0.0110 < 0.5 High = 0.1010 > 0.5

Count=1;
Read next symbol
Low = 0.10001 = 0.0110 + 0.00101
High = 0.10101
Bit=1; Output: 10

Lund, Sweden, February, 2005 22:22

Data Compression Solutions
79% (19)
Data Compression Solutions
67 pages
Examination Calendar All Ug Course
No ratings yet
Examination Calendar All Ug Course
4 pages
Lecture 5
No ratings yet
Lecture 5
13 pages
Fibonacci Search: Observation On Unimodal Functions
No ratings yet
Fibonacci Search: Observation On Unimodal Functions
5 pages
Chapter 10 3
No ratings yet
Chapter 10 3
54 pages
Source 515 A
No ratings yet
Source 515 A
80 pages
CIS Docker Benchmark v1.5.0 PDF
No ratings yet
CIS Docker Benchmark v1.5.0 PDF
292 pages
Java Solve
No ratings yet
Java Solve
28 pages
Manual - Bancada Presys
No ratings yet
Manual - Bancada Presys
39 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
50% (8)
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
103 pages
Windows System Error Codes
No ratings yet
Windows System Error Codes
304 pages
GSTN Informatin Booklet
No ratings yet
GSTN Informatin Booklet
100 pages
Lecture 4 - Arithmetic Coding and Lempel-Ziv
No ratings yet
Lecture 4 - Arithmetic Coding and Lempel-Ziv
26 pages
Apc2 Inst PDF
No ratings yet
Apc2 Inst PDF
32 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Source Code
No ratings yet
Source Code
49 pages
Visual Basic Programming Handouts-Part1
No ratings yet
Visual Basic Programming Handouts-Part1
11 pages
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
100% (1)
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
16 pages
Assembler Concepts
No ratings yet
Assembler Concepts
400 pages
Network Configuration: 69-3 Nguyen Thi Nho, P9, Q.Tbinh, Tp. HCM
No ratings yet
Network Configuration: 69-3 Nguyen Thi Nho, P9, Q.Tbinh, Tp. HCM
20 pages
Arini, MT, MSC: Basic Compression Entropy Coding Statistical
No ratings yet
Arini, MT, MSC: Basic Compression Entropy Coding Statistical
34 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
Codes, Curves and Cryptography: Informal Notes. I
No ratings yet
Codes, Curves and Cryptography: Informal Notes. I
23 pages
Components of C Program
No ratings yet
Components of C Program
64 pages
Arithmetic Coding: Presented By: Einat & Kim
No ratings yet
Arithmetic Coding: Presented By: Einat & Kim
48 pages
Group 3 Report
No ratings yet
Group 3 Report
66 pages
C in Mainframe
100% (1)
C in Mainframe
1,032 pages
Keyword Protocol 2000 - Part 1 - Physical Layer - Swedish
No ratings yet
Keyword Protocol 2000 - Part 1 - Physical Layer - Swedish
12 pages
WM 2024
No ratings yet
WM 2024
6 pages
Final Python Question Bank
No ratings yet
Final Python Question Bank
457 pages
VLSI Testing - DFT and Scan
No ratings yet
VLSI Testing - DFT and Scan
35 pages
Polymorphism Assignment
No ratings yet
Polymorphism Assignment
5 pages
Data Compression Arithmetic Coding
No ratings yet
Data Compression Arithmetic Coding
38 pages
Final Semester Exam Paper
No ratings yet
Final Semester Exam Paper
4 pages
Keywords and Identifiers in C
No ratings yet
Keywords and Identifiers in C
3 pages
Corp Internet Banking FAQs
No ratings yet
Corp Internet Banking FAQs
2 pages
Cmos Digital Vlsi Design: Sequential Logic Design-VII
No ratings yet
Cmos Digital Vlsi Design: Sequential Logic Design-VII
11 pages
Exam Form VI
No ratings yet
Exam Form VI
2 pages
PYhon Application Programming Good One
No ratings yet
PYhon Application Programming Good One
151 pages
Coding Round Question & Answers
No ratings yet
Coding Round Question & Answers
56 pages
Tejesh Resume PDF
No ratings yet
Tejesh Resume PDF
1 page
Audison Thesis Car Audio
100% (3)
Audison Thesis Car Audio
5 pages
Arithmetic Coding: Implementation Details and Examples
No ratings yet
Arithmetic Coding: Implementation Details and Examples
11 pages
Assembler New Manual PDF
100% (1)
Assembler New Manual PDF
159 pages
Introduction To Visual Basic Programming
No ratings yet
Introduction To Visual Basic Programming
41 pages
John Locke Essays
100% (2)
John Locke Essays
5 pages
Giao Trinh Pythong Eng
No ratings yet
Giao Trinh Pythong Eng
436 pages
Cisco Asa Firepower
No ratings yet
Cisco Asa Firepower
11 pages
What Is Huffman Coding and Its History
No ratings yet
What Is Huffman Coding and Its History
5 pages
Modeling & Simulation: An Introduction To Business Process Simulation
No ratings yet
Modeling & Simulation: An Introduction To Business Process Simulation
24 pages
Coding Theory
100% (1)
Coding Theory
297 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
5 pages
System Software Notes
100% (1)
System Software Notes
97 pages
Introduction To Python
No ratings yet
Introduction To Python
24 pages
Huffman Coding 1
No ratings yet
Huffman Coding 1
54 pages
Information Theory and Coding Sample Question 2021
No ratings yet
Information Theory and Coding Sample Question 2021
5 pages
Computer Network Module 2
No ratings yet
Computer Network Module 2
160 pages
All Coding
No ratings yet
All Coding
52 pages
Python
No ratings yet
Python
52 pages
Image Compression Coding Schemes
50% (4)
Image Compression Coding Schemes
96 pages
Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
LatestPythonLabManual-2023 Batch
No ratings yet
LatestPythonLabManual-2023 Batch
15 pages
System Software Lab
100% (2)
System Software Lab
49 pages
Ethical Hacking: Presented By:-Shravan Sanidhya
No ratings yet
Ethical Hacking: Presented By:-Shravan Sanidhya
29 pages
Common Data Representation Formats Used For Big Data Include
No ratings yet
Common Data Representation Formats Used For Big Data Include
7 pages
Assembler
100% (1)
Assembler
49 pages
Two-Pass Assembler: Optab Symtab Locctr
No ratings yet
Two-Pass Assembler: Optab Symtab Locctr
4 pages
Chapter 4 Computer Codes
No ratings yet
Chapter 4 Computer Codes
30 pages
System Programming
No ratings yet
System Programming
29 pages
CSE 232 Systems Programming: Lecture Notes #2
No ratings yet
CSE 232 Systems Programming: Lecture Notes #2
9 pages
How To Use SIC Simulator and Assembler
100% (2)
How To Use SIC Simulator and Assembler
12 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
No ratings yet
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
8 pages
Scanner Tutorial
No ratings yet
Scanner Tutorial
13 pages
Assembler Intro
No ratings yet
Assembler Intro
22 pages
University of Cambridge Department of Physics
No ratings yet
University of Cambridge Department of Physics
50 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Return To Basic Reference Manual: Gordon Henderson C 2012
No ratings yet
Return To Basic Reference Manual: Gordon Henderson C 2012
73 pages
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
No ratings yet
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
21 pages
Coding
No ratings yet
Coding
9 pages
One Pass Assembler
No ratings yet
One Pass Assembler
2 pages
Programming The Basic Computer
50% (2)
Programming The Basic Computer
34 pages
Digital Code Lock Using Arduino With LCD Display and User Defined Password
100% (1)
Digital Code Lock Using Arduino With LCD Display and User Defined Password
7 pages
MIS Programming Languages COBOL FORTRAN
No ratings yet
MIS Programming Languages COBOL FORTRAN
30 pages
Operators in Python
No ratings yet
Operators in Python
7 pages
Assembly Language - Wikipedia
No ratings yet
Assembly Language - Wikipedia
15 pages
3rd - Python - L 1
No ratings yet
3rd - Python - L 1
9 pages
Pseudocode Example Slides
No ratings yet
Pseudocode Example Slides
16 pages
Python Programming
No ratings yet
Python Programming
3 pages

Arithmetic Coding

Uploaded by

Arithmetic Coding

Uploaded by

Arithmetic coding

Arithmetic coding as a generalization of SFE

Implementation of arithmetic coding

Lund, Sweden, February, 2005 1:22

Then Q(m)lm is a codeword for m,

Lund, Sweden, February, 2005 2:22

L = 1.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 3:22

Choosing length lm we used only right segment

H(X) ≤ L < H(X) + 1.

Lund, Sweden, February, 2005 4:22

Introduce σ(x) = Q(x) + p(x)

Then σ̂(m) = σ(m)lm is a codeword for m,

Lund, Sweden, February, 2005 5:22

Lund, Sweden, February, 2005 6:22

L = 2.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 7:22

p(i) + p(j) max{p(i), p(j)}

Lund, Sweden, February, 2005 8:22

Lund, Sweden, February, 2005 9:22

We can consider a new source X n with symbols

The problem is coding complexity. The alpha-

The arithmetic coding provides redundancy 2/n

Lund, Sweden, February, 2005 10:22

Let x = (x1, x2, . . . , xn) be an M -ary sequence of

The code rate R is equal to

If the source generates symbols independently

Lund, Sweden, February, 2005 11:22

where x[1,i] = x1, x2, . . . , xi. It is easy to see that

Lund, Sweden, February, 2005 12:22

Lund, Sweden, February, 2005 13:22

Q(x[1,n]) = Q(x[1,n−1]) + p(x[1,n−1])Q(xn ),

Lund, Sweden, February, 2005 14:22

Q(j) = Q(j − 1) + p(j − 1);

F = F + G/2; l = −log2 G/2; F̂ ← F ∗ 2l ;

Lund, Sweden, February, 2005 15:22

Codeword length −log2 G + 1 = 9

F + G/2 = 0.5423... and

H(X) = 1.3 bits R = 1.8 bit/symbol

Lund, Sweden, February, 2005 16:22

The complexity of coding procedure can be es-

1. Algorithm requires high computational accu-

2. Computational delay=length of the sequence

Lund, Sweden, February, 2005 17:22

Q(m), m = 1, . . . , M are known.

Q(2) = 0.1 < 0.25 m = 2

Q(3) = 0.7 > 0.25 stop with m = 2.

Lund, Sweden, February, 2005 18:22

while S + Q(j + 1) × G < F̂ andj ≤ M

At the ith step G = p(x[1,i]) and S = Q(x[1,i]).

Lund, Sweden, February, 2005 19:22

Codeword 0100010101 F̂ = 0.541

Lund, Sweden, February, 2005 20:22

2.Low > 0.5

Low = Low − 0.5;Low = Low × 2

Lund, Sweden, February, 2005 21:22

Low = 0.0110 < 0.5 High = 0.1010 > 0.5

Lund, Sweden, February, 2005 22:22

You might also like

F = F + G/2; l = −log2 G/2; F̂ ← F ∗ 2l ;

Codeword length −log2 G + 1 = 9