0% found this document useful (0 votes)
47 views36 pages

Lec4 Arith Compression

Uploaded by

sherifrax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views36 pages

Lec4 Arith Compression

Uploaded by

sherifrax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CS 411 : Data Compression

Lecture 4
Arithmetic COMPRISSION AND
DECOMPRISSION Method
Arithmetic Coding History
 The idea of arithmetic coding was suggested by
Rissanen [1975] from the theory of enumerative
coding by Pasco [1976].
 The material of this notes is based on the most
popular implementation of arithmetic coding by
Witten, etc., published in Communications of the
Association for Computing Machinery (1987).
 Moffat, etc (1998) also proposed some
improvements upon the 1987 paper; however, the
basic idea remains same.
Arithmetic (or Range) Coding
(addresses coding redundancy)

 Huffman coding encodes source symbols one at a


time which might not be efficient overall.
 Arithmetic coding assigns sequences of source
symbols to variable length code words.
 There is no one-to-one correspondence between
source symbols and code words.
 Slower than Huffman coding but can achieve
higher compression.
Arithmetic Coding – Main Idea
 Maps a sequence of symbols to a real number (arithmetic
code) in the interval [0, 1).

α1 α2 α 3 α3 α4

 The mapping is built incrementally (i.e., as each source


symbol arrives) and depends on the source symbol
probabilities.
 The original sequence of symbols can be obtained by
decoding the arithmetic code.
Arithmetic Coding – Main Idea (cont’d)
symbol sequence: α1 α2 α3 α3 α4
known probabilities P(αi)
 Start with the interval [0, 1)

0 1
 A sub-interval of [0,1) is chosen to encode the first symbol α1 in the
sequence (based on P(α1)).

0 1

 A sub-interval inside the previous sub-interval is chosen to encode the


next symbol α2 in the sequence (based on P(α2)).
0 1
 Eventually, the whole symbol sequence is encoded by a number within
the final sub-interval, e.g.,:
final
Arithmetic Coding
 Encoding algorithm for arithmetic coding.

Low = 0.0 ; high =1.0 ;


while not EOF do
range = high - low ; read(c) ;
high = low + rangehigh_range(c) ;
low = low + rangelow_range(c) ;
enddo
output(low);
Arithmetic Decoding
 Decoding algorithm

r = input_code
repeat
search c such that r falls in its range ;
output(c) ;
r = r - low_range(c) ;
r = r/(high_range(c) - low_range(c));
until r equal 0
Example 1
Example 1
Example 1
Example 1
Example 1
Example 1
Arithmetic Coding Example (2)
Character probability Range
^(space) 1/10 0.00  r  0.10
A 1/10 0.10  r  0.20
B 1/10 0.20  r  0.30
E 1/10 0.30  r  0.40
G 1/10 0.40  r  0.50
I 1/10 0.50  r  0.60
L 2/10 0.60  r  0.80
S 1/10 0.80  r  0.90
T 1/10 0.90  r  1.00

Suppose that we want to encode the message


BILL GATES
Arithmetic Coding Example (2)
0.2 0.2572
0.0 0.25 0.256
^ ^
0.1 0.25724
0.2 A
0.3 B
0.4 E
0.5 G 0.25
0.6 I I
0.26 0.2572
0.256
L L L
0.8 0.258 0.2576
0.9 S
1.0 T 0.3 0.26 0.258 0.2576
Arithmetic Coding Example (2)
New character Low value high value
B 0.2 0.3
I 0.25 0.26
L 0.256 0.258
L 0.2572 0.2576
^(space) 0.25720 0.25724
G 0.257216 0.257220
A 0.2572164 0.2572168
T 0.25721676 0.2572168
E 0.257216772 0.257216776
S 0.2572167752 0.2572167756
Arithmetic Coding Example (2)
 The final value, named a tag, 0.2572167752 will uniquely encode
the message ‘BILL GATES’.
 Any value between 0.2572167752 and 0.2572167756 can be a tag
for the encoded message, and can be uniquely decoded.
Arithmetic Decoding
 Decoding is the inverse process.
 Since 0.2572167752 falls between 0.2 and 0.3, the first character
must be ‘B’.
 Removing the effect of ‘B’ from 0.2572167752 by first subtracting
the low value of B, 0.2, giving 0.0572167752.
 Then divided by the width of the range of ‘B’, 0.1. This gives a
value of 0.572167752.
 Then calculate where that lands, which is in the range of the next
letter, ‘I’.
 The process repeats until 0 or the known length of the message
is reached.
r c Low High range
0.2572167752 B 0.2 0.3 0.1
0.572167752 I 0.5 0.6 0.1
0.72167752 L 0.6 0.8 0.2
0.6083876 L 0.6 0.8 0.2
0.041938 ^(space) 0.0 0.1 0.1
0.41938 G 0.4 0.5 0.1
0.1938 A 0.2 0.3 0.1
0.938 T 0.9 1.0 0.1
0.38 E 0.3 0.4 0.1
0.8 S 0.8 0.9 0.1
0.0
Arithmetic Coding Example (3)

Symbol probability Range


1 0.80 [0.00, 0.80)
2 0.02 [0.80, 0.82)
3 0.18 [0.82, 1.00)

Suppose that we want to encode the message


1321
Arithmetic Coding Example (3)
0.00 0.00 0.7712 0.7712
0.656

1
1

0.7712 0.773504
0.80 2 2
0.82 0.656 0.77408
3 3
1.00 0.77408 0.773504
0.80 0.80
Arithmetic Coding Example (3)
Encoding:

New character Low value High value


0.0 1.0
1 0.0 0.8
3 0.656 0.800
2 0.7712 0.77408
1 0.7712 0.773504

0.7712  0.773504
Tx (1312)   0.772352
2
Arithmetic Coding Example (3)

Decoding:

r c low high range

0.772352 1 0 0.8 0.8 (0.772352-0)/0.8=0.96544

0.96544 3 0.82 1.0 0.18 (0.96544-0.82) / 0.18=0.808

0.808 2 0.8 0.82 0.02 (0.808-0.8)/0.02=0.4

0.4 1 0 0.8
Arithmetic Coding – Example 4
Subdivide [0,1)
based on P(αi)
Encode
α1 α2 α3 α3 α 4

Subdivide Subdivide Subdivide Subdivide Subdivide

[0.06752, 0.0688) 0.8 1.6


final sub-interval

0.4 0.08
arithmetic code: 0.068
0.2 0.04
(can choose any number
within the final sub-interval)

Warning: finite precision arithmetic might cause problems due to


truncations!
Arithmetic Coding – Example 4 (cont’d)

 The arithmetic code 0.068 can be encoded using


Binary Fractions: α1 α 2 α3 α3 α4

0.0068 ≈ 0.000100011 (9 bits) (subject to conversion errors;


exact value is 0.068359375)

 Huffman Code:
0100011001 (10 bits)

 Fixed Binary Code:


5 x 8 bits/symbol = 40 bits
Arithmetic Decoding – Example 4
Subdivide based
on P(αi) Subdivide Subdivide Subdivide Subdivide

1.0 0.8 0.72 0.592 0.5728


α4 α4 α4 α4 α4
0.8 0.72 0.688 0.5856 0.57152

Decode 0.572
α3 α3 α3 α3 α3

0.4 0.56 0.624 0.5728 0.56896


α2 α2 α2 α2 α2 α3 α3 α1 α2 α4

0.2 0.48 0.592 0.5664 0.56768


α1 α1 α1 α1 α1 A special EOF symbol can
be used to terminate iterations.
0.0 0.4
0.56 0.56 0.5664
Example 5
 Alphabet   {a, b, c, d , e, f }
 M = [a, b, a, a, a, e, a, a, b, a]
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]

 M[1] = a,

 LOW = 0.0

 RANGE = P[a] = 0.67

27
Example 5
 Alphabet   {a, b, c, d , e, f }
 M = [a, b, a, a, a, e, a, a, b, a]
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]

 M[2] = b,
 LOW = LOW + PC[b] * RANGE = 0.0 + 0.67 * 0.67 =
0.44890000000000

 RANGE = RANGE * P[b] = 0.67 * 0.11 =


0.07370000000000

28
Example 5
 Alphabet   {a, b, c, d , e, f }
 M = [a, b, a, a, a, e, a, a, b, a]
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]

 M[3] = a,

 LOW = LOW + PC[a] * RANGE =


0.44890000000000 + 0.0 * 0.07370000000000 =
0.44890000000000

 RANGE = RANGE * P[a] = 0.07370000000000 *


0.67 = 0.04937900000000

29
Example 5
 Alphabet   {a, b, c, d , e, f }
 M = [a, b, a, a, a, e, a, a, b, a]
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]
 LOW = 0.469404611259293
RANGE = 0.00003666730521220415
 OUTPUT 0.46942

30
Decode the Message
--- Example 5
 Alphabet   {a, b, c, d , e, f }
 |M| = 10
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]
 V = 0.46942

 Recover symbol #1.


 LOW = 0.0
 RANGE = 1.0
 V = 0.46942 lies in the interval [0.0, 0.67)
 Output symbol a
 We have the interval [newLOW, newRANGE) = [0.0, 0.67)
 Update the V: V = (V-newLOW) / newRANGE.
 We have V = 0.46942/0.67 = 0.70062686567164

31
Decode the Message
--- Example 5 (continued)
 Alphabet   {a, b, c, d, e, f }
 |M| = 10
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]

 Recover symbol #2.


 LOW = 0.0
 RANGE = 1.0
 V= 0.70062686567164, lies in the interval [0.67, 0.78).
 Output symbol b
 we have the interval [newLOW, newRANGE) = [0.67, 0.11)
 Update the V: V = (V-newLOW) / newRANGE.
 We have V = (0.70062686567164 -0.67) / 0.11 =
0.27842605156036

32
Decode the Message
--- Example 5 (continued)
 Alphabet   {a, b, c, d , e, f }
 |M| = 10
 P = [0.67, 0.11, 0.07, 0.06, 0.05, 0.04].
 PC = [0.0, 0.67, 0.78, 0.85, 0.91, 0.96, 1.00]

 Recover symbol #3.


 We have V = (0.70062686567164 -0.67) / 0.11 =
0.27842605156036

 V=0.27842605156036, lies in the interval [0.0, 0.67), so target


symbol is a.
 we have the interval [newLOW, newRANGE) = [0.0, 0.67).
 Update the V: V = (V-newLOW) / newRANGE.
 We have V = (0.27842605156036 - 0.0) / 0.67 =
33
0.41556127098561
Example 6
 MESSAGE ( 0.23355) SYMBOLS Probabilities RANGE

A 0.1 [0.0,0.1)

E 0.2 [0.1,0.3)

I 0.3 [0.3,0.6)

O 0.1 [0.6,0.7)

U 0.2 [0.7,0.9)

! 0.1 [0.9,1.0)
 For message 0.23355 For message 0.23355
Starting=0.3-0.1=0.2 Starting=0.232-0.234=0.002
0.1+0.1 x 0.2=0.12 0.232+0.1 x 0.002=0.2322
0.12+0.2x0.2=0.16 0.2322+0.2x0.002=0.2326
0.16+0.3x0.2=0.22 0.2326+0.3x0.002=0.2332
0.22+0.1x0.2=0.24 0.2332+0.1x0.002=0.2334
0.24+0.2x0.2=0.28 0.2334+0.2x0.002=0.2338
0.28+0.1x0.2=0.3 0.2338+0.1x0.002=0.234

 For message 0.23355 For message 0.23355


Starting=0.2338-
Starting=0.24-0.22=0.02
0.2334=0.0004
0.22+0.1 x 0.02=0.222
0.2334+0.1 x 0.0004=0.23344
0.222+0.2x0.02=0.226
0.23344+0.2x0.0004=0.23352
0.226+0.3x0.02=0.232 0.23352+0.3x0.0004=0.23364
0.232+0.1x0.02=0.234 0. 23364+0.1x0.0004=0.23368
0.234+0.2x0.02=0.238 0. 23368+0.2x0.0004=0.23376
0.238+0.1x0.02=0.24 0. 23376+0.1x0.0004=0.2338

For message 0.23355


Starting=0.23364-0.23352=0.00012
0.23352+0.1 x 0.00012=0.233532
0. 233532+0.2x0.00012=0.233556
0. 233556+0.3x0.00012=0.233592
0. 233592+0.1x0.00012=0.233604
0. 233604+0.2x0.00012=0.233628
0. 233628+0.1x0.00012=0.23364
E

0.0 0.1 0.3 0.6 0.7 0.9 1


CODED
Message IS:
o EOUIE
0.2 0.1 0.12 0.16 0.22 0.24 0.28 0.3
O
0.02 0.22 0.222 0.226 0.232 0.234 0.238 0.24

0.002 0.232 0.2322 0.2326 0.2332 0.2334 0.2338 0.234

0.0004 0.2334 0.23344 0.23352 0.23364 0.23368 0.23376 0.2338

0.00012 0.23352 0.233532 0.233556 0.233592 0.233604 0.233628 0.23364

You might also like