Adaptiv Huffman Coding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Adaptive Huffman Coding CSEP 590 Data Compression

Autumn 2007 One pass During the pass calculate the frequencies Update the Huffman tree accordingly
Coder new Huffman tree computed after transmitting the symbol Decoder new Huffman tree computed after receiving the symbol

Adaptive Huffman Coding

Symbol set and their initial codes must be known ahead of time. Need NYT (not yet transmitted symbol) to indicate a new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 2

Optimal Tree Numbering


a : 5, b: 2, c : 1, d : 3

Weight the Nodes


a : 5, b: 2, c : 1, d : 3
11 5 6 3 3

a d

a d
1 2

CSEP 590 - Lecture 2 - Autumn 2007

CSEP 590 - Lecture 2 - Autumn 2007

Number the Nodes


a : 5, b: 2, c : 1, d : 3
11 5 5 a 3 1

Adaptive Huffman Principle


In an optimal tree for n symbols there is a numbering of the nodes y1<y2<... <y2n-1 such that their corresponding weights x1,x2, ... , x2n-1 satisfy:

7
6

6
3

3
2

d c 1 b 2

x1 < x2 < ... < x2n-1 siblings are numbered consecutively

And vice versa


That is, if there is such a numbering then the tree is optimal. We call this the node number invariant.

Number the nodes as they are removed from the priority queue.
CSEP 590 - Lecture 2 - Autumn 2007 5

CSEP 590 - Lecture 2 - Autumn 2007

Initialization
Symbols a1, a2, ... ,am have a basic prefix code, used when symbols are first encountered. Example: a, b ,c, d, e, f, g, h, i, j
0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1

Initialization
The tree will encode up to m + 1 symbols including NYT. We reserve numbers 1 to 2m + 1 for node numbering. The initial Huffman tree consists of a single node weight
0 NYT

2m + 1

b c

f 0 g

node number

h i

j
7 CSEP 590 - Lecture 2 - Autumn 2007 8

CSEP 590 - Lecture 2 - Autumn 2007

Coding Algorithm
1. If a new symbol is encountered then output the code for NYT followed by the fixed code for the symbol. Add the new symbol to the tree. 2. If an old symbol is encountered then output its code. 3. Update the tree to preserve the node number invariant.

Decoding Algorithm
1. Decode the symbol using the current tree. 2. If NYT is encountered then use the fixed code to decode the symbol. Add the new symbol to the tree. 3. Update the tree to preserve the node number invariant.

CSEP 590 - Lecture 2 - Autumn 2007

CSEP 590 - Lecture 2 - Autumn 2007

10

Updating the Tree


1. Let y be leaf (symbol) with current weight x.* 2. If y the root update x by 1, otherwise, 3. Exchange y with the largest numbered node with the same weight (unless it is the parent).** 4. Update x by 1 5. Let y be the parent with its weight x and go to 2. *We never update the weight of NYT ** This exchange will preserve the node number invariant output = 000
fixed code
CSEP 590 - Lecture 2 - Autumn 2007 11

Example
aabcdad in alphabet {a,b,..., j}
0 NYT

21

fixed code for a

CSEP 590 - Lecture 2 - Autumn 2007

12

Example
aabcdad
1
0

Example
aabcdad
1
0

21
1

21
1

0 19 NYT

1 a

20

0 19 NYT

1 a

20

output = 000
CSEP 590 - Lecture 2 - Autumn 2007 13

output = 0001
CSEP 590 - Lecture 2 - Autumn 2007 14

Example
aabcdad
1
0

Example
aabcdad
2
0

21
1

21
1

0 19 NYT

2 a

20

0 19 NYT

2 a

20

NYT

fixed code for b

output = 0001
CSEP 590 - Lecture 2 - Autumn 2007 15

output = 00010001
CSEP 590 - Lecture 2 - Autumn 2007 16

Example
aabcdad
2
0

Example
aabcdad
2
0

21
1

21
1

0
0

19
1

2 a

20
0

0 0 17 NYT

19
1

2 a

20

0 17 NYT

0 a b

18

1 a b

18

output = 00010001
CSEP 590 - Lecture 2 - Autumn 2007 17

output = 00010001
CSEP 590 - Lecture 2 - Autumn 2007 18

Example
aabcdad
2
0

Example
aabcdad
3
0

21
1

21
1

1
0

19
1

2 a

20
0

1 0 17 NYT

19
1

2 a

20

0 17 NYT

1 a b

18

1 a b

18 fixed code for c

NYT output = 00010001


CSEP 590 - Lecture 2 - Autumn 2007 19

output = 0001000100010
CSEP 590 - Lecture 2 - Autumn 2007 20

Example
aabcdad
0

Example
aabcdad
0

3 1
0

21
1

3 1
0

21
1

19
1

2 a

20
0
0

19
1

2 a

20

0
0

17
1

a 1 b

18
0 NYT

17
1

a 1 b

18

0 NYT

a 16 15 0 c

a 16 15 1 c

output = 0001000100010
CSEP 590 - Lecture 2 - Autumn 2007 21

output = 0001000100010
CSEP 590 - Lecture 2 - Autumn 2007 22

Example
aabcdad
0

Example
aabcdad
0

3 1
0

21
1

3 2
0

21
1

19
1

2 a

20
1
0

19
1

2 a

20

1
0

17
1

a 1 b

18
0 NYT

17
1

a 1 b

18

0 NYT

1 16 15 a c

1 16 15 a c

output = 0001000100010
CSEP 590 - Lecture 2 - Autumn 2007 23

output = 0001000100010
CSEP 590 - Lecture 2 - Autumn 2007 24

Example
aabcdad
0

Example
aabcdad
2 4
0

21
1

4 2
0

21
1

19
1

19
1

2 a

20
0

2 a

20

1 0
0

17
1

a 1 b

18

1
0

17
1

a 1 b

18

0 NYT

1 16 15 a c

1 16 15 a
1

fixed code for d NYT

0 NYT

13

0 a d

14

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 25

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 26

Example
aabcdad
2
0

Example
aabcdad
4
0

4
0

21
1

21
1

19
1

2 a

20
0

2 1
0

19
1

2 a

20 exchange!

1
0

17
1

a 1 b

18
1
0

17
1

a 1 b

18

0
0

1 16 15 a
1

1 16 15 a
1

0 NYT

13

1 a d

14

0 NYT

13

1 a d

14

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 27

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 28

Example
aabcdad
2
0

Example
aabcdad
4
0

4
0

21
1

21
1

19
1

2 a 1

20
0

2 1 b
0

19
1

2 a 2

20 exchange!

1 b

17
0

18
1

17
0

18
1

1
0

a 16 15 1
1

1 0 NYT

a 16 15 1
1

0 NYT

13

1 a d

14

13

1 a d

14

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 29

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 30

Example
aabcdad
4
0

Example
aabcdad
4
0

21
1

21
1

2 a

19
0

20
1

2 a

19
0

20
1

1 b

17
0

2 1

18
1

1 b

17
0

2 1

18
1

15
1

1 a c

16
0

1 16 15 a
1

0 NYT

13

1 a d

14

0 NYT

13

1 a d

14

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 31

output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 32

Example
aabcdad
5
0

Example
Note: the first a is coded as 000, the second as 1, and the third as 0

21
1

aabcdad

5
0

21
1

2 a

19
0

20
1

3 a

19
0

20
1

1 b

17
0

2 1

18
1

1 b

17
0

2 1

18
1

15
1

1 a c

16
0

1 16 15 a
1

0 NYT

13

1 a d

14

0 NYT

13

1 a d

14

output = 00010001000100000110
CSEP 590 - Lecture 2 - Autumn 2007 33

output = 00010001000100000110
CSEP 590 - Lecture 2 - Autumn 2007 34

Example
aabcdad
6
0

Example
aabcdad
6
0

21
1

21
1

3 a

19
0

20
1

exchange! 18
1

3 a

19
0

20
1

1 b

17
0

2 1

1 d

17
0

2 1

18
1

a 16 15 1
1

a 16 15 1
1

0 NYT

13

1 a d

14

0 NYT

13

1 a b

14

output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 35

output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 36

Example
aabcdad
6
0

Example
aabcdad
6
0

21
1

21
1

3 a

19
0

20
1

3 a

19
0

20
1

2 d

17
0

2 1

18
1

2 d

17
0

2 1

18
1

15
1

1 a c

16
0

1 16 15 a
1

0 NYT

13

1 a b

14

0 NYT

13

1 a b

14

output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 37

output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 38

Example
aabcdad
7
0

Data Structure for Adaptive Huffman


a b c d . . . j NYT
39

21
1 0

7
1

3 a

19
0

20
1

3 a

4
0 1

2 d

17
0

2 1

18
1

2 d
0 0

2
1

1. Fixed code table 2. Binary tree with parent pointers 3. Table of pointers nodes into tree 4. Doubly linked list to rank the nodes 1 a c

15
1

1 a c

16

1
1

0 NYT

13

1 a b

14

0 NYT

a 1 b

output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 CSEP 590 - Lecture 2 - Autumn 2007 40

In Class Exercise
Decode using adaptive Huffman coding assuming the following fixed code
0 0 0 1 0 1 1 0 1 0 1 0 1 1

Huffman Summary

41

b c

00110000
CSEP 590 - Lecture 2 - Autumn 2007

Statistical compression algorithm Prefix code Fixed-to-variable length code Optimization to create a best code Symbol merging Context Adaptive coding Decoder and encoder behave almost the same Need for data structures and algorithms
CSEP 590 - Lecture 2 - Autumn 2007 42

You might also like