0% found this document useful (0 votes)
38 views11 pages

Chapter11 Part2

Adaptive Huffman coding is a data compression technique that uses Huffman coding to encode data in a single pass without pre-computing symbol frequencies. It builds the Huffman tree dynamically as it encodes the data, and the decoder reconstructs the same tree. It starts with all symbols in an alphabet node and splits symbols out into new nodes as they are encoded. It maintains the tree structure and sibling property by swapping nodes if frequency updates break these properties.

Uploaded by

Artemis Zeusborn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views11 pages

Chapter11 Part2

Adaptive Huffman coding is a data compression technique that uses Huffman coding to encode data in a single pass without pre-computing symbol frequencies. It builds the Huffman tree dynamically as it encodes the data, and the decoder reconstructs the same tree. It starts with all symbols in an alphabet node and splits symbols out into new nodes as they are encoded. It maintains the tree structure and sibling property by swapping nodes if frequency updates break these properties.

Uploaded by

Artemis Zeusborn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

COS 212

Data Compression: Adaptive Huffman Coding


& Run-Length Encoding
Huffman Coding
 Basic idea behind Huffman coding
 Construct a binary tree based on symbol probabilities
 Determine the encoding for each symbol by tree traversal
 How do we know the probabilities?
 Calculate average character frequencies in the language being encoded, and
use these frequencies as probabilities
 Will the same frequencies be optimal for the COS 212 textbook and “Harry
Potter and the Philosopher’s Stone”?
 The simple solution
 Calculate frequencies for the text being encoded, and send the corresponding
Huffman codes together with the compressed file
 But the text may be long and we now need to run through it twice (once to compute
frequencies, and once to compress it)
 The table of codes can also be cumbersome to work with
 Adaptive Huffman coding
 Go through the text once, generate codes as you go
 Initially inefficient encoding that improves as we “learn” frequencies
 The receiver reconstructs the same Huffman tree dynamically
 To decode, the sender and receiver must agree on alphabet order
Adaptive Huffman Coding
 Start with a Huffman tree with only one node
 This node stores the entire alphabet and is called the alphabet node
 The frequency for this node must be 0

Frequency: 0
(A B C D E F)

 We encode text letter-by-letter


 There is no pre-processing to find letter frequencies
 Move letters out of the alphabet node when they’re first encoded
 Two cases for encoding a letter
1. If the letter is still contained in the alphabet node
2. If the letter is not contained in the alphabet node
Adaptive Huffman Coding
1. If the letter i is still contained in the alphabet node
 Generate a code that identifies the position of i in the alphabet
 Start with Huffman code of alphabet node (empty on 1 st iteration)
 Add a sequence of 1 bits where the number corresponds to the position of the letter
in the alphabet
 Indicate the end of the code with a single 0 bit

Code for A: 10
Alphabet: (A B C D E F)
Code for D: 11110

 Append the generated code to the encoded bit sequence

 Split the letter i out of the alphabet node


 In the alphabet node, move the last letter to overwrite the letter i
Frequency:
 Create 1 for the letter i, with
a new node Create a new parent
a frequency of 1 node
0 1  Left child is the alphabet node
Frequency: 0 Frequency: 1  Right child is the new node
(A
(FBBCCDDEE)F) A  Cumulative frequency is 1
 Increment counts in new node’s ancestors
Input text: AAFCCCBDD
Encoding: 10101000111000111100110
10
Adaptive Huffman Coding
2. If the letter i is not contained in the alphabet node
 The letter is already in the Huffman tree
 Build a Huffman code by traversing from the root to the letter’s leaf
 Append this code to the encoded bit sequence
 Increment the frequency of the letter’s leaf and every ancestor node to more
accurately reflect the actual probabilities in the input text

 Any frequency increment may break Huffman tree structure


 We then need to repair the tree structure
 We’ll link the nodes using a linked list in breadth-first, right-to-left order
 The sibling property must be maintained
 If the frequencies in the list are non-increasing, the tree is a Huffman tree
 If the sibling property is broken at any point, it must be restored

Frequency: 1
0 1
Frequency: 0 Frequency: 112
(F B C D E) A

Input text: AAFCCCBDD Frequencies in list: 1 2 0


Sibling property broken
Encoding: 10101000111000111100110
10 1
Adaptive Huffman Coding
2. If the letter i is not contained in the alphabet node
 Restoring the sibling property
 Sequences of linked list nodes with the same frequency are blocks
 In the example, there were two blocks before the frequency increment

Frequency: 1
Frequencies in list: 1 1 0
Frequency: 0 Frequency: 1
(F B C D E) A

 Assume that the property is broken by a frequency update for node i


 Swap node i with 1st node in its block, unless 1st node is the parent of i
Frequency: 112
 Continue with frequency increments for all the ancestors of i Note that several sibling
0 1
property violations may be
Frequency: 0 Frequency: 22
encountered, requiring a
(F B C D E) A correction each time

Input text: AAFCCCBDD Frequencies in list: 1 12 2 0


Sibling property broken
Encoding: 10101000111000111100110
10 1
Adaptive Huffman Coding
 The letter F is still contained in the alphabet node
 Generate a code for the letter F
 Huffman code of the alphabet node, sequence of 1 bits to indicate position in alphabet, and a 0 bit to terminate
the code
 Split the letter F out of the alphabet node
 In the alphabet node, move the last letter to overwrite the letter F
 Create a new node for the letter F, with a frequency of 1
 Create a new parent node for the alphabet node
 Cumulative frequency is 1
 Increment frequencies in new node’s ancestors

Frequency: 232
0 1
Frequency: 12 Frequency: 2
0 1 A
Frequency: 0 Frequency: 12
(F(EBBCCDD)
E) A
F

Input text: AAFCCCBDD Frequencies in list: 23 2 01 1 0


Encoding: 10101000111000111100110
1010
Adaptive Huffman Coding
 The letter C is still contained in the alphabet node
 Generate a code for the letter C
 Huffman code of the alphabet node, sequence of 1 bits to indicate position in alphabet, and a 0 bit to terminate
the code
 Split the letter C out of the alphabet node
 In the alphabet node, move the last letter to overwrite the letter C
 Create a new node for the letter C, with a frequency of 1
 Create a new parent node for the alphabet node
 Cumulative frequency is 1
 Increment frequencies in new node’s ancestors

Frequency: 43
0 1
Frequency: 213 Frequency: 2
0 1 A
Frequency: 1 Frequency: 12
0 1 F
A
Frequency: 0 Frequency: 1
(E(EBBCD)
D) C
F

Input text: AAFCCCBDD Frequencies in list: 43 2 21 1 10 1 0


Encoding: 10101000111000111100110
10 001110
Adaptive Huffman Coding
 The letter C is not contained in the alphabet node
 Generate Huffman code for the letter C and increment the frequency for node C
 While not at the root, check if the frequency update breaks the sibling property
 If it has, restore the sibling property by swapping the node with the 1 st node in its block
 Perform no swap if the 1st node in the block is the parent of the node
 Update the frequency of the parent node, and repeat

As an exercise, work through the remaining four Decoding follows a very similar procedure to build
inputs according to the procedure we used for this the tree from the encoded bits – Try to work out
example how the algorithm must change

Frequency: 54
0 0 1 1
Frequency:Frequency:
2 32 Frequency:
Frequency: 3 2
0 A 0 1 A 1
Frequency: 1 Frequency: 1 21
Frequency: Frequency: 2
0 1 0 C
F 1 C
Frequency: 0 Frequency:
Frequency: 21 0 Frequency: 1
(E B D) CF (E B D) F

Input text: AAFCCCBDD Frequencies in list: 54 32 23 21 1 2


1 0
001 Sibling property broken
Encoding: 10101000111000111100110
10
Run-Length Encoding
 Relies on the presence of “runs” in the data to be encoded
 Runs are sequences of exactly the same character
AAAABBCDDDDEE
 Instead of sending or storing AAAA, store 4A
 But, when would you ever see such text in the real world?
 It’s very unlikely that you would
 Run-length is inefficient for text!
 But, think about images…
Run-Length Encoding
 We iterate through the letters in the input text
 Encode each run with just two characters
AAAABBCDDDDEE
 4A2B1C4D2E
 The 1st, 2nd, 4th & 5th parts are either compressed or remain the same length
 The one exception is C, where we’ve actually increased the space used
 Solution to this problem
 Compress only the runs that are long enough
 How will we know what is compressed and what isn’t?
 Use a special character (an escape character) to show compressed runs
 For example, if % is the escape character, the encoding is %4A%2BC%4D%2E
 But BB and EE is actually shorter than %2B and %2E
 Solve this by compressing only runs that are 3 or more symbols long
 For example, %4ABBC%4DEE
 Consider AAABBB versus ABABAB
 Huffman encoding would compress both, but could run-length encoding?
 In binary there are lots of runs of 0 and 1 bits
 How would you apply run-length encoding to binary data?

You might also like