0% found this document useful (0 votes)

38 views11 pages

Chapter11 Part2

Adaptive Huffman coding is a data compression technique that uses Huffman coding to encode data in a single pass without pre-computing symbol frequencies. It builds the Huffman tree dynamically as it encodes the data, and the decoder reconstructs the same tree. It starts with all symbols in an alphabet node and splits symbols out into new nodes as they are encoded. It maintains the tree structure and sibling property by swapping nodes if frequency updates break these properties.

Uploaded by

Artemis Zeusborn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

Chapter11 Part2

Uploaded by

Artemis Zeusborn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

COS 212

Data Compression: Adaptive Huffman Coding

& Run-Length Encoding
Huffman Coding
 Basic idea behind Huffman coding
 Construct a binary tree based on symbol probabilities
 Determine the encoding for each symbol by tree traversal
 How do we know the probabilities?
 Calculate average character frequencies in the language being encoded, and
use these frequencies as probabilities
 Will the same frequencies be optimal for the COS 212 textbook and “Harry
Potter and the Philosopher’s Stone”?
 The simple solution
 Calculate frequencies for the text being encoded, and send the corresponding
Huffman codes together with the compressed file
 But the text may be long and we now need to run through it twice (once to compute
frequencies, and once to compress it)
 The table of codes can also be cumbersome to work with
 Adaptive Huffman coding
 Go through the text once, generate codes as you go
 Initially inefficient encoding that improves as we “learn” frequencies
 The receiver reconstructs the same Huffman tree dynamically
 To decode, the sender and receiver must agree on alphabet order
Adaptive Huffman Coding
 Start with a Huffman tree with only one node
 This node stores the entire alphabet and is called the alphabet node
 The frequency for this node must be 0

Frequency: 0
(A B C D E F)

 We encode text letter-by-letter

 There is no pre-processing to find letter frequencies
 Move letters out of the alphabet node when they’re first encoded
 Two cases for encoding a letter
1. If the letter is still contained in the alphabet node
2. If the letter is not contained in the alphabet node
Adaptive Huffman Coding
1. If the letter i is still contained in the alphabet node
 Generate a code that identifies the position of i in the alphabet
 Start with Huffman code of alphabet node (empty on 1 st iteration)
 Add a sequence of 1 bits where the number corresponds to the position of the letter
in the alphabet
 Indicate the end of the code with a single 0 bit

Code for A: 10
Alphabet: (A B C D E F)
Code for D: 11110

 Append the generated code to the encoded bit sequence

 Split the letter i out of the alphabet node

 In the alphabet node, move the last letter to overwrite the letter i
Frequency:
 Create 1 for the letter i, with
a new node Create a new parent
a frequency of 1 node
0 1  Left child is the alphabet node
Frequency: 0 Frequency: 1  Right child is the new node
(A
(FBBCCDDEE)F) A  Cumulative frequency is 1
 Increment counts in new node’s ancestors
Input text: AAFCCCBDD
Encoding: 10101000111000111100110
10
Adaptive Huffman Coding
2. If the letter i is not contained in the alphabet node
 The letter is already in the Huffman tree
 Build a Huffman code by traversing from the root to the letter’s leaf
 Append this code to the encoded bit sequence
 Increment the frequency of the letter’s leaf and every ancestor node to more
accurately reflect the actual probabilities in the input text

 Any frequency increment may break Huffman tree structure

 We then need to repair the tree structure
 We’ll link the nodes using a linked list in breadth-first, right-to-left order
 The sibling property must be maintained
 If the frequencies in the list are non-increasing, the tree is a Huffman tree
 If the sibling property is broken at any point, it must be restored

Frequency: 1
0 1
Frequency: 0 Frequency: 112
(F B C D E) A

Input text: AAFCCCBDD Frequencies in list: 1 2 0

Sibling property broken
Encoding: 10101000111000111100110
10 1
Adaptive Huffman Coding
2. If the letter i is not contained in the alphabet node
 Restoring the sibling property
 Sequences of linked list nodes with the same frequency are blocks
 In the example, there were two blocks before the frequency increment

Frequency: 1
Frequencies in list: 1 1 0
Frequency: 0 Frequency: 1
(F B C D E) A

 Assume that the property is broken by a frequency update for node i

 Swap node i with 1st node in its block, unless 1st node is the parent of i
Frequency: 112
 Continue with frequency increments for all the ancestors of i Note that several sibling
0 1
property violations may be
Frequency: 0 Frequency: 22
encountered, requiring a
(F B C D E) A correction each time

Input text: AAFCCCBDD Frequencies in list: 1 12 2 0

Sibling property broken
Encoding: 10101000111000111100110
10 1
Adaptive Huffman Coding
 The letter F is still contained in the alphabet node
 Generate a code for the letter F
 Huffman code of the alphabet node, sequence of 1 bits to indicate position in alphabet, and a 0 bit to terminate
the code
 Split the letter F out of the alphabet node
 In the alphabet node, move the last letter to overwrite the letter F
 Create a new node for the letter F, with a frequency of 1
 Create a new parent node for the alphabet node
 Cumulative frequency is 1
 Increment frequencies in new node’s ancestors

Frequency: 232
0 1
Frequency: 12 Frequency: 2
0 1 A
Frequency: 0 Frequency: 12
(F(EBBCCDD)
E) A
F

Input text: AAFCCCBDD Frequencies in list: 23 2 01 1 0

Encoding: 10101000111000111100110
1010
Adaptive Huffman Coding
 The letter C is still contained in the alphabet node
 Generate a code for the letter C
 Huffman code of the alphabet node, sequence of 1 bits to indicate position in alphabet, and a 0 bit to terminate
the code
 Split the letter C out of the alphabet node
 In the alphabet node, move the last letter to overwrite the letter C
 Create a new node for the letter C, with a frequency of 1
 Create a new parent node for the alphabet node
 Cumulative frequency is 1
 Increment frequencies in new node’s ancestors

Frequency: 43
0 1
Frequency: 213 Frequency: 2
0 1 A
Frequency: 1 Frequency: 12
0 1 F
A
Frequency: 0 Frequency: 1
(E(EBBCD)
D) C
F

Input text: AAFCCCBDD Frequencies in list: 43 2 21 1 10 1 0

Encoding: 10101000111000111100110
10 001110
Adaptive Huffman Coding
 The letter C is not contained in the alphabet node
 Generate Huffman code for the letter C and increment the frequency for node C
 While not at the root, check if the frequency update breaks the sibling property
 If it has, restore the sibling property by swapping the node with the 1 st node in its block
 Perform no swap if the 1st node in the block is the parent of the node
 Update the frequency of the parent node, and repeat

As an exercise, work through the remaining four Decoding follows a very similar procedure to build
inputs according to the procedure we used for this the tree from the encoded bits – Try to work out
example how the algorithm must change

Frequency: 54
0 0 1 1
Frequency:Frequency:
2 32 Frequency:
Frequency: 3 2
0 A 0 1 A 1
Frequency: 1 Frequency: 1 21
Frequency: Frequency: 2
0 1 0 C
F 1 C
Frequency: 0 Frequency:
Frequency: 21 0 Frequency: 1
(E B D) CF (E B D) F

Input text: AAFCCCBDD Frequencies in list: 54 32 23 21 1 2

1 0
001 Sibling property broken
Encoding: 10101000111000111100110
10
Run-Length Encoding
 Relies on the presence of “runs” in the data to be encoded
 Runs are sequences of exactly the same character
AAAABBCDDDDEE
 Instead of sending or storing AAAA, store 4A
 But, when would you ever see such text in the real world?
 It’s very unlikely that you would
 Run-length is inefficient for text!
 But, think about images…
Run-Length Encoding
 We iterate through the letters in the input text
 Encode each run with just two characters
AAAABBCDDDDEE
 4A2B1C4D2E
 The 1st, 2nd, 4th & 5th parts are either compressed or remain the same length
 The one exception is C, where we’ve actually increased the space used
 Solution to this problem
 Compress only the runs that are long enough
 How will we know what is compressed and what isn’t?
 Use a special character (an escape character) to show compressed runs
 For example, if % is the escape character, the encoding is %4A%2BC%4D%2E
 But BB and EE is actually shorter than %2B and %2E
 Solve this by compressing only runs that are 3 or more symbols long
 For example, %4ABBC%4DEE
 Consider AAABBB versus ABABAB
 Huffman encoding would compress both, but could run-length encoding?
 In binary there are lots of runs of 0 and 1 bits
 How would you apply run-length encoding to binary data?

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Data Structures Algorithms in Java
No ratings yet
Data Structures Algorithms in Java
3 pages
Computer Science (Tma)
No ratings yet
Computer Science (Tma)
6 pages
Java Internship Report II - Vishal Kumbhkar
No ratings yet
Java Internship Report II - Vishal Kumbhkar
48 pages
CT 1
No ratings yet
CT 1
96 pages
Crypto Lab 21124028
No ratings yet
Crypto Lab 21124028
19 pages
CS606-FinalTerm-By Rana Abubakar Khan
No ratings yet
CS606-FinalTerm-By Rana Abubakar Khan
27 pages
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
No ratings yet
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
19 pages
r20 Ds Lab Manual
No ratings yet
r20 Ds Lab Manual
42 pages
Bca 2021 24
No ratings yet
Bca 2021 24
580 pages
Open Source Software (Linux Administration) : Course Code - CSE0002
No ratings yet
Open Source Software (Linux Administration) : Course Code - CSE0002
224 pages
Intro To C#
No ratings yet
Intro To C#
90 pages
Ejercicios 1 y 2
No ratings yet
Ejercicios 1 y 2
23 pages
IT245 - Module 8
No ratings yet
IT245 - Module 8
41 pages
5 TH Sem CSE
No ratings yet
5 TH Sem CSE
18 pages
Floating Point
No ratings yet
Floating Point
33 pages
Python Course Material - Srikanth Pragada
No ratings yet
Python Course Material - Srikanth Pragada
162 pages
Data Structure Model Papers (Two) 2425
No ratings yet
Data Structure Model Papers (Two) 2425
6 pages
BCS503 - TOC TIE SIMP @vtunetwork
No ratings yet
BCS503 - TOC TIE SIMP @vtunetwork
5 pages
Part 03 - Von-Neumann Computer Architecture - PPT (1997-2003)
No ratings yet
Part 03 - Von-Neumann Computer Architecture - PPT (1997-2003)
31 pages
Mod1 Co&a Bec306c
No ratings yet
Mod1 Co&a Bec306c
31 pages
ARM Core Data Flow Model and 3 Stage Pipelining
No ratings yet
ARM Core Data Flow Model and 3 Stage Pipelining
42 pages
Mock Paper - Grade 4 - Mathematics
No ratings yet
Mock Paper - Grade 4 - Mathematics
6 pages
Brainware University's Details.
No ratings yet
Brainware University's Details.
82 pages
If 4 If If Else Else Else: Int Input
No ratings yet
If 4 If If Else Else Else: Int Input
15 pages
Module 1 - Importance of Programming
No ratings yet
Module 1 - Importance of Programming
17 pages
Dual Priority Encoder
100% (1)
Dual Priority Encoder
1 page
Lec-15 Dynamic Memory Allocation New
No ratings yet
Lec-15 Dynamic Memory Allocation New
14 pages
Unit 8: Program Constructs
No ratings yet
Unit 8: Program Constructs
13 pages
BIT: A Very Compact Scheme System For Microcontrollers
No ratings yet
BIT: A Very Compact Scheme System For Microcontrollers
28 pages
CE 204: Computer Programming Sessional: Class
No ratings yet
CE 204: Computer Programming Sessional: Class
39 pages

Chapter11 Part2

Uploaded by

Chapter11 Part2

Uploaded by

COS 212

Data Compression: Adaptive Huffman Coding

 We encode text letter-by-letter

 Append the generated code to the encoded bit sequence

 Split the letter i out of the alphabet node

 Any frequency increment may break Huffman tree structure

Input text: AAFCCCBDD Frequencies in list: 1 2 0

 Assume that the property is broken by a frequency update for node i

Input text: AAFCCCBDD Frequencies in list: 1 12 2 0

Input text: AAFCCCBDD Frequencies in list: 23 2 01 1 0

Input text: AAFCCCBDD Frequencies in list: 43 2 21 1 10 1 0

Input text: AAFCCCBDD Frequencies in list: 54 32 23 21 1 2

You might also like