SlideShare a Scribd company logo
4
Most read
13
Most read
15
Most read
Huffman Coding Vida Movahedi October 2006
Contents A simple example Definitions Huffman Coding Algorithm Image Compression
A simple example Suppose we have a message consisting of 5 symbols, e.g. [ ►♣♣♠☻►♣☼►☻] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols    at least 3 bits  For a simple encoding, length of code is 10*3=30 bits
A simple example – cont. Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code For Huffman code, length of encoded message  will be  ►♣♣♠☻►♣☼►☻ =3*2 +3*2+2*2+3+3=24bits
Definitions An  ensemble  X is a triple (x, A x , P x ) x: value of a random variable A x : set of possible values for x , A x ={a 1 , a 2 , …, a I } P x : probability for each value , P x ={p 1 , p 2 , …, p I } where P(x)=P(x=a i )=p i , p i > 0,  Shannon information content  of x h(x) = log 2 (1/P(x)) Entropy  of x i a i p i h(p i ) 1 a .0575 4.1 2 b .0128 6.3 3 c .0263 5.2 .. .. .. 26 z .0007 10.4
Source Coding Theorem There exists a variable-length encoding C of an ensemble X such that the average length of an encoded symbol,  L(C,X),  satisfies  L(C,X)  [H(X), H(X)+1) The Huffman coding algorithm produces optimal symbol codes
Symbol Codes Notations: A N : all strings of length N A + : all strings of finite length {0,1} 3 ={000,001,010,…,111} {0,1} + ={0,1,00,01,10,11,000,001,…} A symbol code C for an ensemble X is a mapping from A x  (range of x values) to {0,1} + c(x): codeword for x, l(x): length of codeword
Example Ensemble X: A x = {  a  ,  b ,  c  ,  d  } P x = { 1/2  , 1/4 , 1/8 ,  1/8 } c(a)= 1000 c + (acd)= 100000100001 (called the  extended  code) 4 0001 d 4 0010 c 4 0100 b 4 1000 a l i c(a i ) a i C 0 :
Any encoded string must have a  unique  decoding A code C(X) is  uniquely decodable  if, under the extended code C + , no two distinct strings have the same encoding, i.e.
The symbol code must be easy to decode If possible to identify end of a codeword as soon as it arrives  no codeword can be a prefix of another codeword A symbol code is called a  prefix code  if no code word is a prefix of any other codeword (also called  prefix-free  code,  instantaneous  code or  self-punctuating  code)
The code should achieve  as much compression as possible The  expected length  L(C,X) of symbol code C for X is
Example Ensemble X: A x = {  a  ,  b ,  c  ,  d  } P x = { 1/2  , 1/4 , 1/8 ,  1/8 } c + (acd)= 0110111 (9 bits compared with 12) prefix code? 3 111 d 3 110 c 2 10 b 1 0 a l i c(a i ) a i C 1 :
The Huffman Coding algorithm- History In 1951, David Huffman and his MIT information theory classmates given the choice of a term paper or a final exam Huffman hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing so, the student outdid his professor, who had worked with information theory inventor Claude Shannon to develop a similar code.  Huffman built the tree from the bottom up instead of from the top down
Huffman Coding Algorithm Take the two least probable symbols in the alphabet (longest codewords, equal length, differing in last digit) Combine these two symbols into a single symbol, and repeat.
Example A x ={ a  , b  , c ,  d  , e } P x ={ 0.25, 0.25, 0.2, 0.15, 0.15 } d 0.15 e 0.15 b 0.25 c 0.2 a 0.25 00 10 11 010 011 0.3 0 1 0.45 0 1 0.55 0 1 1.0 0 1
Statements Lower bound on expected length is H(X) There is no better symbol code for a source than the Huffman code Constructing a binary tree top-down is suboptimal
Disadvantages of the Huffman Code Changing ensemble If the ensemble changes   the frequencies and probabilities change    the optimal coding changes e.g. in text compression symbol frequencies vary with context Re-computing the Huffman code by running through the entire file in advance?! Saving/ transmitting the code too?! Does not consider ‘blocks of symbols’ ‘ strings_of_ch’   the next nine symbols are predictable ‘aracters_’ , but bits are used without conveying any new information
Variations n -ary Huffman coding Uses {0, 1, .., n-1} (not just {0,1}) Adaptive Huffman coding Calculates frequencies dynamically based on recent actual frequencies Huffman template algorithm Generalizing  probabilities    any weight Combining methods (addition)    any function Can solve other min. problems e.g.  max [w i +length(c i )]
Image Compression 2-stage Coding technique A linear predictor such as DPCM, or some linear predicting function    Decorrelate the raw image data A standard coding technique, such as Huffman coding, arithmetic coding, … Lossless JPEG: - version 1: DPCM with arithmetic coding - version 2: DPCM with Huffman coding
DPCM Differential Pulse Code Modulation DPCM is an efficient way to encode highly correlated analog signals into binary form suitable for digital transmission, storage, or input to a digital computer Patent by Cutler (1952)
DPCM
Huffman Coding Algorithm  for Image Compression Step 1. Build a Huffman tree by sorting the histogram and successively combine the two bins of the lowest value until only one bin remains. Step 2. Encode the Huffman tree and save the Huffman tree with the coded value. Step 3. Encode the residual image.
Huffman Coding of the most-likely magnitude MLM Method Compute the residual histogram H H(x)= # of pixels having residual magnitude x Compute the symmetry histogram S S(y)= H(y) + H(-y), y > 0 Find the range threshold R  for N: # of pixels , P: desired proportion of most-likely magnitudes
References MacKay, D.J.C. , Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. Wikipedia,  http://en.wikipedia.org/wiki/Huffman_coding Hu, Y.C. and Chang, C.C., “A new losseless compression scheme based on Huffman coding scheme for image compression”,  O’Neal

More Related Content

PDF
Huffman and Arithmetic coding - Performance analysis
PPT
Arithmetic coding
PPTX
Run length encoding
PPTX
PPTX
Huffman codes
PPT
Data compression
PPTX
Data compression
PDF
String matching, naive,
Huffman and Arithmetic coding - Performance analysis
Arithmetic coding
Run length encoding
Huffman codes
Data compression
Data compression
String matching, naive,

What's hot (20)

PPSX
Color Image Processing: Basics
PPT
Lzw coding technique for image compression
PPTX
digital image processing
PPTX
Fundamentals and image compression models
PPTX
Image restoration and degradation model
PPTX
Homomorphic filtering
PPTX
Color Models.pptx
PDF
Digital Image Processing: Image Segmentation
PPTX
Histogram Equalization
PPTX
Digital Image restoration
PPT
Wavelet transform in image compression
PPTX
Image compression in digital image processing
PPTX
Transform coding
PPT
Image segmentation
PPTX
Image compression
PPTX
Color Image Processing
PPTX
Huffman coding
PPTX
Image Sampling and Quantization.pptx
PPTX
5. gray level transformation
PPTX
Image Enhancement in Spatial Domain
Color Image Processing: Basics
Lzw coding technique for image compression
digital image processing
Fundamentals and image compression models
Image restoration and degradation model
Homomorphic filtering
Color Models.pptx
Digital Image Processing: Image Segmentation
Histogram Equalization
Digital Image restoration
Wavelet transform in image compression
Image compression in digital image processing
Transform coding
Image segmentation
Image compression
Color Image Processing
Huffman coding
Image Sampling and Quantization.pptx
5. gray level transformation
Image Enhancement in Spatial Domain
Ad

Similar to Huffman Coding (20)

PDF
Sunzip user tool for data reduction using huffman algorithm
PDF
Data compression huffman coding algoritham
PPT
Huffman code presentation and their operation
PPT
hufman coding for compression algorithm.ppt
PPT
compression & huffman coder problem .ppt
PPT
hufman code presentation and how to compress data using hufman code
PDF
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
PDF
Introduction to Source Coding.pdf
PDF
Implementation of Lossless Compression Algorithms for Text Data
PDF
Module-IV 094.pdf
PPT
huffman Codes + Programming 5TH (part1).ppt
PPT
huffman Codes +5TH Programming (part1).ppt
PPT
Hufman coding basic
PPTX
Huffman Coding
PDF
Unit 2 Lecture notes on Huffman coding
PPTX
Farhana shaikh webinar_huffman coding
PPT
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
PPT
Huffmans code
PPT
huffman ppt
Sunzip user tool for data reduction using huffman algorithm
Data compression huffman coding algoritham
Huffman code presentation and their operation
hufman coding for compression algorithm.ppt
compression & huffman coder problem .ppt
hufman code presentation and how to compress data using hufman code
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
Introduction to Source Coding.pdf
Implementation of Lossless Compression Algorithms for Text Data
Module-IV 094.pdf
huffman Codes + Programming 5TH (part1).ppt
huffman Codes +5TH Programming (part1).ppt
Hufman coding basic
Huffman Coding
Unit 2 Lecture notes on Huffman coding
Farhana shaikh webinar_huffman coding
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
Huffmans code
huffman ppt
Ad

More from anithabalaprabhu (20)

PPTX
Shannon Fano
PDF
Ch 04 Arithmetic Coding ( P P T)
PPT
Compression
PPT
Datacompression1
PPT
Speech Compression
PDF
Z24 4 Speech Compression
PPT
PDF
Dictionary Based Compression
PDF
Module 4 Arithmetic Coding
PDF
Ch 04 Arithmetic Coding (Ppt)
PPT
Compression Ii
PDF
06 Arithmetic 1
PDF
Arithmetic Coding
PPT
Compression Ii
PPT
PPT
PPT
Losseless
PPT
Lec5 Compression
Shannon Fano
Ch 04 Arithmetic Coding ( P P T)
Compression
Datacompression1
Speech Compression
Z24 4 Speech Compression
Dictionary Based Compression
Module 4 Arithmetic Coding
Ch 04 Arithmetic Coding (Ppt)
Compression Ii
06 Arithmetic 1
Arithmetic Coding
Compression Ii
Losseless
Lec5 Compression

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
Modernizing your data center with Dell and AMD
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
madgavkar20181017ppt McKinsey Presentation.pdf
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Modernizing your data center with Dell and AMD
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Huffman Coding

  • 1. Huffman Coding Vida Movahedi October 2006
  • 2. Contents A simple example Definitions Huffman Coding Algorithm Image Compression
  • 3. A simple example Suppose we have a message consisting of 5 symbols, e.g. [ ►♣♣♠☻►♣☼►☻] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols  at least 3 bits For a simple encoding, length of code is 10*3=30 bits
  • 4. A simple example – cont. Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code For Huffman code, length of encoded message will be ►♣♣♠☻►♣☼►☻ =3*2 +3*2+2*2+3+3=24bits
  • 5. Definitions An ensemble X is a triple (x, A x , P x ) x: value of a random variable A x : set of possible values for x , A x ={a 1 , a 2 , …, a I } P x : probability for each value , P x ={p 1 , p 2 , …, p I } where P(x)=P(x=a i )=p i , p i > 0, Shannon information content of x h(x) = log 2 (1/P(x)) Entropy of x i a i p i h(p i ) 1 a .0575 4.1 2 b .0128 6.3 3 c .0263 5.2 .. .. .. 26 z .0007 10.4
  • 6. Source Coding Theorem There exists a variable-length encoding C of an ensemble X such that the average length of an encoded symbol, L(C,X), satisfies L(C,X)  [H(X), H(X)+1) The Huffman coding algorithm produces optimal symbol codes
  • 7. Symbol Codes Notations: A N : all strings of length N A + : all strings of finite length {0,1} 3 ={000,001,010,…,111} {0,1} + ={0,1,00,01,10,11,000,001,…} A symbol code C for an ensemble X is a mapping from A x (range of x values) to {0,1} + c(x): codeword for x, l(x): length of codeword
  • 8. Example Ensemble X: A x = { a , b , c , d } P x = { 1/2 , 1/4 , 1/8 , 1/8 } c(a)= 1000 c + (acd)= 100000100001 (called the extended code) 4 0001 d 4 0010 c 4 0100 b 4 1000 a l i c(a i ) a i C 0 :
  • 9. Any encoded string must have a unique decoding A code C(X) is uniquely decodable if, under the extended code C + , no two distinct strings have the same encoding, i.e.
  • 10. The symbol code must be easy to decode If possible to identify end of a codeword as soon as it arrives  no codeword can be a prefix of another codeword A symbol code is called a prefix code if no code word is a prefix of any other codeword (also called prefix-free code, instantaneous code or self-punctuating code)
  • 11. The code should achieve as much compression as possible The expected length L(C,X) of symbol code C for X is
  • 12. Example Ensemble X: A x = { a , b , c , d } P x = { 1/2 , 1/4 , 1/8 , 1/8 } c + (acd)= 0110111 (9 bits compared with 12) prefix code? 3 111 d 3 110 c 2 10 b 1 0 a l i c(a i ) a i C 1 :
  • 13. The Huffman Coding algorithm- History In 1951, David Huffman and his MIT information theory classmates given the choice of a term paper or a final exam Huffman hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing so, the student outdid his professor, who had worked with information theory inventor Claude Shannon to develop a similar code. Huffman built the tree from the bottom up instead of from the top down
  • 14. Huffman Coding Algorithm Take the two least probable symbols in the alphabet (longest codewords, equal length, differing in last digit) Combine these two symbols into a single symbol, and repeat.
  • 15. Example A x ={ a , b , c , d , e } P x ={ 0.25, 0.25, 0.2, 0.15, 0.15 } d 0.15 e 0.15 b 0.25 c 0.2 a 0.25 00 10 11 010 011 0.3 0 1 0.45 0 1 0.55 0 1 1.0 0 1
  • 16. Statements Lower bound on expected length is H(X) There is no better symbol code for a source than the Huffman code Constructing a binary tree top-down is suboptimal
  • 17. Disadvantages of the Huffman Code Changing ensemble If the ensemble changes  the frequencies and probabilities change  the optimal coding changes e.g. in text compression symbol frequencies vary with context Re-computing the Huffman code by running through the entire file in advance?! Saving/ transmitting the code too?! Does not consider ‘blocks of symbols’ ‘ strings_of_ch’  the next nine symbols are predictable ‘aracters_’ , but bits are used without conveying any new information
  • 18. Variations n -ary Huffman coding Uses {0, 1, .., n-1} (not just {0,1}) Adaptive Huffman coding Calculates frequencies dynamically based on recent actual frequencies Huffman template algorithm Generalizing probabilities  any weight Combining methods (addition)  any function Can solve other min. problems e.g. max [w i +length(c i )]
  • 19. Image Compression 2-stage Coding technique A linear predictor such as DPCM, or some linear predicting function  Decorrelate the raw image data A standard coding technique, such as Huffman coding, arithmetic coding, … Lossless JPEG: - version 1: DPCM with arithmetic coding - version 2: DPCM with Huffman coding
  • 20. DPCM Differential Pulse Code Modulation DPCM is an efficient way to encode highly correlated analog signals into binary form suitable for digital transmission, storage, or input to a digital computer Patent by Cutler (1952)
  • 21. DPCM
  • 22. Huffman Coding Algorithm for Image Compression Step 1. Build a Huffman tree by sorting the histogram and successively combine the two bins of the lowest value until only one bin remains. Step 2. Encode the Huffman tree and save the Huffman tree with the coded value. Step 3. Encode the residual image.
  • 23. Huffman Coding of the most-likely magnitude MLM Method Compute the residual histogram H H(x)= # of pixels having residual magnitude x Compute the symmetry histogram S S(y)= H(y) + H(-y), y > 0 Find the range threshold R for N: # of pixels , P: desired proportion of most-likely magnitudes
  • 24. References MacKay, D.J.C. , Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. Wikipedia, http://en.wikipedia.org/wiki/Huffman_coding Hu, Y.C. and Chang, C.C., “A new losseless compression scheme based on Huffman coding scheme for image compression”, O’Neal