0% found this document useful (0 votes)
105 views31 pages

Lesson - Huffman and Entropy Coding

The document discusses Huffman coding and entropy encoding for data compression. It explains that Huffman coding assigns variable length binary codes to characters, with shorter codes for more frequent characters. This allows data to be compressed by representing characters with fewer bits on average. The document provides an example of building a Huffman tree and assigning codes. It notes that Huffman coding results in optimal compression and has the unique prefix property. Entropy encoding techniques aim to represent data as close to the theoretical entropy limit as possible.

Uploaded by

Sheikh Rasel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views31 pages

Lesson - Huffman and Entropy Coding

The document discusses Huffman coding and entropy encoding for data compression. It explains that Huffman coding assigns variable length binary codes to characters, with shorter codes for more frequent characters. This allows data to be compressed by representing characters with fewer bits on average. The document provides an example of building a Huffman tree and assigning codes. It notes that Huffman coding results in optimal compression and has the unique prefix property. Entropy encoding techniques aim to represent data as close to the theoretical entropy limit as possible.

Uploaded by

Sheikh Rasel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Information Theory & Coding

Huffman and Entropy Coding

Professor Dr. A.K.M Fazlul Haque


Electronics and Telecommunication Engineering (ETE)
Daffodil International University
Basic Idea

Note:

Fixed-length encoding
ASCII, Unicode

Variable-length encoding : assign longer code words to less


frequent characters, shorter code words to more frequent
characters.
Huffman Coding

 Huffman codes can be used to compress information


– Like WinZip – although WinZip doesn’t use the
Huffman algorithm
– JPEGs do use Huffman as part of their compression
process
Huffman Coding (Cont.)

 As an example, lets take the string:


“duke blue devils”
 We first to a frequency count of the characters:
• e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
 Next we use a Greedy algorithm to build up a Huffman
Tree
– We start with nodes for each character

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Huffman Coding (Cont.)

 We then pick the nodes with the smallest frequency and combine
them together to form a new node.
– The selection of these nodes is the Greedy part
 The two selected nodes are removed from the set, but replace by the
combined node.
 This continues until we have only 1 node left in the set.
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 2

i,1 s,1
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 2 2

b,1 v,1 i,1 s,1


Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 3 2

k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 d,2 u,2 4 3 2

l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 4 4 3 2

d,2 u,2 l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

7 4 5

e,3 4 l,2 sp,2 2 3

d,2 u,2 i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

16

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

 Now we assign codes to the tree by placing a 0 on every left branch


and a 1 on every right branch.
 A traversal of the tree from root to leaf give the Huffman code for that
particular leaf character.
 Note that no code is the prefix of another code.
Huffman Coding (Cont.)

16
e 00
7 9
d 010
u 011
e,3 4 4 5
l 100
sp 101
d,2 u,2 l,2 sp,2 2 3
i 1100
i,1 s,1 k,1 2 s 1101
k 1110
b,1 v,1 b 11110
v 11111
Huffman Coding (Cont.)

 These codes are then used to encode the string.

 Thus, “duke blue devils” turns into:


010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101

 When grouped into 8-bit bytes:


01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx

 Thus it takes 7 bytes of space compared to 16 characters * 1 byte/char =


16 bytes uncompressed
Huffman Coding

 Uncompressing works by reading in the file bit by bit.


– Start at the root of the tree
– If a 0 is read, head left
– If a 1 is read, head right
– When a leaf is reached decode that character and start over again at
the root of the tree
 Thus, we need to save Huffman table information as a header in the
compressed file.
– Doesn’t add a significant amount of size to the file for large files (which
are the ones you want to compress anyway)
– Or we could use a fixed universal set of codes/freqencies
Most important properties of
Huffman Coding

 Unique Prefix Property: No Huffman code is a prefix of any other


Huffman code
• For example, 101 and 1010 cannot be Huffman codes. Why?
 Optimality: The Huffman code is a minimum-redundancy code (given
an accurate data model)
• The two least frequent symbols will have the same length for their
Huffman code, whereas symbols occurring more frequently will
have shorter Huffman codes
• It has been shown that the average code length of an information
source S is strictly less than  + 1, i.e.
 l’ <  + 1
Data Compression Scheme

Input Data Encoder B0 = # bits required before compression


(compression) B1 = # bits required after compression

Codes / Compression Ratio = B0 / B1.


Code words Storage or
Networks

Codes /
Code words Decoder
(decompression)

Output
Data
Compression Techniques

Coding Type Basis Technique


Run-length Coding
Entropy
Huffman Coding
Encoding
Arithmetic Coding
DPCM
Prediction
DM
FFT
Transformation
DCT
Source Coding
Bit Position
Layered Coding Subsampling
Sub-band Coding
Vector Quantization
JPEG
MPEG
Hybrid Coding
H.263
Many Proprietary Systems
Compression Techniques (Cont.)

 Entropy Coding
– Semantics of the information to encoded are ignored
– Lossless compression technique
– Can be used for different media regardless of their characteristics
 Source Coding
– Takes into account the semantics of the information to be encoded.
– Often lossy compression technique
– Characteristics of medium are exploited
 Hybrid Coding
– Most multimedia compression algorithms are hybrid techniques
Entropy Encoding

 Information theory is a discipline in applied mathematics involving the


quantification of data with the goal of enabling as much data as possible
to be reliably stored on a medium and/or communicated over a channel.
 According to Claude E. Shannon, the entropy  (eta) of an information
source with alphabet S = {s1, s2, ..., sn} is defined as

n n
1
  H ( S )   pi log 2   pi log 2 pi
i 1 pi i 1

where pi is the probability that symbol si in S will occur.


Entropy Encoding (Cont.)

 Example 1: What is the entropy of an image with uniform distributions


of gray-level intensities (i.e. pi = 1/256 for all i)?
 Example 2: What is the entropy of an image whose histogram shows
that one third of the pixels are dark and two thirds are bright?
Entropy Encoding: Run-Length

 Data often contains sequences of identical bytes. Replacing these


repeated byte sequences with the number of occurrences reduces
considerably the overall data size.
 Many variations of RLE
– One form of RLE is to use a special marker M-byte that will indicate the
number of occurrences of a character
• “c”!#
– How many bytes are used above? When do you think the M-
byte should be used?
• ABCCCCCCCCDEFGGG
is encoded as
ABC!8DEFGGG

– What if the string contains the “!” character?


– How much is the compression ratio for this example
3.8 Entropy Encoding: Run-Length (Cont.)

 Many variations of RLE :


 Zero-suppression: In this case, one character that is
repeated very often is the only character used in the
RLE. In this case, the M-byte and the number of
additional occurrences are stored.
 When do you think the M-byte should be used, as
opposed to using the regular representation without
any encoding?
Entropy Encoding: Run-Length (Cont.)

 Many variations of RLE :


– If we are encoding black and white images
(e.g. Faxes), one such version is as follows:
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runk
begin, col# runk end)
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runr
begin, col# runr end)
– ...
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runs
begin, col# runs end)
Entropy Encoding: Huffman Coding

 One form of variable length coding.


 Greedy algorithm.
 Has been used in fax machines, JPEG and MPEG.
Entropy Encoding: Huffman Coding
(Cont.)

Algorithm of Huffman Coding:


Input: A set C = {c1 , c2 , ... , cn} of n characters and their frequencies {f(c1) ,
f(c2 ) , ... , f(cn )}
Output: A Huffman tree (V, T) for C.
1. Insert all characters into a min-heap H according to their frequencies.
2. V = C; T = {}
3. for j = 1 to n – 1
4. c = deletemin(H)
5. c’ = deletemin(H)
6. f(v) = f(c) + f(c’) // v is a new node
7. Insert v into the minheap H
8. Add (v,c) and (v,c’) to tree T making c and c’ children of v in T
9. end for
END

You might also like