0% found this document useful (0 votes)

118 views59 pages

Huffman Coding Scheme

The document discusses the Huffman coding algorithm, which assigns variable-length binary codes to characters based on their frequency, with more common characters getting shorter codes, allowing for more efficient data compression compared to fixed-length codes. It explains how the Huffman coding tree is built by prioritizing characters based on frequency and combining the two least frequent characters at each step to form nodes, until a full binary tree is constructed to determine the codes. The algorithm guarantees that no code is a prefix of another, allowing for unambiguous decoding.

Uploaded by

Sadaf Rasheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views59 pages

Huffman Coding Scheme

Uploaded by

Sadaf Rasheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 59

GROUP ID: 18 SMART SHEET BASED ATTENDANCE SYSTEM 1

Date: 9th November, 2017

UNIVERSITY OF SINDH, JAMSHORO

Title: HUFFMAN CODING ALGORITHM

Presented To:
Miss Syeda Hira Fatima Naqvi.
Presented By:
Sadaf Rasheed (2K15-CSE-72)

Department of Computer Science

INTRODUCTION

Huffman codes are an effective technique of lossless data compression which means no
information is lost.
Huffman coding could perform effective data compression by reducing the
amount of redundancy in the coding of symbols.
Huffman code is method for the compression for standard text documents.
It makes use of a binary tree to develop codes of varying lengths for the letters used in the
original message.
The algorithm was introduced by David Huffman in 1952 as part of a course assignment at
MIT.

HUFFMAN CODING SCHEME 1

CONTD:

Huffman codes can be used to compress information

Like WinZip although WinZip doesnt use the Huffman algorithm
JPEGs do use Huffman as part of their compression process
The basic idea is that instead of storing each character in a file as an 8-bit
ASCII value, we will instead store the more frequently occurring
characters using fewer bits and less frequently occurring characters using
more bits
On average this should decrease the file size (usually )

2
HUFFMAN CODING SCHEME
EXAMPLE

Consider a file of 100,000 characters from af, with these frequencies:

o a = 45,000
o b = 13,000
o c = 12,000
o d = 16,000
o e = 9,000
o f = 5,000
HUFFMAN CODING SCHEME 3
CONTD
(FIXED-LENGTH CODE )

Typically each character in a file is stored as a single byte (8 bits)

If we know we only have six characters, we can use a 3 bit code for the characters instead:
a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
This is called a fixed-length code (If every word in the code has the same length, the code is called a
fixed-length code, or a block code.)
With this scheme, we can encode the whole file with 300,000 bits
(45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)
We can do better
Better compression
More flexibility

HUFFMAN CODING SCHEME 4

Variable length codes (If every word has the different length code, the code is called a fixed-length
code, or a block code) can perform significantly better
Frequent characters are given short code words, while infrequent characters get longer code words
o Consider this scheme:
a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
How many bits are now required to encode our file?
45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4 = 224,000 bits
This is in fact an optimal character code for this file

HUFFMAN CODING SCHEME 5

PROBLEMS:
Suppose that we want to encode a message constructed from the symbols A, B, C,
D, and E using a fixed-length code
How many bits are required to encode each symbol?
at least 3 bits are required
2 bits are not enough (can only encode four symbols)
How many bits are required to encode the message DEAACAAAAABA?
there are twelve symbols, each requires 3 bits
12*3 = 36 bits are required

6
HUFFMAN CODING SCHEME
DRAWBACKS OF FIXED-LENGTH CODES:

Wasted space
Unicode uses twice as much space as ASCII
inefficient for plain-text messages containing only ASCII characters
Same number of bits used to represent all characters
a and e occur more frequently than q and z
Potential solution: use variable-length codes
variable number of bits to represent characters when frequency of occurrence is known
short codes for characters that occur frequently

7
HUFFMAN CODING SCHEME
ADVANTAGES OF VARIABLE-LENGTH CODES:
The advantage of variable-length codes over fixed-length is short codes can be given to characters that
occur frequently
on average, the length of the encoded message is less than fixed-length encoding
Potential problem: how do we know where one character ends and another begins?
not a problem if number of bits is fixed!

8
HUFFMAN CODING SCHEME
PREFIX PROPERTY:
Prefix codes
Huffman codes are constructed in such a way that they can be unambiguously
translated back to the original data, yet still be an optimal character code
Huffman codes are really considered prefix codes
A code has the prefix property if no character code is the prefix (start of the
code) for another character code

9
HUFFMAN CODING SCHEME
EXAMPLE (PREFIX):

000 is not a prefix of 11, 01, 001, or 10

11 is not a prefix of 000, 01, 001, or 10

10
HUFFMAN CODING SCHEME
CODE WITHOUT PREFIX PROPERTY:

The following code does not have prefix property

The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS

11
HUFFMAN CODING SCHEME
CONTD:

A prefix code is a type of code system (typically a variable-length code) distinguished

by its possession of the "prefix property", which requires that there is no code word in
the system that is a prefix (initial segment) of any other code word in the system.

A prefix code is a uniquely decodable code: a receiver can identify each word
without requiring a special marker between words.

12
HUFFMAN CODING SCHEME
CONTD:

Suppose we have two binary code words a and b, where a is k bits long,
b is n bits long, and k < n. If the first k bits of b are identical to a, then a is
called a prefix of b. The last n k bits of b are called the dangling suffix.

For example, if
a = 010 and b = 01011,
then a is a prefix of b and the dangling suffix is 11.
13
HUFFMAN CODING SCHEME
GROUP ID: 18 SMART SHEET BASED ATTENDANCE SYSTEM 16
PURPOSE OF HUFFMAN CODING:

Proposed by Dr. David A. Huffman in 1952

A Method for the Construction of Minimum Redundancy Codes
Applicable to many forms of data transmission
Our example: text files

14
HUFFMAN CODING SCHEME
THE BASIC ALGORITHM:

Huffman coding is a form of statistical coding

Not all characters occur with the same frequency!
Yet all characters are allocated the same amount of space
1 char = 1 byte, be it e or x

15
HUFFMAN CODING SCHEME
THE BASIC ALGORITHM:

Any savings in tailoring codes to frequency of character?

Code word lengths are no longer fixed like ASCII.
Code word lengths vary and will be shorter for the more
frequently used characters.

16
HUFFMAN CODING SCHEME
THE (REAL) BASIC ALGORITHM:
1. Scan text to be compressed and tally occurrence of all characters.

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

5. Scan text again and create new file using the Huffman codes.

17
HUFFMAN CODING SCHEME
ALGORITHM:
n <- |C
Q <- C
for i <- 1 to n-1
do allocate a new node z
left [ z ] <- x <- EXTRACT-MIN (Q)
right [ z ] <- y <- EXTRACT-MIN (Q)
f [ z ] <- f [ x ] + f [ y ]
INSERT(Q, Z)
return EXTRACT-MIN (Q)

18
HUFFMAN CODING SCHEME
ANALYSIS :

Time Complexity
Time complexity of Huffman algorithm is O(nlogn) where each
iteration requires O(logn) time to determine the cheapest
weight and there would be O(n) iterations.

19
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT

Consider the following short text:

Eerie eyes seen near lake.

Count up the occurrences of all characters in the text

20
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT

Eerie eyes seen near lake.

Q. What characters are present?

E e r i space
ysnarlk.

21
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT

Eerie eyes seen near lake.

What is the frequency of each character in the text?

22
HUFFMAN CODING SCHEME
BUILDING A TREE
PRIORITIZE CHARACTERS

Create binary tree nodes with character and frequency

of each character
Place nodes in a priority queue
The lower the occurrence, the higher the priority in the queue

23
HUFFMAN CODING SCHEME
BUILDING A TREE

The queue after inserting all nodes

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8

24
HUFFMAN CODING SCHEME
BUILDING A TREE
While priority queue contains two or more nodes
Create new node
Dequeue node and make it left subtree
Dequeue next node and make it right subtree
Frequency of new node equals sum of frequency of left and right
children
Enqueue new node back into queue

25
HUFFMAN CODING SCHEME
BUILDING A TREE

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8

26
HUFFMAN CODING SCHEME
BUILDING A TREE

27
HUFFMAN CODING SCHEME
BUILDING A TREE

28
HUFFMAN CODING SCHEME
BUILDING A TREE

29
HUFFMAN CODING SCHEME
BUILDING A TREE

30
HUFFMAN CODING SCHEME
BUILDING A TREE

31
HUFFMAN CODING SCHEME
BUILDING A TREE

32
HUFFMAN CODING SCHEME
BUILDING A TREE

33
HUFFMAN CODING SCHEME
BUILDING A TREE

34
HUFFMAN CODING SCHEME
BUILDING A TREE

35
HUFFMAN CODING SCHEME
BUILDING A TREE

36
BUILDING A TREE

37
HUFFMAN CODING SCHEME
BUILDING A TREE

38
HUFFMAN CODING SCHEME
BUILDING A TREE

39
HUFFMAN CODING SCHEME
BUILDING A TREE

Q. What is happening to the characters with a low

number of occurrences?

40
HUFFMAN CODING SCHEME
BUILDING A TREE

41
HUFFMAN CODING SCHEME
BUILDING A TREE

42
HUFFMAN CODING SCHEME
BUILDING A TREE

43
HUFFMAN CODING SCHEME
BUILDING A TREE

44
HUFFMAN CODING SCHEME
BUILDING A TREE

45
HUFFMAN CODING SCHEME
BUILDING A TREE

46
HUFFMAN CODING SCHEME
BUILDING A TREE

47
HUFFMAN CODING SCHEME
BUILDING A TREE

After
enqueueing
this node there
is only one
node left in
priority queue.

48
HUFFMAN CODING SCHEME
BUILDING A TREE
Dequeue the single node left in the
queue.
This tree contains the new code
words for each character.
Frequency of root node should equal
number of characters in text.
Eerie eyes seen near lake. 26 characters

49
HUFFMAN CODING SCHEME
ENCODING THE FILE
TRAVERSE TREE FOR CODES

Perform a traversal of the tree to

26
obtain new code words
16
Going left is a 0 going right is a 1 10

4
code word is only completed when a
e 8
6 8

leaf node is reached 2 2 2 sp 4 4

4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2

50
HUFFMAN CODING SCHEME
ENCODING THE FILE
TRAVERSE TREE FOR CODES
Char Code
E 0000
i 0001 26
y 0010
l 0011 16
10
k 0100
. 0101
4
space 011 6
e
8
8
e 10
r 1100 2 2
2 sp 4 4
s 1101 4
n 1110 E i y l k .
1 1 1 1 1 1 n a
a 1111 r s
2 2 2 2

51
HUFFMAN CODING SCHEME
ENCODING THE FILE
Rescan text and encode Char Code
file using new code
E 0000
words i
y
0001
0010
l 0011
Eerie eyes seen near lake. k 0100
000010110000011001110001 . 0101
0101101101001111101011111 space 011
100011001111110100100101
e 10
r 1100
Q. Why is there no need for a
s 1101
n 1110
separator character? a 1111
.

52
HUFFMAN CODING SCHEME
ENCODING THE FILE
RESULTS

Have we made things any better?

73 bits to encode the text 0000101100000110011100010
10110110100111110101111110

ASCII would take 8 * 26 = 208 bits

0011001111110100100101

If modified code used 4 bits per

character are needed. Total bits
4 * 26 = 104. Savings not as great.

53
HUFFMAN CODING SCHEME
APPLICATIONS OF HUFFMAN CODING:

Supports various file type as:

ZIP (multichannel compression including text and other data
types) JPEG
MPEG (only upto 2 layers)
Also used in steganography for JPEG carrier compression.

54
HUFFMAN CODING SCHEME
CONCLUSION:

Like many other useful algorithms we do require Huffman

Algorithm for compression of data so it could be transmitted
over internet and other transmission channels properly.
Huffman algorithm works on Binary trees.

55
HUFFMAN CODING SCHEME
59

LP-III Assignment No 2
No ratings yet
LP-III Assignment No 2
16 pages
Huffman
No ratings yet
Huffman
13 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
Number System
No ratings yet
Number System
23 pages
Huffman Coding
No ratings yet
Huffman Coding
32 pages
Huffman Coding: Greedy Algorithm
No ratings yet
Huffman Coding: Greedy Algorithm
27 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Unit 2
No ratings yet
Unit 2
82 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
5 Huffman Coding
No ratings yet
5 Huffman Coding
50 pages
Huffman Code
No ratings yet
Huffman Code
5 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Information and Coding Theory ECE533 - Lec-1 Introduction. Information Theory v4.0
No ratings yet
Information and Coding Theory ECE533 - Lec-1 Introduction. Information Theory v4.0
9 pages
Motorola MPC5xx Memory Maps: MPC555 MPC533/MPC534 MPC535/MPC536 MPC561/MPC562 MPC563/MPC564 MPC565/MPC566
No ratings yet
Motorola MPC5xx Memory Maps: MPC555 MPC533/MPC534 MPC535/MPC536 MPC561/MPC562 MPC563/MPC564 MPC565/MPC566
1 page
2 2 5huffman
No ratings yet
2 2 5huffman
52 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
HuffmanCoding 2
No ratings yet
HuffmanCoding 2
16 pages
CN
No ratings yet
CN
7 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Huffman
No ratings yet
Huffman
53 pages
5c. Huffman
No ratings yet
5c. Huffman
13 pages
Huffman Coding
No ratings yet
Huffman Coding
22 pages
Huffman
No ratings yet
Huffman
11 pages
G3 - Shannon-Weaver Com. Model
No ratings yet
G3 - Shannon-Weaver Com. Model
8 pages
7.4 Huffman Coding
No ratings yet
7.4 Huffman Coding
26 pages
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
No ratings yet
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
28 pages
2.3a Huffman Coding
No ratings yet
2.3a Huffman Coding
25 pages
Huffman
No ratings yet
Huffman
24 pages
Data Compression
No ratings yet
Data Compression
28 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
Huffman's Algorithm Lecture1
No ratings yet
Huffman's Algorithm Lecture1
69 pages
Huffman Coding Algorithm
No ratings yet
Huffman Coding Algorithm
4 pages
Data Measurement Chart
No ratings yet
Data Measurement Chart
6 pages
Huffman Encoding Report
No ratings yet
Huffman Encoding Report
36 pages
FALLSEM2024-25 STS3007 TH AP2024252001217 2024-11-13 Reference-Material-I
No ratings yet
FALLSEM2024-25 STS3007 TH AP2024252001217 2024-11-13 Reference-Material-I
17 pages
12 - Huffman Coding Algorithm
No ratings yet
12 - Huffman Coding Algorithm
16 pages
Lect18 19
No ratings yet
Lect18 19
17 pages
Mini Project
No ratings yet
Mini Project
26 pages
Huffman Assign (Hifza 117)
No ratings yet
Huffman Assign (Hifza 117)
6 pages
Huffman Code
No ratings yet
Huffman Code
25 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Optimization Problems
No ratings yet
Optimization Problems
38 pages
Huff Man
No ratings yet
Huff Man
8 pages
Algorithmics: Information Coding Techniques
No ratings yet
Algorithmics: Information Coding Techniques
44 pages
EE4152 Digital Communications - OBTL
No ratings yet
EE4152 Digital Communications - OBTL
6 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
M1 Greedy - Huffman Codes
No ratings yet
M1 Greedy - Huffman Codes
2 pages
04huffman 2x2
No ratings yet
04huffman 2x2
6 pages
Huffman Code
No ratings yet
Huffman Code
7 pages
Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
Huffman Coding
No ratings yet
Huffman Coding
7 pages
Huffman Tree
No ratings yet
Huffman Tree
9 pages
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
No ratings yet
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
22 pages
Hash Function
No ratings yet
Hash Function
43 pages
Huffman Tree
No ratings yet
Huffman Tree
10 pages
Chapter 1, Part 4
No ratings yet
Chapter 1, Part 4
21 pages
CH 05 - Markov Model
No ratings yet
CH 05 - Markov Model
69 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Huffman Coding
No ratings yet
Huffman Coding
6 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
DCCN Unit 2
No ratings yet
DCCN Unit 2
63 pages
Huffman Coding
No ratings yet
Huffman Coding
16 pages
Lossless Image Compression Systems
No ratings yet
Lossless Image Compression Systems
15 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Project Report Huffman Algorithm: Jinnah University For Women
No ratings yet
Project Report Huffman Algorithm: Jinnah University For Women
11 pages
Assignment No: 02 Title: Huffman Algorithm
No ratings yet
Assignment No: 02 Title: Huffman Algorithm
7 pages
Cyclic Redundancy Check
No ratings yet
Cyclic Redundancy Check
7 pages
CRC
No ratings yet
CRC
35 pages
Token Bucket
No ratings yet
Token Bucket
15 pages
Huffman Coding
No ratings yet
Huffman Coding
8 pages
Week 6 Information Theory - Part 2
No ratings yet
Week 6 Information Theory - Part 2
27 pages
S 2
No ratings yet
S 2
8 pages
8.part II - Channel Code - Introduction
No ratings yet
8.part II - Channel Code - Introduction
37 pages
Information Theory
No ratings yet
Information Theory
27 pages
Compression: Another Example of Greedy Algorithm: Huffman Codes
No ratings yet
Compression: Another Example of Greedy Algorithm: Huffman Codes
4 pages
Video Coding
No ratings yet
Video Coding
19 pages
Forward Error Correction
No ratings yet
Forward Error Correction
5 pages
IIT Convolution Codes
No ratings yet
IIT Convolution Codes
11 pages
Fundamental Connections Between Utility Theories of Wealth and Information Theory
No ratings yet
Fundamental Connections Between Utility Theories of Wealth and Information Theory
21 pages
Probst DRFP
No ratings yet
Probst DRFP
21 pages
Nhập môn An toàn Thông tin
No ratings yet
Nhập môn An toàn Thông tin
26 pages
Data Compression and Encryption 1
No ratings yet
Data Compression and Encryption 1
16 pages
Design and Implementation A New Security Hash Algorithm Based On Md5 and Sha-256
No ratings yet
Design and Implementation A New Security Hash Algorithm Based On Md5 and Sha-256
9 pages
Samet Sincar. Gezgin
No ratings yet
Samet Sincar. Gezgin
2 pages
FP Data Comm
No ratings yet
FP Data Comm
5 pages
Proof of The Kraft-McMillan Inequality
No ratings yet
Proof of The Kraft-McMillan Inequality
1 page
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet

Huffman Coding Scheme

Uploaded by

Huffman Coding Scheme

Uploaded by

GROUP ID: 18 SMART SHEET BASED ATTENDANCE SYSTEM 1

Date: 9th November, 2017

UNIVERSITY OF SINDH, JAMSHORO

Title: HUFFMAN CODING ALGORITHM

Department of Computer Science

HUFFMAN CODING SCHEME 1

Huffman codes can be used to compress information

Consider a file of 100,000 characters from af, with these frequencies:

Typically each character in a file is stored as a single byte (8 bits)

HUFFMAN CODING SCHEME 4

HUFFMAN CODING SCHEME 5

000 is not a prefix of 11, 01, 001, or 10

The following code does not have prefix property

The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS

A prefix code is a type of code system (typically a variable-length code) distinguished

Proposed by Dr. David A. Huffman in 1952

Huffman coding is a form of statistical coding

Any savings in tailoring codes to frequency of character?

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

Consider the following short text:

Eerie eyes seen near lake.

Count up the occurrences of all characters in the text

Eerie eyes seen near lake.

Eerie eyes seen near lake.

Create binary tree nodes with character and frequency

The queue after inserting all nodes

Q. What is happening to the characters with a low

Perform a traversal of the tree to

leaf node is reached 2 2 2 sp 4 4

Have we made things any better?

ASCII would take 8 * 26 = 208 bits

If modified code used 4 bits per

Supports various file type as:

Like many other useful algorithms we do require Huffman

You might also like