0% found this document useful (0 votes)
215 views

Huffman Algorithm - Code Construction 2

Huffman coding is a data compression algorithm that uses variable-length codes to encode characters based on their frequency of occurrence. It assigns shorter bit sequences to more common characters and longer bit sequences to less common characters, resulting in a smaller average code length and a compressed file size that is typically around half the original uncompressed size. The algorithm builds a Huffman tree by combining the two least frequent characters at each step until a single root node remains, then assigns codes to each character by traversing the tree from root to leaf and outputting a 0 for each left branch and 1 for each right branch.

Uploaded by

Tommy Hizkia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views

Huffman Algorithm - Code Construction 2

Huffman coding is a data compression algorithm that uses variable-length codes to encode characters based on their frequency of occurrence. It assigns shorter bit sequences to more common characters and longer bit sequences to less common characters, resulting in a smaller average code length and a compressed file size that is typically around half the original uncompressed size. The algorithm builds a Huffman tree by combining the two least frequent characters at each step until a single root node remains, then assigns codes to each character by traversing the tree from root to leaf and outputting a 0 for each left branch and 1 for each right branch.

Uploaded by

Tommy Hizkia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Huffman Coding

Huffman Coding
Huffman codes can be used to compress information
Like WinZip although WinZip doesnt use the Huffman algorithm JPEGs do use Huffman as part of their compression process

The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we will instead store the more frequently occurring characters using fewer bits and less frequently occurring characters using more bits
On average this should decrease the filesize (usually )

Huffman Coding
As an example, lets take the string:
duke blue devils

We first to a frequency count of the characters:


e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1

Next we use a Greedy algorithm to build up a Huffman Tree


We start with nodes for each character
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

Huffman Coding
We then pick the nodes with the smallest frequency and combine them together to form a new node
The selection of these nodes is the Greedy part

The two selected nodes are removed from the set, but replace by the combined node This continues until we have only 1 node left in the set

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1
i,1

2
s,1

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1
b,1

2
v,1 i,1

2
s,1

Huffman Coding
e,3 d,2 u,2 l,2 sp,2
k,1 b,1

3
2 v,1 i,1

2
s,1

Huffman Coding
e,3 d,2 u,2 4
l,2 sp,2 k,1 b,1

3
2 v,1 i,1

2
s,1

Huffman Coding
e,3
d,2

4
u,2

4
l,2 sp,2 k,1

3
2 b,1 v,1 i,1

2
s,1

Huffman Coding
e,3
d,2

4
u,2

4
l,2 sp,2 i,1 2 s,1

5
3 k,1 b,1 2 v,1

Huffman Coding
7
e,3 d,2 4 u,2

4
l,2 sp,2 i,1 2 s,1

5
3 k,1 b,1 2 v,1

Huffman Coding
7
e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1

9
5 3 2 v,1

Huffman Coding
16 7 e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1 9 5 3 2 v,1

Huffman Coding
Now we assign codes to the tree by placing a 0 on every left branch and a 1 on every right branch A traversal of the tree from root to leaf give the Huffman code for that particular leaf character Note that no code is the prefix of another code

00 010 011 100 1100 1101 1110

Huffman Coding
16 7 e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1 9 5 3 2 v,1

d u l i s k

sp 101

b
v

11110
11111

Huffman Coding
These codes are then used to encode the string Thus, duke blue devils turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101

When grouped into 8-bit bytes:


01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx

Thus it takes 7 bytes of space compared to 16 characters * 1 byte/char = 16 bytes uncompressed

Huffman Coding
Uncompressing works by reading in the file bit by bit
Start at the root of the tree If a 0 is read, head left If a 1 is read, head right When a leaf is reached decode that character and start over again at the root of the tree

Thus, we need to save Huffman table information as a header in the compressed file
Doesnt add a significant amount of size to the file for large files (which are the ones you want to compress anyway) Or we could use a fixed universal set of codes/freqencies

You might also like