0% found this document useful (0 votes)
46 views18 pages

Huffman Codes

Huffman coding is an algorithm for constructing optimal prefix codes for data compression. It involves combining the two lowest probability symbols into a new node and repeating this process until a full tree is constructed. Huffman codes provide the shortest possible expected code length for a given source. The algorithm runs in O(n log n) time and produces codes that are optimal for the given probabilities.

Uploaded by

gunnersregister
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views18 pages

Huffman Codes

Huffman coding is an algorithm for constructing optimal prefix codes for data compression. It involves combining the two lowest probability symbols into a new node and repeating this process until a full tree is constructed. Huffman codes provide the shortest possible expected code length for a given source. The algorithm runs in O(n log n) time and produces codes that are optimal for the given probabilities.

Uploaded by

gunnersregister
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Huffman Codes

Let A1,...,An be a set of items.


Let P1,...,Pn be their probabilities.
We want to find a set of lengths L1,...,Ln
n
that will produce a minimal Li Pi
i =1

2.

Klein S. T. and Wiseman Y.

Huffman's algorithm
If (n==2) then {0,1}
Else combine 2 smallest probabilities Pn,Pn-1
solve for P1,P2,...,Pn-2,Pn-1+Pn
if Pn-1+Pn is represented by
then
Pn-1 will be represented by 0
Pn will be represented by 1
2.

Klein S. T. and Wiseman Y.

Example of Huffman coding


In a given language:
A - 20%, B - 10%, C - 10%, D - 30%, E - 30%
11
10
00
011
010

E(30)
D(30)
A(20)
C(10)
B(10)

E(30)
D(30)
A(20)
CB(20)

CBA(40)
E(30)
D(30)

ED(60)
CBA(40)
1
1

2.

Klein S. T. and Wiseman Y.

Implementation
Sorting is O(n log n).
Finding where to insert a new item is
O(n).
There are n-2 items to find, so the total
order is O(n2).

Total order is O(nlogn)+O(n2) i.e. O(n2).

2.

Klein S. T. and Wiseman Y.

Improved implementation
Two queues.

a4

b4

a3

b3

a2

b2

a1

b1
2.

false

true

b1>a2

a1,a2

Klein S. T. and Wiseman Y.

a1>b2

false

a1,b1

true

b1,b2

Improved implementation (cont.)


Sort all items into queue "a".
Queue "b" is empty.
While there are values in the queues,
Put sum of 2 low values in queue "b".
The order of the numbers created by the
additions are in non-decreasing order.
The algorithm is O(nlogn)+O(n) i.e. O(nlogn)
2.

Klein S. T. and Wiseman Y.

Results of Huffman
Examples for text files:
4500000
4000000
3500000
3000000
2500000
2000000
1500000
1000000
500000
0

original
compressed

hebrew
bible
2.

english
bible

Voltaire
in French

Klein S. T. and Wiseman Y.

Huffman is optimal
An optimal tree is a tree for which

is minimal

n
Li Pi
i =1

Lemmas: In an Optimal tree:


The tree is full so at least 2 nodes on lowest level.
2 nodes with lowest weights are on lowest level.
2 lowest weights can be on brother nodes.
2.

Klein S. T. and Wiseman Y.

Theorem
Let T1 be an optimal tree with weights
W1,...,Wn.
The two lowest weights Wn,Wn-1 are on
lowest level and they are brothers.
Let be the father of Wn,Wn-1.
Let T2 be a same tree as T1,, without
Wn,Wn-1 but with having weight Wn+Wn-1
Theorem: T2 is an optimal tree.
2.

Klein S. T. and Wiseman Y.

Theorem (cont.)

Wn

2.

T1

T2

Wn+Wn-1
Wn-1

Klein S. T. and Wiseman Y.

Theorem - proof
n
Let us denote M1= Li Wi .
i =1
M1=W1L1+...+Wn-1Ln-1+WnLn-1
Brothers have same level

M2=M1-(Wn-1 Ln-1+WnLn-1)+(Wn-1+Wn)(Ln-1-1)
=M1-Wn-1-Wn
Weight of

2.

Klein S. T. and Wiseman Y.

Theorem - proof (cont.)


Suppose T2 is not Optimal.
There is a tree T3T2 which is optimal.
M3<M2.
There is a leaf in T3 which has weight
Wn-1+Wn.
Let T4 be a tree with same nodes as T3 but
instead of it has two leaves - Wn-1, Wn
2.

Klein S. T. and Wiseman Y.

Theorem - proof (cont.)


T3

T4

Wn

2.

Klein S. T. and Wiseman Y.

Wn-1

Theorem - proof (end)


T4 is a tree for the weights W1,...,Wn.
M4=M3+Wn-1+Wn<M2+Wn-1+Wn=M1
M4<M1
But M1 is an optimal tree. contradiction!

T2 is an optimal tree.

2.

Klein S. T. and Wiseman Y.

Theorem
Let T be an optimal tree for any n
leaves.
T is equivalent to a Huffman tree with
same leaves. i.e. they both have same
n
Li Wi .
i =1

2.

Klein S. T. and Wiseman Y.

Theorem - Proof
Induction on number of leaves.
for n=2 there is just one option so the
trees are equivalent.

W1

2.

W2

Klein S. T. and Wiseman Y.

Theorem proof (end)


Let us assume for n-1 and prove for n.
Let T1 be an optimal tree.
According to previous theorem T2 is optimal
too.
According to induction's assumption T2 which
has n-1 leaves, is equivalent to a Huffman
tree.
T1 was built according to Huffman's algorithm
so T1 is equivalent to a Huffman Tree too.
2.

Klein S. T. and Wiseman Y.

Optimal trees
Huffman trees are optimal as proved
but there are other trees which are
optimal but not Huffman.
Example:
1

A 20%, B 10%, C 10%,


D 30%, E 30%

2.

Klein S. T. and Wiseman Y.

You might also like