0% found this document useful (0 votes)
75 views3 pages

Huffman Codes: Forxinc: Addxtoheapqbyp (X)

1) Huffman coding is a lossless encoding algorithm that assigns variable-length binary codewords to characters based on their probabilities. It constructs a binary tree with characters as leaves and encodes them with paths to the root. 2) The algorithm builds the tree bottom-up by repeatedly merging the two lowest probability characters into a node until only one node remains. 3) It can be proven that the algorithm produces an optimal tree by showing that there exists an optimal tree with the two lowest probability characters at the deepest level, and inductively the tree constructed has minimum expected length.

Uploaded by

Andrew Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views3 pages

Huffman Codes: Forxinc: Addxtoheapqbyp (X)

1) Huffman coding is a lossless encoding algorithm that assigns variable-length binary codewords to characters based on their probabilities. It constructs a binary tree with characters as leaves and encodes them with paths to the root. 2) The algorithm builds the tree bottom-up by repeatedly merging the two lowest probability characters into a node until only one node remains. 3) It can be proven that the algorithm produces an optimal tree by showing that there exists an optimal tree with the two lowest probability characters at the deepest level, and inductively the tree constructed has minimum expected length.

Uploaded by

Andrew Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Huffman Codes

Encoding

Huffman coding is a lossless ecoding algorithm that encodes an alphabet into


a prefix code, a mapping of codewords to characters so that no codeword
is a prefix of another. The encoded alphabet can be represented as a binary
tree with all letters on the leaves.

0
a

1
0

b
0
c

1
d

Optimal Code If P (x) is the probability of letter x and dT (x) is the


depth in the tree of the codeword, for an alphabet C, the expected encoding
length for n characters, which we want to minimize, is
B(T ) = n

p(x)dT (x)

(1)

xC

Algorithm

Build the tree bottom-up, merging 2 characters into a meta-character on


each step, then recurse on the 1-letter smaller alphabet with the metacharacter. When merging, always pick the 2 characters with the lowest
probability.
1

for x in C :
add x to heap Q by p ( x )

for i in [1 , | C | - 1]:
z = new internal tree node
left [ z ] = x = extract - min ( Q )
right [ z ] = y = extract - min ( Q )
p(z) = p(x) + p(y)
insert z into Q

11

return last element in Q as root

Huffman Codes
3

Proof

Claim For x, y C with the lowest probabilities, there exists an optimal tree with these two letters at maximum depth.
Proof By contradiction, take b, c, x, y C | p(b) p(c), p(x) p(y)
with no loss of generality. By the claim that x and y have the lowest probabilities, p(x) p(b), p(y) p(c). Now put b, c at the bottom of the tree so
that dT (b) dT (x), dT (c) dT (y).
T0

T
0
x

b
0
y

1
0
b

0
y
1
c

1
0
x

1
c

Swap the positions of x and b in the tree to obtain the new tree T 0 . The
cost change from T to T 0 is given by:
B(T 0 ) = B(T ) p(x)dT (x) + p(x)dT (b) p(b)dT (b) + p(b)dT (x)

(2)

= B(T ) + p(x)(dT (b) dT (x)) p(b)(dT (b) dT (x))

(3)

= B(T ) (p(b) p(x))(dT (b) dT (x))

(4)

In the analysis above, we fixed that p(x) p(b) and dT (x) dT (b), so
therefore B(T 0 ) B(T ). By a similar exchange argument for y, this proves
that there exists some optimal tree with x and y at the deepest level.
Claim

Huffmans algorithm always produces an optimal tree.

Proof Use induction. The base case n = 1 is trivially correct, and by


the inductive hypothesis we assume correctness on n 1 characters.
Given a tree for n characters such that x, y are the lowest probability
letters, we know that they are at the bottom of the optimal solution. Now

Huffman Codes
replace x, y with z so that p(z) = p(x) + p(y), as in the algorithm so that
now |C 0 | = n 1.
Consider that any prefix tree T for C 0 has z as a leaf node (by definition).
Replace this leaf node with an internal node branching out to two new leaves
x and y. The cost change is:
B(T 0 ) = B(T ) p(z)dT (z) + p(x)(dT (z) + 1) + p(y)(dT (z) + 1)
= B(T ) + (p(x) + p(y))

(5)
(6)

By the inductive hypothesis, B(T ) is minimal. Because we picked x, y


so that their sum would be the smallest possible out of the alphabet, B(T 0 )
is optimal as well, which means Huffmans algorithm always produces a
minimal prefix code.

You might also like