Huffman Tree and Coding
Huffman Tree and Coding
Note: We're treating uppercase and lowercase letters as distinct characters. However, in
"AmirAli", all instances of 'A' are uppercase, and all instances of 'i' are lowercase.
Create a leaf node for each character and add it to a priority queue (min-
heap) based on their frequencies:
1. m (frequency 1)
2. r (frequency 1)
3. l (frequency 1)
4. A (frequency 2)
5. i (frequency 2)
1


Parham and AmirAli’s work
Iteration 1:
• A (frequency 2)
• i (frequency 2)
• N2 (frequency 3)
Iteration 3:
• N2 (frequency 3)
• N3 (frequency 4)
Iteration 4:
2

Parham and AmirAli’s work
• Create: New root node N4 with frequency 7.
[N4:7]
/ \
'0'/ \'1'
/ \
[N2:3] [N3:4]
/ \ / \
/ \ / \
/ \
'0'/ \'1'
/ \
[m:1] [r:1]
3

Parham and AmirAli’s work
• 'm': Path '0' (to N2) + '1' (to N1) + '0' (to 'm') = '010'
• 'r': Path '0' (to N2) + '1' (to N1) + '1' (to 'r') = '011'
Why were 'm' and 'r' selected before 'l' in the min-heap priority queue?
• Equal Frequencies: In the word "AmirAli", the characters 'm', 'r', and 'l'
each have a frequency of 1.
4

Parham and AmirAli’s work
• Arbitrary Selection: When multiple nodes have the same minimum frequency,
any two of them can be selected for combination. The Huffman algorithm
doesn't specify which ones to pick first in this case.
• Our Example: We arbitrarily chose to combine 'm' and 'r' first. This was a
random choice for illustration purposes.
Detailed Explanation
▪ Why 'm' and 'r'? Because all three nodes ('m', 'r', 'l') have
the same frequency, we can pick any two. We chose 'm' and 'r'
arbitrarily.
◦ Combine Them:
◦ Update Min-Heap:
◦ Combine Them:
5

Parham and AmirAli’s work
▪ N2 becomes the parent of 'l' and N1.
◦ Update Min-Heap:
◦ The selection between 'm', 'r', and 'l' was arbitrary because they
all had the same frequency.
◦ We could have combined 'l' and 'm', or 'l' and 'r', and the final
Huffman codes would still be optimal.
◦ The total length of the encoded message and the efficiency of the
Huffman codes remain the same regardless of the order in which
equally frequent nodes are combined.
Alternative Scenario:
Suppose we had combined 'l' and 'm' first instead:
The Huffman tree structure would be slightly different, but the overall encoding
efficiency would remain the same.
The specific Huffman codes for each character might differ in their bit patterns,
but the total number of bits used to encode the message would be identical.
Additional Note
• If you have specific criteria or preferences (e.g., always selecting the
earliest character alphabetically), you can apply that consistently
throughout the algorithm.
• However, the standard Huffman algorithm does not mandate any particular
tie-breaking rule, focusing instead on minimizing the total encoded length.
6