Huffman Encoding
and Data
Compression using
MATLAB
By Ali Saad & Jaafar
Amer
Introduction to Huffman
Encoding
Lossless Compression Invented in 1952 Common Applications
Reduces file size without losing Developed by David A. Huffman, Used in JPEG, MP3, ZIP, and more
any information. a pioneering computer scientist. file formats.
Variable-length Encoding Tree-based Algorithm Improves Efficiency
Assigns shorter codes to more Uses a binary tree structure to Minimizes the total number of
frequent symbols and longer codes to efficiently determine prefix bits needed to represent data.
less frequent ones. codes.
Prefix Property
No code is a prefix of another, ensuring unambiguous
decoding.
The Goal: Variable-Length
Encoding
Short Codes for Long Codes for Example: Morse Code
Frequent Symbols Rare Symbols Common letters use shorter
Minimizes the total encoded Preserves information signals for quick transmission.
data length efficiently. while optimizing size.
Reduces Average Ensures Efficient Storage
Code Length Unambiguous and Transmission
By assigning codes based on
Decoding Variable-length encoding
symbol frequency, it minimizes Uses prefix codes where no code optimizes both memory usage
the expected bits per symbol. is a prefix of another, avoiding and bandwidth.
confusion during decoding.
Foundation for Lossless Compression
Forms the basis of many compression algorithms by effectively reducing
redundancy.
Key Steps in
Huffman Encoding
Calculate Symbol Frequencies
1
Count occurrences to determine symbol
probability.
Build the Huffman Tree
2
Construct a binary tree based on symbol
frequencies.
Generate Encoding
3
Assign binary codes from tree paths to
symbols.
Huffman Encoding with MATLAB
The following MATLAB script demonstrates Huffman encoding on a short
message:
We implemented the Huffman Encoding algorithm
using built-in MATLAB functions:
Defined the set of symbols as a cell array: {'a', 'b', 'c', 'd'}
Assigned each symbol a probability: [0.4, 0.3, 0.2, 0.1]
Used huffmandict to generate the Huffman dictionary
Created a message as a cell array: {'a', 'b', 'a', 'c', 'a',
'd', 'b'}
Used huffmanenco to encode the message using the
dictionary
This approach ensures lossless compression using a
minimal number of bits per symbol based on frequency.
Compression
Results
56 bits 13
Original Size Compressed Size
The full message size Smaller size after encoding
0.23 76.78%
Compression Ratio Efficiency
Compressed sized divided by the full Demonstrates effective data
size reduction