Algorithm Analysis and Implementation of Huffman Coding for Grayscale Image Compression Using Python
Algorithm Analysis and Implementation of Huffman Coding for Grayscale Image Compression Using Python
College of Computer Studies (CCS), Laguna State Polytechnic University, Los Baños,
Laguna, Philippines
Keyword: Huffman Coding, Algorithm Analysis, Data Compression, Python, Greedy Algorithm, Binary Tree, Pixel Frequency,
Encoding Efficiency, Image Processing, Lossless Compression, Grayscale Image Compression, Memory Utilization, GUI
Implementation, Tree Traversal, Information Theory
Abstract - This study explores Huffman Coding, a lossless data compression algorithm that efficiently minimizes the size of
textual data by assigning variable-length codes based on character frequency. Implemented using Python, the project constructs a
Huffman Tree and generates unique binary codes for each character in a given input string. Through the use of heap-based
priority queues and tree traversal algorithms, the implementation demonstrates how frequent characters receive shorter codes,
thus reducing overall data size.
The study reports a compression rate of approximately 39% for a sample input, affirming the algorithm's effectiveness in real-
world scenarios. Applications are discussed in the context of artificial intelligence, image processing. Supplementary references
from academic institutions and GitHub repositories underscore the practical and educational significance of the algorithm.
Recommendations include exploring adaptive variants of Huffman Coding and expanding its use in multimedia and mobile
applications.
1. INTRODUCTION processing, recognition, and artificial intelligence. The
study also evaluates the algorithm's performance using
In an era where data is central to decision-making, Google Colaboratory and aims to bridge theoretical
communication, and computing, optimizing the storage and concepts with hands-on simulation [24].
transfer of digital information is more crucial than ever.
Data compression algorithms play a vital role in 1.1 STATEMENT OF THE PROBLEM
minimizing resource usage while preserving data integrity.
Huffman Coding, developed by David A. Huffman in 1952, With the continuous growth of data usage in systems
[1] remains one of the most effective and widely used ranging from social media to sensor networks, efficient
lossless data compression techniques[4]. data handling is crucial. Many conventional storage
techniques waste memory or bandwidth by using uniform
Huffman Coding is based on the principle of assigning code lengths. This leads to the central question of the study:
shorter binary codes to more frequent characters and longer
codes to less frequent ones. This algorithm constructs a How does Huffman Coding perform in terms of
binary tree called the Huffman Tree, ensuring that no code computational efficiency and space reduction when
is a prefix of another (prefix code). This guarantees the implemented in Python on textual data?
unambiguous decoding of compressed data.
The purpose of this study is to implement Huffman Coding 1.2 OBJECTIVE OF THE STUDY
in Python, analyze its algorithmic efficiency, and explore
its real-world applicability in fields such as image
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
● To implement the Huffman Coding algorithm The University of Illinois Urbana-Champaign provides
using Python with manual tree construction and course material focusing on entropy and its relationship to
frequency analysis. optimal encoding strategies. This adds theoretical
● To measure and analyze the algorithm’s grounding to the algorithm's efficiency.[10] Louisiana State
performance in terms of time complexity and University supplements this with practical demonstrations
space efficiency. and course resources for Huffman Tree construction and
● To demonstrate the encoding and decoding implementation using different programming paradigms.
processes through actual code execution. [11]
● To relate the theoretical efficiency of Huffman
Coding to its practical performance and visualize GitHub repositories such as "TheAlgorithms/Python"
its behavior. showcase various implementations of Huffman Coding,
encouraging community collaboration and peer-review.
These implementations are frequently used for
1.3 SCOPE AND DELIMITATION benchmarking and educational purposes.[12]
The study is limited to static Huffman Coding applied to
simple character-based data sets (e.g., gray scale images). Additional studies and textbooks (e.g., Cormen et al.,
Introduction to Algorithms) discuss Huffman Coding in the
Adaptive Huffman Coding, arithmetic encoding, or
context of greedy algorithm design, reinforcing its
compression of multimedia data such as images and audio significance in the broader field of computer science.
are beyond the scope. Python was chosen as the Journals from the ACM and IEEE Digital Library also
implementation language due to its readability and feature comparative analyses of Huffman Coding with
accessibility. Performance testing is limited to small-to- other data compression techniques, providing insights into
medium data samples. its strengths and limitations.[21][22]
Image compression using Huffman coding extends the text ● Programming Language: Python 3.10+
compression principles to handle two-dimensional pixel ● IDE/Text Editor: IDLE python 3.12 64 bit
data. For grayscale images, pixel intensity values (0-255) ● Libraries: heapq, collections.Counter, Tkinter
replace characters as the symbols to be encoded. The ● Operating System: Windows
compression begins by analyzing the frequency distribution ● Testing Environment: IDLE python 3.12 64 bit
of these intensity values across the entire image.
The implementation leverages Python's standard libraries to
The construction of the Huffman tree follows the same efficiently handle data structures critical to Huffman
process as text compression, but with pixel intensity values coding. The heapq module provides priority queue
as nodes rather than characters. After building the functionality essential for tree construction, while
frequency-weighted binary tree, a mapping table is created collections.Counter simplifies frequency analysis. For the
that assigns shorter bit sequences to more common graphical user interface components, Tkinter was selected
intensity values. for its cross-platform compatibility and straightforward
integration with Python.
During encoding, each pixel is replaced with its
corresponding bit sequence. The encoded image consists of 3.3 ALGORITHM IMPLEMENTATION
two components: the Huffman table (necessary for
reconstruction) and the compressed pixel data. Additional
metadata such as image dimensions must also be preserved Huffman Coding starts by calculating the frequency of each
to enable correct decompression. symbol (pixel value) in the input. These frequencies are
used to construct a priority queue. The two lowest-
frequency nodes are repeatedly merged to form a binary
Decompression reverses this process, using the Huffman tree. Each symbol receives a unique binary code based on
table to rebuild the tree, then traversing it according to the its path from the root to its leaf node (left = 0, right = 1).
encoded bits to reconstruct each pixel value. The process The encoded output is stored as a compressed binary string.
continues until all pixels are recovered, at which point they Decompression is done by traversing the tree according to
are rearranged into the original two-dimensional structure. each bit in the encoded string.
For color images, compression can be applied to each color Huffman Coding for image compression follows a
channel (RGB) separately, or the pixel values can be systematic process that begins with frequency analysis of
processed as tuples. When processing images, pixel values and culminates in a compressed binary
considerations like maintaining spatial relationships and representation. Our Python implementation leverages
handling large-scale frequency distributions become several key data structures and algorithms to achieve
particularly important for achieving optimal compression efficient lossless compression specifically for grayscale
ratios while preserving image quality. images. The process follows these key steps:
3. METHODOLOGY The Python implementation follows these key steps:
This paper outlines the systematic approach employed to 1. Pixel Frequency Analysis: Using Counter to
implement and analyze the Huffman coding algorithm calculate pixel intensity frequencies across the
using Python. image
generate_codes(node.right, prefix
import tracemalloc
import time
tracemalloc.start()
compress_start = time.time()
compress_end = time.time()
compress_time = compress_end -
compress_start
current, peak_memory =
tracemalloc.get_traced_memory()
tracemalloc.stop()
Table 1. Sample Image 1 Metadata
Format Png
Colorspace sRGB
Colors 225
Format Png
Colorspace sRGB
Colors 554
Format Png
Colorspace sRGB
Colors 130
Figure 1:
These visualizations serve both analytical and educational The process flow diagram in Figure 2 illustrates the
purposes, making the abstract concept of Huffman coding systematic workflow of the Huffman coding
more accessible and highlighting the relationship between implementation in this study. It depicts the step-by-step
character frequency and code length optimization.
sequence beginning with pixel frequency analysis of
grayscale images, followed by node creation based on these
frequencies. The diagram then shows the tree construction
phase using a priority queue (heap) structure, where nodes
are merged iteratively from bottom-up according to
frequency values. This leads to the code assignment stage,
where binary codes are recursively generated by traversing
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
the Huffman tree. The flow continues with the image
encoding process, where pixel data is converted to
compressed bit strings, and concludes with the decoding
and image reconstruction phases. Throughout the
workflow, performance monitoring tracks compression
metrics including time efficiency and memory usage. This
hierarchical visualization effectively captures the entire
compression and decompression pipeline implemented in
the Python-based system.
Figure 3:
Figure 4:
Figure 7:
Figure 8
● Memory Usage:
Peak memory usage during compression scaled
2 100% None 100%
High with pixel diversity, ranging from 10,621 KB to
(≥ 15,402 KB, while decompression memory
0.99) remained consistent at approximately 8,251 KB
across all images.
● Application Suitability:
3 100% None 100% The algorithm is particularly effective for images
High
(≥ and datasets with skewed symbol frequencies,
0.99) making it well-suited for domains such as
document archiving, medical imaging (e.g., X-
rays), and embedded systems where space
efficiency and deterministic decoding are
4 100% None 100% essential.
High
(≥ ● Limitations:
0.99) As expected, compression efficiency is
diminished with uniform or high-entropy data,
where the frequency distribution of symbols
(pixels) is evenly spread. In such cases, the
5 100% None 100% Huffman overhead can reduce or nullify
High
(≥ compression benefits.
0.99)
REFERENCE
[1] Huffman, D. A. (1952). A Method for the [3] Cover, T. M., & Thomas, J. A. (2006). Elements of
Construction of Minimum-Redundancy Codes. Proceedings Information Theory (2nd ed.). Wiley-Interscience.
of the IRE, 40(9), 1098–1101. [4] Salomon, D. (2007). Data Compression: The
[2] Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Complete Reference (4th ed.). Springer.
Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT [5] Sayood, K. (2017). Introduction to Data
Press. Compression (5th ed.). Morgan Kaufmann.
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
[6] GeeksforGeeks. (2023). Huffman Coding – Greedy [23] Kumar, A., & Rani, P. (2020). Lossless Image
Algorithm. Retrieved from Compression Techniques and Their Applications. Journal
https://www.geeksforgeeks.org/huffman-coding-greedy- of Image Processing & Pattern Recognition, 6(2), 101–112.
algo-3/ [24] Gonzalez, R. C., & Woods, R. E. (2018). Digital
[7] FavTutor. (2023). Huffman Coding in Python. Image Processing (4th ed.). Pearson.
Retrieved from https://favtutor.com/blogs/huffman-coding- [25] Goyal, S., & Arora, A. (2019). Tree-Based Encoding
python Algorithms in Data Compression. International Journal of
[8] Downey, A. B. (2015). Think Python: How to Think Advanced Computer Science, 10(5), 57–65.
Like a Computer Scientist (2nd ed.). O’Reilly Media. [26] Сорин, Д. Б. (2006). PHP: the best web-
[9] Sioson, M. (2024). Entropy and the Efficiency of programming language. Научно-технический вестник
Huffman Encoding. Medium. информационных технологий, механики и оптики, (27),
https://medium.com/@msioson/huffman-coding-efficiency 113-121. Retrieved from
[10] University of Illinois at Urbana-Champaign. https://cyberleninka.ru/article/n/php-the-best-web-
(2022). CS 498: Data Compression Lecture Notes. programming-language
[11] Louisiana State University. (2023). Huffman Trees [27] Chen, Y., Li, H., & Wang, X. (2021). A survey on
and Compression Lecture Slides. Retrieved from cs.lsu.edu document intelligence: Techniques, applications, and
from challenges. Journal of Artificial Intelligence Research, 70,
https://www.sciencedirect.com/science/article/pii/S089360 1–36.
8014002135 [28] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017).
[12] TheAlgorithms/Python GitHub. (n.d.). Huffman Attention is all you need. Advances in Neural Information
Coding Implementation. Retrieved from Processing Systems, 30, 5998–6008.
https://github.com/TheAlgorithms/Python [29] Google Research. (2024). Introducing Gemini 2: The
[13] Python Software Foundation. (2024). Python heapq next generation multimodal AI. Retrieved from
Module Documentation. https://ai.googleblog.com
https://docs.python.org/3/library/heapq.html [30] Khandelwal, R., Bhatt, A., & Singhal, R. (2022).
[14] Python Software Foundation. (2024). collections — Cloud-based machine learning pipelines for document
Container datatypes. automation. International Journal of Computer
https://docs.python.org/3/library/collections.html Applications, 184(8), 45–52.
[15] Pil Image Library (Pillow). (2024). Pillow [31] Hamad, K., & Kaya, M. (2016). A Detailed Analysis
Documentation. https://pillow.readthedocs.io/ of Optical Character Recognition Technology. International
[16] Tkinter GUI Documentation. (2024). Python Journal of Applied Mathematics Electronics and
Interface to Tcl/Tk. Computers, (Special Issue-1), 244–249.
https://docs.python.org/3/library/tkinter.html https://doi.org/10.18100/ijamec.270374
[17] Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). [32] Khan, S., Ullah, A., Ullah, H., & Ullah, W. (2021).
Arithmetic Coding for Data Compression. Communications An Overview and Applications of Optical Character
of the ACM, 30(6), 520–540. Recognition.
[18] Ziv, J., & Lempel, A. (1977). A Universal Algorithm Academia.edu.(https://www.academia.edu/43901444/An_O
for Sequential Data Compression. IEEE Transactions on verview_and_Applications_of_Optical_Character_Recogni
Information Theory, 23(3), 337–343. tion)
[19] Chowdhury, M. Z. R., & Hossain, M. S. (2018). [33] Penn State University Libraries. (n.d.). Optical
Performance Analysis of Static Huffman Coding on Image Character Recognition (OCR): An Introduction. Retrieved
Compression. International Journal of Computer May 2, 2025, from(https://guides.libraries.psu.edu/OCR)
Applications, 181(32), 24–29. [34] ProQuest. (2018). A review on optical character
[20] Rao, P. R. (2020). Greedy Algorithms and Huffman recognition system. Journal of Advanced Research in
Coding Applications. Journal of Computer Science and Dynamical and Control Systems, 10(10 Special Issue),
Applications, 12(1), 10–18. 1805-1809.
[21] Nagla, B., & Mehta, R. (2021). Comparison of https://www.proquest.com/scholarly-journals/review-
Huffman, Arithmetic, and LZW Compression Techniques. optical-character-recognition-system/docview/
ACM Computing Surveys. 2108150311/se-2
[22] IEEE Xplore. (2022). Comparative Analysis of [35] ResearchGate. (n.d.). Optical Character
Lossless Compression Algorithms on Grayscale Images. Recognition. Retrieved May 2, 2025,
https://ieeexplore.ieee.org from(https://www.researchgate.net/publication/360620085_
OPTICAL_CHARACTER_RECOGNITION)
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
[36] Soeno, S. (2024). Development of novel optical International Journal of Document Analysis and
character recognition system to record vital signs and Recognition, 24(1), 33–47. https://doi.org/10.1007/s10032-
prescriptions: An intra-subject experimental study. PLOS 021-00365-5
[39] Zhang, L., Chen, Y., & Zhao, H. (2023). Sequence-
ONE, 19(1), e0296714. https://www.google.com/search?
Aware Transformer Models for OCR Post-Processing.
q=https://doi.org/10.1371/journal.pone.0296714 Pattern Recognition Letters, 169, 112–120.
[37] Wang, Y., Wang, Z., & Wang, H. (2024). Filling https://doi.org/10.1016/j.patrec.2023.04.002
Method Based on OCR and Text Similarity. Applied [40] Xu, M., & Lin, Z. (2023). Layout-Aware Multimodal
Sciences, 14(3), 1034. Transformers for Document Understanding. IEEE
https://doi.org/10.3390/app14031034 Transactions on Pattern Analysis and Machine
[38] Shilton, J., Kumar, R., & Dey, S. (2021). AI-based Intelligence. https://doi.org/10.1109/TPAMI.2023.3257841
Document Automation in Enterprise Systems.