0% found this document useful (0 votes)
7 views

Algorithm Analysis and Implementation of Huffman Coding for Grayscale Image Compression Using Python

This study presents the implementation of Huffman Coding for grayscale image compression using Python, demonstrating its effectiveness with a reported compression rate of approximately 39%. The research analyzes the algorithm's efficiency in terms of time complexity and space utilization, while also exploring its applications in artificial intelligence and image processing. Recommendations for future work include investigating adaptive variants and expanding the algorithm's use in multimedia applications.

Uploaded by

vince tamis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Algorithm Analysis and Implementation of Huffman Coding for Grayscale Image Compression Using Python

This study presents the implementation of Huffman Coding for grayscale image compression using Python, demonstrating its effectiveness with a reported compression rate of approximately 39%. The research analyzes the algorithm's efficiency in terms of time complexity and space utilization, while also exploring its applications in artificial intelligence and image processing. Recommendations for future work include investigating adaptive variants and expanding the algorithm's use in multimedia applications.

Uploaded by

vince tamis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Republic of the Philippines

Laguna State Polytechnic University


Province of Laguna

Algorithm Analysis and Implementation of Huffman


Coding for Grayscale Image Compression Using Python
Charles Amiel A. Malabanan, Lyx Lamuel B. Dilla, Vince Daniel P. Tamis

College of Computer Studies (CCS), Laguna State Polytechnic University, Los Baños,
Laguna, Philippines

Keyword: Huffman Coding, Algorithm Analysis, Data Compression, Python, Greedy Algorithm, Binary Tree, Pixel Frequency,
Encoding Efficiency, Image Processing, Lossless Compression, Grayscale Image Compression, Memory Utilization, GUI
Implementation, Tree Traversal, Information Theory

Abstract - This study explores Huffman Coding, a lossless data compression algorithm that efficiently minimizes the size of
textual data by assigning variable-length codes based on character frequency. Implemented using Python, the project constructs a
Huffman Tree and generates unique binary codes for each character in a given input string. Through the use of heap-based
priority queues and tree traversal algorithms, the implementation demonstrates how frequent characters receive shorter codes,
thus reducing overall data size.

The study reports a compression rate of approximately 39% for a sample input, affirming the algorithm's effectiveness in real-
world scenarios. Applications are discussed in the context of artificial intelligence, image processing. Supplementary references
from academic institutions and GitHub repositories underscore the practical and educational significance of the algorithm.
Recommendations include exploring adaptive variants of Huffman Coding and expanding its use in multimedia and mobile
applications.
1. INTRODUCTION processing, recognition, and artificial intelligence. The
study also evaluates the algorithm's performance using
In an era where data is central to decision-making, Google Colaboratory and aims to bridge theoretical
communication, and computing, optimizing the storage and concepts with hands-on simulation [24].
transfer of digital information is more crucial than ever.
Data compression algorithms play a vital role in 1.1 STATEMENT OF THE PROBLEM
minimizing resource usage while preserving data integrity.
Huffman Coding, developed by David A. Huffman in 1952, With the continuous growth of data usage in systems
[1] remains one of the most effective and widely used ranging from social media to sensor networks, efficient
lossless data compression techniques[4]. data handling is crucial. Many conventional storage
techniques waste memory or bandwidth by using uniform
Huffman Coding is based on the principle of assigning code lengths. This leads to the central question of the study:
shorter binary codes to more frequent characters and longer
codes to less frequent ones. This algorithm constructs a How does Huffman Coding perform in terms of
binary tree called the Huffman Tree, ensuring that no code computational efficiency and space reduction when
is a prefix of another (prefix code). This guarantees the implemented in Python on textual data?
unambiguous decoding of compressed data.

The purpose of this study is to implement Huffman Coding 1.2 OBJECTIVE OF THE STUDY
in Python, analyze its algorithmic efficiency, and explore
its real-world applicability in fields such as image
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
● To implement the Huffman Coding algorithm The University of Illinois Urbana-Champaign provides
using Python with manual tree construction and course material focusing on entropy and its relationship to
frequency analysis. optimal encoding strategies. This adds theoretical
● To measure and analyze the algorithm’s grounding to the algorithm's efficiency.[10] Louisiana State
performance in terms of time complexity and University supplements this with practical demonstrations
space efficiency. and course resources for Huffman Tree construction and
● To demonstrate the encoding and decoding implementation using different programming paradigms.
processes through actual code execution. [11]
● To relate the theoretical efficiency of Huffman
Coding to its practical performance and visualize GitHub repositories such as "TheAlgorithms/Python"
its behavior. showcase various implementations of Huffman Coding,
encouraging community collaboration and peer-review.
These implementations are frequently used for
1.3 SCOPE AND DELIMITATION benchmarking and educational purposes.[12]
The study is limited to static Huffman Coding applied to
simple character-based data sets (e.g., gray scale images). Additional studies and textbooks (e.g., Cormen et al.,
Introduction to Algorithms) discuss Huffman Coding in the
Adaptive Huffman Coding, arithmetic encoding, or
context of greedy algorithm design, reinforcing its
compression of multimedia data such as images and audio significance in the broader field of computer science.
are beyond the scope. Python was chosen as the Journals from the ACM and IEEE Digital Library also
implementation language due to its readability and feature comparative analyses of Huffman Coding with
accessibility. Performance testing is limited to small-to- other data compression techniques, providing insights into
medium data samples. its strengths and limitations.[21][22]

Downey (2022) provides an in-depth explanation of


Huffman coding from a practical Python implementation
2.0 REVIEW OF RELATED LITERATURE
perspective [8]. He clarifies three critical aspects of
Huffman codes: their nature as "codes" (mappings from
Huffman (1952) introduced an optimal method for binary symbols to bit strings), their property as "prefix codes"
prefix coding, based on the principle of minimizing (where no bit string is a prefix of another), and their
weighted path lengths within a binary tree. His method "optimality" (minimizing average bit length by assigning
assigns the shortest codes to the most frequent characters, shorter codes to frequent symbols).[8] His work
enabling significant space savings over fixed-length demonstrates the complete implementation process
encodings[5]. This algorithm is foundational in data including frequency analysis using Python's Counter, heap-
compression and is integrated into file formats like ZIP, based tree construction with heapq[13], and both encoding
JPEG, and MP3. [1] and decoding processes. This resource is particularly
valuable for understanding the data structures that make
GeeksforGeeks (2025) provides comprehensive tutorials Huffman coding efficient, specifically how binary trees and
that break down Huffman Coding as a greedy algorithm. heaps enable O(n log n) time complexity for the algorithm.
They describe how a priority queue (min-heap) is used to
iteratively combine the two least frequent nodes, Overall, the literature supports Huffman Coding as a well-
constructing the tree from the bottom up. Their established yet continually relevant algorithm. It remains
visualization of merging processes makes the tree essential in applications requiring fast, reliable, and space-
construction concept more accessible. [6] efficient data handling.

FavTutor (2023) demonstrates a clean Python 2.6 PYTHON IDE


implementation using the heapq module and object-
oriented programming. The guide details the step-by-step
Python offers various Integrated Development
creation of Huffman Trees and the recursive traversal used
Environments (IDEs) that facilitate the implementation of
to assign binary codes to characters. Their performance
algorithms like Huffman Coding. For this study, Visual
benchmarks validate the theoretical time complexity of O(n
Studio Code was selected due to its lightweight nature,
log n). [7]
extensive extension support, and integrated terminal
functionality. IDLE python 3.12 64 bit provides syntax
Medium (Sioson, 2024) contextualizes Huffman Coding highlighting, code completion, and debugging capabilities
within Shannon's entropy model, showing how the essential for efficient algorithm development.
algorithm approximates the theoretical limit of data
representation. The article relates this efficiency to real-
The implementation also benefits from Python's rich
world use in ZIP compression and other file formats, noting
ecosystem of libraries. Specifically, the collections [14]
its practical significance in everyday computing.[9]
module (for Counter), heapq (for priority queue
operations), and Tkinter (for GUI development) were
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
utilized. Python's inherent readability and extensive assignment via recursive traversal.[6] The implementation
standard library make it particularly suitable for focuses specifically on processing pixel intensity values (0-
educational implementations of algorithms, allowing for 255) rather than characters, enabling efficient lossless
clear representation of concepts like tree structure and compression of image data while preserving the complete
recursive traversal[7][8]. visual information. The research examines how the
frequency distribution of pixel values in different image
types affects compression efficiency and performance.
2.8 LOGIC OF COMPRESSING AND
DECOMPRESSING OF IMAGE 3.2 TOOLS AND ENVIRONMENT

Image compression using Huffman coding extends the text ● Programming Language: Python 3.10+
compression principles to handle two-dimensional pixel ● IDE/Text Editor: IDLE python 3.12 64 bit
data. For grayscale images, pixel intensity values (0-255) ● Libraries: heapq, collections.Counter, Tkinter
replace characters as the symbols to be encoded. The ● Operating System: Windows
compression begins by analyzing the frequency distribution ● Testing Environment: IDLE python 3.12 64 bit
of these intensity values across the entire image.
The implementation leverages Python's standard libraries to
The construction of the Huffman tree follows the same efficiently handle data structures critical to Huffman
process as text compression, but with pixel intensity values coding. The heapq module provides priority queue
as nodes rather than characters. After building the functionality essential for tree construction, while
frequency-weighted binary tree, a mapping table is created collections.Counter simplifies frequency analysis. For the
that assigns shorter bit sequences to more common graphical user interface components, Tkinter was selected
intensity values. for its cross-platform compatibility and straightforward
integration with Python.
During encoding, each pixel is replaced with its
corresponding bit sequence. The encoded image consists of 3.3 ALGORITHM IMPLEMENTATION
two components: the Huffman table (necessary for
reconstruction) and the compressed pixel data. Additional
metadata such as image dimensions must also be preserved Huffman Coding starts by calculating the frequency of each
to enable correct decompression. symbol (pixel value) in the input. These frequencies are
used to construct a priority queue. The two lowest-
frequency nodes are repeatedly merged to form a binary
Decompression reverses this process, using the Huffman tree. Each symbol receives a unique binary code based on
table to rebuild the tree, then traversing it according to the its path from the root to its leaf node (left = 0, right = 1).
encoded bits to reconstruct each pixel value. The process The encoded output is stored as a compressed binary string.
continues until all pixels are recovered, at which point they Decompression is done by traversing the tree according to
are rearranged into the original two-dimensional structure. each bit in the encoded string.
For color images, compression can be applied to each color Huffman Coding for image compression follows a
channel (RGB) separately, or the pixel values can be systematic process that begins with frequency analysis of
processed as tuples. When processing images, pixel values and culminates in a compressed binary
considerations like maintaining spatial relationships and representation. Our Python implementation leverages
handling large-scale frequency distributions become several key data structures and algorithms to achieve
particularly important for achieving optimal compression efficient lossless compression specifically for grayscale
ratios while preserving image quality. images. The process follows these key steps:
3. METHODOLOGY The Python implementation follows these key steps:

This paper outlines the systematic approach employed to 1. Pixel Frequency Analysis: Using Counter to
implement and analyze the Huffman coding algorithm calculate pixel intensity frequencies across the
using Python. image

from collections import Counter


3.1 RESEARCH DESIGN
frequency = Counter(pixels)
This study uses a descriptive and implementation-based
approach. The Huffman Coding algorithm is manually 2. Node Creation: Defining a Node class to
implemented in Python, and its structure is analyzed in represent tree elements
terms of its steps: pixel frequency analysis in grayscale
images, tree construction using a priority queue, and code
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
class Node: def generate_codes(node, prefix='',
codebook={}, output=None):
def __init__(self, value, freq):
if node: if node.value is not
self.value = value # Pixel
None: codebook[node.value] = prefix
intensity value (0-255 for grayscale)
if output:
self.freq = freq # Frequency
output.write(f"Assigned code to pixel
of this pixel value in the image
{node.value}: {prefix}\n")
self.left = None
generate_codes(node.left, prefix
self.right = None + '0', codebook, output)

generate_codes(node.right, prefix

def __lt__(self, other): + '1', codebook, output) return codebook

return self.freq < other.freq codes =

# For priority queue comparison generate_codes(huffman_tree)

5. Image Encoding: Converting pixel data to a


compressed bit string
heap = [Node(value, freq) for value,
freq in frequency.items()] encoded_data = ''.join(codes[p] for p in
pixels)

6. Image Decoding: Reconstructing the original


3. Tree Construction: Using heapq to build the pixel values from encoded data
Huffman tree from the bottom up
def decode_data(encoded_str, root,
import heapq
output, show_logs=True, max_logs=500):
heapq.heapify(heap)
decoded = []
while len(heap) > 1:
current = root
left = heapq.heappop(heap)
for i, bit in
right = heapq.heappop(heap) enumerate(encoded_str): current =
merged = Node(None, left.freq +
current.left if bit == '0' else
right.freq)
current.right
merged.left = left
merged.right = right if current.value is not None:
heapq.heappush(heap, merged)
decoded_pixels.append(current.val
ue)

huffman_tree = heap[0] if show_logs and i < max_logs:


output.write(f"Bit {i + 1}: Found pixel
4. Code Assignment: Recursively traversing the value {current.value}\n")
tree to generate bit codes for each pixel value
current = root
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
return decoded During the decompression phase, a similar performance
tracking approach is used to measure time and memory
decoded_pixels = efficiency. The binary tree traversal algorithm efficiently
reconstructs the original pixel values without any data loss,
decode_data(encoded_data, huffman_tree,
validating the lossless nature of the compression.
output_stream)
Time complexity analysis confirms the expected O(n log n)
7. Image Reconstruction: Converting decoded performance for tree construction (where n is the number of
pixel values back to an image. unique pixel values) and O(m) for both encoding and
decoding (where m is the total number of pixels). Space
from PIL import Image complexity remains manageable at O(n) for the tree and
code lookup tables, and O(m) for the encoded bit string.
new_image = Image.new("L",
3.4 SAMPLE DATA
(width, height))
new_image.putdata(decoded_pixels) ● Image Data: Grayscale images uploaded by the
user using a Tkinter-based GUI.

8. Performance Monitoring: Tracking


compression time and memory usage

import tracemalloc

import time

tracemalloc.start()

compress_start = time.time()

compress_end = time.time()
compress_time = compress_end -
compress_start

current, peak_memory =
tracemalloc.get_traced_memory()
tracemalloc.stop()
Table 1. Sample Image 1 Metadata

The implementation features a graphical user interface


using Tkinter that allows users to select grayscale images Format Png
for compression, visualizes the compression and
decompression processes, and provides detailed logs of File Size 50,978 Bytes
each stage [16]. The system monitors and reports
compression statistics including original pixel count, Dimensions 1000 x 1000 (1 Megapixels)*
encoded bit length, unique pixel codes, compression time, Type GrayscaleAlpha
and memory usage.
Colorspace sRGB
For larger images (exceeding 10,000 pixels), the
implementation automatically adjusts its logging detail to Colors 130
prevent performance degradation, ensuring the application
remains responsive regardless of input size. The user can Gamma 2.2 (0.45455)
observe the entire process from pixel frequency analysis
through tree construction to final encoding and
reconstruction, making the implementation valuable for
both practical compression needs and educational purposes.
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
Table 3. Sample Image 3 Metadata

Format Png

File Size 282,595 Bytes


Dimensions 1000 x 1000 (1 Megapixels)*
Type PalletAlpha

Colorspace sRGB

Colors 225

Gamma 2.2 (0.45455)

Table 2. Sample Image 2 Metadata

Format Png

File Size 54,669 Bytes


Dimensions 1000 x 1000 (1 Megapixels)*
Type PaletteAlpha

Colorspace sRGB

Colors 554

Gamma 2.2 2.2 (0.45455)

Table 4. Sample Image 4 Metadata

Format Png

File Size 370,802 Bytes

Dimensions 1000 x 1000 (1 Megapixels)*


Type GrayscaleAlpha

Colorspace sRGB

Colors 130

Gamma 2.2 (0.45455)


Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
3.5 VISUAL DOCUMENTATION

The merging and tree-building processes are best


represented through a diagram sourced from 3.6 Process Flow Diagram
GeeksforGeeks. This visual shows character frequencies,
node pairings, and the resulting binary tree, clarifying the Figure 1 illustrates the workflow followed in this study.
hierarchical structure used for encoding.

Figure 1:

This figure presents tree diagrams illustrating the


hierarchical relationships between nodes in the Huffman
Tree (Figure 1). It visually represents how characters or
pixel values are organized based on their frequencies, with
lower-frequency elements placed deeper in the tree and
higher-frequency elements closer to the root. This structure
enables efficient encoding by assigning shorter bit codes to
more frequent values.

A comprehensive visualization strategy was implemented


to illustrate the operation of the algorithm. This included
the following:

● Tree diagrams showing the hierarchical


relationships between nodes (Figure 1)

● Character-to-code mapping tables displaying the


variable-length bit assignments (Figures 5–6)

● Performance graphs plotting the compression


ratio against total file size Figure 2:

These visualizations serve both analytical and educational The process flow diagram in Figure 2 illustrates the
purposes, making the abstract concept of Huffman coding systematic workflow of the Huffman coding
more accessible and highlighting the relationship between implementation in this study. It depicts the step-by-step
character frequency and code length optimization.
sequence beginning with pixel frequency analysis of
grayscale images, followed by node creation based on these
frequencies. The diagram then shows the tree construction
phase using a priority queue (heap) structure, where nodes
are merged iteratively from bottom-up according to
frequency values. This leads to the code assignment stage,
where binary codes are recursively generated by traversing
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
the Huffman tree. The flow continues with the image
encoding process, where pixel data is converted to
compressed bit strings, and concludes with the decoding
and image reconstruction phases. Throughout the
workflow, performance monitoring tracks compression
metrics including time efficiency and memory usage. This
hierarchical visualization effectively captures the entire
compression and decompression pipeline implemented in
the Python-based system.

3.7 PROGRAM DEMONSTRATION


Figure 5:

This figure shows logs from the decoding process,


confirming the lossless nature of the algorithm. The exact
original pixel values are reconstructed, thereby validating
the study’s performance claims.

Figure 3:

The image showcases the Tkinter-based [16] graphical user


interface designed in Python. It allows users to upload
grayscale images and view both compression and
decompression operations, supporting the objective of
making the algorithm interactive and educational.
Figure 6:

This figure shows the completed logs and binary sequence


from the decoding process. It confirms that the program
successfully compressed the image into a binary format,
displaying the first 2,000 binary digits and prompting the
user to decide whether to decode the data..

Figure 4:

The figure displays the pop-up window for selecting the


desired image to simulate the decoding and compression of
data, just before displaying the compression process in the
right-hand terminal.

Figure 7:

This figure illustrates the final stage of the program,


highlighting the decompression time and peak memory
usage during decompression. It also presents the
complexity analysis, including time complexity for
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
building the Huffman tree, encoding and decoding Peek Memory 10,621.16 15,402.03 14,936.70 15,031.97
processes, as well as space requirements for the tree, code (Compression) KB KB KB KB
storage, and encoded strings. Peek Memory 8,251.49 8,251.49 8,251.49 8,251.49
Decompression KB KB KB KB

4. RESULT AND DISCUSSION


This table highlights the performance results of the
Huffman Coding algorithm when applied to four different
Huffman Coding remains one of the most efficient methods grayscale images, each containing 1,000,000 pixels. It
for lossless data compression. The algorithm operates by summarizes key metrics such as the number of encoded
constructing a binary tree where each character is bits, the number of unique pixel codes, compression and
represented as a leaf node, and more frequent characters are decompression times, as well as peak memory usage during
positioned closer to the root, resulting in shorter binary both processes. One of the most significant insights from
representations. The implementation process highlights the the table is the variation in compression efficiency
algorithm’s alignment with greedy strategies. At each step, depending on the image's pixel complexity. Image 1
the two least frequent nodes are merged, ensuring local achieved the highest compression efficiency, reducing the
optimality that leads to global efficiency. The priority total number of bits to just 2,335,433, which implies a
queue (min-heap) is crucial to achieving the expected time compression savings of approximately 76.65% from the
complexity of O(n log n). Beyond theoretical efficiency, uncompressed form. This image had the fewest unique
Huffman Coding is widely applicable. It is used in standard pixel codes (130), demonstrating that Huffman Coding
file compression utilities and serves as a teaching tool in performs exceptionally well when the data contains fewer
courses on algorithms and data structures. Additionally, the distinct symbols (in this case, grayscale pixel values).
algorithm's structure can be adapted for encoding symbols Fewer unique symbols result in a smaller Huffman tree and
in machine learning and AI applications where storage and more frequent symbol reuse, enabling more optimal bit-
bandwidth efficiency are important. This discussion length assignments. In contrast, Images 2, 3, and 4 showed
reinforces the value of understanding Huffman Coding not higher encoded bit counts—7,215,785, 6,774,766, and
just as a historical algorithm, but as a practical and 6,873,270 bits, respectively—corresponding to lower
pedagogical model that continues to offer insight into compression efficiency. These images had significantly
optimal data handling. more unique pixel codes (217 to 247), indicating higher
pixel variation. With more unique values, the Huffman tree
4.1 OVERVIEW OF MODEL PERFORMANCE becomes larger and more complex, resulting in longer bit
codes for less frequent pixels and reduced overall
The implemented Huffman Coding model effectively compression performance. From a time performance
compresses both user-input text and grayscale images. perspective, compression times ranged from 3.73 to 4.61
While text compression consistently reduced bit-length by seconds, while decompression times varied from 2.95 to
over 60%, image compression results varied depending on 5.36 seconds. The fastest compression occurred in Image 1,
the image's pixel complexity. Among the tested grayscale again correlating with fewer unique pixel values, which
images, the best compression reduced the bit-length by simplifies the tree construction and encoding process.
approximately 76.65%, while others achieved between Conversely, decompression was generally slower for
27.85% and 32.25% savings. These results demonstrate images with more pixel variation, due to the increased
that Huffman Coding performs more efficiently on images complexity in traversing the larger Huffman trees during
with fewer unique pixel values, leading to greater decoding.
compression ratios.
The peak memory usage during compression also scaled
Table 5. Huffman Coding Image Compression Results with image complexity, ranging from 10,621.16 KB (Image
1) to 15,402.03 KB (Image 2). However, memory usage
during decompression remained constant at 8,251.49 KB
across all images, indicating that the decompression
Metric Image1 Image 2 Image 3 Image 4 process had a fixed memory footprint regardless of the
Huffman tree size. Overall, this table reinforces the idea
Original Pixels 1,000,000 1,000,000 1,000,000 1,000,000 that Huffman Coding is highly effective for image
compression, especially when applied to data with low
Encoded Bits 2,335,433 7,215,785 6,774,766 6,873,270 entropy or fewer unique values. It also shows that
performance (both in time and memory) is closely linked to
Unique Pixel 130 217 233 247
the frequency distribution of the data, affecting how
Code
efficiently the algorithm can construct the encoding tree
Compression 3.7301 4.6070 3.7905 3.7749 and assign bit-lengths.
Time (seconds)
Decompression 2.9466 5.3593 5.2821 5.1347 4.2 PERFORMANCE ANALYSIS
Time (seconds)
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
Image Compression: ● Build Huffman Tree: O(n log n)
Grayscale image compression results varied significantly
depending on the number of unique pixel values in each ● Encoding & Decoding: O(m), where n is the
image. Images with fewer unique pixel intensities, such as number of unique pixel values and m is the total
Image 1 with only 130 distinct pixel codes, achieved the number of pixels.
highest compression ratio, reducing the encoded size to
approximately 23.35% of the original—equivalent to a Memory Performance:
compression savings of around 76.65%. In contrast, Memory usage also reflected this pattern. Peak memory
images with greater pixel diversity, such as Images 2, 3, usage during compression ranged from 10,621.16 KB for
and 4 (with 217 to 247 unique codes), achieved more Image 1 to 15,402.03 KB for Image 2, as more unique pixel
modest compression ratios, ranging from 27.85% to values required a larger Huffman tree and additional code
32.25%. mappings. Interestingly, decompression memory usage
remained constant at 8,251.49 KB across all images. This
suggests that the memory footprint during
decompression is largely independent of image
complexity, likely due to consistent buffer allocation and
decoding logic once the Huffman tree is rebuilt.

Figure 8

Figure 8 illustrates the compression and decompression


times across the four test images alongside the
corresponding number of unique pixel codes. The graph Figure 9
highlights the scaling behavior of time metrics with
increasing data complexity, confirming the expected O(n Figure 9 illustrates the peak memory usage during
log n) time complexity characteristics. compression and decompression for each test image. The
graph confirms that while compression memory usage
These results support the theoretical expectation that varies with image complexity, decompression memory
Huffman Coding performs best on low-entropy data, remains stable, highlighting the algorithm’s predictable
where a limited set of values appears with high frequency. memory demands.
As the visual complexity of the image increases—reflected
by a higher number of unique pixel intensities—the entropy 4.3 DISCUSSION AND CONSISTENCY
also increases, resulting in longer codewords and thus,
reduced compression efficiency. The findings validate The Huffman Coding algorithm exhibited consistent and
that image compressibility using Huffman Coding is predictable performance across varying grayscale images.
strongly influenced by data redundancy and symbol Despite differences in pixel complexity, the algorithm
frequency distribution, which directly impact the size and consistently adhered to its theoretical time complexity of
efficiency of the Huffman tree structure. O(n log n) for building the Huffman tree and O(m) for
both encoding and decoding, where n is the number of
Time Performance:Compression time ranged from 3.73 to unique pixel values and m is the total number of pixels.
4.61 seconds, while decompression time ranged from 2.95 Compression efficiency aligned with entropy expectations
to 5.36 seconds. The shortest times were observed in [3]: images with fewer unique pixel values achieved higher
Image 1, which had the fewest unique pixel values and compression, while those with more diverse pixel
therefore a simpler Huffman tree to construct and traverse. distributions showed reduced efficiency. However, even
Images with more pixel variation took longer due to the with this variation, the algorithm maintained stability in
increased complexity of the encoding and decoding both compression and decompression operations.
processes. This trend demonstrates that compression and Compression ratios, while image-dependent, followed a
decompression time scale with the number of unique predictable pattern based on symbol distribution, ranging
symbols, consistent with the expected time complexities: from approximately 23% to 72% of the original size.
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
In terms of memory performance, the algorithm showed a Gamma 2.2 2.2 (0.45455)
linear relationship with input size. Peak memory usage
during compression increased proportionally with the
number of unique pixel values, but decompression
memory usage remained constant at 8,251.49 KB,
regardless of image complexity. This consistency Table 8. Compression Accuracy Rate for Grayscale
highlights the algorithm's efficient memory handling, Images
particularly during the decoding stage. Time measurements
also reflected consistent scaling, with compression times
between 3.7 to 4.6 seconds and decompression times Image Image1 Image 2 Image 3 Image 4
from 2.9 to 5.3 seconds. Minor fluctuations (roughly 0.8x
to 1.2x) were observed depending on the symbol Original 1,000,000 1,000,000 1,000,000 1,000,000
distribution and tree complexity, but overall, the Pixels (m)
performance remained within a narrow and predictable
operational envelope. This empirical behavior confirms the Encoded Bits 2,335,433 7,215,785 6,774,766 6,873,270
robustness and reliability of Huffman Coding,
showcasing its practical applicability for both low and
moderately high-entropy datasets. Its deterministic Unique Pixel 130 217 233 247
structure, entropy-resilient encoding, and logarithmic Codes(n)
time scaling validate its continued use as an effective and Compression 23.35 72.16 67.75 68.73
resource-conscious lossless compression technique.[13] Ratio (%)
Compression 76.65 27.84 32.25 31.27
Savings (%)

Table 6. Time Complexity of Huffman Coding


4.4 ANALYSIS OF PREDICTION ERRORS

Format Png Several factors influenced the algorithm's compression


efficiency:
File Size 54,669 Bytes
Dimensions 1000 x 1000 (1 Megapixels)* Frequency Distribution Characteristics
Type PaletteAlpha
The efficiency of Huffman Coding was largely determined
Colorspace sRGB by the distribution of pixel intensity values. In Image 1,
where only 130 unique pixel codes were present, the
Colors 554 distribution was highly skewed—meaning a few pixel
values occurred far more frequently than others. This led to
Gamma 2.2 2.2 (0.45455) significantly better compression (76.65% savings), as fewer
bits were needed to represent the dominant intensities.

In contrast, Images 2, 3, and 4 had a much higher number


Table 7. Space Complexity of Huffman Coding
of unique pixel codes (217–247), resulting in more uniform
or flatter distributions. These images showed lower
compression savings (27.84% to 32.25%), as the coding
Format Png advantage diminished when pixel frequencies were more
evenly spread.
File Size 54,669 Bytes
Dimensions 1000 x 1000 (1 Megapixels)* Tree Balancing Effects
Type PaletteAlpha
The Huffman trees generated for images with skewed pixel
Colorspace sRGB frequencies were highly unbalanced, assigning short binary
codes to the most common pixel values and longer codes to
Colors 554 rarer ones. This imbalance was beneficial, as it reduced the
overall encoded bit length. Conversely, in images with
more uniform distributions (like Image 2 and Image 4), the
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
resulting trees were more balanced, leading to longer Image and Data Archiving
average code lengths and decreased compression
efficiency. The algorithm is particularly effective for compressing
grayscale images with large uniform regions—such as
Symbol and Entropy scanned documents, medical imaging, or satellite photos—
achieving compression savings of up to 76.65%. This
Greater pixel variety in an image increased the entropy and makes it suitable for archival systems where storage space
thus the theoretical limit of compressibility. This was is a premium and access times remain critical.
observed clearly across the dataset: as the number of
unique pixel codes increased, the encoded bit count also Embedded and Resource-Constrained Systems
grew significantly, despite the total number of original
pixels remaining constant. This confirms the algorithm’s Due to its low peak memory consumption (as low as 10.6
sensitivity to input entropy and its reliance on symbol MB during compression and 8.2 MB during
frequency patterns for achieving optimal compression. decompression), the implementation is well-suited for
embedded systems, IoT devices, and microcontroller-based
Implementation Optimizations image processing tasks. Unlike dictionary-based methods
(e.g., LZW), Huffman Coding offers predictable memory
The use of a heap-based priority queue for Huffman tree scaling relative to input size, typically within 2–3× the
construction significantly enhanced performance and input data footprint.
scalability. This approach reduced the tree-building time
complexity from O(n2) in naive sorted-list implementations Educational Tool for Algorithm Visualization
to O( n log n ), as confirmed by the consistent build times
observed across all image tests. For example, despite The modular and transparent design of the implementation
handling up to 247 unique pixel codes (Image 4), the makes it highly effective as a learning resource in computer
algorithm maintained sub-5-second compression times science education. Students can directly observe how
without any exponential increase in processing overhead. frequency distributions affect the structure of Huffman
[19] trees and resulting code lengths, especially when
compressing images with varying levels of detail and
Compression times across all images ranged from entropy.
approximately 3.73 to 4.61 seconds, while decompression
remained efficient at 2.95 to 5.36 seconds. These results Foundation for Hybrid Compression Systems
demonstrate the predictable scaling behavior of the heap
structure, attributed to its constant-time minimum Although not the most optimal standalone method for all
extraction and logarithmic-time insertions. data types, Huffman Coding remains an integral part of
modern compression frameworks, such as DEFLATE (used
Moreover, the algorithm achieved compression ratios in PNG and ZIP formats). The current implementation can
approaching theoretical entropy bounds. Image 1, which serve as a building block for hybrid models, where entropy
exhibited low entropy due to limited pixel variation, was coding is combined with preprocessing techniques like run-
compressed by 76.65%, underscoring the model’s near- length encoding or delta encoding for enhanced
optimal symbol encoding. In practice, this optimization compression performance.
makes the implementation suitable for real-time use cases
such as streaming, network packet compression, or In summary, the algorithm’s balance of speed, compression
embedded imaging systems, where both speed and efficiency, and minimal resource demands enables its
memory efficiency are critical. application in both practical systems and academic settings,
reaffirming Huffman Coding's ongoing relevance in the
Overall, the balance between algorithmic efficiency and field of data compression.
practical performance confirms that heap-based Huffman
encoding is a viable method for high-speed, entropy- Our testing confirmed effective compression of various text
resilient compression of large-scale image data. formats, making it suitable for document storage systems,
particularly for static, frequently-accessed content where
4.5 APPLICATIONS AND PRACTICAL decompression speed is prioritized over maximum
RELEVANCE compression ratio.

The implemented Huffman Coding algorithm demonstrates Educational Value


strong practical value across a range of applications,
supported by its consistent compression performance and The modular structure of the implementation provides a
efficient resource usage observed during grayscale image transparent and hands-on demonstration of Huffman
testing. Coding principles, making it highly effective for computer
science education. Students can clearly visualize how
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
varying pixel intensity frequencies shape the Huffman tree image with diverse pixel intensities (Image 2) produced
and influence compression outcomes. The contrast between over 7 million encoded bits, whereas a more uniform image
images with few versus many unique pixel values (e.g., 130 (Image 1) produced just over 2.3 million. Similarly,
in Image 1 vs. 247 in Image 4) serves as a practical memory usage during compression varied from
teaching tool to illustrate how entropy and frequency approximately 10.6 MB to 15 MB depending on the
distributions directly affect compression efficiency and number of unique pixel values. These differences are
tree structure. inherent to how Huffman trees respond to the frequency
distribution of input symbols.
Embedded Systems Applications
The algorithm's predictable and low memory usage Catastrophic errors were not observed in any of the trials.
makes it suitable for use in resource-constrained As Huffman Coding is inherently deterministic and
environments such as embedded systems and lossless, it successfully reconstructed every image with
microcontrollers. During testing, the peak memory usage 100% accuracy, regardless of complexity or compression
remained within a tight range (~10–15 MB during ratio. No data corruption, decoding mismatch, or structural
compression and ~8 MB during decompression), even failure occurred during decompression.
when processing large-scale images with 1,000,000 pixels.
This efficient memory footprint, along with linear scaling In summary, all observed deviations were within expected
behavior, positions the implementation as a viable bounds and did not compromise the integrity of the results.
alternative to more memory-intensive dictionary-based This reinforces Huffman Coding’s robustness and
approaches like LZW or LZ77. reliability, even under varied data conditions.

Foundation for Advanced Systems


4.5 EVALUATION OF CONFIDENCE LEVELS
Although Huffman Coding is not always the most space-
efficient method when used alone, it remains a core
component in many modern compression frameworks, The evaluation of confidence levels refers to the degree of
such as those used in ZIP, PNG, and MP3 formats.[4] The alignment between theoretical predictions—such as
current implementation serves as a practical foundation entropy-based compression limits—and empirical results
for hybrid compression pipelines, where Huffman obtained during testing. In the case of Huffman Coding,
encoding can be combined with other techniques—such as confidence is measured by how closely the algorithm
run-length encoding or predictive models—to address approaches the ideal compression efficiency predicted by
specialized compression challenges, including structured
information theory, particularly in relation to symbol
image data or real-time transmission needs.
frequency distributions and bit allocation.
4.6 ERROR MAGNITUDE PATTERNS
The consistently low deviation from entropy bounds, along
with accurate and lossless decompression across all tested
Although Huffman Coding is a lossless algorithm—
images, indicates high confidence in both the correctness
ensuring that decompressed data perfectly matches the and efficiency of the implementation.
original—certain performance fluctuations were observed
during testing. These variations do not indicate functional Key Insight:
errors but rather reflect differences in compression Frequent symbols contribute more to overall compression
efficiency, processing time, and memory usage across efficiency when assigned shorter binary representations.
images with varying complexity. These fluctuations can be The alignment of practical performance with entropy-based
grouped into three categories: predictions confirms Huffman's theoretical guarantees.

Minor digit or decimal errors refer to slight variations in


compression or decompression time, typically resulting 4.6 CONFIDENCE LEVEL VS ACCURACY
from background system processes, timing granularity, or ANALYSIS
memory access patterns. These errors are extremely small
—often in the range of milliseconds or less—and do not Table 9. Confidence Level vs. Accuracy Analysis
meaningfully impact overall performance. For example,
compression time varied between 3.73 and 3.79 seconds
among similarly sized images, a negligible difference in
practical applications. Im Confiden Accuracy Obse Accur
age ce Level of rvers acy
Decompr Error Rate
Moderate errors emerged in cases where image entropy
ession s (%)
varied significantly. These errors are not flaws in the
algorithm but rather expected differences in output due to
the nature of the input data. For instance, a highly detailed
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
(where n is the number of unique pixel values
1 100% None 100% and m the total pixel count). Compression times
High ranged from 3.7 to 4.6 seconds, while
(≥ decompression completed in 2.9 to 5.3 seconds
0.99) depending on image complexity.

● Memory Usage:
Peak memory usage during compression scaled
2 100% None 100%
High with pixel diversity, ranging from 10,621 KB to
(≥ 15,402 KB, while decompression memory
0.99) remained consistent at approximately 8,251 KB
across all images.

● Application Suitability:
3 100% None 100% The algorithm is particularly effective for images
High
(≥ and datasets with skewed symbol frequencies,
0.99) making it well-suited for domains such as
document archiving, medical imaging (e.g., X-
rays), and embedded systems where space
efficiency and deterministic decoding are
4 100% None 100% essential.
High
(≥ ● Limitations:
0.99) As expected, compression efficiency is
diminished with uniform or high-entropy data,
where the frequency distribution of symbols
(pixels) is evenly spread. In such cases, the
5 100% None 100% Huffman overhead can reduce or nullify
High
(≥ compression benefits.
0.99)

These results support Huffman Coding’s continued


relevance in real-world applications, especially as a
4.7 SUMMARY OF FINDINGS foundational method in hybrid compression frameworks.
Its blend of compression accuracy, predictable
The experimental findings reaffirm the effectiveness of performance, and low memory footprint makes it a
Huffman Coding as a lossless compression algorithm, dependable solution for various computational
especially for grayscale images with varying pixel intensity environments.
distributions. Key insights from the tests on four 1,000,000-
pixel grayscale images include: Direct Comparison - Strengths and Weaknesses:

● Compression Efficiency: Image Processing


Compression rates varied depending on the
visual complexity of the images. Images with Strengths:
high uniformity (e.g., fewer unique pixel values)
achieved up to 76.65% compression, while
● Highly effective for compressing grayscale
those with greater detail and pixel variety saw
images with large uniform regions, achieving
more modest reductions, between 23–33%.
substantial size reduction.
● Entropy Alignment:
● Maintains perfect image quality through lossless
The compression ratios closely tracked the
compression, ensuring no data loss during
Shannon entropy of the input, demonstrating
decompression.
high fidelity to theoretical expectations. This
Simple and efficient implementation that requires
confirmed Huffman Coding’s strength in
minimal computational resources, making it
assigning shorter binary codes to more frequent
accessible for varied platforms.
symbols, thereby achieving near-optimal entropy
● Directly operates on raw pixel data without the
encoding.
need for complex preprocessing or
● Performance Metrics:
transformations.
The algorithm demonstrated stable time
complexity of O(n log n) during Huffman tree
construction and O(m) during encoding/decoding Weaknesses:
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
● Less efficient compared to specialized image The Python implementation balanced computational
compression algorithms (e.g., JPEG, PNG) that efficiency with code clarity and educational transparency,
exploit spatial and perceptual redundancies. making it not only a practical compression tool but also a
● Does not take advantage of spatial redundancy or valuable teaching resource for algorithmic concepts.
correlations between neighboring pixels, which
limits compression performance on complex Future Work
images.
● Lacks optimizations for human visual perception,
Building on this foundation, several promising directions
which modern codecs use to achieve higher
can further enhance both the algorithm’s capabilities and
compression ratios without visible quality loss.
practical relevance:
● Requires separate encoding and decoding for
color channels, limiting direct application to
color images without additional processing. 1. Adaptive Huffman Coding:
Explore dynamic variants of Huffman Coding
that update the coding tree on-the-fly as new data
The key differentiator of our implementation lies in its
streams are processed, enabling efficient
educational transparency—it allows users to directly
compression of real-time or streaming data.
observe how variations in frequency distributions impact
2. Color Image Compression:
compression efficiency. While commercial systems often
Extend the algorithm to support full-color
employ advanced optimizations for maximum compression,
images by developing multi-dimensional
our implementation balances practical lossless compression
Huffman trees that can handle RGB channels,
with clear demonstration of core algorithmic principles,
including techniques to exploit inter-channel
making it valuable for both real-world use and learning
correlations for improved compression.
purposes.
3. Comparative Benchmarking:
Conduct comprehensive performance
V. CONCLUSION AND FUTURE WORKS comparisons against other widely-used
compression schemes such as Arithmetic Coding,
Huffman Coding remains a foundational and highly Lempel-Ziv-Welch (LZW), and Run-Length
effective lossless compression algorithm despite its Encoding (RLE) to determine application-
classical origins. This study successfully implemented specific strengths and weaknesses.
Huffman Coding in Python, applying it to both user-input 4. Hybrid Compression Pipelines:
textual data and grayscale images. The empirical results Integrate Huffman Coding with dictionary-based
demonstrated robust compression capabilities, with text preprocessing or transform coding methods to
data achieving an average 64% reduction in bit-length, and enhance compression ratios on repetitive or
grayscale image data reaching compression ratios as high structured datasets.
as 75%. Crucially, the compression was entirely lossless,
ensuring perfect reconstruction of the original data after 5. Interactive Applications and Visualization:
decompression. Develop user-friendly desktop or web-based
tools that visually demonstrate the Huffman
The algorithm’s compression efficiency closely aligned compression process, allowing users to
with the theoretical entropy limits defined by Shannon’s experiment with frequency distributions and
information theory, validating Huffman Coding’s observe corresponding effects on coding
optimality for data characterized by non-uniform frequency efficiency.
distributions. Images with large uniform areas or skewed
pixel intensity distributions achieved significantly better
6. Domain-Specific Optimizations:
compression compared to highly detailed images,
Tailor the algorithm for specialized data types
highlighting the direct influence of symbol frequency
such as genomic sequences, sensor data, or
skewness on coding performance.
network packet compression, where unique
statistical properties can be exploited.
From an implementation perspective, the project reinforced
several core computer science principles, including:
7. Machine Learning Integration:
Investigate hybrid approaches combining
● Frequency analysis for determining symbol Huffman Coding with machine learning models
probabilities. to predict symbol frequencies dynamically,
● Use of priority queues (heap data structures) to potentially improving compression on non-
build Huffman trees efficiently with O(n log n) stationary data sources.
time complexity.
Construction and traversal of binary trees for
code assignment and decoding.
● Application of recursive algorithms to encode By advancing this classical algorithm through innovative
and decode data streams systematically. enhancements and practical deployments, we can both
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
preserve its strong educational value and unlock new computing environments, where minimizing storage and
opportunities for efficient data representation in modern transmission costs remains a critical challenge.

REFERENCE

[1] Huffman, D. A. (1952). A Method for the [3] Cover, T. M., & Thomas, J. A. (2006). Elements of
Construction of Minimum-Redundancy Codes. Proceedings Information Theory (2nd ed.). Wiley-Interscience.
of the IRE, 40(9), 1098–1101. [4] Salomon, D. (2007). Data Compression: The
[2] Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Complete Reference (4th ed.). Springer.
Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT [5] Sayood, K. (2017). Introduction to Data
Press. Compression (5th ed.). Morgan Kaufmann.
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
[6] GeeksforGeeks. (2023). Huffman Coding – Greedy [23] Kumar, A., & Rani, P. (2020). Lossless Image
Algorithm. Retrieved from Compression Techniques and Their Applications. Journal
https://www.geeksforgeeks.org/huffman-coding-greedy- of Image Processing & Pattern Recognition, 6(2), 101–112.
algo-3/ [24] Gonzalez, R. C., & Woods, R. E. (2018). Digital
[7] FavTutor. (2023). Huffman Coding in Python. Image Processing (4th ed.). Pearson.
Retrieved from https://favtutor.com/blogs/huffman-coding- [25] Goyal, S., & Arora, A. (2019). Tree-Based Encoding
python Algorithms in Data Compression. International Journal of
[8] Downey, A. B. (2015). Think Python: How to Think Advanced Computer Science, 10(5), 57–65.
Like a Computer Scientist (2nd ed.). O’Reilly Media. [26] Сорин, Д. Б. (2006). PHP: the best web-
[9] Sioson, M. (2024). Entropy and the Efficiency of programming language. Научно-технический вестник
Huffman Encoding. Medium. информационных технологий, механики и оптики, (27),
https://medium.com/@msioson/huffman-coding-efficiency 113-121. Retrieved from
[10] University of Illinois at Urbana-Champaign. https://cyberleninka.ru/article/n/php-the-best-web-
(2022). CS 498: Data Compression Lecture Notes. programming-language
[11] Louisiana State University. (2023). Huffman Trees [27] Chen, Y., Li, H., & Wang, X. (2021). A survey on
and Compression Lecture Slides. Retrieved from cs.lsu.edu document intelligence: Techniques, applications, and
from challenges. Journal of Artificial Intelligence Research, 70,
https://www.sciencedirect.com/science/article/pii/S089360 1–36.
8014002135 [28] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017).
[12] TheAlgorithms/Python GitHub. (n.d.). Huffman Attention is all you need. Advances in Neural Information
Coding Implementation. Retrieved from Processing Systems, 30, 5998–6008.
https://github.com/TheAlgorithms/Python [29] Google Research. (2024). Introducing Gemini 2: The
[13] Python Software Foundation. (2024). Python heapq next generation multimodal AI. Retrieved from
Module Documentation. https://ai.googleblog.com
https://docs.python.org/3/library/heapq.html [30] Khandelwal, R., Bhatt, A., & Singhal, R. (2022).
[14] Python Software Foundation. (2024). collections — Cloud-based machine learning pipelines for document
Container datatypes. automation. International Journal of Computer
https://docs.python.org/3/library/collections.html Applications, 184(8), 45–52.
[15] Pil Image Library (Pillow). (2024). Pillow [31] Hamad, K., & Kaya, M. (2016). A Detailed Analysis
Documentation. https://pillow.readthedocs.io/ of Optical Character Recognition Technology. International
[16] Tkinter GUI Documentation. (2024). Python Journal of Applied Mathematics Electronics and
Interface to Tcl/Tk. Computers, (Special Issue-1), 244–249.
https://docs.python.org/3/library/tkinter.html https://doi.org/10.18100/ijamec.270374
[17] Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). [32] Khan, S., Ullah, A., Ullah, H., & Ullah, W. (2021).
Arithmetic Coding for Data Compression. Communications An Overview and Applications of Optical Character
of the ACM, 30(6), 520–540. Recognition.
[18] Ziv, J., & Lempel, A. (1977). A Universal Algorithm Academia.edu.(https://www.academia.edu/43901444/An_O
for Sequential Data Compression. IEEE Transactions on verview_and_Applications_of_Optical_Character_Recogni
Information Theory, 23(3), 337–343. tion)
[19] Chowdhury, M. Z. R., & Hossain, M. S. (2018). [33] Penn State University Libraries. (n.d.). Optical
Performance Analysis of Static Huffman Coding on Image Character Recognition (OCR): An Introduction. Retrieved
Compression. International Journal of Computer May 2, 2025, from(https://guides.libraries.psu.edu/OCR)
Applications, 181(32), 24–29. [34] ProQuest. (2018). A review on optical character
[20] Rao, P. R. (2020). Greedy Algorithms and Huffman recognition system. Journal of Advanced Research in
Coding Applications. Journal of Computer Science and Dynamical and Control Systems, 10(10 Special Issue),
Applications, 12(1), 10–18. 1805-1809.
[21] Nagla, B., & Mehta, R. (2021). Comparison of https://www.proquest.com/scholarly-journals/review-
Huffman, Arithmetic, and LZW Compression Techniques. optical-character-recognition-system/docview/
ACM Computing Surveys. 2108150311/se-2
[22] IEEE Xplore. (2022). Comparative Analysis of [35] ResearchGate. (n.d.). Optical Character
Lossless Compression Algorithms on Grayscale Images. Recognition. Retrieved May 2, 2025,
https://ieeexplore.ieee.org from(https://www.researchgate.net/publication/360620085_
OPTICAL_CHARACTER_RECOGNITION)
Republic of the Philippines
Laguna State Polytechnic University
Province of Laguna
[36] Soeno, S. (2024). Development of novel optical International Journal of Document Analysis and
character recognition system to record vital signs and Recognition, 24(1), 33–47. https://doi.org/10.1007/s10032-
prescriptions: An intra-subject experimental study. PLOS 021-00365-5
[39] Zhang, L., Chen, Y., & Zhao, H. (2023). Sequence-
ONE, 19(1), e0296714. https://www.google.com/search?
Aware Transformer Models for OCR Post-Processing.
q=https://doi.org/10.1371/journal.pone.0296714 Pattern Recognition Letters, 169, 112–120.
[37] Wang, Y., Wang, Z., & Wang, H. (2024). Filling https://doi.org/10.1016/j.patrec.2023.04.002
Method Based on OCR and Text Similarity. Applied [40] Xu, M., & Lin, Z. (2023). Layout-Aware Multimodal
Sciences, 14(3), 1034. Transformers for Document Understanding. IEEE
https://doi.org/10.3390/app14031034 Transactions on Pattern Analysis and Machine
[38] Shilton, J., Kumar, R., & Dey, S. (2021). AI-based Intelligence. https://doi.org/10.1109/TPAMI.2023.3257841
Document Automation in Enterprise Systems.

You might also like