Lossless Image Compression Using K-Means Clustering in Color Pixel Domain
Lossless Image Compression Using K-Means Clustering in Color Pixel Domain
Abstract— Compressing images is a method for shrinking the for compressing images. While unneeded metadata is deleted in
image's dimensions using a particular algorithm. Image lossless compression to minimize file size, lossy compression
compression is a solution associated with transmitting and storing permanently removes part of the original data to reduce file size.
large amounts of data for digital images. While image storage is The terms "supervised learning" and "unsupervised learning"
necessary for medical images, satellite images, documents, and refer to two different machine learning techniques. As opposed
pictures, image transmission includes a variety of applications like to a non-supervised learning algorithm, which uses unlabeled
television broadcasting, remote sensing via satellite, and other input data, supervised learning uses labeled data. While
long-distance communication. These kinds of applications deal
clustering is a subtype of unsupervised learning, classification,
with image compression. Lossy compression and lossless
and regression are additional subcategories of supervised
compression are two separate methods for compressing images.
While unneeded metadata is deleted in lossless compression to
learning.
minimize file size, lossy compression permanently removes part of The two most common forms of picture file compression are
the original data to reduce file size. The terms "supervised lossy and lossless methods. Lossyless compression shrinks an
learning" and "unsupervised learning" refer to two different image file's size by permanently deleting unnecessary and
machine learning techniques. As opposed to a non-supervised unimportant data. Lossy compression may significantly reduce
learning algorithm, which uses unlabeled input data, supervised file size, but if used excessively, it can distort photos and cause
learning uses labeled data. While clustering is a subtype of
a considerable loss in image quality. However, quality may be
unsupervised learning, classification, and regression are additional
kept when compression is done properly.
subcategories of supervised learning. This paper discusses the
application of clustering using K-means. This methodology Our paper aims to explain a method for reducing the
explores the concept of lossless picture compression is the process information needed to represent an image, known as image
of reducing the size of an image without sacrificing its quality compression. This technique is crucial in various fields,
which is achieved using k-mean clustering. The proposed including digital cameras, smartphones, and low-resource
methodology achieves higher compression ratios than other devices, where it can save storage space. It is especially
existing approaches to image compression algorithms. important in the medical field, where large amounts of image
Keywords—image, compression, k-mean, clustering,
data, such as X-rays, MRIs, and CT scans, accumulate. Efficient
unsupervised, lossless storage and fast transmission over the network are essential for
immediate diagnoses by experts. The primary goal of the
I. INTRODUCTION suggested paper is:
Image compression is a method to lessen the number of 1) To generate a lossless compressed image
bytes in a graphics without compromising the standard of
images. By lowering the file size, more photographs may be 2) To decrease the number of colors by averaging the
saved in a given amount of RAM or disc space. The image needs colors closest to the original image.
less bandwidth when it is downloaded from a website or sent 3) To implement K-means clustering for achieving
over the internet, which reduces network congestion and speeds compressed image
up content delivery.
4) To achieves higher compression ratios than other
A method for reducing the size of an image is image existing approaches of image compression algorithms.
compression file using a particular algorithm. Image
compression is a solution associated with transmitting and This paper proposes an image compression method that uses
storing large amounts of data for digital images. While image K-means clustering to generate a lossless compressed image
storage is necessary for medical images, satellite images, that achieves higher compression ratios than other existing
documents, and pictures, image transmission includes a variety approaches of image compression algorithms. The proposed
of applications like television broadcasting, remote sensing via document is organized as, in the beginning, this paper accurately
satellite, and other long-distance communication. These kinds describes initially what assumptions were made from the images
of applications deal with image compression. Lossy in the order in which the method achieves the proposed results.
compression and lossless compression are two separate methods Then this paper describes precisely how the unsupervised
learning technique is used using k-means algorithm. In the end, only a portion of the original image. Here, 16x16 subblocks
this paper explains how the proposed compression method from a picture are decomposed on a single level using wavelets,
works as well as how k-mean is used to compress images. then the coefficients are categorised by utilising the commonly
used K-means clustering method. The centriods of each cluster
II. LITERATURE REVIEW are enumerated and arranged in a book. Only the index values
G.S.K.Krishna phani, B.Lavanya and Sanjay Kumar Singh are sent beyond the pale. It is possible to get high compression
proposed an idea of image compresion using RLS-DLA ratio by reducing the number of clusters. According to the
ALGORITHM in their paper “STUDY AND ANALYSIS simulation findings, changing the cluster size from 10 to 100
OFRLS-DLA ALGORITHMWITH NATURAL IMAGE results in a compression ratio that ranges from 56.8 to 8.0 while
COMPRESSION TECHNIQUES”[1]. The "Algorithm for keeping the image's quality adequate.
Learning a Dictionary Using Recursive Least Squares " is the Nishant Kumar Singhai and Prateek Mishra published a
major focus of this paper. Learning over-complete dictionaries paper[5]. The authors of this study suggested a brand-new
for sparse signal representation using RLS-DLA. While each hybrid approach for image compression. In this study, the
training vector is processed, recursively updating the dictionary authors lessen the impact of two transform methods and create
is done while using the training set.. The RLS-DLA technique a new one that is more effective than either one alone. Here,
was constructed using a convergence scheme that took the researchers compare their new strategy to the earlier one and
forgetting factor into account. The algorithm's core is simple suggest other approaches. The comparison made use of the peak
and straightforward to use. Also, researches focused on a signal-to-noise ratio, mean square error, and compression ratio.
comparison between RLS-DLA with more traditional natural In terms of visual perception, the modern suggested hybrid
image compression methods like JPEG and JPEG 2000. technique i.e. the combination of WPT and DCT is superior.
Lastly, the RLS-DLA approach's sparsity coefficients enable This article serves as a useful resource for choosing an image
low bit rate picture compression. The experimental findings compression technique in the future. This work comes highly
show that the RLS-DLA technique is excellent at compressing recommended for transmission and storage purposes.
natural images at low bit rates. Thamer Hameed proposed a paper “Image Compression
Rishav Chatterjee et al propose an idea of Lossy Using Neural Networks: A Review”[6] . Technology for
Compression in their paper “Image Compression Using VQ for imaging and video coding has advanced considerably in recent
LossyCompression”[2]. Because of its fast rate of compression years. Yet, taking into account the popularity of picture and
and straightforward decoding techniques, vector quantization is video collecting systems, the expansion of picture data is far
among the most widely used lossy compression methods. The higher than that of the growth in compression ratio. It is broadly
design of the codebook is the primary VQ approach. In this acknowledged, it will become harder to further increase the
study, the author used k-means clustering and VQ for lossy efficiency of coding inside the established hybrid coding
compression to compare and determine the compression ratios scheme. The deep CNN that has recently revived the NN and
of jpg and tif photos. achieved substantial success both in artificial intelligence
domains and in signal processing also offers a new and
Xin Yuan et al publish a paper “End-to-end evaluation of intriguing approach to picture reduction. In this research, the
compressed sensing-based image compression vs JPEG”[3]. On author provides a thorough, up-to-date analysis of picture
the basis of compressive sensing, they describe a method for reduction methods based on neural networks.
complete picture compression. The system that is being shown
combines quantization and entropy coding for reconstruction “Design of Image Compression Algorithm Using
with the traditional method on the complete picture of MATLAB”[7] was another paper proposed by Abhishek Thakur
compressive sampling. It is demonstrated that the compression et al. This work provides an overview of recent advancements
performance is comparable to JPEG and significantly superior in image security and recent developments in the field. In order
at low rates with regard to decoded image quality and data rate. to prevent unauthorised users from accessing the picture
They research the variables that affect the system's performance, encryption techniques scramble the pixels of the image and
such as the reconstruction methods, ratio of quantization to lessen their correlation. This paper suggests a technique for
compression trade-offs, and the selection of the sensing matrix. applying a new algorithm called chaotic encryption method to
They offer a useful technique for choosing the compression ratio encrypt the sender's messages. The messages that are exchanged
and the quantization step combinations that produce almost between the two sides will be encrypted and decrypted using
ideal for every given bit rate, quality out of all those that are this key. The new cryptographic method is chaotic encryption.
conceivable. In this study, picture security is highlighted, and a better secure
algorithm using chaotic encryption is designed to offer
Venkateswaran Narasimhan et al proposed a paper “Wavelet increased security and dependability.
Domain K- Means Clustering Based Image Compression”[4].
Using the k-means algorithm with discrete wavelet A paper on a comparative study block incorporating
transformation, this work proposes cutting-edge image wavelet, fractal image compression, embedded zero tree, and
compreion method. Wavelet coefficient clustering is used for coding is presented.[8] was published by IJESRT Journal. The
each DWT band, greater compression rates are desired. In performance differences of several transform coding methods,
contrast to other approaches, this methodology applies DWT to including embedded zero tree image compression, wavelet,
fractal, and block truncating coding, are examined in this
research. This work focuses on key aspects of transform coding Sensor Networks compression, and identified a modern
used for still image compression. The aforementioned methods categorization for present suggested method of compression, as
have been applied successfully in numerous applications. well as a list of benefits, drawbacks, and unresolved
Pictures produced using those methods produce excellent investigation questions.
outcomes. The ratio of peak signal vs noise and the ratio of
compression measurements are utilize to numerically analyse Andrea Vitali et al proposed an idea of “IMAGE
these techniques (CR). Researchers employ Matlab software's COMPRESSION BY PERCEPTUAL VECTOR
Image Processing Toolbox to carry out their proposed study. QUANTIZATION”[13]. This study describes a vector
quantization-based picture compression method. Transforms
The work "Lossy Image Compression using Discrete and entropic coding, which are typically applied before and after
CosineTransform" by P. B. Pokle et al. [9]. The main issues with quantization, are not employed in this process. The image's local
social media applications are the high data rates, high attributes are tracked and exploited by the quantization stage
bandwidth, and enormous amounts of memory needed for using adaptive computation.
compute and storage. Due to bandwidth restrictions, there are
significant difficulties in sending such massive volumes of data Bibhas Chandra Dhara proposed a paper “Block truncation
via the network even with faster internet, throughput rates, and coding and pattern fitting are used to compress images and
videos for quick decoding.”[14]. The goal of this thesis is to
upgraded network infrastructure. This supports the requirement
for creating compression methods in order to maximise create an effective decoder for an image compression technique.
available bandwidth. This article demonstrates how to use the It is obvious that spatial domain-based compression takes
discrete cosine transform to compress digital images and substantially less time to decode than sub-band compression
compares it to other techniques. approaches. Both BTC and VQ are frequently used techniques
for compressing spatial domain data. The proposed paper
Surbhi Singh and Vipin kumar Gupta proposed a work on suggests a blended compression technique that uses the
“Huffman Coding JPEG Image Compression and concepts VQ and BTC to produce reasonable bit-rate per quality
Decompression”[10]. The performance of picture compression trade-off.
is being improved by the authors in this research. Authors are
enhancing the parameters of Normalized CrossCorrelation, Sarang Bansod and Sweta Jain proposed a paper of “The
Structural Content, MSE, Average Difference, PSNR, Harmony Search Algorithm Has Been Improved For Colour
Maximum Difference, and Normalized Absolute Error to Image Compression”[15]. The harmony search algorithm is
increase the performance of image compression. The authors effective for many different situations but is still somewhat new,
can view from the outcomes discussion that NK and PSNR particularly in the area of picture reduction. This study aims to
values are increasing while all other values are decreasing. increase the effectiveness of the method when used to compress
Authors must lower the values of PSNR, AD, SC, MD, MSE, different types of photos. Combining it with Bitcoin versions
and NAE,and must raise NK esteems to increase picture can yield a better outcome (Block Truncation Coding).
compression's effectiveness. Implementation outcomes further demonstrate the viability of
this newly developed technique. Because the original image
Emy Setyaningsih and Agus Harjoko proposed a paper requires a lot of disc space and high transmission bandwidth, an
“Examination of Hybrid Image Compression Methods”[11]. efficient method for image compression is continually evolving.
Compression is the process of shrinking or condensing data Although there are many different image compression methods,
while preserving the information's quality. This study provides the main objective is always to provide the best method that
a review of studies that highlight how various hybrid meets the needs of the user and requires less memory, which is
compression strategies have advanced over the past ten years. accomplished by achieving a strong compression ratio. To
As with the JPEG compression approach, a hybrid compression increase the compression ratio, certain earlier studies on lossless
strategy combines the best aspects of each group of picture compression were suggested. The proposed
technologies. In order to achieve an elevated ratio of methodology achieves higher compression ratios than other
compression bearing in mind the level of the image's existing approaches to image compression algorithms.
reconstruction, this method mixes lossy and lossless methods of
compression. While lossless compression results in high-quality III. RELATED TERMINOLOGIES
data reconstruction and a comparatively high compression ratio Before going through the proposed paper, one should have a
because the data may subsequently be decompressed with the clear understanding of some related terminologies like
same outcomes as before the compression, lossy compression compression, clustering, lossy and lossless compression, etc.
generates a reasonable ratio of compression. The concept
undertaking additional research to enhance how well the picture A. Image processing
performs compression is suggested talks about what is known Image processing is transferring an analogue image with a
about and problems with evolving hybrid methodology of digital form and performing particular processes to it in order to
compression. enhance the image or extract more useful information from it. It
functions like a signal time where an image, such a video frame
Abdelhamid Mammeri, et al proposed a paper “A Survey
or photograph, acts as the input and the output might be a picture
ofImage Compression Algorithms for Visual Sensor
or attributes related to that image.
Networks”[12]. The author of this survey study gave a brief
summary of the current technology for algorithms for Visual
…(4)
The following steps are followed by the elbow approach to
determine the optimal cluster value:
• It applies K-means on the provided dataset, with K
values ranging from 1 to 10.
• Determine the WCSS for each K value.
• Produces a graph that compares the computed WCSS
values to each cluster's K value.
Fig. 4. Sorted Cluster • A plot point or bend's sharp tip is considered to have the
greatest value of K at that intersection.
K-means is an algorithm to achieve exactly this. Recall that
The figure below represents the elbow method:
K means desired clusters number. Method start by randomly
collecting K training instances and in μ so that the points are μ
1, μ2, ..., μK and μ ∈ Rn where is the number of features. At
startup, the points could be quite close to one another, therefore
it is needed to check if the outcome looks like how precise it
appear because it might stuck be in a local optimum.
Then carry out the subsequent actions for a predetermined
number of iterations:
•
For every training example, assign c(i) , to the closest
centroid.
• For every centroid μk, set the location to be the average
of examples assigned to it.
Fig. 5. WCSS VS No. of Cluster
As the algorithm advances, normally the centroids will
migrate to the center of the clusters and the overall distance of B. Architecture
the examples to the clusters gets less 1) Algorithms
This technique utilised in proposed model. The pictures's The K-Means cluster works as shown by the following
pixels are made up of three values: R(ed), B(lue), and G. (reen). algorithm:
Step 1: Using the elbow technique, select K and calculate the same meaning (color), thus they don’t have to be presented as a
clusters' number. grid. Finally, this model is able to use the algorithm to obtain K
Step 2: Randomly choose K's centroids or locations. It's possible colors. These colors are chosen by the algorithm.
that wasn't the supplied data set. Output:
Step 3: Assign each information point to its nearest centroid, [336519.9994879626, 141611.69711384104,
which will produce the required K groups. 86682.93995564576, 64686.94497731874,
Step 4: Determine the variance and move the centroid of each 50913.2632977519, 40686.22682230402,
cluster. 32979.73628183274, 28492.671767520664,
Step 5: Repeat Step 3 to reassign each data point to the new 25375.6265720664, 23094.491972583917,
centroid of each cluster. 20871.595122360202, 18868.510838939113,
17315.16554641331, 15905.816009267488,
Step 6: If a reassignment occurs, proceed to step 4; if not, go to
14559.14441807909, 13297.934030623528,
COMPLETE. 12350.45380372168, 11492.53319826621,
Step 7: Algorithm completed 10974.709382505742, 10364.74151021806]
2) Flowchart
that the size of the image is decreased by averaging the k colours D. Analysis
that seem the closest to the original image.
2) Distance: It assesses every observation and places it in TABLE II. COMPARISIONS TABLE
the nearest cluster. It defines the distances to the other centroids S. COMPARISIONS
in order for it to qualify as the "closest" group. The K means the No RLS- VQ
Compression Technique K-mean
. DLA
clustering algorithm updates a cluster's centroid whenever it 1 Saving Percentage 71.49% 30% -
adds or loses a data point. Lossy, Lossy
2 Compression Type Lossless
3) Maximum Iterations: The algorithm iterates until the lossless
cluster centers have changed the fewest amount from the All image JPG and TIF
3 File extension -
extention
previous iteration. If the clusters are uniformly spherical in 5.053101e
shape, K-means is particularly good at identifying the structure 4 Compression Ratio: 0.285 1.3 −
and drawing conclusions from the data. However, the method 01
5 Compression Factor: 3.507 0.769 0.0785
does a poor job of grouping the data if the clusters contain more 6 Compression Time 494 SEC - 6 SEC
intricate geometric features.
4) Compression Ratio VII. CONCLUSION AND FUTURE SCOPE
size after It's clever to use image compression to reduce an image's
CR= …(5) size while keeping its resolution. The k-mean clustering
size before algorithm is implemented for image compression in this study.
5) Compression Factor WCSS value is calculated to get the appropriate value of k using
size before the Elbow method. Using K-mean clustering, an unsupervised
CF= …(6) technique, and the proposed methodology achieves a lossless
size after compressed image.
6) Saving Percentage
size before - size after
SP= % …(7)
size before
7) Compression Time: If an algorithm's compression times
are reasonable, it suggests that the algorithm is reasonable in
relation to the time factor. This component might have very low
values as high-speed computer accessories evolve, and those
values might be influenced by how well computers work.