0% found this document useful (0 votes)
30 views9 pages

Lossless Image Compression Using K-Means Clustering in Color Pixel Domain

The document presents a study on lossless image compression using K-means clustering, highlighting the importance of efficient image storage and transmission in various fields, particularly in medical imaging. The proposed method aims to reduce image size without compromising quality, achieving higher compression ratios compared to existing algorithms. The paper also reviews various image compression techniques and their applications, emphasizing the significance of clustering in enhancing compression efficiency.

Uploaded by

Niraj kotve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views9 pages

Lossless Image Compression Using K-Means Clustering in Color Pixel Domain

The document presents a study on lossless image compression using K-means clustering, highlighting the importance of efficient image storage and transmission in various fields, particularly in medical imaging. The proposed method aims to reduce image size without compromising quality, achieving higher compression ratios compared to existing algorithms. The paper also reviews various image compression techniques and their applications, emphasizing the significance of clustering in enhancing compression efficiency.

Uploaded by

Niraj kotve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

Lossless Image Compression using K-Means


2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT) | 979-8-3503-8354-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/IC2PCT60090.2024.10486602

Clustering in Color Pixel Domain


Rimjhim Kumari Srinivasan Sriramulu
Department of Computer Science and Engineering School of computing Science and Engineering
Galgotias University, Greater Noida, India Galgotias University Greater Noida, India
[email protected] [email protected]

Abstract— Compressing images is a method for shrinking the for compressing images. While unneeded metadata is deleted in
image's dimensions using a particular algorithm. Image lossless compression to minimize file size, lossy compression
compression is a solution associated with transmitting and storing permanently removes part of the original data to reduce file size.
large amounts of data for digital images. While image storage is The terms "supervised learning" and "unsupervised learning"
necessary for medical images, satellite images, documents, and refer to two different machine learning techniques. As opposed
pictures, image transmission includes a variety of applications like to a non-supervised learning algorithm, which uses unlabeled
television broadcasting, remote sensing via satellite, and other input data, supervised learning uses labeled data. While
long-distance communication. These kinds of applications deal
clustering is a subtype of unsupervised learning, classification,
with image compression. Lossy compression and lossless
and regression are additional subcategories of supervised
compression are two separate methods for compressing images.
While unneeded metadata is deleted in lossless compression to
learning.
minimize file size, lossy compression permanently removes part of The two most common forms of picture file compression are
the original data to reduce file size. The terms "supervised lossy and lossless methods. Lossyless compression shrinks an
learning" and "unsupervised learning" refer to two different image file's size by permanently deleting unnecessary and
machine learning techniques. As opposed to a non-supervised unimportant data. Lossy compression may significantly reduce
learning algorithm, which uses unlabeled input data, supervised file size, but if used excessively, it can distort photos and cause
learning uses labeled data. While clustering is a subtype of
a considerable loss in image quality. However, quality may be
unsupervised learning, classification, and regression are additional
kept when compression is done properly.
subcategories of supervised learning. This paper discusses the
application of clustering using K-means. This methodology Our paper aims to explain a method for reducing the
explores the concept of lossless picture compression is the process information needed to represent an image, known as image
of reducing the size of an image without sacrificing its quality compression. This technique is crucial in various fields,
which is achieved using k-mean clustering. The proposed including digital cameras, smartphones, and low-resource
methodology achieves higher compression ratios than other devices, where it can save storage space. It is especially
existing approaches to image compression algorithms. important in the medical field, where large amounts of image
Keywords—image, compression, k-mean, clustering,
data, such as X-rays, MRIs, and CT scans, accumulate. Efficient
unsupervised, lossless storage and fast transmission over the network are essential for
immediate diagnoses by experts. The primary goal of the
I. INTRODUCTION suggested paper is:
Image compression is a method to lessen the number of 1) To generate a lossless compressed image
bytes in a graphics without compromising the standard of
images. By lowering the file size, more photographs may be 2) To decrease the number of colors by averaging the
saved in a given amount of RAM or disc space. The image needs colors closest to the original image.
less bandwidth when it is downloaded from a website or sent 3) To implement K-means clustering for achieving
over the internet, which reduces network congestion and speeds compressed image
up content delivery.
4) To achieves higher compression ratios than other
A method for reducing the size of an image is image existing approaches of image compression algorithms.
compression file using a particular algorithm. Image
compression is a solution associated with transmitting and This paper proposes an image compression method that uses
storing large amounts of data for digital images. While image K-means clustering to generate a lossless compressed image
storage is necessary for medical images, satellite images, that achieves higher compression ratios than other existing
documents, and pictures, image transmission includes a variety approaches of image compression algorithms. The proposed
of applications like television broadcasting, remote sensing via document is organized as, in the beginning, this paper accurately
satellite, and other long-distance communication. These kinds describes initially what assumptions were made from the images
of applications deal with image compression. Lossy in the order in which the method achieves the proposed results.
compression and lossless compression are two separate methods Then this paper describes precisely how the unsupervised

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1925


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

learning technique is used using k-means algorithm. In the end, only a portion of the original image. Here, 16x16 subblocks
this paper explains how the proposed compression method from a picture are decomposed on a single level using wavelets,
works as well as how k-mean is used to compress images. then the coefficients are categorised by utilising the commonly
used K-means clustering method. The centriods of each cluster
II. LITERATURE REVIEW are enumerated and arranged in a book. Only the index values
G.S.K.Krishna phani, B.Lavanya and Sanjay Kumar Singh are sent beyond the pale. It is possible to get high compression
proposed an idea of image compresion using RLS-DLA ratio by reducing the number of clusters. According to the
ALGORITHM in their paper “STUDY AND ANALYSIS simulation findings, changing the cluster size from 10 to 100
OFRLS-DLA ALGORITHMWITH NATURAL IMAGE results in a compression ratio that ranges from 56.8 to 8.0 while
COMPRESSION TECHNIQUES”[1]. The "Algorithm for keeping the image's quality adequate.
Learning a Dictionary Using Recursive Least Squares " is the Nishant Kumar Singhai and Prateek Mishra published a
major focus of this paper. Learning over-complete dictionaries paper[5]. The authors of this study suggested a brand-new
for sparse signal representation using RLS-DLA. While each hybrid approach for image compression. In this study, the
training vector is processed, recursively updating the dictionary authors lessen the impact of two transform methods and create
is done while using the training set.. The RLS-DLA technique a new one that is more effective than either one alone. Here,
was constructed using a convergence scheme that took the researchers compare their new strategy to the earlier one and
forgetting factor into account. The algorithm's core is simple suggest other approaches. The comparison made use of the peak
and straightforward to use. Also, researches focused on a signal-to-noise ratio, mean square error, and compression ratio.
comparison between RLS-DLA with more traditional natural In terms of visual perception, the modern suggested hybrid
image compression methods like JPEG and JPEG 2000. technique i.e. the combination of WPT and DCT is superior.
Lastly, the RLS-DLA approach's sparsity coefficients enable This article serves as a useful resource for choosing an image
low bit rate picture compression. The experimental findings compression technique in the future. This work comes highly
show that the RLS-DLA technique is excellent at compressing recommended for transmission and storage purposes.
natural images at low bit rates. Thamer Hameed proposed a paper “Image Compression
Rishav Chatterjee et al propose an idea of Lossy Using Neural Networks: A Review”[6] . Technology for
Compression in their paper “Image Compression Using VQ for imaging and video coding has advanced considerably in recent
LossyCompression”[2]. Because of its fast rate of compression years. Yet, taking into account the popularity of picture and
and straightforward decoding techniques, vector quantization is video collecting systems, the expansion of picture data is far
among the most widely used lossy compression methods. The higher than that of the growth in compression ratio. It is broadly
design of the codebook is the primary VQ approach. In this acknowledged, it will become harder to further increase the
study, the author used k-means clustering and VQ for lossy efficiency of coding inside the established hybrid coding
compression to compare and determine the compression ratios scheme. The deep CNN that has recently revived the NN and
of jpg and tif photos. achieved substantial success both in artificial intelligence
domains and in signal processing also offers a new and
Xin Yuan et al publish a paper “End-to-end evaluation of intriguing approach to picture reduction. In this research, the
compressed sensing-based image compression vs JPEG”[3]. On author provides a thorough, up-to-date analysis of picture
the basis of compressive sensing, they describe a method for reduction methods based on neural networks.
complete picture compression. The system that is being shown
combines quantization and entropy coding for reconstruction “Design of Image Compression Algorithm Using
with the traditional method on the complete picture of MATLAB”[7] was another paper proposed by Abhishek Thakur
compressive sampling. It is demonstrated that the compression et al. This work provides an overview of recent advancements
performance is comparable to JPEG and significantly superior in image security and recent developments in the field. In order
at low rates with regard to decoded image quality and data rate. to prevent unauthorised users from accessing the picture
They research the variables that affect the system's performance, encryption techniques scramble the pixels of the image and
such as the reconstruction methods, ratio of quantization to lessen their correlation. This paper suggests a technique for
compression trade-offs, and the selection of the sensing matrix. applying a new algorithm called chaotic encryption method to
They offer a useful technique for choosing the compression ratio encrypt the sender's messages. The messages that are exchanged
and the quantization step combinations that produce almost between the two sides will be encrypted and decrypted using
ideal for every given bit rate, quality out of all those that are this key. The new cryptographic method is chaotic encryption.
conceivable. In this study, picture security is highlighted, and a better secure
algorithm using chaotic encryption is designed to offer
Venkateswaran Narasimhan et al proposed a paper “Wavelet increased security and dependability.
Domain K- Means Clustering Based Image Compression”[4].
Using the k-means algorithm with discrete wavelet A paper on a comparative study block incorporating
transformation, this work proposes cutting-edge image wavelet, fractal image compression, embedded zero tree, and
compreion method. Wavelet coefficient clustering is used for coding is presented.[8] was published by IJESRT Journal. The
each DWT band, greater compression rates are desired. In performance differences of several transform coding methods,
contrast to other approaches, this methodology applies DWT to including embedded zero tree image compression, wavelet,
fractal, and block truncating coding, are examined in this

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1926


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

research. This work focuses on key aspects of transform coding Sensor Networks compression, and identified a modern
used for still image compression. The aforementioned methods categorization for present suggested method of compression, as
have been applied successfully in numerous applications. well as a list of benefits, drawbacks, and unresolved
Pictures produced using those methods produce excellent investigation questions.
outcomes. The ratio of peak signal vs noise and the ratio of
compression measurements are utilize to numerically analyse Andrea Vitali et al proposed an idea of “IMAGE
these techniques (CR). Researchers employ Matlab software's COMPRESSION BY PERCEPTUAL VECTOR
Image Processing Toolbox to carry out their proposed study. QUANTIZATION”[13]. This study describes a vector
quantization-based picture compression method. Transforms
The work "Lossy Image Compression using Discrete and entropic coding, which are typically applied before and after
CosineTransform" by P. B. Pokle et al. [9]. The main issues with quantization, are not employed in this process. The image's local
social media applications are the high data rates, high attributes are tracked and exploited by the quantization stage
bandwidth, and enormous amounts of memory needed for using adaptive computation.
compute and storage. Due to bandwidth restrictions, there are
significant difficulties in sending such massive volumes of data Bibhas Chandra Dhara proposed a paper “Block truncation
via the network even with faster internet, throughput rates, and coding and pattern fitting are used to compress images and
videos for quick decoding.”[14]. The goal of this thesis is to
upgraded network infrastructure. This supports the requirement
for creating compression methods in order to maximise create an effective decoder for an image compression technique.
available bandwidth. This article demonstrates how to use the It is obvious that spatial domain-based compression takes
discrete cosine transform to compress digital images and substantially less time to decode than sub-band compression
compares it to other techniques. approaches. Both BTC and VQ are frequently used techniques
for compressing spatial domain data. The proposed paper
Surbhi Singh and Vipin kumar Gupta proposed a work on suggests a blended compression technique that uses the
“Huffman Coding JPEG Image Compression and concepts VQ and BTC to produce reasonable bit-rate per quality
Decompression”[10]. The performance of picture compression trade-off.
is being improved by the authors in this research. Authors are
enhancing the parameters of Normalized CrossCorrelation, Sarang Bansod and Sweta Jain proposed a paper of “The
Structural Content, MSE, Average Difference, PSNR, Harmony Search Algorithm Has Been Improved For Colour
Maximum Difference, and Normalized Absolute Error to Image Compression”[15]. The harmony search algorithm is
increase the performance of image compression. The authors effective for many different situations but is still somewhat new,
can view from the outcomes discussion that NK and PSNR particularly in the area of picture reduction. This study aims to
values are increasing while all other values are decreasing. increase the effectiveness of the method when used to compress
Authors must lower the values of PSNR, AD, SC, MD, MSE, different types of photos. Combining it with Bitcoin versions
and NAE,and must raise NK esteems to increase picture can yield a better outcome (Block Truncation Coding).
compression's effectiveness. Implementation outcomes further demonstrate the viability of
this newly developed technique. Because the original image
Emy Setyaningsih and Agus Harjoko proposed a paper requires a lot of disc space and high transmission bandwidth, an
“Examination of Hybrid Image Compression Methods”[11]. efficient method for image compression is continually evolving.
Compression is the process of shrinking or condensing data Although there are many different image compression methods,
while preserving the information's quality. This study provides the main objective is always to provide the best method that
a review of studies that highlight how various hybrid meets the needs of the user and requires less memory, which is
compression strategies have advanced over the past ten years. accomplished by achieving a strong compression ratio. To
As with the JPEG compression approach, a hybrid compression increase the compression ratio, certain earlier studies on lossless
strategy combines the best aspects of each group of picture compression were suggested. The proposed
technologies. In order to achieve an elevated ratio of methodology achieves higher compression ratios than other
compression bearing in mind the level of the image's existing approaches to image compression algorithms.
reconstruction, this method mixes lossy and lossless methods of
compression. While lossless compression results in high-quality III. RELATED TERMINOLOGIES
data reconstruction and a comparatively high compression ratio Before going through the proposed paper, one should have a
because the data may subsequently be decompressed with the clear understanding of some related terminologies like
same outcomes as before the compression, lossy compression compression, clustering, lossy and lossless compression, etc.
generates a reasonable ratio of compression. The concept
undertaking additional research to enhance how well the picture A. Image processing
performs compression is suggested talks about what is known Image processing is transferring an analogue image with a
about and problems with evolving hybrid methodology of digital form and performing particular processes to it in order to
compression. enhance the image or extract more useful information from it. It
functions like a signal time where an image, such a video frame
Abdelhamid Mammeri, et al proposed a paper “A Survey
or photograph, acts as the input and the output might be a picture
ofImage Compression Algorithms for Visual Sensor
or attributes related to that image.
Networks”[12]. The author of this survey study gave a brief
summary of the current technology for algorithms for Visual

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1927


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

B. Compression H. Regression Algorithms


With aid of compression, the amount of storage space or It is an additional supervised learning method. Regression
bandwidth needed to display an image can be decreased. Image may be used as a substitute for employing classes or discrete
compression techniques entail shrinking the size of an image values to find a model or function that divides the data into
and adjusting it so that the quality isn't significantly continuous real values. It also has the ability to detect
compromised. distribution movement based on past data. Because it predicts a
quantity, a regression predictive model's accuracy must be
C. Lossy Compression stated as an error in those predictions.
Lossy compression is a type of data compression that
eliminates certain information in order to produce smaller files. I. Clustering
After decompression, lossy compression does not reconstruct or This sort of learning is unsupervised learning. The machine
restore the original data. Lossy compression is frequently model discovers that this strategy is generally used to group data
applied to audio, video, and image files that include more data based on various patterns, such as similarities or differences.
than is necessary. These techniques are used to sort unclassified, unprocessed data
objects.
D. Lossless Compression
A data compression technique called lossless compression J. Association
allows for a flawless conversion of the compressed picture to It is an additional type of unsupervised learning. This
the original image with absolutely no information loss. Lossless method is a rule-based machine learning strategy that identifies
compression is possible because most real-world data displays relationships between parameters in a huge data collection. This
statistical redundancy. Comparatively, lossy compression only method mostly applies to market basket analysis, which aids in
allows for the approximate reconstruction of the original data, clarifying the connections between various products. To
albeit typically having far higher compression rates. determine the relationship between the sales of specific item in
relation to the sale of another focused on consumer conduct, for
E. Color Image Processing instance, shopping stores deploy algorithms based on this
An image processing application that uses a color-based technique. For instance, if a consumer purchases milk, he might
method to extract information from an image. also purchase bread, eggs, or butter. Once properly taught, these
models can be utilized to promote sales by devising various
F. Machine Learning
promotions.
As the name indicates, the computer's capacity for learning
is what gives it a more human-like character. In more locations IV. MATERIALS AND METHODS
than one may imagine, machine learning is currently being The subsequent materials - methods are utilised to evaluate
actively deployed. Learning by Machine is classified into three the efficiency of lossless k-mean clustering image compression
categories: algorithms:
• Supervised Learning: As implied by the name, A. Materials
supervised learning entails a supervisor acting as a
teacher. The technique of training a computer system The following lossless compression techniques are taken
using a set of labeled data is called supervised learning. into consideration for this study.

• Unsupervised learning: It is used to cluster and evaluate 1) K-Means Clustering


unfamiliar data sets. This approach runs independently, An unsupervised technique, separates the set of unfamiliar
which means that no output is given to the model in order data into different groups. In this case, K variable is used as an
to find hidden patterns and information. By using solely experimental parameter which indicates the number of pre-
the input parameter values, the training model defined groups that are needed to be created during the
automatically identifies groups or patterns. procedure. For instance, if the value of K is two, this mean that
only two clusters will exist, if the value of K is 3, there'll just be
• Reinforcement learning: It involves understanding of three clusters, and so on.
how to react in a circumstance to gain to the greatest
extent. Data for RL is gathered from machine learning
systems that employ a trial-and-error process. Input for
either supervised or unsupervised machine learning does
not include data.
G. Classification Algorithms
This sort of learning is supervised. Finding a model or
function to help divide the data into several discrete values or
category categories is a necessary step in the classification
process. Data are classified using the input parameters under a
variety of labels, and the labels are then forecasted for the data. Fig. 1. K-MEAN CLUSTER

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1928


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

2) Compression Time: Step 6: If there is a reassignment, proceed to step 4, otherwise,


It is important to take into account the compression times. go to COMPLETE.
This component might have very low values as high-speed Step 7: Algorithm is complete.
computer accessories evolve, and those values might be
2) Evaluating the performance of K-mean
influenced by how well computers work.
Considering the total sum of clusters, there is no definitive
3) Measuring Compression Performances: answer because K-means is unable to learned from
There are numerous metrics to gauge a compression observations; it demands k as an input. Domain expertise and
algorithm's performance depending on the application. Space intuition may be helpful occasionally, even if it's not frequently
efficiency would be the primary consideration while evaluating the situation. Since groups are employed in the following-up
performance. Another element is time effectiveness. The nature simulation in the cluster predicted approach, on the basis of
and structure of the input source affect performance. The type various K clusters, evaluation of the models' performance is
of compression algorithm—lossy or lossless—also affects the possible.
compression behavior. The space and time efficiencies would In this paper, the elbow approach of indication is to
be higher than those of the lossless compression strategy if a understand K. Based on the sum of squared distance (SSE)
particular source file was compressed using a lossy compression between data points and the centroids that make up their
algorithm. Because it is challenging to measure overall assigned clusters, the elbow technique gives researchers an idea
performance, several measurements should be used to assess the of what a suitable k number of clusters might look like. At the
performances of such compression families. The metrics used to point when SSE begins to flatten out and form an elbow, the
assess the effectiveness of lossless algorithms are listed below. value of k is found. The geyser dataset will be used to evaluate
The proportion of a compressed file's size to the size of the SSE for different values of k to identify potential elbow and
original file is known as the ratio of compression. flattening points on the trajectory.
size (after)
Compression Ratio = ... (1)
size (before)
The CR's opposite is the CF. It measures the ratio of a file's
original size to its compressed size.
size (before)
Compression Factor, CF = … (2)
size (after)
The shrinkage of the source file is calculated using
following:
size (before) - size (after)
Saving % = % …(3)
size (before)

Using file sizes, each of the above-mentioned methods


assesses how well compression algorithms work.
B. Methods
Fig. 2. Elbow Graph
A series of image files are used to develop and test the k-
mean algorithm, which is utilised to assess the effectiveness in The graph shows that k=2 is an acceptable option. It might
lossless compression technique. The aforementioned factors are be difficult to decide how many clusters to employ when the
computed in order to evaluate performances. curve is monotonically dropping since there might not be an
1) Working of K-mean obvious elbow or point when the curve starts to flatten out.
The following steps show the phases of the K-Means algorithm: V. PROPOSED METHODOLOGY
Step 1: To get the total number of clusters, select the value of
the K variable. A. System Mode
Step 2: Pick K's centroids or places at random. That data set K-means is an unsupervised machine-learning algorithm, as
might not have been utilised as input. was indicated in the introduction. Simply put, the main
Step 3: Each data point should be allocated to the closest distinction between supervised with unsupervised techniques
centroid, which will provide the K preset groupings. that former learn by example (labels are already in the dataset),
Step 4: Determine the variance, then move each cluster's while the latter learn by trial and error.
centroid. There are two features in the figure below: x1 and x2.
Step 5: Re-assign each data point to the new centroid of each
cluster by repeating Step 3.

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1929


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

They can be thought of as the grid's 3D points for RGB. The


main objective of the suggested work's picture compression is
to decrease number of colors by taking the average K colors that
are most reminiscent of the original picture. An image with
fewer colors takes up less disc space, which is what proposed
method desire.
Compressing the picture might also be a preparatory step for
another method. Lowering the size of the input data often speeds
up learning.
The effectiveness of K-means technique hinges on creates
incredibly efficient clusters. It is difficult to determine the
appropriate clusters number, though. Although there are various
ways to figure out how many clusters are ideal, this post will
concentrate on the optimal method. The steps are explained
below:
Fig. 3. Data Points
The elbow approach is one of the most used techniques for
Each object needs to be sorted into one of two clusters. The determining the number of ideal clusters. Total variations inside
most natural approach to go about it would be as follows: a cluster are denoted by the acronym WCSS, or inside Cluster
Sum of Squares.

…(4)
The following steps are followed by the elbow approach to
determine the optimal cluster value:
• It applies K-means on the provided dataset, with K
values ranging from 1 to 10.
• Determine the WCSS for each K value.
• Produces a graph that compares the computed WCSS
values to each cluster's K value.
Fig. 4. Sorted Cluster • A plot point or bend's sharp tip is considered to have the
greatest value of K at that intersection.
K-means is an algorithm to achieve exactly this. Recall that
The figure below represents the elbow method:
K means desired clusters number. Method start by randomly
collecting K training instances and in μ so that the points are μ
1, μ2, ..., μK and μ ∈ Rn where is the number of features. At
startup, the points could be quite close to one another, therefore
it is needed to check if the outcome looks like how precise it
appear because it might stuck be in a local optimum.
Then carry out the subsequent actions for a predetermined
number of iterations:

For every training example, assign c(i) , to the closest
centroid.
• For every centroid μk, set the location to be the average
of examples assigned to it.
Fig. 5. WCSS VS No. of Cluster
As the algorithm advances, normally the centroids will
migrate to the center of the clusters and the overall distance of B. Architecture
the examples to the clusters gets less 1) Algorithms
This technique utilised in proposed model. The pictures's The K-Means cluster works as shown by the following
pixels are made up of three values: R(ed), B(lue), and G. (reen). algorithm:

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1930


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

Step 1: Using the elbow technique, select K and calculate the same meaning (color), thus they don’t have to be presented as a
clusters' number. grid. Finally, this model is able to use the algorithm to obtain K
Step 2: Randomly choose K's centroids or locations. It's possible colors. These colors are chosen by the algorithm.
that wasn't the supplied data set. Output:
Step 3: Assign each information point to its nearest centroid, [336519.9994879626, 141611.69711384104,
which will produce the required K groups. 86682.93995564576, 64686.94497731874,
Step 4: Determine the variance and move the centroid of each 50913.2632977519, 40686.22682230402,
cluster. 32979.73628183274, 28492.671767520664,
Step 5: Repeat Step 3 to reassign each data point to the new 25375.6265720664, 23094.491972583917,
centroid of each cluster. 20871.595122360202, 18868.510838939113,
17315.16554641331, 15905.816009267488,
Step 6: If a reassignment occurs, proceed to step 4; if not, go to
14559.14441807909, 13297.934030623528,
COMPLETE. 12350.45380372168, 11492.53319826621,
Step 7: Algorithm completed 10974.709382505742, 10364.74151021806]
2) Flowchart

Fig. 7. Elbow Graph


Fig. 6. Flowchart
Because the indexes supplied by the find k means function
3) Working
are 1 iteration behind the colors, the indexes is computed for the
Initial step begin by implementing a set of function that
current colors. Each pixel has a value in the range of 0 to K,
creates initial points for the centroids. This function picks k
which naturally corresponds to its color.
unique points at random using the Elbow method from the input
X, the training examples. When all the necessary information is obtained,
reconstruction of the picture is perform by switching out the
Then a method is construct to locate the nearest centroid for
colour for the colour index and resizing it to match its original
each training example. This is the first step of the algorithm. For
dimensions. Then transform the raw numbers back into an
each example in c, take X and the centroids as input and return
image using the Pillow method Image.fromarray. Additionally
the index of the closest centroid, an m-dimensional vector.
convert the indexes to integers because numpy only accepts
In the second step of the algorithm, calculate each example's those as indexes for matrices.
distance from "its" centroid and take the average of those
distances for each centroid, where k is the number of centroids. VI. EXPERIMENTATION AND ANALYSIS
Transposing the examples is necessary because looping id done A. Experimental Setup
over the rows.
The image comparison K Means algorithm with Python,
Finally, all the parameters are obtain to complete the K- Numpy, and Pillow is set up as follows: First, python should be
means algorithm. The most iterations allowed, max_iter, is set installed. Then, the next task is to install two python libraries
to 10. Notably, if the centroids stop moving, return statement is i.e. OS, SYS, Numpy, and Pillow. Now, prepare the dataset
implemented to return the results because there is no more room using Python libraries such as Numpy and Pillow. After this, the
for optimization. setup will be ready to implement the proposed model of image
compression
Now, in the next step, the image is obtained using already
implemented k information. The picture is defined as the first B. Experimental Parameters
(and final) command line option, so begin by attempting to open 1) Average K Color: K-means refers to the number of
it. Model receive an Image object from Pillow, but the algorithm
clusters that the algorithm was able to recognise from the data.
needs a NumPy array. So let’s define a little helper function to
convert them. For the pixels to be scaled to 0...1, observe how The data points are clustered together in this approach to make
each value is divided by 255. Then model get the feature matrix the total squared distances between the data points and the
X. Redesigning of the image is done because each pixel has the centroid as short as feasible. The recommended model states

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1931


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

that the size of the image is decreased by averaging the k colours D. Analysis
that seem the closest to the original image.
2) Distance: It assesses every observation and places it in TABLE II. COMPARISIONS TABLE
the nearest cluster. It defines the distances to the other centroids S. COMPARISIONS
in order for it to qualify as the "closest" group. The K means the No RLS- VQ
Compression Technique K-mean
. DLA
clustering algorithm updates a cluster's centroid whenever it 1 Saving Percentage 71.49% 30% -
adds or loses a data point. Lossy, Lossy
2 Compression Type Lossless
3) Maximum Iterations: The algorithm iterates until the lossless
cluster centers have changed the fewest amount from the All image JPG and TIF
3 File extension -
extention
previous iteration. If the clusters are uniformly spherical in 5.053101e
shape, K-means is particularly good at identifying the structure 4 Compression Ratio: 0.285 1.3 −
and drawing conclusions from the data. However, the method 01
5 Compression Factor: 3.507 0.769 0.0785
does a poor job of grouping the data if the clusters contain more 6 Compression Time 494 SEC - 6 SEC
intricate geometric features.
4) Compression Ratio VII. CONCLUSION AND FUTURE SCOPE
size after It's clever to use image compression to reduce an image's
CR= …(5) size while keeping its resolution. The k-mean clustering
size before algorithm is implemented for image compression in this study.
5) Compression Factor WCSS value is calculated to get the appropriate value of k using
size before the Elbow method. Using K-mean clustering, an unsupervised
CF= …(6) technique, and the proposed methodology achieves a lossless
size after compressed image.
6) Saving Percentage
size before - size after
SP= % …(7)
size before
7) Compression Time: If an algorithm's compression times
are reasonable, it suggests that the algorithm is reasonable in
relation to the time factor. This component might have very low
values as high-speed computer accessories evolve, and those
values might be influenced by how well computers work.

Compression Time = time before compression - time before


compression
Fig. 8. Image Compresion
8) Within Cluster Sum of Squares: In the Elbow method, it
is utilised. The acronym "WCSS" stands for "total variations Each compression is ideal in its own context. Further
compression techniques that can compress data at higher
inside a cluster.":
compression ratios while preserving image quality will be taken
into consideration in the near future. Future developments in
technology may allow for picture compression that takes into
account both image symmetry and redundancy.
…(8)
REFERENCES
C. Performance Evaluation [1] G.S.K.Krishna phani, B.Lavanya and Sanjay Kumar Singh proposed an
K=5 resulted in a 71% reduction in file size, from 228kb to idea of image compresion in their paper “Study And Analysis Ofrls-Dla
Algorithmwith Natural Image Compression Techniques” 2010
65kb.
[2] Rishav Chatterjee, Alenrex Maity and Rajdeep Chatterjee “Image
TABLE I. EVALUATION TABLE Compression Using VQ for Lossy Compression” 2019
Average K Color 20 [3] Xin Yuan, Senior Member, IEEE and Raziel Haimi-Cohen, Senior
Maximum Iterations 10 Member, IEEE “Image Compression Based on Compressive Sensing:End-
to-End Comparison with JPEG” arXiv : 1706.01000v3 [cs.CV]19Jan2020
Compression Ratio: 0.285
Compression Factor: 3.507 [4] Venkateswaran Narasimhan and Y.V. Ramana Rao “K- Means Clustering
Compression Time: 494 Based Image Compression in Wavelet Domain” January 2007Information
Technology Journal 6(1) DOI:10.3923/itj. 2007.148.153
Within Cluster Sum of Squares 5
Saving Percentage 71.49% [5] Nishant Kumar Singhai and Prateek Mishra ”Image Compression And
Performance Comparison Using Hybrid (WPT+DCT) Transform Method”

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1932


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Computing, Power, and Communication Technologies (IC2PCT)

[ISinghai, 3(10), October 2016] ISSN: 2394-7659 IMPACT FACTOR-


2.785
[6] Thamer Hameed “Image Compression Using Neural Networks: A
Review” Publisher: International Association of Online Engineering
(IAOE) Publication Name: International Journal of Online and
Biomedical Engineering (iJOE) 2018
[7] Abhishek Thakur, Rajesh Kumar, Amandeep Bath, Jitender Sharma
“Design of Image Compression Algorithm Using MATLAB” www.ijeee-
apm.com International Journal of Electrical & Electronics Engineering 7
IJEEE, Vol. 1, Issue 1 (Jan-Feb 2014)e-ISSN: 1694-2310|p-ISSN: 1694-
2426
[8] Er Kapoor, A Paper On A Comparative Study Block Truncating Coding,
Wavelet, Fractal Image Compression & Embedded Zero Tree.” IJESRT
Journal ,2018
[9] Pravin B. Pokle and N. G. Bawane “Lossy Image Compression using
Discrete CosineTransform” National Conference on Innovative Paradigms
in Engineering & Technology (NCIPET-2012)Proceedings published by
International Journal of Computer Applications® (IJCA)
[10] Surbhi Singh and Vipin kumar Gupta “JPEG Image Compression and
Decompression by Huffman Coding” Volume 1 , Issue 5 , August
[11] – 2016 International Journal of Innovative Science and Research
TechnologyISSN No: - 2456- 2165
[12] Emy Setyaningsih and Agus Harjoko “Survey of Hybrid Image
Compression Techniques”. International Journal of Electrical and
Computer Engineering (IJECE) Vol. 7, No. 4, August 2017, pp.
2206~2214ISSN: 2088-8708, DOI: 10.11591/ijece.v7i4.pp2206-2214
[13] Abdelhamid Mammeri, Brahim Hadjou,and Ahmed Khoumsi “A Survey
ofImage CompressionAlgorithmsfor Visual Sensor Networks”
International Scholarly Research Network ISRN Sensor Networks
Volume 2012, Article ID 760320, 19 pagesdoi:10.5402/2012/760320
[14] Andrea Vitali , Luigi Della Torre, Sebastiano Battiato, and Antonio Buemi
“Image Compression By Perceptual Vector Quantization” 2000
[15] Bibhas Chandra Dhara “Image and video comperssion using block
truncation coding and pattern fitting for fast decoding” 2008
[16] Sarang Bansod and Sweta Jain “Improved Harmony Search Algorithm For
Color Image Compression” International Journal on Recent and
Innovation Trends in Computing and Communication ISSN: 2321-
8169Volume: 2 Issue: 6 1669 – 1672
[17] S. Burak, G. Carlo, T. Bernd, and G. C. Beaulieu, “Medical image
compression based on the region of interest, with application to colonct
images,” 2001.
[18] R. Begleiter, R. El-Yaniv, and G. Yona, “On prediction using variable
order Markov models,” J. Artif. Intell. Res. (JAIR), vol. 22,pp. 385–421,
2004
[19] G. Davis, S. Mallat, and M. Avellaneda, “Greedy adaptive
approximation,” J. Constr. Approx., vol. 13, pp. 57–98, 1997.
[20] ICLR 2021.2, 4, 6, and 9 R. Zhang, P. Isola, A. Efros, E. Shechtman, and
O. Wang. The unreasonable effectiveness of deep features asa perceptual
metric. In CVPR, 2018. 8
[21] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine trans-form,”
IEEE Trans. Comput., vol. 23, no. 1, pp. 90–93, Jan. 1974.
[22] A. Krizhevsky, “Learning Multiple Layers of Features from TinyImages,”
Master’s thesis.

Copyright © IEEE–2024 ISBN: 979-8-3503-8354-6 1933


Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:36:14 UTC from IEEE Xplore. Restrictions apply.

You might also like