0% found this document useful (0 votes)
15 views5 pages

Lossy Data Compression Using K-Means Clustering On Retinal Images Using RStudio

The paper discusses a method for lossy data compression of retinal images using K-means clustering, highlighting the increasing demand for efficient data compression techniques in various fields. It employs RStudio for implementation and reports a compression ratio of 33.3% based on simulation results. The study emphasizes the advantages of using clustering algorithms for image compression over traditional methods.

Uploaded by

Niraj kotve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Lossy Data Compression Using K-Means Clustering On Retinal Images Using RStudio

The paper discusses a method for lossy data compression of retinal images using K-means clustering, highlighting the increasing demand for efficient data compression techniques in various fields. It employs RStudio for implementation and reports a compression ratio of 33.3% based on simulation results. The study emphasizes the advantages of using clustering algorithms for image compression over traditional methods.

Uploaded by

Niraj kotve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN

Lossy Data Compression using K-Means Clustering


2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N) | 978-1-6654-3811-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICAC3N53548.2021.9725647

on Retinal Images using RStudio


S. Sivaarunagirinathan B. Ajith Bala Shaik Fairooz
Department of ECE, School of EEE Department of ECE, School of EEE Department of ECE, Malla Reddy
SASTRA Deemed University SASTRA Deemed University Engineering College
Thanjavur, India Thanjavur, India Secunderabad, India

G. Sasi Har Narayan Upadhyay V. Elamaran


Department of BME, Vel Tech Multi Department of ECE, School of EEE Department of ECE, School of EEE
Tech Dr.Rangarajan Dr. Sakunthala SASTRA Deemed University SASTRA Deemed University
Engineering College Thanjavur, India Thanjavur, India
Chennai, India [email protected]

Abstract—The massive volume of data limits the use of data • Seismology


on portable devices like mobile phones, as well as memory-
limited devices like video game consoles. As a result, the • Financial signal processing
demand for data compression methods in the field of signal
and image processing is skyrocketing. Data compression
• Biomedicine
required advanced ways to serve better, with a greater • Multirate signal processing
compression ratio, due to the rising demands, which expected
more sophisticated algorithms. This article primarily focuses • Statistical signal processing
on image compression (retinal images) utilizing the k-means
clustering technique, which is a popular unsupervised The focus of this paper is on data compression in
classification strategy in machine learning. The compression biological imaging. In this study, the retinal pictures are
challenge in this study uses two retinal pictures. The most employed to achieve compression. Fig. 1 depicts the
useful open-source statistical calculation tool is R advantages of data (multimedia) compression. The data
programming, which is employed here with RStudio, an compression scenario [10,11] appears to have numerous
Integrated Development Environment (IDE). The use of R advantages, as shown in the diagram.
(RStudio) has had an impact across all domains of research
and engineering. The test retinal images had a compression
ratio of 33.3 percent, according to simulation data.

Keywords—Classification, k-means clustering, retinal image,


RStudio.

I. INTRODUCTION
Digital Signal Processing (DSP) is a fast-evolving
paradigm of estimating and filtering algorithms that are
frequently employed in signal analysis and processing.
Wireless communication systems design, sophisticated radar
(sonar) systems, audio and speech signal processing,
vibrations, imaging, biomedicine, and non-destructive
control, to name a few examples [1-5]. The following are Fig. 1. Various themes in medical and biological engineering.
some of the most important applications and domains of DSP
[6-9]: Rather than using traditional DSP techniques like the
discrete Fourier transform (DFT) or the fast Fourier
• Speech/audio signal processing transform (FFT), this study uses an unsupervised
• Speech/audio compression classification strategy like clustering to compress data. With
the help of an Integrated Development Environment (IDE),
• Speech recognition RStudio, R programming is utilized to achieve data
compression. R is a computer language that is frequently
• Digital image processing utilized in big data research applications. Fig. 2 illustrates a
• Video compression handful of the most popular big data analytics tools. At both
the commercial and academic levels, data science is one of
• Speech recognition the fastest-growing fields. People frequently argue that data
• Radar/sonar signal processing science is simply statistics viewed through a different lens
[12-15].
• Digital communications

ISBN: 978-1-6654-3811-7/21/$31.00 ©2021 IEEE

1772
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
There are two primary approaches for machine learning • Support Vector Machine (SVM)
tasks: supervised learning and unsupervised learning. For
labeled data, supervised learning algorithms may be used to Similarly, the most important unsupervised learning
categorize it; for unlabeled data, non-supervised learning approaches are characterized as follows:
techniques can be used to cluster it. The basic principles of • Clustering (Agglomerative, Divisive)
clustering algorithms and the selection of important
clustering algorithm parameters are discussed in this [16] • Expectation-Maximization (EM)
work. The use of the clustering approach in image
• Reinforcement Learning
compression is also looked into. This paper [16] also
highlights the issues that should be considered when The necessity for data and data analysis is essential in our
employing clustering. Finally, a real-world example of daily lives. Data science and analysis software is widely
picture compression using K-means is presented. utilized in a variety of areas, including education, healthcare,
politics, finance, aerospace, and remote sensing.
Compression techniques for k-means and k-means++
clustering have been devised and used for thermographic B. K-Means Clustering
images [17]. Except for the initialization of the centroids, the There is a high demand for an automated system that can
overall procedure has four phases that are the same for both partition data sets into clusters or groups at the moment. For
techniques. The number of clusters used for the method example, Internet information (the World Wide Web) and
determines the compression ratio and quality. To run the digital libraries are constantly expanding, making it difficult
algorithms, a MATLAB GUI was created, and a comparison to obtain meaningful material that is not dependent on the
was made between subjective assessments and objective search engine. Clustering is commonly used in fields such as
RMS error, peak SNR, and compression ratio measures [17]. gene data analysis (bioinformatics), image segmentation,
The following is a breakdown of how this article is object, character, and pattern recognition, machine learning,
structured. Section 2 provides the materials and methods, data mining, VLSI design, computer vision, and information
which include classification algorithms, data compression retrieval, among others. Fig. 3 depicts a few far-flung
using k-means clustering, and the advantages of R clustering uses. The big data science topic has become
programming using an RStudio. Section 3 describes the extremely popular in recent years as a result of advances in
simulation findings and commentary. Finally, Section 4 computing technology and capacity. Large, sparse, and high-
discusses the conclusions. dimensional data can all benefit from clustering. K-means
clustering is one of the most extensively utilized clustering
algorithms in practical applications. In the standard k-means
clustering algorithm [22-24], the squared Euclidean distance
is frequently utilized.

Fig. 2. Prominent data analytics tools.


Fig. 3. Applications of clustering.
II. MATERIALS AND METHODS
C. Data Compression using k-Means Clustering
A. Classification Techniques
The two most common tasks for data mining analysis are
In data science applications, supervised and unsupervised
classification and clustering. The k-Means clustering
categorization learning plays a critical part in creating real-
algorithm is mostly based on distance computations, such as
world decisions (predictions), with a process ranging from
the Euclidean formula in 2-dimensional space [25]:
restricted to massive data sets (big data). The following are
the most common and important approaches to supervised Euclidian distance = ( X 1 − X 2 )2 + (Y1 − Y2 )2 (1)
learning [18-21]:
Equation (1) can also be used in a multidimensional
• Regression (logistics, softmax) space. The number of groups or clusters for data split in k-
• kNN (k-nearest neighbors) Means is k; the algorithm works as follows [16,26]:

• Decision Tree • Assign several clusters (for example, k = 6)

• Random Forest • Assign the locations of centroids (for example, six


centroids), randomly
• Naïve Bayes

1773
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
• Calculate the distance between every centroid and an number of clusters can be set to the optimal value using the
instance plot in Fig. 6. Fig. 7 shows the final images after
compression, with 3, 4, and 5 clusters (see Fig. 9 for test
• Assign the instance to the cluster which produces a image 2). Table 1 shows the peak signal-to-noise ratio
minimum distance (PSNR) and SNR results (see Table 2 for test image 2). The
• Move the centroids towards the cluster (mean of approach with more clusters offers superior SNR values, as
instances) evidenced by the findings. The compression ratio, a key
performance statistic, is calculated as follows:
a) Computation of K: The number of clusters, k, can
Compression Ratio = input file size = 1.2 10 bytes
6
be calculated using the homogeneity within a group by
iteratively calculating the sum-of-squares. The output file size 3.6 10 6 bytes
preconfigured function (wssplot) calculates the sum-of-
squares within each cluster and displays the curve, as = 33 .33 % (2)
illustrated in Fig. 4. It is obvious from the curve that,
beyond five or more clusters, the findings do remain the
same. Hence, the optimum value may be derived from the
elbow point of the curve, for example, k = 3.

Fig. 5. Test retinal images for compression: (a) Image 1; (b) Image 2.

Fig. 4. The curve using “wssplot”.


b) Deployment using R: Often, the most versatile
programming language, such as Python, can be used
efficiently for data analysis. On the other hand, a
programming environment developed for data management
without regard for programming scenarios can be employed.
Matlab, SPSS, Stata, and other systems (environments) are
examples. R is an open-source alternative to S that was Fig. 6. Plot using “wssplot” for test image 1.
created at the University of Auckland in 1993. (by Ross
Ihaka and Robert Gentleman). However, because R is a free,
open-source data analytics tool, it is the most powerful. R is
not inferior to other platforms due to its looseness; it can
perform everything from simple mathematical calculations
through analytical computations to complex visualizations
[27].
When compared to peer products like Matlab, Python,
Stat, SPSS, and others, R can accomplish a lot of things
with a lot of ease, including data processing, statistics, data
mining, and machine learning. The popularity of R was Fig. 7. Compressed retinal images (image 1) using: (a) 3 clusters; (b) 4
aided by RStudio, an IDE for R programming, and over clusters; (c) 5 clusters.
15,000 freely available packages. R (RStudio) would speed
up the learning process for statistics. As a result, R has
become a de facto standard for mathematical and scientific
computing. Even though commercial data analytics and
mining tools produce good results, small and medium
businesses continue to rely heavily on open-source
statistical software [28,29].
III. RESULTS AND DISCUSSION
In this investigation, the two test retinal pictures are Fig. 8. Plot using “wssplot” for test image 2.
employed for data compression (see Fig. 5). Fig. 6 (see Fig. 8
for test picture 2) shows the sum-of-squares vs the number of
clusters using the R package's "wssplot" function. The

1774
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
abilities among students and researchers in both academia
and industry. This research shows how to apply the k-means
clustering technique to a data compression application. Data
compression offers substantial benefits such as reduce
bandwidth, lower storage requirements, and faster
information retrieval. As a review or survey, this work
might be expanded to include a full comparison of
traditional data compression algorithms with the clustering
technique [30-34].
REFERENCES
[1] B. Sindhura, S. Ashwin, G. Rajkumar, V. Elamaran, and M.Sankar,
“Useful Tips and Tricks on Digital Data Processing with Discrete
Fourier Transform,” Proceedings of the 2nd International Conference
on Computing Methodologies and Communication (ICCMC), Erode,
India, 15-16 Feb. 2018, pp. 738-742.
Fig. 9. Compressed retinal images (image 2) using: (a) 3 clusters; [2] E. Abdulhay, V. Elamaran, M. Chandrasekar, V.S. Balaji, and K.
Narasimhan, “Automated diagnosis of epilepsy from EEG signals
PSNR and SNR Results using Test Image 1 using ensemble learning approach,” Pattern Recognition Letters, vol.
139, pp. 174-181, Oct. 2020.
[3] V. Elamaran, A. Praveen, M.S. Reddy, L.V. Aditya, and K.Suman,
Results “FPGA Implementation of Spatial Image Filters using Xilinx System
Number of Clusters PSNR SNR Generator,” Proceedings of the International Conference on Modeling
Optimization and Computing (ICMOC-12), Procedia Engineering,
3 –19.7810 2.7821 vol. 38, pp. 2244-2249, 2012.
–19.7421 2.8204 [4] R. Naveena, V. Darthy Rabecka, G. Rajkumar, and V. Elamaran,
4 “Understanding Digital Filters from Theory to Practice using Matlab
5 –19.6953 2.8665 and Simulink,” International Journal of Pharmacy and Technology,
vol. 7, no.3, pp. 9923-9934, Jan.2015.
6 –19.6139 2.9597 [5] V. Elamaran, A. Aswini, V. Niraimathi, and D. Kokilavani, “FPGA
–19.6019 2.9647 implementation of audio enhancement using adaptive LMS filters,”
7 Journal of Artificial Intelligence, vol. 5, no. 4, pp. 221-226, April
–19.5590 3.0017 2012.
8
[6] M. Sundar Prakash Balaji, R. Jayabharathy, Betty Martin, A.
9 –19.4726 3.0870 Parvathy, R.K. Arvind Shriram, and V. Elamaran, “Exploring Modern
Digital Signal Processing Techniques on Physiological Signals in
10 –19.4259 3.1332
Day-to-Day Life Applications,” Journal of Medical Imaging and
–19.3731 3.1857 Health Informatics, vol. 10, no. 1, pp. 93-98, Jan. 2020.
11
[7] M. Sundar Prakash Balaji, R. Jayabharathy, Nirmala Jegadeesan, L.
12 –19.3223 3.2364 Devasena, G. Venkat Babu, V. Elamaran, and V. Venkatraman,
TABLE II “Analysis of Energy Concentration of the Speech, EEG, and ECG
PSNR AND SNR RESULTS USING TEST IMAGE 2 Signals in Healthcare Applications – A Survey,” Journal of Medical
Imaging and Health Informatics, vol. 10, no. 1, pp. 49-53, Jan. 2020.
Results [8] W.J. Wang, G.P. Zhang, L.M. Yang, V.S. Balaji, V Elamaran, and N.
Number of Clusters PSNR SNR Arunkumar, “Revisitng signal processing with spectrogram analysis
on EEG, ECG, and Speech signals,” Future Generation Computer
–19.5955 2.7466 Systems, vol. 98, pp. 227-232, Sep. 2019.
3
[9] A. Spanias, T. Painter, and V. Atti, “Audio Signal Processing and
4 –19.4701 2.7689 Coding, “ Wiley India, 2013.
–19.4247 2.8176 [10] V. Elamaran and A. Praveen, “Comparison of DCT and wavelets in
5 image coding,” Proceedings of the International Conference on
–19.4188 2.9222 Computer Communication and Informatics (ICCCI), Coimbatore,
6
India, 10-12 Jan. 2012, pp: 1-4.
7 –19.4144 2.9267 [11] B.M. Reddy, T.V. Subbareddy, S.O. Reddy, and V. Elamaran, “A
tutorial review on data compression with detection of fetal heart beat
8 –19.3768 2.9278
from noisy ECG,” Proceedings of the International Conference on
–19.2791 3.0612 Control, Instrumentation, Communication and Computation
9 Technologies (ICCICCT), Kanyakumari, India, 10-11 July 2014, pp:
–19.2221 3.1184 1310-1314.
10
[12] J.S. Racine, “RStudio: A platform-independent IDE for R and
11 –19.1723 3.1681 Sweave,” Journal of Applied Econometrics, vol. 27, pp. 167-172, Jan.
–19.1362 3.2035 2012.
12
[13] A. Kadiyala, and A. Kumar, 2017. Applications of R to evaluate
environmental data science problems. Environmental Progress and
IV. CONCLUSION Sustainable Energy, 36: 1358-1364.
[14] N.R. Salim, K. Gopal, and A.F.M. Ayub, “Effects of using RStudio
Linear systems, control systems, signals and systems, on statistics performance of Malaysian undergraduates,” Malaysian
statistics, digital signal and image processing, audio and Journal of Mathematical Sciences, vol. 13, no. 3, pp. 419-437, Jan.
voice processing, and other areas benefit from the use of r, a 2019.
strong open-source tool. This type of advanced statistical [15] E. Kavya, M. Agca, F. Adiguzel, and M. Cetin, “Spatial data analysis
software program aids in the development of research with R programming for environment,” Human and Ecological Risk
Assessment, vol.25, no. 6, pp. 1521-1530, 2019.

1775
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
[16] X. Wan, “Application of K-means Algorithm in Image Compression,”
Proceedings of the IOP Conference Series: Materials Science and
Engineering, vol. 563, pp. 1-5, Aug. 2019.
[17] H. Biswas, S.E. Umbaugh, D.J. Marino, and J. Sackman,
“Comparison of K-means and K-means++ for image compression
with thermographic images,” Proceedings of SPIE – The International
Society for Optical Engineering, 11743, 12 April 2021, pp: 1-7.
[18] D.W. Liu, R.P. Jia, C.F. Wang, N. Arunkumar, K. Narasimhan, M.
Udayakumar, and V. Elamaran, “Automated detection of cancerous
genomic sequences using genomic signal processing and machine
learning,” Future Generation Computer Systems, vol. 98, no. 2, pp.
233-237, Sep.2019.
[19] T. Shen, Y. Nagai, M. Udayakumar, K. Narasimhan, R.K.A. Shriram,
N. Mohanraj, and V. Elamaran, “Automated genomic signal
processing for diseased gene identification,” Journal of Medical
Imaging and Health Informatics, vol. 9, pp. 1254-1261, Aug. 2019.
[20] G. Rajkumar, R. Jayabharathy, K. Narasimhan, N. Raju, M.
Easwaran, V. Elamaran, G. Ramirez-Gonzalez, and M. Burbano-
Fernandez, “Spectral and SNR improvement analysis of normal and
abnormal heart sound signals using different windows,” Future
Generation Computer Systems, vol. 92, pp. 438-443, Oct.2018.
[21] S. Chirag, “A Hands-On Introduction to Data Science,” Cambridge
University Press, 2020.
[22] E. Abdulhay, N. Arunkumar, K. Narasimhan, V. Elamaran, and V.
Venkatraman, “Gait and tremor investigation using machine learning
techniques for the diagnosis of Parkinson disease,” Future Generation
Computer Systems, vol. 83, pp. 366-373, Mar. 2018.
[23] J.J. Liu, K. Narasimhan, V. Elamaran, N. Arunkumar, M. Solarte, and
G. Ramirez-Gonzalez, “Clinical decision support system for
alcoholism detection using the analysis of EEG signals,” IEEE
Access, vol. 6, pp. 61457-61461, Oct. 2018.
[24] K. Narasimhan, and V. Elamaran, “Wavelet-based energy features
for diagnosis of melanoma from dermoscopic images,” International
Journal of Biomedical Engineering and Technology, vol. 20, no. 3,
pp. 243-252, Jan. 2016.
[25] A. Subasi, “Practical Guide for Biomedical Signals Analysis using
Machine Learning Techniques: A Matlab Based Approach,” Elsevier,
2019.
[26] J. Paek and J.G. Ko, “K-Means Clustering-Based Data Compression
Scheme for Wireless Imaging Sensor Networks,” IEEE Systems
Journal, vol. 11, no. 4, pp. 2652-2662, Dec. 2017.
[27] X. Wan, “Application of K-means Algorithm in Image Compression,”
Proceedings of the IOP Conference Series: Materials Science and
Engineering, vol. 563, pp. 1-5, Aug. 2019.
[28] C. Kamusoko, “Remote Sensing Image Classification in R,” Springer,
2019.
[29] M. Wegmann, B. Leutner, and S. Dech, “Remote Sensing and GIS for
Ecologists: Using Open Source Software,” Pelagic publishers, 2016.
[30] J. Stander, and L.D. Valle, “On Enthusing Students About Big Data
and Social Media Visualization and Analysis using R, RStudio, and
RMarkdown,” Journal of Statistics Education, vol. 25, no.2, pp. 60-
67, May 2017.
[31] N. Benakli, B. Kostadinov, A. Satyanarayana, and S. Singh,
“Introducing computational thinking through hands-on projects using
R with applications to calculus, probability, and data analysis,”
International Journal of Mathematical Education in Science and
Technology, vol. 48, no. 3, pp. 393-427, Dec.2016.
[32] S. Shoba, and R. Rajavel, “A new Genetic Algorithm based fusion
scheme in monaural CASA system to improve the performance of the
speech,” Journal of Ambient Intelligence and Humanized Computing,
vol. 11, no.3, pp. 433-446, Jan. 2020.
[33] S.B. Mohan, T.A. Raghavendiran, and R. Rajavel, “Patch based fast
noise level estimation using DCT and standard deviation,” Cluster
Comuting-The Journal of Networks Software Tools and Applications,
vol. 22, no. 2, pp. 14495-14504, Nov. 2019.
[34] S. Shoba, and R. Rajavel, “Improving speech intelligibility in
monaural segregation system by fusing voiced and unvoiced speech
segments,” Circuits Systems and Signal Processing, vol. 38, no. 4, pp.
3573-3590, Aug.2019.

1776
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.

You might also like