Lossy Data Compression Using K-Means Clustering On Retinal Images Using RStudio
Lossy Data Compression Using K-Means Clustering On Retinal Images Using RStudio
I. INTRODUCTION
Digital Signal Processing (DSP) is a fast-evolving
paradigm of estimating and filtering algorithms that are
frequently employed in signal analysis and processing.
Wireless communication systems design, sophisticated radar
(sonar) systems, audio and speech signal processing,
vibrations, imaging, biomedicine, and non-destructive
control, to name a few examples [1-5]. The following are Fig. 1. Various themes in medical and biological engineering.
some of the most important applications and domains of DSP
[6-9]: Rather than using traditional DSP techniques like the
discrete Fourier transform (DFT) or the fast Fourier
• Speech/audio signal processing transform (FFT), this study uses an unsupervised
• Speech/audio compression classification strategy like clustering to compress data. With
the help of an Integrated Development Environment (IDE),
• Speech recognition RStudio, R programming is utilized to achieve data
compression. R is a computer language that is frequently
• Digital image processing utilized in big data research applications. Fig. 2 illustrates a
• Video compression handful of the most popular big data analytics tools. At both
the commercial and academic levels, data science is one of
• Speech recognition the fastest-growing fields. People frequently argue that data
• Radar/sonar signal processing science is simply statistics viewed through a different lens
[12-15].
• Digital communications
1772
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
There are two primary approaches for machine learning • Support Vector Machine (SVM)
tasks: supervised learning and unsupervised learning. For
labeled data, supervised learning algorithms may be used to Similarly, the most important unsupervised learning
categorize it; for unlabeled data, non-supervised learning approaches are characterized as follows:
techniques can be used to cluster it. The basic principles of • Clustering (Agglomerative, Divisive)
clustering algorithms and the selection of important
clustering algorithm parameters are discussed in this [16] • Expectation-Maximization (EM)
work. The use of the clustering approach in image
• Reinforcement Learning
compression is also looked into. This paper [16] also
highlights the issues that should be considered when The necessity for data and data analysis is essential in our
employing clustering. Finally, a real-world example of daily lives. Data science and analysis software is widely
picture compression using K-means is presented. utilized in a variety of areas, including education, healthcare,
politics, finance, aerospace, and remote sensing.
Compression techniques for k-means and k-means++
clustering have been devised and used for thermographic B. K-Means Clustering
images [17]. Except for the initialization of the centroids, the There is a high demand for an automated system that can
overall procedure has four phases that are the same for both partition data sets into clusters or groups at the moment. For
techniques. The number of clusters used for the method example, Internet information (the World Wide Web) and
determines the compression ratio and quality. To run the digital libraries are constantly expanding, making it difficult
algorithms, a MATLAB GUI was created, and a comparison to obtain meaningful material that is not dependent on the
was made between subjective assessments and objective search engine. Clustering is commonly used in fields such as
RMS error, peak SNR, and compression ratio measures [17]. gene data analysis (bioinformatics), image segmentation,
The following is a breakdown of how this article is object, character, and pattern recognition, machine learning,
structured. Section 2 provides the materials and methods, data mining, VLSI design, computer vision, and information
which include classification algorithms, data compression retrieval, among others. Fig. 3 depicts a few far-flung
using k-means clustering, and the advantages of R clustering uses. The big data science topic has become
programming using an RStudio. Section 3 describes the extremely popular in recent years as a result of advances in
simulation findings and commentary. Finally, Section 4 computing technology and capacity. Large, sparse, and high-
discusses the conclusions. dimensional data can all benefit from clustering. K-means
clustering is one of the most extensively utilized clustering
algorithms in practical applications. In the standard k-means
clustering algorithm [22-24], the squared Euclidean distance
is frequently utilized.
1773
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
• Calculate the distance between every centroid and an number of clusters can be set to the optimal value using the
instance plot in Fig. 6. Fig. 7 shows the final images after
compression, with 3, 4, and 5 clusters (see Fig. 9 for test
• Assign the instance to the cluster which produces a image 2). Table 1 shows the peak signal-to-noise ratio
minimum distance (PSNR) and SNR results (see Table 2 for test image 2). The
• Move the centroids towards the cluster (mean of approach with more clusters offers superior SNR values, as
instances) evidenced by the findings. The compression ratio, a key
performance statistic, is calculated as follows:
a) Computation of K: The number of clusters, k, can
Compression Ratio = input file size = 1.2 10 bytes
6
be calculated using the homogeneity within a group by
iteratively calculating the sum-of-squares. The output file size 3.6 10 6 bytes
preconfigured function (wssplot) calculates the sum-of-
squares within each cluster and displays the curve, as = 33 .33 % (2)
illustrated in Fig. 4. It is obvious from the curve that,
beyond five or more clusters, the findings do remain the
same. Hence, the optimum value may be derived from the
elbow point of the curve, for example, k = 3.
Fig. 5. Test retinal images for compression: (a) Image 1; (b) Image 2.
1774
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
abilities among students and researchers in both academia
and industry. This research shows how to apply the k-means
clustering technique to a data compression application. Data
compression offers substantial benefits such as reduce
bandwidth, lower storage requirements, and faster
information retrieval. As a review or survey, this work
might be expanded to include a full comparison of
traditional data compression algorithms with the clustering
technique [30-34].
REFERENCES
[1] B. Sindhura, S. Ashwin, G. Rajkumar, V. Elamaran, and M.Sankar,
“Useful Tips and Tricks on Digital Data Processing with Discrete
Fourier Transform,” Proceedings of the 2nd International Conference
on Computing Methodologies and Communication (ICCMC), Erode,
India, 15-16 Feb. 2018, pp. 738-742.
Fig. 9. Compressed retinal images (image 2) using: (a) 3 clusters; [2] E. Abdulhay, V. Elamaran, M. Chandrasekar, V.S. Balaji, and K.
Narasimhan, “Automated diagnosis of epilepsy from EEG signals
PSNR and SNR Results using Test Image 1 using ensemble learning approach,” Pattern Recognition Letters, vol.
139, pp. 174-181, Oct. 2020.
[3] V. Elamaran, A. Praveen, M.S. Reddy, L.V. Aditya, and K.Suman,
Results “FPGA Implementation of Spatial Image Filters using Xilinx System
Number of Clusters PSNR SNR Generator,” Proceedings of the International Conference on Modeling
Optimization and Computing (ICMOC-12), Procedia Engineering,
3 –19.7810 2.7821 vol. 38, pp. 2244-2249, 2012.
–19.7421 2.8204 [4] R. Naveena, V. Darthy Rabecka, G. Rajkumar, and V. Elamaran,
4 “Understanding Digital Filters from Theory to Practice using Matlab
5 –19.6953 2.8665 and Simulink,” International Journal of Pharmacy and Technology,
vol. 7, no.3, pp. 9923-9934, Jan.2015.
6 –19.6139 2.9597 [5] V. Elamaran, A. Aswini, V. Niraimathi, and D. Kokilavani, “FPGA
–19.6019 2.9647 implementation of audio enhancement using adaptive LMS filters,”
7 Journal of Artificial Intelligence, vol. 5, no. 4, pp. 221-226, April
–19.5590 3.0017 2012.
8
[6] M. Sundar Prakash Balaji, R. Jayabharathy, Betty Martin, A.
9 –19.4726 3.0870 Parvathy, R.K. Arvind Shriram, and V. Elamaran, “Exploring Modern
Digital Signal Processing Techniques on Physiological Signals in
10 –19.4259 3.1332
Day-to-Day Life Applications,” Journal of Medical Imaging and
–19.3731 3.1857 Health Informatics, vol. 10, no. 1, pp. 93-98, Jan. 2020.
11
[7] M. Sundar Prakash Balaji, R. Jayabharathy, Nirmala Jegadeesan, L.
12 –19.3223 3.2364 Devasena, G. Venkat Babu, V. Elamaran, and V. Venkatraman,
TABLE II “Analysis of Energy Concentration of the Speech, EEG, and ECG
PSNR AND SNR RESULTS USING TEST IMAGE 2 Signals in Healthcare Applications – A Survey,” Journal of Medical
Imaging and Health Informatics, vol. 10, no. 1, pp. 49-53, Jan. 2020.
Results [8] W.J. Wang, G.P. Zhang, L.M. Yang, V.S. Balaji, V Elamaran, and N.
Number of Clusters PSNR SNR Arunkumar, “Revisitng signal processing with spectrogram analysis
on EEG, ECG, and Speech signals,” Future Generation Computer
–19.5955 2.7466 Systems, vol. 98, pp. 227-232, Sep. 2019.
3
[9] A. Spanias, T. Painter, and V. Atti, “Audio Signal Processing and
4 –19.4701 2.7689 Coding, “ Wiley India, 2013.
–19.4247 2.8176 [10] V. Elamaran and A. Praveen, “Comparison of DCT and wavelets in
5 image coding,” Proceedings of the International Conference on
–19.4188 2.9222 Computer Communication and Informatics (ICCCI), Coimbatore,
6
India, 10-12 Jan. 2012, pp: 1-4.
7 –19.4144 2.9267 [11] B.M. Reddy, T.V. Subbareddy, S.O. Reddy, and V. Elamaran, “A
tutorial review on data compression with detection of fetal heart beat
8 –19.3768 2.9278
from noisy ECG,” Proceedings of the International Conference on
–19.2791 3.0612 Control, Instrumentation, Communication and Computation
9 Technologies (ICCICCT), Kanyakumari, India, 10-11 July 2014, pp:
–19.2221 3.1184 1310-1314.
10
[12] J.S. Racine, “RStudio: A platform-independent IDE for R and
11 –19.1723 3.1681 Sweave,” Journal of Applied Econometrics, vol. 27, pp. 167-172, Jan.
–19.1362 3.2035 2012.
12
[13] A. Kadiyala, and A. Kumar, 2017. Applications of R to evaluate
environmental data science problems. Environmental Progress and
IV. CONCLUSION Sustainable Energy, 36: 1358-1364.
[14] N.R. Salim, K. Gopal, and A.F.M. Ayub, “Effects of using RStudio
Linear systems, control systems, signals and systems, on statistics performance of Malaysian undergraduates,” Malaysian
statistics, digital signal and image processing, audio and Journal of Mathematical Sciences, vol. 13, no. 3, pp. 419-437, Jan.
voice processing, and other areas benefit from the use of r, a 2019.
strong open-source tool. This type of advanced statistical [15] E. Kavya, M. Agca, F. Adiguzel, and M. Cetin, “Spatial data analysis
software program aids in the development of research with R programming for environment,” Human and Ecological Risk
Assessment, vol.25, no. 6, pp. 1521-1530, 2019.
1775
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN
[16] X. Wan, “Application of K-means Algorithm in Image Compression,”
Proceedings of the IOP Conference Series: Materials Science and
Engineering, vol. 563, pp. 1-5, Aug. 2019.
[17] H. Biswas, S.E. Umbaugh, D.J. Marino, and J. Sackman,
“Comparison of K-means and K-means++ for image compression
with thermographic images,” Proceedings of SPIE – The International
Society for Optical Engineering, 11743, 12 April 2021, pp: 1-7.
[18] D.W. Liu, R.P. Jia, C.F. Wang, N. Arunkumar, K. Narasimhan, M.
Udayakumar, and V. Elamaran, “Automated detection of cancerous
genomic sequences using genomic signal processing and machine
learning,” Future Generation Computer Systems, vol. 98, no. 2, pp.
233-237, Sep.2019.
[19] T. Shen, Y. Nagai, M. Udayakumar, K. Narasimhan, R.K.A. Shriram,
N. Mohanraj, and V. Elamaran, “Automated genomic signal
processing for diseased gene identification,” Journal of Medical
Imaging and Health Informatics, vol. 9, pp. 1254-1261, Aug. 2019.
[20] G. Rajkumar, R. Jayabharathy, K. Narasimhan, N. Raju, M.
Easwaran, V. Elamaran, G. Ramirez-Gonzalez, and M. Burbano-
Fernandez, “Spectral and SNR improvement analysis of normal and
abnormal heart sound signals using different windows,” Future
Generation Computer Systems, vol. 92, pp. 438-443, Oct.2018.
[21] S. Chirag, “A Hands-On Introduction to Data Science,” Cambridge
University Press, 2020.
[22] E. Abdulhay, N. Arunkumar, K. Narasimhan, V. Elamaran, and V.
Venkatraman, “Gait and tremor investigation using machine learning
techniques for the diagnosis of Parkinson disease,” Future Generation
Computer Systems, vol. 83, pp. 366-373, Mar. 2018.
[23] J.J. Liu, K. Narasimhan, V. Elamaran, N. Arunkumar, M. Solarte, and
G. Ramirez-Gonzalez, “Clinical decision support system for
alcoholism detection using the analysis of EEG signals,” IEEE
Access, vol. 6, pp. 61457-61461, Oct. 2018.
[24] K. Narasimhan, and V. Elamaran, “Wavelet-based energy features
for diagnosis of melanoma from dermoscopic images,” International
Journal of Biomedical Engineering and Technology, vol. 20, no. 3,
pp. 243-252, Jan. 2016.
[25] A. Subasi, “Practical Guide for Biomedical Signals Analysis using
Machine Learning Techniques: A Matlab Based Approach,” Elsevier,
2019.
[26] J. Paek and J.G. Ko, “K-Means Clustering-Based Data Compression
Scheme for Wireless Imaging Sensor Networks,” IEEE Systems
Journal, vol. 11, no. 4, pp. 2652-2662, Dec. 2017.
[27] X. Wan, “Application of K-means Algorithm in Image Compression,”
Proceedings of the IOP Conference Series: Materials Science and
Engineering, vol. 563, pp. 1-5, Aug. 2019.
[28] C. Kamusoko, “Remote Sensing Image Classification in R,” Springer,
2019.
[29] M. Wegmann, B. Leutner, and S. Dech, “Remote Sensing and GIS for
Ecologists: Using Open Source Software,” Pelagic publishers, 2016.
[30] J. Stander, and L.D. Valle, “On Enthusing Students About Big Data
and Social Media Visualization and Analysis using R, RStudio, and
RMarkdown,” Journal of Statistics Education, vol. 25, no.2, pp. 60-
67, May 2017.
[31] N. Benakli, B. Kostadinov, A. Satyanarayana, and S. Singh,
“Introducing computational thinking through hands-on projects using
R with applications to calculus, probability, and data analysis,”
International Journal of Mathematical Education in Science and
Technology, vol. 48, no. 3, pp. 393-427, Dec.2016.
[32] S. Shoba, and R. Rajavel, “A new Genetic Algorithm based fusion
scheme in monaural CASA system to improve the performance of the
speech,” Journal of Ambient Intelligence and Humanized Computing,
vol. 11, no.3, pp. 433-446, Jan. 2020.
[33] S.B. Mohan, T.A. Raghavendiran, and R. Rajavel, “Patch based fast
noise level estimation using DCT and standard deviation,” Cluster
Comuting-The Journal of Networks Software Tools and Applications,
vol. 22, no. 2, pp. 14495-14504, Nov. 2019.
[34] S. Shoba, and R. Rajavel, “Improving speech intelligibility in
monaural segregation system by fusing voiced and unvoiced speech
segments,” Circuits Systems and Signal Processing, vol. 38, no. 4, pp.
3573-3590, Aug.2019.
1776
Authorized licensed use limited to: MIT-World Peace University. Downloaded on August 14,2024 at 06:28:37 UTC from IEEE Xplore. Restrictions apply.