0% found this document useful (0 votes)
12 views19 pages

Enhancing Ocular Healthcare Deep Learning-Based Mu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Enhancing Ocular Healthcare Deep Learning-Based Mu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Enhancing Ocular Healthcare: Deep Learning-


Based multi-class Diabetic Eye Disease
Segmentation and Classification
Maneesha Vadduri1, and Kuppusamy P1, Member, IEEE
1
School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, 522237 INDIA

Corresponding author: Kuppusamy P (e-mail: [email protected]).

ABSTRACT Diabetic Eye Disease (DED) is a serious retinal illness that affects diabetics. The timely
identification and precise categorization of multi-class DED within retinal fundus images play a pivotal role
in mitigating the risk of vision loss. The development of an effective diagnostic model using retinal fundus
images relies significantly on both the quality and quantity of the images. This study proposes a
comprehensive approach to enhance and segment retinal fundus images, followed by multi-class
classification employing pre-trained and customized Deep Convolutional Neural Network (DCNN) models.
The raw retinal fundus dataset was subjected to experimentation using four pre-trained models: ResNet50,
VGG-16, Xception, and EfficientNetB7, and the optimal performing model EfficientNetB7 was acquired.
Then, image enhancement approaches including the green channel extraction, applying Contrast-Limited
Adaptive Histogram Equalization (CLAHE), and illumination correction, were employed on these raw
images. Subsequently, image segmentation methods such as the Tyler Coye Algorithm, Otsu thresholding,
and Circular Hough Transform are employed to extract essential Region of Interest (ROIs) like optic nerve,
Blood Vessels (BV), and the macular region from the raw ocular fundus images. After preprocessing, the
model is trained using these images that outperformed the four pre-trained models and the proposed
customized DCNN model. The proposed DCNN methodology holds promising results for the Cataract (CA),
Diabetic Retinopathy (DR), Glaucoma (GL), and NORMAL detection tasks, achieving accuracies of 96.43%,
98.33%, 97%, and 96%, respectively. The experimental evaluations highlighted the efficacy of the proposed
approach in achieving accurate and reliable multi-class DED classification results, showcasing the promising
potential for early diagnosis and personalized treatment. This contribution could lead to improved healthcare
outcomes for diabetic patients.

INDEX TERMS Deep Convolutional Neural Network, Diabetic Eye Diseases, Image Enhancement,
Image Segmentation, Retinal Fundus Images.

I. INTRODUCTION in Fig. 1. These include deterioration of the lens (CA),


According to the World Health Organization (WHO), abnormal BV growth and, narrow bulges or the retina's tiny
around 2.2 billion people throughout the world are blind or BV rupturing (microaneurysms), (DR) in its earliest stages,
visually challenged [1]. Among them, at least 1 billion are Low intraocular pressure (GL) is the leading cause of
avoidable. It is believed that diabetes mellitus usually called irreversible optic nerve damage and blindness. To effectively
diabetes has a role in these occurrences of blindness [2]. treat these conditions, accurate diagnosis and identification are
Most people with diabetes will eventually develop DED, and essential [1], [2]. Inspiring proactive solutions for detection
due to its high sensitivity in the diagnosis of DED, retinal and prevention that fulfill many needs associated with retinal
fundus imaging has become the most widely used diseases and visual disabilities throughout a person's life. The
technology for detecting DED [2]. application of Deep Learning (DL) in automated DED
DED encompasses CA, DR, GL, and some examples of diagnostics is crucial for solving these problems [3], [4].
lesions that must be recognized from retinal images are shown Professional ophthalmologists agree that timely screening for

VOLUME XXX, 20XX 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

DED is essential for an effective diagnosis, but this screening A. MOTIVATION


takes a lot of time and effort [5]. The recent advances in the domains of artificial intelligence,
DL, and the computer vision have allowed DL to be applied
to produce outstanding outcomes in image categorization
and vision applications. Early detection of lesions and
anomalies in ocular fundus images is still an outstanding
issue. They found that 93% of moderate cases are incorrectly
categorized as normal eyes and that deep neural networks
have trouble learning enough detailed information to
recognize components of mild disease [7]. Therefore, this
study presents a system that combines standard image
processing methods with the most cutting-edge CNN to
assess multi-class DED.

B. CONTRIBUTIONS
FIGURE 1. Fundus images with problems caused by DED
The contributions made by this research are as follows:
While DL has shown outstanding validation accuracies for  Integrate a holistic strategy for the accurate
binary (healthy or diseased) classification, findings for diagnosis of multi-class DED through the
moderate and multi-class classification have been lower utilization of retinal fundus images. This approach
striking, especially for mild impairment. Therefore, this encompasses image enhancement, segmentation,
study introduces an automatic multi-class DED classification and classification techniques to achieve enhanced
model based on DCNN that can distinguish normal from diagnostic accuracy.
diseased tissue in images. First, a comparison of diverse  Employ four pre-trained models, ResNet50, VGG-
Convolutional Neural Network (CNN) architectures is 16, Xception, and EfficientNetB7, and experiment
conducted to determine the optimal one for classifying mild with the raw ocular fundus dataset, and acquire the
and multi-class DED. This model’s goal is to improve upon optimal performing model.
the already impressive performance levels observed in the
 Develop a new customized DCNN model, and train
aforementioned works. Therefore, moderate and multi-class
classification models were trained and tested to enhance using images of the retina that have undergone pre-
sensitivity for the different multi-class DED. This involved processing and segmentation.
implementing various pre-processing and augmentation  Investigate and compare the pre-trained optimal
strategies to enhance result accuracy further and ensure a model and the new customized DCNN model. This
sufficient sample size for the dataset. Treating ocular demonstrated the significance of the preprocessing
diseases as soon as possible is crucial, but doing so with the steps in improving the overall classification
aid of neural networks consumes a significant amount of accuracy.
time and storage space.
Rapid diagnosis and treatment of retinal diseases are II. LITERATURE REVIEW
essential, but doing so with the use of neural networks is To spot DED in ocular fundus images early on, clinicians
resource-intensive. Because of this, a relatively pre-trained need a method that lets them see a full complement of
model can improve the process by adjusting the design to cut features and pinpoint their precise location within the image
down on losses. Pre-trained CNN networks are useful in DL [8]. Lens degeneration, dilated BV (microaneurysms),
because they allow knowledge to be transferred from one vascular leakage, and impairment of the optic nerve, all need
task to another with a smaller set of data or less time spent to be present on retinal fundus images to diagnose multi-
on training [6]. Fine-tuning the pre-trained network is widely class DED in diabetic individuals. Figure 1 depicts the
recognized as a prominent strategy in transfer learning. It is progression of DED.
standard practice to apply various preprocessing techniques Previously, automated DED diagnoses were examined to
to image datasets, including resizing, quantifying, reduce ophthalmologist’s workload and improve
standardizing, and enhancing images. These steps are taken the consistency of diagnosis [9]. Lesion-based detection has
prior to training CNN architectures, regardless of whether been applied in previous research; for example, a novel
the training employs a pre-existing model or a newly model was proposed for identifying microaneurysms in
developed one. Improving the CNN model's classification ocular fundus images. Methods such as BV segmentation,
accuracy is an endless pursuit, moreover, the model’s localization, and elimination of the fovea are used as part of
accuracy relies heavily on the quality of both the training their preprocessing effort. Following that, a hybrid system
dataset and the images within it. comprising neural networks and fuzzy logic models was
employed to accomplish the aforementioned tasks of feature
extraction and classification [5].
2 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

Their research looked at the problem of dividing DR into two imaging techniques, the adoption of DL-based approaches
groups defined by the presence or absence of becomes a feasible option. These approaches entail the
microaneurysms. In addition, diagnosis of the DED can be acquisition of critical features through learning and then
made with a variety of additional features than integrating these feature-learning processes into the model
microaneurysms. Similar to how a classification model development process [23], [24]. A DL approach was
based on pixels was presented to evaluate the intensity of investigated to assess the degree of nuclear CA severity from
ocular illness after segmenting the affected area and slit-lamp images. This technique involves inputting image
pinpointing the anomaly [10]. Used backpropagation neural patches into a CNN to generate the local filters. Furthermore,
networks fed data from decision trees and GA-CFS (Genetic higher-order features were extracted using a set of Recursive
Algorithm- Correlation based Feature Selection) methods to Neural Networks (RNNs). The grading of CA was achieved
identify exudates in DR. Divided healthy eyes and those with using Support Vector Regression [25]. A CA detection
exudates into two groups. The achieved results did not give experiment was conducted, utilizing the Kaggle dataset of
sufficient classification accuracy and did not lead to effective 200 images. In this study, AlexNet the CNN architecture was
noise removal [11]. combined with various common classifiers, including Adam,
Employed a Fuzzy C-Means algorithm and clustering SGD, and others. The recommended system achieved a 77%
analysis to create a method for identifying exudates. Optic accuracy when employing the Adam optimizer and an
Disc (OD) finding and cauterization of the BV are crucial to impressive 97.5% accuracy when utilizing the Lookahead
their work. The results obtained allow the exudates to be optimizer with the AlexNet architecture [26]. A unique CNN
classified without relying on any defining criteria [12]. The model architecture ("Cataract Net") was formulated,
technique presented relies on segmenting both the OD and characterized by its compact size, reduced layers, and
the Optic Cup (OC). The suggested model makes use of two training parameters, as well as the use of smaller kernels to
neural networks simultaneously operating with one focusing enhance computational efficiency. The approach
on the OC and the other on the OD. With the goal of demonstrated a remarkable accuracy of 99.13% for the two
proficiently segmenting, the suggested method targets the classes under study [27]. To identify CA severity from mild
OD and the OC within an ocular fundus image. There are no to severe, a computer-aided technique using fundus images
available outcomes from a classification of GL in multiple was proposed. A CNN that had already been trained was
stages [13]. The use of CNN to recognize DR in the fundus transferred to the automated CA classification task as part of
images was presented. They were able to achieve 90% this strategy [28]. A classifier employing a Support Vector
specificity and sensitivity by using larger non-public datasets Machine (SVM) and achieving a four-stage Correct
consisting of 80,000 to 120,000 ocular fundus images for Classification Rate (CCR) of 92.91% was utilized for the
binary classification between "normal," "mild," and "severe" classification task. A method for classifying CA disease
[14], [15]. known as Tournament-based Ranked CNN was introduced.
To identify retina BV 2D matching filters were used [16]. This method employs a tournament structure along with
Gabor filter bank outputs were employed to automatically binary CNN models for the classification process [29]. The
detect and classify anomalies in the vascular network, CNNs and Res-Net-based trained classifier model enabled a
allowing the recognition of all stages of retinopathy [17]. system for automated CA identification with an accuracy of
There are numerous conventional methods for diagnosing 95.78 percent [30].
and categorizing DED. The majority of methods make use of Recently, a technique utilizing multiple models with
Fuzzy C-Means clustering, region-of-interest algorithms, attention mechanisms was presented for automated CA
mathematical morphology, neural networks, pattern disease identification in ultrasound images, achieving an
recognition, and Gabor filtering methods [16], [17]. accuracy of 97.5% [31]. Using a pre-trained VGG-19
Numerous methods have been suggested to identify OD, architecture on a dataset available on KAGGLE, a
and one such approach is the utilization of Principal comparable accuracy of 97.47% was achieved for fundus
Component Analysis (PCA) to determine potential optical images [32], [33].
disc areas by clustering pixels of a similar brightness. Hough People with diabetes may become blind from DR since it
Transform was utilized to detect optical discs [18]. An has no early warning signs. Yet, DR's effects may be
artificial neural network-driven method is employed for mitigated with early diagnosis. Automated DR diagnosis and
exudate identification [19]. Exudate detection was carried classification were suggested [34]. Pre-processing,
out using a method based on Fuzzy C-Means clustering [20]. segmenting of images, extraction of features, and
A computational intelligence-based method was utilized categorization are all rolled into one using this approach. A
[21]. Automated categorization of DR is attained through the technique for enhancing local contrast was used on the
evaluation of distinct attributes, which encompass exudates, greyscale images to make the area of interest more visible.
hemorrhages, microaneurysms, and BV. This classification Using an adaptive threshold approach and mathematical
process is carried out utilizing a support vector machine [22]. morphology, the lesion area was accurately segmented.
To address the constraints posed by manually crafted Finally, Enhanced categorization was achieved by merging
features and make them applicable across a range of medical statistical and geometric characteristics, leading to more
3 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

accurate outcomes. Those with DR are at risk for developing based strategies for extracting some of the sample-level
retinal complications including blood clots, lesions, and characteristics. However, given the large number of pictures
retinal hemorrhages. Retinal images are used to get a DR used for training, it was urged that EfficientNet be used as a
diagnosis. An approach for DR identification and pre-trained model. The categorization benefitted
categorization using a pre-trained CNN was developed. To significantly from the spatial relationship between feature
improve the retrieved characteristics, a data refinement and vectors, which helped tremendously. Each of these feature
augmentation technique was first used. Gaussian blur was vectors has the potential to supply information that is
used on the fundus image to decrease the quantity of noise in analogous to that provided by the vectors to either side. Here,
the picture. Accuracy was computed in the experimental a feature graph was provided to keep the image's spatial
setting [35]. details intact [39].
An approach-based DL was suggested for DR Using a chaotic bat algorithm, a refined version of
classification, which would include the feature extraction of AlexNet as well as Ensemble Learning Model (ELM). Here,
segmented fundus images. This method began with pre- a pre-trained AlexNet is used, involving dataset training
processing the fundus image and then continued with using images. The process of training the parameters was
segmentation. With the advent of the maximum principal laborious and time-consuming. To make the AlexNet model
curvature model, which prioritizes the greatest Eigenvalues, more stable, Batch Normalization (BN) was implemented
the branching blood veins can now be eliminated. To here. As an additional step, the AlexNet model had multiple
enhance the quality and eliminate inaccuracies within the layers replaced with the ELM. Thus, the model's precision
region, morphological opening, and adaptive histogram improves as a result [40].
equalization techniques were employed. Diabetes has been The initial presentation of DR detection through BV and
linked to increased optic nerve proliferation. The OD segmentation, alongside the identification of retinal
categorization of DR was carried out using a CNN which anomalies, was introduced. This approach encompasses
consists of three primary functional components: The three fundamental components: pre-processing,
investigation focused on the pooling layer, convolution segmentation, and the classification, each playing a pivotal
layer, and the bottleneck layer. The results demonstrated a role. During the pre-processing stage, the CLAHE method
precision(pre) rate of 97.2% and an accuracy of 98.7%. was utilized to process and enhance the green channel
Unfortunately, it was not possible to determine the duration component within the Red, Green, Blue (RGB) scale. Once
of patients' distress [36]. the OD and the BV were primed for segmentation,
Using an Adaptive machine-learning technique, a DR Subsequent steps involved devising methodologies like the
categorization model was created. By this method, DR top hat transformation and the Gabor filtering to efficiently
pictures may be recognized using their own classifiers and identify and isolate anomalies. Throughout the segmentation
characteristics. DRE (Diabetic Retinopathy Estimation) at process, various attributes such as TEM (Texture Energy
the segment level was achieved by using a modified, Measurement), Entropy, and LBP (Local Binary Pattern)
previously trained CNN. After that, the categorization of DR were extracted. Moreover, the approach incorporated the
images was established by connecting lines between all DR Trial-dependent Bypass with an enhanced Dragonfly
maps at each segmentation level. In addition, a learning Algorithm (TB-DA) for optimal feature selection. For
method was used end-to-end to deal with the non-uniform distinguishing between different severity levels (light,
lesions. Acquired sensitivity of 97% and a specificity of moderate, and severe), the hybrid neural network method
96.37 %. Also, proliferative diabetic retinopathy was not was employed. The experimental outcomes were compared
taken into account by this approach [37]. to established methods, assessing various metrics including
Screening for DR by ophthalmologists is difficult and accuracy, NPV (Negative Predictive Value), precision, FDR
time-consuming due to blurred retinal images that make it (False Discovery Rate), FNR (False Negative Rate), MCC
difficult to see signs like microaneurysm, hemorrhage, etc. (Matthews Correlation Coefficient), FPR (False Positive
Because of this, a machine-learning technique that could Rate), and the F1-score [41].
automatically identify DR in fundus images. Classification Recent studies have investigated the feasibility of
of DR images using DL was made more precise by using a employing automatic ocular processing of images for
pre-processing improvement strategy. To improve the glaucoma screening, with results that vary. The techniques
fundus image's clarity for the viewer, Histogram covered below span a variety from simpler machine learning
Equalization (HE), de-haze algorithm, and high pass filter methods to more advanced ones, such as DL. Glaucoma has
were used. Four-layer convolution was used for image been detected using both open and combined datasets. Some
categorization. In the end, a satisfactory level of precision research has tried to use a composite of retinal scans from
was achieved [38]. several public sources to diagnose glaucoma. For example, a
A context-aware graph network for tuberculosis detection combination of DRISHTI, and RIMONE V3 publicly
was presented. Because of training limitations and available datasets extracted features from the OD and the
overfitting issues, the traditional CNN model suffered optic cup to identify glaucoma [42].
greatly. For this reason, this study presents transfer learning-
4 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

An automated GL diagnosis system using three distinct CNN GS1) have AUC values of 0.8354, 0.7739, 0.8575, and
model learning techniques, with results validated by 0.8041 [45].
ophthalmologists. The researchers utilized a wide array of
neural networks, including Transfer Convolutional Neural III. MATERIALS AND METHODS
Networks (TCNNs), Semi-Supervised Convolutional Neural The fundamental objective of this research is to enhance the
Networks (SSCNNs) with self-learning, Denoising Auto efficiency of timely identification of multi-class DED
Encoders (DAE) that relied on both labeled and unlabeled utilizing ocular fundus images by experimentally evaluating
input data. Their models, when run on the RIMONE and image preprocessing and classification enhancement
RIGA open-source datasets, showed convincing results and strategies.
proved that DL models are good at finding GL. The authors The aims in this area may be summed up as follows:
say that the TCNN, SSCNN, and SSCNN-DAE all had an • Deploy conventional techniques of image processing like
overall accuracy of 91.5%, 92.4%, and 93.8%, respectively improving image quality, expanding dataset through
[43]. A transfer learning-based model was utilized to do augmentation, and segmenting images.
automatic GL categorization. Color fundus pictures from the • Exploring different model configurations and observing
RIM-ONE and DRISHTI-GS databases were used. They how they impact CNN model outcomes is the goal.
added images from two more campaigns in Barcelona, • Compare the accuracy of the original and pre-processed
Spain, to their original dataset. Subsequently, using a transfer fundus images using the pre-trained CNN models Xception,
learning method, they performed image preprocessing and ResNet50, VGG-16, and EfficientNet B7.
fine-tuned five distinct CNN models. The study revealed that • Training pre-processed fundus images with a DCNN model

FIGURE 2. The overall process flow


the VGG-19 architecture exhibited the most favorable to boost classification accuracy is the goal.
performance, reaching an Area Under the Curve (AUC) of • Performance measures are utilized to assess and compare
94%, accompanied by a sensitivity of 87% and a specificity the outcomes of the pre-trained model with those of the new
of 89% [44]. VGG16, VGG19, Xception, InceptionV3, and model.
ResNet50 are pre-trained architectures on ImageNet for The pipeline illustration in Figure 2 illustrates the overall
Glaucoma (GL) detection, eliminating the requirement for process flow. The dataset consisting of raw retinal fundus
feature extraction or estimating geometric Optic Nerve Head images was subjected to testing using four different pre-
(ONH) parameters like Cup-to-Disc Ratio (CDR). trained models, namely ResNet50, VGG-16, EfficientNet
Combining five publicly accessible datasets of 1,707 fundus B7, and Xception, in order to identify the most effective one.
pictures created the ACRIMA dataset. The ACRIMA The raw fundus pictures were subjected to normal image
dataset, which contains 396 GL pictures and 309 normal eye processing techniques, and the dataset was then trained using
images, performed at 0.7678 with an accuracy of 70.2% on the most effective model identified in the prior experiment.
the test dataset. The additional open-source datasets in this Additionally, a custom-built DCNN was employed to train
analysis (HRF, sjchoi86-HRF, RIM-ONE, and DRISHTI- pre-processed data. Ultimately, a comparison of outcomes
was conducted to evaluate whether the execution accuracy of
5 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

the models improved with the utilization of pre-processed enhance the accuracy of subsequent stages in the process. In
images. the quest to detect clinical features related to DED, image
processing aims to elevate and refine the quality of the ocular
A. DATASET DESCRIPTION fundus image. Fig. 4 shows a flowchart of the method for
The dataset comprises Retinal images categorized into CA, segmenting and processing images.
DR, GL, and NORMAL. Furthermore, DED characteristics from fundus images are
 Images were collected from publicly available localized, retrieved, and segmented for further classification
datasets, including IDRiD, Oculur recognition, in pre-trained models. This section briefly discusses the
DRISHTI-GS, Retinal Dataset on GitHub, preprocessing methods used in this study.
Messidor, and the Messidor-2.
 Each of the images is labeled by ophthalmologists,
and its lesion grade is determined based on new BV,
hemorrhages, and microaneurysms.
 The Messidor Dataset comprises of 1200 ocular
fundus images of the back part of the eye's interior,
which were taken using a 3CCD color video camera
attached to a Topcon TRC NW6 non-retinograph
with a 45-degree Field of View (FOV). It was
designed to facilitate computer-assisted DED
studies.
 The Messidor-2 Dataset is an openly accessible
dataset with 1,748 color images of retinas from 874
subjects. Each subject contributes two images, one
for each eye. It uses International Clinical Diabetic
Retinopathy (ICDR) and Diabetic Macular Edema
(DME) grades to assign four disease rates per
subject.,
 The Dataset known as DRISHTI-GS includes 101
ocular images, consisting of 31 normal and 70
showing GL-induced damage. To address limited
images, an under-sampling technique was used,
selecting 1000 images from each class for
experimentation.

FIGURE 4. The workflow of data preprocessing

1) IMAGE ENHANCEMENT
Prior to processing, image-enhancing techniques were
applied, including contrast enhancement and lighting
FIGURE 3 Data Distribution
adjustments, to improve the informational content and visual
quality of the original images. The technique of Contrast
B. IMAGE PRE-PROCESSING enhancement: CLAHE [48], is used to enhance the visual
The purpose of the pre-processing phase is to eliminate noise clarity of the images. The CLAHE technique constitutes a
and irregularities from the ocular fundus image, thereby modified component inside the AHE (Adaptive Histogram
enhancing its quality and contrast. Along with contrast Equalization) process. The suggested approach encompasses
improvement, noise reduction, and image normalization, this the application of the boosting function to each individual
pre-processing step can help mitigate irregularities and
6 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

pixel inside the designated region, followed by the Where 𝑝0 and 𝑝𝑖 represent the initial and current pixel
identification of the corresponding transformation function. sizes, 𝜇𝑑 represents the target average intensity, and 𝜇𝑙
This phenomenon exhibits dissimilarities in comparison to represents the local average intensity, respectively [50]. This
AHE due to its relatively diminished level of contrast. In procedure amplifies the appearance of formatted
CLAHE, Contrast-Limited Histogram Equalization (CLHE) microaneurysms on the retinal surface.
is employed as a method to improve an image's contrast. This
is achieved by applying CLHE to smaller regions of the 2) IMAGE AUGMENTATION
image known as tiles, as opposed to the whole image. DL models exhibit superior performance when provided
Bilinear interpolation is then used to put the tiles back with substantial volumes of data for learning purposes [51,
together in a perfect way. CLAHE was used on grayscale 52]. Hence, the term "data augmentation" encompasses a
group of procedures used to expand the training data size
images of the retina. A function called 'clip limit' is used to
without adding any new examples. As a result, geometric
limit the amount of noise in an image. Clip the histogram and changes including flipping, rotation, mirroring, and cropping
make a grey-level mapping. In the contextual area, the are discussed as part of the picture augmentation methods
number of pixels is split evenly between each level of grey, covered in this study. Real-time image augmentation was
In order to obtain an average pixel value that is gray, as facilitated using the Keras Image Data Generator class,
indicated by: ensuring that the selected model would obtain image
variations during each iteration. In this study, the utilized
𝑁𝑐𝑟 −𝑥𝑝 ∗𝑁𝑐𝑟 −𝑦𝑝 Image Data Generator class possesses the capability to
𝑁𝑎𝑣𝑔 = (1)
𝑁𝑔 mitigate overfitting of the selected model by maintaining a
consistent dynamic range in the generated images as
Where 𝑁𝑎𝑣𝑔 represents the number of pixels on average, 𝑁𝑔 compared to the originals.
denotes the number of grey levels inside the contextual zone.
𝑁𝑐𝑟 − 𝑥𝑝 represents the amount of pixels in the contextual 3) IMAGE SEGMENTATION
region's 𝑥 direction. 𝑁𝑐𝑟 − 𝑦𝑝 represents the amount of While designing a classification system for DL-based
pixels in the contextual region‘s 𝑦 direction, then figure out moderate DED detection, it is critical to consider both the
the real clip limit. network design and input data quality. For the results to be
𝑁𝑐𝑙 = 𝑁𝑐𝑙𝑖𝑝 ∗ 𝑁𝑎𝑣𝑔 (2) accurate, the input image quality is a crucial element. The
outcome of an automated disease diagnosis method for
CLAHE [56] is a helpful method in biological image retinal fundus images is contingent on factors such as the
processing since it effectively highlights the key parts of an number of images available, the image brightness and
image as shown in Fig. 5 contrast, and the presence of anatomical characteristics.
Therefore, the process of feature segmentation enhances the
utility of images in classification tasks and contributes to the
enhancement of accuracy. The procedure, when used with
the corresponding theoretical framework, is outlined below.
Extraction of BV for Diagnosing DR at its earliest
stages, Retinal BV are a key anatomical characteristic in
images of the retina. Following these stages accomplishes
segmentation of retinal BV: Improved outcomes may be
attained by the use of (i) image enhancement, (ii) Tyler Coye
algorithm [53], and (iii) morphological operations.
After applying the aforementioned image processing
methods, the green RGB channel provided the most effective
comparison between the vascular network and the backdrop.
FIGURE 5. Sample retinal fundus image and Enhanced image The methods presented by Zuiderveld [48] and Youssif et al.
[50] may be used to estimate the contrast and brightness
Illumination Modification This preprocessing approach changes in a fundus image's backdrop. ISODATA in the
attempts to minimize the scenario effect introduced by Tyler Coye algorithm is then utilized to retrieve the threshold
retinal images with inconsistent illumination [49]. The level once contrast and brightness have been adjusted.
following formula is used to determine the intensity of each Morphological operation (erosion and dilation) was utilized
pixel: to improve upon the Tyler Coye algorithm's work. These two
𝑝𝑖 = 𝑝0 + 𝜇𝑑 − 𝜇𝑙 (3) basic procedures are crucial for eliminating background
noise and filling in foreground details. The following

7 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

equation depicts the process of erosion, which is utilized to Setting the maximum contrast rate to 𝐿, 0 ≤ 𝐿 ≤ 𝐿 [54]
eliminate or enhance the border of the region. adapts the image enhancement computation to the user-
specified maximum contrast level. Additional contrast
𝑀 ⊝ 𝑁 = {𝑝|𝑁𝑝 ⊆ 𝑀} (4) enhancement is applied to images with low contrast
measured by
𝜇(𝑎,𝑏)−∆
𝑀⨁ 𝑁 = {𝓍|𝑁𝓍 ∩ 𝑋 ≠ 0} (5) 𝜙(𝑎, 𝑏) = (_____________ ) (𝛤 − 1) (7)
𝛿−∆

𝑀 ∙ 𝑁 = (𝑀⨁𝑁) ⊝ 𝑁 (6)
Where, 𝜙(𝑎, 𝑏) and 𝜇(𝑎, 𝑏) represent pixels after Where,
𝜙(𝑎, 𝑏) and 𝜇(𝑎, 𝑏) denote the pixels after transformation
In which, the dilation is represented by ⨁ , and the erosion
and the pixels before transformation in the (𝑎, 𝑏)
is represented by ⊝ where M is the structural element, and
coordinates, respectively. ∆ is the highest pixel value, 𝛿 is
N is the dilatation of that set's erosion. Unfortunately, Tyler
Coye algorithm still has a few gaps in the. As seen in Fig. 6, the lowest pixel value of the input image and 𝛤 is the highest
this morphological procedure fills up the microscopic gaps, value of the grayscale image.
covering a portion of the essential BV areas. The use of median filtering is prevalent in the domain of
image processing due to its notable efficacy in reducing
noise. Using the mean filtering, median pixel value of the
window is used to replace the value at the window's
midpoint. Median filtering may be expressed mathematically
as,
𝐹(𝑎, 𝑏) = 𝑚𝑒𝑑𝑖𝑎𝑛(𝑠,𝑡)∈𝑆𝑎𝑏 {𝑔(𝑠, 𝑡)} (8)

Extraction of objects or segmented areas with similar


attributes from the background is the goal of segmentation, a
pixel classification approach [55]. Consequently, the
identification of optical discs was facilitated using the
Circular Hough Transform technique, also known as CHT.
FIGURE 6. Sample retinal fundus image and extracted Circular shapes in images are easy targets for the CHT
technique. The CHT method improves compared to
Identification and extraction of the OD: GL is a condition alternatives because the model demonstrates a significant
that arises due to optic nerve injury. Segmentation of OD is level of sensitivity to variations in the feature specification
a useful technique for investigating the sharper anatomical descriptions while being moderately resistant to the presence
changes in the optic nerve. Figure 7 displays anatomically of image noise. The computation of the CHT is performed
accurate retinal fundus image obtained from the data set using the following formula:
including the OD. The CHT (Circular Hough Transform)
was employed to identify the circular objects, and then the (𝑚 − 𝑎)2 + (𝑛 − 𝑏)2 = 𝑐 2 (9)
median filter was utilized to reduce the noise, and threshold The following are the stages in the process of circle detection
values were applied to segment the OD, as depicted in Fig. (i) The image's binary edges are extracted, (ii) the parameters
8, for the purpose of OD segmentation. CLAHE can only be ‘a’ and ‘b’ are given values, (iii) determine the radius value
applied on a specified section, or "tile," of the image. It of ‘c’, (iv) modify the accumulator in accordance with (a),
cannot be used on the entire image. (b), and (c), (v)Within the scope of interest, replace ‘a’ and
‘b’ values and proceed to stage (iii) to compute ‘c’.

FIGURE 7 Sample retinal fundus image and optical nerve


damage in Glaucoma (GL) FIGURE 8 Sample retinal fundus image and segmented optic disc

8 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

Localization and detection of exudate: Exudates may be 𝜎 2 (𝑡) = 𝜎𝑊


2 (𝑡)
+ 𝜎𝐵2 (𝑡) (17)
seen as bright patches of varied size, brightness, position, and
form in two-dimensional ocular images taken using a digital Thus, 𝜎𝑊 2
is referred to as the WVC (Within-Class
fundus camera. Accurate exudate segmentation is difficult Variance) and represented in the (18), where 𝜎𝐵2 is referred
because of the vast variation in exudate size, intensity, to as the BVC (Between-Class Variance) and is represented
contrast, and shape. Given the wide range of size, intensity, in (19). The WVC is the total amount of variance between
contrast, and form, precise segmentation of exudates is a classes after the probability of each class has been applied to
challenging task. There are three main processing processes the total amount of variation. Equation (20) is used to
included in it: The procedure involves four distinct stages: compute the average total. The threshold value may be
(1) improving image quality; (2) detecting and eliminating reached by minimizing WVC or maximizing BVC, however
the OD; (3) eliminating BV; and (4) extracting exudates. BVC requires less computing time.
Classification of DR can be accomplished by applying the
evaluation standards outlined in the Messidor dataset after 2 (𝑡)
𝜎𝑊 = 𝑊1 (𝑡) ∙ 𝜎1 (𝑡)2 + 𝑊2 (𝑡) ∙ 𝜎2 (𝑡)2 (18)
exudates have been obtained from the mild dataset. Fundus
images may be utilized to identify the existence of the
exudates, allowing for a timely diagnosis of early DR. When 𝜎𝐵2 (𝑡) = 𝑊1 ∙ [𝑀1 (𝑡) − 𝑀𝑇 ]2 + 𝑊2 ∙ [𝑀2 (𝑡) − 𝑀𝑇 ]2 (19)
the OD is found and detached, Otsu thresholding is used to
identify potential exudate regions. The Otsu technique can 𝑀𝑇 = ∑𝑁
𝑖=1 𝑖. 𝑝(𝑖) (20)
automatically estimate a threshold value T from the provided
input ocular image. Then, (10) is what the histogram uses to Morphology encompasses a group of distinct parameters
calculate its intensity value, that pertain to the pixel entity inside an image, using logical
operations such as "or", "and". The opening procedure seeks
256 to remove pixel areas that are smaller than structural
𝑛𝑖
𝑃(𝑖) = , 𝑃(𝑖) ≥ 0, ∑ 𝑃(𝑖) = 1 (10) elements and refine and restore object shape. Equation
𝑁 (21) is used to represent opening operation.
1
The number of pixel images 𝑁, as well as the number of
pixels 𝑛𝑖 with intensity 𝐼. (11) and (12), respectively, 𝑀𝜊𝑁 = (𝑀Θ𝑁)⨁𝑁 (21)
describes the subject weight and background.
𝑡 The segmentation of exudates in the macula is shown in
𝑊1 (𝑡) = ∑ 𝑃(𝑖) (11) Fig. 9.
𝑖=1
𝐿

𝑊2 (𝑡) = ∑ 𝑃(𝑖) = 1 − 𝑊1 (𝑡) (12)


𝑖=𝑡+1

Here, the gray level number is 𝐿. The background and the


object mean is determined by using (13), (14) respectively.
𝑡

𝑀1 (𝑡) = ∑ 𝑖 ∙ 𝑃(𝑖)/ 𝑊1 (𝑡) (13)


𝑖=1
FIGURE 9 Sample retinal fundus image and segmented exudates
𝑡

𝑀2 (𝑡) = ∑ 𝑖 ∙ 𝑃(𝑖)/ 𝑊2 (𝑡) (14)


C. TRANSFER LEARNING
𝑖=1
T This study employs CNN-based transfer learning to
establish a classification method for DED retinal fundus
Thus, Variance is evaluated by (15), (16) respectively, images. Transfer learning strategies are explored,
while the (17) represents the expression for the sum of leveraging pre-trained CNN models to achieve optimal
variance. classification outcomes. The following section will provide
𝑡
𝑃(𝑖) an in-depth exploration of the specifics related to the pre-
𝜎12 (𝑡) ∑(1 − 𝑀1 )2 ∙ (15) trained models.Pan et al. [56] provide the following
𝑊1 (𝑡)
𝑖=1 definition of transfer learning: 𝐷 = Φ, 𝑃(𝑋) where 𝑋 =
𝑡
𝑃(𝑖) 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝜖Φ, where 𝐷 is the domain, Φ is referred to
𝜎22 (𝑡) ∑(1 − 𝑀2 )2 ∙ (16)
𝑊2 (𝑡) feature space, and 𝑃(𝑋) is the marginal distribution of
𝑖=1
probabilities. 𝑇 = 𝑌, 𝐹(∗) is a learnt objective predictive
9 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

function from the feature vector and label pairs, where 𝑇 is network's many layers and filters. The suggested framework
the task and 𝑌 is the label space. comprises a set of 2D convolutional layers, max-pooling
To be more precise, given 𝐷𝑠 , a source domain and 𝑇𝑠 , a layers, and the batch normalization layers. These
learning task, and 𝐷𝑡 , a target domain and 𝑇𝑡 , a learning components have been fine-tuned with carefully selected
task, transfer learning is a procedure of enhancing the target hyperparameters to effectively capture features from input
predictive function learning 𝐹𝑡 (∗)in 𝐷𝑡 based on the fundus images spanning various categories. To facilitate the
knowledge gained from the source domain 𝐷𝑠 and the diagnosis of eye diseases, we incorporated a fully connected
learning task 𝑇𝑠 , where𝐷𝑠 ≠ 𝐷𝑡 , or 𝑇𝑠 ≠ 𝑇𝑡 . It is important layer to serve as a classifier, which accepts the feature maps
to acknowledge that the aforementioned single source generated by the CNN as input. The network consists a total
domain has the potential to include a multitude of other of seventeen weighted layers, constituting the proposed
source domains. model. This includes fourteen convolutional layers, two fully
In image classification, transfer learning is based on the connected layers, and one classification layer. Additionally,
idea that a neural network performs better when it is given the network is enhanced using batch normalization, max-
a large and varied dataset to learn from, such as ImageNet, pooling, dropout, and flattening. In order to classify fundus
it can effectively adapt to and excel in a particular target images, the most crucial and difficult step is to extract
task, despite the other having fewer labelled instances than features from them. In contrast to numerous manual and
the pre-training dataset. Using these acquired feature maps machine learning-based methods for extracting features,
is advantageous in comparison to building a massive deep neural networks serve as automated feature extractors.
architecture from the scratch using a massive dataset. The convolutional operation, represented by (∗), is a built-in
In this research, two approaches will be employed to function and a fundamental component of deep neural
fine-tune existing trained models: (1) Feature extraction, networks, essential for feature extraction. Mathematically, it
the process of using features discovered in the primary task involves multiplying two functions (a and b) to generate a
to draw out pertinent characteristics from the destination third function (a*b). The use of a 𝑲 × 𝑲 window size or
task. To adapt the feature mappings learned from the kernel in convolution is preferred, with 𝑲 ideally being an
sample data, a new classifier was layered atop the pre- odd integer for improved symmetry around the origin and
trained network, with the option for training from scratch. reduced aliasing errors. Convolutional layers store high-level
In step two, the fine-tuning process, certain previously extracted features, with the kernel sliding across image
frozen layers within the base network are unfrozen, pixels to produce feature maps for each of the 𝑵 filters in
allowing training of these unfrozen layers concurrently every layer. If the input dimension of the fundus image is
with the newly introduced classifier layers. This fine-tuning (𝑷𝟏 × 𝑷𝟐) and 𝑵 kernels with a 𝑲 × 𝑲 window are
procedure refines the base network's higher-level feature employed, the resulting image shape will be 𝑵 × ((𝑷𝟏 − 𝑲 +
representations to make them better fit to the target task. To 𝟏) × (𝑷𝟐 − 𝑲 + 𝟏)). This iterative process continues until
accomplish DED image classification, four CNN models precise feature patterns are extracted from the input fundus
that have been pre-trained include ResNet50, VGG-16, image. The design parameters proposed in this approach are
Xception, and EfficientNetB7 are fine-tuned. The carefully selected in a systematic manner to fine-tune the DL
properties of four Image-Net pre-trained CNN networks are model and achieve effective results. Several locations from
listed in Table I. the parameter combination are uniformly picked to provide
TABLE I the best possible hyperparameter combinations. The best
THREE CNN MODELS WERE PRE-TRAINED USING IMAGENET AND ITS
parameter for controlling the dataset's complexity is
FEATURES
Model Size Parameters
determined by cross-validation for every feasible parameter
combination.
ResNet50 98MB 25,600,000
The proposed DCNN architecture employed a constant
VGG-16 528MB 138,357,544 fundus image size of 224 × 224 × 3 pixels, utilizing a 3 ×
Xception 88MB 22,910,480 3 filter window size across the entire network. This size
EfficientNetB7 256MB 66,700,000 choice provides a relatively limited visual field, but it was
sufficient for preserving the image's indications of vertical
IV. PROPOSED DCNN ARCHITECTURE
and horizontal orientation, as well as its central features. In
In order to classify medical visual abnormalities, CNNs are the convolutional layers of this proposed network, a stride
the most often used DL method [57]. This is because CNN value of 1 pixel was applied, causing the kernel to shift by 1
maintains individual characteristics when examining input pixel when padding was employed to retain information at
images. The following discussion highlights the relevance of the image borders. Since the network-wide padding value is
spatial connections in retinal images, such as the location of uniform, an extra 1 pixel is appended to all four image
BV rupture or the buildup of a yellowish fluid in the macula. borders.
Fig. 10 depicts the whole procedure architecture. Fundus Training a deep neural network gets more difficult as
images that have been processed using a deep CNN are the number of parameters increases. To address this issue,
automatically probed for their feature patterns, using the pooling layers are commonly employed to decrease the
10 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

parameter count. One popular pooling technique is max a Gaussian distribution of one unit. The DCNN was
pooling, where a window slides across the feature map constructed using the optimizer (Adam) and RMSprop. The
generated from ocular fundus images, selecting the highest training loss was minimized by optimizing the learning rate
point value within the window. This method is often favored and weights using the Sparse Categorical Cross-Entropy
over other pooling algorithms. In the proposed architecture, function, which was employed along with the RMSProp
five max-pooling layers were incorporated at different points optimization function. proposed architecture uses dropout
after sets of convolutional blocks. These max-pooling layers for regularization to reduce the possibility of overfitting in
utilize a 2 × 2 pixel window, a stride of 2, and maintain the situations when the system is required to make a decision
same padding. After each max-pooling layer is applied, relying on an exceptionally extensive set of parameters.
DCNN increases the total number of filters in use from 32 to In order to retain neurons that can store patterns related to
512 through a series of weighted block configurations. Since eye diseases, a more pronounced utilization of dropout is
the input feature mapping can change as the network's necessary during the classification phase. This differs from
weights are updated during training, this can add complexity the convolutional layer blocks responsible for feature
to training a deep neural network. Therefore, the proposed extraction [59]. To ensure training regularization, a dropout

FIGURE 10 The proposed DCNN layered architecture


architecture incorporates batch normalization, as it helps value of 0.5 is employed in the initial two fully connected
mitigate this issue [58]. Batch normalization works by layers, and a batch size of 32 is used.
standardizing and normalizing the input to a layer based on Rectified linear units (ReLU) [60] are used to activate all
mini-batches of data instead of the entire training dataset. of the proposed architecture's intermediate layers, and the
This approach enhances the robustness of the neural Softmax function [47] is used to activate the network's output
networks. It effectively tackles the problem of the internal layer since the dataset is nonlinear. ReLU is a nonlinear
covariate shift by ensuring that the input to each layer activation function that outperforms Sigmoid and Tanh in
maintains a consistent mean and standard deviation, which terms of performance and convergence speed. As shown in
are representative of a normal distribution. Gradients are less Eq. (22) below, ReLU rectifies all the values that are
sensitive to changes in their starting values and parameter negative in the extracted feature map, which boosts accuracy
sizes after being normalized in batches. It initiates training and shortens the training time.
for a deep neural network with the activation function having

11 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

0, 𝑖𝑓 𝑎 ≤ 0
ReLU(𝑎) = { } (22)
𝑎, 𝑖𝑓 𝑎 ≥ 0 The second convolutional layer also received identical
configurations, featuring 32 filters. To process the output
In the network's output layer, Softmax activation function from the previous layer and decrease the dimensionality of
was employed to transform the outcome into probabilities for the feature maps, a max-pooling layer with a 2 × 2 kernel
the classification of ocular fundus images into four distinct size and a stride value of 1 was introduced. This particular
categories: CA, DR, GL, and NORMAL. The initial pooling layer setup is applied consistently after each pair of
convolutional layer used 32 filters for feature extraction, convolutional layers within the architecture.
with an input shape of (224x224x3). The third and fourth convolutional layers employed a set
Throughout the proposed architecture, all convolutional of 64 filters each., arranged in a 112 × 112 × 32 format.
layers share common characteristics: they have a kernel size The shape of the output from the previous layer was then
of 3x3, use the same padding, employ a stride value of 1, and reduced to (56 × 56 × 64) as a result of the max-pooling
utilize the Rectified Linear Unit (ReLU) activation function layer's operation, which used a 2 × 2 kernel size and a stride
as specified in (23). value of 1.
The fifth convolutional layer makes use of 128 filters of
size 3 × 3 . The layer receives a 56 × 56 × 64 input shape,
𝑎11 𝑎12 𝑎13 𝑎14 ⋯ 𝑎1𝑛 performs a convolutional operation on the feature maps
𝑎21 𝑎22 𝑎23 𝑎24 ⋯ 𝑎2𝑛 results in an output shape of 56 × 56 × 128.The
𝐾1 𝐾2 𝐾3
𝑎31 𝑎32 𝑎33 𝑎34 ⋯ 𝑎3𝑛 convolutional layer's output was normalized using batch
𝑎41 𝑎42 𝑎43 𝑎44 ⋯ 𝑎4𝑛 × [ 𝐾4 𝐾5 𝐾6 ]
𝐾 𝐾8 𝐾9 normalization, which extracted 512 parameters. Having an
⋮ ⋮ ⋮ ⋮ ⋯ ⋮ 7
input shape of 56 × 56 × 128, and then applying batch
[𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 𝑎𝑛4 ⋯ 𝑎𝑛𝑛 ] normalization, sixth, seventh, and the eighth layers are all
Image Kernel identical to the fifth.
The ninth layer uses 256 filters and a 3 × 3 filter size as
𝐹 𝐹2
→ [ 1 ] input and output shape of the maxpool layer, which has the
𝐹3 𝐹4 dimensions of 28 × 28 × 128. After the convolutional layer,
Extracted features (23)
1024 parameters are extracted using batch normalization.
With an input shape of 28 × 28 × 256 and then batch
After every convolutional layer, batch normalization is
normalization, the tenth and eleventh layers are identical to
used to standardize and normalize the output of each
the ninth. The output of the max-pooling layer is a feature
convolutional layer for training, as specified in (24).
map with a shape of 14 × 14 × 256. The input shape of the
twelfth layer is 14 × 14 × 256 and 512 filters. After batch
𝑎−𝜇 normalization, the convolutional layer's output is
𝑎 → 𝑎̂ = → 𝑏 = 𝛾𝑎̂ + 𝛽 (24) normalized and 2048 parameters are extracted. The
𝜎
thirteenth and fourteenth layers are identical to the prior
In this context, (𝜇, 𝜎) represents the mean and the standard layer. Max-pooling serves to decrease the dimensions of the
deviation of a specific parameter within the 𝛽-shifted feature map to 7 × 7 × 512 by employing the same window
minibatch. Algorithm 1 determines the steps involved in size. Following a deep convolutional network, a flattened
minibatch batch normalization in detail. layer is introduced to transform the output into a one-
dimensional vector. This vector is subsequently employed
Algorithm 1: Technique of Batch normalization across a for classification by means of a fully connected layer.
mini-batch The first and second dense layers use the ReLU
Input: value of a over a mini-batch: 𝛽 = {𝑎1...𝑛 }; activation function. These two dense layers have a dropout
learnable parameters:𝛾, 𝛽 layer inserted in between them with a dropout rate of 50%.
Output: bi The output classification layer is built as a densely connected
layer, and it consists of four output neurons that reflect the
probability of each class during predictions. Table 2 provides
𝑝 1
1: 𝜇𝛽 ← ∑𝑗=1 𝑎𝑗 comprehensive details of the proposed architecture's
𝑝
𝑝 1 configuration and parameters, including the layer-by-layer
2: 𝜎𝛽2 ← ∑𝑗=1(𝑎𝑗 − 𝜇𝛽 )2 output structure with weights.
𝑝
𝑎𝑗 −𝜇𝛽
3: 𝑎̂𝑗 ←
2 +𝜖
√𝜎𝛽

4: 𝛾𝑗 ← 𝛾𝑎̂𝑗 + 𝛽 ≡ 𝐵𝑁𝜇,𝛽 (𝑎𝑗 )


5: 𝑟𝑒𝑡𝑢𝑟𝑛 𝛾𝑗

12 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

TABLE. II

VARIOUS PARAMETERS INVOLVED IN THE PROPOSED ARCHITECTURE

Layers Type of Layers Filters Activation Function Output Shape parameters

1 Convolutional 2D 32 ReLU 224,224,32 896


2 Convolutional 2D 32 ReLU 224,224,32 9248
3 Convolutional 2D 64 ReLU 112,112,64 18,496
4 Convolutional 2D 64 ReLU 112,112,64 36,928
5 Convolutional 2D 128 ReLU 56,56,128 73,856
6 Convolutional 2D 128 ReLU 56,56,128 147,584
7 Convolutional 2D 128 ReLU 56,56,128 147,584
8 Convolutional 2D 128 ReLU 56,56,128 147,584
9 Convolutional 2D 256 ReLU 28,28,256 295,168
10 Convolutional 2D 256 ReLU 28,28,256 590,080
11 Convolutional 2D 256 ReLU 28,28,256 590,080
12 Convolutional 2D 512 ReLU 14,14,512 1,180,160
13 Convolutional 2D 512 ReLU 14,14,512 2,359,808
14 Convolutional 2D 512 ReLU 14,14,512 2,359,808
15 Fully Connected Layer 1 ReLU 4096 102,764,544
16 Fully Connected Layer 2 ReLU 1024 4,195,328
17 Classification layer Softmax 4 4100
Trainable parameters 114,927,268
Non-trainable parameters 6016
Total Parameters 114,933,284

V. EXPERIMENTAL DETAILS Correct diagnosis with the identification of anomalies, True


Negative (TN): Accurate exclusion of periodic instances,
A. HARDWARE AND SOFTWARE False Positives (FP): Instances incorrectly grouped as
All of the experiments are executed on a Jupyter notebook periodic. The values within the confusion matrix are
using Python 3.8, running on hardware with a 2.3 GHz Intel calculated using the performance metrics outlined below.
Core i9 processor, 16 GB of 2400 MHz DDR4 RAM, and Accuracy: Accuracy (Acc) serves as a crucial metric when
Intel UHD Graphics 630 with 1536 MB of memory. MatLab evaluating the performance of DL classifiers. It is a
was utilized for both the front-end and back-end in the representation of the correct predictions, encompassing both
experiments. The data was divided into an 80/20 ratio, with true positives and true negatives, divided by the total number
80% used for training and the remaining 20% for testing. To of elements in the matrix. While a highly accurate model is
ensure a balanced distribution of classes, a generic selection desirable, it's important to ensure the use of balanced
method was employed for data segregation. A mini-batch datasets, where false positive and false negative values are
size of 32 was used, and the categorical cross-entropy loss approximately equal. To evaluate the effectiveness of the
function was applied. Adam, the default optimizer, and proposed classification model on the DED dataset, we will
RMSprop were used to generate CNN. Results were calculate the elements of the previously mentioned confusion
validated using the test dataset's accuracy, sensitivity, and matrix.
specificity metrics, which are standard performance
assessment metrics. 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐(%) =
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
B. EVALUATION CRITERIA
The most effective DL model has undergone a
comprehensive evaluation using various metrics. This Sensitivity: Sensitivity (Sen) is determined by dividing the
evaluation aims to determine the accuracy of classifying count of accurate positive predictions by the total count of
DED as either true or false. Initially, we present the positive predictions. Sen ranges from 0.0 (lowest) to 1.0
confusion matrix in Figure 5, obtained through cross- (highest). The following equation is utilized to compute sen:
validation estimation [61]. This confusion matrix provides 𝑇𝑃
predictions for the following outcomes: True Positive (TP): 𝑆𝑒𝑛 =
𝑇𝑃 + 𝐹𝑁
13 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

Specificity: Specificity (Spe) is determined by dividing the TABLE. III


count of correct negative predictions by the total count of MODEL’S AVERAGE PERFORMANCE ON ORIGINAL IMAGES
negatives. Spe also ranges from 0.0 (lowest) to 1.0 (highest). DED Model Acc% Sen% Spe% Prec%
The following equation is used to calculate Spe: CA ResNet50 60.87 67 58 63
VGG 16 65.87 72 63 68
𝑇𝑁 Xception 61.67 76.43 58.85 86
𝑆𝑝𝑒 = EfficientNetB7 85.43 81.92 90 79
𝑇𝑁 + 𝐹𝑃
DR ResNet50 62.07 65 60 62
VGG 16 67.07 70 65 67
Xception 56.72 85 56 45
EfficientNetB7 82.79 85 86 82

GL ResNet50 63.41 85.71 58.82 80


VGG 16 68.41 90.71 62.82 85
Xception 85.49 82 89 81
EfficientNetB7 90.80 99.12 88.33 90
FIGURE 11 Illustration of a confusion matrix

NORMAL ResNet50 63.07 66 61 63


VGG 16 68.07 71 66 68
C. RESULTS Xception 61.67 76.43 68.85 80
In this study, the performance Acc of three distinct pre- EfficientNetB7 86.49 82 89 81
trained DL models, namely ResNet 50, VGG-16, Xception,
and EfficientNet B7, was compared and analyzed against the
new DCNN model. Large-scale ImageNet data was used to TABLE. IV
train and evaluate the pre-trained models used in this study. THE EFFICIENTNETB7 MODEL'S AVERAGE PERFORMANCE ON PRE-
This data includes images of vehicles, animals, flowers, and PROCESSED IMAGES

more. While models are successful in object image DED Model Acc% Sen% Spe% Prec%
categorization, their use is limited to specific domains like CA EfficientNetB7 94.13 90 96 97
medical lesion (DED) detection. Retinal fundus images
DR EfficientNetB7 88.43 91 90.71 83
include a variety of complicated characteristics and lesion
localization that influence the prediction of pathological GL EfficientNetB7 93 96 95 95
indications. Each CNN layer creates a unique representation
of the input image by successively extracting its most salient NORMAL EfficientNetB7 90 91 93 90.71
features. For example, the first layer can learn edges,
whereas the last layer can recognize a lesion as a DED
classification characteristic. As a consequence, the following Table III and IV display the conclusive results for each
conditions were tested: BV, macular areas, and the OD have model used for comparison, presenting Acc percentages as
all been recognized, localized, and segmented as regions of the key metric. Among the four fully trained DL models,
interest. EfficientNet B7 exhibited superior classification
For each phase of the proposed system, a blend of performance, surpassing ResNet 50, VGG-16, and Xception.
standard image segmentation methods was employed. All of Similarly, the newly developed CNN model, leveraging pre-
these algorithms yielded successful segmentation outcomes, processed retinal images, demonstrated exceptional
demonstrated in Fig. 12, for the specified area of interest. To performance, aligning with the proficiency of the other pre-
establish a high-performance system, a series of steps were trained models.
taken, encompassing the image enhancement, segmentation
of BV, OD identification and the extraction, macular region TABLE. V
extraction, BV removal, OD elimination, feature extraction, THE DCNN MODEL'S AVERAGE PERFORMANCE ON ORIGINAL IMAGES
and feature classification. Following segmentation, the DED Model Acc% Sen% Spe% Prec%
image size was optimized to a feasible dimension based on CA CNN 68.33 58.33 78.33 81
the input specifications of each network.The Image Data DR CNN 87.36 88.33 87.35 86
Generator class in Keras was used to augment the imbalance
GL CNN 92.77 93 86 88
dataset in real time, reducing the possibility of model
NORMAL CNN 91 88 92.75 89.13
overfitting. Pre-trained models were utilized for fine-tuning
after having n layers (CNN layer dependent) discarded and
re-trained.

14 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

FIGURE 12. Segmentation Results

The reliability of the results is compared in Table V and VI.


TABLE. VI The built-in CNN demonstrates higher classification Acc
THE DCNN MODEL'S AVERAGE PERFORMANCE ON PRE-PROCESSED IMAGES than any other tested model. For the identification of retinal
DED Model Acc% Sen% Spe% Prec% abnormalities, more comprehensive screening classification
CA DCNN 96.43 99.46 93.24 99 models were developed. For the multi-class classification of
healthy and various DED statuses, the ROC curves and
DR CNN 98.33 99.32 91.67 94 confusion matrices of EfficientNetB7, best performed pre-
GL CNN 97 98 88.24 94 trained DL model and a built DCNN model are depicted in
Fig. 13 and Fig. 14.
NORMAL CNN 96 93 94.65 91.71

FIGURE 13. EfficientNet B7 model performance

15 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

FIGURE 14. Build DCNN model performance

D. DISCUSSION
This research investigates the application of multi-class addressing early categorization issues remains a key clinical
classification DL techniques to automatically detect three concern. Previous studies predominantly centered on binary
distinct DED. The findings of this study highlight that the classification for predicting diabetic eye diseases. It's worth
intricacy of DL algorithms is primarily affected by the noting that even though Google has developed a DL model
quality and the quantity of available data, specifically ocular that surpasses the performance of ophthalmologists, their
fundus images, rather than the inherent method itself. In this 'Inceptionv3' model was specifically optimized for binary
study, publicly accessible annotated fundus image data were classification in the context of (DR) identification, utilizing
utilized for experimentation. It is worth noting that labeled the GoogLeNet architecture, as demonstrated by Gulshan et
hospital fundus images could potentially yield more robust, al. [14]. The evaluation of this approach included the
practical, and realistic results for computer-aided clinical incorporation of an extensive image collection built
applications. CA, DR, and GL are three of the most common specifically for DR screening in diabetic patients, both
retinal disorders associated with diabetes. Without timely healthy and unhealthy. Gulshan et al. reported a recall of
assessment and intervention, these conditions have the 93%-96% for binary disease classification, however, they
potential to cause significant and irreversible visual also pointed out that training on 60,000 samples yields no
impairment [1, 2]. Increasing life expectancy, busy lifestyles, improvement in recall compared to training on 120,000
and various other variables all point to a rise in the number samples using nonpublic data. Visualizations of CNN-
of diabetics [1]. Early detection of abnormal symptoms extracted features show that the patterns used in
reduces the future progression of the disease, its impact on classification are discernible to the human eye [62, 63]. The
affected persons, and associated medical expenses. diabetic retinal images in the moderate and severe classes
Consequently, the DED identification system has the comprise macroscopic characteristics in terms of suitability
potential to fully automate or partially automate the eye- in order to be classified by modern CNN architectures like
screening process. The first approach necessitates a high those found in the ImageNet collection of images, the
level of Acc, similar to that of retinal specialists. In line with discernible features occupy a substantial portion of the scale.
the guidelines of the British Diabetic Association (BDA), the However, it's worth noting that the distinction between mild
chosen approach must meet a lowest threshold of 95% Spe and normal disease is often based on a relatively small
and the 80% Sen for the detection of vision-threatening DR fraction, Comprising a fraction of the total pixel volume,
method condenses the results of massive screening efforts to specifically less than 1%. This degree of differentiation can
identify possible DED instances for further study in humans. prove challenging for human interpreters to discern. This
Both of these alternatives substantially reduce the need for research suggests that additional investigation into
trained ophthalmologists and specialist facilities, opening up automated diagnosis utilizing ocular fundus imaging should
the procedure to a much larger population and making it be conducted to create multi-class DED classification. The
more feasible in areas with limited resources. Obtaining experiment begins with typical image processing to enhance
extensive datasets encompassing various diabetic eye moderate DED characteristics. To extract DED lesions,
disorders has posed significant challenges. Additionally, several standard image processing approaches were used.
16 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

Transfer learning-trained CNN models demonstrate strong requirements sufficiently, although Spe was deficient by 9%
performance when applied to object-oriented images like and 6%.
flowers, vehicles, and animals. However, they do not
perform as effectively with lesion-based medical images. VI. CONCLUSION
Therefore, the primary objective of this study is to establish This work presents a method for identifying multi-class
standardized methods for identifying multi-class DED DED, which has not been thoroughly described in earlier
lesions. This is achieved through the region of interest research. A number of DL performance
segmentation, followed by the utilization of transfer learning optimization strategies have been used, including image
and CNN for extracting features subsequently. enhancement methods, like extracting the green channel,
Initially, the top layer of the model is eliminated (as per CLAHE, and illumination correction, were applied.
the previous technique), and a thorough assessment of four Subsequently, image segmentation methods such as the
distinct CNN architectures, encompassing cutting-edge Tyler Coye Algorithm, Otsu thresholding, and Circular
techniques, is carried out. Following this, the upper layers (n Hough Transform are applied to extract the essential ROI’s
layers) are "unfrozen" and retrained to specifically such as extraction of features like BV, the macular region,
accommodate the intricacies of the application's case study, and the optic nerve from the raw ocular fundus images. After
as detailed in this methodology. The training of the system is preprocessing, these images are trained using EfficientNetB7
carried out using datasets including Messidor, Messidor-2, model that outperformed among the four pre-trained models
and retinal images. To evaluate potential Acc gains for the ResNet50, VGG-16, Xception, and EfficientNetB7 and the
categorization of Normal/multi-class DED images, two proposed DCNN model. The proposed DCNN methodology
training sets were created using the dataset that was available holds promising results for the CA, DR, GL, and NORMAL
(i) before and (ii) after preprocessing. detection tasks, achieving accuracies of 96.43%, 98.33%,
Due to the very minor signs of the impairment, multi- 97%, and 96%, respectively. Automatic identification
class DED is sometimes very difficult to distinguish from a capabilities that are highly selective across categories are
normal retina, therefore an enhancement in the data quality another advantage of DL. This approach helps overcome the
was anticipated. To make abnormal features more visible. technical constraints linked to the analytical and frequently
CNN, the top 1 architecture with the top layer removed and subjective process of manual feature extraction. Moreover,
retrained EfficientNetB7, produced Acc values of 94.13%, the study incorporated comprehensive datasets from various
88.43%, 93% and 90% for each (Table 4). Xception and origins to assess the system's robustness and its capacity to
ResNet50 achieved the lowest performance. The effect of the handle real-world scenarios. The proposed model
fine-tuning differed throughout models. The observed Acc streamlines labor-intensive eye-screening procedures and
pick-up was minimal, confirming the suitability of networks acts as a supplementary diagnostic tool, minimizing human
that are pre-trained by default for DED classification tasks. subjectivity.
In simpler terms, even though these CNN networks
underwent training on a diverse range of images from the VII. REFERENCES
[1] Maneesha Vadduri and P. Kuppusamy, "Diabetic Eye Diseases
ImageNet library, they exhibited the capability to
Detection and Classification Using Deep Learning Techniques—A Survey,"
differentiate between multi-class DED and a healthy retina. in Intl. Conf. on Info. and Commun. Tech. for Competitive Strategies
Unfreezing is not recommended if it does not increase in (ICTCS), Singapore, 2022, pp. 443-454.
Acc, since this would waste computing resources and time. [2] Sarki, Rubina, Khandakar Ahmed, Hua Wang, Yanchun Zhang,
The Acc of the built DCNN model was 96.43 %, 98.33 %, Jiangang Ma, and Kate Wang. "Image preprocessing in classification and
identification of diabetic eye diseases." Data Science and Engineering 6, no.
97%, and 96% respectively. The performance of the
4 (2021): 455-471.
utilized models was compared using two scenarios: (1) [3] Abràmoff, Michael D., Joseph M. Reinhardt, Stephen R. Russell, James
Before image preprocessing and (2) following image C. Folk, Vinit B. Mahajan, Meindert Niemeijer, and Gwénolé Quellec.
preprocessing. To address overfitting, models underwent "Automated early detection of diabetic retinopathy." Ophthalmology 117,
no. 6 (2010): 1147-1154.
training on a raw dataset without preprocessing, including
[4] Kuppusamy, P., Mehfooza Munavar Basha, and Che-Lun Hung. "Retinal
data augmentation involving geometric transformations blood vessel segmentation using random forest with Gabor and Canny edge
applied to the Messidor, Messidor-2, and DRISTI-GS features." In 2022 International Conference on Smart Technologies and
datasets. Post-image preprocessing, the datasets were Systems for Next Generation Computing (ICSTSN), pp. 1-4. IEEE, 2022.
[5] Kuppusamy, P., and Che-Lun Hung. "Enriching the multi-object
subjected to various conventional image processing
detection using convolutional neural network in macro-image." In 2021
techniques, resulting in enhanced classification performance International Conference on Computer Communication and Informatics
to 98.33% (the greatest Acc obtained for DR). After (ICCCI), pp. 1-5. IEEE, 2021.
evaluating the high-performing technique on the CA, DR, [6] Li, Hu, Ye Wang, Hua Wang, and Bin Zhou. "Multi-window based
ensemble learning for classification of imbalanced streaming data." World
GL, and NORMAL detection tasks, maximum Sen rates of
Wide Web 20 (2017): 1507-1525.
99.46%, 99.32%, 98%, and 93% were achieved, along with [7] Lam, Carson, Darvin Yi, Margaret Guo, and Tony Lindsey. "Automated
maximum Spe rates of 93.24%, 91.67%, 88.24%, and detection of diabetic retinopathy using deep learning." AMIA summits on
94.65%. Therefore, early DED detection met the BDA translational science proceedings 2018 (2018): 147.

17 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

[8] Sarki, Rubina, Khandakar Ahmed, Hua Wang, and Yanchun Zhang. [25] Gao, Xinting, Stephen Lin, and Tien Yin Wong. "Automatic feature
"Automatic detection of diabetic eye disease through deep learning using learning to grade nuclear cataracts based on deep learning." IEEE
fundus images: a survey." IEEE access 8 (2020): 151133-151149. Transactions on Biomedical Engineering 62, no. 11 (2015): 2693-2701.
[9] Mookiah, Muthu Rama Krishnan, U. Rajendra Acharya, Chua Kuang [26] Syarifah, Mas Andam, Alhadi Bustamam, and Patuan P. Tampubolon.
Chua, Choo Min Lim, E. Y. K. Ng, and Augustinus Laude. "Computer-aided "Cataract classification based on fundus image using an optimized
diagnosis of diabetic retinopathy: A review." Computers in biology and convolution neural network with lookahead optimizer." In AIP Conference
medicine 43, no. 12 (2013): 2136-2155. Proceedings, vol. 2296, no. 1. AIP Publishing, 2020.
[10] Kaur, Manpreet, and Mandeep Kaur. "A hybrid approach for automatic [27] Junayed, Masum Shah, Md Baharul Islam, Arezoo Sadeghzadeh, and
exudates detection in eye fundus image." Int J 5, no. 6 (2015): 411-417. Saimunur Rahman. "CataractNet: An automated cataract detection system
[11] Karegowda, Asha Gowda, Asfiya Nasiha, M. A. Jayaram, and A. S. using deep learning for fundus images." IEEE Access 9 (2021): 128799-
Manjunath. "Exudates detection in retinal images using back propagation 128808.
neural network." International Journal of Computer Applications 25, no. 3 [28] Pratap, Turimerla, and Priyanka Kokil. "Computer-aided diagnosis of
(2011): 25-31. cataract using deep transfer learning." Biomedical Signal Processing and
[12] Sopharak, Akara, and Bunyarit Uyyanonvara. "Automatic exudates Control 53 (2019): 101533.
detection from diabetic retinopathy retinal image using fuzzy c-means and [29] Jun, Tae Joon, Youngsub Eom, Cherry Kim, and Daeyoung Kim.
morphological methods." In Conf. on Advances in Computer Science and "Tournament based ranking CNN for the cataract grading." In 2019 41st
Technology, pp. 359-364. 2007. Annual International Conference of the IEEE Engineering in Medicine and
[13] Juneja, Mamta, Shaswat Singh, Naman Agarwal, Shivank Bali, Biology Society (EMBC), pp. 1630-1636. IEEE, 2019.
Shubham Gupta, Niharika Thakur, and Prashant Jindal. "Automated [30] Hossain, Md Rajib, Sadia Afroze, Nazmul Siddique, and Mohammed
detection of Glaucoma using deep learning convolution network (G-net)." Moshiul Hoque. "Automatic detection of eye cataract using deep
Multimedia Tools and Applications 79 (2020): 15531-15553. convolution neural networks (DCNNs)." In 2020 IEEE region 10
[14] Gulshan, Varun, Lily Peng, Marc Coram, Martin C. Stumpe, Derek symposium (TENSYMP), pp. 1333-1338. IEEE, 2020.
Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan et al. [31] Zhang, Xiaofei, Jiancheng Lv, Heng Zheng, and Yongsheng Sang.
"Development and validation of a deep learning algorithm for detection of "Attention-based multi-model ensemble for automatic cataract detection in
diabetic retinopathy in retinal fundus photographs." jama 316, no. 22 b-scan eye ultrasound images." In 2020 international joint conference on
(2016): 2402-2410. neural networks (IJCNN), pp. 1-10. IEEE, 2020.
[15] Gargeya, Rishab, and Theodore Leng. "Automated identification of [32] Khan, Md Sajjad Mahmud, Mahiuddin Ahmed, Raseduz Zaman Rasel,
diabetic retinopathy using deep learning." Ophthalmology 124, no. 7 (2017): and Mohammad Monirujjaman Khan. "Cataract detection using
962-969. convolutional neural network with VGG-19 model." In 2021 IEEE World
[16] Chaudhuri, Subhasis, Shankar Chatterjee, Norman Katz, Mark Nelson, AI IoT Congress (AIIoT), pp. 0209-0212. IEEE, 2021.
and Michael Goldbaum. "Detection of B in retinal images using two- [33] Kaggle. (2021). Ocular Disease Recognition. Accessed: Feb. 11, 2021.
dimensional matched filters." IEEE Transactions on medical imaging 8, no. [Online]. Available: https://www.kaggle.com/andrewmvd/ocular-
3 (1989): 263-269. diseaserecognitionodir5k
[17] Vallabha, Deepika, Ramprasath Dorairaj, Kamesh Namuduri, and [34] Amin, Javeria, Muhammad Sharif, Amjad Rehman, Mudassar Raza,
Hilary Thompson. "Automated detection and classification of vascular and Muhammad Rafiq Mufti. "Diabetic retinopathy detection and
abnormalities in diabetic retinopathy." In Conference Record of the Thirty- classification using hybrid feature set." Microscopy research and technique
Eighth Asilomar Conference on Signals, Systems and Computers, 2004., 81, no. 9 (2018): 990-996.
vol. 2, pp. 1625-1629. IEEE, 2004. [35] Patel, Sanskruti. "Diabetic retinopathy detection and classification
[18] Noronha, Kevin, Jagadish Nayak, and S. N. Bhat. "Enhancement of using pre-trained convolutional neural networks." International Journal on
retinal fundus image to highlight the features for detection of abnormal Emerging Technologies 11, no. 3 (2020): 1082-1087.
eyes." In TENCON 2006-2006 IEEE Region 10 Conference, pp. 1-4. IEEE, [36] Das, Sraddha, Krity Kharbanda, M. Suchetha, Rajiv Raman and D
2006. EdwinDhas. “Deep learning architecture based on segmented fundus image
[19] Gardner, G. Gail, David Keating, Tom H. Williamson, and Alex T. features for classification of diabetic retinopathy.” Biomed. Signal Process.
Elliott. "Automatic detection of diabetic retinopathy using an artificial Control. 68 (2021): 102600.
neural network: a screening tool." British journal of Ophthalmology 80, no. [37] Math, Laxmi, and Ruksar Fatima. "Adaptive machine learning
11 (1996): 940-944. classification for diabetic retinopathy." Multimedia Tools and Applications
[20] Bezdek, James C., James Keller, Raghu Krisnapuram, and Nikhil Pal. 80, no. 4 (2021): 5173-5186.
Fuzzy models and algorithms for pattern recognition and image processing. [38] Abu Samah, Abdul Hafiz, Fadzil Ahmad, Muhammad Khusairi Osman,
Vol. 4. Springer Science & Business Media, 1999. Noritawati Md Tahir, and Mohaiyedin Idris. "Diabetic retinopathy
[21] Markham, R., A. Osareh, M. Mirmehdi, B. Thomas, and M. Macipe. pathological signs detection using image enhancement technique and deep
"Automated identification of diabetic retinal exudates using support vector learning." Journal of Electrical and Electronic Systems Research (JEESR)
machines and neural networks." Investigative Ophthalmology & Visual 18 (2021): 44-52.
Science 44, no. 13 (2003): 4000-4000. [39] Lu, Si-Yuan, Shui-Hua Wang, Xin Zhang, and Yu-Dong Zhang.
[22] Acharya, Udyavara R., Choo M. Lim, E. Yin Kwee Ng, Caroline Chee, "TBNet: a context-aware graph network for tuberculosis diagnosis."
and Toshiyo Tamura. "Computer-based detection of diabetes retinopathy Computer Methods and Programs in Biomedicine 214 (2022): 106587.
stages using digital fundus images." Proceedings of the institution of [40] Lu, Siyuan, Shui-Hua Wang, and Yu-Dong Zhang. "Detection of
mechanical engineers, part H: journal of engineering in medicine 223, no. 5 abnormal brain in MRI via improved AlexNet and ELM optimized by
(2009): 545-553. chaotic bat algorithm." Neural Computing and Applications 33 (2021):
[23] Uchino, Eiichiro, Kanata Suzuki, Noriaki Sato, Ryosuke Kojima, 10799-10811.
Yoshinori Tamada, Shusuke Hiragi, Hideki Yokoi et al. "Classification of [41] Jadhav, Ambaji S., Pushpa B. Patil, and Sunil Biradar. "Analysis on
glomerular pathological findings using deep learning and nephrologist–AI diagnosing diabetic retinopathy by segmenting blood vessels, optic disc and
collective intelligence approach." International journal of medical retinal abnormalities." Journal of Medical Engineering & Technology 44,
informatics 141 (2020): 104231. no. 6 (2020): 299-316.
[24] Junayed, Masum Shah, Afsana Ahsan Jeny, Syeda Tanjila Atik, Nafis [42] Civit-Masot, Javier, Manuel J. Domínguez-Morales, Saturnino
Neehal, Asif Karim, Sami Azam, and Bharanidharan Shanmugam. Vicente-Díaz, and Anton Civit. "Dual machine-learning system to aid
"AcneNet-a deep CNN based classification approach for acne classes." In glaucoma diagnosis using disc and cup feature extraction." IEEE Access 8
2019 12th International Conference on Information & Communication (2020): 127519-127529.
Technology and System (ICTS), pp. 203-208. IEEE, 2019. [43] Al Ghamdi, Manal, Mingqi Li, Mohamed Abdel-Mottaleb, and
Mohamed Abou El-Ghar. "Multi-task deep learning for glaucoma detection

18 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3339574

Maneesha Vadduri and Kuppusamy P: Enhancing Ocular Healthcare: Deep Learning-based multi-class Diabetic Eye Disease Segmentation and Classification

and classification." Computers in Biology and Medicine 137 (2021): learning." Journal of Electrical and Electronic Systems Research (JEESR)
104750. 18 (2021): 44-52.
[44] Gómez-Valverde, Juan J., Alfonso Antón, Gianluca Fatti, Bart Liefers, [60] Lu, Si-Yuan, Shui-Hua Wang, Xin Zhang, and Yu-Dong Zhang.
Alejandra Herranz, Andrés Santos, Clara I. Sánchez, and María J. Ledesma- "TBNet: a context-aware graph network for tuberculosis diagnosis."
Carbayo. "Automatic glaucoma classification using color fundus images Computer Methods and Programs in Biomedicine 214 (2022): 106587.
based on convolutional neural networks and transfer learning." Biomedical [61] Lu, Siyuan, Shui-Hua Wang, and Yu-Dong Zhang. "Detection of
optics express 10, no. 2 (2019): 892-913. abnormal brain in MRI via improved AlexNet and ELM optimized by
[45] Diaz-Pinto, Andres, Sandra Morales, Valery Naranjo, Thomas Köhler, chaotic bat algorithm." Neural Computing and Applications 33 (2021):
Jose M. Mossi, and Amparo Navea. "CNNs for automatic glaucoma 10799-10811.
assessment using fundus images: an extensive validation." Biomedical [62] Jadhav, Ambaji S., Pushpa B. Patil, and Sunil Biradar. "Analysis on
engineering online 18 (2019): 1-19. diagnosing diabetic retinopathy by segmenting blood vessels, optic disc and
[46] Solomon, Chris, and Toby Breckon. Fundamentals of Digital Image retinal abnormalities." Journal of Medical Engineering & Technology 44,
Processing: A practical approach with examples in Matlab. John Wiley & no. 6 (2020): 299-316.
Sons, 2011. [63] Civit-Masot, Javier, Manuel J. Domínguez-Morales, Saturnino
[47] Zuiderveld, Karel. "Contrast limited adaptive histogram equalization." Vicente-Díaz, and Anton Civit. "Dual machine-learning system to aid
Graphics gems (1994): 474-485. glaucoma diagnosis using disc and cup feature extraction." IEEE Access 8
[48] Gardner, G. Gail, David Keating, Tom H. Williamson, and Alex T. (2020): 127519-127529.
Elliott. "Automatic detection of diabetic retinopathy using an artificial [64] Chandrapati, Lalitha Manasa, and Ch Koteswara Rao. "Integrated
neural network: a screening tool." British journal of Ophthalmology 80, no. Assessment of Teaching Efficacy: A Natural Language Processing
11 (1996): 940-944. Approach." International Journal of Advanced Computer Science and
[49] Bezdek, James C., James Keller, Raghu Krisnapuram, and Nikhil Pal. Applications 14, no. 1 (2023).
Fuzzy models and algorithms for pattern recognition and image processing.
Vol. 4. Springer Science & Business Media, 1999.
[50] Jun, Tae Joon, Youngsub Eom, Cherry Kim, and Daeyoung Kim. MANEESHA VADDURI received Bachelor’s
"Tournament based ranking CNN for the cataract grading." In 2019 41st degree in Computer Science and Engineering
Annual International Conference of the IEEE Engineering in Medicine and from JNTUK, Andhra Pradesh, INDIA, in 2017,
Biology Society (EMBC), pp. 1630-1636. IEEE, 2019. and Master’s degree in Computer Science and
[51] Hossain, Md Rajib, Sadia Afroze, Nazmul Siddique, and Mohammed Engineering from JUTUK, Andhra Pradesh,
Moshiul Hoque. "Automatic detection of eye cataract using deep INDIA, in 2019. She is currently pursuing the
convolution neural networks (DCNNs)." In 2020 IEEE region 10 Ph.D. degree in School of Computer Science and
symposium (TENSYMP), pp. 1333-1338. IEEE, 2020. Engineering at VIT-AP University, AP, INDIA.
[52] Zhang, Xiaofei, Jiancheng Lv, Heng Zheng, and Yongsheng Sang. Her research interests include Deep Learning,
"Attention-based multi-model ensemble for automatic cataract detection in Machine Learning, and Computer Vision.
b-scan eye ultrasound images." In 2020 international joint conference on
neural networks (IJCNN), pp. 1-10. IEEE, 2020.
[53] Khan, Md Sajjad Mahmud, Mahiuddin Ahmed, Raseduz Zaman Rasel, P. Kuppusamy received Bachelor’s degree in
and Mohammad Monirujjaman Khan. "Cataract detection using Computer Science and Engineering from
convolutional neural network with VGG-19 model." In 2021 IEEE World Madras University, India, in 2002, and Master’s
AI IoT Congress (AIIoT), pp. 0209-0212. IEEE, 2021. degree in Computer Science and Engineering
[54] Kaggle. (2021). Ocular Disease Recognition. Accessed: Feb. 11, 2021. from Anna University, Chennai, India, in 2007.
[Online]. Available: https://www.kaggle.com/andrewmvd/ocular- He has completed Ph.D in Information and
diseaserecognitionodir5k Communication Engineering, Anna University,
[55] Amin, Javeria, Muhammad Sharif, Amjad Rehman, Mudassar Raza, Chennai, in 2014. At present, he is working as
and Muhammad Rafiq Mufti. "Diabetic retinopathy detection and Associate Professor in School of Computer
classification using hybrid feature set." Microscopy research and technique Science & Engineering, VIT AP University,
81, no. 9 (2018): 990-996. Amaravati, AP, INDIA. He has published 50+ research papers in leading
[56] Patel, Sanskruti. "Diabetic retinopathy detection and classification international journals, international conferences and a book for Computer
using pre-trained convolutional neural networks." International Journal on Science and Engineering. Currently, working on Machine learning, Deep
Emerging Technologies 11, no. 3 (2020): 1082-1087. Learning, Internet of Things based research project with sensors, Raspberry
[57] Das, Sraddha, Krity Kharbanda, M. Suchetha, Rajiv Raman and D Pi, Arduino for handling smart devices. He serves as a reviewer for various
EdwinDhas. “Deep learning architecture based on segmented fundus image journals and International Conferences. He has organized research
features for classification of diabetic retinopathy.” Biomed. Signal Process. workshops, seminars based on Arduino controller with smart devices and
Control. 68 (2021): 102600. FDPs. He is a Member of IEEE, ISTE, IAENG and IACSIT. His current
[58] Math, Laxmi, and Ruksar Fatima. "Adaptive machine learning research interests are Machine Learning, Deep Learning, Image Processing,
classification for diabetic retinopathy." Multimedia Tools and Applications and Internet of Things.
80, no. 4 (2021): 5173-5186.
[59] Abu Samah, Abdul Hafiz, Fadzil Ahmad, Muhammad Khusairi Osman,
Noritawati Md Tahir, and Mohaiyedin Idris. "Diabetic retinopathy
pathological signs detection using image enhancement technique and deep

19 VOLUME XX, 20XX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like