1 s2.0 S0957417423000581 Main
1 s2.0 S0957417423000581 Main
1 s2.0 S0957417423000581 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Diabetic retinopathy (DR) is an incurable retinal condition caused by excessive blood sugar that, if left untreated,
Contrast limited adaptive histogram can result in even blindness. A novel automated technique for DR detection has been proposed in this paper. To
equalization (CLAHE) accentuate the lesions, the fundus images (FIs) were preprocessed using Contrast Limited Adaptive Histogram
Diabetic retinopathy (DR)
Equalization (CLAHE). A parallel convolutional neural network (PCNN) was employed for feature extraction and
Parallel convolutional neural network (PCNN)
Extreme LEARNING MACHine (ELM)
then the extreme learning machine (ELM) technique was utilized for the DR classification. In comparison to the
similar CNN structure, the PCNN design uses fewer parameters and layers, which minimizes the time required to
extract distinctive features. The effectiveness of the technique was evaluated on two datasets (Kaggle DR 2015
competition (Dataset 1; 34,984 FIs) and APTOS 2019 (3,662 FIs)), and the results are promising. For the two
datasets mentioned, the proposed technique attained accuracies of 91.78 % and 97.27 % respectively. However,
one of the study’s subsidiary discoveries was that the proposed framework demonstrated stability for both larger
and smaller datasets, as well as for balanced and imbalanced datasets. Furthermore, in terms of classifier per
formance metrics, model parameters and layers, and prediction time, the suggested approach outscored existing
state-of-the-art models, which would add significant benefit for the medical practitioners in accurately identi
fying the DR.
1. Introduction primary forms of the DR. Again, NPDR can be classified with four
severity levels: No DR, Mild stage, Moderate stage, and Severe stage
Diabetic retinopathy (DR) is a chronic retinal disease that is regarded (Mumtaz et al., 2018). Fig. 1 reveals some common symptoms of the DR
as the sixth most common cause of blindness worldwide. It’s a hidden (Mumtaz et al., 2018). The small dark reddish dot-like lesion is visible
progressive chronic disease among the diabetic patients. According to near the blood vessel’s terminal point, called a microaneurysm (MA).
the 2013 statistics, 382 million people are affected by diabetes-related Hypertension and blockage of the retinal veins cause retinal hemorrhage
retinal disease, and by 2025, it is projected to exceed 592 million (HM), another DR consequence. Small HMs might look a lot similar to
(Pandey & Sharma, 2018). DR shows no clear early sign of appearance; the MAs at times. Exudates are yellow flicks that filter out the injured
as the condition degrades, complete blindness is basically the obvious capillaries and are made up of lipids and protein residues.
end result. Regular screening can help to identify the DR at an early In its later phases, the DR is difficult to treat. There are only a few
stage, which can help in arresting any further damage through appro microaneurysms that appear in the Mild NPDR. In contrast, multiple
priate medication. Fundus images (FIs) with high resolution are utilized MAs, hemorrhages, and venous beading occur in the moderate NPDR,
for detecting the teensy lesions and grading the severity level. Non- leading patients’ capacity to transfer blood to the retina to be compro
proliferative DR (NPDR) and proliferative DR (PDR) are the two mised. Severe NPDR is defined by the appearance of more than 20 intra-
* Corresponding author.
E-mail addresses: [email protected] (M. Ahsan), [email protected] (J. Haider), [email protected] (M. Kowalski).
https://doi.org/10.1016/j.eswa.2023.119557
Received 15 February 2022; Received in revised form 2 December 2022; Accepted 12 January 2023
Available online 13 January 2023
0957-4174/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
authors utilized the transfer learning (TL) approach to reuse the pre-
trained model ResNet101′ s weights and achieved an mAP score of 45
%. Besides segmentation, image-level classification is also popular for
the DR grading. The whole image is classified into its classification
grades based on unique features in the image-level classification.
Several of the studies utilized traditional machine learning (ML)
methods such as DT, support vector machine (SVM), Random forests
(RF), logistic regression (LR), and Gaussian Naïve Bayes (GNB). For
using traditional ML-based classification, features were extracted using
image processing techniques later deployed to develop the models. For
example, Lachure et al. (2015), used morphological image processing
like erosion, dilation, opening, closing, etc., to segment MAs and exu
Fig. 1. Fundus image with various lesions for DR classification. dates. Later the features were fed to the SVM and k-nearest neighbors
(KNN) classifiers for grading the FIs. Asha and Karpagavalli (2015),
retinal hemorrhages in each of the four quadrants, visible venous detected retinal exudates using machine learning techniques where the
beading in two or more quadrants, and substantial intraretinal micro FIs were segmented using the fuzzy C means algorithm, then exudates
vascular abnormality (IRMA) in one or more quadrants. New blood features were detected from the Luv color space. The classifiers utilized
vessels are formed in the PDR stage, along with the aforementioned included NB, Multilayer Perceptron (MLP), and Extreme Learning Ma
anomalies (Chudzik et al., 2018). chine (ELM), with ELM providing the best results. ML techniques for
DR is diagnosed using fundus images. Expert ophthalmologists find automatically identifying and categorizing the DR from the retina im
existing lesions on the images based on which they grade the DR level ages were studied by Honnungar et al. (2016). The proposed method
and suggest appropriate treatment accordingly. As the lesions are small entailed image preprocessing (Contrast Limited Adaptive Histogram
and often having an overlapping boundaries between the consecutive Equalization, CLAHE), feature extraction using the bag of visual words
DR grades, even the expert ophthalmologists cannot provide consistent model, and image classification into distinct DR phases using a multi-
diagnosis for the same fundus images and it is also a time-consuming class classifier (logistic regression, SVM, and RF). Raman et al. applied
process. Therefore, an urgent need for a computer-aided system has CLAHE to enhance the images, then Sobel operator and contour with
been realized by the research community. circular hough transformation for optic disk segmentation, morpholog
Various computer-aided systems have been proposed so far for the ical operation for blood vessel segmentation, regions growing for exu
DR screening. Ophthalmologists grade the severity level by screening dates segmentation, and a mixture model for microaneurysm
the lesions present in FIs and providing treatment based on the level. segmentation (Raman et al., 2016). Finally, an artificial neural network
Some lesion segmentation techniques were developed to copy this style (ANN) was used as a classifier. Carrera et al. (2017), utilized image
to mark out these tiny lesions and assist the ophthalmologists in correct processing to isolate blood vessels, microaneurysms, and hard exudates
diagnosis. Image processing techniques were frequently used for seg for extracting features, which were later deployed to the SVM classifier.
menting lesions of FIs. Using image processing techniques, Mumtaz et al. They obtained a sensitivity of 95 % and an accuracy of 94 %. Soma
(2018), showed the automatic identification of one of the red lesions, i. sundaram and Ali (2017), developed a ML bagging ensemble classifier
e., hemorrhage, which is one of the most recognizable symptoms of (ML-BEC) and extracted t-distribution Stochastic Neighbor Embedding
retinal disorders among diabetic patients. Akram et al. (2014), detected (t-SNE) features. Ramani et al. (2017), proposed a two-level classifica
the MA from small patches extracted from the FIs while PCA was used tion for the DR grading. Ensemble of Best First Trees (BFTs) was used,
for dimensionality reduction. Rahim et al. (2016), used fuzzy C-means whereas misclassified instances were removed and deployed to second
(FCM) image processing techniques to provide a novel automated level ensemble classifiers with J48 Graft Trees. Using Local Ternary
diagnosis of the DR and maculopathy in eye fundus pictures. Kar and Pattern (LTP) and Local Energy-based Shape Histogram, Chetoui et al.
Maity (2017), developed a four-part lesion detection technique that (2018), identified texture characteristics (LESH). For classification, SVM
included extraction of vessels and removal of the optic disc, pre- was used with various kernel functions. For feature representation, a
processing, detection of candidate lesion, and post-processing. The histogram binning method was utilized. They demonstrated that using
dark lesions were separated from the weakly lit retinal backgrounds SVM with an RBF kernel, LESH is the best method, with an accuracy of
using curvelet-based edge enhancement, while the contrast between the 90 %. ML approaches for segmentation and categorization of the DR
bright lesions and the background was improved using a well-designed were presented by Ali et al. (2020). They proposed a new regional-
wideband bandpass filter. Subsequently, the mutual information of the growing paradigm based on clustering. They used four types of char
maximum matched filter response and the maximum Laplacian of acteristics for texture analysis: histogram (H), wavelet (W), co-
Gaussian response was maximized together. Finally, morphology-based occurrence matrix (COM), and run-length matrix (RLM). The authors
post-processing was used to exclude the candidate pixels that were utilized data fusion to create hybrid-feature datasets to increase classi
incorrectly identified. Umapathy et al. (2019), extracted texture features fication accuracy. To obtain 13 optimal features, they used Fisher,
using the image processing and classified by Decision Tree (DT) classi correlation-based feature selection, mutual information, and probability
fier. For the second method the authors utilized the transfer learning of error plus average correlation. Finally, five classifiers were used: SMO
method. As the complex features were extracted using the image pro (sequential minimum optimization), Lg (logistic), MLP (multilayer
cessing technique, the accuracy was not so high. For this, deep learning perceptron), and SLg (simple logistic). Gayathri et al. (2021), designed a
models were also proposed for the lesion segmentation. For the seg multipath convolutional neural network (M− CNN) for extracting global
mentation of microaneurysms, Chudzik et al. (2018), presented a patch- and local features from fundus images. Then SVM, RF, and J48 classifiers
based Convolutional Neural Network (CNN) with batch normalization were used for the final DR grade prediction. The M− CNN network ob
layers and a dice loss function Pixel-wise exudate detection with a deep tained the best result with the J48 classifier. Mahmoud et al. (2021),
CNN was proposed by Yu et al. (2017). Gondal et al. (2017), presented a introduced a hybrid inductive ML algorithm (HIMLA) for automatic DR
weakly-supervised CNN model that highlighted denoting regions of the detection.
retinal images. The authors obtained high classification and sensitivity Color FIs were normalized and a convolutional encoder-decoder was
scores. The Mask-RCNN model was proposed to segment small lesions used for segmenting blood vessels. A multiple instance learning tech
(MA and exudates) by Shenavarmasouleh and Arabnia (2007). The nique was utilized for feature extraction and classification. Reddy et al.
(2020), experimented with an ensemble learning method with
2
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
Adaboost, RF, DT, KNN, and Logistic Regression. The authors used the competition (Dataset 1) and APTOS, 2019 respectively provided by
grid search technique for hyperparameter tuning. Odeh et al. (2016), EyePACS and Aravind Eye Hospital via Kaggle (California Healthcare
proposed an ensemble method using RF for robust and powerful Foundation, 2019; APTOS, 2019). The datasets contained five grades of
learning, NN for improving precision, and SVM for accurate, time-saving the DR to detect with 34,984 FIs in Dataset 1 and 3,662 images in
prediction. For feature selection, the authors used info gain attribute APTOS, 2019. 80 % of the data was used for training, and the rest was for
evaluation and wrapper subset evaluation algorithms. testing. During image extraction from the Kaggle DR 2015 dataset, some
One problem with the traditional ML is that the complex features FIs were lost. As both the datasets were collected from Kaggle compe
need to be extracted first. This manual feature extraction using image tition, their corresponding test images were kept in private. Hence, only
processing sometimes fail to capture all the complex features necessary the trained data was used for the DR classification. The trained dataset
for an accurate classification. Here comes the deep learning (DL) then further split into both training and testing set for carried out the
approach, which is used for imaging in a wide range of applications classification task. Table 1 shows the number of FIs per class for both
nowadays. DL models were also deployed in the DR identification with datasets. Representative samples from each class are demonstrated in
significant success through accurate extraction of the complex feature Fig. 2.
using the convolution layers. A 4 × 4 kernel-based CNN architecture
with some preprocessing and augmentation methods was proposed by 3. Proposed framework
Islam et al. (2018), for detecting the DR where the authors employed L2
regularizer and dropout to eliminate overfitting and achieved 98 % An adequate framework was proposed in this study for severity
sensitivity and 94 % specificity with a kappa score of 85 %. Zhou et al. grading of the DR. The benefits of ML and DL algorithms were merged to
(2018), proposed a multitasking deep learning model for the DR develop a robust framework with a trade-off between the model’s pro
grading. Because of the interrelationship among the DR stages, the au cessing performance and classification performance. Fig. 3 exhibits the
thors followed the multitasking approach that predicted the labels with proposed framework to detect DR from the FIs. First, the FIs were pre
both the classification and regression and got a kappa score of 84 %. A processed using CLAHE to highlight the lesions more clearly, then
Siamese-like architecture was also proposed for the DR detection by normalized and finally reshaped. Afterward, a lightweight CNN model
Zeng et al. (2019). The model used binocular fundus images as input and was developed to extract the most discriminant features from the pro
was trained with a transfer learning strategy. An attention-based DL cessed FIs. The extracted features were standardized to be fed into the
model, BiRA-Net was proposed by Zhao et al. (2019). Islam et al. (2020), ELM algorithm, which to classify the severity level of the DR. In the
proposed a VGG16 based transfer learning approach with a color pre subsequent sections, all components of the framework have been
processing version. The authors used stratified K-fold cross-validation to explained comprehensively.
reduce the overfitting problem. For a smaller Kaggle dataset, Samanta
et al. (2020), suggested transfer learning-based DenseNet and attained a
3.1. Pre-processing
kappa score of 0.8836 on the validation set. On the Messidor-1 and
APTOS datasets, Gangwar and Ravi (2021), used a pre-trained model,
Image preprocessing is crucial for medical image analysis because
Inception-ResNet-v2, and built a custom layer on top, achieving an ac
the classification performance varies depending on how well the image
curacy of 72.33 % and 82.18 % respectively. Islam et al. (2021),
has been preprocessed. CLAHE reveals a favorable result for enhancing
developed a customized VGG19 model and down sampling technique for
image quality in the case of medical image preprocessing (Nahiduzza
DR detection. Majumder and Kehtarnavaz (2021), proposed a multi
man et al., 2021a,b). Since the datasets contained different quality of
tasking deep learning model to detect the five grades of the DR
images, hence for improving the quality of low contrast images while
composed of one regression model, one classification model, and one
focusing on the lesions of FIs, CLAHE was utilized. The intensification in
regression model for inter-dependency. For the APTOS and EyePACS
CLAHE was controlled by clipping the histogram at a user-defined value
datasets, they achieved a kappa score of 90 % and 88 %. Also, an inte
called the clip limit. The clipping level determined the amount of
grated shallow network was proposed by Chen et al. (2020).
distortion in the histogram should be eliminated and this defined the
Though various models have been developed, still further improve
limit of contrast adjustment. In this study, the tile size was (4 × 4), and
ment is required particularly in the case of multiclass classification.
the clip limit was 2.0 while using the color version of the CLAHE. After
Several ML models were employed in some research, but in this case, the
applying CLAHE, the FIs have been normalized dividing by 255 to make
classification performance was not satisfactory despite the model
each image range between 0 and 1, which also reduced the complexity
complexity being lower than the existing DL models. Researchers used
of the model. Since the datasets contained diverse FIs, making the FIs
different transfer learning (TL) models to achieve higher classification
with the same size was an essential step to follow. Hence, the FIs were
performance to overcome these shortcomings. However, the TL models
resized to (124 × 124) to fit into the CNN model. Fig. 4 shows the effect
have a vast number of parameters, layers and consume a lot of time for
of CLAHE in the FIs.
training. Therefore, this study proposes a framework that makes a trade-
off between the ML and DL models, increasing classification perfor
mance and reducing the vast number of parameters and layers, which 3.2. Features extraction using parallel convolutional layers
reduces the processing time. In this study, the FIs were preprocessed
using CLAHE to highlight the lesions of DR. A lightweight parallel CNN One of the main focuses of this study was to design a CNN that
model has been developed to extract the most discriminant features, reduced both parameters and layers, which eventually shortened the
which are standardized using a standard scaler. Finally, a single-layer processing time while extracting the most prominent features. The
ML algorithm model named ELM has been used for classification of
the DR. The proposed framework brings its novelty through a smaller Table 1
number of parameters, layers, and comparatively lower processing time. The number of FIs per class for Dataset-1 and APTOS, 2019.
The proposed framework also offers versatile capabilities in any domain, Level Dataset-1 (Image Ratio) APTOS, 2019 (Image Ratio)
for instance, small or large datasets, balanced or imbalanced datasets,
No DR 25,707 (0.73) 1,805 (0.49)
and low-resolution FIs. Mild DR 2,435 (0.07) 370 (0.10)
Moderate DR 5,268 (0.15) 999 (0.27)
2. Dataset description Severe DR 869 (0.025) 193 (0.05)
PDR 705 (0.02) 295 (0.08)
Total 34,984 3,662
In this study, two prevalent datasets were used: Kaggle DR 2015
3
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
Fig. 2. Samples of No DR, Mild, Moderate, Severe, and PDR from Dataset-1 and APTOS, 2019.
Fig. 4. Five levels of FIs without preprocessing and preprocessing with CLAHE.
notable features assisted the ELM model in accurately detecting the discriminant features, whereas a large number of CL layers might lead to
levels of DR. Basically, in CNN, the convolutional layer (CL) was posi overfitting the model. Hence, the number of CL layers needed to be
tioned sequentially for obtaining the best features. For instance, select chosen adequately to extract the most relevant features. In this study, six
ing a small number of CL layers might result in the loss of some CL layers were selected to extract the prominent features while reducing
4
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
overfitting. The lightweight parallax CNN has been shown in Fig. 5. Table 2
In the lightweight parallel CNN, four CLs were placed in parallel, Summary of proposed lightweight CNN for feature extraction.
which resulted in lowering the parameters and processing time. Since Layer (Type) Output Shape Parameters
the four CLs were run in parallel, which could be considered as a single
model (Functional) (None, 124, 124, 256) 31, 744
CL but performed just like four CLs. The size of each CL was 64. The conv5 (Conv2D) (None, 122, 122, 32) 73, 760
kernel sizes of the first, second, third, and fourth CLs were 9 × 9, 7 × 7, bn1 (BatchNormalization) (None, 122, 122, 32) 128
5 × 5, and 3 × 3, respectively and the activation function was ReLU. In av5 (Activation) (None, 122, 122, 32) 0
this study, the padding size was kept the same in the first four CLs to mp1 (MaxPooling2D) (None, 61, 61, 32) 0
conv6 (Conv2D) (None, 59, 59, 16) 4, 624
check the border element. As sometimes the border element might hold bn2 (BatchNormalization) (None, 59, 59, 16) 64
important information in the FIs which were checked using the same av2 (Activation) (None, 59, 59, 16) 0
padding. Afterwards, the result of these parallel CLs were concatenated mp2 (MaxPooling2D) (None, 29, 29, 16) 0
and fed into the sequential CNN. The sizes of the last two CLs were 32 dp1 (Dropout) (None, 29, 29, 16) 0
ft (Flatten) (None, 13456) 0
and 16, respectively, with a kernel size of 3 × 3. The padding size in the
dense (Dense) (None, 250) 3, 364, 250
rest of the CLs was kept “valid”. Each CL was followed by batch bn4 (BatchNormalization) (None, 250) 1, 000
normalization, activation, and a max-pooling layer. Max-pooling with 2 av4 (Activation) (None, 250) 0
× 2 filters was used to extract the most important regions of the FIs by dp2 (Dropout) (None, 250) 0
obtaining the highest value in each region at the CLs. There were two Feature Extraction (Dense) (None, 120) 30, 120
Total Parameters 3, 506, 775
fully connected (FC) layers, and the features were extracted from the last Trainable Parameters 3, 505, 939
FC layer. Two dropouts were used with a 0.5 probability: one after the Non-trainable Parameters 836
last CL and another after the first FC layer. Dropout was used to reduce
overfitting and speed up the training process by randomly skipping 50 %
of all nodes. For extracting the features, the CNN model was run for 50
epochs with a batch size 64 while considering the learning rate of 0.001
Huang et al. (2006), proposed ELM, a forward feed network-based
with the ADAM optimizer and handling the loss using sparse categorical
neural network. The standardized 120 features were classified using a
cross-entropy. A total of 120 features were selected from the last FC
single hidden layer. The number of nodes in the hidden layer for Dataset-
layer by using a trial-and-error process. The summary of the CNN model
1 and APTOS, 2019 were 1000 and 200, respectively, which were
is shown in Table 2.
selected by trial-and-error method. The number of nodes in the input
and output layers of the ELM model for both datasets were 120 and 5,
3.3. Extreme learning machine respectively, whereas the ReLU was used as an activation function. Due
to the absence of backpropagation, the training time was a thousand
Before fitting the features into ELM, features were standardized by times faster than the typical NN, resulting in better generalization power
subtracting the mean and scaling to mean–variance. The standard scaler and higher classification performance (Huang et al. (2006); Nahi
was employed to regularize the extracted features, which improved the duzzaman et al., 2021a,b). The parameters from the input to the hidden
classification performance of the models (), (Nahiduzzaman et al., layer were calculated randomly, whereas the parameters from the hid
2019). The standard score for the sample x has been calculated using Eq. den layer to the output layer were calculated using pseudoinverse. For
(1) (Farrell and Saloner, 1985). extracting features using lightweight CNN, the entire trainable param
x− x eters for the DR classification are 3, 505,939. For classification using
y= (1) Dataset-1 and APTOS, 2019, the complete parameters of the ELM were
σ
125,500, and 25,000, resulting in total trainable parameters of 3, 630,
where x is the mean of the samples and σ is the standard deviation of the 939, and 3,530, 939, respectively.
samples.
Fig. 5. The lightweight parallel CNN to extract the features from FIs.
5
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
2 × (Precision × Recall)
F1 − Score = (5)
Precision + Recall
1 TP TN
AUC = ( + ) (6)
2 TP + FN TN + FP
where true positives, true negatives, false positives, and false negatives
are symbolized as TP , TN , FP and FN , respectively. True positives indi
cated that the normal patients were correctly detected as normal, true
negatives indicated that the DR affected patients were correctly identi
fied as DR whereas false positives indicated that the normal patients
were wrongly detected as DR and false negatives indicated that the DR
patients were wrongly detected as normal.
PyCharm Community Edition (2021.2.3) software was used to run all
of the codes, which were written in the python programming language.
Keras was used to build the CNN model, with TensorFlow as the back
end. The ELM models were trained and tested on a PC with a 64-bit
Windows 10 Pro operating system, an Intel (R) Core (TM) i9-11900
CPU @ 2.50 GHz, 32 GB of RAM, and an NVIDIA GeForce, RTX 3090
24 GB GPU.
In this section, the different types of performance were investigated
to show the robustness of the proposed framework. A lightweight
customized CNN has extracted 120 prominent features from the pre
processed FIs. These prominent features were further preprocessed and
fitted into the ELM model to classify different levels of DR. In abridge
ment, the feature deriving capability was incorporated with the ELM.
The proposed combination was examined with two datasets. Fig. 6. Confusion Matrix (CM) of ELM for Dataset-1.
6
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
Table 3 Table 6
Classification performance comparison by Precision for Dataset-1. Classification performance comparison by Precision for APTOS, 2019 dataset.
DR Level Precision DR Level Precision
No DR 0.93 0.82 0.86 0.82 0.82 0.82 No DR 1.0 0.96 0.96 0.97 0.96 0.97
Mild 0.87 0.42 0.24 0.42 0.28 0.41 Mild 0.99 0.74 0.74 0.79 0.74 0.75
Moderate 0.87 0.48 0.40 0.47 0.44 0.47 Moderate 0.94 0.8 0.8 0.79 0.79 0.8
Severe 0.95 0.56 0.47 0.55 0.46 0.52 Severe 0.9 0.75 0.75 0.71 0.69 0.71
PDR 0.94 0.61 0.56 0.64 0.61 0.65 PDR 0.96 0.73 0.73 0.68 0.57 0.72
Average 0.91 0.58 0.50 0.58 0.52 0.57 Average 0.96 0.8 0.8 0.79 0.75 0.79
Table 4 Table 7
Classification performance comparison by F-1Precision for Dataset-1. Classification performance comparison by F-1Precision for APTOS, 2019
DR Level F1-Score
dataset.
DR Level F1-Score
ELM SVM GNB RF DT LR
ELM SVM GNB RF DT LR
No DR 0.95 0.86 0.82 0.86 0.85 0.86
Mild 0.77 0.31 0.28 0.30 0.25 0.31 No DR 0.99 0.97 0.97 0.98 0.97 0.98
Moderate 0.82 0.43 0.45 0.43 0.41 0.43 Mild 0.97 0.72 0.72 0.74 0.7 0.73
Severe 0.89 0.39 0.40 0.39 0.36 0.39 Moderate 0.96 0.84 0.84 0.83 0.81 0.83
PDR 0.91 0.46 0.46 0.47 0.44 0.49 Severe 0.92 0.68 0.68 0.63 0.59 0.63
Average 0.87 0.49 0.48 0.49 0.46 0.49 PDR 0.92 0.62 0.62 0.6 0.58 0.63
Average 0.95 0.77 0.77 0.76 0.73 0.76
Table 5
Classification performance comparison by Recall for Dataset-1. Table 8
Classification performance comparison by Recall for APTOS, 2019 dataset.
DR Level Recall
DR Level Recall
ELM SVM GNB RF DT LR
ELM SVM GNB RF DT LR
No DR 0.97 0.91 0.78 0.90 0.87 0.90
Mild 0.70 0.24 0.33 0.24 0.23 0.25 No DR 0.99 0.98 0.98 0.99 0.97 0.98
Moderate 0.78 0.39 0.52 0.40 0.39 0.39 Mild 0.96 0.7 0.7 0.7 0.66 0.7
Severe 0.84 0.30 0.35 0.30 0.29 0.31 Moderate 0.97 0.88 0.88 0.87 0.83 0.88
PDR 0.89 0.37 0.39 0.37 0.34 0.39 Severe 0.95 0.62 0.62 0.56 0.51 0.56
Average 0.83 0.44 0.48 0.44 0.43 0.45 PDR 0.88 0.54 0.54 0.54 0.59 0.56
Average 0.95 0.74 0.74 0.73 0.72 0.74
and other five ML models, whereas the numbers of No DR, Mild, Mod
erate, Severe, and PDR were 1444, 296, 799, 154, and 236, respectively.
For evaluating the ELM classification performance, a CM was developed
using 733 FIs (No DR: 361, Mild: 74, Moderate: 200, Severe: 39, and
PDR: 59). The level-wise precision, f1-score and recall shown in
Tables 6–8 demonstrated that the ELM model performed well in the case
of the imbalance or smaller dataset. The best accuracy (97.27 %) was
achieved by ELM model for the APTOS, 2019 dataset with a recall of 95
7
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
8
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
Table 9
Class-wise classification performance of the proposed framework (PF) compared with the previous studies for the Dataset-1.
Level/ Precision Recall F1-Score AUC
Ref. No.
(Pratt (Qummar PF (Pratt (Qummar PF (Pratt (Qummar PF (Pratt (Qummar PF
et al., et al., 2019) et al., et al., 2019) et al., et al., 2019) et al., et al., 2019)
2016) 2016) 2016) 2016)
No DR 0.78 0.84 0.93 0.95 0.97 0.97 0.85 0.90 0.95 – 0.85 0.94
Mild 0.00 0.51 0.89 0.00 0.80 0.68 0.00 0.15 0.78 – 0.71 0.92
Moderate 0.23 0.65 0.87 0.23 0.41 0.78 0.29 0.50 0.82 – 0.85 0.95
Severe 0.78 0.48 0.92 0.78 0.51 0.83 0.10 0.49 0.88 – 0.96 0.96
PDR 0.44 0.69 0.93 0.44 0.56 0.88 0.37 0.62 0.90 – 0.97 0.97
9
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
the imbalanced datasets. The proposed framework achieved a promising Bodapati, J. D., Naralasetti, V., Shareef, S. N., Hakak, S., Bilal, M., Maddikunta, P. K. R.,
& Jo, O. (2020). Blended multi-modal deep convnet features for diabetic retinopathy
outcome based on the performance metrics considered in this study and
severity prediction. Electronics, 9(6), 914.
eliminated additional time required for data augmentation. Bodapati, J. D., Shaik, N. S., & Naralasetti, V. (2021). Composite deep neural network
with gated-attention mechanism for diabetic retinopathy severity classification.
Journal of Ambient Intelligence and Humanized Computing, 1–15.
5. Conclusion
Chudzik, P., Majumdar, S., Calivá, F., Al-Diri, B., & Hunter, A. (2018). Microaneurysm
detection using fully convolutional neural networks. Computer methods and programs
This study proposed a novel framework to enable fast and accurate in biomedicine, 158, 185–192.
Chetoui, M., Akhloufi, M. A., & Kardouchi, M. (2018). Diabetic retinopathy detection
detection of the levels of DR from the FIs, which can aid diabetic patients
using machine learning and texture features. In In 2018 IEEE Canadian Conference on
in preventing or delaying vision loss. CLAHE was adopted to make the Electrical & Computer Engineering (CCECE) (pp. 1–4). IEEE.
lesson clear so that a CNN model can easily extract the most discrimi Chen, W., Yang, B., Li, J., and Wang, J., “An approach to detecting diabetic retinopathy
nating features. 120 features were extracted using a lightweight parallel based on integrated shallow convolutional neural networks,” IEEE Access, vol. 8, pp.
178 552–178 562, 2020.
CNN to reduce processing time and complexity. Finally, these features Carrera, E. V., González, A. and Carrera, R. “Automated detection of diabetic retinopathy
were standardized and fit into the ELM model to adequately distinguish using SVM,” in 2017 IEEE XXIV international conference on electronics, electrical
the different levels of the DR. The proposed framework exhibited a engineering and computing (INTERCON). IEEE, 2017, pp. 1–4.
Dondeti, V., Bodapati, J. D., Shareef, S. N., & Veeranjaneyulu, N. (2020). Deep
promising result in the cases of 34,984 (Dataset-1) and 3,662 (APTOS, convolution features in non-linear embedding space for fundus image classification.
2019) FI datasets with not only higher classification performance but Rev. d’Intelligence Artif., 34(3), 307–313.
also lowering the parameters, layers, and processing time significantly. [dataset 1] California Healthcare Foundation, “Diabetic retinopathy detection,” http
s://www.kaggle.com/c/diabetic-retinopathy-detection/data, 2015, [accessed on 1-
The framework also outperformed the existing SOTA models for both the February-2022].
datasets. The proposed model can accurately detect the severity degree [dataset 2] Asia Pacific Tele-Ophthalmology Society (APTOS), “Aptos 2019 blindness
of the DR earlier on, hence reducing vision loss of the patients and saving detection,” https://www.kaggle.com/c/aptos2019-blindness-detection/data, 2019,
[Accessed: 1-February- 2022].
valuable time of the medical practitioners.
Farrell, J., & Saloner, G. (1985). Standardization, compatibility, and innovation. the
RAND Journal of Economics, 70–83.
CRediT authorship contribution statement Gangwar, A. K., & Ravi, V. (2021). Diabetic retinopathy detection using transfer learning
and deep learning. In Evolution in Computational Intelligence (pp. 679–689). Springer.
Gayathri, S., Gopi, V. P., & Palanisamy, P. (2021). Diabetic retinopathy classification
Md. Nahiduzzaman: Data curation, Conceptualization, Investiga based on multipath cnn and machine learning classifiers. Physical and Engineering
tion, Methodology, Validation, Formal analysis, Writing – original draft. Sciences in Medicine, 1–15.
Gondal, W. M., Köhler, J. M., Grzeszick, R., Fink, G. A. and Hirsch, M. “Weakly-
Md. Robiul Islam: Conceptualization, Investigation, Methodology,
supervised localization of diabetic retinopathy lesions in retinal fundus images,” in
Validation, Formal analysis, Data curation, Writing – original draft. Md. 2017 IEEE international conference on image processeing (ICIP). IEEE, 2017, pp.
Omaer Faruq Goni: Conceptualization, Methodology, Validation, 2069–2073.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and
Formal analysis, Data curation, Writing – original draft, Writing – re
applications. Neurocomputing, 70(1–3), 489–501.
view & editing. Md. Shamim Anower: Conceptualization, Methodol Honnungar, S., Mehra, S. and Joseph, S. “Diabetic retinopathy identification and severity
ogy, Validation, Formal analysis, Investigation, Writing – review & classification,” Fall 2016, 2016.
editing, Supervision. Mominul Ahsan: Methodology, Visualization, Islam, M. R., M. A. M. Hasan, and Sayeed, A. “Transfer learning based diabetic
retinopathy detection with a novel preprocessed layer,” in 2020 IEEE Region 10
Conceptualization, Formal analysis, Writing – review & editing, Super Symposium (TENSYMP). IEEE, 2020, pp. 888–891.
vision. Julfikar Haider: Visualization, Formal analysis, Conceptuali Islam, M. R., Hasan, M. N., and Nahiduzzaman, M., “Severity grading of diabetic
zation, Methodology, Validation, Writing – review & editing, retinopathy using deep convolutional neural network.” International Journal of
Innovative Science and Research Technology, vol. 6 no. 1, pp. 1395–1401.
Supervision. Marcin Kowalski: Conceptualization, Methodology, Islam, S. M. S., Hasan, M. M. and Abdullah, S. “Deep learning based early detection and
Formal analysis, Writing – review & editing, Supervision. grading of diabetic retinopathy using retinal fundus images,” arXiv preprint arXiv:
1812.10595, 2018.
Kassani, S. H., Kassani, P. H., Khazaeinezhad, R., Wesolowski, M. J., Schneider, K. A. and
Declaration of Competing Interest Deters, R. “Diabetic retinopathy classification using a modified xception
architecture,” in 2019 IEEE International Symposium on Signal Processing and
Information Technology (ISSPIT). IEEE, 2019, pp. 1–6.
The authors declare that they have no known competing financial Kar, S. S., & Maity, S. P. (2017). Automatic detection of retinal lesions for screening of
interests or personal relationships that could have appeared to influence diabetic retinopathy. IEEE Transactions on Biomedical Engineering, 65(3), 608–618.
the work reported in this paper. Liu, H., Yue, K., Cheng, S., Pan, C., Sun, J., & Li, W. (2020). Hybrid model structure for
diabetic retinopathy classification. Journal of Healthcare Engineering, 2020.
Lachure, J., Deorankar, A., Lachure, S., Gupta, S. and Jadhav, R. “Diabetic retinopathy
Data availability using morphological operations and machine learning,” in 2015 IEEE international
advance computing conference (IACC). IEEE, 2015, pp. 617–622.
Majumder, S. and Kehtarnavaz, N., “Multitasking deep learning model for detection of
The authors do not have permission to share data.
five stages of diabetic retinopathy,” arXiv preprint arXiv:2103.04207, 2021.
Mahmoud, M. H., Alamery, S., Fouad, H., Altinawi, A., & Youssef, A. E. (2021). An
References automatic detection system of diabetic retinopathy using a hybrid inductive machine
learning algorithm. Personal and Ubiquitous Computing, 1–15.
Mumtaz, R., Hussain, M., Sarwar, S., Khan, K., Mumtaz, S., & Mumtaz, M. (2018).
Afza, F., Sharif, M., Khan, M. A., Tariq, U., Yong, H. S., & Cha, J. (2021). Multiclass skin
Automatic detection of retinal hemorrhages by exploiting image processing
lesion classification using hybrid deep features selection and extreme learning
techniques for screening retinal diseases in diabetic patients. International Journal of
machine. Sensors, 22(3), 799.
Diabetes in Developing Countries, 38(1), 80–87.
Akram, M. U., Khalid, S., Tariq, A., Khan, S. A., & Azam, F. (2014). Detection and
Nahiduzzaman, M., Islam, M. R., Islam, S. R., Goni, M. O. F. Anower, M. S., and Kwak, K.
classification of retinal lesions for grading of diabetic retinopathy. Computers in
S. “Hybrid cnn-svd based prominent feature extraction and selection for grading
Biology and Medicine, 45, 161–171.
diabetic retinopathy using extreme learning machine algorithm,” IEEE Access, vol. 9,
Ali, A., Qadri, S., Mashwani, W. K., Kumam, W., Kumam, P., Naeem, S., … Anam, S.
pp. 152 261–152 274, 2021.
(2020). Machine learning based automated segmentation and hybrid feature analysis
Nahiduzzaman, M., Nayeem, M. J., Ahmed, M. T. and Zaman, M. S. U. “Prediction of
for diabetic retinopathy classification using fundus image. Entropy, 22(5), 567.
heart disease using multi-layer perceptron neural network and support vector
Alenezi, F., Armghan, A., & Polat, K. (2023). Wavelet transform based deep residual
machine,” in 2019 4th International conference on electrical information and
neural network and ReLU based Extreme Learning Machine for skin lesion
communication technology (EICT). IEEE, 2019, pp. 1–6.
classification. Expert Systems with Applications, 213, Article 119064.
Nahiduzzaman, M., Goni, M. O. F., Anower, M. S., Islam, M. R., Ahsan, M., Haider, J.,
Alyoubi, W. L., Abulkhair, M. F., & Shalash, W. M. (2021). Diabetic retinopathy fundus
Gurusamy, S., Hassan, R. and Islam, M. R., “A novel method for multivariant
image classification and lesions localization system using deep learning. Sensors, 21
pneumonia classification based on hybrid CNN-PCA based feature extraction using
(11), 3704.
extreme learning machine with CXR images,” IEEE Access, vol. 9, pp. 147 512–147
Asha, P. and Karpagavalli, S. “Diabetic retinal exudates detection using extreme learning
526, 2021.
machine,” in Emerging ICT for Bridging the Future- Proceedings of the 49th Annual
Odeh, I., Alkasassbeh, M. and Alauthman, M., “Diabetic retinopathy detection using
Convention of the Computer Society of India CSI Volume 2. Springer, 2015, pp.
ensemble machine learning,” arXiv preprint arXiv:2106.12545, 2021.
573–578.
10
Md. Nahiduzzaman et al. Expert Systems With Applications 217 (2023) 119557
Pandey, S. K., & Sharma, V. (2018). World diabetes day 2018: Battling the emerging Sikder, N., Masud, M., Bairagi, A. K., Arif, A. S. M., Nahid, A. A., & Alhumyani, H. A.
epidemic of diabetic retinopathy. Indian Journal of Ophthalmology, 66(11), 1652. (2021). Severity classification of diabetic retinopathy using an ensemble learning
Powers, D. M. “Evaluation: from precision, recall and f-measure to roc, informedness, algorithm through analyzing retinal images. Symmetry, 13(4), 670.
markedness and correlation,” arXiv preprint arXiv:2010.16061, 2020. Shenavarmasouleh, F. and Arabnia, H. R. “Drdr: Automatic masking of exudates and
Pratt, H., Coenen, F., Broadbent, D. M., Harding, S. P., & Zheng, Y. (2016). Convolutional microaneurysms caused by diabetic retinopathy using mask r-cnn and transfer
neural networks for diabetic retinopathy. Procedia Computer Science, 90, 200–205. learning,” arXiv preprint arXiv:2007.02026, 2020.
Qummar, S., Khan, F. G., Shah, S., Khan, A., Shamshirband, S., Rehman, Z. U., Khan, I. A. Somasundaram, S., & Ali, P. (2017). A machine learning ensemble classifier for early
and Jadoon, W. “A deep learning ensemble approach for diabetic retinopathy prediction of diabetic retinopathy. Journal of Medical Systems, 41(12), 1–12.
detection,” IEEE Access, vol. 7, pp. 150 530– 150 539, 2019. Umapathy, A., Sreenivasan, A., Nairy, D. S., Natarajan, S. and Rao, B. N., “Image
Rahim, S. S., Palade, V., Shuttleworth, J., & Jayne, C. (2016). Automatic screening and processing, textural feature extraction and transfer learning based detection of
classification of diabetic retinopathy and maculopathy using fuzzy image processing. diabetic retinopathy,” in Proceedings of the 2019 9th International Conference on
Brain informatics, 3(4), 249–267. Bioscience, Biochemistry and Bioinformatics, 2019, pp. 17–21.
Raman, V., Then, P., & Sumari, P. (2016). Proposed retinal abnormality detection and Yu, S., Xiao, D., and Kanagasingam, Y. “Exudate detection for diabetic retinopathy with
classification approach: Computer aided detection for diabetic retinopathy by convolutional neural networks,” in 2017 39th Annual International Conference of
machine learning approaches. In In 2016 8th IEEE International Conference on the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2017, pp.
Communication Software and Networks (ICCSN) (pp. 636–641). IEEE. 1744–1747.
Ramani, R. G., & Lakshmi, B. (2017). Automatic diabetic retinopathy detection through Zhou, K., Gu, Z., Liu, W., Luo, W., Cheng, J., Gao, S. and Liu, J. “Multi-cell multi-task
ensemble classification techniques automated diabetic retinopathy classification. In convolutional neural networks for diabetic retinopathy grading,” in 2018 40th
In 2017 IEEE International Conference on Computational Intelligence and Computing Annual International Conference of the IEEE Engineering in Medicine and Biology
Research (ICCIC) (pp. 1–4). IEEE. Society (EMBC). IEEE, 2018, pp. 2724–2727.
Reddy, G. T., Bhattacharya, S., Ramakrishnan, S. S., Chowdhary, C. L., Hakak, S., Kaluri, Zeng, X., Chen, H., Luo, Y. and Ye, W. “Automated diabetic retinopathy detection based
R. and Reddy, M. P. K. “An ensemble-based machine learning model for diabetic on binocular siamese-like convolutional neural network,” IEEE Access, vol. 7, pp. 30
retinopathy classification,” in 2020 international conference on emerging trends in 744–30 753, 2019.
information technology and engineering (ic-ETITE). IEEE, 2020, pp. 1–6. Zhao, Z., Zhang, K., Hao, X., Tian, J., Chua, M. C. H., Chen, L. and Xu, X. “Bira-net:
Samanta, A., Saha, A., Satapathy, S. C., Fernandes, S. L., & Zhang, Y.-D. (2020). Bilinear attention net for diabetic retinopathy grading,” in 2019 IEEE International
Automated detection of diabetic retinopathy using convolutional neural networks on Conference on Image Processing (ICIP). IEEE, 2019, pp. 1385–1389.
a small dataset. Pattern Recognition Letters, 135, 293–298.
11