0% found this document useful (0 votes)
43 views10 pages

Automatic Severity Classification of Diabetic Retinopathy Based On Densenet and Convolutional Block Attention Module

Uploaded by

Nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views10 pages

Automatic Severity Classification of Diabetic Retinopathy Based On Densenet and Convolutional Block Attention Module

Uploaded by

Nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY SECTION

Received March 24, 2022, accepted April 1, 2022, date of publication April 6, 2022, date of current version April 14, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3165193

Automatic Severity Classification of Diabetic


Retinopathy Based on DenseNet and
Convolutional Block Attention Module
MOHAMED M. FARAG 1, MARIAM FOUAD 1,2 , AND AMR T. ABDEL-HAMID 1
1 Department of Electronics Engineering, German University in Cairo, New Cairo 11835, Egypt
2 Chair of Medical Engineering, Ruhr Universitat Bochum 102148, Germany

Corresponding author: Mohamed M. Farag ([email protected])

ABSTRACT Diabetic Retinopathy (DR) - a complication developed due to heightened blood glucose levels-
is deemed one of the most sight-threatening diseases. Unfortunately, DR screening is manually acquired by
an ophthalmologist, a process that can be considered erroneous and time-consuming. Accordingly, automated
DR diagnostics have become a focus of research in recent years due to the tremendous increase in diabetic
patients. Moreover, the recent accomplishments demonstrated by Convolutional Neural Networks (CNN)
settle them as state-of-the-art for DR stage identification. This paper proposes a new automatic deep-
learning-based approach for severity detection by utilizing a single Color Fundus photograph (CFP).
The proposed technique employs DenseNet169’s encoder to construct a visual embedding. Furthermore,
Convolutional Block Attention Module (CBAM) is introduced on top of the encoder to reinforce its
discriminative power. Finally, the model is trained using cross-entropy loss on the Kaggle Asia Pacific
Tele-Ophthalmology Society’s (APTOS) dataset. On the binary classification task, we accomplished (97%
accuracy - 97% sensitivity - 98.3% specificity - 0.9455, Quadratic Weighted Kappa score (QWK)) compared
to the state-of-the-art. Moreover, Our network showed high competency (82% accuracy - 0.888 (QWK))
for severity grading. The significant contribution of the proposed framework is that it efficiently grades
the severity level of diabetic retinopathy while reducing the time and space complexity required, which
demonstrates it as a promising candidate for autonomous diagnosis.

INDEX TERMS Diabetic retinopathy, convolutional neural networks (CNN), attention mechanism, deep
learning.

I. INTRODUCTION eye damage, yet only 200,000 ophthalmologists are available


Diabetes Mellitus is a chronic metabolic disease character- worldwide [3]. Grading inconsistency, critical deficiency in
ized by elevated blood glucose levels or (Hyperglycemia), the available number of ophthalmologists as well as the labo-
which over time affects the blood vessels in the human body rious process remains hindering factors for diabetic retinopa-
on both micro and macro scales. According to the World thy detection. Therefore, automating retinopathy diagnostics
Health Organization (WHO), the number of diabetic people is desired to reduce the high strain on health care systems.
hiked to 422 million in 2014, with an expectation to reach Motivated by this, significant efforts have been directed
700 million by 2045 [1], [2]. One of the long-term diabetic to enhance Computer-aided medical diagnosis (CAMD)
micro-vascular effects is diabetic retinopathy, a progressive systems.
abnormality revealed and detected through ocular patholo- DR grading systems can be categorized into two clusters:
gies, which leads to blocking and bleeding of the retinal segregation of diabetic retinas from healthy ones (binary-
capillaries. Fortunately, early detection can prevent vision classification task) and severity estimation (multi-class clas-
impairment. However, without frequent screening, it may sification task) of affected retinas from class 0 (healthy)
induce irreversible damage. International Diabetes Federa- to class 4 proliferative DR (PDR). Traditional Machine
tion (IDF) affirmed that 93 million diabetics suffer from Learning (ML) algorithms are Artificial Intelligence (AI)
techniques that learn through experience by being exposed to
The associate editor coordinating the review of this manuscript and data. They were employed for detecting diabetes type based
approving it for publication was Chulhong Kim . on patient attributes by Nagaraj et al. [4], they utilized the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 38299
M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

Artificial Flora Algorithm (AFA) [5] for feature selection in devices to get high-quality images. (ii) Implicit stochasticity.
addition to using Gradient Boosted Trees (GBT) [6] as a clas- Retinal fundus images experience large variations caused
sification model. Furthermore, exploited by Gharaibeh et al. by different devices and environmental conditions regarding
in [7] and [8] by employing feature engineering process, then color, contrast, illumination, and size. As a result, the model’s
applying Support Vector Machines (SVM) as a classifier for decision may be distorted. (iii) Fading classes’ disparity. The
DR detection [9]. ML algorithms need personalized expe- threshold chosen for image classification between two closely
rience and domain knowledge to find the most informative distributed classes (e.g., mild and moderate in the APTOS
representation despite its effectiveness. dataset) is blurry, as will be shown in Section III.C, due to
Deep Learning (DL) has gained a foothold in various the dependence on microscale ocular pathologies. To solve
fields by representing the world as a nested hierarchy of the problem of fading disparity, large CNN architectures
concepts, with each concept defined through its relation were employed in the literature to extract more informative
to simpler concepts [10]. Convolutional Neural Networks features, data augmentation and preprocessing were used to
was the standout DL architecture in the late nineties. Since enhance CNNs’ generalizability. Finally, transfer learning
then, it has been used extensively for processing data such was exploited to overcome data shortage.
as images and time series. Moreover, it has demonstrated In this paper, we investigate the efficacy of light-weight
outstanding performance in practical applications such as deep learning architecture for fast and robust severity grad-
Natural Language Processing (NLP) [11], [12] and Computer ing of diabetic retinopathy. Our framework is based on
Vision (CV) problems [13]–[15]. a modified version of DenseNet [31] with integrating an
Exploiting convolutional neural networks’ power for a attention mechanism with the former architecture for more
medical domain has developed more robust solutions, specif- feature refinement. Furthermore, we observe the effect of
ically in the DR domain. [16] and [17] demonstrated data imbalance on the model performance and mitigate
the effectiveness of such a technique for retinal vessel such an effect by using an imbalanced learning technique.
segmentation. Similarly, by leveraging Generative Adver- As shown in Fig.1, we first pass and preprocess the reti-
sarial Networks (GANs), Zhao et al. [18] could synthesize nal image for quality enhancement, afterward, the images
fundus images. Dai et al. [19] utilized multi-sieving con- were passed to the DenseNet encoder C for feature extrac-
volutional neural network and image to text mapping for tion, then the features are sent to the attention module A
Micro-aneurysms (MA) early detection. [20] evaluated the for more improved representation. We train our model by
performance of three recognized CNN architectures: VGG16, freezing Densnet’s encoder, trained on the ImageNet [32]
VGG19, and InceptionV3 [21], [22] by employing transfer dataset for the model’s convergence acceleration by using the
learning and fine-tuning for binary and multi-class classi- pre-trained weights θC and training only the attention module
fication. Zeng et al. [23] introduced Siamese-like architec- and the classification head using APTOS data in a supervised
ture [24] trained with transfer learning to classify fundus approach to update θA & θM . Our main contributions are as
images into two grades. Kassani et al. [25] used a Multi- follows:
Layer Perceptron (MLP) as a classification head on top of
1) We developed a modified architecture to reduce the
the modified Xception network [26] by concatenating differ-
time needed for training and inference while enhancing
ent feature maps from different convolutional layers. Four
DR severity grading by using a relatively small model
Inception models were utilized [27] for multi-class classi-
with 8.5 million parameters compared to 10.8 million
fication, each fundus image was sliced into four quadrants,
in the previous work.
and each quadrant will be classified by one of the four
2) We exploited the effect of using an attention mecha-
models. [28] exploited blended models to enhance data rep-
nism as a supplementary module for feature refinement
resentation, Gangwar et al. [29] investigated a new hybrid
which led to an increase in accuracy while preserving
model inherited from Inception and ResNet architectures.
low model complexity.
Al Antary et al. [30] designed ResNet architecture integrated
3) We tested the effect of using an imbalanced learning
with a Multi-Scale Attention mechanism (MSA) to enhance
approach to alleviate the impact of data imbalance on
the representational power of the encoder. Moreover, they
the model’s performance and proved its efficiency in
employed a multi-level approach for feature reuse for more
enhancing the overall metrics.
improvements. Since our focus in this paper is to enhance the
4) We utilized transfer learning only by freezing
grading system both on binary and multi-class classification
the convolutional encoder without extra fine-tuning
tasks, we observed drawbacks related to the aforementioned
which led to relatively low number of learnable
algorithms despite their success ranging from high time and
parameters (150K).
space complexity to drop out mitigating the severe data
imbalance inherited. The paper is divided as follows. The related work
DR severity grading remains a challenging task due to three is presented in Section II. In Section III, the methodol-
factors: (i) Data rarity. Acquiring massive labeled data is ogy is presented. In Section IV, the results and discus-
a crucial issue for DL and more significant in the medical sions are demonstrated. Finally, conclusions are provided in
domain due to the data privacy issues or/and having costly Section V.

38300 VOLUME 10, 2022


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

FIGURE 1. In the scheme of our proposed approach, In the network training step (upper), we pass a batch of labeled preprocessed images X to
our convolutional encoder C for feature extraction, then an attention mechanism A for feature refinement. Finally, in the testing phase (lower),
we directly pass the data to the network to predict the image class.

II. RELATED WORK back-propagation path to reinforce less common occurrences


Deep learning has been deployed extensively in DR due during the learning process by Boix et al. [37] for perfor-
to the rising of the transfer learning paradigm that offers mance enhancement. Moreover, they deployed this technique
fast convergence and performance enhancement while reduc- in different deep learning architectures, using APTOS data
ing the need for massive data and computational resources. for binary and severity grading tasks. Zhang et al. deployed
This has opened the door for more robust algorithms in a Source-Free Transfer Learning (SFTL) [38] model for
the medical domain. Wang et al. [33] developed Lesion- referable DR, which utilized the unlabelled retinal images
Net; the main aim of the network was to aim was to add to alleviate the challenges of medical data annotation and
lesion detection to severity grading to reinforce the rep- privacy. They applied their algorithm to APTOS dataset for
resentational power of the encoder. The architecture was binary and multi-class classification tasks.
built on InceptionV3, which was trained and validated
using a private dataset. An ensemble stacking approach III. METHODOLOGY
was investigated by Qummar et al. [34] by using five In this section, we present the details of our framework.
reputable architectures (Resnet50, InceptionV3, Xception, First, we introduce APTOS data, followed by data prepro-
DenseNet121, DenseNet169) in order to improve produced cessing, then data augmentation, balancing, and analysis.
feature maps. Furthermore, they used the Kaggle EyePACS Finally, we introduce our architecture, training settings, and
dataset to assess the model. A hybrid deep learning model evaluation metrics.
introduced by Cortes et al. [35] was built using InceptionV3
encoder for feature extraction and then training Gaussian Pro- A. DATASETS
cess (GP) regressor to get uncertainty of the prediction using In 2019 (APTOS) dataset2 was released on the Kaggle web-
EyePACS and Messidor-2 datasets, for DR binary classifi- site3 as a part of public competition for DR detection. The
cation task. The EfficientNet-B3 architecture was deployed main aim of using fundus imaging was to classify disease
by Sugeno et al. [36] for both binary and severity classifi- severity by producing a probability that an image located
cation using APTOS dataset. Furthermore, they developed a in one of five clusters: No DR, Mild, Moderate, Severe,
method for lesion detection and validated with ground truth and Proliferative DR. This data was collected by Aravind
exploiting DIARETDB11 dataset. Meta-Plasticity, a bio- Eye Hospital in India, 13,000 (approximately) images were
inspired phenomenon, was artificially implemented at CNN’s
2 https://www.kaggle.com/c/aptos2019-blindness-detection
1 https://www.it.lut.fi/project/imageret/diaretdb1/ 3 https://www.kaggle.com/

VOLUME 10, 2022 38301


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

FIGURE 3. 2D representation for APTOS data, class 0 forms dense clouds


in low dimensional feature space while having scattered representation
for other classes due to data shortage.

Furthermore, by its projection in lower-dimensional feature


space, using Principle Component Analysis (PCA) to lower
FIGURE 2. In visual comparison between (a) Raw fundus image and the data dimensionality to 500-D followed by applying the
(b) Pre-processed fundus image, we observe the removal of the black
side borders, by removing the black pixels and applying a Gaussian filter,
t-Distributed Stochastic Neighbor Embedding (t-SNE) algo-
the clarity of blood vessels and other bio-markers enhanced significantly. rithm to analyze data distribution across different classes [40],
Intuitions were developed by exploiting Fig.3:
• Class 0 forms feature clusters all over the 2-D space,
provided at this competition; however, we had only access to making it one of the easiest classes to be detected.
the ground truth labels of 3662 images. • Classes (1-4) have acute overlapping, which generates
a challenging task for the algorithm to fit a proper
B. DATA PRE-PROCESSING hyperplane.
The uninformative black areas on the sides of the images • We artificially clustered the data to form only two
were first trimmed then a circular crop was applied to have regions (infected and healthy), and we observed that DL,
a centered retinal image. Moreover, a filtering technique was based on our understanding, is robust enough to solve the
exploited [39] to enhance the clarity of visual bio-markers, binary classification problem.
and described by the following equations: Thus, to mitigate such effect, we used an Inverse Number
00 0 of Samples (INS) learning approach where each class is
X = α×X +β ×X +γ (1)
0 *
weighted inversely proportional to its distribution in the orig-
X = G(σx ) X (2) inal dataset as described in (3) and (4):
X indicates the input data, G(σx ) is a 2D Gaussian kernel with 1
a standard deviation of σx = 15 in x-direction and ∗ is the W = (3)
Si
convolution operation. α, β, and γ were chosen empirically W
to be 5, −4, and 70, respectively. Finally, each image was W = PN (4)
( i=1 Wi ) × N
normalized to be in the range of [0, 1], resized to (256 × 256)
using bilinear interpolation, and decoded to a 32-bit floating- W , Si are 1-D array that contains weights for each class and
point. Fig.2 represents the input and output from the pre- the total number of samples per class. N is the total number
processing step. of classes and i is the class index. As a consequence, we used
the updated version of the Categorical Cross-Entropy loss
C. DATA AUGMENTATION, BALANCING & ANALYSIS function (CCE):
Investigating APTOS data revealed severe class imbalance, N M
1 XX
i.e., 49.29%, 10.1%, 27.28%, 5.27%, 8.05% belonging to Jcce = − wi × yim × log(hθ (xm , i)) (5)
M
normal, mild, moderate, severe, and proliferative DR grades. i=1 m=1

38302 VOLUME 10, 2022


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

FIGURE 4. Proposed network architecture for DR severity grading.

FIGURE 5. Convolutional block attention module illustration.

where then features are refined using Convolutional Block Atten-


• M number of training samples tion Module (CBAM) for data representation enhancement.
• N total number of classes Afterward, converting them to a one-dimensional array by
• wi weight for class i averaging each feature map generated by the attention module
• yim target label for training example m for class i using Global Average Pooling (GAP) followed by classifica-
• xm input image for training example m tion head. Fig.4 demonstrates an illustration of our network.
• hθ model with learnable parameters θ
1) DenseNet
Random horizontal, vertical flipping, and rotation were DenseNet was used as the main backbone for the proposed
applied to reduce overfitting and improve the model’s gen- approach. Huang et al. [31] demonstrated the robustness of
eralizability. Furthermore, it was employed using the on-fly the architecture against the vanishing gradient problem while
augmentation technique, which means it was utilized as a reducing the number of parameters and reducing over-fitting
layer in our network to perform the transformations men- for smaller datasets. The main idea was to connect CNN
tioned during the training phase. layers using a dense connectivity pattern such that each layer
has a concatenated input of all preceding feature maps:
D. ARCHITECTURE
Xl = Hl ([X0 , X1 , . . . , Xl−1 ]) (6)
Our algorithm consists of a backbone model (convolutional
base) and an attention module. First, the backbone network where [X0 , X1 , . . . , Xl−1 ] is the concatenated feature maps to
is used as a feature extractor for the input fundus image, and the l th layer, Hl (. ) is a hidden layer that exploits consecutive

VOLUME 10, 2022 38303


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

TABLE 1. Network architecture. Algorithm 1 The Implementation of DenseNet+CBAM


Model
Input: Pre-trained DenseNet encoder C with Imagenet
weights θC , labelled data (X , Y ), α, β, γ , batch size B,
class weights W .
Output: θA for the attention mechanism A, θM for the clas-
sification head.
Initialisation : Learning rate lr
1: Apply preprocessing X 0 = Ftransform (X , α, β, γ )
2: for epoch = i from 1 to N do
3: for each mini-batch do
4: for image k in mini-batch b do
5: Apply on-fly Keras augmentation
6: Extract & refine the features
z = hθA (hθC (X [k]0 ))
7: Encode flattened features z0 = hθM (z)
z0
8: Compute ŷk = argmax( P e k z0j
)
N
j=1 e
9: end for
10: Update MLP via θM ←− Adam(∇θM (JCCE ), θM , W ,
lr );
11: Update A via θA ←− Adam(∇θA (JCCE ), θA , W , lr )
12: end for
13: end for

compression ratio and GAP (Global Average Pooling), GMP


(Global Maximum Pooling) were applied across spatial axes.
0 0 0
Ms (F ) = σ (K 7×7 ([SpAvgpool (F ); SpMaxpool (F )])) (9)
operations: batch normalization (BN) [41], followed by a F 0 ∈ RH ×W ×C is channel’s attention module output, K H ×W
rectified linear unit (RELU) [42], and convolution operation is a convolution kernel with one filter applied to concate-
to have a non-linear transformation of the input. Architecture nation of SpAvgpool and SpMaxpool , where both of them are
design allows feature reuse based on routing the previous employed across the channel axis. Fig.5 shows an illustration
feature maps to the next convolution layer. For pooling, for CBAM.
Transition Block (TB) was integrated, consisting of batch
normalization, 1 × 1 convolution, and 2 × 2 average pooling. 3) PROPOSED IMPLEMENTATION
DenseNet169 was selected from the DenseNet family after
2) CONVOLUTIONAL BLOCK ATTENTION MODULE (CBAM) comparing different reputable pre-trained models. It demon-
CBAM has proved its success in more curated feature gener- strated robust performance across all classes due to its nature;
ation and performance enhancement [43]. It consists of two as discussed in Section III.D.1, the flow of information from
sub-modules: low-level features to the upper layers allowed the model to
• Channel Attention Module. exploit as many features as possible. A series of experiments
• Spatial Attention Module. were made to choose the best depth to check if we need this
The attention module is used to infer two feature maps: high complexity while achieving the best performance, and
we decided to reduce the number of convolutional blocks
Fatt = (Ms (Mc (F) ⊗ F)) ⊗ (Mc (F) ⊗ F) (7) in the fourth dense block to be 12 instead of 32. Exploiting
attention mechanisms offer more flexibility to DL algorithms
Fatt ∈ RH ×W ×C is the refined features, F ∈ RH ×W ×C is to focus more on the vital information related to the target and
CBAM’s input, Ms ∈ RH ×W ×1 is a 2-D spatial attention map, discard those not related. CBAM has provided that it is capa-
⊗ denotes element wise multiplication and Mc ∈ R1×1×C is ble of enhancing the model’s representational power without
1-D channel attention map: increasing the complexity, so we tried different positions for
Mc (F) = σ (MLP(GAP(F)) + MLP(GMP(F)) (8) CBAM in our modified DenseNet, and we observed that the
best performance is accompanied by positioning CBAM on
where σ (. ) is Sigmoid function, MLP is shared network with top of the convolutional encoder plus reducing the training
hidden units ∈ RC/r×1×1 , C is the number of channels, r is a time significantly due to the decrease in spatial dimensions.

38304 VOLUME 10, 2022


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

TABLE 2. DR severity grading results on APTOS dataset. The best, second best, and third best are marked by italics, boldface, and underline, respectively.
M:million.

TABLE 3. Binary classification results on APTOS dataset. The best, second best, and third best are marked by italics, boldface, and underline, respectively.

TABLE 4. Statistics for training and validation datasets. when we have a different data domain compared to ImageNet
data, and we took our decision based on the interesting results
provided by [44], where ImageNet weights demonstrated its
robustness as a feature extractor for retinal disease detec-
tion. A reduction ratio (r = 32) and kernel size (K 7×7 ) at
channel and spatial modules, respectively for CBAM. Due
to its performance, our fourth trial was compared to other
state-of-the-art techniques. Detailed information regarding
our architecture is demonstrated in Table.1.

Four trials were investigated to show the gradual increase in E. TRAINING SETTINGS
performance: Our splitting policy was 90% to 10% of our dataset to
• Baseline DensNet169. form a training and validation set. A stratified data splitting
• DenseNet169 + INS. technique was exploited to preserve the same distribution
• DenseNet169 + CBAM. to ensure the classes’ distribution consistency between the
• DenseNet169 + CBAM + INS. aforementioned subsets and the original set. Table.4 demon-
Where our baseline has only DenseNet169’s modified strates the training and validation data statistics. Furthermore,
encoder without attaching CBAM as a supplementary mod- K-fold validation was implemented to have more robust
ule, moreover as well as not deal with the class imbalance results, and due to the size of the dataset, we used 5-folds
inherited in APTOS data. For the second trial, we demon- to train on 80% and test using 20% of the original dataset
strated the effectiveness of using cost-sensitive learning to at each trial. Furthermore, the maximum number of epochs
penalize our model when dealing with minor classes and was limited to 400 while using an early stopping callback to
vice-versa. CBAM was added to DenseNet without using avoid overfitting by saving the best weights corresponding
INS to investigate its effectiveness in the third trial. Finally, to the minimum validation loss. Finally, we used the exact
we investigated the enhancements added by CBAM and INS stratified data splitting mechanism to ensure the same class
together. The four experiments had followed the same settings distribution at each fold.
by freezing DenseNet’s encoder and using transfer learning Our algorithm was implemented using TensorFlow [45]
to accelerate the training of CBAM and Softmax layers. Fine- and trained on Tesla V100 GPU provided by Google Co-lab.
tuning was not used in contrast to the conventional framework We trained four networks for 1000 epochs, and with a small

VOLUME 10, 2022 38305


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

batch size of 32 images, the RGB images are passed to the


network after being preprocessed. Furthermore, using Adam
optimizer with learning rate 3 × 10−4 , β1 = 0.9, β2 = 0.909,
and weighted CCE that was demonstrated at (5) as a loss func-
tion. Specifically, we exploited Sparse (CCE) based on the
label encoding found in the dataset. All layers in CBAM were
initialized by He normal initializer [46], Dropout layer was
set with a rate equal to 0.5 to improve generalizability, and
Softmax as a final layer [47]. For severity grading, the high-
est probability represents the level of the sample, whereas,
for binary classification, the output was thresholded at 0.5.
We introduce the overall training process of our proposed
approach in Algorithm 1.

F. EVALUATION METRICS
Five common metrics were used to evaluate the model’s
performance.

1) ACCURACY (ACC)
The percentage of correct predictions that a model can
achieve. Accuracy is defined as
TP + TN
Acc = (10)
TP + TN + FP + FN
2) SENSITIVITY (SENS)
is the percentage of positive cases that is classified as actual
positive. Identified as follows
TP
Sens = (11)
TP + FN
3) SPECIFICITY (SPEC)
is the percentage of negative cases that are detected as actual
negative. Identified as follows
TN
Spec = (12)
TN + FP
4) F1-SCORE (F1)
is the harmonic mean of precision and recall and is identified
as
TP
F1 = 1
(13)
TP + 2 (FN + FP)

5) KAPPA-SCORE
to assess the agreement between our model and the original
rater. Identified as follows
P
i,j wi,j Oi,j
k =1− P (14)
i,j wi,j Ei,j
FIGURE 6. Normalized confusion matrices for (a) Baseline DenseNet169
where true positives (TP) are the classes classified correctly (b) DenseNet + CBAM (c) DenseNet169 + INS (d) DenseNET169 + INS +
by the algorithm, true negatives (TN) are samples predicted CBAM.
correctly as negative, false positives (FP) are samples that are
miss-classified as a positive class, and false negatives (FN) IV. RESULTS AND DISCUSSIONS
are samples miss-classified as negative class. Oi,j is the Fig.6 illustrates the performance of our four algorithms.
observed matrices, and Ei,j is the expected one. In Fig.6.a, we observe that without the weighted loss function,

38306 VOLUME 10, 2022


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

it is easier for our model to be distorted and have robust specificity [48]. Finally, our model achieved low training
behavior only in detecting major classes (0 and 2) and vice- time (9 seconds/epoch) and relatively high inference speed
versa. As can be shown in Fig.6.b, attaching CBAM to our (1.166 seconds/32 images) compared to the MSA network
encoder enhanced the detection of classes 1 and 3 by 63.3% that achieved 5 seconds exploiting the same batch size.
and 90.9%, while reducing class 2 only by 4.6%. Class imbal-
ance mitigation allowed better performance, as can be seen in V. CONCLUSION
Fig.6.c, class 1,3 detection is enhanced by 43.3% and 236.4% In this study, we exploited a new CNN model based on
respectively, with respect to the baseline algorithm. Finally, DenseNet169 architecture integrated with CBAM as an addi-
using CBAM with DenseNet169 while adding weighted loss tional component to be added for representational power
has demonstrated thriving performance across all classes. enhancement. The proposed method demonstrated robust per-
Regardless of the reduction in class 2 by 14.63%, classes formance and comparable quality metrics while reducing
(1,3 and 4) exhibit significant improvements by 44.2%, the burden of space and time complexity. Furthermore, a
43.24%, and 235%. An average QWK and accuracy values 2-D Gaussian filter enhances fundus images’ quality. Finally,
of 0.8072 and 72.3% were achieved, respectively, using the we used INS to form our weighted loss function to tackle
5-fold k-validation technique. As shown in Section III.E, the class imbalance to improve the model’s prediction across
we trained our algorithm only for 400 epochs to reduce the all classes. For future research direction, we evaluate the
computational cost of training five different models, further performance of different CBAM configurations. Moreover,
training will provide more intact results. experimenting with different imbalanced learning techniques
As shown in Table.2, the proposed method outperformed and increasing the dataset size will lead to better performance.
the literature work on the severity grading task and showed
comparable results. Our model enhanced accuracy and QWK REFERENCES
by 0.4% and 24.9% while decreasing inference time by [1] Global Report on Diabetes, World Health Organization (WHO), Geneva,
cutting down the number of parameters by 83% compared Switzerland, 2016.
[2] Diabetes Atlas, International Diabetes Federation (IDF), Brussels,
to [28]. We achieved almost the same accuracy as [29] while Belgium, 2019.
reducing the model size. Our best trial had an increase in [3] Diabetes Eye Health a Guide for Health Professionals, International Dia-
accuracy of about 7% compared to the AM-InceptionV3 [37] betes Federation (IDF), Brussels, Belgium, 2017.
[4] P. Nagaraj, P. Deepalakshmi, R. F. Mansour, and A. Almazroa, ‘‘Arti-
method. SFTL model achieved high accuracy at the severity ficial flora algorithm-based feature selection with gradient boosted
grading task. However, they did not tackle the problem of tree model for diabetes classification,’’ Diabetes, Metabolic Syndrome
data imbalance. EfficientNet-B3 [36] achieved higher accu- Obesity: Targets Therapy, vol. 14, pp. 2789–2806, Jun. 2021, doi:
10.2147/DMSO.S312787.
racy but only for major classes, while we achieved compa- [5] L. Cheng, X.-H. Wu, and Y. Wang, ‘‘Artificial flora (AF) optimization
rable accuracy in minor classes, and finally, We compared algorithm,’’ Appl. Sci., vol. 8, no. 3, p. 329, Feb. 2018. [Online]. Available:
our best trial with the MSA network without multi-level https://www.mdpi.com/2076-3417/8/3/329, doi: 10.3390/app8030329.
[6] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’
feature reuse [30]. We had almost the same accuracy with 2016, arXiv:1603.02754.
an increase in QWK by 3.6%. Furthermore, we achieved [7] N. Gharaibeh, O. M. Al-hazaimeh, A. Abu-Ein, and K. M. O. Nahar,
a better confusion matrix across all classes than the litera- ‘‘A hybrid SVM NAÏVE-bayes classifier for bright lesions recognition
in eye fundus images,’’ Int. J. Electr. Eng. Informat., vol. 13, no. 3,
ture while reducing time and space complexity by a 45% pp. 530–545, Sep. 2021, doi: 10.15676/ijeei.2021.13.3.2.
reduction in parameters. Severity grading f1-score was not [8] O. M. Al Hazaimeh, K. M. O. Nahar, B. Al Naami, and N. Gharaibeh,
mentioned in the literature. However, by using CBAM and ‘‘An effective image processing method for detection of diabetic retinopa-
thy diseases from retinal fundus images,’’ Int. J. Signal Imag. Syst. Eng.,
INS, an enhancement was established by 21.4% with respect vol. 11, no. 4, p. 206, 2018, doi: 10.1504/IJSISE.2018.093825.
to the baseline DenseNet169. [9] T. Evgeniou and M. Pontil, ‘‘Support vector machines: Theory and appli-
Our algorithm demonstrated robustness against other deep cations,’’ in Advanced Course on Artificial Intelligence, vol. 2049. Berlin,
Germany: Springer, 2001, pp. 249–257, doi: 10.1007/3-540-44673-7_12.
learning architectures for the binary classification task,
[10] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
as shown in Table.3. Above all, the literature did not deal MA, USA: MIT Press, 2016.
with the class imbalance problem. Most of the algorithms [11] O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, L. Deng, G. Penn, and
implemented did not consider its effect on quality metrics D. Yu, ‘‘Convolutional neural networks for speech recognition,’’
IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 10,
which provided overestimated outcomes, as most of them pp. 1533–1545, Jul. 2014.
were predicting perfectly only for major classes due to ignor- [12] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,
ing data inherited imbalance. Furthermore, as mentioned A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, ‘‘WaveNet:
A generative model for raw audio,’’ 2016, arXiv:1609.03499.
in Section III.C, binary grading did not require complex [13] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel,
architectures to solve it, our algorithm with lower param- and Y. Bengio, ‘‘Show, attend and tell: Neural image caption generation
eters achieved almost the same metrics compared to other with visual attention,’’ 2015, arXiv:1502.03044.
[14] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
algorithms, plus when we artificially formed two clusters 2018, arXiv:1804.02767.
(infected and normal), the classes were balanced which [15] L. A. Gatys, A. S. Ecker, and M. Bethge, ‘‘A neural algorithm of artistic
helped literature algorithms to excel in such a task. More- style,’’ 2015, arXiv:1508.06576.
[16] Z. Yan, X. Yang, and K.-T. Cheng, ‘‘Joint segment-level and pixel-wise
over, our algorithm exceeds the minimum limits provided losses for deep learning based retinal vessel segmentation,’’ IEEE Trans.
by English National Screening Program for sensitivity, and Biomed. Eng., vol. 65, no. 9, pp. 1912–1923, Sep. 2018.

VOLUME 10, 2022 38307


M. M. Farag et al.: Automatic Severity Classification of DR Based on DenseNet and CBAM

[17] Y. Wu, Y. Xia, Y. Song, Y. Zhang, and W. Cai, ‘‘NFN+: A novel network [41] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep network
followed network for retinal vessel segmentation,’’ Neural Netw., vol. 126, training by reducing internal covariate shift,’’ 2015, arXiv:1502.03167.
pp. 153–162, Jun. 2020. [42] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectifier neural net-
[18] H. Zhao, H. Li, S. Maurer-Stroh, and L. Cheng, ‘‘Synthesizing retinal works,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., vol 15, Fort Laud-
and neuronal images with generative adversarial nets,’’ Med. Image Anal., erdale, FL, USA, 2011.
vol. 49, pp. 14–26, Oct. 2018. [43] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, ‘‘CBAM: Convolutional block
[19] L. Dai, R. Fang, H. Li, X. Hou, B. Sheng, Q. Wu, and W. Jia, ‘‘Clin- attention module,’’ 2018, arXiv:1807.06521.
ical report guided retinal microaneurysm detection with multi-sieving [44] M. Raghu, C. Zhang, J. Kleinberg, and S. Bengio, ‘‘Transfusion: Under-
deep learning,’’ IEEE Trans. Med. Imag., vol. 37, no. 5, pp. 1149–1161, standing transfer learning for medical imaging,’’ 2019, arXiv:1902.07208.
May 2018. [45] M. Abadi et al., ‘‘TensorFlow: A system for large-scale machine learning,’’
[20] A. Jain, A. Jalui, J. Jasani, Y. Lahoti, and R. Karani, ‘‘Deep learning for 2016, arXiv:1605.08695.
detection and severity classification of diabetic retinopathy,’’ in Proc. 1st [46] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers:
Int. Conf. Innov. Inf. Commun. Technol. (ICIICT), Apr. 2019, pp. 1–6. Surpassing human-level performance on ImageNet classification,’’ in Proc.
[21] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.
large-scale image recognition,’’ 2014, arXiv:1409.1556. [47] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
[22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
the inception architecture for computer vision,’’ 2015, arXiv:1512.00567. from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 56, pp. 1929–1958,
[23] X. Zeng, H. Chen, Y. Luo, and W. Ye, ‘‘Automated diabetic retinopathy 2014.
detection based on binocular siamese-like convolutional neural network,’’ [48] R. Taylor and D. Batey, The Need to Screen. Hoboken, NJ, USA: Wiley,
IEEE Access, vol. 7, pp. 30744–30753, 2019. 2012, ch. 4, pp. 29–41, doi: 10.1002/9781119968573.ch4.
[24] G. Koch, R. Zemel, and R. Salakhutdinov, ‘‘Siamese neural networks for
one-shot image recognition,’’ in Proc. ICML Deep Learn. Workshop, vol. 2,
Lille, France, 2015.
[25] S. H. Kassani, P. H. Kassani, R. Khazaeinezhad, M. J. Wesolowski,
K. A. Schneider, and R. Deters, ‘‘Diabetic retinopathy classification using MOHAMED M. FARAG received the B.Sc.
a modified xception architecture,’’ in Proc. IEEE Int. Symp. Signal Process. degree in renewable energy engineering from The
Inf. Technol. (ISSPIT), Dec. 2019, pp. 1–6. University of Ain-Shams, Cairo, Egypt, in 2018.
[26] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convolu- He is currently pursuing the M.Sc. degree in elec-
tions,’’ 2016, arXiv:1610.02357. tronics engineering with the German University in
[27] Z. Gao, J. Li, J. Guo, Y. Chen, Z. Yi, and J. Zhong, ‘‘Diagnosis of Cairo (GUC), Cairo. He is currently working as
diabetic retinopathy using deep neural networks,’’ IEEE Access, vol. 7, a Teaching Assistant with the Electronics Depart-
pp. 3360–3370, 2019. ment, German University in Cairo. His research
[28] J. D. Bodapati, V. Naralasetti, S. N. Shareef, S. Hakak, M. Bilal,
and professional interests include data science,
P. K. R. Maddikunta, and O. Jo, ‘‘Blended multi-modal deep ConvNet
medical image processing, machine learning, and
features for diabetic retinopathy severity prediction,’’ Electronics, vol. 9,
no. 6, p. 914, May 2020. deep learning.
[29] A. K. Gangwar and V. Ravi, ‘‘Diabetic retinopathy detection using transfer
learning and deep learning,’’ in Evolution in Computational Intelligence,
V. Bhateja, S.-L. Peng, S. C. Satapathy, and Y.-D. Zhang, Eds. Singapore:
Springer, 2021, pp. 679–689.
[30] M. T. Al-Antary and Y. Arafa, ‘‘Multi-scale attention network for diabetic MARIAM FOUAD was born in Cairo, Egypt,
retinopathy classification,’’ IEEE Access, vol. 9, pp. 54190–54200, 2021. in 1993. She received the bachelor’s degree in
[31] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ‘‘Densely electronics engineering from the German Univer-
connected convolutional networks,’’ 2016, arXiv:1608.06993. sity, Cairo, in 2015, and the master’s degree in
[32] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, 2017 with the thesis titled ‘‘Joint Near-Infrared
A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ‘‘Ima- and Bio-Impedance Spectroscopy Non-Invasive
geNet large scale visual recognition challenge,’’ 2014, arXiv:1409.0575. Glucose Monitoring’’. She is currently pursuing
[33] Y. Wang, M. Yu, B. Hu, X. Jin, and Y. Li, ‘‘Deep learning-based detec- the Ph.D. degree with the Department of Medical
tion and stage grading for optimising diagnosis of diabetic retinopa-
Engineering, Ruhr University Bochum in collab-
thy,’’ Diabetes/Metabolism Res. Rev., vol. 37, no. 4, p. e3445, 2021, doi:
oration with the German University in Cairo. Her
10.1002/dmrr.3445.
[34] S. Qummar, F. G. Khan, S. Shah, A. Khan, S. Shamshirband, Z. U. Rehman, current research interests include the utilization of deep learning concepts in
I. Ahmed Khan, and W. Jadoon, ‘‘A deep learning ensemble approach for ultrasound special applications such as harmonic imaging and synthetic data
diabetic retinopathy detection,’’ IEEE Access, vol. 7, pp. 150530–150539, generation.
2019, doi: 10.1109/ACCESS.2019.2947484.
[35] S. Toledo-Cortés, M. De La Pava, O. Perdómo, and F. A. González,
‘‘Hybrid deep learning Gaussian process for diabetic retinopathy diagnosis
and uncertainty quantification,’’ 2020, arXiv:2007.14994.
[36] A. Sugeno, Y. Ishikawa, T. Ohshima, and R. Muramatsu, ‘‘Simple methods AMR T. ABDEL-HAMID was born in Cairo,
for the lesion detection and severity grading of diabetic retinopathy by Egypt, in 1974. He received the B.S. degree in
image processing and transfer learning,’’ Comput. Biol. Med., vol. 137, electronics and communications engineering from
Oct. 2021, Art. no. 104795, doi: 10.1016/j.compbiomed.2021.104795. Cairo University, Cairo, in 1997, and the M.S.
[37] V. Vives-Boix and D. Ruiz-Fernández, ‘‘Diabetic retinopathy detection and Ph.D. degrees in electrical and computer engi-
through convolutional neural networks with synaptic metaplasticity,’’
neering from Concordia University, Canada, in
Comput. Methods Programs Biomed., vol. 206, Jul. 2021, Art. no. 106094,
2001 and 2005, respectively. Currently, he is an
doi: 10.1016/j.cmpb.2021.106094.
[38] C. Zhang, T. Lei, and P. Chen, ‘‘Diabetic retinopathy grading by a source-
Assistant Professor with the Department of Elec-
free transfer learning approach,’’ Biomed. Signal Process. Control, vol. 73, tronics Engineering and the Vice Dean of Stu-
Mar. 2022, Art. no. 103423, doi: 10.1016/j.bspc.2021.103423. dent Affairs with the German University in Cairo
[39] B. Graham. (2015). Kaggle Diabetic Retinopathy Detection Compe- (GUC), Egypt. His main research interests include the Internet of Things
tition Report. [Online]. Available: https://www.kaggle.com/c/diabetic- applications and security, system-on-a-chip design and verification, func-
retinopathy-detection/discussion/15801 tional verification techniques, tools, and languages, IP watermarking, and
[40] G. Hinton and S. Roweis, ‘‘Stochastic neighbor embedding,’’ in Proc. Adv. security protocols verification.
Neural Inf. Process. Syst., 2002, pp. 833–840.

38308 VOLUME 10, 2022

You might also like