Sharma 2021
Sharma 2021
Keywords: Peripapillary Atrophy (PPA, hereafter) is one of the major indicators of an irreversible eye disease named
Retinal images Glaucoma. An early detection of PPA is vital to avoid vision reduction caused by pathological myopia, or a
Peripapillary Atrophy permanent loss caused by Glaucoma. PPA is a pigmented crescent-shaped abnormality around the optic disc
Glaucoma diagnosis
region. In this paper, we propose a fusion method to detect the atrophy by combining ResNet50-based deep
Deep learning CNN models
features along with clinically significant statistical features of the region of interest (containing PPA). We show
Image processing
results of extensive experimentation with six publicly available databases, on which the system is also trained.
The testing is on a rather difficult dataset of community camp-based images captured under poor lighting
conditions with hand-held low-resolution ophthalmoscopes. We show encouraging experimental results of the
combination of the generalization power of deep features and the medical science behind clinical hand-crafted
features. Such a feature combination out-performs any one of the modalities in the difficult experimental
set. We compare our results with the state-of-the-art in the area. The proposed method outperforms existing
methods with average sensitivity, specificity and accuracy values of 95.83% each. To the best of our knowledge,
this is the best accuracy reported in the literature, on large and varied datasets.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Sharma), [email protected] (M. Agrawal), [email protected] (S.D. Roy),
[email protected] (V. Gupta), [email protected] (P. Vashisht), [email protected] (T. Sidhu).
https://doi.org/10.1016/j.bspc.2020.102254
Received 24 June 2020; Received in revised form 1 September 2020; Accepted 27 September 2020
Available online 13 October 2020
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Fig. 2. The variability in PPA in the retina: low, mild to severe (left-to-right), details
in Section 1.
2
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
construct a decision boundary for optic disc and PPA region. A basic
assumption here is that PPA pixels have high red content than green
(more towards white) i.e., the red-by-green ratio > 1. The authors
in [10] uses a statistical feature-based automatic detection of PPA in
retinal images. Instead of classifying images, the method learns and
classify the sector regions i.e., inferior, superior, nasal and temporal.
The authors in [9] further divide each sector into three sub-sectors of
30◦ each and computes six statistical features namely mean, standard
deviation, smoothness, third moment, uniformity, and entropy for each
sub-sector. The authors feed these feature values to KNN, SVM, and
BPNN classifiers. Based on a similar set of statistical features, the
authors in [17] perform a multi-class classification study. They use
SVMs to classify the retinal image into mild-PPA, severe-PPA and no-
Fig. 3. The complete pipeline for the proposed work representing three major steps:(a)
PPA. The authors in [18] perform a novel study on the association
Region of Interest (ROI) Extraction, (b) Data Augmentation, (c) Feature extraction from
between PPA and children with myopia. A BIF extraction along with pre-trained ResNet50 [13] and a statistical model, and lastly, (d) concatenation of
SVM classifier are used for PPA detection. The authors use an edge features and training of the final dense layers: Details in Section 3.
detection algorithm (based on greyscale variation), followed by outlier
removal for the segmentation task.
In the recent years, deep learning has become popular one such neu-
3.1. Preprocessing steps
ral network-based method [11] constructs a feature vector of intensities
(for each pixel) using a 25 × 25 square window. In addition to intensity
features, it uses a distance-based feature (the Euclidean distance of PPA is a pigmented abnormality across the optic disc boundary
a pixel from the optic disc centre). The authors in [12] propose a which is the key area of interest for its detection [2]. Optic disc
novel PPA area segmentation using a multi-task fully convolutional detection is primary step in the pre-processing. Considering this known
network (MFCN). The method estimates the disc and PPA-disc area optic disc centre as the origin, We crop a square region around the optic
simultaneously. The authors build a glaucoma prediction model using disc (OD, hereafter) centre, since the PPA abnormality lies around the
the segmented PPA sizes (area in pixels) in the superior, nasal, inferior optic disc boundary. In addition to this, we use a series of geometric and
and temporal parts. A few studies has also been performed on OCT colour/grey level transformation techniques have been used for data
images for PPA detection, which are not easily to obtain, as mentioned augmentation required to obtain a model with greater generalization.
in Section 1. The proposed work on the other hand, uses retinal images,
which do not have the cost and portability issues associated with
3.1.1. Region of interest extraction
collecting OCT images.
Due to inter-dataset image variability in terms of spatial dimensions,
Most of the above-mentioned methods require some kind of feature
extraction procedure, which may be a cumbersome task, so as to select we resize the complete retinal image to 605 × 700. Further, as men-
the right/optimal feature set for the classification. With the advent of tioned previously Peripapillary Atrophy is mainly found lying around
deep learning methods, a large number of medical imaging problems the optic disc boundary. Hence, it seems computationally efficient to
can be solved with proper training and parameter tuning. Various deep crop the region around the optic disc in order to prevent the errors
learning classification models exist in the literature, out of which the resulting from other image pathologies and artefacts [20]. For detecting
ResNet50 [13]-based model [13] for image classification suggested in the optic disc in the complete retinal image, the vessel convergence
this paper outperforms the others networks with respect to the task at characteristics along with the optic disc brightness circular property
hand. In addition to the features extracted from a deep network, a set of has been incorporated to find a best possible point near the optic
clinically significant hand-crafted features are concatenated along with disc centre. Considering the detected optic disc as the centre of the
these to enhance the robustness of PPA detection. The organization region of interest, a square shaped region of dimension 224 × 244 and
of paper is as follows. Section 3 describes the complete methodology 299×299 has been cropped as ResNet50 and VGG16 networks take input
behind the problem at hand. Section 4 explains our fusion strategy for image of dimensions 224 × 224 and InceptionV3 considers dimensions
PPA detection. Section 5 presents results of extensive experiments with 299 × 299, respectively [21,22]. The optic disc covers less than 20%
the system. Section 6 concludes the work and gives pointers to areas area in complete retinal images [23,24] and mean beta-PPA area is
for related future work.
about 30%–34% of mean optic disc area [25]. Thus the cropped region
3. Methodology includes the complete optic disc and a major portion of the PPA in it.
Fig. 3 shows the complete pipeline of the proposed work. In the first
Deep neural networks achieve outstanding performance in various step, it extracts the region of interest containing Peripapillary atrophy.
domains and thus their inclusion seems suitable for various segmenta- Another step of optic disc segmentation has been performed for finding
tion and classification problems in the medical domain [10]. However, the features corresponding to non-OD pixels in the image.
the problem with deep learning methods is the availability of large
datasets with reliable ground truth labelling. This is even more severe 3.1.2. Data augmentation
in the medical domain, where ground-truthed databases typically tend Deep learning models do not generalize well on small datasets
to be small in size. There seems to be a direct relationship between data
and thus, often over-fit data. Some techniques to avoid this issue are
availability and network performance. Transfer learning is a possible
to add regularization, dropout, or batch normalization layers. Data
solution to the data scarcity problem. A model is typically trained
augmentation is another methodology to avoid over-fitting, where the
on a huge dataset in one domain, and its knowledge is transferred
training data is enhanced by performing various transformations [26].
to a small data of in another domain [19]. This paper incorporates
deep features learned from pre-trained network. Additionally, we take Apart from the geometric transformations, generative adversarial net-
advantage of the statistical features, a commonly accepted choice for works (GANs) have recently been used to synthetically generate a new
PPA classification [10,14]. Fig. 3 shows a stage diagram of the proposed set of images [27]. In practice, for medical imaging, choosing the
method. The proposed work requires a series of pre-processing steps right kind of augmentation is a tricky task. Various geometric and
before performing the final training of the network. It includes optic intensity transformations such as rotation, vertical and horizontal flips
disc detection, region of interest extraction, and data augmentation to of original image, shear, zooming, vertical and horizontal shifts have
enhance the variability of the training data. been employed in this work. Further, we have also used image colour
3
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Fig. 4. ResNet50 [13]-based proposed architecture for PPA classification. The upper block represents the pre-trained layers from the ResNet50 model on the ImageNet dataset.
The lower block represents the clinically statistical features extracted from ROI. We concatenate features from both models to generate a 2117-dimensional feature vector. We
finally train the dense layers (at the right) for binary classification. Our experimentation in Section 5.3 shows that the combination of deep features and these clinically significant
features outperforms any of the individual models.
4. Fusion of deep and clinically significant features for PPA detec- Fig. 5. Details of the CONV and IDEN blocks from the overall proposed network of
tion Fig. 4. These blocks contain shortcut connections which help to deal with the van-
ishing gradient problem in deeper connections. The upper block contains convolution
operations and lower consists of the identity skip connections.
A key contribution of the proposed work is the fusion of deep
learning-based and clinically significant statistics-based statistical fea-
tures. The first model learns deep features corresponding to the PPA 4.1.1. Transfer learning
abnormality. The second model incorporates global pixel-level clini- In any deep architecture, shallow layers learn to capture the local
cally significant statistical features to classify a given image into a features (such as edges, corners, curves etc.) whereas the deep layers
healthy case, or PPA. The dense network in the last phase is trained learn the global features specific to the dataset. These local features are
on the fused set of features to achieve the best possible classification usually common to all sets of general images and can be transferred
accuracy. The next two subsections explain the deep learning and to other datasets. This technique of transferring the knowledge of a
statistical-based evaluation of features and their concatenation in order network learned in a particular task of a domain, can easily help us to
save training time and effort. We also require less training data for some
to train the dense layers. Fig. 4 shows a detailed block diagram of the
problems in other domains [19–22]. In this paper, we use a ResNet50
process.
network with pre-trained weights (using the ImageNet dataset [29])
to extract the local features. Towards the end of the network, we re-
4.1. Deep learning-based models train the final layers on retinal dataset, keeping the weights of initial
layers unchanged. Additionally, we have experimented with fine-tuning
of some of the final layers, in order to make the network more robust
In the current scenario, detecting Peripapillary atrophy (PPA) in the to the specific problem at hand. We define the source ImageNet dataset
retinal image can be modelled as a binary-label classification problem domain 𝐷𝑠 and the task 𝑇𝑠 of 1000-class classification domain as a
in deep learning. In a traditional classification problem [13–28], the two-element tuple
model seeks to find the best mapping function 𝐹 from the input training
𝐷𝑠 = (𝜒𝐼 , 𝑃 (𝑋𝐼 )), 𝑇𝑠 = (𝛾𝐼 , 𝜂𝐼 ) (1)
images 𝑋 to output labels 𝑌 [19,21]: 𝐹 ∶ 𝑋 → 𝑌 , where given the
training images 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 } and their corresponding ground where, 𝜒 represents the feature space and 𝑃 (𝑋): marginal probability of
truth labels 𝑌 = {𝑦1 , 𝑦2 , 𝑦3 ...., 𝑦𝑛 } in a vector form. The PPA detection is sample data point 𝑋. 𝛾 is the label space and 𝜂 represents the required
a binary classification with two labels 𝑦𝑖 ∈ {0, 1}. It has been shown that objective function. Here, 𝜂 can be defined as 𝑃 (𝑌 |𝑋): the probability
distribution of 𝑌 given 𝑋 such that 𝑌 = {𝑦1 , 𝑦2 , 𝑦3 , … ..., 𝑦𝑛 } and 𝑋 =
introduction of deep learning in PPA classification effectively improves
{𝑥1 , 𝑥2 … 𝑥𝑛 }. In analogy to this, target domain 𝐷𝑡 of retinal database
the accuracy rate. However, large amount of data has to be prepared
paired with the task 𝑇𝑡 of PPA classification can also be defined as
to train the deep neural networks (see Fig. 5). follows:
To deal with this problem, we use data augmentation [26] and
transfer learning [21]. The data augmentation techniques have already 𝑇𝑡 = (𝛾𝑅 , 𝜂𝑅 ), 𝐷𝑡 = (𝜒𝑅 , 𝑃 (𝑋𝑅 )) (2)
been explained in previous section (Section 3.1.2). The next section In the above equations, the subscripts 𝐼 and 𝑅 denote the ImageNet
explains our transfer learning process in detail. and Retinal databases respectively. The transfer learning can now be
4
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Table 1 depths for the ResNet architecture. The depth has been changed to
In the proposed architecture (Fig. 4, the layers till
38 and 101, as in the original paper [13]. The results shows that
CONVB+(IDENx2) have been pre-trained on ImageNet
database. The 2048-dimensional extracted features are con- in case of the shallower ResNet-38 network (where the bottleneck
catenated with a 68-dimensional clinically significant statisti- blocks are reduced to three) does not achieve the best performance
cal feature vector, to generate a 2117-dimensional full feature due to insufficient learning. Further, for the deeper network ResNet-101
vector. Towards the right of Fig. 4, a fully connected layer
of 128 neurons is added, followed by one with a sigmoidal (where the convolutional blocks increased from 5 to 22 in comparison
activation function. The classification training is performed with ResNet-50), we do not get any significant improvement in the test
only for last layers and rest are frozen i.e., no weight updating accuracy, irrespective of the extra computations in ResNet-101.
is done for them.
Layer details Output size Layer status
conv-maxPool 112 × 112 × 64 freeze 4.2. Clinically significant statistical handcrafted features
CONVB+(IDENBx2) 56 × 56 × 256 freeze
CONVB+(IDENBx3) 28 × 28 × 512 freeze
CONVB+(IDENBx5) 14 × 14 × 1024 freeze Apart from the deep features extracted by the ResNet50 model,
CONVB+(IDENBx2) 7 × 7 × 2048 freeze
we have also experimented with clinically significant statistical hand-
Avg Pool 1 × 1 × 2048 𝑠 = 2, 𝑘 = 3
Concatenate 1 × 1 × 2117 FT crafted features obtained from the PPA images. For this purpose, the
Dense(FC) 1 × 128 FT region-of-interest extraction requires optic disc segmentation [8]. In
Loss(sigmoid) 2-D, BCE – a healthy image, pixels lying outside the optic disc periphery have
different texture and colour properties in comparison to corresponding
pixels present in a PPA image [6]. All current methods use only
defined as a nonlinear mapping/function that tries to learn target task statistical features to diagnose the PPA, as visual characteristics can be
𝑇𝑡 in target domain 𝐷𝑡 , using the given knowledge of source domain 𝐷𝑠 easily learned through statistical methods. This motivates us to perform
and task 𝑇𝑠 . For the problem of PPA classification, 𝐷𝑠 ≠ 𝐷𝑡 and 𝑇𝑠 ≠ 𝑇𝑡 a textural study of PPA and not-PPA pixels. The key interest is to use
as ImageNet dataset mostly consists of natural images [30], whereas the region outside the optic disc boundary as the main area of interest.
PPA is a retinal database of a different feature space in comparison We remove the optic disc portion from the cropped RGB image of
to natural images. Further, our problem at hand is to perform PPA- dimension 224 × 224. The resultant image gives black pixels inside
or-healthy categorization of retinal images, which is entirely different the OD region keeping outside unchanged. This focal region is then
from the ImageNet 1000-class classification of natural images. transformed into a rectangular-shaped area using polar-transformation
of Cartesian image. The inspiration for this transformation comes from
4.1.2. Architecture implementation the work of [9], where the authors mention that the polar coordinate
The proposed ResNet50-based network has the following main vari-
system is size- and translation-invariant and dimensionless. This makes
ations compared to the original architecture.
it invariant to the actual PPA location in the entire posterior region.
1. The image size has been re-scaled to 224 × 224 (ResNet50 is Given the optic disc centre as the origin, the image is transformed
designed to train images of dimension 224 × 224 from the Ima- considering the length till the boundary as the radii. In the stated
geNet dataset [13–31]). In the end, after the bottleneck layers, case, the range (length) of the radial axis in the polar coordinate
we add a pooling layer of stride (𝑠 = 2) and kernel size (𝑘 = 3) system may not be same, since the detected optic disc coordinates
to reduce the feature dimension to (1, 2048). Apart from this, a might have some deviation with respect to the true centre in different
concatenation layer is added to merge the two set of features images. This results in different dimensions of the polar transformed
together. Fig. 4 shows the concatenation block and the final fully image for different cases. To standardize input dimensions, we resize
connected layers to estimate the probability of PPA. all rectangular images to one unique size. For the transformed re-
2. Apart from this, we have performed a fine-tuning of some layers gion, we calculate statistical features such as Energy, Homogeneity,
has been done and remaining have been kept frozen i.e., no Contrast, and Correlation of the grey-level occurrence matrix, in all
weight updating is being performed for them. Fig. 4 represents
three channels. Along with this, we take the same set of these features
all the respective changes done at each network branch.
from raw RGB retinal images. We finally concatenate them together
3. The final classification layer has been modified completely as per
in one feature vector. The first-order statistical features are commonly
the problem requirement.
used to obtain the texture information in an image [10]. Further, we
4. The proposed work follows end-to-end learning, thus requires no
incorporate the histogram of intensity features, which helps to include
additional human intervention while training (see in Tables 1
and 2). the variant colour pattern present. Inspired from [9] we also calculate
the biologically inspired features for the same region. All these feature
To provide better insights into the proposed work, we have experi- sets have been experimented to perform the PPA classification using a
mented with different popular deep network architectures. We have linear SVM classifier. The basic requirement for these methods is to find
experimented with VGG16, ResNet50, and InceptionV3 networks using a best possible optic disc boundary, and hence, the best possible region
pre-trained layers from the ImageNet database. These three archi- of interest. Out of all these features, the first proposed set of features
tectures have proven to perform best for ImageNet Classification in
works best with an average accuracy of 92% as shown in Table 3.
different years (2014, 2015 and 2016) [29,30]. VGG16 follows the
In the proposed work, deep features have been concatenated with
conventional convolution layers along with max-pooling and activation
layers. ResNet50 network offers an advantage over VGG using all skip the statistical features and the system is finally trained on a fully
connections, which eliminate the vanishing-gradient problem for deep connected neural network. The clinically significant statistical feature
networks. Lastly, the Inception architecture follows the multiple size set has been normalized (or standardized) before training, and gives
kernels for convolution along with pooling within one layer. All three an average accuracy of 95.83% on augmented dataset. The novelty
networks have same fully connected layers after the maximum pool- of the proposed work lies in the combination of ResNet50-based deep
ing operation. In our experimentation, a ResNet50-based pre-trained features with the statistical hand-crafted features which outperforms
network outperforms the others in terms of classification accuracy. In all the previous approaches. Section 5 presents detailed experimental
addition to this, we have also experimented with different network results.
5
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Table 2
Details of the seven Retinal Image datasets for PPA. The first six are
publicly available with no PPA annotation and last one is a AIIMS
collected, used together for validation. This is a rather challenging one,
collected from community camps of AIIMS, New Delhi. This has images
captured under poor lighting conditions, poor sitting setups, and low-
resolution hand-held ophthalmoscopes. All images have been manually
marked by an experienced ophthalmologists from AIIMS. Section 5.1 has
the details.
S.No Dataset PPA Normal Resolution
images images
1 Drishti [33] 25 20 2896 × 1944
2 Refugee [34] 48 110 1634 × 1634
3 Rim [5] 74 50 –
4 Messidor [32] 24 42 768 × 584
5 Drive [5] – 3 768 × 584
6 Drions [5] 5 69 600 × 400
7 AIIMS Dataset 134 – 1536 × 1152
5. Experiments
5.1. Datasets Fig. 6. The proposed method performs well for a wide variety of difficult PPA
pathology variations, which includes impressive performance on a community camp-
based dataset with poor illumination, poor sitting conditions and low-resolution
To validate the proposed architecture, we have used a wide variety hand-held ophthalmoscopes. The first row shows difficult cases with a wide variety of
of datasets. We use six public datasets Rim, Drive, Drions [5], Messi- PPA textures. (The text box below each figure shows the corresponding PPA probability
dor [32], Drishti [33], and Refugee [34]. All these images have been as estimated by our method.) The second row shows retinal images with poor resolution
annotated by a Glaucoma expert and labelled them into Healthy, PPA, due to blurring or other motion artefacts. Section 5.3 has the details.
and other, out of which first two categories are used for this paper.
Apart from these, we have collected and used a dataset of 134 images.
These community camp-based challenging images have been annotated
by ophthalmologists from the All India Institute of Medical Sciences
(AIIMS), New Delhi. In total, our complete dataset contains 315 PPA
images, where {25,74,48,24,5,134} are from Drishti, Rim, Refugee,
Messidor, Drions, and the AIIMS datasets. Further, the dataset contains
315 Normal images in order to balance the training, where {20, 50,
110,42 ,69} images are from the Drishti, Rim, Refugee, Drive, Messidor
and Drions sets, respectively.
The proposed architecture has been trained and tested altogether
on publicly available datasets [5–34] and locally collected dataset from
AIIMS New Delhi.
5.2. Augmentation Fig. 7. A representative example of successful PPA detection in spite of high illumi-
nation, and some noise as well. In this case, the ResNet50 model-based features gives
low probability value of 0.479 and mis-classifies it as healthy. The statistical features
A large number of augmentations has been performed to enhance are relatively unaffected by the high illumination (a probability value of 0.66). The
the variability of data. Table 4 shows all the combinations of augmen- proposed fused feature technique enables the successful classification with a probability
tations along with their test accuracy. Initially, each type is experi- values of 0.78.
mented, and later all possible combinations are performed. It has been
observed that rotation with 45◦ increment has performed poorly over
other augmentation (rotation with 180◦ types. The vertical flip of image present as shown in first row of Fig. 6. Further, the fused strategy works
has performed better than the set of four rotations. Further, noise and successfully for images with bad quality due to blurriness, noise and
motion blur are the most common artefacts in retinal images which can camera motion artefacts as shown in the second row of Fig. 6.
be observed through accuracy results shown in Table 4 corresponding
to each of these augmentations. The final augmentation set used for the 5.3.1. Robust PPA classification using fusion approach: Success in handling
paper are sharpening, noise, motion blur, and vertical flip of training difficult cases
images as shown in Table 4. The proposed algorithm gives the best of both worlds i.e. takes
the knowledge from both hand-crafted and deep neural network-based
5.3. Experiments with a wide variety of PPA images: poor illumination, features altogether. Figs. 7–9 show some of the most challenging cases,
blurriness and noise where the proposed feature fusion clearly outperforms the individual
approaches of statistical and ResNet50-based models. Fig. 7 shows an
In this section, we illustrate encouraging performance of the pro- image with high illumination, which results into same colour intensity
posed method in handling some rather difficult cases. The collected of the PPA region as the optic disc. The ResNet50 method fails to
test images have wide variation of PPA in terms of colour, texture and classify it correctly, whereas the statistical and fusion approaches are
intensity. In addition to this, the query images are exposed to different able to successfully detect PPA in the images. Another query image
artefacts such as varied level of lightning conditions, camera motion with a slight dark region around optic disc has been mis-classified as
and noise. Additionally, some have a combination of one or more of PPA by the ResNet50 approach (predicted score 0.89) as shown in
these. The proposed algorithm handles the different textures of PPA Fig. 8. However, the clinically significant statistical features predict it
6
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Table 3
Performance comparison with state-of-the art methods. For the algorithms in [9,10] the obtained
test accuracies are 94%, 95% as stated in [9,10] and 79%, 90% on the challenging AIIMS
community camp dataset (the first four rows of Table 3) respectively. We propose a new set
of clinically significant statistical features (details in Section 4.2), which give an impressive 92%
accuracy (the sixth row). The last two rows are based on a deep learning model using transfer
learning (TL) framework. The last row is the proposed fusion of statistical and deep network
features, which gives an accuracy of 95.83%. The authors in [9] and [10] present their results
on their own private datasets, and have not publicly shared their code, either. For a fair workable
comparison, we have implemented the algorithms described by these authors on our challenging
AIIMS community camp dataset, and presented the results, thereof. The ∗ represents the fact that
the data used in the paper has been collected and annotated by experienced ophthalmologists at
AIIMS New Delhi. Section 5.5 has the details.
S.No Method Dataset Test accuracy
(%age)
1 Biologically_inspired features [9] Dataset [9] 94 [9]
2 Statistical features [10] Dataset [10] 95 [10]
3 Biologically_inspired features [9] AIIMS dataset∗ 79
4 Statistical features [10] AIIMS dataset 90
5 Biologically inspired [9] + Statistical [10] AIIMS dataset 72
features
6 Statistical_features (Proposed) AIIMS dataset 92
7 ResNet50(with TL) (Proposed) AIIMS dataset 93.75
8 ResNet50(with TL) + Statistical feat AIIMS dataset 95.83
(Proposed)
Table 4
This table summarizes experimentation performed with various augmentations methods. We have experi-
mented with standard geometric transformations such as rotation, flipping, motion blur, shear etc. both
individually, as well as with combinations of the same. To enhance the variability of the training data, we
have also experimented with colour/grey level-based transformations such as unsharp masking, blurring and
noise. A combination of unsharp, noise, motion blur, and vertical flip gives the best performance among all
types. Section 5.2 has the details.
S.No Augmentation type No of images Test accuracy
1 Rotation(45) 2660 79
2 Motion Blur 1064 85
3 Zoom 2128 89
4 Sharpening 1064 89
5 Gaussian Noise 1064 92
6 Vertical flip 1064 82
7 Unsharp+Noise+MotionBlur 2128 92.5
8 Unsharp+Noise+Motion- Blur+Vertflip 2660 93.7
9 Unsharp+Noise+Motion- Blur+Zoom 3724 87.5
10 Unsharp+Noise+Blur+Vertflip+ Zoom+Shear+ Rotate 6384 89
5.4. Implementation details Fig. 8. An example of another difficult query image(healthy), with some dark texture
around the optic disc. The ResNet50 model misclassifies this as a PPA case. A
classification with clinically statistical features correctly classifies this as a healthy
(𝑇 𝑃 + 𝑇 𝑁) case. The proposed fused feature technique enables the successful classification with
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3) the minimum PPA probability among the three (0.009), which indicates a healthy eye.
Total no of images The adjoining bar plot indicates the relative PPA probability scores corresponding to
(𝑇 𝑁) the three methods.
𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 = (4)
𝑇𝑁 + 𝐹𝑃
(𝑇 𝑃 )
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (5)
𝑇𝑃 + 𝐹𝑁 starts with cropping the region of interest with major portion contains
In order to have fair comparison with other existing methods,
the PPA around the optic disc [35]. In the Rim dataset, the images are
we have implemented two representative major state-of-the-art meth-
disc centred, so no cropping step is required. Afterwards, the images
ods [9], and [10] on the collected test dataset. We obtain achieves an
accuracy score of 95.83%, which outscores the above two state-of-the- are re-scaled to dimensions 224 × 224 (usual image size suitable for
art methods on the same dataset (Table 3). Our system implementation the ResNet50 model). This also seems to be a good spatial size for PPA
7
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
Fig. 11. Implementation details: No over-fitting. The graphs above show the loss and
accuracy performance of our PPA classification for training and validation data with
each epoch. The model clearly learns the data without any over-fitting. More details
are in Section 5.4.
Fig. 10. Successful classification on some representative challenging images: The first
row shows the query PPA images collected from community camps. The second row
camp dataset. The comparison has not been performed on current
represent the healthy images from publicly available datasets. The probability scores
shows the successful classification in spite of blurry/varied pathological cases. The text methods [12], [17] and [18] as the former mainly focus on segmenting
box shows the PPA detection probability from our proposed fusion approach, and the the atrophy, and the latter studies the association between PPA and
ground truth (GT) marking by experienced ophthalmologists. Details in Section 5.3.1. myopia. Table 3 gives a summary, which puts the proposed work in
perspective. The first four rows depict the test accuracies for existing
methods (biologically inspired feature detection [9] and statistical fea-
detection, as the optic disc diameter is typically 50–100 pixels wide tures diagnosis [10]), whereas the rest of the rows represent the accu-
in a complete 605 × 700 resolution retinal image taken with common racies corresponding to the proposed approaches (clinically-significant
ophthalmoscopes. For the Rim database images, 80% of retinal image statistical features, and deep neural network-based).
region is occupied by the optic disc. In other datasets, the optic disc The top six methods from rows 1–6 in Table 3 are based on manual
covers only 10%–20% of the complete region. In order to have a good feature calculation from RGB images using texture and colour charac-
spatial resolution all together for all datasets, we select an optimal teristics of the retinal image. The existing methods are based on two
resolution of 224 × 224. seminal approaches [9,10]. The first approach [9] takes biologically
In the proposed ResNet50-based network, the bottleneck layers are inspired features and the second [10] performs sector-wise classifica-
kept fixed and a global average pooling layer is added, which is then tion on the ROI by considering six different features (mean, standard
followed by a fully connected layer of 128 neurons. At the end, a binary deviation, smoothness, third moment, uniformity, and entropy) for
classification is performed using a sigmoid activation function. For each sub - sector. The test accuracies obtained using algorithms in [9],
optimizing the objective function, a stochastic gradient descent (SGD) [10] on the challenging AIIMS community camp dataset is far inferior
optimizer has been used with learning rate set to 0.001, along with a to that of the authors on their own private dataset in [9,10].
step decay by half with every 10 epochs of the training process. The We propose a new set of clinically significant statistical features (of
momentum value is 0.9. The loss function is binary cross-entropy and 69-dimensions) containing GLCM matrix-extracted features (Contrast,
Fig. 11 shows the loss curve with 100 epochs. Further, the accuracy Correlation, Energy, Homogeneity) in four different directions, for all
curve with respect to epochs has been shown in Fig. 11. In order to three channels (3 × 4 × 4 = 48). In addition to this the raw RGB image
cope with the over-fitting problem, a dropout layer with 5% rate has pixel-based texture information features (contrast, standard deviation,
been used at the end of fully connected layer. The proposed algorithm energy, mean) for all channels (3 × 4 = 12) have been calculated.
has been implemented using Keras library and NVIDIA Quadro P5000 Finally, the set of features (mean, entropy, and standard deviation)
GPU, with 128GB RAM on a Windows10 system. is constructed from vertical and horizontal gradients of the grey-scale
image for all three channels (3×3 = 9). Thus, we have a 69-dimensional
5.5. Comparison with other approaches feature vector (48 + 12 + 9 = 69). The PPA datasets corresponding
to [9] and [10] are not available in public domain (neither is their
We have compared the proposed work with all other major state- code). To have a acceptable fair evaluation, we have implemented the
of-the art methods [9,10] on the same challenging AIIMS community above state-of-the-art approaches, and run our implementation of their
8
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
9
A. Sharma et al. Biomedical Signal Processing and Control 64 (2021) 102254
[30] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep [33] Drishti Retinal Database retrieved from, 2014, https://cvit.iiit.ac.in/projects/
convolutional neural networks, Commun. ACM 60 (6) (2017) 84–90. mip/drishti-gs/mip-dataset2/Home.php, (Accessed April 2014).
[31] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected [34] Refugee Retinal Database Retrieved from, 2018, https://refuge.grand-challenge.
convolutional networks, in: Proc. IEEE International Conference on Computer org/, (Accessed 18 July 2018).
Vision and Pattern Recognition, CVPR, 2017, pp. 2261–2269. [35] A. Sharma, M. Agrawal, B. Lall, Optic disc detection using vessel characteristics
[32] Messidor Retinal Database retrieved from, 2016, http://www.adcis.net/en/third- and disc features, in: Proc. National Conference on Communications, NCC, 2017,
party/messidor/, (Accessed 24 September 2016). pp. 1–6.
10