0% found this document useful (0 votes)
36 views11 pages

Fine Grained Segmentation - Feb 18

This document proposes a learner-reviser CNN for fine-grained segmentation of the optical disc and cup in retinal fundus images. The learner module uses two state-of-the-art CNNs to independently segment the optical disc and cup. Ambiguous pixels predicted differently by the two CNNs are then tested in a pre-trained reviser CNN using patches around the intersection region of the optical disc and cup. The reviser CNN uses quadratic layers to resolve ambiguities. Evaluation on four public datasets shows the proposed approach achieves over 3% improvement in cup segmentation accuracy compared to existing methods.

Uploaded by

khariharasudhan8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views11 pages

Fine Grained Segmentation - Feb 18

This document proposes a learner-reviser CNN for fine-grained segmentation of the optical disc and cup in retinal fundus images. The learner module uses two state-of-the-art CNNs to independently segment the optical disc and cup. Ambiguous pixels predicted differently by the two CNNs are then tested in a pre-trained reviser CNN using patches around the intersection region of the optical disc and cup. The reviser CNN uses quadratic layers to resolve ambiguities. Evaluation on four public datasets shows the proposed approach achieves over 3% improvement in cup segmentation accuracy compared to existing methods.

Uploaded by

khariharasudhan8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Fine-grained segmentation of Optical Disc and Cup with a novel Learner- Reviser CNN

Raja Chandrasekaran1*, Praveen2, Santhosh Krishna B V3


Associate Professor, Department of ECE, Vel Tech Rangarajan Dr. Sagunthala R & D Institute of
Science and Technology, Chennai, Tamil Nadu1

Assistant Professor, Department of Ophthalmology, ACSR MEDICAL COLLEGE, Nellore2


Associate Professor, Department of CSE, New Horizon College of Engineering, Bengaluru,
Karnataka3

Abstract – In the retinal fundus images, at the boundary of the Optical Disc (OD) and Optical
Cup (OC), there is high intra-class variability and small inter-class differences. We propose to
address the ambiguities in segmentation of the OD and OC with a learner-reviser CNN (with
patch-level fine-grained revision). In the learner module, the segmentation is done by two
state-of-the-art CNNs (a deep encoder-decoder CNN and U- Net) independently. The pixels
predicted differently (ambiguous pixels) by the learner CNNs are tested in a pre-trained
reviser CNN. The reviser CNN is pre-trained with patches generated around the
intersectional region of OD and OC labels, generated as a patch dataset. Regarding the
architecture, the reviser CNN is built with Quadratic layers to fit the fine-grained revision. We
carried out the experimentation with four publicly available benchmark datasets: DRISHTI-
GS (101 images), RIMONE-DL (159 images), REFUGE (400 images) and ORIGA (650
images) with n fold cross validations. Around 30% ambiguity was observed between the
predictions of the OD and OC among two state of the art CNNs, that was resolved by fine
grained revision strategy. With DRISHTI-GS and RIMONE datasets the CDED net is
equivalent in performance for OD segmentation, whereas for the OC segmentation of the
learner reviser CNN outperforms with more than 3% improvement in accuracy. The patch-
wise revision around the ambiguities has conceived better outcomes in OD and OC
segmentation, directly addressing the semantic gap and the absence of revision mechanisms
in other state-of-the-art CNNs. Also, across datasets the proposed CNN performs best in the
IoU for the OC segmentation. The dataset that provides patches used for training the Reviser
CNN (Patch dataset) in this work and the complete implemented code is available at [39].

Keywords: Glaucoma, Segmentation- Retinal Optical Disc and Cup, Deep Learning.

1. Introduction:

Glaucoma is a perpetual eye disease that leads to permanent blindness. The diagnosis and treatment
for Glaucoma is vital for controlling the disease. Till now the Glaucoma diagnosis is carried out in two-
stages. Visual field test and intraocular pressure measurement of an eye should be taken. Secondly,
precise measurement of the disc and cup areas of an eye is required. The Cup to Disc Ratio (CDR) is
an important attribute for the efficient diagnosis of that diseases. At present, the cup to disc ratio can
be measured manually with the help of professionals [1]. However, based on the experience and
expertise of the profession, the accuracy has varied due to high intra-class variability and small
inter-class differences. Precise measurement and diagnosis of the disease is very significant.
Therefore, careful segmentation of the OD and OC is very essential. Hence, proposing an effective
and autonomous OD and OC segmentation algorithm has gained more attraction in the medical field
for the last few decades and remains within the scope.

Among the segmentation networks, the efficiency of the Encoder-decoder model is higher [24]. The
module refines concise feature maps, leading to size reduction. The decoder recovers to the original
size. But the semantic gaps and lack of localization are major problems. This is addressed with patch-
based learning. The patch-based classification is used to extract the features from the image patches
to retain the maximum level of spatial information. However, there is a limited article on patch-based
segmentation [3]. In [3], a patch dataset is generated based on the training images whose labels are
already known. For each test image in the patch, the output is predicted in reviser CNN.

1
Specifically, in the fundus images, at the boundary of the OD and OC, there is high intra-class
variability and small inter-class differences, which need to be addressed with a fine-grained
revision. The state-of-the-art CNN-based segmentation models like U- Nets also possess semantic
gap problems as addressed in [22], [24], [25], [26] because of the cascade of the initial encoder and
terminal decoder. There are strong shreds of evidence that this problem needs to be addressed as a
fine-grained task, rather than intensive feature extraction. Fine-grained learning is required for images
with the same shape features with little difference [17]. From the machine learning point of view, fine-
grained classification tasks possess data with high intra-class variability and small inter-class
differences [17] [21]. To address this issue, there are many research works that implement the
training of CNN with Image patches from the Region of Interest and with full Images. Guo- Sen Xie et
al., in [21] proposed a local and global discrimination CNN with one CNN focussed on image patches
(input images with bounding boxes of significant regions) and another CNN focussed on the full
image. Further weight sharing is implemented to invest in the patch and full image details.

Several researchers like Chen Lin Zhang et al., [18] propose new non-linear layers in the deep
learning framework (apart from the non-linear activation function) to improve the performance on fine-
grained tasks [18]. For instance, several researchers utilized Bilinear CNNs, ex: Yingqiong Peng et
al., [19] proposed a Bilinear CNN that combines low-level and high-level features after the feature
extraction stage, thus utilizing non- linear fully connected layers.

Zhang et al., proposed non-linearity with the power mean transformation in the Convolution and Fully
connected layers for fine-grained tasks. The non-linear classifier has higher accuracy compared to the
traditional classifier with conventional linear units [18]. These non-linear layers are generally used at
the Fully connected layers i.e., after feature extraction. We propose a learner-reviser CNN, in which
a learner CNN encompasses 2 CNNs that semantically segment the OD and OC maps independently.
And then, the reviser CNN (which encompasses quadratic layers as proposed for the fine-grained
tasks in works like [20]), tests the patches encompassing the ambiguous pixels (predicted differently
by the learner CNNs). i.e., we propose a semantic segmentation phase (learner CNN) with 2
conventional CNNs and the identified ambiguous pixels are revised in the reviser CNN, which is a
bunch of quadratic layers pre-trained with image patches around OD or OC boundaries (the expected
ambiguities).

The CDED net in [24] encompasses an encoder-decoder CNN for the segmentation of OD and OC
from 3 different datasets (DRISHTI- GS, RIMONE and REFUGE). The encoder is made up of eight
convolution layers with skip connections, inspired by ResNet and the decoder follows the signet
architecture. The researchers also concluded with the semantic gap problem in the U- Net. In [32], the
authors proposed a joint OD and OC segmentation by modifying the much renowned U Net with
feature fusion (the low-resolution feature maps from the subsequent branch of u-net are fused with
the higher-level feature maps in the encoder). Also, spatial and channel attention mechanisms were
incorporated. With all these enhancements the model achieved a Dice of 98.2 and 92.6 for the OD
and OC segmentation.

The CNN in [28] uses Deep Lab V3 and U- net with attention gates for OD segmentation. The spatial
pyramid mapping in Deep Lab V3 encodes feature maps at multi-resolutions. The concatenation of
encoder-decoder feature maps in the conventional U- Net is enhanced with spatial attention. With
tremendous architectural changes, the CNN produces an accuracy of 99.6 for OD segmentation. The
CNN in [27] uses 2 CNNs with similar layers. The red channel was taken as input for disc
segmentation. And the drop-out is removed from conventional U- Net to address the lack of images
for training. The CNN used in [23] uses efficient Net as the encoder and U- Net as the decoder. The
efficient net (Encoder) encompasses mobile inverted bottle-neck convolution and the layers are
scaled to width, depth and resolution enlargement. The complex architecture achieves an accuracy of
96.54, 96.89 percent for OD and OC segmentation. Yoel et al [6] proposed patch-based segmentation
for identifying three main boundaries of the cornea such as epithelium, Bowman's layer and the
endothelium in AS-OCT. The accuracy of the recognition rate of the proposed method has been
improved by using boundary classifiers.

In [7], the researcher aimed for an autonomous segmentation model for the accurate diagnosis.
Wenyu et al suggested CNN patch-based automatic segmentation framework called CM-SegNet. The

2
designed framework contacts different spatial locations of each patch which collects features of the
related edges and adjacent patches. Comparatively, in our work, we have generated the patches at
the boundary of the optical disc and optical cup.

In [34], patches are generated from the boundaries in the hippocampal images to refine the
segmentation process. The work [3] uses ambiguous patches as the query patches. Another work [8]
differentiates the dissimilarities between lung cancer and tuberculosis in the CT images. The
particular CNN used a convolutional Siamese Neural Network for the feasibility of CBIR. The
proposed method supports patch-based segmentation. In this application, lesion patches have been
cropped out to form lung cancer and tuberculosis datasets. The patch dataset is used to train a
CSNN. Finally, a test patch has been created as a query. We generate the test patches around the
pixels differently predicted by two learner CNNs.

Previously, Cataract can be diagnosed using different color fundus images by fully supervised CNN.
In [14], the whole image was split into small patches. Then the features of CNN and RNN are
integrated. The outcomes reveal that the proposed method exhibits more accuracy than baseline
methods.

In [10], a classification algorithm has been proposed for identifying hemorrhagic strokes and after-
effects of the brain using MRI scan images. All the brain related imaging system uses a publicly
available dataset called ATLAS (Anatomical trancing of lesions after stroke). With the patches (of size
64 Χ 64) and whole images, level learning is also integrated. And this method achieved a mean dice
score of 0.754 which is maximum than other existing methods.

The authors in [12] propose an alternate method for the diagnosis of prostate cancer. The anomaly
regions in the images suffer from inter-observer variability. Hence, many automated machine learning
algorithms have been illustrated by many scientists. But, in [12], the author focused on supervised
learning with fine-grained pixel-level annotations. Therefore, the accuracy has been enhanced to
0.985.

In [10], a multi-scale feature Fusion Network based on a self-supervised feature extractor is proposed.
Here, all extracted image slices were divided into overlapping patches. This self-supervising algorithm
learns the mechanism autonomously to improve the efficiency of feature extraction. In addition to the
above listed works, several works employ patch-based learning in the Ophthalmology [4], pulmonary
[5], oncology [11], [13], polyp segmentation [15], mammography [16]. To account on the on the patch
datasets used in these methods, the patches are generated of same size in [4], by sliding window in
[5], around ROI in [11]. Similarly, in this work the patches are generated from the intersectional region
of OD and OC. The work [15] by proposing a hybrid loss function for the integrated learning of the
whole images and patches and the work [16] by training the CNN with patches around the
mammogram anomalies have achieved betterment in accuracy.

In addition to patch wise learning, attention mechanisms are implemented to concentrate on specific
spatial locations of interest. In [9] a CNN system for detecting Covid symptoms using a multiscale
class residual attention (MCRA) network is proposed. The designed architecture uses chest X-ray
image classification. In addition to that, to tackle the inter-class interference problem, the proposed
system uses spatial attention. In our work, we have used spatial attention focused toward the center
of the patches. Experimental results are shown that the proposed method has performed better in
terms of accuracy and F1 score.

Guo- Sen Xie et al., in [21] proposed a local and global discrimination CNN with one CNN focussed
on image patches (input images with bounding boxes of significant regions) and another CNN
focussed on the full image with weight sharing. We propose a reverse strategy for full image learning
first and then a patch-based revision.

The proposed work includes a deep encoder-decoder CNN (termed CNN 1) and U- Net (termed CNN
2) as the learner CNNs for training and testing the full images. The independent results/ segmentation
maps are recorded. The segmentation maps of each image produced by the 2 CNNs are subjected to

3
Logical Ex- OR, which produces output as “High” for the ambiguous pixels. These ambiguous pixels
are tested with reviser CNN for refinement.

Before that, the reviser CNN is trained with patches of different sizes, created around the boundaries /
intersectional region of OD and OC labels (the expected ambiguities). The patches are extracted from
both the image and also the ground truth to exhibit supervised learning by a CNN with quadratic
layers. That is fine-grained learning is executed for ambiguous regions (high intra-class variability and
small inter-class differences).

The gold standard dataset generation processes employ multiple human skills (Ophthalmologists) to
create OD and OC maps. In [28], the OD is labelled by an expert at three different times; the
researchers in [29] mention the low inter-class variability between the OD and OC as "obscured" and
the researchers in [27] mention that the ground truths in DRISHTI GS dataset was obtained from 4
different experts, with a different number of years of experience in Ophthalmology. Juneja, M et al.,
[27] mention that this is done to rule- out the inter-observer variance. The proposed learner–reviser
CNN is the first of the kind to mimic the procedure of manual gold standard dataset creation.

2. Materials and Methods:


2.1. Datasets:

We collected the images from the datasets: DRISHTI-GS, RIMONE-DL, REFUGE, ORIGA from the
source [39].

DRISHTI-GS

The DRISHTI-GS dataset [30] consists of 101 images. The researchers in [30] collected the images
from Aravind Eye Hospital, Madurai, with their consent. The images are resized to 512 Χ 512 pixels
from the size of 2896 Χ 1944 pixels in the dataset. The dataset encompasses OD and OC
segmentation maps obtained from a dedicated marking tool from multiple human experts. In this work,
we have used 90 images for training and 10 images for testing with ten-fold cross validation.

RIMONE-DL

The RIMONE-DL dataset [31] consists of 159 images. In this work, we maintained a training and
testing ratio of 80:20. The researchers collected the images from 3 Spanish hospitals. The images are
resized to 512 Χ 512 pixels for uniformity of training and testing across datasets. The dataset
encompasses OD and OC segmentation maps drawn by multiple human experts. In this work, we
have used 106 images for training and 53 images for testing with 3-fold cross validation.

REFUGE:

The REFUGE (Retinal Fundus Glaucoma Challenge) [37] consists of 400 images publicly released for
research. The large dataset was released with a vision to benchmark deep learning tasks for
Glaucoma diagnosis and prior segmentation of OD and OC. It was released in an AI challenge held in
a workshop called OMIA (Ophthalmic Image Analysis Workshop). In this work, we have used 300
images for training and 100 for testing with four-fold cross validation.

ORIGA:

The ORIGA (Online Retinal fundus Image database for Glaucoma Analysis and research) dataset [38]
consists of 650 images with original fundus images, the corresponding segmentation maps for the OD
and OC. The images were annotated by trained medical professionals from Singapore Eye research
Institute. In this work, we have used 600 images for training and 50 for testing with thirteen-fold cross
validation.

4
Patch Dataset

The dataset that provides the patches used in this work is given at [39]. In the dataset mentioned 50,
000 patches are available. The Patch dataset is created by cropping portions of size 25 × 25, 29 × 29,
33 × 33 from several fundus images in DRISHTI- GS dataset.

The patches are created around the intersectional regions of the OD and OC (as it is known that the
OC lies completely inside OD) where the ambiguity is expected to be maximum. The intersectional
region is shown in the LHS of fig 1. The white pixels are the expected ambiguities. Such ambiguous
points (in class belongingness as OD or OC) are extracted by executing logical EX-OR of the OD and
OC labels. The folder named "Expected ambiguities" in the dataset URL shows the images with
expected ambiguities (i.e., the region of OD excluding the OC). The cropping of patches around the
borders is illustrated in RHS of fig 1. In the dataset portal, an Excel file called "Expected ambiguities"
we have mentioned the 50,000 ambiguous points (pixels) selected from several images. which are
excluded in whole image learning. The 50,000 patches are extracted from the original image and OD
ground truth for supervised learning. The dataset that provides patches used for training the Reviser
CNN (Patch dataset) in this work and the complete implemented code is available at [39].

2.2 Working:

The overall sequence of operations of the learner reviser CNN's as follows:

1. The two learner CNNs get trained with the training images (Full-size fundus images of size
512 Χ 512), as shown in the learner CNN sub-block in Fig 2.
a. Simultaneously, the reviser CNN (shown in the reviser CNN sub-block in Fig 2) is
trained with image patches from the patch dataset.
b. The reviser CNN exhibit multiscale feature extraction and classification in the
quadratic dense layers.
2. The two learner CNNs’ output soft-map prediction is taken logical Ex- OR so that the
ambiguous pixels are identified as depicted in Fig 3.
3. With the ambiguous pixels as the center, patches of 3 sizes (same as the size of patches
used in the training phase of reviser CNN) are created. These patches are tested with the
reviser CNN (shown in the Vertical direction in the reviser CNN sub-block in Fig 2).
4. The reviser CNN’s prediction for ambiguous pixels is substituted at the spatial coordinate in
learner CNNs’ predicted soft-map.

The detailed explanation is as follows:

In the proposed design we first implement the segmentation of OD and OC with 2 CNNs (Deep
Encoder-Decoder CNN, U Net). Henceforth the Deep Encoder-Decoder CNN, U Net shall be called
CNN 1 and CNN 2.

In CNN 1, the encoder path is designed to have 20 sets of Convolution and batch normalization
repeated thrice. Striding of appropriate value is done to maintain the size consistency before max-
pooling. The max-pooling typically takes the feature maps to the next resolution where Convolution
and batch normalization is repeated. In the decoder, 20 sets of convolution and up-sampling are done
to locate the feature maps.

The CNN 2 which is a U- Net model consists of Convolution, batch normalization repeated twice,
followed by max pooling. “Relu" is used as the activation function. Then there is a max-pool layer
before the tensors flow to the next resolution. Eight resolutions are stepping down which form the
steps of the encoder layers.

5
On the contrary, the decoder layers consist of convolution layers, followed by up-samplers to recoup
the image dimensions at each level of the decoder to match with the parallel step of the encoder. The
specialty of the U- Net is the direct connection from each encoder to the parallel decoder in the step
which enables appending of the features along with the recouped features.

2.2.1. Reviser CNN

Simultaneous to the learner CNN, the feature extraction and training of a reviser CNN is executed
with Image patches surrounding the boundaries: i. the OD and the background, ii. OD and OC (the
expected ambiguities).

The training phase of the reviser CNN is shown in horizontal orientation, trained with the patches (of 3
different sizes) from the patch dataset created, as shown in the reviser CNN sub-block in Fig 2, with
the boundary pixel at the centre. We have used patches of size 25 Χ 25, 29 Χ 29, 33 Χ 33 [6].
Patches of the same size from the same spatial coordinates are taken from the labels also for
supervised learning. In this work, the reviser CNN, as shown in the sub-block in Fig 2, encompasses a
convolution layer for feature extraction, followed by quadratic dense layers for classification.

In the multi-scale feature extraction block (in reviser CNN sub-block in fig 2), we have used
convolution kernels of size m, (m+k), (m+2k) where “m” stands for the size of the smallest kernel and
"k" stands for the difference in the size of the image patches. This produces convolution outputs of the
same size (l-m+1), unlike [33], though the feature maps carry multiscale information.

The SAA-NET in [33] extracts multi-scale features for fine-grained segmentation. To focus on multiple
receptive fields, the authors in [33] used masks of different sizes for the image patches. The scale
attention in [33] employs adaptive weights at each scale to emphasize the needed scale. The
multiscale feature extraction block is inspired by [33] and we have made a major change over by
producing feature maps of the same size, though they hold multi-scale information by using patches
of different sizes around the same pixel (Boundary pixel in training and ambiguous pixel in testing).

As the interest is towards the center pixel, spatial attention maps are “weighted” in the center of the
patches of all sizes. In the testing phase of the reviser CNN, the two learner CNNs’ output soft-map
prediction is taken logical Ex- OR so that the ambiguous pixels are identified as depicted in Fig 2.

Quadratic CNN

Compared to [33], in this research work we have complemented the fine-grained learning with a
quadratic dense layer for classification to address the micro level inter-class differences [18]. The
CNNs used in the revision phase include Quadratic (non-linear) layers as the ambiguous pixels
identified in the learner CNN are revised in the reviser CNN. The bunch of quadratic layers are pre-
trained with image patches around OD or OC boundaries (the expected ambiguities). The textures of
micro-level inter-class differences are proven to be significant for fine-grained classification tasks [18].
Hence it is obvious to extend the CNN training by including layers of a higher order [18] learning
space. The higher order training is accomplished by the increase in the dimension of the weight vector
(W) in equation (2), which is quadratic compared to equation (1), which is a linear equation. The fine-
grained task rules out the ambiguity (high inter-class variability and low intra- class variability)
between the 1. OD and OC, 2. the neuro-retinal rim and the background.

The non-quadratic or linear kernel used by convention in the fully connected layer can be expressed
as,

y=Wx+B --- (1)

where “W” and “B” are the “weights” and “biases”; And “x” and “y” are the inputs and outputs
respectively

6
where the weight vector, W ∈ R d1 Χ d2 . where d1 and d2 represent the dimensions of the input and
output respectively. For a linear kernel which is conventionally used in CNNs d 1=d2=1 as both the
input and output would be1 D vectors.

Whereas, the quadratic kernel of nth order is expressed as,

( n−1
2 ) () n−1
2 ) (2)
K n−1 ( x ) =( x )
T
W q (x
Where “x” is the input and W is the weights.
d 1× d 2 × H
And W q ( x ) ∈ R

where d1 and d2 represent the dimensions of the input and output respectively, similar to the linear
kernel. Additionally, H is the new dimension that offers an extended scope for the learning.

Spatial Attention maps

We use spatial attention maps to emphasize the center pixel, as the patches are extracted around the
boundaries (in the training phase) and around the ambiguous pixels (in the testing phase).

As the patch-wise training can be inferred from the image patches shown in the reviser CNN sub-
block in Fig 1, more weightage should be given to the center pixel (the ambiguous one), we
implement an attention map similar to the one implemented in [12] as given in the eqn (1).

O=C 1 ( C2 x+C 3 g ) (1)

Here "O" represents the output feature map. The terms C1, C2, and C3 represent a 1 Χ 1 - 2 D
convolution layers, x represents the input feature map, and "g" represents the gating function which is
here a centrally-weighted spatial attention map. The feature refinement confined to the ambiguity of
the center pixel is attained by focusing on the same. The input feature map (x) is added to the
attention map (g) for adaptive feature refinement and here the input feature map (x) and the attention
map (g) are filtered with convolution kernels C1 and C2 respectively.

3. Results and discussions:

The patch-wise revision around the ambiguities has conceived better outcomes in OD and OC
segmentation, directly addressing the semantic gap and the absence of revision mechanisms in state-
of-the-art CNNs across the following datasets: DRISHTI- GS (101 images), RIMONE-DL (159
images), REFUGE (400 images) and ORIGA (650 images), all with ten-fold cross validation.

Each row of fig 3 shows in order the fundus image, OD Segmentation o/p of Conventional CNN 1 and
CNN 2, ambiguities of segmentation outputs of the CNN 1 and CNN 2, segmentation o/p of the
proposed (learner-reviser) CNN after ambiguity resolution in the reviser CNN, Ground Truth from the
dataset.

A deep encoder-decoder CNN is termed CNN 1 and a conventional U- Net is termed as CNN 2.
These CNNs performance is compared to the proposed learner-reviser CNN as an ablation study for
the learner-reviser CNN. Table 1,2,3 and 4 show the ablation Study with different evaluation metrics
on Drishti GS, RIMONE-DL, REFUGE, ORIGA, in order. It is inferred that the proposed CNN best
segments the OD and OC in the RIMONE and DRISHTI-GS with highest accuracy of segmentation.
With respect to DRISHTI GS dataset there is an improvement of 4% and 5% of segmentation
accuracy with the proposed CNN in OD and OC segmentation tasks respectively. The DICE
coefficient is also best in the proposed CNN across datasets providing the best improvement in
DRISHTI (95% to 99% approximately) and REFUGE datasets (97% to 99% approximately), providing
modest improvement in other datasets. With respect to OC segmentation the proposed CNN has the
best improvement in REFUGE datasets and modest improvement in other datasets (i.e., in REFUGE
and ORIGA datasets, there is a 2% improvement in DICE coefficient. With respect to IoU measure,

7
consistently the reviser CNN produced better outcomes as per the values inferred in Table 1 to 4. The
OD and OC segmentation accuracy of the proposed (learner-reviser) CNN is better compared to the
pixel accuracies of CNN 1 and 2, which can be subjectively inferred from the 5 th column of fig 3 and fig
4, after the ambiguity (white pixels in 4 th column of fig 3 and fig 4) resolution. The same is the case
with the optical cup segmentation also, as inferred from fig 4. Also, it is inferred from fig 3 and fig 4,
there is ambiguities in the segmentation of OD and OC both in CNN 1 and CNN 2. It is measured to
be around 30% among the 2 CNNs (i.e., the percentage of white pixels in column 4 of fig 3 and 4 with
respect to the number of pixels in the image).

Metrics of the Proposed CNN is compared with theoretical metrics (as per the corresponding
literatures) of State-of-the-art CNNs, as shown in table 5, the OD and OC segmentation accuracy is
best in the proposed learner reviser CNN in most of the settings.

 Regarding the OD segmentation, with the Drishti GS dataset, the accuracy is the best with the
proposed CNN with 99.8 percent, which is more than the OD segmentation accuracy of all the
other state-of-the-art and published models, CDED Net in [24], Efficient Net [23] and U Net
with feature fusion [32].
o With the DRISHTI-GS and RIMONE dataset, the OD & OC segmentation accuracy is
99.8 and 99.7 percent with the proposed CNN, in which case the CDED Net in [24]
has an accuracy of 99.7 percent. The CDED net is equivalent in performance for OD
segmentation with DRISHTI-GS and RIMONE datasets.
o With the Rim-one dataset, the OC segmentation accuracy is the best with the
proposed CNN with 97.3 percent, in which case the CDED Net in [24] has an
accuracy of only 94.6 percent.

Further CDED net was not tested by the researchers [24] with REFUGE and ORIGA dataset.
The proposed learner reviser CNN produced competitive outcomes with REFUGE (OD
segmentation accuracy of 97.8%, OC segmentation accuracy of 96.7%) and ORIGA (OD
segmentation accuracy of 98.2%, OC segmentation accuracy of 97.9%),

 Further the FAU Net [32] for DRISHTI-GS dataset, achieves a Dice of 98.2% and IoU of 96%
for segmentation of OD, whereas the proposed CNN achieves a Dice and IoU of around 99%.
o For OC segmentation the FAU Net achieves a Dice of 92.6% and IoU of 87%,
whereas the proposed CNN achieves a Dice and IoU of 99%.
o For REFUGE dataset, the proposed CNN exhibits 1.3% and 7.9% improvement in
IoU for OD and OC respectively.

Thus, across datasets the proposed CNN performs significantly better in the accuracy of
segmentation, IoU for the OC due to the addressal of high intra-class and small inter-class
differences by the fine-grained learning in the intersectional areas of OD and OC.

The state-of-the-art CNN-based segmentation models like U- Nets possess semantic gap problems
as addressed in [22], [24], [25], [26] because of the cascade of the initial encoder and terminal
decoder. The patch-wise revision around the ambiguities has conceived better outcomes in OD and
OC segmentation, directly addressing the semantic gap and the absence of revision mechanisms in
state-of-the-art CNNs. Thus, high intra-class variability and small inter-class differences in the
boundaries are evident and there is a strong requirement to deploy the fine-grained learning
with a revision strategy. That is why we pre-train the reviser CNN with the patches, created as patch
dataset [39], at the intersectional areas of OD.

Regarding the significance of the accuracy of the AI algorithms deputed for clinical care, the authors
of [35] state that around 20% of the patients, for whom the diseases are mis-diagnosed, causes
serious harm to the patients [35]. The treatments arising out of false positive or false negative
diagnosis affects the specific anatomy/ organ under treatment or even causes life-threats. Hence

8
there is a subjective significance for the improvement of metrics in clinical domain, compared to other
domains.

4. Conclusion:

In our experimentation, the small inter-class differences at the boundary of OD and OC reflect an
ambiguity (of around 30%) among the 2 CNNs in both the OD and OC segmentation. This is due to
the semantic gap and the absence of a revision mechanism in state-of-the-art CNNs. The ambiguity is
addressed by the fine-grained testing in the reviser CNN with patch wise learning and quadratic
layers. The CDED net is equivalent in performance for OD segmentation with DRISHTI-GS and
RIMONE datasets, whereas with OC segmentation in the proposed CNN has significantly improved.
Across four datasets the proposed CNN performs significantly better in the accuracy of segmentation
and IoU for the OC segmentation and moderately better in the metrics for OD segmentation. The
proposed CNN also has achieved a Dice and IoU of 99%, significantly higher than the ablation models
and the state-of-the-art CNN’s in recent literatures. The strategy of learning and revising mimics the
manual dataset creation process with the fusion of multiple decisions and ambiguity resolution. An
extended model of the proposed one can be deputed to generate segmentation datasets.

Data availability statement:

The DRISHTI-GS, and RIMONE-DL datasets [31] are freely available to researchers.

The Patch dataset is created exclusively for this work by cropping patches of size 25 Χ 25 from
fundus images in DRISHTI- GS dataset. around the borders of the Optical Disc (OD) and Optical Cup
(OC) where the ambiguity is expected to be high. These proposed ambiguous points (in class
belongingness as OD or OC) are extracted by executing logical EX-OR of the OD and OC labels.

The dataset that provides 50, 000 patches of size 25 × 25 from the images and the corresponding
labels is given at [39].

5. References:

1. Meng Y, Zhang H, Zhao Y, Gao D, Hamill B, Patri G, Peto T, Madhusudhan S, Zheng Y. Dual
Consistency Enabled Weakly and Semi-Supervised Optic Disc and Cup Segmentation With
Dual Adaptive Graph Convolutional Networks. IEEE Trans Med Imaging. 2023 Feb;42(2):416-
429. doi: 10.1109/TMI.2022.3203318. Epub 2023 Feb 2. PMID: 36044486.
2. R. Gu et al., "CA-Net: Comprehensive Attention Convolutional Neural Networks for
Explainable Medical Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 40,
no. 2, pp. 699-711, Feb. 2021, doi: 10.1109/TMI.2020.3035253.
3. S. F. Qadri, L. Shen, M. Ahmad, S. Qadri, S. S. Zareen and S. Khan, "OP-convNet: A Patch
Classification-Based Framework for CT Vertebrae Segmentation," in IEEE Access, vol. 9, pp.
158227-158240, 2021, doi: 10.1109/ACCESS.2021.3131216.
4. Bi, L., Feng, D. & Kim, J. Dual-Path Adversarial Learning for Fully Convolutional Network
(FCN)-Based Medical Image Segmentation. Vis Comput 34, 1043–1052 (2018).
https://doi.org/10.1007/s00371-018-1519-5
5. Cheng, Z., Qu, A. & He, X. Contour-aware semantic segmentation network with spatial
attention mechanism for medical image. Vis Comput 38, 749–762 (2022).
https://doi.org/10.1007/s00371-021-02075-9
6. Yoel F. Garcia-Marin, David Alonso-Caneiro, Damien Fisher, Stephen J. Vincent, Michael J.
Collins, 'Patch-based CNN for corneal segmentation of AS-OCT images: Effect of the number

9
of classes and image quality upon performance,' Computers in Biology and Medicine,Volume
152,PP 106-342,2023. https://doi.org/10.1016/j.compbiomed.2022.106342.
7. Wenyu Xing, Zhibin Zhu, Dongni Hou, Yaoting Yue, Fei Dai, Yifang Li, Lin Tong, Yuanlin
Song, Dean Ta, 'CM-SegNet: A deep learning-based automatic segmentation approach for
medical images by combining convolution and multilayer perceptron,' Computers in Biology
and Medicine, Volume 147, 2022, 105797.
https://doi.org/10.1016/j.compbiomed.2022.105797.
8. Kai Zhang, Shouliang Qi, Jiumei Cai, Dan Zhao, Tao Yu, Yong Yue, Yudong Yao, Wei Qian,
'Content-based image retrieval with a Convolutional Siamese Neural Network: Distinguishing
lung cancer and tuberculosis in CT images,' Computers in Biology and Medicine, Volume
140, 2022, 105096. https://doi.org/10.1016/j.compbiomed.2021.105096.
9. Shangwang Liu, Tongbo Cai, Xiufang Tang, Yangyang Zhang, Changgeng Wang, 'COVID-19
diagnosis via chest X-ray image classification based on multiscale class residual attention,'
Computers in Biology and Medicine, Volume 149, 2022, 106065.
https://doi.org/10.1016/j.compbiomed.2022.106065.
10. Hani Alquhayz, Hafiz Zahid Tufail, Basit Raza, 'The multi-level classification network (MCN)
with modified residual U-Net for ischemic stroke lesions segmentation from ATLAS,'
Computers in Biology and Medicine, Volume 151, Part A, 2022, 106332.
https://doi.org/10.1016/j.compbiomed.2022.106332.
11. Jiang S, Suriawinata AA, Hassanpour S. MHAttnSurv: Multi-head attention for survival
prediction using whole-slide pathology images. Comput Biol Med. 2023 May;158:106883. doi:
10.1016/j.compbiomed.2023.106883. Epub 2023 Apr 5. PMID: 37031509; PMCID:
PMC10148238.
12. Jinxi Xiang, Xiyue Wang, Xinran Wang, Jun Zhang, Sen Yang, Wei Yang, Xiao Han, Yueping
Liu, 'Automatic diagnosis and grading of Prostate Cancer with weakly supervised learning on
whole slide images,' Computers in Biology and Medicine, Volume 152, 2023,106340.
https://doi.org/10.1016/j.compbiomed.2022.106340.
13. Le Li, Yong Liang, Mingwen Shao, Shanghui Lu, Shuilin liao, Dong Ouyang, 'Self-supervised
learning-based Multi-Scale feature Fusion Network for survival analysis from whole slide
images,' Computers in Biology and Medicine, Volume 153, 2023, 106482.
https://doi.org/10.1016/j.compbiomed.2022.106482.
14. Imran, A., Li, J., Pei, Y. et al. Fundus image-based cataract classification using a hybrid
convolutional and recurrent neural network. Vis Comput 37, 2407–2417 (2021).
https://doi.org/10.1007/s00371-020-01994-3
15. Huilin Lai, Ye Luo, Guokai Zhang, Xiaoang Shen, Bo Li, and Jianwei Lu. 2022. Toward
accurate polyp segmentation with cascade boundary-guided attention. Vis. Comput. 39, 4
(Apr 2023), 1453–1469. https://doi.org/10.1007/s00371-022-02422-4
16. Xi, P., Guan, H., Shu, C. et al. An integrated approach for medical abnormality detection
using deep patch convolutional neural networks. Vis Comput 36, 1869–1882 (2020).
https://doi.org/10.1007/s00371-019-01775-7
17. Milan Sulc, Jiri Matas, “ Fine grained recognition of plants from Images”, Plant methods,
2017, pp. 1- 14.
18. Chen Lin Zhang, Jianxin Wu, "Improving CNN layers with power mean non-linearity" Pattern
Recognition, 2018, vol. 89, pp. 12-21.
19. Yingqiong Peng, Muxin Liao, “FB- CNN: Feature fusion based bilinear CNN for classification
of fruit fly image”, IEEE access, Vol 8, 2020, pp. 3987- 3995.
20. Shujun Wang, Lequan Yu, “Patch based output space adversarial learning for joint Optical
Disc and Cup segmentation", IEEE Transactions on Medical Imaging, 2018, 38(11), pp. 2485-
2495.
21. Guo- Sen Xie, Xu- Yao Zhang, " LG- CNN: From local parts to global discrimination for fine-
grained recognition" Pattern Recognition, 2017, 71, pp. 118- 131.
22. C Raja, L Balaji, “An automatic detection of blood vessel in retinal images using convolution
neural network for diabetic retinopathy detection.” Pattern Recognition and Image Analysis,
2019, 29, pp. 533- 545.
23. Neeraj Gupta, Hitendra Garg, Rohit Agarwal, “A robust framework for glaucoma detection
using CLAHE and EfficientNet”, The Visual Computer, 2021.

10
24. M. Tabassum, Tariq M. Khan, Muhammad Arsalan, et al., "CDED-Net: Joint Segmentation of
Optic Disc and Optic Cup for Glaucoma Screening," IEEE Access, vol. 8, pp. 102733-102747,
2020.
25. Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, ‘‘UNetCC: A nested U-net architecture
for medical image segmentation,’’ Deep Learning in Medical Image Analysis and Multimodal
Learning for Clinical Decision Support (DLMIA) (Lecture Notes in Computer Science), vol.
11045, 2018.
26. N. Ibtehaz, M. S. Rahman, ‘‘MultiResUNet: Rethinking the U-net architecture for multimodal
biomedical image segmentation,’’ Neural Netw., vol. 121, pp. 74–87, 2020.
27. Juneja, M., Singh, S., et al. Automated detection of Glaucoma using deep learning
convolution network (G-net). Multimed Tools Appl, Vol79, pp. 15531–15553, 2020.
28. B. J. Bhatkalkar, D. R. Reddy, et al., "Improving the Performance of Convolutional Neural
Network for the Segmentation of Optic Disc in Fundus Images Using Attention Gates and
Conditional Random Fields," IEEE Access, vol. 8, pp. 29299-29310, 2020.
29. Jiang Y, Duan L, et al., “JointRCNN: A Region-Based Convolutional Neural Network for Optic
Disc and Cup Segmentation,” IEEE Trans Biomed Eng. 2020 Vol. 67(2), pp. 335-343.
30. Jayanthi et al, 'A Comprehensive Retinal Image Dataset for the Assessment of Glaucoma
from the Optic Nerve Head Analysis,' JSM Biomedical Imaging Data Papers', 2(1): 1004
(2015)
31. Fumero Batista, Francisco José et al. RIM-ONE DL: A Unified Retinal Image Database for
Assessing Glaucoma Using Deep Learning. Image Analysis & Stereology, [S.l.], v. 39, n. 3, p.
161-167, nov. 2020. doi:https://doi.org/10.5566/ias.2346.
32. Xiaoxin Guo, Jiahui Li, Qifeng Lin, Zhenchuan Tu, Xiaoying Hu, Songtian Che, 'Joint optic
disc and cup segmentation using feature fusion and attention,' Computers in Biology and
Medicine, Volume 150, 2022, 106094. https://doi.org/10.1016/j.compbiomed.2022.106094.
33. Chi Zhang, Jingben Lu, Qianqian Hua, Chunguo Li, Pengwei Wang, “SAA-Net: U-shaped
network with Scale-Axis-Attention for liver tumor segmentation”, Biomedical Signal
Processing and Control, V. 73, n. 4, 2022.
34. He, G., Zhang, G., Zhou, L. et al. Deep convolutional neural network for hippocampus
segmentation with boundary region refinement. Medical and Biological Engineering and
Computing, vol 61, 2329–2339, 2023.

35. Richens, J.G., Lee, C.M. & Johri, S. Improving the accuracy of medical diagnosis
with causal machine learning. Nat Commun 11, 3923 (2020).
https://doi.org/10.1038/s41467-020-17419-7
36. https://ieee-dataport.org/documents/ddrdnetdataset
37. José Ignacio Orlando, Huazhu Fu, João Barbosa Breda, “REFUGE Challenge: A unified
framework for evaluating automated methods for glaucoma assessment from fundus
photographs.”, Medical Image Analysis, Volume 59, 2020,
https://doi.org/10.1016/j.media.2019.101570.
38. Zhang Z, Yin FS, Liu J, Wong WK, et al., ‘ORIGA(-light): an online retinal fundus image
database for glaucoma analysis and research.” Annu Int Conf IEEE Eng Med Biol Soc. 2010;
2010:3065-8. doi: 10.1109/IEMBS.2010.5626137. PMID: 21095735.
39. https://github.com/Rajachandru/Fine-Grained-Segmentation-of-OD-and-OC

11

You might also like