0% found this document useful (0 votes)

20 views15 pages

Paper 5

Good Research ppaper

Uploaded by

giribabukande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Paper 5

Good Research ppaper

Uploaded by

giribabukande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received February 2, 2021, accepted February 15, 2021, date of publication March 4, 2021, date of current version March

17, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3063716

Real-Time Polyp Detection, Localization and

Segmentation in Colonoscopy
Using Deep Learning
DEBESH JHA 1,2 , SHARIB ALI2,4 , NIKHIL KUMAR TOMAR1 ,
HÅVARD D. JOHANSEN 3 , DAG JOHANSEN3 , JENS RITTSCHER2,4 ,
MICHAEL A. RIEGLER1 , AND PÅL HALVORSEN1,5
1 SimulaMet, 0167 Oslo, Norway
2 Department of Engineering Science, Big Data Institute, University of Oxford, Oxford OX3 7XF, U.K.
3 Department of Computer Science, UiT—The Arctic University of Norway, 9037 Tromsø, Norway
4 Oxford NIHR Biomedical Research Centre, Oxford OX4 2PGv, U.K.
5 Department of Computer Science, Oslo Metropolitan University, 0167 Oslo, Norway

Corresponding authors: Debesh Jha ([email protected]) and Sharib Ali ([email protected])

This work was supported in part by the Research Council of Norway under Contract 270053, and in part by the National Institute for
Health Research (NIHR) Oxford BRC through the Wellcome Trust Core Award Grant 203141/Z/16/Z. The work of Debesh Jha was
supported by the Research Council of Norway project number 263248 (Privaton). The computations in this paper were performed on
equipment provided by the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by
the Research Council of Norway under contract 270053. Parts of computational resources were also used from the research supported by
the National Institute for Health Research (NIHR) Oxford BRC with additional support from the Wellcome Trust Core Award Grant
Number 203141/Z/16/Z. The work of Sharib Ali was supported by the NIHR Oxford Biomedical Research Centre. The views expressed
are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

ABSTRACT Computer-aided detection, localisation, and segmentation methods can help improve
colonoscopy procedures. Even though many methods have been built to tackle automatic detection and
segmentation of polyps, benchmarking of state-of-the-art methods still remains an open problem. This
is due to the increasing number of researched computer vision methods that can be applied to polyp
datasets. Benchmarking of novel methods can provide a direction to the development of automated polyp
detection and segmentation tasks. Furthermore, it ensures that the produced results in the community are
reproducible and provide a fair comparison of developed methods. In this paper, we benchmark several
recent state-of-the-art methods using Kvasir-SEG, an open-access dataset of colonoscopy images for polyp
detection, localisation, and segmentation evaluating both method accuracy and speed. Whilst, most methods
in literature have competitive performance over accuracy, we show that the proposed ColonSegNet achieved
a better trade-off between an average precision of 0.8000 and mean IoU of 0.8100, and the fastest speed
of 180 frames per second for the detection and localisation task. Likewise, the proposed ColonSegNet
achieved a competitive dice coefficient of 0.8206 and the best average speed of 182.38 frames per second
for the segmentation task. Our comprehensive comparison with various state-of-the-art methods reveals the
importance of benchmarking the deep learning methods for automated real-time polyp identification and
delineations that can potentially transform current clinical practices and minimise miss-detection rates.

INDEX TERMS Medical image segmentation, ColonSegNet, colonoscopy, polyps, deep learning, detection,
localisation, benchmarking, Kvasir-SEG.
I. INTRODUCTION are important to detect because it can develop into the CRC
Colorectal Cancer (CRC) has the third highest mortality rate at late stage. Thus, an early detection of CRC is crucial for
among all cancers. The overall five-year survival rate of colon survival.
cancer is around 68%, and stomach cancer is only around After modification in the lifestyle, the prevention from
44% [1]. Searching for and removing precancerous anomalies the CRC is the screening of the colon regularly. Differ-
is one of the best working methods to avoid CRC based ent research studies suggest that population-wide screening
mortality. Among these abnormalities, polyps in the colon advances the prognosis and can even reduce the incidence
of CRC [2]. Colonoscopy is an invasive medical procedure
The associate editor coordinating the review of this manuscript and where an endoscopist examines and operates on the colon
approving it for publication was Alberto Cano . using a flexible endoscope. It is considered to be the best

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
40496 VOLUME 9, 2021
D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

diagnostic tool for colon examination for early detection and designs have enabled us to built accurate and efficient sys-
removal of polyps. Therefore, colonoscopic screening is the tems, these largely depend on the data availability as most
most preferred technique among gastroenterologists. recent methods are data voracious. The lack of availability
Polyps are abnormal growths of tissue protruding from of public datasets [14] is a critical bottleneck to accelerate
the mucous membrane. They can occur anywhere in the algorithm development in this realm.
gastrointestinal (GI) tract but are mostly found in the col- In general, curating medical datasets are challenging and
orectal area and are often considered a predecessor of it requires domain knowledge expertise. Reaching a con-
CRC [3], [4]. Polyps may be pedunculated (having a well- sensus to achieve ground truth labels from different experts
defined stalk) or sessile (without a defined stalk). The on the same dataset is again another obstacle. Typically,
colorectal polyps can be categorised into two classes: non- in colonoscopy, smaller polyps or flat/sessile polyps that
neoplastic and neoplastic. Non-neoplastic polyps are further are usually missed out during a procedure can be difficult
sub-categorised into hyperplastic, inflammatory, and hamar- to observe even during manual labeling. Other challenges
tomatous polyps. These types of polyps are non-cancerous include the patient variability and presence of different sizes,
and not harmful. Neoplastic is further sub-categorised into shapes, textures, colors, and orientations of these polyps [3].
adenomas and serrated polyps. These polyps can develop Therefore, during polyp data curation and developing of auto-
into the risk of cancer. Based on their size, colorectal mated systems for the colonoscopy, it is vital that all various
polyps can be categorised into three classes, namely, diminu- challenges often come along routine colonoscopy has to be
tive (≤5mm), small (6 to 9 mm), and advanced (large) taken into consideration.
(≥10mm) [5]. Usually, larger polyps can be detected and Automatic polyp detection and segmentation systems
resected. based on Deep Learning (DL) have a high overall per-
There exists a significant risk with small and diminu- formance in both colonoscopy images and colonoscopy
tive colorectal polyps [6]. A polypectomy is a technique videos [15], [16]. Ideally, the automatic CADx systems for
for the removal of small and diminutive polyps. There polyps detection, localisation, and segmentation should have:
are five different polypectomy techniques for resection of 1) consistent performance and improved robustness to patient
diminutive polyps, namely, cold forceps polypectomy, hot variability, i.e., the system should be able to produce reli-
forceps polypectomy, cold snare polypectomy, hot snare able outputs, 2) high overall performance surpassing the set
polypectomy, and endoscopic mucosal resection [5]. Among bar for algorithms, 3) real-time performance required for
these techniques, cold snare polypectomy is considered best clinical applicability, and 4) easy-to-use system that can pro-
polypectomy technique for resectioning small colorectal vide with clinically interpretable outputs. Scaling this to a
polyps [7]. population sized cohort is also a very resource-demanding
Colonoscopy is an invasive procedure that requires high- and incurs enormous costs. As a first step, we therefore
quality bowel preparation as well as air insufflation during target the detection, localisation, and segmentation of col-
examination [8]. It is both an expensive and time-demanding orectal polyps known as precursors of CRC. The reason for
procedure. Nevertheless, on average, 20% of polyps are starting with this scenario is that most colon cancers arise
missed during examinations. The risk of getting cancer there- from benign adenomatous polyps (around 20%) containing
fore relates to the individual endoscopists’ ability to detect dysplastic cells. Detection and removal of polyps prevent the
polyps [9]. Recent studies have shown that new endoscopic development of cancer, and the risk of getting CRC in the
devices and diagnostic tools have improved the adenoma following 60 months after a colonoscopy depends largely on
detection rate and polyp detection rate [10], [11]. However, the endoscopist ability to detect polyps [9].
the problem of over-looked polyps remains the same. Detection and localisation of polyps are usually critical
The colonoscopy videos recorded at the clinical centers during routine surveillance and to measure the polyp load
store a significant amount of colonoscopy data. However, of the patient at the end of the surveillance while pixel-wise
the collected data are not used efficiently as they are labour segmentation becomes vital to automate the polyp boundary
intense for the endoscopists [12]. Thus, a second review of delineation during the surgical procedures or radio-frequency
videos are often not done. This might lead to missed detec- ablations. In this paper, we evaluate DL methods for both
tion at an early stage largely. Automated data curation and detection (and localisation referring to bounding box detec-
annotation of video data is a prerequisite for building reliable tion) and segmentation (pixel-wise classification or semantic
Computer Aided Diagnosis (CADx) systems that can help to segmentation) SOTA methods on Kvasir-SEG dataset [17]
assess clinical endoscopy more thoroughly [13]. A fraction to provide a comprehensive benchmark for the colonoscopy
of the collected colonoscopy data can be curated to develop images. The main aim of the paper is to establish a
computer-aided systems for automated detection and delin- new strong benchmark with existing successful computer
eation of polyps either during the clinical procedure or after vision approaches. Our contributions can be summarised as
the reporting. At the same time, to build a robust system, follows:
it is vital to incorporate data variability related to patients, • We propose ColonSegNet, an encoder-decoder archi-
endoscopic procedure, and endoscope manufacturers. Even tecture for segmentation of colonoscopic images. The
though recent developments in computer vision and system architecture is very efficient in terms of processing speed

VOLUME 9, 2021 40497

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

(i.e., produces segmentation of colonoscopic polyp in detection to avoid abnormalities and enable early disease
real-time) and competitive in terms of performance. detection.
• A comprehensive comparison of the state-of-the-art In addition to the work related to automatic detection
computer vision baseline methods on the Kvasir-SEG and localisation, pixel-wise classification (segmentation) of
dataset is presented. The best approaches show real- the disease provides an exact polyp boundary and hence
time performance for polyp detection, localisation, and is also of high significance for clinical surveillance and
segmentation. procedures. Bernel et al. [31] presented the results of the
• We have established strong benchmark for detection and automatic polyp detection subchallenge, which was the part
localisation on the Kvasir-SEG dataset. Additionally, of the endoscopic vision challenge at the Medical Image
we have extended segmentation baseline as compared Computing and Computer Assisted Intervention (MICCAI)
to [3], [17], [18]. These benchmarks can be useful to 2015 conference. This work compared the performance of
develop reliable and clinically applicable methods. eight teams and provided an analysis of various detec-
• Detection, localisation, and semantic segmentation per- tion methods applied on the provided polyp challenge data.
formances are evaluated on standard computer vision Wang et al. [16] proposed a DL-based SegNet [39] that had
metrics. a real-time performance with an inference of more than
• Detailed analysis have been presented with the spe- 25 frames per second. Geo and Matuszewski [40] used
cific focus on the best and worst performing cases that fully convolution dilation networks on the Gastrointesti-
will allow to dissect method success and failure modes nal Image ANAlysis (GIANA) polyp segmentation dataset.
required to accelerate algorithm development. Jha et al. [3] proposed ResUNet++ demonstrating 10%
The rest of the paper is organized as follows: In Section II, improvement compared to the widely used UNet baseline
we present related work in the field. In Section III, we present on Kvasir-SEG dataset. They also further applied the trained
the material. Section IV presents both detection, localisa- model on the CVC-ClinicDB [23] dataset showing more
tion, and segmentation methods. Result are presented in than 15% improvement over UNet. Ali et al. [32] did a
Section V. Discussion on the best performing detection, comprehensive evaluation for both detection and segmenta-
localisation, and semantic segmentation approaches are pre- tion approaches for the artifacts present clinical endoscopy
sented in Section VI and finally a conclusion is provided in including colonoscopy data [41]. Wang et al. [42] proposed a
the Section VII. boundary-aware neural network (BA-Net) for medical image
segmentation. BA-Net is an encoder-decoder network that is
II. RELATED WORK capable of capturing the high-level context and preserving
Automated polyp detection has been an active topic for the spatial information. Later on, Jha et al. [43] proposed
research over the last two decades and considerable work has DoubleUNet for the segmentation, which was applied to
been done to develop efficient methods and algorithms. Ear- four biomedical imaging datasets. The proposed DoubleUNet
lier works were especially focused on polyp color and texture, is the combination of two UNet stacked on top of each
using handcrafted descriptors-based feature learning [27], other with some additional blocks. Experimental results on
[28]. More recently, methods based on Convolutional Neural CVC-Clinic and ETIS-Larib polyp datasets show the state-of-
Networks (CNNs) have received significant attention [29], the-art (SOTA) performances. In addition to the related work
[30], and have been the go to approach for those competing on polyp segmentation, there are studies on segmentation
in public challenges [31], [32]. approaches [44]–[47].
Wang et al. [33] designed algorithms and developed Datasets has been instrumental for medical research.
software modules for fast polyp edge detection and polyp Table 1 shows the list of the available endoscopic image and
shot detection, including a polyp alert software system. video datasets. Kvasir-SEG, ETIS-Larib, and CVC-ClinicDB
Shin et al. [34] have used region-based CNN for automatic contain colonoscopy images, whereas Kvasir, Nerthus,
polyp detection in colonoscopy videos and images. They and HyperKvasir contain the images from the whole
used Inception ResNet as a transfer learning approach and GI. KvasirCapusle contains images from video capsule
post-processing techniques for reliable polyp detection in endoscopy. All the dataset contains images acquired from
colonoscopy. Later on, Shin et al. [14] used generative conventional White Light (WL) imaging technique except
adversarial network [35], where they showed that the gen- the EDD dataset, where it contains images from both
erated polyp images are not qualitatively realistic; how- WL imaging and Narrow Band Imaging (NBI) techniques.
ever, they can help to improve the detection performance. All of these datasets contain at least a polyp class. Out of
Lee et al. [15] used YOLO-v2 [36], [37] for the development nine available datasets, Kvasir-SEG [17], ETIS-Larib [22],
of polyp detection and localisation algorithm. The algorithm and CVC-ClinicDB [23] has manually labeled ground truth
produced high sensitivity and near real-time performance. masks. Among them, Kvasir-SEG offers the most num-
Yamada et al. [38] developed an artificial intelligence sys- ber of annotated samples providing both ground truth
tem that can automatically detect the sign of CRC dur- masks and bounding boxes offering detection, localisation,
ing colonoscopy with high sensitivity and specificity. They and segmentation task. All of the datasets are publicly
claimed that their system could aid endoscopists in real-time available.

40498 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

TABLE 1. Available endoscopic datasets.

Dataset development, benchmarking of the methods, and

evaluation are critical in the medical imaging domain.
It inspires the community to build clinically transferable
methods on a well-curated and standardised dataset. Due to
the lack of benchmark papers, it becomes utmost difficult
to understand the clear strength of methods in the litera-
ture. New algorithm developments demonstrating its trans-
lational abilities in clinics is thus very minimal. Data science
challenges do offer some insight, however, a comprehensive
analysis on various different aspects such as detection, local-
isation, segmentation, and inference time estimation are still
not covered by the most.
Inspired by the previous benchmark for polyp detec-
tion [31], endoscopic artifact detection [41], endoscopic
disease detection and segmentation [25], endoluminal scene
object segmentation [48], and endoscopic instrument seg-
mentation [49], we introduce a new benchmark for the auto-
matic polyp detection, localisation and segmentation using
publicly available Kvasir-SEG dataset.

III. MATERIALS – DATASET

We have used the Kvasir-SEG [17] for detection, locali-
sation, and segmentation tasks. Figure 1 shows the image,
ground truth information, and their detection (their localised
bounding boxes in red). This dataset is the outcome of
an initiative for open and reproducible results. It contains
1000 polyp images acquired by high-resolution electromag-
netic imaging system, i.e., ScopeGuide, Olympus Europe,
their corresponding masks and bounding box information.
The images and their ground truths can be used for the
segmentation task, whereas the bounding box information FIGURE 1. Sample images from Kvasir-SEG dataset: Annotated masks
(2nd column) and bounding boxes (3rd column) for selected samples.
provides an opportunity for the detection task. The resolu-
tion of the images in this dataset ranges from 332 × 487
to 1920 × 1072 pixels. The dataset can be downloaded images of 700 large polyps (> 160×160 pixels), 323 medium
at https://datasets.simula.no/kvasir-seg/. The dataset includes sized polyps (> 64 × 64 pixels and ≤ 160 × 160 pixels)

VOLUME 9, 2021 40499

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

and 48 small polyps (≤ 64 × 64 pixels). In total, the dataset

consists of 1072 images of polyps with segmentation masks
and bounding boxes.

IV. METHOD
Detection methods aim to predict the object class and regress
bounding boxes for localisation, while segmentation meth-
ods aim to classify the object class for each pixel in an
image. In Figure 1, ground truth masks for segmentation
task are shown in 2nd column while corresponding bounding
boxes for the detection task are in 3rd column. This section
describes the baseline methods for detection, localisation and
segmentation methods used for the automated detection and
segmentation of polyp in the Kvasir-SEG dataset.

A. DETECTION AND LOCALISATION BASELINE METHODS

Detection methods consist of input, backbone, neck, and
head. The input can be images, patches, or image pyramids.
The backbone can be different CNN architectures such as
VGG16, ResNet50, ResNext-101, and Darknet. The neck is
the subset of the backbone network, which could consist of
FPN, PANet, and Bi-FPN. The head is used to handle the pre-
diction boxes that can be one stage detector for dense predic-
tion (e.g., YOLO, RPN, and RetinaNet [50]), and two-stage
detector with the sparse prediction (e.g., Faster R-CNN [51]
and RFCN [52]). Recently, one stage methods have attracted
much attention due to their speed and ability to obtain optima
accuracy. This has been possible because recent networks
utilise feature pyramid networks or spatial-pyramid pool-
ing layers to predict candidate bounding boxes which are
regressed by optimising loss functions (see Figure 2).
In this paper, we use EfficientDet [53] which uses Effi-
FIGURE 2. Baseline detection, localisation and semantic segmentation
cientNet [54], as the backbone architecture, bi-directional method summary.
feature pyramid network (BiFPN) as the feature network, and
shared class/box prediction network. Additionally, we also
use Faster R-CNN [51], which uses region proposal net- boxes [51] are searched and pre-defined for the provided data
work (RPN), as the proposal network and Fast R-CNN [55] to tackle large variance of scale and aspect ratio of boxes.
as the detector network. Moreover, we use YOLOv3 [56] Table 2 shows the hyperparameter used by each of the object
that utilises multi-class logistic loss (binary cross-entropy detection methods for the detection task.
for classification loss and mean square error for regres-
sion loss) modeled with regularizers such as objectness pre- B. SEGMENTATION BASELINE METHODS
diction scores. Furthermore, we also used YOLOv4 [57], In the past years, data-driven approaches using CNNs have
which utilises an additional bounding box regressor based changed the paradigm of computer vision methods, includ-
on the Intersection over Union (IoU) and a cross-stage par- ing segmentation. An input image can be directly be fed
tial connections in their backbone architecture. Additionally, to convolution layers to obtain feature maps, which can be
YOLOv4 allows on fly data augmentation, such as mosaic later upsampled to predict pixel-wise classification provid-
and cut-mix. ing object segmentation. Such networks learn from available
RetinaNet [50] takes into account the data driven prop- ground truth labels and can be used to predict labels from
erty that allows the network to focus on ‘‘hard’’ samples other similar data. A Fully Convolutional Network (FCN)
for improved accuracy. The easy to adapt backbones for based segmentation was first proposed by Long et al. [58] that
feature extraction at the beginning of the network provides can be trained end-to-end. Ronneberger et al. [59] modified
the opportunity to experiment with deeper and varied archi- and extended the FCN architecture to a UNet architecture.
tectures such as ResNet50, and ResNet101 for RetinaNet The UNet consist of an analysis (encoder) and a synthesis
and 53 layered Darknet53 backbone for YOLOv3 and (decoder) path. In the analysis path of the network, deep fea-
YOLOv4 architecture. To tackle the different aspect ratio tures are learnt, whereas in the synthesis path segmentation is
problem, for both one stage networks, optimal anchor performed on the basis of the learnt features.

40500 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

Pyramid Scene Parsing Network (PSPNet) [60] introduced

a pyramid pooling module aimed at aggregating global con-
text information from different regions which are upsampled
and concatenated to form the final feature representation.
A final per-pixel prediction is obtained after a convolution
layer (see Figure 2, third architecture). For feature extraction,
we have used the ResNet50 architecture pretrained on ima-
geNet. Similar to the UNet architecture, DeepLabV3+ [61] is
an encoder-decoder network. However, it utilizes atrous sepa-
rable convolutions and spatial pyramid pooling (see Figure 2,
last architecture) for fast inference and improved accuracy.
Atrous convolution controls the resolution of features com-
puted and adjust the receptive field to effectively capture
multi-scale information. In this paper, we have used an out-
put stride of 16 for both encoder and decoder networks of
DeepLabV3+ and have experimented on both ResNet50 and
ResNet101 backbones.
ResUNet [62] integrates the power of both UNet and
residual neural network. ResUNet++ [3] is the improved
version of ResUNet architecture. It has additional layers
including squeeze-and-excite block, Atrous Spatial Pyra-
mid Pooling (ASPP), and attention block. These additional
layers helps learning the deep features that are capable of
improved prediction of pixels for object segmentation tasks.
DoubleU-Net [43] consists of two modified UNet architec-
ture. It uses VGG-19 pretrained on ImageNet [63] as the first
encoder. The main reason behind using VGG-19 (similar to
UNet [64]) was that it is a lightweight model. The additional
component in the DoubleUNet are squeeze-and-excite block,
and ASPP block. High-Resolution Network (HRNet) [65]
maintains high-resolution representation convolution in par-
allel and interchange the information across the resolution
FIGURE 3. Block diagram of ColonSegNet.
continuously. This is one of the most recent and popular
method in the literature. Furthermore, we have used UNet
with ResNet34 as a backbone network and trained the model
to compare with the other state-of-the-art semantic segmen- such as U-Net [59], PSPNet [60], DeepLabV3+ [61], and
tation networks. others. The use of fewer trainable parameters makes the
Table 4 shows the hyperparameters used for each of proposed architecture a very light-weight network that leads
the semantic segmentation based benchmark methods used. to real-time performance.
From the table, we can see that number of trainable param- The network consists of two encoder blocks and two
eters of the baseline methods are large. A high number of decoder blocks. The encoder network learns to extract all the
trainable parameters in the network makes it complex, leading necessary information from the input image, which is then
to a lower frame rate. It is therefore essential to design an passed to the decoder. Each decoder block consists of two
efficient, lightweight architecture that can provide a higher skip connections from the encoder. The first is a simple con-
frame rate and better performance. In this regard, we propose catenation, and the second skip connection passed through a
a novel architecture, ColonSegNet, that requires only few transpose convolution to incorporates multi-scale features in
number of training parameters, which can save training and the decoder. These multi-scale features help the decoder to
inference time. More details about the architecture can be generate more semantic and meaningful information in the
found in the below section. form of a segmentation mask.
The input image is fed to the first encoder, which consists
C. COLONSEGNET of two residual blocks and a 3 × 3 strided convolution in
Figure 3 shows the block diagram of the proposed ColonSeg- between them. This layer is followed by a 2 × 2 max-pooling.
Net. It is an encoder-decoder that uses residual block [66] Here, the output feature map spatial dimensions are reduced
with squeeze and excitation network [67] as the main com- to 14 of the input image. The second encoder consists of two
ponent. The network is designed to have very few trainable residual blocks and a 3 × 3 strided convolution in between
parameters as compared to other networks baseline networks them.

VOLUME 9, 2021 40501

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

The decoder starts with a transpose convolution, where the ensures monotonically decreasing precision. AP was
first decoder uses a stride value 4, which increases the feature computed as an average APs for IoU from 0.25 to
map spatial dimensions by 4. Similarly, the second decoder 0.75 with a step-size of 0.05 which means an average
uses a stride value of 2, increasing the spatial dimensions over 11 IoU levels are used (AP @[.25 : .05 : .75]).
by 2. Then, the network follows a simple concatenation and
a residual block. Next, it is concatenated with the second 2) SEGMENTATION TASK
skip connection and again followed by a residual block. The For polyp segmentation task, we have used widely accepted
output of the last decoder block passes through a 1 × 1 computer vision metrics that include Dice Coefficient (DSC),
convolution and a sigmoid activation function, generating the Jaccard Coefficient (JC), precision (p), and recall (r), and
binary segmentation mask. overall accuracy (Acc). JC is also termed as IoU. We have
also included Frame Per Second (FPS) to evaluate the clinical
1) DATA AUGMENTATION applicability of the segmentation methods in terms of infer-
Supervised learning methods are data voracious and require ence time during the test.
large amount of data to obtain reliable and well-performing To define each metric, let tp, fp, tn, and fn represents true
models. Acquiring such training data through data collection, positives, false positives, true negatives, and false negatives,
curation, and annotation is a manual process that needs sig- respectively.
nificant resources and man-hours from both clinical experts
2 · tp
and computational scientists. DSC = (3)
Data augmentation is a common technique to computa- 2 · tp + fp + fn
tionally increase the number of training samples in a dataset. tp
IoU = (4)
For our DL models, we use basic augmentation techniques tp + fp + fn
such as horizontal flipping, vertical flipping, random rotation, tp
r = (5)
random scale, and random cropping. The images used in all tp + fn
the experiments undergo normalization and are resized to a tp
p= (6)
fixed size of 512 × 512. For the normalization, we subtract tp + fp
the image by mean and divide it by standard deviation. 5p × r
F2 = (7)
4p + r
V. RESULTS tp + tn
In this section, we first present our evaluation metrics and Acc = (8)
tp + tn + fp + fn
experimental setup. Then, we present both quantitative and #frames 1
qualitative results. FPS = = (9)
sec sec/frame
A. EVALUATION METRICS
We have used standard computer vision metrics to evaluate B. EXPERIMENTAL SETUP AND CONFIGURATION
polyp detection and localisation, and semantic segmentation The methods such as UNet, ResUNet, ResUNet ++, Dou-
methods on the Kvasir-SEG dataset. bleUNet, and HRNet were implemented using Keras [70]
with a Tensorflow [71] back-end and were run on a Volta
1) DETECTION AND LOCALISATION TASK 100 GPU and an Nvidia DGX-2 AI system. A PyTorch
For the object detection and localisation task, the commonly implementation for FCN8, PSPNet, DeepLabv3 +, UNet-
used Average Precision (AP) and IoU have been used [68], ResNet34, and ColonSegNet networks were done. Training
[69]. of these methods were conducted on NVIDIA Quadro RTX
• IoU: This metric measures the overlap between two 6000. NVIDIA GTX2080Ti was used for test inference for all
bounding boxes A and B as the ratio between the over- methods reported in the paper. All of the detection methods
lapped area. were implemented using PyTorch and used NVIDIA Quadro
RTX 6000 hardware for training the network.
A∩B
IoU(A,B) = (1) In all of the cases, we used 880 images for training and
A∪B the remaining 120 images for the validation. Due to different
• AP: AP is computed as the Area Under Curve (AUC) image sizes in the dataset, we resized the images to 512×512.
of the precision-recall curve of detection sampled at all Hyperparamters are important for the DL algorithms to find
unique recall values (r1, r2, . . . ) whenever the maximum the optimal solution. However, picking the optimal hyperpa-
precision value drops: rameter is difficult. There are algorithms such as grid search,
AP =
X
(rn+1 − rn ) pinterp (rn+1 ) , (2) random search, and advanced solutions such as Bayesian
n
optimization for finding the optimal parameters. However,
an algorithm such as Bayesian optimization is computation-
with pinterp (rn+1 ) = max p(r̃). Here, p(rn ) denotes the
r̃≥rn+1 ally costly, making it difficult to test several DL algorithms.
precision value at a given recall value. This definition We have done an extensive hyperparameter search for finding

40502 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

FIGURE 4. Detection and localisation results on test dataset: On right of the black solid line, images where EfficientDet-D0, YOLOv4, Faster R-CNN and
RetinaNet (with ResNet50 backbone) have similar results and in most cases obtained highest IoU. On left, images with failed case (worse localisation) for
either of the method. Confidence scores are provided on the top-left of the red prediction boxes.

TABLE 2. Hyperparameters used for baseline methods for polyp detection and localisation task on Kvasir-SEG. Here, CIoU: complete
intersection-of-union loss, MSE: mean square error, CE: cross-entropy.

TABLE 3. Result on the polyp detection and localisation task on the Kvasir-SEG dataset. Two best scores are highlighted in bold.

the optimal hyperparameters for polyp detection, localisation, computed for multiple IoU thresholds and for average pre-
and segmentation task. These sets of hyperparameters were cision at IoU threshold 25 (AP25 ) and 50 (AP50 ). RetinaNet
chosen based on empirical evaluation. The used hyperparam- with ResNet101 backbone achieved an average precision
eters are for the Kvasir-SEG dataset and are reported in the of 0.8745, while YOLOv4 yielded 0.8513. However, for
Table 2, and Table 4. the IoU threshold of 0.75, YOLOv4 showed improvement
over RetinaNet with (AP75 ) of 0.7594 against 0.7132 for
C. QUANTITATIVE EVALUATION RetinaNet with ResNet101 backbone. Similarly, the aver-
1) DETECTION AND LOCALISATION age IoU of 0.8248 was observed for YOLOv3, which is
Table 3 shows the detailed result for the polyp detec- nearly 8% improvement over RetinaNet. IoU determines the
tion and localisation task on the Kvasir-SEG dataset. preciseness of the bounding box localisation. EfficientDet-
It can be observed that RetinaNet shows improvement D0 obtained the least AP of 0.4756 and IoU of 0.4322.
over YOLOv3 and YOLOv4 for mean average precision Faster R-CNN obtained an AP of 0.7866. However, it only

VOLUME 9, 2021 40503

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

TABLE 4. Hyperparameters used for baseline methods for polyp segmentation task on Kvasir-SEG dataset.

TABLE 5. Baseline methods for polyp segmentation on the Kvasir-SEG dataset. Two best scores are highlighted in bold. ‘‘-’’ shows that there is no
backbone used in the network.

obtained an FPS of 8. YOLOv4 with Darknet53 as back- methods in terms of DSC, and IoU. However, the proposed
bone obtained a FPS of 48, which is 6× faster than Faster ColonSegNet outperforms in terms of processing speed.
R-CNN. The other competitive network was YOLOv3, with ColonSegNet is faster than UNet-ResNet34 by more than
an average FPS of 45.01. However, its average precision four times in processing colonoscopy frames. The com-
value is 5% less than YOLOv4. Thus, the quantitative results plexity of the network is six times smaller than the UNet-
show that the YOLOv4 with Darknet can detect different ResNet34 network. The proposed network is even smaller
types of polyps at a real-time speed of 48 FPS and average than the conventional UNet, with its size only being around
precision of 0.8513. Therefore, from the evaluation metrics 0.75 times that of the UNet with higher scores on evaluation
comparison, YOLOv4 with Darknet53 is the best model for metrics compared to the classical UNet and its derivates such
detection and localisation of polyp. The results suggest that as ResUNet and ResUNet ++. Additionally, the recall and
the model can help gastroenterologists find missed polyps overall accuracy metrics of ColonSegNet are close to the
and decrease the polyp miss-rate. Even though, the proposed highest performing UNet-ResNet34 network, which shows
ColonSegNet is primary built for real-time segmentation of the proposed method’s efficiency.
polyps, we compared the bounding box predictions of the The original implementation of UNet obtained the least
proposed network with SOTA detection methods. It can be DSC of 0.5969, whereas the UNet with ResNet34 as the
observed that the inference of the proposed method is nearly backbone model obtained the highest DSC of 0.8757. The
four times faster (180 FPS) than YOLOv4. Additionally, it second and third best DSC scores of 0.8643 and 0.8572 were
is also obtaining competitive scores on both AP and IoU obtained for DeepLabv3+ with ResNet101 and DeepLabv3+
metrics (IoU of 0.81 and AP of 0.80). Therefore, it can also with ResNet50 as the backbone, respectively. From the
be considered as one of the best detection and localisation table, it is seen that DeepLabv3+ with ResNet101 per-
techniques. forms better than Deeplabv3+ with ResNet50. This may be
because of the top-5 accuracy (i.e., the validation results
on the ImageNet model) of ResNet101 is slightly better
2) SEGMENTATION
than ResNet50.1 Despite of DeepLabv3+ with ResNet101
Table 5 shows the obtained results on the polyp segmentation
task. It can be observed that the UNet with ResNet34 back-
bone performs better than the other SOTA segmentation 1 https://keras.io/api/applications/

40504 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

FIGURE 5. Best and worse performing samples for polyp segmentation: a) Top (left) and bottom (right) scored sets, b) predicted masks for top scored
images and c) bottom scored images for all methods compared to the ground truth (GT) masks. Green rectangles represent the selected images from top
scored set and red rectangle represent those from bottom set. Here, UNet-RN34: UNet-ResNet34, RUNet ++: ResUNet ++, D-UNet: Double UNet, DLabv3
+: DeepLabv3 + (ResNet50).

backbone having the total number of trainable parameters ColonSegNet is competitive compared to both of these net-
more than 11 times and DeepLabv3+ with ResNet34 being works. However in terms, of processing speed, it is almost
nearly eight times computational complexity, the DSC of 11 times faster than DeepLabv3 + with ResNet101 and

VOLUME 9, 2021 40505

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

nearly seven times faster than DeepLabv3 with ResNet34 for flat polyps (very small), images with a certain degree
backbone. of inclined view, and for the images with saturated areas.
FCN8, HRNet and DoubleUNet provided similar results The proposed ColonSegNet is able to achieve similar shapes
with DSC of 0.8310, 0.8446, and 0.8129 while ResUNet ++ compared to these of the ground truth with some outliers for
achieved DSC of only 0.7143. A similar trend can be the predictions which can be seen in Figure 5(b), while for
observed for F2-score for all methods. For precision, UNet the prediction on worse performing images in Figure 5(c),
with ResNet34 backbone achieved the maximum score of our proposed network provides comparatively improved pre-
p = 0.9435, and DeepLabv3 + with ResNet50 backbone dictions on almost all samples.
achieved the highest scores of r = 0.8616, while UNet scored
the worst with p = 0.6722 and r = 0.6171. The overall VI. DISCUSSION
accuracy was outstanding for most methods, with the highest It is evident that there is a growing interest in the investi-
for UNet and ResNet34 as the backbone. IoU is also provided gation of computational support systems for decision mak-
in the table for each segmentation method for scientific com- ing through endoscopic images. For the first time, we are
pletion. Again, UNet and ResNet34 surpassed others with a using Kvasir-SEG for detection and localisation tasks, and
mIoU score of 0.8100. Also, UNet and ResNet34 achieved comparing segmentation methods with most recent SOTA
the highest FPS rate of 35 fps, which is acceptable in terms methods. We provide a reproducible benchmarking of the DL
of speed and is relatively faster as compared to DeepLabv3+ methods using standard computer vision metrics in object
with ResNet50 (27.9000) and DeepLabv3+ with ResNet101 detection and localisation, and semantic segmentation. The
(16.7500) and other SOTA methods. Additionally, when we choice of methods are based their popularity in the medical
consider the number of parameter uses (see Table 4), UNet image domain for detection and segmentation (e.g., UNet,
with ResNet34 backbone uses less number of the parameters Faster R-CNN), speed (e.g., UNet with ResNet34, YOLOv3),
as compared to that of FCN8 or DeepLabv3 + network. and accuracy (e.g., PSPNet, FCN8, or DoubleUNet) or a
Due to the low number of trainable parameters and fastest combination of all (e.g., DeepLabv3+, YOLOv4).
inference time, ColonSegNet is computationally efficient and From the experimental results in Table 3, we can observe
becomes the best choice while considering the need for real- that the combination of YOLOv3 with Darknet53 backbone
time segmentation (182.38 FPS on NVIDIA GTX2080Ti) shows improvement over other methods in terms of mIoU,
of polyps with deployment possible on even low-end hard- which means a better localisation compared to counterpart
ware devices making it feasible for many clinical settings. RetinaNet. However, YOLOv4 is 3× faster than RetinaNet
Whereas, UNet with ResNet34 backbone seems the best and has a good trade-off between the average precision and
choice while taking DSC metric into account, however, with IoU. This is because of their Cross-Stage-Partial-Connections
speed of only 35 FPS on NVIDIA GTX2080Ti. (CSP) and CIoU loss for bounding box regression. However,
RetinaNet with the backbone ResNet101 shows competitive
D. QUALITATIVE EVALUATION results surpassing other methods on average precision but
Figure 4 shows the qualitative result for the polyp detection nearly 5% less IoU compared to YOLOv4 and nearly 5% less
and localisation task along with their corresponding confi- than YOLOv3-spp. Similarly, state-of-the-art methods Faster
dence scores. It can be observed that for most images on the R-CNN and EffecientDet-D0 provided the least AP and IoU.
left side of the vertical line, both YOLOv4 and RetinaNet A choice between computational speed, accuracy and pre-
are able to detect and localise polyps with higher confidence, cision is vital in object detection and localisation tasks, espe-
except for the third column sample where most of these cially for colonoscopy video data where speed is a vital
methods can identify only some polyp areas. Similarly, on the element to achieve real-time performance. Therefore, we con-
right side of the vertical line, the detected bounding boxes for sider YOLOv4 with Darknet53 and CSP backbone as the best
5th and 6th column images are too wide for the RetinaNet, approach in the table for the polyp detection and localisation
while YOLOv4 has the best localisation of polyp (observe task.
the bounding box). Also, in the seventh column, RetinaNet For the semantic segmentation tasks, ColonSegNet showed
and EfficientDet D0 misses the polyp. In the eighth column, improvement over all the methods. The method obtained the
YOLOv4 and EfficientDet D0 misses the small polyp com- highest FPS of 182.38. The quantitative results in Figure 5 (b)
pletely while stool and polyp is detected as polyp by the Faster showed the most accurate delineation of polyp pixels com-
R-CNN and RetinaNet. pared to other SOTA methods considered in this paper.
Figure 5 shows the result for the top-scored and bottom The most competitive method to ColonSegNet was UNet
scored sets selected based on their dice similarity coefficient with ResNet34 backbone. The other comparable method was
values for the semantic segmentation methods. It can be seen DeepLabv3 +, which accuracy can be due to its ability
that all the algorithms are able to detect large polyps and to navigate the semantically meaningful regions with its
produce high-quality masks (see Figure 5(b). atrous convolution and spatial-pyramid pooling mechanism.
Here, the best obtained segmentation results can be Additionally, the feature concatenation from previous fea-
observed for DeepLabv3+ and UNet-ResNet34. However, ture maps may have helped to compute more accurate maps
as shown in Figure 5(c), the segmentation results are affected for object semantic representation and hence segmentation.

40506 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

The other competitor was PSPNet, which is also based on IoU, and FPS for the detection algorithm and DSC, IoU,
similar idea but on aggregating the global context informa- precision, recall, F2-score, and FPS for the segmentation
tion from different regions rather than the use of dilated algorithm. While algorithms investigated in this paper show
convolutions. The computational speed for DeepLabv3+ a clear strength to be used in clinical settings to help gas-
with the same ResNet50 backbone as used in PSPNet in troenterologists for the polyp detection, localisation, and seg-
our experiments comes from the fact that the 1D separa- mentation task, computational scientists can build upon these
ble convolutions and SPP network is used in DeepLabv3+. methods to further improve in terms of accuracy, speed and
We evaluated the most recent popular SOTA method in seg- robustness.
mentation ‘‘HRNet’’ [65]. While HRNet produced compet- Additionally, the qualitative results provide insight for
itive results compared to other SOTA methods, UNet with failure cases. This gives an opportunity to address the chal-
ResNet34 backbone and DeepLabv3+ outperformed for most lenges present in the Kvasir-SEG dataset. Moreover, we have
evaluation metrics with ColonSegNet being competitive in provided experimental results using well-established perfor-
the recall, and overall accuracy and outperforming other mance metrics along with the dataset for a fair comparison of
SOTA method significantly. the approaches. We believe that further data augmentation,
Figure 5 shows an example for the 16 top scored and fine tuning, and more advanced methods can improve the
16 bottom scored images on DSC for segmentation. From the results. Additionally, incorporating artifacts [73] (e.g., sat-
results in Figure 5(c), it can be observed that there are polyps uration, specularity, bubbles, and contrast) issues can help
whose appearance under the given lighting conditions is very improve the performance of polyp detection, localisation,
similar to healthy surrounding gastrointestinal skin texture. and segmentation. In the future, research should be more
We suggest that including more samples with variable tex- focused on designing even better algorithms for detection,
ture, different lighting conditions, and different angular views localisation, and segmentation tasks, and models should be
(refer to the samples in Figure 5(a) on the right, and (c)) can build taking the number of parameters into consideration as
help to improve the DSC and other metrics of segmentation. required by most clinical systems.
We also observed that the presence of sessile or flat polyps
were major limiting factors for algorithm robustness. Thus, ACKNOWLEDGMENT
including smaller polyps with respect to image size can help Debesh Jha is funded by the Research Council of Norway
algorithm to generalise better thereby making these methods project number 263248 (Privaton). The computations in this
more usable for early detection of hard-to find polyps. In this paper were performed on equipment provided by the Experi-
regard, we also suggest the use of spatial pyramid layers to mental Infrastructure for Exploration of Exascale Computing
handle small polyps and using context-aware methods such (eX3), which is financially supported by the Research Coun-
as incorporation of artifacts or shape information to improve cil of Norway under contract 270053. Parts of computational
the robustness of these methods. resources were also used from the research supported by the
The possible limitation of the study is its retrospective National Institute for Health Research (NIHR) Oxford BRC
design. Clinical studies are required for the validation of the with additional support from the Wellcome Trust Core Award
approach in a real-world setting [72]. Additionally, in the Grant Number 203141/Z/16/Z. Sharib Ali is supported by
presented study design we have resized the images, which the NIHR Oxford Biomedical Research Centre. The views
can lead to loss of information and affect the algorithm expressed are those of the author(s) and not necessarily
performance. Moreover, we have optimized all the algo- those of the NHS, the NIHR or the Department of Health.
rithms based on the empirical evaluation. Even though, (Debesh Jha and Sharib Ali contributed equally to this work.)
optimal hyper-parameters have been set after experiments,
we acknowledge that these can be further adjusted. Similarly,
meta-learning approaches can be exploited to optimize the REFERENCES
hyper-parameters that can work even in resource constraint [1] J. Asplund, J. H. Kauppila, F. Mattsson, and J. Lagergren, ‘‘Survival trends
settings. in gastric adenocarcinoma: A population-based study in Sweden,’’ Ann.
Surgical Oncol., vol. 25, no. 9, pp. 2693–2702, Sep. 2018.
[2] Ø. Holme, M. Bretthauer, A. Fretheim, J. Odgaard-Jensen, and G. Hoff,
VII. CONCLUSION ‘‘Flexible sigmoidoscopy versus faecal occult blood testing for colorectal
In this paper, we benchmark deep learning methods on the cancer screening in asymptomatic individuals,’’ Cochrane Database Sys-
tematic Rev., vol. 9. Munich, Germany: Zuckschwerdt, Oct. 2013.
Kvasir-SEG dataset. We conducted thorough and extensive
[3] D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. D. Lange,
experiments for polyp detection, localisation, and segmen- P. Halvorsen, and H. D. Johansen, ‘‘ResUNet++: An advanced architec-
tation tasks and shown how different algorithms performs ture for medical image segmentation,’’ in Proc. IEEE Int. Symp. Multime-
dia (ISM), Dec. 2019, pp. 225–2255.
on variable polyp sizes and image resolutions. The proposed
[4] R. G. Holzheimer and J. A. Mannick, Surgical Treatment: Evidence-Based
ColonSegNet detected and localised polyps at 180 frames Problem-Oriented. 2001.
per second. Similarly, ColonSegNet segmented polyps at the [5] J. Lee, ‘‘Resection of diminutive and small colorectal polyps: What is the
speed of 182.38 frames per second. The automatic polyp optimal technique?’’ Clin. Endoscopy, vol. 49, no. 4, p. 355, 2016.
[6] P. L. Ponugoti, O. W. Cummings, and D. K. Rex, ‘‘Risk of cancer in small
detection, localisation, and segmentation algorithms showed and diminutive colorectal polyps,’’ Digestive Liver Disease, vol. 49, no. 1,
good performance, as evidenced by high average precision, pp. 34–37, Jan. 2017.

VOLUME 9, 2021 40507

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

[7] C. V. Tranquillini, W. M. Bernardo, V. O. Brunaldi, E. T. D. Moura, [25] S. Ali et al., ‘‘Deep learning for detection and segmentation of artefact
S. B. Marques, and E. G. H. D. Moura, ‘‘Best polypectomy technique for and disease instances in gastrointestinal endoscopy,’’ Med. Image Anal.,
small and diminutive colorectal polyps: A systematic review and meta- vol. 70, May 2021, Art. no. 102002.
analysis,’’ Arquivos de Gastroenterologia, vol. 55, no. 4, pp. 358–368, [26] D. Jha, S. Ali, K. Emanuelsen, S. A. Hicks, V. Thambawita, E. Garcia-Ceja,
Dec. 2018. M. A. Riegler, T. D. Lange, P. T. Schmidt, H. D. Johansenm, D. Johansen,
[8] O. Kronborg and J. Regula, ‘‘Population screening for colorectal can- and P. Halvorsen, ‘‘Kvasir-instrument: Diagnostic and therapeutic tool
cer: Advantages and drawbacks,’’ Digestive Diseases, vol. 25, no. 3, segmentation dataset in gastrointestinal endoscopy,’’ in Proc. Int. Conf.
pp. 270–273, 2007. Multimedia Modeling (MMM), 2021, pp. 218–229.
[9] M. F. Kaminski, J. Regula, U. Wojciechowska, E. Kraszewska, [27] S. A. Karkanis, D. K. Iakovidis, D. E. Maroulis, D. A. Karras, and
M. Polkowski, J. Didkowska, M. Zwierko, M. Rupinski, M. P. Nowacki, M. Tzivras, ‘‘Computer-aided tumor detection in endoscopic video using
and E. Butruk, ‘‘Quality indicators for colonoscopy and the risk of color wavelet features,’’ IEEE Trans. Inf. Technol. Biomed., vol. 7, no. 3,
interval cancer,’’ New England J. Med., vol. 362, no. 19, pp. 1795–1803, pp. 141–152, Sep. 2003.
May 2010. [28] S. Ameling, S. Wirth, D. Paulus, G. Lacey, and F. Vilarino, ‘‘Texture-
[10] D. Castaneda, V. B. Popov, E. Verheyen, P. Wander, and S. A. Gross, based polyp detection in colonoscopy,’’ in Bildverarbeitung für die Medizin
‘‘New technologies improve adenoma detection rate, adenoma miss rate, 2009. Informatik aktuell, H. P. Meinzer, T. M. Deserno, H. Handels, and
and polyp detection rate: A systematic review and meta-analysis,’’ Gas- T. Tolxdorff, Eds. Berlin, Germany: Springer, 2009, pp. 346–350, doi:
trointestinal Endoscopy, vol. 88, no. 2, pp. 209–222, 2018. 10.1007/978-3-540-93860-6_70.
[11] M. Matyja, A. Pasternak, M. Szura, M. Wysocki, M. Pędziwiatr, and [29] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall,
K. Rembiasz, ‘‘How to improve the adenoma detection rate in colorectal M. B. Gotway, and J. Liang, ‘‘Convolutional neural networks for medical
cancer screening? Clinical factors and technological advancements,’’ Arch. image analysis: Full training or fine tuning?’’ IEEE Trans. Med. Imag.,
Med. Sci., AMS, vol. 15, no. 2, p. 424, 2019. vol. 35, no. 5, pp. 1299–1312, May 2016.
[12] M. Riegler, ‘‘Eir—A medical multimedia system for efficient computer [30] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao,
aided diagnosis,’’ Ph.D. dissertation, Dept. Inform., Univ. Oslo, Oslo, D. Mollura, and R. M. Summers, ‘‘Deep convolutional neural networks for
Norway, 2017. computer-aided detection: CNN architectures, dataset characteristics and
[13] T. D. Lange, P. Halvorsen, and M. Riegler, ‘‘Methodology to develop transfer learning,’’ IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298,
machine learning algorithms to improve performance in gastrointestinal May 2016.
endoscopy,’’ World J. Gastroenterol., vol. 24, no. 45, p. 5057, 2018. [31] J. Bernal et al., ‘‘Comparative validation of polyp detection methods in
[14] Y. Shin, H. A. Qadir, and I. Balasingham, ‘‘Abnormal colon polyp image
video colonoscopy: Results from the MICCAI 2015 endoscopic vision
synthesis using conditional adversarial networks for improved detection
challenge,’’ IEEE Trans. Med. Imag., vol. 36, no. 6, pp. 1231–1249,
performance,’’ IEEE Access, vol. 6, pp. 56007–56017, 2018.
Jun. 2017.
[15] J. Y. Lee, J. Jeong, E. M. Song, C. Ha, H. J. Lee, J. E. Koo,
[32] S. Ali et al., ‘‘An objective comparison of detection and segmentation
D.-H. Yang, N. Kim, and J.-S. Byeon, ‘‘Real-time detection of colon polyps
algorithms for artefacts in clinical endoscopy,’’ Sci. Rep., vol. 10, no. 1,
during colonoscopy using deep learning: Systematic validation with four
pp. 1–15, Dec. 2020.
independent datasets,’’ Sci. Rep., vol. 10, no. 1, pp. 1–9, Dec. 2020.
[33] Y. Wang, W. Tavanapong, J. Wong, J. H. Oh, and P. C. D. Groen, ‘‘Polyp-
[16] P. Wang, X. Xiao, J. R. Glissen Brown, T. M. Berzin, M. Tu, F. Xiong,
alert: Near real-time feedback during colonoscopy,’’ Comput. Methods
X. Hu, P. Liu, Y. Song, D. Zhang, X. Yang, L. Li, J. He, X. Yi, J. Liu,
Programs Biomed., vol. 120, no. 3, pp. 164–179, Jul. 2015.
and X. Liu, ‘‘Development and validation of a deep-learning algorithm for
[34] Y. Shin, H. A. Qadir, L. Aabakken, J. Bergsland, and I. Balasingham,
the detection of polyps during colonoscopy,’’ Nature Biomed. Eng., vol. 2,
‘‘Automatic colon polyp detection using region based deep CNN and post
no. 10, pp. 741–748, Oct. 2018.
[17] D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. D. Lange, learning approaches,’’ IEEE Access, vol. 6, pp. 40950–40962, 2018.
D. Johansen, and H. D. Johansen, ‘‘Kvasir-SEG: A segmented polyp [35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
dataset,’’ in Proc. Int. Conf. Multimedia Modeling (MMM), 2020, S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in
pp. 451–462. Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
[18] D. Jha, P. H. Smedsrud, D. Johansen, T. D. Lange, H. Johansen, [36] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
P. Halvorsen, and M. Riegler, ‘‘A comprehensive study on colorectal polyp Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
segmentation with ResUNet++, conditional random field and test-time Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
augmentation,’’ IEEE J. Biomed. Health Inform., early access, Jan. 5, 2021, [37] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ in
doi: 10.1109/JBHI.2021.3049304. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
[19] K. Pogorelov, K. R. Randel, C. Griwodz, S. L. Eskeland, T. D. Lange, pp. 7263–7271.
D. Johansen, C. Spampinato, D. T. Dang-Nguyen, M. Lux, P. T. Schmidt, [38] M. Yamada, Y. Saito, H. Imaoka, M. Saiko, S. Yamada, H. Kondo,
and M. Riegler, ‘‘Kvasir: A multi-class image dataset for computer aided H. Takamaru, T. Sakamoto, J. Sese, A. Kuchiba, T. Shibata, and
gastrointestinal disease detection,’’ in Proc. 8th ACM Multimedia Syst. R. Hamamoto, ‘‘Development of a real-time endoscopic image diagnosis
Conf., 2017, pp. 164–169. support system using deep learning technology in colonoscopy,’’ Sci. Rep.,
[20] K. Pogorelov, K. R. Randel, T. D. Lange, S. L. Eskeland, C. Griwodz, vol. 9, no. 1, pp. 1–9, Dec. 2019.
D. Johansen, C. Spampinato, M. Taschwer, M. Lux, P. T. Schmidt, and [39] V. Badrinarayanan, A. Kendall, and R. Cipolla, ‘‘SegNet: A deep con-
M. Riegler, ‘‘Nerthus: A bowel preparation quality video dataset,’’ in Proc. volutional encoder-decoder architecture for image segmentation,’’ IEEE
ACM Multimedia Syst. Conf. (MMSys), 2017, pp. 170–174. Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
[21] H. Borgli, V. Thambawita, P. H. Smedsrud, S. Hicks, D. Jha, S. L. Eskeland, Dec. 2017.
K. R. Randel, K. Pogorelov, M. Lux, D. T. D. Nguyen, D. Johansen, [40] Y. Guo and B. Matuszewski, ‘‘GIANA polyp segmentation with
C. Griwodz, H. K. Stensland, E. Garcia-Ceja, P. T. Schmidt, H. L. Hammer, fully convolutional dilation neural networks,’’ in Proc. 14th Int.
M. A. Riegler, P. Halvorsen, and T. D. Lange, ‘‘HyperKvasir, a comprehen- Joint Conf. Comput. Vis., Imag. Comput. Graph. Theory Appl., 2019,
sive multi-class image and video dataset for gastrointestinal endoscopy,’’ pp. 632–641.
Sci. Data, vol. 7, no. 1, pp. 1–14, Dec. 2020. [41] S. Ali, F. Zhou, C. Daul, B. Braden, A. Bailey, S. Realdon, J. East,
[22] J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, ‘‘Toward G. Wagnières, V. Loschenov, E. Grisan, W. Blondel, and J. Rittscher,
embedded detection of polyps in WCE images for early diagnosis of ‘‘Endoscopy artifact detection (EAD 2019) challenge dataset,’’ 2019,
colorectal cancer,’’ Int. J. Comput. Assist. Radiol. Surgery, vol. 9, no. 2, arXiv:1905.03209. [Online]. Available: http://arxiv.org/abs/1905.
pp. 283–293, Mar. 2014. 03209
[23] J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, [42] R. Wang, S. Chen, C. Ji, J. Fan, and Y. Li, ‘‘Boundary-aware context neu-
and F. Vilariño, ‘‘WM-DOVA maps for accurate polyp highlighting in ral network for medical image segmentation,’’ 2020, arXiv:2005.00966.
colonoscopy: Validation vs. Saliency maps from physicians,’’ Computer- [Online]. Available: http://arxiv.org/abs/2005.00966
ized Med. Imag. Graph., vol. 43, pp. 99–111, Jul. 2015. [43] D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,
[24] P. H. Smedsrud, H. L. Gjestang, O. O. Nedrejord, E. Næss, V. Thambawita, ‘‘DoubleU-Net: A deep convolutional neural network for medical image
S. Hicks, H. Borgli, D. Jha, T. J. Berstad, S. L. Eskeland, and M. Lux, segmentation,’’ in Proc. IEEE 33rd Int. Symp. Comput.-Based Med. Syst.
‘‘Kvasir-capsule, a video capsule endoscopy dataset,’’ Sci. Data, 2021. (CBMS), Jul. 2020, pp. 558–564.

40508 VOLUME 9, 2021

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

[44] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and [68] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn,
D. Terzopoulos, ‘‘Image segmentation using deep learning: A survey,’’ and A. Zisserman, ‘‘The Pascal visual object classes challenge: A retro-
2020, arXiv:2001.05566. [Online]. Available: http://arxiv.org/abs/2001. spective,’’ Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015.
05566 [69] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár,
[45] M. Baldeon-Calisto and S. K. Lai-Yuen, ‘‘AdaResU-Net: Multiobjective and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in
adaptive convolutional neural network for medical image segmentation,’’ Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755, doi: 10.1007/978-3-
Neurocomputing, vol. 392, pp. 325–340, Jun. 2020. 319-10602-1_48.
[46] N. Saeedizadeh, S. Minaee, R. Kafieh, S. Yazdani, and M. Sonka, [70] F. Chollet et al., ‘‘Keras,’’ Tech. Rep., 2015.
‘‘COVID TV-UNet: Segmenting COVID-19 chest CT images using con- [71] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
nectivity imposed U-Net,’’ 2020, arXiv:2007.12303. [Online]. Available: S. Ghemawat, G. Irving, M. Isard, and M. Kudlur, ‘‘Tensorflow: A system
http://arxiv.org/abs/2007.12303 for large-scale machine learning,’’ in Proc. USENIX Symp. Operating Syst.
[47] Y. Meng, M. Wei, D. Gao, Y. Zhao, X. Yang, X. Huang, and Y. Zheng, Design Implement. (OSDI), 2016, pp. 265–283.
‘‘CNN-GCN aggregation enabled boundary regression for biomedical [72] Y. Mori et al., ‘‘Real-time use of artificial intelligence in identification of
image segmentation,’’ in Proc. Int. Conf. Med. Image Comput. Comput.- diminutive polyps during colonoscopy: A prospective study,’’ Ann. Internal
Assist. Intervent., 2020, pp. 352–362. Med., vol. 169, no. 6, pp. 357–366, 2018.
[48] D. Vázquez, A. M. López, F. J. Sánchez, J. Bernal, A. Romero, [73] S. Ali, F. Zhou, A. Bailey, B. Braden, J. E. East, X. Lu, and J. Rittscher,
G. Fernández-Esparrach, M. Drozdzal, and A. Courville, ‘‘A benchmark ‘‘A deep learning framework for quality assessment and restoration in
for endoluminal scene segmentation of colonoscopy images,’’ J. Health- video endoscopy,’’ Med. Image Anal., vol. 68, Feb. 2021, Art. no. 101900.
care Eng., vol. 2017, Jul. 2017, Art. no. 4037190.
[49] T. Roß et al., ‘‘Comparative validation of multi-instance instrument seg-
mentation in endoscopy: Results of the ROBUST-MIS 2019 challenge,’’
Med. Image Anal., vol. 70, Nov. 2020, Art. no. 101920.
[50] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, ‘‘Focal loss for dense
object detection,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
pp. 2980–2988.
[51] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
object detection with region proposal networks,’’ in Proc. Adv. Neural Inf. DEBESH JHA received the master’s degree
Process. Syst., 2015, pp. 91–99. in information and communication engineering
[52] J. Dai, Y. Li, K. He, and J. Sun, ‘‘R-FCN: Object detection via region- from Chosun University, Gwangju, Republic of
based fully convolutional networks,’’ in Proc. Adv. Neural Inf. Process. Korea. He is currently pursuing the Ph.D. degree
Syst., 2016, pp. 379–387.
with SimulaMet, Oslo, Norway, and UiT—The
[53] M. Tan, R. Pang, and Q. V. Le, ‘‘Efficientdet: Scalable and efficient object
Arctic University of Norway, Tromsø, Norway.
detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Jun. 2020, pp. 10781–10790. His research interests include computer vision,
[54] M. Tan and Q. V. Le, ‘‘EfficientNet: Rethinking model scaling for convo- machine learning, deep learning, and medical
lutional neural networks,’’ 2019, arXiv:1905.11946. [Online]. Available: image analysis.
http://arxiv.org/abs/1905.11946
[55] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Dec. 2015, pp. 1440–1448.
[56] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improve-
ment,’’ 2018, arXiv:1804.02767. [Online]. Available: http://arxiv.org/abs/
1804.02767
[57] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Opti-
mal speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
[Online]. Available: http://arxiv.org/abs/2004.10934
[58] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks SHARIB ALI received the Ph.D. degree from
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern the University of Lorraine, France. He worked
Recognit. (CVPR), Jun. 2015, pp. 3431–3440. as a Postdoctoral Researcher at the Biomedical
[59] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks Computer Vision Group and the German
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image Cancer research Center (DKFZ), University of
Comput. Comput.-Assist. Intervent. (MICCAI), 2015, pp. 234–241. Heidelberg, Heidelberg, Germany. He is cur-
[60] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, ‘‘Pyramid scene parsing rently working at the Department of Engineer-
network,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), ing Science, Institute of Biomedical Engineering,
Jul. 2017, pp. 2881–2890.
University of Oxford, Oxford, U.K. His research
[61] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, ‘‘Encoder-
interests include computer vision and medical
decoder with atrous separable convolution for semantic image segmenta-
tion,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 801–818. image analysis.
[62] Z. Zhang, Q. Liu, and Y. Wang, ‘‘Road extraction by deep residual U-Net,’’
IEEE Geosci. Remote Sens. Lett., vol. 15, no. 5, pp. 749–753, May 2018.
[63] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet:
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[64] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online]. Avail-
able: http://arxiv.org/abs/1409.1556
[65] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, NIKHIL KUMAR TOMAR received the bachelor’s
M. Tan, X. Wang, W. Liu, and B. Xiao, ‘‘Deep high-resolution represen- degree in computer application from Indira Gandhi
tation learning for visual recognition,’’ IEEE Trans. Pattern Anal. Mach. Open University, New Delhi, India. He is currently
Intell., early access, Apr. 1, 2020, doi: 10.1109/TPAMI.2020.2983686. doing collaborative research at SimulaMet. His
[66] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image research interests include computer vision, artifi-
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), cial intelligence, parallel processing, and medical
Jun. 2016, pp. 770–778. image segmentation.
[67] J. Hu, L. Shen, and G. Sun, ‘‘Squeeze- and-excitation networks,’’
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 7132–7141.

VOLUME 9, 2021 40509

D. Jha et al.: Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning

HÅVARD D. JOHANSEN received the Ph.D. MICHAEL A. RIEGLER received the Ph.D. degree
degree from the UiT—The Arctic University of from the Department of Informatics, University of
Norway. He is currently a Professor with the Oslo, Oslo, Norway, in 2015. He is currently work-
Department of Informatics, UiT—The Arctic Uni- ing as a Chief Research Scientist at SimulaMet,
versity of Norway. His major research interests Oslo, Norway. His research interests include
include computing networks, cloud computing, machine learning, video analysis and understand-
network security, information security, and net- ing, image processing, image retrieval, crowd-
work architecture. sourcing, social computing, and user intentions.

DAG JOHANSEN is currently a Full Profes-

sor with the Department of Computer Science,
UiT—The Arctic of Norway. He is exploring inter-
disciplinary research problems at the intersection
of sport science, medicine, and computer science.
A usecase receiving special attention is elite soccer
performance development and quantification tech-
nologies as basis for evidence-based decisions. His
research interests include intervention technolo-
gies where privacy is a first-order concern and
design principle.

JENS RITTSCHER received the Ph.D. degree from

the University of Oxford, in 2001. He has worked
extensively in the areas of video surveillance, PÅL HALVORSEN is currently a Chief Research
the automatic annotation of video, and under- Scientist at SimulaMet, Oslo, Norway, a Full Pro-
standing of volumetric seismic data. He previously fessor with the Department of Computer Science,
worked at the GE Global Research, Niskayuna, Oslo Metropolitan University, and an Adjunct Pro-
NY, USA, where he led the Computer Vision Lab- fessor at the Department of Informatics, University
oratory. He is currently a Professor at with the of Oslo, Norway. His research interest includes
Institute of Biomedical Engineering, Department distributed multimedia systems, including oper-
of Engineering science, University of Oxford, ating systems, processing, storage and retrieval,
Oxford, U.K. He acts as an elected member of the IEEE SPS Technical communication, and distribution.
Committee on Bio Image and Signal Processing.

40510 VOLUME 9, 2021

Manual Ensite
100% (2)
Manual Ensite
310 pages
M Tech. Oral Examination: School of Computer Science & Engineering
No ratings yet
M Tech. Oral Examination: School of Computer Science & Engineering
20 pages
SimCol Challenge Paper R1
No ratings yet
SimCol Challenge Paper R1
19 pages
A Real-Time Polyp-Detection System With Clinical Application in Colonoscopy Using Deep Convolutional Neural Networks
No ratings yet
A Real-Time Polyp-Detection System With Clinical Application in Colonoscopy Using Deep Convolutional Neural Networks
38 pages
MidSem Presentation
No ratings yet
MidSem Presentation
50 pages
Journal Club
No ratings yet
Journal Club
76 pages
A DWT-based Encoder-Decoder Network For Specularity Segmentation in Colonoscopy Images
No ratings yet
A DWT-based Encoder-Decoder Network For Specularity Segmentation in Colonoscopy Images
20 pages
A Systematic Review of Deep Learning Based Image Segmentation To Detect Polyp
No ratings yet
A Systematic Review of Deep Learning Based Image Segmentation To Detect Polyp
53 pages
Life 13 00719 v2
No ratings yet
Life 13 00719 v2
18 pages
Content Based Image Retrieval
100% (1)
Content Based Image Retrieval
23 pages
17.xin Zheng 2014
No ratings yet
17.xin Zheng 2014
12 pages
Remotesensing 13 05111
No ratings yet
Remotesensing 13 05111
21 pages
Applsci 123
No ratings yet
Applsci 123
19 pages
ICONDEEPCOM 156.docm
No ratings yet
ICONDEEPCOM 156.docm
18 pages
A Comparative Study On Polyp Classification Using Convolutional Neural Networks
No ratings yet
A Comparative Study On Polyp Classification Using Convolutional Neural Networks
16 pages
PG 18 51207
No ratings yet
PG 18 51207
15 pages
"3D Printing Technology": Bachelor of Engineering
No ratings yet
"3D Printing Technology": Bachelor of Engineering
48 pages
Jimaging 11 00084 v2
No ratings yet
Jimaging 11 00084 v2
20 pages
Manuscript v6
No ratings yet
Manuscript v6
16 pages
Real-Colon: A Dataset For Developing Real-World Ai Applications in Colonos
No ratings yet
Real-Colon: A Dataset For Developing Real-World Ai Applications in Colonos
11 pages
PHD Thesis Industrial Engineering
100% (3)
PHD Thesis Industrial Engineering
5 pages
Vision Grid Transformer For Document Layout Analysis
No ratings yet
Vision Grid Transformer For Document Layout Analysis
13 pages
Polyp Segmentation Based On Implicit Edge Guided Cross Layer Fusion Networks
No ratings yet
Polyp Segmentation Based On Implicit Edge Guided Cross Layer Fusion Networks
12 pages
Diagnostics 14 00474
No ratings yet
Diagnostics 14 00474
16 pages
Classification of IHC Images of NATs With
No ratings yet
Classification of IHC Images of NATs With
9 pages
Group9 Report
No ratings yet
Group9 Report
23 pages
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
No ratings yet
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
5 pages
Internet of Things and Deep Learning Enabled Diabetic Retinopathy Diagnosis Using Retinal Fundus Images
No ratings yet
Internet of Things and Deep Learning Enabled Diabetic Retinopathy Diagnosis Using Retinal Fundus Images
12 pages
PolypSeg A Lightweight Context-Aware Network For Real-Time Polyp Segmentation
No ratings yet
PolypSeg A Lightweight Context-Aware Network For Real-Time Polyp Segmentation
12 pages
RePolyp A Framework For Generating Realistic Colon Polyps With Corresponding Segmentation Masks Using Diffusion Models
No ratings yet
RePolyp A Framework For Generating Realistic Colon Polyps With Corresponding Segmentation Masks Using Diffusion Models
6 pages
Ghanem 2019 Nature Appendix
No ratings yet
Ghanem 2019 Nature Appendix
6 pages
A Review On Different Glaucoma Detection PDF
No ratings yet
A Review On Different Glaucoma Detection PDF
6 pages
A Review On Different Glaucoma Detection PDF
No ratings yet
A Review On Different Glaucoma Detection PDF
6 pages
Multi-View Network For Colorectal Polyps Detection in CT Colonography
No ratings yet
Multi-View Network For Colorectal Polyps Detection in CT Colonography
6 pages
Colorectal Polyps Detection in Virtual Colonoscopy Using 3D Geometric Features and Deep Learning
No ratings yet
Colorectal Polyps Detection in Virtual Colonoscopy Using 3D Geometric Features and Deep Learning
4 pages
Image Segmentation With GVF Snake and Corner Detection
No ratings yet
Image Segmentation With GVF Snake and Corner Detection
4 pages
HistoryOfObjectRecognition PDF
No ratings yet
HistoryOfObjectRecognition PDF
2 pages
A Comprehensive Survey On Segment Anything Model For Vision and Beyond
No ratings yet
A Comprehensive Survey On Segment Anything Model For Vision and Beyond
28 pages
Chengtaopu 2020
No ratings yet
Chengtaopu 2020
9 pages
Adaptive Frequency Learning Network With Anti-Aliasing Complex Convolutions For Colon Diseases Subtypes
No ratings yet
Adaptive Frequency Learning Network With Anti-Aliasing Complex Convolutions For Colon Diseases Subtypes
12 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
10 pages
Diagnostics 13 01473
No ratings yet
Diagnostics 13 01473
11 pages
Polyp
No ratings yet
Polyp
9 pages
Applsci 13 10800
No ratings yet
Applsci 13 10800
12 pages
BEiT Model
No ratings yet
BEiT Model
18 pages
Study 2
No ratings yet
Study 2
14 pages
Using DUCK Net For Polyp Image Segmentation: Razvan Gabriel Dumitru, Darius Peteleaza & Catalin Craciun
No ratings yet
Using DUCK Net For Polyp Image Segmentation: Razvan Gabriel Dumitru, Darius Peteleaza & Catalin Craciun
12 pages
Helmet and Number Plate Detection
No ratings yet
Helmet and Number Plate Detection
7 pages
JARVIS Project
No ratings yet
JARVIS Project
39 pages
Paper 1
No ratings yet
Paper 1
12 pages
NanoNet Real Time Polyp Segmentation in
No ratings yet
NanoNet Real Time Polyp Segmentation in
7 pages
Abstract Batch - 7
No ratings yet
Abstract Batch - 7
1 page
Bui MEGANet Multi-Scale Edge-Guided Attention Network For Weak Boundary Polyp Segmentation WACV 2024 Paper
No ratings yet
Bui MEGANet Multi-Scale Edge-Guided Attention Network For Weak Boundary Polyp Segmentation WACV 2024 Paper
10 pages
DE-ColonSegNet: An Improved Version of ColonSegNet With Dual Encoder For Accurate Real Time Polyp Segmentation in Colonos
No ratings yet
DE-ColonSegNet: An Improved Version of ColonSegNet With Dual Encoder For Accurate Real Time Polyp Segmentation in Colonos
5 pages
Preprints202110 0135 v1
No ratings yet
Preprints202110 0135 v1
13 pages
1 s2.0 S1746809424006372 Main
No ratings yet
1 s2.0 S1746809424006372 Main
12 pages
PHD Synopsis
No ratings yet
PHD Synopsis
7 pages
1 s2.0 S0010482524010151 Main
No ratings yet
1 s2.0 S0010482524010151 Main
16 pages
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
No ratings yet
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
32 pages
Elregaily 20
No ratings yet
Elregaily 20
7 pages
2013 - 2014 Image Processing Project Titles
No ratings yet
2013 - 2014 Image Processing Project Titles
2 pages
Intelligentparkingsystem
No ratings yet
Intelligentparkingsystem
5 pages
1 s2.0 S0010482524000155 Main
No ratings yet
1 s2.0 S0010482524000155 Main
13 pages
1 s2.0 S0010482523012258 Main
No ratings yet
1 s2.0 S0010482523012258 Main
11 pages
1 s2.0 S001048252400180X Mainext
No ratings yet
1 s2.0 S001048252400180X Mainext
11 pages
Theme Du PFE
No ratings yet
Theme Du PFE
2 pages
Transfer Learning in Polyp and Endoscopic Tool Segmentation From Colonoscopy Images v2
No ratings yet
Transfer Learning in Polyp and Endoscopic Tool Segmentation From Colonoscopy Images v2
3 pages
SS Jntu Hyd
No ratings yet
SS Jntu Hyd
19 pages
Cse - Dip - Question Bank
No ratings yet
Cse - Dip - Question Bank
3 pages
Minor Project Final
No ratings yet
Minor Project Final
6 pages
Extra Bits SS
No ratings yet
Extra Bits SS
2 pages
Image Processing - Techniques, Types, & Applications (2023)
No ratings yet
Image Processing - Techniques, Types, & Applications (2023)
32 pages
Hand Gesture Recognition Using Python
100% (1)
Hand Gesture Recognition Using Python
4 pages
CS231A - Computer Vision: Project Proposals
No ratings yet
CS231A - Computer Vision: Project Proposals
46 pages
Comparasion PDF
No ratings yet
Comparasion PDF
100 pages
Mask
No ratings yet
Mask
12 pages
Polyp Nature OA2020
No ratings yet
Polyp Nature OA2020
8 pages
Ensemble U-Net Model For Efficient Polyp Segmentation
No ratings yet
Ensemble U-Net Model For Efficient Polyp Segmentation
3 pages
Department Vision Mission
No ratings yet
Department Vision Mission
4 pages
Analog Circuits Lab
No ratings yet
Analog Circuits Lab
1 page
Automated Detection and Segmentation of Large Lesions in CT Colonography
No ratings yet
Automated Detection and Segmentation of Large Lesions in CT Colonography
10 pages
Theft Vehicle Detection Using Automatic License: Plate Recognition
No ratings yet
Theft Vehicle Detection Using Automatic License: Plate Recognition
5 pages
Simpleware, Anybody & Ansys: Case Study
No ratings yet
Simpleware, Anybody & Ansys: Case Study
2 pages
Improving Automatic Polyp Detection Using CNN by Exploiting Temporal Dependency in Colonoscopy Video
No ratings yet
Improving Automatic Polyp Detection Using CNN by Exploiting Temporal Dependency in Colonoscopy Video
14 pages
Subject Title: Analog Circuits Course Code: Year and Semester: II & II
No ratings yet
Subject Title: Analog Circuits Course Code: Year and Semester: II & II
8 pages
2 - Benefits of IEEE Membership and Join IEEE
No ratings yet
2 - Benefits of IEEE Membership and Join IEEE
15 pages
Computer Vision and Image Processsing
No ratings yet
Computer Vision and Image Processsing
3 pages
EDC Unit-2
No ratings yet
EDC Unit-2
22 pages
Ss Jntuk Dec 2015
No ratings yet
Ss Jntuk Dec 2015
4 pages
EC2029-Digital Image Processing Two Marks Questions and Answers - New PDF
No ratings yet
EC2029-Digital Image Processing Two Marks Questions and Answers - New PDF
20 pages
Amharic Ocr
No ratings yet
Amharic Ocr
62 pages
Colonoscopy: Principles and Practice
From Everand
Colonoscopy: Principles and Practice
Jerome D. Waye
No ratings yet
Got Guts! A Guide to Prevent and Beat Colon Cancer
From Everand
Got Guts! A Guide to Prevent and Beat Colon Cancer
Joseph B Weiss
No ratings yet
Case Studies in Advanced Skin Cancer Management: An Osce Viva Resource
From Everand
Case Studies in Advanced Skin Cancer Management: An Osce Viva Resource
James Bricknell
No ratings yet
Got Guts! A Guide to Prevent and Beat Colon Cancer
From Everand
Got Guts! A Guide to Prevent and Beat Colon Cancer
Joseph Weiss
No ratings yet
Fast Facts: The Essentials of Cytopathology
From Everand
Fast Facts: The Essentials of Cytopathology
Peter Dalquen
No ratings yet
Fast Facts: Cholangiocarcinoma: Diagnostic and Therapeutic Advances Are Improving Outcomes
From Everand
Fast Facts: Cholangiocarcinoma: Diagnostic and Therapeutic Advances Are Improving Outcomes
Rachna T. Shroff
No ratings yet
Fast Facts: Early Breast Cancer
From Everand
Fast Facts: Early Breast Cancer
Jayant S. Vaidya
No ratings yet
Rabbit-Tortoise Model for Cancer Cure
From Everand
Rabbit-Tortoise Model for Cancer Cure
Dr. Biswaroop Roy Chowdhury
No ratings yet
Manual of Bone Marrow Examination
From Everand
Manual of Bone Marrow Examination
Anwarul Islam
No ratings yet

Paper 5

Uploaded by

Paper 5

Uploaded by

Received February 2, 2021, accepted February 15, 2021, date of publication March 4, 2021, date of current version March

Real-Time Polyp Detection, Localization and

Corresponding authors: Debesh Jha ([email protected]) and Sharib Ali ([email protected])

VOLUME 9, 2021 40497

40498 VOLUME 9, 2021

TABLE 1. Available endoscopic datasets.

Dataset development, benchmarking of the methods, and

III. MATERIALS – DATASET

VOLUME 9, 2021 40499

and 48 small polyps (≤ 64 × 64 pixels). In total, the dataset

A. DETECTION AND LOCALISATION BASELINE METHODS

40500 VOLUME 9, 2021

Pyramid Scene Parsing Network (PSPNet) [60] introduced

VOLUME 9, 2021 40501

40502 VOLUME 9, 2021

VOLUME 9, 2021 40503

40504 VOLUME 9, 2021

VOLUME 9, 2021 40505

40506 VOLUME 9, 2021

VOLUME 9, 2021 40507

40508 VOLUME 9, 2021

VOLUME 9, 2021 40509

DAG JOHANSEN is currently a Full Profes-

JENS RITTSCHER received the Ph.D. degree from

40510 VOLUME 9, 2021

You might also like