Paper 14

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Generative Adversarial Networks for Automatic Polyp

Segmentation
Awadelrahman M. A. Ahmed
University of Oslo, Norway; [email protected]

ABSTRACT map each gastrointestinal polyp image to the corresponding mask


This paper aims to contribute in bench-marking the automatic that defines the polyps area, this description does not fit within
polyp segmentation problem using generative adversarial networks the standard GAN setting described above where the input is a
framework. Perceiving the problem as an image-to-image transla- noise and the output is an arbitrary (but realistic-looking) image.
tion task, conditional generative adversarial networks are utilized Therefore, for that we utilize a prominent variation of GANs which
to generate masks conditioned by the images as inputs. Both gen- produces an output that is conditioned by its input and hence called
erator and discriminator are convolution neural networks based. conditional GAN.
The model achieved 0.4382 on Jaccard index and 0.611 as F2 score. The polyp segmentation GAN-based model consists of two net-
works. A generator takes the images as input and tries to produce
realistic-looking masks conditioned by this input, and a discrimina-
1 INTRODUCTION tor which is basically a classifier which has the access to the ground
Developing an automated computer-aided diagnosis system for truth masks and tries to classify whether the generated masks are
polyp segmentation is indeed one of the potential solutions that can real or not. To stabilize the training, the images are concatenated
assist colonoscopy, the current gold-standard medical procedure with the masks (generated or real) before being fed to the discrimi-
for examining the colon, in the sense that can lessen the percentage nator. The ultimate goal is to find the generator parameter set 𝜃𝐺∗
of the overlooked polyps. Motivated by that, this paper contributes that satisfies the minimax optimization problem is shown in (1).
by examining a model based on generative adversarial networks We particularly draw a batch of images 𝑥 and their corresponding
(GANs) to represent one of the benchmark methods that other cut- masks 𝑦 from their distributions 𝑋 and 𝑌 respectively. We feed
ting edge new methods can be compared to. This paper discusses a 𝑥 to the generator 𝐺 to produce a fake mask set 𝐺 (𝑥). We then
submission to Medico automatic polyp segmentation challenge [3] concatenate both masks (real and fake) with the input images, this
for task 1 which asks the participants to develop algorithms for is represented by 𝐶. These concatenated pairs are then fed to the
segmenting polyps on a comprehensive data set. In this work we discriminator 𝐷. To learn these models, both networks are updated
perceive the polyp segmentation problem as an image-to-image in an adversarial fashion based on the discriminator output; mean-
translation problem, as we are given a gastrointestinal polyp im- ing that the discriminator aims to maximize the exact function
age and the task is to translate it to the corresponding mask that that the generator aims to minimize. Through training time, the
locates the polyps. GANs framework have been successfully im- generator will be able to generated realistic-looking masks that are
plemented to solve image-to-image translation problem in many good enough to deceive the discriminator, i.e. it can classify as real.
application fields. Authors in [2] investigated conditional GANs as
a general-purpose solution to image-to-image translation problems
and evaluated in different application fields such as translating
aerial images to maps and reconstructing objects from edge maps 𝜃𝐺∗ = 𝑚𝑖𝑛𝜃𝐺 𝑚𝑎𝑥𝜃 𝐷 E𝑥,𝑦∼𝑋 ,𝑌 [𝑙𝑜𝑔𝐷 (𝐶 (𝑥, 𝑦)]
(1)
or templates. The challenge task can be seen in the same way, hence + E𝑥∼𝑋 [𝑙𝑜𝑔(1 − 𝐷 (𝐶 (𝑥, 𝐺 (𝑥))]
we adapted the model architecture suggested in [2] to fit the polyp
segmentation problem. The following sections will illustrate the
model details, evaluation and results followed by the conclusion The model block diagram is shown in Figure 1 and the model
and future work. details are shown in Table 1. Both networks are based on convo-
lution neural networks (CNNs). The generator has two segments,
2 GANS FOR POLYP SEGMENTATION one is based on convolution operations folllowed by deconvolution
layers with using skip connections.
The GANs framework is a generative model scheme based on a
game-theoretic formulation for training data-synthesis models [1].
A GAN consists of two models (e.g., neural networks) trained in 3 EXPERIMENTAL EVALUATION
opposition to each other. A generator network 𝐺 takes a random As required by the challenge, the data set used is the Kvaris-SEG [4]
noise vector as an input and maps it to an output which represents training data set which consists of 1000 image-mask pairs. We
fake data. The other network is the discriminator 𝐷, which receives firstly split the dataset to 800 training set and 200 validation set
that fake data and classifies it as fake data (i.e., generated by 𝐺) or as to optimize the parameters. After we got satisfied with the model,
real data from a real dataset. Nevertheless, our particular task is to we re-trained the model on the whole 1000 data pairs and used to
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons produce the masks of a separate 160 test set that we submitted. The
License Attribution 4.0 International (CC BY 4.0). model is implemented using Python[6] and PyTorch[5] framework
MediaEval’20, December 14-15 2020, Online
on a 2×14-core Intel/128Gib machine.
MediaEval’20, December 14-15 2020, Online A. Ahmed

3.2 Results
To quantify the overlap percentage between the ground truth mask
and our generated masks, we report both Jaccard index the Dice
similarity coefficient (DSC). We also report the per pixel recall,
precision, accuracy and F2 (giving more weight to recall). Results
on the test set are in Table 2. However it is difficult to comment on
those results objectively because of the uniqueness of the dataset
[4], i.e. no previous publications to compare with, but in general
the higher recall than precision reflects the model accountability
for false-negatives which is desirable for this application. The test
throughput is 16 frames/sec on a 2×14-core Intel/128Gib machine.
We report the model outputs for some samples in Figure 3. Even
Figure 1: Model block diagram
though we do not have the access to the ground truth of the test
set, we may observe that the model incorrectly identified the polyp
Generator Network location of the bottom two samples; whereas it did far better in
Conv1 𝑓 *1, LeakyReLU locating the polyps area of other samples (top and middle rows).
Conv2 𝑓 *2, LeakyReLU We do not have a clear explanation for that, but we speculate that
Conv3 𝑓 *4, LeakyReLU the small receptive field of the convolution layers could be a reason.
Conv4, 5, 6, 7, 8 𝑓 *8, LeakyReLU In other words, the convolution layers pay attention to the close by
DeConv1, 2, 3 ,4 𝑓 *8, ReLU, Dropout(0.5) area to each pixel and if this area is rich enough with features (e.g.
DeConv5 𝑓 *4, ReLU, Dropout(0.5) has sharp edges) the polyp can be more distinguishable. This why
DeConv6 𝑓 *2, ReLU, Dropout(0.5) incorporating attention mechanism [7] might help the model to
DeConv7 𝑓 *1, ReLU, Dropout(0.5) attend to fine details and larger ranges; hence we suggest studying
DeConv8 1 , Tanh it as an extended model.
Discriminator Network Jaccard DSC Recall Precision Accuracy F2
Conv1 𝑓 *1, LeakyReLU 0.4382 0.562 0.697 0.556 0.881 0.611
Conv2 𝑓 *2, LeakyReLU
Table 2: Test results as reported by the challenge organizers
Conv3 𝑓 *4, LeakyReLU
Conv4 1 , Sigmoid
𝑓 : number of filters=64
Table 1: Model details

3.1 Data Preparation and Training


As the images have different dimensions, we fit the images to the
mean width and height by cropping the larger images and padding
the smaller images with zeros. The pixel values are normalized
between -1 and 1. No data augmentation were used. The networks
in Table 1 were trained by optimizing the loss function in equation
1. The discriminator minimizes the negative log likelihood and the
generator minimizes the negative of that. We tried other options
like using feature matching, i.e. minimizing the loss between the Figure 3: Model outputs from the test set
features of the discriminator pre-last layer and the original masks
instead of using the last classification layer output, but this seems to
hurt the performance. We used Adam optimizer for both networks 4 CONCLUSION
with learning rates of 0.002, for 12 epochs and batch size of 4. Some This paper aimed to benchmark the polyp segmentation problem
samples of the generated masks for some epochs during training is using GANs. The problem has been perceived as an image-to-image
shown in Figure 2. translation task and we utilized conditional GANs architecture. The
model was able to learn the masks however higher performance can
be achieved by trying some improvements such as adding recon-
struction loss and increasing the dataset with data augmentation.
An interesting extension for this model is to try incorporating an
attention layer [7] which can help the convolution layers in both
generator and discriminator to attend to fine details and expand
Figure 2: Generated masks in different epochs the receptive field.
Medico Multimedia Task MediaEval’20, December 14-15 2020, Online

REFERENCES
[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial
nets. In Advances in neural information processing systems. 2672–2680.
[2] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-
image translation with conditional adversarial networks. In Proceedings of the
IEEE conference on computer vision and pattern recognition. 1125–1134.
[3] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen, Dag Johansen,
Thomas de Lange, Michael A. Riegler, and Pål Halvorsen. 14-15 December 2020.
Medico Multimedia Task at MediaEval 2020:Automatic Polyp Segmentation. In
Proc. of the MediaEval 2020 Workshop, Online.
[4] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange,
Dag Johansen, and Håvard D Johansen. 2020. Kvasir-SEG: A segmented polyp
dataset. In Proc. of International Conference on Multimedia Modeling. 451–462.
[5] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,
Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer.
2017. Automatic differentiation in PyTorch. (2017).
[6] Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateS-
pace, Scotts Valley, CA.
[7] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-
attention generative adversarial networks. In International conference on machine
learning. PMLR, 7354–7363.

You might also like