17 Master2017Liu
17 Master2017Liu
Qinghui L IU
Deep Learning Applied to Automatic Polyp Detection in
Colonoscopy Images
iii
Abstract
Deep learning is an improvement to the neural network that contains more computa-
tional layers that allow for higher levels of abstraction and prediction in the data. So
far, it is becoming a leading machine learning tool for general imaging and computer
vision. Current trends in research have also demonstrated that deep convolutional
neural networks (DCNNs) are very effective in automatically analyzing images. How-
ever, the requirement of large number of annotated samples prohibits its wide use in
medical image analysis, since collecting and labeling a large amount of data is difficult
due to the challenges in obtaining the data from the medical domain.
Polyps are known as possible colorectal cancer precursors, and their early detec-
tion is of great importance, but highly challenging from an image processing stand-
point. In this work, we evaluate several state-of-the-art machine learning techniques
and deep learning methods in the medical image processing domain and research solu-
tions about how they can be more efficiently utilized for automatic detection of polyps
in endoscopy and colonoscopy images.
This work proposes an effective transfer learning (TL) framework relying on pre-
trained DCNNs using a large collection of natural ImageNet images. This has been
achieved by evaluating various kinds of cutting edge techniques including both tradi-
tional machine learning methods by training feature-based classifiers from scratch and
modern DCNNs algorithms with (TL) and fine tuning pre-trained models. We transfer
learned ImageNet weights as initial weights, and then fine-tune this model combined
with a new deep classifier called fully connected networks (FCNs) with data augmen-
tation and patch-extraction of colonoscopy images to automatically detect polyps. In
case of insufficient colonoscopy images, patch-based data augmentation and deep fea-
tures extracted using TL strategy can provide sufficient and balanced classification in-
formation.
With the proposed TL framework with our optimized hyper-parameters, the sys-
tem achieved overall 96.00% polyp detection precision and sensitivity, which outper-
formed the traditional machine learning classification methods in each defined perfor-
mance metric. Moreover, the TL framework proposed is scalable and flexible so that
it can easily be extended to include other types of disease detection in the future and
also be able to integrate one more DCNNs model to boost its generalizing capabilities.
v
Acknowledgements
First and foremost, I would like to express my sincere gratitude to my thesis advisors,
Dr. António L. L. Ramos at USN, Norway, and Dr. Sergio L. Netto at UFRJ, Brazil,
for their guidance and support while conducting this research work as well as in the
writing of this thesis. I could not have had better advisors and mentors for my master
study and research. I am very thankful to Dr. António L. L. Ramos for giving me
this task and to Dr. Sergio L. Netto and his research group, specially Lucas Cinelli
and Bruno Afonso, for sharing their expertise and experience in the field of Machine
Learning.
I would like to thank Dr. Olaf Hallan Graven, our head of department, for pro-
viding all the necessary means and facilities to carry out this task. I also take this
opportunity to express my gratitude to my lecturers and faculty members at USN for
sharing their expertise, and sincere and valuable guidance and encouragement.
I would also like to thank all my classmates, specially my lab-mates Zhili Shao,
Blessing, Vegard, and David, for the stimulating discussions and for all the fun we have
had in the past two years. My sincere thanks also go to Mr. Paul Stupkin for reading
part of the manuscript and for the valuable comments on the English grammar.
Last but not the least, I would especially like to thank my family. Words cannot
express how grateful I am to my mother-in-law, father-in-law, my mother and father
for all of the sacrifices you have made on my behalf. To my beloved wife, GU Yan,
I can’t thank you enough for supporting me for everything, and for encouraging me
throughout this experience.
vii
Contents
Abstract iii
Acknowledgements v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature review 7
2.1 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Texture features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Shape features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Texture and shape features . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Deep architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 CNNs-based CAD systems . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Pre-trained CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Methodology 17
3.1 Proposed frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Image preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Histogram modification . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Noise filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.4 Dimension reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Neural network design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Softmax functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.4 Gradient descent optimizers . . . . . . . . . . . . . . . . . . . . . 26
3.4 Convolutional networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Convolutional layer . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Dropout layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
viii
Bibliography 87
ix
List of Figures
List of Tables
List of Abbreviations
AI Artificial Intelligence.
API Application Programming Interface.
BN Batch Normalization.
CAD Computer Aided Diagnosis.
CLBP Completed Local Binary Pattern.
CNN Convolutional Neural Network.
ConvNet Convolutional Network.
CUDA Compute Unified Device Architecture.
cuDNN cuDA Deep Neural Network library.
CV Cross Validation.
CWC Color Wavelet Covariance.
DCNN Deep Convolutional Neural Networks.
DL Deep Learning.
DT Decision Tree.
FCN Fully Connected Network.
GI Gastro-Intestinal.
GP Gaussian Process.
GPU Graphics Processing Units.
ILSVRC ImageNet Large-Scale Visual Recognition Challenge.
KNN K-Nearest Neighbors.
LBP Local Binary Pattern.
ML Machine Learning.
MLP Multi-Layer Perceptron.
NN Neural Network.
PCA Principal Component Analysis.
RBF Radical Basis Function.
ReLU Rectified Linear Unit.
RF Random Forests.
ResNet Residual Network.
RMS Root Mean Squared.
ROI Region Of Interest.
SGD Stochastic Gradient Descent.
SIFT Scale Invariant Feature Transform.
SVM Support Vector Machine.
TL Transfer Learning.
TPE Tree-structured Parzen Estimator.
TSCH Texture Spectrum and Color Histogram.
TSH Texture Spectrum Histogram.
VCE Video Capsule Endoscopy.
VGG Visual Geometry Group.
WCE Wireless Capsule Endoscopy.
1
Chapter 1
Introduction
1.1 Background
Colorectal cancer is the third most common type of cancer in men and women in the
United States of America and also the second highest cause of cancer deaths [1]. Early
detection of polyps, protrusions from the colon surface, is vital to the prevention of
colorectal cancers, since colorectal cancer is highly curable when it is detected early. It
often begins as a benign polyp of the tissue lining the colon or rectum and, without
proper treatment at early stage, it will eventually develop into a cancer. Therefore,
one of the major goals of endoscopy and colonoscopy is early detection of polyps and
cancers.
stomach, and small intestines; and Colonoscopy, performed by inserting the endoscope
via the anus to examine the large intestine, colon, and rectum.
Wireless (video) capsule endoscopy (WCE/VCE) is a noninvasive technology de-
signed primarily to provide diagnostic imaging of the small intestine in a less invasive
manner. The capsule measures 26 by 11 mm, the size of a large vitamin pill, and is
propelled through the small bowel by peristalsis. Wireless capsule endoscopes have
also been developed for the esophagus and colon, but their use in those areas is not yet
as popular [54]. Colonoscopy is still the preferred technique for colon cancer screening
and prevention.
Appearances of polyps
Polyps appears in different shapes ranging from flat to predunculated forms. The flat
polyps are often attached to the colon wall by their base, predunculated polyps are
attached via a stem. The figure 1.2 shows some examples of colonic polyps extracted
from different colonscopy videos from CVC-ColonDB.
Besides, a polyp may appear in scale depending on the distance between the polyps
and the colonoscopy camera. This is shown in the Fig.1.3 where the same polyp ap-
pears different in scale in each image.
over 50,000 images are captured for analysis, which is time-consuming for physicians
to assess manually.
To reduce the miss-detection of polyps caused by human factors and the cost and
time of screening a large number of colonoscopy or WCE frames, a large number of
techniques have been studied and exploited recently for the automatic detection of
polyps in colonic images.
However, computer-aided automatic detection of polyps is still a difficult task due
to the variety of shape, size, color, texture and size scale in the captured images. Ad-
ditionally, the complex structure of the GI tract, similar color between polyp and non-
polyp regions, poor image quality, and image variation of the same polyp caused by
frequent camera angle changes creates further challenges.
1.3 Motivation
Current trends in research have demonstrated that deep learning methods, especially
deep convolutional neural networks (DCNNs), are very effective for automatic analy-
sis of images. So far, DCNNs have become a leading machine learning tool for general
imaging and computer vision. Indeed, recent advances in deep learning frameworks
and and methods have shown great potential to enhance the performance in computer
vision applications, owing to their robust learning capabilities [17]. This captured our
curiosity to explore and develop an effective approach based on cutting edge DL algo-
rithms to solve a real world problem in medical image analysis. This work, focusing on
automatic polyp detection, can potentially be a life savior, and builds upon our initial
study presented in [31].
4 Chapter 1. Introduction
1.4 Objective
The objective of this work is to develop high performance, scalable and reliable auto-
mated polyp detection systems that can tolerate polyp variability. By handling differ-
ences such as shape, size, color, and texture, computer-aided automatic polyp detection
systems become more feasible in clinical practice.
1.5 Approach
To achieve our objectives and desired outcomes, we chose the SCRUM methodology.
This choice was based on the type of project to be carried out. This method allowed us
to achieve maximum efficiency using iterative weekly sprints on which the work done
the last week was reviewed and new tasks were organized and defined for the next
week. We proposed the following sub-tasks:
• Researching work on the topics related to automatic polyp detection
• Studying and evaluating imaging processing algorithms and the state-of-the-art
machine learning approaches.
• Extensively studying DCNNs algorithms and choosing the most appropriate mod-
els for automatic polyp detecting tasks.
• Developing pre-processing techniques for dataset preparation and performing
primarily experiments on the domain dataset.
• Designing and implementing the DCNNs models for the detection of polyps.
• Performing extensive tests and fine-tuning the models to obtain the best perfor-
mance.
• Final evaluation and suggesting future work.
We first studied the literature that focused on image processing algorithms and
machine learning methods for polyp classification. We then built tools to evaluate these
techniques. Meanwhile, we extensively studied and investigated the newest DCNNs
architectures which could be employed in our work. Then we developed a scalable
transfer learning framework to utilize pre-trained DCNN models for polyp detection
tasks. After the proposed DCNNs models were implemented, we performed extensive
tests and fine-tuning of the models in order to obtain the best performance. Eventually,
we evaluated all supposed methods using comprehensive performance metrics and
suggested an outlook on future work.
1.6 Outline
The remainder of this thesis is organized as follows.
• In Chapter 2, we provide an overview literature discussion on topics related
to automatic polyp detection. This covers a brief description about traditional
texture/shape based methods, conventional machine learning classification and
deep learning concepts.
1.6. Outline 5
Chapter 2
Literature review
This chapter covers general aspects of ML and DL methods in order to build the nec-
essary foundations to understand to scope and results presented in this work. First, an
overview of machine learning techniques is given with a brief discussion of different
learning types such as supervised and unsupervised learning, reinforcement learn-
ing and so on. Subsequently different low-level feature extraction approaches, which
include the texture, shape and fusion of texture and shape features, are presented sep-
arately in the context of automatic polyp detection. Next, the chapter focuses on dis-
cussing deep learning methods, which represent the current state-of-the-art of current
work and future trends. We first present deep learning concepts in general. Then
several cutting edge DCNN models including AlexNet, VGG Net, GoogLeNet and
ResNet, are described in detail since DCNN models are used as an important part of
our work. In the subsequent sections, we analyze different deep learning applications
for the automatic detection of polyp and publications related to this topic, which are
grouped into two separate sections namely CNN-based CAD systems and pre-trained
CNNs according to their utilized methods. Finally, we summarize and evaluate the
results against our specific requirements.
2.1.1 Overview
The typical goal of machine learning is to determine a mapping from input patterns to
an output value [4]. A machine learning algorithm can be expressed as a function y(x)
that uses a input x and generates an output y. The output is usually encoded in the
same way as the target vectors [4]. The form of the function y(x) is determined during
the training or learning phase, based on a training data set. Once the model is trained,
it is then using new date referred to as the test set.
Machine learning algorithms are typically classified into three categories, based on
the nature of the training signals or feedback to the learning system, as follows [43]:
• Supervised learning: In supervised learning problems, the training data is made
up of tuples (xi , yi ), where xi is the input and yi the corresponding target vec-
tor [10]. The goal is to learn a general rule, also called mapping function f : X →
8 Chapter 2. Literature review
Y , that maps inputs to outputs. Supervised learning tasks can be further classi-
fied into classification and regression categories based on the desired output of a
machine-learned system [4]:
In most of machine learning techniques the main stages are the feature extrac-
tion/descriptor step, and a decision-making stage called a classification step. Addi-
tional steps could be added prior to the feature extraction stage such as image smooth-
ing or noise filtering and region-of-interest(ROI) selection.
There are primarily two types of features namely, the shape/geometric features
and texture-color features. Both types of features have been utilized in the literature
for polyp detection in medical images. To improve further the quality of the features
and to have more information acquired on the images, feature fusion approach have
been employed as well. This is done by combining geometric and textual features of
the image to benefit from the information that both provide.
However, texture-color based analysis has two major limitations [23]: it uses a fixed
size analysis window; and relies heavily on an exhaustive training set of images, which
make them very sensitive to parameters tuning.
Nawarathna et al. [37] made use of texton histograms for identifying abnormal
regions with different classifiers such as SVM and KNN. An accuracy of 95.27% was
obtained for polyp detection by using Schmid filter bank based textons and SVM clas-
sifier. Nawarathna et al. [36] later extended futher this approach by using an local
binary patterns (LBP) feature. In addition, a bigger filter bank (Leung–Malik), which
includes Gaussian filters, were proposed for capturing texture more effectively. These
approaches only use texture features without any color or geometrical features. The
best performance of 92% accuracy was obtained based on the Leung–Malik-LBP filter
bank with KNN classifier.
Yuan and Meng [59] utilized scale invariant feature transform (SIFT) feature vec-
tors with K-means clustering for bag of features representation of polyps. The authors
calculated weighted histograms of the visual words by integrating histograms in both
saliency and non-saliency regions. These were fed into an SVM classifier and experi-
ments on 872 images with 436 polyp frames showed that 92% detection accuracy was
obtained.
• It did not detect the actual location of polyp in a colon. This problem is particu-
larly exacerbated in capsule colonoscopy.
• Its effectiveness lies partially in the use of a pre-selection criterion; however, the
pre-selection approach proposed was robust in some sense, but not sophisticated
enough. It was less effective in filtering out frames with bubbles.
• It only utilized texture and geometry information, but the color content was dis-
carded since the frame had to be converted to grayscale before processing.
The suggested system had been tested using two public polyp database contain-
ing 300 unique polyps, and achieved a sensitivity of 88.0%. However, the author also
pointed out that the suggested system might fail to detect the polyps with faint gra-
dients around their boundaries, resulting in a polyp localization failure. In addition,
unsuccessful edge classification could also lead to localization failures.
2.2. Deep learning 11
2.1.5 Classifiers
In most cases that we studied, SVM has been the widely used classifier in medical im-
age processing. SVM determines some support vectors from the feature space which
are helpful to determine the optimal hyperplane to separate a set of objects with max-
imum margin [12]. However, there are no single classification methods which outper-
forms all others on all data sets and there are also some other state of the art classifiers
such as Random Forests (RF) [32], KNN and so on. We will evaluate all of them in this
work.
number of convolution kernels stays the same within the group and increases from 64
in the first group to 512 in the last one. The total number of learnable layers could
be 11, 13, 16, or 19 depending on the number of convolutional layers in each group.
Figure 2.1 illustrate the architecture of 16-layer VGG net (VGG16). VGG Net is one of
the most influential architectures since it strengthened the intuitive notion that CNNs
have to have deep layers for making this hierarchical representation of visual data to
work.
GoogLeNet [49] was the winner of ILSVRC 2014 with a top-5 error of 6.7%. The
authors introduced an novel Inception module which performs pooling and convo-
lutional operations in parallel. GoogLeNet used 9 inception modules with over 100
layers in total but had 12x fewer parameters than AlexNet. It was the first model that
introduced the idea that CNN layers with different kernel filters can be stacked up and
operating in parallel. Utilizing the creative inception module, GoogLeNet can lead
to improved performance and computationally efficiency, since it avoided stacking all
convolution layers and adding huge numbers of filters sequentially which require a
greater number of computational and memory resources and increase the chance of
over-fitting issue as well.
ResNet were originally introduced in the paper "Deep Residual learning for Image
Recognition" [18] by He et.al. It won the championship of ILSVRC 2015 with a new 152-
layer convolutional network architecture (ResNet152) trained on an 8 GPU machine for
two to three weeks. It achieved an incredible top-5 error of 3.6% that set new records
in classification, detection, and localization. Resnets architectures were demonstrated
with 50, 101 and 152 layers. The deeper ResNets got, the more its performance grew.
The authors of ResNet proposed a residual learning approach to ease the difficulty
of training deeper networks by reformulating the layers as residual blocks, with each
block containing two branches, one directly connecting input to the output, the other
2.2. Deep learning 13
performing two to three convolutions and calculating the residual function with ref-
erence to the layer inputs. The outputs of these two branches are then added up as
shown in Figure 2.3.
• Fine-tuned CNNs were more robust to the size of training sets than CNNs trained
from scratch
• Neither shallow tuning nor deep tuning was the optimal choice for a particular
application.
• Layer-wise fine-tuning scheme could offer a practical way to reach the best per-
formance for the application at hand based on the amount of available data.
These results showed the knowledge transfer from natural images to medical im-
ages is possible and suggested [53] that the layer-wise fine-tuning might offer a practi-
cal way to achieve the best performance for some medical image application based on
the amount of available data.
2.3 Summary
In summary, we discussed all of the polyp detection approaches covered so far with
machine learning and deep learning techniques, classifiers utilized along with the
dataset as well as performance details (whenever available). We can see that plenty
of improvements was done either in the pre-processing techniques, feature extraction
algorithms, classification methods or in all, and there is a clear trend toward the use of
deep learning frameworks, especially CNN-based architectures. However, it can also
be seen that these proposed methods are tuned to obtain the best achievable detection
accuracy results for their corresponding datasets, so our belief is that the majority of
these methods have more or less over-fitting or under-fitting problems.
17
Chapter 3
Methodology
In this chapter, we describe different techniques in detail for automatic polyp detection.
The first section of this chapter presents our 3 major frameworks (ML-framework, DL-
framework and TL-framework) for automatic detecting polyps in colonic images, and
we also describe a scalable framework for computer-aided diagnosis systems based on
the fusion of overall state-of-the-art techniques to generalize and extend our project in
future with versatile capabilities in medical domain.
The subsequent section analyses various image preprocessing methods that are uti-
lized in our work and also are necessary for most machine learning and deep learning
systems. These techniques cover histogram modification, noise filtering, data augmen-
tation, and dimension reduction. Next, the chapter focuses on neural networks design
methodologies that mainly cover all the necessary algorithms to build a effective ar-
tificial neural network such as feed-forward structure, activation functions, softmax
functions, loss functions, regularization, gradient descent optimizers and backpropa-
gation methods.
Finally, in the last section, we describe all necessary methodologies for designing
deep convolutional networks that represent state-of-the-art now, which include the
convolution algorithm with zero-padding and stride methods, pooling and dropout
techniques. At last, we describe the deep learning model - 50-layer ResNets with its
detail structure. ResNet50 is the major deep convolutional network architecture uti-
lized in our project.
detect or predict other types of diseases. Generally, it consists of four stages: prepro-
cessing, feature extraction, classification and post-processing as shown in Figure 3.2.
Here red dash line represents the process for training the system.
First, the preprocessing stage is quite import to properly prepare the data by re-
moving noise or unwanted parts of the data. The objective of preprocessing is to refine
the quality of digital images. It can consist of subsampling, enhancing, edge detecting,
scaling or extracting research of interest (ROI) patches, and so on. It has a lot of impact
on the following feature extraction and classification processing.
For the feature extraction phase, the focus is on the extraction of some key charac-
teristics of candidates such as texture and shape by a set of low-level image processing
algorithms. However, more and more DL techniques like CNNs have been recently
utilized as feature descriptor in medical image analysis. We also took advantage of
deep CNNs techniques in our work.
In classification stage, many kinds of classifiers are utilized to discriminating multi-
ple objects on the base of features defined and extracted from previous phase. Finally,
the post-processing stage is needed to properly display the results, formulate diagno-
sis reports, or localize and annotate the diseases for further evaluations by medical
physicians.
The purpose of this suggested CAD architecture is to be as a roadmap for making
versatile CAD systems in future by reproducing, generalizing, and extending our work
on automatic polyp detection systems.
much change in their levels. They are characterized as the occurrence of very nar-
row peaks. Different stretching techniques have been developed to stretch the narrow
range to the whole of the available dynamic range as well. The figure 3.3 shows the
different histogram performance based on three algorithms: contrast stretching, his-
togram equalization and adaptive equalization.
• Calculate the covariance matrix to measure of how two different variables change
together.
The covariance between X and Y, can be given by the following formula 3.1, and
then covariance matrix can be computed with this form 3.2:
n
P
(Xi − X̄)(Yi − Ȳ )
i=1
cov = (3.1)
(n − 1)
cov(x, x) cov(x, y) cov(x, z)
C = cov(y, x) cov(y, y) cov(y, z) (3.2)
cov(z, x) cov(z, y) cov(z, z)
Then we can relate the next layer’s input to it’s previous via the following relation:
!
X
aij = σ i i−1
wjk ak + bij (3.3)
k
• aij represents the activation value of the j th neuron in the ith layer.
1
aij = σ(zji ) = i (3.5)
1 + e−zj
The Tanh functions with the mathematical form as below are related linearly and
can be seen as a rescaled version of the sigmoid function so that its output range is
between -1 to 1.
aij = σ(zji ) = tanh(zji ) (3.6)
The ReLu functions are the most popular choice for deeper architectures. It can be
seen as a ramp function whose range lies above 0 to infinity, so that it is much easier to
calculate than the sigmoid function. The biggest benefit of ReLU is that it bypasses the
vanishing gradient problem.
Here the softmax function % takes as input a C-dimensional vector z and outputs a
C-dimensional vector y of real values between 0 and 1. The denominator C d=1 e acts
zd
P
PC
as a regularizer to make sure that c=1 yc = 1.
Loss functions
A loss function, or a cost function is used for parameter estimation in training neural
networks. The choice of the loss function is an important aspect for designing a deep
neural network. In our project, we make use of cross-entropy loss function which is
defined as:
n
1 X (i)
y ln a(x(i) ) + 1 − y (i) ln 1 − a(x(i) ) (3.9)
L(X, Y ) = −
n i=1
26 Chapter 3. Methodology
Here x , . . . , x(n) is the set of input examples in the training dataset, and
(1)
X =
Y = y , . . . , y (n) is the corresponding set of labels for those input examples. The
(1)
a(x) is the output of the neural network given input x, which is typically restricted to
the open interval (0, 1) by using a ReLU 3.7 or sigmoid 3.5 activation function.
Regularization
Regularization is a very important technique in neural network design to prevent over-
fitting. Regularization works by extending the loss function with a regularization
penalty (R(W )) as:
L = L(X, Y ) + λR(W ) (3.10)
| {z } | {z }
loss function regularization penalty
Although adadelta algorithm strive to do away with learning rate tuning, in prac-
tice the issue isn’t completely solved. Setting and tuning constant and decay rate
3.4. Convolutional networks 27
ρ are still important and necessary in our work to achieve sound performance curve
while the adaptation can effectively counter the learning rate with its own scaling if the
optimization directs it in that direction. The constant can be consider as the ’learning
rate’ of adadelta because it actually determines the update of ∆xt since RM S[∆x]t =
E[∆x2 ]t + and E[∆x2 ]t = ρE[∆x2 ]t−1 + (1 − ρ)∆x2t , where RMS stands for root mean
p
squared.
Backpropagation
Technically, the backpropagation algorithm is a supervised learning method for train-
ing the weights in multilayer feed-forward neural networks. The algorithm can be
divided into two phases: propagation and weight update.
The propagation covers 2 steps: first forward propagation of a training input through
the neural network and then backward propagation of the generated deltas (the error
between the targeted and actual output value). While the weight update must follow
2 steps as well: first, the weight’s delta and input activation are multiplied to deter-
mine the gradient of the weight, and a ratio of that gradient is then subtracted from
the weight.
Here the input volume size is represented as Hi × Wi × C i , and the kernel filter
setting is F × F × C i × K where F stands for the size of the kernel, C i is the channels
of the kernel (must be equal to the channels of input) and K stands for the number
of kernel filters, if given a stride of S and a zero-padding of P , the volume of output
maps (Ho × Wo × C o ) can be produced by the forms below:
F IGURE 3.11: Max and average pooling examples for subsampling fea-
tures.
In addition to max pooling method, average pooling or even L2-norm pooling was
often used historically. However, it has recently fallen out of favor compared to the
max pooling, which has been shown to work better in practice [26]. But we still made
use of both max pooling and average pooling methods in our neural networks, and
it demonstrated that average pooling performed better in some situation than max
pooling.
30 Chapter 3. Methodology
transformation from x directly to f (x), in ResNet we’re computing the term of y that
add f (x) to the identity x as shown in Figure 3.13.
The residual network design addresses the problem of vanishing gradients in the
simplest way possible, since the main challenge in training deeper networks is that
accuracy degrades with network depth. The concept of residual learning behind is a
great innovation and becoming one of the hot new ways to build deep convolutional
neural networks. Safe to say, the ResNet model is now the best single CNN architecture
for object detection, which is the main reason we choose this model for our work. Fig-
ure 3.14 illustrates ResNets with 50 layers. ResNets use bottleneck blocks of different
numbers of repetitions which converges very fast and can be trained with hundreds or
thousands of layers.
32 Chapter 3. Methodology
Chapter 4
as our hardware platform. Table 4.1 shows the basic configurations and the tested
configurations for our project.
• TensorFlow: TensorFlow [2] is an open source Python library for fast numerical
computing created and released by Google and released under the Apache 2.0
open source license. It is a foundation library that can be used to create Deep
Learning models directly or by using other wrapper libraries like Keras that sim-
plify the process built on top of TensorFlow. It can run on single CPU systems,
GPUs as well as mobile devices and large scale distributed systems of hundreds
of machines. Please refer to Tensorflow.org.
• Keras: Keras is an open source API written in Python which uses as backend
either Theano or Tensorflow. It was developed with a focus on enabling fast ex-
perimentation, so that it is easier to build complete solutions, and is easy to read
with the greatest selection of state-of-the-art algorithms (optimizers, normaliza-
tion routines, activation functions). Please refer to Keras.io.
• Other APIs: Besides the above libraries, we also utilize some other open source
APIs that focus on more specific tasks, which include OpenCV, Pandasm, Numpy,
Matplotlib, Scripy, H5py, QtPy, and so on. For more details, please refer to Ap-
pendix A.
4.2. Input data preparation 35
• Positive patches: we extract a patch (300*300) which covers the whole polyp from
every frame (574*500).
• Negative patches (non-polyp patches): we crop the region which does not con-
tain any part or only cover a little part of polyp from each frame.
Figure 4.1 illustrates the process of extracting positive and negative patches from a
positive frame (containing a polyp)
After patch extraction, we make use of data augmentation techniques with hori-
zontal and vertical flips, random rotations and so on to artificially boost the amount
of positive and negative samples. Finally we generate our new balanced dataset with
2200 training samples and 400 test samples. The positive and negative sets are equal
in size as shown in Table 4.2
• Accuracy: The proportion of all predictions that are correct. Accuracy is a mea-
surement of how good a model is.
TP+TN
Accuracy =
TP+FN+TP+TN
• Precision: The proportion of all positive predictions that are correct. Precision is
a measure of how many positive predictions were actual positive observations.
TP
Precision =
TP+FP
• Recall/Sensitivity: The proportion of all real positive observations that are cor-
rect.
TP
Recall/Sensitivity =
TP+FN
• Specificity: The proportion of all real negative observations that are correct.
TN
Specificity =
TN+FP
• F1-Score: The harmonic mean of precision and recall.
Precision ∗ Recall
F1-score = 2 ∗
Precision + Recall
In practice, sensitivity/recall indicates how good a test is at detecting the positives,
while specificity represents how good a test is at avoiding false alarms, and precision
4.3. Traditional ML methods 37
illustrates how many of the positively classified were relevant. Sensitivity, specificity
and precision are the most used performance metrics in the medical field.
Though SVM seems the most popular classifier and has achieved very good per-
formance in image classification tasks according to our literature review, there are no
single classification methods which outperforms all others on all data sets. Therefore,
we decided to evaluate 10 different state-of-the-art classifiers together to compare their
performance on our own data set. The 10 classifiers are shown below.
• KNN : K-nearest Neighbor (KNN) classifier implements based on the K nearest
neighbors of each query point.
• Linear SVM: An implementation of SVM with a linear kernel requiring only one
hyper parameter C which reduces the training and testing times by trading off
misclassification of training examples against simplicity of the decision surface.
• RBF SVM: Another implementation of SVM with radical basis function (RBF)
kernel requiring 2 parameters C and γ which defines how much a single training
example has.
• SGD: Stochastic gradient descent(SGD) classifier requires a number of hyper pa-
rameters and is sensitive to feature scaling.
• GP: Gaussian Process (GP) classifier uses the whole sample’s information to per-
form the prediction.
• DT: Decision tree (DT) classifier is a non-parametric method by using a tree-like
decision model.
38 Chapter 4. Implementation and Results
• RF: Random Forest (RF) classifier is a collection of ensemble decision trees, each
tree in RF is built or grown from randomly selected subset.
• Naive Bayes: Naive Bayes classifier is a set of learning algorithms based on Bayes’
theorem with the naive assumption of independence between every pair of fea-
tures.
inexpensive architecture. These weights are ported from the ones released by Kaiming
He under the MIT license.
• Learning rate (η) : Learning rate is one of the most important and sensitive pa-
rameters that multiplies the computed gradient in the update. The most common
approach here is to start with a small learning rate and increase it exponentially if
two epochs in a row reduce the error, while on the other hand decrease it rapidly
if a significant error increase occurs.
• Decay rate (ρ): When training a deep neural networks, it is necessary to lower
the learning rate as the training progresses by setting a proper decay rate. The
learning rate is a parameter that determines how much an updating step influ-
ences the current value of the weights, while the weight decay is an additional
term in the weight update rule that prevents over fitting and leads to convergence
faster. Adadelta [60] uses exponential decaying methods. The detail algorithm
was presented in the chapter 3.4.4.
• Batch size (Bs): In practice, batch size and learning rate are linked. If a batch size
is too small then the gradients would become more unstable and would need to
reduce the learning rate. And more, the higher the batch size, the more memory
space we will need. Due to the limits of hardware configurations, the maximum
batch size is up to 10 with 224x224 image size as input, and maximum 32 batch
size for 100x100 inputs.
• Input size (Is): The size of image resized to feed to the model, which is really
linked to batch size that depends on the GPU’s capability, so we have to com-
promise on the setting because of the limitation of our hardware configuration as
42 Chapter 4. Implementation and Results
mentioned above.
• Training epochs (Te): One epoch means one forward pass and one backward
pass of all the training samples. Early stopping method can be applied given
enough training dataset along with k-fold cross validation strategy. Typically a
patience number should be defined first. Patience number stands for the num-
ber of epochs to wait before early stop if no progress on the validation set. The
patience number is often set somewhere between 3 and 20.
• Dropout rate (Dr ): Dropout is a simple but quite effective way to regularize the
neural networks and address the over-fitting problem. It has been demonstrated
that dropout improves the performance of neural networks on supervised learn-
ing tasks in vision, speech recognition, document classification and computa-
tional biology, obtaining state-of-the-art results on many benchmark data sets
[48]. However, too high rate could result in under-fitting problem as well base on
our experimentations.
• Pooling size (Ps): In our neural networks, we utilized average pooling methods
before the fully connected layers in order to reduce the resolution of the feature
map but retain features of the map required for classification through transla-
tional and rotational invariants. Its default filter size is 7 × 7 which should de-
crease carefully to smaller filters (2 × 2, 3 × 3, or 5 × 5) in order to fit different
input image sizes.
44 Chapter 4. Implementation and Results
TABLE 4.6: The suggested setting ranges of hyper parameters for the pro-
posed TL framework.
The table 4.7 shows an example of 3-fold CV process. The training data is split
into 3 folds, the folds 1-2 first become the training set. Fold 3 here is denoted as the
validation fold for tuning the hyper-parameters. Once the training process of given
epochs is completed, the model is tested a single time on the test data (marked yellow).
Therefore, after running 3-fold CV we gain 3 different performance scores on the test
dataset that we can summarize using a mean and a standard deviation. The result is
48 Chapter 4. Implementation and Results
more accurate because the model is trained and evaluated multiple times on different
data.
F IGURE 4.11: The impact of different input image sizes. The larger input
sizes could dramatically improve the model’s performance, while greater
number of system memories and training time is required as well.
Model-0 (100x100), Model-3 (150x150) with Model-6 (224x224), the hyper parameters
almost have the same setting, but Model-6 achieved much better performances than
Model-3 and Model-0 - accuracy (92.75% vs 82.75% vs 75.25%), precision (90.91% vs
80.18% vs 77.60%), sensitivity (95.00% vs 87.00% vs 71.00%), and F1-score (92.91% vs
83.45% vs 74.15%) as shown in the figure 4.11. The best sensitivity rates achieved by in-
put size of 100x100 and 150x150 are only 81.50% (Model-1) and 87.00% (Model-3) sep-
arately among all the models, while for the input size of 224x224, we finally achieved
96.00% sensitivity and F1-score with Model-8.
Though the larger input sizes improve the performance of models, a longer time is
required for each training epoch, and a greater number of system memories are occu-
pied as well. Sometimes the out-of-memory issue may occur during training DCNNs
with too big input sizes. In this case, either the input size and batch size must be de-
creased, or the system hardware configuration improved with greater memory and
more powerful CPUs to solve the problem.
rate from 0.805 to 0.8, Model-8 mitigated the over-fitting problem and achieved better
performance than Model-6 as shown in Table 4.9.
Meanwhile, from our observation, we also found bigger batch size could mitigate
over-fitting issues in some way, and greater input sizes commonly require a slightly
higher dropout rate and decay rate to avoid over-fitting than smaller input sizes, how-
ever, too high dropout rate or decay rate could result in under-fitting problems as well,
as shown in the curve of Model-7 and Model-0.
However, through our experiments, we can observe surprisingly that it does not
lead to an overall performance improvement by increasing k-fold or training epochs
by comparing the model-1 with model-2, or model-4 with model-5 or model-3. Mean-
while, the decay rates should also be altered a little higher to avoid over-fitting if the
training epochs are increased significantly.
4.5.4 Generalization
As we can observe in our experiments, the proposed TL models generalize quite well
given that the training accuracy are almost all 100%. First, dropout strategy improves
the generalization capability of our models by a large rate (at 0.8% for Model-8). Sec-
ond, for the structure of ResNets, batch normalization applied in convolutional blocks
also help improve both the training speed and generalization. Another important rea-
son is that we replace the fully-connected layer of ResNet50 by a global average pool-
ing layer, and 2 FC layers before the softmax output layer, which greatly reduces the
amount of parameters. Thus, our TL DCNNs models demonstrate very strong gener-
alization capability with the state-of-the-art performance.
52 Chapter 4. Implementation and Results
4.5.5 Constraints
Since we are using the pre-trained model, the convolutional filters, the kernel size, and
the number of layers are fixed in our TL architecture. We are also slightly constrained
in terms of the model architecture. For example, we can’t arbitrarily take out certain
convolutional layers from ResNet50. However, the input layer with different image
size can be customized due to parameter sharing.
Another constraint of our work is the hardware. When training a deep neural net-
works, the system has to keep all the intermediate activation outputs for the backwards
pass. So we need to compute how much memory it will need to store all the relevant
activation outputs in the forward pass, in addition to other memory constraints such
storing the weights on the GPU and so on. Since our model is quite deep with 50-
layers, we have to take a smaller batch-size as we do not have enough system and
GPU memory. For instance, we are not able to take batch-size over 10 given the in-
put size of 224x224 due to our GPU’s memory constraints, which actually limited our
system’s performance. In practice, especially in the case of deep learning with GPUs,
larger batches are very attractive computationally and it is very common to take larger
batch-sizes that fully leverage the GPU.
• Tuning some key hyper parameters on a small subset database could allow you
to quickly establish a rough but very valuable tuning range of each parameter.
The subset should be sub-sampled from your own entire dataset.
• Once you establish a rough tuning range of each hyper parameter, you could fur-
ther conduct a set of specific experimentation within the range but with a smaller
scale to alter each parameter one time.
• After above two steps, you could obtain both more accurate setting ranges of
each parameter and high valuable insights on the performance of your system
against different settings.
4.5. Evaluation and discussion 53
In addition, from what we can observe on a large number of experiments, the sys-
tem’s test performance, in terms of accuracy, precision, sensitivity, specificity and F1-
score, can be significantly affected by some just slight changes among several key hy-
per parameters like dropout rate, decay rate and learning rate in our case. For instance,
looking at Model-6, -7, and Model-8 in Table 4.9, Model-8 has just slightly increased the
learning rage to 0.05 from 0.049, and decreased the dropout rate to 0.8 from 0.805, and
keep the day rate at 0.0025 same with Model-7, but surprisingly Model-8 finally yields
much better results than Model-6 and Model-7.
All in all, DNN hyper-parameter tuning is still considered as a “dark art”, master-
ing the ’dark art’ requires not only a solid background in machine learning algorithms,
but also extensive experience working with real-world datasets.
Chapter 5
5.1 Conclusion
In this thesis, we investigated various techniques and solutions for automatic detec-
tion of polyps in endoscopic images. The goal of our study is to explore the use of
the cutting-edge machine learning, computer vision and deep learning algorithms to
achieve automated disease diagnosis.
We first studied and discussed work on topics related to the automatic polyp de-
tection in colon images. We consider shape and texture-based classification (such as
SVM, KNN, etc.) techniques as the conventional machine learning methods for dis-
tinguishing with deep learning based ones. For traditional ML-based techniques, we
first provided an overview of machine learning approaches with a brief discussion
of different learning types such as supervised and unsupervised learning and so on.
Then we discussed different feature extraction and classification algorithms utilized for
polyp detection tasks which covered shape and texture-color based methods. As for
DL-based techniques, we first studied a set of state-of-the-art deep learning networks
such as ALexNet, VGG Net, GoogLeNet, and ResNet which have demonstrated out-
standing effectiveness in image classification domain which also can be applied into
medical image processing pipelines. Subsequently CNN-based CAD systems along
with pre-trained CNNs techniques were discussed.
Based on our literature review, we first proposed our three different schemes for
automatic detection of colorectal polyps named ML-framework, DL-framework and
TL-framework separately standing for machine learning, deep learning and transfer
learning frameworks. We also provided a scalable CAD framework which consisted of
4 flexible modules based on the fusion of a set of state-of-the-art image processing algo-
rithms in order to generalize and extend our work in future with versatile capabilities
in the medical domain. automatic polyp detection. We then presented and analyzed
various image preprocessing methods including histogram modification, noise filter-
ing, data augmentation and dimension reduction etc. The next most important part
of our work is related to the detailed design methodologies of deep neural networks
that are also our major contributions. We analyzed the cutting edge techniques and
algorithms that are all necessary to build a high effective deep learning network. That
covered general neuron algorithm, feed-forward network, activation and loss func-
tions with regularization approach, gradient descent optimization algorithms with the
backpropagation process. And last we described the key techniques in detail for deep
ConvNets that covered the convolution algorithm with stride and padding methods,
different pooling techniques and dropout methodologies etc. Finally, we analyzed the
60 Chapter 5. Conclusion and Future Work
50-layer ResNet architecture that was the major deep learning model utilized in our
transfer-learning framework.
In the implementation phase, we developed a set of software tools to extract patches
from the ground truth CVC-ColonDB and enlarged the data set by automatic augmen-
tation algorithms and finally made our patch-balanced dataset with sufficient size for
our research and experiments. Meanwhile, we built 10-classifiers (Linear SVM, RBF
SVM, KNN, RF,GP, SGD, MLP, Adaboost and Bayes) along with a set of low-level fea-
ture extractors (Histogram and a set of different filters) to evaluate the performance
for detecting polyps by making use of these classifiers with low-level feature extrac-
tors. We then established the benchmarks from these experiments on our own dataset
by using these conventional machine learning methods, which can be used later as a
comparison base against DCNNs’ performance.
Based on our extensively study and research on different cutting-edge DCNNs
techniques, we successfully developed an effective transfer learning architecture which
consists of a new FCNs classifier and input layer combined with a pre-trained 50-layer
ResNet model. We implemented the proposed TL-framework by Python with Tensor-
flow and CUDA as backend to make the best use of the parallel computational power
of GPUs.
DCNNs are very sensitive to the setting of their hyper-parameters. In our TL-
framework, we provide 8 hyper parameters that include learn rate (η), decay(ρ), batch
size (Bs), input size (Is), epoch number (T e), dropout rate (Dr), k-fold number (K),
and pooling size (P s). These hyper parameters make our system very flexible and
scalable. However, fine-tuning the hyper parameters is a tricky process. Though
there are some automatic fine-tuning approaches such as grid search, random search,
or Bayesian optimization and TPE algorithms, etc. All these methods either are too
costly and time-consuming or too difficult to apply in unique deep neural networks.
Therefore, experimentation with hand-tuning is still the best approach till now for
fine-tuning deep learning systems. In our work, we creatively made an high effective
hand-tuning strategy with first establishing a rough range of each hyper parameter by
conducting a set of quick experiments on a small sub-sampled training set, and then
further fine-tuning each parameter on the whole dataset to determine a more accu-
rate setting range. This unique hand-tuning methods saved us a lot of time to search
and select the best and most suitable setting of the hyper-parameter to obtain better
performance in terms of accuracy, precision, sensitivity and so on.
We finally achieved overall 96.00% detection accuracy and precision, 96.00% sensi-
tivity and specificity, and 96.00% f1-score by using the proposed TL framework with
our optimized hyper-parameters, which outperformed the traditional machine learn-
ing classification methods in each defined performance metric. Moreover, the TL frame-
work proposed is scalable and flexible so that it can easily be extended to include other
types of disease detection in future.
Appendix A
Based on the specific requirements and time constraints of our project, we chose to use
Python programming language and the below toolkits and libraries in this work.
Python:
Python is an interpreted, object-oriented, high-level programming language which is
developed under an OSI-approved open source license, making it freely usable and
distributable. And its high-level built in data structures, dynamic typing and dynamic
binding, make it very attractive for rapid prototyping and application development
especially in the big data and deep learning domain. For more information, please
refer to Python.org.
TensorFlow:
TensorFlow [2] is an open source Python library for fast numerical computing created
and released by Google and released under the Apache 2.0 open source license. It is a
foundation library that can be used to create Deep Learning models directly or by using
other wrapper libraries like Keras that simplify the process built on top of TensorFlow.
It can run on single CPU systems, GPUs as well as mobile devices and large scale
distributed systems of hundreds of machines. Please refer to Tensorflow.org.
Keras:
Keras is an open source API written in Python which uses as backend either Theano or
Tensorflow. It was developed with a focus on enabling fast experimentation, so that it
is easier to build complete solutions, and is easy to read with the greatest selection of
state-of-the-art algorithms (optimizers, normalization routines, activation functions).
Please refer to Keras.io.
64 Appendix A. Required toolkits and libraries
OpenCV:
OpenCV is a famous open source computer vision library. It is free for both commer-
cial and research use under a BSD license. The library is cross-platform, and runs on
Windows, Linux, Mac OS X, mobile Android and iOS with support of C/C++, Python
and Java interfaces. The library itself is written in C/C++, but Python bindings are
provided when running the installer. We utilized OpenCV 3.0 in our application. For
more details, please refer to OpenCV.org.
Scikit-learn:
Scikit-learn is an open source library built on Numpy, Scipy and Matplotlib. It is devel-
oped by a large community of developers and machine learning experts. Scikit-learn
provides a set of tools for many of the standard machine-learning tasks (such as clus-
tering, classification, regression, etc.). It can be commercially usable under BSD license.
For more details, please refer to Scikit-learn.org
Others:
There are also some other open source APIs utilized in our applications that include
NumPy, SciPy, Matplotlib, Pandas, H5py, QtPy, etc. We will not present them here
in more detail, since it is convenient to get these resources on-line.
65
Appendix B
24 x = conv_block ( x , 3 , [ 6 4 , 6 4 , 2 5 6 ] , s t a g e =2 , b l o c k= ’ a ’ , s t r i d e s = ( 1 ,
1) )
25 x = i d e n t i t y _ b l o c k ( x , 3 , [ 6 4 , 6 4 , 2 5 6 ] , s t a g e =2 , b l o c k= ’ b ’ )
26 x = i d e n t i t y _ b l o c k ( x , 3 , [ 6 4 , 6 4 , 2 5 6 ] , s t a g e =2 , b l o c k= ’ c ’ )
27
28 x = conv_block ( x , 3 , [ 1 2 8 , 1 2 8 , 5 1 2 ] , s t a g e =3 , b l o c k= ’ a ’ )
29 x = identity_block (x , 3 , [ 1 2 8 , 1 2 8 , 5 1 2 ] , s t a g e =3 , b l o c k= ’ b ’ )
30 x = identity_block (x , 3 , [ 1 2 8 , 1 2 8 , 5 1 2 ] , s t a g e =3 , b l o c k= ’ c ’ )
31 x = identity_block (x , 3 , [ 1 2 8 , 1 2 8 , 5 1 2 ] , s t a g e =3 , b l o c k= ’ d ’ )
66 Appendix B. Code snippets for implementation
32
33 x = conv_block ( x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ a ’ )
34 x = identity_block (x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ b ’ )
35 x = identity_block (x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ c ’ )
36 x = identity_block (x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ d ’ )
37 x = identity_block (x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ e ’ )
38 x = identity_block (x , 3 , [ 2 5 6 , 2 5 6 , 1 0 2 4 ] , s t a g e =4 , b l o c k= ’ f ’ )
39
40 x = conv_block ( x , 3 , [ 5 1 2 , 5 1 2 , 2 0 4 8 ] , s t a g e =5 , b l o c k= ’ a ’ )
41 x = i d e n t i t y _ b l o c k ( x , 3 , [ 5 1 2 , 5 1 2 , 2 0 4 8 ] , s t a g e =5 , b l o c k= ’ b ’ )
42 x = i d e n t i t y _ b l o c k ( x , 3 , [ 5 1 2 , 5 1 2 , 2 0 4 8 ] , s t a g e =5 , b l o c k= ’ c ’ )
43
L ISTING B.1: Create ResNet50 model with a customized top layer for
transfer learning.
1 def i d e n t i t y _ b l o c k ( i n p u t _ t e n s o r , k e r n e l _ s i z e , f i l t e r s , s t a g e , b l o c k ) :
2 """
3 The i d e n t i t y _ b l o c k i s t h e b l o c k t h a t has no conv l a y e r a t s h o r t c u t
4 Arguments
5 i n p u t _ t e n s o r : in pu t t e n s o r
6 k e r n e l _ s i z e : d e f u a l t 3 , t h e k e r n e l s i z e o f middle conv l a y e r a t
main path
7 f i l t e r s : l i s t o f i n t e g e r s , t h e n b _ f i l t e r s o f 3 conv l a y e r a t
main path
8 s t a g e : i n t e g e r , c u r r e n t s t a g e l a b e l , used f o r g e n e r a t i n g l a y e r
names
9 b l o c k : ’ a ’ , ’ b ’ . . . , c u r r e n t b l o c k l a b e l , used f o r g e n e r a t i n g
l a y e r names
B.1. ResNet50 model 67
10 """
11
12 nb_filter1 , nb_filter2 , nb_filter3 = f i l t e r s
13 bn_axis = 3
14 conv_name_base = ’ r e s ’ + s t r ( s t a g e ) + b l o c k + ’ _branch ’
15 bn_name_base = ’ bn ’ + s t r ( s t a g e ) + b l o c k + ’ _branch ’
16
17 x = Convolution2D ( n b _ f i l t e r 1 , 1 , 1 , name=conv_name_base + ’ 2 a ’ ) (
input_tensor )
18 x = BatchNormalization ( a x i s =bn_axis , name=bn_name_base + ’ 2 a ’ ) ( x )
19 x = Activation ( ’ relu ’ ) ( x )
20
21 x = Convolution2D ( n b _ f i l t e r 2 , k e r n e l _ s i z e , k e r n e l _ s i z e ,
22 border_mode= ’ same ’ , name=conv_name_base + ’ 2b ’ ) ( x )
23 x = BatchNormalization ( a x i s =bn_axis , name=bn_name_base + ’ 2b ’ ) ( x )
24 x = Activation ( ’ relu ’ ) ( x )
25
26 x = Convolution2D ( n b _ f i l t e r 3 , 1 , 1 , name=conv_name_base + ’ 2 c ’ ) ( x )
27 x = BatchNormalization ( a x i s =bn_axis , name=bn_name_base + ’ 2 c ’ ) ( x )
28
29 x = merge ( [ x , i n p u t _ t e n s o r ] , mode= ’sum ’ )
30 x = Activation ( ’ relu ’ ) ( x )
31 return x
24 x = Convolution2D ( n b _ f i l t e r 2 , k e r n e l _ s i z e , k e r n e l _ s i z e , border_mode=
’ same ’ ,
25 name=conv_name_base + ’ 2b ’ ) ( x )
68 Appendix B. Code snippets for implementation
32 s h o r t c u t = Convolution2D ( n b _ f i l t e r 3 , 1 , 1 , subsample= s t r i d e s ,
33 name=conv_name_base + ’ 1 ’ ) ( i n p u t _ t e n s o r )
34 s h o r t c u t = BatchNormalization ( a x i s =bn_axis , name=bn_name_base + ’ 1 ’ )
( shortcut )
35
36 x = merge ( [ x , s h o r t c u t ] , mode= ’sum ’ )
37 x = Activation ( ’ relu ’ ) ( x )
38 return x
1 import numpy as np
2 import argparse
3 import imutils
4 import cv2
5 import os
6
7
8 def e x t r a c t _ c o l o r _ h i s t o g r a m ( image , b i n s = ( 8 , 8 , 8 ) ) :
9 # e x t r a c t a 3D c o l o r histogram from t h e HSV c o l o r space using
10 # t h e s u p p li e d number o f ‘ bins ‘ per channel
11 hsv = cv2 . c v t C o l o r ( image , cv2 . COLOR_BGR2HSV)
12 h i s t = cv2 . c a l c H i s t ( [ hsv ] , [ 0 , 1 , 2 ] , None , bins ,
13 [ 0 , 180 , 0 , 256 , 0 , 256])
14
The code snippet in Listing B.5 is to implement the comparison experiments on the
10 suggested classifiers.
1 # import t h e n e c e s s a r y packages
2 from s k l e a r n . m o d e l _ s e l e c t i o n import t r a i n _ t e s t _ s p l i t
B.2. Feature extraction and classifier 69
61 h i s t = e x t r a c t _ c o l o r _ h i s t o g r a m ( image )
62 d a t a _ t e s t . append ( h i s t )
63 l a b e l s _ t e s t . append ( l a b e l )
64
65 # show an update every 1 , 0 0 0 images
66 i f i > 0 and i % 100 == 0 :
67 p r i n t ( " P ro ces se d { } / { } " . format ( i , l e n ( imageTestPaths ) ) )
68
69 # encode t h e l a b e l s , c o n v e r t i n g them from s t r i n g s t o i n t e g e r s
70 l e = LabelEncoder ( )
71 labels = le . fit_transform ( labels )
72 labels_test = le . fit_transform ( labels_test )
73
74 # p a r t i t i o n t h e data i n t o t r a i n i n g and t e s t i n g s p l i t s , using 75%
75 # o f t h e data f o r t r a i n i n g and t h e remaining 25% f o r t e s t i n g
76 # p r i n t ( " [ INFO ] c o n s t r u c t i n g t r a i n i n g / t e s t i n g s p l i t . . . " )
77 # ( trainData , testData , trainLabels , t e s t L a b e l s ) = t r a i n _ t e s t _ s p l i t (
78 # np . a r r a y ( data ) , l a b e l s , t e s t _ s i z e = 0 . 2 5 , random_state =42)
79
80 X _ t r a i n = np . a r r a y ( data )
81 y_train = labels
82 X _ t e s t = np . a r r a y ( d a t a _ t e s t )
83 y_test = labels_test
84
85 names = [ " Nearest Neighbors " , " L i n e a r SVM" , " RBF SVM" , " S G D C l a s s i f i e r " ,
86 " Gaussian P r o c e s s " ,
87 " D e c i s i o n Tree " , "Random F o r e s t " , " M L P C l a s s i f i e r " , " AdaBoost " ,
88 " Naive Bayes " ]
89
90 classifiers = [
91 KNeighborsClassifier (59) ,
92 LinearSVC ( ) ,
93 SVC( k e r n e l = ’ poly ’ ,C= 0 . 1 ,gamma= 0 . 0 1 , degree =3) ,
94 S G D C l a s s i f i e r ( l o s s = " l o g " , n _ i t e r =10) ,
95 G a u s s i a n P r o c e s s C l a s s i f i e r ( 1 . 0 ∗ RBF ( 1 . 0 ) , warm_start=True ) ,
96 D e c i s i o n T r e e C l a s s i f i e r ( max_depth =15) ,
97 R a n d o m F o r e s t C l a s s i f i e r ( n _ e s t i m a t o r s =100 , max_features= ’ s q r t ’ ) ,
98 M L P C l a s s i f i e r ( alpha =1) ,
99 AdaBoostClassifier ( learning_rate =0.1) ,
100 GaussianNB ( ) ]
101
102 # c l o s s _ v a l i d a t i o n a c c u r a c y experiments
103 results = {}
104 f o r name , c l f i n z i p ( names , c l a s s i f i e r s ) :
105 s c o r e s = c r o s s _ v a l _ s c o r e ( c l f , X _ t r a i n , y _ t r a i n , cv =5)
106 r e s u l t s [ name ] = s c o r e s
107
108 f o r name , s c o r e s i n r e s u l t s . i te ms ( ) :
109 p r i n t ( " %20s | Accuracy : %0.2 f%% (+/− %0.2 f%%)" % ( name , 100 ∗ s c o r e s
. mean ( ) , 100 ∗ s c o r e s . s t d ( ) ∗ 2 ) )
110
111 # i t e r a t e over c l a s s i f i e r s by using f i x e d a d d i t i o n a l v a l i d a t i o n data
112 f o r name , model i n z i p ( names , c l a s s i f i e r s ) :
113 p r i n t ( " T r a i n i n g and e v a l u a t i n g c l a s s i f i e r { } " . format ( name ) )
114 model . f i t ( X _ t r a i n , y _ t r a i n )
115
116 p r e d i c t i o n s = model . p r e d i c t ( X _ t e s t )
B.3. Data augmentation 71
117 p r i n t ( c l a s s i f i c a t i o n _ r e p o r t ( y _ t e s t , p r e d i c t i o n s , target_names= l e .
classes_ ) )
22 i f i n f i l e != o u t f i l e :
23 try :
24 im = Image . open ( i n f i l e )
25 im . thumbnail ( s i z e , Image . ANTIALIAS )
26 im . save ( o u t p u t _ d i r + o u t f i l e + e x t e n s i o n , " JPEG " )
27 e x c e p t IOError :
28 p r i n t ( " cannot reduce image f o r { } " . format ( i n f i l e ) )
29
30
31 i f __name__== " __main__ " :
32 output_dir = " resized "
33 d i r = os . getcwd ( )
34
35 i f not os . path . e x i s t s ( os . path . j o i n ( d i r , o u t p u t _ d i r ) ) :
36 os . mkdir ( o u t p u t _ d i r )
37
38 f o r f i l e i n os . l i s t d i r ( d i r ) :
39 resizeImage ( f i l e )
1
k = np.ones((5,5))/25
# perform convolution
b = sn.filters.convolve(a,k)
pyplot.subplot(1,2,2)
pyplot.imshow(b,cmap='gray') # try cmap = 'bone_r' or other parameters
pyplot.title('mean filter output')
pyplot.show()
# convert ndarray to an image
b = scipy.misc.toimage(b)
b.save('mean_output.png')
b_median = scipy.misc.toimage(b_median)
pyplot.subplot(1,2,1)
pyplot.imshow(b_median,'gray')
pyplot.title('median filter output')
b_median.save('b_median.png')
2
output=None,
mode ='reflect',
cval=0.0, origin=0)
b_max = scipy.misc.toimage(b_max)
pyplot.subplot(1,2,2)
pyplot.imshow(b_max,'gray')
pyplot.title('max filter output')
b_max.save('b_max.png')
3
pyplot.imshow(b_edge,'gray')
pyplot.title('sobel filter output')
b_edge = scipy.misc.toimage(b_edge)
b_edge.save('b_edge.png')
/usr/local/Cellar/anaconda2/lib/python2.7/site-packages/skimage/filter/__init__.py:6: skimage_depr
warn(skimage_deprecation('The `skimage.filter` module has been renamed '
b_prewitt = scipy.misc.toimage(b_prewitt)
b_prewitt.save('b_prewitt.png')
b_hprewitt = scipy.misc.toimage(b_hprewitt)
b_hprewitt.save('b_hprewitt.png')
4
In [5]: # canny and laplace filters
b_canny = feature.canny(a, sigma=0.1)
pyplot.subplot(1,2,1)
pyplot.imshow(b_canny,'gray')
pyplot.title('canny filter output')
b_canny = scipy.misc.toimage(b_canny)
b_canny.save('b_canny.png')
#b_laplace = skimage.filters.laplace(a,ksize = 3)
b_laplace = sn.filters.laplace(a,mode='reflect')
pyplot.subplot(1,2,2)
pyplot.imshow(b_laplace,'gray')
pyplot.title('laplace filter output')
b_laplace = scipy.misc.toimage(b_laplace)
b_laplace.save('b_laplace.png')
5
In [6]: # Histogram Equalization
# refer to link: http://scikit-image.org/docs/dev/auto_examples
"""
img = img_as_float(img)
ax_img, ax_hist = axes
ax_cdf = ax_hist.twinx()
# Display image
ax_img.imshow(img, cmap='gray')
ax_img.set_axis_off()
ax_img.set_adjustable('box-forced')
# Display histogram
ax_hist.hist(img.ravel(), bins=bins, histtype='step', color='black')
ax_hist.ticklabel_format(axis='y', style='scientific', scilimits=(0, 0))
ax_hist.set_xlabel('Pixel intensity')
ax_hist.set_xlim(0, 1)
ax_hist.set_yticks([])
6
ax_cdf.set_yticks([])
In [7]: img = a
# Contrast stretching
p2, p98 = np.percentile(img, (2, 98))
img_rescale = exposure.rescale_intensity(img, in_range=(p2, p98))
# Equalization
img_eq = exposure.equalize_hist(img)
# Adaptive Equalization
img_adapteq = exposure.equalize_adapthist(img, clip_limit=0.03)
/usr/local/Cellar/anaconda2/lib/python2.7/site-packages/skimage/util/dtype.py:110: UserWarning: Po
"%s to %s" % (dtypeobj_in, dtypeobj))
ax_hist.set_ylabel('Number of pixels')
ax_hist.set_yticks(np.linspace(0, y_max, 5))
7
# prevent overlap of y-axis labels
fig.tight_layout()
pyplot.show()
/usr/local/Cellar/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:11: RuntimeWarning:
8
In [10]: ## image inverse transfomation
# t(i,j) = L - 1 -I(i,j), transfrom dark intensities to bright intensities
# vice versa
im2 = 255-imp
im3 = scipy.misc.toimage(im2)
im3.save('b_invers.png')
pyplot.subplot(1,2,1)
pyplot.imshow(im_pl,'gray')
pyplot.title('img power low output')
pyplot.subplot(1,2,2)
pyplot.imshow(im3,'gray')
pyplot.title('img inverse output')
Out[10]: <matplotlib.text.Text at 0x1111d62d0>
p1=imp1.astype(float)
p2=numpy.max(p1)
c=(255.0*numpy.log(1+p1))/numpy.log(1+p2)
c1=c.astype(int)
im_log = scipy.misc.toimage(c1)
im_log.save('b_logTrans.png')
pyplot.subplot(1,2,2)
pyplot.imshow(im_log,'gray')
pyplot.title('img log trans output')
9
Out[11]: <matplotlib.text.Text at 0x1123bcb90>
# Gamma
gamma_corrected = exposure.adjust_gamma(img, 2)
# Logarithmic
logarithmic_corrected = exposure.adjust_log(img, 1)
10
ax_img.set_title('Gamma correction')
@adapt_rgb(each_channel)
def sobel_each(image):
return filters.sobel(image)
11
@adapt_rgb(hsv_value)
def sobel_hsv(image):
return filters.sobel(image)
# We use 1 - sobel_hsv(image) but this will not work if image is not normalized
ax_hsv.imshow(rescale_intensity(1 - sobel_hsv(image)))
ax_hsv.set_xticks([]), ax_hsv.set_yticks([])
ax_hsv.set_title("Sobel filter \n on each HSV ")
ax_orig.imshow(image)
ax_orig.set_xticks([]), ax_orig.set_yticks([])
ax_orig.set_title("Original RGB image")
In [25]: ## Thresholding
# create a binary image from a grayscale image
12
from skimage.filters import threshold_otsu,threshold_isodata, threshold_li
#image = skimage.color.rgb2gray(image)
#image= img_eq#.astype(float)
thresh = threshold_otsu(image)
#thresh = threshold_li(image)
#thresh = threshold_isodata(image)
ax[0].imshow(image, cmap=pyplot.cm.gray)
ax[0].set_title('Original')
ax[0].axis('off')
ax[1].hist(image.ravel(), bins=256)
ax[1].set_title('Histogram')
ax[1].axvline(thresh, color='r')
pyplot.show()
In [ ]:
13
87
Bibliography
[16] Ross Girshick et al. “Rich feature hierarchies for accurate object detection and
semantic segmentation”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2014, pp. 580–587.
[17] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. http://
www.deeplearningbook.org. MIT Press, 2016.
[18] Kaiming He et al. “Deep residual learning for image recognition”. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, pp. 770–
778.
[19] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. “A fast learning algo-
rithm for deep belief nets”. In: Neural computation 18.7 (2006), pp. 1527–1554.
[20] Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. “A practical guide to
support vector classification”. In: (2003).
[21] Wang Hui-Hui et al. “Audio signals encoding for cough classification using con-
volutional neural networks: A comparative study”. In: Bioinformatics and Biomedicine
(BIBM), 2015 IEEE International Conference on, pp. 442–445. DOI: 10.1109/BIBM.
2015.7359724.
[22] S. Hwang and M. E. Celebi. “Polyp detection in Wireless Capsule Endoscopy
videos based on image segmentation and geometric feature”. In: 2010 IEEE In-
ternational Conference on Acoustics, Speech and Signal Processing, pp. 678–681. ISBN:
1520-6149. DOI: 10.1109/ICASSP.2010.5495103.
[23] S. Hwang et al. “Polyp Detection in Colonoscopy Video using Elliptical Shape
Feature”. In: 2007 IEEE International Conference on Image Processing. Vol. 2, pp. II
–465–II –468. ISBN: 1522-4880. DOI: 10.1109/ICIP.2007.4379193.
[24] D. K. Iakovidis et al. “A comparative study of texture features for the discrimina-
tion of gastric polyps in endoscopic video”. In: 18th IEEE Symposium on Computer-
Based Medical Systems (CBMS’05), pp. 575–580. ISBN: 1063-7125. DOI: 10.1109/
CBMS.2005.6.
[25] Xiao Jia and Max Q-H Meng. “A deep convolutional neural network for bleeding
detection in Wireless Capsule Endoscopy images”. In: Engineering in Medicine
and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the.
IEEE. 2016, pp. 639–642.
[26] Andrej Karpathy. “Stanford University CS231n: Convolutional Neural Networks
for Visual Recognition”. In: (). URL: http://cs231n.stanford.edu/syllabus.
html.
[27] Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”.
In: arXiv preprint arXiv:1412.6980 (2014).
[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classifica-
tion with deep convolutional neural networks”. In: Advances in neural information
processing systems. 2012, pp. 1097–1105.
[29] Yann LeCun, Yoshua Bengio, et al. “Convolutional networks for images, speech,
and time series”. In: The handbook of brain theory and neural networks 3361.10 (1995),
p. 1995.
BIBLIOGRAPHY 89
[30] B. Li, M. Q. H. Meng, and C. Hu. “A comparative study of endoscopic polyp de-
tection by textural features”. In: Intelligent Control and Automation (WCICA), 2012
10th World Congress on, pp. 4671–4675. DOI: 10.1109/WCICA.2012.6359363.
[31] Qinghui Liu and António L. L. Ramos. “Advances and Future Perspectives on
Computer-Aided Diagnosis: The Case of Automatic Polyp Detection Based on
Gastrointestinal Imaging”. In: The 21th International Conference on Emerging Trends
and Technologies in Designing Healthcare Systems. Society For Design And Process
Science. SDPS. 2016, 216–222.
[32] Gilles Louppe. “Understanding random forests: From theory to practice”. In:
arXiv preprint arXiv:1407.7502 (2014).
[33] A. V. Mamonov et al. “Automated Polyp Detection in Colon Capsule Endoscopy”.
In: IEEE Transactions on Medical Imaging 33.7 (2014), pp. 1488–1502. ISSN: 0278-
0062. DOI: 10.1109/TMI.2014.2314959.
[34] Andriy Mnih and Geoffrey E Hinton. “A scalable hierarchical distributed lan-
guage model”. In: Advances in neural information processing systems. 2009, pp. 1081–
1088.
[35] Jonas Mockus. Bayesian approach to global optimization: theory and applications. Vol. 37.
Springer Science & Business Media, 2012.
[36] Ruwan Nawarathna et al. “Abnormal image detection in endoscopy videos us-
ing a filter bank and local binary patterns”. In: Neurocomputing 144 (2014), pp. 70–
91.
[37] Ruwan Dharshana Nawarathna et al. “Abnormal image detection using texton
method in wireless capsule endoscopy videos”. In: International Conference on
Medical Biometrics. Springer. 2010, pp. 153–162.
[38] Mengqi Pei et al. “Small bowel motility assessment based on fully convolutional
networks and long short-term memory”. In: Knowledge-Based Systems 121 (2017),
pp. 163–172.
[39] Otávio AB Penatti, Keiller Nogueira, and Jefersson A dos Santos. “Do deep fea-
tures generalize from everyday objects to remote sensing and aerial scenes do-
mains?” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition Workshops. 2015, pp. 44–51.
[40] E. Ribeiro, A. Uhl, and M. Häfner. “Colonic Polyp Classification with Convolu-
tional Neural Networks”. In: 2016 IEEE 29th International Symposium on Computer-
Based Medical Systems (CBMS), pp. 253–258. DOI: 10.1109/CBMS.2016.39.
[41] E. Ribeiro et al. “Colonic Polyp Classification with Convolutional Neural Net-
works”. In: 2016 IEEE 29th International Symposium on Computer-Based Medical
Systems (CBMS), pp. 253–258. DOI: 10.1109/CBMS.2016.39.
[42] Olga Russakovsky et al. “Imagenet large scale visual recognition challenge”. In:
International Journal of Computer Vision 115.3 (2015), pp. 211–252.
[43] Stuart Russell, Peter Norvig, and Artificial Intelligence. “A modern approach”.
In: Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25 (1995), p. 27.
90 BIBLIOGRAPHY
[53] N. Tajbakhsh et al. “Convolutional Neural Networks for Medical Image Analy-
sis: Full Training or Fine Tuning?” In: IEEE Transactions on Medical Imaging 35.5
(2016), pp. 1299–1312. ISSN: 0278-0062. DOI: 10.1109/TMI.2016.2535302.
[54] Pietro Valdastri, Massimiliano Simi, and Robert J. Webster. “Advanced Technolo-
gies for Gastrointestinal Endoscopy”. In: Annual Review of Biomedical Engineer-
ing 14.1 (2012), pp. 397–429. ISSN: 1523-9829. DOI: 10.1146/annurev-bioeng-
071811 - 150006. URL: http : / / dx . doi . org / 10 . 1146 / annurev - bioeng -
071811-150006.
[55] Pascal Vincent et al. “Extracting and composing robust features with denoising
autoencoders”. In: Proceedings of the 25th international conference on Machine learn-
ing. ACM. 2008, pp. 1096–1103.
[56] Jason Weston et al. “Deep learning via semi-supervised embedding”. In: Neural
Networks: Tricks of the Trade. Springer, 2012, pp. 639–655.
[57] G Wimmer et al. “Convolutional Neural Network Architectures for the Auto-
mated Diagnosis of Celiac Disease”. In: International Workshop on Computer-Assisted
and Robotic Endoscopy. Springer. 2016, pp. 104–113.
BIBLIOGRAPHY 91
[58] Xiang Wu, Ran He, and Zhenan Sun. “A lightened cnn for deep face represen-
tation”. In: 2015 IEEE Conference on IEEE Computer Vision and Pattern Recognition
(CVPR). 2015.
[59] Yixuan Yuan and Max Q-H Meng. “Polyp classification based on bag of features
and saliency in wireless capsule endoscopy”. In: Robotics and Automation (ICRA),
2014 IEEE International Conference on. IEEE. 2014, pp. 3930–3935.
[60] Matthew D Zeiler. “ADADELTA: an adaptive learning rate method”. In: arXiv
preprint arXiv:1212.5701 (2012).
[61] Rongsheng Zhu, Rong Zhang, and Dixiu Xue. “Lesion detection of endoscopy
images based on convolutional neural network features”. In: Image and Signal
Processing (CISP), 2015 8th International Congress on. IEEE. 2015, pp. 372–376.