CS 231N Final Project Report: Cervical Cancer Screening
CS 231N Final Project Report: Cervical Cancer Screening
1. Abstract
1
at cervical cancer classification combined image features Table 1. Class distribution of the dataset
Type 1 Type 2 Type 3
from the last fully connected layer of pre-trained AlexNet
251 782 451
with biological features extracted from a Pap smear to make
16.9% 52.7% 30.4%
the prediction [4]. Another group used features computed
from images of cells from a cervix biopsy as input into
a feed-forward neural network to predict the presence of
cancer [5]. Other manual features from a Pap smear such as
Grey level, wavelet, and Grey level co-occurrence matrix some viewers. Included in Figure 2 are several thumbnail
have been used for cancer detection [6]. Deep learning has sized versions of the training data. Our training set con-
also been used for other types of cancer detection. A con- tained a total of 1481 images (see Table 1 for a breakdown
volutional neural network (CNN) following OxfordNet’s by type) while the test set contains 512 images with the la-
structure was used to detect mammographic lesions [7]. bels not publicly available. Notice that Type 2 makes up
A CNN with parameters pre-trained on a similar dataset over half of the available training data, while Type 1 only
was also used to differentiate between mammographic makes up 17%. Each image has a variable number of pixels
cysts and lesions [8]. A recent paper from a group of but all are colored images.
Stanford researchers has excited the medical community Kaggle provides additional data for training, but the ad-
and uses a pre-trained Inception-v3 model and hierarchical ditional data is of low quality. Manual inspection of the
algorithm to classify different skin malignancies with data reveals that many images are duplicated, and some im-
results comparable to expert dermatologists [9]. Another ages are not even of cervixes (e.g. we found a picture of a
study analyzed colonoscopy video footage and used a CNN woman’s face, a picture of a finger, and a picture of some
to compute image features which were then later used to newsprint). We found that training on the additional data
predict the bounding boxes for different polyps [10]. No did not improve model performance; this is likely because
pre-segmentation was used in one study of lung nodule the additional dataset is not drawn from the same distribu-
classification, which used a CNN feature extractor [11]. tion as the training dataset. We excluded the additional data
from our analysis for this reason.
Automated cervix and cervical cell segmentation is
In an attempt to visualize out data set, we performed
another important area of study. One method takes care to
PCA on the raw images values to look for clustering and
remove glare from the photo and uses K-nearest neighbors
grouping by type and also performed t-SNE analysis on the
(KNN) with images pre-segmented by a distance metric
first 3 principal components us sci-kit learn [23]. Unsurpris-
based off of the histogram of oriented gradients to locate
ingly, as can be seen in Figures 3 and 4, the cervix types do
the most similar bounding boxes and averages them [12].
not fall into clusters based on this analysis, indicating that
A model by researchers at Medical College of Georgia also
our input data points resemble each other.
used glare removal, K-means clustering, and texture fea-
tures to segment the different cell types around the cervix
[13]. A similar method fed color and cell area features
into K-means to segment the cervix [14]. Another group
performed cervix segmentation by first transforming the
image from RGB to luminosity, red-green chromaticism,
and blue-yellow chromaticism, and then ran K-means
and selected the largest region [15]. One group found
that using a CNN to segment cervical cell cytoplasm and
nuclei outperformed traditional filters and classification
methods, especially when multiple cells were in the picture
[16]. LeNet5 was used as inspiration for another group’s
epithelial cell segmentation task [17]. They coped with
dataset scarcity by extensively augmenting the dataset with Figure 3. Training data distribution over the first two principle
flips and rotations. Similarly, a LeNet-like architecture components. No obvious clusters have emerged.
was also used for segmentation of bones in x-rays using
pixel-wise classification [18].
We used either only the original dataset or with the addi-
4. Dataset tional dataset, combined with different data augmentation
methods (see Preprocessing section). We then randomly
Kaggle provides a dataset of approximately 1500 labeled chose 10% of the labeled data for validation, and the rest
cervix images. The images are graphic and may offend for training.
2
nary mask. This final modification was necessary because
the center of the cervix frequently had a redder color than
could be represented by the average color vector, causing
it to be mistakenly excluded. In practice, we used 10 for
K and 5 for M. The higher K and M are, the higher the
chance of including cervical tissue but also extraneous ob-
jects. Some successful and unsuccessful segmentations are
shown in the following figures.
5. Methods
5.1. Preprocessing
Since the initial images provided were much to large
(more than 2000 pixels a side) as well as irregularly shaped,
the first step was to crop the initial images into a square
with the length of the shortest initial side. Then, a 160x160
or 224x224 segment of the image was cut from the center
of the larger image. The assumption, which turns out to be
true most of the time, is that the cervix will be in the center
of the image since it is the most important. Figure 5. A successful segmentation. The initial image, the K-
We attempted a variety of data set augmentation methods means patches, the KNN binary mask, and the final image. Note
to cope with the small dataset. We performed random hor- the glove and speculum are removed but the cervix remains.
izontal and vertical flipping, 90◦ and 270◦ rotations, as well
as random rotation, random cropping, and random scaling
of the inputs.
5.2. Segmentation
Although not the main focus of this project, given the at-
tention paid to segmentation in the literature, we thought it
best to make an effort to segment the cervix, which would
help with removing extraneous objects and tissues from the
input. We took inspiration from [12, 13, 14, 15], who
used K-means and KNN to aid in their segmentation pro-
cess. Our segmentation pipeline is as follows: first, the
image is run through scikit-learn’s image segmentation al-
gorithm, which uses K-means to create roughly K image
patches based on proximity and color similarity [21, 24].
Then KNN is used to determine which of these patches is
cervical tissue. While [12] used the relatively sophisticated
Figure 6. A semi-successful segmentation. The initial image, the
histogram of oriented gradients approach to find the patches K-means patches, the KNN binary mask, and the final image. Per-
closest to pre-segmented cervices, we did not have the lux- haps the average color of the plastic was close enough to cervical
ury of many pre-segmented cervices. Instead, we manually tissue to be included.
segmented 10 random cervices and computed the average
red, green, and blue values, giving us a 3 element feature
5.3. Model Architectures
vector. Then, to decide which of the K patches contained
cervical tissue, we performed KNN using the average color We built two models from scratch for this project:
vector for the patch as the feature. We took the M patches CervixNet-1, a shallow net with two convolutional layers,
with the lowest distances as well as any patches that were and CervixNet-2, a deeper net with five convolutional lay-
contained within these patches and used them to create a bi- ers.
3
Figure 7. A failed segmentation. The initial image, the K-means
patches, the KNN binary mask, and the final image. Because the
initial image was so zoomed in, the final segmentation actually lost
tissue.
5.3.1 CervixNet-1
4
• ResNet v1 [19]
• Inception v2 [20]
5
Table 2. Results of some best experiments on CervixNet-1
6. Experiments
6.1. CervixNet-1 Figure 12. Loss curves for the best performing model.
6
To choose the learning rate and regularization strength,
we trained the model for 50 iterations at 5 different learn-
ing rates distributed logarithmically from 10−4 to 10−2 and
regularizations strengths of 10−3 , 10−4 , and 10−6 . These
cross-validation values were chosen based on the reported
hyperparameter values in the paper. To preprocess the
data, images are randomly dilated (i.e. resized), randomly
cropped, and randomly flipped [19]. This is both to aug-
ment the dataset and prevent over-fitting to irrelevant image
features related to spatial location or image size.
Figure 13 shows a representative training loss curve for
the ResNet training. The learning rate was annealed by a
factor of 0.4 at iteration 100 and 200, and the logging fre-
quency was reduced by a factor of 5 at iteration 230.
Though the training loss decreased appreciably over the Figure 14. Confusion matrix for 32 validation data points with the
first hundred iterations, the loss begins to plateau after iter- ResNet v1 model
ation 100. Due to time and resource constraints, the ResNet
model could only be trained for a limited number of gra-
dient steps. It is likely that model performance could have Table 3. ResNet predicted class distributions vs. actual Class dis-
been improved by training for more iterations. The decrease tributions
in loss over the first 100 iterations is likely due to the last
fully connected layer training to fit the data; it yields sig- Type 1 Type 2 Type 3
nificant progress relatively quickly. For the rest of the time, Average predicted
0.499 0.454 0.046
the entire model is training, and will likely take on the order Class Prevalence
of 104 iterations to fully converge. Actual Class
0.169 0.527 0.304
Prevalance
6.4. Inception
7
Table 5. Performance Statistics for best performing models of each
Architecture
Figure 15. Train (solid) and Validation (dashed) plots for the dif-
ferent experiments listed in Table 2.
8
spent more time to rigorously cleanse the additional data. imaging 34.1 (2015): 229-245.
Batch normalization proved to increase performance on [13] Li, Wenjing, et al. “Automated image analysis of uter-
some models at the cost of lower speed. We could have ine cervical images.” Medical Imaging. International Society for
tried weight normalization instead. Optics and Photonics, 2007.
Our best training cross-enntropy loss score of 0.817 puts [14] Srinivasan, Yeshwanth, et al. “A probabilistic approach
us within the top 200 submissions on Kaggle. Given more to segmentation and classification of neoplasia in uterine cervix
time to experiment and refine, we expect this score can be images using color and geometric features.” Medical Imaging. In-
improved. ternational Society for Optics and Photonics, 2005.
We learned a lot from the project, both about image pro- [15] Das, Abhishek, Avijit Kar, and Debasis Bhattacharyya.
cessing and deep neural networks. This is also the first “Elimination of specular reflection and identification of ROI: The
Kaggle competition for all our team members, and we all first step in automated detection of Cervical Cancer using Digital
thought that it was a fun experience. This motivates us to Colposcopy.” Imaging Systems and Techniques (IST), 2011 IEEE
do not only more Kaggle competitions in the future, but to International Conference on. IEEE, 2011.
apply what we’ve learned in class to real world problems. [16] Song, Youyi, et al. “A deep learning based framework
for accurate segmentation of cervical cytoplasm and nuclei.” En-
References gineering in Medicine and Biology Society (EMBC), 2014 36th
annual international conference of the IEEE. IEEE, 2014.
[1] Ioffe, S., & Szegedy, C. “Batch Normalization : Accelerat- [17] Malon, Christopher, et al. “Identifying histological ele-
ing Deep Network Training by Reducing Internal Covariate Shift.” ments with convolutional neural networks.” Proceedings of the 5th
arXiv Preprint arXiv:1502.03167v3. (2015). international conference on Soft computing as transdisciplinary
[2] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & science and technology. ACM, 2008.
Salakhutdinov, R. Dropout: a simple way to prevent neural net- [18] Cernazanu-Glavan, Cosmin, and Stefan Holban. “Seg-
works from overfitting. J. Machine Learning Res. 15, 19291958 mentation of bone structure in X-ray images using convolutional
(2014). neural network.” Adv. Electr. Comput. Eng 13.1 (2013): 87-94.
[3] MobileODT, Intel, & Kaggle Inc. “Intel & MobileODT [19] He, K. et al. “Deep Residual Learning for Image Recog-
Cervical Cancer Screening.” www.kaggle.com/c/intel-mobileodt- nition” arXiv Prepring arXiv:1512.03385. (2015)
cervical-cancer-screening (2017). [20] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wo-
[4] Xu, Tao, et al. “Multimodal Deep Learning for Cervical jna. “Rethinking the inception architecture for computer vision.”
Dysplasia Diagnosis.” International Conference on Medical Im- arXiv preprint arXiv:1512.00567. (2015)
age Computing and Computer-Assisted Intervention. Springer In- [21] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aure-
ternational Publishing, 2016. lien Lucchi, Pascal Fua, and Sabine Suesstrunk, SLIC Superpixels
[5] Sokouti, Babak, Siamak Haghipour, and Ali Dastranj Compared to State-of-the-art Superpixel Methods, TPAMI, May
Tabrizi. “A framework for diagnosing cervical cancer dis- 2012.
ease based on feedforward MLP neural network and ThinPrep [22] Hinton, Geoffrey, NiRsh Srivastava, and Kevin Swersky.
histopathological cell image features.” Neural Computing and Ap- ”Neural Networks for Machine Learning Lecture 6a Overview of
plications 24.1 (2014): 221-232. mini-batch gradient descent.” (2012).
[6] Sukumar, P., and R. K. Gnanamurthy. “Computer aided de- [23] Pedregosa, Fabian, et al. ”Scikit-learn: Machine learning
tection of cervical cancer using PAP smear images based on hybrid in Python.” Journal of Machine Learning Research 12.Oct (2011):
classifier.” International Journal of Applied Engineering Research 2825-2830.
10.8 (2015): 21021-32. [24] Van der Walt, Stefan, et al. ”scikit-image: image process-
[7] Kooi, Thijs, et al. “Large scale deep learning for computer ing in Python.” PeerJ 2 (2014): e453.
aided detection of mammographic lesions.” Medical image analy- [25] Martn Abadi, Ashish Agarwal, Paul Barham, Eugene
sis 35 (2017): 303-312. Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
[8] Kooi, Thijs, et al. “Discriminating solitary cysts from soft Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfel-
tissue lesions in mammography using a pretrained deep convolu- low, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefow-
tional neural network.” Medical physics 44.3 (2017): 1017-1027. icz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Lev-
[9] Esteva, Andre, et al. “Dermatologist-level classification of enberg, Dan Man, Mike Schuster, Rajat Monga, Sherry Moore,
skin cancer with deep neural networks.” Nature 542.7639 (2017): Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya
115-118. Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay
[10] Park, Sun Young, and Dusty Sargent. “Colonoscopic
Vasudevan, Fernanda Vigas, Oriol Vinyals, Pete Warden, Martin
polyp detection using convolutional neural networks.” SPIE Medi-
Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Ten-
cal Imaging. International Society for Optics and Photonics, 2016.
[11] Shen, Wei, et al. “Multi-scale convolutional neural net- sorFlow: Large-scale machine learning on heterogeneous systems,
works for lung nodule classification.” International Conference 2015. Software available from tensorflow.org.
on Information Processing in Medical Imaging. Springer Interna-
tional Publishing, 2015.
[12] Song, Dezhao, et al. “Multimodal Entity Coreference
for Cervical Dysplasia Diagnosis.” IEEE transactions on medical