Combining Deep Learning With Traditional Features For Classification and

The document discusses combining deep learning and traditional features for classification and segmentation of pathological breast cancer images. It trains CNN models using AlexNet and GoogLeNet on their dataset and performs transfer learning. Features extracted from CNNs are combined with texture features and classified using SVM to achieve automatic cancer detection.

Uploaded by

Edis Djedovic

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Combining Deep Learning With Traditional Features For Classification and

Uploaded by

Edis Djedovic

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2018 11th International Symposium on Computational Intelligence and Design

Combining Deep Learning with Traditional Features for Classification and

Segmentation of Pathological Images of Breast Cancer

Simin He, Jun Ruan, Yi Long, Jianlian Wang, Junqiu Yue, Yanggeling Zhang
Chenchen Wu, Guanglu Ye, Jingfan Zhou Department of Pathology
School of Information Engineering Hubei Cancer Hospital
Wuhan University of Technology Wuhan, China
Wuhan, China
e-mail: [email protected]

Abstract—The automatic classification of pathological images features of image. These layers can extract rich features from
of breast cancer has important clinical value. In order to image automatically. When training a CNN model, a very
improve the accuracy and efficiency of cancer detection, we common scenario is that there is not enough labeled dataset.
implement two classifications in this paper. (1) Train deep The generalization ability of the model which is trained by
convolutional neural network (CNN) models based on AlexNet the existing dataset often fails to meet the requirements. At
and GoogLeNet network structures. (2) Take the idea of this point, transfer learning has become an ideal solution.
transfer learning to complete the training of classification Although CNN can extract features very well, due to the
models. We use CNN to extract image features, then select high complexity of pathological images, combining some
distinguished features to simplify feature set, and combine
texture features such as Gray-Level Co-occurrence Matrix
them with texture features of images. At last, a Support Vector
(GLCM) and Local Binary Pattern (LBP) to analyze
Machine (SVM) is used for feature learning and classification.
pathological images can achieve better classification results
Keywords: CNN; transfer learning; texture features; feature [4]. Based on this, the main works of this article are as
learning follows: (1) Train classification models based on AlexNet
and GoogLeNet network structures. (2) Take advantage of
transfer learning to implement feature extraction network. (3)
I. INTRODUCTION Propose a new method combining features extracted by CNN
Cancer is one of the major killers that threaten human with texture features, then a SVM classifier is used for
health and life. Breast cancer is the most common cancer feature learning and classification. (4) Choose the best
among women. According to the overview of female breast solution for classification to achieve automatic detection of
cancer statistics including data on incidence, mortality, cancer regions.
survival, and screening provided by the American Cancer
Society, approximately 252,710 new cases of invasive breast II. METHODS
cancer and 40,610 breast cancer deaths are expected to occur
among US women in 2017. In China, the situation is almost A. Dataset
the same, the incidence rate is rising year by year and The experimental dataset is H&E (Hematoxylin and
younger [1]. eosin) stain histological WSIs of 121 cancer patients
With the development of digital image technology, it has provided by Hubei Cancer Hospital. Each slide is stained
become possible to detect cancer in digital whole slide with H&E, which makes the nuclei dark blue and the
images (WSIs). Due to its many advantages such as accurate cytoplasm and tissue areas are lavender. The size of whole
measurement, labeling, and multi-slide viewing, WSI has image is 70,000 × 80,000 pixels scanning at 20×
played an important role in the development of digital magnification. Each slide has a Mask file, which contains
pathology [2].
the annotated information of cancer area and normal area as
Automatic classification of pathological images of breast
cancer is a very challenging task. First, due to the complex shown in Figure 1.
characteristics of pathological images, such as subtle
differences among different images, overlapping of cells,
uneven color distribution, have brought great difficulties to
image classification. Second, the lack of large-scale, open,
labeled dataset brings some difficulties to algorithm research
[3]. Despite all this, scholars have conducted many studies in
the automatic classification of pathological images of breast
cancer. (a) Tissue biopsy after (b) WSI annotation information
In recent years, it is a trend to use CNN features to HE staining
accomplish the classification task of pathological images. Figure1. Slide scanning and its annotation, the blue line (the line
The main structural layers of CNN are convolutional layers outside) indicates the cancer area, and the green line (the line inside)
and pooling layers which are used to extract and highlight indicates the normal area.

DOI 10.1109/ISCID.2018.00007
B. Preprocessing different dataset, which have no correlation, still has some
For tissue slide full-scan images, most of the regions are commonalities. Through the network, we can firstly learn
background that have nothing to do with the experiment. the commonalities among these images in order to be
For each WSI the manually labeled mask file was used to capable of specific tasks. Diffracting this idea into deep
draw the cancer area and normal area. We use rectangular learning [7]. First, train a CNN model with well-defined and
boxes of 256×256 size to extract positive and negative large dataset. Then, use the trained model to extract the
samples in the cancer area and normal or tissue areas features of samples involved in the specific task. The dataset
respectively. used in this experiment is a large natural image database,
Besides, we divide each WSI into many regions at 1.25× ImageNet, which contains more than 14 million images.
magnification using Simple Linear Iterative Clustering ImageNet makes it possible to extract sufficient information
(SLIC). Then sample patches from each region at 20× from medical images [8].
magnification to classify. If most of patches of the region In this paper, we fine-tune the training parameters and
are predicted negative, we can exclude this region to reduce train AlexNet and GoogLeNet models by ImageNet. From
detection range for improving the efficiency of cancer the fully connected layer of the network, we extract features
detection. of images. As for AlexNet, we extract features from the
“fc7” layer to obtain a 4096-dimensional vector for each
C. Classification image. As for GoogLeNet, we extract features from the
We try two ways to obtain a classification model to “loss3/classifier” layer to obtain a 1000-dimensional vector
distinguish the cancer and normal images. The first is for each image. In addition, to obtain a distinguished subset
training CNN models by our own dataset. The second is of the features, the redundant or irrelevant features are
using transfer learning to implement feature extraction excluded. Features are selected based on the differences
network for next classification. between positive and negative labels. The difference of the
CNN k-th feature diffk is calculated by
1 1
¦v ¦v
In the CNN classification module, we select AlexNet and
GoogLeNet network structures for training. The CNN
diffk i, k i, k ,
Npos i pos Nneg ineg
model is trained under real dataset, and the resulting
network model is used for direct classification tasks. where Npos and Nneg are the number of positive and negative
As the champion of the ImageNet competition in 2012, images in the training set, and vi,k is the k-th dimensional
AlexNet has a special significance. It is the first large-scale, feature of the i-th image. Feature components are then
complex, and successfully implemented CNN model. The ranked from largest diffk to smallest, and the top 100 feature
first 5 layers of AlexNet network are convolutional layers, components are selected [9].
and the next 3 layers are fully connected layers [5]. Due to the complex characteristics of pathological images,
Compared to AlexNet whose depth of network is 8, the there are very subtle differences between different images.
depth of another network GoogLeNet used in the Some misjudgments may occur even if relying on the deep
experiment reached 22 layers. Christian Szegedy from learning network. Therefore, we can combine texture
Google presents and designs the first Inception Architecture features with CNN features to complete the detection task of
Network – GoogLeNet [6]. cancer cells to improve the classification accuracy.
We send testing samples to network. If the rate of Then we compute GLCM and LBP features which are the
classification of the model is higher than 95%, the result is popular features to characterize image texture. GLCM is a
correct. The parameters we set about CNN models are matrix function of pixel distances and angles. In the
shown in TABLE I and TABLE II. experiment, we set 5 to distance and 0 to angel for each
pixel pair, and set the levels, which indicate the number of
TABLE I. PARAMETERS OF ALEXNET MODEL TRAINING grey-levels, 256. Then we can obtain a 6-dimensional
feature vector from each image. LBP is an operator used to
Base_lr Lr_policy Batch_size Test_interval Test_iter
0.001 poly 8 250 2,500
describe local features of an image. As for parameter setting,
Momentum Power Solver_mode Max_iter Snapshot we set 1 to the radius of circle, which is set to the number of
0.9 0.5 GPU 25,000 5,000 circularly symmetric neighbor set points after multiplying 8.
TABLE II. PARAMETERS OF GOOGLENET MODEL TRAINING
Finally, train a SVM classifier to classify testing
samples based on the feature set. Figure 2 shows our
Base_lr Lr_policy Batch_size Test_interval Test_iter
0.001 poly 4 500 5000
classification framework. We either combine the features of
Momentum Power Solver_mode Max_iter Snapshot CNN, GLCM and LBP in a higher-dimensional feature
0.9 0.5 GPU 50,000 10,000 vector or we use CNN features directly for classification.
Here, for classifying, there are four combined ways including
Transfer learning AlexNet + SVM (AS), GoogLeNet + SVM (GS), AlexNet +
We use transfer learning to train a CNN model for GLCM + LBP + SVM (AGLS) and GoogLeNet + GLCM +
feature extraction. The main idea of transfer learning is that LBP + SVM (GGLS). In all of them, SVM is used for

4
Figure2. Classification framework. Square patches of 256h256 pixels in size are sampled on a rectangular grid. Then extract 4096-
dimensional or 1000-dimensional feature vector from CNN models for each patch. Select 100-dimensional vector via feature selection for
each image. Combine the features of CNN, GLCM and LBP in a higher-dimensional feature vector or use CNN features directly for
classification.

classifying, and in the front of SVM, they are all used for final label for each 8×8 pixels patch is decided by the
extracting features. Open-source toolbox LIBLINEAR [10] majority vote of the patches covering this 8×8 pixels patch.
is used to optimize SVM whose cost function is Finally, compare the classification results with the
1 T pathologists' annotations to evaluate the accuracy of the
w w C ¦i 1 max 0, 1 yi wT xi
l
and radial basis
2 segmentation.
function(RBF) is K X i , X j exp J X i X j 2 , J ! 0 .The III. RESULTS AND DISCUSSION
optimal value of parameter C and gamma are determined by The WSI was cut into small patches to obtain training
cross-validation on training data. The parameters we set sample set and testing sample set. First manually mark the
about SVM are shown in TABLE III. cancer area from the WSI at 1.25× magnification, and
TABLE III. PARAMETERS OF SVM
extract the positive and negative patches at 20×
magnification. The patch size is 256×256 pixels. Then we
methods C γ select patches manually and remove the noisy patches in the
AlexNet+SVM (AS) 32.0 0.0078125
training set. We extract 5,000 representative positive and
GoogLeNet+SVM (GS) 8.0 0.125
AlexNet+GLCM+LBP+SVM (AGLS) 2.0 0.03125 negative patches from WSIs. Then the method of data
GoogLeNet+GLCM+LBP+SVM (GGLS) 32.0 0.03125 augmentation is used, the sample set was horizontally
flipped and clockwise rotated 90°, 180°, and 270° to
D. Segmentation amplify the data, making the new training set 8 times the
In the large-scale histopathology image, various types of original data, thus we can obtain 40000 samples (20000
cell regions are often sticky, that is, cancer cells or normal images for training, 2000 images for validation and 18000
cells will be in the adjacent area, showing a clustered images for test).
distribution rather than a scattered distribution. Therefore,
for each type of 256×256 pixels patch on a large image, it A. Classification Results
has a large correlation with its surrounding patches. Based on All the experiments are performed in the CAFFE
this, the voting classification mechanism is used. platform. we select six dataset from testing data to evaluate
First, each histopathology image is divided into a set of the schemes of the classification while each testing dataset
overlapping square patches with a size of 112×112 pixels in contains 3000 images. Compare the classification result with
8-stride at 10× magnification. Then the patches are scaled to the label of patch to determine whether the classification
a size of 224×224 pixels and fed to classifier we trained. result is correct. TABLE IV shows the average accuracy of
experiments.
Since one 8×8 pixels patch can be covered by many
overlapping 112×112 pixels patches with different labels, the

5
TABLE IV. THE ACCURACY OF CLASSIFICATION IV. CONCLUSIONS
classification method Patch classification In the era of widespread popularity of digital pathological
accuracy images, in order to reduce the workload of pathologists and
AlexNet 93.40%
GoogLeNet 94.12%
perform professional and reliable diagnosis, this paper
AlexNet+SVM (AS) 94.79% combines computer image processing technology and deep
GoogLeNet+SVM (GS) 95.15% learning technology to perform automatic cancer detection
AlexNet+GLCM+LBP+SVM (AGLS) 98.83% on WSIs. The main works of this paper are˖(1) Extract
GoogLeNet+GLCM+LBP+SVM (GGLS) 98.90%
patches from WSIs to obtain training set and testing set. (2)
As shown in the TABLE IV, higher than the previous Train classification models based on CNN to classify cancer
four methods, our AGLS leads to the accuracy of 98.83% on areas. (3) Train feature extraction network model via CNN
the testing data and GGLS is 98.90%. Although GGLS is and transfer learning. Then combine CNN features with
slightly higher than AGLS in the accuracy, because of its texture features to train a SVM classifier. The classification
simple structure, AlexNet is much faster than GoogLeNet in model is used for cancer detection. And our proposed
the experiment. Considering both accuracy and speed, this classification method has higher accuracy. (4) Choose the
paper choose AGLS in final task for detecting cancer. best solution for classification to achieve automatic detection
of cancer regions on large-scale pathological images via
B. Segmentation Results sliding window and voting scoring. And the final detection
For each WSI, large-scale images were extracted for results almost approached the diagnosis of professional
cancer detection. Then we deepen the color of detected pathologists.
cancer areas to make the segmentation results more visible.
Finally, compare segmentation results with specialist
ACKNOWLEDGMENT
pathologists’ judgement, as shown in Figure 3.
This research was supported by "Training project for
Young and Middle-aged Medical Talents" (to JQY) from
Health and Family Planning Commission of Wuhan City of
China.

REFERENCES
[1] Carol E. DeSantis, Jiemin Ma, Ann Goding Sauer, et al. “Breast
Cancer Statistics, 2017, Racial Disparity in Mortality by State”.
[2] Fine J L, Grzybicki D M, Silowash R, et al. Evaluation of whole slide
image immunohistochemistry interpretation in challenging prostate
needle biopsies. [J]. Human Pathology, 2008, 39(4):564-572.
[3] Kalkan H, Nap M, Duin R P W, et al. Automated classification of
local patches in colon histopathology[C]// International Conference
on Pattern Recognition. IEEE, 2012:61-64.
[4] Wan T, Cao J, Chen J, et al. Automated grading of breast cancer
histopathology using cascaded ensemble with combination of multi-
level image features ˓[J]. Neurocomputing, 2017, 229(C):34-44.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with
deep convolutional neural networks[C]// International Conference on
Neural Information Processing Systems. Curran Associates Inc.
2012:1097-1105.
[6] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//
IEEE Conference on Computer Vision and Pattern Recognition. IEEE
Computer Society, 2015:1-9.
[7] Pan S J, Yang Q. A Survey on Transfer Learning[J]. IEEE
Transactions on Knowledge & Data Engineering, 2010, 22(10):1345-
(a) (b) (c) (d)
1359.
Figure3. Segmentation results on large-scale pathological images. a Original
images. b Our segmentation results. c Heatmap for classification. d Ground [8] Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual
truth from pathologists (red line area represents cancer area). Recognition Challenge[J]. International Journal of Computer Vision,
2015, 115(3):211-252.
From the comparison of experimental results and expert [9] Xu Y, Jia Z, Wang L B, et al. Large scale tissue histopathology image
diagnosis, our AGLS classification method are feasible in classification, segmentation, and visualization via deep convolutional
cancer detection which are very similar to professional activation features[J]. Bmc Bioinformatics, 2017, 18(1):281
pathologists. [10] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. Liblinear: A
library for large linear classification. J Mach Learn Res.
2008;9:1871–4.